Tang Weigang

@clawhub-tangweigang-jpg-8679fec286

82prompts

0upvotes received

0contributions

Joined 3 months ago

82 contributions in the last year

Aug

Sep

Oct

Nov

Dec

Jan

Feb

Mar

Apr

May

Jun

Jul

Less

Beancount Plaintext Ledger

Skill

Beancount 纯文本复式记账框架，支持导入银行对账单和交易数据，自动生成资产负债表和损益表等财务报表。

---
name: beancount-plaintext-ledger
description: |-
  Beancount 纯文本复式记账框架，支持导入银行对账单和交易数据，自动生成资产负债表和损益表等财务报表。
license: Proprietary. See LICENSE.txt in project root.
compatibility: Designed for Doramagic-host ecosystem (Claude Code / openclaw / Cursor). Requires Python 3.12+ with uv package manager.
metadata:
  version: "v6.1"
  blueprint_id: "finance-bp-129"
  compiled_at: "2026-04-22T13:01:04.739311+00:00"
  capability_markets: "global"
  capability_activities: "accounting"
  sop_version: "crystal-compilation-v6.1"
---
# Beancount 纯文本账本 (beancount-plaintext-ledger)

> Beancount 纯文本复式记账框架，支持导入银行对账单和交易数据，自动生成资产负债表和损益表等财务报表。

## Pipeline

`data_collection -> data_storage -> factor_computation -> target_selection -> trading_execution -> visualization`

## Top Use Cases (2 total)

### Beancount Test Utilities Framework (`UC-101`)
Provides reusable testing utilities for beancount test scripts including temporary directory management and test file creation for integration testing
**Triggers**: testing utilities, tempdir, test files

### Test Utils Validation Suite (`UC-102`)
Unit tests that validate the correctness of test utility functions including temporary directory cleanup and test file generation for beancount test s
**Triggers**: unit test, validation, test utilities

**Execute trigger**: `When user intent matches intent_router.uc_entries[].positive_terms AND user uses action verb (run/execute/跑/执行/backtest/fetch/collect)`

## What I'll Ask You

- Target market: A-share (default), HK, or crypto? (US stocks in ZVT are half-baked — stockus_nasdaq_AAPL exists but coverage is thin)
- Data source / provider: eastmoney (free, no account), joinquant (account+paid), baostock (free, good history), akshare, or qmt (broker)?
- Strategy type: MACD golden-cross, MA crossover, volume breakout, fundamental screen, or custom factor?
- Time range: start_timestamp and end_timestamp for backtest period
- Target entity IDs: specific stocks (stock_sh_600000) or index components (SZ1000)?

## Semantic Locks (Fatal)

| ID | Rule | On Violation |
|---|---|---|
| `SL-01` | Execute sell orders before buy orders in every trading cycle | halt |
| `SL-02` | Trading signals MUST use next-bar execution (no look-ahead) | halt |
| `SL-03` | Entity IDs MUST follow format entity_type_exchange_code | halt |
| `SL-04` | DataFrame index MUST be MultiIndex (entity_id, timestamp) | halt |
| `SL-05` | TradingSignal MUST have EXACTLY ONE of: position_pct, order_money, order_amount | halt |
| `SL-06` | filter_result column semantics: True=BUY, False=SELL, None/NaN=NO ACTION | halt |
| `SL-07` | Transformer MUST run BEFORE Accumulator in factor pipeline | halt |
| `SL-08` | MACD parameters locked: fast=12, slow=26, signal=9 | halt |

Full lock definitions: [references/LOCKS.md](references/LOCKS.md)

## Top Anti-Patterns (15 total)

- **`AP-ACCOUNTING-001`**: Using floating-point arithmetic for monetary amounts
- **`AP-ACCOUNTING-002`**: Skipping initialization calls before VM/script execution
- **`AP-ACCOUNTING-003`**: Mixing different asset types in monetary operations

All 15 anti-patterns: [references/ANTI_PATTERNS.md](references/ANTI_PATTERNS.md)

## Evidence Quality Notice

> [QUALITY NOTICE] This crystal was compiled from blueprint finance-bp-129. Evidence verify ratio = 51.5% and audit fail total = 7. Generated results may have uncaptured requirement gaps. Verify critical decisions against source files (LATEST.yaml / LATEST.jsonl).

## Reference Files

| File | Contents | When to Load |
|---|---|---|
| [references/seed.yaml](references/seed.yaml) | V6+ 全量权威 (source-of-truth) | 有行为/决策争议时必读 |
| [references/ANTI_PATTERNS.md](references/ANTI_PATTERNS.md) | 15 条跨项目反模式 | 开始实现前 |
| [references/WISDOM.md](references/WISDOM.md) | 跨项目精华借鉴 | 架构决策时 |
| [references/CONSTRAINTS.md](references/CONSTRAINTS.md) | domain + fatal 约束 | 规则冲突时 |
| [references/USE_CASES.md](references/USE_CASES.md) | 全量 KUC-* 业务场景 | 需要完整示例时 |
| [references/LOCKS.md](references/LOCKS.md) | SL-* + preconditions + hints | 生成回测/交易代码前 |
| [references/COMPONENTS.md](references/COMPONENTS.md) | AST 组件地图（按 module 拆分）| 查 API 时 |

---

*Compiled by Doramagic crystal-compilation-v6.1 from `finance-bp-129` blueprint at 2026-04-22T13:01:04.739311+00:00.*
*See [human_summary.md](human_summary.md) for non-technical overview.*

FILE:human_summary.md
# Human Summary

> I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow. Just tell me what you want; I'll write the code, you don't have to dig docs. (Heads up: ZVT natively supports A-share, HK, and crypto. US stocks — stockus_nasdaq_AAPL — are half-baked; don't bother for serious work.)

## What I Can Do

- **tagline**: I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow. Just tell me what you want; I'll write the code, you don't have to dig docs. (Heads up: ZVT natively supports A-share, HK, and crypto. US stocks — stockus_nasdaq_AAPL — are half-baked; don't bother for serious work.)
- **use_cases**: ['Test Utils Validation Suite', 'Beancount Test Utilities Framework', 'A-share MACD daily golden-cross backtest with hfq price adjustment from eastmoney', 'End-to-end ZVT pipeline: FinanceRecorder + GoodCompanyFactor + StockTrader', 'Multi-factor strategy with TargetSelector (AND mode) combining MACD + volume breakout', 'Index composition data collection (SZ1000, SZ2000) with EM recorder', 'Institutional fund holdings tracker via joinquant_fund_runner pattern']

## What I Ask You

- Target market: A-share (default), HK, or crypto? (US stocks in ZVT are half-baked — stockus_nasdaq_AAPL exists but coverage is thin)
- Data source / provider: eastmoney (free, no account), joinquant (account+paid), baostock (free, good history), akshare, or qmt (broker)?
- Strategy type: MACD golden-cross, MA crossover, volume breakout, fundamental screen, or custom factor?
- Time range: start_timestamp and end_timestamp for backtest period
- Target entity IDs: specific stocks (stock_sh_600000) or index components (SZ1000)?

FILE:references/ANTI_PATTERNS.md
# Anti-Patterns (Cross-Project)

Total: **15**

## finance-bp-073--ledger (7)

### `AP-ACCOUNTING-002` — Skipping initialization calls before VM/script execution <sub>(high)</sub>

Executing Numscript VM without first calling ResolveResources() and ResolveBalances() causes panics with ErrResourcesNotInitialized or ErrBalancesNotInitialized. This prevents any script execution and leaves transactions in an unrunnable state, blocking financial operations entirely.

### `AP-ACCOUNTING-003` — Mixing different asset types in monetary operations <sub>(high)</sub>

Performing addition, subtraction, or take operations on amounts with different asset types produces invalid financial calculations. This violates the fundamental accounting principle that amounts in different currencies cannot be combined, leading to corrupted account balances and failed reconciliations.

### `AP-ACCOUNTING-004` — Missing insufficient funds validation <sub>(high)</sub>

Failing to detect when account balance cannot cover a requested withdrawal or transfer allows overdrafts beyond permitted limits. This causes real monetary losses, account balance violations, and potential regulatory compliance issues in global markets.

### `AP-ACCOUNTING-005` — Non-atomic transaction commit/rollback <sub>(high)</sub>

Processing database operations without atomic commit/rollback leaves partial state when failures occur. This corrupts account balances and volumes, violating double-entry bookkeeping integrity and making audit trails unreliable for global regulatory compliance.

### `AP-ACCOUNTING-006` — On-demand posting generation causing double-spending <sub>(high)</sub>

Computing postings on-demand rather than accumulating them during transaction execution fails to track already-spent funds within the same transaction. This creates double-spending vulnerabilities that violate atomic transaction semantics and can result in significant financial losses.

### `AP-ACCOUNTING-007` — Log insertion after transaction commit breaking event sourcing <sub>(high)</sub>

Committing the transaction before inserting the audit log breaks the event sourcing pattern fundamental to accounting integrity. This makes it impossible to rebuild state from logs and violates audit requirements necessary for global financial compliance.

### `AP-ACCOUNTING-008` — Incomplete transaction log hash chaining <sub>(high)</sub>

Computing log hashes without including the previous log hash breaks the immutable audit trail chain. This allows undetected tampering with historical transaction records, compromising financial integrity and regulatory audit compliance.

## finance-bp-073--ledger, finance-bp-129--beancount (1)

### `AP-ACCOUNTING-001` — Using floating-point arithmetic for monetary amounts <sub>(high)</sub>

Representing currency values with float64 or similar floating-point types causes precision loss during arithmetic operations. Rounding errors accumulate over multiple transactions, leading to incorrect balance calculations and potential financial losses. This violates the fundamental requirement that monetary calculations must be exact.

## finance-bp-078--fava_investor (4)

### `AP-ACCOUNTING-009` — Incorrect row data access patterns on query results <sub>(high)</sub>

Using dictionary notation (row['column_name']) on namedtuple query results raises TypeError since namedtuples only support attribute access. This breaks all module queries expecting attribute-style access, causing asset allocation, tax loss harvesting, and other critical financial computations to fail.

### `AP-ACCOUNTING-010` — Missing bidirectional inference for fund relationship declarations <sub>(medium)</sub>

When relationship A→B is declared but B→A is not inferred, the TLH partner list becomes incomplete. This leads to suboptimal tax-loss harvesting decisions where only some funds show all valid swap options, reducing potential tax savings for investors.

### `AP-ACCOUNTING-011` — Wash sale comparison within substantially identical groups <sub>(high)</sub>

Comparing a ticker to itself in its own substantially identical group falsely triggers wash sale warnings. This incorrectly blocks valid tax-loss harvesting transactions, causing investors to miss opportunities to realize tax losses and offset capital gains.

### `AP-ACCOUNTING-012` — Missing substantially identical tickers in wash sale queries <sub>(high)</sub>

Omitting substantially identical fund tickers from the wash sale comparison set allows purchases of similar funds within the 30-day window. This triggers unintended wash sales that disallow tax loss claims on subsequent sales of the original position.

## finance-bp-129--beancount (3)

### `AP-ACCOUNTING-013` — Using parsed entries with MISSING sentinel values for calculations <sub>(high)</sub>

Using parsed entries directly that contain MISSING sentinel values for balance or cost computations causes runtime errors or silent zero-value calculations. This results in incorrect portfolio valuations and reconciliation failures, compromising financial reporting accuracy.

### `AP-ACCOUNTING-014` — Underspecified interpolation with multiple missing values per currency <sub>(high)</sub>

Having more than one missing value per currency group creates an underdetermined system with no unique solution during interpolation. This causes InterpolationError and transaction failure, blocking balance calculations for affected accounts.

### `AP-ACCOUNTING-015` — Violating accounting identity in opening balance transactions <sub>(high)</sub>

Creating opening balance transactions where the total balance of summarized entries does not equal exactly zero violates the fundamental accounting identity (Assets = Liabilities + Equity). This causes the balance sheet to be fundamentally incorrect with non-zero total assets and liabilities.

FILE:references/COMPONENTS.md
# Component Capability Map

**Project**: finance-bp-129--beancount
**Scan date**: 2026-04-22
**Stats**: {'total_files': 6, 'total_classes': 19, 'total_functions': 0, 'total_stages': 6}

## Modules (6)

- [parsing](components/parsing.md): 4 classes
- [booking_(lot_matching)](components/booking_-lot_matching.md): 3 classes
- [transformation_(plugins)](components/transformation_-plugins.md): 3 classes
- [realization](components/realization.md): 3 classes
- [summarization](components/summarization.md): 3 classes
- [validation](components/validation.md): 3 classes

FILE:references/CONSTRAINTS.md
# Constraints

## preservation_manifest

```yaml
required_objects:
  business_decisions_count: 116
  fatal_constraints_count: 38
  non_fatal_constraints_count: 146
  use_cases_count: 2
  semantic_locks_count: 12
  preconditions_count: 4
  evidence_quality_rules_count: 2
  traceback_scenarios_count: 5

```

FILE:references/LOCKS.md
# Semantic Locks + Preconditions

## Semantic Locks (12)

### `SL-01` <sub>(on_violation: fatal)</sub>
Execute sell orders before buy orders in every trading cycle

### `SL-02` <sub>(on_violation: fatal)</sub>
Trading signals MUST use next-bar execution (no look-ahead)

### `SL-03` <sub>(on_violation: fatal)</sub>
Entity IDs MUST follow format entity_type_exchange_code

### `SL-04` <sub>(on_violation: fatal)</sub>
DataFrame index MUST be MultiIndex (entity_id, timestamp)

### `SL-05` <sub>(on_violation: fatal)</sub>
TradingSignal MUST have EXACTLY ONE of: position_pct, order_money, order_amount

### `SL-06` <sub>(on_violation: fatal)</sub>
filter_result column semantics: True=BUY, False=SELL, None/NaN=NO ACTION

### `SL-07` <sub>(on_violation: fatal)</sub>
Transformer MUST run BEFORE Accumulator in factor pipeline

### `SL-08` <sub>(on_violation: fatal)</sub>
MACD parameters locked: fast=12, slow=26, signal=9

### `SL-09` <sub>(on_violation: warning)</sub>
Default transaction costs: buy_cost=0.001, sell_cost=0.001, slippage=0.001

### `SL-10` <sub>(on_violation: fatal)</sub>
A-share equity trading is T+1 (no same-day close of buy positions)

### `SL-11` <sub>(on_violation: fatal)</sub>
Recorder subclass MUST define provider AND data_schema class attributes

### `SL-12` <sub>(on_violation: fatal)</sub>
Factor result_df MUST contain either 'filter_result' OR 'score_result' column

## Preconditions (4)

- **`PC-01`**: `python3 -c 'import zvt; print(zvt.__version__)'` → on_fail: Run: python3 -m pip install zvt then re-run: python3 -m zvt.init_dirs to initialize data directories
- **`PC-02`**: `python3 -c "from zvt.api.kdata import get_kdata; df = get_kdata(entity_ids=['stock_sh_600000'], limit=1); assert df is not None and len(df) > 0, 'No kdata found'"` → on_fail: Run recorder first: python3 -m zvt.recorders.em.em_stock_kdata_recorder --entity_ids stock_sh_600000 (replace with your target entity IDs)
- **`PC-03`**: `python3 -c "import os; from pathlib import Path; zvt_home = Path(os.environ.get('ZVT_HOME', Path.home() / '.zvt')); assert zvt_home.exists(), f'ZVT home not found: {zvt_home}'"` → on_fail: Run: python3 -m zvt.init_dirs
- **`PC-04`**: `python3 -c "import os, tempfile; from pathlib import Path; zvt_home = Path(os.environ.get('ZVT_HOME', Path.home() / '.zvt')); test_f = zvt_home / '.write_test'; test_f.touch(); test_f.unlink()"` → on_fail: Check directory permissions: chmod u+w ~/.zvt or set ZVT_HOME environment variable to a writable location

FILE:references/USE_CASES.md
# Known Use Cases (KUC)

Total: **2**

## `KUC-101`
**Source**: `beancount/utils/test_utils.py`

Provides reusable testing utilities for beancount test scripts including temporary directory management and test file creation for integration testing.

## `KUC-102`
**Source**: `beancount/utils/test_utils_test.py`

Unit tests that validate the correctness of test utility functions including temporary directory cleanup and test file generation for beancount test scripts.

FILE:references/WISDOM.md
# Cross-Project Wisdom

Total: **10**

## `CW-ACCOUNTING-001` — Use exact-precision integer types for monetary representation
**From**: finance-bp-073--ledger, finance-bp-129--beancount · **Applicable to**: accounting

Both the Numscript ledger and Beancount parser mandates using Decimal (beancount) or MonetaryInt based on big.Int (ledger) instead of floating-point. This pattern ensures no rounding errors accumulate in financial calculations, critical for audit compliance in global markets.

## `CW-ACCOUNTING-002` — Mandatory initialization sequence before execution
**From**: finance-bp-073--ledger · **Applicable to**: accounting

The Numscript VM requires a strict initialization sequence: ResolveResources() then ResolveBalances() must both be called before Execute(). Skipping any step causes panics. This teaches that VM/script execution requires careful state setup—always verify prerequisites before running financial logic.

## `CW-ACCOUNTING-003` — Dual idempotency key strategy
**From**: finance-bp-073--ledger · **Applicable to**: accounting

Using both IdempotencyKey and IdempotencyHash together ensures robust duplicate detection: IdempotencyKey prevents exact retries while IdempotencyHash catches retries with different input parameters that would otherwise incorrectly succeed. Single-key approaches leave gaps in financial transaction safety.

## `CW-ACCOUNTING-004` — Log-before-commit event sourcing pattern
**From**: finance-bp-073--ledger · **Applicable to**: accounting

In the transaction processing pipeline, the log must be inserted before committing the transaction to maintain event sourcing integrity. This ensures the audit trail can always reconstruct state and supports rollback scenarios, critical for regulatory compliance in global accounting.

## `CW-ACCOUNTING-005` — Read Committed isolation with FOR UPDATE locks
**From**: finance-bp-073--ledger · **Applicable to**: accounting

When implementing balance operations, use Read Committed isolation level combined with FOR UPDATE row locks. This prevents concurrent transactions from creating inconsistent balances (e.g., both succeeding when they should fail due to insufficient funds), ensuring data integrity under concurrent load.

## `CW-ACCOUNTING-006` — Transitive closure for equivalence relationships
**From**: finance-bp-078--fava_investor · **Applicable to**: accounting

When building commodity groups or substantially identical fund relationships, apply transitive closure to infer complete equivalence. If A equals B and B equals C, then A, B, and C form one group. This ensures wash sale detection and TLH calculations are complete and accurate across all declared relationships.

## `CW-ACCOUNTING-007` — Canonical representative selection for relationship groups
**From**: finance-bp-078--fava_investor · **Applicable to**: accounting

When selecting a representative for a substantially identical fund group, always return the same representative ticker for any member of that group. Inconsistent representative selection causes non-deterministic calculations where the same ticker gets different partners depending on which group member is queried.

## `CW-ACCOUNTING-008` — Immutable monetary objects with __slots__
**From**: finance-bp-129--beancount · **Applicable to**: accounting

Constructing Amount or Position objects using immutable Decimal values with __slots__ = () pattern prevents accidental mutation of monetary values after creation. This immutability ensures financial calculations remain consistent throughout transaction processing and audit trails.

## `CW-ACCOUNTING-009` — Eliminate all MISSING values before presenting parsed data as complete
**From**: finance-bp-129--beancount · **Applicable to**: accounting

Parsed entries with MISSING sentinel values are incomplete and cannot be used for financial reporting. All MISSING values must be resolved through booking and interpolation before claiming parsed entries are ready for balance calculations or realized/unrealized gains computation.

## `CW-ACCOUNTING-010` — Strict schema compatibility across class hierarchies
**From**: finance-bp-078--fava_investor, finance-bp-129--beancount · **Applicable to**: accounting

When extending base classes with additional functionality (like ScaledNAV extending RelateTickers), maintain compatibility with existing metadata schemas. Schema divergence causes extended classes to miss relationships declared for the base class, breaking wash sale detection and TLH recommendations.

FILE:references/components/booking_-lot_matching.md
# booking_(lot_matching) (3 classes)

## `Inventory.reduce`
`booking_(lot_matching)/inventory-reduce.py:0`

## `booking_method_STRICT`
`booking_(lot_matching)/booking-method-strict.py:0`

## `booking_method_fn`
`booking_(lot_matching)/booking-method-fn.py:0`

FILE:references/components/parsing.md
# parsing (4 classes)

## `Builder.build`
`parsing/builder-build.py:0`

## `OptDesc.convert`
`parsing/optdesc-convert.py:0`

## `booking_method`
`parsing/booking-method.py:0`

## `plugin`
`parsing/plugin.py:0`

FILE:references/components/realization.md
# realization (3 classes)

## `RealAccount.txn_postings`
`realization/realaccount-txn-postings.py:0`

## `Amount.__slots__`
`realization/amount-slots.py:0`

## `balance_reducer`
`realization/balance-reducer.py:0`

FILE:references/components/summarization.md
# summarization (3 classes)

## `AccountTypes.equity`
`summarization/accounttypes-equity.py:0`

## `summarize.open`
`summarization/summarize-open.py:0`

## `conversion_currency`
`summarization/conversion-currency.py:0`

FILE:references/components/transformation_-plugins.md
# transformation_(plugins) (3 classes)

## `DocumentError.check`
`transformation_(plugins)/documenterror-check.py:0`

## `PadError.check`
`transformation_(plugins)/paderror-check.py:0`

## `plugin_module`
`transformation_(plugins)/plugin-module.py:0`

FILE:references/components/validation.md
# validation (3 classes)

## `ValidationError.check`
`validation/validationerror-check.py:0`

## `validate_open_close`
`validation/validate-open-close.py:0`

## `extra_validations`
`validation/extra-validations.py:0`

FILE:references/seed.yaml
meta:
  id: finance-bp-129-v5.3
  version: v6.1
  blueprint_id: finance-bp-129
  sop_version: crystal-compilation-v6.1
  source_language: en
  compiled_at: '2026-04-22T13:01:04.739311+00:00'
  target_host: openclaw
  authoritative_artifact:
    primary: seed.yaml
    non_authoritative_derivatives:
    - SKILL.md (host-generated summary, may lag)
    - HEARTBEAT.md (host telemetry)
    - memory/*.md (host conversational memory)
    rule: On any behavioral decision (preconditions check, OV assertion, EQ rule firing, spec_lock verification), agents MUST
      re-read seed.yaml. Derivatives are for UI display only and may be out-of-date.
  execution_protocol:
    install_trigger:
    - Execute resources.host_adapter.install_recipes[] in declared order
    - Verify each package with import check before proceeding
    execute_trigger: When user intent matches intent_router.uc_entries[].positive_terms AND user uses action verb (run/execute/跑/执行/backtest/fetch/collect)
    on_execute:
    - Reload seed.yaml (do not rely on SKILL.md or cached summaries)
    - Run preconditions[] in declared order; halt on first fatal failure with on_fail message to user
    - Enter context_state_machine.CA1_MEMORY_CHECKED state
    - Evaluate evidence_quality.enforcement_rules[]; prepend user_disclosure_template
    - Translate user_facing_fields to user locale per locale_contract
    - "[V6 READING ORDER]\nThis crystal contains the following V6 layers. Before answering any business question, the host\
      \ MUST read them in order:\n  1. anti_patterns[] — cross-project anti-patterns (with AP-* ids)\n  2. cross_project_wisdom[]\
      \ — cross-project wisdom (with CW-* ids)\n  3. domain_constraints_injected[] — domain constraints (SHARED-* ids)\n \
      \ 4. known_use_cases[] — concrete business scenarios (KUC-* ids)\n  5. component_capability_map — AST component map\
      \ (by module)\n\nWhen answering user questions, proactively cite relevant AP-*/CW-*/SHARED-*/KUC-* ids with source text.\
      \ Examples: T+1 rules -> cite SHARED-* constraint; model comparison -> warn via AP-*; follow-holdings strategy -> cite\
      \ KUC-* with example file."
    workspace_resolution:
      scripts_path: '{host_workspace}/scripts/'
      skills_path: '{host_workspace}/skills/'
      trace_path: '{host_workspace}/.trace/'
  capability_tags:
    markets:
    - global
    activities:
    - accounting
  upgraded_from: finance-bp-129-v1.seed.yaml
  upgraded_at: '2026-04-22T13:20:35.880096+00:00'
  v6_inputs:
    ast_mind_map: knowledge/sources/finance/finance-bp-129--beancount/v6_inputs/ast_mind_map.yaml
    anti_patterns: null
    cross_project_wisdom: null
    examples_kuc: knowledge/sources/finance/finance-bp-129--beancount/v6_inputs/examples_kuc.yaml
    shared_pools_dir: knowledge/sources/finance/_shared
anti_patterns:
- id: AP-ACCOUNTING-001
  title: Using floating-point arithmetic for monetary amounts
  description: Representing currency values with float64 or similar floating-point types causes precision loss during arithmetic
    operations. Rounding errors accumulate over multiple transactions, leading to incorrect balance calculations and potential
    financial losses. This violates the fundamental requirement that monetary calculations must be exact.
  project_source: finance-bp-073--ledger, finance-bp-129--beancount
  severity: high
  applicable_to_tags:
    markets:
    - global
    activities:
    - accounting
  _source_file: anti-patterns/accounting.yaml
- id: AP-ACCOUNTING-002
  title: Skipping initialization calls before VM/script execution
  description: Executing Numscript VM without first calling ResolveResources() and ResolveBalances() causes panics with ErrResourcesNotInitialized
    or ErrBalancesNotInitialized. This prevents any script execution and leaves transactions in an unrunnable state, blocking
    financial operations entirely.
  project_source: finance-bp-073--ledger
  severity: high
  applicable_to_tags:
    markets:
    - global
    activities:
    - accounting
  _source_file: anti-patterns/accounting.yaml
- id: AP-ACCOUNTING-003
  title: Mixing different asset types in monetary operations
  description: Performing addition, subtraction, or take operations on amounts with different asset types produces invalid
    financial calculations. This violates the fundamental accounting principle that amounts in different currencies cannot
    be combined, leading to corrupted account balances and failed reconciliations.
  project_source: finance-bp-073--ledger
  severity: high
  applicable_to_tags:
    markets:
    - global
    activities:
    - accounting
  _source_file: anti-patterns/accounting.yaml
- id: AP-ACCOUNTING-004
  title: Missing insufficient funds validation
  description: Failing to detect when account balance cannot cover a requested withdrawal or transfer allows overdrafts beyond
    permitted limits. This causes real monetary losses, account balance violations, and potential regulatory compliance issues
    in global markets.
  project_source: finance-bp-073--ledger
  severity: high
  applicable_to_tags:
    markets:
    - global
    activities:
    - accounting
  _source_file: anti-patterns/accounting.yaml
- id: AP-ACCOUNTING-005
  title: Non-atomic transaction commit/rollback
  description: Processing database operations without atomic commit/rollback leaves partial state when failures occur. This
    corrupts account balances and volumes, violating double-entry bookkeeping integrity and making audit trails unreliable
    for global regulatory compliance.
  project_source: finance-bp-073--ledger
  severity: high
  applicable_to_tags:
    markets:
    - global
    activities:
    - accounting
  _source_file: anti-patterns/accounting.yaml
- id: AP-ACCOUNTING-006
  title: On-demand posting generation causing double-spending
  description: Computing postings on-demand rather than accumulating them during transaction execution fails to track already-spent
    funds within the same transaction. This creates double-spending vulnerabilities that violate atomic transaction semantics
    and can result in significant financial losses.
  project_source: finance-bp-073--ledger
  severity: high
  applicable_to_tags:
    markets:
    - global
    activities:
    - accounting
  _source_file: anti-patterns/accounting.yaml
- id: AP-ACCOUNTING-007
  title: Log insertion after transaction commit breaking event sourcing
  description: Committing the transaction before inserting the audit log breaks the event sourcing pattern fundamental to
    accounting integrity. This makes it impossible to rebuild state from logs and violates audit requirements necessary for
    global financial compliance.
  project_source: finance-bp-073--ledger
  severity: high
  applicable_to_tags:
    markets:
    - global
    activities:
    - accounting
  _source_file: anti-patterns/accounting.yaml
- id: AP-ACCOUNTING-008
  title: Incomplete transaction log hash chaining
  description: Computing log hashes without including the previous log hash breaks the immutable audit trail chain. This allows
    undetected tampering with historical transaction records, compromising financial integrity and regulatory audit compliance.
  project_source: finance-bp-073--ledger
  severity: high
  applicable_to_tags:
    markets:
    - global
    activities:
    - accounting
  _source_file: anti-patterns/accounting.yaml
- id: AP-ACCOUNTING-009
  title: Incorrect row data access patterns on query results
  description: Using dictionary notation (row['column_name']) on namedtuple query results raises TypeError since namedtuples
    only support attribute access. This breaks all module queries expecting attribute-style access, causing asset allocation,
    tax loss harvesting, and other critical financial computations to fail.
  project_source: finance-bp-078--fava_investor
  severity: high
  applicable_to_tags:
    markets:
    - global
    activities:
    - accounting
  _source_file: anti-patterns/accounting.yaml
- id: AP-ACCOUNTING-010
  title: Missing bidirectional inference for fund relationship declarations
  description: When relationship A→B is declared but B→A is not inferred, the TLH partner list becomes incomplete. This leads
    to suboptimal tax-loss harvesting decisions where only some funds show all valid swap options, reducing potential tax
    savings for investors.
  project_source: finance-bp-078--fava_investor
  severity: medium
  applicable_to_tags:
    markets:
    - global
    activities:
    - accounting
  _source_file: anti-patterns/accounting.yaml
- id: AP-ACCOUNTING-011
  title: Wash sale comparison within substantially identical groups
  description: Comparing a ticker to itself in its own substantially identical group falsely triggers wash sale warnings.
    This incorrectly blocks valid tax-loss harvesting transactions, causing investors to miss opportunities to realize tax
    losses and offset capital gains.
  project_source: finance-bp-078--fava_investor
  severity: high
  applicable_to_tags:
    markets:
    - global
    activities:
    - accounting
  _source_file: anti-patterns/accounting.yaml
- id: AP-ACCOUNTING-012
  title: Missing substantially identical tickers in wash sale queries
  description: Omitting substantially identical fund tickers from the wash sale comparison set allows purchases of similar
    funds within the 30-day window. This triggers unintended wash sales that disallow tax loss claims on subsequent sales
    of the original position.
  project_source: finance-bp-078--fava_investor
  severity: high
  applicable_to_tags:
    markets:
    - global
    activities:
    - accounting
  _source_file: anti-patterns/accounting.yaml
- id: AP-ACCOUNTING-013
  title: Using parsed entries with MISSING sentinel values for calculations
  description: Using parsed entries directly that contain MISSING sentinel values for balance or cost computations causes
    runtime errors or silent zero-value calculations. This results in incorrect portfolio valuations and reconciliation failures,
    compromising financial reporting accuracy.
  project_source: finance-bp-129--beancount
  severity: high
  applicable_to_tags:
    markets:
    - global
    activities:
    - accounting
  _source_file: anti-patterns/accounting.yaml
- id: AP-ACCOUNTING-014
  title: Underspecified interpolation with multiple missing values per currency
  description: Having more than one missing value per currency group creates an underdetermined system with no unique solution
    during interpolation. This causes InterpolationError and transaction failure, blocking balance calculations for affected
    accounts.
  project_source: finance-bp-129--beancount
  severity: high
  applicable_to_tags:
    markets:
    - global
    activities:
    - accounting
  _source_file: anti-patterns/accounting.yaml
- id: AP-ACCOUNTING-015
  title: Violating accounting identity in opening balance transactions
  description: Creating opening balance transactions where the total balance of summarized entries does not equal exactly
    zero violates the fundamental accounting identity (Assets = Liabilities + Equity). This causes the balance sheet to be
    fundamentally incorrect with non-zero total assets and liabilities.
  project_source: finance-bp-129--beancount
  severity: high
  applicable_to_tags:
    markets:
    - global
    activities:
    - accounting
  _source_file: anti-patterns/accounting.yaml
cross_project_wisdom:
- wisdom_id: CW-ACCOUNTING-001
  source_project: finance-bp-073--ledger, finance-bp-129--beancount
  pattern_name: Use exact-precision integer types for monetary representation
  description: Both the Numscript ledger and Beancount parser mandates using Decimal (beancount) or MonetaryInt based on big.Int
    (ledger) instead of floating-point. This pattern ensures no rounding errors accumulate in financial calculations, critical
    for audit compliance in global markets.
  applicable_to_activity: accounting
  _source_file: cross-project-wisdom/accounting.yaml
- wisdom_id: CW-ACCOUNTING-002
  source_project: finance-bp-073--ledger
  pattern_name: Mandatory initialization sequence before execution
  description: 'The Numscript VM requires a strict initialization sequence: ResolveResources() then ResolveBalances() must
    both be called before Execute(). Skipping any step causes panics. This teaches that VM/script execution requires careful
    state setup—always verify prerequisites before running financial logic.'
  applicable_to_activity: accounting
  _source_file: cross-project-wisdom/accounting.yaml
- wisdom_id: CW-ACCOUNTING-003
  source_project: finance-bp-073--ledger
  pattern_name: Dual idempotency key strategy
  description: 'Using both IdempotencyKey and IdempotencyHash together ensures robust duplicate detection: IdempotencyKey
    prevents exact retries while IdempotencyHash catches retries with different input parameters that would otherwise incorrectly
    succeed. Single-key approaches leave gaps in financial transaction safety.'
  applicable_to_activity: accounting
  _source_file: cross-project-wisdom/accounting.yaml
- wisdom_id: CW-ACCOUNTING-004
  source_project: finance-bp-073--ledger
  pattern_name: Log-before-commit event sourcing pattern
  description: In the transaction processing pipeline, the log must be inserted before committing the transaction to maintain
    event sourcing integrity. This ensures the audit trail can always reconstruct state and supports rollback scenarios, critical
    for regulatory compliance in global accounting.
  applicable_to_activity: accounting
  _source_file: cross-project-wisdom/accounting.yaml
- wisdom_id: CW-ACCOUNTING-005
  source_project: finance-bp-073--ledger
  pattern_name: Read Committed isolation with FOR UPDATE locks
  description: When implementing balance operations, use Read Committed isolation level combined with FOR UPDATE row locks.
    This prevents concurrent transactions from creating inconsistent balances (e.g., both succeeding when they should fail
    due to insufficient funds), ensuring data integrity under concurrent load.
  applicable_to_activity: accounting
  _source_file: cross-project-wisdom/accounting.yaml
- wisdom_id: CW-ACCOUNTING-006
  source_project: finance-bp-078--fava_investor
  pattern_name: Transitive closure for equivalence relationships
  description: When building commodity groups or substantially identical fund relationships, apply transitive closure to infer
    complete equivalence. If A equals B and B equals C, then A, B, and C form one group. This ensures wash sale detection
    and TLH calculations are complete and accurate across all declared relationships.
  applicable_to_activity: accounting
  _source_file: cross-project-wisdom/accounting.yaml
- wisdom_id: CW-ACCOUNTING-007
  source_project: finance-bp-078--fava_investor
  pattern_name: Canonical representative selection for relationship groups
  description: When selecting a representative for a substantially identical fund group, always return the same representative
    ticker for any member of that group. Inconsistent representative selection causes non-deterministic calculations where
    the same ticker gets different partners depending on which group member is queried.
  applicable_to_activity: accounting
  _source_file: cross-project-wisdom/accounting.yaml
- wisdom_id: CW-ACCOUNTING-008
  source_project: finance-bp-129--beancount
  pattern_name: Immutable monetary objects with __slots__
  description: Constructing Amount or Position objects using immutable Decimal values with __slots__ = () pattern prevents
    accidental mutation of monetary values after creation. This immutability ensures financial calculations remain consistent
    throughout transaction processing and audit trails.
  applicable_to_activity: accounting
  _source_file: cross-project-wisdom/accounting.yaml
- wisdom_id: CW-ACCOUNTING-009
  source_project: finance-bp-129--beancount
  pattern_name: Eliminate all MISSING values before presenting parsed data as complete
  description: Parsed entries with MISSING sentinel values are incomplete and cannot be used for financial reporting. All
    MISSING values must be resolved through booking and interpolation before claiming parsed entries are ready for balance
    calculations or realized/unrealized gains computation.
  applicable_to_activity: accounting
  _source_file: cross-project-wisdom/accounting.yaml
- wisdom_id: CW-ACCOUNTING-010
  source_project: finance-bp-078--fava_investor, finance-bp-129--beancount
  pattern_name: Strict schema compatibility across class hierarchies
  description: When extending base classes with additional functionality (like ScaledNAV extending RelateTickers), maintain
    compatibility with existing metadata schemas. Schema divergence causes extended classes to miss relationships declared
    for the base class, breaking wash sale detection and TLH recommendations.
  applicable_to_activity: accounting
  _source_file: cross-project-wisdom/accounting.yaml
domain_constraints_injected: []
resources_injected: {}
known_use_cases:
- kuc_id: KUC-101
  source_file: beancount/utils/test_utils.py
  business_problem: Provides reusable testing utilities for beancount test scripts including temporary directory management
    and test file creation for integration testing.
  intent_keywords:
  - testing utilities
  - tempdir
  - test files
  - mock repository
  - integration testing
  stage: testing
  data_domain: internal
  type: builtin_factor
- kuc_id: KUC-102
  source_file: beancount/utils/test_utils_test.py
  business_problem: Unit tests that validate the correctness of test utility functions including temporary directory cleanup
    and test file generation for beancount test scripts.
  intent_keywords:
  - unit test
  - validation
  - test utilities
  - tempdir cleanup
  - test file generation
  stage: testing
  data_domain: internal
  type: builtin_factor
component_capability_map:
  project: finance-bp-129--beancount
  scan_date: '2026-04-22'
  stats:
    total_files: 6
    total_classes: 19
    total_functions: 0
    total_stages: 6
  modules:
    parsing:
      class_count: 4
      stage_id: parsing
      stage_order: 1
      responsibility: Tokenize and parse Beancount DSL files into directive data structures. Provides the foundation for each
        downstream processing.
      classes:
      - name: Builder.build
        file: parsing/builder-build.py
        line: 0
        kind: required_method
        signature: ''
      - name: OptDesc.convert
        file: parsing/optdesc-convert.py
        line: 0
        kind: required_method
        signature: ''
      - name: booking_method
        file: parsing/booking-method.py
        line: 0
        kind: replaceable_point
      - name: plugin
        file: parsing/plugin.py
        line: 0
        kind: replaceable_point
      design_decision_count: 4
    booking_(lot_matching):
      class_count: 3
      stage_id: booking
      stage_order: 2
      responsibility: Match inventory reductions to existing lots using configurable methods; infer missing posting amounts
        via interpolation
      classes:
      - name: Inventory.reduce
        file: booking_(lot_matching)/inventory-reduce.py
        line: 0
        kind: required_method
        signature: ''
      - name: booking_method_STRICT
        file: booking_(lot_matching)/booking-method-strict.py
        line: 0
        kind: required_method
        signature: ''
      - name: booking_method_fn
        file: booking_(lot_matching)/booking-method-fn.py
        line: 0
        kind: replaceable_point
      design_decision_count: 5
    transformation_(plugins):
      class_count: 3
      stage_id: transformation
      stage_order: 3
      responsibility: Apply user plugins and built-in transformations to synthesized entries, pad balances, check assertions
      classes:
      - name: DocumentError.check
        file: transformation_(plugins)/documenterror-check.py
        line: 0
        kind: required_method
        signature: ''
      - name: PadError.check
        file: transformation_(plugins)/paderror-check.py
        line: 0
        kind: required_method
        signature: ''
      - name: plugin_module
        file: transformation_(plugins)/plugin-module.py
        line: 0
        kind: replaceable_point
      design_decision_count: 4
    realization:
      class_count: 3
      stage_id: realization
      stage_order: 4
      responsibility: Convert chronological list of directives into account tree with running balances for reporting
      classes:
      - name: RealAccount.txn_postings
        file: realization/realaccount-txn-postings.py
        line: 0
        kind: required_method
        signature: ''
      - name: Amount.__slots__
        file: realization/amount-slots.py
        line: 0
        kind: required_method
        signature: ''
      - name: balance_reducer
        file: realization/balance-reducer.py
        line: 0
        kind: replaceable_point
      design_decision_count: 4
    summarization:
      class_count: 3
      stage_id: summarization
      stage_order: 5
      responsibility: Fold historical entries into balance sheet opening transactions; support period reporting
      classes:
      - name: AccountTypes.equity
        file: summarization/accounttypes-equity.py
        line: 0
        kind: required_method
        signature: ''
      - name: summarize.open
        file: summarization/summarize-open.py
        line: 0
        kind: required_method
        signature: ''
      - name: conversion_currency
        file: summarization/conversion-currency.py
        line: 0
        kind: replaceable_point
      design_decision_count: 3
    validation:
      class_count: 3
      stage_id: validation
      stage_order: 6
      responsibility: Verify invariants hold after each transformations; ensure accounting rules are not violated
      classes:
      - name: ValidationError.check
        file: validation/validationerror-check.py
        line: 0
        kind: required_method
        signature: ''
      - name: validate_open_close
        file: validation/validate-open-close.py
        line: 0
        kind: required_method
        signature: ''
      - name: extra_validations
        file: validation/extra-validations.py
        line: 0
        kind: replaceable_point
      design_decision_count: 3
  data_flow_hints: []
locale_contract:
  source_language: en
  user_facing_fields:
  - human_summary.what_i_can_do.tagline
  - human_summary.what_i_can_do.use_cases[]
  - human_summary.what_i_auto_fetch[]
  - human_summary.what_i_ask_you[]
  - evidence_quality.user_disclosure_template
  - post_install_notice.message_template.positioning
  - post_install_notice.message_template.capability_catalog.groups[].name
  - post_install_notice.message_template.capability_catalog.groups[].description
  - post_install_notice.message_template.capability_catalog.groups[].ucs[].name
  - post_install_notice.message_template.capability_catalog.groups[].ucs[].short_description
  - post_install_notice.message_template.call_to_action
  - post_install_notice.message_template.featured_entries[].beginner_prompt
  - post_install_notice.message_template.more_info_hint
  - preconditions[].description
  - preconditions[].on_fail
  - intent_router.uc_entries[].name
  - intent_router.uc_entries[].ambiguity_question
  - architecture.pipeline
  - architecture.stages[].narrative.does_what
  - architecture.stages[].narrative.key_decisions
  - architecture.stages[].narrative.common_pitfalls
  - constraints.fatal[].consequence
  - constraints.regular[].consequence
  - output_validator.assertions[].failure_message
  - acceptance.hard_gates[].on_fail
  - skill_crystallization.action
  locale_detection_order:
  - explicit_user_declaration
  - first_message_language
  - system_locale
  translation_enforcement:
    trigger: on_first_user_message
    action: Render user_facing_fields in detected locale, preserving all IDs (BD-/SL-/UC-/finance-C-) and code identifiers
      verbatim
    violation_code: LOCALE-01
    violation_signal: User receives untranslated English Human Summary when detected locale != en
evidence_quality:
  declared:
    evidence_coverage_ratio: 1.0
    evidence_verify_ratio: 0.5154639175257731
    evidence_invalid: 47
    evidence_verified: 50
    evidence_auto_fixed: 0
    audit_coverage: 29/29 (100%)
    audit_pass_rate: 7/29 (24%)
    audit_fail_total: 7
    audit_finance_universal:
      pass: 4
      warn: 8
      fail: 4
    audit_subdomain_totals:
      pass: 3
      warn: 7
      fail: 3
  enforcement_rules:
  - id: EQ-01
    trigger: declared.evidence_verify_ratio < 0.5
    action: MUST invoke traceback lookup for all cited BD-IDs in output before emitting business code — read LATEST.yaml sections
      for each BD referenced
    violation_code: EQ-01-V
    violation_signal: Generated script references BD-IDs but no tool_call to read LATEST.yaml preceded code generation
  user_disclosure_template: '[QUALITY NOTICE] This crystal was compiled from blueprint finance-bp-129. Evidence verify ratio
    = 51.5% and audit fail total = 7. Generated results may have uncaptured requirement gaps. Verify critical decisions against
    source files (LATEST.yaml / LATEST.jsonl).'
traceback:
  source_files:
    blueprint: LATEST.yaml
    constraints: LATEST.jsonl
  mandatory_lookup_scenarios:
  - id: TB-01
    condition: Two constraints have apparently conflicting enforcement rules
    lookup_target: LATEST.jsonl — find both constraint IDs, compare `consequence` + `evidence_refs` to determine priority
  - id: TB-02
    condition: A business decision rationale is unclear or disputed
    lookup_target: LATEST.yaml — locate BD-ID under business_decisions, read `rationale` + `alternative_considered` fields
  - id: TB-03
    condition: evidence_invalid > 0 in evidence_quality.declared
    lookup_target: LATEST.yaml _enrich_meta — cross-check specific BD `evidence_refs` fields for invalid markers
  - id: TB-04
    condition: User asks where a rule comes from
    lookup_target: LATEST.jsonl — find constraint by ID, read `confidence.evidence_refs` for source file + line number
  - id: TB-05
    condition: Generated code does not match expected ZVT API behavior
    lookup_target: LATEST.yaml stages[].required_methods — verify method signature and evidence locator in source code
  degraded_lookup:
    no_fs_access: 'Ask the user to paste the relevant LATEST.yaml section or LATEST.jsonl lines for the BD-/finance-C- IDs
      in question. Crystal ID: finance-bp-129-v5.0.'
trace_schema:
  event_types:
  - precondition_check
  - spec_lock_check
  - evidence_rule_fired
  - evidence_rule_skipped
  - locale_translation_emitted
  - hard_gate_passed
  - hard_gate_failed
  - skill_emitted
  - false_completion_claim
preconditions:
- id: PC-01
  description: zvt package installed and importable
  check_command: python3 -c 'import zvt; print(zvt.__version__)'
  on_fail: 'Run: python3 -m pip install zvt  then re-run: python3 -m zvt.init_dirs to initialize data directories'
  severity: fatal
- id: PC-02
  description: K-data exists for target entities (required before backtesting)
  check_command: python3 -c "from zvt.api.kdata import get_kdata; df = get_kdata(entity_ids=['stock_sh_600000'], limit=1);
    assert df is not None and len(df) > 0, 'No kdata found'"
  on_fail: 'Run recorder first: python3 -m zvt.recorders.em.em_stock_kdata_recorder --entity_ids stock_sh_600000  (replace
    with your target entity IDs)'
  severity: fatal
  applies_to_uc: []
- id: PC-03
  description: ZVT data directory initialized (~/.zvt or ZVT_HOME)
  check_command: 'python3 -c "import os; from pathlib import Path; zvt_home = Path(os.environ.get(''ZVT_HOME'', Path.home()
    / ''.zvt'')); assert zvt_home.exists(), f''ZVT home not found: {zvt_home}''"'
  on_fail: 'Run: python3 -m zvt.init_dirs'
  severity: fatal
- id: PC-04
  description: SQLite write permission for ZVT data directory
  check_command: python3 -c "import os, tempfile; from pathlib import Path; zvt_home = Path(os.environ.get('ZVT_HOME', Path.home()
    / '.zvt')); test_f = zvt_home / '.write_test'; test_f.touch(); test_f.unlink()"
  on_fail: 'Check directory permissions: chmod u+w ~/.zvt  or set ZVT_HOME environment variable to a writable location'
  severity: warn
intent_router:
  uc_entries:
  - uc_id: UC-101
    name: Beancount Test Utilities Framework
    positive_terms:
    - testing utilities
    - tempdir
    - test files
    - mock repository
    - integration testing
    data_domain: internal
    negative_terms:
    - trading strategy
    - screening
    - live trading
    - data pipeline
    - monitoring
    - reporting
    ambiguity_question: Are you looking for reusable testing utilities for your beancount project, or are you looking for
      a specific trading, screening, or data processing use case?
  - uc_id: UC-102
    name: Test Utils Validation Suite
    positive_terms:
    - unit test
    - validation
    - test utilities
    - tempdir cleanup
    - test file generation
    data_domain: internal
    negative_terms:
    - trading signals
    - portfolio screening
    - data ingestion
    - live execution
    - performance reporting
    ambiguity_question: Are you looking for test coverage of beancount utilities, or do you need a specific trading, screening,
      or analytical use case?
context_state_machine:
  states:
  - id: CA1_MEMORY_CHECKED
    entry: Task started
    exit: All memory queries attempted and recorded; memory_unavailable set if failed
    timeout: 30s — skip memory, mark memory_unavailable=true, proceed to CA2
  - id: CA2_GAPS_FILLED
    entry: CA1 complete
    exit: 'All FATAL-priority required inputs answered: target market (A-share/HK/US), data source, time range, strategy type'
    timeout: NOT skippable — FATAL inputs MUST be user-answered before proceeding
  - id: CA3_PATH_SELECTED
    entry: CA2 complete
    exit: intent_router matched single use case with confidence gap > 20% over next candidate, no data_domain ambiguity
    timeout: Trigger ambiguity_question for top-2 candidates, await user selection
  - id: CA4_EXECUTING
    entry: CA3 complete + user explicit confirmation received
    exit: All hard gates G1-Gn passed and output files written
    timeout: NOT skippable — user confirmation of execution path required
  enforcement: Code generation is PROHIBITED before CA4_EXECUTING. Any regression to earlier state MUST be announced to user.
    buy/sell ordering SL-01 check runs at CA4 entry.
spec_lock_registry:
  semantic_locks:
  - id: SL-01
    description: Execute sell orders before buy orders in every trading cycle
    locked_value: sell() called before buy() in each Trader.run() iteration
    violation_is: fatal
    source_bd_ids:
    - BD-018
  - id: SL-02
    description: Trading signals MUST use next-bar execution (no look-ahead)
    locked_value: due_timestamp = happen_timestamp + level.to_second()
    violation_is: fatal
    source_bd_ids:
    - BD-014
    - BD-025
  - id: SL-03
    description: Entity IDs MUST follow format entity_type_exchange_code
    locked_value: stock_sh_600000 | stockhk_hk_0700 | stockus_nasdaq_AAPL
    violation_is: fatal
    source_bd_ids: []
  - id: SL-04
    description: DataFrame index MUST be MultiIndex (entity_id, timestamp)
    locked_value: df.index.names == ['entity_id', 'timestamp']
    violation_is: fatal
    source_bd_ids: []
  - id: SL-05
    description: 'TradingSignal MUST have EXACTLY ONE of: position_pct, order_money, order_amount'
    locked_value: XOR enforcement in trading/__init__.py:68
    violation_is: fatal
    source_bd_ids: []
  - id: SL-06
    description: 'filter_result column semantics: True=BUY, False=SELL, None/NaN=NO ACTION'
    locked_value: factor.py:475 order_type_flag mapping
    violation_is: fatal
    source_bd_ids: []
  - id: SL-07
    description: Transformer MUST run BEFORE Accumulator in factor pipeline
    locked_value: 'compute_result(): transform at :403 before accumulator at :409'
    violation_is: fatal
    source_bd_ids: []
  - id: SL-08
    description: 'MACD parameters locked: fast=12, slow=26, signal=9'
    locked_value: factors/algorithm.py:30 macd(slow=26, fast=12, n=9)
    violation_is: fatal
    source_bd_ids:
    - BD-036
  - id: SL-09
    description: 'Default transaction costs: buy_cost=0.001, sell_cost=0.001, slippage=0.001'
    locked_value: sim_account.py:25 SimAccountService default costs
    violation_is: warning
    source_bd_ids:
    - BD-029
  - id: SL-10
    description: A-share equity trading is T+1 (no same-day close of buy positions)
    locked_value: sim_account.available_long filters by trading_t
    violation_is: fatal
    source_bd_ids: []
  - id: SL-11
    description: Recorder subclass MUST define provider AND data_schema class attributes
    locked_value: contract/recorder.py:71 Meta; register_schema decorator
    violation_is: fatal
    source_bd_ids: []
  - id: SL-12
    description: Factor result_df MUST contain either 'filter_result' OR 'score_result' column
    locked_value: result_df.columns.intersection({'filter_result', 'score_result'}) non-empty
    violation_is: fatal
    source_bd_ids: []
  implementation_hints:
  - id: IH-01
    hint: 'Use AdjustType enum exactly: qfq (pre-adjust), hfq (post-adjust), bfq (none) — contract/__init__.py:121'
  - id: IH-02
    hint: For A-share kdata, default to hfq for long-term analysis (dividend-adjusted) — trader.py:538 StockTrader
  - id: IH-03
    hint: SQLite connection MUST use check_same_thread=False for multi-threaded recorders
  - id: IH-04
    hint: Accumulator state serialization uses JSON with custom encoder/decoder hooks — contract/base_service.py
  - id: IH-05
    hint: Factor.level MUST match TargetSelector.level (enforced at add_factor) — factors/target_selector.py:84
preservation_manifest:
  required_objects:
    business_decisions_count: 116
    fatal_constraints_count: 38
    non_fatal_constraints_count: 146
    use_cases_count: 2
    semantic_locks_count: 12
    preconditions_count: 4
    evidence_quality_rules_count: 2
    traceback_scenarios_count: 5
architecture:
  pipeline: data_collection -> data_storage -> factor_computation -> target_selection -> trading_execution -> visualization
  stages:
  - id: data_collection
    narrative:
      does_what: TimeSeriesDataRecorder and FixedCycleDataRecorder fetch OHLCV and fundamental data from providers (eastmoney,
        joinquant, baostock, akshare) and persist domain objects (Stock1dKdata, BalanceSheet) to SQLite via df_to_db().
      key_decisions: BD-002 chose evaluate_start_end_size_timestamps for incremental fetch (not full refresh) because comparing
        to get_latest_saved_record avoids redundant API calls; BD-003 chose get_data_map field transformation to keep domain
        schema provider-agnostic.
      common_pitfalls: 'Don''t forget SL-11: Recorder subclass MUST declare both provider and data_schema class attributes
        else initialization fails with assertion error; finance-C-001 fatal violation.'
    business_decisions: []
  - id: data_storage
    narrative:
      does_what: StorageBackend persists DataFrames to per-provider SQLite databases at {data_path}/{provider}/{provider}_{db_name}.db
        using path templates from _get_path_template; Mixin.record_data and Mixin.query_data provide uniform read/write interface.
      key_decisions: BD-004 chose StorageBackend abstraction (not hardcoded SQLite) to allow future cloud storage swap; BD-006
        derives db_name from data_schema __tablename__ for per-domain database isolation.
      common_pitfalls: SL-04 violation (wrong DataFrame index) causes factor pipeline failures downstream; always ensure df.index.names
        == ['entity_id', 'timestamp'] before calling record_data.
    business_decisions: []
  - id: factor_computation
    narrative:
      does_what: Factor.compute() applies Transformer (stateless, e.g. MacdTransformer) then Accumulator (stateful, e.g. MaStatsAccumulator)
        to produce filter_result or score_result columns; EntityStateService persists per-entity rolling state across batches.
      key_decisions: BD-007 chose Factor inheriting DataReader for composable data access; SL-08 locks MACD at (fast=12, slow=26,
        n=9) — chose standard Appel parameters not adaptive because interpretability matters for practitioners.
      common_pitfalls: 'SL-07: Transformer MUST run before Accumulator — swapping order causes NaN propagation; SL-12: result_df
        must contain filter_result OR score_result column or TargetSelector silently drops all signals.'
    business_decisions: []
  - id: target_selection
    narrative:
      does_what: TargetSelector.add_factor() registers Factor instances; get_targets() returns entity_ids passing threshold
        filter at a specific timestamp, enabling point-in-time historical backtesting without look-ahead.
      key_decisions: BD-012 chose registrable factor list (not hardcoded) for runtime customization; BD-013 chose timestamp-specific
        filtering not current-only because backtests need historical point-in-time correctness.
      common_pitfalls: Factor.level MUST match TargetSelector.level (IH-05); mismatched levels cause silent empty target lists
        that look like no signals but are actually level-mismatch bugs.
    business_decisions: []
  - id: trading_execution
    narrative:
      does_what: Trader.run() calls sell() before buy() each cycle, generates TradingSignals with due_timestamp = happen_timestamp
        + level.to_second() for next-bar execution, and applies on_profit_control() for stop-loss/take-profit before regular
        target selection.
      key_decisions: SL-01 locks sell-before-buy order because available_long check in sim_account depends on it — chose this
        over symmetric ordering to prevent implicit leverage; BD-039 chose long=AND/short=OR multi-level logic to reflect
        risk asymmetry.
      common_pitfalls: 'SL-02 violation (immediate execution instead of next-bar) introduces look-ahead bias and makes backtest
        results unreproducible in live trading; SL-10: A-share T+1 constraint — backtesting without it overstates returns.'
    business_decisions: []
  - id: visualization
    narrative:
      does_what: Drawer.draw() combines kline main chart with factor overlays and Rect annotations for entry/exit signals
        using Plotly; Drawable interface on Factor enables consistent chart rendering across data types.
      key_decisions: BD-019 chose drawer_rects subclass override for custom annotations not hardcoded markers — allows traders
        to define entry/exit visuals without modifying base drawing logic.
      common_pitfalls: draw_result=True by default (BD-055) is fine for development but set draw_result=False in production/headless
        environments to avoid Plotly server startup overhead.
    business_decisions: []
  - id: cross_cutting_concerns
    narrative:
      does_what: 'Invariants and utilities that span multiple pipeline stages — collected from 37 source groups: API Usage(1),
        Architecture(1), Caching(1), Compatibility(1), Concurrency(1), Configuration(2), and 31 more.'
      key_decisions: 116 BDs merged here because they apply to more than one main stage (e.g. algorithm helpers, default value
        choices, ordering contracts, error handling). Agent should inspect individual BD summaries and link back to affected
        main stages via shared IDs.
      common_pitfalls: Cross-cutting concerns frequently surface as bugs when changes to one main stage unintentionally break
        another. Check constraints referencing these BDs and verify invariants still hold after any stage-local modification.
    business_decisions:
    - id: BD-GAP-013
      type: B
      summary: Batch update API is used for transforming links in Google Docs
    - id: BD-GAP-007
      type: T
      summary: Index document discovery pattern uses a known index document to find each linked documentation
    - id: BD-GAP-006
      type: B/DK
      summary: File-based caching is used for Google Drive API responses to avoid repeated downloads
    - id: BD-GAP-008
      type: B
      summary: Dual VCS support (Git AND Mercurial) for extracting file modification years in copyright update
    - id: BD-GAP-005
      type: B
      summary: threading.local storage is used to save function call returns during regexp matching
    - id: BD-GAP-010
      type: T
      summary: Google Docs is used as the storage mechanism for configuration options
    - id: BD-GAP-011
      type: B
      summary: Redirect file pattern is used to lookup Google Doc IDs dynamically
    - id: BD-GAP-002
      type: B/DK
      summary: MIME type parameter controls which document formats are downloaded from Google Drive
    - id: BD-GAP-004
      type: B
      summary: Block-based document processing preserves blockquotes during DOCX-to-RST conversion
    - id: BD-GAP-016
      type: B
      summary: Regexp-based file selection with ignore directories for copyright update operations
    - id: BD-GAP-003
      type: T
      summary: reStructuredText (RST) is the target format for documentation conversion, not Markdown
    - id: BD-GAP-009
      type: T
      summary: Year compression transforms sequential years into interval notation (2018-2020) in copyright notices
    - id: BD-GAP-001
      type: BA
      summary: Google Drive API with service account authentication is used instead of OAuth user flow for document downloads
    - id: BD-GAP-015
      type: DK
      summary: Test completeness is determined by file existence rather than code coverage analysis
    - id: BD-GAP-012
      type: B
      summary: Benchmark execution is prevented when uncommitted local changes exist
    - id: BD-GAP-014
      type: B
      summary: Dry-run mode is supported for link transformations to preview changes before applying
    - id: BD-GAP-017
      type: T
      summary: Pandoc is used as the DOCX-to-RST conversion engine
    - id: BD-028
      type: B
      summary: Accounts use hierarchical colon-separated naming with 4 standard types (Assets, Liabilities, Equity, Income,
        Expenses)
    - id: BD-029
      type: B/BA
      summary: Equity contains 'Opening-Balances' and 'Current-Earnings' standard sub-accounts
    - id: BD-050
      type: B
      summary: Open entries mark accounts as active with optional currencies and booking method
    - id: BD-051
      type: B
      summary: Close entries mark accounts as inactive after the specified date
    - id: BD-063
      type: B
      summary: Use date + 1 day offset for balance check placement
    - id: BD-064
      type: B/BA
      summary: Zero balance assertion for position verification
    - id: BD-065
      type: B/BA
      summary: 'Metadata boolean ''closing: TRUE'' as trigger'
    - id: BD-066
      type: B
      summary: Balance check date verification equals original + 1 day
    - id: BD-067
      type: B/BA
      summary: Extra tolerance multiplier for dual constraint satisfaction
    - id: BD-068
      type: B/BA
      summary: Expected proceeds = price * (-units) for short/long positions
    - id: BD-069
      type: B
      summary: Proceeds accumulation for non-income accounts
    - id: BD-070
      type: B/DK
      summary: Currency-by-currency inventory comparison
    - id: BD-071
      type: B/BA
      summary: Absolute difference tolerance check
    - id: BD-072
      type: B/BA
      summary: Tolerance inference per currency from postings
    - id: BD-073
      type: B
      summary: Proceeds inventory accumulation using weight
    - id: BD-074
      type: B/RC
      summary: Require each cost postings to have prices for validation
    - id: BD-075
      type: B
      summary: Error when proceeds inventory has unmatched currencies
    - id: BD-076
      type: B
      summary: Proceeds types include equity (for stock vesting)
    - id: BD-077
      type: B/BA
      summary: No errors expected for balanced multi-leg sale
    - id: BD-078
      type: B/BA
      summary: SellGainsError on unbalanced cash vs cost
    - id: BD-079
      type: B
      summary: Dual error type on imbalance with missing expense
    - id: BD-080
      type: B
      summary: Other currency (CAD) proceeds accepted as valid
    - id: BD-081
      type: B
      summary: Zero price sale accepted as valid
    - id: BD-039
      type: B
      summary: Weight calculation uses cost for lots, units for non-cost postings, explicit price overrides
    - id: BD-026
      type: B/BA
      summary: Interpolation tolerance is 0.005 (0.5% of balance) for balance assertions
    - id: BD-027
      type: B
      summary: Balance assertions use date-ordered preceding postings for interpolation
    - id: BD-053
      type: B/BA
      summary: Pad entries create balance between two dates using interpolation from preceding entries
    - id: BD-005
      type: BA
      summary: 'Two-pass booking: reductions first, then augmentations'
    - id: BD-006
      type: B
      summary: CostSpec separates incomplete from resolved costs
    - id: BD-007
      type: BA
      summary: Booking method dispatch via _BOOKING_METHODS dict
    - id: BD-008
      type: M/DK
      summary: STRICT_WITH_SIZE fallback for size-exact matches
    - id: BD-009
      type: B
      summary: MISSING sentinel for unfilled numbers (not None)
    - id: BD-025
      type: B/BA
      summary: 'Booking method defaults to ''NONE'' (strict mode: no currency mixing without explicit cost)'
    - id: BD-058
      type: B/RC
      summary: Position lot merging requires identical cost basis (number, currency, date, label)
    - id: BD-059
      type: B
      summary: 'Inventory addition: same-lot positions combine, different lots coexist in inventory'
    - id: BD-043
      type: B
      summary: 'Vesting example: 4-year vesting with 1-year cliff, monthly vesting thereafter'
    - id: BD-044
      type: B
      summary: 'Trading simulation: sell biggest winner or biggest loser based on random 50/50 selection'
    - id: BD-045
      type: B
      summary: 'Trading simulation: skip selling on days when buying (avoid same-day round-trip)'
    - id: BD-046
      type: B
      summary: 'Trading simulation: skip lots without price movement when selecting sell candidates'
    - id: BD-056
      type: B
      summary: Commission tracked as separate expense line (Expenses:Financial:Commissions) in trading
    - id: BD-057
      type: B
      summary: Vesting calculation uses EXACT decimal arithmetic (Decimal type) for precision
    - id: BD-082
      type: B/BA
      summary: 'INTERACTION: BD-001 × BD-003 → Parser line tracking enables secondary sort key for same-day ordering'
    - id: BD-083
      type: BA
      summary: 'INTERACTION: BD-023 × BD-026 → Balanced postings invariant vs tolerance creates tolerance boundary ambiguity'
    - id: BD-084
      type: B/BA
      summary: 'INTERACTION: BD-025 × BD-008 → Strict booking default vs STRICT_WITH_SIZE fallback creates behavioral inconsistency'
    - id: BD-085
      type: BA
      summary: 'INTERACTION: BD-057 × BD-026 → Decimal precision vs tolerance existence reveals incomplete Decimal coverage'
    - id: BD-086
      type: B/RC
      summary: 'INTERACTION: BD-006 × BD-005 × BD-031 → CostSpec incompleteness flows through two-pass booking to multi-lot
        inventory'
    - id: BD-087
      type: B
      summary: 'INTERACTION: BD-017 × BD-029 × BD-040 → Equity rollforward requires standard accounts and consistent date
        ordering'
    - id: BD-088
      type: BA
      summary: 'INTERACTION: BD-037 × BD-038 × BD-036 × BD-021 → Currency conversion chain creates systemic conversion dependency'
    - id: BD-089
      type: B/BA
      summary: 'INTERACTION: BD-010 × BD-011 × BD-012 × BD-032 → Plugin pipeline ordering creates invariant check dependencies'
    - id: BD-090
      type: B/BA
      summary: 'INTERACTION: BD-068 × BD-073 × BD-070 × BD-075 → Proceeds validation cascade from negation through currency
        matching'
    - id: BD-091
      type: B/BA
      summary: 'INTERACTION: BD-013 × BD-014 × BD-030 → Realization tree structure assumptions enable hierarchical aggregation'
    - id: BD-092
      type: B
      summary: 'INTERACTION: BD-002 × BD-061 → Immutable directives enable safe post-modification invariant checking'
    - id: BD-093
      type: BA
      summary: 'RISK CASCADE: Price lookup failure → multi-hop failure → constraint violation → validation error'
    - id: BD-094
      type: BA
      summary: 'RISK CASCADE: Incomplete CostSpec → booking resolution failure → incorrect lot assignment → wrong cost basis'
    - id: BD-095
      type: BA
      summary: 'RISK CASCADE: Plugin mode change → fixed ordering bypass → invariant check miss → silent corruption'
    - id: BD-096
      type: BA
      summary: 'RISK CASCADE: Proceeds negation error → weighted accumulation wrong → per-currency pass → false validation
        pass'
    - id: BD-052
      type: B
      summary: Document directives provide optional source file references for entries
    - id: BD-001
      type: B/DK
      summary: Lexer/Parser split using PLY (Python Lex-Yacc)
    - id: BD-002
      type: B
      summary: Directives are immutable NamedTuples
    - id: BD-003
      type: B/RC
      summary: Every directives require date; lineno is secondary sort key
    - id: BD-004
      type: BA
      summary: Options parsed per-file with aggregation for includes
    - id: BD-GAP-018
      type: DK
      summary: 'Missing: Timezone explicit annotation + UTC normalization'
    - id: BD-GAP-019
      type: B
      summary: 'Missing: Provider Priority & Credential Isolation'
    - id: BD-GAP-020
      type: RC
      summary: 'Missing: Delinquency Definition (DPD 30/60/90)'
    - id: BD-032
      type: B
      summary: 'Plugin pipeline runs in order: check_closing first, then other plugins, then close_tree last'
    - id: BD-033
      type: B
      summary: sellgains plugin moves realized gains to income when cost basis exceeds proceeds
    - id: BD-048
      type: B
      summary: check_closing plugin verifies closing entries match computed balances
    - id: BD-049
      type: B
      summary: close_tree plugin removes empty account subtrees after each processing
    - id: BD-024
      type: B/RC
      summary: Lots are identified by (account, currency, cost_spec) triple including number, currency, date, and label
    - id: BD-031
      type: B
      summary: Inventory holds multiple lots per (account, currency) with set-based equality
    - id: BD-036
      type: B/BA
      summary: Cost currency defaults to the same currency as the lot's units when no explicit cost currency specified
    - id: BD-042
      type: B/RC
      summary: Cost label (optional) allows distinguishing lots with same date/currency/amount
    - id: BD-047
      type: B
      summary: Inventory reduce uses (amount, currency) key for aggregation across lots
    - id: BD-037
      type: B
      summary: Price lookup uses (base, quote) currency tuple as key for rate retrieval
    - id: BD-038
      type: B
      summary: Implied price conversion via intermediate currency hops when direct rate unavailable
    - id: BD-013
      type: B
      summary: RealAccount extends dict with account component keys
    - id: BD-014
      type: B
      summary: Balance stored as Inventory per RealAccount
    - id: BD-015
      type: B
      summary: Balances computed via Inventory.reduce with convert functions
    - id: BD-016
      type: B
      summary: Amount uses __slots__ = () to prevent dynamic attributes
    - id: BD-030
      type: B/DK
      summary: Realization creates complete tree from root 'Root' down through 4 account levels
    - id: BD-034
      type: B
      summary: Summarization uses previous-day dates to maintain chronological ordering in reports
    - id: BD-035
      type: B
      summary: Transfer entries use date minus 1 day to precede cutoff date for account transfers
    - id: BD-040
      type: B
      summary: 'Clamp operation: income/expenses to equity at begin_date, summarize period, truncate after end_date, convert
        at end'
    - id: BD-041
      type: B
      summary: Balance assertions following transferred accounts are removed from the new account
    - id: BD-054
      type: B
      summary: Date range filtering in exports uses half-open interval [begin, end)
    - id: BD-055
      type: B
      summary: Treeify expands flat account lists into nested dict structure for hierarchical reporting
    - id: BD-060
      type: B
      summary: Realization tree contains both aggregated balances and per-node lot lists
    - id: BD-017
      type: BA
      summary: Income/expenses rolled to equity at period boundaries
    - id: BD-018
      type: B
      summary: Conversion entries at open date for zero-priced items
    - id: BD-019
      type: M
      summary: GetAccounts class uses getattr dispatch on entry class name
    - id: BD-062
      type: B
      summary: 'Position scaling: negate position units to create selling lot from buying position'
    - id: BD-010
      type: BA
      summary: 'Plugin processing mode: raw vs default'
    - id: BD-011
      type: B/DK
      summary: Documents plugin is always prepended
    - id: BD-012
      type: B
      summary: pad and balance are always appended as post plugins
    - id: BD-020
      type: B
      summary: Open/close account lifecycle validation
    - id: BD-021
      type: B
      summary: Currency constraints from Open declaration
    - id: BD-022
      type: BA/DK
      summary: Extra validations injected at load time
    - id: BD-023
      type: B
      summary: 'Double-entry accounting: every transaction must have balanced postings (sum to zero)'
    - id: BD-061
      type: B
      summary: 'Invariant checking: verify_balance_interpolation called after every entry modification'
resources:
  packages:
  - name: click >=7.0
    version_pin: latest
  - name: python-dateutil >=2.6.0
    version_pin: latest
  - name: regex >=2022.9.13
    version_pin: latest
  - name: flex / winflexbison-bin >=2.6.4
    version_pin: latest
  - name: bison-bin / winflexbison-bin >=3.8.0
    version_pin: latest
  - name: meson >=1.2.1
    version_pin: latest
  - name: meson-python >=0.14.0
    version_pin: latest
  - name: pytest
    version_pin: latest
  - name: mypy
    version_pin: latest
  - name: types-python-dateutil
    version_pin: latest
  strategy_scaffold:
    entry_point_name: run_backtest
    output_path: result.csv
    execution_mode: backtest
    conditional_entry_points:
      backtest:
        entry_point_name: run_backtest
        output_path: result.csv
      collector:
        entry_point_name: run_collector
        output_path: result.json
      factor:
        entry_point_name: run_factor
        output_path: result.parquet
      training:
        entry_point_name: run_training
        output_path: result.json
      serving:
        entry_point_name: run_server
        output_path: result.json
      research:
        entry_point_name: run_research
        output_path: result.json
    tail_template: "# === DO NOT MODIFY BELOW THIS LINE ===\nif __name__ == \"__main__\":\n    result = run_backtest()  #\
      \ implement above\n    from validate import enforce_validation\n    enforce_validation(result, output_path=\"{workspace}/result.csv\"\
      )\n# === END DO NOT MODIFY ==="
  host_adapter:
    target: openclaw
    timeout_seconds: 1800
    shell_operator_restriction: 'exec tool intercepts && / ; / | — never chain: ''pip install X && python Y''. Use separate
      exec calls.'
    install_recipes:
    - python3 -m pip install click >=7.0
    - python3 -m pip install python-dateutil >=2.6.0
    - python3 -m pip install regex >=2022.9.13
    - python3 -m pip install zvt
    credential_injection: JoinQuant/QMT credentials require user-side '!' prefix shell login. Never hardcode credentials in
      generated scripts.
    path_resolution: '{workspace} resolves to ~/.openclaw/workspace/doramagic at execution time.'
    file_io_tooling: Use openclaw 'write' tool for .py/.sql files; 'exec' tool for python3 /absolute/path/script.py (absolute
      paths only).
constraints:
  fatal:
  - id: finance-C-001
    when: When implementing monetary calculations in the parsing stage
    action: use Decimal type from beancount.core.number instead of floating-point types
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Floating-point arithmetic produces rounding errors that accumulate in financial calculations, leading to
      incorrect account balances and audit failures
    stage_ids:
    - parsing
  - id: finance-C-002
    when: When parsing directives from Beancount source files
    action: require every directive to have a valid date field as primary identifier
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Directives without dates cannot be temporally ordered, causing incorrect balance calculations and non-deterministic
      transaction sequencing
    stage_ids:
    - parsing
  - id: finance-C-003
    when: When encoding Beancount source files for parsing
    action: use UTF-8 encoding exclusively for each input files
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Non-UTF-8 encoded files cause character decoding errors, preventing valid entries from being parsed and resulting
      in lost transaction data
    stage_ids:
    - parsing
  - id: finance-C-009
    when: When using parser output directly for financial calculations
    action: use parsed entries that contain MISSING sentinel values for balance or cost computations
    severity: fatal
    kind: operational_lesson
    modality: must_not
    consequence: MISSING values in postings cause runtime errors or silent zero-value calculations, resulting in incorrect
      portfolio valuations and reconciliation failures
    stage_ids:
    - parsing
  - id: finance-C-014
    when: When presenting parsed data as completed ledger output
    action: claim that parsed entries are complete and ready for financial reporting without running booking
    severity: fatal
    kind: claim_boundary
    modality: must_not
    consequence: Presenting incomplete parsed entries as final results misleads users into using entries with unresolved MISSING
      values and unimterpolated amounts for financial decisions
    stage_ids:
    - parsing
  - id: finance-C-017
    when: When processing inventory reduction postings
    action: Match reductions against existing lots before applying interpolation
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Interpolation cannot succeed for reductions with missing price/cost because the booking method (FIFO/LIFO/etc.)
      must first determine which lot is being reduced
    stage_ids:
    - booking
  - id: finance-C-018
    when: When validating transaction completeness after booking
    action: Eliminate each MISSING values from posting units and costs
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Postings with incomplete amounts cannot be used for balance calculations or realized/unrealized gains computation
    stage_ids:
    - booking
  - id: finance-C-019
    when: When interpolating missing numbers in a currency group
    action: Have more than one missing value per currency group
    severity: fatal
    kind: domain_rule
    modality: must_not
    consequence: Multiple missing values create an underdetermined system with no unique solution, causing InterpolationError
      and transaction failure
    stage_ids:
    - booking
  - id: finance-C-026
    when: When implementing AVERAGE booking method
    action: Claim support for AVERAGE method as it is not implemented
    severity: fatal
    kind: resource_boundary
    modality: must_not
    consequence: AVERAGE booking always returns an AmbiguousMatchError, so any code claiming AVERAGE support is incorrect
    stage_ids:
    - booking
  - id: finance-C-034
    when: When implementing a plugin function for beancount
    action: return a tuple of (modified_entries, errors_list) from the plugin function
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Plugin return value mismatch causes loader to crash or corrupt entry list when extending errors
    stage_ids:
    - transformation
  - id: finance-C-036
    when: When implementing plugin function signatures
    action: accept (entries, options_map, *optional_config) parameters in that exact order
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Plugin function signature mismatch causes TypeError when loader attempts to call the plugin callback with
      (entries, options_map, *args)
    stage_ids:
    - transformation
  - id: finance-C-049
    when: When creating a RealAccount instance
    action: pass a string account_name, not None or non-string value
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Passing None or non-string as account_name causes ValueError, breaking account tree construction and preventing
      balance reporting for the entire account hierarchy
    stage_ids:
    - realization
  - id: finance-C-050
    when: When inserting a subaccount into a RealAccount tree
    action: use string keys matching the hierarchical account naming convention
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Non-string keys or mismatched account names cause KeyError/ValueError, breaking the tree structure and corrupting
      balance calculations for all child accounts
    stage_ids:
    - realization
  - id: finance-C-051
    when: When constructing Amount or Position objects
    action: use Decimal for each monetary number values and immutable __slots__ = () pattern
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Using float instead of Decimal for monetary values causes rounding errors in balance calculations, leading
      to incorrect financial reports and potential compliance issues
    stage_ids:
    - realization
  - id: finance-C-061
    when: When summarizing entries to create opening balance transactions
    action: Verify the total balance of summarized entries equals exactly zero
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: The accounting identity will be violated, causing the balance sheet to be fundamentally incorrect with non-zero
      total assets and liabilities
    stage_ids:
    - summarization
  - id: finance-C-062
    when: When computing balances for summarization before a cutoff date
    action: Include only entries strictly before the date parameter
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Entries at the boundary date will be incorrectly included or excluded, causing duplicate or missing transactions
      in the opening balances
    stage_ids:
    - summarization
  - id: finance-C-063
    when: When inserting synthesized entries at a period boundary
    action: Insert transfer and summary entries on the day before the boundary date
    severity: fatal
    kind: architecture_guardrail
    modality: must
    consequence: Open directives will appear after transactions, violating the requirement that Open entries precede all activity
      for that account
    stage_ids:
    - summarization
  - id: finance-C-064
    when: When creating conversion entries for zero-priced items
    action: Use ZERO as the price amount to maintain the balance invariant
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: The balance invariant will be broken because conversion entries will not correctly offset the original positions,
      causing phantom gains or losses
    stage_ids:
    - summarization
  - id: finance-C-065
    when: When executing the open() function for period opening
    action: Execute conversions before clear, and clear before summarize in that exact order
    severity: fatal
    kind: architecture_guardrail
    modality: must
    consequence: Income/expense accounts will still have balances when summarization runs, causing those accounts to incorrectly
      appear in opening balance transactions
    stage_ids:
    - summarization
  - id: finance-C-071
    when: When closing income and expense accounts at period boundaries
    action: Transfer the accumulated balances to the equity earnings account before summarization
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Income statement accounts will show residual balances that should have been closed to equity, causing the
      balance sheet to not balance correctly
    stage_ids:
    - summarization
  - id: finance-C-076
    when: When using the clamp function to filter entries to a time period
    action: Execute income/expense transfer before summarize, then truncate, then add conversion entries
    severity: fatal
    kind: architecture_guardrail
    modality: must
    consequence: Income statement accounts will have residual balances and the period will not end with zero total balance
      as required for period reporting
    stage_ids:
    - summarization
  - id: finance-C-079
    when: When implementing account lifecycle validation
    action: Prevent duplicate Open or Close directives for the same account
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Duplicate Open/Close directives cause ambiguous account lifecycle, leading to incorrect balance calculations
      and reports that mix up different account states
    stage_ids:
    - validation
  - id: finance-C-080
    when: When implementing account close validation
    action: Verify Close directive date is strictly after its corresponding Open directive date
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Closing an account before it is opened creates invalid accounting state where transactions could reference
      an account that doesn't exist, corrupting the ledger integrity
    stage_ids:
    - validation
  - id: finance-C-081
    when: When implementing balance assertion validation
    action: Reject duplicate Balance entries with different amounts on the same (account, currency, date)
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Conflicting balance assertions for the same account on the same date create irreconcilable accounting state,
      causing incorrect account balances and reporting errors
    stage_ids:
    - validation
  - id: finance-C-083
    when: When implementing transaction validation
    action: Verify each transaction postings balance to zero within tolerance
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Unbalanced transactions violate double-entry accounting principles, resulting in incorrect ledger balances
      and financial reports that do not sum correctly
    stage_ids:
    - validation
  - id: finance-C-084
    when: When implementing currency constraint validation
    action: Enforce that postings only use currencies declared in the account's Open directive
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Postings with currencies not allowed for an account violate currency constraints, leading to incorrect inventory
      tracking and mixing of incompatible currencies in the same account
    stage_ids:
    - validation
  - id: finance-C-085
    when: When implementing active account validation
    action: Verify each directive references to accounts occur within the account's open-close interval
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: References to accounts outside their active period create invalid transactions that reference non-existent
      or closed accounts, corrupting the accounting records
    stage_ids:
    - validation
  - id: finance-C-088
    when: When running transaction balance validation
    action: Execute balance checks AFTER each user plugin transformations
    severity: fatal
    kind: architecture_guardrail
    modality: must
    consequence: Running balance checks before plugins means unbalanced input transactions are rejected even when plugins
      are designed to fix them, breaking valid workflows
    stage_ids:
    - validation
  - id: finance-C-089
    when: When calling the validation pipeline
    action: Invoke validation AFTER booking and transformations are complete
    severity: fatal
    kind: architecture_guardrail
    modality: must
    consequence: Validation before booking or transformations checks incomplete or incorrect data, producing false errors
      that do not reflect the final state of the ledger
    stage_ids:
    - validation
  - id: finance-C-095
    when: When parsing produces directives with CostSpec (incomplete costs)
    action: Convert each CostSpec instances to Cost instances during booking stage using the account's configured booking
      method
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Transaction balance calculations will be incorrect if CostSpec remains unresolved, causing wrong lot matching
      and incorrect cost basis for assets
  - id: finance-C-097
    when: When booking stage produces entries with resolved costs
    action: Verify entries remain sorted by date after booking completion using entry_sortkey
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Position calculations and balance checks will be incorrect if entries are processed out of chronological
      order, violating accounting ledger order requirements
  - id: finance-C-100
    when: When plugin_processing_mode is 'default'
    action: 'Execute plugins in exact order: PLUGINS_PRE first, then user plugins, then PLUGINS_AUTO, then PLUGINS_POST last'
    severity: fatal
    kind: architecture_guardrail
    modality: must
    consequence: Balance checks and padding will run at wrong time, allowing unbalanced transactions to pass validation or
      padding to be applied incorrectly
  - id: finance-C-102
    when: When fully transformed directives are passed to validation stage
    action: Verify each Transaction postings balance with tolerance checking using inferred_tolerances from options_map
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Unbalanced transactions will appear in reports as valid, leading to incorrect financial records and potentially
      wrong tax calculations
  - id: finance-C-108
    when: When implementing or writing code that creates or manipulates monetary amounts in beancount
    action: Use the Decimal type (via D() function) for each monetary numbers instead of Python float or int — never use floating-point
      in an accounting system
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Floating-point arithmetic causes rounding errors that accumulate across transactions, leading to incorrect
      account balances and misreported financial positions
  - id: finance-C-110
    when: When defining or using directives (Transaction, Open, Close, Balance, etc.) in beancount
    action: Treat each directive instances as immutable — never mutate NamedTuple fields after construction; use meta dict
      for metadata (filename, lineno) instead of modifying state
    severity: fatal
    kind: architecture_guardrail
    modality: must
    consequence: Mutating a directive breaks immutability assumptions throughout the system, causing inconsistent balances
      and unpredictable plugin behavior
  - id: finance-C-113
    when: When sorting or retrieving entries in beancount
    action: Sort each directives using entry_sortkey(entry) which returns (date, directive_type_sort_order, lineno) — never
      sort by date alone or by file order alone
    severity: fatal
    kind: architecture_guardrail
    modality: must
    consequence: Incorrect sort order breaks balance assertions, causes check directives to process after transactions on
      the same day, and corrupts inventory calculations
  - id: finance-C-114
    when: When creating or accessing positions within an Inventory in beancount
    action: 'Key each Inventory positions by a tuple of (currency: str, cost: Cost|None) — use None cost for non-booked positions
      and a Cost instance for booked lots'
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Incorrect inventory keying causes positions to merge incorrectly, leading to wrong cost basis calculations
      and distorted portfolio reports
  - id: finance-C-143
    when: When implementing or refactoring period-bound financial snapshot logic in beancount/ops/summarize.py
    action: 'Execute clamp operation in the exact sequence: (1) move income/expenses to Equity:Current-Earnings at begin_date,
      (2) summarize the period, (3) remove entries after end_date, (4) apply currency conversion at end_date'
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Reordering the clamp operation sequence—such as applying currency conversion before truncation—produces incorrect
      period snapshots where gains/losses are valued at wrong exchange rates or include entries outside the specified period
    derived_from_bd_id: BD-040
  regular:
  - id: finance-C-004
    when: When validating user-defined options from Beancount source files
    action: validate each option value using its designated converter function and raise ValueError on invalid inputs
    severity: high
    kind: domain_rule
    modality: must
    consequence: Invalid option values bypass validation and propagate through the system, causing unexpected behavior in
      downstream processing stages like booking and transformation
    stage_ids:
    - parsing
  - id: finance-C-005
    when: When providing input files to the Beancount parser
    action: provide files with non-absolute path names to the top-level load_file function
    severity: high
    kind: resource_boundary
    modality: must_not
    consequence: Relative file paths cause include directive resolution to fail unpredictably depending on current working
      directory, resulting in missing entries or incorrect file loading
    stage_ids:
    - parsing
  - id: finance-C-006
    when: When loading plugin modules via the plugin directive
    action: verify plugin modules define the __plugins__ tuple attribute to be recognized by the transformation system
    severity: high
    kind: resource_boundary
    modality: must
    consequence: Plugins without __plugins__ attribute are silently skipped during run_transformations, leaving entries unprocessed
      by expected validation and transformation logic
    stage_ids:
    - parsing
  - id: finance-C-007
    when: When including additional files via include directives
    action: resolve relative include paths against the directory of the file containing the directive
    severity: high
    kind: resource_boundary
    modality: must
    consequence: Incorrect include path resolution causes files to not be found, resulting in missing directives and incomplete
      ledger data
    stage_ids:
    - parsing
  - id: finance-C-008
    when: When processing duplicate include file references
    action: allow the same file to be parsed more than once in a single load operation
    severity: high
    kind: resource_boundary
    modality: must_not
    consequence: Duplicate file parsing creates duplicate directives with identical timestamps, causing double-counting of
      transactions and incorrect financial reports
    stage_ids:
    - parsing
  - id: finance-C-010
    when: When implementing directive sorting for downstream processing
    action: sort entries by date as primary key and use lineno as secondary sort key for same-day directives
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Incorrect sorting causes Balance directives to be evaluated after Transactions on the same day, breaking
      the accounting invariant that balances apply at the beginning of the day
    stage_ids:
    - parsing
  - id: finance-C-011
    when: When building directive data structures
    action: create directives as immutable NamedTuple instances to verify safe sharing and caching
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Mutable directive objects cause unexpected side effects when shared across plugins or cached, leading to
      non-deterministic behavior and hard-to-debug inconsistencies
    stage_ids:
    - parsing
  - id: finance-C-012
    when: When processing include directives recursively
    action: maintain a stack-based processing order where each included file is fully processed before moving to the next
    severity: medium
    kind: architecture_guardrail
    modality: must
    consequence: Non-stack-based include processing causes entries to be interleaved incorrectly, breaking the assumption
      that entries from included files are contiguous in date order
    stage_ids:
    - parsing
  - id: finance-C-013
    when: When aggregating options from multiple include files
    action: merge operating_currency lists and dcontext from included files into the top-level options map
    severity: medium
    kind: architecture_guardrail
    modality: must
    consequence: Unmerged operating_currency lists cause currency restrictions from included files to be ignored, allowing
      invalid currency postings without error
    stage_ids:
    - parsing
  - id: finance-C-015
    when: When using parser output for real-time financial decisions
    action: claim the parser provides real-time data synchronization with exchange systems
    severity: high
    kind: claim_boundary
    modality: must_not
    consequence: The parser only reads static Beancount source files; it does not connect to any external data source, so
      presenting parsed data as real-time creates false confidence in data freshness
    stage_ids:
    - parsing
  - id: finance-C-016
    when: When implementing lot matching with incomplete cost specifications
    action: Use MISSING sentinel (not None) for unfilled numbers in CostSpec
    severity: high
    kind: domain_rule
    modality: must
    consequence: MISSING sentinel propagates through the booking process and surfaces in clear error messages, whereas None
      would silently cause type errors or incorrect matches
    stage_ids:
    - booking
  - id: finance-C-020
    when: When inferring price for a posting with cost specification
    action: Attempt to infer price from the residual for cost-held postings
    severity: high
    kind: domain_rule
    modality: must_not
    consequence: Cost-based postings should use cost for value calculation, not price interpolation, leading to incorrect
      lot valuation
    stage_ids:
    - booking
  - id: finance-C-021
    when: When creating a position with cost specification
    action: Allow zero units with a non-None cost
    severity: high
    kind: domain_rule
    modality: must_not
    consequence: Zero units with cost creates an invalid lot that cannot be meaningfully tracked or valued
    stage_ids:
    - booking
  - id: finance-C-022
    when: When creating or interpolating a cost value
    action: Allow negative cost numbers
    severity: high
    kind: domain_rule
    modality: must_not
    consequence: Negative cost values break financial calculations and PnL computations, leading to incorrect balance assertions
    stage_ids:
    - booking
  - id: finance-C-023
    when: When booking reductions against existing inventory
    action: Verify reduction postings match existing lots with matching cost currency
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Reducing inventory against lots with mismatched currencies creates phantom gains/losses in reports
    stage_ids:
    - booking
  - id: finance-C-024
    when: When handling ambiguous lot matching scenarios
    action: Use configured booking method from _BOOKING_METHODS dispatch table
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Incorrect booking method selection leads to wrong lot matching, causing wrong cost basis for subsequent transactions
    stage_ids:
    - booking
  - id: finance-C-025
    when: When updating inventory balances after booking reductions
    action: Update local balance tracking to avoid matching same lot twice in one transaction
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Double-matching the same lot causes inventory to go negative or incorrect cost basis calculations
    stage_ids:
    - booking
  - id: finance-C-027
    when: When matching reductions against inventory balance
    action: Match a reduction posting against positive lots (same-sign inventory)
    severity: high
    kind: domain_rule
    modality: must_not
    consequence: Matching reductions to positive lots creates invalid double-negative or double-positive positions
    stage_ids:
    - booking
  - id: finance-C-028
    when: When resolving augmentation postings with incomplete cost
    action: Convert CostSpec to Cost only after interpolation completes
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Converting CostSpec before interpolation prevents filling missing cost_per/cost_total values from the transaction
      residual
    stage_ids:
    - booking
  - id: finance-C-029
    when: When converting CostSpec to Cost for augmenting postings
    action: Verify each required cost fields (number_per or number_total, currency, date) are resolved
    severity: high
    kind: domain_rule
    modality: must
    consequence: Incomplete Cost after conversion breaks position valuation and causes downstream errors in reports
    stage_ids:
    - booking
  - id: finance-C-030
    when: When using STRICT booking method
    action: Reject ambiguous matches unless each matching lots sum exactly to the reduction amount
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Ambiguous matches without exact sum lead to arbitrary lot selection, causing wrong tax lot calculations
    stage_ids:
    - booking
  - id: finance-C-031
    when: When booking with NONE method
    action: Treat postings as augmentations without attempting inventory matching
    severity: medium
    kind: resource_boundary
    modality: must
    consequence: NONE method intentionally skips matching; forcing matching creates mixed inventories that violate account
      conventions
    stage_ids:
    - booking
  - id: finance-C-032
    when: When inventory has insufficient lots to satisfy a reduction request
    action: Report ReductionError or AmbiguousMatchError for insufficient lots
    severity: high
    kind: domain_rule
    modality: must
    consequence: Undetected insufficient lots creates phantom inventory positions that cause incorrect balance assertions
    stage_ids:
    - booking
  - id: finance-C-033
    when: When processing same-day transactions affecting same account
    action: Use local balance tracking that includes prior same-day postings
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Without cumulative local balance, same-day reductions cannot match augmentations, causing phantom insufficient
      lot errors
    stage_ids:
    - booking
  - id: finance-C-035
    when: When implementing a beancount plugin module
    action: define the __plugins__ tuple with valid function names from that module
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Plugin without __plugins__ attribute is silently skipped by the loader, causing transformation logic to never
      execute
    stage_ids:
    - transformation
  - id: finance-C-037
    when: When plugins modify the entry list during transformation
    action: preserve chronological ordering by using data.entry_sortkey for sorting new entries
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Unsorted entries after plugin transformation cause incorrect balance calculations and validation errors in
      subsequent stages
    stage_ids:
    - transformation
  - id: finance-C-038
    when: When a plugin raises a non-SystemExit exception during execution
    action: allow exceptions to propagate and stop processing - exceptions must be caught and converted to LoadError
    severity: high
    kind: architecture_guardrail
    modality: must_not
    consequence: Uncaught plugin exception terminates the entire loader, preventing other plugins from running and losing
      partial work
    stage_ids:
    - transformation
  - id: finance-C-039
    when: When plugin_processing_mode is set to 'default' (default behavior)
    action: automatically prepend documents plugin and append pad/balance plugins to the plugin chain
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Missing documents plugin causes document directives to not be processed; missing pad/balance causes balance
      assertions to fail
    stage_ids:
    - transformation
  - id: finance-C-040
    when: When plugin_processing_mode is set to 'raw'
    action: only execute user-specified plugins without automatically running pre/post plugins
    severity: high
    kind: resource_boundary
    modality: must
    consequence: Setting 'raw' mode without explicitly loading pad/balance causes balance assertions to never be checked,
      silently producing incorrect results
    stage_ids:
    - transformation
  - id: finance-C-041
    when: When implementing a plugin that synthesizes new entries (e.g., auto_accounts)
    action: sort newly synthesized entries using data.entry_sortkey before returning them
    severity: high
    kind: operational_lesson
    modality: must
    consequence: Unsorted new entries cause temporal ordering violations where newer entries appear before older ones, corrupting
      account balance calculations
    stage_ids:
    - transformation
  - id: finance-C-042
    when: When a plugin raises SystemExit during transformation
    action: allow SystemExit to propagate immediately without catching or converting to error
    severity: medium
    kind: architecture_guardrail
    modality: must
    consequence: SystemExit should not be caught to allow intentional termination of beancount processing (e.g., bail out
      on critical errors)
    stage_ids:
    - transformation
  - id: finance-C-043
    when: When configuring plugin processing in beancount files
    action: use invalid values for plugin_processing_mode option (only 'raw' or 'default' are valid)
    severity: high
    kind: domain_rule
    modality: must_not
    consequence: Invalid plugin_processing_mode value causes assertion failure and loader crashes without processing any plugins
    stage_ids:
    - transformation
  - id: finance-C-044
    when: When importing plugin modules during transformation
    action: allow ImportError to terminate processing - errors must be caught and reported as LoadError
    severity: high
    kind: architecture_guardrail
    modality: must_not
    consequence: Uncaught ImportError from missing plugin module crashes the entire loader without reporting which plugin
      failed
    stage_ids:
    - transformation
  - id: finance-C-045
    when: When combining multiple plugins using loader.combine_plugins()
    action: verify combined module has __plugins__ attribute containing each functions from source modules
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Combined plugin without proper __plugins__ attribute is silently skipped, losing transformations from all
      constituent plugins
    stage_ids:
    - transformation
  - id: finance-C-046
    when: When processing entries in transformation stage
    action: verify entries are sorted after each plugin completes to maintain chronological order
    severity: high
    kind: domain_rule
    modality: must
    consequence: Missing sort after plugin transformation breaks temporal ordering causing incorrect balance calculations
      and validation errors
    stage_ids:
    - transformation
  - id: finance-C-047
    when: When loading beancount files with plugin configuration
    action: insert user-specified pythonpath entries at the front of sys.path before loading plugins
    severity: medium
    kind: operational_lesson
    modality: must
    consequence: Plugin modules in user pythonpath fail to import if pythonpath is not inserted first, causing plugin import
      failures
    stage_ids:
    - transformation
  - id: finance-C-048
    when: When setting the plugin_processing_mode option
    action: use 'default' mode for normal processing where pad/balance are needed, only use 'raw' when full control is required
    severity: medium
    kind: claim_boundary
    modality: must
    consequence: Using 'raw' mode without explicitly loading pad/balance creates false confidence that balances are being
      checked when they are not
    stage_ids:
    - transformation
  - id: finance-C-052
    when: When reducing an Inventory balance
    action: use the specified reducer function matching the desired balance type (get_units for units, get_cost for cost basis,
      get_weight for weighted cost)
    severity: high
    kind: domain_rule
    modality: must
    consequence: Using the wrong reducer function produces incorrect balance types, causing financial reports to show wrong
      units vs cost basis vs market value
    stage_ids:
    - realization
  - id: finance-C-053
    when: When iterating postings with balance in realize function
    action: process entries in chronological order to maintain correct running balance
    severity: high
    kind: domain_rule
    modality: must
    consequence: Out-of-order entry processing corrupts running balance calculations, causing incorrect account balances and
      invalid financial reports
    stage_ids:
    - realization
  - id: finance-C-054
    when: When calling realize function
    action: pass entries that have been previously filtered and date-sorted
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Realizing unsorted entries leads to incorrect account tree structure and wrong balance ordering in reports,
      breaking the chronological integrity of financial data
    stage_ids:
    - realization
  - id: finance-C-055
    when: When building the RealAccount tree
    action: create parent accounts as dict containers before adding child account nodes
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Missing parent accounts breaks the hierarchical tree structure, causing KeyError exceptions and preventing
      balance aggregation across the account hierarchy
    stage_ids:
    - realization
  - id: finance-C-056
    when: When using balance_reducer function
    action: provide a compatible reducer function accepting Position and returning Amount
    severity: high
    kind: resource_boundary
    modality: must
    consequence: Incompatible reducer functions cause type errors during balance computation, preventing the realization stage
      from completing and generating account reports
    stage_ids:
    - realization
  - id: finance-C-057
    when: When computing account balance from postings
    action: initialize balance as empty Inventory and accumulate positions incrementally
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Improper balance initialization causes position data loss, leading to incorrect final balances that do not
      reflect all transactions in the account
    stage_ids:
    - realization
  - id: finance-C-058
    when: When calling iterate_with_balance
    action: not pass Posting instances directly - use TxnPosting wrappers instead
    severity: high
    kind: domain_rule
    modality: must
    consequence: Passing raw Posting instances causes assertion errors at line 420, breaking the iteration and preventing
      balance accumulation
    stage_ids:
    - realization
  - id: finance-C-059
    when: When computing the total balance of parent accounts
    action: aggregate balances from each child accounts using Inventory addition
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Not aggregating child balances causes parent accounts to show incorrect balances that exclude subaccount
      positions, breaking hierarchical balance reporting
    stage_ids:
    - realization
  - id: finance-C-060
    when: When realizing entries without postings
    action: handle empty entry lists gracefully by returning root RealAccount with empty balances
    severity: medium
    kind: operational_lesson
    modality: must
    consequence: Not handling empty entry lists causes null reference errors, preventing the system from generating reports
      when no transactions exist
    stage_ids:
    - realization
  - id: finance-C-066
    when: When generating opening balance entries for accounts with empty balances
    action: Create opening balance transactions for accounts with zero inventory
    severity: high
    kind: domain_rule
    modality: must_not
    consequence: Synthetic transactions will be created for accounts with no activity, cluttering the balance sheet and potentially
      causing downstream calculation errors
    stage_ids:
    - summarization
  - id: finance-C-067
    when: When transfer_balances() removes balance assertions after a cutoff
    action: Remove Balance assertions for transferred accounts that occur after the transfer date
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Balance assertions will fail because the account balance has been transferred away, causing validation errors
      or incorrect balance checks
    stage_ids:
    - summarization
  - id: finance-C-068
    when: When summarizing entries that contain positions with cost basis
    action: Include cost information in the summarized postings to preserve the original cost basis
    severity: high
    kind: domain_rule
    modality: must
    consequence: Positions will lose their cost basis information, causing incorrect average cost calculations and misreported
      asset values
    stage_ids:
    - summarization
  - id: finance-C-069
    when: When using GetAccounts class to gather accounts from directives
    action: Verify each directive classes have corresponding handler methods defined
    severity: high
    kind: resource_boundary
    modality: must
    consequence: AttributeError will be raised for unknown directive types, causing the summarization pipeline to fail on
      valid entries
    stage_ids:
    - summarization
  - id: finance-C-070
    when: When creating summarized entries for period reporting
    action: Sort the combined entries (open, price, summarizing) by data.entry_sortkey before returning
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Entries will not be in chronological order, violating the requirement that entries be sorted by date and
      potentially causing incorrect balance calculations
    stage_ids:
    - summarization
  - id: finance-C-072
    when: When entries list is empty before calling summarization functions
    action: Return the empty entries list immediately without processing
    severity: high
    kind: operational_lesson
    modality: must
    consequence: IndexError or other exceptions may be raised when processing empty lists, causing the pipeline to fail silently
      or with cryptic errors
    stage_ids:
    - summarization
  - id: finance-C-073
    when: When computing conversion balances for entries with positions at cost
    action: Use conversion_cost_balance (reduced by convert.get_cost) for creating conversion entries
    severity: high
    kind: domain_rule
    modality: must
    consequence: Conversion entries will be created for positions at cost rather than units, causing incorrect currency conversion
      calculations
    stage_ids:
    - summarization
  - id: finance-C-074
    when: When summarizing entries and preserving price directives
    action: Preserve only the last relevant price entry for each commodity before the cutoff date
    severity: medium
    kind: operational_lesson
    modality: must
    consequence: Multiple stale price entries will be retained, causing the price fetcher to query unnecessary historical
      prices and potentially use outdated conversion rates
    stage_ids:
    - summarization
  - id: finance-C-075
    when: When presenting summarized entries as the basis for a balance sheet
    action: Claim that the summarized entries represent actual original transactions
    severity: high
    kind: claim_boundary
    modality: must_not
    consequence: Stakeholders will be misled about the provenance of transactions; summarized entries are synthetic replacements,
      not original ledger entries
    stage_ids:
    - summarization
  - id: finance-C-077
    when: When creating conversion entries with conversion_currency parameter
    action: Use the specified conversion_currency for each zero-priced conversion postings
    severity: high
    kind: resource_boundary
    modality: must
    consequence: Conversion entries will use the wrong target currency, causing incorrect balance calculations when the ledger
      contains multiple currencies
    stage_ids:
    - summarization
  - id: finance-C-078
    when: When handling entries with compress_unbooked option for NONE booking
    action: Merge postings together to obtain accurate cost basis for accounts with NONE booking method
    severity: medium
    kind: operational_lesson
    modality: must
    consequence: Individual lot positions will cause misleading profit/loss calculations because positions are not properly
      matched against existing cost bases
    stage_ids:
    - summarization
  - id: finance-C-082
    when: When implementing commodity declaration validation
    action: Enforce uniqueness of Commodity directives per currency
    severity: high
    kind: domain_rule
    modality: must
    consequence: Duplicate Commodity directives for the same currency create ambiguous price information, causing incorrect
      cost basis calculations and price lookups
    stage_ids:
    - validation
  - id: finance-C-086
    when: When loading and validating a ledger file
    action: Collect each validation errors rather than stopping at the first error
    severity: high
    kind: domain_rule
    modality: must
    consequence: Stopping at the first error prevents users from seeing all issues at once, requiring multiple fix-run cycles
      to discover all problems in the ledger
    stage_ids:
    - validation
  - id: finance-C-087
    when: When implementing extra validation injection
    action: Support the extra_validations parameter for custom validation functions
    severity: medium
    kind: architecture_guardrail
    modality: must
    consequence: Without extensible validations, users cannot add custom business rules, limiting the system's ability to
      enforce domain-specific accounting policies
    stage_ids:
    - validation
  - id: finance-C-090
    when: When implementing account lifecycle exceptions
    action: Allow Balance, Document, and Note directives after account closure
    severity: medium
    kind: domain_rule
    modality: must
    consequence: Rejecting legitimate Balance/Document/Note entries after close prevents users from verifying account closure
      correctness or attaching late-arriving documents
    stage_ids:
    - validation
  - id: finance-C-091
    when: When implementing document path validation
    action: Require each Document entries to have absolute file paths
    severity: medium
    kind: domain_rule
    modality: must
    consequence: Relative paths in Document directives cause file lookup failures when the working directory changes, breaking
      document association functionality
    stage_ids:
    - validation
  - id: finance-C-092
    when: When implementing tolerance-based balance checking
    action: Use tolerance when comparing transaction residual to zero
    severity: high
    kind: domain_rule
    modality: must
    consequence: Comparing without tolerance causes false errors for legitimate rounding differences, rejecting valid transactions
      due to sub-cent precision differences
    stage_ids:
    - validation
  - id: finance-C-093
    when: When implementing data type validation
    action: Check entry attribute data types match expected schema
    severity: medium
    kind: domain_rule
    modality: should
    consequence: Invalid data types in entries cause runtime errors during reporting or calculation, leading to crashes or
      silent data corruption
    stage_ids:
    - validation
  - id: finance-C-094
    when: When configuring validation levels
    action: Separate BASIC_VALIDATIONS from slow HARDCORE_VALIDATIONS
    severity: low
    kind: operational_lesson
    modality: should
    consequence: Running all validations including slow ones during development creates unnecessary performance overhead,
      slowing down iteration cycles
    stage_ids:
    - validation
  - id: finance-C-096
    when: When booking method is specified in Open directive
    action: Use the per-account Booking enum value from Open directive instead of the global option_map booking_method
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Incorrect lot matching method applied to account, leading to wrong cost basis and potentially incorrect capital
      gains calculations
  - id: finance-C-098
    when: When BalanceError list is passed from booking stage
    action: Must not halt processing when BalanceError is encountered; errors must accumulate for downstream reporting
    severity: high
    kind: architecture_guardrail
    modality: must_not
    consequence: User will not see validation errors indicating mismatched balances, leading to undetected accounting errors
      in final reports
  - id: finance-C-099
    when: When directives flow through plugin pipeline PLUGINS_PRE → user plugins → PLUGINS_AUTO → PLUGINS_POST
    action: Maintain entries sorted by entry_sortkey after each plugin execution
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Plugins may process entries in wrong chronological order, breaking accounting logic that depends on sequential
      position updates
  - id: finance-C-101
    when: When user plugin raises an exception during transformation
    action: Must not crash the entire loader; errors must be caught and accumulated for user reporting
    severity: high
    kind: architecture_guardrail
    modality: must_not
    consequence: Entire ledger processing halts on single plugin error, preventing user from seeing all other errors or generating
      partial reports
  - id: finance-C-103
    when: When accumulated errors from each previous stages are passed to validation
    action: Combine each errors from parsing, booking, and transformation stages before reporting to user
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: User sees incomplete error list, missing critical issues from earlier stages that affect data integrity
  - id: finance-C-104
    when: When options map is passed from parsing to validation
    action: Provide tolerance_multiplier and inferred_tolerance_default values for balance validation checks
    severity: high
    kind: domain_rule
    modality: must
    consequence: Balance checks will use incorrect tolerances, failing valid entries or accepting invalid ones, corrupting
      financial records
  - id: finance-C-105
    when: When options map is passed from parsing to validation
    action: Provide account_types configuration for validating active account references
    severity: medium
    kind: domain_rule
    modality: must
    consequence: Validation will incorrectly flag transactions on valid accounts as errors or fail to detect transactions
      on closed accounts
  - id: finance-C-106
    when: When validated directives are passed to realization for report generation
    action: Pass entries that have passed each validation checks; any failed transactions must still be included with specified
      error flags
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Reports will show incorrect balances if invalid entries are silently dropped, or users will miss critical
      validation errors if entries are hidden
  - id: finance-C-107
    when: When validated directives are passed to realization
    action: Use the display_context from options_map for formatting monetary amounts in reports
    severity: medium
    kind: resource_boundary
    modality: must
    consequence: Reports will display incorrect number precision, showing wrong decimal places that could mislead users about
      actual account balances
  - id: finance-C-109
    when: When writing or loading beancount source files with numeric amounts
    action: Use the D() function to parse numbers instead of direct Decimal() constructor — D() handles comma thousands separators
      and None values
    severity: high
    kind: domain_rule
    modality: must
    consequence: Numbers with comma thousand separators fail to parse correctly, causing ValueError exceptions and preventing
      ledger file loading
  - id: finance-C-111
    when: When processing financial data with incomplete cost or number specifications
    action: Use the MISSING sentinel class (not None or empty string) to represent incomplete/interpolatable fields — MISSING
      is designed to appear correctly in error messages
    severity: high
    kind: domain_rule
    modality: must
    consequence: Using None instead of MISSING for incomplete data causes AttributeError or confusing TypeError messages when
      interpolation is attempted
  - id: finance-C-112
    when: When adding tags or links to Transaction or other directive instances
    action: Use EMPTY_SET (frozenset()) instead of None for absent tags or links — never use None as a placeholder for empty
      collections
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Using None for empty tags/links causes AttributeError when code iterates over tags or links, breaking plugin
      processing and report generation
  - id: finance-C-115
    when: When validating or creating currency symbols in beancount
    action: Match currency names against CURRENCY_RE regex — valid currencies are uppercase alphanumeric with optional dots/underscores/hyphens,
      or forward-slash currency pairs like USD/EUR
    severity: high
    kind: domain_rule
    modality: must
    consequence: Invalid currency symbols accepted without validation cause parsing errors in downstream processing and generate
      confusing error messages
  - id: finance-C-116
    when: When working with dates in beancount directives
    action: Use datetime.date objects only (no time component) — beancount is a date-based accounting system with no support
      for time-of-day timestamps
    severity: high
    kind: domain_rule
    modality: must
    consequence: Attempting to use datetime.datetime with time components causes type errors since all directive date fields
      expect datetime.date without time
  - id: finance-C-117
    when: When booking ambiguous lots in an Inventory (multiple matching lots for a posting)
    action: Apply the booking method declared on the account's Open directive (STRICT, STRICT_WITH_SIZE, AVERAGE, FIFO, LIFO,
      HIFO, or NONE) — never arbitrarily pick a lot
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Incorrect lot selection causes wrong cost basis for asset sales, distorting realized gains/losses and violating
      accounting consistency requirements
  - id: finance-C-118
    when: When presenting or reporting beancount's capabilities to users
    action: Claim that beancount supports real-time financial transactions — beancount is a batch-processing text-file-based
      double-entry bookkeeping language, not a real-time trading or payment system
    severity: high
    kind: claim_boundary
    modality: must_not
    consequence: Users build integrations expecting live transaction recording and immediate balance updates, but beancount
      requires file reloads and produces stale data between processing runs
  - id: finance-C-119
    when: When presenting or reporting beancount's capabilities to users
    action: Claim that beancount supports multi-user ledger systems — beancount is a single-user file-based accounting tool
      with no concurrency controls, user authentication, or access control
    severity: high
    kind: claim_boundary
    modality: must_not
    consequence: Multiple users editing the same beancount file simultaneously cause data corruption from concurrent writes
      with no conflict resolution mechanism
  - id: finance-C-120
    when: When presenting or reporting beancount's capabilities to users
    action: Claim that beancount supports complex derivatives pricing — beancount handles basic cost basis tracking and currency
      conversion but lacks option pricing models, Greeks calculations, or margin modeling
    severity: high
    kind: claim_boundary
    modality: must_not
    consequence: Users expecting derivatives analytics receive only basic position tracking, leading to incorrect risk assessment
      and compliance failures for derivative portfolios
  - id: finance-C-121
    when: When presenting or reporting beancount's capabilities to users
    action: Claim that beancount provides a GUI-based accounting interface — beancount is a text-file DSL with CLI tools and
      a minimal web interface, not a point-and-click accounting application
    severity: medium
    kind: claim_boundary
    modality: must_not
    consequence: Users expecting traditional GUI accounting workflows are misled about the technical skills required; beancount
      requires text file editing and command-line operation
  - id: finance-C-122
    when: When reporting balance assertion failures or validation errors from beancount
    action: Claim that beancount's error messages provide automatic correction suggestions — beancount detects errors but
      requires manual file editing to resolve them; it does not have auto-fix capabilities
    severity: medium
    kind: claim_boundary
    modality: must_not
    consequence: Users expecting auto-correction wait indefinitely for fixes that require manual text file edits, delaying
      reconciliation and causing frustration
  - id: finance-C-123
    when: When loading and processing beancount files across different environments or timezones
    action: Use consistent date handling since beancount operates exclusively on date-only values (datetime.date) with no
      timezone component — no timezone normalization is needed because there is no time component
    severity: medium
    kind: domain_rule
    modality: must
    consequence: Attempting to apply timezone transformations to beancount dates causes type errors since all dates are datetime.date
      without timezone information
  - id: finance-C-124
    when: When loading beancount files and the file or any included files have been modified
    action: Invalidate the pickle cache and recompute the loaded entries — cache validation must check each included files'
      modification times, not just the top-level file
    severity: high
    kind: operational_lesson
    modality: must
    consequence: Stale cache causes beancount to return outdated entries and balances that don't reflect recent file changes,
      leading to incorrect financial reports
  - id: finance-C-125
    when: When processing directive types that can appear after account closure
    action: Allow Balance, Document, and Note directives to appear after their account's Close directive — these are exempt
      from the general chronological ordering rule
    severity: medium
    kind: architecture_guardrail
    modality: must
    consequence: Rejecting valid post-close Balance/Document/Note directives causes validation errors on legitimate accounting
      entries received after account closure
  - id: finance-C-126
    when: When implementing or refactoring cost tracking data structures
    action: Maintain CostSpec as a separate type from Cost — CostSpec represents incomplete cost state (missing date, label,
      or number) while Cost represents fully-specified resolved cost; do not merge these into a single type with nullable
      fields
    severity: high
    kind: domain_rule
    modality: must
    consequence: Merging CostSpec and Cost into a single type with nullable fields creates ambiguity between intentional missing
      values and programming errors, causing cost basis calculations to use incomplete specifications and produce incorrect
      tax lot information
    derived_from_bd_id: BD-006
  - id: finance-C-127
    when: When implementing number handling in cost or position calculations
    action: Use MISSING sentinel for unfilled optional values instead of None — MISSING propagates through computations and
      surfaces in error messages with meaningful context, making absence explicit rather than ambiguous
    severity: high
    kind: domain_rule
    modality: must
    consequence: Using None for missing values creates ambiguity between intentional absence and null-checking errors, causing
      bugs to produce silent incorrect results instead of immediate failures with traceable context
    derived_from_bd_id: BD-009
  - id: finance-C-128
    when: When implementing lot identification and merging logic for tax reporting
    action: Identify lots using the complete tuple (account, currency, cost_number, cost_currency, cost_date, cost_label)
      — any difference in these fields keeps positions as separate lots; only identical tuples are treated as the same lot
      for merging
    severity: high
    kind: domain_rule
    modality: must
    consequence: Incomplete lot identification (e.g., omitting cost_date or cost_label) causes separate lots to merge incorrectly,
      producing wrong cost basis calculations that lead to incorrect capital gains/losses for tax reporting
    derived_from_bd_id: BD-024
  - id: finance-C-129
    when: When tracking positions with cost specifications for lot accounting
    action: Include cost_label in lot identification when positions share identical cost_number, cost_currency, and cost_date
      — cost_label (even when empty) is part of the lot identity tuple and must not be omitted for matching purposes
    severity: high
    kind: domain_rule
    modality: must
    consequence: Omitting cost_label from lot matching causes lots with identical cost specifications to merge incorrectly,
      breaking tax lot accounting for securities that require separate lot tracking based on label differentiation
    derived_from_bd_id: BD-042
  - id: finance-C-130
    when: When processing dates in ledger entries
    action: Assume datetimes are timezone-aware without explicit annotation — the framework does not implement UTC normalization
      for naive datetimes; without explicit timezone handling, comparisons and calculations across different system timezones
      produce incorrect results
    severity: high
    kind: claim_boundary
    modality: must_not
    consequence: Without timezone annotation and UTC normalization, date comparisons and calculations produce inconsistent
      results depending on system timezone settings, causing ledger entries to be processed incorrectly and financial calculations
      to be wrong
    derived_from_bd_id: BD-GAP-018
  - id: finance-C-131
    when: When parsing or creating ledger entries with date/time fields
    action: Add explicit timezone annotation to each datetime fields and normalize to UTC during parsing — implement UTC normalization
      step before any date comparison or calculation operations to verify consistent timezone handling
    severity: high
    kind: domain_rule
    modality: must
    consequence: Without UTC normalization, ledger entries processed in different timezones produce inconsistent calculations,
      and cross-timezone reporting generates incorrect financial summaries that do not match when viewed from different locations
    derived_from_bd_id: BD-GAP-018
  - id: finance-C-132
    when: When implementing plugin processing in beancount's plugin_pipeline stage
    action: Execute check_closing plugin first before other plugins, and close_tree plugin last after other plugins — maintain
      the fixed plugin ordering sequence to verify validation occurs before cleanup operations
    severity: high
    kind: domain_rule
    modality: must
    consequence: Changing plugin execution order causes validation to run after potential entry modifications, resulting in
      invariant checks that pass but don't reflect the final state — backtest results may appear valid but contain hidden
      inconsistencies
    derived_from_bd_id: BD-032
  - id: finance-C-133
    when: When implementing the sellgains plugin's P&L reclassification logic
    action: Move only losses (where cost basis exceeds proceeds) to income — must NOT move gains (where proceeds exceed cost)
      to income; gains remain in their original expense classification
    severity: high
    kind: domain_rule
    modality: must_not
    consequence: Reclassifying gains to income instead of keeping them as expenses creates accounting classification errors
      that distort P&L reports and cause incorrect tax calculations
    derived_from_bd_id: BD-033
  - id: finance-C-134
    when: When implementing summarization period date boundaries in report_generation
    action: Use end_date minus 1 day (date-1) as the summarization period boundary — verify summarized entries are dated up
      to (end_date - 1 day) so source entries on end_date appear chronologically after the summary
    severity: high
    kind: domain_rule
    modality: must
    consequence: Using end_date without the -1 offset causes period summaries to overlap with subsequent entries, creating
      chronological confusion in reports where the same date appears in both summary and detail
    derived_from_bd_id: BD-034
  - id: finance-C-135
    when: When implementing account transfer entry date assignments in report_generation
    action: Date transfer entries exactly 1 day before the transfer date (transfer_date - 1) — verify transferred balances
      appear before the transfer cutoff point when filtering entries up to but not including the transfer date
    severity: high
    kind: domain_rule
    modality: must
    consequence: Using the actual transfer date instead of date-1 causes transfer entries to be excluded from balance calculations
      filtered by transfer date, resulting in incorrect account balances at cutoff points
    derived_from_bd_id: BD-035
  - id: finance-C-136
    when: When implementing price lookup or currency conversion logic in price_resolution
    action: Use (base, quote) currency tuple as the lookup key for price retrieval — maintain explicit ordering as price lookups
      are directional and (USD, CAD) must be treated as distinct from (CAD, USD)
    severity: high
    kind: domain_rule
    modality: must
    consequence: Using a single currency pair string without direction causes incorrect rate selection, converting currencies
      at inverse rates and producing wrong valuations in portfolio reports
    derived_from_bd_id: BD-037
  - id: finance-C-137
    when: When implementing currency conversion for currency pairs without direct exchange rates
    action: Attempt multi-hop conversion through intermediate currencies when direct rate is unavailable — try each currency
      as a potential intermediate step, returning the first valid conversion path found through the price graph
    severity: high
    kind: domain_rule
    modality: must
    consequence: Skipping multi-hop conversion when direct rates are missing causes valid conversions to fail silently, resulting
      in incomplete portfolio valuations and missing transaction conversions
    derived_from_bd_id: BD-038
  - id: finance-C-138
    when: When implementing booking/lot matching logic and users request strict cost specification
    action: 'Be aware that strict booking mode has a fallback: STRICT_WITH_SIZE allows exact-size matches to bypass explicit
      cost specification — document this behavior to users and do not promise complete strict enforcement'
    severity: high
    kind: operational_lesson
    modality: must
    consequence: Users relying on strict mode for regulatory compliance expect complete enforcement but get unexpected matches
      via the STRICT_WITH_SIZE fallback, leading to incorrect lot selections and audit failures
    derived_from_bd_id: BD-084
  - id: finance-C-139
    when: When implementing or modifying the plugin pipeline execution order
    action: 'Preserve the fixed plugin ordering: check_closing must run first (after Documents prepend and before other plugins),
      and close_tree must run last (after pad/balance append and invariant checking) — changing this order breaks invariant
      detection at predictable stages'
    severity: high
    kind: domain_rule
    modality: must
    consequence: Altering plugin order causes invariant violations to be detected at unexpected stages or missed entirely,
      resulting in data inconsistencies that pass validation but cause downstream calculation errors
    derived_from_bd_id: BD-089
  - id: finance-C-140
    when: When implementing proceeds calculation or validation in lot matching and P&L computation
    action: 'Implement validation checks at each cascade stage: verify negation produces positive proceeds (BD-068), verify
      weighted accumulation matches expected totals (BD-073), validate currency matching per-currency (BD-070), and fail on
      any unmatched currencies (BD-075) — subtle errors in earlier stages can pass through later validation'
    severity: medium
    kind: operational_lesson
    modality: should
    consequence: 'Errors in proceeds negation accumulate through the validation cascade: wrong proceeds pass currency-by-currency
      validation due to tolerance settings, and only complete currency mismatches trigger errors, causing understated or overstated
      P&L'
    derived_from_bd_id: BD-090
  - id: finance-C-141
    when: When implementing account hierarchy or inventory tracking for hierarchical reporting
    action: Maintain the complete 4-level account tree structure (Root → Assets/Income/Expenses/Liabilities → sub-accounts
      → components) with nested dict for O(1) lookup, and Inventory objects per account for per-currency cost-basis segregation
      — the aggregation chain from leaf to root depends on each levels being present
    severity: high
    kind: domain_rule
    modality: must
    consequence: Missing any account level breaks hierarchical rollup, causing incomplete aggregations where sub-account balances
      don't sum to parent totals and portfolio reports show incorrect totals
    derived_from_bd_id: BD-091
  - id: finance-C-142
    when: When implementing or refactoring position averaging logic in beancount/core/convert.py
    action: Use cost as weight for lots with cost specification, and units as weight for uncosted postings, with explicit
      price overrides taking precedence when provided
    severity: high
    kind: domain_rule
    modality: must
    consequence: Changing the weight calculation logic causes average price computations to use incorrect weighting, leading
      to misstated position values that diverge from actual cost basis and produce wrong P&L reports
    derived_from_bd_id: BD-039
  - id: finance-C-144
    when: When implementing or refactoring account transfer logic in beancount/ops/summarize.py
    action: Filter out any Balance directive on the destination (receiving) account after a transfer operation—the transfer
      entry itself establishes the correct balance
    severity: high
    kind: domain_rule
    modality: must
    consequence: Retaining balance assertions on the destination account after transfer causes false validation errors when
      the balance assertion value conflicts with the balance implicitly established by the transfer entry
    derived_from_bd_id: BD-041
  - id: finance-C-145
    when: When implementing or refactoring inventory aggregation logic in beancount/core/inventory.py
    action: Use (amount, currency) as the aggregation key during reduce operations to combine positions with identical amount
      and currency regardless of individual cost basis
    severity: high
    kind: domain_rule
    modality: must
    consequence: Changing the aggregation key to cost basis or other fields causes positions with identical amount and currency
      to remain uncombined, producing incorrect net position calculations and distorted portfolio reports
    derived_from_bd_id: BD-047
  - id: finance-C-146
    when: When implementing or refactoring closing entry validation in beancount/plugins/check_closing.py
    action: Verify that each Close directive's stated balance matches the computed balance from preceding entries—any discrepancy
      must trigger an error
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Removing the closing balance verification allows discrepancies between declared and actual closing balances
      to go undetected, corrupting ledger integrity and producing financial reports that do not reflect actual account state
    derived_from_bd_id: BD-048
  - id: finance-C-147
    when: When implementing or refactoring empty account tree cleanup in beancount/plugins/close_tree.py
    action: Remove only accounts that have zero balance AND no subaccounts—preserve any account with meaningful balance or
      child accounts
    severity: medium
    kind: architecture_guardrail
    modality: must
    consequence: Modifying the cleanup criteria to preserve empty accounts or remove accounts with children clutters the account
      tree with meaningless branches, degrading report readability and query performance
    derived_from_bd_id: BD-049
  - id: finance-C-148
    when: When implementing position adjustment or lot creation logic in the trading stage
    action: Negate position units to create a selling lot from a buying position — the negated position creates a new lot
      with negative units while the original lot remains unchanged until positions are netted by the booking method
    severity: high
    kind: domain_rule
    modality: must
    consequence: Without negation, sell transactions cannot properly reduce buying positions, causing duplicate lot creation
      and incorrect cost-basis tracking that accumulates over multiple trades
    derived_from_bd_id: BD-062
  - id: finance-C-149
    when: When implementing balance check logic in the check_closing stage
    action: Place the balance check on date + 1 day offset from the closing entry date to verify each closing postings have
      been applied before verification
    severity: high
    kind: domain_rule
    modality: must
    consequence: Same-day balance checks miss pending transactions in the closing batch, creating false balance discrepancies
      that trigger incorrect error reports and mask actual reconciliation failures
    derived_from_bd_id: BD-063
  - id: finance-C-150
    when: When implementing or validating balance check date offset in the check_closing stage
    action: Verify that the balance verification date equals exactly original entry date + 1 calendar day — the boundary condition
      ensures one full day separates closing entries from their verification
    severity: high
    kind: domain_rule
    modality: must
    consequence: Incorrect date offsets cause balance checks to run against incomplete closing batches or redundant post-verification
      entries, producing unreliable reconciliation results that hide actual accounting errors
    derived_from_bd_id: BD-066
  - id: finance-C-151
    when: When implementing proceeds accumulation logic in the sellgains stage
    action: Track proceeds only for account types designated in proceed_types — exclude income accounts and non-proceeds categories
      from inventory validation to verify only cash-equivalent accounts participate in reconciliation
    severity: high
    kind: domain_rule
    modality: must
    consequence: Including income accounts in proceeds tracking creates artificial reconciliation mismatches and produces
      incorrect gain/loss calculations that misstate taxable income and lead to IRS audit findings
    derived_from_bd_id: BD-069
  - id: finance-C-152
    when: When implementing proceeds inventory accumulation in the sellgains stage
    action: Accumulate proceeds using weighted conversion that accounts for position size — larger positions must contribute
      proportionally more to enable accurate FIFO/LIFO matching for cost-basis calculations
    severity: high
    kind: domain_rule
    modality: must
    consequence: Simple sum accumulation ignores position weighting, causing incorrect cost-basis allocation in partial sales
      that systematically misstates gains/losses and triggers incorrect tax reporting
    derived_from_bd_id: BD-073
  - id: finance-C-153
    when: When implementing currency validation in the sellgains stage
    action: Fail validation if any proceeds currencies cannot be matched to cost currencies — verify complete currency matching
      for accurate multi-currency gain/loss reporting
    severity: high
    kind: domain_rule
    modality: must
    consequence: Partial currency matching leaves orphaned proceeds that create accounting inconsistencies, causing multi-currency
      gain/loss reports to show phantom balances and incorrect per-currency totals
    derived_from_bd_id: BD-075
  - id: finance-C-154
    when: When configuring proceed_types in the sellgains stage for equity compensation tracking
    action: Include equity accounts in proceed_types to capture stock vesting, RSU vesting, and stock option exercises — these
      equity compensation events must be properly tracked in proceeds calculations
    severity: high
    kind: domain_rule
    modality: must
    consequence: Excluding equity accounts misses RSU vesting and option exercise events, causing incomplete proceeds tracking
      that misstates cost-basis and generates incorrect 1099-B reports for equity compensation
    derived_from_bd_id: BD-076
  - id: finance-C-155
    when: When implementing error reporting for transactions with multiple validation failures in the sellgains stage
    action: Report each applicable error types when a transaction has both proceeds imbalance and missing expense category
      — produce multiple error types to enable complete correction rather than sequential fixes
    severity: high
    kind: domain_rule
    modality: must
    consequence: Single error reporting hides secondary validation failures, requiring multiple fix-and-rerun cycles that
      delay reconciliation and risk fixing symptoms rather than root causes
    derived_from_bd_id: BD-079
  - id: finance-C-156
    when: When validating proceeds currencies for international brokerage accounts in the sellgains stage
    action: Accept non-USD currencies (CAD, etc.) as valid proceeds when properly structured — validate foreign currency proceeds
      against matching cost currencies to support international accounts
    severity: high
    kind: domain_rule
    modality: must
    consequence: Rejecting non-USD currencies causes valid international brokerage transactions to fail validation, blocking
      reconciliation for CAD and other foreign currency accounts entirely
    derived_from_bd_id: BD-080
  - id: finance-C-157
    when: When processing zero-priced transactions for corporate actions in the sellgains stage
    action: Accept zero-priced transactions when properly structured — some corporate actions like stock splits or distributions
      result in zero-value sales that must be captured for correct cost-basis calculations
    severity: high
    kind: domain_rule
    modality: must
    consequence: Rejecting zero-priced transactions causes legitimate corporate actions to fail, resulting in incomplete cost-basis
      records that cannot properly reflect stock splits and create phantom cost-basis gaps
    derived_from_bd_id: BD-081
  - id: finance-C-158
    when: When implementing period-bound reporting logic that rolls income/expenses to equity
    action: Use standard equity accounts (Opening-Balances, Current-Earnings) from BD-029 and enforce correct date ordering
      via BD-040's clamp operation when executing the equity rollforward
    severity: high
    kind: domain_rule
    modality: must
    consequence: Violating either the standard account requirement or date ordering causes period-bound reporting to fail
      silently, producing incorrect equity balances that don't match source transactions
    derived_from_bd_id: BD-087
  - id: finance-C-159
    when: When implementing post-modification invariant checking in the directive pipeline
    action: Maintain immutability of NamedTuple directives as defined in BD-002 and verify entries passed to BD-061's invariant
      checking remain unmodified during pipeline execution
    severity: high
    kind: domain_rule
    modality: must
    consequence: If directives become mutable, modifications could corrupt the entries being verified by invariant checking,
      making the verification unreliable and allowing invalid data to propagate through the pipeline
    derived_from_bd_id: BD-092
  - id: finance-C-160
    when: When configuring multiple data providers in production backtesting
    action: Assume the framework implements provider priority selection or credential isolation between providers — these
      capabilities are not present in the codebase; the framework uses no priority ordering and does not enforce credential
      separation
    severity: high
    kind: claim_boundary
    modality: must_not
    consequence: Without provider priority handling, the system may select an incorrect or higher-cost data provider, causing
      inconsistent data quality and increased operational costs in live trading
    derived_from_bd_id: BD-GAP-019
  - id: finance-C-161
    when: When configuring multiple data providers in production backtesting
    action: Implement provider priority configuration with a priority_rank field per provider and use credential isolation
      by storing each provider's credentials in separate secure storage entries (e.g., provider_<name>_api_key, provider_<name>_secret)
      instead of shared configuration
    severity: high
    kind: domain_rule
    modality: must
    consequence: Without explicit priority ranking, the system defaults to arbitrary provider selection, leading to inconsistent
      data feeds and potential credential cross-contamination that can cause authentication failures or data leakage
    derived_from_bd_id: BD-GAP-019
  - id: finance-C-162
    when: When implementing or modifying directive sorting and ordering logic in Beancount
    action: Maintain date as the primary sort key and lineno (line number) as the secondary sort key for directives occurring
      on the same date — preserve this two-level ordering guarantee
    severity: high
    kind: domain_rule
    modality: must
    consequence: Removing lineno as secondary sort key breaks deterministic ordering for same-day directives, causing balance
      calculation inconsistencies when ledger files are reorganized or include ordering changes
    derived_from_bd_id: BD-003
  - id: finance-C-163
    when: When implementing option parsing and aggregation logic for multi-file ledgers with includes
    action: Verify that options are parsed per-file with proper aggregation, ensuring top-level file options dominate after
      aggregation for predictable override behavior
    severity: medium
    kind: operational_lesson
    modality: should
    consequence: Incorrect option aggregation causes wrong operating_currency and display_context settings in included files,
      leading to calculation errors when shared settings in included files conflict with main file settings
    derived_from_bd_id: BD-004
  - id: finance-C-164
    when: When implementing or modifying the Beancount loader validation system
    action: Preserve the ability to inject custom validation functions at load time — extra validation functions must execute
      after standard validation completes and before data processing
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Removing custom validation injection breaks domain-specific compliance checks such as regulatory rules and
      personal accounting policies, allowing invalid entries to pass through the pipeline undetected
    derived_from_bd_id: BD-022
  - id: finance-C-165
    when: When implementing position cost basis calculations for lot matching
    action: Explicitly specify the booking method (FIFO, LIFO, MOST_RECENT, STRICT) when creating lots rather than relying
      on framework defaults, as the booking method directly determines which lots are consumed first and affects realized
      gains calculations
    severity: medium
    kind: operational_lesson
    modality: should
    consequence: Without explicit booking method specification, the framework uses its default method which may differ across
      versions or configurations, producing inconsistent cost basis calculations that lead to materially different realized
      gains or losses reports
    derived_from_bd_id: BD-007
  - id: finance-C-166
    when: When processing positions with cost specifications
    action: Explicitly specify cost currency in cost specifications when it differs from the position's units currency; when
      cost currency is omitted, verify that the framework correctly infers cost currency matches units currency
    severity: high
    kind: domain_rule
    modality: must
    consequence: Silent cost currency inference causes multi-currency portfolios to incorrectly value positions when cost
      and units currencies differ, leading to balance sheet errors and potentially incorrect P&L calculations in foreign-denominated
      holdings
    derived_from_bd_id: BD-036
  - id: finance-C-167
    when: When implementing period boundary processing or generating balance sheets for arbitrary periods
    action: Verify that income and expense accounts are properly closed and rolled into retained earnings at period boundaries
      — verify P&L closure follows standard accounting practice for accurate period-relative balance sheets
    severity: medium
    kind: operational_lesson
    modality: should
    consequence: Without proper income/expense rollforward to retained earnings, period-relative balance sheets show incorrect
      retained earnings balances, leading to wrong equity calculations and potentially misguided strategy decisions based
      on faulty capital estimates
    derived_from_bd_id: BD-017
  - id: finance-C-168
    when: When validating transaction balance assertions or implementing posting validation logic
    action: Enforce balanced postings invariant (sum to zero) strictly at transaction creation time; apply tolerance-based
      balance checking only during balance assertion interpolation, not during transaction posting validation — maintain clear
      scope separation between BD-023 and BD-026
    severity: high
    kind: domain_rule
    modality: must
    consequence: Mixing invariant-level and tolerance-level validation scopes causes inconsistent balance checking behavior
      — transactions may be accepted that fail strict invariant checks, or rejected that pass tolerance checks, leading to
      silent data integrity issues
    derived_from_bd_id: BD-083
  - id: finance-C-169
    when: When processing financial calculations involving external data ingestion or price lookups
    action: Verify each financial arithmetic operations use Decimal type for precision — verify that external data feeds,
      price lookups, and data ingestion pipelines are converted to Decimal before calculations; do not rely on floating-point
      arithmetic for any monetary values
    severity: high
    kind: domain_rule
    modality: must
    consequence: Floating-point calculations for financial values introduce rounding errors that accumulate over multiple
      transactions; strategies relying on precise cost-basis calculations may execute incorrectly due to sub-cent errors,
      causing systematic profit/loss misreporting
    derived_from_bd_id: BD-085
  - id: finance-C-170
    when: When caching Google Drive API responses using file-based caching
    action: Assume the file-based cache handles data staleness automatically — the cache has no automatic invalidation mechanism
      and persists indefinitely until manually deleted
    severity: high
    kind: claim_boundary
    modality: must_not
    consequence: Without automatic cache invalidation, backtests use stale data when source Google Drive files are updated;
      this causes backtest-live inconsistency where strategy decisions are based on outdated financial data
    derived_from_bd_id: BD-GAP-006
  - id: finance-C-171
    when: When caching Google Drive API responses in backtesting workflows
    action: Implement explicit cache invalidation by checking file modification timestamps or API ETag headers; delete stale
      cache entries when remote data is newer than cached version, or use TTL-based expiration
    severity: high
    kind: domain_rule
    modality: must
    consequence: Backtests using stale cached API responses will execute strategies based on outdated financial data, causing
      incorrect allocation decisions and misleading performance metrics
    derived_from_bd_id: BD-GAP-006
  - id: finance-C-172
    when: When implementing or modifying plugin mode configurations in the framework
    action: Verify that plugin ordering invariants (BD-061) are maintained across each execution modes, especially when using
      raw mode that bypasses built-in plugins (BD-010)
    severity: high
    kind: operational_lesson
    modality: must
    consequence: In raw mode, fixed plugin ordering (BD-011, BD-012) is bypassed, causing invariant checks (BD-061) to run
      on incompletely-transformed entries and miss silent data corruption that standard mode would catch
    derived_from_bd_id: BD-095
  - id: finance-C-173
    when: When implementing or modifying proceeds calculation logic, especially for short positions
    action: Verify that unit negation logic for proceeds calculation (BD-068) produces correct signed values, and validate
      weighted accumulation results (BD-073) against independent calculations before relying on per-currency tolerance checks
      (BD-070, BD-071)
    severity: high
    kind: operational_lesson
    modality: must
    consequence: A negation error in proceeds calculation (BD-068) propagates through weighted accumulation (BD-073), and
      the resulting incorrect totals may pass validation due to tolerances (BD-026, BD-071), allowing incorrect gain/loss
      to be recorded
    derived_from_bd_id: BD-096
  - id: finance-C-174
    when: When configuring Google Drive API authentication for document downloads in enterprise environments
    action: Verify that service account has domain-wide delegation authority configured in GSuite Admin Console before using
      service account authentication; if delegation is not available, implement OAuth2 user flow with proper token refresh
      handling
    severity: high
    kind: operational_lesson
    modality: must
    consequence: Service account authentication silently fails in GSuite environments without domain-wide delegation authority,
      causing document download failures that may crash the system or produce incomplete data
    derived_from_bd_id: BD-GAP-001
  - id: finance-C-175
    when: When implementing account structure in beancount
    action: 'Use hierarchical colon-separated account naming with one of the 4 standard type prefixes: Assets, Liabilities,
      Equity, Income, or Expenses. Accounts must begin with these prefixes to be valid for account type classification and
      proper balance sheet organization.'
    severity: high
    kind: domain_rule
    modality: must
    consequence: Breaking hierarchical account naming would disrupt account type classification, causing balance sheet and
      income statement reports to fail or produce incorrect totals
    derived_from_bd_id: BD-028
  - id: finance-C-176
    when: When implementing inventory equality comparison logic
    action: Use set-based equality for inventory comparison — two inventories are equal only when they contain identical lot
      sets (sorted position comparison by account and currency), regardless of acquisition order
    severity: high
    kind: domain_rule
    modality: must
    consequence: Using list-based or order-dependent comparison would cause inventories with identical lots in different orders
      to be treated as unequal, breaking cost basis tracking and lot-level accounting accuracy
    derived_from_bd_id: BD-031
  - id: finance-C-177
    when: When testing or validating multi-leg sale transactions
    action: Verify that balanced multi-leg sales (where cash proceeds equal cost basis within tolerance) produce zero validation
      errors and are accepted by the sellgains plugin
    severity: medium
    kind: operational_lesson
    modality: should
    consequence: Modifying validation to reject balanced transactions would incorrectly flag valid economic outcomes, causing
      false accounting errors in correctly structured trades
    derived_from_bd_id: BD-077
  - id: finance-C-178
    when: When processing sell transactions in the sellgains plugin
    action: Reject transactions where cash proceeds do not match cost basis within tolerance — must trigger SellGainsError
      for unbalanced cash vs cost
    severity: high
    kind: domain_rule
    modality: must
    consequence: Allowing mismatched cash flows to pass validation would create accounting errors that propagate through the
      system, producing incorrect realized gains/losses calculations
    derived_from_bd_id: BD-078
  - id: finance-C-179
    when: When implementing directive ordering for same-day transactions
    action: Preserve parser source location tracking (PLY line numbers) as the secondary sort key for same-day directive ordering
      — the sellgains plugin relies on deterministic lineno-based ordering when timestamps are equal
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Removing or modifying parser line number tracking would make same-day directive ordering non-deterministic,
      causing inconsistent validation results and potential silent ordering-dependent bugs in accounting
    derived_from_bd_id: BD-082
  - id: finance-C-180
    when: When implementing Open directive logic for account initialization
    action: Preserve the currencies field to limit which currencies can post to the account, and preserve the booking field
      to specify lot merging strategy — both constraints are enforced from the Open date onward
    severity: high
    kind: domain_rule
    modality: must
    consequence: Modifying or removing the currencies or booking fields during Open directive processing changes account scope
      and lot merging behavior, causing unexpected transaction rejections or incorrect position tracking
    derived_from_bd_id: BD-050
  - id: finance-C-181
    when: When implementing Close directive logic for account lifecycle management
    action: Verify any transaction dated on or after the Close date for an account triggers an error — preventing posthumous
      entries to closed accounts
    severity: high
    kind: domain_rule
    modality: must
    consequence: Allowing transactions on or after the Close date creates posthumous entries that violate account lifecycle
      semantics, causing data integrity issues and incorrect historical reporting
    derived_from_bd_id: BD-051
  - id: finance-C-182
    when: When implementing Document directive processing or balance calculation logic
    action: Use Document directives for any balance calculations or position tracking — they are purely informational for
      audit trail and source file references only
    severity: high
    kind: architecture_guardrail
    modality: must_not
    consequence: Incorporating Document directives into balance calculations creates phantom balances since Document merely
      associates filenames with entry dates without affecting account balances
    derived_from_bd_id: BD-052
  - id: finance-C-183
    when: When implementing date range filtering logic for export or report generation
    action: Use half-open interval [begin, end) semantics — include entries on the begin date and exclude entries on the end
      date, matching standard Python slice semantics
    severity: high
    kind: domain_rule
    modality: must
    consequence: Using closed intervals [begin, end] causes double-counting when exporting sequential periods, as entries
      on period boundaries are counted in multiple exports simultaneously
    derived_from_bd_id: BD-054
  - id: finance-C-184
    when: When implementing treeify logic for hierarchical report generation
    action: Preserve the conversion of flat (account, balance) tuples to nested dict structure where each level corresponds
      to account segments separated by colons — the hierarchical structure enables nested reporting aggregation
    severity: high
    kind: domain_rule
    modality: must
    consequence: Returning flat structure instead of nested dict breaks hierarchical aggregation and nested reporting, causing
      incorrect or missing account segment rollups in financial reports
    derived_from_bd_id: BD-055
output_validator:
  assertions:
  - id: OV-01
    check_predicate: all(p in inspect.getsource(zvt.factors.algorithm.macd) for p in ['slow=26', 'fast=12', 'n=9'])
    failure_message: 'FATAL: MACD params drifted from (fast=12, slow=26, n=9) — SL-08 violation, non-reproducible signals'
    business_meaning: Standard MACD parameters are a semantic lock; drift makes results incomparable with industry-standard
      indicators and non-reproducible.
    source_ids:
    - SL-08
    - BD-036
  - id: OV-02
    check_predicate: result.get('total_trades', 0) > 0 or result.get('explicit_zero_trade_ack') is True
    failure_message: Zero trades executed — likely missing pre-fetched data (see PC-02) or over-restrictive filters
    business_meaning: A backtest with zero trades is not a valid result; either data is missing or the strategy never triggered.
      Structural non-emptiness check is insufficient — we need business confirmation.
    source_ids:
    - SL-01
    - finance-C-073
  - id: OV-03
    check_predicate: result.get('annual_return') is None or abs(float(result['annual_return'])) <= 5.0
    failure_message: 'FATAL: |annual_return| > 500% — likely look-ahead bias or data error'
    business_meaning: Annual returns exceeding 500% are physically implausible for A-share strategies; indicates look-ahead
      bias or corrupt data.
    source_ids: []
  - id: OV-04
    check_predicate: result.get('holding_change_pct') is None or abs(float(result['holding_change_pct'])) <= 1.0
    failure_message: 'FATAL: |holding_change_pct| > 100% — physically impossible'
    business_meaning: Holding change percentage cannot exceed 100%; violation indicates position accounting error.
    source_ids:
    - BD-029
  - id: OV-05
    check_predicate: result.get('max_drawdown') is None or abs(float(result['max_drawdown'])) <= 1.0
    failure_message: 'FATAL: |max_drawdown| > 100% — impossible for non-leveraged account'
    business_meaning: Maximum drawdown cannot exceed 100% without leverage; violation indicates calculation error or look-ahead
      bias.
    source_ids: []
  - id: OV-06
    check_predicate: not (hasattr(result, 'trade_log') and result.trade_log and any(result.trade_log[i].action == 'sell' and
      i+1 < len(result.trade_log) and result.trade_log[i+1].action == 'buy' and result.trade_log[i].timestamp == result.trade_log[i+1].timestamp
      for i in range(len(result.trade_log)-1)))
    failure_message: 'FATAL: buy-before-sell detected in same cycle — SL-01 violation, creates implicit leverage'
    business_meaning: SL-01 requires sell() before buy() in each cycle; violation means available_long was not updated before
      buying, risking duplicate positions.
    source_ids:
    - SL-01
  scaffold:
    validate_py_path: '{workspace}/validate.py'
    tail_block: "# === DO NOT MODIFY BELOW THIS LINE ===\nif __name__ == \"__main__\":\n    result = run_backtest()\n    from\
      \ validate import enforce_validation\n    enforce_validation(result, output_path=\"{workspace}/result.csv\")\n# ===\
      \ END DO NOT MODIFY ==="
  enforcement_protocol: 1. Never edit validate.py. 2. Never delete the DO NOT MODIFY tail block from the main script. 3. Never
    wrap enforce_validation() in try/except. 4. Never rewrite result write logic — it MUST go through enforce_validation.
    5. If validate.py raises ImportError, fix the dependency, do not remove the call.
acceptance:
  hard_gates:
  - id: G1
    check: '{workspace}/result.csv exists AND file size > 0'
    on_fail: Strategy did not produce output; check run_backtest() return value and enforce_validation() call
  - id: G2
    check: '{workspace}/result.csv.validation_passed marker file exists'
    on_fail: Validation did not complete; review validate.py output and fix assertion failures
  - id: G3
    check: 'Main script contains literal: from validate import enforce_validation'
    on_fail: Validation chain stripped; re-add the import in the DO NOT MODIFY block
  - id: G4
    check: 'Main script contains literal: # === DO NOT MODIFY BELOW THIS LINE ==='
    on_fail: Validation fence removed; regenerate DO NOT MODIFY tail block
  - id: G5
    check: 'result.csv has at least 1 row: pandas.read_csv(result.csv).shape[0] >= 1'
    on_fail: Empty result; check if trade_log is non-empty and factors generated signals. Confirm PC-02 (k-data exists) passed.
  - id: G6
    check: 'If MACD strategy: source contains ''slow=26'' AND ''fast=12'' AND ''n=9'' in algorithm call'
    on_fail: MACD params drifted from SL-08 lock; restore standard (12, 26, 9)
  - id: G7
    check: 'For data pipeline tasks: result.csv contains ''entity_id'' and ''timestamp'' fields'
    on_fail: Missing required columns; check Mixin.query_data return schema and DataFrame MultiIndex reset_index() before
      writing
  - id: G8
    check: 'OV-03 passes: abs(annual_return) <= 5.0 (500%)'
    on_fail: Physical plausibility check failed; investigate look-ahead bias or data corruption in input kdata
  soft_gates:
  - id: SG-01
    rubric: 'Strategy narrative consistency: user intent aligns with generated strategy.py logic. dim_a: signal direction
      (buy/sell) matches intent [1-5, pass>=4]; dim_b: frequency (daily/intraday) aligns [1-5, pass>=4]; dim_c: risk controls
      match user intent [1-5, pass>=4].'
  - id: SG-02
    rubric: 'Factor combination quality. dim_a: no highly correlated factor duplication [1-5, pass>=4]; dim_b: multi-period
      alignment correct [1-5, pass>=4]; dim_c: liquidity filter present for A-share [1-5, pass>=4].'
  - id: SG-03
    rubric: 'Data source selection appropriateness. dim_a: coverage sufficient for target entities [1-5, pass>=4]; dim_b:
      provider latency acceptable for strategy frequency [1-5, pass>=4]; dim_c: no unauthorized provider used without credentials
      [1-5, pass>=4].'
skill_crystallization:
  trigger: all_hard_gates_passed AND user_opt_out_skill_saving != true
  output_path_template: '{workspace}/../skills/{slug}.skill'
  slug_template: '{blueprint_id_short}-{uc_id_lower}'
  captured_fields:
  - name
  - intent_keywords
  - entry_point_script
  - validate_script
  - fatal_constraints
  - spec_locks
  - preconditions
  - install_recipes
  - human_summary_translated
  action: 'After all Hard Gates PASS, resolve slug via slug_template using the executed UC, then write the .skill YAML file
    at output_path_template. Notify user in their detected locale: ''Skill saved as {slug}.skill — next time say one of {sample_triggers}
    from the matched UC to invoke directly.'''
  violation_signal: All hard gates passed but no .skill file exists at expected path
  skill_file_schema:
    name: finance-bp-129 / Beancount Test Utilities Framework
    version: v5.3
    intent_keywords:
    - testing utilities
    - tempdir
    - test files
    - mock repository
    - integration testing
    entry_point: run_backtest
    fatal_guards:
    - SL-01
    - SL-02
    - SL-03
    - SL-04
    - SL-05
    - SL-06
    - SL-07
    - SL-08
    - SL-10
    - SL-11
    - SL-12
    spec_locks:
    - SL-01
    - SL-02
    - SL-03
    - SL-04
    - SL-05
    - SL-06
    - SL-07
    - SL-08
    - SL-09
    - SL-10
    - SL-11
    - SL-12
    preconditions:
    - PC-01
    - PC-02
    - PC-03
    - PC-04
post_install_notice:
  trigger: skill_installation_complete
  message_template:
    positioning: I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow.
    capability_catalog:
      group_strategy:
        source: auto_grouped
        strategy_reason: no candidate field had 2-7 distinct values; all capabilities collapsed into single group
      groups:
      - group_id: all
        name: All Capabilities
        description: ''
        emoji: 📦
        uc_count: 2
        ucs:
        - uc_id: UC-101
          name: Beancount Test Utilities Framework
          short_description: Provides reusable testing utilities for beancount test scripts including temporary directory
            management and test file creation for integration testing
          sample_triggers:
          - testing utilities
          - tempdir
          - test files
        - uc_id: UC-102
          name: Test Utils Validation Suite
          short_description: Unit tests that validate the correctness of test utility functions including temporary directory
            cleanup and test file generation for beancount test s
          sample_triggers:
          - unit test
          - validation
          - test utilities
    call_to_action: Tell me which one you want to try.
    featured_entries:
    - uc_id: UC-101
      beginner_prompt: Try beancount test utilities framework
      auto_selected: true
    - uc_id: UC-102
      beginner_prompt: Try test utils validation suite
      auto_selected: true
    - uc_id: UC-100
      beginner_prompt: Try capability UC-100
      auto_selected: true
    more_info_hint: Ask me 'what else can you do?' to see all 2 capabilities.
  locale_rendering:
    instruction: On skill_installation_complete, translate ALL user-facing strings (positioning + capability_catalog.groups[].name
      + capability_catalog.groups[].description + capability_catalog.groups[].ucs[].short_description + call_to_action + featured_entries[].beginner_prompt
      + more_info_hint) into detected user locale per locale_contract. Preserve UC-IDs, group_id, emoji, and sample_triggers
      verbatim.
    preserve_verbatim:
    - UC-IDs
    - group_id
    - emoji
    - sample_triggers
    - technical_class_names
  enforcement:
    action: 'Host agent MUST send composed message to user as the FIRST user-facing response after skill_installation_complete
      event. Message MUST contain: positioning, capability_catalog (rendered as markdown tables per group), 3 featured_entries,
      call_to_action, and more_info_hint.'
    violation_code: PIN-01
    violation_signal: First user-facing message post-install does not contain the full capability_catalog (all UCs grouped)
      OR skips featured_entries OR skips call_to_action.
human_summary:
  persona: Doraemon
  what_i_can_do:
    tagline: 'I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow. Just tell me
      what you want; I''ll write the code, you don''t have to dig docs. (Heads up: ZVT natively supports A-share, HK, and
      crypto. US stocks — stockus_nasdaq_AAPL — are half-baked; don''t bother for serious work.)'
    use_cases:
    - Test Utils Validation Suite
    - Beancount Test Utilities Framework
    - A-share MACD daily golden-cross backtest with hfq price adjustment from eastmoney
    - 'End-to-end ZVT pipeline: FinanceRecorder + GoodCompanyFactor + StockTrader'
    - Multi-factor strategy with TargetSelector (AND mode) combining MACD + volume breakout
    - Index composition data collection (SZ1000, SZ2000) with EM recorder
    - Institutional fund holdings tracker via joinquant_fund_runner pattern
  what_i_auto_fetch:
  - ZVT stage pipeline structure (data_collection → visualization) from LATEST.yaml
  - Semantic locks (SL-01 through SL-12) — especially sell-before-buy ordering and MACD params
  - Fatal constraints (finance-C-*) relevant to your target strategy type
  - 'Default parameters: MACD(12,26,9), hfq adjustment, buy_cost=0.001, base_capital=1M CNY'
  - Entity ID format (stock_sh_600000) and DataFrame MultiIndex convention
  - Provider-specific recorder class names and required class attributes
  what_i_ask_you:
  - 'Target market: A-share (default), HK, or crypto? (US stocks in ZVT are half-baked — stockus_nasdaq_AAPL exists but coverage
    is thin)'
  - 'Data source / provider: eastmoney (free, no account), joinquant (account+paid), baostock (free, good history), akshare,
    or qmt (broker)?'
  - 'Strategy type: MACD golden-cross, MA crossover, volume breakout, fundamental screen, or custom factor?'
  - 'Time range: start_timestamp and end_timestamp for backtest period'
  - 'Target entity IDs: specific stocks (stock_sh_600000) or index components (SZ1000)?'
  locale_rendering:
    instruction: On first user contact, translate all fields above into detected user locale while preserving Doraemon persona
      (direct, frank, mildly snarky, knows limits).
    preserve_verbatim:
    - BD-IDs
    - SL-IDs
    - UC-IDs
    - finance-C-IDs
    - class_names
    - function_names
    - file_paths
    - numeric_thresholds

ClawHub Coding Testing+2

T@clawhub-tangweigang-jpg-8679fec286

Backtrader Event Driven

Skill

运行经典双均线交叉策略回测，事件驱动模拟信号生成与持仓，输出 PyFolio 绩效报告。

---
name: backtrader-event-driven
description: |-
  运行经典双均线交叉策略回测，事件驱动模拟信号生成与持仓，输出 PyFolio 绩效报告。
license: Proprietary. See LICENSE.txt in project root.
compatibility: Designed for Doramagic-host ecosystem (Claude Code / openclaw / Cursor). Requires Python 3.12+ with uv package manager.
metadata:
  version: "v6.1"
  blueprint_id: "finance-bp-086"
  compiled_at: "2026-04-22T13:00:35.750880+00:00"
  capability_markets: "multi-market"
  capability_activities: "backtesting, factor-research"
  sop_version: "crystal-compilation-v6.1"
---
# Backtrader 事件回测 (backtrader-event-driven)

> 运行经典双均线交叉策略回测，事件驱动模拟信号生成与持仓，输出 PyFolio 绩效报告。

## Pipeline

`data_collection -> data_storage -> factor_computation -> target_selection -> trading_execution -> visualization`

## Top Use Cases (2 total)

### SMA Crossover Backtester with PyFolio Analytics (`UC-101`)
Implements a classic dual moving average crossover trading strategy using backtrader, generating LONG/LONGSHORT signals when fast and slow SMAs cross,
**Triggers**: backtrader, SMA crossover, moving average

### OHLC Data Printer Utility (`UC-102`)
Provides a minimal backtrader strategy that logs and prints OHLC (Open, High, Low, Close) data in CSV format for debugging and verifying data feed int
**Triggers**: backtrader, data printing, OHLC logging

**Execute trigger**: `When user intent matches intent_router.uc_entries[].positive_terms AND user uses action verb (run/execute/跑/执行/backtest/fetch/collect)`

## What I'll Ask You

- Target market: A-share (default), HK, or crypto? (US stocks in ZVT are half-baked — stockus_nasdaq_AAPL exists but coverage is thin)
- Data source / provider: eastmoney (free, no account), joinquant (account+paid), baostock (free, good history), akshare, or qmt (broker)?
- Strategy type: MACD golden-cross, MA crossover, volume breakout, fundamental screen, or custom factor?
- Time range: start_timestamp and end_timestamp for backtest period
- Target entity IDs: specific stocks (stock_sh_600000) or index components (SZ1000)?

## Semantic Locks (Fatal)

| ID | Rule | On Violation |
|---|---|---|
| `SL-01` | Execute sell orders before buy orders in every trading cycle | halt |
| `SL-02` | Trading signals MUST use next-bar execution (no look-ahead) | halt |
| `SL-03` | Entity IDs MUST follow format entity_type_exchange_code | halt |
| `SL-04` | DataFrame index MUST be MultiIndex (entity_id, timestamp) | halt |
| `SL-05` | TradingSignal MUST have EXACTLY ONE of: position_pct, order_money, order_amount | halt |
| `SL-06` | filter_result column semantics: True=BUY, False=SELL, None/NaN=NO ACTION | halt |
| `SL-07` | Transformer MUST run BEFORE Accumulator in factor pipeline | halt |
| `SL-08` | MACD parameters locked: fast=12, slow=26, signal=9 | halt |

Full lock definitions: [references/LOCKS.md](references/LOCKS.md)

## Top Anti-Patterns (25 total)

- **`AP-ZVT-183`**: 除权因子为 inf/NaN 时直接参与乘法导致复权静默失败
- **`AP-ZVT-179`**: 第三方数据接口超限后异常被吞噬，数据静默缺失
- **`AP-ZVT-183B`**: HFQ（后复权）与 QFQ（前复权）K 线表使用错误导致因子计算漂移

All 25 anti-patterns: [references/ANTI_PATTERNS.md](references/ANTI_PATTERNS.md)

## Evidence Quality Notice

> [QUALITY NOTICE] This crystal was compiled from blueprint finance-bp-086. Evidence verify ratio = 28.8% and audit fail total = 15. Generated results may have uncaptured requirement gaps. Verify critical decisions against source files (LATEST.yaml / LATEST.jsonl).

## Reference Files

| File | Contents | When to Load |
|---|---|---|
| [references/seed.yaml](references/seed.yaml) | V6+ 全量权威 (source-of-truth) | 有行为/决策争议时必读 |
| [references/ANTI_PATTERNS.md](references/ANTI_PATTERNS.md) | 25 条跨项目反模式 | 开始实现前 |
| [references/WISDOM.md](references/WISDOM.md) | 跨项目精华借鉴 | 架构决策时 |
| [references/CONSTRAINTS.md](references/CONSTRAINTS.md) | domain + fatal 约束 | 规则冲突时 |
| [references/USE_CASES.md](references/USE_CASES.md) | 全量 KUC-* 业务场景 | 需要完整示例时 |
| [references/LOCKS.md](references/LOCKS.md) | SL-* + preconditions + hints | 生成回测/交易代码前 |
| [references/COMPONENTS.md](references/COMPONENTS.md) | AST 组件地图（按 module 拆分）| 查 API 时 |

---

*Compiled by Doramagic crystal-compilation-v6.1 from `finance-bp-086` blueprint at 2026-04-22T13:00:35.750880+00:00.*
*See [human_summary.md](human_summary.md) for non-technical overview.*

FILE:human_summary.md
# Human Summary

> I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow. Just tell me what you want; I'll write the code, you don't have to dig docs. (Heads up: ZVT natively supports A-share, HK, and crypto. US stocks — stockus_nasdaq_AAPL — are half-baked; don't bother for serious work.)

## What I Can Do

- **tagline**: I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow. Just tell me what you want; I'll write the code, you don't have to dig docs. (Heads up: ZVT natively supports A-share, HK, and crypto. US stocks — stockus_nasdaq_AAPL — are half-baked; don't bother for serious work.)
- **use_cases**: ['OHLC Data Printer Utility', 'SMA Crossover Backtester with PyFolio Analytics', 'A-share MACD daily golden-cross backtest with hfq price adjustment from eastmoney', 'End-to-end ZVT pipeline: FinanceRecorder + GoodCompanyFactor + StockTrader', 'Multi-factor strategy with TargetSelector (AND mode) combining MACD + volume breakout', 'Index composition data collection (SZ1000, SZ2000) with EM recorder', 'Institutional fund holdings tracker via joinquant_fund_runner pattern']

## What I Ask You

- Target market: A-share (default), HK, or crypto? (US stocks in ZVT are half-baked — stockus_nasdaq_AAPL exists but coverage is thin)
- Data source / provider: eastmoney (free, no account), joinquant (account+paid), baostock (free, good history), akshare, or qmt (broker)?
- Strategy type: MACD golden-cross, MA crossover, volume breakout, fundamental screen, or custom factor?
- Time range: start_timestamp and end_timestamp for backtest period
- Target entity IDs: specific stocks (stock_sh_600000) or index components (SZ1000)?

FILE:references/ANTI_PATTERNS.md
# Anti-Patterns (Cross-Project)

Total: **25**

## qlib (9)

### `AP-QLIB-1930` — 回测结果与模型无关——共享 dataset 对象导致预测值被首次模型覆盖 <sub>(high)</sub>

Qlib 中多个模型复用同一个已 fit 的 DatasetH 实例时，dataset 内部的标准化 参数（fit_start_time/fit_end_time 决定的归一化统计量）在第一次 fit 后固化。 切换模型但不重新初始化 dataset，导致所有模型实际使用同一套预测信号。表现为 无论换 LightGBM/XGBoost/DNN，回测净值曲线完全一致。这是最危险的"实验看起来 在跑，但结论全部无效"反模式。

Source: https://github.com/microsoft/qlib/issues/1930

### `AP-QLIB-2090` — fit_start_time 与 train segment 双重配置引发隐式数据泄露 <sub>(high)</sub>

Qlib DatasetH 有两个"训练数据范围"：handler 的 fit_start_time/fit_end_time （决定归一化器拟合范围）和 segments.train（决定模型训练范围）。常见错误是 让 fit_end_time 覆盖 valid/test 段，使归一化统计量（均值、标准差）包含了 未来数据，造成前向偏差（look-ahead bias）。两者独立配置但语义耦合，文档 未明确说明 fit_end_time 必须 <= train_end。

Source: https://github.com/microsoft/qlib/issues/2090

### `AP-QLIB-2036` — MACD 因子公式文档错误——DEA 被多除一次 CLOSE 导致量纲不一致 <sub>(high)</sub>

Qlib 官方文档中的 Alpha 公式示例将 MACD 的 DEA 定义为 EMA(DIF, 9) / CLOSE， 但 DIF 已经是无量纲（除过 CLOSE 的），再次除以 CLOSE 导致 DEA 量纲为 1/price。 基于此文档公式构建的 MACD 因子在截面标准化后与正确公式差异显著，IC 下降。 此类文档层面的公式错误会被大量用户直接照搬入生产因子库。

Source: https://github.com/microsoft/qlib/issues/2036

### `AP-QLIB-2184` — 自定义 A 股数据导入前未按约定填充停牌日 NaN，引发下游因子噪声 <sub>(high)</sub>

Qlib 约定停牌日 open/close/high/low/volume/factor 字段均应填 NaN，以便框架 在因子计算时识别并跳过。用户自建 A 股数据集时若将停牌日保留为上一日价格 （常见于从东财/Wind 直接导出的数据），会导致停牌期间的价格动量因子出现 "假信号"（价格不变但因子非零）。Qlib 不校验此约定，错误静默流入训练数据。

Source: https://github.com/microsoft/qlib/issues/2184

### `AP-QLIB-1892` — PIT（Point-In-Time）财务数据收集器依赖外部股票列表接口，全量 A 股获取不完整 <sub>(high)</sub>

Qlib 的 PIT 数据收集器（财务数据时间点快照）在初始化时调用 get_hs_stock_symbols() 获取沪深股票列表。该函数依赖东财 API，经常仅返回 部分列表而非全量 5000+ 股票，且函数在获取不完整时直接 raise ValueError。 用户若按文档步骤操作，财务数据集将只覆盖部分股票，基于 PIT 财务因子的回测 存在严重生存者偏差（未被采集的股票被隐式排除）。

Source: https://github.com/microsoft/qlib/issues/1892

### `AP-QLIB-2097` — 全市场 instrument="all" 在 32GB 内存机器上 OOM，但 CSI300 正常 <sub>(medium)</sub>

Qlib 在加载 Alpha158 特征时会将指定 universe 的全部特征矩阵一次性载入内存。 使用 instrument="csi300"（300 股）与 instrument="all"（5000+ 股）的内存占用 差约 16 倍。32GB 机器跑全市场时在 init_instance_by_config 阶段直接 OOM， 错误信息不提示内存问题。用户容易误以为是配置错误，实际上需要分批加载或 使用流式特征计算。

Source: https://github.com/microsoft/qlib/issues/2097

### `AP-QLIB-1984` — LightGBM 模型标签维度校验逻辑永远不触发导致多标签训练静默失败 <sub>(medium)</sub>

Qlib gbdt.py 中用 y.values.ndim == 2 判断是否为多标签，但从 DataFrame 取出的 Series 的 ndim 永远为 1，条件永远为 False，因此多标签训练不会走 squeeze 分支，而是直接进入 LightGBM 训练并在更深处抛出语义不明的错误。 用户尝试自定义多标签任务时无法从错误信息定位到此根因。

Source: https://github.com/microsoft/qlib/issues/1984

### `AP-QLIB-1915` — 自定义 CSV 数据 dump_bin 后 DataHandler 报 Length mismatch，D.features 却正常 <sub>(high)</sub>

Qlib 存在两套数据访问路径：D.features（直接读 binary）和 DataHandler/DataHandlerLP （带 processor pipeline）。自定义 A 股 CSV 数据在 dump_bin 时若字段顺序 或 symbol 格式（如 600000.SH vs SH600000）与 Qlib 约定不符，DataHandler 的 processor 在 align/reindex 时触发 Length mismatch，而 D.features 因不 经过 processor 而成功。这一"两套路径行为不一致"让用户误以为数据已正确导入。

Source: https://github.com/microsoft/qlib/issues/1915

### `AP-QLIB-1949` — Colab/Linux 多进程后端与 Qlib ParallelExt 冲突导致 DataHandler 完全不可用 <sub>(medium)</sub>

Qlib 在非 fork 环境（Windows 或 Google Colab）中，DataHandler 使用 joblib 并行加载特征时，ParallelExt 初始化时访问 _backend_args 属性失败（AttributeError）。 根因是 joblib 1.5+ 移除了该内部属性，Qlib 的兼容层未更新。表现为 D.features 调用抛出多层嵌套异常，用户无法从错误栈判断是并行后端问题还是数据问题。

Source: https://github.com/microsoft/qlib/issues/1949

## vnpy (4)

### `AP-VNPY-3691` — K 线生成器首根 K 线时间戳不对齐，导致第一个周期信号错误 <sub>(high)</sub>

vnpy BarGenerator 在合成 N 分钟 K 线时，第一根推送的 K 线时间戳为"当前 tick 所在分钟"而非"完整 N 分钟周期结束时间"。具体表现：09:59 的 tick 会 触发一根不完整的 5 分钟 K 线推送（本应等到 10:04 才推送）。策略若在 on_bar 中直接用 datetime.minute % 5 过滤，第一根 K 线恰好通过，但包含的 数据不足一个完整周期，用于信号计算会产生错误的开仓信号。

Source: https://github.com/vnpy/vnpy/issues/3691

### `AP-VNPY-3669` — Alpha 模块历史数据增量保存时新旧 DataFrame schema 不兼容导致 SchemaError <sub>(medium)</sub>

vnpy Alpha 模块在保存 K 线数据到 Parquet 文件时，将新下载数据（可能含 Float64 列）与已存文件（历史 Int64 列）直接 polars.concat。polars 强类型 不允许隐式类型提升，抛出 SchemaError。根因是不同数据源/版本返回的字段类型 不一致（如 volume 在部分行情源为整数，在另一些为浮点），且 concat 前无 schema 对齐步骤。影响所有使用 vnpy alpha 进行回测的历史数据构建流程。

Source: https://github.com/vnpy/vnpy/issues/3669

### `AP-VNPY-3685` — 价差交易模块 run_backtesting() 在 Jupyter 环境下静默报错，结果不可信 <sub>(high)</sub>

vnpy 4.10 价差交易（SpreadTrading）模块的 run_backtesting() 在 Jupyter 环境下存在事件循环冲突（asyncio already running），导致回测引擎部分逻辑 不执行但不抛异常，返回看似正常的回测统计数据。同样代码在命令行 Python 中无此问题。vnpy 4.x 将部分 IO 改为 async 但 Jupyter 的事件循环与之不兼容， 是"回测结果看起来正确但实际不完整"的隐蔽陷阱。

Source: https://github.com/vnpy/vnpy/issues/3685

### `AP-VNPY-3700` — 安装脚本不使用 venv 导致全局 numpy 版本被降级破坏其他依赖 <sub>(medium)</sub>

vnpy install.bat 直接在系统/conda base 环境安装，会强制降级 numpy 到 <2.0 以满足 vnpy 依赖，破坏依赖 numpy 2.x 的其他量化工具（如 scipy、pytorch 新版）。 没有 requirements.txt，依赖边界不透明。在多工具共存的量化研究环境中， vnpy 的安装脚本是"全局环境污染"的常见根源。

Source: https://github.com/vnpy/vnpy/issues/3700

## zipline (6)

### `AP-ZIPLINE-138` — 回测价格为未复权价，教程图表误导用户误判策略收益 <sub>(high)</sub>

Zipline 教程使用 AAPL 股价图做演示，但 bundle 中存储的是未复权价格（raw price）， 而非经过拆股/分红调整的复权价。图表显示的历史价格与市场实际价约差 4 倍（Apple 历次拆股累计因子），用户误将"价格翻 4 倍"当作策略收益。A 股场景更严重： 除权前后价格跳变会在未复权数据中形成巨大"信号"，吸引技术指标在除权日产生 虚假突破信号。

Source: https://github.com/stefan-jansen/zipline-reloaded/issues/138

### `AP-ZIPLINE-235` — 默认以当根 K 线收盘价成交，低估实盘滑点，策略回测收益虚高 <sub>(high)</sub>

Zipline 默认滑点模型在当根 K 线触发信号后，以同根 K 线收盘价成交（current bar close fill）。实盘中信号只能在下一根 K 线的开盘价附近成交（T+1 order execution）。以 A 股日线为例，用收盘价回测比用次日开盘价成交平均高估日收益 约 0.1-0.3%，年化差距可超 30%。需显式配置 slippage model 为 VolumeShareSlippage 或 FixedSlippage 并设合理 volume_limit。

Source: https://github.com/stefan-jansen/zipline-reloaded/issues/235

### `AP-ZIPLINE-190` — 日历 start_session 设为非交易日触发 DateOutOfBounds，无提示如何修正 <sub>(medium)</sub>

Zipline 在注册 bundle 或运行算法时，若 start_session 参数恰好是非交易日 （如 1998-01-01 元旦），Calendar 校验抛出 DateOutOfBounds（"cannot be earlier than the first session"）。错误信息仅显示交易日历起始日，不提示"请改为第一个 交易日"。A 股场景：使用 SSE/SZSE 日历时，若 start_date 恰好是春节前最后 一天次日（节假日），会触发同类错误，调试成本极高。

Source: https://github.com/stefan-jansen/zipline-reloaded/issues/190

### `AP-ZIPLINE-181` — asset db 过期后 Pipeline 报"no assets traded"，误导用户排查数据范围 <sub>(high)</sub>

Zipline 的 asset database（SQLite）记录每只股票的 start/end 交易日期。若 使用了旧版 Quandl/自建 bundle 且未重新 ingest，在回测新日期范围时 Pipeline 抛出 "Failed to find any assets with country_code 'US' that traded between [dates]"。A 股场景：重新下载行情后若只更新价格数据而未重建 asset db，退市/ 新上市股票的日期范围不更新，Pipeline 过滤会悄悄排除这些股票，产生生存者偏差。

Source: https://github.com/stefan-jansen/zipline-reloaded/issues/181

### `AP-ZIPLINE-285` — week_start()/week_end() 在自定义日历（非美股）下静默失效 <sub>(medium)</sub>

Zipline schedule_function 的 date_rules.week_start() 和 date_rules.week_end() 依赖交易日历的周首/周末判断逻辑，但在非美股日历（如 ASX、SSE）中，该逻辑 与 NYSE 日历的偏移计算不兼容，导致 schedule 永远不触发或在错误的日期触发。 A 股场景：使用 SSE 日历时，含春节等连续长假的周，week_start 可能跳过整个 假期周而不调仓，但用户无法从日志发现未触发的调度。

Source: https://github.com/stefan-jansen/zipline-reloaded/issues/285

### `AP-ZIPLINE-240` — 回测日期时区必须为 UTC，传入 naive datetime 引发深层 AssertionError <sub>(medium)</sub>

Zipline 内部强制要求所有时间戳为 UTC aware datetime。当用户传入 naive datetime （无时区信息，如 pd.Timestamp('2020-01-01')）时，不在入口处报错，而是在 算法执行深处触发 AssertionError: Algorithm should have a utc datetime，栈深 难以定位。A 股开发者从本地 CST 时间导入数据时极易触发此陷阱，需在 bundle 注册时显式 tz_localize('UTC')。

Source: https://github.com/stefan-jansen/zipline-reloaded/issues/240

## zvt (6)

### `AP-ZVT-183` — 除权因子为 inf/NaN 时直接参与乘法导致复权静默失败 <sub>(high)</sub>

ZVT 在计算前复权因子时以 new/old 价格比计算 qfq_factor。当 old==0（新股首日 或数据缺失）时因子为 inf；当 kdata.open 本身为 None（停牌日未填充）时乘法 抛出 TypeError。结果：整个 entity 的复权计算中断，后续 K 线全部丢失，但主 流程只 log ERROR 不中断，用户往往不知道已有大量股票数据损坏。

Source: https://github.com/zvtvz/zvt/issues/183

### `AP-ZVT-179` — 第三方数据接口超限后异常被吞噬，数据静默缺失 <sub>(high)</sub>

ZVT 使用聚宽 jqdatasdk 批量拉取全市场 K 线时（4000+ 股票），触发聚宽每日 最大查询条数限制（错误：已超过每日最大查询数量）。ZVT 捕获异常后继续执行下一 entity，导致超限后所有股票的当日数据均静默缺失。回测若使用该残缺数据库，因 子计算结果将产生系统性偏差，且无告警。

Source: https://github.com/zvtvz/zvt/issues/179

### `AP-ZVT-161` — 全市场 SQLite 批量因子计算触发 too many SQL variables 错误 <sub>(medium)</sub>

ZVT 在计算 VolumeUpMaFactor 等多股因子时，将所有 entity_id 拼入单条 SQL 的 IN 子句。当 A 股全市场（5000+ 股）一次性查询时，触发 SQLite 默认限制 SQLITE_MAX_VARIABLE_NUMBER=999。调大 max_allowed_packet（MySQL 参数）无效， 根因是 SQLite 变量数上限。正确解法是分批查询，但 ZVT 早期版本未处理此边界。

Source: https://github.com/zvtvz/zvt/issues/161

### `AP-ZVT-129` — 使用通配符导入隐藏 API 版本变更，AdjustType 等枚举莫名消失 <sub>(medium)</sub>

ZVT 文档示例使用 `from zvt import *` 导入所有符号。当 ZVT 版本升级重构 枚举（如将 AdjustType 移入子模块）后，通配符导入不再包含该符号，触发 AttributeError。使用者误以为是安装问题，实际是版本间 API breaking change 未在 CHANGELOG 中标注，且通配符导入掩盖了具体来源。应显式 import 枚举类。

Source: https://github.com/zvtvz/zvt/issues/129

### `AP-ZVT-187` — 回测引擎未在数据层空结果时提前终止，导致空指针级联崩溃 <sub>(medium)</sub>

ZVT Trader 在 load_data 完成后检查数据为空时，不提前退出，而是将空 DataFrame 传入 selector 计算，触发后续 NoneType 操作链式崩溃。错误栈深且难以定位根因， 用户误以为是策略逻辑问题。根因是数据时间窗口配置错误（start/end 不在数据 库覆盖范围内）但无有效校验。

Source: https://github.com/zvtvz/zvt/issues/187

### `AP-ZVT-183B` — HFQ（后复权）与 QFQ（前复权）K 线表使用错误导致因子计算漂移 <sub>(high)</sub>

ZVT 提供 Stock1dKdata（不复权）、Stock1dHfqKdata（后复权）、Stock1dQfqKdata （前复权）三张独立表。用户在计算价格动量/均线因子时混用两张表（如用不复权 做均线，用后复权做收益率），导致除权日前后因子值产生跳变。ZVT 不做跨表 复权类型一致性校验，混用静默通过。

Source: https://github.com/zvtvz/zvt/issues/183

FILE:references/COMPONENTS.md
# Component Capability Map

**Project**: finance-bp-086--backtrader
**Scan date**: 2026-04-22
**Stats**: {'total_files': 7, 'total_classes': 24, 'total_functions': 0, 'total_stages': 7}

## Modules (7)

- [data_ingestion](components/data_ingestion.md): 3 classes
- [data_filtering_&_resampling](components/data_filtering_-_resampling.md): 2 classes
- [indicator_computation](components/indicator_computation.md): 3 classes
- [strategy_logic](components/strategy_logic.md): 6 classes
- [order_execution_&_broker](components/order_execution_-_broker.md): 4 classes
- [analysis_&_reporting](components/analysis_-_reporting.md): 2 classes
- [cerebro_orchestration](components/cerebro_orchestration.md): 4 classes

FILE:references/CONSTRAINTS.md
# Constraints

## preservation_manifest

```yaml
required_objects:
  business_decisions_count: 184
  fatal_constraints_count: 30
  non_fatal_constraints_count: 239
  use_cases_count: 2
  semantic_locks_count: 12
  preconditions_count: 4
  evidence_quality_rules_count: 2
  traceback_scenarios_count: 5

```

## Domain Constraints Injected (39)

- **`SHARED-BT-LAB-001`** <sub>(fatal)</sub>: 未来函数（Lookahead Bias）：在模拟历史时间点 t 的交易决策时， 不得使用 t 时刻之后才能知道的信息。最常见形式： (1) 使用收盘价计算信号并同日以收盘价成交； (2) 将 T 日收盘后计算的指标标记在同一根 K 线； (3) 使用当日最高/最低价作为成交假设。 信号计算与成交时间必须对齐：T 日收盘后计算信号，T+1 日开盘成交。
- **`SHARED-BT-LAB-002`** <sub>(high)</sub>: 指标预热期（Warmup Period）处理：滚动窗口指标在前 N 个 bar 时 NaN， 这些 bar 不应参与信号计算和持仓决策。强制要求指标的 warmup_period 与最长 lookback 期等长，且 warmup 期间持仓应置零。
- **`SHARED-BT-LAB-003`** <sub>(fatal)</sub>: ML/DL 模型时序数据划分必须按时间顺序：TRAIN < VALID < TEST， 不可使用随机 k-fold 分折（会将未来数据混入训练集）。 应使用 TimeSeriesSplit 或 Walk-Forward 验证。
- **`SHARED-BT-LAB-004`** <sub>(fatal)</sub>: 开盘价/最高价/最低价成交假设：日线回测中假设每日可以最高价卖出或 最低价买入（如动量策略"最高价止盈"），这是明显的 lookahead， 因为日内最高/最低价只有收盘后才能确认。成交价只能用开盘价或 前一日收盘价（带滑点）。
- **`SHARED-BT-LAB-005`** <sub>(high)</sub>: 数据对齐偏移（Off-by-one）：pandas rolling/shift 等操作容易引入细微的 1 期偏移错误。应在代码中明确记录每个序列的"观测时间点"， 并通过 assert 验证关键时间对齐关系。
- **`SHARED-BT-LAB-006`** <sub>(high)</sub>: 过度优化（Overfitting）：回测数量越多，过拟合概率越高。 Bailey et al.（2014）证明 Optimal Sharpe Ratio 期望值随回测次数单调递减。 应使用 Walk-Forward 验证代替 in-sample 参数穷举，并报告 Deflated Sharpe Ratio（DSR）而非峰值 Sharpe。
- **`SHARED-BT-SURV-001`** <sub>(fatal)</sub>: 幸存者偏差（Survivorship Bias）：使用当前市场成分股作为历史回测股票池， 会遗漏曾经存在但后来退市、摘牌或被合并的股票，系统性高估策略历史收益率。 回测股票池必须使用历史时点快照（point-in-time universe）。
- **`SHARED-BT-SURV-002`** <sub>(high)</sub>: In-Sample / Out-of-Sample 划分：策略开发、参数选择必须在样本内完成， 样本外数据仅用于最终验证，不可多次"看"样本外数据后继续调优 （会将样本外变为新的样本内，重蹈过拟合）。
- **`SHARED-BT-SURV-003`** <sub>(high)</sub>: 停牌/缺失数据的填充策略：停牌日价格不可简单用前一日收盘价 forward-fill， 因为这会在复盘时造成"零成交量"日参与了因子计算和信号生成。 应在因子计算层显式过滤缺失交易日，不填充。
- **`SHARED-BT-SURV-004`** <sub>(high)</sub>: 异常值（Extreme Value）污染：原始市场数据可能含有数据源错误（如除权未 及时调整、手工录入错误导致的极端价格），不清洗直接进入因子计算会产生 极端信号，污染整个横截面。应在 pipeline 入口处过滤 3-sigma 异常值。
- **`SHARED-BT-COST-001`** <sub>(fatal)</sub>: 交易成本（佣金 + 印花税/转让税 + 过户费）必须在回测初始化时强制配置， 不可使用零成本默认值。忽略成本的回测策略绩效指标具有欺骗性， 高换手率策略尤其严重（单边往返成本往往吞噬 50%+ 的毛收益）。
- **`SHARED-BT-COST-002`** <sub>(high)</sub>: 滑点（Slippage）建模：回测若无滑点，假设每笔订单以理想价格成交， 高频策略在实盘中会因成交价劣化而产生严重亏损。至少应配置固定点差 或比例滑点；大单应使用成交量比例模型（如不超过日成交量 5%）。
- **`SHARED-BT-COST-003`** <sub>(high)</sub>: 换手率（Turnover）必须在回测绩效报告中展示并与成本关联分析。 月换手率超过 50%（年化 600%+）时，策略净收益对成本假设极度敏感， 每 10bps 成本变化可能改变策略盈亏结论，必须做成本敏感性分析。
- **`SHARED-BT-COST-004`** <sub>(medium)</sub>: 仓位规模化（Position Sizing）必须纳入资金量约束：回测应模拟固定资金量 下的实际持仓股数（取整），而非假设可以持有小数股。 对小盘股，最小交易单位（A股：100股/手）会导致实际可持仓量与目标权重 产生偏差，应在回测中模拟取整效应。
- **`SHARED-BT-TIME-001`** <sub>(high)</sub>: 时间戳时区统一：多数据源合并时，UTC vs 本地时间混用是常见数据腐败源。 所有时间戳必须在 pipeline 入口处统一转换为同一时区（推荐 UTC 存储， 市场本地时区展示），不可在 pipeline 中途混用不同时区。
- **`SHARED-BT-TIME-002`** <sub>(high)</sub>: 交易日历对齐：合并不同市场或不同频率数据时（如日线价格 + 周频因子）， 必须使用明确的交易日历进行 reindex/merge，不可使用 outer join 后 fillna， 否则会在非交易日（节假日）创建虚假数据行。
- **`SHARED-BT-TIME-003`** <sub>(high)</sub>: 增量更新边界校验：历史数据增量更新时，必须从数据库查询已存最新日期， 仅下载该日期之后的数据。若重新下载已有数据并追加，会产生时间戳重复行， 导致回测时序错误。更新前后必须校验无重复 (index.duplicated().any() == False)。
- **`SHARED-BT-TIME-004`** <sub>(medium)</sub>: 回测绩效归因失真：基准（Benchmark）选择不当会使 Alpha/Beta 计算失真。 应选用策略实际可投资的被动基准（如 HS300 ETF），而非不可直接投资的 价格指数（如 HS300 指数）。价格指数不含股息再投资，会低估持仓基准收益。
- **`SHARED-BT-PERF-001`** <sub>(medium)</sub>: 最大回撤（Max Drawdown）计算必须使用净值序列（portfolio value）， 不可用累计收益率序列代替。若使用对数收益率累加，会低估回撤深度 （因对数收益率在下跌时会比简单收益率偏小）。
- **`SHARED-BT-PERF-002`** <sub>(medium)</sub>: Sharpe Ratio 年化化约定：年化 Sharpe = 日 Sharpe × sqrt(252)（股票，252 交易日） 或 × sqrt(365)（加密货币，365日）。不同系统默认不同，跨系统对比前必须 确认年化因子，否则 Sharpe 不可比。
- **`SHARED-BT-PERF-003`** <sub>(medium)</sub>: Calmar Ratio / Sortino Ratio 优于 Sharpe Ratio 作为风险调整收益指标： Sharpe 假设收益正态分布，A 股/加密市场的收益分布显著左偏（肥尾）， 会低估下行风险。量化评估应同时报告 Sortino（仅下行波动）和 Calmar（年化收益/最大回撤），不应单一依赖 Sharpe。
- **`SHARED-BT-PERF-004`** <sub>(medium)</sub>: 回测绩效归因应拆解为：alpha（主动收益）、beta（市场收益）、 因子暴露收益（style/sector）和特异性收益（stock selection）。 不做归因的回测无法区分"策略优秀"与"顺风行情恰好 beta 对了"。
- **`SHARED-FR-IC-001`** <sub>(high)</sub>: IC（信息系数）是衡量因子预测能力的核心指标，定义为因子值与 下期收益率的 Spearman 秩相关系数（ICIR = IC / std(IC)）。 IC 绝对值 > 0.05 视为有预测能力的初步证据，ICIR > 0.5 视为稳定。 不计算 IC 直接报告回测绩效是因子有效性证明缺失的典型问题。
- **`SHARED-FR-IC-002`** <sub>(high)</sub>: IC 衰减（IC Decay）分析：因子预测能力通常随持仓期增长而衰减。 应计算 1/5/10/20 日 IC 序列，识别因子的最优持仓期。 IC 在1日高但20日迅速衰减的因子是短期因子，不适合月度换仓策略； 反之亦然。使用错误的持仓期会严重损害因子实盘表现。
- **`SHARED-FR-IC-003`** <sub>(high)</sub>: Harvey, Liu & Zhu (2016) 警告：学术界已发现 300+ 个"显著"因子， 其中大量是多重检验下的误发现（False Discovery）。因子有效性要求： t-stat > 3.0（而非传统的 1.96）；或在不同时段/市场独立复现； 或有清晰的经济学逻辑。不满足上述条件的因子极可能是数据挖掘产物。
- **`SHARED-FR-IC-004`** <sub>(high)</sub>: 因子换手率（Factor Turnover）控制：高 IC 但高换手率的因子，在扣除 交易成本后净 IC 可能为负。应计算换手率调整后的有效 IC： net_IC = IC - turnover × cost_per_turn。目标换手率 ≤ 50%（月频）。
- **`SHARED-FR-IC-005`** <sub>(medium)</sub>: 因子衰减期（Half-life）是因子信号强度的核心参数，直接决定最优再平衡频率。 半衰期 < 5 日：日频或周频换仓；5-20 日：周频或双周；> 20 日：月频换仓。 错误地对短期因子使用月频换仓，会导致大量 alpha 在持仓期内消散。
- **`SHARED-FR-NEUT-001`** <sub>(high)</sub>: 行业中性化（Industry Neutralization）：因子值若不对行业均值中性化， 因子收益中会混入行业轮动收益，难以判断是因子本身还是行业暴露驱动了收益。 行业中性化操作：factor_neutral = factor - industry_mean(factor)。
- **`SHARED-FR-NEUT-002`** <sub>(high)</sub>: 市值中性化（Market Cap Neutralization）：小盘股效应（小盘跑赢大盘） 是金融史上最持久的 anomaly 之一，会污染几乎所有未中性化的因子。 若因子与市值高度相关，选股会系统性偏向小盘，收益来自市值暴露而非因子本身。 需同时进行行业和市值中性化（Fama-MacBeth 回归或残差法）。
- **`SHARED-FR-NEUT-003`** <sub>(high)</sub>: 异常值处理（Winsorize/MAD）：因子原始值通常含有极端值，极端值会扭曲 分组分析（如 Q1/Q10 十分位）。应对原始因子值做 Winsorize（截尾至 [1%, 99%] 或 3-sigma）或 MAD（中位数绝对偏差）缩尾，然后再排名/中性化。
- **`SHARED-FR-NEUT-004`** <sub>(medium)</sub>: 因子正交化（Factor Orthogonalization）：当多个因子共同用于合成打分时， 高相关因子的合成等效于对单一因子过度权重，稀释信号多样性。 应在合成前对因子做施密特正交化或 PCA，消除因子间的多重共线性。
- **`SHARED-FR-NEUT-005`** <sub>(medium)</sub>: 缺失数据填充策略：因子计算中的 NaN（停牌/新股/数据缺口）若用截面均值填充 会引入 lookahead bias（均值本身含未来信息）；若完全删除会产生幸存者偏差； 正确做法是用截面中位数（当日所有股票的中位数，不依赖未来）或将该股当日排除。
- **`SHARED-FR-PORT-001`** <sub>(high)</sub>: 分层分析（Quantile Analysis）：因子评估应使用 Q1/Q5（五分位）或 Q1/Q10（十分位）分组的多空收益差（top minus bottom spread）作为 主要评估指标，而非简单的多头收益。Q1 多 Q5 空的"单调性"检验是 因子有效性的核心证据：单调递增/递减 > 非单调 >> 仅多头有效。
- **`SHARED-FR-PORT-002`** <sub>(medium)</sub>: Alpha 衰减测试（Alpha Decay Test）：因子的月度 IC 在不同时段（牛市/熊市/ 震荡市）的稳定性是因子鲁棒性的重要证据。IC 仅在某个特定市场状态下有效 的因子不适合全天候部署；应分段（rolling 12M）展示 IC 时序， 识别因子失效期。
- **`SHARED-FR-PORT-003`** <sub>(medium)</sub>: 换仓成本感知（Turnover-Aware Selection）：因子排名靠近中间地带（49-51 分位） 的股票，排名小幅波动就会触发换仓，产生大量无效交易成本。 应在选股时设置换仓缓冲区（buffer zone）：只在排名变化超过阈值时才换仓。
- **`SHARED-FR-PORT-004`** <sub>(medium)</sub>: 分组收益的统计显著性（Bootstrap 检验）：因子分层收益差（Q1-Q5 spread） 即使在历史数据上很大，也可能是偶然，需要 bootstrap 或 t-test 检验 显著性（p-value < 0.05）。小样本回测期（< 3年）的分层收益尤其不可靠。
- **`SHARED-FR-XFER-001`** <sub>(high)</sub>: 因子跨市场可移植性验证：在一个市场有效的因子，不必然在另一个市场有效。 将美股因子直接套用 A 股、或将股票因子套用期货/加密货币，需要独立 IC 验证， 不可假设跨市场通用性。A 股特有异象（如反转效应、ST 价格异常）不存在于美股。
- **`SHARED-FR-XFER-002`** <sub>(medium)</sub>: 因子有效性时间稳定性：曾经有效的因子会因市场学习和套利行为逐渐失效 （McLean & Pontiff 2016 证明因子发表后平均衰减 58%）。 应定期（每季度/年）重新评估因子 IC，失效因子应及时替换或降权。
- **`SHARED-FR-XFER-003`** <sub>(medium)</sub>: 因子与宏观经济环境的交互：利率周期/经济周期/市场情绪对因子有效性影响显著。 价值因子（低 P/B）在利率上升期更有效；动量因子在趋势市更有效，震荡市失效。 部署因子前应评估当前宏观环境与因子最优生存环境的匹配度。

FILE:references/LOCKS.md
# Semantic Locks + Preconditions

## Semantic Locks (12)

### `SL-01` <sub>(on_violation: fatal)</sub>
Execute sell orders before buy orders in every trading cycle

### `SL-02` <sub>(on_violation: fatal)</sub>
Trading signals MUST use next-bar execution (no look-ahead)

### `SL-03` <sub>(on_violation: fatal)</sub>
Entity IDs MUST follow format entity_type_exchange_code

### `SL-04` <sub>(on_violation: fatal)</sub>
DataFrame index MUST be MultiIndex (entity_id, timestamp)

### `SL-05` <sub>(on_violation: fatal)</sub>
TradingSignal MUST have EXACTLY ONE of: position_pct, order_money, order_amount

### `SL-06` <sub>(on_violation: fatal)</sub>
filter_result column semantics: True=BUY, False=SELL, None/NaN=NO ACTION

### `SL-07` <sub>(on_violation: fatal)</sub>
Transformer MUST run BEFORE Accumulator in factor pipeline

### `SL-08` <sub>(on_violation: fatal)</sub>
MACD parameters locked: fast=12, slow=26, signal=9

### `SL-09` <sub>(on_violation: warning)</sub>
Default transaction costs: buy_cost=0.001, sell_cost=0.001, slippage=0.001

### `SL-10` <sub>(on_violation: fatal)</sub>
A-share equity trading is T+1 (no same-day close of buy positions)

### `SL-11` <sub>(on_violation: fatal)</sub>
Recorder subclass MUST define provider AND data_schema class attributes

### `SL-12` <sub>(on_violation: fatal)</sub>
Factor result_df MUST contain either 'filter_result' OR 'score_result' column

## Preconditions (4)

- **`PC-01`**: `python3 -c 'import zvt; print(zvt.__version__)'` → on_fail: Run: python3 -m pip install zvt then re-run: python3 -m zvt.init_dirs to initialize data directories
- **`PC-02`**: `python3 -c "from zvt.api.kdata import get_kdata; df = get_kdata(entity_ids=['stock_sh_600000'], limit=1); assert df is not None and len(df) > 0, 'No kdata found'"` → on_fail: Run recorder first: python3 -m zvt.recorders.em.em_stock_kdata_recorder --entity_ids stock_sh_600000 (replace with your target entity IDs)
- **`PC-03`**: `python3 -c "import os; from pathlib import Path; zvt_home = Path(os.environ.get('ZVT_HOME', Path.home() / '.zvt')); assert zvt_home.exists(), f'ZVT home not found: {zvt_home}'"` → on_fail: Run: python3 -m zvt.init_dirs
- **`PC-04`**: `python3 -c "import os, tempfile; from pathlib import Path; zvt_home = Path(os.environ.get('ZVT_HOME', Path.home() / '.zvt')); test_f = zvt_home / '.write_test'; test_f.touch(); test_f.unlink()"` → on_fail: Check directory permissions: chmod u+w ~/.zvt or set ZVT_HOME environment variable to a writable location

FILE:references/USE_CASES.md
# Known Use Cases (KUC)

Total: **2**

## `KUC-101`
**Source**: `samples/pyfolio2/backtrader-pyfolio.ipynb`

Implements a classic dual moving average crossover trading strategy using backtrader, generating LONG/LONGSHORT signals when fast and slow SMAs cross, with integrated PyFolio portfolio analytics for performance measurement.

## `KUC-102`
**Source**: `samples/pyfoliotest/backtrader-pyfolio.ipynb`

Provides a minimal backtrader strategy that logs and prints OHLC (Open, High, Low, Close) data in CSV format for debugging and verifying data feed integrity.

FILE:references/WISDOM.md
# Cross-Project Wisdom

Total: **10**

## `CW-BT-001` — Cerebro 统一编排引擎
**From**: backtrader · **Applicable to**: backtesting

backtrader 用 Cerebro 作为单一入口，统一管理 data feeds、strategies、analyzers、 observers 的生命周期，支持一次 cerebro.run() 跑多策略+多数据源。 zvt 的 StockTrader 目前每次实例化只绑定一套因子，缺乏统一的多策略组合编排层； 借鉴 Cerebro 模式可让用户把多个 Trader 实例组合到一个 runner 中对比评估。

## `CW-BT-002` — Analyzer 插件化绩效评估
**From**: backtrader · **Applicable to**: backtesting

backtrader 提供 SharpeRatio、DrawDown、TimeReturn、TradeAnalyzer 等即插即用 的 Analyzer，可在不修改策略代码的情况下附加任意绩效指标。 zvt 当前绩效评估能力较弱，没有标准化的 Analyzer 接口； 借鉴此模式可让用户 cerebro.addanalyzer(SharpeRatio) 即得风险调整收益报告。

## `CW-BT-003` — Sizer 仓位管理分离
**From**: backtrader · **Applicable to**: backtesting

backtrader 将仓位管理（每次开仓买多少股/多大比例）单独抽象为 Sizer， 与信号逻辑完全解耦；内置 FixedSize、PercentSizer 等，用户可自定义。 zvt 目前没有显式的 Sizer 概念，仓位控制逻辑散落在 Trader.on_profit_control 等钩子中； 引入 Sizer 接口可使策略信号与资金管理规则独立演化和组合复用。

## `CW-BT-004` — Order 类型全集（Limit/Stop/OCO/Bracket）
**From**: backtrader · **Applicable to**: backtesting

backtrader 支持 Market、Limit、Stop、StopLimit、OCO（二选一）、 Bracket（止盈止损一对订单）等丰富订单类型，并模拟成交滑点和手续费方案。 zvt 回测目前主要支持市价成交，缺乏限价委托和组合订单模拟； 对于高频或实盘对接场景，完善订单类型将大幅提升回测真实性。

## `CW-BT-005` — 数据重采样与重播（Resampling & Replaying）
**From**: backtrader · **Applicable to**: backtesting

backtrader 可将低级别数据（如 1 min）实时 resample 为高级别（如 1 day）并同步驱动策略， 或 replay 逐 tick 模拟 OHLC 形成过程，实现日内精细回测。 zvt 目前多时间框架通过预录入不同级别 K 线实现，缺少运行时动态重采样； 借鉴此模式可在不重复录入数据的前提下支持任意时间粒度组合回测。

## `CW-VN-003` — CTA 回测引擎内置可视化
**From**: vnpy · **Applicable to**: backtesting

vnpy 的 cta_backtester 提供图形界面直接展示策略净值曲线、最大回撤、 每日盈亏、成交明细，无需 Jupyter Notebook。 zvt 目前回测结果可视化依赖 draw_result 方法调用 Plotly，但无统一的回测报告页面； 借鉴此模式可打包一个开箱即用的策略绩效仪表盘。

## `CW-VN-004` — vnpy.alpha ML 因子研究实验室（Lab）
**From**: vnpy · **Applicable to**: factor-research

vnpy 4.0 的 vnpy.alpha.lab 提供数据管理、模型训练、信号生成、策略回测一体化工作流， 支持 Lasso/LightGBM/MLP 等算法的标准化训练接口和可视化对比。 zvt 的 ML 能力目前仅有 MaStockMLMachine 一个入口，缺乏规范化 Lab 框架； 借鉴 Lab 模式可建立"特征工程→训练→信号→回测"的标准流水线，降低 ML 实验门槛。

## `CW-QL-001` — Point-in-Time 数据库（防未来数据泄漏）
**From**: qlib · **Applicable to**: backtesting

qlib 的 Point-in-Time Provider 保证在给定时间点 t 的查询只返回 t 时刻 真实可知的数据（财报发布延迟、修订历史均被正确处理）， 彻底消除回测中的 look-ahead bias。 zvt 目前财务数据以报告期为 timestamp，缺少"发布日"维度， 存在用未来财报数据做选股的潜在偏差；引入 PIT 模式可大幅提升回测可信度。

## `CW-QL-002` — Recorder + Experiment 实验管理（MLflow 风格）
**From**: qlib · **Applicable to**: factor-research

qlib 的 workflow 模块提供 Experiment/Recorder，自动记录每次模型训练的 超参数、特征、指标、预测结果，支持跨实验比较和模型版本管理。 zvt 目前缺乏 ML 实验追踪机制，每次重跑结果会覆盖前次； 借鉴 Recorder 模式可将每次因子实验的参数和结果持久化，支持快速复现和版本对比。

## `CW-QL-003` — Nested Decision Framework（多层嵌套决策执行）
**From**: qlib · **Applicable to**: backtesting

qlib 支持将高频执行层（分钟级委托拆单）嵌套在低频决策层（日级组合调仓）中， 两层独立优化且可组合运行，实现日内最优执行算法（如 TWAP、VWAP 调仓）。 zvt 目前回测仅有日线级别的成交假设，缺乏执行算法建模； 借鉴嵌套框架可让 zvt 区分"何时持有哪些股"与"如何以最小冲击成本建仓"两个问题。

FILE:references/components/analysis_-_reporting.md
# analysis_&_reporting (2 classes)

## `Analyzer.get_analysis`
`analysis_&_reporting/analyzer-get-analysis.py:0`

## `Output format`
`analysis_&_reporting/output-format.py:0`

FILE:references/components/cerebro_orchestration.md
# cerebro_orchestration (4 classes)

## `Cerebro.adddata`
`cerebro_orchestration/cerebro-adddata.py:0`

## `Cerebro.addstrategy`
`cerebro_orchestration/cerebro-addstrategy.py:0`

## `Cerebro.run`
`cerebro_orchestration/cerebro-run.py:0`

## `Execution mode`
`cerebro_orchestration/execution-mode.py:0`

FILE:references/components/data_filtering_-_resampling.md
# data_filtering_&_resampling (2 classes)

## `Resampler._process`
`data_filtering_&_resampling/resampler-process.py:0`

## `Resampling mode`
`data_filtering_&_resampling/resampling-mode.py:0`

FILE:references/components/data_ingestion.md
# data_ingestion (3 classes)

## `Cerebro.adddata`
`data_ingestion/cerebro-adddata.py:0`

## `CSVDataBase._loadline`
`data_ingestion/csvdatabase-loadline.py:0`

## `Data Source`
`data_ingestion/data-source.py:0`

FILE:references/components/indicator_computation.md
# indicator_computation (3 classes)

## `Indicator.next`
`indicator_computation/indicator-next.py:0`

## `Indicator.once`
`indicator_computation/indicator-once.py:0`

## `Indicator calculation`
`indicator_computation/indicator-calculation.py:0`

FILE:references/components/order_execution_-_broker.md
# order_execution_&_broker (4 classes)

## `BrokerBase.submit`
`order_execution_&_broker/brokerbase-submit.py:0`

## `BrokerBase.getvalue`
`order_execution_&_broker/brokerbase-getvalue.py:0`

## `BrokerBase.getposition`
`order_execution_&_broker/brokerbase-getposition.py:0`

## `Broker backend`
`order_execution_&_broker/broker-backend.py:0`

FILE:references/components/strategy_logic.md
# strategy_logic (6 classes)

## `Strategy.__init__`
`strategy_logic/strategy-init.py:0`

## `Strategy.next`
`strategy_logic/strategy-next.py:0`

## `Strategy.buy`
`strategy_logic/strategy-buy.py:0`

## `Strategy.sell`
`strategy_logic/strategy-sell.py:0`

## `Sizing logic`
`strategy_logic/sizing-logic.py:0`

## `Trade tracking`
`strategy_logic/trade-tracking.py:0`

ClawHub Coding Data Analysis+2

T@clawhub-tangweigang-jpg-8679fec286

Arcticdb Timeseries

Skill

管理大规模时序数据存储与查询，支持十亿行级数据高效聚合，提供 DataFrame 懒加载与批量拼接，兼容 AWS S3 等多种存储后端。。

---
name: arcticdb-timeseries
description: |-
  管理大规模时序数据存储与查询，支持十亿行级数据高效聚合，提供 DataFrame 懒加载与批量拼接，兼容 AWS S3 等多种存储后端。。
license: Proprietary. See LICENSE.txt in project root.
compatibility: Designed for Doramagic-host ecosystem (Claude Code / openclaw / Cursor). Requires Python 3.12+ with uv package manager.
metadata:
  version: "v6.1"
  blueprint_id: "finance-bp-103"
  compiled_at: "2026-04-22T13:00:48.376963+00:00"
  capability_markets: "multi-market"
  capability_activities: "data-sourcing"
  sop_version: "crystal-compilation-v6.1"
---
# ArcticDB 时序存储 (arcticdb-timeseries)

> 管理大规模时序数据存储与查询，支持十亿行级数据高效聚合，提供 DataFrame 懒加载与批量拼接，兼容 AWS S3 等多种存储后端。

## Pipeline

`data_collection -> data_storage -> factor_computation -> target_selection -> trading_execution -> visualization`

## Top Use Cases (17 total)

### AWS S3 Configuration for Public Blockchain Data Access (`UC-101`)
Setting up AWS credentials to enable secure access to public blockchain data stored in S3, allowing integration with ArcticDB for time-series storage
**Triggers**: aws, s3, credentials

### Billion Row Challenge - Large Scale Data Performance (`UC-102`)
Demonstrates ArcticDB's ability to handle massive datasets (1 billion rows of temperature data) with efficient aggregation, serving as a performance b
**Triggers**: billion rows, large scale, performance

### Batch DataFrame Concatenation with Lazy Loading (`UC-103`)
Demonstrates efficient concatenation of multiple DataFrames stored in ArcticDB using lazy loading to minimize memory consumption during batch operatio
**Triggers**: concat, batch, lazy

For all **17** use cases, see [references/USE_CASES.md](references/USE_CASES.md).

**Execute trigger**: `When user intent matches intent_router.uc_entries[].positive_terms AND user uses action verb (run/execute/跑/执行/backtest/fetch/collect)`

## What I'll Ask You

- Target market: A-share (default), HK, or crypto? (US stocks in ZVT are half-baked — stockus_nasdaq_AAPL exists but coverage is thin)
- Data source / provider: eastmoney (free, no account), joinquant (account+paid), baostock (free, good history), akshare, or qmt (broker)?
- Strategy type: MACD golden-cross, MA crossover, volume breakout, fundamental screen, or custom factor?
- Time range: start_timestamp and end_timestamp for backtest period
- Target entity IDs: specific stocks (stock_sh_600000) or index components (SZ1000)?

## Semantic Locks (Fatal)

| ID | Rule | On Violation |
|---|---|---|
| `SL-01` | Execute sell orders before buy orders in every trading cycle | halt |
| `SL-02` | Trading signals MUST use next-bar execution (no look-ahead) | halt |
| `SL-03` | Entity IDs MUST follow format entity_type_exchange_code | halt |
| `SL-04` | DataFrame index MUST be MultiIndex (entity_id, timestamp) | halt |
| `SL-05` | TradingSignal MUST have EXACTLY ONE of: position_pct, order_money, order_amount | halt |
| `SL-06` | filter_result column semantics: True=BUY, False=SELL, None/NaN=NO ACTION | halt |
| `SL-07` | Transformer MUST run BEFORE Accumulator in factor pipeline | halt |
| `SL-08` | MACD parameters locked: fast=12, slow=26, signal=9 | halt |

Full lock definitions: [references/LOCKS.md](references/LOCKS.md)

## Top Anti-Patterns (14 total)

- **`AP-DATA-SOURCING-001`**: Missing or invalid User-Agent headers for SEC API requests
- **`AP-DATA-SOURCING-002`**: Ignoring external API rate limits causing IP blocking
- **`AP-DATA-SOURCING-003`**: No HTTP timeout configuration causing indefinite hangs

All 14 anti-patterns: [references/ANTI_PATTERNS.md](references/ANTI_PATTERNS.md)

## Evidence Quality Notice

> [QUALITY NOTICE] This crystal was compiled from blueprint finance-bp-103. Evidence verify ratio = 79.0% and audit fail total = 19. Generated results may have uncaptured requirement gaps. Verify critical decisions against source files (LATEST.yaml / LATEST.jsonl).

## Reference Files

| File | Contents | When to Load |
|---|---|---|
| [references/seed.yaml](references/seed.yaml) | V6+ 全量权威 (source-of-truth) | 有行为/决策争议时必读 |
| [references/ANTI_PATTERNS.md](references/ANTI_PATTERNS.md) | 14 条跨项目反模式 | 开始实现前 |
| [references/WISDOM.md](references/WISDOM.md) | 跨项目精华借鉴 | 架构决策时 |
| [references/CONSTRAINTS.md](references/CONSTRAINTS.md) | domain + fatal 约束 | 规则冲突时 |
| [references/USE_CASES.md](references/USE_CASES.md) | 全量 KUC-* 业务场景 | 需要完整示例时 |
| [references/LOCKS.md](references/LOCKS.md) | SL-* + preconditions + hints | 生成回测/交易代码前 |
| [references/COMPONENTS.md](references/COMPONENTS.md) | AST 组件地图（按 module 拆分）| 查 API 时 |

---

*Compiled by Doramagic crystal-compilation-v6.1 from `finance-bp-103` blueprint at 2026-04-22T13:00:48.376963+00:00.*
*See [human_summary.md](human_summary.md) for non-technical overview.*

FILE:human_summary.md
# Human Summary

> I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow. Just tell me what you want; I'll write the code, you don't have to dig docs. (Heads up: ZVT natively supports A-share, HK, and crypto. US stocks — stockus_nasdaq_AAPL — are half-baked; don't bother for serious work.)

## What I Can Do

- **tagline**: I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow. Just tell me what you want; I'll write the code, you don't have to dig docs. (Heads up: ZVT natively supports A-share, HK, and crypto. US stocks — stockus_nasdaq_AAPL — are half-baked; don't bother for serious work.)
- **use_cases**: ['Batch DataFrame Concatenation with Lazy Loading', 'Billion Row Challenge - Large Scale Data Performance', 'AWS S3 Configuration for Public Blockchain Data Access', 'A-share MACD daily golden-cross backtest with hfq price adjustment from eastmoney', 'End-to-end ZVT pipeline: FinanceRecorder + GoodCompanyFactor + StockTrader', 'Multi-factor strategy with TargetSelector (AND mode) combining MACD + volume breakout', 'Index composition data collection (SZ1000, SZ2000) with EM recorder']

## What I Ask You

- Target market: A-share (default), HK, or crypto? (US stocks in ZVT are half-baked — stockus_nasdaq_AAPL exists but coverage is thin)
- Data source / provider: eastmoney (free, no account), joinquant (account+paid), baostock (free, good history), akshare, or qmt (broker)?
- Strategy type: MACD golden-cross, MA crossover, volume breakout, fundamental screen, or custom factor?
- Time range: start_timestamp and end_timestamp for backtest period
- Target entity IDs: specific stocks (stock_sh_600000) or index components (SZ1000)?

FILE:references/ANTI_PATTERNS.md
# Anti-Patterns (Cross-Project)

Total: **14**

## finance-bp-070--edgartools (2)

### `AP-DATA-SOURCING-004` — Invalidating XBRL period types for balance sheet analysis <sub>(high)</sub>

Balance sheets represent point-in-time snapshots (instant periods), not ranges (duration periods). Using duration periods for balance sheet statements causes stockholder equity and other line items to show nonsensical date ranges, corrupting financial calculations that depend on accurate period associations.

### `AP-DATA-SOURCING-012` — Large document parsing without streaming causing OOM errors <sub>(high)</sub>

SEC filings can exceed 160MB, and parsing large documents in memory without streaming causes OOM errors that crash the entire service for all users. Documents exceeding 10MB require switching to streaming parsers to prevent extreme memory usage.

## finance-bp-070--edgartools, finance-bp-079--akshare, finance-bp-084--eastmoney, finance-bp-114--edgar-crawler (1)

### `AP-DATA-SOURCING-002` — Ignoring external API rate limits causing IP blocking <sub>(high)</sub>

Multiple financial data sources (SEC EDGAR, Sina, Eastmoney, TuShare) enforce strict rate limits (10 req/sec, 120 calls/minute). Exceeding these triggers temporary IP blocks lasting 10-60 minutes, causing complete data unavailability. Immediate retry attempts during blocks extend the block duration significantly.

## finance-bp-070--edgartools, finance-bp-114--edgar-crawler (1)

### `AP-DATA-SOURCING-001` — Missing or invalid User-Agent headers for SEC API requests <sub>(high)</sub>

SEC EDGAR requires valid User-Agent identity with contact information in headers. Without this, requests are rejected with 403 Forbidden errors, completely blocking all filing access. Both edgartools and edgar-crawler enforce this constraint as fundamental to any data retrieval operation.

## finance-bp-079--akshare (4)

### `AP-DATA-SOURCING-003` — No HTTP timeout configuration causing indefinite hangs <sub>(high)</sub>

HTTP requests to external financial data sources (Yahoo, Sina, Eastmoney) without timeout values can hang indefinitely on blocked connections. This freezes the entire application and prevents data collection from all other sources, creating cascading failures across the system.

### `AP-DATA-SOURCING-005` — Malformed or empty JSON responses causing silent failures <sub>(medium)</sub>

Financial API responses containing malformed JSON raise unhandled ValueError exceptions, crashing downstream processing. Similarly, empty JSON responses (empty dict, list, null) masquerading as valid data cause silent failures producing empty DataFrames or misleading results in financial analysis.

### `AP-DATA-SOURCING-006` — Source-specific symbol mapping errors causing data corruption <sub>(high)</sub>

Stock symbols require source-specific formatting (sh/sz prefixes for Sina, numeric codes for THS, etc.). Incorrect symbol mapping causes API calls to return empty results or wrong data, corrupting financial datasets with missing records or entirely incorrect tickers being stored.

### `AP-DATA-SOURCING-013` — Column mapping length mismatch causing DataFrame errors <sub>(medium)</sub>

Column mapping constants with length mismatch against actual API response columns cause ValueError exceptions during DataFrame construction. Raw field names (f1, f2, f12) must be mapped to meaningful names (最新价, 涨跌幅) with exact column count alignment.

## finance-bp-103--ArcticDB (3)

### `AP-DATA-SOURCING-007` — Using unsupported DataFrame types with time-series storage <sub>(high)</sub>

ArcticDB does not support MultiIndex columns, PyArrow-backed pandas DataFrames, or timedelta64 columns. Attempting to write these DataFrame types raises ArcticDbNotYetImplemented exceptions, causing write failures and permanent data loss if not properly handled before storage operations.

### `AP-DATA-SOURCING-008` — Non-atomic storage writes causing concurrent access corruption <sub>(high)</sub>

Storage backends without atomic write_if_none operations can cause data corruption under concurrent multi-writer access. Similarly, updating reference keys before atom keys complete allows readers to access incomplete or missing data, breaking version chain integrity.

### `AP-DATA-SOURCING-014` — Pruning snapshot-protected versions breaking point-in-time recovery <sub>(high)</sub>

Deleting or pruning versions that are referenced by existing snapshots breaks historical data access. Snapshots provide point-in-time recovery capabilities, and removing their referenced versions causes read failures when users attempt to access data from specific snapshots.

## finance-bp-114--edgar-crawler (1)

### `AP-DATA-SOURCING-010` — 8-K filing item numbering scheme mismatch for historical filings <sub>(medium)</sub>

8-K filings use obsolete item numbering (1-12) before 2004-08-23 and new numbering (1.01-9.01) after. Using the wrong numbering scheme causes no matches for historical filings, resulting in empty item sections and complete extraction failure for pre-2004 data.

## finance-bp-128--yfinance (2)

### `AP-DATA-SOURCING-009` — Missing timezone-aware DatetimeIndex causing DST offset errors <sub>(high)</sub>

Price history DataFrames returned without timezone-aware DatetimeIndex cause incorrect timestamp interpretation when combined with other timezone-aware data. This leads to 23-25 hour offset errors during daylight saving time transitions, corrupting historical price calculations.

### `AP-DATA-SOURCING-011` — Yahoo Finance missing crumb authentication causing 401/403 errors <sub>(high)</sub>

Yahoo Finance API requires crumb and cookie authentication with every request. Without proper crumb management, API calls return 401 Unauthorized or HTML error pages instead of JSON data, breaking all downstream price and financial data processing.

FILE:references/COMPONENTS.md
# Component Capability Map

**Project**: finance-bp-103--ArcticDB
**Scan date**: 2026-04-22
**Stats**: {'total_files': 9, 'total_classes': 32, 'total_functions': 0, 'total_stages': 9}

## Modules (9)

- [uri_parsing_and_storage_adapter_selection](components/uri_parsing_and_storage_adapter_selection.md): 3 classes
- [library_configuration_and_management](components/library_configuration_and_management.md): 3 classes
- [data_normalization](components/data_normalization.md): 3 classes
- [recursive_normalization_(nested_structures)](components/recursive_normalization_-nested_structures.md): 3 classes
- [write_operations](components/write_operations.md): 5 classes
- [read_operations](components/read_operations.md): 3 classes
- [query_processing](components/query_processing.md): 4 classes
- [version_and_snapshot_management](components/version_and_snapshot_management.md): 5 classes
- [storage_backend_(c++_layer)](components/storage_backend_-c-_layer.md): 3 classes

FILE:references/CONSTRAINTS.md
# Constraints

## preservation_manifest

```yaml
required_objects:
  business_decisions_count: 153
  fatal_constraints_count: 40
  non_fatal_constraints_count: 278
  use_cases_count: 17
  semantic_locks_count: 12
  preconditions_count: 4
  evidence_quality_rules_count: 2
  traceback_scenarios_count: 5

```

## Domain Constraints Injected (16)

- **`SHARED-DS-RL-001`** <sub>(fatal)</sub>: Rate Limit + 指数退避重试：所有外部数据 API 调用必须实施速率限制控制 和指数退避重试（Exponential Backoff with Jitter）。收到 429/503 响应后 立即重试是反模式，会加剧服务端压力并触发 IP 封禁。 最大重试次数 3-5 次，退避基数 1-2 秒，最大退避 60 秒。
- **`SHARED-DS-RL-002`** <sub>(high)</sub>: 批量 API 调用必须控制并发数（max_workers），不可无限制并行。 免费 API（akshare/tushare 免费版）通常限制为 1-3 并发； 付费 API 也有并发上限（tushare 积分制，不同积分对应不同并发）。 超出并发限制会触发 429 或 IP 封禁。推荐使用 asyncio.Semaphore 或 ThreadPoolExecutor 的 max_workers 参数显式控制。
- **`SHARED-DS-RL-003`** <sub>(high)</sub>: API Token / 凭证安全：数据源 API key（tushare token / akshare 无需 token 但 其他商业数据源需要）不可硬编码在代码中，必须通过环境变量或配置文件读取。 硬编码 token 提交到 Git 会导致 token 泄露和费用损失。
- **`SHARED-DS-RL-004`** <sub>(medium)</sub>: 请求节流（Throttling）：对同一 API 的批量请求应在请求间插入最小间隔 （akshare 部分接口要求 ≥ 0.5s；tushare 免费版每分钟 200 次）。 纯代码 sleep 不如令牌桶（Token Bucket）算法精确，推荐使用 ratelimit 或 slowapi 等成熟库。
- **`SHARED-DS-MISS-001`** <sub>(high)</sub>: 停牌日数据缺失策略：停牌股票在停牌期间无成交数据，数据库中会出现日期缺口。 缺失日期不可使用 forward-fill（会产生虚假成交量）； 应在数据库中以 is_suspended=True 标记，量和成交额填 0，价格保留前一日收盘价。 因子计算时必须过滤 is_suspended=True 的行。
- **`SHARED-DS-MISS-002`** <sub>(medium)</sub>: 新上市股票的历史数据边界：新股上市首日开始在数据库中出现，但其上市前 无历史数据。若因子计算的 lookback 期超过上市天数，会产生所有 NaN 因子值。 采集时应记录每只股票的上市日期（list_date），采集逻辑应以上市日期为起点， 不以固定开始日期。
- **`SHARED-DS-MISS-003`** <sub>(high)</sub>: 退市股票的数据完整性：已退市股票在主流数据源（akshare/tushare）中依然 可以查询历史数据（退市前的历史），但退市日期后无数据。 历史股票池构建时必须包含已退市股票（否则幸存者偏差）， 且采集时需明确处理退市日截止边界。
- **`SHARED-DS-MISS-004`** <sub>(high)</sub>: 多数据源数据对账（Cross-Source Reconciliation）：同一数据（如收盘价） 从不同数据源（akshare/tushare/baostock）获取可能存在细微差异 （不同复权方式/不同节假日处理/除息调整时间不同）。 应在 pipeline 中实施多源对账检查，差异超阈值（如 0.1%）时记录告警并人工确认。
- **`SHARED-DS-TIME-001`** <sub>(high)</sub>: 时间戳精度与类型一致性：数据库中时间戳应使用统一的数据类型 （timestamp 而非 varchar/int）。混用字符串日期（'2024-01-15'）和 Timestamp 对象是比较、索引、merge 出现细微 bug 的常见来源， 应在 pipeline 入口处强制转换。
- **`SHARED-DS-TIME-002`** <sub>(high)</sub>: 交易时间与自然时间的区分：日线数据的"日期"通常对应交易日（T日）， 而新闻/公告数据的"时间"是自然时间。合并两类数据时，必须将自然时间 映射到下一个可用交易日（next available trading day）， 否则会产生"公告在T日，但T日盘中已经可用"的 lookahead 问题。
- **`SHARED-DS-TIME-003`** <sub>(medium)</sub>: 夏令时（DST）处理：采集美股/欧洲股市数据时，夏令时切换日（3月/11月） 会导致同一 HH:MM 时刻对应不同的 UTC 时间，若未处理，当日时序数据 会出现1小时的漂移。应始终以 UTC 存储，展示时按市场本地时区转换。
- **`SHARED-DS-INCR-001`** <sub>(high)</sub>: 增量更新幂等性：数据更新脚本必须是幂等的（多次运行结果相同）。 若脚本因网络中断在中途失败，重新运行时不应产生重复数据或数据缺口。 实现方式：先写入临时表，校验后 UPSERT 到主表，不直接 INSERT/APPEND。
- **`SHARED-DS-INCR-002`** <sub>(high)</sub>: 数据完整性检验（数据校验和/行数检查）：每次数据更新后， 应对关键字段做完整性检验：行数是否在预期范围内、价格是否为正数、 日期是否连续（无缺失交易日）。缺少自动校验的数据管道是"沉默腐烂"的根源。
- **`SHARED-DS-INCR-003`** <sub>(medium)</sub>: 数据版本化：数据管道的输出数据应版本化管理（data versioning）。 当数据源更新了历史数据（如修订调整后的财务数据）， 旧版本数据应保留可追溯，不应静默覆盖，以便对比版本间差异及复现历史回测。
- **`SHARED-DS-INCR-004`** <sub>(medium)</sub>: 数据对齐到交易日历边界：采集完成后，应验证所有股票/资产的数据覆盖 完整性与交易日历的一致性。每只股票在每个交易日都应有一行数据 （停牌标记，不是缺失）。通过 pivot_table 检查 NaN 比例是有效的快速诊断手段。
- **`SHARED-DS-INCR-005`** <sub>(medium)</sub>: 缓存策略（Caching）：频繁读取的静态/低频更新数据（如股票信息、行业分类、 指数成分股）应本地缓存，避免每次运行重复 API 调用。 缓存必须设置过期时间（TTL），防止使用过期的行业分类或已失效的成分股信息。

FILE:references/LOCKS.md
# Semantic Locks + Preconditions

## Semantic Locks (12)

### `SL-01` <sub>(on_violation: fatal)</sub>
Execute sell orders before buy orders in every trading cycle

### `SL-02` <sub>(on_violation: fatal)</sub>
Trading signals MUST use next-bar execution (no look-ahead)

### `SL-03` <sub>(on_violation: fatal)</sub>
Entity IDs MUST follow format entity_type_exchange_code

### `SL-04` <sub>(on_violation: fatal)</sub>
DataFrame index MUST be MultiIndex (entity_id, timestamp)

### `SL-05` <sub>(on_violation: fatal)</sub>
TradingSignal MUST have EXACTLY ONE of: position_pct, order_money, order_amount

### `SL-06` <sub>(on_violation: fatal)</sub>
filter_result column semantics: True=BUY, False=SELL, None/NaN=NO ACTION

### `SL-07` <sub>(on_violation: fatal)</sub>
Transformer MUST run BEFORE Accumulator in factor pipeline

### `SL-08` <sub>(on_violation: fatal)</sub>
MACD parameters locked: fast=12, slow=26, signal=9

### `SL-09` <sub>(on_violation: warning)</sub>
Default transaction costs: buy_cost=0.001, sell_cost=0.001, slippage=0.001

### `SL-10` <sub>(on_violation: fatal)</sub>
A-share equity trading is T+1 (no same-day close of buy positions)

### `SL-11` <sub>(on_violation: fatal)</sub>
Recorder subclass MUST define provider AND data_schema class attributes

### `SL-12` <sub>(on_violation: fatal)</sub>
Factor result_df MUST contain either 'filter_result' OR 'score_result' column

## Preconditions (4)

- **`PC-01`**: `python3 -c 'import zvt; print(zvt.__version__)'` → on_fail: Run: python3 -m pip install zvt then re-run: python3 -m zvt.init_dirs to initialize data directories
- **`PC-02`**: `python3 -c "from zvt.api.kdata import get_kdata; df = get_kdata(entity_ids=['stock_sh_600000'], limit=1); assert df is not None and len(df) > 0, 'No kdata found'"` → on_fail: Run recorder first: python3 -m zvt.recorders.em.em_stock_kdata_recorder --entity_ids stock_sh_600000 (replace with your target entity IDs)
- **`PC-03`**: `python3 -c "import os; from pathlib import Path; zvt_home = Path(os.environ.get('ZVT_HOME', Path.home() / '.zvt')); assert zvt_home.exists(), f'ZVT home not found: {zvt_home}'"` → on_fail: Run: python3 -m zvt.init_dirs
- **`PC-04`**: `python3 -c "import os, tempfile; from pathlib import Path; zvt_home = Path(os.environ.get('ZVT_HOME', Path.home() / '.zvt')); test_f = zvt_home / '.write_test'; test_f.touch(); test_f.unlink()"` → on_fail: Check directory permissions: chmod u+w ~/.zvt or set ZVT_HOME environment variable to a writable location

FILE:references/USE_CASES.md
# Known Use Cases (KUC)

Total: **17**

## `KUC-101`
**Source**: `docs/mkdocs/docs/notebooks/ArcticDB_aws_public_blockchain.ipynb`

Setting up AWS credentials to enable secure access to public blockchain data stored in S3, allowing integration with ArcticDB for time-series storage.

## `KUC-102`
**Source**: `docs/mkdocs/docs/notebooks/ArcticDB_billion_row_challenge.ipynb`

Demonstrates ArcticDB's ability to handle massive datasets (1 billion rows of temperature data) with efficient aggregation, serving as a performance benchmark for large-scale data operations.

## `KUC-103`
**Source**: `docs/mkdocs/docs/notebooks/ArcticDB_demo_concat.ipynb`

Demonstrates efficient concatenation of multiple DataFrames stored in ArcticDB using lazy loading to minimize memory consumption during batch operations.

## `KUC-104`
**Source**: `docs/mkdocs/docs/notebooks/ArcticDB_demo_equity_analytics.ipynb`

Demonstrates downloading historical equity market data from yfinance and storing it in ArcticDB for analytics, enabling time-series analysis of stock prices and volumes across multiple symbols.

## `KUC-105`
**Source**: `docs/mkdocs/docs/notebooks/ArcticDB_demo_equity_options.ipynb`

Demonstrates storing and querying equity options data including expiry analysis and option Greeks, supporting options strategy research and analysis workflows.

## `KUC-106`
**Source**: `docs/mkdocs/docs/notebooks/ArcticDB_demo_lazydataframe.ipynb`

Demonstrates reading large datasets (10M-1B rows) efficiently using ArcticDB's lazy loading to reduce memory usage while selecting specific columns.

## `KUC-107`
**Source**: `docs/mkdocs/docs/notebooks/ArcticDB_demo_lmdb.ipynb`

Demonstrates basic storage operations (write, read, append, update) with ArcticDB using LMDB backend, including version management and subframe reading capabilities.

## `KUC-108`
**Source**: `docs/mkdocs/docs/notebooks/ArcticDB_demo_querybuilder.ipynb`

Demonstrates efficient querying of large datasets (up to 1B rows) with specific column selection, optimizing read performance by avoiding unnecessary data loading.

## `KUC-109`
**Source**: `docs/mkdocs/docs/notebooks/ArcticDB_demo_read_as_arrow.ipynb`

Demonstrates reading ArcticDB data into Arrow and Polars formats for interoperability with modern data science tooling, enabling seamless integration with downstream processing pipelines.

## `KUC-110`
**Source**: `docs/mkdocs/docs/notebooks/ArcticDB_demo_recursive_normalizers.ipynb`

Demonstrates storing complex nested data structures including DataFrames within dictionaries using recursive normalizers, enabling preservation of hierarchical data relationships.

## `KUC-111`
**Source**: `docs/mkdocs/docs/notebooks/ArcticDB_demo_resample.ipynb`

Demonstrates resampling high-frequency time-series data (12M rows at second frequency) to lower frequencies (1-minute) using built-in aggregation, optimizing storage and query performance.

## `KUC-112`
**Source**: `docs/mkdocs/docs/notebooks/ArcticDB_demo_snapshots.ipynb`

Demonstrates creating and managing data snapshots for point-in-time recovery, enabling reproducibility and audit trails for time-series data in ArcticDB.

## `KUC-113`
**Source**: `docs/mkdocs/docs/notebooks/ArcticDB_merge.ipynb`

Demonstrates merging new data with existing datasets using merge strategies (update, do_nothing) for handling price corrections and data synchronization in financial applications.

## `KUC-114`
**Source**: `docs/mkdocs/docs/notebooks/ArcticDB_pythagorean_won_loss_formula_notebook.ipynb`

Demonstrates sports data analytics using the Pythagorean expectation formula to analyze team performance, including data storage, visualization, and OLS statistical modeling.

## `KUC-115`
**Source**: `docs/mkdocs/docs/notebooks/ArcticDB_staged_data_with_tokens.ipynb`

Demonstrates staging data from multiple concurrent writers before finalizing with tokens, enabling distributed data ingestion workflows with proper synchronization.

## `KUC-116`
**Source**: `docs/mkdocs/docs/notebooks/styling.py`

Provides styling functions for DataFrame visualization with custom themes, color schemes, and export capabilities for creating professional data presentations.

## `KUC-117`
**Source**: `docs/mkdocs/docs/technical/release_checks.py`

Provides automated release validation tests that verify basic ArcticDB functionality including library creation, data write/read operations, and library deletion.

FILE:references/WISDOM.md
# Cross-Project Wisdom

Total: **8**

## `CW-DATA-SOURCING-001` — Exponential backoff retry with rate limit detection
**From**: finance-bp-079--akshare, finance-bp-114--edgar-crawler · **Applicable to**: data-sourcing

Implement retry logic with exponential backoff specifically for HTTP 429 rate limit responses. Retrying immediately on rate limit errors worsens the block situation. Separate retry logic for transient network errors (TimeoutError, ConnectionError) from permanent errors (ValueError, KeyError) prevents resource waste and masks underlying bugs.

## `CW-DATA-SOURCING-002` — Strict date format validation and standardization
**From**: finance-bp-070--edgartools, finance-bp-079--akshare, finance-bp-084--eastmoney · **Applicable to**: data-sourcing

Validate date formats strictly (YYYY-MM-DD pattern with leap year and month-end checks) before processing XBRL or API data. Convert date strings between formats (YYYYMMDD to YYYY-MM-DD) when storing to databases. Invalid dates corrupt downstream financial calculations.

## `CW-DATA-SOURCING-003` — XBRL fact attribute completeness enforcement
**From**: finance-bp-070--edgartools, finance-bp-114--edgar-crawler · **Applicable to**: data-sourcing

Extract and validate all essential XBRL fact attributes (concept, value, period, unit) from every fact. Missing attributes cause financial analysis queries to return incomplete or misleading results. Period type (instant vs duration) must be correctly distinguished for accurate balance sheet rendering.

## `CW-DATA-SOURCING-004` — Streaming parser threshold for large documents
**From**: finance-bp-070--edgartools, finance-bp-128--yfinance · **Applicable to**: data-sourcing

Implement streaming parser activation when documents exceed configurable thresholds (10MB default). This prevents OOM errors on large NPORT-P filings or bulk document downloads. Also require timezone information for time-series data to prevent DST offset corruption.

## `CW-DATA-SOURCING-005` — Data accuracy disclaimer requirements
**From**: finance-bp-079--akshare, finance-bp-128--yfinance, finance-bp-097--OpenBB · **Applicable to**: data-sourcing

Always present scraped or third-party financial data with proper caveats about accuracy limitations and delays. Claims of guaranteed accuracy, real-time capabilities, or Yahoo/provider affiliation violate terms of service and can lead to user financial losses from reliance on delayed or incorrect data.

## `CW-DATA-SOURCING-006` — Atomic write ordering for versioned storage
**From**: finance-bp-103--ArcticDB · **Applicable to**: data-sourcing

Write atom keys (TABLE_DATA, TABLE_INDEX, VERSION) before updating mutable reference keys (VERSION_REF, SNAPSHOT_REF). Never modify atom keys after writing to preserve content-addressed storage invariants. This prevents readers from accessing incomplete data in multi-writer scenarios.

## `CW-DATA-SOURCING-007` — HTTP status code validation before data processing
**From**: finance-bp-079--akshare, finance-bp-097--OpenBB · **Applicable to**: data-sourcing

Always validate HTTP response status codes before processing response data. Error responses (404, 500) may contain HTML error pages that corrupt downstream JSON parsing. Explicitly check for HTTP 429 and raise RateLimitError for proper handling by callers.

## `CW-DATA-SOURCING-008` — Quality gates for financial recommendations
**From**: finance-bp-084--eastmoney · **Applicable to**: data-sourcing

Apply fundamental quality filters (ROE thresholds, OCF/Profit ratios, debt ratios) before generating financial recommendations. Without quality gates, low-quality stocks may be recommended for positions, leading to investment losses. Separate on-demand computation from scheduled pre-computation to handle API rate limits.

FILE:references/components/data_normalization.md
# data_normalization (3 classes)

## `CompositeNormalizer.normalize`
`data_normalization/compositenormalizer-normalize.py:0`

## `DataFrameNormalizer.normalize`
`data_normalization/dataframenormalizer-normalize.py:0`

## `Data Format`
`data_normalization/data-format.py:0`

FILE:references/components/library_configuration_and_management.md
# library_configuration_and_management (3 classes)

## `Arctic.get_library`
`library_configuration_and_management/arctic-get-library.py:0`

## `LibraryOptions.__init__`
`library_configuration_and_management/libraryoptions-init.py:0`

## `Library Options`
`library_configuration_and_management/library-options.py:0`

FILE:references/components/query_processing.md
# query_processing (4 classes)

## `QueryBuilder.__getitem__`
`query_processing/querybuilder-getitem.py:0`

## `QueryBuilder.groupby`
`query_processing/querybuilder-groupby.py:0`

## `QueryBuilder.resample`
`query_processing/querybuilder-resample.py:0`

## `Query Optimization`
`query_processing/query-optimization.py:0`

FILE:references/components/read_operations.md
# read_operations (3 classes)

## `NativeVersionStore.read`
`read_operations/nativeversionstore-read.py:0`

## `NativeVersionStore.batch_read`
`read_operations/nativeversionstore-batch-read.py:0`

## `Output Format`
`read_operations/output-format.py:0`

FILE:references/components/recursive_normalization_-nested_structures.md
# recursive_normalization_(nested_structures) (3 classes)

## `Flattener.flatten`
`recursive_normalization_(nested_structures)/flattener-flatten.py:0`

## `Flattener.reconstruct`
`recursive_normalization_(nested_structures)/flattener-reconstruct.py:0`

## `Metastructure Version`
`recursive_normalization_(nested_structures)/metastructure-version.py:0`

FILE:references/components/storage_backend_-c-_layer.md
# storage_backend_(c++_layer) (3 classes)

## `PythonVersionStore.write`
`storage_backend_(c++_layer)/pythonversionstore-write.py:0`

## `PythonVersionStore.read`
`storage_backend_(c++_layer)/pythonversionstore-read.py:0`

## `Key-Value Backend`
`storage_backend_(c++_layer)/key-value-backend.py:0`

FILE:references/components/uri_parsing_and_storage_adapter_selection.md
# uri_parsing_and_storage_adapter_selection (3 classes)

## `Arctic.__init__`
`uri_parsing_and_storage_adapter_selection/arctic-init.py:0`

## `ArcticLibraryAdapter.supports_uri`
`uri_parsing_and_storage_adapter_selection/arcticlibraryadapter-supports-uri.py:0`

## `Storage Backend`
`uri_parsing_and_storage_adapter_selection/storage-backend.py:0`

FILE:references/components/version_and_snapshot_management.md
# version_and_snapshot_management (5 classes)

## `NativeVersionStore.list_versions`
`version_and_snapshot_management/nativeversionstore-list-versions.py:0`

## `NativeVersionStore.snapshot`
`version_and_snapshot_management/nativeversionstore-snapshot.py:0`

## `NativeVersionStore.delete`
`version_and_snapshot_management/nativeversionstore-delete.py:0`

## `NativeVersionStore.prune_previous_versions`
`version_and_snapshot_management/nativeversionstore-prune-previous-versio.py:0`

## `Version Pruning`
`version_and_snapshot_management/version-pruning.py:0`

FILE:references/components/write_operations.md
# write_operations (5 classes)

## `NativeVersionStore.write`
`write_operations/nativeversionstore-write.py:0`

## `NativeVersionStore.append`
`write_operations/nativeversionstore-append.py:0`

## `NativeVersionStore.update`
`write_operations/nativeversionstore-update.py:0`

## `NativeVersionStore.stage`
`write_operations/nativeversionstore-stage.py:0`

## `Write Mode`
`write_operations/write-mode.py:0`

ClawHub Coding Cloud+2

T@clawhub-tangweigang-jpg-8679fec286

Arch Garch Volatility

Skill

用 GARCH 族模型进行波动率建模与预测，支持夏普比率统计推断和 SPA 模型比较测试，应用于全球市场风险管理。

---
name: arch-garch-volatility
description: |-
  用 GARCH 族模型进行波动率建模与预测，支持夏普比率统计推断和 SPA 模型比较测试，应用于全球市场风险管理。
license: Proprietary. See LICENSE.txt in project root.
compatibility: Designed for Doramagic-host ecosystem (Claude Code / openclaw / Cursor). Requires Python 3.12+ with uv package manager.
metadata:
  version: "v6.1"
  blueprint_id: "finance-bp-124"
  compiled_at: "2026-04-22T13:01:01.570350+00:00"
  capability_markets: "global"
  capability_activities: "derivatives-pricing"
  sop_version: "crystal-compilation-v6.1"
---
# GARCH 波动率模型 (arch-garch-volatility)

> 用 GARCH 族模型进行波动率建模与预测，支持夏普比率统计推断和 SPA 模型比较测试，应用于全球市场风险管理。

## Pipeline

`data_collection -> data_storage -> factor_computation -> target_selection -> trading_execution -> visualization`

## Top Use Cases (9 total)

### Sharpe Ratio Bootstrap Statistical Inference (`UC-101`)
Computes statistical inference (confidence intervals, standard errors) for the Sharpe Ratio using bootstrap methods to quantify uncertainty in risk-ad
**Triggers**: bootstrap, sharpe ratio, statistical inference

### Multiple Model Comparison with SPA Test (`UC-102`)
Compares 500 predictive models against a benchmark using the Superior Predictive Ability (SPA) test to determine if any models significantly outperfor
**Triggers**: model comparison, SPA test, multiple models

### Oil Price Cointegration Analysis (`UC-103`)
Tests for cointegration relationships between WTI and Brent crude oil prices to identify mean-reverting spread opportunities using Engle-Granger and P
**Triggers**: cointegration, unit root, ADF test

For all **9** use cases, see [references/USE_CASES.md](references/USE_CASES.md).

**Execute trigger**: `When user intent matches intent_router.uc_entries[].positive_terms AND user uses action verb (run/execute/跑/执行/backtest/fetch/collect)`

## What I'll Ask You

- Target market: A-share (default), HK, or crypto? (US stocks in ZVT are half-baked — stockus_nasdaq_AAPL exists but coverage is thin)
- Data source / provider: eastmoney (free, no account), joinquant (account+paid), baostock (free, good history), akshare, or qmt (broker)?
- Strategy type: MACD golden-cross, MA crossover, volume breakout, fundamental screen, or custom factor?
- Time range: start_timestamp and end_timestamp for backtest period
- Target entity IDs: specific stocks (stock_sh_600000) or index components (SZ1000)?

## Semantic Locks (Fatal)

| ID | Rule | On Violation |
|---|---|---|
| `SL-01` | Execute sell orders before buy orders in every trading cycle | halt |
| `SL-02` | Trading signals MUST use next-bar execution (no look-ahead) | halt |
| `SL-03` | Entity IDs MUST follow format entity_type_exchange_code | halt |
| `SL-04` | DataFrame index MUST be MultiIndex (entity_id, timestamp) | halt |
| `SL-05` | TradingSignal MUST have EXACTLY ONE of: position_pct, order_money, order_amount | halt |
| `SL-06` | filter_result column semantics: True=BUY, False=SELL, None/NaN=NO ACTION | halt |
| `SL-07` | Transformer MUST run BEFORE Accumulator in factor pipeline | halt |
| `SL-08` | MACD parameters locked: fast=12, slow=26, signal=9 | halt |

Full lock definitions: [references/LOCKS.md](references/LOCKS.md)

## Top Anti-Patterns (15 total)

- **`AP-DERIVATIVES-PRICING-001`**: Instrument NPV called without attached pricing engine
- **`AP-DERIVATIVES-PRICING-002`**: BSM forward price ignores dividend yield
- **`AP-DERIVATIVES-PRICING-003`**: Negative discount factors passed to log-domain interpolation

All 15 anti-patterns: [references/ANTI_PATTERNS.md](references/ANTI_PATTERNS.md)

## Evidence Quality Notice

> [QUALITY NOTICE] This crystal was compiled from blueprint finance-bp-124. Evidence verify ratio = 47.2% and audit fail total = 32. Generated results may have uncaptured requirement gaps. Verify critical decisions against source files (LATEST.yaml / LATEST.jsonl).

## Reference Files

| File | Contents | When to Load |
|---|---|---|
| [references/seed.yaml](references/seed.yaml) | V6+ 全量权威 (source-of-truth) | 有行为/决策争议时必读 |
| [references/ANTI_PATTERNS.md](references/ANTI_PATTERNS.md) | 15 条跨项目反模式 | 开始实现前 |
| [references/WISDOM.md](references/WISDOM.md) | 跨项目精华借鉴 | 架构决策时 |
| [references/CONSTRAINTS.md](references/CONSTRAINTS.md) | domain + fatal 约束 | 规则冲突时 |
| [references/USE_CASES.md](references/USE_CASES.md) | 全量 KUC-* 业务场景 | 需要完整示例时 |
| [references/LOCKS.md](references/LOCKS.md) | SL-* + preconditions + hints | 生成回测/交易代码前 |
| [references/COMPONENTS.md](references/COMPONENTS.md) | AST 组件地图（按 module 拆分）| 查 API 时 |

---

*Compiled by Doramagic crystal-compilation-v6.1 from `finance-bp-124` blueprint at 2026-04-22T13:01:01.570350+00:00.*
*See [human_summary.md](human_summary.md) for non-technical overview.*

FILE:human_summary.md
# Human Summary

> I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow. Just tell me what you want; I'll write the code, you don't have to dig docs. (Heads up: ZVT natively supports A-share, HK, and crypto. US stocks — stockus_nasdaq_AAPL — are half-baked; don't bother for serious work.)

## What I Can Do

- **tagline**: I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow. Just tell me what you want; I'll write the code, you don't have to dig docs. (Heads up: ZVT natively supports A-share, HK, and crypto. US stocks — stockus_nasdaq_AAPL — are half-baked; don't bother for serious work.)
- **use_cases**: ['Oil Price Cointegration Analysis', 'Multiple Model Comparison with SPA Test', 'Sharpe Ratio Bootstrap Statistical Inference', 'A-share MACD daily golden-cross backtest with hfq price adjustment from eastmoney', 'End-to-end ZVT pipeline: FinanceRecorder + GoodCompanyFactor + StockTrader', 'Multi-factor strategy with TargetSelector (AND mode) combining MACD + volume breakout', 'Index composition data collection (SZ1000, SZ2000) with EM recorder']

## What I Ask You

- Target market: A-share (default), HK, or crypto? (US stocks in ZVT are half-baked — stockus_nasdaq_AAPL exists but coverage is thin)
- Data source / provider: eastmoney (free, no account), joinquant (account+paid), baostock (free, good history), akshare, or qmt (broker)?
- Strategy type: MACD golden-cross, MA crossover, volume breakout, fundamental screen, or custom factor?
- Time range: start_timestamp and end_timestamp for backtest period
- Target entity IDs: specific stocks (stock_sh_600000) or index components (SZ1000)?

FILE:references/ANTI_PATTERNS.md
# Anti-Patterns (Cross-Project)

Total: **15**

## FinancePy (finance-bp-101) (3)

### `AP-DERIVATIVES-PRICING-003` — Negative discount factors passed to log-domain interpolation <sub>(high)</sub>

When Numba-jitted interpolation functions perform log transformation on discount factors, negative or zero values cause domain errors. This occurs because log(-x) and log(0) are mathematically undefined. The consequence is runtime crashes in jitted functions and complete failure of discount curve interpolation, blocking all downstream pricing calculations.

### `AP-DERIVATIVES-PRICING-004` — Non-monotonic time points in discount curve interpolation <sub>(high)</sub>

Interpolation over non-monotonically increasing time points produces undefined behavior at crossing times, causing discount factors to be incorrectly computed where time values overlap. This corrupts the entire term structure because the bootstrap algorithm cannot determine which discount factor corresponds to which maturity. The consequence is incorrect present value calculations across all downstream products priced against the curve.

### `AP-DERIVATIVES-PRICING-005` — Bootstrap calibration instruments not in maturity order <sub>(high)</sub>

When building yield curves from market instruments (deposits, FRAs, swaps), the instruments must be provided in strictly increasing maturity order. Out-of-order instruments cause the bootstrap algorithm to solve for discount factors at incorrect time points, corrupting the entire term structure. The consequence is wrong forward rates and discount factors that propagate into all priced instruments.

## QuantLib-SWIG (finance-bp-123) (4)

### `AP-DERIVATIVES-PRICING-001` — Instrument NPV called without attached pricing engine <sub>(high)</sub>

Calling NPV() on a derivatives instrument without first calling setPricingEngine() returns uninitialized garbage values or throws null pointer exceptions. This occurs because the Instrument class relies on the attached PricingEngine to perform actual valuation logic. The consequence is silently incorrect pricing results that appear valid, potentially leading to bad trading decisions.

### `AP-DERIVATIVES-PRICING-006` — Option Exercise type mismatches VanillaOption constructor <sub>(high)</sub>

VanillaOption requires both a StrikedTypePayoff and a matching Exercise object. Using wrong Exercise type (e.g., AmericanExercise for European option) causes compilation failures in C++ or runtime errors in SWIG bindings. The consequence is the pricing system cannot initialize options, blocking all option pricing workflows.

### `AP-DERIVATIVES-PRICING-013` — Evaluation date not set before QuantLib term structure construction <sub>(medium)</sub>

QuantLib requires ql.Settings.instance().evaluationDate to be set before constructing yield term structures and instruments. Without an explicit evaluation date, the curve reference date becomes undefined, causing date calculations to fail or produce incorrect settlement dates. The consequence is wrong discount factors and NPV calculations across the entire portfolio.

### `AP-DERIVATIVES-PRICING-014` — Market quotes passed without QuoteHandle wrapper <sub>(medium)</sub>

QuantLib's observer pattern requires all market quotes to be wrapped in QuoteHandle before passing to rate helpers. Raw quote values bypass the observable notification mechanism, causing dependent instruments to never recalculate when market data updates. The consequence is stale pricing that doesn't reflect current market conditions.

## arch (finance-bp-124) (2)

### `AP-DERIVATIVES-PRICING-007` — NaN/inf values in ARCH model input data <sub>(high)</sub>

ARCH model estimation relies on recursive variance computations and scipy optimize. Non-finite input values (NaN, inf) cause optimizers to produce NaN results and recursive variance calculations to fail. The consequence is complete model estimation failure with meaningless outputs that appear valid, leading to incorrect volatility forecasts and risk misestimation.

### `AP-DERIVATIVES-PRICING-008` — ARCH parameter array concatenation in wrong order <sub>(high)</sub>

ARCHModel composes from three components (mean, volatility, distribution) and requires parameter arrays concatenated in fixed order: [mean_params, volatility_params, distribution_params]. Incorrect ordering causes _parse_parameters to assign wrong values to wrong components, producing mathematically invalid models (e.g., volatility parameters interpreted as distribution parameters). The consequence is invalid conditional variance forecasts.

## py_vollib (finance-bp-127) (6)

### `AP-DERIVATIVES-PRICING-002` — BSM forward price ignores dividend yield <sub>(high)</sub>

When calculating option prices on dividend-paying stocks using BSM, the forward price must be adjusted as F = S * exp((r-q)*t). Omitting the dividend yield adjustment (using F = S * exp(r*t)) causes systematic mispricing for all dividend-paying assets. The consequence is consistently wrong option prices that diverge from market prices, leading to arbitrage opportunities and trading losses.

### `AP-DERIVATIVES-PRICING-009` — Zero or negative time-to-expiration in option pricing <sub>(high)</sub>

Option pricing formulas (Black-Scholes, Black model) compute sqrt(t) in the denominator. Zero time causes division by zero; negative time produces NaN in d1/d2 calculations. The consequence is invalid option prices (NaN, inf) that break downstream Greeks calculations and hedging workflows.

### `AP-DERIVATIVES-PRICING-010` — Black model applies spot price instead of forward price <sub>(high)</sub>

The Black model is designed for options on futures/forwards and expects futures price F as input, not spot price S. Using spot directly causes incorrect pricing because the Black formula assumes the underlying follows geometric Brownian motion with drift equal to the risk-free rate (i.e., forward dynamics). The consequence is systematically wrong forward option prices.

### `AP-DERIVATIVES-PRICING-011` — Missing discount factor in Black model pricing <sub>(medium)</sub>

Black model pricing must apply time value discounting with deflater = exp(-r*t) to undiscounted option prices. Omitting the discount factor produces forward option prices that exceed their fair value by the risk-free compounding amount. The consequence is violation of time value of money principles and prices that cannot be used for fair valuation or hedging.

### `AP-DERIVATIVES-PRICING-012` — Invalid flag parameter ('c'/'p') passed to py_vollib without validation <sub>(medium)</sub>

py_vollib binary_flag dict only contains keys 'c' and 'p'. Passing any other flag value causes KeyError exception. The library lacks input validation and crashes on invalid inputs. The consequence is unhandled exceptions in production systems when flag values come from external sources with unexpected formats.

### `AP-DERIVATIVES-PRICING-015` — Implied volatility computed without proper bounds validation <sub>(medium)</sub>

When computing implied volatility, option prices outside theoretical bounds (below intrinsic value or above maximum) must raise appropriate exceptions. Returning invalid IV values (negative volatility or extreme values) violates mathematical definitions and leads to incorrect pricing, risk calculations, and hedging ratios. The consequence is systemic pricing errors across all vol-dependent derivatives.

FILE:references/COMPONENTS.md
# Component Capability Map

**Project**: finance-bp-124--arch
**Scan date**: 2026-04-22
**Stats**: {'total_files': 7, 'total_classes': 40, 'total_functions': 0, 'total_stages': 7}

## Modules (7)

- [data_input_&_validation](components/data_input_-_validation.md): 4 classes
- [model_specification](components/model_specification.md): 7 classes
- [parameter_estimation](components/parameter_estimation.md): 5 classes
- [forecasting_&_simulation](components/forecasting_-_simulation.md): 5 classes
- [unit_root_&_cointegration_testing](components/unit_root_-_cointegration_testing.md): 7 classes
- [bootstrap_&_multiple_comparison](components/bootstrap_-_multiple_comparison.md): 7 classes
- [results_reporting_&_visualization](components/results_reporting_-_visualization.md): 5 classes

FILE:references/CONSTRAINTS.md
# Constraints

## preservation_manifest

```yaml
required_objects:
  business_decisions_count: 107
  fatal_constraints_count: 77
  non_fatal_constraints_count: 151
  use_cases_count: 9
  semantic_locks_count: 12
  preconditions_count: 4
  evidence_quality_rules_count: 2
  traceback_scenarios_count: 5

```

FILE:references/LOCKS.md
# Semantic Locks + Preconditions

## Semantic Locks (12)

### `SL-01` <sub>(on_violation: fatal)</sub>
Execute sell orders before buy orders in every trading cycle

### `SL-02` <sub>(on_violation: fatal)</sub>
Trading signals MUST use next-bar execution (no look-ahead)

### `SL-03` <sub>(on_violation: fatal)</sub>
Entity IDs MUST follow format entity_type_exchange_code

### `SL-04` <sub>(on_violation: fatal)</sub>
DataFrame index MUST be MultiIndex (entity_id, timestamp)

### `SL-05` <sub>(on_violation: fatal)</sub>
TradingSignal MUST have EXACTLY ONE of: position_pct, order_money, order_amount

### `SL-06` <sub>(on_violation: fatal)</sub>
filter_result column semantics: True=BUY, False=SELL, None/NaN=NO ACTION

### `SL-07` <sub>(on_violation: fatal)</sub>
Transformer MUST run BEFORE Accumulator in factor pipeline

### `SL-08` <sub>(on_violation: fatal)</sub>
MACD parameters locked: fast=12, slow=26, signal=9

### `SL-09` <sub>(on_violation: warning)</sub>
Default transaction costs: buy_cost=0.001, sell_cost=0.001, slippage=0.001

### `SL-10` <sub>(on_violation: fatal)</sub>
A-share equity trading is T+1 (no same-day close of buy positions)

### `SL-11` <sub>(on_violation: fatal)</sub>
Recorder subclass MUST define provider AND data_schema class attributes

### `SL-12` <sub>(on_violation: fatal)</sub>
Factor result_df MUST contain either 'filter_result' OR 'score_result' column

## Preconditions (4)

- **`PC-01`**: `python3 -c 'import zvt; print(zvt.__version__)'` → on_fail: Run: python3 -m pip install zvt then re-run: python3 -m zvt.init_dirs to initialize data directories
- **`PC-02`**: `python3 -c "from zvt.api.kdata import get_kdata; df = get_kdata(entity_ids=['stock_sh_600000'], limit=1); assert df is not None and len(df) > 0, 'No kdata found'"` → on_fail: Run recorder first: python3 -m zvt.recorders.em.em_stock_kdata_recorder --entity_ids stock_sh_600000 (replace with your target entity IDs)
- **`PC-03`**: `python3 -c "import os; from pathlib import Path; zvt_home = Path(os.environ.get('ZVT_HOME', Path.home() / '.zvt')); assert zvt_home.exists(), f'ZVT home not found: {zvt_home}'"` → on_fail: Run: python3 -m zvt.init_dirs
- **`PC-04`**: `python3 -c "import os, tempfile; from pathlib import Path; zvt_home = Path(os.environ.get('ZVT_HOME', Path.home() / '.zvt')); test_f = zvt_home / '.write_test'; test_f.touch(); test_f.unlink()"` → on_fail: Check directory permissions: chmod u+w ~/.zvt or set ZVT_HOME environment variable to a writable location

FILE:references/USE_CASES.md
# Known Use Cases (KUC)

Total: **9**

## `KUC-101`
**Source**: `examples/bootstrap_examples.ipynb`

Computes statistical inference (confidence intervals, standard errors) for the Sharpe Ratio using bootstrap methods to quantify uncertainty in risk-adjusted performance metrics.

## `KUC-102`
**Source**: `examples/multiple-comparison_examples.ipynb`

Compares 500 predictive models against a benchmark using the Superior Predictive Ability (SPA) test to determine if any models significantly outperform the benchmark.

## `KUC-103`
**Source**: `examples/unitroot_cointegration_examples.ipynb`

Tests for cointegration relationships between WTI and Brent crude oil prices to identify mean-reverting spread opportunities using Engle-Granger and Phillips-Ouliaris tests.

## `KUC-104`
**Source**: `examples/unitroot_examples.ipynb`

Tests for stationarity in credit spreads (BAA-AAA) using Augmented Dickey-Fuller tests to determine if mean-reversion trading strategies are applicable.

## `KUC-105`
**Source**: `examples/univariate_forecasting_with_exogenous_variables.ipynb`

Forecasts univariate time series using Autoregressive models with exogenous variables (ARX) to capture the impact of external factors on the target variable.

## `KUC-106`
**Source**: `examples/univariate_using_fixed_variance.ipynb`

Demonstrates how to specify a HARX mean model with fixed/external variance inputs and iteratively fit volatility models using the estimated conditional volatility.

## `KUC-107`
**Source**: `examples/univariate_volatility_forecasting.ipynb`

Forecasts future volatility of S&P 500 returns using GARCH models, including multi-step ahead forecasts and rolling window out-of-sample predictions.

## `KUC-108`
**Source**: `examples/univariate_volatility_modeling.ipynb`

Fits and compares different GARCH volatility model specifications (symmetric, asymmetric, power) with various error distributions to characterize S&P 500 return volatility dynamics.

## `KUC-109`
**Source**: `examples/univariate_volatility_scenarios.ipynb`

Generates multiple volatility scenarios for NASDAQ returns using simulation-based forecasting methods, useful for risk management and option pricing applications.

FILE:references/WISDOM.md
# Cross-Project Wisdom

Total: **8**

## `CW-DERIVATIVES-PRICING-001` — Strict input validation before financial calculations
**From**: FinancePy, QuantLib-SWIG · **Applicable to**: derivatives-pricing

Both FinancePy and QuantLib-SWIG enforce strict validation of all input parameters before any financial computation. FinancePy validates day count types, date arguments, tolerance parameters, and max iterations. QuantLib-SWIG validates exercise types and swap direction enums. This pattern prevents corrupted calculations and provides clear error messages. Apply this pattern by validating all inputs at function entry points.

## `CW-DERIVATIVES-PRICING-002` — Bootstrap requires ordered instrument calibration
**From**: FinancePy, QuantLib-SWIG · **Applicable to**: derivatives-pricing

Both FinancePy and QuantLib-SWIG require calibration instruments to be provided in strict maturity order for curve bootstrapping. FinancePy enforces monotonically increasing time points and validates instrument sequencing (deposits before FRAs before swaps). QuantLib-SWIG uses bootstrap helpers (DepositRateHelper, FraRateHelper, SwapRateHelper) that assume ordered inputs. This ensures the bootstrap algorithm solves for discount factors at mathematically correct time points.

## `CW-DERIVATIVES-PRICING-003` — Handle pattern for lazy evaluation chains
**From**: QuantLib-SWIG · **Applicable to**: derivatives-pricing

QuantLib-SWIG requires wrapping market data (quotes, term structures) in Handle objects to enable lazy evaluation and automatic recalculation. QuoteHandle for market quotes and Handle for term structures enable the observer pattern. When market data updates, all dependent instruments automatically recalculate. This pattern is essential for live pricing systems where prices must reflect current market conditions.

## `CW-DERIVATIVES-PRICING-004` — Parameter composition requires fixed ordering and partitioning
**From**: arch · **Applicable to**: derivatives-pricing

arch enforces a strict parameter composition pattern where mean, volatility, and distribution parameters must be concatenated in fixed order with explicit offset partitioning. The offsets array partitions the unified parameter vector into components. This pattern prevents parameter assignment errors that would corrupt model components. Apply this when composing financial models from multiple sub-components.

## `CW-DERIVATIVES-PRICING-005` — Strict mathematical constraint enforcement
**From**: arch, py_vollib · **Applicable to**: derivatives-pricing

Both arch and py_vollib enforce strict mathematical constraints: arch enforces volatility model stationarity constraints (A.dot(params) - b >= 0) for SLSQP optimization; py_vollib validates implied volatility is positive and option prices within intrinsic/maximum bounds. Violating these constraints produces mathematically invalid results. Always enforce domain constraints on all financial model parameters.

## `CW-DERIVATIVES-PRICING-006` — Forward price adjustment for dividend yield in BSM
**From**: py_vollib · **Applicable to**: derivatives-pricing

py_vollib demonstrates the correct BSM implementation: compute forward price F = S * exp((r-q)*t) to adjust for continuous dividend yield before passing to the pricing engine. This pattern is essential for all options on dividend-paying assets. Forgetting the dividend adjustment causes systematic mispricing for the entire equity derivatives book.

## `CW-DERIVATIVES-PRICING-007` — Monotonicity validation for interpolation arrays
**From**: FinancePy · **Applicable to**: derivatives-pricing

FinancePy enforces strictly monotonically increasing time arrays before interpolation operations. This prevents undefined behavior at crossing times and ensures each time point maps to exactly one discount factor. Apply this validation whenever implementing interpolation over financial time series (discount curves, volatility surfaces, forward rates).

## `CW-DERIVATIVES-PRICING-008` — Production vs reference implementation selection
**From**: py_vollib · **Applicable to**: derivatives-pricing

py_vollib explicitly distinguishes between ref_python (slow, educational) and production (fast, C-based lets_be_rational) implementations. Using the reference implementation in production causes 10-100x performance degradation. Always select the appropriate implementation tier based on use case requirements—reference for testing/education, optimized for production trading systems.

FILE:references/components/bootstrap_-_multiple_comparison.md
# bootstrap_&_multiple_comparison (7 classes)

## `IIDBootstrap.conf_int`
`bootstrap_&_multiple_comparison/iidbootstrap-conf-int.py:0`

## `StationaryBootstrap.__init__`
`bootstrap_&_multiple_comparison/stationarybootstrap-init.py:0`

## `MCS.__init__`
`bootstrap_&_multiple_comparison/mcs-init.py:0`

## `SPA.__init__`
`bootstrap_&_multiple_comparison/spa-init.py:0`

## `Bootstrap type`
`bootstrap_&_multiple_comparison/bootstrap-type.py:0`

## `Confidence interval method`
`bootstrap_&_multiple_comparison/confidence-interval-method.py:0`

## `Multiple comparison procedure`
`bootstrap_&_multiple_comparison/multiple-comparison-procedure.py:0`

FILE:references/components/data_input_-_validation.md
# data_input_&_validation (4 classes)

## `ARCHModel.__init__`
`data_input_&_validation/archmodel-init.py:0`

## `ensure1d`
`data_input_&_validation/ensure1d.py:0`

## `to_array_1d`
`data_input_&_validation/to-array-1d.py:0`

## `Input type coercion`
`data_input_&_validation/input-type-coercion.py:0`

FILE:references/components/forecasting_-_simulation.md
# forecasting_&_simulation (5 classes)

## `ARCHModelResult.forecast`
`forecasting_&_simulation/archmodelresult-forecast.py:0`

## `VarianceForecast._analytic_forecast`
`forecasting_&_simulation/varianceforecast-analytic-forecast.py:0`

## `VarianceForecast._simulation_forecast`
`forecasting_&_simulation/varianceforecast-simulation-forecast.py:0`

## `Forecasting method`
`forecasting_&_simulation/forecasting-method.py:0`

## `Alignment`
`forecasting_&_simulation/alignment.py:0`

FILE:references/components/model_specification.md
# model_specification (7 classes)

## `ARCHModel.fit`
`model_specification/archmodel-fit.py:0`

## `ARCHModel.forecast`
`model_specification/archmodel-forecast.py:0`

## `GARCH.__init__`
`model_specification/garch-init.py:0`

## `HARX.__init__`
`model_specification/harx-init.py:0`

## `Mean model`
`model_specification/mean-model.py:0`

## `Volatility model`
`model_specification/volatility-model.py:0`

## `Distribution`
`model_specification/distribution.py:0`

FILE:references/components/parameter_estimation.md
# parameter_estimation (5 classes)

## `ARCHModel.fit`
`parameter_estimation/archmodel-fit.py:0`

## `ARCHModelResult.summary`
`parameter_estimation/archmodelresult-summary.py:0`

## `ARCHModelResult.conf_int`
`parameter_estimation/archmodelresult-conf-int.py:0`

## `Starting values`
`parameter_estimation/starting-values.py:0`

## `Covariance type`
`parameter_estimation/covariance-type.py:0`

FILE:references/components/results_reporting_-_visualization.md
# results_reporting_&_visualization (5 classes)

## `ARCHModelResult.summary`
`results_reporting_&_visualization/archmodelresult-summary.py:0`

## `ARCHModelResult.conf_int`
`results_reporting_&_visualization/archmodelresult-conf-int.py:0`

## `ARCHModelResult.arch_lm_test`
`results_reporting_&_visualization/archmodelresult-arch-lm-test.py:0`

## `WaldTestStatistic`
`results_reporting_&_visualization/waldteststatistic.py:0`

## `Output format`
`results_reporting_&_visualization/output-format.py:0`

FILE:references/components/unit_root_-_cointegration_testing.md
# unit_root_&_cointegration_testing (7 classes)

## `ADF.__init__`
`unit_root_&_cointegration_testing/adf-init.py:0`

## `UnitRootTest.stat`
`unit_root_&_cointegration_testing/unitroottest-stat.py:0`

## `CointegrationTestResult.stat`
`unit_root_&_cointegration_testing/cointegrationtestresult-stat.py:0`

## `DynamicOLS.__init__`
`unit_root_&_cointegration_testing/dynamicols-init.py:0`

## `Test statistic`
`unit_root_&_cointegration_testing/test-statistic.py:0`

## `Lag selection method`
`unit_root_&_cointegration_testing/lag-selection-method.py:0`

## `Covariance kernel`
`unit_root_&_cointegration_testing/covariance-kernel.py:0`

FILE:references/seed.yaml
meta:
  id: finance-bp-124-v5.3
  version: v6.1
  blueprint_id: finance-bp-124
  sop_version: crystal-compilation-v6.1
  source_language: en
  compiled_at: '2026-04-22T13:01:01.570350+00:00'
  target_host: openclaw
  authoritative_artifact:
    primary: seed.yaml
    non_authoritative_derivatives:
    - SKILL.md (host-generated summary, may lag)
    - HEARTBEAT.md (host telemetry)
    - memory/*.md (host conversational memory)
    rule: On any behavioral decision (preconditions check, OV assertion, EQ rule firing, spec_lock verification), agents MUST
      re-read seed.yaml. Derivatives are for UI display only and may be out-of-date.
  execution_protocol:
    install_trigger:
    - Execute resources.host_adapter.install_recipes[] in declared order
    - Verify each package with import check before proceeding
    execute_trigger: When user intent matches intent_router.uc_entries[].positive_terms AND user uses action verb (run/execute/跑/执行/backtest/fetch/collect)
    on_execute:
    - Reload seed.yaml (do not rely on SKILL.md or cached summaries)
    - Run preconditions[] in declared order; halt on first fatal failure with on_fail message to user
    - Enter context_state_machine.CA1_MEMORY_CHECKED state
    - Evaluate evidence_quality.enforcement_rules[]; prepend user_disclosure_template
    - Translate user_facing_fields to user locale per locale_contract
    - "[V6 READING ORDER]\nThis crystal contains the following V6 layers. Before answering any business question, the host\
      \ MUST read them in order:\n  1. anti_patterns[] — cross-project anti-patterns (with AP-* ids)\n  2. cross_project_wisdom[]\
      \ — cross-project wisdom (with CW-* ids)\n  3. domain_constraints_injected[] — domain constraints (SHARED-* ids)\n \
      \ 4. known_use_cases[] — concrete business scenarios (KUC-* ids)\n  5. component_capability_map — AST component map\
      \ (by module)\n\nWhen answering user questions, proactively cite relevant AP-*/CW-*/SHARED-*/KUC-* ids with source text.\
      \ Examples: T+1 rules -> cite SHARED-* constraint; model comparison -> warn via AP-*; follow-holdings strategy -> cite\
      \ KUC-* with example file."
    workspace_resolution:
      scripts_path: '{host_workspace}/scripts/'
      skills_path: '{host_workspace}/skills/'
      trace_path: '{host_workspace}/.trace/'
  capability_tags:
    markets:
    - global
    activities:
    - derivatives-pricing
  upgraded_from: finance-bp-124-v1.seed.yaml
  upgraded_at: '2026-04-22T13:20:34.223301+00:00'
  v6_inputs:
    ast_mind_map: knowledge/sources/finance/finance-bp-124--arch/v6_inputs/ast_mind_map.yaml
    anti_patterns: null
    cross_project_wisdom: null
    examples_kuc: knowledge/sources/finance/finance-bp-124--arch/v6_inputs/examples_kuc.yaml
    shared_pools_dir: knowledge/sources/finance/_shared
anti_patterns:
- id: AP-DERIVATIVES-PRICING-001
  title: Instrument NPV called without attached pricing engine
  description: Calling NPV() on a derivatives instrument without first calling setPricingEngine() returns uninitialized garbage
    values or throws null pointer exceptions. This occurs because the Instrument class relies on the attached PricingEngine
    to perform actual valuation logic. The consequence is silently incorrect pricing results that appear valid, potentially
    leading to bad trading decisions.
  project_source: QuantLib-SWIG (finance-bp-123)
  severity: high
  applicable_to_tags:
    markets:
    - global
    activities:
    - derivatives-pricing
  _source_file: anti-patterns/derivatives-pricing.yaml
- id: AP-DERIVATIVES-PRICING-002
  title: BSM forward price ignores dividend yield
  description: When calculating option prices on dividend-paying stocks using BSM, the forward price must be adjusted as F
    = S * exp((r-q)*t). Omitting the dividend yield adjustment (using F = S * exp(r*t)) causes systematic mispricing for all
    dividend-paying assets. The consequence is consistently wrong option prices that diverge from market prices, leading to
    arbitrage opportunities and trading losses.
  project_source: py_vollib (finance-bp-127)
  severity: high
  applicable_to_tags:
    markets:
    - global
    activities:
    - derivatives-pricing
  _source_file: anti-patterns/derivatives-pricing.yaml
- id: AP-DERIVATIVES-PRICING-003
  title: Negative discount factors passed to log-domain interpolation
  description: When Numba-jitted interpolation functions perform log transformation on discount factors, negative or zero
    values cause domain errors. This occurs because log(-x) and log(0) are mathematically undefined. The consequence is runtime
    crashes in jitted functions and complete failure of discount curve interpolation, blocking all downstream pricing calculations.
  project_source: FinancePy (finance-bp-101)
  severity: high
  applicable_to_tags:
    markets:
    - global
    activities:
    - derivatives-pricing
  _source_file: anti-patterns/derivatives-pricing.yaml
- id: AP-DERIVATIVES-PRICING-004
  title: Non-monotonic time points in discount curve interpolation
  description: Interpolation over non-monotonically increasing time points produces undefined behavior at crossing times,
    causing discount factors to be incorrectly computed where time values overlap. This corrupts the entire term structure
    because the bootstrap algorithm cannot determine which discount factor corresponds to which maturity. The consequence
    is incorrect present value calculations across all downstream products priced against the curve.
  project_source: FinancePy (finance-bp-101)
  severity: high
  applicable_to_tags:
    markets:
    - global
    activities:
    - derivatives-pricing
  _source_file: anti-patterns/derivatives-pricing.yaml
- id: AP-DERIVATIVES-PRICING-005
  title: Bootstrap calibration instruments not in maturity order
  description: When building yield curves from market instruments (deposits, FRAs, swaps), the instruments must be provided
    in strictly increasing maturity order. Out-of-order instruments cause the bootstrap algorithm to solve for discount factors
    at incorrect time points, corrupting the entire term structure. The consequence is wrong forward rates and discount factors
    that propagate into all priced instruments.
  project_source: FinancePy (finance-bp-101)
  severity: high
  applicable_to_tags:
    markets:
    - global
    activities:
    - derivatives-pricing
  _source_file: anti-patterns/derivatives-pricing.yaml
- id: AP-DERIVATIVES-PRICING-006
  title: Option Exercise type mismatches VanillaOption constructor
  description: VanillaOption requires both a StrikedTypePayoff and a matching Exercise object. Using wrong Exercise type (e.g.,
    AmericanExercise for European option) causes compilation failures in C++ or runtime errors in SWIG bindings. The consequence
    is the pricing system cannot initialize options, blocking all option pricing workflows.
  project_source: QuantLib-SWIG (finance-bp-123)
  severity: high
  applicable_to_tags:
    markets:
    - global
    activities:
    - derivatives-pricing
  _source_file: anti-patterns/derivatives-pricing.yaml
- id: AP-DERIVATIVES-PRICING-007
  title: NaN/inf values in ARCH model input data
  description: ARCH model estimation relies on recursive variance computations and scipy optimize. Non-finite input values
    (NaN, inf) cause optimizers to produce NaN results and recursive variance calculations to fail. The consequence is complete
    model estimation failure with meaningless outputs that appear valid, leading to incorrect volatility forecasts and risk
    misestimation.
  project_source: arch (finance-bp-124)
  severity: high
  applicable_to_tags:
    markets:
    - global
    activities:
    - derivatives-pricing
  _source_file: anti-patterns/derivatives-pricing.yaml
- id: AP-DERIVATIVES-PRICING-008
  title: ARCH parameter array concatenation in wrong order
  description: 'ARCHModel composes from three components (mean, volatility, distribution) and requires parameter arrays concatenated
    in fixed order: [mean_params, volatility_params, distribution_params]. Incorrect ordering causes _parse_parameters to
    assign wrong values to wrong components, producing mathematically invalid models (e.g., volatility parameters interpreted
    as distribution parameters). The consequence is invalid conditional variance forecasts.'
  project_source: arch (finance-bp-124)
  severity: high
  applicable_to_tags:
    markets:
    - global
    activities:
    - derivatives-pricing
  _source_file: anti-patterns/derivatives-pricing.yaml
- id: AP-DERIVATIVES-PRICING-009
  title: Zero or negative time-to-expiration in option pricing
  description: Option pricing formulas (Black-Scholes, Black model) compute sqrt(t) in the denominator. Zero time causes division
    by zero; negative time produces NaN in d1/d2 calculations. The consequence is invalid option prices (NaN, inf) that break
    downstream Greeks calculations and hedging workflows.
  project_source: py_vollib (finance-bp-127)
  severity: high
  applicable_to_tags:
    markets:
    - global
    activities:
    - derivatives-pricing
  _source_file: anti-patterns/derivatives-pricing.yaml
- id: AP-DERIVATIVES-PRICING-010
  title: Black model applies spot price instead of forward price
  description: The Black model is designed for options on futures/forwards and expects futures price F as input, not spot
    price S. Using spot directly causes incorrect pricing because the Black formula assumes the underlying follows geometric
    Brownian motion with drift equal to the risk-free rate (i.e., forward dynamics). The consequence is systematically wrong
    forward option prices.
  project_source: py_vollib (finance-bp-127)
  severity: high
  applicable_to_tags:
    markets:
    - global
    activities:
    - derivatives-pricing
  _source_file: anti-patterns/derivatives-pricing.yaml
- id: AP-DERIVATIVES-PRICING-011
  title: Missing discount factor in Black model pricing
  description: Black model pricing must apply time value discounting with deflater = exp(-r*t) to undiscounted option prices.
    Omitting the discount factor produces forward option prices that exceed their fair value by the risk-free compounding
    amount. The consequence is violation of time value of money principles and prices that cannot be used for fair valuation
    or hedging.
  project_source: py_vollib (finance-bp-127)
  severity: medium
  applicable_to_tags:
    markets:
    - global
    activities:
    - derivatives-pricing
  _source_file: anti-patterns/derivatives-pricing.yaml
- id: AP-DERIVATIVES-PRICING-012
  title: Invalid flag parameter ('c'/'p') passed to py_vollib without validation
  description: py_vollib binary_flag dict only contains keys 'c' and 'p'. Passing any other flag value causes KeyError exception.
    The library lacks input validation and crashes on invalid inputs. The consequence is unhandled exceptions in production
    systems when flag values come from external sources with unexpected formats.
  project_source: py_vollib (finance-bp-127)
  severity: medium
  applicable_to_tags:
    markets:
    - global
    activities:
    - derivatives-pricing
  _source_file: anti-patterns/derivatives-pricing.yaml
- id: AP-DERIVATIVES-PRICING-013
  title: Evaluation date not set before QuantLib term structure construction
  description: QuantLib requires ql.Settings.instance().evaluationDate to be set before constructing yield term structures
    and instruments. Without an explicit evaluation date, the curve reference date becomes undefined, causing date calculations
    to fail or produce incorrect settlement dates. The consequence is wrong discount factors and NPV calculations across the
    entire portfolio.
  project_source: QuantLib-SWIG (finance-bp-123)
  severity: medium
  applicable_to_tags:
    markets:
    - global
    activities:
    - derivatives-pricing
  _source_file: anti-patterns/derivatives-pricing.yaml
- id: AP-DERIVATIVES-PRICING-014
  title: Market quotes passed without QuoteHandle wrapper
  description: QuantLib's observer pattern requires all market quotes to be wrapped in QuoteHandle before passing to rate
    helpers. Raw quote values bypass the observable notification mechanism, causing dependent instruments to never recalculate
    when market data updates. The consequence is stale pricing that doesn't reflect current market conditions.
  project_source: QuantLib-SWIG (finance-bp-123)
  severity: medium
  applicable_to_tags:
    markets:
    - global
    activities:
    - derivatives-pricing
  _source_file: anti-patterns/derivatives-pricing.yaml
- id: AP-DERIVATIVES-PRICING-015
  title: Implied volatility computed without proper bounds validation
  description: When computing implied volatility, option prices outside theoretical bounds (below intrinsic value or above
    maximum) must raise appropriate exceptions. Returning invalid IV values (negative volatility or extreme values) violates
    mathematical definitions and leads to incorrect pricing, risk calculations, and hedging ratios. The consequence is systemic
    pricing errors across all vol-dependent derivatives.
  project_source: py_vollib (finance-bp-127)
  severity: medium
  applicable_to_tags:
    markets:
    - global
    activities:
    - derivatives-pricing
  _source_file: anti-patterns/derivatives-pricing.yaml
cross_project_wisdom:
- wisdom_id: CW-DERIVATIVES-PRICING-001
  source_project: FinancePy, QuantLib-SWIG
  pattern_name: Strict input validation before financial calculations
  description: Both FinancePy and QuantLib-SWIG enforce strict validation of all input parameters before any financial computation.
    FinancePy validates day count types, date arguments, tolerance parameters, and max iterations. QuantLib-SWIG validates
    exercise types and swap direction enums. This pattern prevents corrupted calculations and provides clear error messages.
    Apply this pattern by validating all inputs at function entry points.
  applicable_to_activity: derivatives-pricing
  _source_file: cross-project-wisdom/derivatives-pricing.yaml
- wisdom_id: CW-DERIVATIVES-PRICING-002
  source_project: FinancePy, QuantLib-SWIG
  pattern_name: Bootstrap requires ordered instrument calibration
  description: Both FinancePy and QuantLib-SWIG require calibration instruments to be provided in strict maturity order for
    curve bootstrapping. FinancePy enforces monotonically increasing time points and validates instrument sequencing (deposits
    before FRAs before swaps). QuantLib-SWIG uses bootstrap helpers (DepositRateHelper, FraRateHelper, SwapRateHelper) that
    assume ordered inputs. This ensures the bootstrap algorithm solves for discount factors at mathematically correct time
    points.
  applicable_to_activity: derivatives-pricing
  _source_file: cross-project-wisdom/derivatives-pricing.yaml
- wisdom_id: CW-DERIVATIVES-PRICING-003
  source_project: QuantLib-SWIG
  pattern_name: Handle pattern for lazy evaluation chains
  description: QuantLib-SWIG requires wrapping market data (quotes, term structures) in Handle objects to enable lazy evaluation
    and automatic recalculation. QuoteHandle for market quotes and Handle for term structures enable the observer pattern.
    When market data updates, all dependent instruments automatically recalculate. This pattern is essential for live pricing
    systems where prices must reflect current market conditions.
  applicable_to_activity: derivatives-pricing
  _source_file: cross-project-wisdom/derivatives-pricing.yaml
- wisdom_id: CW-DERIVATIVES-PRICING-004
  source_project: arch
  pattern_name: Parameter composition requires fixed ordering and partitioning
  description: arch enforces a strict parameter composition pattern where mean, volatility, and distribution parameters must
    be concatenated in fixed order with explicit offset partitioning. The offsets array partitions the unified parameter vector
    into components. This pattern prevents parameter assignment errors that would corrupt model components. Apply this when
    composing financial models from multiple sub-components.
  applicable_to_activity: derivatives-pricing
  _source_file: cross-project-wisdom/derivatives-pricing.yaml
- wisdom_id: CW-DERIVATIVES-PRICING-005
  source_project: arch, py_vollib
  pattern_name: Strict mathematical constraint enforcement
  description: 'Both arch and py_vollib enforce strict mathematical constraints: arch enforces volatility model stationarity
    constraints (A.dot(params) - b >= 0) for SLSQP optimization; py_vollib validates implied volatility is positive and option
    prices within intrinsic/maximum bounds. Violating these constraints produces mathematically invalid results. Always enforce
    domain constraints on all financial model parameters.'
  applicable_to_activity: derivatives-pricing
  _source_file: cross-project-wisdom/derivatives-pricing.yaml
- wisdom_id: CW-DERIVATIVES-PRICING-006
  source_project: py_vollib
  pattern_name: Forward price adjustment for dividend yield in BSM
  description: 'py_vollib demonstrates the correct BSM implementation: compute forward price F = S * exp((r-q)*t) to adjust
    for continuous dividend yield before passing to the pricing engine. This pattern is essential for all options on dividend-paying
    assets. Forgetting the dividend adjustment causes systematic mispricing for the entire equity derivatives book.'
  applicable_to_activity: derivatives-pricing
  _source_file: cross-project-wisdom/derivatives-pricing.yaml
- wisdom_id: CW-DERIVATIVES-PRICING-007
  source_project: FinancePy
  pattern_name: Monotonicity validation for interpolation arrays
  description: FinancePy enforces strictly monotonically increasing time arrays before interpolation operations. This prevents
    undefined behavior at crossing times and ensures each time point maps to exactly one discount factor. Apply this validation
    whenever implementing interpolation over financial time series (discount curves, volatility surfaces, forward rates).
  applicable_to_activity: derivatives-pricing
  _source_file: cross-project-wisdom/derivatives-pricing.yaml
- wisdom_id: CW-DERIVATIVES-PRICING-008
  source_project: py_vollib
  pattern_name: Production vs reference implementation selection
  description: py_vollib explicitly distinguishes between ref_python (slow, educational) and production (fast, C-based lets_be_rational)
    implementations. Using the reference implementation in production causes 10-100x performance degradation. Always select
    the appropriate implementation tier based on use case requirements—reference for testing/education, optimized for production
    trading systems.
  applicable_to_activity: derivatives-pricing
  _source_file: cross-project-wisdom/derivatives-pricing.yaml
domain_constraints_injected: []
resources_injected: {}
known_use_cases:
- kuc_id: KUC-101
  source_file: examples/bootstrap_examples.ipynb
  business_problem: Computes statistical inference (confidence intervals, standard errors) for the Sharpe Ratio using bootstrap
    methods to quantify uncertainty in risk-adjusted performance metrics.
  intent_keywords:
  - bootstrap
  - sharpe ratio
  - statistical inference
  - confidence intervals
  - stationary bootstrap
  stage: factor_computation
  data_domain: financial_data
  type: research_analysis
- kuc_id: KUC-102
  source_file: examples/multiple-comparison_examples.ipynb
  business_problem: Compares 500 predictive models against a benchmark using the Superior Predictive Ability (SPA) test to
    determine if any models significantly outperform the benchmark.
  intent_keywords:
  - model comparison
  - SPA test
  - multiple models
  - benchmark comparison
  - superior predictive ability
  stage: factor_computation
  data_domain: financial_data
  type: research_analysis
- kuc_id: KUC-103
  source_file: examples/unitroot_cointegration_examples.ipynb
  business_problem: Tests for cointegration relationships between WTI and Brent crude oil prices to identify mean-reverting
    spread opportunities using Engle-Granger and Phillips-Ouliaris tests.
  intent_keywords:
  - cointegration
  - unit root
  - ADF test
  - Engle-Granger
  - oil prices
  - mean reversion
  stage: factor_computation
  data_domain: financial_data
  type: research_analysis
- kuc_id: KUC-104
  source_file: examples/unitroot_examples.ipynb
  business_problem: Tests for stationarity in credit spreads (BAA-AAA) using Augmented Dickey-Fuller tests to determine if
    mean-reversion trading strategies are applicable.
  intent_keywords:
  - unit root
  - ADF test
  - stationarity
  - credit spread
  - mean reversion
  stage: factor_computation
  data_domain: financial_data
  type: research_analysis
- kuc_id: KUC-105
  source_file: examples/univariate_forecasting_with_exogenous_variables.ipynb
  business_problem: Forecasts univariate time series using Autoregressive models with exogenous variables (ARX) to capture
    the impact of external factors on the target variable.
  intent_keywords:
  - ARX
  - exogenous variables
  - forecasting
  - autoregressive
  - regression
  stage: factor_computation
  data_domain: financial_data
  type: research_analysis
- kuc_id: KUC-106
  source_file: examples/univariate_using_fixed_variance.ipynb
  business_problem: Demonstrates how to specify a HARX mean model with fixed/external variance inputs and iteratively fit
    volatility models using the estimated conditional volatility.
  intent_keywords:
  - fixed variance
  - HARX
  - volatility modeling
  - GARCH
  - VIX
  stage: factor_computation
  data_domain: financial_data
  type: research_analysis
- kuc_id: KUC-107
  source_file: examples/univariate_volatility_forecasting.ipynb
  business_problem: Forecasts future volatility of S&P 500 returns using GARCH models, including multi-step ahead forecasts
    and rolling window out-of-sample predictions.
  intent_keywords:
  - GARCH
  - volatility forecasting
  - S&P 500
  - conditional variance
  - out-of-sample
  stage: factor_computation
  data_domain: financial_data
  type: research_analysis
- kuc_id: KUC-108
  source_file: examples/univariate_volatility_modeling.ipynb
  business_problem: Fits and compares different GARCH volatility model specifications (symmetric, asymmetric, power) with
    various error distributions to characterize S&P 500 return volatility dynamics.
  intent_keywords:
  - GARCH
  - volatility modeling
  - S&P 500
  - model comparison
  - asymmetric GARCH
  - t-distribution
  stage: factor_computation
  data_domain: financial_data
  type: research_analysis
- kuc_id: KUC-109
  source_file: examples/univariate_volatility_scenarios.ipynb
  business_problem: Generates multiple volatility scenarios for NASDAQ returns using simulation-based forecasting methods,
    useful for risk management and option pricing applications.
  intent_keywords:
  - volatility scenarios
  - simulation
  - NASDAQ
  - GARCH
  - scenario analysis
  - risk management
  stage: factor_computation
  data_domain: financial_data
  type: research_analysis
component_capability_map:
  project: finance-bp-124--arch
  scan_date: '2026-04-22'
  stats:
    total_files: 7
    total_classes: 40
    total_functions: 0
    total_stages: 7
  modules:
    data_input_&_validation:
      class_count: 4
      stage_id: data_input
      stage_order: 1
      responsibility: Accept time series data (numpy/pandas), validate finiteness, convert to contiguous float64 arrays for
        core computation. Provides type flexibility while ensuring memory layout efficiency for downstream numeric operations.
      classes:
      - name: ARCHModel.__init__
        file: data_input_&_validation/archmodel-init.py
        line: 0
        kind: required_method
        signature: ''
      - name: ensure1d
        file: data_input_&_validation/ensure1d.py
        line: 0
        kind: required_method
        signature: ''
      - name: to_array_1d
        file: data_input_&_validation/to-array-1d.py
        line: 0
        kind: required_method
        signature: ''
      - name: Input type coercion
        file: data_input_&_validation/input-type-coercion.py
        line: 0
        kind: replaceable_point
      design_decision_count: 2
    model_specification:
      class_count: 7
      stage_id: model_specification
      stage_order: 2
      responsibility: Allow users to compose mean model + volatility process + distribution as pluggable components. Each
        component implements a common interface for unified likelihood computation and forecasting.
      classes:
      - name: ARCHModel.fit
        file: model_specification/archmodel-fit.py
        line: 0
        kind: required_method
        signature: ''
      - name: ARCHModel.forecast
        file: model_specification/archmodel-forecast.py
        line: 0
        kind: required_method
        signature: ''
      - name: GARCH.__init__
        file: model_specification/garch-init.py
        line: 0
        kind: required_method
        signature: ''
      - name: HARX.__init__
        file: model_specification/harx-init.py
        line: 0
        kind: required_method
        signature: ''
      - name: Mean model
        file: model_specification/mean-model.py
        line: 0
        kind: replaceable_point
      - name: Volatility model
        file: model_specification/volatility-model.py
        line: 0
        kind: replaceable_point
      - name: Distribution
        file: model_specification/distribution.py
        line: 0
        kind: replaceable_point
      design_decision_count: 5
    parameter_estimation:
      class_count: 5
      stage_id: parameter_estimation
      stage_order: 3
      responsibility: Jointly estimate mean+volatility+distribution parameters via constrained maximum likelihood. Uses SLSQP
        with bounds and inequality constraints derived from stationarity/coerciveness requirements.
      classes:
      - name: ARCHModel.fit
        file: parameter_estimation/archmodel-fit.py
        line: 0
        kind: required_method
        signature: ''
      - name: ARCHModelResult.summary
        file: parameter_estimation/archmodelresult-summary.py
        line: 0
        kind: required_method
        signature: ''
      - name: ARCHModelResult.conf_int
        file: parameter_estimation/archmodelresult-conf-int.py
        line: 0
        kind: required_method
        signature: ''
      - name: Starting values
        file: parameter_estimation/starting-values.py
        line: 0
        kind: replaceable_point
      - name: Covariance type
        file: parameter_estimation/covariance-type.py
        line: 0
        kind: replaceable_point
      design_decision_count: 4
    forecasting_&_simulation:
      class_count: 5
      stage_id: forecasting
      stage_order: 4
      responsibility: Generate multi-step volatility/mean forecasts using analytic formulas, simulation, or bootstrap. Handles
        alignment (origin vs target), reindexing, and exogenous regressors.
      classes:
      - name: ARCHModelResult.forecast
        file: forecasting_&_simulation/archmodelresult-forecast.py
        line: 0
        kind: required_method
        signature: ''
      - name: VarianceForecast._analytic_forecast
        file: forecasting_&_simulation/varianceforecast-analytic-forecast.py
        line: 0
        kind: required_method
        signature: ''
      - name: VarianceForecast._simulation_forecast
        file: forecasting_&_simulation/varianceforecast-simulation-forecast.py
        line: 0
        kind: required_method
        signature: ''
      - name: Forecasting method
        file: forecasting_&_simulation/forecasting-method.py
        line: 0
        kind: replaceable_point
      - name: Alignment
        file: forecasting_&_simulation/alignment.py
        line: 0
        kind: replaceable_point
      design_decision_count: 3
    unit_root_&_cointegration_testing:
      class_count: 7
      stage_id: unitroot_testing
      stage_order: 5
      responsibility: Test time series for stationarity (unit roots) and cross-series cointegration relationships. Supports
        ADF, DFGLS, PhillipsPerron, KPSS, ZivotAndrews, VarianceRatio, Engle-Granger, Phillips-Ouliaris, DOLS, FMOLS.
      classes:
      - name: ADF.__init__
        file: unit_root_&_cointegration_testing/adf-init.py
        line: 0
        kind: required_method
        signature: ''
      - name: UnitRootTest.stat
        file: unit_root_&_cointegration_testing/unitroottest-stat.py
        line: 0
        kind: required_method
        signature: ''
      - name: CointegrationTestResult.stat
        file: unit_root_&_cointegration_testing/cointegrationtestresult-stat.py
        line: 0
        kind: required_method
        signature: ''
      - name: DynamicOLS.__init__
        file: unit_root_&_cointegration_testing/dynamicols-init.py
        line: 0
        kind: required_method
        signature: ''
      - name: Test statistic
        file: unit_root_&_cointegration_testing/test-statistic.py
        line: 0
        kind: replaceable_point
      - name: Lag selection method
        file: unit_root_&_cointegration_testing/lag-selection-method.py
        line: 0
        kind: replaceable_point
      - name: Covariance kernel
        file: unit_root_&_cointegration_testing/covariance-kernel.py
        line: 0
        kind: replaceable_point
      design_decision_count: 1
    bootstrap_&_multiple_comparison:
      class_count: 7
      stage_id: bootstrap_inference
      stage_order: 6
      responsibility: Time-series bootstrap for standard errors/confidence intervals; multiple comparison procedures (MCS,
        StepM, SPA) for model selection. Supports block bootstrap (circular, stationary, moving) and independent resampling.
      classes:
      - name: IIDBootstrap.conf_int
        file: bootstrap_&_multiple_comparison/iidbootstrap-conf-int.py
        line: 0
        kind: required_method
        signature: ''
      - name: StationaryBootstrap.__init__
        file: bootstrap_&_multiple_comparison/stationarybootstrap-init.py
        line: 0
        kind: required_method
        signature: ''
      - name: MCS.__init__
        file: bootstrap_&_multiple_comparison/mcs-init.py
        line: 0
        kind: required_method
        signature: ''
      - name: SPA.__init__
        file: bootstrap_&_multiple_comparison/spa-init.py
        line: 0
        kind: required_method
        signature: ''
      - name: Bootstrap type
        file: bootstrap_&_multiple_comparison/bootstrap-type.py
        line: 0
        kind: replaceable_point
      - name: Confidence interval method
        file: bootstrap_&_multiple_comparison/confidence-interval-method.py
        line: 0
        kind: replaceable_point
      - name: Multiple comparison procedure
        file: bootstrap_&_multiple_comparison/multiple-comparison-procedure.py
        line: 0
        kind: replaceable_point
      design_decision_count: 4
    results_reporting_&_visualization:
      class_count: 5
      stage_id: results_reporting
      stage_order: 7
      responsibility: Format and display estimation results (summary tables, R², AIC/BIC, parameter table with std_err/tvalues/pvalues),
        residual diagnostics (ARCH-LM test), and visualization (hedgehog plots, residual plots).
      classes:
      - name: ARCHModelResult.summary
        file: results_reporting_&_visualization/archmodelresult-summary.py
        line: 0
        kind: required_method
        signature: ''
      - name: ARCHModelResult.conf_int
        file: results_reporting_&_visualization/archmodelresult-conf-int.py
        line: 0
        kind: required_method
        signature: ''
      - name: ARCHModelResult.arch_lm_test
        file: results_reporting_&_visualization/archmodelresult-arch-lm-test.py
        line: 0
        kind: required_method
        signature: ''
      - name: WaldTestStatistic
        file: results_reporting_&_visualization/waldteststatistic.py
        line: 0
        kind: required_method
        signature: ''
      - name: Output format
        file: results_reporting_&_visualization/output-format.py
        line: 0
        kind: replaceable_point
      design_decision_count: 1
  data_flow_hints: []
locale_contract:
  source_language: en
  user_facing_fields:
  - human_summary.what_i_can_do.tagline
  - human_summary.what_i_can_do.use_cases[]
  - human_summary.what_i_auto_fetch[]
  - human_summary.what_i_ask_you[]
  - evidence_quality.user_disclosure_template
  - post_install_notice.message_template.positioning
  - post_install_notice.message_template.capability_catalog.groups[].name
  - post_install_notice.message_template.capability_catalog.groups[].description
  - post_install_notice.message_template.capability_catalog.groups[].ucs[].name
  - post_install_notice.message_template.capability_catalog.groups[].ucs[].short_description
  - post_install_notice.message_template.call_to_action
  - post_install_notice.message_template.featured_entries[].beginner_prompt
  - post_install_notice.message_template.more_info_hint
  - preconditions[].description
  - preconditions[].on_fail
  - intent_router.uc_entries[].name
  - intent_router.uc_entries[].ambiguity_question
  - architecture.pipeline
  - architecture.stages[].narrative.does_what
  - architecture.stages[].narrative.key_decisions
  - architecture.stages[].narrative.common_pitfalls
  - constraints.fatal[].consequence
  - constraints.regular[].consequence
  - output_validator.assertions[].failure_message
  - acceptance.hard_gates[].on_fail
  - skill_crystallization.action
  locale_detection_order:
  - explicit_user_declaration
  - first_message_language
  - system_locale
  translation_enforcement:
    trigger: on_first_user_message
    action: Render user_facing_fields in detected locale, preserving all IDs (BD-/SL-/UC-/finance-C-) and code identifiers
      verbatim
    violation_code: LOCALE-01
    violation_signal: User receives untranslated English Human Summary when detected locale != en
evidence_quality:
  declared:
    evidence_coverage_ratio: 1.0
    evidence_verify_ratio: 0.4722222222222222
    evidence_invalid: 38
    evidence_verified: 34
    evidence_auto_fixed: 0
    audit_coverage: 53/53 (100%)
    audit_pass_rate: 3/53 (5%)
    audit_fail_total: 32
    audit_finance_universal:
      pass: 1
      warn: 9
      fail: 10
    audit_subdomain_totals:
      pass: 2
      warn: 9
      fail: 22
  enforcement_rules:
  - id: EQ-01
    trigger: declared.evidence_verify_ratio < 0.5
    action: MUST invoke traceback lookup for all cited BD-IDs in output before emitting business code — read LATEST.yaml sections
      for each BD referenced
    violation_code: EQ-01-V
    violation_signal: Generated script references BD-IDs but no tool_call to read LATEST.yaml preceded code generation
  user_disclosure_template: '[QUALITY NOTICE] This crystal was compiled from blueprint finance-bp-124. Evidence verify ratio
    = 47.2% and audit fail total = 32. Generated results may have uncaptured requirement gaps. Verify critical decisions against
    source files (LATEST.yaml / LATEST.jsonl).'
traceback:
  source_files:
    blueprint: LATEST.yaml
    constraints: LATEST.jsonl
  mandatory_lookup_scenarios:
  - id: TB-01
    condition: Two constraints have apparently conflicting enforcement rules
    lookup_target: LATEST.jsonl — find both constraint IDs, compare `consequence` + `evidence_refs` to determine priority
  - id: TB-02
    condition: A business decision rationale is unclear or disputed
    lookup_target: LATEST.yaml — locate BD-ID under business_decisions, read `rationale` + `alternative_considered` fields
  - id: TB-03
    condition: evidence_invalid > 0 in evidence_quality.declared
    lookup_target: LATEST.yaml _enrich_meta — cross-check specific BD `evidence_refs` fields for invalid markers
  - id: TB-04
    condition: User asks where a rule comes from
    lookup_target: LATEST.jsonl — find constraint by ID, read `confidence.evidence_refs` for source file + line number
  - id: TB-05
    condition: Generated code does not match expected ZVT API behavior
    lookup_target: LATEST.yaml stages[].required_methods — verify method signature and evidence locator in source code
  degraded_lookup:
    no_fs_access: 'Ask the user to paste the relevant LATEST.yaml section or LATEST.jsonl lines for the BD-/finance-C- IDs
      in question. Crystal ID: finance-bp-124-v5.0.'
trace_schema:
  event_types:
  - precondition_check
  - spec_lock_check
  - evidence_rule_fired
  - evidence_rule_skipped
  - locale_translation_emitted
  - hard_gate_passed
  - hard_gate_failed
  - skill_emitted
  - false_completion_claim
preconditions:
- id: PC-01
  description: zvt package installed and importable
  check_command: python3 -c 'import zvt; print(zvt.__version__)'
  on_fail: 'Run: python3 -m pip install zvt  then re-run: python3 -m zvt.init_dirs to initialize data directories'
  severity: fatal
- id: PC-02
  description: K-data exists for target entities (required before backtesting)
  check_command: python3 -c "from zvt.api.kdata import get_kdata; df = get_kdata(entity_ids=['stock_sh_600000'], limit=1);
    assert df is not None and len(df) > 0, 'No kdata found'"
  on_fail: 'Run recorder first: python3 -m zvt.recorders.em.em_stock_kdata_recorder --entity_ids stock_sh_600000  (replace
    with your target entity IDs)'
  severity: fatal
  applies_to_uc:
  - UC-101
  - UC-102
  - UC-103
  - UC-104
  - UC-105
  - UC-106
  - UC-107
  - UC-108
  - UC-109
- id: PC-03
  description: ZVT data directory initialized (~/.zvt or ZVT_HOME)
  check_command: 'python3 -c "import os; from pathlib import Path; zvt_home = Path(os.environ.get(''ZVT_HOME'', Path.home()
    / ''.zvt'')); assert zvt_home.exists(), f''ZVT home not found: {zvt_home}''"'
  on_fail: 'Run: python3 -m zvt.init_dirs'
  severity: fatal
- id: PC-04
  description: SQLite write permission for ZVT data directory
  check_command: python3 -c "import os, tempfile; from pathlib import Path; zvt_home = Path(os.environ.get('ZVT_HOME', Path.home()
    / '.zvt')); test_f = zvt_home / '.write_test'; test_f.touch(); test_f.unlink()"
  on_fail: 'Check directory permissions: chmod u+w ~/.zvt  or set ZVT_HOME environment variable to a writable location'
  severity: warn
intent_router:
  uc_entries:
  - uc_id: UC-101
    name: Sharpe Ratio Bootstrap Statistical Inference
    positive_terms:
    - bootstrap
    - sharpe ratio
    - statistical inference
    - confidence intervals
    - stationary bootstrap
    data_domain: financial_data
    negative_terms:
    - GARCH
    - volatility forecasting
    - unit root
    - cointegration
    - model comparison
    ambiguity_question: Are you looking to compute statistical inference on performance metrics like Sharpe Ratio using bootstrap
      methods?
  - uc_id: UC-102
    name: Multiple Model Comparison with SPA Test
    positive_terms:
    - model comparison
    - SPA test
    - multiple models
    - benchmark comparison
    - superior predictive ability
    data_domain: financial_data
    negative_terms:
    - GARCH
    - volatility forecasting
    - unit root
    - cointegration
    - bootstrap
    ambiguity_question: Do you want to test whether multiple models can significantly beat a benchmark predictor?
  - uc_id: UC-103
    name: Oil Price Cointegration Analysis
    positive_terms:
    - cointegration
    - unit root
    - ADF test
    - Engle-Granger
    - oil prices
    - mean reversion
    data_domain: financial_data
    negative_terms:
    - GARCH
    - volatility forecasting
    - bootstrap
    - model comparison
    ambiguity_question: Are you testing whether two related financial series (like crude oil prices) move together in a long-run
      equilibrium?
  - uc_id: UC-104
    name: Credit Spread Stationarity Testing
    positive_terms:
    - unit root
    - ADF test
    - stationarity
    - credit spread
    - mean reversion
    data_domain: financial_data
    negative_terms:
    - cointegration
    - GARCH
    - volatility forecasting
    - bootstrap
    ambiguity_question: Are you testing whether a time series (like credit spreads) is stationary or has a unit root?
  - uc_id: UC-105
    name: ARX Forecasting with Exogenous Variables
    positive_terms:
    - ARX
    - exogenous variables
    - forecasting
    - autoregressive
    - regression
    data_domain: financial_data
    negative_terms:
    - GARCH
    - volatility modeling
    - cointegration
    - unit root
    ambiguity_question: Do you want to forecast a time series while accounting for the effect of external/exogenous variables?
  - uc_id: UC-106
    name: HARX Volatility Modeling with Fixed Variance
    positive_terms:
    - fixed variance
    - HARX
    - volatility modeling
    - GARCH
    - VIX
    data_domain: financial_data
    negative_terms:
    - cointegration
    - unit root
    - exogenous variables
    - bootstrap
    ambiguity_question: Do you want to model volatility using a pre-specified or externally computed variance series?
  - uc_id: UC-107
    name: S&P 500 GARCH Volatility Forecasting
    positive_terms:
    - GARCH
    - volatility forecasting
    - S&P 500
    - conditional variance
    - out-of-sample
    data_domain: financial_data
    negative_terms:
    - cointegration
    - unit root
    - bootstrap
    - model comparison
    ambiguity_question: Do you need to forecast future volatility levels for the S&P 500 or similar assets?
  - uc_id: UC-108
    name: S&P 500 GARCH Volatility Model Comparison
    positive_terms:
    - GARCH
    - volatility modeling
    - S&P 500
    - model comparison
    - asymmetric GARCH
    - t-distribution
    data_domain: financial_data
    negative_terms:
    - cointegration
    - unit root
    - bootstrap
    - exogenous variables
    ambiguity_question: Are you fitting GARCH volatility models to characterize return variance dynamics?
  - uc_id: UC-109
    name: NASDAQ Volatility Scenario Generation
    positive_terms:
    - volatility scenarios
    - simulation
    - NASDAQ
    - GARCH
    - scenario analysis
    - risk management
    data_domain: financial_data
    negative_terms:
    - cointegration
    - unit root
    - bootstrap
    - model comparison
    ambiguity_question: Do you need to generate multiple simulated volatility scenarios for stress testing or risk analysis?
context_state_machine:
  states:
  - id: CA1_MEMORY_CHECKED
    entry: Task started
    exit: All memory queries attempted and recorded; memory_unavailable set if failed
    timeout: 30s — skip memory, mark memory_unavailable=true, proceed to CA2
  - id: CA2_GAPS_FILLED
    entry: CA1 complete
    exit: 'All FATAL-priority required inputs answered: target market (A-share/HK/US), data source, time range, strategy type'
    timeout: NOT skippable — FATAL inputs MUST be user-answered before proceeding
  - id: CA3_PATH_SELECTED
    entry: CA2 complete
    exit: intent_router matched single use case with confidence gap > 20% over next candidate, no data_domain ambiguity
    timeout: Trigger ambiguity_question for top-2 candidates, await user selection
  - id: CA4_EXECUTING
    entry: CA3 complete + user explicit confirmation received
    exit: All hard gates G1-Gn passed and output files written
    timeout: NOT skippable — user confirmation of execution path required
  enforcement: Code generation is PROHIBITED before CA4_EXECUTING. Any regression to earlier state MUST be announced to user.
    buy/sell ordering SL-01 check runs at CA4 entry.
spec_lock_registry:
  semantic_locks:
  - id: SL-01
    description: Execute sell orders before buy orders in every trading cycle
    locked_value: sell() called before buy() in each Trader.run() iteration
    violation_is: fatal
    source_bd_ids:
    - BD-018
  - id: SL-02
    description: Trading signals MUST use next-bar execution (no look-ahead)
    locked_value: due_timestamp = happen_timestamp + level.to_second()
    violation_is: fatal
    source_bd_ids:
    - BD-014
    - BD-025
  - id: SL-03
    description: Entity IDs MUST follow format entity_type_exchange_code
    locked_value: stock_sh_600000 | stockhk_hk_0700 | stockus_nasdaq_AAPL
    violation_is: fatal
    source_bd_ids: []
  - id: SL-04
    description: DataFrame index MUST be MultiIndex (entity_id, timestamp)
    locked_value: df.index.names == ['entity_id', 'timestamp']
    violation_is: fatal
    source_bd_ids: []
  - id: SL-05
    description: 'TradingSignal MUST have EXACTLY ONE of: position_pct, order_money, order_amount'
    locked_value: XOR enforcement in trading/__init__.py:68
    violation_is: fatal
    source_bd_ids: []
  - id: SL-06
    description: 'filter_result column semantics: True=BUY, False=SELL, None/NaN=NO ACTION'
    locked_value: factor.py:475 order_type_flag mapping
    violation_is: fatal
    source_bd_ids: []
  - id: SL-07
    description: Transformer MUST run BEFORE Accumulator in factor pipeline
    locked_value: 'compute_result(): transform at :403 before accumulator at :409'
    violation_is: fatal
    source_bd_ids: []
  - id: SL-08
    description: 'MACD parameters locked: fast=12, slow=26, signal=9'
    locked_value: factors/algorithm.py:30 macd(slow=26, fast=12, n=9)
    violation_is: fatal
    source_bd_ids:
    - BD-036
  - id: SL-09
    description: 'Default transaction costs: buy_cost=0.001, sell_cost=0.001, slippage=0.001'
    locked_value: sim_account.py:25 SimAccountService default costs
    violation_is: warning
    source_bd_ids:
    - BD-029
  - id: SL-10
    description: A-share equity trading is T+1 (no same-day close of buy positions)
    locked_value: sim_account.available_long filters by trading_t
    violation_is: fatal
    source_bd_ids: []
  - id: SL-11
    description: Recorder subclass MUST define provider AND data_schema class attributes
    locked_value: contract/recorder.py:71 Meta; register_schema decorator
    violation_is: fatal
    source_bd_ids: []
  - id: SL-12
    description: Factor result_df MUST contain either 'filter_result' OR 'score_result' column
    locked_value: result_df.columns.intersection({'filter_result', 'score_result'}) non-empty
    violation_is: fatal
    source_bd_ids: []
  implementation_hints:
  - id: IH-01
    hint: 'Use AdjustType enum exactly: qfq (pre-adjust), hfq (post-adjust), bfq (none) — contract/__init__.py:121'
  - id: IH-02
    hint: For A-share kdata, default to hfq for long-term analysis (dividend-adjusted) — trader.py:538 StockTrader
  - id: IH-03
    hint: SQLite connection MUST use check_same_thread=False for multi-threaded recorders
  - id: IH-04
    hint: Accumulator state serialization uses JSON with custom encoder/decoder hooks — contract/base_service.py
  - id: IH-05
    hint: Factor.level MUST match TargetSelector.level (enforced at add_factor) — factors/target_selector.py:84
preservation_manifest:
  required_objects:
    business_decisions_count: 107
    fatal_constraints_count: 77
    non_fatal_constraints_count: 151
    use_cases_count: 9
    semantic_locks_count: 12
    preconditions_count: 4
    evidence_quality_rules_count: 2
    traceback_scenarios_count: 5
architecture:
  pipeline: data_collection -> data_storage -> factor_computation -> target_selection -> trading_execution -> visualization
  stages:
  - id: data_collection
    narrative:
      does_what: TimeSeriesDataRecorder and FixedCycleDataRecorder fetch OHLCV and fundamental data from providers (eastmoney,
        joinquant, baostock, akshare) and persist domain objects (Stock1dKdata, BalanceSheet) to SQLite via df_to_db().
      key_decisions: BD-002 chose evaluate_start_end_size_timestamps for incremental fetch (not full refresh) because comparing
        to get_latest_saved_record avoids redundant API calls; BD-003 chose get_data_map field transformation to keep domain
        schema provider-agnostic.
      common_pitfalls: 'Don''t forget SL-11: Recorder subclass MUST declare both provider and data_schema class attributes
        else initialization fails with assertion error; finance-C-001 fatal violation.'
    business_decisions: []
  - id: data_storage
    narrative:
      does_what: StorageBackend persists DataFrames to per-provider SQLite databases at {data_path}/{provider}/{provider}_{db_name}.db
        using path templates from _get_path_template; Mixin.record_data and Mixin.query_data provide uniform read/write interface.
      key_decisions: BD-004 chose StorageBackend abstraction (not hardcoded SQLite) to allow future cloud storage swap; BD-006
        derives db_name from data_schema __tablename__ for per-domain database isolation.
      common_pitfalls: SL-04 violation (wrong DataFrame index) causes factor pipeline failures downstream; always ensure df.index.names
        == ['entity_id', 'timestamp'] before calling record_data.
    business_decisions: []
  - id: factor_computation
    narrative:
      does_what: Factor.compute() applies Transformer (stateless, e.g. MacdTransformer) then Accumulator (stateful, e.g. MaStatsAccumulator)
        to produce filter_result or score_result columns; EntityStateService persists per-entity rolling state across batches.
      key_decisions: BD-007 chose Factor inheriting DataReader for composable data access; SL-08 locks MACD at (fast=12, slow=26,
        n=9) — chose standard Appel parameters not adaptive because interpretability matters for practitioners.
      common_pitfalls: 'SL-07: Transformer MUST run before Accumulator — swapping order causes NaN propagation; SL-12: result_df
        must contain filter_result OR score_result column or TargetSelector silently drops all signals.'
    business_decisions: []
  - id: target_selection
    narrative:
      does_what: TargetSelector.add_factor() registers Factor instances; get_targets() returns entity_ids passing threshold
        filter at a specific timestamp, enabling point-in-time historical backtesting without look-ahead.
      key_decisions: BD-012 chose registrable factor list (not hardcoded) for runtime customization; BD-013 chose timestamp-specific
        filtering not current-only because backtests need historical point-in-time correctness.
      common_pitfalls: Factor.level MUST match TargetSelector.level (IH-05); mismatched levels cause silent empty target lists
        that look like no signals but are actually level-mismatch bugs.
    business_decisions: []
  - id: trading_execution
    narrative:
      does_what: Trader.run() calls sell() before buy() each cycle, generates TradingSignals with due_timestamp = happen_timestamp
        + level.to_second() for next-bar execution, and applies on_profit_control() for stop-loss/take-profit before regular
        target selection.
      key_decisions: SL-01 locks sell-before-buy order because available_long check in sim_account depends on it — chose this
        over symmetric ordering to prevent implicit leverage; BD-039 chose long=AND/short=OR multi-level logic to reflect
        risk asymmetry.
      common_pitfalls: 'SL-02 violation (immediate execution instead of next-bar) introduces look-ahead bias and makes backtest
        results unreproducible in live trading; SL-10: A-share T+1 constraint — backtesting without it overstates returns.'
    business_decisions: []
  - id: visualization
    narrative:
      does_what: Drawer.draw() combines kline main chart with factor overlays and Rect annotations for entry/exit signals
        using Plotly; Drawable interface on Factor enables consistent chart rendering across data types.
      key_decisions: BD-019 chose drawer_rects subclass override for custom annotations not hardcoded markers — allows traders
        to define entry/exit visuals without modifying base drawing logic.
      common_pitfalls: draw_result=True by default (BD-055) is fine for development but set draw_result=False in production/headless
        environments to avoid Plotly server startup overhead.
    business_decisions: []
  - id: cross_cutting_concerns
    narrative:
      does_what: 'Invariants and utilities that span multiple pipeline stages — collected from 21 source groups: arch.bootstrap(2),
        arch.covariance(2), arch.unitroot(11), arch.unitroot.critical_values.simulation(2), arch.univariate(11), bandwidth_selection(1),
        and 15 more.'
      key_decisions: 107 BDs merged here because they apply to more than one main stage (e.g. algorithm helpers, default value
        choices, ordering contracts, error handling). Agent should inspect individual BD summaries and link back to affected
        main stages via shared IDs.
      common_pitfalls: Cross-cutting concerns frequently surface as bugs when changes to one main stage unintentionally break
        another. Check constraints referencing these BDs and verify invariants still hold after any stage-local modification.
    business_decisions:
    - id: BD-030
      type: B
      summary: Use Politis-White optimal block length formula for Stationary and Circular Block Bootstrap
    - id: BD-054
      type: B/BA
      summary: Use Stationary Bootstrap with geometric block length distribution
    - id: BD-035
      type: B
      summary: Use Bartlett kernel with automatic bandwidth for long-run covariance in KPSS
    - id: BD-053
      type: B
      summary: Use Quadratic Spectral kernel for Andrews-optimal long-run covariance
    - id: BD-031
      type: B/BA
      summary: Use BIC as default lag selection method for ADF test
    - id: BD-032
      type: B/BA
      summary: Use MacKinnon critical value regression surface for ADF/PP p-values
    - id: BD-033
      type: B/BA
      summary: Use automatic max_lags formula 12*(nobs/100)^(1/4) for ADF when not specified
    - id: BD-041
      type: B
      summary: Use Elliott-Rothenberg-Stock GLS detrending for DFGLS test
    - id: BD-042
      type: B/RC
      summary: Use Engle-Granger two-step cointegration test on cross-sectional regression residuals
    - id: BD-043
      type: B
      summary: Use Dynamic OLS with leads and lags for cointegrating vector estimation
    - id: BD-044
      type: B/BA
      summary: Use Newey-West automatic bandwidth for KPSS with Hobijn et al. formula
    - id: BD-045
      type: B/BA
      summary: Use Zivot-Andrews structural break test with single-break assumption
    - id: BD-046
      type: B/BA
      summary: Use Phillips-Ouliaris Za/Zt/Pu/Pz tests with kernel-based long-run covariance
    - id: BD-047
      type: B
      summary: Use Variance Ratio test with heteroskedasticity-robust inference for random walk
    - id: BD-055
      type: B/BA
      summary: Use OLS with t-stat threshold of 1.645 for lag selection in t-stat method
    - id: BD-052
      type: B
      summary: Use Weighted Least Squares for critical value surface estimation in simulations
    - id: BD-057
      type: B
      summary: Use 250,000 simulations for critical value surface estimation
    - id: BD-034
      type: B
      summary: Use GARCH recursion with power transformation for variance bounds
    - id: BD-036
      type: B
      summary: Use EWMA recursion with lambda=0.94 for RiskMetrics2006 variance
    - id: BD-037
      type: B/BA
      summary: Use Student's t distribution with kurtosis-based starting values for ARCH models
    - id: BD-038
      type: B/BA
      summary: Use 0.94^i exponential decay backcast for GARCH initialization
    - id: BD-039
      type: B
      summary: Use FIGARCH with ARCH(infinity) representation for long-memory volatility
    - id: BD-040
      type: B/BA
      summary: Use EGARCH with log-variance for asymmetric volatility modeling
    - id: BD-048
      type: B/BA
      summary: Use Skew Student's t with Hansen (1994) parameterization for asymmetric returns
    - id: BD-049
      type: B/RC
      summary: Use Generalized Error Distribution with nu>1 for flexible tail modeling
    - id: BD-050
      type: B/BA
      summary: Use HAR (Heterogeneous Autoregressive) model for financial volatility forecasting
    - id: BD-051
      type: B
      summary: Use scipy.optimize.minimize with L-BFGS-B for ARCH model maximum likelihood
    - id: BD-056
      type: B/BA
      summary: Use APARCH with delta=1 for TARCH specification and power parameter
    - id: BD-028
      type: B
      summary: Auto-bandwidth selection for KPSS test
    - id: BD-007
      type: B/BA
      summary: sqrt(T) as default bootstrap block size
    - id: BD-008
      type: B/BA
      summary: Stationary Bootstrap as default for MCS
    - id: BD-009
      type: B/BA
      summary: 1000 bootstrap replications for MCS
    - id: BD-013
      type: B/BA
      summary: 2500 bootstrap replications for Sharpe ratio
    - id: BD-019
      type: B
      summary: Two-sided p-values using normal SF
    - id: BD-026
      type: B/BA
      summary: 'Bootstrap confidence intervals: ''basic'' and ''percentile'' methods'
    - id: BD-022
      type: B/BA
      summary: EWMA decay parameter 0.94 for variance bounds
    - id: BD-023
      type: B
      summary: 'Variance bounds: [var/1e6, var*1e6] with floor/ceiling'
    - id: BD-027
      type: B/BA
      summary: 'Parametric constraints: alpha[i] > 0, beta[i] > 0, sum < 1'
    - id: BD-003
      type: B/BA
      summary: 100 * pct_change() for returns calculation
    - id: BD-GAP-001
      type: DK
      summary: 'Missing: as-of vs processing time'
    - id: BD-GAP-002
      type: DK
      summary: 'Missing: Point-in-Time data availability'
    - id: BD-GAP-003
      type: DK
      summary: 'Missing: Stale data detection and expiry'
    - id: BD-GAP-004
      type: DK
      summary: 'Missing: Model/data version snapshot binding'
    - id: BD-GAP-005
      type: B
      summary: 'Missing: Currency/unit explicit annotation'
    - id: BD-GAP-006
      type: RC
      summary: 'Missing: Settlement and delivery time'
    - id: BD-GAP-007
      type: RC
      summary: 'Missing: Price and quantity precision'
    - id: BD-GAP-008
      type: B
      summary: 'Missing: 协方差矩阵 PSD 修复策略'
    - id: BD-GAP-009
      type: B
      summary: 'Missing: 协方差估计量选择与收缩'
    - id: BD-GAP-010
      type: B
      summary: 'Missing: VaR/CVaR 置信水平与窗口'
    - id: BD-GAP-011
      type: DK
      summary: 'Missing: 版本化写入与快照语义'
    - id: BD-GAP-012
      type: DK
      summary: 'Missing: ** "Implement explicit timezone annotation policy: each DatetimeIndex inputs must be UTC-normalized
        with explicit tz_localize before processing; add a validate_timezone() helper'
    - id: BD-GAP-013
      type: M
      summary: 'Missing: ** "Add Hessian condition number check before np.linalg.inv() in arch/univariate/base.py:979 and
        in cointegration module; warn or regularize if cond > 1e10'
    - id: BD-GAP-014
      type: B
      summary: 'Missing: ** "Add PSD (positive semi-definite) validation to kernel covariance estimator output in arch/covariance/kernel.py;
        symmetrize + eigenfloor any non-PSD estimates'
    - id: BD-GAP-015
      type: M
      summary: 'Missing: ** "Add explicit DataScaleWarning behavior description for poorly-scaled data: document the 1-1000
        scale recommendation and add a rescale helper'
    - id: BD-GAP-016
      type: M
      summary: 'Missing: ** "Add optional ConvergenceDiagnosis object that stores: iteration history, log-likelihood path,
        parameter trajectory, for post-hoc convergence quality assessment'
    - id: BD-GAP-017
      type: B
      summary: 'Missing: ** "Add explicit annualized_volatility() helper with configurable compounding convention (252, 365,
        simple); clarify that each volatility is in frequency-of-data units'
    - id: BD-GAP-018
      type: B
      summary: 'Missing: ** "Add backtest validation framework: automatic train/test split with historical VaR/CVaR/realized
        PnL tracking for volatility models'
    - id: BD-GAP-019
      type: DK
      summary: 'Missing: as-of vs processing time'
    - id: BD-GAP-020
      type: DK
      summary: 'Missing: Point-in-Time data availability'
    - id: BD-GAP-021
      type: DK
      summary: 'Missing: Stale data detection and expiry'
    - id: BD-GAP-022
      type: M/DK
      summary: 'Missing: Day count convention'
    - id: BD-GAP-023
      type: B
      summary: 'Missing: Currency/unit explicit annotation'
    - id: BD-GAP-024
      type: RC
      summary: 'Missing: Settlement and delivery time'
    - id: BD-GAP-025
      type: RC
      summary: 'Missing: Price and quantity precision'
    - id: BD-060
      type: BA/DK
      summary: GARCH power=2.0 defaults to standard GARCH; power!=2.0 blocks analytic forecasts
    - id: BD-061
      type: B/BA
      summary: ConstantVariance() and Normal() are hardcoded ARCHModel defaults
    - id: BD-066
      type: BA/M
      summary: Backcast uses exponential decay tau=min(75, nobs) with 0.94^weight
    - id: BD-069
      type: BA
      summary: hold_back=0 by default; each observations used in estimation
    - id: BD-072
      type: T
      summary: BCa confidence intervals require equal-length datasets across args/kwargs
    - id: BD-004
      type: B/BA
      summary: Analytic forecast method as default
    - id: BD-005
      type: B/BA
      summary: 1000 simulations for simulation/bootstrap forecasting
    - id: BD-014
      type: B/DK
      summary: Rolling window forecasts with 20 replications
    - id: BD-020
      type: B
      summary: 3-d array for multiple exogenous variable forecasts
    - id: BD-029
      type: B/BA
      summary: Simulation-based forecasting for multi-step GARCH with power!=2
    - id: BD-073
      type: BA/DK
      summary: 'INTERACTION: BD-002 (Normal distribution) × BD-017 (Student''s T distribution) → CONTRADICTION: Gaussian default
        vs heavy-tail reality'
    - id: BD-074
      type: BA/DK
      summary: 'INTERACTION: BD-003 (100*pct_change returns) × BD-012 (Sharpe annualization 12x) → HIDDEN DEPENDENCY: Return
        scaling propagates to performance metrics'
    - id: BD-075
      type: BA/DK
      summary: 'INTERACTION: BD-001 (GARCH(1,1)) × BD-015 (GJR-GARCH leverage) × BD-041 (EGARCH log-variance) → RISK CASCADE:
        Asymmetric volatility models cascade through VaR estimation'
    - id: BD-076
      type: BA/M
      summary: 'INTERACTION: BD-010 (AIC for ADF) × BD-032 (BIC for ADF) → CONTRADICTION: Conflicting lag selection defaults
        across codebase'
    - id: BD-077
      type: BA/DK
      summary: 'INTERACTION: BD-004 (Analytic forecast) × BD-030 (Simulation for power!=2) × BD-061 (power blocks analytic)
        → RISK CASCADE: Power specification determines forecast method availability'
    - id: BD-078
      type: BA/M
      summary: 'INTERACTION: BD-006 (EWMA backcast tau=75) × BD-022 (lambda=0.94 decay) × BD-037 (RiskMetrics lambda=0.94)
        → AMPLIFICATION: Consistent EWMA parameters amplify initialization sensitivity'
    - id: BD-079
      type: BA/M
      summary: 'INTERACTION: BD-007 (sqrt(T) block size) × BD-008 (Stationary Bootstrap) × BD-031 (Politis-White optimal block)
        → AMPLIFICATION: Multiple block length selection mechanisms interact'
    - id: BD-080
      type: BA/DK
      summary: 'INTERACTION: BD-021 (Bollerslev-Wooldridge robust SE) × BD-028 (GARCH constraints alpha+beta<1) → HIDDEN DEPENDENCY:
        Robust inference requires correctly specified volatility model'
    - id: BD-081
      type: BA
      summary: 'INTERACTION: BD-070 (hold_back=0 default) × BD-014 (20 rolling windows) → CONTRADICTION: Full-sample estimation
        vs out-of-sample validation requirements'
    - id: BD-082
      type: BA
      summary: 'INTERACTION: BD-033 (MacKinnon critical values) × BD-043 (Engle-Granger cointegration) × BD-047 (Phillips-Ouliaris
        tests) → RISK CASCADE: Critical value surface accuracy cascades through each unit roo'
    - id: BD-065
      type: B/BA
      summary: CircularBlockBootstrap inherits from IIDBootstrap with block_length override
    - id: BD-006
      type: B/BA
      summary: EWMA backcast with lambda=0.94 and tau=75
    - id: BD-062
      type: DK
      summary: 'Parameter ordering: [mean_params, vol_params, dist_params] with computed offsets'
    - id: BD-063
      type: DK
      summary: 'Loglikelihood computation order: resid -> sigma2 -> distribution.loglikelihood'
    - id: BD-070
      type: DK
      summary: 'rescale threshold: variance must be in [0.1, 10000) to avoid rescaling'
    - id: BD-001
      type: B/BA
      summary: GARCH(1,1) as default volatility model
    - id: BD-002
      type: B/BA
      summary: Normal (Gaussian) distribution as default error distribution
    - id: BD-010
      type: B/BA
      summary: AIC as default lag selection criterion for ADF
    - id: BD-011
      type: B/BA
      summary: Constant as default ADF deterministic trend
    - id: BD-015
      type: B/BA
      summary: GJR-GARCH with o=1 captures leverage effect
    - id: BD-016
      type: B/BA
      summary: TARCH (power=1.0) models absolute volatility
    - id: BD-017
      type: B/BA
      summary: Student's T distribution for heavy-tailed returns
    - id: BD-018
      type: B
      summary: R-squared adjusted for degrees of freedom
    - id: BD-024
      type: B/BA
      summary: Fixed parameters via fix() method for counterfactuals
    - id: BD-058
      type: T
      summary: fit() MUST be called before forecast() on ARCHModelResult
    - id: BD-059
      type: T
      summary: Bootstrap clone() requires fresh fit data - old fit indices persist
    - id: BD-068
      type: T
      summary: fit() closed-form path requires Normal() dist AND ConstantVariance volatility
    - id: BD-025
      type: B
      summary: 'Horizon naming: ''h.1'', ''h.2'', ... for forecast columns'
    - id: BD-021
      type: B
      summary: Bollerslev-Wooldridge robust covariance estimator
    - id: BD-064
      type: M/DK
      summary: 'Strategy pattern: VolatilityProcess/Distribution are pluggable strategy objects'
    - id: BD-067
      type: M/DK
      summary: Starting values search uses fixed grid of (alpha, gamma, beta) tuples
    - id: BD-071
      type: B
      summary: Numba JIT compilation fallback to Python in _cov_kernel
    - id: BD-012
      type: B/BA
      summary: Sharpe Ratio annualized with 12 multiplier
resources:
  packages:
  - name: pandas
    version_pin: ==1.5.3
  - name: numpy
    version_pin: ==1.24.4
  - name: matplotlib
    version_pin: '>=2'
  - name: requests
    version_pin: ==2.31.0
  - name: scipy
    version_pin: '>=1.3.0'
  - name: scikit-learn
    version_pin: '>1.4.2'
  - name: pytest
    version_pin: '>=8.3'
  strategy_scaffold:
    entry_point_name: run_backtest
    output_path: result.csv
    execution_mode: backtest
    conditional_entry_points:
      backtest:
        entry_point_name: run_backtest
        output_path: result.csv
      collector:
        entry_point_name: run_collector
        output_path: result.json
      factor:
        entry_point_name: run_factor
        output_path: result.parquet
      training:
        entry_point_name: run_training
        output_path: result.json
      serving:
        entry_point_name: run_server
        output_path: result.json
      research:
        entry_point_name: run_research
        output_path: result.json
    tail_template: "# === DO NOT MODIFY BELOW THIS LINE ===\nif __name__ == \"__main__\":\n    result = run_backtest()  #\
      \ implement above\n    from validate import enforce_validation\n    enforce_validation(result, output_path=\"{workspace}/result.csv\"\
      )\n# === END DO NOT MODIFY ==="
  host_adapter:
    target: openclaw
    timeout_seconds: 1800
    shell_operator_restriction: 'exec tool intercepts && / ; / | — never chain: ''pip install X && python Y''. Use separate
      exec calls.'
    install_recipes:
    - python3 -m pip install zvt
    credential_injection: JoinQuant/QMT credentials require user-side '!' prefix shell login. Never hardcode credentials in
      generated scripts.
    path_resolution: '{workspace} resolves to ~/.openclaw/workspace/doramagic at execution time.'
    file_io_tooling: Use openclaw 'write' tool for .py/.sql files; 'exec' tool for python3 /absolute/path/script.py (absolute
      paths only).
constraints:
  fatal:
  - id: finance-C-001
    when: When implementing data input for ARCH model initialization
    action: validate that input data contains only finite values using np.all(np.isfinite) before any numeric computation
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Optimizers and recursive variance computations will produce NaN/inf results, causing the entire model estimation
      to fail silently with meaningless outputs
    stage_ids:
    - data_input
  - id: finance-C-002
    when: When implementing data input for ARCH model initialization
    action: convert each input data to contiguous float64 arrays using np.ascontiguousarray before storing in self._y
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Non-contiguous arrays or non-float64 types will cause buffer errors in Cython/Numba optimized recursive computations,
      leading to segmentation faults or incorrect variance calculations
    stage_ids:
    - data_input
  - id: finance-C-009
    when: When initializing ARCH models with data
    action: pass None as input data without raising RuntimeError when attempting to fit the model
    severity: fatal
    kind: domain_rule
    modality: must_not
    consequence: Fitting attempt with no data will cause cryptic errors in scipy optimize or segfault in Cython recursions
    stage_ids:
    - data_input
  - id: finance-C-013
    when: When implementing an ARCH model with custom components
    action: verify input data y contains only finite values without NaN or inf
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: NaN or inf values in the input data cause the model to fail silently or produce invalid likelihood computations
      during optimization, leading to incorrect parameter estimates
    stage_ids:
    - model_specification
  - id: finance-C-014
    when: When plugging in a volatility component to ARCHModel
    action: verify the volatility parameter inherits from VolatilityProcess abstract base class
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Using a non-VolatilityProcess subclass causes TypeError during initialization, and incompatible volatility
      processes will fail during variance computation and forecasting
    stage_ids:
    - model_specification
  - id: finance-C-015
    when: When plugging in a distribution component to ARCHModel
    action: verify the distribution parameter inherits from Distribution abstract base class
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Using a non-Distribution subclass causes TypeError during initialization, and incompatible distributions
      will fail during log-likelihood computation
    stage_ids:
    - model_specification
  - id: finance-C-016
    when: When implementing custom volatility or distribution classes
    action: implement the constraints() method returning (A, b) arrays where A.dot(params) - b >= 0
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Missing or incorrect constraints implementation causes optimization to use invalid parameter regions, producing
      mathematically invalid volatility models (e.g., negative variances)
    stage_ids:
    - model_specification
  - id: finance-C-019
    when: When implementing custom volatility processes
    action: provide compute_variance() method that fills sigma2 array with conditional variances
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Missing or incorrect compute_variance implementation causes the likelihood function to fail, making parameter
      estimation impossible
    stage_ids:
    - model_specification
  - id: finance-C-020
    when: When implementing custom distribution classes
    action: provide loglikelihood() method for likelihood evaluation during optimization
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Missing loglikelihood implementation causes the optimization to fail during parameter estimation, as log-likelihood
      is the objective function for SLSQP
    stage_ids:
    - model_specification
  - id: finance-C-029
    when: When composing ARCHModel from three components
    action: 'concatenate parameter arrays in the fixed order: [mean_params, volatility_params, distribution_params]'
    severity: fatal
    kind: architecture_guardrail
    modality: must
    consequence: Incorrect parameter ordering causes _parse_parameters to assign wrong values to each component, leading to
      mathematically invalid models (e.g., volatility parameters interpreted as mean parameters)
    stage_ids:
    - model_specification
  - id: finance-C-030
    when: When constructing constraints for ARCH model fitting
    action: stack constraint matrices from mean model, volatility, and distribution in parameter order
    severity: fatal
    kind: architecture_guardrail
    modality: must
    consequence: Incorrect constraint stacking causes the optimizer to enforce wrong constraints on wrong parameters, producing
      invalid or non-stationary models
    stage_ids:
    - model_specification
  - id: finance-C-031
    when: When constructing starting values for ARCH model fitting
    action: concatenate starting values from mean model, volatility (computed from resids), and distribution (computed from
      std_resids)
    severity: fatal
    kind: architecture_guardrail
    modality: must
    consequence: Incorrect starting values concatenation causes the optimizer to use wrong initial values for wrong parameters,
      leading to poor convergence or wrong solutions
    stage_ids:
    - model_specification
  - id: finance-C-035
    when: When implementing SLSQP constrained optimization
    action: enforce inequality constraints a.dot(params) - b >= 0 for each parameters
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Volatility model parameters violating stationarity constraints produce invalid conditional variances, causing
      downstream risk misestimation and potential trading losses
    stage_ids:
    - parameter_estimation
  - id: finance-C-036
    when: When computing conditional variance in optimization loop
    action: verify sigma2 (conditional variance) >= 0 for each observations
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Negative variance values cause loglikelihood to produce NaN, invalidating parameter estimates and causing
      downstream computations to fail silently
    stage_ids:
    - parameter_estimation
  - id: finance-C-040
    when: When constructing the unified parameter vector
    action: use offsets array to partition parameters into mean|volatility|distribution components
    severity: fatal
    kind: architecture_guardrail
    modality: must
    consequence: Incorrect parameter partitioning causes wrong component parameters to be passed to mean model, volatility
      model, and distribution, producing invalid results
    stage_ids:
    - parameter_estimation
  - id: finance-C-041
    when: When assembling inequality constraints for joint estimation
    action: combine constraints from mean model, volatility model, and distribution in correct order
    severity: fatal
    kind: architecture_guardrail
    modality: must
    consequence: Incorrect constraint ordering places volatility constraints in wrong parameter positions, allowing invalid
      parameters that violate stationarity or positivity requirements
    stage_ids:
    - parameter_estimation
  - id: finance-C-051
    when: When estimating volatility persistence close to 1.0
    action: check that persistence = sum(alpha) + sum(gamma)/2 + sum(beta) < 1 for stationarity
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Persistence >= 1 violates covariance stationarity, producing non-mean-reverting variance that explodes over
      time, invalidating long-horizon forecasts
    stage_ids:
    - parameter_estimation
  - id: finance-C-052
    when: When implementing ARCH model forecasting code
    action: verify forecast variances are finite and non-negative throughout the forecast horizon
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Non-finite or negative variance forecasts indicate mathematical errors in the ARCH recursion, producing invalid
      statistical inferences and potentially misleading risk estimates
    stage_ids:
    - forecasting
  - id: finance-C-054
    when: When validating forecast horizon parameter
    action: require horizon to be a positive integer (>= 1)
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Invalid horizon values cause undefined forecast behavior or silent data corruption in downstream risk calculations
    stage_ids:
    - forecasting
  - id: finance-C-055
    when: When forecasting with EGARCH volatility models
    action: use analytic method for horizons greater than 1
    severity: fatal
    kind: resource_boundary
    modality: must_not
    consequence: EGARCH variance evolves in logarithmic space, not squares. Analytic multi-step formulas require variance
      to evolve in squares, producing mathematically invalid forecasts
    stage_ids:
    - forecasting
  - id: finance-C-063
    when: When validating forecasting method parameter
    action: accept only 'analytic', 'simulation', or 'bootstrap' as valid method values
    severity: fatal
    kind: architecture_guardrail
    modality: must
    consequence: Invalid method strings cause undefined behavior or silent fallback to incorrect forecasting algorithm
    stage_ids:
    - forecasting
  - id: finance-C-068
    when: When comparing backtested forecast performance to live trading
    action: claim backtest returns equal expected live trading returns
    severity: fatal
    kind: claim_boundary
    modality: must_not
    consequence: Backtest results exclude transaction costs, slippage, liquidity constraints, and market impact that affect
      live execution. Claiming equivalence misleads risk assessment
    stage_ids:
    - forecasting
  - id: finance-C-070
    when: When implementing unit root tests that use lag selection
    action: enforce non-negative integer lags by raising ValueError for negative lags
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Negative lag values produce invalid regression specifications with wrong degrees of freedom, causing misleading
      test statistics and invalid statistical inference
    stage_ids:
    - unitroot_testing
  - id: finance-C-071
    when: When running ADF, DFGLS, or KPSS tests
    action: validate trend against the test-specific supported trends before computation
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Invalid trend specification causes wrong statistical distribution assumptions, leading to incorrect critical
      values and p-values that invalidate test conclusions
    stage_ids:
    - unitroot_testing
  - id: finance-C-072
    when: When implementing ADF or DFGLS test regression
    action: verify sample size exceeds minimum requirement of 3 + trend_order + lag_len observations
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Insufficient observations cause singular or near-singular regression matrices, leading to unstable or undefined
      test statistics
    stage_ids:
    - unitroot_testing
  - id: finance-C-073
    when: When computing test statistics for unit root tests
    action: verify the statistic is finite and within the interpolation bounds of critical value tables
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Non-finite or out-of-range statistics produce undefined p-values (0.0 or 1.0) that miss actual stationarity
      patterns or create false rejections
    stage_ids:
    - unitroot_testing
  - id: finance-C-074
    when: When implementing VarianceRatio test
    action: enforce lags parameter to be an integer >= 2 before computation
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Lags less than 2 produce undefined multi-period variance ratios, causing division by zero or mathematically
      invalid test statistics
    stage_ids:
    - unitroot_testing
  - id: finance-C-075
    when: When implementing ZivotAndrews test trim parameter
    action: validate trim is a float in range [0, 1/3] to verify valid break period calculation
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Invalid trim values cause incorrect break point exclusion regions, leading to structural break misdetection
      and invalid unit root conclusions
    stage_ids:
    - unitroot_testing
  - id: finance-C-076
    when: When implementing DFGLS test
    action: use trend values other than 'c' or 'ct'
    severity: fatal
    kind: resource_boundary
    modality: must_not
    consequence: Unsupported trends produce GLS-detrending coefficients outside valid ranges (-7.0 for 'c', -13.5 for 'ct'),
      causing undefined test statistics
    stage_ids:
    - unitroot_testing
  - id: finance-C-077
    when: When implementing cointegration tests
    action: verify y and x have identical number of observations before cross-sectional regression
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Misaligned observation counts produce incorrect cointegrating vectors and residuals, leading to spurious
      cointegration conclusions
    stage_ids:
    - unitroot_testing
  - id: finance-C-080
    when: When implementing Engle-Granger cointegration test
    action: limit the number of cross-sectional variables (num_x) to range [1, 12]
    severity: fatal
    kind: resource_boundary
    modality: must
    consequence: Cross-sectional variables outside [1,12] lack pre-computed critical value tables, causing KeyError or invalid
      cointegration inference
    stage_ids:
    - unitroot_testing
  - id: finance-C-081
    when: When using MacKinnon critical value functions
    action: use regression='ctt' with dist_type='dfgls' since DFGLS only supports 'c' and 'ct'
    severity: fatal
    kind: resource_boundary
    modality: must_not
    consequence: Invalid regression-dist_type combination causes KeyError when accessing non-existent critical value table
      entries
    stage_ids:
    - unitroot_testing
  - id: finance-C-083
    when: When computing automatic bandwidth for KPSS test
    action: require at least 2 observations in the input series
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Single observation series causes division by zero or undefined bandwidth, leading to crashes in KPSS test
      execution
    stage_ids:
    - unitroot_testing
  - id: finance-C-084
    when: When running PhillipsPerron test
    action: allow zero or negative regression coefficient standard error
    severity: fatal
    kind: domain_rule
    modality: must_not
    consequence: Zero variance indicates constant-value series or perfect multicollinearity, producing undefined PP test statistics
    stage_ids:
    - unitroot_testing
  - id: finance-C-085
    when: When implementing ZivotAndrews test
    action: validate regressor matrix rank to detect singular matrices from constant regions
    severity: fatal
    kind: architecture_guardrail
    modality: must
    consequence: Singular regressor matrices cause undefined OLS estimates, producing NaN test statistics and invalid structural
      break conclusions
    stage_ids:
    - unitroot_testing
  - id: finance-C-093
    when: When implementing confidence interval calculation using conf_int method
    action: use size parameter strictly between 0 and 1
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Invalid CI size produces undefined behavior or runtime ValueError, breaking statistical inference and producing
      meaningless intervals that cannot be interpreted for decision-making
    stage_ids:
    - bootstrap_inference
  - id: finance-C-094
    when: When implementing confidence interval calculation using conf_int method
    action: use tail parameter as one of 'two', 'lower', or 'upper'
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Invalid tail parameter causes ValueError and fails to produce one-sided or two-sided confidence intervals
      needed for directional hypothesis testing
    stage_ids:
    - bootstrap_inference
  - id: finance-C-095
    when: When implementing Model Confidence Set (MCS) with multiple_comparison module
    action: provide losses array with at least two columns (models)
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: MCS with fewer than 2 models cannot compute pairwise comparisons, resulting in ValueError and failure to
      produce any model confidence set output
    stage_ids:
    - bootstrap_inference
  - id: finance-C-096
    when: When implementing BCa (bias-corrected and accelerated) confidence interval method
    action: verify empirical probability p is strictly between 0 and 1
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: BCa fails when empirical probability is 0 or 1 (extreme statistics), causing RuntimeError and preventing
      bias correction for distributions not well-approximated by normal in finite samples
    stage_ids:
    - bootstrap_inference
  - id: finance-C-097
    when: When implementing bootstrap-based forecasting using _bootstrap_forecast
    action: verify start index includes more than 100 observations
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Bootstrap forecast with fewer than 100 observations produces unreliable standard errors and confidence intervals,
      invalidating volatility forecasts for risk management decisions
    stage_ids:
    - bootstrap_inference
  - id: finance-C-098
    when: When implementing bootstrap confidence intervals
    action: validate confidence interval size as strictly between 0 and 1
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Invalid CI size causes ValueError and prevents computation of statistically valid confidence intervals needed
      for parameter uncertainty quantification
    stage_ids:
    - bootstrap_inference
  - id: finance-C-103
    when: When implementing MCS or SPA
    action: require each input arrays to have the same number of elements in axis 0
    severity: fatal
    kind: resource_boundary
    modality: must
    consequence: Misaligned data causes silent misalignment in bootstrap resampling, producing incorrect standard errors and
      invalid confidence intervals that appear valid
    stage_ids:
    - bootstrap_inference
  - id: finance-C-106
    when: When implementing SPA p-value calculation
    action: compute pvalue argument must be strictly between 0 and 1 for critical values
    severity: fatal
    kind: operational_lesson
    modality: must
    consequence: Invalid p-value causes ValueError and prevents computation of critical values needed for model selection
      decisions
    stage_ids:
    - bootstrap_inference
  - id: finance-C-107
    when: When implementing bootstrap-based model comparison
    action: call compute() before accessing pvalues or included/excluded model sets
    severity: fatal
    kind: architecture_guardrail
    modality: must
    consequence: Accessing results before compute() causes RuntimeError and prevents retrieval of MCS/SPA results
    stage_ids:
    - bootstrap_inference
  - id: finance-C-109
    when: When implementing bootstrap data validation
    action: verify each input data types are numpy ndarray, pandas DataFrame, or pandas Series
    severity: fatal
    kind: architecture_guardrail
    modality: must
    consequence: Unsupported data types cause TypeError and prevent bootstrap from resampling data for inference
    stage_ids:
    - bootstrap_inference
  - id: finance-C-110
    when: When implementing PRNG for bootstrap
    action: use NumPy Generator or RandomState as PRNG source
    severity: fatal
    kind: architecture_guardrail
    modality: must
    consequence: Invalid PRNG type causes TypeError and prevents bootstrap from generating random indices for resampling
    stage_ids:
    - bootstrap_inference
  - id: finance-C-116
    when: When computing t-statistics for model parameters
    action: compute tvalues as the ratio of params divided by std_err
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Incorrect t-statistics will lead to wrong hypothesis testing conclusions, causing invalid statistical inference
      about parameter significance
    stage_ids:
    - results_reporting
  - id: finance-C-117
    when: When computing p-values for parameter t-statistics
    action: compute pvalues using two-sided normal distribution survival function on absolute tvalues
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Incorrect p-values will cause wrong conclusions about parameter significance, leading to improper model specification
      decisions
    stage_ids:
    - results_reporting
  - id: finance-C-118
    when: When constructing parameter confidence intervals
    action: compute confidence intervals using normal distribution quantile with specified alpha coverage
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Incorrect confidence interval coverage will misrepresent the precision of parameter estimates, violating
      statistical guarantees
    stage_ids:
    - results_reporting
  - id: finance-C-119
    when: When computing ARCH-LM test statistic
    action: compute the ARCH-LM statistic as nobs multiplied by the R-squared of the auxiliary regression
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Incorrect ARCH-LM statistic will produce wrong diagnostic conclusions about remaining heteroskedasticity
      in model residuals
    stage_ids:
    - results_reporting
  - id: finance-C-120
    when: When computing standard errors from parameter covariance
    action: extract standard errors as the square root of diagonal elements of the parameter covariance matrix
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Incorrect standard errors propagate to all downstream inference, affecting t-statistics, p-values, and confidence
      intervals
    stage_ids:
    - results_reporting
  - id: finance-C-121
    when: When computing model fit statistics
    action: compute AIC as negative two times loglikelihood plus two times number of parameters
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Incorrect AIC will lead to wrong model selection decisions when comparing different ARCH specifications
    stage_ids:
    - results_reporting
  - id: finance-C-122
    when: When computing Schwarz/Bayesian Information Criteria
    action: compute BIC as negative two times loglikelihood plus log of observations times number of parameters
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Incorrect BIC will cause improper model selection, potentially choosing over-parameterized or under-fitted
      models
    stage_ids:
    - results_reporting
  - id: finance-C-123
    when: When computing adjusted R-squared
    action: compute adjusted R-squared using the degrees of freedom correction formula
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Incorrect adjusted R-squared will misrepresent model explanatory power after accounting for parameter count
    stage_ids:
    - results_reporting
  - id: finance-C-124
    when: When displaying model estimation summary
    action: display parameter table with columns for coefficient, standard error, t-statistic, p-value, and confidence interval
    severity: fatal
    kind: architecture_guardrail
    modality: must
    consequence: Missing columns in summary output will prevent users from performing valid statistical inference on model
      parameters
    stage_ids:
    - results_reporting
  - id: finance-C-125
    when: When displaying fit statistics in summary
    action: display R-squared, adjusted R-squared, log-likelihood, AIC, and BIC in the summary header
    severity: fatal
    kind: architecture_guardrail
    modality: must
    consequence: Missing fit statistics will prevent users from assessing model quality and performing model comparison
    stage_ids:
    - results_reporting
  - id: finance-C-128
    when: When running ARCH-LM test for residual diagnostics
    action: require at least 3 non-nan observations for valid test results
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: ARCH-LM test with insufficient observations produces unreliable test statistics and misleading diagnostic
      conclusions
    stage_ids:
    - results_reporting
  - id: finance-C-130
    when: When computing R-squared for model fit assessment
    action: handle implicit constant detection to verify correct total sum of squares computation
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Incorrect R-squared when model contains implicit constant leads to wrong assessment of model explanatory
      power
    stage_ids:
    - results_reporting
  - id: finance-C-135
    when: When testing ARCH-LM with default lag selection
    action: compute default lags using the formula ceil(12 * (nobs/100)^(1/4)) bounded by half the sample size
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Incorrect default lag selection will produce either under-powered or over-fitted ARCH-LM tests, leading to
      wrong diagnostic conclusions
    stage_ids:
    - results_reporting
  - id: finance-C-136
    when: When initializing an ARCHModel with input data
    action: Convert y to float64 contiguous array and validate each values are finite (no NaN or inf)
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Non-finite values (NaN/inf) in the input data will cause variance computations to produce NaN, leading to
      failed optimization and meaningless model results
    stage_ids:
    - data_input
    - model_specification
  - id: finance-C-137
    when: When converting input data to the internal representation
    action: Verify input y is converted to 1D float64 contiguous array via to_array_1d
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Multi-dimensional or non-contiguous arrays will cause index errors in variance recursions and parameter estimation
    stage_ids:
    - data_input
    - model_specification
  - id: finance-C-139
    when: When combining starting values from mean, volatility, and distribution
    action: 'Concatenate starting values in the correct order: mean params, volatility params, distribution params'
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Incorrect parameter ordering will cause parameter parsing (_parse_parameters) to assign wrong values to mean/volatility/distribution,
      producing invalid models
    stage_ids:
    - model_specification
    - parameter_estimation
  - id: finance-C-140
    when: When combining bounds from mean, volatility, and distribution
    action: 'Extend bounds list in the same order as parameters: mean bounds first, then volatility bounds, then distribution
      bounds'
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Misaligned bounds will cause SLSQP optimizer to enforce wrong constraints on wrong parameters, potentially
      producing invalid parameter values
    stage_ids:
    - model_specification
    - parameter_estimation
  - id: finance-C-141
    when: When constructing linear constraints from each model components
    action: Block-diagonalize constraint matrix A so each component's constraints only affect its own parameters
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Non-block-diagonal constraints will incorrectly constrain unrelated parameters, causing optimization to fail
      or produce wrong parameter values
    stage_ids:
    - model_specification
    - parameter_estimation
  - id: finance-C-144
    when: When passing fitted parameters from estimation to forecasting
    action: Parse params using _parse_parameters to extract mean/volatility/distribution parameter subsets
    severity: fatal
    kind: architecture_guardrail
    modality: must
    consequence: Using raw params array without parsing will pass wrong parameter subsets to variance recursions, producing
      incorrect forecasts
    stage_ids:
    - parameter_estimation
    - forecasting
  - id: finance-C-159
    when: When initializing an ARCHModel with input data (y parameter)
    action: Provide only finite values in the data array — NaN and inf values are not permitted and cause a ValueError
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: NaN or inf values in the input time series cause the model's loglikelihood computation to produce NaN results,
      corrupting all parameter estimates and forecasts
  - id: finance-C-160
    when: When implementing a new VolatilityProcess subclass
    action: Verify each computed conditional variance values (sigma2) are non-negative throughout the variance recursion —
      values below the lower var_bounds are clamped up, and values above the upper bound are log-adjusted
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Negative sigma2 values cause the distribution loglikelihood to receive invalid inputs (e.g., sqrt of negative
      for Normal), producing NaN loglikelihood and corrupted parameter estimates
  - id: finance-C-161
    when: When constructing variance bounds (var_bounds) for any volatility model
    action: Format var_bounds as a 2-column array of shape (nobs, 2) where column 0 is the lower bound and column 1 is the
      upper bound for each observation
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Incorrectly formatted var_bounds causes the bounds_check function to misread lower/upper bounds, allowing
      invalid sigma2 values to pass through and corrupt the loglikelihood
  - id: finance-C-162
    when: When providing a user-supplied backcast value to the volatility model's backcast_transform
    action: Verify the backcast value is strictly positive — negative backcast values cause a ValueError
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Negative backcast causes the volatility recursion to start with an invalid initial sigma2 value, producing
      invalid loglikelihood values and corrupted estimates
  - id: finance-C-163
    when: When implementing a new VolatilityProcess or Distribution subclass
    action: Return constraint arrays (a, b) where parameters satisfy a.dot(params) - b >= 0 for each rows of a — this linear
      constraint format is required by the SLSQP optimizer
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Incorrectly formatted constraint arrays cause SLSQP to receive invalid constraints, producing undefined optimization
      behavior and potentially invalid parameter estimates
  - id: finance-C-171
    when: When implementing a new mean model, volatility model, or distribution
    action: 'Verify the concatenated parameter array follows the fixed ordering: [mean_params, vol_params, dist_params], using
      the pre-computed offsets array to slice each sub-parameter set'
    severity: fatal
    kind: architecture_guardrail
    modality: must
    consequence: Incorrect parameter ordering causes _parse_parameters to return wrong slices for mean, volatility, and distribution
      parameters, producing invalid loglikelihood values and corrupted estimates
  - id: finance-C-172
    when: When computing the loglikelihood in the ARCHModel._loglikelihood method
    action: 'Follow the fixed three-step computation order: (1) compute resids from the mean model, (2) compute sigma2 using
      volatility.compute_variance(), (3) call distribution.loglikelihood() with the computed resids and sigma2'
    severity: fatal
    kind: architecture_guardrail
    modality: must
    consequence: Skipping or reordering any step produces an incorrect loglikelihood value, leading to wrong parameter estimates
      during optimization
  - id: finance-C-173
    when: When calling forecast() on an ARCHModelResult
    action: Call fit() first to produce an ARCHModelResult with estimated params — forecast() requires the params attribute
      which is only populated after successful fit() execution
    severity: fatal
    kind: architecture_guardrail
    modality: must
    consequence: Calling forecast() before fit() causes an AttributeError because params is None, preventing any forecast
      generation
  - id: finance-C-174
    when: When implementing a new VolatilityProcess subclass that is NOT ConstantVariance
    action: Set closed_form = False — only ConstantVariance has closed_form = True; each other volatility processes must explicitly
      set closed_form = False or accept the default
    severity: fatal
    kind: architecture_guardrail
    modality: must_not
    consequence: Setting closed_form = True on a non-ConstantVariance volatility process causes the fit() method to incorrectly
      enter the closed-form path, producing mathematically invalid parameter estimates
  - id: finance-C-175
    when: When using the Normal/Gaussian distribution with any ARCH volatility model
    action: Set num_params = 0 on the Normal distribution — Normal has no additional parameters beyond those estimated by
      the mean and volatility models
    severity: fatal
    kind: architecture_guardrail
    modality: must
    consequence: Incorrectly setting num_params > 0 on the Normal distribution disrupts the parameter offset calculations
      in fit(), causing parameter slicing errors and wrong estimates
  - id: finance-C-210
    when: When processing price and quantity data in financial calculations
    action: Assume infinite precision for monetary calculations using native float types — floating-point representation causes
      rounding errors in price aggregation and P&L computation
    severity: fatal
    kind: claim_boundary
    modality: must_not
    consequence: Using float for price and quantity causes accumulated rounding errors in high-frequency trading; a 0.01 cent
      error per trade compounds to significant P&L discrepancies in live trading
    derived_from_bd_id: BD-GAP-007
  - id: finance-C-230
    when: When setting APARCH model parameters in volatility estimation
    action: Change the delta parameter to values other than 1 when intending TARCH specification — delta=1 recovers standard
      absolute return specification required for threshold ARCH
    severity: fatal
    kind: architecture_guardrail
    modality: must_not
    consequence: Setting delta!=1 fundamentally alters the APARCH power transformation, breaking the TARCH nesting property
      and producing incorrect asymmetric volatility estimates that misrepresent downside risk
    derived_from_bd_id: BD-056
  regular:
  - id: finance-C-003
    when: When implementing input type detection in ARCH model
    action: track whether the original input was a pandas DataFrame or Series using isinstance(y, (pd.DataFrame, pd.Series))
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Results reporting will return incorrect types (numpy array instead of Series), breaking user API expectations
      and causing downstream type errors
    stage_ids:
    - data_input
  - id: finance-C-004
    when: When implementing data input for ARCH model initialization
    action: store the original input data unchanged in _y_original before any transformation for results reporting
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Results and forecasts will be reported using transformed/scaled data instead of original user input, making
      results unintelligible to users
    stage_ids:
    - data_input
  - id: finance-C-005
    when: When implementing input coercion in ARCH library
    action: handle various input types (Series, DataFrame, numpy arrays, lists, scalars) by converting to consistent 1D format
      using ensure1d
    severity: high
    kind: resource_boundary
    modality: must
    consequence: Incompatible input types will raise unexpected TypeErrors, preventing users from using common data formats
      like pandas Series or numpy arrays
    stage_ids:
    - data_input
  - id: finance-C-006
    when: When implementing array conversion in ARCH library
    action: verify each converted arrays are 1D and float64 dtype using to_array_1d for downstream numeric operations
    severity: high
    kind: resource_boundary
    modality: must
    consequence: Multi-dimensional or non-float64 arrays will cause shape mismatches in matrix operations and optimize.outer
      calls, producing incorrect log-likelihood values
    stage_ids:
    - data_input
  - id: finance-C-007
    when: When implementing rescale logic in ARCH model fitting
    action: apply scale factor consistently to original data when rescaling is triggered, then update model state via _scale_changed()
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Parameter estimates and forecasts will be on wrong scale, making results meaningless for users who expect
      outputs in original data units
    stage_ids:
    - data_input
  - id: finance-C-008
    when: When implementing data validation in ARCH model
    action: raise ValueError immediately when encountering non-1D-reshapable input, with clear message indicating dimensionality
      requirement
    severity: high
    kind: domain_rule
    modality: must
    consequence: Multi-dimensional data will silently produce incorrect results in variance calculations, leading to misleading
      ARCH parameter estimates
    stage_ids:
    - data_input
  - id: finance-C-010
    when: When implementing automatic data rescaling
    action: warn users when data variance is outside [0.1, 10000) range to prevent numerical instability in optimization
    severity: medium
    kind: operational_lesson
    modality: should
    consequence: Poorly scaled data causes optimizer convergence failure or excessive iterations, wasting computational resources
      and producing unreliable parameter estimates
    stage_ids:
    - data_input
  - id: finance-C-011
    when: When implementing ARCH model forecasting
    action: claim that forecast outputs equal actual realized returns or that backtest returns predict live trading performance
    severity: high
    kind: claim_boundary
    modality: must_not
    consequence: ARCH forecasts represent conditional variance estimates based on historical patterns; presenting these as
      predictions of actual returns violates fundamental statistical principles and may mislead users into financial losses
    stage_ids:
    - data_input
  - id: finance-C-012
    when: When handling pandas Series input with ensure1d
    action: preserve the Series name attribute during conversion when series=True, or set name from provided name parameter
    severity: medium
    kind: architecture_guardrail
    modality: must
    consequence: Results will have unlabeled or incorrectly labeled output Series, making downstream data analysis and debugging
      difficult
    stage_ids:
    - data_input
  - id: finance-C-017
    when: When implementing custom volatility processes
    action: provide starting_values() method returning valid initial parameter values
    severity: high
    kind: domain_rule
    modality: must
    consequence: Invalid or poorly-chosen starting values cause the SLSQP optimizer to fail convergence or converge to local
      optima, producing suboptimal parameter estimates
    stage_ids:
    - model_specification
  - id: finance-C-018
    when: When implementing custom volatility processes
    action: provide bounds() method returning list of (lower, upper) tuples for each parameter
    severity: high
    kind: domain_rule
    modality: must
    consequence: Missing bounds causes the optimizer to use unbounded parameter search, potentially producing numerically
      unstable or invalid parameter values
    stage_ids:
    - model_specification
  - id: finance-C-021
    when: When using the arch library for volatility modeling
    action: claim real-time trading capability since arch is a pure backtesting and forecasting framework
    severity: high
    kind: claim_boundary
    modality: must_not
    consequence: Claiming live trading capability when arch only provides estimation and simulation leads to operational misuse
      and potential financial losses from attempting to deploy estimation-only code in production trading
    stage_ids:
    - model_specification
  - id: finance-C-022
    when: When presenting ARCH model estimation results
    action: present backtest simulation results as equivalent to live trading performance
    severity: high
    kind: claim_boundary
    modality: must_not
    consequence: Simulated backtest returns systematically differ from live trading due to execution slippage, transaction
      costs, market impact, and liquidity constraints not captured in the estimation framework
    stage_ids:
    - model_specification
  - id: finance-C-023
    when: When estimating ARCH models on financial time series
    action: claim parameter estimates are the true population parameters
    severity: medium
    kind: claim_boundary
    modality: must_not
    consequence: ARCH model parameters are estimated via maximum likelihood on finite samples, introducing estimation uncertainty.
      Standard errors and confidence intervals must be reported to avoid overstating precision
    stage_ids:
    - model_specification
  - id: finance-C-024
    when: When initializing ARCHModel with default components
    action: use ConstantVariance as the default volatility process since it has closed-form estimation
    severity: medium
    kind: resource_boundary
    modality: must
    consequence: Using non-ConstantVariance volatility without explicit specification causes the model to require iterative
      optimization, increasing computation time and potential convergence issues
    stage_ids:
    - model_specification
  - id: finance-C-025
    when: When initializing ARCHModel with default components
    action: use Normal distribution as the default since it has no shape parameters (closed-form fit available)
    severity: medium
    kind: resource_boundary
    modality: must
    consequence: Using heavy-tailed distributions (StudentsT, SkewStudent) without explicit selection may cause optimization
      to fail if starting values are poorly chosen
    stage_ids:
    - model_specification
  - id: finance-C-026
    when: When optimizing ARCH model parameters
    action: use SLSQP optimizer since it supports both bound constraints and linear inequality constraints
    severity: high
    kind: resource_boundary
    modality: must
    consequence: Using optimizers without proper constraint support (L-BFGS-B, Nelder-Mead) cannot enforce ARCH parameter
      constraints, producing mathematically invalid models
    stage_ids:
    - model_specification
  - id: finance-C-027
    when: When estimating ARCH models with HAR or other lag-based mean models
    action: set hold_back parameter to exclude pre-sample observations that would cause look-ahead bias
    severity: high
    kind: operational_lesson
    modality: must
    consequence: HAR models use historical average calculations that can include pre-sample data if hold_back is not set,
      causing look-ahead bias where information not yet available affects current estimates
    stage_ids:
    - model_specification
  - id: finance-C-028
    when: When using closed-form estimation path
    action: verify volatility has closed_form=True AND distribution has num_params=0 AND volatility is ConstantVariance
    severity: medium
    kind: operational_lesson
    modality: must
    consequence: Failing to meet all three conditions forces the model through iterative optimization instead of closed-form
      estimation, significantly increasing computation time
    stage_ids:
    - model_specification
  - id: finance-C-032
    when: When implementing volatility process starting values
    action: compute volatility starting values using residuals from the mean model fit
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Using raw data instead of residuals for volatility starting values produces incorrect initial variance estimates,
      potentially causing divergence in optimization
    stage_ids:
    - model_specification
  - id: finance-C-033
    when: When implementing distribution starting values
    action: compute distribution starting values using standardized residuals from volatility
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Using non-standardized residuals for distribution starting values produces incorrect shape parameter initialization,
      especially for heavy-tailed distributions
    stage_ids:
    - model_specification
  - id: finance-C-034
    when: When validating user-provided starting values
    action: check starting values satisfy both bounds and constraint inequalities before optimization
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Starting values outside bounds or violating constraints cause the optimizer to either fail immediately or
      produce invalid intermediate results
    stage_ids:
    - model_specification
  - id: finance-C-037
    when: When estimating parameter covariance matrix
    action: verify the parameter covariance matrix is positive definite
    severity: high
    kind: domain_rule
    modality: must
    consequence: Non-positive-definite covariance matrix produces invalid standard errors, t-statistics, and confidence intervals,
      corrupting statistical inference
    stage_ids:
    - parameter_estimation
  - id: finance-C-038
    when: When constructing residuals in parameter estimation
    action: produce mean-zero residuals by subtracting the conditional mean
    severity: high
    kind: domain_rule
    modality: must
    consequence: Non-zero mean residuals bias volatility estimation, as the ARCH variance equation assumes mean-zero shocks,
      leading to systematic risk mismeasurement
    stage_ids:
    - parameter_estimation
  - id: finance-C-039
    when: When implementing SLSQP optimization for ARCH models
    action: skip convergence status validation based on 'looks reasonable' assessment
    severity: high
    kind: architecture_guardrail
    modality: must_not
    consequence: Non-zero optimizer status indicates the solution may be suboptimal or infeasible, producing biased parameter
      estimates that corrupt volatility forecasts
    stage_ids:
    - parameter_estimation
  - id: finance-C-042
    when: When setting up backcasting for variance recursion
    action: use EWMA(0.94) with tau=75 for backcast computation
    severity: medium
    kind: resource_boundary
    modality: must
    consequence: Incorrect backcast values bias initial variance estimates, affecting convergence speed and potentially producing
      suboptimal parameter estimates
    stage_ids:
    - parameter_estimation
  - id: finance-C-043
    when: When executing likelihood computation in ARCH estimation
    action: use Numba JIT compilation with nopython=True for speed, with graceful pure Python fallback
    severity: medium
    kind: resource_boundary
    modality: must
    consequence: Without JIT compilation, likelihood evaluation becomes prohibitively slow for large datasets, making estimation
      impractical
    stage_ids:
    - parameter_estimation
  - id: finance-C-044
    when: When the optimizer returns non-zero convergence status
    action: emit ConvergenceWarning to alert user about potential optimization difficulty
    severity: high
    kind: operational_lesson
    modality: must
    consequence: Silent acceptance of non-converged optimization produces unreliable parameter estimates that may not represent
      the true optimum, misleading downstream risk calculations
    stage_ids:
    - parameter_estimation
  - id: finance-C-045
    when: When providing custom starting values for optimization
    action: validate starting values satisfy both bounds and inequality constraints before use
    severity: high
    kind: operational_lesson
    modality: must
    consequence: Invalid starting values cause optimization to start from an infeasible point, potentially converging to invalid
      parameters or failing to converge
    stage_ids:
    - parameter_estimation
  - id: finance-C-046
    when: When estimating ARCH models with SLSQP optimizer
    action: tolerate non-zero convergence status as common in ARCH estimation due to constraint boundaries
    severity: medium
    kind: operational_lesson
    modality: must
    consequence: Treating non-zero status as fatal error prevents valid estimates from being returned when optimizer reaches
      constraint boundaries (common in volatility models)
    stage_ids:
    - parameter_estimation
  - id: finance-C-047
    when: When presenting ARCH model estimation results
    action: claim that backtested volatility forecasts equal expected live trading performance
    severity: high
    kind: claim_boundary
    modality: must_not
    consequence: ARCH volatility estimates are conditional on historical data; structural breaks, regime changes, and market
      conditions cause live performance to diverge from backtested results
    stage_ids:
    - parameter_estimation
  - id: finance-C-048
    when: When computing parameter covariance using numerical derivatives
    action: skip numerical Hessian inversion when hessian is near-singular
    severity: high
    kind: domain_rule
    modality: must_not
    consequence: Near-singular Hessian indicates model identification issues; blind matrix inversion produces unreliable standard
      errors and invalid inference
    stage_ids:
    - parameter_estimation
  - id: finance-C-049
    when: When the optimizer status is non-zero but parameters look reasonable
    action: dismiss the convergence warning as 'false alarm' without investigation
    severity: high
    kind: rationalization_guard
    modality: must_not
    consequence: Dismissing convergence warnings based on superficial parameter inspection ignores constraint boundary conditions
      that invalidate stationarity guarantees
    stage_ids:
    - parameter_estimation
  - id: finance-C-050
    when: When constraints appear satisfied and optimization completes
    action: skip validation that each inequality constraints a.dot(params) - b >= 0 hold for final parameters
    severity: medium
    kind: rationalization_guard
    modality: must_not
    consequence: Optimizer may return parameters at constraint boundaries that technically satisfy a.dot(params) - b >= 0
      but produce numerically unstable or invalid variance forecasts
    stage_ids:
    - parameter_estimation
  - id: finance-C-053
    when: When implementing multi-step ARCH variance forecasting
    action: verify first-horizon multi-step variance equals one-step variance due to ARCH model structure
    severity: high
    kind: domain_rule
    modality: must
    consequence: Violation indicates incorrect implementation of ARCH recursion, causing forecast variance to diverge from
      true model dynamics at h=1
    stage_ids:
    - forecasting
  - id: finance-C-057
    when: When using bootstrap forecasting method
    action: require at least 10 initial observations and horizon/start ratio less than 20%
    severity: high
    kind: resource_boundary
    modality: must
    consequence: Bootstrap with insufficient burn-in or excessive extrapolation ratio produces unreliable variance estimates
      with high sampling bias
    stage_ids:
    - forecasting
  - id: finance-C-058
    when: When setting simulation count for Monte Carlo forecasting
    action: use default of 1000 simulations to balance Monte Carlo error against computation time
    severity: medium
    kind: resource_boundary
    modality: should
    consequence: Insufficient simulations increase variance of forecast estimates; excessive simulations waste computation
      without meaningful accuracy gains
    stage_ids:
    - forecasting
  - id: finance-C-059
    when: When forecasting with FixedVariance volatility process
    action: rely on simulation method producing meaningful variance forecasts
    severity: high
    kind: resource_boundary
    modality: must_not
    consequence: FixedVariance process returns NaN for all forecast paths when using simulation method, as the variance is
      predetermined and cannot be simulated
    stage_ids:
    - forecasting
  - id: finance-C-060
    when: When forecasting with ARCHInMean models
    action: attempt to generate forecasts as this model variant does not support prediction
    severity: high
    kind: resource_boundary
    modality: must_not
    consequence: ARCHInMean models raise NotImplementedError because the ARCH-in-mean specification makes multi-step forecasting
      mathematically undefined
    stage_ids:
    - forecasting
  - id: finance-C-061
    when: When validating simulation output shape
    action: verify simulated_paths/variances have shape (horizons x simulations x reindex_dim)
    severity: high
    kind: operational_lesson
    modality: must
    consequence: Incorrect simulation shape causes downstream indexing errors or silent misalignment between forecast paths
      and their variance estimates
    stage_ids:
    - forecasting
  - id: finance-C-062
    when: When forecasting models with exogenous regressors
    action: limit horizon to 1 or accept NaN-filled columns for multi-step forecasts
    severity: medium
    kind: operational_lesson
    modality: must
    consequence: Multi-step exogenous variable forecasts require aligned out-of-sample values that are typically unavailable,
      producing NaN columns
    stage_ids:
    - forecasting
  - id: finance-C-065
    when: When validating forecasting capability before computation
    action: call _check_forecasting_method to verify method compatibility with model type and horizon
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Skipping method validation allows unsupported forecast types (e.g., EGARCH analytic multi-step) to reach
      computation stage
    stage_ids:
    - forecasting
  - id: finance-C-066
    when: When using bootstrap method for variance forecasting
    action: wrap standardized residual sampling through BootstrapRng to enable simulation-based path generation
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Direct bootstrap sampling without BootstrapRng wrapper produces non-reproducible results and breaks the RNG
      interface contract for simulation paths
    stage_ids:
    - forecasting
  - id: finance-C-067
    when: When presenting simulation-based forecast results
    action: claim exact reproducibility without setting random_state or provide identical random seed
    severity: medium
    kind: claim_boundary
    modality: must_not
    consequence: Simulation paths contain inherent Monte Carlo randomness; presenting them as deterministic produces misleading
      risk estimates
    stage_ids:
    - forecasting
  - id: finance-C-069
    when: When using simulation or bootstrap methods
    action: recognize that forecast variance estimates contain Monte Carlo sampling error
    severity: medium
    kind: claim_boundary
    modality: must
    consequence: With finite simulations (default 1000), variance estimates have standard error proportional to 1/sqrt(simulations).
      Claims of point precision are statistically invalid
    stage_ids:
    - forecasting
  - id: finance-C-078
    when: When implementing PhillipsPerron automatic lag selection
    action: use the formula 12 * (nobs/100)^(1/4) as the default when lags is None
    severity: high
    kind: resource_boundary
    modality: must
    consequence: Non-standard lag selection produces inconsistent long-run variance estimates and invalid PP test statistics
      across implementations
    stage_ids:
    - unitroot_testing
  - id: finance-C-079
    when: When implementing PhillipsPerron test
    action: allow lag parameter to exceed available observations for covariance estimation
    severity: high
    kind: resource_boundary
    modality: must_not
    consequence: Excessive lags relative to observations cause ill-conditioned long-run covariance matrices, producing unreliable
      PP test statistics
    stage_ids:
    - unitroot_testing
  - id: finance-C-082
    when: When running unit root tests with automatic lag selection
    action: warn users when max_lags is large relative to sample size due to performance impact
    severity: medium
    kind: operational_lesson
    modality: should
    consequence: Large lag search spaces with many observations cause slow computation without proportional statistical benefit
    stage_ids:
    - unitroot_testing
  - id: finance-C-088
    when: When implementing unit root test base class
    action: make _check_specification and _compute_statistic abstract methods requiring subclass implementation
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Missing abstract method enforcement allows instantiation of incomplete test classes that lack core computation
      logic
    stage_ids:
    - unitroot_testing
  - id: finance-C-089
    when: When using unit root test results for trading decisions
    action: claim unit root test results as predictions of future price behavior
    severity: high
    kind: claim_boundary
    modality: must_not
    consequence: Unit root tests are statistical hypothesis tests on historical data, not forecasts; presenting stationarity
      conclusions as trading signals misleads stakeholders
    stage_ids:
    - unitroot_testing
  - id: finance-C-090
    when: When presenting cointegration test results
    action: claim cointegration implies causal trading relationships
    severity: high
    kind: claim_boundary
    modality: must_not
    consequence: Cointegration only indicates statistical equilibrium relationships; presenting it as evidence of profitable
      pairs trading without proper risk management overstates the capability
    stage_ids:
    - unitroot_testing
  - id: finance-C-091
    when: When using critical values for unit root tests
    action: claim asymptotic critical values are exact for finite samples
    severity: medium
    kind: claim_boundary
    modality: must_not
    consequence: MacKinnon critical values are asymptotic approximations; presenting them as precise thresholds for small
      samples overstates test accuracy and may lead to incorrect conclusions
    stage_ids:
    - unitroot_testing
  - id: finance-C-092
    when: When running multiple unit root tests on the same series
    action: select the test with most favorable p-value without pre-specification justification
    severity: medium
    kind: claim_boundary
    modality: should_not
    consequence: Multiple testing without correction inflates Type I error rate; selecting favorable results misleads about
      statistical evidence for stationarity
    stage_ids:
    - unitroot_testing
  - id: finance-C-099
    when: When initializing bootstrap for confidence intervals
    action: use default method='basic' for confidence interval calculation
    severity: medium
    kind: resource_boundary
    modality: must
    consequence: Using non-basic methods without understanding tradeoffs may produce incorrect coverage; basic is simplest
      and matches default behavior expected by the framework
    stage_ids:
    - bootstrap_inference
  - id: finance-C-100
    when: When initializing bootstrap for confidence intervals
    action: use default sampling='nonparametric'
    severity: medium
    kind: resource_boundary
    modality: must
    consequence: Nonparametric is the safest default; parametric/semiparametric require specific assumptions that may not
      hold
    stage_ids:
    - bootstrap_inference
  - id: finance-C-101
    when: When initializing bootstrap for confidence intervals
    action: use at least 1000 bootstrap replications for stable estimates
    severity: high
    kind: resource_boundary
    modality: must
    consequence: Fewer than 1000 reps produces high-variance confidence intervals with poor coverage, leading to unreliable
      statistical inference and potentially wrong conclusions
    stage_ids:
    - bootstrap_inference
  - id: finance-C-102
    when: When implementing MCS or SPA multiple comparison procedures
    action: default block_size to sqrt(T) when not provided
    severity: high
    kind: resource_boundary
    modality: must
    consequence: Wrong block size invalidates time-series bootstrap standard errors; sqrt(T) is theoretically justified for
      block bootstraps
    stage_ids:
    - bootstrap_inference
  - id: finance-C-104
    when: When initializing MCS
    action: default MCS test size to 0.05 (5% significance level)
    severity: low
    kind: resource_boundary
    modality: should
    consequence: Non-standard significance levels may not be appropriate for model selection; 0.05 is conventional
    stage_ids:
    - bootstrap_inference
  - id: finance-C-105
    when: When implementing BCa confidence intervals
    action: compute acceleration parameter using jackknife estimation
    severity: high
    kind: operational_lesson
    modality: must
    consequence: BCa without proper jackknife acceleration produces biased confidence intervals that fail to achieve nominal
      coverage
    stage_ids:
    - bootstrap_inference
  - id: finance-C-108
    when: When initializing any bootstrap class for reproducibility
    action: pass seed parameter (int, Generator, or RandomState) to enable reproducible results
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Without seed, each run produces different bootstrap replicates, preventing reproducible inference and making
      results impossible to verify
    stage_ids:
    - bootstrap_inference
  - id: finance-C-111
    when: When implementing bootstrap state management
    action: use reset() to restore initial state or reset with new seed
    severity: medium
    kind: architecture_guardrail
    modality: must
    consequence: Without proper reset, bootstrap continues from current state causing reproducibility issues in sequential
      inference
    stage_ids:
    - bootstrap_inference
  - id: finance-C-112
    when: When using bootstrap confidence intervals
    action: claim bootstrap CI coverage is exact for finite samples
    severity: high
    kind: claim_boundary
    modality: must_not
    consequence: Bootstrap confidence intervals are asymptotically valid; claiming exact finite-sample coverage is misleading
      and violates statistical theory
    stage_ids:
    - bootstrap_inference
  - id: finance-C-113
    when: When using MCS model confidence set
    action: claim MCS produces guaranteed model rankings
    severity: high
    kind: claim_boundary
    modality: must_not
    consequence: MCS produces confidence set of models, not rankings; claiming guaranteed ranking is statistically incorrect
    stage_ids:
    - bootstrap_inference
  - id: finance-C-114
    when: When using SPA test of Superior Predictive Ability
    action: claim SPA p-values have exact finite-sample distribution
    severity: high
    kind: claim_boundary
    modality: must_not
    consequence: SPA uses bootstrap p-values that are asymptotically calibrated; exact finite-sample distribution is unknown
    stage_ids:
    - bootstrap_inference
  - id: finance-C-115
    when: When implementing MCS with identical loss values
    action: handle standard deviation of 0 in loss differences with warning
    severity: high
    kind: operational_lesson
    modality: must
    consequence: Identical losses produce zero variance, causing division by zero and invalid MCS computation with RuntimeWarning
      issued
    stage_ids:
    - bootstrap_inference
  - id: finance-C-126
    when: When displaying model results summary
    action: organize parameters into separate tables for Mean Model, Volatility Model, and Distribution
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Flat parameter listing without model component separation obscures model structure and complicates interpretation
    stage_ids:
    - results_reporting
  - id: finance-C-127
    when: When outputting results using statsmodels Summary
    action: use statsmodels SimpleTable and Summary classes for consistent output formatting
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Non-standard output formatting will break compatibility with Jupyter notebooks and the econometrics ecosystem
    stage_ids:
    - results_reporting
  - id: finance-C-129
    when: When optimizer indicates failed convergence
    action: display convergence warning in summary output with optimizer message
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Silent convergence failure will produce unreliable parameter estimates that appear valid but are actually
      suboptimal
    stage_ids:
    - results_reporting
  - id: finance-C-131
    when: When presenting ARCH-LM test results to users
    action: claim that a significant ARCH-LM test indicates adequate model specification
    severity: high
    kind: claim_boundary
    modality: must_not
    consequence: A significant ARCH-LM p-value means remaining ARCH effects exist, indicating the model is misspecified -
      claiming otherwise misleads users
    stage_ids:
    - results_reporting
  - id: finance-C-132
    when: When presenting R-squared values from ARCH model estimation
    action: claim that high R-squared indicates good volatility forecasting ability
    severity: medium
    kind: claim_boundary
    modality: must_not
    consequence: R-squared measures mean model fit, not volatility model adequacy; ARCH models are estimated for volatility
      forecasting, not point prediction
    stage_ids:
    - results_reporting
  - id: finance-C-133
    when: When visualizing forecast results with hedgehog plots
    action: align forecast spines with actual historical values at the forecast origin
    severity: high
    kind: domain_rule
    modality: must
    consequence: Misaligned hedgehog plot spines will mislead users about the timing and accuracy of forecasts relative to
      actual observations
    stage_ids:
    - results_reporting
  - id: finance-C-134
    when: When using matplotlib for visualization
    action: handle matplotlib version compatibility for date plotting methods
    severity: medium
    kind: resource_boundary
    modality: should
    consequence: Incompatibility with matplotlib version < 3.10 will cause date axis plotting failures in hedgehog and residual
      plots
    stage_ids:
    - results_reporting
  - id: finance-C-138
    when: When passing data between data_input and model_specification
    action: Preserve the hold_back parameter when excluding initial observations from estimation
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Incorrect hold_back handling will cause the first hold_back observations to be incorrectly included or excluded
      from parameter estimation
    stage_ids:
    - data_input
    - model_specification
  - id: finance-C-142
    when: When computing variance bounds for use in loglikelihood
    action: Pass variance_bounds computed from residuals to each variance recursion calls to prevent NaN in loglikelihood
    severity: high
    kind: domain_rule
    modality: must
    consequence: Without variance bounds, extreme variance values will produce -inf loglikelihood, causing optimizer to fail
      or produce invalid parameters
    stage_ids:
    - model_specification
    - parameter_estimation
  - id: finance-C-143
    when: When validating user-provided starting values
    action: Check starting values satisfy both bounds AND linear constraints before passing to optimizer
    severity: high
    kind: domain_rule
    modality: must
    consequence: Invalid starting values that violate constraints will cause SLSQP to fail immediately or produce undefined
      behavior in optimization
    stage_ids:
    - model_specification
    - parameter_estimation
  - id: finance-C-145
    when: When passing backcast and var_bounds to forecasting
    action: Use the same backcast and variance_bounds computed during estimation, not recomputed values
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Recomputing backcast/var_bounds may produce slightly different values, breaking alignment between in-sample
      fit and out-of-sample forecasts
    stage_ids:
    - parameter_estimation
    - forecasting
  - id: finance-C-146
    when: When computing multi-step variance forecasts
    action: Verify horizon is >= 1 and uses only variance forecasting method supported by the volatility model
    severity: high
    kind: domain_rule
    modality: must
    consequence: Using unsupported forecasting method (e.g., analytic for EGARCH multi-step) will raise ValueError or produce
      mathematically incorrect forecasts
    stage_ids:
    - parameter_estimation
    - forecasting
  - id: finance-C-147
    when: When passing ARCHModelResult to results reporting
    action: Populate residuals array with NaN in positions outside estimation window (first_obs:last_obs)
    severity: medium
    kind: architecture_guardrail
    modality: must
    consequence: Without NaN padding, users cannot distinguish observations excluded from estimation from actual zero residuals,
      causing misinterpretation of results
    stage_ids:
    - parameter_estimation
    - results_reporting
  - id: finance-C-148
    when: When reporting ARCHModelResult summary
    action: Report both fit_start and fit_stop indices to indicate which observations were used in estimation
    severity: medium
    kind: architecture_guardrail
    modality: must
    consequence: Without explicit fit window reporting, users may incorrectly analyze residuals or apply forecasts to wrong
      time periods
    stage_ids:
    - parameter_estimation
    - results_reporting
  - id: finance-C-149
    when: When passing bootstrap results to results reporting
    action: Return confidence intervals with shape (2, num_params) where row 0 is lower bounds and row 1 is upper bounds
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Incorrect confidence interval shape will cause downstream reporting to display wrong bounds or raise dimension
      errors
    stage_ids:
    - bootstrap_inference
    - results_reporting
  - id: finance-C-150
    when: When computing Model Confidence Set (MCS) p-values
    action: Return p-values in DataFrame format with model indices as rows
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Wrong format will cause downstream model selection to fail or select wrong models
    stage_ids:
    - bootstrap_inference
    - results_reporting
  - id: finance-C-151
    when: When computing SPA (Reality Check) p-values
    action: Return three p-values (lower, consistent, upper) to account for test's one-sided nature
    severity: high
    kind: domain_rule
    modality: must
    consequence: Single p-value ignores the SPA's multiple-testing correction, leading to incorrect model selection decisions
    stage_ids:
    - bootstrap_inference
    - results_reporting
  - id: finance-C-152
    when: When computing unit root test statistics
    action: Store stat, pvalue, and critical_values (dict with keys 1%, 5%, 10%) in the result object
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Missing critical values will prevent users from making decisions using critical-value-based thresholds instead
      of p-values
    stage_ids:
    - unitroot_testing
    - results_reporting
  - id: finance-C-153
    when: When returning cointegration test results
    action: Include cointegrating_vector in the result so users can implement the discovered relationship
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Without the cointegrating vector, users cannot implement the discovered long-run equilibrium relationship
      between variables
    stage_ids:
    - unitroot_testing
    - results_reporting
  - id: finance-C-154
    when: When validating unit root test lags parameter
    action: Reject negative lag values as they are mathematically invalid
    severity: high
    kind: domain_rule
    modality: must
    consequence: Negative lags will produce undefined behavior in the test regression or raise cryptic errors downstream
    stage_ids:
    - unitroot_testing
    - results_reporting
  - id: finance-C-155
    when: When passing forecast results to reporting
    action: Return mean, variance, and residual_variance as DataFrames aligned by index
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Misaligned forecasts will cause incorrect visualization and summary statistics in results reporting
    stage_ids:
    - forecasting
    - results_reporting
  - id: finance-C-156
    when: When creating hedgehog forecast plot data
    action: Pad forecasts with NaN for dates before the earliest forecastable date to prevent look-ahead bias in visualization
    severity: medium
    kind: domain_rule
    modality: must
    consequence: Without NaN padding, the hedgehog plot will display incorrect forecast trajectories that include in-sample
      information
    stage_ids:
    - forecasting
    - results_reporting
  - id: finance-C-157
    when: When forecasting with align='target'
    action: Align forecasts so that column h contains h-step ahead forecast from time t-h, matching evaluation methodology
    severity: high
    kind: domain_rule
    modality: must
    consequence: Misaligned target forecasts will show incorrect alignment between realizations and forecasts, invalidating
      forecast evaluation metrics
    stage_ids:
    - forecasting
    - results_reporting
  - id: finance-C-158
    when: When using bootstrap forecasting method
    action: Require at least 10 initial observations and horizon/start ratio < 0.2 for valid bootstrap
    severity: high
    kind: resource_boundary
    modality: must
    consequence: Insufficient initial observations for bootstrap will produce unstable variance estimates and unreliable forecasts
    stage_ids:
    - forecasting
    - results_reporting
  - id: finance-C-164
    when: When using a GARCH volatility model with power not equal to 2.0
    action: Request analytic forecasting method for horizon > 1 — only 'simulation' or 'bootstrap' are valid forecasting methods
      for non-square-power GARCH models
    severity: high
    kind: domain_rule
    modality: must_not
    consequence: Using method='analytic' with power!=2.0 at horizon > 1 raises a ValueError and produces no forecast, forcing
      re-estimation or re-configuration
  - id: finance-C-165
    when: When using an EGARCH volatility model
    action: Request analytic forecasting method for horizon > 1 — EGARCH does not support analytic multi-step variance forecasts
    severity: high
    kind: domain_rule
    modality: must_not
    consequence: Using method='analytic' with EGARCH at horizon > 1 raises a ValueError, preventing multi-step forecast generation
  - id: finance-C-166
    when: When determining the estimation path in ARCHModel.fit()
    action: 'The closed-form estimation path is only taken when each three conditions hold simultaneously: volatility.closed_form=True,
      distribution.num_params=0, and isinstance(volatility, ConstantVariance) — if any condition fails, use the general SLSQP
      optimization path'
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Attempting to use the closed-form path without all conditions met causes incorrect parameter estimates or
      AttributeErrors, as the closed-form formulas are specific to the ConstantVariance + Normal combination
  - id: finance-C-167
    when: When calling the forecast() method on an ARCHModel or ARCHModelResult
    action: 'Use only one of the three explicitly supported ForecastingMethod values: ''analytic'', ''simulation'', or ''bootstrap''
      — any other string raises a TypeError'
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Using an unsupported forecasting method string causes a TypeError in the function call chain, preventing
      forecast generation
  - id: finance-C-168
    when: When configuring unit root tests (ADF, DFGLS, Phillips-Perron, KPSS, Zivot-Andrews, VarianceRatio)
    action: 'Use only one of the four explicitly supported trend specifications: ''n'' (no constant), ''c'' (constant only),
      ''ct'' (constant and time trend), or ''ctt'' (constant, time trend, and squared time trend)'
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Using an unsupported trend specification causes a TypeError or produces statistically incorrect test results
      with wrong degrees of freedom
  - id: finance-C-169
    when: When configuring parameter covariance estimation in ARCHModel.fit()
    action: Use only 'robust' or 'classic' for the cov_type parameter — 'robust' uses the sandwich estimator with numerical
      derivatives, 'classic' uses the inverse Hessian
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Using an unsupported cov_type value causes incorrect standard errors and invalid inference (wrong t-statistics,
      p-values, and confidence intervals)
  - id: finance-C-170
    when: When calling forecast() on an ARCHModel or ARCHModelResult
    action: Use only 'origin' or 'target' for the align parameter — 'origin' aligns forecasts by their information origin
      time, 'target' aligns by the forecast target time
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Using an unsupported align value causes a TypeError and prevents forecast computation
  - id: finance-C-176
    when: When initializing any bootstrap class (IIDBootstrap, CircularBlockBootstrap, StationaryBootstrap, MovingBlockBootstrap)
    action: 'Pass the index parameter as one of the three supported types: an Int64Array1D, a tuple of Int64Array1D, or a
      tuple of (list of Int64Array1D, dict of Int64Array1D) — matching the BootstrapIndexT union type'
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Passing an unsupported index type causes the bootstrap to produce invalid resampled indices, corrupting all
      bootstrap confidence intervals, p-values, and covariance estimates
  - id: finance-C-177
    when: When estimating any ARCH model and the variance of input residuals is outside [0.1, 10000.0)
    action: Allow automatic rescaling of the data or provide explicit rescale=True — the _check_scale function automatically
      rescales data outside this range by powers of 10 to avoid numerical issues in the optimizer
    severity: medium
    kind: operational_lesson
    modality: should
    consequence: Data with variance outside [0.1, 10000) causes the SLSQP optimizer to converge slowly or fail to find the
      optimal parameters, producing suboptimal or invalid estimates
  - id: finance-C-178
    when: When presenting or reporting results from this package to users
    action: Claim that the package supports real-time streaming data analysis — it is a batch statistical estimation library
      that operates on static historical time series
    severity: high
    kind: claim_boundary
    modality: must_not
    consequence: Users attempt to integrate the package into real-time trading pipelines expecting live data ingestion, leading
      to system failures when the package cannot process streaming data
  - id: finance-C-179
    when: When presenting or reporting this system's capabilities
    action: Claim support for high-frequency trading systems requiring sub-second latency — the package performs batch maximum-likelihood
      estimation unsuitable for latency-critical applications
    severity: high
    kind: claim_boundary
    modality: must_not
    consequence: Users deploy the package in HFT contexts where sub-second decision-making is required, causing severe financial
      losses due to estimation latency
  - id: finance-C-180
    when: When presenting or reporting this system's capabilities
    action: Claim support for multivariate volatility models — the package only implements univariate ARCH/GARCH variants;
      users requiring multivariate volatility must use the rpy2 port of the R package 'rmgarch'
    severity: high
    kind: claim_boundary
    modality: must_not
    consequence: Users attempt to model multivariate volatility correlations using the univariate package, producing incorrect
      risk estimates and wrong portfolio allocation decisions
  - id: finance-C-181
    when: When presenting or reporting this system's capabilities
    action: Claim support for structural break detection beyond unit root tests — the package's unit root tests cannot detect
      multiple structural breaks; users requiring this should use dedicated structural change packages
    severity: medium
    kind: claim_boundary
    modality: must_not
    consequence: Users rely on unit root tests for structural break detection, missing multiple breaks that invalidate the
      entire time-series model specification
  - id: finance-C-182
    when: When presenting or reporting this system's capabilities
    action: Claim drag-and-drop GUI interfaces — the package is a Python API-only library with no graphical user interface;
      users requiring GUI access must build their own wrappers
    severity: medium
    kind: claim_boundary
    modality: must_not
    consequence: Users expect a graphical interface and cannot use the package's Python API, wasting development resources
      on attempting to find a non-existent GUI
  - id: finance-C-183
    when: When presenting or reporting this system's capabilities
    action: Claim that this package supports live trading — it is a pure backtesting and statistical estimation library with
      no exchange connectivity, order execution, or portfolio management capabilities
    severity: high
    kind: claim_boundary
    modality: must_not
    consequence: Users connect the package directly to a brokerage expecting automated trade execution, causing unintended
      market orders to be placed with real capital
  - id: finance-C-184
    when: When presenting or reporting this system's backtested or estimated returns to users
    action: Claim that estimated model parameters or historical backtest results equal expected future performance — past
      ARCH model estimates do not predict future volatility or returns
    severity: high
    kind: claim_boundary
    modality: must_not
    consequence: Users make live capital allocation decisions based on inflated historical estimates, leading to severe underperformance
      when market regimes shift away from the estimation period
  - id: finance-C-185
    when: When presenting or reporting volatility forecasts to users
    action: Claim that model-based forecasts fully account for market microstructure costs — forecasts ignore market impact,
      bid-ask spread, financing costs, slippage, and execution delays
    severity: high
    kind: claim_boundary
    modality: must_not
    consequence: Users incorporate ARCH forecasts directly into live trading strategies without adjusting for execution costs,
      producing strategies that appear profitable in backtests but lose money in live trading after costs
  - id: finance-C-186
    when: When implementing GARCH volatility model configuration in arch package
    action: Use GARCH(1,1) with power=2.0 as the default configuration — verify default p=1, q=1 parameters are used unless
      explicit model selection is performed; for asymmetric volatility, use GJR-GARCH; for log-volatility use EGARCH
    severity: high
    kind: domain_rule
    modality: must
    consequence: Using non-standard GARCH parameters (p>1, q>1) without sufficient data causes unreliable volatility estimates,
      leading to incorrect risk forecasts and poor hedging decisions in live trading
    derived_from_bd_id: BD-001
  - id: finance-C-187
    when: When using GARCH model with non-standard power parameter
    action: Use simulation-based forecasting when power != 2.0 — analytic forecast methods are only available for standard
      GARCH (power=2.0); verify forecast method is explicitly set to simulation if power differs from default
    severity: high
    kind: domain_rule
    modality: must
    consequence: Using analytic forecasting with power != 2.0 produces incorrect forecast values since closed-form solutions
      do not exist for non-standard power specifications, causing systematic mispricing in option hedging and VaR calculations
    derived_from_bd_id: BD-060
  - id: finance-C-188
    when: When configuring backcast parameters for GARCH model initialization
    action: Verify backcast tau calculation matches min(75, nobs) formula — for samples smaller than 75, tau should equal
      nobs; for larger samples, tau should equal 75; changing decay factor from 0.94 affects backcast smoothness and initial
      variance estimates
    severity: medium
    kind: operational_lesson
    modality: should
    consequence: Incorrect backcast tau calculation produces biased initial variance estimates, affecting the accuracy of
      short-term volatility forecasts which are critical for intraday risk management and option pricing
    derived_from_bd_id: BD-066
  - id: finance-C-189
    when: When testing for cointegration between two financial time series
    action: Apply Engle-Granger two-step cointegration test using OLS regression followed by ADF test on residuals — this
      method applies to bivariate relationships only; use MacKinnon critical values for significance determination
    severity: high
    kind: domain_rule
    modality: must
    consequence: Using univariate time series methods for cointegration testing incorrectly identifies or misses cointegrating
      relationships, causing pairs trading strategies to trade on spurious relationships or miss profitable opportunities
    derived_from_bd_id: BD-042
  - id: finance-C-190
    when: When calculating returns for downstream performance metrics
    action: Verify that 100 * pct_change() scaling matches the expected annualization multiplier (12x for percentage returns)
      used in Sharpe ratio and other risk-adjusted performance calculations
    severity: medium
    kind: operational_lesson
    modality: should
    consequence: Using log returns without adjusting the annualization multiplier from 12x to sqrt(12) causes systematic mismeasurement
      of risk-adjusted performance, making strategies appear more or less attractive than they actually are
    derived_from_bd_id: BD-003
  - id: finance-C-191
    when: When computing Sharpe ratio or other risk-adjusted performance metrics
    action: 'Verify the annualization multiplier matches the return scaling convention: use 12x for percentage returns (100*pct_change)
      or sqrt(12) for log returns; document the dependency explicitly in strategy analysis code'
    severity: medium
    kind: operational_lesson
    modality: should
    consequence: Mismatch between return scaling convention and annualization assumption creates systematic mismeasurement
      of risk-adjusted performance, causing strategies to appear more or less attractive than their true performance
    derived_from_bd_id: BD-074
  - id: finance-C-192
    when: When implementing optimization or fitting methods for ARCH models
    action: Assume the framework provides comprehensive convergence diagnostics beyond scipy status codes — the current convergence_flag
      only returns status codes without iteration history, log-likelihood path, or parameter trajectory data
    severity: high
    kind: claim_boundary
    modality: must_not
    consequence: Without trajectory data, local optima issues in complex ARCH models cannot be diagnosed post-hoc, leading
      to unreliable parameter estimates being used in production strategies
    derived_from_bd_id: BD-GAP-016
  - id: finance-C-193
    when: When fitting complex ARCH models requiring convergence quality assessment
    action: 'Implement or use a ConvergenceDiagnosis object that stores: iteration history, log-likelihood path, and parameter
      trajectory for post-hoc assessment of convergence quality to diagnose local optima issues'
    severity: high
    kind: domain_rule
    modality: must
    consequence: Fitting complex ARCH models without convergence diagnostics prevents diagnosis of local optima issues, causing
      unreliable parameter estimates to be used in production strategies
    derived_from_bd_id: BD-GAP-016
  - id: finance-C-194
    when: When running simulation/bootstrap forecasting
    action: Verify that 1000 simulations (Monte Carlo SE ≈ 3.2%) provides sufficient precision for the intended use case;
      for extreme quantile estimation such as VaR at 99%, increase to 10000+ simulations to achieve stable tail estimates
    severity: medium
    kind: operational_lesson
    modality: should
    consequence: Using 1000 simulations for extreme quantile estimation produces unstable VaR estimates with high variance,
      leading to either excessive capital reserves or underestimation of tail risk
    derived_from_bd_id: BD-005
  - id: finance-C-195
    when: When initializing variance in EWMA volatility models
    action: Verify that lambda=0.94 decay rate and tau=75 observation window (~3 months of daily data) align with your data
      frequency and volatility characteristics before using default backcast values
    severity: medium
    kind: operational_lesson
    modality: should
    consequence: Default EWMA parameters may not match asset-specific volatility dynamics, causing systematic initialization
      bias that propagates through all forecasts
    derived_from_bd_id: BD-006
  - id: finance-C-196
    when: When selecting power specification for APARCH or TARCH volatility models
    action: Verify power parameter before selecting forecast method; power!=2 automatically switches from analytic to simulation-based
      forecasting with different computational cost and replication requirements
    severity: high
    kind: domain_rule
    modality: must
    consequence: Selecting non-quadratic power models without understanding the forecast method switch causes unexpected runtime
      increases and potentially insufficient simulation replications for stable tail estimates
    derived_from_bd_id: BD-077
  - id: finance-C-197
    when: When relying on Bollerslev-Wooldridge robust standard errors for inference
    action: Assume robust SE corrects inference only when the conditional variance specification is approximately correct;
      model misspecification such as ignoring leverage effects makes robust SE unreliable despite appearing to correct standard
      errors
    severity: high
    kind: operational_lesson
    modality: must_not
    consequence: Robust standard errors provide false confidence when the GARCH specification is misspecified, leading to
      invalid hypothesis tests and potentially wrong conclusions about coefficient significance
    derived_from_bd_id: BD-080
  - id: finance-C-198
    when: When selecting bootstrap block length for time series resampling
    action: Verify sqrt(T) rule-of-thumb against Politis-White optimal block calculation; for highly persistent series, larger
      blocks than sqrt(T) may be needed to account for long memory
    severity: medium
    kind: operational_lesson
    modality: should
    consequence: Using sqrt(T) default block length for highly persistent series inflates bootstrap variance and produces
      unreliable confidence intervals, leading to incorrect statistical inference
    derived_from_bd_id: BD-079
  - id: finance-C-199
    when: When implementing or refactoring RiskMetrics2006 variance calculations
    action: Maintain lambda=0.94 as the fixed decay factor for EWMA recursion, as this is the RiskMetrics 2006 industry standard
      for balancing responsiveness and stability in variance estimation
    severity: high
    kind: domain_rule
    modality: must
    consequence: Changing lambda from the RiskMetrics 2006 standard of 0.94 breaks comparability with industry benchmarks
      and produces variance estimates that do not reflect the intended balance between responsiveness and stability, potentially
      leading to misaligned risk management decisions
    derived_from_bd_id: BD-036
  - id: finance-C-200
    when: When implementing volatility model selection for time series with long-memory characteristics
    action: Use GARCH instead of FIGARCH for series exhibiting long-memory — GARCH cannot capture hyperbolic decay in volatility
      autocorrelation; FIGARCH(1,d,1) with fractional differencing parameter d in (0,0.5) is required
    severity: high
    kind: domain_rule
    modality: must_not
    consequence: Substituting GARCH for FIGARCH when modeling long-memory volatility causes the model to miss the characteristic
      hyperbolic decay pattern, leading to materially incorrect variance forecasts that distort risk estimates and hedging
      ratios
    derived_from_bd_id: BD-039
  - id: finance-C-201
    when: When implementing DFGLS unit root test for stationarity detection
    action: Apply GLS detrending (ERS 1996) before Dickey-Fuller regression in the DFGLS variant — standard OLS detrending
      must not be used as it provides materially lower test power
    severity: high
    kind: domain_rule
    modality: must
    consequence: Using OLS detrending instead of GLS detrending in DFGLS reduces test power below the designed efficiency
      gains, causing higher rates of failing to detect actual unit roots and leading to false conclusions about stationarity
    derived_from_bd_id: BD-041
  - id: finance-C-202
    when: When implementing cointegrating vector estimation for bivariate or multivariate relationships
    action: Add leads and lags of differenced regressors (Dynamic OLS) to address endogeneity in the cointegrating regression
      — static OLS without augmentation must not be used
    severity: high
    kind: domain_rule
    modality: must
    consequence: Using static OLS without lead/lag augmentation introduces endogeneity bias that violates the super-consistency
      property of cointegration estimators, producing inconsistent coefficient estimates that invalidate the identified long-run
      relationship
    derived_from_bd_id: BD-043
  - id: finance-C-203
    when: When implementing volatility calculations in backtesting or production code
    action: Assume the framework provides a built-in annualized_volatility() helper function with configurable compounding
      convention — no such standardized helper exists in the current framework
    severity: high
    kind: claim_boundary
    modality: must_not
    consequence: Without a standardized annualization helper, users apply inconsistent formulas, leading to incorrect risk
      estimates and strategy comparisons that diverge from live trading results
    derived_from_bd_id: BD-GAP-017
  - id: finance-C-204
    when: When implementing volatility calculations in backtesting or production code
    action: Implement an explicit annualized_volatility() helper that accepts configurable compounding convention (252 for
      daily trading days, 365 for calendar days, simple for no compounding) and documents that input volatility is in frequency-of-data
      units
    severity: high
    kind: domain_rule
    modality: must
    consequence: Without explicit annualization, users apply inconsistent formulas causing systematic risk mis-estimation
      that compounds over time in live trading
    derived_from_bd_id: BD-GAP-017
  - id: finance-C-205
    when: When implementing GARCH model evaluation workflows in backtesting
    action: Assume the framework provides a standardized backtest validation framework with automatic train/test splits and
      VaR/CVaR/realized PnL tracking — no such framework exists in the current implementation
    severity: high
    kind: claim_boundary
    modality: must_not
    consequence: Without standardized backtest methodology, users implement ad-hoc validation that fails to detect GARCH forecast
      failures, leading to live trading losses from unvalidated volatility predictions
    derived_from_bd_id: BD-GAP-018
  - id: finance-C-206
    when: When implementing GARCH model evaluation workflows in backtesting
    action: Implement a backtest validation framework that includes automatic train/test split for time series, historical
      VaR/CVaR tracking against realized PnL, and diagnostic plots for volatility model evaluation
    severity: high
    kind: domain_rule
    modality: must
    consequence: Without standardized backtest validation, GARCH forecast failures go undetected until live trading, causing
      significant financial losses from incorrect volatility predictions
    derived_from_bd_id: BD-GAP-018
  - id: finance-C-207
    when: When implementing or extending GARCH model parameter extraction and constraint application
    action: Use hardcoded parameter indices without computed offsets — parameter ordering follows [mean_params, vol_params,
      dist_params] with dynamically computed offsets during fit()
    severity: high
    kind: claim_boundary
    modality: must_not
    consequence: Direct index access assuming fixed parameter ordering breaks custom GARCH variants; constraint application
      fails silently causing invalid optimization results
    derived_from_bd_id: BD-GAP-015
  - id: finance-C-208
    when: When implementing or extending GARCH model parameter extraction and constraint application
    action: Always use dynamically computed offsets from the model to access parameter indices; validate that offsets remain
      within bounds before parameter extraction
    severity: high
    kind: domain_rule
    modality: must
    consequence: Without using computed offsets, custom GARCH variants with non-standard parameter counts cause index out-of-bounds
      errors or silently corrupted parameter values
    derived_from_bd_id: BD-GAP-015
  - id: finance-C-209
    when: When configuring starting values for GARCH optimization
    action: Validate GARCH starting values against stationarity constraints (alpha >= 0, gamma >= 0, beta >= 0, alpha + gamma
      + beta < 1) before passing to optimizer
    severity: high
    kind: domain_rule
    modality: must
    consequence: The fixed grid search may produce invalid starting values for non-standard GARCH variants; using invalid
      starting values causes optimizer divergence or convergence to incorrect parameters
    derived_from_bd_id: BD-067
  - id: finance-C-211
    when: When using ARCH library for volatility calculations in production backtesting
    action: Assume the framework handles stale data detection and expiry — the ARCH library does not implement data freshness
      validation; stale data may be processed as current without warning
    severity: high
    kind: claim_boundary
    modality: must_not
    consequence: Without stale data detection, stale price data is processed as current, causing PnL calculations to be incorrect
      and potentially resulting in significant financial losses or reporting errors in production systems
    derived_from_bd_id: BD-GAP-003
  - id: finance-C-212
    when: When implementing data ingestion in ARCH library production backtesting
    action: Implement data timestamp validation using validate_timestamp() helper — check data timestamps against current
      time and reject data older than the configured staleness threshold (e.g., 5 minutes for intraday data)
    severity: high
    kind: domain_rule
    modality: must
    consequence: Stale data processed as current leads to incorrect PnL calculations and reporting errors in production backtesting
      systems, potentially causing significant financial losses
    derived_from_bd_id: BD-GAP-003
  - id: finance-C-213
    when: When running backtests or production strategies with ARCH library
    action: Assume the framework automatically maintains model and data version binding — the framework does not implement
      snapshot binding; different runs may silently use different data versions without tracking
    severity: high
    kind: claim_boundary
    modality: must_not
    consequence: Without version snapshot binding, backtest results become non-reproducible because different executions may
      load different data versions without any tracking or warning, making strategy audits impossible
    derived_from_bd_id: BD-GAP-004
  - id: finance-C-214
    when: When running backtests or production strategies with ARCH library
    action: Capture and persist data hashes (e.g., hashlib.sha256 on source data) and model version identifiers for each backtest
      run, storing them alongside results in a version manifest file
    severity: high
    kind: domain_rule
    modality: must
    consequence: Without storing version snapshots, backtest results cannot be reproduced or audited; different data versions
      may silently change strategy performance and invalidate historical comparisons
    derived_from_bd_id: BD-GAP-004
  - id: finance-C-215
    when: When implementing custom data provider integration or attempting to write external datasets to ARCH library
    action: Assume the framework supports versioned writes and snapshot semantics for data persistence — the framework only
      supports built-in datasets (sp500, cpu, realized volatility) without version control or atomic write guarantees
    severity: high
    kind: claim_boundary
    modality: must_not
    consequence: Without versioned writes, concurrent data updates can corrupt datasets, and snapshot semantics cannot guarantee
      data consistency across parallel operations or strategy executions
    derived_from_bd_id: BD-GAP-011
  - id: finance-C-216
    when: When implementing custom data provider integration with ARCH library
    action: Use external database transactions or file versioning systems (e.g., git LFS, versioned S3 buckets) for custom
      dataset writes — implement atomic write patterns using database transactions or write-to-temp-then-rename operations
    severity: high
    kind: domain_rule
    modality: must
    consequence: Without version control and atomic writes, data corruption can occur during concurrent updates; strategies
      may execute on inconsistent snapshots, leading to unpredictable backtest results
    derived_from_bd_id: BD-GAP-011
  - id: finance-C-217
    when: When processing DatetimeIndex inputs with timezone-naive indices in arch/utility/array.py:259-276
    action: Assume the framework handles timezone conversion automatically — timezone-naive indices are silently stripped
      to GMT without warning, causing subtle off-by-one errors in quarterly and monthly data
    severity: high
    kind: claim_boundary
    modality: must_not
    consequence: Timezone-naive date indices are silently stripped to GMT, leading to off-by-one errors that accumulate over
      time in quarterly and monthly data, corrupting statistical analysis and strategy signals
    derived_from_bd_id: BD-GAP-012
  - id: finance-C-218
    when: When processing DatetimeIndex inputs before passing to ARCH library functions
    action: UTC-normalize each DatetimeIndex inputs using tz_localize(tz='UTC') before processing, and apply validate_timezone()
      helper to verify each date indices carry explicit UTC timezone information
    severity: high
    kind: domain_rule
    modality: must
    consequence: Without explicit timezone normalization, timezone-naive indices silently default to GMT, causing subtle off-by-one
      errors that corrupt quarterly and monthly statistical analysis used for strategy decisions
    derived_from_bd_id: BD-GAP-012
  - id: finance-C-219
    when: When implementing or modifying KPSS stationarity test logic in arch/unitroot/unitroot.py
    action: Use automatic bandwidth selection for KPSS test via auto_bandwidth() function — the auto_bandwidth function minimizes
      asymptotic mean squared error of the variance estimator; do not replace with fixed bandwidth formulas (T^0.5, T^0.4)
      unless sample size is small, series is trending, or exhibits heavy-tailed distributions
    severity: high
    kind: domain_rule
    modality: must
    consequence: Replacing auto-bandwidth with fixed bandwidth may cause KPSS test to reject stationarity incorrectly or accept
      non-stationary series as stationary, leading to incorrect trading strategy signals and financial losses
    derived_from_bd_id: BD-028
  - id: finance-C-220
    when: When processing monetary values or quantitative data in backtesting
    action: Assume the framework provides explicit currency/unit annotation for data fields — the framework does not implement
      currency/unit annotation; numeric values lack metadata about their denomination or measurement unit
    severity: high
    kind: claim_boundary
    modality: must_not
    consequence: Without explicit currency/unit annotation, mixed-currency portfolios or unit-mismatched data cause silent
      conversion errors that accumulate over time, leading to incorrect portfolio valuations and risk calculations in production
    derived_from_bd_id: BD-GAP-005
  - id: finance-C-221
    when: When defining portfolio data structures or processing multi-currency positions
    action: Add explicit currency/unit metadata to each monetary fields (e.g., currency_code='CNY', unit='share', scale=1)
      in data schemas and validate unit consistency before calculations
    severity: high
    kind: operational_lesson
    modality: must
    consequence: Explicit currency/unit annotation prevents silent unit mismatches that cause portfolio value miscalculations
      when strategies operate across multiple currencies or asset classes
    derived_from_bd_id: BD-GAP-005
  - id: finance-C-222
    when: When estimating covariance matrices for portfolio optimization or risk calculation
    action: Assume the framework automatically fixes non-positive semi-definite (PSD) covariance matrices — the framework
      does not implement PSD correction; eigenvalue adjustment, Higham's method, or shrinkage is not applied automatically
    severity: high
    kind: claim_boundary
    modality: must_not
    consequence: Non-PSD covariance matrices cause portfolio optimizers to fail or produce invalid allocations with negative
      variances, leading to incorrect risk estimates and potentially catastrophic trading decisions
    derived_from_bd_id: BD-GAP-008
  - id: finance-C-223
    when: When performing covariance matrix estimation in portfolio construction
    action: Apply PSD correction using eigenvalue clipping (set negative eigenvalues to 0, rescale trace) or Higham's method
      before passing covariance matrix to portfolio optimizer
    severity: high
    kind: domain_rule
    modality: must
    consequence: Without PSD correction, portfolio optimization fails for ill-conditioned covariance matrices, causing strategies
      to produce unstable or infeasible allocations
    derived_from_bd_id: BD-GAP-008
  - id: finance-C-224
    when: When estimating covariance matrices for portfolios with N > T (high-dimensional case)
    action: Select Ledoit-Wolf shrinkage estimator or OAS (Oracle Approximating Shrinkage) with target=diagonal, and configure
      shrinkage intensity α in range [0.2, 0.4] based on cross-validation
    severity: high
    kind: operational_lesson
    modality: must
    consequence: Shrinkage estimators provide finite-sample bias correction that improves portfolio out-of-sample performance
      by 5-15% in high-dimensional cases compared to raw sample covariance
    derived_from_bd_id: BD-GAP-009
  - id: finance-C-225
    when: When configuring VaR/CVaR risk metrics for regulatory or internal risk management
    action: Set confidence_level=0.99 for regulatory VaR (Basel requirements) or 0.95 for internal models, and configure lookback_window=252
      (1 year) for daily VaR or 60 for monthly; validate parameters against risk mandate before backtesting
    severity: high
    kind: domain_rule
    modality: must
    consequence: Explicit VaR/CVaR parameter configuration ensures regulatory compliance and accurate tail risk estimation
      aligned with the specific risk management mandate
    derived_from_bd_id: BD-GAP-010
  - id: finance-C-226
    when: When implementing volatility forecasting using realized volatility (RV) data
    action: Use HAR (Heterogeneous Autoregressive) model with predetermined lags of 1, 5, and 22 days representing daily,
      weekly, and monthly components respectively
    severity: high
    kind: domain_rule
    modality: must
    consequence: Using alternative volatility models like standard GARCH without understanding HAR's multi-scale advantage
      produces systematically different volatility forecasts, leading to incorrect risk assessments and suboptimal hedging
      decisions in live trading
    derived_from_bd_id: BD-050
  - id: finance-C-227
    when: When implementing bootstrap confidence intervals or hypothesis tests for time series
    action: Use Stationary Bootstrap with geometric block length distribution where expected block length equals 1/p (p =
      success probability) to verify stationarity of bootstrap samples
    severity: high
    kind: domain_rule
    modality: must
    consequence: Using fixed block length bootstrap methods like Circular Block Bootstrap or Moving Block Bootstrap introduces
      non-stationarity in bootstrap samples, causing confidence intervals to be systematically miscalibrated and hypothesis
      tests to have incorrect rejection rates
    derived_from_bd_id: BD-054
  - id: finance-C-228
    when: When implementing t-stat based lag selection for statistical tests
    action: Apply t-stat threshold of |1.645| (10% two-sided significance) for lag elimination — remove lags where absolute
      t-stat < 1.645 and continue until each remaining lags meet the threshold
    severity: high
    kind: domain_rule
    modality: must
    consequence: Changing the t-stat threshold alters the lag selection aggressiveness; using a stricter threshold (e.g.,
      1.96 for 5%) retains fewer lags while using a looser threshold retains more lags, directly affecting model specification
      and test power
    derived_from_bd_id: BD-055
  - id: finance-C-229
    when: When implementing or configuring APARCH volatility models in ARCH library
    action: Maintain delta=1 when using APARCH for TARCH specification; the power parameter controls volatility asymmetry
      where negative shocks produce larger volatility increases (absolute return specification)
    severity: high
    kind: domain_rule
    modality: must
    consequence: Without delta=1, APARCH no longer nests TARCH specification, eliminating the asymmetric volatility mechanism
      that captures negative shock amplification, causing systematically biased volatility forecasts
    derived_from_bd_id: BD-056
  - id: finance-C-231
    when: When configuring ARCHModel instances for volatility estimation
    action: Explicitly specify volatility model (ConstantVariance, EGARCH, GARCH, etc.) and distribution (Normal, StudentT,
      etc.) in constructor parameters rather than relying on hardcoded defaults; verify default path matches intended estimation
    severity: medium
    kind: operational_lesson
    modality: should
    consequence: Using hardcoded defaults without verification may result in ConstantVariance+Normal being silently used,
      causing incorrect volatility and distribution specifications that corrupt estimation results and invalidate downstream
      risk metrics
    derived_from_bd_id: BD-061
  - id: finance-C-232
    when: When refactoring bootstrap implementations in arch library
    action: Preserve the inheritance relationship between CircularBlockBootstrap and IIDBootstrap; verify block_length parameter
      override is maintained as the inheritance enables clone behavior and shared sampling interface
    severity: high
    kind: domain_rule
    modality: must
    consequence: Breaking the inheritance relationship between CircularBlockBootstrap and IIDBootstrap breaks the circular
      block bootstrap implementation, causing sampling failures and invalid statistical inference in bootstrap-based confidence
      intervals
    derived_from_bd_id: BD-065
output_validator:
  assertions:
  - id: OV-01
    check_predicate: all(p in inspect.getsource(zvt.factors.algorithm.macd) for p in ['slow=26', 'fast=12', 'n=9'])
    failure_message: 'FATAL: MACD params drifted from (fast=12, slow=26, n=9) — SL-08 violation, non-reproducible signals'
    business_meaning: Standard MACD parameters are a semantic lock; drift makes results incomparable with industry-standard
      indicators and non-reproducible.
    source_ids:
    - SL-08
    - BD-036
  - id: OV-02
    check_predicate: result.get('total_trades', 0) > 0 or result.get('explicit_zero_trade_ack') is True
    failure_message: Zero trades executed — likely missing pre-fetched data (see PC-02) or over-restrictive filters
    business_meaning: A backtest with zero trades is not a valid result; either data is missing or the strategy never triggered.
      Structural non-emptiness check is insufficient — we need business confirmation.
    source_ids:
    - SL-01
    - finance-C-073
  - id: OV-03
    check_predicate: result.get('annual_return') is None or abs(float(result['annual_return'])) <= 5.0
    failure_message: 'FATAL: |annual_return| > 500% — likely look-ahead bias or data error'
    business_meaning: Annual returns exceeding 500% are physically implausible for A-share strategies; indicates look-ahead
      bias or corrupt data.
    source_ids: []
  - id: OV-04
    check_predicate: result.get('holding_change_pct') is None or abs(float(result['holding_change_pct'])) <= 1.0
    failure_message: 'FATAL: |holding_change_pct| > 100% — physically impossible'
    business_meaning: Holding change percentage cannot exceed 100%; violation indicates position accounting error.
    source_ids:
    - BD-029
  - id: OV-05
    check_predicate: result.get('max_drawdown') is None or abs(float(result['max_drawdown'])) <= 1.0
    failure_message: 'FATAL: |max_drawdown| > 100% — impossible for non-leveraged account'
    business_meaning: Maximum drawdown cannot exceed 100% without leverage; violation indicates calculation error or look-ahead
      bias.
    source_ids: []
  - id: OV-06
    check_predicate: not (hasattr(result, 'trade_log') and result.trade_log and any(result.trade_log[i].action == 'sell' and
      i+1 < len(result.trade_log) and result.trade_log[i+1].action == 'buy' and result.trade_log[i].timestamp == result.trade_log[i+1].timestamp
      for i in range(len(result.trade_log)-1)))
    failure_message: 'FATAL: buy-before-sell detected in same cycle — SL-01 violation, creates implicit leverage'
    business_meaning: SL-01 requires sell() before buy() in each cycle; violation means available_long was not updated before
      buying, risking duplicate positions.
    source_ids:
    - SL-01
  scaffold:
    validate_py_path: '{workspace}/validate.py'
    tail_block: "# === DO NOT MODIFY BELOW THIS LINE ===\nif __name__ == \"__main__\":\n    result = run_backtest()\n    from\
      \ validate import enforce_validation\n    enforce_validation(result, output_path=\"{workspace}/result.csv\")\n# ===\
      \ END DO NOT MODIFY ==="
  enforcement_protocol: 1. Never edit validate.py. 2. Never delete the DO NOT MODIFY tail block from the main script. 3. Never
    wrap enforce_validation() in try/except. 4. Never rewrite result write logic — it MUST go through enforce_validation.
    5. If validate.py raises ImportError, fix the dependency, do not remove the call.
acceptance:
  hard_gates:
  - id: G1
    check: '{workspace}/result.csv exists AND file size > 0'
    on_fail: Strategy did not produce output; check run_backtest() return value and enforce_validation() call
  - id: G2
    check: '{workspace}/result.csv.validation_passed marker file exists'
    on_fail: Validation did not complete; review validate.py output and fix assertion failures
  - id: G3
    check: 'Main script contains literal: from validate import enforce_validation'
    on_fail: Validation chain stripped; re-add the import in the DO NOT MODIFY block
  - id: G4
    check: 'Main script contains literal: # === DO NOT MODIFY BELOW THIS LINE ==='
    on_fail: Validation fence removed; regenerate DO NOT MODIFY tail block
  - id: G5
    check: 'result.csv has at least 1 row: pandas.read_csv(result.csv).shape[0] >= 1'
    on_fail: Empty result; check if trade_log is non-empty and factors generated signals. Confirm PC-02 (k-data exists) passed.
  - id: G6
    check: 'If MACD strategy: source contains ''slow=26'' AND ''fast=12'' AND ''n=9'' in algorithm call'
    on_fail: MACD params drifted from SL-08 lock; restore standard (12, 26, 9)
  - id: G7
    check: 'For data pipeline tasks: result.csv contains ''entity_id'' and ''timestamp'' fields'
    on_fail: Missing required columns; check Mixin.query_data return schema and DataFrame MultiIndex reset_index() before
      writing
  - id: G8
    check: 'OV-03 passes: abs(annual_return) <= 5.0 (500%)'
    on_fail: Physical plausibility check failed; investigate look-ahead bias or data corruption in input kdata
  soft_gates:
  - id: SG-01
    rubric: 'Strategy narrative consistency: user intent aligns with generated strategy.py logic. dim_a: signal direction
      (buy/sell) matches intent [1-5, pass>=4]; dim_b: frequency (daily/intraday) aligns [1-5, pass>=4]; dim_c: risk controls
      match user intent [1-5, pass>=4].'
  - id: SG-02
    rubric: 'Factor combination quality. dim_a: no highly correlated factor duplication [1-5, pass>=4]; dim_b: multi-period
      alignment correct [1-5, pass>=4]; dim_c: liquidity filter present for A-share [1-5, pass>=4].'
  - id: SG-03
    rubric: 'Data source selection appropriateness. dim_a: coverage sufficient for target entities [1-5, pass>=4]; dim_b:
      provider latency acceptable for strategy frequency [1-5, pass>=4]; dim_c: no unauthorized provider used without credentials
      [1-5, pass>=4].'
skill_crystallization:
  trigger: all_hard_gates_passed AND user_opt_out_skill_saving != true
  output_path_template: '{workspace}/../skills/{slug}.skill'
  slug_template: '{blueprint_id_short}-{uc_id_lower}'
  captured_fields:
  - name
  - intent_keywords
  - entry_point_script
  - validate_script
  - fatal_constraints
  - spec_locks
  - preconditions
  - install_recipes
  - human_summary_translated
  action: 'After all Hard Gates PASS, resolve slug via slug_template using the executed UC, then write the .skill YAML file
    at output_path_template. Notify user in their detected locale: ''Skill saved as {slug}.skill — next time say one of {sample_triggers}
    from the matched UC to invoke directly.'''
  violation_signal: All hard gates passed but no .skill file exists at expected path
  skill_file_schema:
    name: finance-bp-124 / Sharpe Ratio Bootstrap Statistical Inference
    version: v5.3
    intent_keywords:
    - bootstrap
    - sharpe ratio
    - statistical inference
    - confidence intervals
    - stationary bootstrap
    entry_point: run_backtest
    fatal_guards:
    - SL-01
    - SL-02
    - SL-03
    - SL-04
    - SL-05
    - SL-06
    - SL-07
    - SL-08
    - SL-10
    - SL-11
    - SL-12
    spec_locks:
    - SL-01
    - SL-02
    - SL-03
    - SL-04
    - SL-05
    - SL-06
    - SL-07
    - SL-08
    - SL-09
    - SL-10
    - SL-11
    - SL-12
    preconditions:
    - PC-01
    - PC-02
    - PC-03
    - PC-04
post_install_notice:
  trigger: skill_installation_complete
  message_template:
    positioning: I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow.
    capability_catalog:
      group_strategy:
        source: auto_grouped
        strategy_reason: no candidate field had 2-7 distinct values; all capabilities collapsed into single group
      groups:
      - group_id: all
        name: All Capabilities
        description: ''
        emoji: 📦
        uc_count: 9
        ucs:
        - uc_id: UC-101
          name: Sharpe Ratio Bootstrap Statistical Inference
          short_description: Computes statistical inference (confidence intervals, standard errors) for the Sharpe Ratio using
            bootstrap methods to quantify uncertainty in risk-ad
          sample_triggers:
          - bootstrap
          - sharpe ratio
          - statistical inference
        - uc_id: UC-102
          name: Multiple Model Comparison with SPA Test
          short_description: Compares 500 predictive models against a benchmark using the Superior Predictive Ability (SPA)
            test to determine if any models significantly outperfor
          sample_triggers:
          - model comparison
          - SPA test
          - multiple models
        - uc_id: UC-103
          name: Oil Price Cointegration Analysis
          short_description: Tests for cointegration relationships between WTI and Brent crude oil prices to identify mean-reverting
            spread opportunities using Engle-Granger and P
          sample_triggers:
          - cointegration
          - unit root
          - ADF test
        - uc_id: UC-104
          name: Credit Spread Stationarity Testing
          short_description: Tests for stationarity in credit spreads (BAA-AAA) using Augmented Dickey-Fuller tests to determine
            if mean-reversion trading strategies are applicabl
          sample_triggers:
          - unit root
          - ADF test
          - stationarity
        - uc_id: UC-105
          name: ARX Forecasting with Exogenous Variables
          short_description: Forecasts univariate time series using Autoregressive models with exogenous variables (ARX) to
            capture the impact of external factors on the target va
          sample_triggers:
          - ARX
          - exogenous variables
          - forecasting
        - uc_id: UC-106
          name: HARX Volatility Modeling with Fixed Variance
          short_description: Demonstrates how to specify a HARX mean model with fixed/external variance inputs and iteratively
            fit volatility models using the estimated conditiona
          sample_triggers:
          - fixed variance
          - HARX
          - volatility modeling
        - uc_id: UC-107
          name: S&P 500 GARCH Volatility Forecasting
          short_description: Forecasts future volatility of S&P 500 returns using GARCH models, including multi-step ahead
            forecasts and rolling window out-of-sample predictions
          sample_triggers:
          - GARCH
          - volatility forecasting
          - S&P 500
        - uc_id: UC-108
          name: S&P 500 GARCH Volatility Model Comparison
          short_description: 'Fits and compares different GARCH volatility model specifications (symmetric, asymmetric, power)
            with various error distributions to characterize S&P '
          sample_triggers:
          - GARCH
          - volatility modeling
          - S&P 500
        - uc_id: UC-109
          name: NASDAQ Volatility Scenario Generation
          short_description: Generates multiple volatility scenarios for NASDAQ returns using simulation-based forecasting
            methods, useful for risk management and option pricing a
          sample_triggers:
          - volatility scenarios
          - simulation
          - NASDAQ
    call_to_action: Tell me which one you want to try.
    featured_entries:
    - uc_id: UC-101
      beginner_prompt: Try sharpe ratio bootstrap statistical inference
      auto_selected: true
    - uc_id: UC-102
      beginner_prompt: Try multiple model comparison with spa test
      auto_selected: true
    - uc_id: UC-103
      beginner_prompt: Try oil price cointegration analysis
      auto_selected: true
    more_info_hint: Ask me 'what else can you do?' to see all 9 capabilities.
  locale_rendering:
    instruction: On skill_installation_complete, translate ALL user-facing strings (positioning + capability_catalog.groups[].name
      + capability_catalog.groups[].description + capability_catalog.groups[].ucs[].short_description + call_to_action + featured_entries[].beginner_prompt
      + more_info_hint) into detected user locale per locale_contract. Preserve UC-IDs, group_id, emoji, and sample_triggers
      verbatim.
    preserve_verbatim:
    - UC-IDs
    - group_id
    - emoji
    - sample_triggers
    - technical_class_names
  enforcement:
    action: 'Host agent MUST send composed message to user as the FIRST user-facing response after skill_installation_complete
      event. Message MUST contain: positioning, capability_catalog (rendered as markdown tables per group), 3 featured_entries,
      call_to_action, and more_info_hint.'
    violation_code: PIN-01
    violation_signal: First user-facing message post-install does not contain the full capability_catalog (all UCs grouped)
      OR skips featured_entries OR skips call_to_action.
human_summary:
  persona: Doraemon
  what_i_can_do:
    tagline: 'I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow. Just tell me
      what you want; I''ll write the code, you don''t have to dig docs. (Heads up: ZVT natively supports A-share, HK, and
      crypto. US stocks — stockus_nasdaq_AAPL — are half-baked; don''t bother for serious work.)'
    use_cases:
    - Oil Price Cointegration Analysis
    - Multiple Model Comparison with SPA Test
    - Sharpe Ratio Bootstrap Statistical Inference
    - A-share MACD daily golden-cross backtest with hfq price adjustment from eastmoney
    - 'End-to-end ZVT pipeline: FinanceRecorder + GoodCompanyFactor + StockTrader'
    - Multi-factor strategy with TargetSelector (AND mode) combining MACD + volume breakout
    - Index composition data collection (SZ1000, SZ2000) with EM recorder
  what_i_auto_fetch:
  - ZVT stage pipeline structure (data_collection → visualization) from LATEST.yaml
  - Semantic locks (SL-01 through SL-12) — especially sell-before-buy ordering and MACD params
  - Fatal constraints (finance-C-*) relevant to your target strategy type
  - 'Default parameters: MACD(12,26,9), hfq adjustment, buy_cost=0.001, base_capital=1M CNY'
  - Entity ID format (stock_sh_600000) and DataFrame MultiIndex convention
  - Provider-specific recorder class names and required class attributes
  what_i_ask_you:
  - 'Target market: A-share (default), HK, or crypto? (US stocks in ZVT are half-baked — stockus_nasdaq_AAPL exists but coverage
    is thin)'
  - 'Data source / provider: eastmoney (free, no account), joinquant (account+paid), baostock (free, good history), akshare,
    or qmt (broker)?'
  - 'Strategy type: MACD golden-cross, MA crossover, volume breakout, fundamental screen, or custom factor?'
  - 'Time range: start_timestamp and end_timestamp for backtest period'
  - 'Target entity IDs: specific stocks (stock_sh_600000) or index components (SZ1000)?'
  locale_rendering:
    instruction: On first user contact, translate all fields above into detected user locale while preserving Doraemon persona
      (direct, frank, mildly snarky, knows limits).
    preserve_verbatim:
    - BD-IDs
    - SL-IDs
    - UC-IDs
    - finance-C-IDs
    - class_names
    - function_names
    - file_paths
    - numeric_thresholds

ClawHub Coding Data Analysis+2

T@clawhub-tangweigang-jpg-8679fec286

Aml Data Generator

Skill

生成符合AMLSim格式的合成交易数据，将交易日志转换为用于反洗钱检测系统测试的模拟数据集，支持按银行ID分割账户、合并多源输出并生成交易网络图。

---
name: aml-data-generator
description: |-
  生成符合AMLSim格式的合成交易数据，将交易日志转换为用于反洗钱检测系统测试的模拟数据集，支持按银行ID分割账户、合并多源输出并生成交易网络图。
license: Proprietary. See LICENSE.txt in project root.
compatibility: Designed for Doramagic-host ecosystem (Claude Code / openclaw / Cursor). Requires Python 3.12+ with uv package manager.
metadata:
  version: "v6.1"
  blueprint_id: "finance-bp-060"
  compiled_at: "2026-04-22T13:00:18.242568+00:00"
  capability_markets: "global"
  capability_activities: "regtech-compliance"
  sop_version: "crystal-compilation-v6.1"
---
# AML 数据生成 (aml-data-generator)

> 生成符合AMLSim格式的合成交易数据，将交易日志转换为用于反洗钱检测系统测试的模拟数据集，支持按银行ID分割账户、合并多源输出并生成交易网络图。

## Pipeline

`data_collection -> data_storage -> factor_computation -> target_selection -> trading_execution -> visualization`

## Top Use Cases (13 total)

### Convert Logs to AML Simulation Data (`UC-101`)
Convert transaction log files into synthetic AML simulation data for testing anti-money laundering detection systems
**Triggers**: convert logs, synthetic data, AML simulation

### Split Accounts by Bank ID (`UC-102`)
Partition account CSV files by bank identifier for bank-specific analysis and processing
**Triggers**: split accounts, bank ID, partition data

### Combine AML Simulation Outputs (`UC-103`)
Aggregate multiple AMLSim output files into a consolidated dataset for comprehensive analysis
**Triggers**: combine outputs, merge data, AMLSim aggregation

For all **13** use cases, see [references/USE_CASES.md](references/USE_CASES.md).

**Execute trigger**: `When user intent matches intent_router.uc_entries[].positive_terms AND user uses action verb (run/execute/跑/执行/backtest/fetch/collect)`

## What I'll Ask You

- Target market: A-share (default), HK, or crypto? (US stocks in ZVT are half-baked — stockus_nasdaq_AAPL exists but coverage is thin)
- Data source / provider: eastmoney (free, no account), joinquant (account+paid), baostock (free, good history), akshare, or qmt (broker)?
- Strategy type: MACD golden-cross, MA crossover, volume breakout, fundamental screen, or custom factor?
- Time range: start_timestamp and end_timestamp for backtest period
- Target entity IDs: specific stocks (stock_sh_600000) or index components (SZ1000)?

## Semantic Locks (Fatal)

| ID | Rule | On Violation |
|---|---|---|
| `SL-01` | Execute sell orders before buy orders in every trading cycle | halt |
| `SL-02` | Trading signals MUST use next-bar execution (no look-ahead) | halt |
| `SL-03` | Entity IDs MUST follow format entity_type_exchange_code | halt |
| `SL-04` | DataFrame index MUST be MultiIndex (entity_id, timestamp) | halt |
| `SL-05` | TradingSignal MUST have EXACTLY ONE of: position_pct, order_money, order_amount | halt |
| `SL-06` | filter_result column semantics: True=BUY, False=SELL, None/NaN=NO ACTION | halt |
| `SL-07` | Transformer MUST run BEFORE Accumulator in factor pipeline | halt |
| `SL-08` | MACD parameters locked: fast=12, slow=26, signal=9 | halt |

Full lock definitions: [references/LOCKS.md](references/LOCKS.md)

## Top Anti-Patterns (15 total)

- **`AP-REGTECH-001`**: Missing attribute initialization on data structures
- **`AP-REGTECH-002`**: Self-loops in transaction graphs violate domain rules
- **`AP-REGTECH-003`**: Unvalidated floating-point inputs cause runtime crashes

All 15 anti-patterns: [references/ANTI_PATTERNS.md](references/ANTI_PATTERNS.md)

## Evidence Quality Notice

> [QUALITY NOTICE] This crystal was compiled from blueprint finance-bp-060. Evidence verify ratio = 15.9% and audit fail total = 22. Generated results may have uncaptured requirement gaps. Verify critical decisions against source files (LATEST.yaml / LATEST.jsonl).

## Reference Files

| File | Contents | When to Load |
|---|---|---|
| [references/seed.yaml](references/seed.yaml) | V6+ 全量权威 (source-of-truth) | 有行为/决策争议时必读 |
| [references/ANTI_PATTERNS.md](references/ANTI_PATTERNS.md) | 15 条跨项目反模式 | 开始实现前 |
| [references/WISDOM.md](references/WISDOM.md) | 跨项目精华借鉴 | 架构决策时 |
| [references/CONSTRAINTS.md](references/CONSTRAINTS.md) | domain + fatal 约束 | 规则冲突时 |
| [references/USE_CASES.md](references/USE_CASES.md) | 全量 KUC-* 业务场景 | 需要完整示例时 |
| [references/LOCKS.md](references/LOCKS.md) | SL-* + preconditions + hints | 生成回测/交易代码前 |
| [references/COMPONENTS.md](references/COMPONENTS.md) | AST 组件地图（按 module 拆分）| 查 API 时 |

---

*Compiled by Doramagic crystal-compilation-v6.1 from `finance-bp-060` blueprint at 2026-04-22T13:00:18.242568+00:00.*
*See [human_summary.md](human_summary.md) for non-technical overview.*

FILE:human_summary.md
# Human Summary

> I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow. Just tell me what you want; I'll write the code, you don't have to dig docs. (Heads up: ZVT natively supports A-share, HK, and crypto. US stocks — stockus_nasdaq_AAPL — are half-baked; don't bother for serious work.)

## What I Can Do

- **tagline**: I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow. Just tell me what you want; I'll write the code, you don't have to dig docs. (Heads up: ZVT natively supports A-share, HK, and crypto. US stocks — stockus_nasdaq_AAPL — are half-baked; don't bother for serious work.)
- **use_cases**: ['Combine AML Simulation Outputs', 'Split Accounts by Bank ID', 'Convert Logs to AML Simulation Data', 'A-share MACD daily golden-cross backtest with hfq price adjustment from eastmoney', 'End-to-end ZVT pipeline: FinanceRecorder + GoodCompanyFactor + StockTrader', 'Multi-factor strategy with TargetSelector (AND mode) combining MACD + volume breakout', 'Index composition data collection (SZ1000, SZ2000) with EM recorder']

## What I Ask You

- Target market: A-share (default), HK, or crypto? (US stocks in ZVT are half-baked — stockus_nasdaq_AAPL exists but coverage is thin)
- Data source / provider: eastmoney (free, no account), joinquant (account+paid), baostock (free, good history), akshare, or qmt (broker)?
- Strategy type: MACD golden-cross, MA crossover, volume breakout, fundamental screen, or custom factor?
- Time range: start_timestamp and end_timestamp for backtest period
- Target entity IDs: specific stocks (stock_sh_600000) or index components (SZ1000)?

FILE:references/ANTI_PATTERNS.md
# Anti-Patterns (Cross-Project)

Total: **15**

## finance-bp-060--AMLSim (1)

### `AP-REGTECH-011` — Mismatched configuration parameters across coupled components <sub>(medium)</sub>

When TransactionGenerator and Nominator use different degree_threshold values, Nominator identifies hub accounts using different criteria than TransactionGenerator. This causes incorrect fan-in/fan-out candidate selection. Consequence: AML typology patterns placed on wrong accounts, invalidating simulation results.

## finance-bp-060--AMLSim, finance-bp-067--firesale_stresstest (1)

### `AP-REGTECH-002` — Self-loops in transaction graphs violate domain rules <sub>(high)</sub>

When generating directed transaction graphs or AML typologies, allowing source == destination edges creates self-loops. In AML simulation, self-loops represent accounts sending money to themselves, which is not a valid money laundering pattern. In fire-sale models, self-loops cause undefined behavior. Consequence: corrupted graph topology and invalid typology validation.

## finance-bp-060--AMLSim, finance-bp-071--opensanctions (1)

### `AP-REGTECH-001` — Missing attribute initialization on data structures <sub>(high)</sub>

When loading account lists or creating entity dictionaries, failing to initialize required list/dict attributes (e.g., normal_models, statement IDs) causes KeyError or ValueError at runtime. The code path that reads these structures assumes they exist, but the initialization path omits them. Consequence: pipeline crashes or data loss for affected entities.

## finance-bp-062--ifrs9 (3)

### `AP-REGTECH-005` — Incorrect amortization windows violate IFRS 9 compliance <sub>(high)</sub>

Stage 1 ECL requires exactly 12-month amortization (11 zero-indexed iterations) while Stage 2/3 requires full remaining tenor (tenor-1 iterations). Using identical windows for all stages causes ECL over/understatement. Consequence: regulatory non-compliance and materially incorrect loan loss provisions.

### `AP-REGTECH-010` — Incorrect cumulative PD ordering corrupts lifetime ECL term structure <sub>(high)</sub>

Using cumprod(1-conPD) without shift(1) and fillna(1) produces corrupted first-period survival probability. This cascades into all subsequent marginal and cumulative PD calculations, violating IFRS 9 lifetime ECL requirements. Consequence: systematically incorrect provisions across all remaining tenor periods.

### `AP-REGTECH-015` — Missing EAD component in ECL formula produces incomplete provisions <sub>(high)</sub>

IFRS 9 requires ECL = PD x LGD x EAD. When the EAD module is missing or not integrated, the ECL calculation is incomplete and unusable for provisioning. Consequence: regulatory rejection of ECL calculations, blocking of provisioning and reporting processes.

## finance-bp-062--ifrs9, finance-bp-067--firesale_stresstest (2)

### `AP-REGTECH-003` — Unvalidated floating-point inputs cause runtime crashes <sub>(high)</sub>

When parsing CSV files or computing statistical functions on raw data, failing to validate inputs against acceptable ranges (e.g., DDP near 0 or 1 for norm.ppf, unvalidated floats from CSV) causes ValueError or infinite/NaN values. Consequence: entire model crashes before simulation or corrupted downstream calculations.

### `AP-REGTECH-004` — Division by zero in financial calculations produces inf/NaN <sub>(high)</sub>

When calculating ratios like DDP (downgrade observations / total observations) or price impact denominators (total_quantities), zero-denominator cases are not guarded. The resulting inf/NaN propagates through all downstream calculations, corrupting CCI, ECL, or market clearing. Consequence: systematic data corruption across the entire calculation pipeline.

## finance-bp-067--firesale_stresstest (4)

### `AP-REGTECH-006` — Wrong leverage formula in threshold-based decisions <sub>(high)</sub>

Computing leverage as equity-to-liabilities (E/L) instead of equity-to-assets (E/A) produces different values. This causes deleveraging triggers and insolvency detection to fire at wrong thresholds. Consequence: zombie banks continue operating with negative equity, or healthy banks unnecessarily deleverage.

### `AP-REGTECH-007` — Confusing deleveraging buffer threshold with insolvency threshold <sub>(high)</sub>

Banks below 3% leverage are insolvent and must default, but deleveraging should trigger at 4% buffer. Using the same threshold eliminates the buffer zone, causing immediate default with no intermediate corrective action. Consequence: excessive bank failures amplify systemic contagion.

### `AP-REGTECH-013` — Order-dependent execution creates first-mover advantage bias <sub>(medium)</sub>

Without separating step() and act() phases, first-acting banks sell assets before others decide, creating systematic first-mover advantage. This distorts the competitive equilibrium and fire-sale dynamics. Consequence: unreliable systemic risk estimates that understate contagion for late-acting banks.

### `AP-REGTECH-014` — Immediate asset sales cause double-selling and undefined state <sub>(medium)</sub>

Executing asset sales immediately rather than queuing them to a buffer allows multiple banks holding the same asset to sell simultaneously without accounting for concurrent intentions. Consequence: undefined price impact and incorrect cash transfers in market clearing.

## finance-bp-071--opensanctions (3)

### `AP-REGTECH-008` — Cache keys omit request body for state-changing methods <sub>(high)</sub>

Using only URL for cache fingerprints on POST/PATCH requests means different request bodies return identical cached content. This causes stale data, missing entities, and data corruption in compliance screening pipelines. Consequence: sanctions matches missed or false positives from stale entity data.

### `AP-REGTECH-009` — ID collision in entity construction creates false sanctions matches <sub>(high)</sub>

When constructing entity IDs from source identifiers, insufficient identifying attributes cause different real-world entities to receive identical IDs. The database then merges them into one entity. Consequence: a sanctioned entity's ID matches an innocent entity, causing false positive compliance alerts.

### `AP-REGTECH-012` — Reverse property assignment corrupts entity construction <sub>(medium)</sub>

Stub (reverse) properties represent inverse relationships and raise InvalidData when directly assigned. Attempting to add values to stub properties instead of forward properties causes ValueError, aborting entity construction. Consequence: entities lost from output, incomplete compliance datasets.

FILE:references/COMPONENTS.md
# Component Capability Map

**Project**: finance-bp-060--AMLSim
**Scan date**: 2026-04-22
**Stats**: {'total_files': 5, 'total_classes': 20, 'total_functions': 0, 'total_stages': 5}

## Modules (5)

- [graph_construction](components/graph_construction.md): 5 classes
- [alert_pattern_generation](components/alert_pattern_generation.md): 3 classes
- [log_conversion](components/log_conversion.md): 5 classes
- [alert_validation](components/alert_validation.md): 5 classes
- [data_combination](components/data_combination.md): 2 classes

FILE:references/CONSTRAINTS.md
# Constraints

## preservation_manifest

```yaml
required_objects:
  business_decisions_count: 114
  fatal_constraints_count: 54
  non_fatal_constraints_count: 129
  use_cases_count: 13
  semantic_locks_count: 12
  preconditions_count: 4
  evidence_quality_rules_count: 2
  traceback_scenarios_count: 5

```

FILE:references/LOCKS.md
# Semantic Locks + Preconditions

## Semantic Locks (12)

### `SL-01` <sub>(on_violation: fatal)</sub>
Execute sell orders before buy orders in every trading cycle

### `SL-02` <sub>(on_violation: fatal)</sub>
Trading signals MUST use next-bar execution (no look-ahead)

### `SL-03` <sub>(on_violation: fatal)</sub>
Entity IDs MUST follow format entity_type_exchange_code

### `SL-04` <sub>(on_violation: fatal)</sub>
DataFrame index MUST be MultiIndex (entity_id, timestamp)

### `SL-05` <sub>(on_violation: fatal)</sub>
TradingSignal MUST have EXACTLY ONE of: position_pct, order_money, order_amount

### `SL-06` <sub>(on_violation: fatal)</sub>
filter_result column semantics: True=BUY, False=SELL, None/NaN=NO ACTION

### `SL-07` <sub>(on_violation: fatal)</sub>
Transformer MUST run BEFORE Accumulator in factor pipeline

### `SL-08` <sub>(on_violation: fatal)</sub>
MACD parameters locked: fast=12, slow=26, signal=9

### `SL-09` <sub>(on_violation: warning)</sub>
Default transaction costs: buy_cost=0.001, sell_cost=0.001, slippage=0.001

### `SL-10` <sub>(on_violation: fatal)</sub>
A-share equity trading is T+1 (no same-day close of buy positions)

### `SL-11` <sub>(on_violation: fatal)</sub>
Recorder subclass MUST define provider AND data_schema class attributes

### `SL-12` <sub>(on_violation: fatal)</sub>
Factor result_df MUST contain either 'filter_result' OR 'score_result' column

## Preconditions (4)

- **`PC-01`**: `python3 -c 'import zvt; print(zvt.__version__)'` → on_fail: Run: python3 -m pip install zvt then re-run: python3 -m zvt.init_dirs to initialize data directories
- **`PC-02`**: `python3 -c "from zvt.api.kdata import get_kdata; df = get_kdata(entity_ids=['stock_sh_600000'], limit=1); assert df is not None and len(df) > 0, 'No kdata found'"` → on_fail: Run recorder first: python3 -m zvt.recorders.em.em_stock_kdata_recorder --entity_ids stock_sh_600000 (replace with your target entity IDs)
- **`PC-03`**: `python3 -c "import os; from pathlib import Path; zvt_home = Path(os.environ.get('ZVT_HOME', Path.home() / '.zvt')); assert zvt_home.exists(), f'ZVT home not found: {zvt_home}'"` → on_fail: Run: python3 -m zvt.init_dirs
- **`PC-04`**: `python3 -c "import os, tempfile; from pathlib import Path; zvt_home = Path(os.environ.get('ZVT_HOME', Path.home() / '.zvt')); test_f = zvt_home / '.write_test'; test_f.touch(); test_f.unlink()"` → on_fail: Check directory permissions: chmod u+w ~/.zvt or set ZVT_HOME environment variable to a writable location

FILE:references/USE_CASES.md
# Known Use Cases (KUC)

Total: **13**

## `KUC-101`
**Source**: `scripts/convert_logs.py`

Convert transaction log files into synthetic AML simulation data for testing anti-money laundering detection systems

## `KUC-102`
**Source**: `scripts/split_accounts_bank.py`

Partition account CSV files by bank identifier for bank-specific analysis and processing

## `KUC-103`
**Source**: `scripts/combine_data.py`

Aggregate multiple AMLSim output files into a consolidated dataset for comprehensive analysis

## `KUC-104`
**Source**: `scripts/transaction_graph_generator.py`

Generate the base transaction network graph used as input for AML simulation, defining account relationships and transaction patterns

## `KUC-105`
**Source**: `scripts/generate_scalefree.py`

Generate scale-free network graphs using Kronecker graph algorithm for research on network topology and distribution analysis

## `KUC-106`
**Source**: `scripts/visualize/plot_alert_pattern_subgraphs.py`

Visualize alert pattern subgraphs showing which accounts and transactions are involved in each generated alert for debugging and validation

## `KUC-107`
**Source**: `scripts/visualize/plot_distributions.py`

Generate statistical distribution plots (degree, amount, frequency) from transaction graphs for analysis and reporting

## `KUC-108`
**Source**: `scripts/amlsim/random_amount.py`

Generate random transaction amounts within configurable min/max bounds for transaction simulation

## `KUC-109`
**Source**: `scripts/amlsim/nominator.py`

Select appropriate accounts for different transaction types (fan-in, fan-out, single, mutual, periodical) based on network degree thresholds

## `KUC-110`
**Source**: `scripts/amlsim/rounded_amount.py`

Generate rounded transaction amounts (e.g., 100, 500, 1000) to simulate realistic human transaction patterns

## `KUC-111`
**Source**: `scripts/amlsim/normal_model.py`

Define and manage normal (non-suspicious) account behavior models including main accounts and member accounts for transaction simulation

## `KUC-112`
**Source**: `scripts/validation/network_analytics.py`

Load AMLSim outputs and analyze transaction network characteristics including degree distribution, connected components, and graph properties

## `KUC-113`
**Source**: `scripts/validation/validate_alerts.py`

Validate generated alerts against expected alert parameters to ensure AML simulation produces correct alert patterns and amounts

FILE:references/WISDOM.md
# Cross-Project Wisdom

Total: **10**

## `CW-REGTECH-001` — Input bounds validation before statistical computation
**From**: finance-bp-062--ifrs9, finance-bp-067--firesale_stresstest · **Applicable to**: regtech-compliance

Statistical functions like norm.ppf() and cumprod() have strict input requirements that, if violated, produce infinite or NaN values corrupting entire pipelines. Always validate inputs against domain constraints (DDP in (0,1), counts > 0) before passing to statistical functions. Apply to any statistical or inverse-CDF computation.

## `CW-REGTECH-002` — Graph/topology invariant verification before construction
**From**: finance-bp-060--AMLSim, finance-bp-067--firesale_stresstest · **Applicable to**: regtech-compliance

Before constructing graph structures (transaction networks, transition matrices), verify invariants: sum(in-degrees) = sum(out-degrees), matrix row sums = 1.0, degree sequence length divisibility. This catches data corruption early before expensive graph construction operations. Apply to any bipartite or directed graph generation.

## `CW-REGTECH-003` — Regulatory amortization window discipline
**From**: finance-bp-062--ifrs9 · **Applicable to**: regtech-compliance

IFRS 9 mandates different ECL calculation windows: exactly 12-month for Stage 1 (11 zero-indexed iterations), full remaining tenor for Stage 2/3. Mixing these up violates compliance requirements. Always encode stage-specific window logic explicitly rather than reusing a single loop variable across stages.

## `CW-REGTECH-004` — Fingerprint composition must include all request dimensions
**From**: finance-bp-071--opensanctions · **Applicable to**: regtech-compliance

Cache keys must include all request parameters that affect response content: URL, HTTP method, authentication headers, and request body for state-changing methods. POST requests with different bodies returning identical cache is a silent data corruption bug. Always compose fingerprints from the union of all content-affecting parameters.

## `CW-REGTECH-005` — Floating-point zero-equivalence with explicit epsilon tolerance
**From**: finance-bp-067--firesale_stresstest · **Applicable to**: regtech-compliance

IEEE 754 floating-point precision causes exact zero comparisons to fail in financial calculations. Always use eps=1e-9 tolerance for zero-equivalence checks in market clearing, leverage ratios, and price impact calculations. This prevents division-by-zero crashes and incorrect cash transfers.

## `CW-REGTECH-006` — Stage classification threshold ordering enforcement
**From**: finance-bp-062--ifrs9 · **Applicable to**: regtech-compliance

IFRS 9 SICR thresholds must be ordered: BUCKETS 2-3 trigger Stage 2, BUCKETS >=4 trigger Stage 3. Applying thresholds in wrong order or omitting absolute DPD triggers causes material ECL misstatement. Validate threshold ordering and document bucket-to-stage mapping explicitly.

## `CW-REGTECH-007` — Initialization-before-use dependency ordering
**From**: finance-bp-067--firesale_stresstest · **Applicable to**: regtech-compliance

Operational dependencies must initialize before dependent objects use them: AssetMarket before bank registration, CSV file existence before parsing, entity ID before statement addition. Violations cause AttributeError or FileNotFoundError that abort entire initialization. Always encode dependency ordering explicitly in initialization sequences.

## `CW-REGTECH-008` — Sufficient entity ID collision prevention
**From**: finance-bp-071--opensanctions · **Applicable to**: regtech-compliance

Entity IDs must include enough identifying attributes (dataset prefix, source, identifier type, document number) to guarantee uniqueness. Collisions create false equivalence between unrelated entities, directly causing false positive sanctions matches. Include the maximum available discriminating attributes in ID construction.

## `CW-REGTECH-009` — Hub selection with candidate removal before addition
**From**: finance-bp-060--AMLSim · **Applicable to**: regtech-compliance

When selecting hub accounts for typology placement, always call remove_typology_candidate BEFORE add_node for each selected account. Reversing this order causes hub self-selection (accounts choosing themselves) and duplicate assignment across overlapping patterns. Apply to any allocation algorithm with candidate pooling.

## `CW-REGTECH-010` — Insolvency detection before operational decisions
**From**: finance-bp-067--firesale_stresstest · **Applicable to**: regtech-compliance

Banks below the insolvency threshold (3% leverage) must trigger default immediately, not enter the deleveraging decision logic. Checking operational thresholds before insolvency creates zombie banks with negative equity. Always gate operational decisions on prior insolvency state.

FILE:references/components/alert_pattern_generation.md
# alert_pattern_generation (3 classes)

## `TransactionGenerator.add_aml_typology`
`alert_pattern_generation/transactiongenerator-add-aml-typology.py:0`

## `AMLTypology.add_transaction`
`alert_pattern_generation/amltypology-add-transaction.py:0`

## `AlertPattern`
`alert_pattern_generation/alertpattern.py:0`

FILE:references/components/alert_validation.md
# alert_validation (5 classes)

## `AlertValidator.validate_all`
`alert_validation/alertvalidator-validate-all.py:0`

## `satisfies_params`
`alert_validation/satisfies-params.py:0`

## `is_cycle`
`alert_validation/is-cycle.py:0`

## `is_scatter_gather`
`alert_validation/is-scatter-gather.py:0`

## `PatternValidator`
`alert_validation/patternvalidator.py:0`

FILE:references/components/data_combination.md
# data_combination (2 classes)

## `Combiner.combine`
`data_combination/combiner-combine.py:0`

## `Combiner.merge_schemas`
`data_combination/combiner-merge-schemas.py:0`

FILE:references/components/graph_construction.md
# graph_construction (5 classes)

## `TransactionGenerator.generate_normal_transactions`
`graph_construction/transactiongenerator-generate-normal-tra.py:0`

## `TransactionGenerator.build_normal_models`
`graph_construction/transactiongenerator-build-normal-models.py:0`

## `Nominator.place_normal_models`
`graph_construction/nominator-place-normal-models.py:0`

## `AmountGenerator`
`graph_construction/amountgenerator.py:0`

## `NormalModelType`
`graph_construction/normalmodeltype.py:0`

FILE:references/components/log_conversion.md
# log_conversion (5 classes)

## `LogConverter.convert`
`log_conversion/logconverter-convert.py:0`

## `Schema.get_tx_row`
`log_conversion/schema-get-tx-row.py:0`

## `Schema.get_account_row`
`log_conversion/schema-get-account-row.py:0`

## `FakerLocale`
`log_conversion/fakerlocale.py:0`

## `OutputFormat`
`log_conversion/outputformat.py:0`

FILE:references/seed.yaml
meta:
  id: finance-bp-060-v5.3
  version: v6.1
  blueprint_id: finance-bp-060
  sop_version: crystal-compilation-v6.1
  source_language: en
  compiled_at: '2026-04-22T13:00:18.242568+00:00'
  target_host: openclaw
  authoritative_artifact:
    primary: seed.yaml
    non_authoritative_derivatives:
    - SKILL.md (host-generated summary, may lag)
    - HEARTBEAT.md (host telemetry)
    - memory/*.md (host conversational memory)
    rule: On any behavioral decision (preconditions check, OV assertion, EQ rule firing, spec_lock verification), agents MUST
      re-read seed.yaml. Derivatives are for UI display only and may be out-of-date.
  execution_protocol:
    install_trigger:
    - Execute resources.host_adapter.install_recipes[] in declared order
    - Verify each package with import check before proceeding
    execute_trigger: When user intent matches intent_router.uc_entries[].positive_terms AND user uses action verb (run/execute/跑/执行/backtest/fetch/collect)
    on_execute:
    - Reload seed.yaml (do not rely on SKILL.md or cached summaries)
    - Run preconditions[] in declared order; halt on first fatal failure with on_fail message to user
    - Enter context_state_machine.CA1_MEMORY_CHECKED state
    - Evaluate evidence_quality.enforcement_rules[]; prepend user_disclosure_template
    - Translate user_facing_fields to user locale per locale_contract
    - "[V6 READING ORDER]\nThis crystal contains the following V6 layers. Before answering any business question, the host\
      \ MUST read them in order:\n  1. anti_patterns[] — cross-project anti-patterns (with AP-* ids)\n  2. cross_project_wisdom[]\
      \ — cross-project wisdom (with CW-* ids)\n  3. domain_constraints_injected[] — domain constraints (SHARED-* ids)\n \
      \ 4. known_use_cases[] — concrete business scenarios (KUC-* ids)\n  5. component_capability_map — AST component map\
      \ (by module)\n\nWhen answering user questions, proactively cite relevant AP-*/CW-*/SHARED-*/KUC-* ids with source text.\
      \ Examples: T+1 rules -> cite SHARED-* constraint; model comparison -> warn via AP-*; follow-holdings strategy -> cite\
      \ KUC-* with example file."
    workspace_resolution:
      scripts_path: '{host_workspace}/scripts/'
      skills_path: '{host_workspace}/skills/'
      trace_path: '{host_workspace}/.trace/'
  capability_tags:
    markets:
    - global
    activities:
    - regtech-compliance
  upgraded_from: finance-bp-060-v1.seed.yaml
  upgraded_at: '2026-04-22T13:20:11.565905+00:00'
  v6_inputs:
    ast_mind_map: knowledge/sources/finance/finance-bp-060--AMLSim/v6_inputs/ast_mind_map.yaml
    anti_patterns: null
    cross_project_wisdom: null
    examples_kuc: knowledge/sources/finance/finance-bp-060--AMLSim/v6_inputs/examples_kuc.yaml
    shared_pools_dir: knowledge/sources/finance/_shared
anti_patterns:
- id: AP-REGTECH-001
  title: Missing attribute initialization on data structures
  description: 'When loading account lists or creating entity dictionaries, failing to initialize required list/dict attributes
    (e.g., normal_models, statement IDs) causes KeyError or ValueError at runtime. The code path that reads these structures
    assumes they exist, but the initialization path omits them. Consequence: pipeline crashes or data loss for affected entities.'
  project_source: finance-bp-060--AMLSim, finance-bp-071--opensanctions
  severity: high
  applicable_to_tags:
    markets:
    - global
    activities:
    - regtech-compliance
  _source_file: anti-patterns/regtech.yaml
- id: AP-REGTECH-002
  title: Self-loops in transaction graphs violate domain rules
  description: 'When generating directed transaction graphs or AML typologies, allowing source == destination edges creates
    self-loops. In AML simulation, self-loops represent accounts sending money to themselves, which is not a valid money laundering
    pattern. In fire-sale models, self-loops cause undefined behavior. Consequence: corrupted graph topology and invalid typology
    validation.'
  project_source: finance-bp-060--AMLSim, finance-bp-067--firesale_stresstest
  severity: high
  applicable_to_tags:
    markets:
    - global
    activities:
    - regtech-compliance
  _source_file: anti-patterns/regtech.yaml
- id: AP-REGTECH-003
  title: Unvalidated floating-point inputs cause runtime crashes
  description: 'When parsing CSV files or computing statistical functions on raw data, failing to validate inputs against
    acceptable ranges (e.g., DDP near 0 or 1 for norm.ppf, unvalidated floats from CSV) causes ValueError or infinite/NaN
    values. Consequence: entire model crashes before simulation or corrupted downstream calculations.'
  project_source: finance-bp-062--ifrs9, finance-bp-067--firesale_stresstest
  severity: high
  applicable_to_tags:
    markets:
    - global
    activities:
    - regtech-compliance
  _source_file: anti-patterns/regtech.yaml
- id: AP-REGTECH-004
  title: Division by zero in financial calculations produces inf/NaN
  description: 'When calculating ratios like DDP (downgrade observations / total observations) or price impact denominators
    (total_quantities), zero-denominator cases are not guarded. The resulting inf/NaN propagates through all downstream calculations,
    corrupting CCI, ECL, or market clearing. Consequence: systematic data corruption across the entire calculation pipeline.'
  project_source: finance-bp-062--ifrs9, finance-bp-067--firesale_stresstest
  severity: high
  applicable_to_tags:
    markets:
    - global
    activities:
    - regtech-compliance
  _source_file: anti-patterns/regtech.yaml
- id: AP-REGTECH-005
  title: Incorrect amortization windows violate IFRS 9 compliance
  description: 'Stage 1 ECL requires exactly 12-month amortization (11 zero-indexed iterations) while Stage 2/3 requires full
    remaining tenor (tenor-1 iterations). Using identical windows for all stages causes ECL over/understatement. Consequence:
    regulatory non-compliance and materially incorrect loan loss provisions.'
  project_source: finance-bp-062--ifrs9
  severity: high
  applicable_to_tags:
    markets:
    - global
    activities:
    - regtech-compliance
  _source_file: anti-patterns/regtech.yaml
- id: AP-REGTECH-006
  title: Wrong leverage formula in threshold-based decisions
  description: 'Computing leverage as equity-to-liabilities (E/L) instead of equity-to-assets (E/A) produces different values.
    This causes deleveraging triggers and insolvency detection to fire at wrong thresholds. Consequence: zombie banks continue
    operating with negative equity, or healthy banks unnecessarily deleverage.'
  project_source: finance-bp-067--firesale_stresstest
  severity: high
  applicable_to_tags:
    markets:
    - global
    activities:
    - regtech-compliance
  _source_file: anti-patterns/regtech.yaml
- id: AP-REGTECH-007
  title: Confusing deleveraging buffer threshold with insolvency threshold
  description: 'Banks below 3% leverage are insolvent and must default, but deleveraging should trigger at 4% buffer. Using
    the same threshold eliminates the buffer zone, causing immediate default with no intermediate corrective action. Consequence:
    excessive bank failures amplify systemic contagion.'
  project_source: finance-bp-067--firesale_stresstest
  severity: high
  applicable_to_tags:
    markets:
    - global
    activities:
    - regtech-compliance
  _source_file: anti-patterns/regtech.yaml
- id: AP-REGTECH-008
  title: Cache keys omit request body for state-changing methods
  description: 'Using only URL for cache fingerprints on POST/PATCH requests means different request bodies return identical
    cached content. This causes stale data, missing entities, and data corruption in compliance screening pipelines. Consequence:
    sanctions matches missed or false positives from stale entity data.'
  project_source: finance-bp-071--opensanctions
  severity: high
  applicable_to_tags:
    markets:
    - global
    activities:
    - regtech-compliance
  _source_file: anti-patterns/regtech.yaml
- id: AP-REGTECH-009
  title: ID collision in entity construction creates false sanctions matches
  description: 'When constructing entity IDs from source identifiers, insufficient identifying attributes cause different
    real-world entities to receive identical IDs. The database then merges them into one entity. Consequence: a sanctioned
    entity''s ID matches an innocent entity, causing false positive compliance alerts.'
  project_source: finance-bp-071--opensanctions
  severity: high
  applicable_to_tags:
    markets:
    - global
    activities:
    - regtech-compliance
  _source_file: anti-patterns/regtech.yaml
- id: AP-REGTECH-010
  title: Incorrect cumulative PD ordering corrupts lifetime ECL term structure
  description: 'Using cumprod(1-conPD) without shift(1) and fillna(1) produces corrupted first-period survival probability.
    This cascades into all subsequent marginal and cumulative PD calculations, violating IFRS 9 lifetime ECL requirements.
    Consequence: systematically incorrect provisions across all remaining tenor periods.'
  project_source: finance-bp-062--ifrs9
  severity: high
  applicable_to_tags:
    markets:
    - global
    activities:
    - regtech-compliance
  _source_file: anti-patterns/regtech.yaml
- id: AP-REGTECH-011
  title: Mismatched configuration parameters across coupled components
  description: 'When TransactionGenerator and Nominator use different degree_threshold values, Nominator identifies hub accounts
    using different criteria than TransactionGenerator. This causes incorrect fan-in/fan-out candidate selection. Consequence:
    AML typology patterns placed on wrong accounts, invalidating simulation results.'
  project_source: finance-bp-060--AMLSim
  severity: medium
  applicable_to_tags:
    markets:
    - global
    activities:
    - regtech-compliance
  _source_file: anti-patterns/regtech.yaml
- id: AP-REGTECH-012
  title: Reverse property assignment corrupts entity construction
  description: 'Stub (reverse) properties represent inverse relationships and raise InvalidData when directly assigned. Attempting
    to add values to stub properties instead of forward properties causes ValueError, aborting entity construction. Consequence:
    entities lost from output, incomplete compliance datasets.'
  project_source: finance-bp-071--opensanctions
  severity: medium
  applicable_to_tags:
    markets:
    - global
    activities:
    - regtech-compliance
  _source_file: anti-patterns/regtech.yaml
- id: AP-REGTECH-013
  title: Order-dependent execution creates first-mover advantage bias
  description: 'Without separating step() and act() phases, first-acting banks sell assets before others decide, creating
    systematic first-mover advantage. This distorts the competitive equilibrium and fire-sale dynamics. Consequence: unreliable
    systemic risk estimates that understate contagion for late-acting banks.'
  project_source: finance-bp-067--firesale_stresstest
  severity: medium
  applicable_to_tags:
    markets:
    - global
    activities:
    - regtech-compliance
  _source_file: anti-patterns/regtech.yaml
- id: AP-REGTECH-014
  title: Immediate asset sales cause double-selling and undefined state
  description: 'Executing asset sales immediately rather than queuing them to a buffer allows multiple banks holding the same
    asset to sell simultaneously without accounting for concurrent intentions. Consequence: undefined price impact and incorrect
    cash transfers in market clearing.'
  project_source: finance-bp-067--firesale_stresstest
  severity: medium
  applicable_to_tags:
    markets:
    - global
    activities:
    - regtech-compliance
  _source_file: anti-patterns/regtech.yaml
- id: AP-REGTECH-015
  title: Missing EAD component in ECL formula produces incomplete provisions
  description: 'IFRS 9 requires ECL = PD x LGD x EAD. When the EAD module is missing or not integrated, the ECL calculation
    is incomplete and unusable for provisioning. Consequence: regulatory rejection of ECL calculations, blocking of provisioning
    and reporting processes.'
  project_source: finance-bp-062--ifrs9
  severity: high
  applicable_to_tags:
    markets:
    - global
    activities:
    - regtech-compliance
  _source_file: anti-patterns/regtech.yaml
cross_project_wisdom:
- wisdom_id: CW-REGTECH-001
  source_project: finance-bp-062--ifrs9, finance-bp-067--firesale_stresstest
  pattern_name: Input bounds validation before statistical computation
  description: Statistical functions like norm.ppf() and cumprod() have strict input requirements that, if violated, produce
    infinite or NaN values corrupting entire pipelines. Always validate inputs against domain constraints (DDP in (0,1), counts
    > 0) before passing to statistical functions. Apply to any statistical or inverse-CDF computation.
  applicable_to_activity: regtech-compliance
  _source_file: cross-project-wisdom/regtech.yaml
- wisdom_id: CW-REGTECH-002
  source_project: finance-bp-060--AMLSim, finance-bp-067--firesale_stresstest
  pattern_name: Graph/topology invariant verification before construction
  description: 'Before constructing graph structures (transaction networks, transition matrices), verify invariants: sum(in-degrees)
    = sum(out-degrees), matrix row sums = 1.0, degree sequence length divisibility. This catches data corruption early before
    expensive graph construction operations. Apply to any bipartite or directed graph generation.'
  applicable_to_activity: regtech-compliance
  _source_file: cross-project-wisdom/regtech.yaml
- wisdom_id: CW-REGTECH-003
  source_project: finance-bp-062--ifrs9
  pattern_name: Regulatory amortization window discipline
  description: 'IFRS 9 mandates different ECL calculation windows: exactly 12-month for Stage 1 (11 zero-indexed iterations),
    full remaining tenor for Stage 2/3. Mixing these up violates compliance requirements. Always encode stage-specific window
    logic explicitly rather than reusing a single loop variable across stages.'
  applicable_to_activity: regtech-compliance
  _source_file: cross-project-wisdom/regtech.yaml
- wisdom_id: CW-REGTECH-004
  source_project: finance-bp-071--opensanctions
  pattern_name: Fingerprint composition must include all request dimensions
  description: 'Cache keys must include all request parameters that affect response content: URL, HTTP method, authentication
    headers, and request body for state-changing methods. POST requests with different bodies returning identical cache is
    a silent data corruption bug. Always compose fingerprints from the union of all content-affecting parameters.'
  applicable_to_activity: regtech-compliance
  _source_file: cross-project-wisdom/regtech.yaml
- wisdom_id: CW-REGTECH-005
  source_project: finance-bp-067--firesale_stresstest
  pattern_name: Floating-point zero-equivalence with explicit epsilon tolerance
  description: IEEE 754 floating-point precision causes exact zero comparisons to fail in financial calculations. Always use
    eps=1e-9 tolerance for zero-equivalence checks in market clearing, leverage ratios, and price impact calculations. This
    prevents division-by-zero crashes and incorrect cash transfers.
  applicable_to_activity: regtech-compliance
  _source_file: cross-project-wisdom/regtech.yaml
- wisdom_id: CW-REGTECH-006
  source_project: finance-bp-062--ifrs9
  pattern_name: Stage classification threshold ordering enforcement
  description: 'IFRS 9 SICR thresholds must be ordered: BUCKETS 2-3 trigger Stage 2, BUCKETS >=4 trigger Stage 3. Applying
    thresholds in wrong order or omitting absolute DPD triggers causes material ECL misstatement. Validate threshold ordering
    and document bucket-to-stage mapping explicitly.'
  applicable_to_activity: regtech-compliance
  _source_file: cross-project-wisdom/regtech.yaml
- wisdom_id: CW-REGTECH-007
  source_project: finance-bp-067--firesale_stresstest
  pattern_name: Initialization-before-use dependency ordering
  description: 'Operational dependencies must initialize before dependent objects use them: AssetMarket before bank registration,
    CSV file existence before parsing, entity ID before statement addition. Violations cause AttributeError or FileNotFoundError
    that abort entire initialization. Always encode dependency ordering explicitly in initialization sequences.'
  applicable_to_activity: regtech-compliance
  _source_file: cross-project-wisdom/regtech.yaml
- wisdom_id: CW-REGTECH-008
  source_project: finance-bp-071--opensanctions
  pattern_name: Sufficient entity ID collision prevention
  description: Entity IDs must include enough identifying attributes (dataset prefix, source, identifier type, document number)
    to guarantee uniqueness. Collisions create false equivalence between unrelated entities, directly causing false positive
    sanctions matches. Include the maximum available discriminating attributes in ID construction.
  applicable_to_activity: regtech-compliance
  _source_file: cross-project-wisdom/regtech.yaml
- wisdom_id: CW-REGTECH-009
  source_project: finance-bp-060--AMLSim
  pattern_name: Hub selection with candidate removal before addition
  description: When selecting hub accounts for typology placement, always call remove_typology_candidate BEFORE add_node for
    each selected account. Reversing this order causes hub self-selection (accounts choosing themselves) and duplicate assignment
    across overlapping patterns. Apply to any allocation algorithm with candidate pooling.
  applicable_to_activity: regtech-compliance
  _source_file: cross-project-wisdom/regtech.yaml
- wisdom_id: CW-REGTECH-010
  source_project: finance-bp-067--firesale_stresstest
  pattern_name: Insolvency detection before operational decisions
  description: Banks below the insolvency threshold (3% leverage) must trigger default immediately, not enter the deleveraging
    decision logic. Checking operational thresholds before insolvency creates zombie banks with negative equity. Always gate
    operational decisions on prior insolvency state.
  applicable_to_activity: regtech-compliance
  _source_file: cross-project-wisdom/regtech.yaml
domain_constraints_injected: []
resources_injected: {}
known_use_cases:
- kuc_id: KUC-101
  source_file: scripts/convert_logs.py
  business_problem: Convert transaction log files into synthetic AML simulation data for testing anti-money laundering detection
    systems
  intent_keywords:
  - convert logs
  - synthetic data
  - AML simulation
  - generate transaction logs
  - test data generation
  stage: data_collection
  data_domain: mixed
  type: data_pipeline
- kuc_id: KUC-102
  source_file: scripts/split_accounts_bank.py
  business_problem: Partition account CSV files by bank identifier for bank-specific analysis and processing
  intent_keywords:
  - split accounts
  - bank ID
  - partition data
  - bank filtering
  - account grouping
  stage: data_collection
  data_domain: holding_data
  type: data_pipeline
- kuc_id: KUC-103
  source_file: scripts/combine_data.py
  business_problem: Aggregate multiple AMLSim output files into a consolidated dataset for comprehensive analysis
  intent_keywords:
  - combine outputs
  - merge data
  - AMLSim aggregation
  - consolidate simulation results
  - dataset assembly
  stage: data_collection
  data_domain: mixed
  type: data_pipeline
- kuc_id: KUC-104
  source_file: scripts/transaction_graph_generator.py
  business_problem: Generate the base transaction network graph used as input for AML simulation, defining account relationships
    and transaction patterns
  intent_keywords:
  - transaction graph
  - network generation
  - graph topology
  - AMLSim input
  - account relationships
  stage: data_collection
  data_domain: trading_data
  type: data_pipeline
- kuc_id: KUC-105
  source_file: scripts/generate_scalefree.py
  business_problem: Generate scale-free network graphs using Kronecker graph algorithm for research on network topology and
    distribution analysis
  intent_keywords:
  - scale-free
  - Kronecker graph
  - network topology
  - degree distribution
  - graph generation research
  stage: network_generation
  data_domain: market_data
  type: research_analysis
- kuc_id: KUC-106
  source_file: scripts/visualize/plot_alert_pattern_subgraphs.py
  business_problem: Visualize alert pattern subgraphs showing which accounts and transactions are involved in each generated
    alert for debugging and validation
  intent_keywords:
  - alert visualization
  - subgraph plot
  - alert debugging
  - pattern inspection
  - AMLSim validation
  stage: validation
  data_domain: trading_data
  type: monitoring
- kuc_id: KUC-107
  source_file: scripts/visualize/plot_distributions.py
  business_problem: Generate statistical distribution plots (degree, amount, frequency) from transaction graphs for analysis
    and reporting
  intent_keywords:
  - distribution plot
  - statistics
  - degree distribution
  - amount analysis
  - transaction visualization
  stage: validation
  data_domain: trading_data
  type: reporting
- kuc_id: KUC-108
  source_file: scripts/amlsim/random_amount.py
  business_problem: Generate random transaction amounts within configurable min/max bounds for transaction simulation
  intent_keywords:
  - random amount
  - transaction generator
  - random number
  - amount range
  - simulation utility
  stage: factor_computation
  data_domain: trading_data
  type: builtin_factor
- kuc_id: KUC-109
  source_file: scripts/amlsim/nominator.py
  business_problem: Select appropriate accounts for different transaction types (fan-in, fan-out, single, mutual, periodical)
    based on network degree thresholds
  intent_keywords:
  - account selection
  - nominator
  - transaction routing
  - fan-in fan-out
  - network degree
  stage: factor_computation
  data_domain: holding_data
  type: builtin_factor
- kuc_id: KUC-110
  source_file: scripts/amlsim/rounded_amount.py
  business_problem: Generate rounded transaction amounts (e.g., 100, 500, 1000) to simulate realistic human transaction patterns
  intent_keywords:
  - rounded amount
  - realistic transaction
  - human pattern
  - currency rounding
  - simulation utility
  stage: factor_computation
  data_domain: trading_data
  type: builtin_factor
- kuc_id: KUC-111
  source_file: scripts/amlsim/normal_model.py
  business_problem: Define and manage normal (non-suspicious) account behavior models including main accounts and member accounts
    for transaction simulation
  intent_keywords:
  - normal model
  - behavior model
  - account group
  - main account
  - member account
  stage: factor_computation
  data_domain: holding_data
  type: builtin_factor
- kuc_id: KUC-112
  source_file: scripts/validation/network_analytics.py
  business_problem: Load AMLSim outputs and analyze transaction network characteristics including degree distribution, connected
    components, and graph properties
  intent_keywords:
  - network analysis
  - graph analytics
  - validation
  - topology analysis
  - degree analysis
  stage: validation
  data_domain: trading_data
  type: monitoring
- kuc_id: KUC-113
  source_file: scripts/validation/validate_alerts.py
  business_problem: Validate generated alerts against expected alert parameters to ensure AML simulation produces correct
    alert patterns and amounts
  intent_keywords:
  - validate alerts
  - alert verification
  - simulation accuracy
  - alert parameters
  - SAR validation
  stage: validation
  data_domain: trading_data
  type: monitoring
component_capability_map:
  project: finance-bp-060--AMLSim
  scan_date: '2026-04-22'
  stats:
    total_files: 5
    total_classes: 20
    total_functions: 0
    total_stages: 5
  modules:
    graph_construction:
      class_count: 5
      stage_id: graph_construction
      stage_order: 1
      responsibility: Builds a directed transaction graph from account lists and degree sequences using configuration-model
        random graphs. This is the foundation layer that creates the network topology for each downstream processing.
      classes:
      - name: TransactionGenerator.generate_normal_transactions
        file: graph_construction/transactiongenerator-generate-normal-tra.py
        line: 0
        kind: required_method
        signature: ''
      - name: TransactionGenerator.build_normal_models
        file: graph_construction/transactiongenerator-build-normal-models.py
        line: 0
        kind: required_method
        signature: ''
      - name: Nominator.place_normal_models
        file: graph_construction/nominator-place-normal-models.py
        line: 0
        kind: required_method
        signature: ''
      - name: AmountGenerator
        file: graph_construction/amountgenerator.py
        line: 0
        kind: replaceable_point
      - name: NormalModelType
        file: graph_construction/normalmodeltype.py
        line: 0
        kind: replaceable_point
      design_decision_count: 5
    alert_pattern_generation:
      class_count: 3
      stage_id: alert_pattern_generation
      stage_order: 2
      responsibility: Injects suspicious AML typology patterns (fan-in, fan-out, cycle, scatter-gather) into the base transaction
        graph. These represent the ground-truth alerts that validation will later detect.
      classes:
      - name: TransactionGenerator.add_aml_typology
        file: alert_pattern_generation/transactiongenerator-add-aml-typology.py
        line: 0
        kind: required_method
        signature: ''
      - name: AMLTypology.add_transaction
        file: alert_pattern_generation/amltypology-add-transaction.py
        line: 0
        kind: required_method
        signature: ''
      - name: AlertPattern
        file: alert_pattern_generation/alertpattern.py
        line: 0
        kind: replaceable_point
      design_decision_count: 4
    log_conversion:
      class_count: 5
      stage_id: log_conversion
      stage_order: 3
      responsibility: Transforms simulator output into standardized database schema format (Neo4j, JanusGraph). Applies Faker-generated
        names, computes party relationships, and formats timestamps.
      classes:
      - name: LogConverter.convert
        file: log_conversion/logconverter-convert.py
        line: 0
        kind: required_method
        signature: ''
      - name: Schema.get_tx_row
        file: log_conversion/schema-get-tx-row.py
        line: 0
        kind: required_method
        signature: ''
      - name: Schema.get_account_row
        file: log_conversion/schema-get-account-row.py
        line: 0
        kind: required_method
        signature: ''
      - name: FakerLocale
        file: log_conversion/fakerlocale.py
        line: 0
        kind: replaceable_point
      - name: OutputFormat
        file: log_conversion/outputformat.py
        line: 0
        kind: replaceable_point
      design_decision_count: 3
    alert_validation:
      class_count: 5
      stage_id: alert_validation
      stage_order: 4
      responsibility: Validates that generated alert patterns match their expected typology parameters. Checks account counts,
        amounts, periods, and structural properties like cycle ordering and scatter-gather chronology.
      classes:
      - name: AlertValidator.validate_all
        file: alert_validation/alertvalidator-validate-all.py
        line: 0
        kind: required_method
        signature: ''
      - name: satisfies_params
        file: alert_validation/satisfies-params.py
        line: 0
        kind: required_method
        signature: ''
      - name: is_cycle
        file: alert_validation/is-cycle.py
        line: 0
        kind: required_method
        signature: ''
      - name: is_scatter_gather
        file: alert_validation/is-scatter-gather.py
        line: 0
        kind: required_method
        signature: ''
      - name: PatternValidator
        file: alert_validation/patternvalidator.py
        line: 0
        kind: replaceable_point
      design_decision_count: 4
    data_combination:
      class_count: 2
      stage_id: data_combination
      stage_order: 5
      responsibility: Merges multiple simulation outputs into a single dataset. Aggregates degrees and appends output CSVs
        for multi-simulation batch runs, enabling large-scale dataset creation.
      classes:
      - name: Combiner.combine
        file: data_combination/combiner-combine.py
        line: 0
        kind: required_method
        signature: ''
      - name: Combiner.merge_schemas
        file: data_combination/combiner-merge-schemas.py
        line: 0
        kind: required_method
        signature: ''
      design_decision_count: 1
  data_flow_hints: []
locale_contract:
  source_language: en
  user_facing_fields:
  - human_summary.what_i_can_do.tagline
  - human_summary.what_i_can_do.use_cases[]
  - human_summary.what_i_auto_fetch[]
  - human_summary.what_i_ask_you[]
  - evidence_quality.user_disclosure_template
  - post_install_notice.message_template.positioning
  - post_install_notice.message_template.capability_catalog.groups[].name
  - post_install_notice.message_template.capability_catalog.groups[].description
  - post_install_notice.message_template.capability_catalog.groups[].ucs[].name
  - post_install_notice.message_template.capability_catalog.groups[].ucs[].short_description
  - post_install_notice.message_template.call_to_action
  - post_install_notice.message_template.featured_entries[].beginner_prompt
  - post_install_notice.message_template.more_info_hint
  - preconditions[].description
  - preconditions[].on_fail
  - intent_router.uc_entries[].name
  - intent_router.uc_entries[].ambiguity_question
  - architecture.pipeline
  - architecture.stages[].narrative.does_what
  - architecture.stages[].narrative.key_decisions
  - architecture.stages[].narrative.common_pitfalls
  - constraints.fatal[].consequence
  - constraints.regular[].consequence
  - output_validator.assertions[].failure_message
  - acceptance.hard_gates[].on_fail
  - skill_crystallization.action
  locale_detection_order:
  - explicit_user_declaration
  - first_message_language
  - system_locale
  translation_enforcement:
    trigger: on_first_user_message
    action: Render user_facing_fields in detected locale, preserving all IDs (BD-/SL-/UC-/finance-C-) and code identifiers
      verbatim
    violation_code: LOCALE-01
    violation_signal: User receives untranslated English Human Summary when detected locale != en
evidence_quality:
  declared:
    evidence_coverage_ratio: 1.0
    evidence_verify_ratio: 0.1590909090909091
    evidence_invalid: 74
    evidence_verified: 14
    evidence_auto_fixed: 0
    audit_coverage: 38/38 (100%)
    audit_pass_rate: 1/38 (2%)
    audit_fail_total: 22
    audit_finance_universal:
      pass: 1
      warn: 9
      fail: 10
    audit_subdomain_totals:
      pass: 0
      warn: 6
      fail: 12
  enforcement_rules:
  - id: EQ-01
    trigger: declared.evidence_verify_ratio < 0.5
    action: MUST invoke traceback lookup for all cited BD-IDs in output before emitting business code — read LATEST.yaml sections
      for each BD referenced
    violation_code: EQ-01-V
    violation_signal: Generated script references BD-IDs but no tool_call to read LATEST.yaml preceded code generation
  user_disclosure_template: '[QUALITY NOTICE] This crystal was compiled from blueprint finance-bp-060. Evidence verify ratio
    = 15.9% and audit fail total = 22. Generated results may have uncaptured requirement gaps. Verify critical decisions against
    source files (LATEST.yaml / LATEST.jsonl).'
traceback:
  source_files:
    blueprint: LATEST.yaml
    constraints: LATEST.jsonl
  mandatory_lookup_scenarios:
  - id: TB-01
    condition: Two constraints have apparently conflicting enforcement rules
    lookup_target: LATEST.jsonl — find both constraint IDs, compare `consequence` + `evidence_refs` to determine priority
  - id: TB-02
    condition: A business decision rationale is unclear or disputed
    lookup_target: LATEST.yaml — locate BD-ID under business_decisions, read `rationale` + `alternative_considered` fields
  - id: TB-03
    condition: evidence_invalid > 0 in evidence_quality.declared
    lookup_target: LATEST.yaml _enrich_meta — cross-check specific BD `evidence_refs` fields for invalid markers
  - id: TB-04
    condition: User asks where a rule comes from
    lookup_target: LATEST.jsonl — find constraint by ID, read `confidence.evidence_refs` for source file + line number
  - id: TB-05
    condition: Generated code does not match expected ZVT API behavior
    lookup_target: LATEST.yaml stages[].required_methods — verify method signature and evidence locator in source code
  degraded_lookup:
    no_fs_access: 'Ask the user to paste the relevant LATEST.yaml section or LATEST.jsonl lines for the BD-/finance-C- IDs
      in question. Crystal ID: finance-bp-060-v5.0.'
trace_schema:
  event_types:
  - precondition_check
  - spec_lock_check
  - evidence_rule_fired
  - evidence_rule_skipped
  - locale_translation_emitted
  - hard_gate_passed
  - hard_gate_failed
  - skill_emitted
  - false_completion_claim
preconditions:
- id: PC-01
  description: zvt package installed and importable
  check_command: python3 -c 'import zvt; print(zvt.__version__)'
  on_fail: 'Run: python3 -m pip install zvt  then re-run: python3 -m zvt.init_dirs to initialize data directories'
  severity: fatal
- id: PC-02
  description: K-data exists for target entities (required before backtesting)
  check_command: python3 -c "from zvt.api.kdata import get_kdata; df = get_kdata(entity_ids=['stock_sh_600000'], limit=1);
    assert df is not None and len(df) > 0, 'No kdata found'"
  on_fail: 'Run recorder first: python3 -m zvt.recorders.em.em_stock_kdata_recorder --entity_ids stock_sh_600000  (replace
    with your target entity IDs)'
  severity: fatal
  applies_to_uc:
  - UC-108
  - UC-109
  - UC-110
  - UC-111
- id: PC-03
  description: ZVT data directory initialized (~/.zvt or ZVT_HOME)
  check_command: 'python3 -c "import os; from pathlib import Path; zvt_home = Path(os.environ.get(''ZVT_HOME'', Path.home()
    / ''.zvt'')); assert zvt_home.exists(), f''ZVT home not found: {zvt_home}''"'
  on_fail: 'Run: python3 -m zvt.init_dirs'
  severity: fatal
- id: PC-04
  description: SQLite write permission for ZVT data directory
  check_command: python3 -c "import os, tempfile; from pathlib import Path; zvt_home = Path(os.environ.get('ZVT_HOME', Path.home()
    / '.zvt')); test_f = zvt_home / '.write_test'; test_f.touch(); test_f.unlink()"
  on_fail: 'Check directory permissions: chmod u+w ~/.zvt  or set ZVT_HOME environment variable to a writable location'
  severity: warn
intent_router:
  uc_entries:
  - uc_id: UC-101
    name: Convert Logs to AML Simulation Data
    positive_terms:
    - convert logs
    - synthetic data
    - AML simulation
    - generate transaction logs
    - test data generation
    data_domain: mixed
    negative_terms:
    - live trading
    - real-time data
    - production alerts
    - screening
    ambiguity_question: Are you generating synthetic test data for simulation, or processing real transaction logs for analysis?
  - uc_id: UC-102
    name: Split Accounts by Bank ID
    positive_terms:
    - split accounts
    - bank ID
    - partition data
    - bank filtering
    - account grouping
    data_domain: holding_data
    negative_terms:
    - alert generation
    - transaction simulation
    - network graph
    ambiguity_question: Do you need to split existing account data by bank, or are you looking for transaction graph generation?
  - uc_id: UC-103
    name: Combine AML Simulation Outputs
    positive_terms:
    - combine outputs
    - merge data
    - AMLSim aggregation
    - consolidate simulation results
    - dataset assembly
    data_domain: mixed
    negative_terms:
    - live trading
    - real-time processing
    - screening alerts
    ambiguity_question: Are you combining simulation outputs into one dataset, or running the simulation itself?
  - uc_id: UC-104
    name: Generate Transaction Graph
    positive_terms:
    - transaction graph
    - network generation
    - graph topology
    - AMLSim input
    - account relationships
    data_domain: trading_data
    negative_terms:
    - visualize graph
    - plot distributions
    - alert analysis
    ambiguity_question: Do you need to create/generate a new transaction network, or analyze/visualize an existing one?
  - uc_id: UC-105
    name: Generate Scale-Free Network Graph
    positive_terms:
    - scale-free
    - Kronecker graph
    - network topology
    - degree distribution
    - graph generation research
    data_domain: market_data
    negative_terms:
    - AML simulation
    - alert generation
    - transaction data
    ambiguity_question: Are you generating mathematical network graphs for research, or creating transaction networks for
      AML simulation?
  - uc_id: UC-106
    name: Plot Alert Pattern Subgraphs
    positive_terms:
    - alert visualization
    - subgraph plot
    - alert debugging
    - pattern inspection
    - AMLSim validation
    data_domain: trading_data
    negative_terms:
    - generate alerts
    - create transactions
    - distributions
    ambiguity_question: Are you visualizing existing alerts, or generating new transaction patterns and alerts?
  - uc_id: UC-107
    name: Plot Transaction Distributions
    positive_terms:
    - distribution plot
    - statistics
    - degree distribution
    - amount analysis
    - transaction visualization
    data_domain: trading_data
    negative_terms:
    - alert generation
    - transaction simulation
    - network construction
    ambiguity_question: Are you plotting statistics from existing transaction data, or generating new transactions for simulation?
  - uc_id: UC-108
    name: Random Amount Generator
    positive_terms:
    - random amount
    - transaction generator
    - random number
    - amount range
    - simulation utility
    data_domain: trading_data
    negative_terms:
    - fixed amount
    - rounded amount
    - real data
    ambiguity_question: Do you need random amounts with uniform distribution, or rounded/specific amounts for transactions?
  - uc_id: UC-109
    name: Account Nominator for Transaction Selection
    positive_terms:
    - account selection
    - nominator
    - transaction routing
    - fan-in fan-out
    - network degree
    data_domain: holding_data
    negative_terms:
    - alert generation
    - visualization
    - data loading
    ambiguity_question: Are you selecting accounts for transaction routing, or generating/analyzing alerts?
  - uc_id: UC-110
    name: Rounded Amount Generator
    positive_terms:
    - rounded amount
    - realistic transaction
    - human pattern
    - currency rounding
    - simulation utility
    data_domain: trading_data
    negative_terms:
    - random precise
    - exact amount
    - real data
    ambiguity_question: Do you need realistic rounded amounts, or precise random amounts for transactions?
  - uc_id: UC-111
    name: Normal Account Behavior Model
    positive_terms:
    - normal model
    - behavior model
    - account group
    - main account
    - member account
    data_domain: holding_data
    negative_terms:
    - SAR
    - suspicious activity
    - alert
    ambiguity_question: Are you defining normal transaction behavior patterns, or working with suspicious activity (SAR) alerts?
  - uc_id: UC-112
    name: Analyze Transaction Networks
    positive_terms:
    - network analysis
    - graph analytics
    - validation
    - topology analysis
    - degree analysis
    data_domain: trading_data
    negative_terms:
    - generate network
    - create transactions
    - simulation
    ambiguity_question: Are you analyzing existing network properties, or generating new transaction networks?
  - uc_id: UC-113
    name: Validate AML Simulation Alerts
    positive_terms:
    - validate alerts
    - alert verification
    - simulation accuracy
    - alert parameters
    - SAR validation
    data_domain: trading_data
    negative_terms:
    - generate alerts
    - create transactions
    - visualization
    ambiguity_question: Are you validating that alerts match expected parameters, or generating new alerts?
context_state_machine:
  states:
  - id: CA1_MEMORY_CHECKED
    entry: Task started
    exit: All memory queries attempted and recorded; memory_unavailable set if failed
    timeout: 30s — skip memory, mark memory_unavailable=true, proceed to CA2
  - id: CA2_GAPS_FILLED
    entry: CA1 complete
    exit: 'All FATAL-priority required inputs answered: target market (A-share/HK/US), data source, time range, strategy type'
    timeout: NOT skippable — FATAL inputs MUST be user-answered before proceeding
  - id: CA3_PATH_SELECTED
    entry: CA2 complete
    exit: intent_router matched single use case with confidence gap > 20% over next candidate, no data_domain ambiguity
    timeout: Trigger ambiguity_question for top-2 candidates, await user selection
  - id: CA4_EXECUTING
    entry: CA3 complete + user explicit confirmation received
    exit: All hard gates G1-Gn passed and output files written
    timeout: NOT skippable — user confirmation of execution path required
  enforcement: Code generation is PROHIBITED before CA4_EXECUTING. Any regression to earlier state MUST be announced to user.
    buy/sell ordering SL-01 check runs at CA4 entry.
spec_lock_registry:
  semantic_locks:
  - id: SL-01
    description: Execute sell orders before buy orders in every trading cycle
    locked_value: sell() called before buy() in each Trader.run() iteration
    violation_is: fatal
    source_bd_ids:
    - BD-018
  - id: SL-02
    description: Trading signals MUST use next-bar execution (no look-ahead)
    locked_value: due_timestamp = happen_timestamp + level.to_second()
    violation_is: fatal
    source_bd_ids:
    - BD-014
    - BD-025
  - id: SL-03
    description: Entity IDs MUST follow format entity_type_exchange_code
    locked_value: stock_sh_600000 | stockhk_hk_0700 | stockus_nasdaq_AAPL
    violation_is: fatal
    source_bd_ids: []
  - id: SL-04
    description: DataFrame index MUST be MultiIndex (entity_id, timestamp)
    locked_value: df.index.names == ['entity_id', 'timestamp']
    violation_is: fatal
    source_bd_ids: []
  - id: SL-05
    description: 'TradingSignal MUST have EXACTLY ONE of: position_pct, order_money, order_amount'
    locked_value: XOR enforcement in trading/__init__.py:68
    violation_is: fatal
    source_bd_ids: []
  - id: SL-06
    description: 'filter_result column semantics: True=BUY, False=SELL, None/NaN=NO ACTION'
    locked_value: factor.py:475 order_type_flag mapping
    violation_is: fatal
    source_bd_ids: []
  - id: SL-07
    description: Transformer MUST run BEFORE Accumulator in factor pipeline
    locked_value: 'compute_result(): transform at :403 before accumulator at :409'
    violation_is: fatal
    source_bd_ids: []
  - id: SL-08
    description: 'MACD parameters locked: fast=12, slow=26, signal=9'
    locked_value: factors/algorithm.py:30 macd(slow=26, fast=12, n=9)
    violation_is: fatal
    source_bd_ids:
    - BD-036
  - id: SL-09
    description: 'Default transaction costs: buy_cost=0.001, sell_cost=0.001, slippage=0.001'
    locked_value: sim_account.py:25 SimAccountService default costs
    violation_is: warning
    source_bd_ids:
    - BD-029
  - id: SL-10
    description: A-share equity trading is T+1 (no same-day close of buy positions)
    locked_value: sim_account.available_long filters by trading_t
    violation_is: fatal
    source_bd_ids: []
  - id: SL-11
    description: Recorder subclass MUST define provider AND data_schema class attributes
    locked_value: contract/recorder.py:71 Meta; register_schema decorator
    violation_is: fatal
    source_bd_ids: []
  - id: SL-12
    description: Factor result_df MUST contain either 'filter_result' OR 'score_result' column
    locked_value: result_df.columns.intersection({'filter_result', 'score_result'}) non-empty
    violation_is: fatal
    source_bd_ids: []
  implementation_hints:
  - id: IH-01
    hint: 'Use AdjustType enum exactly: qfq (pre-adjust), hfq (post-adjust), bfq (none) — contract/__init__.py:121'
  - id: IH-02
    hint: For A-share kdata, default to hfq for long-term analysis (dividend-adjusted) — trader.py:538 StockTrader
  - id: IH-03
    hint: SQLite connection MUST use check_same_thread=False for multi-threaded recorders
  - id: IH-04
    hint: Accumulator state serialization uses JSON with custom encoder/decoder hooks — contract/base_service.py
  - id: IH-05
    hint: Factor.level MUST match TargetSelector.level (enforced at add_factor) — factors/target_selector.py:84
preservation_manifest:
  required_objects:
    business_decisions_count: 114
    fatal_constraints_count: 54
    non_fatal_constraints_count: 129
    use_cases_count: 13
    semantic_locks_count: 12
    preconditions_count: 4
    evidence_quality_rules_count: 2
    traceback_scenarios_count: 5
architecture:
  pipeline: data_collection -> data_storage -> factor_computation -> target_selection -> trading_execution -> visualization
  stages:
  - id: data_collection
    narrative:
      does_what: TimeSeriesDataRecorder and FixedCycleDataRecorder fetch OHLCV and fundamental data from providers (eastmoney,
        joinquant, baostock, akshare) and persist domain objects (Stock1dKdata, BalanceSheet) to SQLite via df_to_db().
      key_decisions: BD-002 chose evaluate_start_end_size_timestamps for incremental fetch (not full refresh) because comparing
        to get_latest_saved_record avoids redundant API calls; BD-003 chose get_data_map field transformation to keep domain
        schema provider-agnostic.
      common_pitfalls: 'Don''t forget SL-11: Recorder subclass MUST declare both provider and data_schema class attributes
        else initialization fails with assertion error; finance-C-001 fatal violation.'
    business_decisions: []
  - id: data_storage
    narrative:
      does_what: StorageBackend persists DataFrames to per-provider SQLite databases at {data_path}/{provider}/{provider}_{db_name}.db
        using path templates from _get_path_template; Mixin.record_data and Mixin.query_data provide uniform read/write interface.
      key_decisions: BD-004 chose StorageBackend abstraction (not hardcoded SQLite) to allow future cloud storage swap; BD-006
        derives db_name from data_schema __tablename__ for per-domain database isolation.
      common_pitfalls: SL-04 violation (wrong DataFrame index) causes factor pipeline failures downstream; always ensure df.index.names
        == ['entity_id', 'timestamp'] before calling record_data.
    business_decisions: []
  - id: factor_computation
    narrative:
      does_what: Factor.compute() applies Transformer (stateless, e.g. MacdTransformer) then Accumulator (stateful, e.g. MaStatsAccumulator)
        to produce filter_result or score_result columns; EntityStateService persists per-entity rolling state across batches.
      key_decisions: BD-007 chose Factor inheriting DataReader for composable data access; SL-08 locks MACD at (fast=12, slow=26,
        n=9) — chose standard Appel parameters not adaptive because interpretability matters for practitioners.
      common_pitfalls: 'SL-07: Transformer MUST run before Accumulator — swapping order causes NaN propagation; SL-12: result_df
        must contain filter_result OR score_result column or TargetSelector silently drops all signals.'
    business_decisions: []
  - id: target_selection
    narrative:
      does_what: TargetSelector.add_factor() registers Factor instances; get_targets() returns entity_ids passing threshold
        filter at a specific timestamp, enabling point-in-time historical backtesting without look-ahead.
      key_decisions: BD-012 chose registrable factor list (not hardcoded) for runtime customization; BD-013 chose timestamp-specific
        filtering not current-only because backtests need historical point-in-time correctness.
      common_pitfalls: Factor.level MUST match TargetSelector.level (IH-05); mismatched levels cause silent empty target lists
        that look like no signals but are actually level-mismatch bugs.
    business_decisions: []
  - id: trading_execution
    narrative:
      does_what: Trader.run() calls sell() before buy() each cycle, generates TradingSignals with due_timestamp = happen_timestamp
        + level.to_second() for next-bar execution, and applies on_profit_control() for stop-loss/take-profit before regular
        target selection.
      key_decisions: SL-01 locks sell-before-buy order because available_long check in sim_account depends on it — chose this
        over symmetric ordering to prevent implicit leverage; BD-039 chose long=AND/short=OR multi-level logic to reflect
        risk asymmetry.
      common_pitfalls: 'SL-02 violation (immediate execution instead of next-bar) introduces look-ahead bias and makes backtest
        results unreproducible in live trading; SL-10: A-share T+1 constraint — backtesting without it overstates returns.'
    business_decisions: []
  - id: visualization
    narrative:
      does_what: Drawer.draw() combines kline main chart with factor overlays and Rect annotations for entry/exit signals
        using Plotly; Drawable interface on Factor enables consistent chart rendering across data types.
      key_decisions: BD-019 chose drawer_rects subclass override for custom annotations not hardcoded markers — allows traders
        to define entry/exit visuals without modifying base drawing logic.
      common_pitfalls: draw_result=True by default (BD-055) is fine for development but set draw_result=False in production/headless
        environments to avoid Plotly server startup overhead.
    business_decisions:
    - id: BD-062
      type: B/DK
      summary: Graphviz layout for alert subgraph visualization
  - id: cross_cutting_concerns
    narrative:
      does_what: 'Invariants and utilities that span multiple pipeline stages — collected from 33 source groups: account_attribute(1),
        account_classification(1), account_config(1), account_initialization(1), alert_pattern_generation(17), alert_validation(10),
        and 27 more.'
      key_decisions: 113 BDs merged here because they apply to more than one main stage (e.g. algorithm helpers, default value
        choices, ordering contracts, error handling). Agent should inspect individual BD summaries and link back to affected
        main stages via shared IDs.
      common_pitfalls: Cross-cutting concerns frequently surface as bugs when changes to one main stage unintentionally break
        another. Check constraints referencing these BDs and verify invariants still hold after any stage-local modification.
    business_decisions:
    - id: BD-035
      type: B/BA
      summary: Gender assigned with 50/50 probability (Male/Female)
    - id: BD-036
      type: B/BA
      summary: Account type assigned 50/50 (individual vs organization)
    - id: BD-043
      type: B/BA
      summary: 'Initial balance range: min=50000, max=100000'
    - id: BD-028
      type: B/BA
      summary: Account balance generated with uniform distribution between min_balance and max_balance
    - id: BD-006
      type: B
      summary: AML typologies use hub accounts as main nodes
    - id: BD-007
      type: B
      summary: Accounts removed from hub pool after being selected
    - id: BD-008
      type: BA/M
      summary: Alert types encoded as integer model IDs
    - id: BD-024
      type: B/BA
      summary: Transaction amounts rounded to psychologically appealing values (multiples of 10, 100, 1000)
    - id: BD-025
      type: B/BA
      summary: 'Step size selection: find power of ten giving 7-30 slots in range'
    - id: BD-046
      type: B/BA
      summary: 'Fan-in pattern: multiple originators send to single main account'
    - id: BD-047
      type: B
      summary: 'Fan-out pattern: single main account sends to multiple beneficiaries'
    - id: BD-048
      type: B
      summary: 'Bipartite pattern: split accounts evenly between originators and beneficiaries'
    - id: BD-049
      type: B
      summary: 'Stack pattern: divide accounts into thirds for originator/intermediate/beneficiary'
    - id: BD-050
      type: B/BA
      summary: 'Cycle pattern: transactions form ring using modulo arithmetic, margin decrements amount'
    - id: BD-051
      type: B/BA
      summary: 'Scatter-gather: split at midpoint date, scatter (orig->mid) then gather (mid->bene)'
    - id: BD-052
      type: B/BA
      summary: 'Gather-scatter: collect from origins to mid at midpoint, then distribute to beneficiaries'
    - id: BD-060
      type: B/RC
      summary: Random amount generation using uniform distribution
    - id: BD-069
      type: DK/B
      summary: Nominator uses circular iterator pattern with manual index wrapping - next_node_id() resets index to 0 on IndexError
    - id: BD-074
      type: M/DK
      summary: Schema classes use factory pattern via get_*_row() methods - row builders take **attrs for extensible columns
    - id: BD-077
      type: DK
      summary: Nominator state machine uses increment_type_index() round-robin across types - assumes balanced workload but
        allows type starvation
    - id: BD-082
      type: BA/DK
      summary: RoundedAmount implements adaptive step size algorithm (7-30 slots per range) - non-uniform distribution favoring
        round numbers
    - id: BD-012
      type: B
      summary: Validation uses graph-theoretic properties rather than regex/text matching
    - id: BD-013
      type: BA
      summary: Ordered patterns check chronological sequencing of transactions
    - id: BD-014
      type: B
      summary: Scatter-gather requires intermediate amounts to decrease
    - id: BD-018
      type: B
      summary: In-degree and out-degree sequences must have equal sums
    - id: BD-019
      type: B/BA
      summary: Total accounts must be multiple of degree sequence length
    - id: BD-030
      type: B/DK
      summary: SAR flag marks accounts involved in suspicious activity reports
    - id: BD-053
      type: B/BA
      summary: 'Alert validation checks: number of accounts, amount range, period range'
    - id: BD-054
      type: B
      summary: 'Cycle pattern validation: single cycle, chronological ordering, unique amounts'
    - id: BD-055
      type: B
      summary: 'Scatter-gather validation: intermediate degree=1, amounts decrease, chronological order'
    - id: BD-064
      type: B
      summary: Alert is_sar checked with > 0 comparison (sar_id > 0)
    - id: BD-037
      type: B/BA
      summary: Powerlaw distribution fitting for degree distribution visualization
    - id: BD-GAP-001
      type: T
      summary: Transaction generator uses INI configuration files to define test scenarios, enabling non-technical users to
        create fraud test data without modifying code
    - id: BD-031
      type: B
      summary: External (inter-bank) transactions allowed when multiple banks exist and bank_id is empty
    - id: BD-GAP-002
      type: B/BA
      summary: Suspicious account classification uses boolean flags (country_risk, business_risk) rather than continuous risk
        scores, forcing discrete categorization
    - id: BD-GAP-003
      type: B/BA
      summary: AML rule engine combines multiple indicators (amount, frequency, country, business) into single rule definitions,
        treating them as conjunction requirements
    - id: BD-GAP-005
      type: BA
      summary: Fraud patterns are explicitly typed (fan_in, fan_out, dense, mixed, stack) rather than emerging from configuration,
        encoding domain expertise about common laundering techniques
    - id: BD-044
      type: B/BA
      summary: Cash-in normal interval=100, fraud interval=50; cash-out reversed
    - id: BD-045
      type: B/BA
      summary: Cash-in normal amount=50-100, fraud=500-1000; cash-out reversed
    - id: BD-017
      type: B
      summary: Environment variable RANDOM_SEED overrides config file random seed
    - id: BD-056
      type: B/BA
      summary: Degree threshold of 4 for hub account selection
    - id: BD-015
      type: BA
      summary: Schema loaded from first input and reused for all
    - id: BD-033
      type: B
      summary: Transaction deduplication using (orig_id, dest_id, type, amount, date) tuple
    - id: BD-034
      type: B/DK
      summary: Faker library (en_US locale) generates account names and addresses
    - id: BD-063
      type: B/DK
      summary: Address retry loop ensures valid US address format
    - id: BD-GAP-004
      type: B
      summary: Transaction network generation models hub accounts as high-degree vertices with preferential attachment, reflecting
        real-world concentration of transaction volume
    - id: BD-067
      type: BA
      summary: DEFAULT_MARGIN_RATIO=0.1 encodes business assumption that intermediaries retain 10% of funds in cycle/scatter-gather
        patterns
    - id: BD-073
      type: DK
      summary: 'base_date inconsistency: conf.json and convert_logs.py use ''2017-01-01'' but network_analytics.py uses ''1970-01-01'''
    - id: BD-078
      type: BA/M
      summary: schedule_id defaults to 1 for normal models (hardcoded) vs AML typologies using schedule parameter from CSV
    - id: BD-083
      type: DK
      summary: 'degree_threshold test/production mismatch: conf.json uses threshold=10 but test fixtures use threshold=3'
    - id: BD-058
      type: B/DK
      summary: Active edge marking for normal model subgraph edges
    - id: BD-084
      type: B/BA
      summary: 'INTERACTION: BD-066 × BD-072 → Initialization sequence violations cause Nominator AttributeError cascades'
    - id: BD-085
      type: BA
      summary: 'INTERACTION: BD-073 × BD-038 → Inconsistent base dates (2017-01-01 vs 1970-01-01) corrupt temporal calculations
        across pipeline boundaries'
    - id: BD-086
      type: B/BA
      summary: 'INTERACTION: BD-083 × BD-003 → Test/production threshold mismatch causes false confidence in hub detection
        validation'
    - id: BD-087
      type: B/BA
      summary: 'INTERACTION: BD-006 × BD-007 → Hub main node selection conflicts with account pool depletion under high alert
        volumes'
    - id: BD-088
      type: BA
      summary: 'INTERACTION: BD-012 × BD-079 → Graph-theoretic validation amplifies maintenance burden and detection divergence
        risk'
    - id: BD-089
      type: BA
      summary: 'INTERACTION: BD-021 × BD-050 × BD-051 → Margin ratio creates detectable signature across cycle and scatter-gather
        patterns'
    - id: BD-090
      type: B
      summary: 'INTERACTION: BD-080 × BD-018 → Graph construction constraints formalize flow conservation requirements'
    - id: BD-091
      type: BA
      summary: 'INTERACTION: BD-009 × BD-015 → Schema-driven mapping enables multi-format support but assumes consistency
        across combined data'
    - id: BD-092
      type: B/BA
      summary: 'RISK CASCADE: BD-066 → BD-071 → BD-027 → BD-003 → BD-006 → BD-046/BD-047 → BD-005/BD-059 → Alert pipeline
        failure'
    - id: BD-093
      type: BA/M
      summary: 'RISK CASCADE: BD-073 → BD-010 → BD-029 → BD-053 → BD-013 → Incorrect temporal validation'
    - id: BD-094
      type: B/BA
      summary: 'CONTRADICTION: BD-015 assumes schema consistency while BD-009 enables schema evolution - these create conflicting
        requirements'
    - id: BD-095
      type: BA/M
      summary: 'CONTRADICTION: BD-078 hardcodes schedule_id=1 for normal models while AML typologies use dynamic CSV scheduling'
    - id: BD-001
      type: B
      summary: Directed configuration model avoids self-loops by swapping IDs
    - id: BD-002
      type: B
      summary: Degree sequences are repeated to fill total account count
    - id: BD-003
      type: B/BA
      summary: Hub nodes defined by degree_threshold crossing either in OR out degree
    - id: BD-004
      type: BA
      summary: Nominator uses degree-based candidate sorting
    - id: BD-005
      type: BA
      summary: Fan breakdown algorithm can steal nodes from existing clumps
    - id: BD-016
      type: B
      summary: Use directed configuration model to generate transaction graphs from degree sequences
    - id: BD-039
      type: B
      summary: Weakly connected components analyzed for network structure
    - id: BD-040
      type: B/BA
      summary: Clustering coefficient computed at intervals (default 30 steps) for performance
    - id: BD-GAP-006
      type: DK
      summary: 'Missing: Timezone explicit annotation + UTC normalization'
    - id: BD-GAP-007
      type: M
      summary: 'Missing: Convergence criteria explicit declaration'
    - id: BD-GAP-008
      type: DK
      summary: 'Missing: Point-in-Time data availability'
    - id: BD-GAP-009
      type: DK
      summary: 'Missing: Stale data detection and expiry policy'
    - id: BD-GAP-010
      type: B
      summary: 'Missing: Train/test time split integrity'
    - id: BD-GAP-011
      type: DK
      summary: 'Missing: Model and data version snapshot binding'
    - id: BD-GAP-012
      type: RC
      summary: 'Missing: Settlement and delivery time convention'
    - id: BD-GAP-013
      type: B
      summary: 'Missing: 模糊匹配算法与阈值（Jaro-Winkler/Levenshtein）'
    - id: BD-GAP-014
      type: RC
      summary: 'Missing: 误报率监控与模型治理'
    - id: BD-GAP-015
      type: B
      summary: 'Missing: ** "Implement immutable audit logging with cryptographic hash chains and append-only storage'
    - id: BD-GAP-016
      type: RC
      summary: 'Missing: ** "Add Decimal type for each currency amounts (balance, transaction amounts) instead of float/double'
    - id: BD-GAP-017
      type: B
      summary: 'Missing: ** "Implement jurisdiction-specific CTR/SAR threshold configuration with audit trail'
    - id: BD-GAP-018
      type: DK
      summary: 'Missing: ** "Add run_id/experiment_id for reproducible simulation snapshots'
    - id: BD-GAP-019
      type: M
      summary: 'Missing: Convergence criteria explicit declaration'
    - id: BD-020
      type: B/BA
      summary: Hub accounts selected as accounts with degree >= degree_threshold
    - id: BD-070
      type: B/BA
      summary: ResultGraphLoader overrides count_hub_accounts() but calls super() then extends - inheritance creates dual
        counting behavior
    - id: BD-068
      type: T
      summary: degree_threshold MUST be consistent between TransactionGenerator and Nominator - both receive identical value
        at construction
    - id: BD-071
      type: RC
      summary: Each account node MUST have 'normal_models' list attribute initialized at add_account() time for Nominator
        graph lookups
    - id: BD-076
      type: DK/B
      summary: fan_in/fan_out candidates are mutually exclusive after first assignment - node removed from opposite list on
        first use
    - id: BD-080
      type: T
      summary: 'Directed graph degree sequences MUST satisfy: sum(in_deg) == sum(out_deg) and num_accounts % len(sequence)
        == 0'
    - id: BD-009
      type: BA
      summary: Schema drives each column mappings via dataType annotations
    - id: BD-010
      type: B/DK
      summary: Days converted to UTC ISO 8601 via base_date offset
    - id: BD-011
      type: BA
      summary: SAR accounts extracted via org_type lookup
    - id: BD-038
      type: B/BA
      summary: Base date (2017-01-01) plus days offset for transaction timestamps
    - id: BD-027
      type: B/BA
      summary: Nominator uses degree_threshold to determine fan_in/fan_out candidates
    - id: BD-057
      type: B
      summary: Nominator tracks remaining/used counts per type for model assignment
    - id: BD-059
      type: B/BA
      summary: 'Fan breakdown candidates: subtract existing fan nodes, fill if below threshold'
    - id: BD-026
      type: B
      summary: 'Normal model types: single, fan_in, fan_out, forward, mutual, periodical'
    - id: BD-065
      type: B/BA
      summary: Normal model type count initialized from normalModels.csv
    - id: BD-061
      type: B/BA
      summary: Normal model schedule_id defaults to 2
    - id: BD-066
      type: B/BA
      summary: 'TransactionGenerator init sequence MUST be: set_num_accounts -> generate_normal_transactions -> load_account_list
        -> load_normal_models -> build_normal_models -> set_main_acct_candidates -> load_alert'
    - id: BD-072
      type: B
      summary: remove_typology_candidate MUST be called BEFORE add_node in each AML typology generators - order matters for
        hub accounting
    - id: BD-075
      type: BA
      summary: scatter_gather pattern requires scatter_date < gather_date AND scatter_amount > gather_amount - two independent
        ordering constraints
    - id: BD-081
      type: B
      summary: normal_models list must be written AFTER mark_active_edges sets edge attributes - active flag drives CSV export
        filter
    - id: BD-041
      type: B/BA
      summary: Simulation total_steps=150, base_date=2017-01-01, random_seed=0
    - id: BD-079
      type: M
      summary: validation/ module implements independent alert pattern detection (is_cycle, is_scatter_gather, is_gather_scatter)
        mirroring graph generator
    - id: BD-029
      type: B/BA
      summary: Transaction dates distributed uniformly within [start_date, end_date] inclusive
    - id: BD-021
      type: B/BA
      summary: Default margin ratio of 0.1 (10%) for intermediate accounts
    - id: BD-042
      type: B/BA
      summary: 'Transaction amount range: min=100, max=1000'
    - id: BD-032
      type: B/BA
      summary: Cash transactions identified by type in CASH_TYPES set ("CASH-IN", "CASH-OUT")
    - id: BD-022
      type: B
      summary: 'AML typology types: fan_in, fan_out, cycle, bipartite, stack, random, scatter_gather, gather_scatter'
    - id: BD-023
      type: B
      summary: 'Alert type ID mapping: fan_out=1, fan_in=2, cycle=3, bipartite=4, stack=5, random=6, scatter_gather=7, gather_scatter=8'
resources:
  packages:
  - name: numpy
    version_pin: latest
  - name: networkx
    version_pin: latest
  - name: matplotlib
    version_pin: latest
  - name: pygraphviz
    version_pin: latest
  - name: powerlaw
    version_pin: latest
  - name: python-dateutil
    version_pin: latest
  - name: Faker
    version_pin: latest
  - name: MASON
    version_pin: latest
  - name: JSON in Java
    version_pin: latest
  - name: WebGraph
    version_pin: latest
  strategy_scaffold:
    entry_point_name: run_backtest
    output_path: result.csv
    execution_mode: backtest
    conditional_entry_points:
      backtest:
        entry_point_name: run_backtest
        output_path: result.csv
      collector:
        entry_point_name: run_collector
        output_path: result.json
      factor:
        entry_point_name: run_factor
        output_path: result.parquet
      training:
        entry_point_name: run_training
        output_path: result.json
      serving:
        entry_point_name: run_server
        output_path: result.json
      research:
        entry_point_name: run_research
        output_path: result.json
    tail_template: "# === DO NOT MODIFY BELOW THIS LINE ===\nif __name__ == \"__main__\":\n    result = run_backtest()  #\
      \ implement above\n    from validate import enforce_validation\n    enforce_validation(result, output_path=\"{workspace}/result.csv\"\
      )\n# === END DO NOT MODIFY ==="
  host_adapter:
    target: openclaw
    timeout_seconds: 1800
    shell_operator_restriction: 'exec tool intercepts && / ; / | — never chain: ''pip install X && python Y''. Use separate
      exec calls.'
    install_recipes:
    - python3 -m pip install numpy
    - python3 -m pip install networkx
    - python3 -m pip install matplotlib
    - python3 -m pip install zvt
    credential_injection: JoinQuant/QMT credentials require user-side '!' prefix shell login. Never hardcode credentials in
      generated scripts.
    path_resolution: '{workspace} resolves to ~/.openclaw/workspace/doramagic at execution time.'
    file_io_tooling: Use openclaw 'write' tool for .py/.sql files; 'exec' tool for python3 /absolute/path/script.py (absolute
      paths only).
constraints:
  fatal:
  - id: finance-C-001
    when: When implementing directed_configuration_model graph generation
    action: Enforce sum of in-degrees equals sum of out-degrees before edge creation
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Invalid degree sequences will produce an inconsistent directed graph where some nodes have unmatched incoming/outgoing
      edges, corrupting the transaction network topology for AML analysis
    stage_ids:
    - graph_construction
  - id: finance-C-002
    when: When loading account lists via load_account_list_param
    action: Initialize normal_models as empty list for every account node
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Missing normal_models attribute causes KeyError when Nominator methods attempt to access it during fan_in_breakdown
      and fan_out_breakdown operations, breaking the entire normal model generation pipeline
    stage_ids:
    - graph_construction
  - id: finance-C-003
    when: When loading raw account lists via load_account_list_raw
    action: Initialize normal_models as empty list for every account node attribute dictionary
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Raw account loading path does not include normal_models initialization, causing KeyError when downstream
      Nominator code attempts to append to the missing attribute during normal model construction
    stage_ids:
    - graph_construction
  - id: finance-C-004
    when: When constructing directed graphs from degree sequences
    action: Swap IDs to eliminate self-loops when source equals destination after shuffling
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Self-loops in the transaction graph would represent accounts sending money to themselves, which violates
      AML domain requirements and corrupts downstream fan-in/fan-out pattern analysis
    stage_ids:
    - graph_construction
  - id: finance-C-005
    when: When parsing degree distribution CSV files
    action: Verify in-degree sequence length equals out-degree sequence length
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Mismatched sequence lengths produce a graph where the number of accounts with incoming edges differs from
      those with outgoing edges, corrupting the bipartite degree sequence matching for directed configuration model
    stage_ids:
    - graph_construction
  - id: finance-C-006
    when: When scaling degree sequences to match account count
    action: Require num_accounts to be evenly divisible by degree sequence length
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Non-divisible account count causes incomplete graph scaling where some accounts lack degree assignments,
      resulting in orphaned nodes with undefined transaction patterns in the AML simulation
    stage_ids:
    - graph_construction
  - id: finance-C-008
    when: When instantiating TransactionGenerator and Nominator classes
    action: Pass identical degree_threshold value to both TransactionGenerator and Nominator
    severity: fatal
    kind: architecture_guardrail
    modality: must
    consequence: Mismatched degree_threshold causes Nominator to identify hub accounts using different criteria than TransactionGenerator,
      leading to incorrect fan-in/fan-out candidate selection and corrupted AML pattern generation
    stage_ids:
    - graph_construction
  - id: finance-C-014
    when: When loading account data from aggregated CSV files
    action: Expand degree sequence entries by the repeat count before graph construction
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Without proper expansion, degree sequences remain at sample size causing graph topology to be incorrect for
      the full account set, with accounts receiving incorrect transaction pattern assignments
    stage_ids:
    - graph_construction
  - id: finance-C-015
    when: When implementing scatter_gather pattern generation
    action: verify scatter transactions occur before gather transactions (scatter_date < gather_date)
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Scatter-gather pattern validation will fail if scatter_date >= gather_date, breaking the chronological ordering
      required for AML typology verification
    stage_ids:
    - alert_pattern_generation
  - id: finance-C-016
    when: When implementing scatter_gather pattern generation
    action: verify scatter_amount exceeds gather_amount for each intermediate account
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Validation will reject scatter-gather patterns if scatter_amount <= gather_amount, as the margin must be
      retained by intermediate accounts
    stage_ids:
    - alert_pattern_generation
  - id: finance-C-017
    when: When loading margin_ratio configuration
    action: verify margin_ratio value is within the valid range [0.0, 1.0]
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Invalid margin_ratio will cause ValueError during pattern generation, preventing any AML typology from being
      placed in the transaction graph
    stage_ids:
    - alert_pattern_generation
  - id: finance-C-018
    when: When implementing cycle pattern generation
    action: verify cycle transactions are chronologically ordered with decreasing amounts
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Validation will reject cycle patterns if transaction dates are not strictly increasing or amounts are not
      strictly decreasing, breaking the expected money laundering funnel pattern
    stage_ids:
    - alert_pattern_generation
  - id: finance-C-019
    when: When adding transaction edges in AML typologies
    action: create self-loops where originator equals beneficiary account
    severity: fatal
    kind: domain_rule
    modality: must_not
    consequence: Self-loops are not valid transaction patterns for AML detection systems and will cause ValueError to be raised
      during edge creation
    stage_ids:
    - alert_pattern_generation
  - id: finance-C-020
    when: When creating AML typology patterns
    action: call remove_typology_candidate BEFORE add_node for each selected account
    severity: fatal
    kind: architecture_guardrail
    modality: must
    consequence: Reversing this order causes hub self-selection and duplicate account assignment across overlapping alert
      patterns, corrupting the generated transaction graph
    stage_ids:
    - alert_pattern_generation
  - id: finance-C-021
    when: When selecting hub accounts for AML typologies
    action: validate hub pool is non-empty before calling add_main_acct
    severity: fatal
    kind: architecture_guardrail
    modality: must
    consequence: Calling add_main_acct with empty hub pool raises ValueError and stops all further typology generation, preventing
      alert pattern placement
    stage_ids:
    - alert_pattern_generation
  - id: finance-C-025
    when: When generating scatter_gather patterns
    action: apply margin_ratio to intermediate account amounts correctly (gather_amount = scatter_amount - scatter_amount
      * margin_ratio)
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Incorrect margin application violates the scatter_amount > gather_amount invariant required for validation,
      causing pattern rejection
    stage_ids:
    - alert_pattern_generation
  - id: finance-C-038
    when: When converting simulator day offsets to timestamps
    action: Append 'Z' suffix to mark UTC timezone in ISO 8601 format
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Database imports fail or misattribute transaction times to wrong timezone, causing incorrect AML alert sequencing
      and compliance violations
    stage_ids:
    - log_conversion
  - id: finance-C-039
    when: When parsing SAR flag from input CSV files
    action: Convert SAR flag to lowercase string 'true'/'false' for consistent CSV output
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Alert filtering logic in downstream analytics fails silently because case-sensitive comparisons miss SAR
      transactions, causing compliance detection gaps
    stage_ids:
    - log_conversion
  - id: finance-C-041
    when: When outputting transaction rows with date valueType
    action: Apply days2date conversion to each date-typed columns before writing CSV rows
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: CSV columns contain raw day integers instead of ISO timestamps, causing database schema violations and failed
      imports for Neo4j/JanusGraph
    stage_ids:
    - log_conversion
  - id: finance-C-042
    when: When parsing alert transactions for SAR extraction
    action: Verify alert_id exists in self.reports dictionary before calling get_reason()
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Python raises AttributeError when accessing get_reason() on None, causing transaction conversion to abort
      and leaving incomplete CSV outputs
    stage_ids:
    - log_conversion
  - id: finance-C-044
    when: When converting raw transaction logs to CSV format
    action: Execute convert_alert_members() before convert_acct_tx() to populate self.reports
    severity: fatal
    kind: architecture_guardrail
    modality: must
    consequence: Alert transaction extraction fails with NoneType errors because reports dictionary is empty, preventing SAR
      case generation
    stage_ids:
    - log_conversion
  - id: finance-C-045
    when: When loading schema.json for column mapping
    action: Parse dataType annotations to determine field roles (account_id, timestamp, sar_flag, alert_id)
    severity: fatal
    kind: architecture_guardrail
    modality: must
    consequence: Schema-driven field mapping fails, causing wrong columns to populate critical identifiers and preventing
      join operations across CSV outputs
    stage_ids:
    - log_conversion
  - id: finance-C-053
    when: When validating cycle alert patterns
    action: check that the alert subgraph contains exactly one closed loop detectable by nx.simple_cycles
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Cycle patterns with zero or multiple closed loops will pass validation incorrectly, causing invalid AML typologies
      to be treated as legitimate alerts
    stage_ids:
    - alert_validation
  - id: finance-C-054
    when: When validating ordered scatter-gather alert patterns
    action: check that scatter_date is chronologically before gather_date for each intermediate accounts
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Scatter-gather patterns with transactions in reverse chronological order will be incorrectly validated, breaking
      the fundamental fan-out then fan-in structure of the AML typology
    stage_ids:
    - alert_validation
  - id: finance-C-055
    when: When validating ordered scatter-gather alert patterns
    action: check that scatter_amount exceeds gather_amount for each intermediate account to verify margin extraction
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Scatter-gather patterns where intermediate accounts do not receive margin will be incorrectly validated,
      failing to detect money laundering via fee extraction
    stage_ids:
    - alert_validation
  - id: finance-C-056
    when: When validating ordered cycle patterns
    action: check that cycle transaction amounts are strictly monotonically decreasing in chronological order
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Cycle patterns with unordered transaction amounts will be incorrectly validated, breaking the margin extraction
      chain in circular fund movements
    stage_ids:
    - alert_validation
  - id: finance-C-057
    when: When validating ordered cycle patterns
    action: check that cycle transaction dates are chronologically ordered and successor edge connects from predecessor's
      beneficiary
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Cycle patterns with unordered transaction dates or broken chain connections will be incorrectly validated,
      failing to represent legitimate circular fund flow
    stage_ids:
    - alert_validation
  - id: finance-C-063
    when: When validating gather-scatter patterns
    action: check that gather transactions complete before scatter transactions commence chronologically
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Gather-scatter patterns where scatter occurs before gather completes violate the fundamental fan-in then
      fan-out structure of this AML typology
    stage_ids:
    - alert_validation
  - id: finance-C-064
    when: When validating gather-scatter patterns
    action: check that scatter amounts do not exceed the average gathered amount per beneficiary account
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Gather-scatter patterns where scatter amounts exceed gathered amounts indicate impossible fund flows that
      should not pass validation
    stage_ids:
    - alert_validation
  - id: finance-C-065
    when: When modifying alert pattern validation rules
    action: modify validation rules in isolation without synchronizing changes to transaction_graph_generator.py
    severity: fatal
    kind: architecture_guardrail
    modality: must_not
    consequence: Desynchronization between generation and validation rules will cause valid generated patterns to fail validation
      or invalid patterns to pass
    stage_ids:
    - alert_validation
  - id: finance-C-066
    when: When loading alert parameter CSV files
    action: 'parse each required columns: count, type, schedule_id, min_accounts, max_accounts, min_amount, max_amount, min_period,
      max_period, bank_id, is_sar'
    severity: fatal
    kind: architecture_guardrail
    modality: must
    consequence: Missing column indices will cause KeyError exceptions during parameter loading, preventing alert validation
      from executing
    stage_ids:
    - alert_validation
  - id: finance-C-067
    when: When loading alert transaction CSV files
    action: construct a directed graph with edges containing amount and date attributes for each transaction
    severity: fatal
    kind: architecture_guardrail
    modality: must
    consequence: Directed graph without proper edge attributes will cause KeyError exceptions during pattern validation when
      accessing date or amount properties
    stage_ids:
    - alert_validation
  - id: finance-C-072
    when: When validating alert patterns against typology specifications
    action: only pass validation if the alert subgraph matches at least one parameter set with matching alert_type
    severity: fatal
    kind: architecture_guardrail
    modality: must
    consequence: Alert patterns matched against wrong typology parameters will produce incorrect validation results, compromising
      the integrity of generated simulation data
    stage_ids:
    - alert_validation
  - id: finance-C-079
    when: When combining multiple simulation outputs into a single dataset
    action: use only input simulations that share the same schema structure as the first input
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Combined CSV files will have mismatched column counts and names, causing downstream alert validation and
      ML training pipelines to fail with column index errors
    stage_ids:
    - data_combination
  - id: finance-C-080
    when: When appending output data from each input simulation
    action: offset each account IDs by the cumulative account ID offset from previous simulations
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Account IDs will collide across combined simulations, causing referential integrity failures when transactions
      reference accounts that appear in multiple simulations
    stage_ids:
    - data_combination
  - id: finance-C-081
    when: When appending output data from each input simulation
    action: offset each transaction IDs by the cumulative transaction ID offset from previous simulations
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Transaction IDs will duplicate across combined simulations, breaking alert-to-transaction joins and creating
      false-positive SAR identifications
    stage_ids:
    - data_combination
  - id: finance-C-082
    when: When appending output data from each input simulation
    action: offset each alert IDs by the cumulative alert ID offset from previous simulations
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Alert IDs will duplicate across combined simulations, causing alert_members and alert_transactions to join
      incorrectly and corrupt suspicious activity reports
    stage_ids:
    - data_combination
  - id: finance-C-083
    when: When combining transaction outputs from multiple simulations
    action: offset both orig_id and dest_id (account references) by the cumulative account ID offset
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Transaction sender/receiver references will point to wrong accounts across simulation boundaries, corrupting
      the transaction graph and breaking downstream graph analytics
    stage_ids:
    - data_combination
  - id: finance-C-084
    when: When combining alert member outputs from multiple simulations
    action: offset account_id references within alert_members by the cumulative account ID offset
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Alert-to-account mappings will reference incorrect accounts, causing investigators to examine wrong accounts
      when reviewing alerts
    stage_ids:
    - data_combination
  - id: finance-C-085
    when: When combining alert transaction outputs from multiple simulations
    action: offset tx_id, orig_id, and dest_id references within alert_transactions by cumulative offsets
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Alert transactions will reference non-existent transactions and accounts, breaking the link between suspicious
      activity alerts and the underlying transaction records
    stage_ids:
    - data_combination
  - id: finance-C-088
    when: When writing output CSV headers for combined files
    action: use the output schema column names (acct_names, tx_names, alert_acct_names, alert_tx_names)
    severity: fatal
    kind: architecture_guardrail
    modality: must
    consequence: Column headers in combined CSVs will not match the schema definition, causing downstream parsers to misidentify
      columns and corrupt data loading
    stage_ids:
    - data_combination
  - id: finance-C-096
    when: When configuring the degree sequence for directed graph generation
    action: Verify the sum of in-degrees equals the sum of out-degrees
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Directed configuration model will raise NetworkXError, causing the entire transaction graph generation pipeline
      to fail
  - id: finance-C-098
    when: When outputting alert members CSV from alert_pattern_generation
    action: Include the alertID column that uniquely identifies each AML typology
    severity: fatal
    kind: architecture_guardrail
    modality: must
    consequence: Log converter cannot link alert transactions to their corresponding typology members, breaking the SAR reporting
      chain
  - id: finance-C-100
    when: When generating hub account candidates for AML typologies
    action: Select accounts with degree exceeding the degree_threshold configuration parameter
    severity: fatal
    kind: operational_lesson
    modality: must
    consequence: Alert generation will fail with ValueError when no hub accounts exist, halting simulation
  - id: finance-C-103
    when: When combining multiple simulation outputs in data_combination
    action: Offset account IDs by the maximum ID from previously combined simulations
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Account ID collisions will cause incorrect transaction linkage in downstream analysis, producing invalid
      money laundering patterns
  - id: finance-C-104
    when: When combining multiple simulation outputs in data_combination
    action: Offset alert IDs by the maximum alert ID from previously combined simulations
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Alert ID collisions will merge distinct SAR cases in the alert database, corrupting compliance investigation
      workflows
  - id: finance-C-105
    when: When mapping transaction originator and beneficiary IDs during combination
    action: Apply account ID offset to both orig_id and dest_id fields in transactions
    severity: fatal
    kind: architecture_guardrail
    modality: must
    consequence: Transaction sender/receiver relationships will be incorrectly attributed, breaking transaction graph topology
      for AML analysis
  - id: finance-C-120
    when: When generating directed graphs from degree sequences
    action: Validate that sum of in-degrees equals sum of out-degrees before graph construction
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: NetworkXError raised during graph generation causes simulation failure; uncaught exception crashes the pipeline
      and loses all generated data
  - id: finance-C-121
    when: When loading degree sequences for directed graph generation
    action: Validate that number of total accounts is divisible by the degree sequence length
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: ValueError raised when degree sequence cannot evenly tile the account graph; simulation fails to initialize
      the transaction network
  - id: finance-C-130
    when: When using AMLSim in any production or compliance context
    action: Treat synthetic AML alerts as regulatory-grade findings or use them to satisfy AML compliance obligations
    severity: fatal
    kind: claim_boundary
    modality: must_not
    consequence: Non-compliant AML program may face regulatory sanctions, fines, or enforcement actions from financial regulators;
      synthetic data does not satisfy reporting requirements
  - id: finance-C-131
    when: When deploying AMLSim for real-time financial operations
    action: Connect AMLSim outputs to real-time transaction processing, payment systems, or live financial infrastructure
    severity: fatal
    kind: claim_boundary
    modality: must_not
    consequence: Synthetic transaction data injected into live systems may trigger incorrect fraud alerts, freeze legitimate
      customer accounts, or corrupt financial databases with fabricated records
  - id: finance-C-138
    when: When implementing account creation logic in AML transaction graph simulation
    action: Initialize 'normal_models' as an empty list attribute for each account node at add_account() time — accounts must
      have this attribute before any Nominator graph operations
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Accounts added without normal_models initialization cause AttributeError during Nominator operation when
      pattern generators attempt to extend the list, breaking graph construction and preventing alert generation
    derived_from_bd_id: BD-071
  - id: finance-C-160
    when: When implementing timestamp conversion and temporal validation logic
    action: Mix epoch-based timestamps (Unix epoch 1970-01-01) with date-string-based timestamps (2017-01-01 base) in temporal
      validation — verify each timestamps use consistent reference dates throughout the pipeline from generation (BD-073)
      through conversion (BD-010), distribution logic (BD-029), alert validation (BD-053), and chronological ordering (BD-013)
    severity: fatal
    kind: domain_rule
    modality: must_not
    consequence: The RISK CASCADE causes transactions generated with 2017-01-01 base dates to be interpreted relative to 1970-01-01
      Unix epoch, making period range validation produce incorrect results that either accept invalid patterns or reject valid
      ones, corrupting downstream analytics
    derived_from_bd_id: BD-093
  - id: finance-C-161
    when: When validating transaction temporal ranges against configured time periods
    action: Implement centralized date constant management — use a single source of truth for base_date (e.g., BASE_DATE =
      datetime(2017, 1, 1)) imported consistently across timestamp generation (BD-073), UTC conversion (BD-010), uniform distribution
      (BD-029), alert validation (BD-053), and chronological ordering (BD-013) modules
    severity: fatal
    kind: architecture_guardrail
    modality: must
    consequence: Without centralized date management, the base_date inconsistency (2017-01-01 vs 1970-01-01) propagates through
      each transformation stage, causing period validation to incorrectly compare timestamps against the wrong epoch and produce
      systematically wrong results
    derived_from_bd_id: BD-093
  regular:
  - id: finance-C-007
    when: When using AMLSim for transaction graph generation
    action: Use networkx version other than 1.11 for large graph generation
    severity: high
    kind: resource_boundary
    modality: must_not
    consequence: NetworkX version 2.* exhibits severe performance degradation when creating large transaction graphs, causing
      exponential slowdown in graph generation for datasets with 10K+ accounts
    stage_ids:
    - graph_construction
  - id: finance-C-009
    when: When implementing hub node identification logic
    action: Identify hub accounts using OR semantics for in/out degree threshold crossing
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Using AND instead of OR semantics excludes pure senders or pure receivers from hub identification, breaking
      the AML typology design where both fan-in aggregators and fan-out distributors serve as main accounts
    stage_ids:
    - graph_construction
  - id: finance-C-010
    when: When validating transaction graph generation outputs
    action: Verify that at least one hub account exists before proceeding to model building
    severity: high
    kind: operational_lesson
    modality: must
    consequence: Proceeding without hub accounts causes AML typology generation to fail when trying to assign main accounts,
      requiring users to reconfigure degree_threshold with no clear error message
    stage_ids:
    - graph_construction
  - id: finance-C-011
    when: When generating directed configuration model graphs
    action: Use the same random seed across TransactionGenerator and Nominator for reproducibility
    severity: medium
    kind: operational_lesson
    modality: must
    consequence: Different random seeds cause shuffled degree lists to produce different graph topologies between graph generation
      and model assignment, breaking reproducibility of AML simulation runs
    stage_ids:
    - graph_construction
  - id: finance-C-012
    when: When presenting AMLSim generated data as research or compliance evidence
    action: Claim generated transaction graphs represent real financial transaction data
    severity: high
    kind: claim_boundary
    modality: must_not
    consequence: Presenting synthetic AML simulation data as real transactions violates research integrity and could lead
      to regulatory compliance violations if used in actual AML investigations without proper disclosure
    stage_ids:
    - graph_construction
  - id: finance-C-013
    when: When evaluating graph generation quality or AML detection accuracy
    action: Assume backtest performance on synthetic data predicts live AML detection effectiveness
    severity: high
    kind: claim_boundary
    modality: must_not
    consequence: Synthetic transaction patterns may not capture real-world evasion techniques, data quality issues, or temporal
      dynamics, leading to over-optimistic evaluation of detection algorithms that fail on actual financial crime data
    stage_ids:
    - graph_construction
  - id: finance-C-022
    when: When generating alert subgroups
    action: assign sequential alert_id values and store subgraph under correct alert_id key
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Alert IDs in transaction log must match alert_members.csv for joinability in downstream validation; mismatched
      IDs break data integrity for alert correlation
    stage_ids:
    - alert_pattern_generation
  - id: finance-C-023
    when: When placing AML typology accounts
    action: use hub accounts (high-degree vertices) as main accounts for pattern centroids
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Non-hub main accounts create highly anomalous patterns that stand out artificially, defeating the purpose
      of realistic AML simulation
    stage_ids:
    - alert_pattern_generation
  - id: finance-C-024
    when: When implementing ordered pattern types
    action: verify transaction dates fall within the generated start_date and end_date range
    severity: high
    kind: domain_rule
    modality: must
    consequence: Out-of-range dates cause validation failures and create invalid temporal patterns that do not match the intended
      alert typology period
    stage_ids:
    - alert_pattern_generation
  - id: finance-C-026
    when: When generating cycle patterns
    action: apply margin_ratio to transfer amounts sequentially through each account in the cycle
    severity: high
    kind: domain_rule
    modality: must
    consequence: Without sequential margin deduction, cycle amounts would remain constant instead of decreasing, violating
      the expected money laundering funnel behavior
    stage_ids:
    - alert_pattern_generation
  - id: finance-C-027
    when: When selecting accounts for AML typology members
    action: allow hub accounts to be selected as main accounts for multiple patterns
    severity: high
    kind: architecture_guardrail
    modality: must_not
    consequence: Hub account reuse across patterns causes overlapping suspicious activity that inflates detection metrics
      and creates duplicate SAR assignments
    stage_ids:
    - alert_pattern_generation
  - id: finance-C-028
    when: When running AMLSim with large transaction graphs
    action: use networkx version 2.* or later
    severity: high
    kind: resource_boundary
    modality: must_not
    consequence: NetworkX 2.* has significant performance issues with large graph creation, causing excessive runtime or memory
      exhaustion during transaction graph generation
    stage_ids:
    - alert_pattern_generation
  - id: finance-C-029
    when: When configuring AML typology generation
    action: verify sufficient hub account candidates exist relative to pattern count
    severity: high
    kind: operational_lesson
    modality: must
    consequence: Insufficient hub accounts relative to alert pattern count causes ValueError at check_hub_exists and stops
      all pattern generation; solution requires lowering degree_threshold
    stage_ids:
    - alert_pattern_generation
  - id: finance-C-030
    when: When generating external-bank AML patterns
    action: verify sub-bank has sufficient candidate accounts before attempting selection
    severity: medium
    kind: operational_lesson
    modality: must
    consequence: Pattern generation silently returns without placing the pattern if insufficient accounts exist in the target
      bank, causing incomplete alert coverage
    stage_ids:
    - alert_pattern_generation
  - id: finance-C-031
    when: When presenting AMLSim output data
    action: claim synthetic AML patterns represent real-world money laundering behavior
    severity: high
    kind: claim_boundary
    modality: must_not
    consequence: Presenting synthetic transaction patterns as real AML cases misleads stakeholders about the system's actual
      detection capability on genuine suspicious activity
    stage_ids:
    - alert_pattern_generation
  - id: finance-C-032
    when: When using AMLSim validation results
    action: present validation success rates as indicators of real-world detection performance
    severity: medium
    kind: claim_boundary
    modality: must_not
    consequence: AMLSim validates that generated patterns match their parameters, but this does not guarantee equivalent detection
      rates on real financial crime patterns which have different characteristics
    stage_ids:
    - alert_pattern_generation
  - id: finance-C-033
    when: When loading typology pattern names
    action: verify typology name is one of the supported alert_types keys
    severity: medium
    kind: domain_rule
    modality: must
    consequence: Unknown typology names are skipped with a warning but the pattern count for that row is not retried, potentially
      leaving alert coverage below intended levels
    stage_ids:
    - alert_pattern_generation
  - id: finance-C-034
    when: When marking accounts involved in alert patterns
    action: set IS_SAR_KEY attribute to True for each vertices participating in alert typologies
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Missing IS_SAR_KEY flag causes SAR account list generation to miss alerted accounts, breaking downstream
      compliance reporting requirements
    stage_ids:
    - alert_pattern_generation
  - id: finance-C-035
    when: When specifying external-bank typology requirements
    action: require at least 2 banks to exist when bank_id is empty in pattern configuration
    severity: medium
    kind: operational_lesson
    modality: must
    consequence: Attempting external transactions without multiple banks causes KeyError when checking if bank exists, terminating
      pattern generation
    stage_ids:
    - alert_pattern_generation
  - id: finance-C-036
    when: When implementing bipartite and stack patterns
    action: calculate originator and beneficiary account counts correctly (num_orig_accts = num_accounts // 2 for bipartite,
      num_accounts // 3 for stack)
    severity: high
    kind: domain_rule
    modality: must
    consequence: Incorrect account count allocation causes insufficient accounts for one partition, breaking the expected
      multi-layer transaction structure
    stage_ids:
    - alert_pattern_generation
  - id: finance-C-037
    when: When implementing gather_scatter pattern
    action: accumulate amounts from origin accounts and distribute equal amounts to beneficiary accounts
    severity: high
    kind: domain_rule
    modality: must
    consequence: Non-equal distribution breaks the expected gather-scatter money flow pattern and causes validation failures
    stage_ids:
    - alert_pattern_generation
  - id: finance-C-040
    when: When configuring the base_date parameter
    action: Set base_date to '2017-01-01' to match hardcoded fallback in days2date calculation
    severity: high
    kind: domain_rule
    modality: must
    consequence: Transaction timestamps drift by years, causing all AML alert correlations to reference wrong date ranges
      and invalidating historical pattern analysis
    stage_ids:
    - log_conversion
  - id: finance-C-043
    when: When determining account organization type for SAR routing
    action: Return 'INDIVIDUAL' for account type 'I' and 'COMPANY' for each other types
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: SAR accounts misrouted to wrong entity tables, causing party enrichment queries to return empty results for
      legitimate SAR investigations
    stage_ids:
    - log_conversion
  - id: finance-C-046
    when: When generating Faker-based personal attributes
    action: Use 'en_US' locale for consistent US-style name and address generation
    severity: medium
    kind: resource_boundary
    modality: should
    consequence: Mixed locale attributes cause address parsing failures and inconsistent naming conventions across account
      records
    stage_ids:
    - log_conversion
  - id: finance-C-047
    when: When validating transaction log row integrity
    action: Skip rows with fewer columns than expected header to prevent index out of bounds errors
    severity: high
    kind: domain_rule
    modality: must
    consequence: CSV reader raises IndexError on malformed rows, causing transaction conversion to crash with incomplete output
    stage_ids:
    - log_conversion
  - id: finance-C-048
    when: When presenting AMLSim converted outputs
    action: Claim synthetic transaction data represents real-world AML patterns or compliance-ready alerts
    severity: high
    kind: claim_boundary
    modality: must_not
    consequence: Regulatory bodies may take enforcement action if synthetic data is presented as validated AML intelligence
      without proper disclosure
    stage_ids:
    - log_conversion
  - id: finance-C-049
    when: When outputting Faker-generated personal information
    action: Present Faker-generated names and SSNs as real personal identification data
    severity: medium
    kind: claim_boundary
    modality: must_not
    consequence: Data misuse if synthetic personal data is mistaken for actual PII, violating data handling policies and privacy
      expectations
    stage_ids:
    - log_conversion
  - id: finance-C-050
    when: When handling is_sar boolean to string conversion
    action: Write 'YES'/'NO' strings to IS_SAR column in sar_accounts.csv (not 'true'/'false')
    severity: high
    kind: domain_rule
    modality: must
    consequence: SAR filtering in downstream dashboards fails because 'YES'/'NO' values are expected but 'true'/'false' are
      written, causing zero SAR alerts detected
    stage_ids:
    - log_conversion
  - id: finance-C-051
    when: When reading prior_sar_count boolean field from accounts CSV
    action: Map prior_sar_count boolean through AccountDataTypeLookup.inputType before writing to output
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: SAR history field mismatches schema expectations, causing account risk scoring algorithms to receive invalid
      boolean values
    stage_ids:
    - log_conversion
  - id: finance-C-052
    when: When generating Python Faker instance for name anonymization
    action: Seed Faker with deterministic value (Faker.seed(0)) for reproducible name generation
    severity: medium
    kind: operational_lesson
    modality: must
    consequence: Different Faker outputs across runs cause non-deterministic account names, breaking regression tests and
      reproducibility requirements
    stage_ids:
    - log_conversion
  - id: finance-C-058
    when: When validating alert subgraph structures
    action: check that the number of accounts falls within the specified min_accounts to max_accounts range
    severity: high
    kind: domain_rule
    modality: must
    consequence: Alert patterns with incorrect account counts will be incorrectly validated, causing the generated simulation
      to deviate from specified typology parameters
    stage_ids:
    - alert_validation
  - id: finance-C-059
    when: When validating alert subgraph structures
    action: check that the initial transaction amount falls within the specified min_amount to max_amount range
    severity: high
    kind: domain_rule
    modality: must
    consequence: Alert patterns with incorrect transaction amounts will be incorrectly validated, causing AML typologies to
      violate financial thresholds specified in simulation parameters
    stage_ids:
    - alert_validation
  - id: finance-C-060
    when: When validating alert subgraph structures
    action: check that the transaction period falls within the specified min_period to max_period range
    severity: high
    kind: domain_rule
    modality: must
    consequence: Alert patterns with incorrect transaction periods will be incorrectly validated, causing temporal characteristics
      of AML typologies to deviate from simulation parameters
    stage_ids:
    - alert_validation
  - id: finance-C-061
    when: When implementing or extending pattern validation logic
    action: introduce custom validation rules that diverge from the graph-theoretic property-based approach
    severity: high
    kind: domain_rule
    modality: must_not
    consequence: Text-based or regex matching approaches are less robust than graph-theoretic validation and may produce false
      positives or negatives in pattern matching
    stage_ids:
    - alert_validation
  - id: finance-C-062
    when: When validating scatter-gather patterns
    action: check that intermediate accounts have exactly one incoming edge and one outgoing edge (degree 1)
    severity: high
    kind: domain_rule
    modality: must
    consequence: Intermediate accounts with incorrect vertex degrees indicate malformed scatter-gather structures that should
      not pass validation
    stage_ids:
    - alert_validation
  - id: finance-C-068
    when: When parsing schedule_id from alert parameter CSV
    action: 'convert schedule_id to boolean ordered flag: schedule_id > 0 means ordered, schedule_id == 0 means unordered'
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Incorrect conversion of schedule_id will cause ordered vs unordered validation checks to be applied incorrectly,
      either missing required checks or adding invalid ones
    stage_ids:
    - alert_validation
  - id: finance-C-069
    when: When running the AlertValidator class
    action: validate alerts before alert_transactions.csv has been generated by the transaction simulator
    severity: high
    kind: resource_boundary
    modality: must_not
    consequence: Attempting to validate non-existent transaction files will cause FileNotFoundError and validation will fail
      without producing results
    stage_ids:
    - alert_validation
  - id: finance-C-070
    when: When validating individual alerts via AlertValidator.validate_single
    action: raise KeyError if the requested alert_id does not exist in the loaded alert graphs
    severity: medium
    kind: operational_lesson
    modality: must
    consequence: Silent failure to handle non-existent alert IDs may cause misleading validation results in batch processing
    stage_ids:
    - alert_validation
  - id: finance-C-071
    when: When validating alert subgraph structures
    action: extract the initial amount from the transaction occurring on the start_date (earliest transaction)
    severity: high
    kind: domain_rule
    modality: must
    consequence: Using the wrong transaction for initial amount comparison will cause amount range validation to fail for
      valid patterns or pass for invalid ones
    stage_ids:
    - alert_validation
  - id: finance-C-073
    when: When reporting validation results
    action: log both successful matches with parameter line number and failed matches with mismatch reason
    severity: medium
    kind: operational_lesson
    modality: must
    consequence: Missing diagnostic information will make it difficult to debug validation failures and identify which parameter
      constraints were violated
    stage_ids:
    - alert_validation
  - id: finance-C-074
    when: When calculating transaction period for alert validation
    action: compute period as the number of days between start_date and end_date inclusive
    severity: high
    kind: domain_rule
    modality: must
    consequence: Incorrect period calculation (e.g., exclusive end_date) will cause valid patterns to fail or invalid patterns
      to pass validation
    stage_ids:
    - alert_validation
  - id: finance-C-075
    when: When validating alert patterns
    action: claim that validation results prove real-world AML detection effectiveness
    severity: high
    kind: claim_boundary
    modality: must_not
    consequence: Presenting synthetic simulation validation as evidence of real-world AML detection capability misrepresents
      the system's limitations
    stage_ids:
    - alert_validation
  - id: finance-C-076
    when: When generating validation reports
    action: present validation results as proof of financial crime detection capability
    severity: medium
    kind: claim_boundary
    modality: must_not
    consequence: AML typology pattern validation only confirms synthetic data generation parameters, not the system's ability
      to detect actual money laundering
    stage_ids:
    - alert_validation
  - id: finance-C-077
    when: When interpreting validation failure messages
    action: dismiss validation failures as simulation artifacts rather than investigating root causes
    severity: high
    kind: rationalization_guard
    modality: must_not
    consequence: Attributing validation failures to simulation quirks without investigation may mask genuine bugs in pattern
      generation or validation logic
    stage_ids:
    - alert_validation
  - id: finance-C-078
    when: When extending AML typology support
    action: skip adding corresponding validation logic for newly added pattern types
    severity: high
    kind: architecture_guardrail
    modality: must_not
    consequence: Unvalidated pattern types will allow invalid synthetic data to be generated, compromising the integrity of
      downstream ML training and evaluation
    stage_ids:
    - alert_validation
  - id: finance-C-086
    when: When using the combine_data script for batch combination runs
    action: provide an even number of command-line arguments (InputConfJSON and Repetitions pairs)
    severity: high
    kind: domain_rule
    modality: must
    consequence: Script will exit with error code 1 and no data combination occurs, leaving incomplete datasets
    stage_ids:
    - data_combination
  - id: finance-C-087
    when: When aggregating degree statistics across multiple simulations
    action: accumulate degree counts from each simulation using Counter addition
    severity: high
    kind: domain_rule
    modality: must
    consequence: Degree distribution statistics will be incomplete, causing graph analysis tools to miscalculate node connectivity
      and miss high-degree suspicious accounts
    stage_ids:
    - data_combination
  - id: finance-C-089
    when: When processing the first alert member row in each simulation
    action: initialize last_alert_id to 0 before processing if it is None
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Alert ID offsetting will use None as offset, causing TypeError exceptions or silent ID corruption
    stage_ids:
    - data_combination
  - id: finance-C-090
    when: When skipping CSV header rows during data combination
    action: call next(reader) once before processing each input CSV file
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Header rows will be included as data rows, corrupting aggregated statistics and causing type conversion errors
    stage_ids:
    - data_combination
  - id: finance-C-091
    when: When validating combined dataset outputs for research purposes
    action: claim that combined synthetic data represents real-world transaction patterns
    severity: high
    kind: claim_boundary
    modality: must_not
    consequence: Research results trained on synthetic AMLSim data will not generalize to real AML detection, potentially
      wasting investigation resources on patterns that do not exist in actual financial crime
    stage_ids:
    - data_combination
  - id: finance-C-092
    when: When combining simulations that were generated with different random seeds
    action: expect the combined dataset to maintain temporal ordering across simulation boundaries
    severity: medium
    kind: resource_boundary
    modality: must_not
    consequence: Transaction timestamps from later simulations may overlap with or precede those from earlier simulations,
      breaking time-series analysis assumptions
    stage_ids:
    - data_combination
  - id: finance-C-093
    when: When using combine_data.py for very large-scale dataset creation
    action: load entire output CSV files into memory during append operations
    severity: medium
    kind: resource_boundary
    modality: should_not
    consequence: Memory consumption will grow linearly with combined dataset size, potentially causing OutOfMemoryError for
      multi-million row combinations
    stage_ids:
    - data_combination
  - id: finance-C-094
    when: When interpreting combined alert outputs for downstream AML analysis
    action: assume that alert_id uniqueness alone guarantees cross-simulation alert attribution
    severity: medium
    kind: claim_boundary
    modality: must_not
    consequence: Alert type, schedule_id, and bank_id fields from different simulations may reference the same conceptual
      alert pattern with different IDs after offset, causing analysis tools to miss related alerts
    stage_ids:
    - data_combination
  - id: finance-C-095
    when: When combining simulation outputs with repetitions parameter
    action: load each input simulation configuration exactly N times as specified by the repetitions argument
    severity: high
    kind: operational_lesson
    modality: must
    consequence: Combined dataset will have incorrect simulation count, skewing statistical properties and reducing dataset
      diversity
    stage_ids:
    - data_combination
  - id: finance-C-097
    when: When passing account IDs from graph_construction to alert_pattern_generation
    action: Allow duplicate account IDs across different banks within the same simulation
    severity: high
    kind: domain_rule
    modality: must_not
    consequence: Alert validation will produce false matches when comparing transaction subgraphs against parameter definitions
  - id: finance-C-099
    when: When converting transaction timestamps from days to ISO format
    action: Use the base_date configuration parameter as the reference epoch (2017-01-01 default)
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Alert validation will compute incorrect transaction periods, causing false negatives in pattern matching
  - id: finance-C-101
    when: When reading alert transactions CSV in alert_validation
    action: Parse date strings with ISO 8601 format (YYYY-MM-DDTHH:MM:SSZ)
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Date parsing will raise ValueError, preventing validation from executing on any alert subgraph
  - id: finance-C-102
    when: When loading alert transaction subgraphs for validation
    action: Construct NetworkX DiGraph with edge attributes containing both amount and date fields
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Pattern validation functions will raise KeyError when accessing edge attributes for cycle/scatter-gather
      checks
  - id: finance-C-106
    when: When referencing degree sequences during alert validation
    action: Use degree.csv from the same simulation run as the alert parameter file
    severity: medium
    kind: operational_lesson
    modality: must
    consequence: Structural validation will compare alerts against mismatched degree distributions, producing false validation
      failures
  - id: finance-C-107
    when: When using Python NetworkX library for graph operations
    action: Use networkx version 2.x due to performance issues with large-scale graph creation
    severity: high
    kind: resource_boundary
    modality: must_not
    consequence: Graph construction will become extremely slow or run out of memory for large transaction networks (10K+ accounts)
  - id: finance-C-108
    when: When configuring the number of members in AML typologies
    action: Specify member count greater than 1 to avoid degenerate single-account patterns
    severity: high
    kind: operational_lesson
    modality: must
    consequence: Typology generation will raise ValueError for insufficient member count, breaking the alert generation pipeline
  - id: finance-C-109
    when: When presenting backtest simulation results
    action: Claim that simulated transaction patterns represent real-world money laundering behavior
    severity: high
    kind: claim_boundary
    modality: must_not
    consequence: Compliance teams may make incorrect regulatory decisions based on unrealistic synthetic data
  - id: finance-C-110
    when: When validating alert patterns against simulation parameters
    action: Assume that generated alerts perfectly match parameter specifications due to random sampling
    severity: medium
    kind: claim_boundary
    modality: must_not
    consequence: Validation will report false mismatches for edge cases in random amount generation and temporal scheduling
  - id: finance-C-114
    when: When generating synthetic transaction data for AML analysis
    action: Present the generated synthetic data as real-world financial transaction data or claim it reflects actual banking
      activity
    severity: high
    kind: claim_boundary
    modality: must_not
    consequence: Users or organizations may use synthetic data in regulatory submissions or compliance reports, misrepresenting
      the nature of the dataset and violating financial reporting standards
  - id: finance-C-115
    when: When using AMLSim for compliance or regulatory purposes
    action: Claim that AMLSim-generated alerts or SAR flags are equivalent to real Suspicious Activity Reports or regulatory
      compliance findings
    severity: high
    kind: claim_boundary
    modality: must_not
    consequence: Regulatory filings based on synthetic alerts may be rejected by authorities, leading to compliance violations
      and potential legal liability for the filing organization
  - id: finance-C-116
    when: When integrating AMLSim into operational transaction monitoring systems
    action: Use AMLSim outputs as inputs to real-time transaction monitoring, alerting, or blocking systems
    severity: high
    kind: claim_boundary
    modality: must_not
    consequence: Real-time monitoring systems receiving synthetic data may generate false alerts, fail to detect actual suspicious
      activity, or block legitimate transactions based on simulated patterns
  - id: finance-C-117
    when: When interpreting simulation results for machine learning model training
    action: Claim that ML detection models trained on AMLSim synthetic data will perform equivalently on real-world transaction
      data without validation
    severity: high
    kind: claim_boundary
    modality: must_not
    consequence: ML models may exhibit significant performance degradation when deployed on real data, leading to missed detections
      of actual money laundering activity and regulatory non-compliance
  - id: finance-C-118
    when: When converting transaction logs to CSV outputs
    action: Output SAR flag values as lowercase string 'true' or 'false' (matching the schema specification)
    severity: high
    kind: domain_rule
    modality: must
    consequence: Alert downstream processing systems expecting lowercase boolean strings may fail to correctly identify SAR-flagged
      transactions, causing incorrect compliance categorization
  - id: finance-C-119
    when: When representing in-memory transaction graphs
    action: Use NetworkX DiGraph class for each in-memory graph representations (accounts as nodes, transactions as directed
      edges)
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Using MultiDiGraph for the main transaction graph may cause duplicate edge handling inconsistencies, while
      using undirected graphs loses transaction directionality critical for AML typology detection
  - id: finance-C-122
    when: When configuring the AMLSim system
    action: Set degree_threshold identically in both TransactionGenerator and Nominator instances
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Mismatched degree thresholds cause incorrect identification of main account candidates; fan-in/fan-out patterns
      are misclassified, corrupting AML typology simulation results
  - id: finance-C-123
    when: When initializing account nodes in the transaction graph
    action: Initialize each account vertex with a 'normal_models' list attribute
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: KeyError raised when Nominator methods attempt to access 'normal_models' attribute for filtering; AML typology
      assignment fails for accounts without initialized normal_models
  - id: finance-C-124
    when: When assigning AML typology roles to account candidates
    action: Remove assigned nodes from the opposite candidate list (fan-in assigned nodes must be removed from fan-out candidates)
    severity: medium
    kind: architecture_guardrail
    modality: must
    consequence: Same account may be assigned multiple conflicting AML typology roles; simulation generates invalid nested
      or circular transaction patterns that do not match parameter definitions
  - id: finance-C-125
    when: When initializing the TransactionGenerator for simulation
    action: 'Execute initialization methods in the specified order: set_num_accounts -> generate_normal_transactions -> load_account_list
      -> load_normal_models -> build_normal_models -> set_main_acct_candidates -> load_alert_patterns -> mark_active_edges'
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Dependency violations cause AttributeError or KeyError exceptions; for example, generating transactions before
      setting account count creates mismatched graph topology
  - id: finance-C-126
    when: When interpreting timestamp values in simulator outputs
    action: Treat each timestamp values as days offset from base_date (default 2017-01-01), not as absolute dates or Unix
      timestamps
    severity: high
    kind: domain_rule
    modality: must
    consequence: Misinterpretation of day offsets as Unix timestamps produces dates in year 1970 or beyond year 4000; misinterpretation
      as absolute dates produces incorrect temporal ordering of transactions
  - id: finance-C-127
    when: When joining transaction and alert member datasets
    action: Verify Alert IDs in transaction log match those in alert_members.csv for joinability
    severity: high
    kind: domain_rule
    modality: must
    consequence: SQL or pandas join operations fail to match alert transactions with alert members; downstream compliance
      analysis cannot correlate transactions to suspicious accounts
  - id: finance-C-128
    when: When configuring AMLSim Python dependencies
    action: Use networkx version 1.11 specifically (version 2.* is not supported due to performance issues with large graph
      creation)
    severity: high
    kind: resource_boundary
    modality: must
    consequence: Using networkx 2.* causes severe performance degradation or out-of-memory errors when generating transaction
      graphs with thousands of accounts; simulation may not complete
  - id: finance-C-129
    when: When creating base transaction graphs from degree sequences
    action: Use MultiDiGraph as intermediate representation in directed_configuration_model, then convert to DiGraph for TransactionGenerator
    severity: medium
    kind: architecture_guardrail
    modality: must
    consequence: Skipping MultiDiGraph intermediate step may cause NetworkX API incompatibilities; duplicate edges in MultiDiGraph
      are lost when converted to simple DiGraph, affecting transaction multiplicity
  - id: finance-C-132
    when: When validating alert transaction subgraphs
    action: Match generated alert subgraphs against parameter definitions to detect structural inconsistencies
    severity: medium
    kind: operational_lesson
    modality: must
    consequence: Undetected inconsistencies between generated patterns and parameter files produce invalid typologies; ML
      training data contains incorrectly structured transaction sequences
  - id: finance-C-133
    when: When implementing or refactoring the directed transaction graph generation logic
    action: Maintain the self-loop avoidance logic that swaps IDs to prevent self-referential edges in the generated graph
    severity: high
    kind: domain_rule
    modality: must
    consequence: Removing self-loop swap logic causes artificial self-loops in transaction graphs, distorting AML pattern
      analysis and producing unrealistic account-to-account relationships that bias detection algorithms toward false positives
      or negatives
    derived_from_bd_id: BD-001
  - id: finance-C-134
    when: When implementing hub node identification logic in AML transaction graph analysis
    action: Use OR semantics when checking if degree_threshold is crossed (check if in_degree >= threshold OR out_degree >=
      threshold) — must NOT use AND semantics that requires both in and out degree to exceed threshold
    severity: high
    kind: domain_rule
    modality: must
    consequence: Using AND semantics for hub detection excludes legitimate one-sided hub accounts (high senders or high receivers
      only), reducing AML pattern coverage and missing detection opportunities for one-sided transaction patterns common in
      layering and structuring schemes
    derived_from_bd_id: BD-003
  - id: finance-C-135
    when: When implementing fan-in or fan-out alert pattern generation in the Nominator
    action: Verify candidate sorting uses degree-based selection (out-degree for fan-in collection points, in-degree for fan-out
      distribution points) — verify that high-activity nodes are prioritized as aggregation points rather than using random
      selection
    severity: medium
    kind: operational_lesson
    modality: should
    consequence: Using random selection instead of degree-based sorting creates unrealistic aggregation points with no outbound
      capability, generating AML alerts that appear anomalous to reviewers and reducing backtest fidelity for pattern detection
      systems
    derived_from_bd_id: BD-004
  - id: finance-C-136
    when: When implementing amount rounding logic for transaction generation
    action: Implement the adaptive step size algorithm (7-30 slots per range) to create non-uniform distribution favoring
      round numbers — verify step_size is between 7 and 30, and amounts align to step boundaries
    severity: medium
    kind: operational_lesson
    modality: should
    consequence: Using uniform distribution or step sizes below 7 produces unrealistic transaction amounts that lack the natural
      clustering around round figures, causing generated AML alerts to appear artificial and fail pattern authenticity validation
    derived_from_bd_id: BD-082
  - id: finance-C-137
    when: When modifying pattern detection logic (cycle, scatter_gather, gather_scatter) in either the graph generator or
      validation module
    action: Verify identical pattern detection logic is maintained in both validation/validate_alerts.py and the graph generator
      — apply changes to both modules simultaneously to maintain detection consistency
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Modifying pattern detection in only one module creates divergence where validation flags patterns the generator
      missed or vice versa, causing inconsistent alert classification and breaking the independent verification capability
    derived_from_bd_id: BD-079
  - id: finance-C-139
    when: When performing cross-module date arithmetic involving logs and analytics
    action: Normalize base_date to a single consistent value before performing date arithmetic across modules; do not mix
      conf.json/convert_logs.py (2017-01-01) with network_analytics.py (1970-01-01) without explicit conversion
    severity: high
    kind: operational_lesson
    modality: must
    consequence: Using inconsistent base dates across modules produces incorrect duration calculations, causing transaction
      age and risk scoring errors that accumulate silently across pipeline boundaries
    derived_from_bd_id: BD-073
  - id: finance-C-140
    when: When implementing scatter-gather pattern validation logic
    action: Validate scatter-gather patterns with degree exactly 1 for intermediate nodes (neither sending nor receiving additional
      transactions), monotonically decreasing amounts through the chain, and chronological transaction order within each phase
    severity: high
    kind: domain_rule
    modality: must
    consequence: Loose validation accepts malformed scatter-gather patterns that don't represent real money laundering schemes,
      causing false positive alerts that waste investigation resources and dilute detection signal
    derived_from_bd_id: BD-055
  - id: finance-C-141
    when: When implementing model assignment logic for AML typology simulation
    action: Track remaining and used counts per typology type to verify specified model quantities match allocation, preventing
      over or under-assignment of patterns to simulation accounts
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Random assignment without per-type counters produces uncontrolled pattern distributions unsuitable for testing,
      causing validation failures and unreliable detection algorithm assessment
    derived_from_bd_id: BD-057
  - id: finance-C-142
    when: When implementing suspicious activity report (SAR) status checking logic
    action: Check alert is_sar status using sar_id > 0 comparison (positive integer), where sar_id equals 0 indicates no SAR
      filed and positive values indicate filed report IDs
    severity: high
    kind: domain_rule
    modality: must
    consequence: Using zero check (sar_id == 0) instead of positive integer check incorrectly marks null-SAR accounts as having
      filed suspicious activity reports, violating database nullable integer semantics and causing compliance violations
    derived_from_bd_id: BD-064
  - id: finance-C-143
    when: When implementing AML typology graph generation with hub accounting
    action: Call remove_typology_candidate BEFORE add_node in each typology generator - this ordering ensures hub accounting
      tracks candidates before node registration
    severity: high
    kind: domain_rule
    modality: must
    consequence: Reversing the order causes hub accounts to be miscounted and alerts to reference unregistered nodes, corrupting
      the transaction graph structure and breaking alert correlation logic
    derived_from_bd_id: BD-072
  - id: finance-C-144
    when: When implementing normal model account population for CSV export
    action: Populate normal_models list AFTER mark_active_edges sets edge attributes - the active flag drives CSV export filter
      and must be set before population
    severity: high
    kind: domain_rule
    modality: must
    consequence: Writing normal_models before mark_active_edges includes inactive accounts in exports, causing data quality
      issues where CSV files contain accounts without valid transaction patterns
    derived_from_bd_id: BD-081
  - id: finance-C-145
    when: When configuring cash transaction amount ranges for AML simulation
    action: Set cash-in amounts with normal range 50-100 and fraud range 500-1000 (10x normal), and reverse the ranges for
      cash-out - these thresholds create multi-dimensional fraud signatures essential for detection
    severity: high
    kind: domain_rule
    modality: must
    consequence: Using uniform amount ranges for both normal and fraud transactions eliminates the characteristic volume increase
      signature, making transactions indistinguishable from legitimate cash activity and breaking detection algorithms
    derived_from_bd_id: BD-045
  - id: finance-C-146
    when: When implementing fan-in pattern generation for structuring detection
    action: Configure fan-in pattern with multiple originators sending to a single main account - this models smurfing schemes
      where individuals make sub-threshold deposits to avoid reporting
    severity: high
    kind: domain_rule
    modality: must
    consequence: Using fan-out pattern (single originator to multiple destinations) instead reverses the money flow direction,
      causing detection algorithms to look for opposite convergence patterns and miss actual structuring activity
    derived_from_bd_id: BD-046
  - id: finance-C-147
    when: When implementing cycle pattern generation for sophisticated laundering detection
    action: Form transactions into ring structures using modulo arithmetic for deterministic paths, and decrement amounts
      at each hop via margin extraction to verify final amounts differ from initial
    severity: high
    kind: domain_rule
    modality: must
    consequence: Random cycle paths without modulo arithmetic or missing margin decrements cause funds to return unchanged
      to origin, misrepresenting laundering fund degradation through layering stages
    derived_from_bd_id: BD-050
  - id: finance-C-148
    when: When implementing scatter-gather pattern generation with temporal segmentation
    action: Split scatter-gather at midpoint date with scatter phase (originators to intermediaries) executing before gather
      phase (intermediaries to beneficiaries) - this creates two-phase temporal signature
    severity: high
    kind: domain_rule
    modality: must
    consequence: Implementing single-phase patterns instead of two-phase scatter-gather eliminates the temporal evasion dimension,
      causing detection systems to miss timing-based evasion techniques that rely on phase delays
    derived_from_bd_id: BD-051
  - id: finance-C-149
    when: When implementing gather-scatter pattern generation with reversed phase order
    action: Execute gather phase (originators to intermediaries) first, then scatter phase (intermediaries to beneficiaries)
      - the phase order is critical for creating mirror pattern to scatter-gather
    severity: high
    kind: domain_rule
    modality: must
    consequence: Reversing to scatter-first order makes the pattern identical to scatter-gather, creating a detection blind
      spot where collection-first schemes are not identified regardless of phase order
    derived_from_bd_id: BD-052
  - id: finance-C-150
    when: When implementing graph construction logic in amlsim.nominator (Nominator stage)
    action: 'Maintain flow conservation invariants: in-degree sum must equal out-degree sum for every vertex, and num_accounts
      % len(sequence) == 0 must hold; graph construction must fail-fast if these constraints are violated'
    severity: high
    kind: domain_rule
    modality: must
    consequence: Violating flow conservation invariants causes Nominator failures (BD-071) and prevents directed graph generation
      entirely; backtest pipeline halts without generating transaction networks
    derived_from_bd_id: BD-090
  - id: finance-C-151
    when: When implementing multi-jurisdiction AML compliance reporting
    action: Assume the framework provides configurable CTR/SAR threshold handling per jurisdiction — the framework uses hardcoded
      thresholds that cannot accommodate jurisdictional variations
    severity: high
    kind: claim_boundary
    modality: must_not
    consequence: Hardcoded CTR/SAR thresholds prevent deployment across multiple jurisdictions with different regulatory requirements,
      causing compliance violations in production environments where thresholds differ from the hardcoded values
    derived_from_bd_id: BD-GAP-017
  - id: finance-C-152
    when: When configuring AML threshold parameters for compliance reporting
    action: Implement jurisdiction-specific CTR/SAR threshold configuration with audit trail — externalize thresholds to configuration
      files with jurisdiction codes and maintain change history for regulatory audit purposes
    severity: high
    kind: domain_rule
    modality: must
    consequence: Without configurable thresholds, organizations cannot meet multi-jurisdiction AML requirements where CTR
      limits vary (e.g., FinCEN $3000 vs UK £500) and regulators require documented threshold changes
    derived_from_bd_id: BD-GAP-017
  - id: finance-C-153
    when: When initializing the TransactionGraphGenerator component
    action: 'Execute initialization sequence exactly as: set_num_accounts -> generate_normal_transactions -> load_account_list
      -> load_normal_models -> build_normal_models -> set_main_acct_candidates -> load_alert_patterns -> mark_active_edges'
    severity: high
    kind: domain_rule
    modality: must
    consequence: Violating the initialization order causes Nominator graph lookups to fail when normal_models lists are missing
      or accounts are uninitialized, leading to AttributeError cascades in the alert generation pipeline
    derived_from_bd_id: BD-066
  - id: finance-C-154
    when: When using ResultGraphLoader.count_hub_accounts() for analytics reporting
    action: Verify that the dual counting behavior (base + extension) is expected for the use case — callers should not assume
      this returns a simple hub account count as it includes both parent implementation and extended analytics counting
    severity: medium
    kind: operational_lesson
    modality: should
    consequence: Callers expecting a single hub account count will misinterpret the inflated value from dual counting, causing
      metric discrepancies in downstream reporting and potentially incorrect AML alert prioritization
    derived_from_bd_id: BD-070
  - id: finance-C-155
    when: When testing hub detection patterns at different threshold values
    action: Verify test configurations match production threshold values — validate that tests run with threshold=10 (production
      value) to guarantee correct behavior for hub-based pattern assignment
    severity: high
    kind: operational_lesson
    modality: must
    consequence: Tests passing at threshold=3 do not guarantee correct behavior at threshold=10, creating false confidence
      where insufficient candidate pools for pattern assignment go undetected until production
    derived_from_bd_id: BD-086
  - id: finance-C-156
    when: When running alert generation under high volume conditions
    action: Monitor hub pool depletion rates and verify fallback behavior produces acceptable results — when hub pool exhausts,
      the fallback to lower-degree accounts may violate realism requirements for pattern blending
    severity: high
    kind: operational_lesson
    modality: must
    consequence: Under high alert volumes, hub pool depletion causes fallback to lower-degree accounts that violate the realism
      requirement, creating obvious anomalies that real-world AML systems would detect and reject
    derived_from_bd_id: BD-087
  - id: finance-C-157
    when: When combining simulation runs with different schema versions
    action: Combine data from runs with varying schema versions without schema validation — BD-015 enforces consistency while
      BD-009 enables evolution, creating silent misinterpretation when schemas differ
    severity: high
    kind: domain_rule
    modality: must_not
    consequence: Schema evolution enabled by BD-009 combines with BD-015 consistency enforcement, causing silent data misinterpretation
      when simulation runs with different schema versions are combined
    derived_from_bd_id: BD-094
  - id: finance-C-158
    when: When implementing suspicious account classification for tiered AML monitoring
    action: Verify that boolean risk flags (country_risk, business_risk) are sufficient for the AML rule engine — if nuanced
      risk levels are needed, the architecture requires redesign as the system only supports discrete thresholds
    severity: medium
    kind: operational_lesson
    modality: should
    consequence: Boolean risk classification forces discrete categorization that breaks when nuanced risk levels (medium-high)
      are required for tiered monitoring, potentially missing suspicious activity that falls between binary thresholds
    derived_from_bd_id: BD-GAP-002
  - id: finance-C-159
    when: When implementing hub account detection logic using degree threshold
    action: Verify that degree_threshold=4 matches the actual statistical outliers in degree distribution for the specific
      dataset being analyzed; adjust threshold based on the actual degree distribution rather than using the default value
      blindly
    severity: medium
    kind: operational_lesson
    modality: should
    consequence: Using degree_threshold=4 without verification may identify incorrect hub accounts; in money laundering detection,
      misidentified hubs cause both false positives (unnecessary investigations) and false negatives (missed consolidation
      points), violating FATF compliance requirements
    derived_from_bd_id: BD-020
  - id: finance-C-162
    when: When using the framework's default margin ratio parameter for transaction amount generation
    action: Verify that DEFAULT_MARGIN_RATIO=0.1 matches the actual intermediary fee structure in the target laundering scenario,
      and adjust to reflect specific layering scheme economics if needed
    severity: medium
    kind: operational_lesson
    modality: should
    consequence: Using 10% margin creates detectable decrement patterns across multi-hop chains; if actual intermediary fees
      differ, the generated transaction amounts will exhibit unrealistic margins that either over or understate laundering
      costs, compromising detection validation
    derived_from_bd_id: BD-021
  - id: finance-C-163
    when: When implementing transaction amount generation logic
    action: Verify that transaction amount rounding follows psychologically appealing patterns (multiples of 10, 100, 1000)
      as configured, and confirm the rounding strategy matches the target scenario's behavioral assumptions
    severity: medium
    kind: operational_lesson
    modality: should
    consequence: Rounding to round numbers creates realistic launderer behavior patterns that avoid obvious structuring thresholds;
      removing this rounding produces either unnaturally distributed amounts or constant-amount chains that fail to represent
      real transaction patterns
    derived_from_bd_id: BD-024
  - id: finance-C-164
    when: When implementing normal model subgraph edge generation
    action: Mark subgraph edges as active when they represent current-period transactions — active edges must be distinguishable
      from dormant historical edges to enable downstream pattern detection filtering
    severity: high
    kind: domain_rule
    modality: must
    consequence: Without active edge marking, dormant historical transactions incorrectly match against current-period alert
      patterns, causing false positive alerts that trigger unnecessary investigator review and dilute detection system effectiveness
    derived_from_bd_id: BD-058
  - id: finance-C-165
    when: When implementing SAR account extraction logic during log conversion
    action: Use org_type lookup to classify SAR accounts before schema routing — verify individual and organizational SAR
      accounts are routed to their respective schemas to comply with reporting requirements
    severity: high
    kind: domain_rule
    modality: must
    consequence: Failing to classify SAR accounts by org_type causes schema routing violations where individual accounts receive
      organizational schemas or vice versa, resulting in non-compliant SAR reports that regulatory authorities will reject
    derived_from_bd_id: BD-011
  - id: finance-C-166
    when: When implementing alert validation logic that checks transaction patterns for AML detection
    action: Verify that the validation framework enforces strict chronological ordering of transactions — verify transaction
      sequence is validated as a temporal dependency, not just as data presence
    severity: high
    kind: operational_lesson
    modality: must
    consequence: Without chronological ordering enforcement, AML typologies like layering sequences are not detected correctly;
      alerts for time-sensitive patterns generate false negatives, allowing suspicious transactions to pass undetected
    derived_from_bd_id: BD-013
  - id: finance-C-167
    when: When routing normal model alerts through the scheduling system
    action: Assume normal model alerts use the same dynamic CSV scheduling as AML typology patterns — normal model distribution
      is hardcoded to schedule_id=1 regardless of CSV parameters
    severity: high
    kind: architecture_guardrail
    modality: must_not
    consequence: Hardcoded schedule_id=1 prevents multi-schedule simulation scenarios where normal activity distribution differs;
      analysts cannot route normal model alerts to alternative schedules, limiting backtesting flexibility for schedule-dependent
      strategies
    derived_from_bd_id: BD-095
  - id: finance-C-168
    when: When implementing schedule routing configuration for pattern distribution
    action: Use dynamic CSV scheduling configuration for AML typology patterns while acknowledging normal models require hardcoded
      schedule_id=1 — do not attempt to override normal model schedule routing via CSV
    severity: medium
    kind: domain_rule
    modality: should
    consequence: Attempting to route normal model alerts through dynamic CSV causes routing conflicts; normal model alerts
      always default to schedule 1, so configuration changes for normal models in CSV have no effect
    derived_from_bd_id: BD-095
  - id: finance-C-169
    when: When processing transaction timestamps during graph_construction
    action: Assume the framework handles timezone conversion or UTC normalization automatically — timestamps are not explicitly
      annotated with timezone and may be treated as naive
    severity: high
    kind: claim_boundary
    modality: must_not
    consequence: Without explicit timezone annotation, transactions across multiple timezones are incorrectly sequenced in
      the graph; UTC-based systems may misalign events by hours, causing cycle detection algorithms to miss or incorrectly
      flag temporal patterns
    derived_from_bd_id: BD-GAP-006
  - id: finance-C-170
    when: When constructing transaction graphs from multiple data sources with timestamps
    action: Annotate each timestamps with explicit timezone identifiers and normalize to UTC before graph construction — convert
      local timestamps using source timezone metadata and store as UTC-aware datetime objects
    severity: high
    kind: domain_rule
    modality: must
    consequence: Missing UTC normalization causes cross-timezone transaction graphs to have incorrect temporal ordering; alerts
      relying on chronological sequences may trigger at wrong times or miss detection windows entirely
    derived_from_bd_id: BD-GAP-006
  - id: finance-C-171
    when: When selecting historical data snapshots for graph_construction
    action: Assume the framework provides point-in-time data availability — historical queries return current-state data,
      not the state that existed at the query timestamp
    severity: high
    kind: claim_boundary
    modality: must_not
    consequence: Without point-in-time data, backtests use current entity states that include future changes unknown at the
      historical timestamp; this introduces look-ahead bias where alerts reference accounts or entities modified after the
      backtest date
    derived_from_bd_id: BD-GAP-008
  - id: finance-C-172
    when: When running historical backtests or validating alerts against past timestamps
    action: Query data using point-in-time semantics — use temporal query methods that return the entity state as it existed
      at the specified timestamp, filtering out records created or modified after that point
    severity: high
    kind: domain_rule
    modality: must
    consequence: Using current-state data for historical backtests causes false positive alerts; entities that were valid
      at the historical timestamp but were subsequently closed or flagged appear as suspicious when they were not at that
      time
    derived_from_bd_id: BD-GAP-008
  - id: finance-C-173
    when: When implementing pattern validation logic for AML alert detection
    action: Use graph-theoretic algorithms (such as NetworkX simple_cycles for cycle detection) rather than regex or text-based
      pattern matching — validate patterns based on transaction graph structure
    severity: high
    kind: domain_rule
    modality: must
    consequence: Regex-based validation can be evaded by simple field value changes or formatting variations; suspicious transactions
      that modify field contents bypass detection while still exhibiting structurally suspicious patterns
    derived_from_bd_id: BD-012
  - id: finance-C-174
    when: When combining multiple data inputs in the data_combination pipeline
    action: Verify that each combined inputs share the same schema structure before processing — if schemas differ, the framework
      will silently load schema from the first input only and may misinterpret subsequent data fields
    severity: high
    kind: operational_lesson
    modality: must
    consequence: Silent schema mismatch causes the framework to load structure from the first input only, potentially misinterpreting
      field names and types in subsequent inputs and corrupting the combined dataset without raising errors
    derived_from_bd_id: BD-015
  - id: finance-C-175
    when: When using the framework's DEFAULT_MARGIN_RATIO parameter for transaction cycle simulation
    action: Verify that DEFAULT_MARGIN_RATIO=0.1 (10% fund retention) matches the actual regulatory requirement for intermediaries
      in cycle/scatter-gather patterns, and adjust if the mandated retention ratio differs in the target jurisdiction
    severity: medium
    kind: operational_lesson
    modality: should
    consequence: Hardcoded 0.1 margin ratio causes the simulation to under-flag or over-flag transaction cycles if the actual
      regulatory retention requirement differs, leading to validation results that don't match compliance expectations
    derived_from_bd_id: BD-067
  - id: finance-C-176
    when: When processing data in the graph_construction stage
    action: Assume the framework implements stale data detection or automatic data expiry — the framework does not include
      staleness checks; expired or outdated data is processed as current without warning
    severity: high
    kind: claim_boundary
    modality: must_not
    consequence: Without stale data detection, the framework processes outdated data as current, causing downstream analysis
      to use stale values and producing unreliable results in production systems
    derived_from_bd_id: BD-GAP-009
  - id: finance-C-177
    when: When managing data feeds in the graph_construction stage
    action: Implement a data staleness policy with configurable TTL (time-to-live) — add a timestamp or version field to each
      data record, and mark records as expired when current_time - timestamp exceeds the configured TTL threshold
    severity: high
    kind: domain_rule
    modality: must
    consequence: Without a staleness policy, stale data continues to flow through the pipeline causing downstream systems
      to make decisions based on outdated information
    derived_from_bd_id: BD-GAP-009
  - id: finance-C-178
    when: When managing model and data artifacts in production systems
    action: Assume the framework enforces model-data version consistency — the framework does not implement snapshot binding
      between model versions and their corresponding training/inference data versions
    severity: high
    kind: claim_boundary
    modality: must_not
    consequence: Without version snapshot binding, models trained on old data can run against new data without validation,
      causing prediction quality degradation that accumulates silently in production
    derived_from_bd_id: BD-GAP-011
  - id: finance-C-179
    when: When registering or loading model artifacts in the graph_construction stage
    action: Implement version snapshot binding by storing model_version and data_version metadata together in the artifact
      registry, and validate that loaded model artifacts' data_version matches the target dataset's version before inference
    severity: high
    kind: domain_rule
    modality: must
    consequence: Without version binding, models trained on outdated data continue serving predictions against new data distributions,
      causing prediction quality degradation that remains undetected until significant business impact occurs
    derived_from_bd_id: BD-GAP-011
  - id: finance-C-180
    when: When generating synthetic transaction data with cycle patterns or scatter-gather patterns for AML system training
    action: Introduce randomized margin ratios instead of fixed DEFAULT_MARGIN_RATIO=0.1; vary margin ratio stochastically
      (e.g., uniform[0.05, 0.15] or normally distributed) to prevent uniform 10% decrement signature detection
    severity: high
    kind: operational_lesson
    modality: must
    consequence: Fixed 10% margin ratio creates uniform decrement signature across cycle and scatter-gather patterns; adversaries
      can identify synthetic data origin by the consistent 0.1 ratio, compromising AML system training validity
    derived_from_bd_id: BD-089
  - id: finance-C-181
    when: When combining data from multiple input sources or simulation runs in the fraud detection pipeline
    action: Verify that each combined inputs share the same schema version before processing; implement schema validation
      checks that detect drift between the first-loaded schema and subsequent inputs
    severity: medium
    kind: operational_lesson
    modality: should
    consequence: When inputs have different schema versions, the framework silently applies the first-loaded schema to all
      combined data, misinterpreting fields in subsequent inputs and causing silent data corruption in aggregated alerts
    derived_from_bd_id: BD-091
  - id: finance-C-182
    when: When implementing graph analysis algorithms for money laundering detection
    action: Use weakly connected component analysis to identify isolated transaction clusters representing distinct money
      laundering networks — do not replace with strongly connected components alone
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Replacing weakly connected components with strongly connected components misses direction-agnostic connectivity
      patterns in undirected graph views, causing isolated shell company networks and segmented operations to remain invisible
      to detection algorithms
    derived_from_bd_id: BD-039
  - id: finance-C-183
    when: When implementing money laundering pattern detection in transaction graphs
    action: Use deterministic fan-out pattern where a single main account sends to multiple beneficiaries — do not replace
      with random distribution recipients
    severity: high
    kind: domain_rule
    modality: must
    consequence: Replacing deterministic fan-out with random distribution breaks the reproducible test case structure and
      misses the single-source multi-destination anomalies that model the final laundering distribution stage
    derived_from_bd_id: BD-047
  - id: finance-C-184
    when: When implementing peer-to-peer layering pattern detection in transaction graphs
    action: Use even split between originators and beneficiaries in bipartite patterns — do not use uneven splits that create
      obvious hub accounts
    severity: high
    kind: domain_rule
    modality: must
    consequence: Using uneven splits creates obvious hub accounts detectable by simple degree thresholds, breaking the balanced
      bipartite subgraphs that obscure the overall laundering flow by distributing activity symmetrically
    derived_from_bd_id: BD-048
  - id: finance-C-185
    when: When implementing three-tier layering pattern generation in transaction graphs
    action: Divide accounts into equal thirds for originator, intermediate, and beneficiary roles — do not use variable tier
      sizes
    severity: high
    kind: domain_rule
    modality: must
    consequence: Using variable tiers blurs the distinct role boundaries between placement, layering, and integration stages,
      causing the recognizable tiered structures representing classic three-tier laundering to become unrecognizable
    derived_from_bd_id: BD-049
  - id: finance-C-186
    when: When implementing alert validation for cycle pattern detection
    action: 'Enforce cycle-specific validation constraints: single cycle topology, chronological transaction ordering, and
      unique transaction amounts — do not use generic validation that lacks topological and temporal constraints'
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Using generic validation produces malformed synthetic cycles that do not match real-world ring structure
      characteristics, causing false-positive detections in money laundering cycle alerts
    derived_from_bd_id: BD-054
output_validator:
  assertions:
  - id: OV-01
    check_predicate: all(p in inspect.getsource(zvt.factors.algorithm.macd) for p in ['slow=26', 'fast=12', 'n=9'])
    failure_message: 'FATAL: MACD params drifted from (fast=12, slow=26, n=9) — SL-08 violation, non-reproducible signals'
    business_meaning: Standard MACD parameters are a semantic lock; drift makes results incomparable with industry-standard
      indicators and non-reproducible.
    source_ids:
    - SL-08
    - BD-036
  - id: OV-02
    check_predicate: result.get('total_trades', 0) > 0 or result.get('explicit_zero_trade_ack') is True
    failure_message: Zero trades executed — likely missing pre-fetched data (see PC-02) or over-restrictive filters
    business_meaning: A backtest with zero trades is not a valid result; either data is missing or the strategy never triggered.
      Structural non-emptiness check is insufficient — we need business confirmation.
    source_ids:
    - SL-01
    - finance-C-073
  - id: OV-03
    check_predicate: result.get('annual_return') is None or abs(float(result['annual_return'])) <= 5.0
    failure_message: 'FATAL: |annual_return| > 500% — likely look-ahead bias or data error'
    business_meaning: Annual returns exceeding 500% are physically implausible for A-share strategies; indicates look-ahead
      bias or corrupt data.
    source_ids: []
  - id: OV-04
    check_predicate: result.get('holding_change_pct') is None or abs(float(result['holding_change_pct'])) <= 1.0
    failure_message: 'FATAL: |holding_change_pct| > 100% — physically impossible'
    business_meaning: Holding change percentage cannot exceed 100%; violation indicates position accounting error.
    source_ids:
    - BD-029
  - id: OV-05
    check_predicate: result.get('max_drawdown') is None or abs(float(result['max_drawdown'])) <= 1.0
    failure_message: 'FATAL: |max_drawdown| > 100% — impossible for non-leveraged account'
    business_meaning: Maximum drawdown cannot exceed 100% without leverage; violation indicates calculation error or look-ahead
      bias.
    source_ids: []
  - id: OV-06
    check_predicate: not (hasattr(result, 'trade_log') and result.trade_log and any(result.trade_log[i].action == 'sell' and
      i+1 < len(result.trade_log) and result.trade_log[i+1].action == 'buy' and result.trade_log[i].timestamp == result.trade_log[i+1].timestamp
      for i in range(len(result.trade_log)-1)))
    failure_message: 'FATAL: buy-before-sell detected in same cycle — SL-01 violation, creates implicit leverage'
    business_meaning: SL-01 requires sell() before buy() in each cycle; violation means available_long was not updated before
      buying, risking duplicate positions.
    source_ids:
    - SL-01
  scaffold:
    validate_py_path: '{workspace}/validate.py'
    tail_block: "# === DO NOT MODIFY BELOW THIS LINE ===\nif __name__ == \"__main__\":\n    result = run_backtest()\n    from\
      \ validate import enforce_validation\n    enforce_validation(result, output_path=\"{workspace}/result.csv\")\n# ===\
      \ END DO NOT MODIFY ==="
  enforcement_protocol: 1. Never edit validate.py. 2. Never delete the DO NOT MODIFY tail block from the main script. 3. Never
    wrap enforce_validation() in try/except. 4. Never rewrite result write logic — it MUST go through enforce_validation.
    5. If validate.py raises ImportError, fix the dependency, do not remove the call.
acceptance:
  hard_gates:
  - id: G1
    check: '{workspace}/result.csv exists AND file size > 0'
    on_fail: Strategy did not produce output; check run_backtest() return value and enforce_validation() call
  - id: G2
    check: '{workspace}/result.csv.validation_passed marker file exists'
    on_fail: Validation did not complete; review validate.py output and fix assertion failures
  - id: G3
    check: 'Main script contains literal: from validate import enforce_validation'
    on_fail: Validation chain stripped; re-add the import in the DO NOT MODIFY block
  - id: G4
    check: 'Main script contains literal: # === DO NOT MODIFY BELOW THIS LINE ==='
    on_fail: Validation fence removed; regenerate DO NOT MODIFY tail block
  - id: G5
    check: 'result.csv has at least 1 row: pandas.read_csv(result.csv).shape[0] >= 1'
    on_fail: Empty result; check if trade_log is non-empty and factors generated signals. Confirm PC-02 (k-data exists) passed.
  - id: G6
    check: 'If MACD strategy: source contains ''slow=26'' AND ''fast=12'' AND ''n=9'' in algorithm call'
    on_fail: MACD params drifted from SL-08 lock; restore standard (12, 26, 9)
  - id: G7
    check: 'For data pipeline tasks: result.csv contains ''entity_id'' and ''timestamp'' fields'
    on_fail: Missing required columns; check Mixin.query_data return schema and DataFrame MultiIndex reset_index() before
      writing
  - id: G8
    check: 'OV-03 passes: abs(annual_return) <= 5.0 (500%)'
    on_fail: Physical plausibility check failed; investigate look-ahead bias or data corruption in input kdata
  soft_gates:
  - id: SG-01
    rubric: 'Strategy narrative consistency: user intent aligns with generated strategy.py logic. dim_a: signal direction
      (buy/sell) matches intent [1-5, pass>=4]; dim_b: frequency (daily/intraday) aligns [1-5, pass>=4]; dim_c: risk controls
      match user intent [1-5, pass>=4].'
  - id: SG-02
    rubric: 'Factor combination quality. dim_a: no highly correlated factor duplication [1-5, pass>=4]; dim_b: multi-period
      alignment correct [1-5, pass>=4]; dim_c: liquidity filter present for A-share [1-5, pass>=4].'
  - id: SG-03
    rubric: 'Data source selection appropriateness. dim_a: coverage sufficient for target entities [1-5, pass>=4]; dim_b:
      provider latency acceptable for strategy frequency [1-5, pass>=4]; dim_c: no unauthorized provider used without credentials
      [1-5, pass>=4].'
skill_crystallization:
  trigger: all_hard_gates_passed AND user_opt_out_skill_saving != true
  output_path_template: '{workspace}/../skills/{slug}.skill'
  slug_template: '{blueprint_id_short}-{uc_id_lower}'
  captured_fields:
  - name
  - intent_keywords
  - entry_point_script
  - validate_script
  - fatal_constraints
  - spec_locks
  - preconditions
  - install_recipes
  - human_summary_translated
  action: 'After all Hard Gates PASS, resolve slug via slug_template using the executed UC, then write the .skill YAML file
    at output_path_template. Notify user in their detected locale: ''Skill saved as {slug}.skill — next time say one of {sample_triggers}
    from the matched UC to invoke directly.'''
  violation_signal: All hard gates passed but no .skill file exists at expected path
  skill_file_schema:
    name: finance-bp-060 / Convert Logs to AML Simulation Data
    version: v5.3
    intent_keywords:
    - convert logs
    - synthetic data
    - AML simulation
    - generate transaction logs
    - test data generation
    entry_point: run_backtest
    fatal_guards:
    - SL-01
    - SL-02
    - SL-03
    - SL-04
    - SL-05
    - SL-06
    - SL-07
    - SL-08
    - SL-10
    - SL-11
    - SL-12
    spec_locks:
    - SL-01
    - SL-02
    - SL-03
    - SL-04
    - SL-05
    - SL-06
    - SL-07
    - SL-08
    - SL-09
    - SL-10
    - SL-11
    - SL-12
    preconditions:
    - PC-01
    - PC-02
    - PC-03
    - PC-04
post_install_notice:
  trigger: skill_installation_complete
  message_template:
    positioning: I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow.
    capability_catalog:
      group_strategy:
        source: auto_grouped
        strategy_reason: auto-grouped by UC.type (5 distinct values, balanced distribution)
      groups:
      - group_id: data_pipeline
        name: Data Pipeline
        description: ''
        emoji: 📊
        uc_count: 4
        ucs:
        - uc_id: UC-101
          name: Convert Logs to AML Simulation Data
          short_description: Convert transaction log files into synthetic AML simulation data for testing anti-money laundering
            detection systems
          sample_triggers:
          - convert logs
          - synthetic data
          - AML simulation
        - uc_id: UC-102
          name: Split Accounts by Bank ID
          short_description: Partition account CSV files by bank identifier for bank-specific analysis and processing
          sample_triggers:
          - split accounts
          - bank ID
          - partition data
        - uc_id: UC-103
          name: Combine AML Simulation Outputs
          short_description: Aggregate multiple AMLSim output files into a consolidated dataset for comprehensive analysis
          sample_triggers:
          - combine outputs
          - merge data
          - AMLSim aggregation
        - uc_id: UC-104
          name: Generate Transaction Graph
          short_description: Generate the base transaction network graph used as input for AML simulation, defining account
            relationships and transaction patterns
          sample_triggers:
          - transaction graph
          - network generation
          - graph topology
      - group_id: research_analysis
        name: Research Analysis
        description: ''
        emoji: 📦
        uc_count: 1
        ucs:
        - uc_id: UC-105
          name: Generate Scale-Free Network Graph
          short_description: Generate scale-free network graphs using Kronecker graph algorithm for research on network topology
            and distribution analysis
          sample_triggers:
          - scale-free
          - Kronecker graph
          - network topology
      - group_id: monitoring
        name: Monitoring
        description: ''
        emoji: 📦
        uc_count: 3
        ucs:
        - uc_id: UC-106
          name: Plot Alert Pattern Subgraphs
          short_description: Visualize alert pattern subgraphs showing which accounts and transactions are involved in each
            generated alert for debugging and validation
          sample_triggers:
          - alert visualization
          - subgraph plot
          - alert debugging
        - uc_id: UC-112
          name: Analyze Transaction Networks
          short_description: Load AMLSim outputs and analyze transaction network characteristics including degree distribution,
            connected components, and graph properties
          sample_triggers:
          - network analysis
          - graph analytics
          - validation
        - uc_id: UC-113
          name: Validate AML Simulation Alerts
          short_description: Validate generated alerts against expected alert parameters to ensure AML simulation produces
            correct alert patterns and amounts
          sample_triggers:
          - validate alerts
          - alert verification
          - simulation accuracy
      - group_id: reporting
        name: Reporting
        description: ''
        emoji: 📋
        uc_count: 1
        ucs:
        - uc_id: UC-107
          name: Plot Transaction Distributions
          short_description: Generate statistical distribution plots (degree, amount, frequency) from transaction graphs for
            analysis and reporting
          sample_triggers:
          - distribution plot
          - statistics
          - degree distribution
      - group_id: builtin_factor
        name: Builtin Factor
        description: ''
        emoji: 🧮
        uc_count: 4
        ucs:
        - uc_id: UC-108
          name: Random Amount Generator
          short_description: Generate random transaction amounts within configurable min/max bounds for transaction simulation
          sample_triggers:
          - random amount
          - transaction generator
          - random number
        - uc_id: UC-109
          name: Account Nominator for Transaction Selection
          short_description: Select appropriate accounts for different transaction types (fan-in, fan-out, single, mutual,
            periodical) based on network degree thresholds
          sample_triggers:
          - account selection
          - nominator
          - transaction routing
        - uc_id: UC-110
          name: Rounded Amount Generator
          short_description: Generate rounded transaction amounts (e.g., 100, 500, 1000) to simulate realistic human transaction
            patterns
          sample_triggers:
          - rounded amount
          - realistic transaction
          - human pattern
        - uc_id: UC-111
          name: Normal Account Behavior Model
          short_description: Define and manage normal (non-suspicious) account behavior models including main accounts and
            member accounts for transaction simulation
          sample_triggers:
          - normal model
          - behavior model
          - account group
    call_to_action: Tell me which one you want to try.
    featured_entries:
    - uc_id: UC-101
      beginner_prompt: Try convert logs to aml simulation data
      auto_selected: true
    - uc_id: UC-102
      beginner_prompt: Try split accounts by bank id
      auto_selected: true
    - uc_id: UC-103
      beginner_prompt: Try combine aml simulation outputs
      auto_selected: true
    more_info_hint: Ask me 'what else can you do?' to see all 13 capabilities.
  locale_rendering:
    instruction: On skill_installation_complete, translate ALL user-facing strings (positioning + capability_catalog.groups[].name
      + capability_catalog.groups[].description + capability_catalog.groups[].ucs[].short_description + call_to_action + featured_entries[].beginner_prompt
      + more_info_hint) into detected user locale per locale_contract. Preserve UC-IDs, group_id, emoji, and sample_triggers
      verbatim.
    preserve_verbatim:
    - UC-IDs
    - group_id
    - emoji
    - sample_triggers
    - technical_class_names
  enforcement:
    action: 'Host agent MUST send composed message to user as the FIRST user-facing response after skill_installation_complete
      event. Message MUST contain: positioning, capability_catalog (rendered as markdown tables per group), 3 featured_entries,
      call_to_action, and more_info_hint.'
    violation_code: PIN-01
    violation_signal: First user-facing message post-install does not contain the full capability_catalog (all UCs grouped)
      OR skips featured_entries OR skips call_to_action.
human_summary:
  persona: Doraemon
  what_i_can_do:
    tagline: 'I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow. Just tell me
      what you want; I''ll write the code, you don''t have to dig docs. (Heads up: ZVT natively supports A-share, HK, and
      crypto. US stocks — stockus_nasdaq_AAPL — are half-baked; don''t bother for serious work.)'
    use_cases:
    - Combine AML Simulation Outputs
    - Split Accounts by Bank ID
    - Convert Logs to AML Simulation Data
    - A-share MACD daily golden-cross backtest with hfq price adjustment from eastmoney
    - 'End-to-end ZVT pipeline: FinanceRecorder + GoodCompanyFactor + StockTrader'
    - Multi-factor strategy with TargetSelector (AND mode) combining MACD + volume breakout
    - Index composition data collection (SZ1000, SZ2000) with EM recorder
  what_i_auto_fetch:
  - ZVT stage pipeline structure (data_collection → visualization) from LATEST.yaml
  - Semantic locks (SL-01 through SL-12) — especially sell-before-buy ordering and MACD params
  - Fatal constraints (finance-C-*) relevant to your target strategy type
  - 'Default parameters: MACD(12,26,9), hfq adjustment, buy_cost=0.001, base_capital=1M CNY'
  - Entity ID format (stock_sh_600000) and DataFrame MultiIndex convention
  - Provider-specific recorder class names and required class attributes
  what_i_ask_you:
  - 'Target market: A-share (default), HK, or crypto? (US stocks in ZVT are half-baked — stockus_nasdaq_AAPL exists but coverage
    is thin)'
  - 'Data source / provider: eastmoney (free, no account), joinquant (account+paid), baostock (free, good history), akshare,
    or qmt (broker)?'
  - 'Strategy type: MACD golden-cross, MA crossover, volume breakout, fundamental screen, or custom factor?'
  - 'Time range: start_timestamp and end_timestamp for backtest period'
  - 'Target entity IDs: specific stocks (stock_sh_600000) or index components (SZ1000)?'
  locale_rendering:
    instruction: On first user contact, translate all fields above into detected user locale while preserving Doraemon persona
      (direct, frank, mildly snarky, knows limits).
    preserve_verbatim:
    - BD-IDs
    - SL-IDs
    - UC-IDs
    - finance-C-IDs
    - class_names
    - function_names
    - file_paths
    - numeric_thresholds

ClawHub Coding Data Analysis+2

T@clawhub-tangweigang-jpg-8679fec286

Alphalens Factor Analysis

Skill

分析alpha因子的预测能力与前向收益特征，生成分组收益、IC、换手率等报告，辅助量化策略的因子研究与事件分析。。

---
name: alphalens-factor-analysis
description: |-
  分析alpha因子的预测能力与前向收益特征，生成分组收益、IC、换手率等报告，辅助量化策略的因子研究与事件分析。。
license: Proprietary. See LICENSE.txt in project root.
compatibility: Designed for Doramagic-host ecosystem (Claude Code / openclaw / Cursor). Requires Python 3.12+ with uv package manager.
metadata:
  version: "v6.1"
  blueprint_id: "finance-bp-120"
  compiled_at: "2026-04-22T13:00:58.879278+00:00"
  capability_markets: "multi-market"
  capability_activities: "backtesting, factor-research"
  sop_version: "crystal-compilation-v6.1"
---
# Alphalens 因子分析 (alphalens-factor-analysis)

> 分析alpha因子的预测能力与前向收益特征，生成分组收益、IC、换手率等报告，辅助量化策略的因子研究与事件分析。

## Pipeline

`data_collection -> data_storage -> factor_computation -> target_selection -> trading_execution -> visualization`

## Top Use Cases (6 total)

### Documentation Deployment (`UC-101`)
Automated build and deployment of project documentation to ensure consistent and reproducible documentation releases
**Triggers**: docs, deploy, build

### Sphinx Documentation Configuration (`UC-102`)
Configures the Sphinx documentation system with extensions for Python API documentation, Jupyter notebooks, and mathematical expressions
**Triggers**: sphinx, config, documentation

### PyFolio Portfolio Integration (`UC-106`)
Combines Alphalens factor analysis with PyFolio portfolio analytics to evaluate factor-derived portfolio performance, risk metrics, and tearsheet gene
**Triggers**: pyfolio, integration, portfolio

For all **6** use cases, see [references/USE_CASES.md](references/USE_CASES.md).

**Execute trigger**: `When user intent matches intent_router.uc_entries[].positive_terms AND user uses action verb (run/execute/跑/执行/backtest/fetch/collect)`

## What I'll Ask You

- Target market: A-share (default), HK, or crypto? (US stocks in ZVT are half-baked — stockus_nasdaq_AAPL exists but coverage is thin)
- Data source / provider: eastmoney (free, no account), joinquant (account+paid), baostock (free, good history), akshare, or qmt (broker)?
- Strategy type: MACD golden-cross, MA crossover, volume breakout, fundamental screen, or custom factor?
- Time range: start_timestamp and end_timestamp for backtest period
- Target entity IDs: specific stocks (stock_sh_600000) or index components (SZ1000)?

## Semantic Locks (Fatal)

| ID | Rule | On Violation |
|---|---|---|
| `SL-01` | Execute sell orders before buy orders in every trading cycle | halt |
| `SL-02` | Trading signals MUST use next-bar execution (no look-ahead) | halt |
| `SL-03` | Entity IDs MUST follow format entity_type_exchange_code | halt |
| `SL-04` | DataFrame index MUST be MultiIndex (entity_id, timestamp) | halt |
| `SL-05` | TradingSignal MUST have EXACTLY ONE of: position_pct, order_money, order_amount | halt |
| `SL-06` | filter_result column semantics: True=BUY, False=SELL, None/NaN=NO ACTION | halt |
| `SL-07` | Transformer MUST run BEFORE Accumulator in factor pipeline | halt |
| `SL-08` | MACD parameters locked: fast=12, slow=26, signal=9 | halt |

Full lock definitions: [references/LOCKS.md](references/LOCKS.md)

## Top Anti-Patterns (25 total)

- **`AP-ZVT-183`**: 除权因子为 inf/NaN 时直接参与乘法导致复权静默失败
- **`AP-ZVT-179`**: 第三方数据接口超限后异常被吞噬，数据静默缺失
- **`AP-ZVT-183B`**: HFQ（后复权）与 QFQ（前复权）K 线表使用错误导致因子计算漂移

All 25 anti-patterns: [references/ANTI_PATTERNS.md](references/ANTI_PATTERNS.md)

## Evidence Quality Notice

> [QUALITY NOTICE] This crystal was compiled from blueprint finance-bp-120. Evidence verify ratio = 55.2% and audit fail total = 22. Generated results may have uncaptured requirement gaps. Verify critical decisions against source files (LATEST.yaml / LATEST.jsonl).

## Reference Files

| File | Contents | When to Load |
|---|---|---|
| [references/seed.yaml](references/seed.yaml) | V6+ 全量权威 (source-of-truth) | 有行为/决策争议时必读 |
| [references/ANTI_PATTERNS.md](references/ANTI_PATTERNS.md) | 25 条跨项目反模式 | 开始实现前 |
| [references/WISDOM.md](references/WISDOM.md) | 跨项目精华借鉴 | 架构决策时 |
| [references/CONSTRAINTS.md](references/CONSTRAINTS.md) | domain + fatal 约束 | 规则冲突时 |
| [references/USE_CASES.md](references/USE_CASES.md) | 全量 KUC-* 业务场景 | 需要完整示例时 |
| [references/LOCKS.md](references/LOCKS.md) | SL-* + preconditions + hints | 生成回测/交易代码前 |
| [references/COMPONENTS.md](references/COMPONENTS.md) | AST 组件地图（按 module 拆分）| 查 API 时 |

---

*Compiled by Doramagic crystal-compilation-v6.1 from `finance-bp-120` blueprint at 2026-04-22T13:00:58.879278+00:00.*
*See [human_summary.md](human_summary.md) for non-technical overview.*

FILE:human_summary.md
# Human Summary

> I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow. Just tell me what you want; I'll write the code, you don't have to dig docs. (Heads up: ZVT natively supports A-share, HK, and crypto. US stocks — stockus_nasdaq_AAPL — are half-baked; don't bother for serious work.)

## What I Can Do

- **tagline**: I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow. Just tell me what you want; I'll write the code, you don't have to dig docs. (Heads up: ZVT natively supports A-share, HK, and crypto. US stocks — stockus_nasdaq_AAPL — are half-baked; don't bother for serious work.)
- **use_cases**: ['Event Study Analysis', 'Sphinx Documentation Configuration', 'Documentation Deployment', 'A-share MACD daily golden-cross backtest with hfq price adjustment from eastmoney', 'End-to-end ZVT pipeline: FinanceRecorder + GoodCompanyFactor + StockTrader', 'Multi-factor strategy with TargetSelector (AND mode) combining MACD + volume breakout', 'Index composition data collection (SZ1000, SZ2000) with EM recorder']

## What I Ask You

- Target market: A-share (default), HK, or crypto? (US stocks in ZVT are half-baked — stockus_nasdaq_AAPL exists but coverage is thin)
- Data source / provider: eastmoney (free, no account), joinquant (account+paid), baostock (free, good history), akshare, or qmt (broker)?
- Strategy type: MACD golden-cross, MA crossover, volume breakout, fundamental screen, or custom factor?
- Time range: start_timestamp and end_timestamp for backtest period
- Target entity IDs: specific stocks (stock_sh_600000) or index components (SZ1000)?

FILE:references/ANTI_PATTERNS.md
# Anti-Patterns (Cross-Project)

Total: **25**

## qlib (9)

### `AP-QLIB-1930` — 回测结果与模型无关——共享 dataset 对象导致预测值被首次模型覆盖 <sub>(high)</sub>

Qlib 中多个模型复用同一个已 fit 的 DatasetH 实例时，dataset 内部的标准化 参数（fit_start_time/fit_end_time 决定的归一化统计量）在第一次 fit 后固化。 切换模型但不重新初始化 dataset，导致所有模型实际使用同一套预测信号。表现为 无论换 LightGBM/XGBoost/DNN，回测净值曲线完全一致。这是最危险的"实验看起来 在跑，但结论全部无效"反模式。

Source: https://github.com/microsoft/qlib/issues/1930

### `AP-QLIB-2090` — fit_start_time 与 train segment 双重配置引发隐式数据泄露 <sub>(high)</sub>

Qlib DatasetH 有两个"训练数据范围"：handler 的 fit_start_time/fit_end_time （决定归一化器拟合范围）和 segments.train（决定模型训练范围）。常见错误是 让 fit_end_time 覆盖 valid/test 段，使归一化统计量（均值、标准差）包含了 未来数据，造成前向偏差（look-ahead bias）。两者独立配置但语义耦合，文档 未明确说明 fit_end_time 必须 <= train_end。

Source: https://github.com/microsoft/qlib/issues/2090

### `AP-QLIB-2036` — MACD 因子公式文档错误——DEA 被多除一次 CLOSE 导致量纲不一致 <sub>(high)</sub>

Qlib 官方文档中的 Alpha 公式示例将 MACD 的 DEA 定义为 EMA(DIF, 9) / CLOSE， 但 DIF 已经是无量纲（除过 CLOSE 的），再次除以 CLOSE 导致 DEA 量纲为 1/price。 基于此文档公式构建的 MACD 因子在截面标准化后与正确公式差异显著，IC 下降。 此类文档层面的公式错误会被大量用户直接照搬入生产因子库。

Source: https://github.com/microsoft/qlib/issues/2036

### `AP-QLIB-2184` — 自定义 A 股数据导入前未按约定填充停牌日 NaN，引发下游因子噪声 <sub>(high)</sub>

Qlib 约定停牌日 open/close/high/low/volume/factor 字段均应填 NaN，以便框架 在因子计算时识别并跳过。用户自建 A 股数据集时若将停牌日保留为上一日价格 （常见于从东财/Wind 直接导出的数据），会导致停牌期间的价格动量因子出现 "假信号"（价格不变但因子非零）。Qlib 不校验此约定，错误静默流入训练数据。

Source: https://github.com/microsoft/qlib/issues/2184

### `AP-QLIB-1892` — PIT（Point-In-Time）财务数据收集器依赖外部股票列表接口，全量 A 股获取不完整 <sub>(high)</sub>

Qlib 的 PIT 数据收集器（财务数据时间点快照）在初始化时调用 get_hs_stock_symbols() 获取沪深股票列表。该函数依赖东财 API，经常仅返回 部分列表而非全量 5000+ 股票，且函数在获取不完整时直接 raise ValueError。 用户若按文档步骤操作，财务数据集将只覆盖部分股票，基于 PIT 财务因子的回测 存在严重生存者偏差（未被采集的股票被隐式排除）。

Source: https://github.com/microsoft/qlib/issues/1892

### `AP-QLIB-2097` — 全市场 instrument="all" 在 32GB 内存机器上 OOM，但 CSI300 正常 <sub>(medium)</sub>

Qlib 在加载 Alpha158 特征时会将指定 universe 的全部特征矩阵一次性载入内存。 使用 instrument="csi300"（300 股）与 instrument="all"（5000+ 股）的内存占用 差约 16 倍。32GB 机器跑全市场时在 init_instance_by_config 阶段直接 OOM， 错误信息不提示内存问题。用户容易误以为是配置错误，实际上需要分批加载或 使用流式特征计算。

Source: https://github.com/microsoft/qlib/issues/2097

### `AP-QLIB-1984` — LightGBM 模型标签维度校验逻辑永远不触发导致多标签训练静默失败 <sub>(medium)</sub>

Qlib gbdt.py 中用 y.values.ndim == 2 判断是否为多标签，但从 DataFrame 取出的 Series 的 ndim 永远为 1，条件永远为 False，因此多标签训练不会走 squeeze 分支，而是直接进入 LightGBM 训练并在更深处抛出语义不明的错误。 用户尝试自定义多标签任务时无法从错误信息定位到此根因。

Source: https://github.com/microsoft/qlib/issues/1984

### `AP-QLIB-1915` — 自定义 CSV 数据 dump_bin 后 DataHandler 报 Length mismatch，D.features 却正常 <sub>(high)</sub>

Qlib 存在两套数据访问路径：D.features（直接读 binary）和 DataHandler/DataHandlerLP （带 processor pipeline）。自定义 A 股 CSV 数据在 dump_bin 时若字段顺序 或 symbol 格式（如 600000.SH vs SH600000）与 Qlib 约定不符，DataHandler 的 processor 在 align/reindex 时触发 Length mismatch，而 D.features 因不 经过 processor 而成功。这一"两套路径行为不一致"让用户误以为数据已正确导入。

Source: https://github.com/microsoft/qlib/issues/1915

### `AP-QLIB-1949` — Colab/Linux 多进程后端与 Qlib ParallelExt 冲突导致 DataHandler 完全不可用 <sub>(medium)</sub>

Qlib 在非 fork 环境（Windows 或 Google Colab）中，DataHandler 使用 joblib 并行加载特征时，ParallelExt 初始化时访问 _backend_args 属性失败（AttributeError）。 根因是 joblib 1.5+ 移除了该内部属性，Qlib 的兼容层未更新。表现为 D.features 调用抛出多层嵌套异常，用户无法从错误栈判断是并行后端问题还是数据问题。

Source: https://github.com/microsoft/qlib/issues/1949

## vnpy (4)

### `AP-VNPY-3691` — K 线生成器首根 K 线时间戳不对齐，导致第一个周期信号错误 <sub>(high)</sub>

vnpy BarGenerator 在合成 N 分钟 K 线时，第一根推送的 K 线时间戳为"当前 tick 所在分钟"而非"完整 N 分钟周期结束时间"。具体表现：09:59 的 tick 会 触发一根不完整的 5 分钟 K 线推送（本应等到 10:04 才推送）。策略若在 on_bar 中直接用 datetime.minute % 5 过滤，第一根 K 线恰好通过，但包含的 数据不足一个完整周期，用于信号计算会产生错误的开仓信号。

Source: https://github.com/vnpy/vnpy/issues/3691

### `AP-VNPY-3669` — Alpha 模块历史数据增量保存时新旧 DataFrame schema 不兼容导致 SchemaError <sub>(medium)</sub>

vnpy Alpha 模块在保存 K 线数据到 Parquet 文件时，将新下载数据（可能含 Float64 列）与已存文件（历史 Int64 列）直接 polars.concat。polars 强类型 不允许隐式类型提升，抛出 SchemaError。根因是不同数据源/版本返回的字段类型 不一致（如 volume 在部分行情源为整数，在另一些为浮点），且 concat 前无 schema 对齐步骤。影响所有使用 vnpy alpha 进行回测的历史数据构建流程。

Source: https://github.com/vnpy/vnpy/issues/3669

### `AP-VNPY-3685` — 价差交易模块 run_backtesting() 在 Jupyter 环境下静默报错，结果不可信 <sub>(high)</sub>

vnpy 4.10 价差交易（SpreadTrading）模块的 run_backtesting() 在 Jupyter 环境下存在事件循环冲突（asyncio already running），导致回测引擎部分逻辑 不执行但不抛异常，返回看似正常的回测统计数据。同样代码在命令行 Python 中无此问题。vnpy 4.x 将部分 IO 改为 async 但 Jupyter 的事件循环与之不兼容， 是"回测结果看起来正确但实际不完整"的隐蔽陷阱。

Source: https://github.com/vnpy/vnpy/issues/3685

### `AP-VNPY-3700` — 安装脚本不使用 venv 导致全局 numpy 版本被降级破坏其他依赖 <sub>(medium)</sub>

vnpy install.bat 直接在系统/conda base 环境安装，会强制降级 numpy 到 <2.0 以满足 vnpy 依赖，破坏依赖 numpy 2.x 的其他量化工具（如 scipy、pytorch 新版）。 没有 requirements.txt，依赖边界不透明。在多工具共存的量化研究环境中， vnpy 的安装脚本是"全局环境污染"的常见根源。

Source: https://github.com/vnpy/vnpy/issues/3700

## zipline (6)

### `AP-ZIPLINE-138` — 回测价格为未复权价，教程图表误导用户误判策略收益 <sub>(high)</sub>

Zipline 教程使用 AAPL 股价图做演示，但 bundle 中存储的是未复权价格（raw price）， 而非经过拆股/分红调整的复权价。图表显示的历史价格与市场实际价约差 4 倍（Apple 历次拆股累计因子），用户误将"价格翻 4 倍"当作策略收益。A 股场景更严重： 除权前后价格跳变会在未复权数据中形成巨大"信号"，吸引技术指标在除权日产生 虚假突破信号。

Source: https://github.com/stefan-jansen/zipline-reloaded/issues/138

### `AP-ZIPLINE-235` — 默认以当根 K 线收盘价成交，低估实盘滑点，策略回测收益虚高 <sub>(high)</sub>

Zipline 默认滑点模型在当根 K 线触发信号后，以同根 K 线收盘价成交（current bar close fill）。实盘中信号只能在下一根 K 线的开盘价附近成交（T+1 order execution）。以 A 股日线为例，用收盘价回测比用次日开盘价成交平均高估日收益 约 0.1-0.3%，年化差距可超 30%。需显式配置 slippage model 为 VolumeShareSlippage 或 FixedSlippage 并设合理 volume_limit。

Source: https://github.com/stefan-jansen/zipline-reloaded/issues/235

### `AP-ZIPLINE-190` — 日历 start_session 设为非交易日触发 DateOutOfBounds，无提示如何修正 <sub>(medium)</sub>

Zipline 在注册 bundle 或运行算法时，若 start_session 参数恰好是非交易日 （如 1998-01-01 元旦），Calendar 校验抛出 DateOutOfBounds（"cannot be earlier than the first session"）。错误信息仅显示交易日历起始日，不提示"请改为第一个 交易日"。A 股场景：使用 SSE/SZSE 日历时，若 start_date 恰好是春节前最后 一天次日（节假日），会触发同类错误，调试成本极高。

Source: https://github.com/stefan-jansen/zipline-reloaded/issues/190

### `AP-ZIPLINE-181` — asset db 过期后 Pipeline 报"no assets traded"，误导用户排查数据范围 <sub>(high)</sub>

Zipline 的 asset database（SQLite）记录每只股票的 start/end 交易日期。若 使用了旧版 Quandl/自建 bundle 且未重新 ingest，在回测新日期范围时 Pipeline 抛出 "Failed to find any assets with country_code 'US' that traded between [dates]"。A 股场景：重新下载行情后若只更新价格数据而未重建 asset db，退市/ 新上市股票的日期范围不更新，Pipeline 过滤会悄悄排除这些股票，产生生存者偏差。

Source: https://github.com/stefan-jansen/zipline-reloaded/issues/181

### `AP-ZIPLINE-285` — week_start()/week_end() 在自定义日历（非美股）下静默失效 <sub>(medium)</sub>

Zipline schedule_function 的 date_rules.week_start() 和 date_rules.week_end() 依赖交易日历的周首/周末判断逻辑，但在非美股日历（如 ASX、SSE）中，该逻辑 与 NYSE 日历的偏移计算不兼容，导致 schedule 永远不触发或在错误的日期触发。 A 股场景：使用 SSE 日历时，含春节等连续长假的周，week_start 可能跳过整个 假期周而不调仓，但用户无法从日志发现未触发的调度。

Source: https://github.com/stefan-jansen/zipline-reloaded/issues/285

### `AP-ZIPLINE-240` — 回测日期时区必须为 UTC，传入 naive datetime 引发深层 AssertionError <sub>(medium)</sub>

Zipline 内部强制要求所有时间戳为 UTC aware datetime。当用户传入 naive datetime （无时区信息，如 pd.Timestamp('2020-01-01')）时，不在入口处报错，而是在 算法执行深处触发 AssertionError: Algorithm should have a utc datetime，栈深 难以定位。A 股开发者从本地 CST 时间导入数据时极易触发此陷阱，需在 bundle 注册时显式 tz_localize('UTC')。

Source: https://github.com/stefan-jansen/zipline-reloaded/issues/240

## zvt (6)

### `AP-ZVT-183` — 除权因子为 inf/NaN 时直接参与乘法导致复权静默失败 <sub>(high)</sub>

ZVT 在计算前复权因子时以 new/old 价格比计算 qfq_factor。当 old==0（新股首日 或数据缺失）时因子为 inf；当 kdata.open 本身为 None（停牌日未填充）时乘法 抛出 TypeError。结果：整个 entity 的复权计算中断，后续 K 线全部丢失，但主 流程只 log ERROR 不中断，用户往往不知道已有大量股票数据损坏。

Source: https://github.com/zvtvz/zvt/issues/183

### `AP-ZVT-179` — 第三方数据接口超限后异常被吞噬，数据静默缺失 <sub>(high)</sub>

ZVT 使用聚宽 jqdatasdk 批量拉取全市场 K 线时（4000+ 股票），触发聚宽每日 最大查询条数限制（错误：已超过每日最大查询数量）。ZVT 捕获异常后继续执行下一 entity，导致超限后所有股票的当日数据均静默缺失。回测若使用该残缺数据库，因 子计算结果将产生系统性偏差，且无告警。

Source: https://github.com/zvtvz/zvt/issues/179

### `AP-ZVT-161` — 全市场 SQLite 批量因子计算触发 too many SQL variables 错误 <sub>(medium)</sub>

ZVT 在计算 VolumeUpMaFactor 等多股因子时，将所有 entity_id 拼入单条 SQL 的 IN 子句。当 A 股全市场（5000+ 股）一次性查询时，触发 SQLite 默认限制 SQLITE_MAX_VARIABLE_NUMBER=999。调大 max_allowed_packet（MySQL 参数）无效， 根因是 SQLite 变量数上限。正确解法是分批查询，但 ZVT 早期版本未处理此边界。

Source: https://github.com/zvtvz/zvt/issues/161

### `AP-ZVT-129` — 使用通配符导入隐藏 API 版本变更，AdjustType 等枚举莫名消失 <sub>(medium)</sub>

ZVT 文档示例使用 `from zvt import *` 导入所有符号。当 ZVT 版本升级重构 枚举（如将 AdjustType 移入子模块）后，通配符导入不再包含该符号，触发 AttributeError。使用者误以为是安装问题，实际是版本间 API breaking change 未在 CHANGELOG 中标注，且通配符导入掩盖了具体来源。应显式 import 枚举类。

Source: https://github.com/zvtvz/zvt/issues/129

### `AP-ZVT-187` — 回测引擎未在数据层空结果时提前终止，导致空指针级联崩溃 <sub>(medium)</sub>

ZVT Trader 在 load_data 完成后检查数据为空时，不提前退出，而是将空 DataFrame 传入 selector 计算，触发后续 NoneType 操作链式崩溃。错误栈深且难以定位根因， 用户误以为是策略逻辑问题。根因是数据时间窗口配置错误（start/end 不在数据 库覆盖范围内）但无有效校验。

Source: https://github.com/zvtvz/zvt/issues/187

### `AP-ZVT-183B` — HFQ（后复权）与 QFQ（前复权）K 线表使用错误导致因子计算漂移 <sub>(high)</sub>

ZVT 提供 Stock1dKdata（不复权）、Stock1dHfqKdata（后复权）、Stock1dQfqKdata （前复权）三张独立表。用户在计算价格动量/均线因子时混用两张表（如用不复权 做均线，用后复权做收益率），导致除权日前后因子值产生跳变。ZVT 不做跨表 复权类型一致性校验，混用静默通过。

Source: https://github.com/zvtvz/zvt/issues/183

FILE:references/COMPONENTS.md
# Component Capability Map

**Project**: finance-bp-120--alphalens-reloaded
**Scan date**: 2026-04-22
**Stats**: {'total_files': 4, 'total_classes': 32, 'total_functions': 0, 'total_stages': 4}

## Modules (4)

- [data_preparation_&_alignment](components/data_preparation_-_alignment.md): 8 classes
- [performance_&_risk_metrics](components/performance_-_risk_metrics.md): 9 classes
- [plotting_&_visualization](components/plotting_-_visualization.md): 8 classes
- [tear_sheet_reporting](components/tear_sheet_reporting.md): 7 classes

FILE:references/CONSTRAINTS.md
# Constraints

## preservation_manifest

```yaml
required_objects:
  business_decisions_count: 136
  fatal_constraints_count: 31
  non_fatal_constraints_count: 129
  use_cases_count: 6
  semantic_locks_count: 12
  preconditions_count: 4
  evidence_quality_rules_count: 2
  traceback_scenarios_count: 5

```

## Domain Constraints Injected (39)

- **`SHARED-BT-LAB-001`** <sub>(fatal)</sub>: 未来函数（Lookahead Bias）：在模拟历史时间点 t 的交易决策时， 不得使用 t 时刻之后才能知道的信息。最常见形式： (1) 使用收盘价计算信号并同日以收盘价成交； (2) 将 T 日收盘后计算的指标标记在同一根 K 线； (3) 使用当日最高/最低价作为成交假设。 信号计算与成交时间必须对齐：T 日收盘后计算信号，T+1 日开盘成交。
- **`SHARED-BT-LAB-002`** <sub>(high)</sub>: 指标预热期（Warmup Period）处理：滚动窗口指标在前 N 个 bar 时 NaN， 这些 bar 不应参与信号计算和持仓决策。强制要求指标的 warmup_period 与最长 lookback 期等长，且 warmup 期间持仓应置零。
- **`SHARED-BT-LAB-003`** <sub>(fatal)</sub>: ML/DL 模型时序数据划分必须按时间顺序：TRAIN < VALID < TEST， 不可使用随机 k-fold 分折（会将未来数据混入训练集）。 应使用 TimeSeriesSplit 或 Walk-Forward 验证。
- **`SHARED-BT-LAB-004`** <sub>(fatal)</sub>: 开盘价/最高价/最低价成交假设：日线回测中假设每日可以最高价卖出或 最低价买入（如动量策略"最高价止盈"），这是明显的 lookahead， 因为日内最高/最低价只有收盘后才能确认。成交价只能用开盘价或 前一日收盘价（带滑点）。
- **`SHARED-BT-LAB-005`** <sub>(high)</sub>: 数据对齐偏移（Off-by-one）：pandas rolling/shift 等操作容易引入细微的 1 期偏移错误。应在代码中明确记录每个序列的"观测时间点"， 并通过 assert 验证关键时间对齐关系。
- **`SHARED-BT-LAB-006`** <sub>(high)</sub>: 过度优化（Overfitting）：回测数量越多，过拟合概率越高。 Bailey et al.（2014）证明 Optimal Sharpe Ratio 期望值随回测次数单调递减。 应使用 Walk-Forward 验证代替 in-sample 参数穷举，并报告 Deflated Sharpe Ratio（DSR）而非峰值 Sharpe。
- **`SHARED-BT-SURV-001`** <sub>(fatal)</sub>: 幸存者偏差（Survivorship Bias）：使用当前市场成分股作为历史回测股票池， 会遗漏曾经存在但后来退市、摘牌或被合并的股票，系统性高估策略历史收益率。 回测股票池必须使用历史时点快照（point-in-time universe）。
- **`SHARED-BT-SURV-002`** <sub>(high)</sub>: In-Sample / Out-of-Sample 划分：策略开发、参数选择必须在样本内完成， 样本外数据仅用于最终验证，不可多次"看"样本外数据后继续调优 （会将样本外变为新的样本内，重蹈过拟合）。
- **`SHARED-BT-SURV-003`** <sub>(high)</sub>: 停牌/缺失数据的填充策略：停牌日价格不可简单用前一日收盘价 forward-fill， 因为这会在复盘时造成"零成交量"日参与了因子计算和信号生成。 应在因子计算层显式过滤缺失交易日，不填充。
- **`SHARED-BT-SURV-004`** <sub>(high)</sub>: 异常值（Extreme Value）污染：原始市场数据可能含有数据源错误（如除权未 及时调整、手工录入错误导致的极端价格），不清洗直接进入因子计算会产生 极端信号，污染整个横截面。应在 pipeline 入口处过滤 3-sigma 异常值。
- **`SHARED-BT-COST-001`** <sub>(fatal)</sub>: 交易成本（佣金 + 印花税/转让税 + 过户费）必须在回测初始化时强制配置， 不可使用零成本默认值。忽略成本的回测策略绩效指标具有欺骗性， 高换手率策略尤其严重（单边往返成本往往吞噬 50%+ 的毛收益）。
- **`SHARED-BT-COST-002`** <sub>(high)</sub>: 滑点（Slippage）建模：回测若无滑点，假设每笔订单以理想价格成交， 高频策略在实盘中会因成交价劣化而产生严重亏损。至少应配置固定点差 或比例滑点；大单应使用成交量比例模型（如不超过日成交量 5%）。
- **`SHARED-BT-COST-003`** <sub>(high)</sub>: 换手率（Turnover）必须在回测绩效报告中展示并与成本关联分析。 月换手率超过 50%（年化 600%+）时，策略净收益对成本假设极度敏感， 每 10bps 成本变化可能改变策略盈亏结论，必须做成本敏感性分析。
- **`SHARED-BT-COST-004`** <sub>(medium)</sub>: 仓位规模化（Position Sizing）必须纳入资金量约束：回测应模拟固定资金量 下的实际持仓股数（取整），而非假设可以持有小数股。 对小盘股，最小交易单位（A股：100股/手）会导致实际可持仓量与目标权重 产生偏差，应在回测中模拟取整效应。
- **`SHARED-BT-TIME-001`** <sub>(high)</sub>: 时间戳时区统一：多数据源合并时，UTC vs 本地时间混用是常见数据腐败源。 所有时间戳必须在 pipeline 入口处统一转换为同一时区（推荐 UTC 存储， 市场本地时区展示），不可在 pipeline 中途混用不同时区。
- **`SHARED-BT-TIME-002`** <sub>(high)</sub>: 交易日历对齐：合并不同市场或不同频率数据时（如日线价格 + 周频因子）， 必须使用明确的交易日历进行 reindex/merge，不可使用 outer join 后 fillna， 否则会在非交易日（节假日）创建虚假数据行。
- **`SHARED-BT-TIME-003`** <sub>(high)</sub>: 增量更新边界校验：历史数据增量更新时，必须从数据库查询已存最新日期， 仅下载该日期之后的数据。若重新下载已有数据并追加，会产生时间戳重复行， 导致回测时序错误。更新前后必须校验无重复 (index.duplicated().any() == False)。
- **`SHARED-BT-TIME-004`** <sub>(medium)</sub>: 回测绩效归因失真：基准（Benchmark）选择不当会使 Alpha/Beta 计算失真。 应选用策略实际可投资的被动基准（如 HS300 ETF），而非不可直接投资的 价格指数（如 HS300 指数）。价格指数不含股息再投资，会低估持仓基准收益。
- **`SHARED-BT-PERF-001`** <sub>(medium)</sub>: 最大回撤（Max Drawdown）计算必须使用净值序列（portfolio value）， 不可用累计收益率序列代替。若使用对数收益率累加，会低估回撤深度 （因对数收益率在下跌时会比简单收益率偏小）。
- **`SHARED-BT-PERF-002`** <sub>(medium)</sub>: Sharpe Ratio 年化化约定：年化 Sharpe = 日 Sharpe × sqrt(252)（股票，252 交易日） 或 × sqrt(365)（加密货币，365日）。不同系统默认不同，跨系统对比前必须 确认年化因子，否则 Sharpe 不可比。
- **`SHARED-BT-PERF-003`** <sub>(medium)</sub>: Calmar Ratio / Sortino Ratio 优于 Sharpe Ratio 作为风险调整收益指标： Sharpe 假设收益正态分布，A 股/加密市场的收益分布显著左偏（肥尾）， 会低估下行风险。量化评估应同时报告 Sortino（仅下行波动）和 Calmar（年化收益/最大回撤），不应单一依赖 Sharpe。
- **`SHARED-BT-PERF-004`** <sub>(medium)</sub>: 回测绩效归因应拆解为：alpha（主动收益）、beta（市场收益）、 因子暴露收益（style/sector）和特异性收益（stock selection）。 不做归因的回测无法区分"策略优秀"与"顺风行情恰好 beta 对了"。
- **`SHARED-FR-IC-001`** <sub>(high)</sub>: IC（信息系数）是衡量因子预测能力的核心指标，定义为因子值与 下期收益率的 Spearman 秩相关系数（ICIR = IC / std(IC)）。 IC 绝对值 > 0.05 视为有预测能力的初步证据，ICIR > 0.5 视为稳定。 不计算 IC 直接报告回测绩效是因子有效性证明缺失的典型问题。
- **`SHARED-FR-IC-002`** <sub>(high)</sub>: IC 衰减（IC Decay）分析：因子预测能力通常随持仓期增长而衰减。 应计算 1/5/10/20 日 IC 序列，识别因子的最优持仓期。 IC 在1日高但20日迅速衰减的因子是短期因子，不适合月度换仓策略； 反之亦然。使用错误的持仓期会严重损害因子实盘表现。
- **`SHARED-FR-IC-003`** <sub>(high)</sub>: Harvey, Liu & Zhu (2016) 警告：学术界已发现 300+ 个"显著"因子， 其中大量是多重检验下的误发现（False Discovery）。因子有效性要求： t-stat > 3.0（而非传统的 1.96）；或在不同时段/市场独立复现； 或有清晰的经济学逻辑。不满足上述条件的因子极可能是数据挖掘产物。
- **`SHARED-FR-IC-004`** <sub>(high)</sub>: 因子换手率（Factor Turnover）控制：高 IC 但高换手率的因子，在扣除 交易成本后净 IC 可能为负。应计算换手率调整后的有效 IC： net_IC = IC - turnover × cost_per_turn。目标换手率 ≤ 50%（月频）。
- **`SHARED-FR-IC-005`** <sub>(medium)</sub>: 因子衰减期（Half-life）是因子信号强度的核心参数，直接决定最优再平衡频率。 半衰期 < 5 日：日频或周频换仓；5-20 日：周频或双周；> 20 日：月频换仓。 错误地对短期因子使用月频换仓，会导致大量 alpha 在持仓期内消散。
- **`SHARED-FR-NEUT-001`** <sub>(high)</sub>: 行业中性化（Industry Neutralization）：因子值若不对行业均值中性化， 因子收益中会混入行业轮动收益，难以判断是因子本身还是行业暴露驱动了收益。 行业中性化操作：factor_neutral = factor - industry_mean(factor)。
- **`SHARED-FR-NEUT-002`** <sub>(high)</sub>: 市值中性化（Market Cap Neutralization）：小盘股效应（小盘跑赢大盘） 是金融史上最持久的 anomaly 之一，会污染几乎所有未中性化的因子。 若因子与市值高度相关，选股会系统性偏向小盘，收益来自市值暴露而非因子本身。 需同时进行行业和市值中性化（Fama-MacBeth 回归或残差法）。
- **`SHARED-FR-NEUT-003`** <sub>(high)</sub>: 异常值处理（Winsorize/MAD）：因子原始值通常含有极端值，极端值会扭曲 分组分析（如 Q1/Q10 十分位）。应对原始因子值做 Winsorize（截尾至 [1%, 99%] 或 3-sigma）或 MAD（中位数绝对偏差）缩尾，然后再排名/中性化。
- **`SHARED-FR-NEUT-004`** <sub>(medium)</sub>: 因子正交化（Factor Orthogonalization）：当多个因子共同用于合成打分时， 高相关因子的合成等效于对单一因子过度权重，稀释信号多样性。 应在合成前对因子做施密特正交化或 PCA，消除因子间的多重共线性。
- **`SHARED-FR-NEUT-005`** <sub>(medium)</sub>: 缺失数据填充策略：因子计算中的 NaN（停牌/新股/数据缺口）若用截面均值填充 会引入 lookahead bias（均值本身含未来信息）；若完全删除会产生幸存者偏差； 正确做法是用截面中位数（当日所有股票的中位数，不依赖未来）或将该股当日排除。
- **`SHARED-FR-PORT-001`** <sub>(high)</sub>: 分层分析（Quantile Analysis）：因子评估应使用 Q1/Q5（五分位）或 Q1/Q10（十分位）分组的多空收益差（top minus bottom spread）作为 主要评估指标，而非简单的多头收益。Q1 多 Q5 空的"单调性"检验是 因子有效性的核心证据：单调递增/递减 > 非单调 >> 仅多头有效。
- **`SHARED-FR-PORT-002`** <sub>(medium)</sub>: Alpha 衰减测试（Alpha Decay Test）：因子的月度 IC 在不同时段（牛市/熊市/ 震荡市）的稳定性是因子鲁棒性的重要证据。IC 仅在某个特定市场状态下有效 的因子不适合全天候部署；应分段（rolling 12M）展示 IC 时序， 识别因子失效期。
- **`SHARED-FR-PORT-003`** <sub>(medium)</sub>: 换仓成本感知（Turnover-Aware Selection）：因子排名靠近中间地带（49-51 分位） 的股票，排名小幅波动就会触发换仓，产生大量无效交易成本。 应在选股时设置换仓缓冲区（buffer zone）：只在排名变化超过阈值时才换仓。
- **`SHARED-FR-PORT-004`** <sub>(medium)</sub>: 分组收益的统计显著性（Bootstrap 检验）：因子分层收益差（Q1-Q5 spread） 即使在历史数据上很大，也可能是偶然，需要 bootstrap 或 t-test 检验 显著性（p-value < 0.05）。小样本回测期（< 3年）的分层收益尤其不可靠。
- **`SHARED-FR-XFER-001`** <sub>(high)</sub>: 因子跨市场可移植性验证：在一个市场有效的因子，不必然在另一个市场有效。 将美股因子直接套用 A 股、或将股票因子套用期货/加密货币，需要独立 IC 验证， 不可假设跨市场通用性。A 股特有异象（如反转效应、ST 价格异常）不存在于美股。
- **`SHARED-FR-XFER-002`** <sub>(medium)</sub>: 因子有效性时间稳定性：曾经有效的因子会因市场学习和套利行为逐渐失效 （McLean & Pontiff 2016 证明因子发表后平均衰减 58%）。 应定期（每季度/年）重新评估因子 IC，失效因子应及时替换或降权。
- **`SHARED-FR-XFER-003`** <sub>(medium)</sub>: 因子与宏观经济环境的交互：利率周期/经济周期/市场情绪对因子有效性影响显著。 价值因子（低 P/B）在利率上升期更有效；动量因子在趋势市更有效，震荡市失效。 部署因子前应评估当前宏观环境与因子最优生存环境的匹配度。

FILE:references/LOCKS.md
# Semantic Locks + Preconditions

## Semantic Locks (12)

### `SL-01` <sub>(on_violation: fatal)</sub>
Execute sell orders before buy orders in every trading cycle

### `SL-02` <sub>(on_violation: fatal)</sub>
Trading signals MUST use next-bar execution (no look-ahead)

### `SL-03` <sub>(on_violation: fatal)</sub>
Entity IDs MUST follow format entity_type_exchange_code

### `SL-04` <sub>(on_violation: fatal)</sub>
DataFrame index MUST be MultiIndex (entity_id, timestamp)

### `SL-05` <sub>(on_violation: fatal)</sub>
TradingSignal MUST have EXACTLY ONE of: position_pct, order_money, order_amount

### `SL-06` <sub>(on_violation: fatal)</sub>
filter_result column semantics: True=BUY, False=SELL, None/NaN=NO ACTION

### `SL-07` <sub>(on_violation: fatal)</sub>
Transformer MUST run BEFORE Accumulator in factor pipeline

### `SL-08` <sub>(on_violation: fatal)</sub>
MACD parameters locked: fast=12, slow=26, signal=9

### `SL-09` <sub>(on_violation: warning)</sub>
Default transaction costs: buy_cost=0.001, sell_cost=0.001, slippage=0.001

### `SL-10` <sub>(on_violation: fatal)</sub>
A-share equity trading is T+1 (no same-day close of buy positions)

### `SL-11` <sub>(on_violation: fatal)</sub>
Recorder subclass MUST define provider AND data_schema class attributes

### `SL-12` <sub>(on_violation: fatal)</sub>
Factor result_df MUST contain either 'filter_result' OR 'score_result' column

## Preconditions (4)

- **`PC-01`**: `python3 -c 'import zvt; print(zvt.__version__)'` → on_fail: Run: python3 -m pip install zvt then re-run: python3 -m zvt.init_dirs to initialize data directories
- **`PC-02`**: `python3 -c "from zvt.api.kdata import get_kdata; df = get_kdata(entity_ids=['stock_sh_600000'], limit=1); assert df is not None and len(df) > 0, 'No kdata found'"` → on_fail: Run recorder first: python3 -m zvt.recorders.em.em_stock_kdata_recorder --entity_ids stock_sh_600000 (replace with your target entity IDs)
- **`PC-03`**: `python3 -c "import os; from pathlib import Path; zvt_home = Path(os.environ.get('ZVT_HOME', Path.home() / '.zvt')); assert zvt_home.exists(), f'ZVT home not found: {zvt_home}'"` → on_fail: Run: python3 -m zvt.init_dirs
- **`PC-04`**: `python3 -c "import os, tempfile; from pathlib import Path; zvt_home = Path(os.environ.get('ZVT_HOME', Path.home() / '.zvt')); test_f = zvt_home / '.write_test'; test_f.touch(); test_f.unlink()"` → on_fail: Check directory permissions: chmod u+w ~/.zvt or set ZVT_HOME environment variable to a writable location

FILE:references/USE_CASES.md
# Known Use Cases (KUC)

Total: **6**

## `KUC-101`
**Source**: `docs/deploy.py`

Automated build and deployment of project documentation to ensure consistent and reproducible documentation releases.

## `KUC-102`
**Source**: `docs/source/conf.py`

Configures the Sphinx documentation system with extensions for Python API documentation, Jupyter notebooks, and mathematical expressions.

## `KUC-103`
**Source**: `docs/source/notebooks/event_study.ipynb, src/alphalens/examples/event_study.ipynb`

Identifies and analyzes specific market events (e.g., price crossing thresholds) to study their predictive power and forward return characteristics.

## `KUC-104`
**Source**: `docs/source/notebooks/intraday_factor.ipynb, src/alphalens/examples/intraday_factor.ipynb`

Analyzes factors across multiple market sectors (11 GICS sectors) to evaluate cross-sector factor performance and sector-specific factor behavior.

## `KUC-105`
**Source**: `docs/source/notebooks/overview.ipynb, src/alphalens/examples/overview.ipynb`

Provides a comprehensive introduction to Alphalens capabilities for factor analysis, including data preparation, factor computation, and performance visualization.

## `KUC-106`
**Source**: `docs/source/notebooks/pyfolio_integration.ipynb, src/alphalens/examples/pyfolio_integration.ipynb`

Combines Alphalens factor analysis with PyFolio portfolio analytics to evaluate factor-derived portfolio performance, risk metrics, and tearsheet generation.

FILE:references/WISDOM.md
# Cross-Project Wisdom

Total: **10**

## `CW-BT-001` — Cerebro 统一编排引擎
**From**: backtrader · **Applicable to**: backtesting

backtrader 用 Cerebro 作为单一入口，统一管理 data feeds、strategies、analyzers、 observers 的生命周期，支持一次 cerebro.run() 跑多策略+多数据源。 zvt 的 StockTrader 目前每次实例化只绑定一套因子，缺乏统一的多策略组合编排层； 借鉴 Cerebro 模式可让用户把多个 Trader 实例组合到一个 runner 中对比评估。

## `CW-BT-002` — Analyzer 插件化绩效评估
**From**: backtrader · **Applicable to**: backtesting

backtrader 提供 SharpeRatio、DrawDown、TimeReturn、TradeAnalyzer 等即插即用 的 Analyzer，可在不修改策略代码的情况下附加任意绩效指标。 zvt 当前绩效评估能力较弱，没有标准化的 Analyzer 接口； 借鉴此模式可让用户 cerebro.addanalyzer(SharpeRatio) 即得风险调整收益报告。

## `CW-BT-003` — Sizer 仓位管理分离
**From**: backtrader · **Applicable to**: backtesting

backtrader 将仓位管理（每次开仓买多少股/多大比例）单独抽象为 Sizer， 与信号逻辑完全解耦；内置 FixedSize、PercentSizer 等，用户可自定义。 zvt 目前没有显式的 Sizer 概念，仓位控制逻辑散落在 Trader.on_profit_control 等钩子中； 引入 Sizer 接口可使策略信号与资金管理规则独立演化和组合复用。

## `CW-BT-004` — Order 类型全集（Limit/Stop/OCO/Bracket）
**From**: backtrader · **Applicable to**: backtesting

backtrader 支持 Market、Limit、Stop、StopLimit、OCO（二选一）、 Bracket（止盈止损一对订单）等丰富订单类型，并模拟成交滑点和手续费方案。 zvt 回测目前主要支持市价成交，缺乏限价委托和组合订单模拟； 对于高频或实盘对接场景，完善订单类型将大幅提升回测真实性。

## `CW-BT-005` — 数据重采样与重播（Resampling & Replaying）
**From**: backtrader · **Applicable to**: backtesting

backtrader 可将低级别数据（如 1 min）实时 resample 为高级别（如 1 day）并同步驱动策略， 或 replay 逐 tick 模拟 OHLC 形成过程，实现日内精细回测。 zvt 目前多时间框架通过预录入不同级别 K 线实现，缺少运行时动态重采样； 借鉴此模式可在不重复录入数据的前提下支持任意时间粒度组合回测。

## `CW-VN-003` — CTA 回测引擎内置可视化
**From**: vnpy · **Applicable to**: backtesting

vnpy 的 cta_backtester 提供图形界面直接展示策略净值曲线、最大回撤、 每日盈亏、成交明细，无需 Jupyter Notebook。 zvt 目前回测结果可视化依赖 draw_result 方法调用 Plotly，但无统一的回测报告页面； 借鉴此模式可打包一个开箱即用的策略绩效仪表盘。

## `CW-VN-004` — vnpy.alpha ML 因子研究实验室（Lab）
**From**: vnpy · **Applicable to**: factor-research

vnpy 4.0 的 vnpy.alpha.lab 提供数据管理、模型训练、信号生成、策略回测一体化工作流， 支持 Lasso/LightGBM/MLP 等算法的标准化训练接口和可视化对比。 zvt 的 ML 能力目前仅有 MaStockMLMachine 一个入口，缺乏规范化 Lab 框架； 借鉴 Lab 模式可建立"特征工程→训练→信号→回测"的标准流水线，降低 ML 实验门槛。

## `CW-QL-001` — Point-in-Time 数据库（防未来数据泄漏）
**From**: qlib · **Applicable to**: backtesting

qlib 的 Point-in-Time Provider 保证在给定时间点 t 的查询只返回 t 时刻 真实可知的数据（财报发布延迟、修订历史均被正确处理）， 彻底消除回测中的 look-ahead bias。 zvt 目前财务数据以报告期为 timestamp，缺少"发布日"维度， 存在用未来财报数据做选股的潜在偏差；引入 PIT 模式可大幅提升回测可信度。

## `CW-QL-002` — Recorder + Experiment 实验管理（MLflow 风格）
**From**: qlib · **Applicable to**: factor-research

qlib 的 workflow 模块提供 Experiment/Recorder，自动记录每次模型训练的 超参数、特征、指标、预测结果，支持跨实验比较和模型版本管理。 zvt 目前缺乏 ML 实验追踪机制，每次重跑结果会覆盖前次； 借鉴 Recorder 模式可将每次因子实验的参数和结果持久化，支持快速复现和版本对比。

## `CW-QL-003` — Nested Decision Framework（多层嵌套决策执行）
**From**: qlib · **Applicable to**: backtesting

qlib 支持将高频执行层（分钟级委托拆单）嵌套在低频决策层（日级组合调仓）中， 两层独立优化且可组合运行，实现日内最优执行算法（如 TWAP、VWAP 调仓）。 zvt 目前回测仅有日线级别的成交假设，缺乏执行算法建模； 借鉴嵌套框架可让 zvt 区分"何时持有哪些股"与"如何以最小冲击成本建仓"两个问题。

FILE:references/components/data_preparation_-_alignment.md
# data_preparation_&_alignment (8 classes)

## `utils.get_clean_factor_and_forward_returns`
`data_preparation_&_alignment/utils-get-clean-factor-and-forward-retur.py:0`

## `utils.compute_forward_returns`
`data_preparation_&_alignment/utils-compute-forward-returns.py:0`

## `utils.quantize_factor`
`data_preparation_&_alignment/utils-quantize-factor.py:0`

## `utils.demean_forward_returns`
`data_preparation_&_alignment/utils-demean-forward-returns.py:0`

## `utils.infer_trading_calendar`
`data_preparation_&_alignment/utils-infer-trading-calendar.py:0`

## `forward_returns_computation`
`data_preparation_&_alignment/forward-returns-computation.py:0`

## `binning_strategy`
`data_preparation_&_alignment/binning-strategy.py:0`

## `zscore_filter`
`data_preparation_&_alignment/zscore-filter.py:0`

FILE:references/components/performance_-_risk_metrics.md
# performance_&_risk_metrics (9 classes)

## `performance.factor_information_coefficient`
`performance_&_risk_metrics/performance-factor-information-coefficie.py:0`

## `performance.factor_weights`
`performance_&_risk_metrics/performance-factor-weights.py:0`

## `performance.factor_returns`
`performance_&_risk_metrics/performance-factor-returns.py:0`

## `performance.factor_alpha_beta`
`performance_&_risk_metrics/performance-factor-alpha-beta.py:0`

## `performance.mean_return_by_quantile`
`performance_&_risk_metrics/performance-mean-return-by-quantile.py:0`

## `performance.factor_rank_autocorrelation`
`performance_&_risk_metrics/performance-factor-rank-autocorrelation.py:0`

## `IC_computation`
`performance_&_risk_metrics/ic-computation.py:0`

## `weighting_scheme`
`performance_&_risk_metrics/weighting-scheme.py:0`

## `portfolio_type`
`performance_&_risk_metrics/portfolio-type.py:0`

FILE:references/components/plotting_-_visualization.md
# plotting_&_visualization (8 classes)

## `plotting.plot_ic_ts`
`plotting_&_visualization/plotting-plot-ic-ts.py:0`

## `plotting.plot_ic_hqq`
`plotting_&_visualization/plotting-plot-ic-hqq.py:0`

## `plotting.plot_quantile_returns_bar`
`plotting_&_visualization/plotting-plot-quantile-returns-bar.py:0`

## `plotting.plot_cumulative_returns`
`plotting_&_visualization/plotting-plot-cumulative-returns.py:0`

## `plotting.plot_turnover_table`
`plotting_&_visualization/plotting-plot-turnover-table.py:0`

## `plotting.plot_event_returns`
`plotting_&_visualization/plotting-plot-event-returns.py:0`

## `plotting_style`
`plotting_&_visualization/plotting-style.py:0`

## `context`
`plotting_&_visualization/context.py:0`

FILE:references/components/tear_sheet_reporting.md
# tear_sheet_reporting (7 classes)

## `tears.create_full_tear_sheet`
`tear_sheet_reporting/tears-create-full-tear-sheet.py:0`

## `tears.create_summary_tear_sheet`
`tear_sheet_reporting/tears-create-summary-tear-sheet.py:0`

## `tears.create_returns_tear_sheet`
`tear_sheet_reporting/tears-create-returns-tear-sheet.py:0`

## `tears.create_information_tear_sheet`
`tear_sheet_reporting/tears-create-information-tear-sheet.py:0`

## `tears.create_turnover_tear_sheet`
`tear_sheet_reporting/tears-create-turnover-tear-sheet.py:0`

## `tears.create_event_returns_tear_sheet`
`tear_sheet_reporting/tears-create-event-returns-tear-sheet.py:0`

## `tear_sheet_type`
`tear_sheet_reporting/tear-sheet-type.py:0`

ClawHub DevOps Data Analysis+2

T@clawhub-tangweigang-jpg-8679fec286

Akshare Financial Data

Skill

获取中国 A 股市场实时行情、历史 K 线、财务报表、基金期货等金融数据，支持股票、债券、期权等多品种数据查询。

---
name: akshare-financial-data
description: |-
  获取中国 A 股市场实时行情、历史 K 线、财务报表、基金期货等金融数据，支持股票、债券、期权等多品种数据查询。
license: Proprietary. See LICENSE.txt in project root.
compatibility: Designed for Doramagic-host ecosystem (Claude Code / openclaw / Cursor). Requires Python 3.12+ with uv package manager.
metadata:
  version: "v6.1"
  blueprint_id: "finance-bp-079"
  compiled_at: "2026-04-22T13:00:30.352072+00:00"
  capability_markets: "cn-astock"
  capability_activities: "data-sourcing"
  sop_version: "crystal-compilation-v6.1"
---
# AkShare 金融数据 (akshare-financial-data)

> 获取中国 A 股市场实时行情、历史 K 线、财务报表、基金期货等金融数据，支持股票、债券、期权等多品种数据查询。

## Pipeline

`data_collection -> data_storage -> factor_computation -> target_selection -> trading_execution -> visualization`

## Top Use Cases (1 total)

### Sphinx Documentation Configuration for Akshare (`UC-101`)
Sets up the Sphinx documentation builder with Chinese language support (via ctex), Markdown parsing via recommonmark, and automatic version string ext
**Triggers**: documentation, sphinx, docs build

**Execute trigger**: `When user intent matches intent_router.uc_entries[].positive_terms AND user uses action verb (run/execute/跑/执行/backtest/fetch/collect)`

## What I'll Ask You

- Target market: A-share (default), HK, or crypto? (US stocks in ZVT are half-baked — stockus_nasdaq_AAPL exists but coverage is thin)
- Data source / provider: eastmoney (free, no account), joinquant (account+paid), baostock (free, good history), akshare, or qmt (broker)?
- Strategy type: MACD golden-cross, MA crossover, volume breakout, fundamental screen, or custom factor?
- Time range: start_timestamp and end_timestamp for backtest period
- Target entity IDs: specific stocks (stock_sh_600000) or index components (SZ1000)?

## Semantic Locks (Fatal)

| ID | Rule | On Violation |
|---|---|---|
| `SL-01` | Execute sell orders before buy orders in every trading cycle | halt |
| `SL-02` | Trading signals MUST use next-bar execution (no look-ahead) | halt |
| `SL-03` | Entity IDs MUST follow format entity_type_exchange_code | halt |
| `SL-04` | DataFrame index MUST be MultiIndex (entity_id, timestamp) | halt |
| `SL-05` | TradingSignal MUST have EXACTLY ONE of: position_pct, order_money, order_amount | halt |
| `SL-06` | filter_result column semantics: True=BUY, False=SELL, None/NaN=NO ACTION | halt |
| `SL-07` | Transformer MUST run BEFORE Accumulator in factor pipeline | halt |
| `SL-08` | MACD parameters locked: fast=12, slow=26, signal=9 | halt |

Full lock definitions: [references/LOCKS.md](references/LOCKS.md)

## Top Anti-Patterns (14 total)

- **`AP-DATA-SOURCING-001`**: Missing or invalid User-Agent headers for SEC API requests
- **`AP-DATA-SOURCING-002`**: Ignoring external API rate limits causing IP blocking
- **`AP-DATA-SOURCING-003`**: No HTTP timeout configuration causing indefinite hangs

All 14 anti-patterns: [references/ANTI_PATTERNS.md](references/ANTI_PATTERNS.md)

## Evidence Quality Notice

> [QUALITY NOTICE] This crystal was compiled from blueprint finance-bp-079. Evidence verify ratio = 30.6% and audit fail total = 41. Generated results may have uncaptured requirement gaps. Verify critical decisions against source files (LATEST.yaml / LATEST.jsonl).

## Reference Files

| File | Contents | When to Load |
|---|---|---|
| [references/seed.yaml](references/seed.yaml) | V6+ 全量权威 (source-of-truth) | 有行为/决策争议时必读 |
| [references/ANTI_PATTERNS.md](references/ANTI_PATTERNS.md) | 14 条跨项目反模式 | 开始实现前 |
| [references/WISDOM.md](references/WISDOM.md) | 跨项目精华借鉴 | 架构决策时 |
| [references/CONSTRAINTS.md](references/CONSTRAINTS.md) | domain + fatal 约束 | 规则冲突时 |
| [references/USE_CASES.md](references/USE_CASES.md) | 全量 KUC-* 业务场景 | 需要完整示例时 |
| [references/LOCKS.md](references/LOCKS.md) | SL-* + preconditions + hints | 生成回测/交易代码前 |
| [references/COMPONENTS.md](references/COMPONENTS.md) | AST 组件地图（按 module 拆分）| 查 API 时 |

---

*Compiled by Doramagic crystal-compilation-v6.1 from `finance-bp-079` blueprint at 2026-04-22T13:00:30.352072+00:00.*
*See [human_summary.md](human_summary.md) for non-technical overview.*

FILE:human_summary.md
# Human Summary

> I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow. Just tell me what you want; I'll write the code, you don't have to dig docs. (Heads up: ZVT natively supports A-share, HK, and crypto. US stocks — stockus_nasdaq_AAPL — are half-baked; don't bother for serious work.)

## What I Can Do

- **tagline**: I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow. Just tell me what you want; I'll write the code, you don't have to dig docs. (Heads up: ZVT natively supports A-share, HK, and crypto. US stocks — stockus_nasdaq_AAPL — are half-baked; don't bother for serious work.)
- **use_cases**: ['Sphinx Documentation Configuration for Akshare', 'A-share MACD daily golden-cross backtest with hfq price adjustment from eastmoney', 'End-to-end ZVT pipeline: FinanceRecorder + GoodCompanyFactor + StockTrader', 'Multi-factor strategy with TargetSelector (AND mode) combining MACD + volume breakout', 'Index composition data collection (SZ1000, SZ2000) with EM recorder', 'Institutional fund holdings tracker via joinquant_fund_runner pattern', 'Custom Transformer + Accumulator factor with per-entity rolling state']

## What I Ask You

- Target market: A-share (default), HK, or crypto? (US stocks in ZVT are half-baked — stockus_nasdaq_AAPL exists but coverage is thin)
- Data source / provider: eastmoney (free, no account), joinquant (account+paid), baostock (free, good history), akshare, or qmt (broker)?
- Strategy type: MACD golden-cross, MA crossover, volume breakout, fundamental screen, or custom factor?
- Time range: start_timestamp and end_timestamp for backtest period
- Target entity IDs: specific stocks (stock_sh_600000) or index components (SZ1000)?

FILE:references/ANTI_PATTERNS.md
# Anti-Patterns (Cross-Project)

Total: **14**

## finance-bp-070--edgartools (2)

### `AP-DATA-SOURCING-004` — Invalidating XBRL period types for balance sheet analysis <sub>(high)</sub>

Balance sheets represent point-in-time snapshots (instant periods), not ranges (duration periods). Using duration periods for balance sheet statements causes stockholder equity and other line items to show nonsensical date ranges, corrupting financial calculations that depend on accurate period associations.

### `AP-DATA-SOURCING-012` — Large document parsing without streaming causing OOM errors <sub>(high)</sub>

SEC filings can exceed 160MB, and parsing large documents in memory without streaming causes OOM errors that crash the entire service for all users. Documents exceeding 10MB require switching to streaming parsers to prevent extreme memory usage.

## finance-bp-070--edgartools, finance-bp-079--akshare, finance-bp-084--eastmoney, finance-bp-114--edgar-crawler (1)

### `AP-DATA-SOURCING-002` — Ignoring external API rate limits causing IP blocking <sub>(high)</sub>

Multiple financial data sources (SEC EDGAR, Sina, Eastmoney, TuShare) enforce strict rate limits (10 req/sec, 120 calls/minute). Exceeding these triggers temporary IP blocks lasting 10-60 minutes, causing complete data unavailability. Immediate retry attempts during blocks extend the block duration significantly.

## finance-bp-070--edgartools, finance-bp-114--edgar-crawler (1)

### `AP-DATA-SOURCING-001` — Missing or invalid User-Agent headers for SEC API requests <sub>(high)</sub>

SEC EDGAR requires valid User-Agent identity with contact information in headers. Without this, requests are rejected with 403 Forbidden errors, completely blocking all filing access. Both edgartools and edgar-crawler enforce this constraint as fundamental to any data retrieval operation.

## finance-bp-079--akshare (4)

### `AP-DATA-SOURCING-003` — No HTTP timeout configuration causing indefinite hangs <sub>(high)</sub>

HTTP requests to external financial data sources (Yahoo, Sina, Eastmoney) without timeout values can hang indefinitely on blocked connections. This freezes the entire application and prevents data collection from all other sources, creating cascading failures across the system.

### `AP-DATA-SOURCING-005` — Malformed or empty JSON responses causing silent failures <sub>(medium)</sub>

Financial API responses containing malformed JSON raise unhandled ValueError exceptions, crashing downstream processing. Similarly, empty JSON responses (empty dict, list, null) masquerading as valid data cause silent failures producing empty DataFrames or misleading results in financial analysis.

### `AP-DATA-SOURCING-006` — Source-specific symbol mapping errors causing data corruption <sub>(high)</sub>

Stock symbols require source-specific formatting (sh/sz prefixes for Sina, numeric codes for THS, etc.). Incorrect symbol mapping causes API calls to return empty results or wrong data, corrupting financial datasets with missing records or entirely incorrect tickers being stored.

### `AP-DATA-SOURCING-013` — Column mapping length mismatch causing DataFrame errors <sub>(medium)</sub>

Column mapping constants with length mismatch against actual API response columns cause ValueError exceptions during DataFrame construction. Raw field names (f1, f2, f12) must be mapped to meaningful names (最新价, 涨跌幅) with exact column count alignment.

## finance-bp-103--ArcticDB (3)

### `AP-DATA-SOURCING-007` — Using unsupported DataFrame types with time-series storage <sub>(high)</sub>

ArcticDB does not support MultiIndex columns, PyArrow-backed pandas DataFrames, or timedelta64 columns. Attempting to write these DataFrame types raises ArcticDbNotYetImplemented exceptions, causing write failures and permanent data loss if not properly handled before storage operations.

### `AP-DATA-SOURCING-008` — Non-atomic storage writes causing concurrent access corruption <sub>(high)</sub>

Storage backends without atomic write_if_none operations can cause data corruption under concurrent multi-writer access. Similarly, updating reference keys before atom keys complete allows readers to access incomplete or missing data, breaking version chain integrity.

### `AP-DATA-SOURCING-014` — Pruning snapshot-protected versions breaking point-in-time recovery <sub>(high)</sub>

Deleting or pruning versions that are referenced by existing snapshots breaks historical data access. Snapshots provide point-in-time recovery capabilities, and removing their referenced versions causes read failures when users attempt to access data from specific snapshots.

## finance-bp-114--edgar-crawler (1)

### `AP-DATA-SOURCING-010` — 8-K filing item numbering scheme mismatch for historical filings <sub>(medium)</sub>

8-K filings use obsolete item numbering (1-12) before 2004-08-23 and new numbering (1.01-9.01) after. Using the wrong numbering scheme causes no matches for historical filings, resulting in empty item sections and complete extraction failure for pre-2004 data.

## finance-bp-128--yfinance (2)

### `AP-DATA-SOURCING-009` — Missing timezone-aware DatetimeIndex causing DST offset errors <sub>(high)</sub>

Price history DataFrames returned without timezone-aware DatetimeIndex cause incorrect timestamp interpretation when combined with other timezone-aware data. This leads to 23-25 hour offset errors during daylight saving time transitions, corrupting historical price calculations.

### `AP-DATA-SOURCING-011` — Yahoo Finance missing crumb authentication causing 401/403 errors <sub>(high)</sub>

Yahoo Finance API requires crumb and cookie authentication with every request. Without proper crumb management, API calls return 401 Unauthorized or HTML error pages instead of JSON data, breaking all downstream price and financial data processing.

FILE:references/COMPONENTS.md
# Component Capability Map

**Project**: finance-bp-079--akshare
**Scan date**: 2026-04-22
**Stats**: {'total_files': 10, 'total_classes': 18, 'total_functions': 0, 'total_stages': 10}

## Modules (10)

- [http_request_layer](components/http_request_layer.md): 3 classes
- [source-specific_data_acquisition](components/source-specific_data_acquisition.md): 3 classes
- [html_table_extraction](components/html_table_extraction.md): 2 classes
- [json_response_parsing](components/json_response_parsing.md): 1 classes
- [column_name_standardization](components/column_name_standardization.md): 2 classes
- [data_type_conversion](components/data_type_conversion.md): 1 classes
- [paginated_data_fetching](components/paginated_data_fetching.md): 1 classes
- [trading_calendar_validation](components/trading_calendar_validation.md): 2 classes
- [price_adjustment_processing](components/price_adjustment_processing.md): 1 classes
- [realized_volatility_calculation](components/realized_volatility_calculation.md): 2 classes

FILE:references/CONSTRAINTS.md
# Constraints

## preservation_manifest

```yaml
required_objects:
  business_decisions_count: 158
  fatal_constraints_count: 30
  non_fatal_constraints_count: 198
  use_cases_count: 1
  semantic_locks_count: 12
  preconditions_count: 4
  evidence_quality_rules_count: 2
  traceback_scenarios_count: 5

```

## Domain Constraints Injected (47)

- **`SHARED-CN-ASTOCK-T1-001`** <sub>(fatal)</sub>: A 股股票实行 T+1 交收制度：T 日买入的股票最早 T+1 日方可卖出。 T 日卖出所得资金可当日再用于买入。回测框架若未施加 T+1 持仓锁定， 将高估换手率与策略胜率，尤其损害日内反转类策略的真实性。
- **`SHARED-CN-ASTOCK-T1-002`** <sub>(fatal)</sub>: 沪深主板股票日涨跌幅上限为 ±10%（ST/SST 股票 ±5%）。 涨停封板时买方消失、跌停封板时卖方消失；回测若假设当日可以任意价格 成交，会系统性高估可执行性。封板检测应在成交模拟层强制实施。
- **`SHARED-CN-ASTOCK-T1-003`** <sub>(high)</sub>: 科创板和创业板（2020年8月改革后）正常交易日涨跌幅为 ±20%； 北交所 ±30%；新股上市后前5个交易日不设涨跌幅限制。 回测若对所有股票统一套用 ±10% 过滤逻辑，会错误剔除或错误包含这些板块的成交。
- **`SHARED-CN-ASTOCK-T1-004`** <sub>(high)</sub>: ST/*ST 股票日涨跌幅限制为 ±5%，流动性极差，成交假设不可与正常股票混用。 包含历史 ST 股票（最终退市）但不纳入回测会产生幸存者偏差； 纳入回测但不区分 ST 涨跌幅会错误模拟成交。
- **`SHARED-CN-ASTOCK-T1-005`** <sub>(medium)</sub>: A 股开盘集合竞价（9:15-9:25）和收盘集合竞价（14:57-15:00）期间， 成交价由"最大成交量原则"确定，非即时撮合。回测以开盘价或收盘价假设 即时全量成交会低估实际滑点风险，大单策略尤为明显。
- **`SHARED-CN-ASTOCK-T1-006`** <sub>(high)</sub>: 停牌制度：A 股长期停牌（2018年前可长达数月）期间，持仓资金被锁定， 无法再平衡，机会成本在回测中普遍被忽略。应在因子计算前过滤停牌日 （volume == 0 或 is_suspended == True），停牌期间不发出信号。
- **`SHARED-CN-ASTOCK-T1-007`** <sub>(high)</sub>: 新股上市后前5个交易日无涨跌幅限制（首日涨幅可超300%）， 且无完整历史数据（均线/波动率/换手率因子无法计算）。 应在因子计算前过滤上市不足 N 个交易日（通常 60-252 日）的股票。
- **`SHARED-CN-ASTOCK-T1-008`** <sub>(high)</sub>: A 股程序化交易监管新规（2025年7月7日施行）：单账户每秒申报/撤单 ≥ 300 笔， 或单日申报/撤单 ≥ 20000 笔，被认定为高频交易，须向交易所报备。 AI 生成的量化策略若频率超标则无法合规运行，应在策略设计期提示。
- **`SHARED-CN-ASTOCK-ADJ-001`** <sub>(fatal)</sub>: 除权除息日股价跳空是账面调整而非真实亏损。复权选择： 不复权会虚增策略亏损；前复权会将历史价格内嵌未来分红信息（lookahead bias）； 后复权以上市首日为基准累积，是量化回测的最优选择。
- **`SHARED-CN-ASTOCK-ADJ-002`** <sub>(fatal)</sub>: A 股上市公司财务报告披露有法定延迟：年报在次年4月30日前、 半年报在8月31日前、季报分别在4月30日（一季）/10月31日（三季）前披露。 回测中使用财务数据时，必须以实际披露日期（announcement_date）而非 会计期间结束日作为数据可用时间点，否则引入 point-in-time lookahead bias。
- **`SHARED-CN-ASTOCK-ADJ-003`** <sub>(high)</sub>: 分红送股转增和配股会导致除权除息日后股本增加，历史持股数量不变但股价等比 缩水，若回测系统未同步调整持仓股数，会在除权日产生虚假亏损或盈利。
- **`SHARED-CN-ASTOCK-ADJ-004`** <sub>(medium)</sub>: 大宗交易与竞价交易价差：大宗交易成交价可比市价折价最多 10%（主板）， 但此价格不影响次日竞价开盘。大宗交易数据出现在收盘后，若将其混入 日内 OHLCV 数据，会污染收盘价和成交量的正常计算。
- **`SHARED-CN-ASTOCK-ADJ-005`** <sub>(fatal)</sub>: 融资融券（两融）做空限制：A 股散户无法直接卖空，融券标的池有限（主要为 大盘蓝筹，中小盘融券极度稀缺），融券利率远高于融资利率。 回测若直接假设可做空任意股票，会产生不可执行的策略，实盘与回测严重背离。
- **`SHARED-CN-ASTOCK-FX-001`** <sub>(high)</sub>: 通过沪深港通（北向）买入股票，境外投资者合计持股上限 30%，预警线 28%。 当外资持股比例达 28% 时，联交所暂停该股新增买盘，直到降至 26% 才恢复。 策略若重仓外资偏好股（消费/医药龙头），需监控外资持股比例。
- **`SHARED-CN-ASTOCK-FX-002`** <sub>(high)</sub>: 5% 举牌规则：单一投资者持有上市公司已发行股份超过 5%，须在3日内向证监会 和交易所报告并公告；在此期间及公告后2日内不得再买卖。 量化选股系统若不考虑此规则，重仓股超过 5% 阈值后将面临强制停止买入。
- **`SHARED-CN-ASTOCK-FX-003`** <sub>(high)</sub>: 公募基金"双十原则"：单基金持有单只股票不超过净资产 10%， 同一基金管理人旗下所有基金合计不超过该公司已发行股份 10%。 量化选股组合若部署于公募基金，需在优化约束中强制加入合规上限。
- **`SHARED-CN-ASTOCK-FX-004`** <sub>(fatal)</sub>: 内幕交易边界：AI 辅助量化系统的所有输入数据必须来自公开已披露信息。 通过非公开渠道（私有数据服务/内部消息/重组前预知）触发的自动化交易 构成内幕交易，适用《证券法》第80-83条及《内幕交易行为认定指引》。
- **`SHARED-CN-ASTOCK-MKT-001`** <sub>(fatal)</sub>: 幸存者偏差：使用当前 A 股成分股（如当前沪深300）作为历史回测股票池， 会遗漏曾被纳入指数但因业绩差被调出或退市的股票。2020-2024年 A 股 退市数量加速（41家/年创纪录），此偏差日趋严重。必须使用历史时点快照。
- **`SHARED-CN-ASTOCK-MKT-002`** <sub>(medium)</sub>: 指数成分股调整效应：沪深300/中证500等每半年调整一次（6月/12月）， 被纳入股票通常在公告日至生效日之间显著上涨（被动资金被动买入）， 被剔除股票则相反。回测股票池应使用历史成分股快照，并标注调整窗口期。
- **`SHARED-CN-ASTOCK-MKT-003`** <sub>(high)</sub>: 策略拥挤（Strategy Crowding）：大量量化私募使用相似因子模型时， 持仓高度重叠，遇市场冲击时集体卖出形成踩踏。2024年2月 A 股量化危机 是典型案例（小盘股指数单日跌幅超 10%）。需监控因子多头持仓与 主流量化基金的重叠率。
- **`SHARED-CN-ASTOCK-MKT-004`** <sub>(high)</sub>: A 股量化对冲策略常用 IF/IC/IM 股指期货做多/空对冲系统性风险。 但 A 股股指期货长期处于贴水（远期价格 < 现货），IC 年化贴水可达 10-20%。 回测若仅考虑价格收益而忽略期货贴水/升水，会严重高估对冲策略净收益。
- **`SHARED-CN-ASTOCK-MKT-005`** <sub>(high)</sub>: A 股月度动量因子在方向上与美股相反：近1个月表现最好的股票， 下1个月大概率反转（反转效应而非动量）。机构研究（华泰/东吴证券） 与学术论文均验证：直接套用美股月度动量因子在 A 股会产生系统性亏损。
- **`SHARED-CN-ASTOCK-BF-001`** <sub>(medium)</sub>: 处置效应（Shefrin & Statman 1985）在 A 股散户中尤为显著： 投资者倾向于过早卖出盈利股票、过长持有亏损股票。上交所实证研究证实 超过 90% 的个人账户存在此效应，AI 辅助工具不应迁就"持有亏损等解套" 的直觉，而应基于量化信号提供纪律性止损止盈建议。
- **`SHARED-CN-ASTOCK-BF-002`** <sub>(medium)</sub>: A 股以散户为主（个人账户交易量占比超 80%），羊群效应显著：散户倾向于 跟风操作，导致价格非理性波动（如 2015年杠杆牛熊）。量化策略应避免 使用成交量排行/热度排行等可能强化羊群信号的指标作为主要因子。
- **`SHARED-CN-ASTOCK-BF-003`** <sub>(medium)</sub>: 过度自信效应（Barber & Odean 2000）在 A 股散户中更严重：散户年均换手率 超 500%，机构长期收益显著优于散户。高换手率策略经交易成本后净收益往往 更低。AI 不应鼓励"频繁操作"，而应推荐低频高质信号驱动交易。
- **`SHARED-CN-ASTOCK-BF-004`** <sub>(medium)</sub>: A 股日历效应：春节效应（节前5日和节后1-3日倾向上涨）、月初效应 （月初第1-5个交易日表现优于月中/月末）已有学术实证（南京财经大学等）。 策略应在日历特殊窗口降低信号置信度，或单独评估日历驱动收益的贡献。
- **`SHARED-CN-ASTOCK-BF-005`** <sub>(high)</sub>: 策略容量（Capacity）限制：A 股小盘/微盘股日均成交额仅数百万， 大资金买入/卖出会造成严重价格冲击，策略实际容量可能仅几千万元。 回测结果不可外推至亿级资金，应在回测中加入成交量比例上限约束。
- **`SHARED-CN-ASTOCK-COST-001`** <sub>(fatal)</sub>: A 股完整交易成本结构（2023年8月调整后）：印花税卖出单向 0.05%； 佣金双向约 0.01%（最低5元）；过户费（沪市）0.001%； 滑点/冲击成本小盘股 0.1%-0.5%/次。忽略成本的回测策略年化收益率 具有欺骗性，高频/高换手策略尤甚。
- **`SHARED-CN-ASTOCK-COST-002`** <sub>(high)</sub>: 市场冲击成本（Market Impact）在回测中通常完全缺失，但在实盘中可能是 最大成本来源。A 股小盘股 100 万元买入可能推高 1% 以上。冲击成本与 成交规模呈幂律而非线性关系，应使用 Almgren-Chriss 模型或简化版估算。
- **`SHARED-CN-ASTOCK-COST-003`** <sub>(medium)</sub>: 大股东/董监高减持新规（证监会第224号令，2024年5月）：持股5%以上大股东 通过集中竞价减持须提前15个交易日披露减持计划，3个月内不超过股份总数1%。 解禁股减持压力是 A 股特有的系统性风险因子，回测中忽略解禁日历会低估 相关股票的持股风险。
- **`SHARED-CN-ASTOCK-DATA-001`** <sub>(high)</sub>: A 股交易日历与自然日历不一致：存在法定节假日调休导致的"补班日"（周六上班）， 以及临时停市（2015年7月8日至7月10日因股灾紧急停市）。 使用通用工作日历（weekdays）推算 A 股交易日会产生偏差， 必须使用 A 股专用交易日历（如 exchange_calendars 或 tushare 的交易日接口）。
- **`SHARED-CN-ASTOCK-DATA-002`** <sub>(medium)</sub>: A 股退市后股票代码可能被新股重用（极少见但存在）。使用纯代码（如 '000001'） 作为历史数据主键而不包含交易所后缀（'.SZ'）或上市日期范围，可能导致 历史数据与当前股票的错误混淆，长周期回测中需特别注意。
- **`SHARED-DS-RL-001`** <sub>(fatal)</sub>: Rate Limit + 指数退避重试：所有外部数据 API 调用必须实施速率限制控制 和指数退避重试（Exponential Backoff with Jitter）。收到 429/503 响应后 立即重试是反模式，会加剧服务端压力并触发 IP 封禁。 最大重试次数 3-5 次，退避基数 1-2 秒，最大退避 60 秒。
- **`SHARED-DS-RL-002`** <sub>(high)</sub>: 批量 API 调用必须控制并发数（max_workers），不可无限制并行。 免费 API（akshare/tushare 免费版）通常限制为 1-3 并发； 付费 API 也有并发上限（tushare 积分制，不同积分对应不同并发）。 超出并发限制会触发 429 或 IP 封禁。推荐使用 asyncio.Semaphore 或 ThreadPoolExecutor 的 max_workers 参数显式控制。
- **`SHARED-DS-RL-003`** <sub>(high)</sub>: API Token / 凭证安全：数据源 API key（tushare token / akshare 无需 token 但 其他商业数据源需要）不可硬编码在代码中，必须通过环境变量或配置文件读取。 硬编码 token 提交到 Git 会导致 token 泄露和费用损失。
- **`SHARED-DS-RL-004`** <sub>(medium)</sub>: 请求节流（Throttling）：对同一 API 的批量请求应在请求间插入最小间隔 （akshare 部分接口要求 ≥ 0.5s；tushare 免费版每分钟 200 次）。 纯代码 sleep 不如令牌桶（Token Bucket）算法精确，推荐使用 ratelimit 或 slowapi 等成熟库。
- **`SHARED-DS-MISS-001`** <sub>(high)</sub>: 停牌日数据缺失策略：停牌股票在停牌期间无成交数据，数据库中会出现日期缺口。 缺失日期不可使用 forward-fill（会产生虚假成交量）； 应在数据库中以 is_suspended=True 标记，量和成交额填 0，价格保留前一日收盘价。 因子计算时必须过滤 is_suspended=True 的行。
- **`SHARED-DS-MISS-002`** <sub>(medium)</sub>: 新上市股票的历史数据边界：新股上市首日开始在数据库中出现，但其上市前 无历史数据。若因子计算的 lookback 期超过上市天数，会产生所有 NaN 因子值。 采集时应记录每只股票的上市日期（list_date），采集逻辑应以上市日期为起点， 不以固定开始日期。
- **`SHARED-DS-MISS-003`** <sub>(high)</sub>: 退市股票的数据完整性：已退市股票在主流数据源（akshare/tushare）中依然 可以查询历史数据（退市前的历史），但退市日期后无数据。 历史股票池构建时必须包含已退市股票（否则幸存者偏差）， 且采集时需明确处理退市日截止边界。
- **`SHARED-DS-MISS-004`** <sub>(high)</sub>: 多数据源数据对账（Cross-Source Reconciliation）：同一数据（如收盘价） 从不同数据源（akshare/tushare/baostock）获取可能存在细微差异 （不同复权方式/不同节假日处理/除息调整时间不同）。 应在 pipeline 中实施多源对账检查，差异超阈值（如 0.1%）时记录告警并人工确认。
- **`SHARED-DS-TIME-001`** <sub>(high)</sub>: 时间戳精度与类型一致性：数据库中时间戳应使用统一的数据类型 （timestamp 而非 varchar/int）。混用字符串日期（'2024-01-15'）和 Timestamp 对象是比较、索引、merge 出现细微 bug 的常见来源， 应在 pipeline 入口处强制转换。
- **`SHARED-DS-TIME-002`** <sub>(high)</sub>: 交易时间与自然时间的区分：日线数据的"日期"通常对应交易日（T日）， 而新闻/公告数据的"时间"是自然时间。合并两类数据时，必须将自然时间 映射到下一个可用交易日（next available trading day）， 否则会产生"公告在T日，但T日盘中已经可用"的 lookahead 问题。
- **`SHARED-DS-INCR-001`** <sub>(high)</sub>: 增量更新幂等性：数据更新脚本必须是幂等的（多次运行结果相同）。 若脚本因网络中断在中途失败，重新运行时不应产生重复数据或数据缺口。 实现方式：先写入临时表，校验后 UPSERT 到主表，不直接 INSERT/APPEND。
- **`SHARED-DS-INCR-002`** <sub>(high)</sub>: 数据完整性检验（数据校验和/行数检查）：每次数据更新后， 应对关键字段做完整性检验：行数是否在预期范围内、价格是否为正数、 日期是否连续（无缺失交易日）。缺少自动校验的数据管道是"沉默腐烂"的根源。
- **`SHARED-DS-INCR-003`** <sub>(medium)</sub>: 数据版本化：数据管道的输出数据应版本化管理（data versioning）。 当数据源更新了历史数据（如修订调整后的财务数据）， 旧版本数据应保留可追溯，不应静默覆盖，以便对比版本间差异及复现历史回测。
- **`SHARED-DS-INCR-004`** <sub>(medium)</sub>: 数据对齐到交易日历边界：采集完成后，应验证所有股票/资产的数据覆盖 完整性与交易日历的一致性。每只股票在每个交易日都应有一行数据 （停牌标记，不是缺失）。通过 pivot_table 检查 NaN 比例是有效的快速诊断手段。
- **`SHARED-DS-INCR-005`** <sub>(medium)</sub>: 缓存策略（Caching）：频繁读取的静态/低频更新数据（如股票信息、行业分类、 指数成分股）应本地缓存，避免每次运行重复 API 调用。 缓存必须设置过期时间（TTL），防止使用过期的行业分类或已失效的成分股信息。

FILE:references/LOCKS.md
# Semantic Locks + Preconditions

## Semantic Locks (12)

### `SL-01` <sub>(on_violation: fatal)</sub>
Execute sell orders before buy orders in every trading cycle

### `SL-02` <sub>(on_violation: fatal)</sub>
Trading signals MUST use next-bar execution (no look-ahead)

### `SL-03` <sub>(on_violation: fatal)</sub>
Entity IDs MUST follow format entity_type_exchange_code

### `SL-04` <sub>(on_violation: fatal)</sub>
DataFrame index MUST be MultiIndex (entity_id, timestamp)

### `SL-05` <sub>(on_violation: fatal)</sub>
TradingSignal MUST have EXACTLY ONE of: position_pct, order_money, order_amount

### `SL-06` <sub>(on_violation: fatal)</sub>
filter_result column semantics: True=BUY, False=SELL, None/NaN=NO ACTION

### `SL-07` <sub>(on_violation: fatal)</sub>
Transformer MUST run BEFORE Accumulator in factor pipeline

### `SL-08` <sub>(on_violation: fatal)</sub>
MACD parameters locked: fast=12, slow=26, signal=9

### `SL-09` <sub>(on_violation: warning)</sub>
Default transaction costs: buy_cost=0.001, sell_cost=0.001, slippage=0.001

### `SL-10` <sub>(on_violation: fatal)</sub>
A-share equity trading is T+1 (no same-day close of buy positions)

### `SL-11` <sub>(on_violation: fatal)</sub>
Recorder subclass MUST define provider AND data_schema class attributes

### `SL-12` <sub>(on_violation: fatal)</sub>
Factor result_df MUST contain either 'filter_result' OR 'score_result' column

## Preconditions (4)

- **`PC-01`**: `python3 -c 'import zvt; print(zvt.__version__)'` → on_fail: Run: python3 -m pip install zvt then re-run: python3 -m zvt.init_dirs to initialize data directories
- **`PC-02`**: `python3 -c "from zvt.api.kdata import get_kdata; df = get_kdata(entity_ids=['stock_sh_600000'], limit=1); assert df is not None and len(df) > 0, 'No kdata found'"` → on_fail: Run recorder first: python3 -m zvt.recorders.em.em_stock_kdata_recorder --entity_ids stock_sh_600000 (replace with your target entity IDs)
- **`PC-03`**: `python3 -c "import os; from pathlib import Path; zvt_home = Path(os.environ.get('ZVT_HOME', Path.home() / '.zvt')); assert zvt_home.exists(), f'ZVT home not found: {zvt_home}'"` → on_fail: Run: python3 -m zvt.init_dirs
- **`PC-04`**: `python3 -c "import os, tempfile; from pathlib import Path; zvt_home = Path(os.environ.get('ZVT_HOME', Path.home() / '.zvt')); test_f = zvt_home / '.write_test'; test_f.touch(); test_f.unlink()"` → on_fail: Check directory permissions: chmod u+w ~/.zvt or set ZVT_HOME environment variable to a writable location

FILE:references/USE_CASES.md
# Known Use Cases (KUC)

Total: **1**

## `KUC-101`
**Source**: `docs/conf.py`

Sets up the Sphinx documentation builder with Chinese language support (via ctex), Markdown parsing via recommonmark, and automatic version string extraction from the akshare package for consistent documentation.

FILE:references/WISDOM.md
# Cross-Project Wisdom

Total: **8**

## `CW-DATA-SOURCING-001` — Exponential backoff retry with rate limit detection
**From**: finance-bp-079--akshare, finance-bp-114--edgar-crawler · **Applicable to**: data-sourcing

Implement retry logic with exponential backoff specifically for HTTP 429 rate limit responses. Retrying immediately on rate limit errors worsens the block situation. Separate retry logic for transient network errors (TimeoutError, ConnectionError) from permanent errors (ValueError, KeyError) prevents resource waste and masks underlying bugs.

## `CW-DATA-SOURCING-002` — Strict date format validation and standardization
**From**: finance-bp-070--edgartools, finance-bp-079--akshare, finance-bp-084--eastmoney · **Applicable to**: data-sourcing

Validate date formats strictly (YYYY-MM-DD pattern with leap year and month-end checks) before processing XBRL or API data. Convert date strings between formats (YYYYMMDD to YYYY-MM-DD) when storing to databases. Invalid dates corrupt downstream financial calculations.

## `CW-DATA-SOURCING-003` — XBRL fact attribute completeness enforcement
**From**: finance-bp-070--edgartools, finance-bp-114--edgar-crawler · **Applicable to**: data-sourcing

Extract and validate all essential XBRL fact attributes (concept, value, period, unit) from every fact. Missing attributes cause financial analysis queries to return incomplete or misleading results. Period type (instant vs duration) must be correctly distinguished for accurate balance sheet rendering.

## `CW-DATA-SOURCING-004` — Streaming parser threshold for large documents
**From**: finance-bp-070--edgartools, finance-bp-128--yfinance · **Applicable to**: data-sourcing

Implement streaming parser activation when documents exceed configurable thresholds (10MB default). This prevents OOM errors on large NPORT-P filings or bulk document downloads. Also require timezone information for time-series data to prevent DST offset corruption.

## `CW-DATA-SOURCING-005` — Data accuracy disclaimer requirements
**From**: finance-bp-079--akshare, finance-bp-128--yfinance, finance-bp-097--OpenBB · **Applicable to**: data-sourcing

Always present scraped or third-party financial data with proper caveats about accuracy limitations and delays. Claims of guaranteed accuracy, real-time capabilities, or Yahoo/provider affiliation violate terms of service and can lead to user financial losses from reliance on delayed or incorrect data.

## `CW-DATA-SOURCING-006` — Atomic write ordering for versioned storage
**From**: finance-bp-103--ArcticDB · **Applicable to**: data-sourcing

Write atom keys (TABLE_DATA, TABLE_INDEX, VERSION) before updating mutable reference keys (VERSION_REF, SNAPSHOT_REF). Never modify atom keys after writing to preserve content-addressed storage invariants. This prevents readers from accessing incomplete data in multi-writer scenarios.

## `CW-DATA-SOURCING-007` — HTTP status code validation before data processing
**From**: finance-bp-079--akshare, finance-bp-097--OpenBB · **Applicable to**: data-sourcing

Always validate HTTP response status codes before processing response data. Error responses (404, 500) may contain HTML error pages that corrupt downstream JSON parsing. Explicitly check for HTTP 429 and raise RateLimitError for proper handling by callers.

## `CW-DATA-SOURCING-008` — Quality gates for financial recommendations
**From**: finance-bp-084--eastmoney · **Applicable to**: data-sourcing

Apply fundamental quality filters (ROE thresholds, OCF/Profit ratios, debt ratios) before generating financial recommendations. Without quality gates, low-quality stocks may be recommended for positions, leading to investment losses. Separate on-demand computation from scheduled pre-computation to handle API rate limits.

FILE:references/components/column_name_standardization.md
# column_name_standardization (2 classes)

## `set_df_columns`
`column_name_standardization/set-df-columns.py:0`

## `Column naming convention`
`column_name_standardization/column-naming-convention.py:0`

FILE:references/components/data_type_conversion.md
# data_type_conversion (1 classes)

## `N/A`
`data_type_conversion/n-a.py:0`

FILE:references/components/html_table_extraction.md
# html_table_extraction (2 classes)

## `N/A`
`html_table_extraction/n-a.py:0`

## `HTML parser`
`html_table_extraction/html-parser.py:0`

FILE:references/components/http_request_layer.md
# http_request_layer (3 classes)

## `AkshareConfig`
`http_request_layer/akshareconfig.py:0`

## `ProxyContext`
`http_request_layer/proxycontext.py:0`

## `HTTP client`
`http_request_layer/http-client.py:0`

FILE:references/components/json_response_parsing.md
# json_response_parsing (1 classes)

## `N/A`
`json_response_parsing/n-a.py:0`

FILE:references/components/paginated_data_fetching.md
# paginated_data_fetching (1 classes)

## `fetch_paginated_data`
`paginated_data_fetching/fetch-paginated-data.py:0`

FILE:references/components/price_adjustment_processing.md
# price_adjustment_processing (1 classes)

## `stock_zh_a_daily`
`price_adjustment_processing/stock-zh-a-daily.py:0`

FILE:references/components/realized_volatility_calculation.md
# realized_volatility_calculation (2 classes)

## `volatility_yz_rv`
`realized_volatility_calculation/volatility-yz-rv.py:0`

## `Volatility estimator`
`realized_volatility_calculation/volatility-estimator.py:0`

FILE:references/components/source-specific_data_acquisition.md
# source-specific_data_acquisition (3 classes)

## `TLSAdapter`
`source-specific_data_acquisition/tlsadapter.py:0`

## `DataApi`
`source-specific_data_acquisition/dataapi.py:0`

## `Data source`
`source-specific_data_acquisition/data-source.py:0`

FILE:references/components/trading_calendar_validation.md
# trading_calendar_validation (2 classes)

## `get_rank_sum_daily`
`trading_calendar_validation/get-rank-sum-daily.py:0`

## `Calendar source`
`trading_calendar_validation/calendar-source.py:0`

ClawHub Coding Data Analysis+2

T@clawhub-tangweigang-jpg-8679fec286

Advanced Financial Ml

Skill

MlFinLab 提供金融机器学习高级实现，包括信息驱动 bars（tick/volume/dollar/imbalance bars）、分数阶差分和回测工具，支持多市场因子研究与策略验证。

---
name: advanced-financial-ml
description: |-
  MlFinLab 提供金融机器学习高级实现，包括信息驱动 bars（tick/volume/dollar/imbalance bars）、分数阶差分和回测工具，支持多市场因子研究与策略验证。
license: Proprietary. See LICENSE.txt in project root.
compatibility: Designed for Doramagic-host ecosystem (Claude Code / openclaw / Cursor). Requires Python 3.12+ with uv package manager.
metadata:
  version: "v6.1"
  blueprint_id: "finance-bp-115"
  compiled_at: "2026-04-22T13:00:55.567727+00:00"
  capability_markets: "multi-market"
  capability_activities: "backtesting, factor-research"
  sop_version: "crystal-compilation-v6.1"
---
# 金融机器学习 (advanced-financial-ml)

> MlFinLab 提供金融机器学习高级实现，包括信息驱动 bars（tick/volume/dollar/imbalance bars）、分数阶差分和回测工具，支持多市场因子研究与策略验证。

## Pipeline

`data_collection -> data_storage -> factor_computation -> target_selection -> trading_execution -> visualization`

## Top Use Cases (1 total)

### Sphinx Documentation Configuration (`UC-101`)
How to configure and generate project documentation using Sphinx autodoc and extensions for API documentation coverage
**Triggers**: documentation, sphinx, autodoc

**Execute trigger**: `When user intent matches intent_router.uc_entries[].positive_terms AND user uses action verb (run/execute/跑/执行/backtest/fetch/collect)`

## What I'll Ask You

- Target market: A-share (default), HK, or crypto? (US stocks in ZVT are half-baked — stockus_nasdaq_AAPL exists but coverage is thin)
- Data source / provider: eastmoney (free, no account), joinquant (account+paid), baostock (free, good history), akshare, or qmt (broker)?
- Strategy type: MACD golden-cross, MA crossover, volume breakout, fundamental screen, or custom factor?
- Time range: start_timestamp and end_timestamp for backtest period
- Target entity IDs: specific stocks (stock_sh_600000) or index components (SZ1000)?

## Semantic Locks (Fatal)

| ID | Rule | On Violation |
|---|---|---|
| `SL-01` | Execute sell orders before buy orders in every trading cycle | halt |
| `SL-02` | Trading signals MUST use next-bar execution (no look-ahead) | halt |
| `SL-03` | Entity IDs MUST follow format entity_type_exchange_code | halt |
| `SL-04` | DataFrame index MUST be MultiIndex (entity_id, timestamp) | halt |
| `SL-05` | TradingSignal MUST have EXACTLY ONE of: position_pct, order_money, order_amount | halt |
| `SL-06` | filter_result column semantics: True=BUY, False=SELL, None/NaN=NO ACTION | halt |
| `SL-07` | Transformer MUST run BEFORE Accumulator in factor pipeline | halt |
| `SL-08` | MACD parameters locked: fast=12, slow=26, signal=9 | halt |

Full lock definitions: [references/LOCKS.md](references/LOCKS.md)

## Top Anti-Patterns (25 total)

- **`AP-ZVT-183`**: 除权因子为 inf/NaN 时直接参与乘法导致复权静默失败
- **`AP-ZVT-179`**: 第三方数据接口超限后异常被吞噬，数据静默缺失
- **`AP-ZVT-183B`**: HFQ（后复权）与 QFQ（前复权）K 线表使用错误导致因子计算漂移

All 25 anti-patterns: [references/ANTI_PATTERNS.md](references/ANTI_PATTERNS.md)

## Evidence Quality Notice

> [QUALITY NOTICE] This crystal was compiled from blueprint finance-bp-115. Evidence verify ratio = 43.7% and audit fail total = 34. Generated results may have uncaptured requirement gaps. Verify critical decisions against source files (LATEST.yaml / LATEST.jsonl).

## Reference Files

| File | Contents | When to Load |
|---|---|---|
| [references/seed.yaml](references/seed.yaml) | V6+ 全量权威 (source-of-truth) | 有行为/决策争议时必读 |
| [references/ANTI_PATTERNS.md](references/ANTI_PATTERNS.md) | 25 条跨项目反模式 | 开始实现前 |
| [references/WISDOM.md](references/WISDOM.md) | 跨项目精华借鉴 | 架构决策时 |
| [references/CONSTRAINTS.md](references/CONSTRAINTS.md) | domain + fatal 约束 | 规则冲突时 |
| [references/USE_CASES.md](references/USE_CASES.md) | 全量 KUC-* 业务场景 | 需要完整示例时 |
| [references/LOCKS.md](references/LOCKS.md) | SL-* + preconditions + hints | 生成回测/交易代码前 |
| [references/COMPONENTS.md](references/COMPONENTS.md) | AST 组件地图（按 module 拆分）| 查 API 时 |

---

*Compiled by Doramagic crystal-compilation-v6.1 from `finance-bp-115` blueprint at 2026-04-22T13:00:55.567727+00:00.*
*See [human_summary.md](human_summary.md) for non-technical overview.*

FILE:human_summary.md
# Human Summary

> I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow. Just tell me what you want; I'll write the code, you don't have to dig docs. (Heads up: ZVT natively supports A-share, HK, and crypto. US stocks — stockus_nasdaq_AAPL — are half-baked; don't bother for serious work.)

## What I Can Do

- **tagline**: I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow. Just tell me what you want; I'll write the code, you don't have to dig docs. (Heads up: ZVT natively supports A-share, HK, and crypto. US stocks — stockus_nasdaq_AAPL — are half-baked; don't bother for serious work.)
- **use_cases**: ['Sphinx Documentation Configuration', 'A-share MACD daily golden-cross backtest with hfq price adjustment from eastmoney', 'End-to-end ZVT pipeline: FinanceRecorder + GoodCompanyFactor + StockTrader', 'Multi-factor strategy with TargetSelector (AND mode) combining MACD + volume breakout', 'Index composition data collection (SZ1000, SZ2000) with EM recorder', 'Institutional fund holdings tracker via joinquant_fund_runner pattern', 'Custom Transformer + Accumulator factor with per-entity rolling state']

## What I Ask You

- Target market: A-share (default), HK, or crypto? (US stocks in ZVT are half-baked — stockus_nasdaq_AAPL exists but coverage is thin)
- Data source / provider: eastmoney (free, no account), joinquant (account+paid), baostock (free, good history), akshare, or qmt (broker)?
- Strategy type: MACD golden-cross, MA crossover, volume breakout, fundamental screen, or custom factor?
- Time range: start_timestamp and end_timestamp for backtest period
- Target entity IDs: specific stocks (stock_sh_600000) or index components (SZ1000)?

FILE:references/ANTI_PATTERNS.md
# Anti-Patterns (Cross-Project)

Total: **25**

## qlib (9)

### `AP-QLIB-1930` — 回测结果与模型无关——共享 dataset 对象导致预测值被首次模型覆盖 <sub>(high)</sub>

Qlib 中多个模型复用同一个已 fit 的 DatasetH 实例时，dataset 内部的标准化 参数（fit_start_time/fit_end_time 决定的归一化统计量）在第一次 fit 后固化。 切换模型但不重新初始化 dataset，导致所有模型实际使用同一套预测信号。表现为 无论换 LightGBM/XGBoost/DNN，回测净值曲线完全一致。这是最危险的"实验看起来 在跑，但结论全部无效"反模式。

Source: https://github.com/microsoft/qlib/issues/1930

### `AP-QLIB-2090` — fit_start_time 与 train segment 双重配置引发隐式数据泄露 <sub>(high)</sub>

Qlib DatasetH 有两个"训练数据范围"：handler 的 fit_start_time/fit_end_time （决定归一化器拟合范围）和 segments.train（决定模型训练范围）。常见错误是 让 fit_end_time 覆盖 valid/test 段，使归一化统计量（均值、标准差）包含了 未来数据，造成前向偏差（look-ahead bias）。两者独立配置但语义耦合，文档 未明确说明 fit_end_time 必须 <= train_end。

Source: https://github.com/microsoft/qlib/issues/2090

### `AP-QLIB-2036` — MACD 因子公式文档错误——DEA 被多除一次 CLOSE 导致量纲不一致 <sub>(high)</sub>

Qlib 官方文档中的 Alpha 公式示例将 MACD 的 DEA 定义为 EMA(DIF, 9) / CLOSE， 但 DIF 已经是无量纲（除过 CLOSE 的），再次除以 CLOSE 导致 DEA 量纲为 1/price。 基于此文档公式构建的 MACD 因子在截面标准化后与正确公式差异显著，IC 下降。 此类文档层面的公式错误会被大量用户直接照搬入生产因子库。

Source: https://github.com/microsoft/qlib/issues/2036

### `AP-QLIB-2184` — 自定义 A 股数据导入前未按约定填充停牌日 NaN，引发下游因子噪声 <sub>(high)</sub>

Qlib 约定停牌日 open/close/high/low/volume/factor 字段均应填 NaN，以便框架 在因子计算时识别并跳过。用户自建 A 股数据集时若将停牌日保留为上一日价格 （常见于从东财/Wind 直接导出的数据），会导致停牌期间的价格动量因子出现 "假信号"（价格不变但因子非零）。Qlib 不校验此约定，错误静默流入训练数据。

Source: https://github.com/microsoft/qlib/issues/2184

### `AP-QLIB-1892` — PIT（Point-In-Time）财务数据收集器依赖外部股票列表接口，全量 A 股获取不完整 <sub>(high)</sub>

Qlib 的 PIT 数据收集器（财务数据时间点快照）在初始化时调用 get_hs_stock_symbols() 获取沪深股票列表。该函数依赖东财 API，经常仅返回 部分列表而非全量 5000+ 股票，且函数在获取不完整时直接 raise ValueError。 用户若按文档步骤操作，财务数据集将只覆盖部分股票，基于 PIT 财务因子的回测 存在严重生存者偏差（未被采集的股票被隐式排除）。

Source: https://github.com/microsoft/qlib/issues/1892

### `AP-QLIB-2097` — 全市场 instrument="all" 在 32GB 内存机器上 OOM，但 CSI300 正常 <sub>(medium)</sub>

Qlib 在加载 Alpha158 特征时会将指定 universe 的全部特征矩阵一次性载入内存。 使用 instrument="csi300"（300 股）与 instrument="all"（5000+ 股）的内存占用 差约 16 倍。32GB 机器跑全市场时在 init_instance_by_config 阶段直接 OOM， 错误信息不提示内存问题。用户容易误以为是配置错误，实际上需要分批加载或 使用流式特征计算。

Source: https://github.com/microsoft/qlib/issues/2097

### `AP-QLIB-1984` — LightGBM 模型标签维度校验逻辑永远不触发导致多标签训练静默失败 <sub>(medium)</sub>

Qlib gbdt.py 中用 y.values.ndim == 2 判断是否为多标签，但从 DataFrame 取出的 Series 的 ndim 永远为 1，条件永远为 False，因此多标签训练不会走 squeeze 分支，而是直接进入 LightGBM 训练并在更深处抛出语义不明的错误。 用户尝试自定义多标签任务时无法从错误信息定位到此根因。

Source: https://github.com/microsoft/qlib/issues/1984

### `AP-QLIB-1915` — 自定义 CSV 数据 dump_bin 后 DataHandler 报 Length mismatch，D.features 却正常 <sub>(high)</sub>

Qlib 存在两套数据访问路径：D.features（直接读 binary）和 DataHandler/DataHandlerLP （带 processor pipeline）。自定义 A 股 CSV 数据在 dump_bin 时若字段顺序 或 symbol 格式（如 600000.SH vs SH600000）与 Qlib 约定不符，DataHandler 的 processor 在 align/reindex 时触发 Length mismatch，而 D.features 因不 经过 processor 而成功。这一"两套路径行为不一致"让用户误以为数据已正确导入。

Source: https://github.com/microsoft/qlib/issues/1915

### `AP-QLIB-1949` — Colab/Linux 多进程后端与 Qlib ParallelExt 冲突导致 DataHandler 完全不可用 <sub>(medium)</sub>

Qlib 在非 fork 环境（Windows 或 Google Colab）中，DataHandler 使用 joblib 并行加载特征时，ParallelExt 初始化时访问 _backend_args 属性失败（AttributeError）。 根因是 joblib 1.5+ 移除了该内部属性，Qlib 的兼容层未更新。表现为 D.features 调用抛出多层嵌套异常，用户无法从错误栈判断是并行后端问题还是数据问题。

Source: https://github.com/microsoft/qlib/issues/1949

## vnpy (4)

### `AP-VNPY-3691` — K 线生成器首根 K 线时间戳不对齐，导致第一个周期信号错误 <sub>(high)</sub>

vnpy BarGenerator 在合成 N 分钟 K 线时，第一根推送的 K 线时间戳为"当前 tick 所在分钟"而非"完整 N 分钟周期结束时间"。具体表现：09:59 的 tick 会 触发一根不完整的 5 分钟 K 线推送（本应等到 10:04 才推送）。策略若在 on_bar 中直接用 datetime.minute % 5 过滤，第一根 K 线恰好通过，但包含的 数据不足一个完整周期，用于信号计算会产生错误的开仓信号。

Source: https://github.com/vnpy/vnpy/issues/3691

### `AP-VNPY-3669` — Alpha 模块历史数据增量保存时新旧 DataFrame schema 不兼容导致 SchemaError <sub>(medium)</sub>

vnpy Alpha 模块在保存 K 线数据到 Parquet 文件时，将新下载数据（可能含 Float64 列）与已存文件（历史 Int64 列）直接 polars.concat。polars 强类型 不允许隐式类型提升，抛出 SchemaError。根因是不同数据源/版本返回的字段类型 不一致（如 volume 在部分行情源为整数，在另一些为浮点），且 concat 前无 schema 对齐步骤。影响所有使用 vnpy alpha 进行回测的历史数据构建流程。

Source: https://github.com/vnpy/vnpy/issues/3669

### `AP-VNPY-3685` — 价差交易模块 run_backtesting() 在 Jupyter 环境下静默报错，结果不可信 <sub>(high)</sub>

vnpy 4.10 价差交易（SpreadTrading）模块的 run_backtesting() 在 Jupyter 环境下存在事件循环冲突（asyncio already running），导致回测引擎部分逻辑 不执行但不抛异常，返回看似正常的回测统计数据。同样代码在命令行 Python 中无此问题。vnpy 4.x 将部分 IO 改为 async 但 Jupyter 的事件循环与之不兼容， 是"回测结果看起来正确但实际不完整"的隐蔽陷阱。

Source: https://github.com/vnpy/vnpy/issues/3685

### `AP-VNPY-3700` — 安装脚本不使用 venv 导致全局 numpy 版本被降级破坏其他依赖 <sub>(medium)</sub>

vnpy install.bat 直接在系统/conda base 环境安装，会强制降级 numpy 到 <2.0 以满足 vnpy 依赖，破坏依赖 numpy 2.x 的其他量化工具（如 scipy、pytorch 新版）。 没有 requirements.txt，依赖边界不透明。在多工具共存的量化研究环境中， vnpy 的安装脚本是"全局环境污染"的常见根源。

Source: https://github.com/vnpy/vnpy/issues/3700

## zipline (6)

### `AP-ZIPLINE-138` — 回测价格为未复权价，教程图表误导用户误判策略收益 <sub>(high)</sub>

Zipline 教程使用 AAPL 股价图做演示，但 bundle 中存储的是未复权价格（raw price）， 而非经过拆股/分红调整的复权价。图表显示的历史价格与市场实际价约差 4 倍（Apple 历次拆股累计因子），用户误将"价格翻 4 倍"当作策略收益。A 股场景更严重： 除权前后价格跳变会在未复权数据中形成巨大"信号"，吸引技术指标在除权日产生 虚假突破信号。

Source: https://github.com/stefan-jansen/zipline-reloaded/issues/138

### `AP-ZIPLINE-235` — 默认以当根 K 线收盘价成交，低估实盘滑点，策略回测收益虚高 <sub>(high)</sub>

Zipline 默认滑点模型在当根 K 线触发信号后，以同根 K 线收盘价成交（current bar close fill）。实盘中信号只能在下一根 K 线的开盘价附近成交（T+1 order execution）。以 A 股日线为例，用收盘价回测比用次日开盘价成交平均高估日收益 约 0.1-0.3%，年化差距可超 30%。需显式配置 slippage model 为 VolumeShareSlippage 或 FixedSlippage 并设合理 volume_limit。

Source: https://github.com/stefan-jansen/zipline-reloaded/issues/235

### `AP-ZIPLINE-190` — 日历 start_session 设为非交易日触发 DateOutOfBounds，无提示如何修正 <sub>(medium)</sub>

Zipline 在注册 bundle 或运行算法时，若 start_session 参数恰好是非交易日 （如 1998-01-01 元旦），Calendar 校验抛出 DateOutOfBounds（"cannot be earlier than the first session"）。错误信息仅显示交易日历起始日，不提示"请改为第一个 交易日"。A 股场景：使用 SSE/SZSE 日历时，若 start_date 恰好是春节前最后 一天次日（节假日），会触发同类错误，调试成本极高。

Source: https://github.com/stefan-jansen/zipline-reloaded/issues/190

### `AP-ZIPLINE-181` — asset db 过期后 Pipeline 报"no assets traded"，误导用户排查数据范围 <sub>(high)</sub>

Zipline 的 asset database（SQLite）记录每只股票的 start/end 交易日期。若 使用了旧版 Quandl/自建 bundle 且未重新 ingest，在回测新日期范围时 Pipeline 抛出 "Failed to find any assets with country_code 'US' that traded between [dates]"。A 股场景：重新下载行情后若只更新价格数据而未重建 asset db，退市/ 新上市股票的日期范围不更新，Pipeline 过滤会悄悄排除这些股票，产生生存者偏差。

Source: https://github.com/stefan-jansen/zipline-reloaded/issues/181

### `AP-ZIPLINE-285` — week_start()/week_end() 在自定义日历（非美股）下静默失效 <sub>(medium)</sub>

Zipline schedule_function 的 date_rules.week_start() 和 date_rules.week_end() 依赖交易日历的周首/周末判断逻辑，但在非美股日历（如 ASX、SSE）中，该逻辑 与 NYSE 日历的偏移计算不兼容，导致 schedule 永远不触发或在错误的日期触发。 A 股场景：使用 SSE 日历时，含春节等连续长假的周，week_start 可能跳过整个 假期周而不调仓，但用户无法从日志发现未触发的调度。

Source: https://github.com/stefan-jansen/zipline-reloaded/issues/285

### `AP-ZIPLINE-240` — 回测日期时区必须为 UTC，传入 naive datetime 引发深层 AssertionError <sub>(medium)</sub>

Zipline 内部强制要求所有时间戳为 UTC aware datetime。当用户传入 naive datetime （无时区信息，如 pd.Timestamp('2020-01-01')）时，不在入口处报错，而是在 算法执行深处触发 AssertionError: Algorithm should have a utc datetime，栈深 难以定位。A 股开发者从本地 CST 时间导入数据时极易触发此陷阱，需在 bundle 注册时显式 tz_localize('UTC')。

Source: https://github.com/stefan-jansen/zipline-reloaded/issues/240

## zvt (6)

### `AP-ZVT-183` — 除权因子为 inf/NaN 时直接参与乘法导致复权静默失败 <sub>(high)</sub>

ZVT 在计算前复权因子时以 new/old 价格比计算 qfq_factor。当 old==0（新股首日 或数据缺失）时因子为 inf；当 kdata.open 本身为 None（停牌日未填充）时乘法 抛出 TypeError。结果：整个 entity 的复权计算中断，后续 K 线全部丢失，但主 流程只 log ERROR 不中断，用户往往不知道已有大量股票数据损坏。

Source: https://github.com/zvtvz/zvt/issues/183

### `AP-ZVT-179` — 第三方数据接口超限后异常被吞噬，数据静默缺失 <sub>(high)</sub>

ZVT 使用聚宽 jqdatasdk 批量拉取全市场 K 线时（4000+ 股票），触发聚宽每日 最大查询条数限制（错误：已超过每日最大查询数量）。ZVT 捕获异常后继续执行下一 entity，导致超限后所有股票的当日数据均静默缺失。回测若使用该残缺数据库，因 子计算结果将产生系统性偏差，且无告警。

Source: https://github.com/zvtvz/zvt/issues/179

### `AP-ZVT-161` — 全市场 SQLite 批量因子计算触发 too many SQL variables 错误 <sub>(medium)</sub>

ZVT 在计算 VolumeUpMaFactor 等多股因子时，将所有 entity_id 拼入单条 SQL 的 IN 子句。当 A 股全市场（5000+ 股）一次性查询时，触发 SQLite 默认限制 SQLITE_MAX_VARIABLE_NUMBER=999。调大 max_allowed_packet（MySQL 参数）无效， 根因是 SQLite 变量数上限。正确解法是分批查询，但 ZVT 早期版本未处理此边界。

Source: https://github.com/zvtvz/zvt/issues/161

### `AP-ZVT-129` — 使用通配符导入隐藏 API 版本变更，AdjustType 等枚举莫名消失 <sub>(medium)</sub>

ZVT 文档示例使用 `from zvt import *` 导入所有符号。当 ZVT 版本升级重构 枚举（如将 AdjustType 移入子模块）后，通配符导入不再包含该符号，触发 AttributeError。使用者误以为是安装问题，实际是版本间 API breaking change 未在 CHANGELOG 中标注，且通配符导入掩盖了具体来源。应显式 import 枚举类。

Source: https://github.com/zvtvz/zvt/issues/129

### `AP-ZVT-187` — 回测引擎未在数据层空结果时提前终止，导致空指针级联崩溃 <sub>(medium)</sub>

ZVT Trader 在 load_data 完成后检查数据为空时，不提前退出，而是将空 DataFrame 传入 selector 计算，触发后续 NoneType 操作链式崩溃。错误栈深且难以定位根因， 用户误以为是策略逻辑问题。根因是数据时间窗口配置错误（start/end 不在数据 库覆盖范围内）但无有效校验。

Source: https://github.com/zvtvz/zvt/issues/187

### `AP-ZVT-183B` — HFQ（后复权）与 QFQ（前复权）K 线表使用错误导致因子计算漂移 <sub>(high)</sub>

ZVT 提供 Stock1dKdata（不复权）、Stock1dHfqKdata（后复权）、Stock1dQfqKdata （前复权）三张独立表。用户在计算价格动量/均线因子时混用两张表（如用不复权 做均线，用后复权做收益率），导致除权日前后因子值产生跳变。ZVT 不做跨表 复权类型一致性校验，混用静默通过。

Source: https://github.com/zvtvz/zvt/issues/183

FILE:references/COMPONENTS.md
# Component Capability Map

**Project**: finance-bp-115--mlfinlab
**Scan date**: 2026-04-22
**Stats**: {'total_files': 12, 'total_classes': 58, 'total_functions': 0, 'total_stages': 12}

## Modules (12)

- [data_ingestion_&_bar_construction](components/data_ingestion_-_bar_construction.md): 4 classes
- [event_filtering_&_sampling](components/event_filtering_-_sampling.md): 3 classes
- [triple_barrier_labeling_&_meta-labeling](components/triple_barrier_labeling_-_meta-labeling.md): 7 classes
- [sample_weighting_&_uniqueness](components/sample_weighting_-_uniqueness.md): 5 classes
- [feature_engineering_&_importance](components/feature_engineering_-_importance.md): 6 classes
- [model_training_with_sequential_bootstrap](components/model_training_with_sequential_bootstrap.md): 5 classes
- [bet_sizing](components/bet_sizing.md): 5 classes
- [backtesting_&_statistics](components/backtesting_-_statistics.md): 6 classes
- [correlation_&_codependence_analysis](components/correlation_-_codependence_analysis.md): 4 classes
- [clustering_&_network_generation](components/clustering_-_network_generation.md): 6 classes
- [synthetic_data_generation](components/synthetic_data_generation.md): 5 classes
- [structural_break_detection](components/structural_break_detection.md): 2 classes

FILE:references/CONSTRAINTS.md
# Constraints

## preservation_manifest

```yaml
required_objects:
  business_decisions_count: 131
  fatal_constraints_count: 76
  non_fatal_constraints_count: 250
  use_cases_count: 1
  semantic_locks_count: 12
  preconditions_count: 4
  evidence_quality_rules_count: 2
  traceback_scenarios_count: 5

```

## Domain Constraints Injected (39)

- **`SHARED-BT-LAB-001`** <sub>(fatal)</sub>: 未来函数（Lookahead Bias）：在模拟历史时间点 t 的交易决策时， 不得使用 t 时刻之后才能知道的信息。最常见形式： (1) 使用收盘价计算信号并同日以收盘价成交； (2) 将 T 日收盘后计算的指标标记在同一根 K 线； (3) 使用当日最高/最低价作为成交假设。 信号计算与成交时间必须对齐：T 日收盘后计算信号，T+1 日开盘成交。
- **`SHARED-BT-LAB-002`** <sub>(high)</sub>: 指标预热期（Warmup Period）处理：滚动窗口指标在前 N 个 bar 时 NaN， 这些 bar 不应参与信号计算和持仓决策。强制要求指标的 warmup_period 与最长 lookback 期等长，且 warmup 期间持仓应置零。
- **`SHARED-BT-LAB-003`** <sub>(fatal)</sub>: ML/DL 模型时序数据划分必须按时间顺序：TRAIN < VALID < TEST， 不可使用随机 k-fold 分折（会将未来数据混入训练集）。 应使用 TimeSeriesSplit 或 Walk-Forward 验证。
- **`SHARED-BT-LAB-004`** <sub>(fatal)</sub>: 开盘价/最高价/最低价成交假设：日线回测中假设每日可以最高价卖出或 最低价买入（如动量策略"最高价止盈"），这是明显的 lookahead， 因为日内最高/最低价只有收盘后才能确认。成交价只能用开盘价或 前一日收盘价（带滑点）。
- **`SHARED-BT-LAB-005`** <sub>(high)</sub>: 数据对齐偏移（Off-by-one）：pandas rolling/shift 等操作容易引入细微的 1 期偏移错误。应在代码中明确记录每个序列的"观测时间点"， 并通过 assert 验证关键时间对齐关系。
- **`SHARED-BT-LAB-006`** <sub>(high)</sub>: 过度优化（Overfitting）：回测数量越多，过拟合概率越高。 Bailey et al.（2014）证明 Optimal Sharpe Ratio 期望值随回测次数单调递减。 应使用 Walk-Forward 验证代替 in-sample 参数穷举，并报告 Deflated Sharpe Ratio（DSR）而非峰值 Sharpe。
- **`SHARED-BT-SURV-001`** <sub>(fatal)</sub>: 幸存者偏差（Survivorship Bias）：使用当前市场成分股作为历史回测股票池， 会遗漏曾经存在但后来退市、摘牌或被合并的股票，系统性高估策略历史收益率。 回测股票池必须使用历史时点快照（point-in-time universe）。
- **`SHARED-BT-SURV-002`** <sub>(high)</sub>: In-Sample / Out-of-Sample 划分：策略开发、参数选择必须在样本内完成， 样本外数据仅用于最终验证，不可多次"看"样本外数据后继续调优 （会将样本外变为新的样本内，重蹈过拟合）。
- **`SHARED-BT-SURV-003`** <sub>(high)</sub>: 停牌/缺失数据的填充策略：停牌日价格不可简单用前一日收盘价 forward-fill， 因为这会在复盘时造成"零成交量"日参与了因子计算和信号生成。 应在因子计算层显式过滤缺失交易日，不填充。
- **`SHARED-BT-SURV-004`** <sub>(high)</sub>: 异常值（Extreme Value）污染：原始市场数据可能含有数据源错误（如除权未 及时调整、手工录入错误导致的极端价格），不清洗直接进入因子计算会产生 极端信号，污染整个横截面。应在 pipeline 入口处过滤 3-sigma 异常值。
- **`SHARED-BT-COST-001`** <sub>(fatal)</sub>: 交易成本（佣金 + 印花税/转让税 + 过户费）必须在回测初始化时强制配置， 不可使用零成本默认值。忽略成本的回测策略绩效指标具有欺骗性， 高换手率策略尤其严重（单边往返成本往往吞噬 50%+ 的毛收益）。
- **`SHARED-BT-COST-002`** <sub>(high)</sub>: 滑点（Slippage）建模：回测若无滑点，假设每笔订单以理想价格成交， 高频策略在实盘中会因成交价劣化而产生严重亏损。至少应配置固定点差 或比例滑点；大单应使用成交量比例模型（如不超过日成交量 5%）。
- **`SHARED-BT-COST-003`** <sub>(high)</sub>: 换手率（Turnover）必须在回测绩效报告中展示并与成本关联分析。 月换手率超过 50%（年化 600%+）时，策略净收益对成本假设极度敏感， 每 10bps 成本变化可能改变策略盈亏结论，必须做成本敏感性分析。
- **`SHARED-BT-COST-004`** <sub>(medium)</sub>: 仓位规模化（Position Sizing）必须纳入资金量约束：回测应模拟固定资金量 下的实际持仓股数（取整），而非假设可以持有小数股。 对小盘股，最小交易单位（A股：100股/手）会导致实际可持仓量与目标权重 产生偏差，应在回测中模拟取整效应。
- **`SHARED-BT-TIME-001`** <sub>(high)</sub>: 时间戳时区统一：多数据源合并时，UTC vs 本地时间混用是常见数据腐败源。 所有时间戳必须在 pipeline 入口处统一转换为同一时区（推荐 UTC 存储， 市场本地时区展示），不可在 pipeline 中途混用不同时区。
- **`SHARED-BT-TIME-002`** <sub>(high)</sub>: 交易日历对齐：合并不同市场或不同频率数据时（如日线价格 + 周频因子）， 必须使用明确的交易日历进行 reindex/merge，不可使用 outer join 后 fillna， 否则会在非交易日（节假日）创建虚假数据行。
- **`SHARED-BT-TIME-003`** <sub>(high)</sub>: 增量更新边界校验：历史数据增量更新时，必须从数据库查询已存最新日期， 仅下载该日期之后的数据。若重新下载已有数据并追加，会产生时间戳重复行， 导致回测时序错误。更新前后必须校验无重复 (index.duplicated().any() == False)。
- **`SHARED-BT-TIME-004`** <sub>(medium)</sub>: 回测绩效归因失真：基准（Benchmark）选择不当会使 Alpha/Beta 计算失真。 应选用策略实际可投资的被动基准（如 HS300 ETF），而非不可直接投资的 价格指数（如 HS300 指数）。价格指数不含股息再投资，会低估持仓基准收益。
- **`SHARED-BT-PERF-001`** <sub>(medium)</sub>: 最大回撤（Max Drawdown）计算必须使用净值序列（portfolio value）， 不可用累计收益率序列代替。若使用对数收益率累加，会低估回撤深度 （因对数收益率在下跌时会比简单收益率偏小）。
- **`SHARED-BT-PERF-002`** <sub>(medium)</sub>: Sharpe Ratio 年化化约定：年化 Sharpe = 日 Sharpe × sqrt(252)（股票，252 交易日） 或 × sqrt(365)（加密货币，365日）。不同系统默认不同，跨系统对比前必须 确认年化因子，否则 Sharpe 不可比。
- **`SHARED-BT-PERF-003`** <sub>(medium)</sub>: Calmar Ratio / Sortino Ratio 优于 Sharpe Ratio 作为风险调整收益指标： Sharpe 假设收益正态分布，A 股/加密市场的收益分布显著左偏（肥尾）， 会低估下行风险。量化评估应同时报告 Sortino（仅下行波动）和 Calmar（年化收益/最大回撤），不应单一依赖 Sharpe。
- **`SHARED-BT-PERF-004`** <sub>(medium)</sub>: 回测绩效归因应拆解为：alpha（主动收益）、beta（市场收益）、 因子暴露收益（style/sector）和特异性收益（stock selection）。 不做归因的回测无法区分"策略优秀"与"顺风行情恰好 beta 对了"。
- **`SHARED-FR-IC-001`** <sub>(high)</sub>: IC（信息系数）是衡量因子预测能力的核心指标，定义为因子值与 下期收益率的 Spearman 秩相关系数（ICIR = IC / std(IC)）。 IC 绝对值 > 0.05 视为有预测能力的初步证据，ICIR > 0.5 视为稳定。 不计算 IC 直接报告回测绩效是因子有效性证明缺失的典型问题。
- **`SHARED-FR-IC-002`** <sub>(high)</sub>: IC 衰减（IC Decay）分析：因子预测能力通常随持仓期增长而衰减。 应计算 1/5/10/20 日 IC 序列，识别因子的最优持仓期。 IC 在1日高但20日迅速衰减的因子是短期因子，不适合月度换仓策略； 反之亦然。使用错误的持仓期会严重损害因子实盘表现。
- **`SHARED-FR-IC-003`** <sub>(high)</sub>: Harvey, Liu & Zhu (2016) 警告：学术界已发现 300+ 个"显著"因子， 其中大量是多重检验下的误发现（False Discovery）。因子有效性要求： t-stat > 3.0（而非传统的 1.96）；或在不同时段/市场独立复现； 或有清晰的经济学逻辑。不满足上述条件的因子极可能是数据挖掘产物。
- **`SHARED-FR-IC-004`** <sub>(high)</sub>: 因子换手率（Factor Turnover）控制：高 IC 但高换手率的因子，在扣除 交易成本后净 IC 可能为负。应计算换手率调整后的有效 IC： net_IC = IC - turnover × cost_per_turn。目标换手率 ≤ 50%（月频）。
- **`SHARED-FR-IC-005`** <sub>(medium)</sub>: 因子衰减期（Half-life）是因子信号强度的核心参数，直接决定最优再平衡频率。 半衰期 < 5 日：日频或周频换仓；5-20 日：周频或双周；> 20 日：月频换仓。 错误地对短期因子使用月频换仓，会导致大量 alpha 在持仓期内消散。
- **`SHARED-FR-NEUT-001`** <sub>(high)</sub>: 行业中性化（Industry Neutralization）：因子值若不对行业均值中性化， 因子收益中会混入行业轮动收益，难以判断是因子本身还是行业暴露驱动了收益。 行业中性化操作：factor_neutral = factor - industry_mean(factor)。
- **`SHARED-FR-NEUT-002`** <sub>(high)</sub>: 市值中性化（Market Cap Neutralization）：小盘股效应（小盘跑赢大盘） 是金融史上最持久的 anomaly 之一，会污染几乎所有未中性化的因子。 若因子与市值高度相关，选股会系统性偏向小盘，收益来自市值暴露而非因子本身。 需同时进行行业和市值中性化（Fama-MacBeth 回归或残差法）。
- **`SHARED-FR-NEUT-003`** <sub>(high)</sub>: 异常值处理（Winsorize/MAD）：因子原始值通常含有极端值，极端值会扭曲 分组分析（如 Q1/Q10 十分位）。应对原始因子值做 Winsorize（截尾至 [1%, 99%] 或 3-sigma）或 MAD（中位数绝对偏差）缩尾，然后再排名/中性化。
- **`SHARED-FR-NEUT-004`** <sub>(medium)</sub>: 因子正交化（Factor Orthogonalization）：当多个因子共同用于合成打分时， 高相关因子的合成等效于对单一因子过度权重，稀释信号多样性。 应在合成前对因子做施密特正交化或 PCA，消除因子间的多重共线性。
- **`SHARED-FR-NEUT-005`** <sub>(medium)</sub>: 缺失数据填充策略：因子计算中的 NaN（停牌/新股/数据缺口）若用截面均值填充 会引入 lookahead bias（均值本身含未来信息）；若完全删除会产生幸存者偏差； 正确做法是用截面中位数（当日所有股票的中位数，不依赖未来）或将该股当日排除。
- **`SHARED-FR-PORT-001`** <sub>(high)</sub>: 分层分析（Quantile Analysis）：因子评估应使用 Q1/Q5（五分位）或 Q1/Q10（十分位）分组的多空收益差（top minus bottom spread）作为 主要评估指标，而非简单的多头收益。Q1 多 Q5 空的"单调性"检验是 因子有效性的核心证据：单调递增/递减 > 非单调 >> 仅多头有效。
- **`SHARED-FR-PORT-002`** <sub>(medium)</sub>: Alpha 衰减测试（Alpha Decay Test）：因子的月度 IC 在不同时段（牛市/熊市/ 震荡市）的稳定性是因子鲁棒性的重要证据。IC 仅在某个特定市场状态下有效 的因子不适合全天候部署；应分段（rolling 12M）展示 IC 时序， 识别因子失效期。
- **`SHARED-FR-PORT-003`** <sub>(medium)</sub>: 换仓成本感知（Turnover-Aware Selection）：因子排名靠近中间地带（49-51 分位） 的股票，排名小幅波动就会触发换仓，产生大量无效交易成本。 应在选股时设置换仓缓冲区（buffer zone）：只在排名变化超过阈值时才换仓。
- **`SHARED-FR-PORT-004`** <sub>(medium)</sub>: 分组收益的统计显著性（Bootstrap 检验）：因子分层收益差（Q1-Q5 spread） 即使在历史数据上很大，也可能是偶然，需要 bootstrap 或 t-test 检验 显著性（p-value < 0.05）。小样本回测期（< 3年）的分层收益尤其不可靠。
- **`SHARED-FR-XFER-001`** <sub>(high)</sub>: 因子跨市场可移植性验证：在一个市场有效的因子，不必然在另一个市场有效。 将美股因子直接套用 A 股、或将股票因子套用期货/加密货币，需要独立 IC 验证， 不可假设跨市场通用性。A 股特有异象（如反转效应、ST 价格异常）不存在于美股。
- **`SHARED-FR-XFER-002`** <sub>(medium)</sub>: 因子有效性时间稳定性：曾经有效的因子会因市场学习和套利行为逐渐失效 （McLean & Pontiff 2016 证明因子发表后平均衰减 58%）。 应定期（每季度/年）重新评估因子 IC，失效因子应及时替换或降权。
- **`SHARED-FR-XFER-003`** <sub>(medium)</sub>: 因子与宏观经济环境的交互：利率周期/经济周期/市场情绪对因子有效性影响显著。 价值因子（低 P/B）在利率上升期更有效；动量因子在趋势市更有效，震荡市失效。 部署因子前应评估当前宏观环境与因子最优生存环境的匹配度。

FILE:references/LOCKS.md
# Semantic Locks + Preconditions

## Semantic Locks (12)

### `SL-01` <sub>(on_violation: fatal)</sub>
Execute sell orders before buy orders in every trading cycle

### `SL-02` <sub>(on_violation: fatal)</sub>
Trading signals MUST use next-bar execution (no look-ahead)

### `SL-03` <sub>(on_violation: fatal)</sub>
Entity IDs MUST follow format entity_type_exchange_code

### `SL-04` <sub>(on_violation: fatal)</sub>
DataFrame index MUST be MultiIndex (entity_id, timestamp)

### `SL-05` <sub>(on_violation: fatal)</sub>
TradingSignal MUST have EXACTLY ONE of: position_pct, order_money, order_amount

### `SL-06` <sub>(on_violation: fatal)</sub>
filter_result column semantics: True=BUY, False=SELL, None/NaN=NO ACTION

### `SL-07` <sub>(on_violation: fatal)</sub>
Transformer MUST run BEFORE Accumulator in factor pipeline

### `SL-08` <sub>(on_violation: fatal)</sub>
MACD parameters locked: fast=12, slow=26, signal=9

### `SL-09` <sub>(on_violation: warning)</sub>
Default transaction costs: buy_cost=0.001, sell_cost=0.001, slippage=0.001

### `SL-10` <sub>(on_violation: fatal)</sub>
A-share equity trading is T+1 (no same-day close of buy positions)

### `SL-11` <sub>(on_violation: fatal)</sub>
Recorder subclass MUST define provider AND data_schema class attributes

### `SL-12` <sub>(on_violation: fatal)</sub>
Factor result_df MUST contain either 'filter_result' OR 'score_result' column

## Preconditions (4)

- **`PC-01`**: `python3 -c 'import zvt; print(zvt.__version__)'` → on_fail: Run: python3 -m pip install zvt then re-run: python3 -m zvt.init_dirs to initialize data directories
- **`PC-02`**: `python3 -c "from zvt.api.kdata import get_kdata; df = get_kdata(entity_ids=['stock_sh_600000'], limit=1); assert df is not None and len(df) > 0, 'No kdata found'"` → on_fail: Run recorder first: python3 -m zvt.recorders.em.em_stock_kdata_recorder --entity_ids stock_sh_600000 (replace with your target entity IDs)
- **`PC-03`**: `python3 -c "import os; from pathlib import Path; zvt_home = Path(os.environ.get('ZVT_HOME', Path.home() / '.zvt')); assert zvt_home.exists(), f'ZVT home not found: {zvt_home}'"` → on_fail: Run: python3 -m zvt.init_dirs
- **`PC-04`**: `python3 -c "import os, tempfile; from pathlib import Path; zvt_home = Path(os.environ.get('ZVT_HOME', Path.home() / '.zvt')); test_f = zvt_home / '.write_test'; test_f.touch(); test_f.unlink()"` → on_fail: Check directory permissions: chmod u+w ~/.zvt or set ZVT_HOME environment variable to a writable location

FILE:references/USE_CASES.md
# Known Use Cases (KUC)

Total: **1**

## `KUC-101`
**Source**: `docs/source/conf.py`

How to configure and generate project documentation using Sphinx autodoc and extensions for API documentation coverage.

FILE:references/WISDOM.md
# Cross-Project Wisdom

Total: **10**

## `CW-BT-001` — Cerebro 统一编排引擎
**From**: backtrader · **Applicable to**: backtesting

backtrader 用 Cerebro 作为单一入口，统一管理 data feeds、strategies、analyzers、 observers 的生命周期，支持一次 cerebro.run() 跑多策略+多数据源。 zvt 的 StockTrader 目前每次实例化只绑定一套因子，缺乏统一的多策略组合编排层； 借鉴 Cerebro 模式可让用户把多个 Trader 实例组合到一个 runner 中对比评估。

## `CW-BT-002` — Analyzer 插件化绩效评估
**From**: backtrader · **Applicable to**: backtesting

backtrader 提供 SharpeRatio、DrawDown、TimeReturn、TradeAnalyzer 等即插即用 的 Analyzer，可在不修改策略代码的情况下附加任意绩效指标。 zvt 当前绩效评估能力较弱，没有标准化的 Analyzer 接口； 借鉴此模式可让用户 cerebro.addanalyzer(SharpeRatio) 即得风险调整收益报告。

## `CW-BT-003` — Sizer 仓位管理分离
**From**: backtrader · **Applicable to**: backtesting

backtrader 将仓位管理（每次开仓买多少股/多大比例）单独抽象为 Sizer， 与信号逻辑完全解耦；内置 FixedSize、PercentSizer 等，用户可自定义。 zvt 目前没有显式的 Sizer 概念，仓位控制逻辑散落在 Trader.on_profit_control 等钩子中； 引入 Sizer 接口可使策略信号与资金管理规则独立演化和组合复用。

## `CW-BT-004` — Order 类型全集（Limit/Stop/OCO/Bracket）
**From**: backtrader · **Applicable to**: backtesting

backtrader 支持 Market、Limit、Stop、StopLimit、OCO（二选一）、 Bracket（止盈止损一对订单）等丰富订单类型，并模拟成交滑点和手续费方案。 zvt 回测目前主要支持市价成交，缺乏限价委托和组合订单模拟； 对于高频或实盘对接场景，完善订单类型将大幅提升回测真实性。

## `CW-BT-005` — 数据重采样与重播（Resampling & Replaying）
**From**: backtrader · **Applicable to**: backtesting

backtrader 可将低级别数据（如 1 min）实时 resample 为高级别（如 1 day）并同步驱动策略， 或 replay 逐 tick 模拟 OHLC 形成过程，实现日内精细回测。 zvt 目前多时间框架通过预录入不同级别 K 线实现，缺少运行时动态重采样； 借鉴此模式可在不重复录入数据的前提下支持任意时间粒度组合回测。

## `CW-VN-003` — CTA 回测引擎内置可视化
**From**: vnpy · **Applicable to**: backtesting

vnpy 的 cta_backtester 提供图形界面直接展示策略净值曲线、最大回撤、 每日盈亏、成交明细，无需 Jupyter Notebook。 zvt 目前回测结果可视化依赖 draw_result 方法调用 Plotly，但无统一的回测报告页面； 借鉴此模式可打包一个开箱即用的策略绩效仪表盘。

## `CW-VN-004` — vnpy.alpha ML 因子研究实验室（Lab）
**From**: vnpy · **Applicable to**: factor-research

vnpy 4.0 的 vnpy.alpha.lab 提供数据管理、模型训练、信号生成、策略回测一体化工作流， 支持 Lasso/LightGBM/MLP 等算法的标准化训练接口和可视化对比。 zvt 的 ML 能力目前仅有 MaStockMLMachine 一个入口，缺乏规范化 Lab 框架； 借鉴 Lab 模式可建立"特征工程→训练→信号→回测"的标准流水线，降低 ML 实验门槛。

## `CW-QL-001` — Point-in-Time 数据库（防未来数据泄漏）
**From**: qlib · **Applicable to**: backtesting

qlib 的 Point-in-Time Provider 保证在给定时间点 t 的查询只返回 t 时刻 真实可知的数据（财报发布延迟、修订历史均被正确处理）， 彻底消除回测中的 look-ahead bias。 zvt 目前财务数据以报告期为 timestamp，缺少"发布日"维度， 存在用未来财报数据做选股的潜在偏差；引入 PIT 模式可大幅提升回测可信度。

## `CW-QL-002` — Recorder + Experiment 实验管理（MLflow 风格）
**From**: qlib · **Applicable to**: factor-research

qlib 的 workflow 模块提供 Experiment/Recorder，自动记录每次模型训练的 超参数、特征、指标、预测结果，支持跨实验比较和模型版本管理。 zvt 目前缺乏 ML 实验追踪机制，每次重跑结果会覆盖前次； 借鉴 Recorder 模式可将每次因子实验的参数和结果持久化，支持快速复现和版本对比。

## `CW-QL-003` — Nested Decision Framework（多层嵌套决策执行）
**From**: qlib · **Applicable to**: backtesting

qlib 支持将高频执行层（分钟级委托拆单）嵌套在低频决策层（日级组合调仓）中， 两层独立优化且可组合运行，实现日内最优执行算法（如 TWAP、VWAP 调仓）。 zvt 目前回测仅有日线级别的成交假设，缺乏执行算法建模； 借鉴嵌套框架可让 zvt 区分"何时持有哪些股"与"如何以最小冲击成本建仓"两个问题。

FILE:references/components/backtesting_-_statistics.md
# backtesting_&_statistics (6 classes)

## `CampbellBacktesting.haircut_sharpe_ratios`
`backtesting_&_statistics/campbellbacktesting-haircut-sharpe-ratio.py:0`

## `sharpe_ratio`
`backtesting_&_statistics/sharpe-ratio.py:0`

## `probabilistic_sharpe_ratio`
`backtesting_&_statistics/probabilistic-sharpe-ratio.py:0`

## `deflated_sharpe_ratio`
`backtesting_&_statistics/deflated-sharpe-ratio.py:0`

## `drawdown_and_time_under_water`
`backtesting_&_statistics/drawdown-and-time-under-water.py:0`

## `Sharpe adjustment method`
`backtesting_&_statistics/sharpe-adjustment-method.py:0`

FILE:references/components/bet_sizing.md
# bet_sizing (5 classes)

## `M2N.fit`
`bet_sizing/m2n-fit.py:0`

## `bet_size_probability`
`bet_sizing/bet-size-probability.py:0`

## `bet_size_dynamic`
`bet_sizing/bet-size-dynamic.py:0`

## `bet_size_reserve`
`bet_sizing/bet-size-reserve.py:0`

## `Sizing function`
`bet_sizing/sizing-function.py:0`

FILE:references/components/clustering_-_network_generation.md
# clustering_&_network_generation (6 classes)

## `MST.create_mst`
`clustering_&_network_generation/mst-create-mst.py:0`

## `PMFG.create_pmfg`
`clustering_&_network_generation/pmfg-create-pmfg.py:0`

## `ALMST.create_almst`
`clustering_&_network_generation/almst-create-almst.py:0`

## `get_feature_clusters`
`clustering_&_network_generation/get-feature-clusters.py:0`

## `optimal_hierarchical_cluster`
`clustering_&_network_generation/optimal-hierarchical-cluster.py:0`

## `Network type`
`clustering_&_network_generation/network-type.py:0`

FILE:references/components/correlation_-_codependence_analysis.md
# correlation_&_codependence_analysis (4 classes)

## `get_dependence_matrix`
`correlation_&_codependence_analysis/get-dependence-matrix.py:0`

## `get_mutual_info`
`correlation_&_codependence_analysis/get-mutual-info.py:0`

## `optimal_transport_dependence`
`correlation_&_codependence_analysis/optimal-transport-dependence.py:0`

## `Dependence metric`
`correlation_&_codependence_analysis/dependence-metric.py:0`

FILE:references/components/data_ingestion_-_bar_construction.md
# data_ingestion_&_bar_construction (4 classes)

## `MicrostructuralFeaturesGenerator.generate_features`
`data_ingestion_&_bar_construction/microstructuralfeaturesgenerator-generat.py:0`

## `Bar threshold calculation`
`data_ingestion_&_bar_construction/bar-threshold-calculation.py:0`

## `Imbalance metric`
`data_ingestion_&_bar_construction/imbalance-metric.py:0`

## `Bar type`
`data_ingestion_&_bar_construction/bar-type.py:0`

FILE:references/components/event_filtering_-_sampling.md
# event_filtering_&_sampling (3 classes)

## `cusum_filter`
`event_filtering_&_sampling/cusum-filter.py:0`

## `z_score_filter`
`event_filtering_&_sampling/z-score-filter.py:0`

## `Filter type`
`event_filtering_&_sampling/filter-type.py:0`

FILE:references/components/feature_engineering_-_importance.md
# feature_engineering_&_importance (6 classes)

## `FractionalDifferentiation.frac_diff_ffd`
`feature_engineering_&_importance/fractionaldifferentiation-frac-diff-ffd.py:0`

## `mean_decrease_impurity`
`feature_engineering_&_importance/mean-decrease-impurity.py:0`

## `mean_decrease_accuracy`
`feature_engineering_&_importance/mean-decrease-accuracy.py:0`

## `get_orthogonal_features`
`feature_engineering_&_importance/get-orthogonal-features.py:0`

## `Fractional differentiation method`
`feature_engineering_&_importance/fractional-differentiation-method.py:0`

## `Importance metric`
`feature_engineering_&_importance/importance-metric.py:0`

FILE:references/components/model_training_with_sequential_bootstrap.md
# model_training_with_sequential_bootstrap (5 classes)

## `SequentiallyBootstrappedBaggingClassifier.fit`
`model_training_with_sequential_bootstrap/sequentiallybootstrappedbaggingclassifie.py:0`

## `ml_cross_val_score`
`model_training_with_sequential_bootstrap/ml-cross-val-score.py:0`

## `PurgedKFold.split`
`model_training_with_sequential_bootstrap/purgedkfold-split.py:0`

## `CombinatorialPurgedKFold.split`
`model_training_with_sequential_bootstrap/combinatorialpurgedkfold-split.py:0`

## `Cross-validation generator`
`model_training_with_sequential_bootstrap/cross-validation-generator.py:0`

FILE:references/components/sample_weighting_-_uniqueness.md
# sample_weighting_&_uniqueness (5 classes)

## `get_weights_by_return`
`sample_weighting_&_uniqueness/get-weights-by-return.py:0`

## `get_weights_by_time_decay`
`sample_weighting_&_uniqueness/get-weights-by-time-decay.py:0`

## `seq_bootstrap`
`sample_weighting_&_uniqueness/seq-bootstrap.py:0`

## `get_ind_matrix`
`sample_weighting_&_uniqueness/get-ind-matrix.py:0`

## `Weighting scheme`
`sample_weighting_&_uniqueness/weighting-scheme.py:0`

FILE:references/components/structural_break_detection.md
# structural_break_detection (2 classes)

## `get_sadf`
`structural_break_detection/get-sadf.py:0`

## `Break detection model`
`structural_break_detection/break-detection-model.py:0`

FILE:references/components/synthetic_data_generation.md
# synthetic_data_generation (5 classes)

## `sample_from_dvine`
`synthetic_data_generation/sample-from-dvine.py:0`

## `sample_from_cvine`
`synthetic_data_generation/sample-from-cvine.py:0`

## `generate_hcmb_mat`
`synthetic_data_generation/generate-hcmb-mat.py:0`

## `sample_from_corrgan`
`synthetic_data_generation/sample-from-corrgan.py:0`

## `Generation method`
`synthetic_data_generation/generation-method.py:0`

FILE:references/components/triple_barrier_labeling_-_meta-labeling.md
# triple_barrier_labeling_&_meta-labeling (7 classes)

## `apply_pt_sl_on_t1`
`triple_barrier_labeling_&_meta-labeling/apply-pt-sl-on-t1.py:0`

## `get_events`
`triple_barrier_labeling_&_meta-labeling/get-events.py:0`

## `get_bins`
`triple_barrier_labeling_&_meta-labeling/get-bins.py:0`

## `add_vertical_barrier`
`triple_barrier_labeling_&_meta-labeling/add-vertical-barrier.py:0`

## `drop_labels`
`triple_barrier_labeling_&_meta-labeling/drop-labels.py:0`

## `Vertical barrier`
`triple_barrier_labeling_&_meta-labeling/vertical-barrier.py:0`

## `Labeling approach`
`triple_barrier_labeling_&_meta-labeling/labeling-approach.py:0`

ClawHub Data Analysis Research+2

T@clawhub-tangweigang-jpg-8679fec286

Abs Cashflow Modeling

Skill

建模资产支持证券交易结构，模拟抵押贷款池现金流、债券分级偿还和瀑布分配，分析 tranche 收益与风险表现。。

---
name: abs-cashflow-modeling
description: |-
  建模资产支持证券交易结构，模拟抵押贷款池现金流、债券分级偿还和瀑布分配，分析 tranche 收益与风险表现。。
license: Proprietary. See LICENSE.txt in project root.
compatibility: Designed for Doramagic-host ecosystem (Claude Code / openclaw / Cursor). Requires Python 3.12+ with uv package manager.
metadata:
  version: "v6.1"
  blueprint_id: "finance-bp-076"
  compiled_at: "2026-04-22T13:00:28.210602+00:00"
  capability_markets: "global"
  capability_activities: "insurance-actuarial"
  sop_version: "crystal-compilation-v6.1"
---
# ABS 现金流建模 (abs-cashflow-modeling)

> 建模资产支持证券交易结构，模拟抵押贷款池现金流、债券分级偿还和瀑布分配，分析 tranche 收益与风险表现。

## Pipeline

`data_collection -> data_storage -> factor_computation -> target_selection -> trading_execution -> visualization`

## Top Use Cases (40 total)

### Basic ABS Deal Model (`UC-001`)
Model a basic asset-backed securities deal with mortgage pool, bonds, fees, and waterfall to analyze cashflows and tranche performance
**Triggers**: basic deal, ABS, mortgage pool

### Adjustable Rate Mortgage Pool (`UC-002`)
Model an adjustable rate mortgage pool with LIBOR-based floating rates and periodic resets
**Triggers**: ARM, adjustable rate, LIBOR

### Bond Step-Up Rate (`UC-003`)
Model bonds with scheduled rate step-ups at specific dates for ABS deal structuring
**Triggers**: step-up, bond rate, scheduled increase

For all **40** use cases, see [references/USE_CASES.md](references/USE_CASES.md).

**Execute trigger**: `When user intent matches intent_router.uc_entries[].positive_terms AND user uses action verb (run/execute/跑/执行/backtest/fetch/collect)`

## What I'll Ask You

- Target market: A-share (default), HK, or crypto? (US stocks in ZVT are half-baked — stockus_nasdaq_AAPL exists but coverage is thin)
- Data source / provider: eastmoney (free, no account), joinquant (account+paid), baostock (free, good history), akshare, or qmt (broker)?
- Strategy type: MACD golden-cross, MA crossover, volume breakout, fundamental screen, or custom factor?
- Time range: start_timestamp and end_timestamp for backtest period
- Target entity IDs: specific stocks (stock_sh_600000) or index components (SZ1000)?

## Semantic Locks (Fatal)

| ID | Rule | On Violation |
|---|---|---|
| `SL-01` | Execute sell orders before buy orders in every trading cycle | halt |
| `SL-02` | Trading signals MUST use next-bar execution (no look-ahead) | halt |
| `SL-03` | Entity IDs MUST follow format entity_type_exchange_code | halt |
| `SL-04` | DataFrame index MUST be MultiIndex (entity_id, timestamp) | halt |
| `SL-05` | TradingSignal MUST have EXACTLY ONE of: position_pct, order_money, order_amount | halt |
| `SL-06` | filter_result column semantics: True=BUY, False=SELL, None/NaN=NO ACTION | halt |
| `SL-07` | Transformer MUST run BEFORE Accumulator in factor pipeline | halt |
| `SL-08` | MACD parameters locked: fast=12, slow=26, signal=9 | halt |

Full lock definitions: [references/LOCKS.md](references/LOCKS.md)

## Top Anti-Patterns (15 total)

- **`AP-INSURANCE-001`**: Implicit numeric format assumptions without validation
- **`AP-INSURANCE-002`**: Triangle axis construction with invalid temporal ordering
- **`AP-INSURANCE-003`**: Cumulative/incremental triangle representation misuse

All 15 anti-patterns: [references/ANTI_PATTERNS.md](references/ANTI_PATTERNS.md)

## Evidence Quality Notice

> [QUALITY NOTICE] This crystal was compiled from blueprint finance-bp-076. Evidence verify ratio = 37.8% and audit fail total = 22. Generated results may have uncaptured requirement gaps. Verify critical decisions against source files (LATEST.yaml / LATEST.jsonl).

## Reference Files

| File | Contents | When to Load |
|---|---|---|
| [references/seed.yaml](references/seed.yaml) | V6+ 全量权威 (source-of-truth) | 有行为/决策争议时必读 |
| [references/ANTI_PATTERNS.md](references/ANTI_PATTERNS.md) | 15 条跨项目反模式 | 开始实现前 |
| [references/WISDOM.md](references/WISDOM.md) | 跨项目精华借鉴 | 架构决策时 |
| [references/CONSTRAINTS.md](references/CONSTRAINTS.md) | domain + fatal 约束 | 规则冲突时 |
| [references/USE_CASES.md](references/USE_CASES.md) | 全量 KUC-* 业务场景 | 需要完整示例时 |
| [references/LOCKS.md](references/LOCKS.md) | SL-* + preconditions + hints | 生成回测/交易代码前 |
| [references/COMPONENTS.md](references/COMPONENTS.md) | AST 组件地图（按 module 拆分）| 查 API 时 |

---

*Compiled by Doramagic crystal-compilation-v6.1 from `finance-bp-076` blueprint at 2026-04-22T13:00:28.210602+00:00.*
*See [human_summary.md](human_summary.md) for non-technical overview.*

FILE:human_summary.md
# Human Summary

> I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow. Just tell me what you want; I'll write the code, you don't have to dig docs. (Heads up: ZVT natively supports A-share, HK, and crypto. US stocks — stockus_nasdaq_AAPL — are half-baked; don't bother for serious work.)

## What I Can Do

- **tagline**: I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow. Just tell me what you want; I'll write the code, you don't have to dig docs. (Heads up: ZVT natively supports A-share, HK, and crypto. US stocks — stockus_nasdaq_AAPL — are half-baked; don't bother for serious work.)
- **use_cases**: ['Bond Step-Up Rate', 'Adjustable Rate Mortgage Pool', 'Basic ABS Deal Model', 'A-share MACD daily golden-cross backtest with hfq price adjustment from eastmoney', 'End-to-end ZVT pipeline: FinanceRecorder + GoodCompanyFactor + StockTrader', 'Multi-factor strategy with TargetSelector (AND mode) combining MACD + volume breakout', 'Index composition data collection (SZ1000, SZ2000) with EM recorder']

## What I Ask You

- Target market: A-share (default), HK, or crypto? (US stocks in ZVT are half-baked — stockus_nasdaq_AAPL exists but coverage is thin)
- Data source / provider: eastmoney (free, no account), joinquant (account+paid), baostock (free, good history), akshare, or qmt (broker)?
- Strategy type: MACD golden-cross, MA crossover, volume breakout, fundamental screen, or custom factor?
- Time range: start_timestamp and end_timestamp for backtest period
- Target entity IDs: specific stocks (stock_sh_600000) or index components (SZ1000)?

FILE:references/ANTI_PATTERNS.md
# Anti-Patterns (Cross-Project)

Total: **15**

## finance-bp-063--chainladder-python (4)

### `AP-INSURANCE-002` — Triangle axis construction with invalid temporal ordering <sub>(high)</sub>

Development dates are created without verifying they are strictly greater than origin dates, or development lags are calculated with incorrect formulas (e.g., using wrong divisor for monthly difference). This creates logically impossible triangle cells where development <= origin, corrupting the fundamental data structure and producing wrong loss development patterns.

### `AP-INSURANCE-003` — Cumulative/incremental triangle representation misuse <sub>(high)</sub>

Link ratios are computed on incremental triangles instead of cumulative form, or cum_to_incr/incr_to_cum conversions are not properly inverse-applied. This produces link ratios near 1.0 regardless of actual claims development, leading to misleading development factors and incorrect IBNR estimates.

### `AP-INSURANCE-004` — Including incomplete latest diagonal in development analysis <sub>(high)</sub>

Link ratio computation includes the latest diagonal which contains incomplete/in-progress development data. Without excluding this diagonal via valuation_date filtering, development factor estimation uses partial data that biases IBNR estimates. The latest diagonal must be excluded to capture true historical development patterns.

### `AP-INSURANCE-015` — Triangle grain transformation with incompatible parameters <sub>(medium)</sub>

Triangle grain() method is called without setting is_cumulative attribute, or origin grain is made finer than development grain. These produce invalid triangular data structures with misaligned periods and undefined behavior, corrupting actuarial reserving calculations.

## finance-bp-064--insurance_python (2)

### `AP-INSURANCE-005` — EIOPA calibration workflow violations <sub>(high)</sub>

Smith-Wilson calibration workflow is violated in multiple ways: calibration step is skipped before extrapolation, different alpha values are used for calibration vs extrapolation, or convergence point T uses incorrect formula. These violations produce mathematically inconsistent rate curves where observed points do not match market data and extrapolated rates violate EIOPA specifications.

### `AP-INSURANCE-006` — Missing iteration bounds causing infinite loops <sub>(high)</sub>

Root-finding algorithms like bisection for alpha calibration lack maxIter parameters. When the algorithm fails to converge (e.g., no sign change in Galfa at interval bounds), the application freezes indefinitely, causing service disruption. This is especially critical in regulatory compliance workflows where calibration must complete.

## finance-bp-064--insurance_python, finance-bp-126--lifelines (1)

### `AP-INSURANCE-007` — Invalid financial/mathematical constraints not validated <sub>(high)</sub>

Correlation coefficients outside [-1,1], non-positive-semidefinite covariance matrices, negative durations, or entry times >= duration are not validated before use. These cause Cholesky decomposition failures, imaginary values in sqrt(1-rho²), or logically impossible scenarios, producing NaN prices or corrupted at-risk calculations.

## finance-bp-065--pyliferisk (4)

### `AP-INSURANCE-008` — None values propagated to arithmetic operations <sub>(high)</sub>

Critical parameters like interest rate i are passed as None to actuarial calculations. In pyliferisk, Actuarial.__init__ with i=None causes TypeError in (1/(1+i)) and commutation arrays remain empty. Bare except clauses catch these TypeErrors and silently return 0, masking the fundamental issue and producing incorrect but seemingly valid results.

### `AP-INSURANCE-009` — Stub function implementations and duplicate definitions <sub>(high)</sub>

Critical insurance functions like deferred temporary annuities are implemented as empty stubs (only 'pass' statement) or have duplicate definitions where the second shadows the first. This causes functions to return None instead of calculated values, breaking increasing annuity and premium calculations silently in production.

### `AP-INSURANCE-010` — Dispatcher routing to undefined functions <sub>(medium)</sub>

Complex function dispatchers (like annuity()) handle many parameter combinations but call functions that do not exist (e.g., qtaaxn, qtaxn). This causes NameError at runtime when specific parameter combinations are requested, preventing deferred temporary increasing annuity calculations entirely.

### `AP-INSURANCE-014` — Actuarial convention violations in life table construction <sub>(high)</sub>

Life tables violate standard actuarial conventions: using incorrect radix (not 100000), failing to append 0 to lx array for complete extinction, or using wrong payment adjustment formula for fractional annuities. These violations scale all derived quantities (dx, ex, reserves, premiums) incorrectly.

## finance-bp-065--pyliferisk, finance-bp-064--insurance_python (1)

### `AP-INSURANCE-001` — Implicit numeric format assumptions without validation <sub>(high)</sub>

Data formats like per-mille qx values or rate-to-price conversions are applied implicitly without validation. In pyliferisk, qx values stored as per-mille (qx*1000) are used directly as probabilities yielding 1000x errors. In insurance_python, rates are converted to prices using p=(1+r)^(-M) without verifying input format. This causes material miscalculations in reserve and premium calculations.

## finance-bp-126--lifelines (3)

### `AP-INSURANCE-011` — Survival function monotonicity not enforced <sub>(high)</sub>

Non-parametric survival curve estimators do not verify that S(t) is monotonically non-increasing across timeline values. Violations produce mathematically invalid survival curves where probability of survival increases over time, or S(0) is not initialized to 1.0, breaking interpretation as probability distribution.

### `AP-INSURANCE-012` — Input data corruption via inplace operations <sub>(medium)</sub>

User-provided DataFrames are modified inplace using .pop() operations without first creating a copy. This permanently corrupts user data by removing columns, violating data isolation principles and potentially affecting downstream analysis on the original data.

### `AP-INSURANCE-013` — Interval censoring bounds not validated <sub>(medium)</sub>

Lower and upper bounds for interval-censored data are not validated, allowing upper_bound < lower_bound. Invalid interval bounds produce undefined survival probability calculations, potentially negative time intervals in the likelihood function, and corrupt NPMLE estimation.

FILE:references/COMPONENTS.md
# Component Capability Map

**Project**: finance-bp-076--AbsBox
**Scan date**: 2026-04-22
**Stats**: {'total_files': 8, 'total_classes': 29, 'total_functions': 0, 'total_stages': 8}

## Modules (8)

- [deal_definition](components/deal_definition.md): 4 classes
- [component_transformation](components/component_transformation.md): 7 classes
- [deal_execution_(api)](components/deal_execution_-api.md): 4 classes
- [result_parsing](components/result_parsing.md): 4 classes
- [asset_type_system](components/asset_type_system.md): 3 classes
- [input_validation](components/input_validation.md): 2 classes
- [report_generation](components/report_generation.md): 2 classes
- [root_finding](components/root_finding.md): 3 classes

FILE:references/CONSTRAINTS.md
# Constraints

## preservation_manifest

```yaml
required_objects:
  business_decisions_count: 122
  fatal_constraints_count: 52
  non_fatal_constraints_count: 210
  use_cases_count: 40
  semantic_locks_count: 12
  preconditions_count: 4
  evidence_quality_rules_count: 2
  traceback_scenarios_count: 5

```

FILE:references/LOCKS.md
# Semantic Locks + Preconditions

## Semantic Locks (12)

### `SL-01` <sub>(on_violation: fatal)</sub>
Execute sell orders before buy orders in every trading cycle

### `SL-02` <sub>(on_violation: fatal)</sub>
Trading signals MUST use next-bar execution (no look-ahead)

### `SL-03` <sub>(on_violation: fatal)</sub>
Entity IDs MUST follow format entity_type_exchange_code

### `SL-04` <sub>(on_violation: fatal)</sub>
DataFrame index MUST be MultiIndex (entity_id, timestamp)

### `SL-05` <sub>(on_violation: fatal)</sub>
TradingSignal MUST have EXACTLY ONE of: position_pct, order_money, order_amount

### `SL-06` <sub>(on_violation: fatal)</sub>
filter_result column semantics: True=BUY, False=SELL, None/NaN=NO ACTION

### `SL-07` <sub>(on_violation: fatal)</sub>
Transformer MUST run BEFORE Accumulator in factor pipeline

### `SL-08` <sub>(on_violation: fatal)</sub>
MACD parameters locked: fast=12, slow=26, signal=9

### `SL-09` <sub>(on_violation: warning)</sub>
Default transaction costs: buy_cost=0.001, sell_cost=0.001, slippage=0.001

### `SL-10` <sub>(on_violation: fatal)</sub>
A-share equity trading is T+1 (no same-day close of buy positions)

### `SL-11` <sub>(on_violation: fatal)</sub>
Recorder subclass MUST define provider AND data_schema class attributes

### `SL-12` <sub>(on_violation: fatal)</sub>
Factor result_df MUST contain either 'filter_result' OR 'score_result' column

## Preconditions (4)

- **`PC-01`**: `python3 -c 'import zvt; print(zvt.__version__)'` → on_fail: Run: python3 -m pip install zvt then re-run: python3 -m zvt.init_dirs to initialize data directories
- **`PC-02`**: `python3 -c "from zvt.api.kdata import get_kdata; df = get_kdata(entity_ids=['stock_sh_600000'], limit=1); assert df is not None and len(df) > 0, 'No kdata found'"` → on_fail: Run recorder first: python3 -m zvt.recorders.em.em_stock_kdata_recorder --entity_ids stock_sh_600000 (replace with your target entity IDs)
- **`PC-03`**: `python3 -c "import os; from pathlib import Path; zvt_home = Path(os.environ.get('ZVT_HOME', Path.home() / '.zvt')); assert zvt_home.exists(), f'ZVT home not found: {zvt_home}'"` → on_fail: Run: python3 -m zvt.init_dirs
- **`PC-04`**: `python3 -c "import os, tempfile; from pathlib import Path; zvt_home = Path(os.environ.get('ZVT_HOME', Path.home() / '.zvt')); test_f = zvt_home / '.write_test'; test_f.touch(); test_f.unlink()"` → on_fail: Check directory permissions: chmod u+w ~/.zvt or set ZVT_HOME environment variable to a writable location

FILE:references/USE_CASES.md
# Known Use Cases (KUC)

Total: **40**

## `KUC-001`
**Source**: `docs/source/deal_sample/test01.py`

Model a basic asset-backed securities deal with mortgage pool, bonds, fees, and waterfall to analyze cashflows and tranche performance

## `KUC-002`
**Source**: `docs/source/deal_sample/arm_sample.py`

Model an adjustable rate mortgage pool with LIBOR-based floating rates and periodic resets

## `KUC-003`
**Source**: `docs/source/deal_sample/bondStepUp.py`

Model bonds with scheduled rate step-ups at specific dates for ABS deal structuring

## `KUC-004`
**Source**: `docs/source/deal_sample/test10.py`

Incorporate interest rate swap to hedge floating rate exposure in ABS deal

## `KUC-005`
**Source**: `docs/source/deal_sample/conditionAgg.py`

Implement conditional aggregation rules in waterfall that trigger based on pool status

## `KUC-006`
**Source**: `docs/source/deal_sample/fee1.py`

Calculate fees based on period, pool balance percentages, and tiered tables in ABS deals

## `KUC-007`
**Source**: `docs/source/deal_sample/fireTrigger.py`

Implement trigger mechanisms that fire events in waterfall based on performance conditions

## `KUC-008`
**Source**: `docs/source/deal_sample/float_bond.py`

Model ABS deal with floating rate bonds tied to SOFR index

## `KUC-009`
**Source**: `docs/source/deal_sample/multi_pool.py`

Model ABS deal with multiple pools containing different asset types (mortgage and loan) with separate assumptions

## `KUC-010`
**Source**: `docs/source/deal_sample/payPrinSeq.py`

Structure sequential principal payments across multiple bond tranches

## `KUC-011`
**Source**: `docs/source/deal_sample/rateCap.py`

Implement interest rate cap to limit floating rate exposure in ABS deal

## `KUC-012`
**Source**: `docs/source/deal_sample/resec.py`

Model re-securitization where bonds from underlying deals become assets in a new structure

## `KUC-013`
**Source**: `docs/source/deal_sample/stepup_sample.py`

Model bonds with conditional step-up rates that increase after specified dates

## `KUC-014`
**Source**: `docs/source/deal_sample/test02.py`

Implement multiple waterfall phases (amortizing, accelerated) with different payment priorities

## `KUC-015`
**Source**: `docs/source/deal_sample/test04.py`

Split pool income (interest/principal) proportionally across multiple accounts

## `KUC-016`
**Source**: `docs/source/deal_sample/test05.py`

Model insurance or liquidation provider supporting interest payments when pool income is insufficient

## `KUC-017`
**Source**: `docs/source/deal_sample/test08.py`

Model GNMA (Ginnie Mae) mortgage-backed deal with custom ARM loans, guarantor fees, and servicer fees

## `KUC-018`
**Source**: `docs/source/deal_sample/ysoc.py`

Implement yield supplement overcollateralization to bridge yield gap between low-rate assets and higher-rate bonds

## `KUC-019`
**Source**: `docs/source/deal_sample/test13.py`

Model assets with pre-defined projected cashflows rather than individual loan calculations

## `KUC-020`
**Source**: `docs/source/nbsample/pool_multiScenario.ipynb`

Run single pool through multiple CDR/CPR scenarios to compare default and prepayment impacts

## `KUC-021`
**Source**: `docs/source/nbsample/multiAsset.ipynb`

Run multiple asset pools (Mortgage, Loan) with separate assumptions and inspect pool balances

## `KUC-022`
**Source**: `docs/source/nbsample/single_mortgage.ipynb`

Project cashflows for individual mortgage with various CDR scenarios

## `KUC-023`
**Source**: `docs/source/nbsample/single_loan.ipynb`

Model individual loan with SOFR-based floating rate and rate assumption scenarios

## `KUC-024`
**Source**: `docs/source/nbsample/How-to-price-Balloon-Mortgage.ipynb`

Price balloon mortgages and analyze impact of default assumptions on pricing

## `KUC-025`
**Source**: `docs/source/nbsample/bond_pricing.ipynb`

Price bonds using discount curve to determine present value of cashflows

## `KUC-026`
**Source**: `docs/source/nbsample/firstLoss.ipynb`

Calculate first loss position and equity tranche absorption using root finder

## `KUC-027`
**Source**: `docs/source/nbsample/triggers.ipynb`

Monitor default rate triggers and cumulative defaults over deal life

## `KUC-028`
**Source**: `docs/source/nbsample/HowDealEnded.ipynb`

Model deal call options and determine deal termination conditions

## `KUC-029`
**Source**: `docs/source/nbsample/Irr_002.ipynb`

Calculate IRR for equity tranche with target return and incentive fee structure

## `KUC-030`
**Source**: `docs/source/nbsample/masterTrust.ipynb`

Model master trust with multiple sub-tranches (A-1, A-2) under same series

## `KUC-031`
**Source**: `docs/source/nbsample/comboSensitivity.ipynb`

Run combined scenarios with different deal structures and pool assumptions

## `KUC-032`
**Source**: `docs/source/nbsample/InspectSample.ipynb`

Inspect and extract intermediate waterfall variables for debugging deal logic

## `KUC-033`
**Source**: `docs/source/nbsample/re_securitization_example.ipynb`

Model complete re-securitization with child deals, parent deal, and asset pooling from bond proceeds

## `KUC-034`
**Source**: `docs/source/nbsample/revolving_buy_multiple_pools.ipynb`

Model revolving credit structure that purchases multiple pools of assets over time

## `KUC-035`
**Source**: `docs/source/nbsample/warehouse.ipynb`

Model warehouse facility with funding period before term deal issuance

## `KUC-036`
**Source**: `docs/source/nbsample/SRT_Example_Native_Prod.ipynb`

Model synthetic risk transfer where credit risk is transferred via derivatives rather than asset transfer

## `KUC-037`
**Source**: `docs/source/nbsample/PoolAndTag.ipynb`

Run pool analysis with tag-based filtering and multiple assumption scenarios

## `KUC-038`
**Source**: `docs/source/nbsample/MultiIntBond.ipynb`

Model bond with multiple interest components (multipliers and separate rate types)

## `KUC-039`
**Source**: `docs/source/nbsample/structuring-lease-doc.ipynb`

Structure ABS deal backed by lease assets with rental income collections

## `KUC-040`
**Source**: `docs/source/nbsample/WhyByTerm.ipynb`

Apply time-varying assumptions by term periods for CPR and other parameters

FILE:references/WISDOM.md
# Cross-Project Wisdom

Total: **10**

## `CW-INSURANCE-001` — Validate input data format and type before computation
**From**: finance-bp-063--chainladder-python, finance-bp-126--lifelines · **Applicable to**: insurance-actuarial

Both triangle construction and survival analysis require strict input validation: numeric types for triangle columns, valid event indicators (0/1), no NaN/Inf values, and correct temporal ordering. This prevents downstream numerical failures and ensures mathematical validity of actuarial computations.

## `CW-INSURANCE-002` — Initialize probability distributions to boundary values
**From**: finance-bp-065--pyliferisk, finance-bp-126--lifelines · **Applicable to**: insurance-actuarial

Survival probability S(0) must equal 1.0 and life table lx must start at standard radix (100000) and end at 0. Properly initializing boundary values ensures actuarial quantities have correct scale and interpretation as probability distributions.

## `CW-INSURANCE-003` — Include iteration limits in numerical root-finding
**From**: finance-bp-064--insurance_python · **Applicable to**: insurance-actuarial

Bisection and other root-finding algorithms must include maxIter parameters and verify interval contains valid root (sign change). This prevents infinite loops when calibration fails, ensuring service availability in regulatory compliance workflows.

## `CW-INSURANCE-004` — Avoid bare except clauses that mask TypeErrors
**From**: finance-bp-065--pyliferisk · **Applicable to**: insurance-actuarial

Bare except clauses that catch all exceptions including TypeError and return default values (0 or None) mask fundamental parameter errors. Use specific exception handling and validate inputs upfront to fail fast with clear error messages.

## `CW-INSURANCE-005` — Preserve standard radix and extinction conventions in life tables
**From**: finance-bp-065--pyliferisk · **Applicable to**: insurance-actuarial

Life insurance calculations rely on industry-standard conventions: radix of 100000 at age 0 and lx[-1]=0 for complete extinction. Deviating from these conventions scales all derived quantities incorrectly and breaks interoperability with other actuarial systems.

## `CW-INSURANCE-006` — Ensure workflow step ordering and parameter consistency
**From**: finance-bp-063--chainladder-python, finance-bp-064--insurance_python · **Applicable to**: insurance-actuarial

Multi-step algorithms (triangle transformations, Smith-Wilson calibration) require strict step ordering: compute calibration vector before extrapolation, use consistent alpha values throughout. Violating workflow order produces undefined or mathematically inconsistent results.

## `CW-INSURANCE-007` — Validate probability bounds for confidence intervals
**From**: finance-bp-126--lifelines · **Applicable to**: insurance-actuarial

Confidence interval bounds must be constrained to [0,1] for probability estimates. Use fillna and formula constraints to ensure CI bounds remain valid probability ranges, preventing invalid statistical inference from actuarial models.

## `CW-INSURANCE-008` — Validate matrix properties before decomposition
**From**: finance-bp-065--pyliferisk, finance-bp-064--insurance_python · **Applicable to**: insurance-actuarial

Positive semi-definite matrices must be verified before Cholesky decomposition. Invalid matrices cause math domain errors or invalid correlated samples. Similarly, correlation coefficients must be validated to [-1,1] bounds before use in sqrt(1-rho²).

## `CW-INSURANCE-009` — Make defensive copies of input DataFrames
**From**: finance-bp-126--lifelines · **Applicable to**: insurance-actuarial

User-provided DataFrames should be copied before inplace modifications (.pop(), .drop()). This preserves user data integrity and prevents side effects from leaking into caller code, maintaining data isolation principles.

## `CW-INSURANCE-010` — Exclude incomplete diagonals from historical analysis
**From**: finance-bp-063--chainladder-python · **Applicable to**: insurance-actuarial

The latest diagonal in claims triangles contains incomplete development data from the current period. Excluding this diagonal via valuation_date filtering ensures development factors capture only completed, reliable historical patterns for unbiased IBNR estimation.

FILE:references/components/asset_type_system.md
# asset_type_system (3 classes)

## `mkAsset`
`asset_type_system/mkasset.py:0`

## `mkAssumpType`
`asset_type_system/mkassumptype.py:0`

## `Asset Classification`
`asset_type_system/asset-classification.py:0`

FILE:references/components/component_transformation.md
# component_transformation (7 classes)

## `mkDate`
`component_transformation/mkdate.py:0`

## `mkAsset`
`component_transformation/mkasset.py:0`

## `mkBndComp`
`component_transformation/mkbndcomp.py:0`

## `mkAction`
`component_transformation/mkaction.py:0`

## `mkWaterfall`
`component_transformation/mkwaterfall.py:0`

## `Asset Type Builder`
`component_transformation/asset-type-builder.py:0`

## `Waterfall Phase Tag`
`component_transformation/waterfall-phase-tag.py:0`

FILE:references/components/deal_definition.md
# deal_definition (4 classes)

## `Generic.__init__`
`deal_definition/generic-init.py:0`

## `SPV.__init__`
`deal_definition/spv-init.py:0`

## `mkDeal`
`deal_definition/mkdeal.py:0`

## `Deal Locale`
`deal_definition/deal-locale.py:0`

FILE:references/components/deal_execution_-api.md
# deal_execution_(api) (4 classes)

## `API.run`
`deal_execution_(api)/api-run.py:0`

## `API.connect`
`deal_execution_(api)/api-connect.py:0`

## `Run Mode`
`deal_execution_(api)/run-mode.py:0`

## `Response Locale`
`deal_execution_(api)/response-locale.py:0`

FILE:references/components/input_validation.md
# input_validation (2 classes)

## `vDate`
`input_validation/vdate.py:0`

## `isListOfDict`
`input_validation/islistofdict.py:0`

FILE:references/components/report_generation.md
# report_generation (2 classes)

## `toHtml`
`report_generation/tohtml.py:0`

## `Report Format`
`report_generation/report-format.py:0`

FILE:references/components/result_parsing.md
# result_parsing (4 classes)

## `Generic.read`
`result_parsing/generic-read.py:0`

## `SPV.read`
`result_parsing/spv-read.py:0`

## `readBondStmt`
`result_parsing/readbondstmt.py:0`

## `Response Locale`
`result_parsing/response-locale.py:0`

FILE:references/components/root_finding.md
# root_finding (3 classes)

## `mkTweak`
`root_finding/mktweak.py:0`

## `mkStop`
`root_finding/mkstop.py:0`

## `Target Metric`
`root_finding/target-metric.py:0`

ClawHub Coding Data Analysis+2

T@clawhub-tangweigang-jpg-8679fec286

A Stock Quant Lab

Skill

A 股量化实验室：基于 zvt 框架的数据采集 + 因子研究 + 回测执行一站式。覆盖 31 个场景——机构持仓、财报、指数成分、MACD/MA/量能择时。仅限中国 A 股。

---
name: a-stock-quant-lab
description: |-
  A 股量化实验室：基于 zvt 框架的数据采集 + 因子研究 + 回测执行一站式。
  覆盖 31 个场景——机构持仓、财报、指数成分、MACD/MA/量能择时。仅限中国 A 股。
license: MIT-0
compatibility: Python 3.12+, uv package manager. Network access to eastmoney / joinquant / baostock / akshare for data fetch.
metadata:
  version: "v6.1"
  blueprint_id: "finance-bp-009"
  compiled_at: "2026-04-20T07:34:47.524525+00:00"
  capability_markets: "cn-astock"
  capability_activities: "data-sourcing, backtesting, factor-research"
  sop_version: "crystal-compilation-v6.1"
  openclaw:
    emoji: "📈"
    skillKey: a-stock-quant-lab
    category: finance
    primaryEnv: python
    requires:
      bins: ["python3", "uv"]
---
# A 股量化实验室 (a-stock-quant-lab)

> 说出"跟机构持仓"或"MACD 回测"——我基于 zvt 直接写代码跑起来，不用你翻文档。 美股数据质量一般，不推荐。

## Pipeline

`data_collection -> data_storage -> factor_computation -> target_selection -> trading_execution -> visualization`

## Top Use Cases (31 total)

### Actor Data Recorder (`UC-101`)
Collects institutional investor holdings and top 10 free float shareholders on a weekly schedule for tracking major player positions
**Triggers**: institutional investor, top holders, actor data

### Financial Statement Recorder (`UC-102`)
Collects fundamental financial data including balance sheets, income statements, and cash flow statements from eastmoney on a weekly basis
**Triggers**: financial statements, balance sheet, income statement

### Index Data Recorder (`UC-103`)
Collects index metadata, index compositions (SZ1000, SZ2000, growth, value indices), and daily index price data
**Triggers**: index data, index composition, SZ1000

For all **31** use cases, see [references/USE_CASES.md](references/USE_CASES.md).

**Execute trigger**: `When user intent matches intent_router.uc_entries[].positive_terms AND user uses action verb (run/execute/跑/执行/backtest/fetch/collect)`

## What I'll Ask You

- Target market: A-share (default), HK, or crypto? (US stocks in ZVT are half-baked — stockus_nasdaq_AAPL exists but coverage is thin)
- Data source / provider: eastmoney (free, no account), joinquant (account+paid), baostock (free, good history), akshare, or qmt (broker)?
- Strategy type: MACD golden-cross, MA crossover, volume breakout, fundamental screen, or custom factor?
- Time range: start_timestamp and end_timestamp for backtest period
- Target entity IDs: specific stocks (stock_sh_600000) or index components (SZ1000)?

## Semantic Locks (Fatal)

| ID | Rule | On Violation |
|---|---|---|
| `SL-01` | Execute sell orders before buy orders in every trading cycle | halt |
| `SL-02` | Trading signals MUST use next-bar execution (no look-ahead) | halt |
| `SL-03` | Entity IDs MUST follow format entity_type_exchange_code | halt |
| `SL-04` | DataFrame index MUST be MultiIndex (entity_id, timestamp) | halt |
| `SL-05` | TradingSignal MUST have EXACTLY ONE of: position_pct, order_money, order_amount | halt |
| `SL-06` | filter_result column semantics: True=BUY, False=SELL, None/NaN=NO ACTION | halt |
| `SL-07` | Transformer MUST run BEFORE Accumulator in factor pipeline | halt |
| `SL-08` | MACD parameters locked: fast=12, slow=26, signal=9 | halt |

Full lock definitions: [references/LOCKS.md](references/LOCKS.md)

## Top Anti-Patterns (47 total)

- **`AP-ZVT-183`**: 除权因子为 inf/NaN 时直接参与乘法导致复权静默失败
- **`AP-ZVT-179`**: 第三方数据接口超限后异常被吞噬，数据静默缺失
- **`AP-ZVT-200`**: Token 失效后数据查询返回空 DataFrame 而非报错

All 47 anti-patterns: [references/ANTI_PATTERNS.md](references/ANTI_PATTERNS.md)

## Evidence Quality Notice

> [QUALITY NOTICE] This crystal was compiled from blueprint finance-bp-009. Evidence verify ratio = 55.0% and audit fail total = 36. Generated results may have uncaptured requirement gaps. Verify critical decisions against source files (LATEST.yaml / LATEST.jsonl).

## Reference Files

| File | Contents | When to Load |
|---|---|---|
| [references/seed.yaml](references/seed.yaml) | V6+ 全量权威 (source-of-truth) | 有行为/决策争议时必读 |
| [references/ANTI_PATTERNS.md](references/ANTI_PATTERNS.md) | 47 条跨项目反模式 | 开始实现前 |
| [references/WISDOM.md](references/WISDOM.md) | 跨项目精华借鉴 | 架构决策时 |
| [references/CONSTRAINTS.md](references/CONSTRAINTS.md) | domain + fatal 约束 | 规则冲突时 |
| [references/USE_CASES.md](references/USE_CASES.md) | 全量 KUC-* 业务场景 | 需要完整示例时 |
| [references/LOCKS.md](references/LOCKS.md) | SL-* + preconditions + hints | 生成回测/交易代码前 |
| [references/COMPONENTS.md](references/COMPONENTS.md) | AST 组件地图（按 module 拆分）| 查 API 时 |

---

*Compiled by Doramagic crystal-compilation-v6.1 from `finance-bp-009` blueprint at 2026-04-20T07:34:47.524525+00:00.*
*See [human_summary.md](human_summary.md) for non-technical overview.*

FILE:human_summary.md
# Human Summary

> I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow. Just tell me what you want; I'll write the code, you don't have to dig docs. (Heads up: ZVT natively supports A-share, HK, and crypto. US stocks — stockus_nasdaq_AAPL — are half-baked; don't bother for serious work.)

## What I Can Do

- **tagline**: I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow. Just tell me what you want; I'll write the code, you don't have to dig docs. (Heads up: ZVT natively supports A-share, HK, and crypto. US stocks — stockus_nasdaq_AAPL — are half-baked; don't bother for serious work.)
- **use_cases**: ['Index Data Recorder', 'Financial Statement Recorder', 'Actor Data Recorder', 'A-share MACD daily golden-cross backtest with hfq price adjustment from eastmoney', 'End-to-end ZVT pipeline: FinanceRecorder + GoodCompanyFactor + StockTrader', 'Multi-factor strategy with TargetSelector (AND mode) combining MACD + volume breakout', 'Index composition data collection (SZ1000, SZ2000) with EM recorder']

## What I Ask You

- Target market: A-share (default), HK, or crypto? (US stocks in ZVT are half-baked — stockus_nasdaq_AAPL exists but coverage is thin)
- Data source / provider: eastmoney (free, no account), joinquant (account+paid), baostock (free, good history), akshare, or qmt (broker)?
- Strategy type: MACD golden-cross, MA crossover, volume breakout, fundamental screen, or custom factor?
- Time range: start_timestamp and end_timestamp for backtest period
- Target entity IDs: specific stocks (stock_sh_600000) or index components (SZ1000)?

FILE:references/ANTI_PATTERNS.md
# Anti-Patterns (Cross-Project)

Total: **47**

## qlib (12)

### `AP-QLIB-1930` — 回测结果与模型无关——共享 dataset 对象导致预测值被首次模型覆盖 <sub>(high)</sub>

Qlib 中多个模型复用同一个已 fit 的 DatasetH 实例时，dataset 内部的标准化 参数（fit_start_time/fit_end_time 决定的归一化统计量）在第一次 fit 后固化。 切换模型但不重新初始化 dataset，导致所有模型实际使用同一套预测信号。表现为 无论换 LightGBM/XGBoost/DNN，回测净值曲线完全一致。这是最危险的"实验看起来 在跑，但结论全部无效"反模式。

Source: https://github.com/microsoft/qlib/issues/1930

### `AP-QLIB-2090` — fit_start_time 与 train segment 双重配置引发隐式数据泄露 <sub>(high)</sub>

Qlib DatasetH 有两个"训练数据范围"：handler 的 fit_start_time/fit_end_time （决定归一化器拟合范围）和 segments.train（决定模型训练范围）。常见错误是 让 fit_end_time 覆盖 valid/test 段，使归一化统计量（均值、标准差）包含了 未来数据，造成前向偏差（look-ahead bias）。两者独立配置但语义耦合，文档 未明确说明 fit_end_time 必须 <= train_end。

Source: https://github.com/microsoft/qlib/issues/2090

### `AP-QLIB-2036` — MACD 因子公式文档错误——DEA 被多除一次 CLOSE 导致量纲不一致 <sub>(high)</sub>

Qlib 官方文档中的 Alpha 公式示例将 MACD 的 DEA 定义为 EMA(DIF, 9) / CLOSE， 但 DIF 已经是无量纲（除过 CLOSE 的），再次除以 CLOSE 导致 DEA 量纲为 1/price。 基于此文档公式构建的 MACD 因子在截面标准化后与正确公式差异显著，IC 下降。 此类文档层面的公式错误会被大量用户直接照搬入生产因子库。

Source: https://github.com/microsoft/qlib/issues/2036

### `AP-QLIB-2184` — 自定义 A 股数据导入前未按约定填充停牌日 NaN，引发下游因子噪声 <sub>(high)</sub>

Qlib 约定停牌日 open/close/high/low/volume/factor 字段均应填 NaN，以便框架 在因子计算时识别并跳过。用户自建 A 股数据集时若将停牌日保留为上一日价格 （常见于从东财/Wind 直接导出的数据），会导致停牌期间的价格动量因子出现 "假信号"（价格不变但因子非零）。Qlib 不校验此约定，错误静默流入训练数据。

Source: https://github.com/microsoft/qlib/issues/2184

### `AP-QLIB-1892` — PIT（Point-In-Time）财务数据收集器依赖外部股票列表接口，全量 A 股获取不完整 <sub>(high)</sub>

Qlib 的 PIT 数据收集器（财务数据时间点快照）在初始化时调用 get_hs_stock_symbols() 获取沪深股票列表。该函数依赖东财 API，经常仅返回 部分列表而非全量 5000+ 股票，且函数在获取不完整时直接 raise ValueError。 用户若按文档步骤操作，财务数据集将只覆盖部分股票，基于 PIT 财务因子的回测 存在严重生存者偏差（未被采集的股票被隐式排除）。

Source: https://github.com/microsoft/qlib/issues/1892

### `AP-QLIB-2097` — 全市场 instrument="all" 在 32GB 内存机器上 OOM，但 CSI300 正常 <sub>(medium)</sub>

Qlib 在加载 Alpha158 特征时会将指定 universe 的全部特征矩阵一次性载入内存。 使用 instrument="csi300"（300 股）与 instrument="all"（5000+ 股）的内存占用 差约 16 倍。32GB 机器跑全市场时在 init_instance_by_config 阶段直接 OOM， 错误信息不提示内存问题。用户容易误以为是配置错误，实际上需要分批加载或 使用流式特征计算。

Source: https://github.com/microsoft/qlib/issues/2097

### `AP-QLIB-1974` — data_collector 即使指定 --region US 仍调用东财 A 股接口获取股票列表 <sub>(medium)</sub>

Qlib Yahoo 数据收集器在 download_data 时无论 --region 参数为何，均调用 东财 API（_get_eastmoney）获取完整股票列表作为基底，再用 Yahoo Finance 补充数据。在国际网络环境下东财接口不可达，导致即便指定 US 区域也必须科学 上网。这一隐式依赖从未在文档中说明，是 A 股数据基础设施默认全局的典型 设计陷阱。

Source: https://github.com/microsoft/qlib/issues/1974

### `AP-QLIB-1984` — LightGBM 模型标签维度校验逻辑永远不触发导致多标签训练静默失败 <sub>(medium)</sub>

Qlib gbdt.py 中用 y.values.ndim == 2 判断是否为多标签，但从 DataFrame 取出的 Series 的 ndim 永远为 1，条件永远为 False，因此多标签训练不会走 squeeze 分支，而是直接进入 LightGBM 训练并在更深处抛出语义不明的错误。 用户尝试自定义多标签任务时无法从错误信息定位到此根因。

Source: https://github.com/microsoft/qlib/issues/1984

### `AP-QLIB-1915` — 自定义 CSV 数据 dump_bin 后 DataHandler 报 Length mismatch，D.features 却正常 <sub>(high)</sub>

Qlib 存在两套数据访问路径：D.features（直接读 binary）和 DataHandler/DataHandlerLP （带 processor pipeline）。自定义 A 股 CSV 数据在 dump_bin 时若字段顺序 或 symbol 格式（如 600000.SH vs SH600000）与 Qlib 约定不符，DataHandler 的 processor 在 align/reindex 时触发 Length mismatch，而 D.features 因不 经过 processor 而成功。这一"两套路径行为不一致"让用户误以为数据已正确导入。

Source: https://github.com/microsoft/qlib/issues/1915

### `AP-QLIB-1949` — Colab/Linux 多进程后端与 Qlib ParallelExt 冲突导致 DataHandler 完全不可用 <sub>(medium)</sub>

Qlib 在非 fork 环境（Windows 或 Google Colab）中，DataHandler 使用 joblib 并行加载特征时，ParallelExt 初始化时访问 _backend_args 属性失败（AttributeError）。 根因是 joblib 1.5+ 移除了该内部属性，Qlib 的兼容层未更新。表现为 D.features 调用抛出多层嵌套异常，用户无法从错误栈判断是并行后端问题还是数据问题。

Source: https://github.com/microsoft/qlib/issues/1949

### `AP-QLIB-1930` — 回测结果与模型无关——共享 dataset 对象导致预测值被首次模型覆盖 <sub>(high)</sub>

Qlib 中多个模型复用同一个已 fit 的 DatasetH 实例时，dataset 内部的标准化 参数（fit_start_time/fit_end_time 决定的归一化统计量）在第一次 fit 后固化。 切换模型但不重新初始化 dataset，导致所有模型实际使用同一套预测信号。表现为 无论换 LightGBM/XGBoost/DNN，回测净值曲线完全一致。这是最危险的"实验看起来 在跑，但结论全部无效"反模式。

Source: https://github.com/microsoft/qlib/issues/1930

### `AP-QLIB-2090` — fit_start_time 与 train segment 双重配置引发隐式数据泄露 <sub>(high)</sub>

Qlib DatasetH 有两个"训练数据范围"：handler 的 fit_start_time/fit_end_time （决定归一化器拟合范围）和 segments.train（决定模型训练范围）。常见错误是 让 fit_end_time 覆盖 valid/test 段，使归一化统计量（均值、标准差）包含了 未来数据，造成前向偏差（look-ahead bias）。两者独立配置但语义耦合，文档 未明确说明 fit_end_time 必须 <= train_end。

Source: https://github.com/microsoft/qlib/issues/2090

## vnpy (11)

### `AP-VNPY-3691` — K 线生成器首根 K 线时间戳不对齐，导致第一个周期信号错误 <sub>(high)</sub>

vnpy BarGenerator 在合成 N 分钟 K 线时，第一根推送的 K 线时间戳为"当前 tick 所在分钟"而非"完整 N 分钟周期结束时间"。具体表现：09:59 的 tick 会 触发一根不完整的 5 分钟 K 线推送（本应等到 10:04 才推送）。策略若在 on_bar 中直接用 datetime.minute % 5 过滤，第一根 K 线恰好通过，但包含的 数据不足一个完整周期，用于信号计算会产生错误的开仓信号。

Source: https://github.com/vnpy/vnpy/issues/3691

### `AP-VNPY-3705` — CTP 委托价格超限被系统自动撤单，vnpy 无日志输出形同"无声失败" <sub>(high)</sub>

A 股/期货涨跌停价格限制下，超限委托在 CTP 端会被直接撤单（OnRtnOrder statusMsg="50:已撤单被拒绝SHFE:价格跌破跌停板"），而非触发 OnRspOrderInsert 拒单回调。vnpy 的 CTP Gateway 仅在 onRspOrderInsert 时输出拒单日志，对 OnRtnOrder 的撤单原因不做解析区分。策略开发者若依赖日志监控委托失败， 超限委托将完全静默消失，导致实盘仓位与预期严重偏离。

Source: https://github.com/vnpy/vnpy/issues/3705

### `AP-VNPY-3669` — Alpha 模块历史数据增量保存时新旧 DataFrame schema 不兼容导致 SchemaError <sub>(medium)</sub>

vnpy Alpha 模块在保存 K 线数据到 Parquet 文件时，将新下载数据（可能含 Float64 列）与已存文件（历史 Int64 列）直接 polars.concat。polars 强类型 不允许隐式类型提升，抛出 SchemaError。根因是不同数据源/版本返回的字段类型 不一致（如 volume 在部分行情源为整数，在另一些为浮点），且 concat 前无 schema 对齐步骤。影响所有使用 vnpy alpha 进行回测的历史数据构建流程。

Source: https://github.com/vnpy/vnpy/issues/3669

### `AP-VNPY-3707` — CTP Gateway 登出时 C++ 空指针崩溃，重连/切换账号导致进程终止 <sub>(high)</sub>

vnpy_ctp 在调用 close() 登出时，C++ 端 MdApi/TdApi 未检查空指针，有较大概率 触发段错误导致整个 Python 进程崩溃。影响场景：策略测试时频繁登录/登出、切换 模拟与实盘账号、服务器关机重连等。崩溃不产生 Python 异常，无法被 try/except 捕获，是实盘场景中最危险的稳定性陷阱之一。

Source: https://github.com/vnpy/vnpy/issues/3707

### `AP-VNPY-3685` — 价差交易模块 run_backtesting() 在 Jupyter 环境下静默报错，结果不可信 <sub>(high)</sub>

vnpy 4.10 价差交易（SpreadTrading）模块的 run_backtesting() 在 Jupyter 环境下存在事件循环冲突（asyncio already running），导致回测引擎部分逻辑 不执行但不抛异常，返回看似正常的回测统计数据。同样代码在命令行 Python 中无此问题。vnpy 4.x 将部分 IO 改为 async 但 Jupyter 的事件循环与之不兼容， 是"回测结果看起来正确但实际不完整"的隐蔽陷阱。

Source: https://github.com/vnpy/vnpy/issues/3685

### `AP-VNPY-3700` — 安装脚本不使用 venv 导致全局 numpy 版本被降级破坏其他依赖 <sub>(medium)</sub>

vnpy install.bat 直接在系统/conda base 环境安装，会强制降级 numpy 到 <2.0 以满足 vnpy 依赖，破坏依赖 numpy 2.x 的其他量化工具（如 scipy、pytorch 新版）。 没有 requirements.txt，依赖边界不透明。在多工具共存的量化研究环境中， vnpy 的安装脚本是"全局环境污染"的常见根源。

Source: https://github.com/vnpy/vnpy/issues/3700

### `AP-VNPY-3715` — loguru 格式化字符串中含花括号的 order 对象触发 KeyError 导致日志系统崩溃 <sub>(high)</sub>

vnpy engine.py 使用 f-string 将 order.__dict__ 直接格式化后传给 loguru 的 write_log。当 order 的字段名（如 gateway_name）恰好匹配 loguru 格式化占位符时， loguru 将其解析为模板变量并抛出 KeyError，导致整个日志线程崩溃。实盘中 日志系统崩溃意味着后续所有委托/成交记录丢失，是生产环境的高危陷阱。

Source: https://github.com/vnpy/vnpy/issues/3715

### `AP-VNPY-3691` — K 线生成器首根 K 线时间戳不对齐，导致第一个周期信号错误 <sub>(high)</sub>

vnpy BarGenerator 在合成 N 分钟 K 线时，第一根推送的 K 线时间戳为"当前 tick 所在分钟"而非"完整 N 分钟周期结束时间"。具体表现：09:59 的 tick 会 触发一根不完整的 5 分钟 K 线推送（本应等到 10:04 才推送）。策略若在 on_bar 中直接用 datetime.minute % 5 过滤，第一根 K 线恰好通过，但包含的 数据不足一个完整周期，用于信号计算会产生错误的开仓信号。

Source: https://github.com/vnpy/vnpy/issues/3691

### `AP-VNPY-3669` — Alpha 模块历史数据增量保存时新旧 DataFrame schema 不兼容导致 SchemaError <sub>(medium)</sub>

vnpy Alpha 模块在保存 K 线数据到 Parquet 文件时，将新下载数据（可能含 Float64 列）与已存文件（历史 Int64 列）直接 polars.concat。polars 强类型 不允许隐式类型提升，抛出 SchemaError。根因是不同数据源/版本返回的字段类型 不一致（如 volume 在部分行情源为整数，在另一些为浮点），且 concat 前无 schema 对齐步骤。影响所有使用 vnpy alpha 进行回测的历史数据构建流程。

Source: https://github.com/vnpy/vnpy/issues/3669

### `AP-VNPY-3685` — 价差交易模块 run_backtesting() 在 Jupyter 环境下静默报错，结果不可信 <sub>(high)</sub>

vnpy 4.10 价差交易（SpreadTrading）模块的 run_backtesting() 在 Jupyter 环境下存在事件循环冲突（asyncio already running），导致回测引擎部分逻辑 不执行但不抛异常，返回看似正常的回测统计数据。同样代码在命令行 Python 中无此问题。vnpy 4.x 将部分 IO 改为 async 但 Jupyter 的事件循环与之不兼容， 是"回测结果看起来正确但实际不完整"的隐蔽陷阱。

Source: https://github.com/vnpy/vnpy/issues/3685

### `AP-VNPY-3700` — 安装脚本不使用 venv 导致全局 numpy 版本被降级破坏其他依赖 <sub>(medium)</sub>

vnpy install.bat 直接在系统/conda base 环境安装，会强制降级 numpy 到 <2.0 以满足 vnpy 依赖，破坏依赖 numpy 2.x 的其他量化工具（如 scipy、pytorch 新版）。 没有 requirements.txt，依赖边界不透明。在多工具共存的量化研究环境中， vnpy 的安装脚本是"全局环境污染"的常见根源。

Source: https://github.com/vnpy/vnpy/issues/3700

## zipline (13)

### `AP-ZIPLINE-138` — 回测价格为未复权价，教程图表误导用户误判策略收益 <sub>(high)</sub>

Zipline 教程使用 AAPL 股价图做演示，但 bundle 中存储的是未复权价格（raw price）， 而非经过拆股/分红调整的复权价。图表显示的历史价格与市场实际价约差 4 倍（Apple 历次拆股累计因子），用户误将"价格翻 4 倍"当作策略收益。A 股场景更严重： 除权前后价格跳变会在未复权数据中形成巨大"信号"，吸引技术指标在除权日产生 虚假突破信号。

Source: https://github.com/stefan-jansen/zipline-reloaded/issues/138

### `AP-ZIPLINE-235` — 默认以当根 K 线收盘价成交，低估实盘滑点，策略回测收益虚高 <sub>(high)</sub>

Zipline 默认滑点模型在当根 K 线触发信号后，以同根 K 线收盘价成交（current bar close fill）。实盘中信号只能在下一根 K 线的开盘价附近成交（T+1 order execution）。以 A 股日线为例，用收盘价回测比用次日开盘价成交平均高估日收益 约 0.1-0.3%，年化差距可超 30%。需显式配置 slippage model 为 VolumeShareSlippage 或 FixedSlippage 并设合理 volume_limit。

Source: https://github.com/stefan-jansen/zipline-reloaded/issues/235

### `AP-ZIPLINE-190` — 日历 start_session 设为非交易日触发 DateOutOfBounds，无提示如何修正 <sub>(medium)</sub>

Zipline 在注册 bundle 或运行算法时，若 start_session 参数恰好是非交易日 （如 1998-01-01 元旦），Calendar 校验抛出 DateOutOfBounds（"cannot be earlier than the first session"）。错误信息仅显示交易日历起始日，不提示"请改为第一个 交易日"。A 股场景：使用 SSE/SZSE 日历时，若 start_date 恰好是春节前最后 一天次日（节假日），会触发同类错误，调试成本极高。

Source: https://github.com/stefan-jansen/zipline-reloaded/issues/190

### `AP-ZIPLINE-182` — CSV bundle 中股票 symbol 列为空/None 时 SQLite 约束失败，全量导入静默中断 <sub>(medium)</sub>

Zipline csvdir bundle 在 ingest 时会将所有 CSV 文件名解析为 symbol，写入 equity_symbol_mappings 表。若 CSV 文件名不符合 Zipline 规范（如含中文、 带交易所后缀 .SH），symbol 字段被解析为空字符串或 None，触发 sqlite3.IntegrityError: NOT NULL constraint failed。错误发生在 ingest 尾声， 前面已写入的数据被回滚，整个 bundle 不可用。常见于 A 股数据（000001.SZ.csv 格式），需预处理文件名去掉后缀。

Source: https://github.com/stefan-jansen/zipline-reloaded/issues/182

### `AP-ZIPLINE-181` — asset db 过期后 Pipeline 报"no assets traded"，误导用户排查数据范围 <sub>(high)</sub>

Zipline 的 asset database（SQLite）记录每只股票的 start/end 交易日期。若 使用了旧版 Quandl/自建 bundle 且未重新 ingest，在回测新日期范围时 Pipeline 抛出 "Failed to find any assets with country_code 'US' that traded between [dates]"。A 股场景：重新下载行情后若只更新价格数据而未重建 asset db，退市/ 新上市股票的日期范围不更新，Pipeline 过滤会悄悄排除这些股票，产生生存者偏差。

Source: https://github.com/stefan-jansen/zipline-reloaded/issues/181

### `AP-ZIPLINE-285` — week_start()/week_end() 在自定义日历（非美股）下静默失效 <sub>(medium)</sub>

Zipline schedule_function 的 date_rules.week_start() 和 date_rules.week_end() 依赖交易日历的周首/周末判断逻辑，但在非美股日历（如 ASX、SSE）中，该逻辑 与 NYSE 日历的偏移计算不兼容，导致 schedule 永远不触发或在错误的日期触发。 A 股场景：使用 SSE 日历时，含春节等连续长假的周，week_start 可能跳过整个 假期周而不调仓，但用户无法从日志发现未触发的调度。

Source: https://github.com/stefan-jansen/zipline-reloaded/issues/285

### `AP-ZIPLINE-240` — 回测日期时区必须为 UTC，传入 naive datetime 引发深层 AssertionError <sub>(medium)</sub>

Zipline 内部强制要求所有时间戳为 UTC aware datetime。当用户传入 naive datetime （无时区信息，如 pd.Timestamp('2020-01-01')）时，不在入口处报错，而是在 算法执行深处触发 AssertionError: Algorithm should have a utc datetime，栈深 难以定位。A 股开发者从本地 CST 时间导入数据时极易触发此陷阱，需在 bundle 注册时显式 tz_localize('UTC')。

Source: https://github.com/stefan-jansen/zipline-reloaded/issues/240

### `AP-ZIPLINE-138` — 回测价格为未复权价，教程图表误导用户误判策略收益 <sub>(high)</sub>

Zipline 教程使用 AAPL 股价图做演示，但 bundle 中存储的是未复权价格（raw price）， 而非经过拆股/分红调整的复权价。图表显示的历史价格与市场实际价约差 4 倍（Apple 历次拆股累计因子），用户误将"价格翻 4 倍"当作策略收益。A 股场景更严重： 除权前后价格跳变会在未复权数据中形成巨大"信号"，吸引技术指标在除权日产生 虚假突破信号。

Source: https://github.com/stefan-jansen/zipline-reloaded/issues/138

### `AP-ZIPLINE-235` — 默认以当根 K 线收盘价成交，低估实盘滑点，策略回测收益虚高 <sub>(high)</sub>

Zipline 默认滑点模型在当根 K 线触发信号后，以同根 K 线收盘价成交（current bar close fill）。实盘中信号只能在下一根 K 线的开盘价附近成交（T+1 order execution）。以 A 股日线为例，用收盘价回测比用次日开盘价成交平均高估日收益 约 0.1-0.3%，年化差距可超 30%。需显式配置 slippage model 为 VolumeShareSlippage 或 FixedSlippage 并设合理 volume_limit。

Source: https://github.com/stefan-jansen/zipline-reloaded/issues/235

### `AP-ZIPLINE-190` — 日历 start_session 设为非交易日触发 DateOutOfBounds，无提示如何修正 <sub>(medium)</sub>

Zipline 在注册 bundle 或运行算法时，若 start_session 参数恰好是非交易日 （如 1998-01-01 元旦），Calendar 校验抛出 DateOutOfBounds（"cannot be earlier than the first session"）。错误信息仅显示交易日历起始日，不提示"请改为第一个 交易日"。A 股场景：使用 SSE/SZSE 日历时，若 start_date 恰好是春节前最后 一天次日（节假日），会触发同类错误，调试成本极高。

Source: https://github.com/stefan-jansen/zipline-reloaded/issues/190

### `AP-ZIPLINE-181` — asset db 过期后 Pipeline 报"no assets traded"，误导用户排查数据范围 <sub>(high)</sub>

Zipline 的 asset database（SQLite）记录每只股票的 start/end 交易日期。若 使用了旧版 Quandl/自建 bundle 且未重新 ingest，在回测新日期范围时 Pipeline 抛出 "Failed to find any assets with country_code 'US' that traded between [dates]"。A 股场景：重新下载行情后若只更新价格数据而未重建 asset db，退市/ 新上市股票的日期范围不更新，Pipeline 过滤会悄悄排除这些股票，产生生存者偏差。

Source: https://github.com/stefan-jansen/zipline-reloaded/issues/181

### `AP-ZIPLINE-285` — week_start()/week_end() 在自定义日历（非美股）下静默失效 <sub>(medium)</sub>

Zipline schedule_function 的 date_rules.week_start() 和 date_rules.week_end() 依赖交易日历的周首/周末判断逻辑，但在非美股日历（如 ASX、SSE）中，该逻辑 与 NYSE 日历的偏移计算不兼容，导致 schedule 永远不触发或在错误的日期触发。 A 股场景：使用 SSE 日历时，含春节等连续长假的周，week_start 可能跳过整个 假期周而不调仓，但用户无法从日志发现未触发的调度。

Source: https://github.com/stefan-jansen/zipline-reloaded/issues/285

### `AP-ZIPLINE-240` — 回测日期时区必须为 UTC，传入 naive datetime 引发深层 AssertionError <sub>(medium)</sub>

Zipline 内部强制要求所有时间戳为 UTC aware datetime。当用户传入 naive datetime （无时区信息，如 pd.Timestamp('2020-01-01')）时，不在入口处报错，而是在 算法执行深处触发 AssertionError: Algorithm should have a utc datetime，栈深 难以定位。A 股开发者从本地 CST 时间导入数据时极易触发此陷阱，需在 bundle 注册时显式 tz_localize('UTC')。

Source: https://github.com/stefan-jansen/zipline-reloaded/issues/240

## zvt (11)

### `AP-ZVT-183` — 除权因子为 inf/NaN 时直接参与乘法导致复权静默失败 <sub>(high)</sub>

ZVT 在计算前复权因子时以 new/old 价格比计算 qfq_factor。当 old==0（新股首日 或数据缺失）时因子为 inf；当 kdata.open 本身为 None（停牌日未填充）时乘法 抛出 TypeError。结果：整个 entity 的复权计算中断，后续 K 线全部丢失，但主 流程只 log ERROR 不中断，用户往往不知道已有大量股票数据损坏。

Source: https://github.com/zvtvz/zvt/issues/183

### `AP-ZVT-179` — 第三方数据接口超限后异常被吞噬，数据静默缺失 <sub>(high)</sub>

ZVT 使用聚宽 jqdatasdk 批量拉取全市场 K 线时（4000+ 股票），触发聚宽每日 最大查询条数限制（错误：已超过每日最大查询数量）。ZVT 捕获异常后继续执行下一 entity，导致超限后所有股票的当日数据均静默缺失。回测若使用该残缺数据库，因 子计算结果将产生系统性偏差，且无告警。

Source: https://github.com/zvtvz/zvt/issues/179

### `AP-ZVT-200` — Token 失效后数据查询返回空 DataFrame 而非报错 <sub>(high)</sub>

当聚宽/东财 token 过期时，ZVT 的 record_data 不抛异常，而是将 API 返回的 错误信息（如"error: token无效"）当作 DataFrame 列名解析，得到 0 行空表。 后续更新逻辑认为"无新数据"而跳过，造成数据库长期停止更新却无任何错误日志。 用户直到回测结果异常才发现数据已过期数月。

Source: https://github.com/zvtvz/zvt/issues/200

### `AP-ZVT-161` — 全市场 SQLite 批量因子计算触发 too many SQL variables 错误 <sub>(medium)</sub>

ZVT 在计算 VolumeUpMaFactor 等多股因子时，将所有 entity_id 拼入单条 SQL 的 IN 子句。当 A 股全市场（5000+ 股）一次性查询时，触发 SQLite 默认限制 SQLITE_MAX_VARIABLE_NUMBER=999。调大 max_allowed_packet（MySQL 参数）无效， 根因是 SQLite 变量数上限。正确解法是分批查询，但 ZVT 早期版本未处理此边界。

Source: https://github.com/zvtvz/zvt/issues/161

### `AP-ZVT-129` — 使用通配符导入隐藏 API 版本变更，AdjustType 等枚举莫名消失 <sub>(medium)</sub>

ZVT 文档示例使用 `from zvt import *` 导入所有符号。当 ZVT 版本升级重构 枚举（如将 AdjustType 移入子模块）后，通配符导入不再包含该符号，触发 AttributeError。使用者误以为是安装问题，实际是版本间 API breaking change 未在 CHANGELOG 中标注，且通配符导入掩盖了具体来源。应显式 import 枚举类。

Source: https://github.com/zvtvz/zvt/issues/129

### `AP-ZVT-187` — 回测引擎未在数据层空结果时提前终止，导致空指针级联崩溃 <sub>(medium)</sub>

ZVT Trader 在 load_data 完成后检查数据为空时，不提前退出，而是将空 DataFrame 传入 selector 计算，触发后续 NoneType 操作链式崩溃。错误栈深且难以定位根因， 用户误以为是策略逻辑问题。根因是数据时间窗口配置错误（start/end 不在数据 库覆盖范围内）但无有效校验。

Source: https://github.com/zvtvz/zvt/issues/187

### `AP-ZVT-184` — 样例历史数据替换后 provider 目录不匹配导致更新报错 <sub>(low)</sub>

ZVT 提供了可下载的历史快照数据库，但文档未说明必须放置于特定 zvt_home 子目录 下且与 provider 名称对应。用户将数据放错目录后执行 record_data 时，框架 发现本地库为空，触发从头全量拉取，再次遭遇 API 额度或权限错误。数据库路径 与 provider 的隐式绑定是常见理解盲区。

Source: https://github.com/zvtvz/zvt/issues/184

### `AP-ZVT-183B` — HFQ（后复权）与 QFQ（前复权）K 线表使用错误导致因子计算漂移 <sub>(high)</sub>

ZVT 提供 Stock1dKdata（不复权）、Stock1dHfqKdata（后复权）、Stock1dQfqKdata （前复权）三张独立表。用户在计算价格动量/均线因子时混用两张表（如用不复权 做均线，用后复权做收益率），导致除权日前后因子值产生跳变。ZVT 不做跨表 复权类型一致性校验，混用静默通过。

Source: https://github.com/zvtvz/zvt/issues/183

### `AP-ZVT-183` — 除权因子为 inf/NaN 时直接参与乘法导致复权静默失败 <sub>(high)</sub>

ZVT 在计算前复权因子时以 new/old 价格比计算 qfq_factor。当 old==0（新股首日 或数据缺失）时因子为 inf；当 kdata.open 本身为 None（停牌日未填充）时乘法 抛出 TypeError。结果：整个 entity 的复权计算中断，后续 K 线全部丢失，但主 流程只 log ERROR 不中断，用户往往不知道已有大量股票数据损坏。

Source: https://github.com/zvtvz/zvt/issues/183

### `AP-ZVT-187` — 回测引擎未在数据层空结果时提前终止，导致空指针级联崩溃 <sub>(medium)</sub>

ZVT Trader 在 load_data 完成后检查数据为空时，不提前退出，而是将空 DataFrame 传入 selector 计算，触发后续 NoneType 操作链式崩溃。错误栈深且难以定位根因， 用户误以为是策略逻辑问题。根因是数据时间窗口配置错误（start/end 不在数据 库覆盖范围内）但无有效校验。

Source: https://github.com/zvtvz/zvt/issues/187

### `AP-ZVT-183B` — HFQ（后复权）与 QFQ（前复权）K 线表使用错误导致因子计算漂移 <sub>(high)</sub>

ZVT 提供 Stock1dKdata（不复权）、Stock1dHfqKdata（后复权）、Stock1dQfqKdata （前复权）三张独立表。用户在计算价格动量/均线因子时混用两张表（如用不复权 做均线，用后复权做收益率），导致除权日前后因子值产生跳变。ZVT 不做跨表 复权类型一致性校验，混用静默通过。

Source: https://github.com/zvtvz/zvt/issues/183

FILE:references/COMPONENTS.md
# Component Capability Map

**Project**: zvt
**Scan date**: 2026-04-20
**Stats**: {'total_files': 325, 'total_classes': 424, 'total_functions': 571, 'total_business_decision_candidates': 147}

## Modules (14)

- [factors](components/factors.md): 54 classes
- [recorders](components/recorders.md): 90 classes
- [trader](components/trader.md): 22 classes
- [domain](components/domain.md): 114 classes
- [api](components/api.md): 2 classes
- [contract](components/contract.md): 53 classes
- [broker](components/broker.md): 6 classes
- [ml](components/ml.md): 5 classes
- [tag](components/tag.md): 42 classes
- [trading](components/trading.md): 19 classes
- [common](components/common.md): 9 classes
- [misc](components/misc.md): 2 classes
- [informer](components/informer.md): 4 classes
- [samples](components/samples.md): 2 classes

## Data Flow Hints (6)

- {'from': 'EntitySchema (contract/schema.py)', 'to': 'Recorder (contract/recorder.py)', 'how': 'Recorder.data_schema = EntitySchema; RecorderManager registers recorders per entity'}
- {'from': 'Recorder', 'to': 'Domain DB (SQLAlchemy models in domain/)', 'how': 'Recorder.run() calls schema.query_data() / session.add() via zvt storage layer'}
- {'from': 'Domain DB', 'to': 'Factor (contract/factor.py)', 'how': 'Factor.__init__ reads entity_schema; Factor.compute() loads data via TechnicalFactor/TransformerFactor'}
- {'from': 'Factor', 'to': 'TargetSelector (factors/target_selector.py)', 'how': 'TargetSelector aggregates multiple Factors; filters entities by score'}
- {'from': 'TargetSelector', 'to': 'Trader (trader/)', 'how': 'Trader consumes TargetSelector.run() result to make buy/sell decisions'}
- {'from': 'Trader', 'to': 'SimAccount / Broker', 'how': 'Trader places orders via Account.order(); SimAccount simulates fills'}
FILE:references/CONSTRAINTS.md
# Constraints

## preservation_manifest

```yaml
required_objects:
  business_decisions_count: 188
  fatal_constraints_count: 34
  non_fatal_constraints_count: 124
  use_cases_count: 31
  semantic_locks_count: 12
  preconditions_count: 4
  evidence_quality_rules_count: 2
  traceback_scenarios_count: 5

```

## Domain Constraints Injected (86)

- **`SHARED-CN-ASTOCK-T1-001`** <sub>(fatal)</sub>: A 股股票实行 T+1 交收制度：T 日买入的股票最早 T+1 日方可卖出。 T 日卖出所得资金可当日再用于买入。回测框架若未施加 T+1 持仓锁定， 将高估换手率与策略胜率，尤其损害日内反转类策略的真实性。
- **`SHARED-CN-ASTOCK-T1-002`** <sub>(fatal)</sub>: 沪深主板股票日涨跌幅上限为 ±10%（ST/SST 股票 ±5%）。 涨停封板时买方消失、跌停封板时卖方消失；回测若假设当日可以任意价格 成交，会系统性高估可执行性。封板检测应在成交模拟层强制实施。
- **`SHARED-CN-ASTOCK-T1-003`** <sub>(high)</sub>: 科创板和创业板（2020年8月改革后）正常交易日涨跌幅为 ±20%； 北交所 ±30%；新股上市后前5个交易日不设涨跌幅限制。 回测若对所有股票统一套用 ±10% 过滤逻辑，会错误剔除或错误包含这些板块的成交。
- **`SHARED-CN-ASTOCK-T1-004`** <sub>(high)</sub>: ST/*ST 股票日涨跌幅限制为 ±5%，流动性极差，成交假设不可与正常股票混用。 包含历史 ST 股票（最终退市）但不纳入回测会产生幸存者偏差； 纳入回测但不区分 ST 涨跌幅会错误模拟成交。
- **`SHARED-CN-ASTOCK-T1-005`** <sub>(medium)</sub>: A 股开盘集合竞价（9:15-9:25）和收盘集合竞价（14:57-15:00）期间， 成交价由"最大成交量原则"确定，非即时撮合。回测以开盘价或收盘价假设 即时全量成交会低估实际滑点风险，大单策略尤为明显。
- **`SHARED-CN-ASTOCK-T1-006`** <sub>(high)</sub>: 停牌制度：A 股长期停牌（2018年前可长达数月）期间，持仓资金被锁定， 无法再平衡，机会成本在回测中普遍被忽略。应在因子计算前过滤停牌日 （volume == 0 或 is_suspended == True），停牌期间不发出信号。
- **`SHARED-CN-ASTOCK-T1-007`** <sub>(high)</sub>: 新股上市后前5个交易日无涨跌幅限制（首日涨幅可超300%）， 且无完整历史数据（均线/波动率/换手率因子无法计算）。 应在因子计算前过滤上市不足 N 个交易日（通常 60-252 日）的股票。
- **`SHARED-CN-ASTOCK-T1-008`** <sub>(high)</sub>: A 股程序化交易监管新规（2025年7月7日施行）：单账户每秒申报/撤单 ≥ 300 笔， 或单日申报/撤单 ≥ 20000 笔，被认定为高频交易，须向交易所报备。 AI 生成的量化策略若频率超标则无法合规运行，应在策略设计期提示。
- **`SHARED-CN-ASTOCK-ADJ-001`** <sub>(fatal)</sub>: 除权除息日股价跳空是账面调整而非真实亏损。复权选择： 不复权会虚增策略亏损；前复权会将历史价格内嵌未来分红信息（lookahead bias）； 后复权以上市首日为基准累积，是量化回测的最优选择。
- **`SHARED-CN-ASTOCK-ADJ-002`** <sub>(fatal)</sub>: A 股上市公司财务报告披露有法定延迟：年报在次年4月30日前、 半年报在8月31日前、季报分别在4月30日（一季）/10月31日（三季）前披露。 回测中使用财务数据时，必须以实际披露日期（announcement_date）而非 会计期间结束日作为数据可用时间点，否则引入 point-in-time lookahead bias。
- **`SHARED-CN-ASTOCK-ADJ-003`** <sub>(high)</sub>: 分红送股转增和配股会导致除权除息日后股本增加，历史持股数量不变但股价等比 缩水，若回测系统未同步调整持仓股数，会在除权日产生虚假亏损或盈利。
- **`SHARED-CN-ASTOCK-ADJ-004`** <sub>(medium)</sub>: 大宗交易与竞价交易价差：大宗交易成交价可比市价折价最多 10%（主板）， 但此价格不影响次日竞价开盘。大宗交易数据出现在收盘后，若将其混入 日内 OHLCV 数据，会污染收盘价和成交量的正常计算。
- **`SHARED-CN-ASTOCK-ADJ-005`** <sub>(fatal)</sub>: 融资融券（两融）做空限制：A 股散户无法直接卖空，融券标的池有限（主要为 大盘蓝筹，中小盘融券极度稀缺），融券利率远高于融资利率。 回测若直接假设可做空任意股票，会产生不可执行的策略，实盘与回测严重背离。
- **`SHARED-CN-ASTOCK-FX-001`** <sub>(high)</sub>: 通过沪深港通（北向）买入股票，境外投资者合计持股上限 30%，预警线 28%。 当外资持股比例达 28% 时，联交所暂停该股新增买盘，直到降至 26% 才恢复。 策略若重仓外资偏好股（消费/医药龙头），需监控外资持股比例。
- **`SHARED-CN-ASTOCK-FX-002`** <sub>(high)</sub>: 5% 举牌规则：单一投资者持有上市公司已发行股份超过 5%，须在3日内向证监会 和交易所报告并公告；在此期间及公告后2日内不得再买卖。 量化选股系统若不考虑此规则，重仓股超过 5% 阈值后将面临强制停止买入。
- **`SHARED-CN-ASTOCK-FX-003`** <sub>(high)</sub>: 公募基金"双十原则"：单基金持有单只股票不超过净资产 10%， 同一基金管理人旗下所有基金合计不超过该公司已发行股份 10%。 量化选股组合若部署于公募基金，需在优化约束中强制加入合规上限。
- **`SHARED-CN-ASTOCK-FX-004`** <sub>(fatal)</sub>: 内幕交易边界：AI 辅助量化系统的所有输入数据必须来自公开已披露信息。 通过非公开渠道（私有数据服务/内部消息/重组前预知）触发的自动化交易 构成内幕交易，适用《证券法》第80-83条及《内幕交易行为认定指引》。
- **`SHARED-CN-ASTOCK-MKT-001`** <sub>(fatal)</sub>: 幸存者偏差：使用当前 A 股成分股（如当前沪深300）作为历史回测股票池， 会遗漏曾被纳入指数但因业绩差被调出或退市的股票。2020-2024年 A 股 退市数量加速（41家/年创纪录），此偏差日趋严重。必须使用历史时点快照。
- **`SHARED-CN-ASTOCK-MKT-002`** <sub>(medium)</sub>: 指数成分股调整效应：沪深300/中证500等每半年调整一次（6月/12月）， 被纳入股票通常在公告日至生效日之间显著上涨（被动资金被动买入）， 被剔除股票则相反。回测股票池应使用历史成分股快照，并标注调整窗口期。
- **`SHARED-CN-ASTOCK-MKT-003`** <sub>(high)</sub>: 策略拥挤（Strategy Crowding）：大量量化私募使用相似因子模型时， 持仓高度重叠，遇市场冲击时集体卖出形成踩踏。2024年2月 A 股量化危机 是典型案例（小盘股指数单日跌幅超 10%）。需监控因子多头持仓与 主流量化基金的重叠率。
- **`SHARED-CN-ASTOCK-MKT-004`** <sub>(high)</sub>: A 股量化对冲策略常用 IF/IC/IM 股指期货做多/空对冲系统性风险。 但 A 股股指期货长期处于贴水（远期价格 < 现货），IC 年化贴水可达 10-20%。 回测若仅考虑价格收益而忽略期货贴水/升水，会严重高估对冲策略净收益。
- **`SHARED-CN-ASTOCK-MKT-005`** <sub>(high)</sub>: A 股月度动量因子在方向上与美股相反：近1个月表现最好的股票， 下1个月大概率反转（反转效应而非动量）。机构研究（华泰/东吴证券） 与学术论文均验证：直接套用美股月度动量因子在 A 股会产生系统性亏损。
- **`SHARED-CN-ASTOCK-BF-001`** <sub>(medium)</sub>: 处置效应（Shefrin & Statman 1985）在 A 股散户中尤为显著： 投资者倾向于过早卖出盈利股票、过长持有亏损股票。上交所实证研究证实 超过 90% 的个人账户存在此效应，AI 辅助工具不应迁就"持有亏损等解套" 的直觉，而应基于量化信号提供纪律性止损止盈建议。
- **`SHARED-CN-ASTOCK-BF-002`** <sub>(medium)</sub>: A 股以散户为主（个人账户交易量占比超 80%），羊群效应显著：散户倾向于 跟风操作，导致价格非理性波动（如 2015年杠杆牛熊）。量化策略应避免 使用成交量排行/热度排行等可能强化羊群信号的指标作为主要因子。
- **`SHARED-CN-ASTOCK-BF-003`** <sub>(medium)</sub>: 过度自信效应（Barber & Odean 2000）在 A 股散户中更严重：散户年均换手率 超 500%，机构长期收益显著优于散户。高换手率策略经交易成本后净收益往往 更低。AI 不应鼓励"频繁操作"，而应推荐低频高质信号驱动交易。
- **`SHARED-CN-ASTOCK-BF-004`** <sub>(medium)</sub>: A 股日历效应：春节效应（节前5日和节后1-3日倾向上涨）、月初效应 （月初第1-5个交易日表现优于月中/月末）已有学术实证（南京财经大学等）。 策略应在日历特殊窗口降低信号置信度，或单独评估日历驱动收益的贡献。
- **`SHARED-CN-ASTOCK-BF-005`** <sub>(high)</sub>: 策略容量（Capacity）限制：A 股小盘/微盘股日均成交额仅数百万， 大资金买入/卖出会造成严重价格冲击，策略实际容量可能仅几千万元。 回测结果不可外推至亿级资金，应在回测中加入成交量比例上限约束。
- **`SHARED-CN-ASTOCK-COST-001`** <sub>(fatal)</sub>: A 股完整交易成本结构（2023年8月调整后）：印花税卖出单向 0.05%； 佣金双向约 0.01%（最低5元）；过户费（沪市）0.001%； 滑点/冲击成本小盘股 0.1%-0.5%/次。忽略成本的回测策略年化收益率 具有欺骗性，高频/高换手策略尤甚。
- **`SHARED-CN-ASTOCK-COST-002`** <sub>(high)</sub>: 市场冲击成本（Market Impact）在回测中通常完全缺失，但在实盘中可能是 最大成本来源。A 股小盘股 100 万元买入可能推高 1% 以上。冲击成本与 成交规模呈幂律而非线性关系，应使用 Almgren-Chriss 模型或简化版估算。
- **`SHARED-CN-ASTOCK-COST-003`** <sub>(medium)</sub>: 大股东/董监高减持新规（证监会第224号令，2024年5月）：持股5%以上大股东 通过集中竞价减持须提前15个交易日披露减持计划，3个月内不超过股份总数1%。 解禁股减持压力是 A 股特有的系统性风险因子，回测中忽略解禁日历会低估 相关股票的持股风险。
- **`SHARED-CN-ASTOCK-DATA-001`** <sub>(high)</sub>: A 股交易日历与自然日历不一致：存在法定节假日调休导致的"补班日"（周六上班）， 以及临时停市（2015年7月8日至7月10日因股灾紧急停市）。 使用通用工作日历（weekdays）推算 A 股交易日会产生偏差， 必须使用 A 股专用交易日历（如 exchange_calendars 或 tushare 的交易日接口）。
- **`SHARED-CN-ASTOCK-DATA-002`** <sub>(medium)</sub>: A 股退市后股票代码可能被新股重用（极少见但存在）。使用纯代码（如 '000001'） 作为历史数据主键而不包含交易所后缀（'.SZ'）或上市日期范围，可能导致 历史数据与当前股票的错误混淆，长周期回测中需特别注意。
- **`SHARED-DS-RL-001`** <sub>(fatal)</sub>: Rate Limit + 指数退避重试：所有外部数据 API 调用必须实施速率限制控制 和指数退避重试（Exponential Backoff with Jitter）。收到 429/503 响应后 立即重试是反模式，会加剧服务端压力并触发 IP 封禁。 最大重试次数 3-5 次，退避基数 1-2 秒，最大退避 60 秒。
- **`SHARED-DS-RL-002`** <sub>(high)</sub>: 批量 API 调用必须控制并发数（max_workers），不可无限制并行。 免费 API（akshare/tushare 免费版）通常限制为 1-3 并发； 付费 API 也有并发上限（tushare 积分制，不同积分对应不同并发）。 超出并发限制会触发 429 或 IP 封禁。推荐使用 asyncio.Semaphore 或 ThreadPoolExecutor 的 max_workers 参数显式控制。
- **`SHARED-DS-RL-003`** <sub>(high)</sub>: API Token / 凭证安全：数据源 API key（tushare token / akshare 无需 token 但 其他商业数据源需要）不可硬编码在代码中，必须通过环境变量或配置文件读取。 硬编码 token 提交到 Git 会导致 token 泄露和费用损失。
- **`SHARED-DS-RL-004`** <sub>(medium)</sub>: 请求节流（Throttling）：对同一 API 的批量请求应在请求间插入最小间隔 （akshare 部分接口要求 ≥ 0.5s；tushare 免费版每分钟 200 次）。 纯代码 sleep 不如令牌桶（Token Bucket）算法精确，推荐使用 ratelimit 或 slowapi 等成熟库。
- **`SHARED-DS-MISS-001`** <sub>(high)</sub>: 停牌日数据缺失策略：停牌股票在停牌期间无成交数据，数据库中会出现日期缺口。 缺失日期不可使用 forward-fill（会产生虚假成交量）； 应在数据库中以 is_suspended=True 标记，量和成交额填 0，价格保留前一日收盘价。 因子计算时必须过滤 is_suspended=True 的行。
- **`SHARED-DS-MISS-002`** <sub>(medium)</sub>: 新上市股票的历史数据边界：新股上市首日开始在数据库中出现，但其上市前 无历史数据。若因子计算的 lookback 期超过上市天数，会产生所有 NaN 因子值。 采集时应记录每只股票的上市日期（list_date），采集逻辑应以上市日期为起点， 不以固定开始日期。
- **`SHARED-DS-MISS-003`** <sub>(high)</sub>: 退市股票的数据完整性：已退市股票在主流数据源（akshare/tushare）中依然 可以查询历史数据（退市前的历史），但退市日期后无数据。 历史股票池构建时必须包含已退市股票（否则幸存者偏差）， 且采集时需明确处理退市日截止边界。
- **`SHARED-DS-MISS-004`** <sub>(high)</sub>: 多数据源数据对账（Cross-Source Reconciliation）：同一数据（如收盘价） 从不同数据源（akshare/tushare/baostock）获取可能存在细微差异 （不同复权方式/不同节假日处理/除息调整时间不同）。 应在 pipeline 中实施多源对账检查，差异超阈值（如 0.1%）时记录告警并人工确认。
- **`SHARED-DS-TIME-001`** <sub>(high)</sub>: 时间戳精度与类型一致性：数据库中时间戳应使用统一的数据类型 （timestamp 而非 varchar/int）。混用字符串日期（'2024-01-15'）和 Timestamp 对象是比较、索引、merge 出现细微 bug 的常见来源， 应在 pipeline 入口处强制转换。
- **`SHARED-DS-TIME-002`** <sub>(high)</sub>: 交易时间与自然时间的区分：日线数据的"日期"通常对应交易日（T日）， 而新闻/公告数据的"时间"是自然时间。合并两类数据时，必须将自然时间 映射到下一个可用交易日（next available trading day）， 否则会产生"公告在T日，但T日盘中已经可用"的 lookahead 问题。
- **`SHARED-DS-INCR-001`** <sub>(high)</sub>: 增量更新幂等性：数据更新脚本必须是幂等的（多次运行结果相同）。 若脚本因网络中断在中途失败，重新运行时不应产生重复数据或数据缺口。 实现方式：先写入临时表，校验后 UPSERT 到主表，不直接 INSERT/APPEND。
- **`SHARED-DS-INCR-002`** <sub>(high)</sub>: 数据完整性检验（数据校验和/行数检查）：每次数据更新后， 应对关键字段做完整性检验：行数是否在预期范围内、价格是否为正数、 日期是否连续（无缺失交易日）。缺少自动校验的数据管道是"沉默腐烂"的根源。
- **`SHARED-DS-INCR-003`** <sub>(medium)</sub>: 数据版本化：数据管道的输出数据应版本化管理（data versioning）。 当数据源更新了历史数据（如修订调整后的财务数据）， 旧版本数据应保留可追溯，不应静默覆盖，以便对比版本间差异及复现历史回测。
- **`SHARED-DS-INCR-004`** <sub>(medium)</sub>: 数据对齐到交易日历边界：采集完成后，应验证所有股票/资产的数据覆盖 完整性与交易日历的一致性。每只股票在每个交易日都应有一行数据 （停牌标记，不是缺失）。通过 pivot_table 检查 NaN 比例是有效的快速诊断手段。
- **`SHARED-DS-INCR-005`** <sub>(medium)</sub>: 缓存策略（Caching）：频繁读取的静态/低频更新数据（如股票信息、行业分类、 指数成分股）应本地缓存，避免每次运行重复 API 调用。 缓存必须设置过期时间（TTL），防止使用过期的行业分类或已失效的成分股信息。
- **`SHARED-BT-LAB-001`** <sub>(fatal)</sub>: 未来函数（Lookahead Bias）：在模拟历史时间点 t 的交易决策时， 不得使用 t 时刻之后才能知道的信息。最常见形式： (1) 使用收盘价计算信号并同日以收盘价成交； (2) 将 T 日收盘后计算的指标标记在同一根 K 线； (3) 使用当日最高/最低价作为成交假设。 信号计算与成交时间必须对齐：T 日收盘后计算信号，T+1 日开盘成交。
- **`SHARED-BT-LAB-002`** <sub>(high)</sub>: 指标预热期（Warmup Period）处理：滚动窗口指标在前 N 个 bar 时 NaN， 这些 bar 不应参与信号计算和持仓决策。强制要求指标的 warmup_period 与最长 lookback 期等长，且 warmup 期间持仓应置零。
- **`SHARED-BT-LAB-003`** <sub>(fatal)</sub>: ML/DL 模型时序数据划分必须按时间顺序：TRAIN < VALID < TEST， 不可使用随机 k-fold 分折（会将未来数据混入训练集）。 应使用 TimeSeriesSplit 或 Walk-Forward 验证。
- **`SHARED-BT-LAB-004`** <sub>(fatal)</sub>: 开盘价/最高价/最低价成交假设：日线回测中假设每日可以最高价卖出或 最低价买入（如动量策略"最高价止盈"），这是明显的 lookahead， 因为日内最高/最低价只有收盘后才能确认。成交价只能用开盘价或 前一日收盘价（带滑点）。
- **`SHARED-BT-LAB-005`** <sub>(high)</sub>: 数据对齐偏移（Off-by-one）：pandas rolling/shift 等操作容易引入细微的 1 期偏移错误。应在代码中明确记录每个序列的"观测时间点"， 并通过 assert 验证关键时间对齐关系。
- **`SHARED-BT-LAB-006`** <sub>(high)</sub>: 过度优化（Overfitting）：回测数量越多，过拟合概率越高。 Bailey et al.（2014）证明 Optimal Sharpe Ratio 期望值随回测次数单调递减。 应使用 Walk-Forward 验证代替 in-sample 参数穷举，并报告 Deflated Sharpe Ratio（DSR）而非峰值 Sharpe。
- **`SHARED-BT-SURV-001`** <sub>(fatal)</sub>: 幸存者偏差（Survivorship Bias）：使用当前市场成分股作为历史回测股票池， 会遗漏曾经存在但后来退市、摘牌或被合并的股票，系统性高估策略历史收益率。 回测股票池必须使用历史时点快照（point-in-time universe）。
- **`SHARED-BT-SURV-002`** <sub>(high)</sub>: In-Sample / Out-of-Sample 划分：策略开发、参数选择必须在样本内完成， 样本外数据仅用于最终验证，不可多次"看"样本外数据后继续调优 （会将样本外变为新的样本内，重蹈过拟合）。
- **`SHARED-BT-SURV-003`** <sub>(high)</sub>: 停牌/缺失数据的填充策略：停牌日价格不可简单用前一日收盘价 forward-fill， 因为这会在复盘时造成"零成交量"日参与了因子计算和信号生成。 应在因子计算层显式过滤缺失交易日，不填充。
- **`SHARED-BT-SURV-004`** <sub>(high)</sub>: 异常值（Extreme Value）污染：原始市场数据可能含有数据源错误（如除权未 及时调整、手工录入错误导致的极端价格），不清洗直接进入因子计算会产生 极端信号，污染整个横截面。应在 pipeline 入口处过滤 3-sigma 异常值。
- **`SHARED-BT-COST-001`** <sub>(fatal)</sub>: 交易成本（佣金 + 印花税/转让税 + 过户费）必须在回测初始化时强制配置， 不可使用零成本默认值。忽略成本的回测策略绩效指标具有欺骗性， 高换手率策略尤其严重（单边往返成本往往吞噬 50%+ 的毛收益）。
- **`SHARED-BT-COST-002`** <sub>(high)</sub>: 滑点（Slippage）建模：回测若无滑点，假设每笔订单以理想价格成交， 高频策略在实盘中会因成交价劣化而产生严重亏损。至少应配置固定点差 或比例滑点；大单应使用成交量比例模型（如不超过日成交量 5%）。
- **`SHARED-BT-COST-003`** <sub>(high)</sub>: 换手率（Turnover）必须在回测绩效报告中展示并与成本关联分析。 月换手率超过 50%（年化 600%+）时，策略净收益对成本假设极度敏感， 每 10bps 成本变化可能改变策略盈亏结论，必须做成本敏感性分析。
- **`SHARED-BT-COST-004`** <sub>(medium)</sub>: 仓位规模化（Position Sizing）必须纳入资金量约束：回测应模拟固定资金量 下的实际持仓股数（取整），而非假设可以持有小数股。 对小盘股，最小交易单位（A股：100股/手）会导致实际可持仓量与目标权重 产生偏差，应在回测中模拟取整效应。
- **`SHARED-BT-TIME-001`** <sub>(high)</sub>: 时间戳时区统一：多数据源合并时，UTC vs 本地时间混用是常见数据腐败源。 所有时间戳必须在 pipeline 入口处统一转换为同一时区（推荐 UTC 存储， 市场本地时区展示），不可在 pipeline 中途混用不同时区。
- **`SHARED-BT-TIME-002`** <sub>(high)</sub>: 交易日历对齐：合并不同市场或不同频率数据时（如日线价格 + 周频因子）， 必须使用明确的交易日历进行 reindex/merge，不可使用 outer join 后 fillna， 否则会在非交易日（节假日）创建虚假数据行。
- **`SHARED-BT-TIME-003`** <sub>(high)</sub>: 增量更新边界校验：历史数据增量更新时，必须从数据库查询已存最新日期， 仅下载该日期之后的数据。若重新下载已有数据并追加，会产生时间戳重复行， 导致回测时序错误。更新前后必须校验无重复 (index.duplicated().any() == False)。
- **`SHARED-BT-TIME-004`** <sub>(medium)</sub>: 回测绩效归因失真：基准（Benchmark）选择不当会使 Alpha/Beta 计算失真。 应选用策略实际可投资的被动基准（如 HS300 ETF），而非不可直接投资的 价格指数（如 HS300 指数）。价格指数不含股息再投资，会低估持仓基准收益。
- **`SHARED-BT-PERF-001`** <sub>(medium)</sub>: 最大回撤（Max Drawdown）计算必须使用净值序列（portfolio value）， 不可用累计收益率序列代替。若使用对数收益率累加，会低估回撤深度 （因对数收益率在下跌时会比简单收益率偏小）。
- **`SHARED-BT-PERF-002`** <sub>(medium)</sub>: Sharpe Ratio 年化化约定：年化 Sharpe = 日 Sharpe × sqrt(252)（股票，252 交易日） 或 × sqrt(365)（加密货币，365日）。不同系统默认不同，跨系统对比前必须 确认年化因子，否则 Sharpe 不可比。
- **`SHARED-BT-PERF-003`** <sub>(medium)</sub>: Calmar Ratio / Sortino Ratio 优于 Sharpe Ratio 作为风险调整收益指标： Sharpe 假设收益正态分布，A 股/加密市场的收益分布显著左偏（肥尾）， 会低估下行风险。量化评估应同时报告 Sortino（仅下行波动）和 Calmar（年化收益/最大回撤），不应单一依赖 Sharpe。
- **`SHARED-BT-PERF-004`** <sub>(medium)</sub>: 回测绩效归因应拆解为：alpha（主动收益）、beta（市场收益）、 因子暴露收益（style/sector）和特异性收益（stock selection）。 不做归因的回测无法区分"策略优秀"与"顺风行情恰好 beta 对了"。
- **`SHARED-FR-IC-001`** <sub>(high)</sub>: IC（信息系数）是衡量因子预测能力的核心指标，定义为因子值与 下期收益率的 Spearman 秩相关系数（ICIR = IC / std(IC)）。 IC 绝对值 > 0.05 视为有预测能力的初步证据，ICIR > 0.5 视为稳定。 不计算 IC 直接报告回测绩效是因子有效性证明缺失的典型问题。
- **`SHARED-FR-IC-002`** <sub>(high)</sub>: IC 衰减（IC Decay）分析：因子预测能力通常随持仓期增长而衰减。 应计算 1/5/10/20 日 IC 序列，识别因子的最优持仓期。 IC 在1日高但20日迅速衰减的因子是短期因子，不适合月度换仓策略； 反之亦然。使用错误的持仓期会严重损害因子实盘表现。
- **`SHARED-FR-IC-003`** <sub>(high)</sub>: Harvey, Liu & Zhu (2016) 警告：学术界已发现 300+ 个"显著"因子， 其中大量是多重检验下的误发现（False Discovery）。因子有效性要求： t-stat > 3.0（而非传统的 1.96）；或在不同时段/市场独立复现； 或有清晰的经济学逻辑。不满足上述条件的因子极可能是数据挖掘产物。
- **`SHARED-FR-IC-004`** <sub>(high)</sub>: 因子换手率（Factor Turnover）控制：高 IC 但高换手率的因子，在扣除 交易成本后净 IC 可能为负。应计算换手率调整后的有效 IC： net_IC = IC - turnover × cost_per_turn。目标换手率 ≤ 50%（月频）。
- **`SHARED-FR-IC-005`** <sub>(medium)</sub>: 因子衰减期（Half-life）是因子信号强度的核心参数，直接决定最优再平衡频率。 半衰期 < 5 日：日频或周频换仓；5-20 日：周频或双周；> 20 日：月频换仓。 错误地对短期因子使用月频换仓，会导致大量 alpha 在持仓期内消散。
- **`SHARED-FR-NEUT-001`** <sub>(high)</sub>: 行业中性化（Industry Neutralization）：因子值若不对行业均值中性化， 因子收益中会混入行业轮动收益，难以判断是因子本身还是行业暴露驱动了收益。 行业中性化操作：factor_neutral = factor - industry_mean(factor)。
- **`SHARED-FR-NEUT-002`** <sub>(high)</sub>: 市值中性化（Market Cap Neutralization）：小盘股效应（小盘跑赢大盘） 是金融史上最持久的 anomaly 之一，会污染几乎所有未中性化的因子。 若因子与市值高度相关，选股会系统性偏向小盘，收益来自市值暴露而非因子本身。 需同时进行行业和市值中性化（Fama-MacBeth 回归或残差法）。
- **`SHARED-FR-NEUT-003`** <sub>(high)</sub>: 异常值处理（Winsorize/MAD）：因子原始值通常含有极端值，极端值会扭曲 分组分析（如 Q1/Q10 十分位）。应对原始因子值做 Winsorize（截尾至 [1%, 99%] 或 3-sigma）或 MAD（中位数绝对偏差）缩尾，然后再排名/中性化。
- **`SHARED-FR-NEUT-004`** <sub>(medium)</sub>: 因子正交化（Factor Orthogonalization）：当多个因子共同用于合成打分时， 高相关因子的合成等效于对单一因子过度权重，稀释信号多样性。 应在合成前对因子做施密特正交化或 PCA，消除因子间的多重共线性。
- **`SHARED-FR-NEUT-005`** <sub>(medium)</sub>: 缺失数据填充策略：因子计算中的 NaN（停牌/新股/数据缺口）若用截面均值填充 会引入 lookahead bias（均值本身含未来信息）；若完全删除会产生幸存者偏差； 正确做法是用截面中位数（当日所有股票的中位数，不依赖未来）或将该股当日排除。
- **`SHARED-FR-PORT-001`** <sub>(high)</sub>: 分层分析（Quantile Analysis）：因子评估应使用 Q1/Q5（五分位）或 Q1/Q10（十分位）分组的多空收益差（top minus bottom spread）作为 主要评估指标，而非简单的多头收益。Q1 多 Q5 空的"单调性"检验是 因子有效性的核心证据：单调递增/递减 > 非单调 >> 仅多头有效。
- **`SHARED-FR-PORT-002`** <sub>(medium)</sub>: Alpha 衰减测试（Alpha Decay Test）：因子的月度 IC 在不同时段（牛市/熊市/ 震荡市）的稳定性是因子鲁棒性的重要证据。IC 仅在某个特定市场状态下有效 的因子不适合全天候部署；应分段（rolling 12M）展示 IC 时序， 识别因子失效期。
- **`SHARED-FR-PORT-003`** <sub>(medium)</sub>: 换仓成本感知（Turnover-Aware Selection）：因子排名靠近中间地带（49-51 分位） 的股票，排名小幅波动就会触发换仓，产生大量无效交易成本。 应在选股时设置换仓缓冲区（buffer zone）：只在排名变化超过阈值时才换仓。
- **`SHARED-FR-PORT-004`** <sub>(medium)</sub>: 分组收益的统计显著性（Bootstrap 检验）：因子分层收益差（Q1-Q5 spread） 即使在历史数据上很大，也可能是偶然，需要 bootstrap 或 t-test 检验 显著性（p-value < 0.05）。小样本回测期（< 3年）的分层收益尤其不可靠。
- **`SHARED-FR-XFER-001`** <sub>(high)</sub>: 因子跨市场可移植性验证：在一个市场有效的因子，不必然在另一个市场有效。 将美股因子直接套用 A 股、或将股票因子套用期货/加密货币，需要独立 IC 验证， 不可假设跨市场通用性。A 股特有异象（如反转效应、ST 价格异常）不存在于美股。
- **`SHARED-FR-XFER-002`** <sub>(medium)</sub>: 因子有效性时间稳定性：曾经有效的因子会因市场学习和套利行为逐渐失效 （McLean & Pontiff 2016 证明因子发表后平均衰减 58%）。 应定期（每季度/年）重新评估因子 IC，失效因子应及时替换或降权。
- **`SHARED-FR-XFER-003`** <sub>(medium)</sub>: 因子与宏观经济环境的交互：利率周期/经济周期/市场情绪对因子有效性影响显著。 价值因子（低 P/B）在利率上升期更有效；动量因子在趋势市更有效，震荡市失效。 部署因子前应评估当前宏观环境与因子最优生存环境的匹配度。

FILE:references/LOCKS.md
# Semantic Locks + Preconditions

## Semantic Locks (12)

### `SL-01` <sub>(on_violation: fatal)</sub>
Execute sell orders before buy orders in every trading cycle

### `SL-02` <sub>(on_violation: fatal)</sub>
Trading signals MUST use next-bar execution (no look-ahead)

### `SL-03` <sub>(on_violation: fatal)</sub>
Entity IDs MUST follow format entity_type_exchange_code

### `SL-04` <sub>(on_violation: fatal)</sub>
DataFrame index MUST be MultiIndex (entity_id, timestamp)

### `SL-05` <sub>(on_violation: fatal)</sub>
TradingSignal MUST have EXACTLY ONE of: position_pct, order_money, order_amount

### `SL-06` <sub>(on_violation: fatal)</sub>
filter_result column semantics: True=BUY, False=SELL, None/NaN=NO ACTION

### `SL-07` <sub>(on_violation: fatal)</sub>
Transformer MUST run BEFORE Accumulator in factor pipeline

### `SL-08` <sub>(on_violation: fatal)</sub>
MACD parameters locked: fast=12, slow=26, signal=9

### `SL-09` <sub>(on_violation: warning)</sub>
Default transaction costs: buy_cost=0.001, sell_cost=0.001, slippage=0.001

### `SL-10` <sub>(on_violation: fatal)</sub>
A-share equity trading is T+1 (no same-day close of buy positions)

### `SL-11` <sub>(on_violation: fatal)</sub>
Recorder subclass MUST define provider AND data_schema class attributes

### `SL-12` <sub>(on_violation: fatal)</sub>
Factor result_df MUST contain either 'filter_result' OR 'score_result' column

## Preconditions (4)

- **`PC-01`**: `python3 -c 'import zvt; print(zvt.__version__)'` → on_fail: Run: python3 -m pip install zvt then re-run: python3 -m zvt.init_dirs to initialize data directories
- **`PC-02`**: `python3 -c "from zvt.api.kdata import get_kdata; df = get_kdata(entity_ids=['stock_sh_600000'], limit=1); assert df is not None and len(df) > 0, 'No kdata found'"` → on_fail: Run recorder first: python3 -m zvt.recorders.em.em_stock_kdata_recorder --entity_ids stock_sh_600000 (replace with your target entity IDs)
- **`PC-03`**: `python3 -c "import os; from pathlib import Path; zvt_home = Path(os.environ.get('ZVT_HOME', Path.home() / '.zvt')); assert zvt_home.exists(), f'ZVT home not found: {zvt_home}'"` → on_fail: Run: python3 -m zvt.init_dirs
- **`PC-04`**: `python3 -c "import os, tempfile; from pathlib import Path; zvt_home = Path(os.environ.get('ZVT_HOME', Path.home() / '.zvt')); test_f = zvt_home / '.write_test'; test_f.touch(); test_f.unlink()"` → on_fail: Check directory permissions: chmod u+w ~/.zvt or set ZVT_HOME environment variable to a writable location

FILE:references/USE_CASES.md
# Known Use Cases (KUC)

Total: **41**

## `KUC-DR-001`
**Source**: `examples/data_runner/actor_runner.py`

定时（每周三凌晨1点）批量采集机构投资者持仓、前十大流通股东、股东汇总数据， 支撑后续机构持仓变化的量化分析。

**Inputs**:
- {'data_provider': 'em'}
- {'entity_provider': 'em'}
- {'cron_schedule': 'hour=1, minute=0, day_of_week=2'}

**Components**:
- StockInstitutionalInvestorHolder
- StockTopTenFreeHolder
- StockActorSummary
- run_data_recorder
- BackgroundScheduler

**Parameters**:
```
{'day_data': True, 'sleeping_time': None}
```

**Validation**:
```
运行后查询 StockActorSummary.query_data() 返回非空 DataFrame， 且数据时间戳更新至最近报告期。

```

## `KUC-DR-002`
**Source**: `examples/data_runner/finance_runner.py`

每周五凌晨1点同步全市场A股财务四表（利润表、资产负债表、现金流量表、财务因子）， 保持本地数据库与东方财富数据源同步，为基本面选股提供原始数据。

**Inputs**:
- {'data_provider': 'eastmoney'}
- {'entity_provider': 'eastmoney'}
- {'cron_schedule': 'hour=1, minute=0, day_of_week=5'}

**Components**:
- Stock
- StockDetail
- FinanceFactor
- BalanceSheet
- IncomeStatement
- CashFlowStatement
- run_data_recorder
- BackgroundScheduler

**Parameters**:
```
{'day_data': True}
```

**Validation**:
```
FinanceFactor.query_data(limit=5) 返回含最新报告期记录， CashFlowStatement.query_data() 不为空。

```

## `KUC-DR-003`
**Source**: `examples/data_runner/index_runner.py`

维护A股指数基本信息及其成份股列表（国证1000/2000/成长/价值）， 并在每个交易日16:20后同步重要指数的日K行情，为板块轮动分析提供基础数据。

**Inputs**:
- {'data_provider': '"exchange" / "em"'}
- {'index_ids': ['index_sz_399311', 'index_sz_399303', 'index_sz_399370', 'index_sz_399371']}
- {'important_index_codes': 'IMPORTANT_INDEX 常量'}

**Components**:
- Index
- Index1dKdata
- IndexStock
- run_data_recorder
- BackgroundScheduler

**Parameters**:
```
{'day_data': True, 'entity_provider': 'exchange'}
```

**Validation**:
```
Index1dKdata.query_data(codes=IMPORTANT_INDEX) 返回当日收盘价记录。

```

## `KUC-DR-004`
**Source**: `examples/data_runner/joinquant_fund_runner.py`

每周六从聚宽采集公募基金基本信息、基金持仓明细和个股估值（PE/PB）， 支持基金重仓分析与价值择时策略。

**Inputs**:
- {'data_provider': 'joinquant'}
- {'entity_provider': 'joinquant'}

**Components**:
- Fund
- FundStock
- StockValuation
- run_data_recorder

**Parameters**:
```
{'sleeping_time': 0, 'day_data': True}
```

**Validation**:
```
FundStock.query_data(limit=5) 返回含基金代码和持仓股代码的记录。

```

## `KUC-DR-005`
**Source**: `examples/data_runner/joinquant_kdata_runner.py`

每个交易日15:30后从聚宽采集A股日后复权K线及交易日历， 保持本地历史行情数据库完整，为技术因子计算提供全量数据基础。

**Inputs**:
- {'data_provider': 'joinquant'}
- {'entity_provider': 'joinquant'}

**Components**:
- Stock
- StockTradeDay
- Stock1dHfqKdata
- run_data_recorder

**Parameters**:
```
{'force_update': False, 'day_data': True, 'sleeping_time': 0}
```

**Validation**:
```
Stock1dHfqKdata.query_data(entity_id="stock_sz_000001", limit=5) 返回最新交易日收盘价。

```

## `KUC-DR-006`
**Source**: `examples/data_runner/kdata_runner.py`

A股+港股每日全市场行情录入主流程：包含涨停数据、指数行情、板块行情（概念/行业）、 A股后复权K线、港股（南向通）行情，并推送新板块通知邮件。

**Inputs**:
- {'data_provider': 'em'}
- {'entity_provider': 'em'}
- {'sleeping_time': 0}

**Components**:
- LimitUpInfo
- Index
- Index1dKdata
- Block
- Block1dKdata
- Stock
- Stock1dHfqKdata
- Stockhk
- Stockhk1dHfqKdata
- get_entity_ids_by_filter
- EmailInformer
- run_data_recorder

**Parameters**:
```
{'ignore_delist': True, 'ignore_st': False, 'ignore_new_stock': False, 'return_unfinished': True, 'force_update': False}
```

**Validation**:
```
Stock1dHfqKdata.query_data(day_data=True) 及 LimitUpInfo.query_data() 均有当日记录， 邮件收到新板块通知。

```

## `KUC-DR-007`
**Source**: `examples/data_runner/kdata_runner.py`

采集涨停股的涨停原因并统计近期热门涨停题材（按出现频次排序）， 输出题材热度榜以辅助短线复盘。

**Inputs**:
- {'days_ago': '20 / 5'}
- {'limit': 15}

**Components**:
- LimitUpInfo
- get_hot_topics
- EmailInformer

**Parameters**:
```
{'reason_split_char': '+'}
```

**Validation**:
```
get_hot_topics(days_ago=5) 返回非空字典，键为题材名，值为出现次数。

```

## `KUC-DR-008`
**Source**: `examples/data_runner/kdata_runner.py`

采集A股全市场新闻标题，按可配置关键词分组统计各题材关联个股， 识别长期热门 vs 新热门 vs 退潮题材，辅助主题投资决策。

**Inputs**:
- {'hot_words_config': 'hot.json（主题:关键词列表）'}
- {'days_ago': '20 / 5'}
- {'threshold': 3}

**Components**:
- StockNews
- run_data_recorder
- get_hot_topics
- group_stocks_by_topic
- EmailInformer

**Parameters**:
```
{'sleeping_time': 2, 'force_update': False}
```

**Validation**:
```
report_hot_topics() 邮件包含"一直热门"、"+++"、"---"三段信息， 且每段均非空。

```

## `KUC-DR-009`
**Source**: `examples/data_runner/sina_data_runner.py`

从新浪采集A股板块（概念/行业）基本信息及板块资金流向， 提供与东方财富数据源互补的资金面视角。

**Inputs**:
- {'data_provider': 'sina'}

**Components**:
- Block
- BlockMoneyFlow
- run_data_recorder

**Parameters**:
```
{'day_data': True}
```

**Validation**:
```
BlockMoneyFlow.query_data(provider="sina", limit=5) 返回含 main_net_inflow 字段的记录。

```

## `KUC-DR-010`
**Source**: `examples/data_runner/trading_runner.py`

每个交易日18点采集龙虎榜，筛选出近一年胜率高的知名游资席位， 再过滤出30天内有该席位参与且当日成交额+换手率达标的个股，推送邮件供人工跟踪。

**Inputs**:
- {'data_provider': 'em'}
- {'entity_provider': 'em'}
- {'look_back_days': 400}
- {'recent_days': 30}
- {'dep_rate_threshold': 5}
- {'turnover_threshold': 300000000}
- {'turnover_rate_threshold': 0.02}

**Components**:
- DragonAndTiger
- Stock1dHfqKdata
- get_big_players
- EmailInformer
- run_data_recorder

**Parameters**:
```
{'sleeping_time': 2, 'day_data': True}
```

**Validation**:
```
DragonAndTiger.query_data(limit=5) 有当日记录，邮件包含"report 龙虎榜"主题。

```

## `KUC-FA-001`
**Source**: `examples/factors/boll_factor.py`

为A股个股计算布林带（Bollinger Bands）并标注突破上轨/下轨信号， 可视化价格与带宽关系，辅助均值回归与趋势跟踪策略。

**Inputs**:
- {'entity_ids': ['stock_sz_000338', 'stock_sh_601318']}
- {'provider': 'em'}
- {'start_timestamp': '2019-01-01'}
- {'data_level': '"1d" / "30m"'}

**Components**:
- BollTransformer (自定义 Transformer，使用 ta.volatility.BollingerBands)
- BollFactor (继承 TechnicalFactor)
- Stock1dHfqKdata / Stock30mHfqKdata

**Parameters**:
```
{'window': 20, 'window_dev': 2, 'output_columns': ['bb_bbm', 'bb_bbh', 'bb_bbl', 'bb_bbhi', 'bb_bbli', 'bb_bbw', 'bb_bbp'], 'filter_result': 'bb_bbli - bb_bbhi (1=价格在下轨, -1=价格在上轨)'}
```

**Validation**:
```
factor.draw(show=True) 弹出含价格+布林带三轨道图形； factor.result_df 包含 True/False/None 三种状态。

```

## `KUC-FA-002`
**Source**: `examples/factors/fundamental_selector.py`

用基本面多维度筛选"核心资产"：高ROE、高现金流、低财务杠杆、有增长、 低应收账款（应收<=总流动资产30%），为长线价值投资提供股票池。

**Inputs**:
- {'start_timestamp': '2016-01-01'}
- {'end_timestamp': '当前日期字符串'}
- {'codes': 'null（全A）'}

**Components**:
- FundamentalSelector (继承 TargetSelector)
- GoodCompanyFactor (使用 FinanceFactor 数据)
- GoodCompanyFactor (使用 BalanceSheet + accounts_receivable 过滤)
- BalanceSheet

**Parameters**:
```
{'provider': 'eastmoney', 'col_period_threshold': 'null (第二个 factor 不设列期数阈值)', 'accounts_receivable_max': '0.3 * total_current_assets'}
```

**Validation**:
```
selector.get_targets("2019-06-30") 返回非空 entity_id 列表， 手工核对结果含典型高ROE龙头股（如贵州茅台、格力电器等）。

```

## `KUC-FA-003`
**Source**: `examples/factors/tech_factor.py`

综合 MACD 金叉/多头趋势 + 均线多头排列（5/120/250日线）+ 成交额与换手率过滤， 识别"放量上攻牛股"，为日线趋势跟踪策略提供入场信号。

**Inputs**:
- {'entity_ids': '中大市值股票池（由 get_middle_and_big_stock 预过滤）'}
- {'start_timestamp': '2019-01-01'}
- {'adjust_type': 'AdjustType.hfq'}

**Components**:
- BullAndUpFactor (继承 MacdFactor)
- CrossMaTransformer (windows=[5, 120, 250])
- MacdFactor

**Parameters**:
```
{'turnover_threshold': 400000000, 'turnover_rate_threshold': 0.02, 'ma_windows': [5, 120, 250]}
```

**Validation**:
```
factor.result_df["filter_result"] 含 True 条目； report_bull() 邮件列出符合条件标的。

```

## `KUC-RP-001`
**Source**: `examples/reports/report_bull.py`

每个交易日18点自动筛选满足"牛股"条件（MACD金叉+多头趋势+成交量达标）的 A股及板块，分类推送邮件并同步到东方财富自选股组，辅助每日择股。

**Inputs**:
- {'target_date': 'get_latest_kdata_date() 自动获取'}
- {'entity_ids': 'get_middle_and_big_stock(timestamp)'}
- {'adjust_type': 'AdjustType.hfq (股) / AdjustType.qfq (板块)'}

**Components**:
- BullAndUpFactor
- report_targets
- get_middle_and_big_stock
- EmailInformer

**Parameters**:
```
{'turnover_threshold': 300000000, 'turnover_rate_threshold': 0.02, 'start_timestamp': '2019-01-01', 'em_group': 'bull股票', 'em_group_over_write': False, 'filter_by_volume': False}
```

**Validation**:
```
邮件主题包含"bull股票"且正文包含个股代码； 东方财富"bull股票"组有对应记录。

```

## `KUC-RP-002`
**Source**: `examples/reports/report_core_compay.py`

每周六基于基本面多因子模型（FundamentalSelector）筛选核心资产， 附上基金和QFII持仓占比变化，发邮件并同步东方财富"core"自选组， 为长线配置提供每周精选标的。

**Inputs**:
- {'start_timestamp': '2016-01-01'}
- {'end_timestamp': '当前日期'}
- {'subscriber_emails': 'subscriber_emails.json 文件'}

**Components**:
- FundamentalSelector
- TargetSelector
- StockActorSummary
- get_entities
- add_to_eastmoney
- EmailInformer

**Parameters**:
```
{'actor_type': 'ActorType.raised_fund / ActorType.qfii', 'em_group': 'core'}
```

**Validation**:
```
邮件含选股结果（含机构持仓占比）；若无结果则发送"no targets"。

```

## `KUC-RP-003`
**Source**: `examples/reports/report_tops.py`

每日17点计算A股短期最强（近期涨幅最高）和中期最强个股， 17:30计算最强行业板块和最强概念板块（按N日涨幅排名）， 并同步推送港股南向通短期/中期最强，辅助板块轮动和动量策略。

**Inputs**:
- {'periods': '短期=[近N天], 中期=[30,50]'}
- {'top_count': '10（板块）'}
- {'turnover_threshold': '100000000（港股）'}

**Components**:
- get_top_stocks
- report_top_entities
- get_top_performance_entities_by_periods
- Block
- BlockCategory
- inform
- EmailInformer

**Parameters**:
```
{'return_type': 'TopType.positive', 'ignore_new_stock': 'false / true', 'adjust_type': 'AdjustType.hfq / null', 'em_group_over_write': True}
```

**Validation**:
```
邮件分别包含"短期最强"、"中期最强"、"最强行业"、"最强概念"主题， 每组列出 top_count 数量标的。

```

## `KUC-RP-004`
**Source**: `examples/reports/report_vol_up.py`

筛选"放量突破半年线或年线"的A股（按市值分大小市值两组）和港股， 识别均线突破形态的个股，辅助中期趋势入场。

**Inputs**:
- {'windows': [120, 250]}
- {'up_intervals': 60}
- {'over_mode': 'or'}
- {'turnover_threshold': '100000000（港股）'}

**Components**:
- VolumeUpMaFactor
- get_top_stocks (return_type="small_vol_up" / "big_vol_up")
- report_targets
- inform
- EmailInformer

**Parameters**:
```
{'adjust_type': 'AdjustType.hfq', 'start_timestamp': '2021-01-01', 'filter_by_volume': False}
```

**Validation**:
```
邮件包含"放量突破(半)年线"标题，标的按大小市值分两封邮件。

```

## `KUC-RP-005`
**Source**: `examples/reports/__init__.py`

识别财务风险股票：营收/利润下滑、流动比率/速动比率低、 高应收+高存货+高商誉、应收账款超净利润一半， 用于规避高风险标的或做空筛选。

**Inputs**:
- {'the_date': '当前日期（默认）'}
- {'income_yoy': '-0.1 (同比跌幅阈值)'}
- {'profit_yoy': -0.1}
- {'entity_ids': 'null（全A）'}

**Components**:
- FinanceFactor
- BalanceSheet
- IncomeStatement
- risky_company (自定义函数)

**Parameters**:
```
{'current_ratio_min': 0.7, 'quick_ratio_min': 0.5, 'start_offset_days': 130}
```

**Validation**:
```
risky_company() 返回含高风险个股代码的列表，手工验证含已知财务暴雷股。

```

## `KUC-RS-001`
**Source**: `examples/research/dragon_and_tiger.py`

分析龙虎榜历史数据，识别过去一年（~400天）中胜率最高的游资席位（大玩家）， 并计算每个席位在不同持仓天数（3/5/10天）下的历史胜率， 为跟庄席位策略提供统计依据。

**Inputs**:
- {'provider': 'em'}
- {'start_timestamp': 'date_time_by_interval(end_timestamp, -400)'}
- {'end_timestamp': 'date_time_by_interval(current_date(), -60)'}
- {'intervals': [3, 5, 10]}

**Components**:
- DragonAndTiger
- get_big_players
- get_player_success_rate

**Validation**:
```
get_player_success_rate() 返回含席位名+多个持仓天数胜率的 DataFrame， 可见知名游资席位（如"国泰君安证券股份有限公司上海江苏路证券营业部"）。

```

## `KUC-RS-002`
**Source**: `examples/research/top_dragon_tiger.py`

对每月涨幅前30股票，追溯其月涨幅期间内龙虎榜记录， 统计哪些席位频繁参与月度强势股，揭示"聪明钱"机构行为模式。

**Inputs**:
- {'data_provider': 'em'}
- {'start_timestamp': '2021-01-01'}
- {'end_timestamp': '2022-01-01'}

**Components**:
- get_top_performance_by_month
- get_players
- DragonTigerFactor (继承 TechnicalFactor，叠加席位注释)

**Parameters**:
```
{'direction': 'in', 'top_count_per_month': 30}
```

**Validation**:
```
top_dragon_and_tiger() 返回合并后的 player_df， 按 entity_id+timestamp 双索引排序，可见重复出现的知名席位。

```

## `KUC-RS-003`
**Source**: `examples/research/top_tags.py`

统计每月涨幅前30股票的市值分布，验证"小市值效应"假设， 并记录每个月度强势股对应时点的市值及得分，为选股规则制定提供实证依据。

**Inputs**:
- {'data_provider': 'em'}
- {'start_timestamp': '2020-01-01'}
- {'end_timestamp': '2021-01-01'}

**Components**:
- get_top_performance_by_month
- Stock1dHfqKdata
- top_tags (自定义函数)

**Parameters**:
```
{'list_days': 250}
```

**Validation**:
```
top_tags() 返回含 {entity_id, timestamp, cap, score} 的记录列表， 分析结果验证"市值90%分布在100亿以下"的假设。

```

## `KUC-ML-001`
**Source**: `examples/ml/sgd.py`

用 SGD 分类器基于MA特征预测A股个股下期价格行为（涨/跌/震荡分类）， 结合标准化管道训练+预测，可视化预测结果与实际K线对比。

**Inputs**:
- {'data_provider': 'em'}
- {'entity_ids': ['stock_sz_000001']}
- {'label_method': 'behavior_cls'}

**Components**:
- MaStockMLMachine
- SGDClassifier (sklearn)
- StandardScaler
- make_pipeline

**Parameters**:
```
{'max_iter': 1000, 'tol': '1e-3'}
```

**Validation**:
```
machine.draw_result(entity_id="stock_sz_000001") 展示预测结果图； 预测准确率可通过 machine.predict() 返回的 DataFrame 评估。

```

## `KUC-ML-002`
**Source**: `examples/ml/sgd.py`

用 SGD 回归器基于MA特征直接预测A股个股下期收益率（连续值）， 与分类模式对比，评估线性模型的预测能力。

**Inputs**:
- {'data_provider': 'em'}
- {'entity_ids': ['stock_sz_000001']}
- {'label_method': 'raw'}

**Components**:
- MaStockMLMachine
- SGDRegressor (sklearn)
- StandardScaler
- make_pipeline

**Parameters**:
```
{'max_iter': 1000, 'tol': '1e-3'}
```

**Validation**:
```
machine.draw_result(entity_id="stock_sz_000001") 展示回归预测线； 预测误差通过 MSE/MAE 评估。

```

## `KUC-IN-001`
**Source**: `examples/intent/intent.py`

对比沪指与道琼斯指数自2000年起的相对表现（同基归一化）， 直观展示中美股市的长期相关性与分化，辅助宏观择时判断。

**Inputs**:
- {'entity_ids': ['index_sh_000001', 'indexus_us_SPX']}
- {'start_timestamp': '2000-01-01'}
- {'scale_value': 100}

**Components**:
- Index
- Indexus
- Index1dKdata
- Indexus1dKdata
- compare

**Validation**:
```
compare() 弹出含双轨叠加的折线图，Y轴为归一化值（基期=100）。

```

## `KUC-IN-002`
**Source**: `examples/intent/intent.py`

比较美债收益率（2年/5年）与道指走势的历史关系， 验证"高利率压制股市"假设，辅助美联储政策周期下的资产配置。

**Inputs**:
- {'entity_ids': ['country_galaxy_US', 'indexus_us_SPX']}
- {'start_timestamp': '1990-01-01'}

**Components**:
- TreasuryYield
- Indexus1dKdata
- compare

**Parameters**:
```
{'scale_value': None, 'schema_map_columns': {'TreasuryYield': ['yield_2', 'yield_5'], 'Indexus1dKdata': ['close']}}
```

**Validation**:
```
compare() 展示多轨折线图，可见利率与指数的反向关系。

```

## `KUC-IN-003`
**Source**: `examples/intent/intent.py`

对比江西铜业股票与沪铜期货走势（归一化）， 验证"资源股跟踪商品价格"假设，识别股价与商品价格的背离机会。

**Inputs**:
- {'entity_ids': ['stock_sh_600362', 'future_shfe_CU']}
- {'start_timestamp': '2005-01-01'}
- {'scale_value': 100}

**Components**:
- compare

**Validation**:
```
compare() 展示归一化双轨折线，可见铜业股与铜期货的高度相关走势。

```

## `KUC-IN-004`
**Source**: `examples/intent/intent.py`

比较铜、铝、螺纹钢三种工业金属的价格走势， 识别金属品种间的轮动规律与分化，为跨品种套利提供参考。

**Inputs**:
- {'entity_ids': ['future_shfe_CU', 'future_shfe_AL', 'future_shfe_RB']}
- {'start_timestamp': '2009-04-01'}
- {'scale_value': 100}

**Components**:
- compare

**Validation**:
```
compare() 展示三条归一化折线，可见品种间分化时段。

```

## `KUC-IN-005`
**Source**: `examples/intent/intent.py`

比较纳指/标普/美元指数三者走势（2015年后）， 研究"美元强弱对美股的压制效应"，辅助海外资产配置。

**Inputs**:
- {'entity_ids': ['indexus_us_NDX', 'indexus_us_SPX', 'indexus_us_UDI']}
- {'start_timestamp': '2015-01-01'}
- {'scale_value': 100}

**Components**:
- Indexus1dKdata
- compare

**Parameters**:
```
{'schema_map_columns': {'Indexus1dKdata': ['close']}}
```

**Validation**:
```
compare() 展示三轨折线，可分析纳指与美元指数的走势分化。

```

## `KUC-IN-006`
**Source**: `examples/intent/intent.py`

对比人民币兑美元汇率（USDCNY）与沪指走势， 研究汇率贬值对A股资金流向的影响，辅助外资流出风险判断。

**Inputs**:
- {'entity_ids': ['index_sh_000001', 'currency_forex_USDCNYC']}
- {'start_timestamp': '2005-01-01'}
- {'scale_value': 100}

**Components**:
- Currency1dKdata
- Index1dKdata
- compare

**Parameters**:
```
{'schema_map_columns': {'Currency1dKdata': ['close'], 'Index1dKdata': ['close']}}
```

**Validation**:
```
compare() 展示双轨折线，可见汇率与指数的阶段性反相关。

```

## `KUC-TR-001`
**Source**: `examples/trader/ma_trader.py`

最简单的MA均线交叉回测：5日线上穿10日线买入、下穿卖出， 验证双均线策略在单只/多只A股上的历史收益， 提供最基础的趋势跟踪策略 baseline。

**Inputs**:
- {'codes': ['000338']}
- {'level': 'IntervalLevel.LEVEL_1DAY'}
- {'start_timestamp': '2019-01-01'}
- {'end_timestamp': '2019-06-30'}
- {'windows': [5, 10]}

**Components**:
- CrossMaFactor
- MyMaTrader (继承 StockTrader)

**Parameters**:
```
{'need_persist': False, 'trader_name': '000338_ma_trader'}
```

**Validation**:
```
trader.run() 完成后，zvt 数据库中有 trader_name 对应的交易记录； trader.draw_result() 展示净值曲线。

```

## `KUC-TR-002`
**Source**: `examples/trader/ma_trader.py`

MACD多头市场过滤+MA均线交叉的组合策略回测：只在大趋势向上（BullFactor）时 持有多头仓位，降低熊市做多风险，验证加入趋势过滤的策略改进效果。

**Inputs**:
- {'codes': ['000338']}
- {'level': 'IntervalLevel.LEVEL_1DAY'}
- {'start_timestamp': '2019-01-01'}
- {'adjust_type': 'hfq'}

**Components**:
- BullFactor
- MyBullTrader (继承 StockTrader)

**Validation**:
```
trader.run() 完成，净值曲线较纯MA策略回撤更低。

```

## `KUC-TR-003`
**Source**: `examples/trader/macd_day_trader.py`

日线 MACD 金叉策略完整框架示例：演示如何在 StockTrader 中 覆盖止盈止损（on_profit_control）、开收盘钩子（on_trading_open/close）、 交易信号批处理（on_trading_signals）等生命周期方法。

**Inputs**:
- {'start_timestamp': '2019-01-01'}
- {'end_timestamp': '2020-01-01'}
- {'provider': 'joinquant'}
- {'level': 'IntervalLevel.LEVEL_1DAY'}

**Components**:
- GoldCrossFactor
- MacdDayTrader (继承 StockTrader)

**Parameters**:
```
{'start_offset_days': -50}
```

**Validation**:
```
trader.run() 不报错；各 hook 方法被正确调用（可在 override 中添加日志验证）。

```

## `KUC-TR-004`
**Source**: `examples/trader/macd_week_and_day_trader.py`

周线+日线双时间框架 MACD 策略：只有周线和日线同时金叉时才开多仓， 降低误信号，验证多周期共振策略的信号质量提升。

**Inputs**:
- {'start_timestamp': '2019-01-01'}
- {'end_timestamp': '2020-01-01'}
- {'provider': 'joinquant'}

**Components**:
- GoldCrossFactor (LEVEL_1WEEK)
- GoldCrossFactor (LEVEL_1DAY)
- MultipleLevelTrader (继承 StockTrader)

**Parameters**:
```
{'on_targets_selected_from_levels': 'override 可自定义多级别合并逻辑'}
```

**Validation**:
```
trader.run() 完成；相比纯日线策略，交易次数减少但胜率更高。

```

## `KUC-TR-005`
**Source**: `examples/trader/dragon_and_tiger_trader.py`

基于龙虎榜跟踪机构专用席位：当"机构专用"席位在某股票上榜时产生买入信号， 依此验证"跟机构席位"策略的历史有效性。

**Inputs**:
- {'start_timestamp': '2020-01-01'}
- {'end_timestamp': '2022-05-01'}
- {'provider': 'em'}

**Components**:
- DragonTigerFactor (继承 Factor，数据源为 DragonAndTiger)
- MyTrader (继承 StockTrader)

**Parameters**:
```
{'filter': 'DragonAndTiger.dep1 == "机构专用"'}
```

**Validation**:
```
trader.run() 完成，交易记录中开仓时间与龙虎榜机构上榜日期一致。

```

## `KUC-TR-006`
**Source**: `examples/trader/follow_ii_trader.py`

跟随公募基金持仓变化做多/平仓：当基金持仓比例季报新增超5%时买入， 减持超50%时卖出，验证"跟基金重仓变化"的交易逻辑在单只股票（茅台）上的效果。

**Inputs**:
- {'code': '600519'}
- {'start_timestamp': '2002-01-01'}
- {'end_timestamp': '2021-01-01'}
- {'adjust_type': 'AdjustType.qfq'}

**Components**:
- FollowIITrader (继承 StockTrader，override on_time)
- StockActorSummary
- Stock1dKdata

**Parameters**:
```
{'actor_type': 'ActorType.raised_fund', 'long_threshold': 0.05, 'short_threshold': -0.5, 'profit_threshold': None}
```

**Validation**:
```
trader.run() 完成；可视化净值曲线在2010-2021年茅台上涨期间应呈显著正收益。

```

## `KUC-TR-007`
**Source**: `examples/trader/keep_run_trader.py`

滚动40天区间多因子组合策略：每40天重新计算股票池（成交量前30%+机构重仓前30%的交集）， 结合周线牛市判断+日线金叉入场，验证动态股票池筛选对组合策略的增益。

**Inputs**:
- {'start': '2019-01-01'}
- {'end': '2021-01-01'}
- {'interval': 40}
- {'vol_pct': 0.3}
- {'ii_pct': 0.3}

**Components**:
- MultipleLevelTrader (BullFactor LEVEL_1WEEK + GoldCrossFactor LEVEL_1DAY)
- get_top_volume_entities
- get_top_fund_holding_stocks
- split_time_interval
- clear_trader

**Parameters**:
```
{'keep_history': True, 'draw_result': False, 'rich_mode': False, 'trader_name': 'keep_run_trader'}
```

**Validation**:
```
所有时间段遍历完成后，zvt 数据库中有完整的 keep_run_trader 交易历史记录。

```

## `KUC-QU-001`
**Source**: `examples/query_snippet.py`

演示 StockTags 的 JSON 字段查询能力：通过 SQLite JSON_EXTRACT 函数 按子标签（如"低空经济"）精确筛选A股，解决标签多值存储下的高效检索问题。

**Inputs**:
- {'sub_tag': '低空经济'}

**Components**:
- StockTags
- func.json_extract (SQLAlchemy)

**Parameters**:
```
{'json_path': '$."{tag}"'}
```

**Validation**:
```
query_json() 返回含对应子标签的 DataFrame，行数与东方财富"低空经济"板块成份数接近。

```

## `KUC-QU-002`
**Source**: `examples/query_snippet.py`

快速获取当前标签库的覆盖缺口：找出尚无标签的在市A股代码列表， 为标签体系维护提供自动差异发现。

**Inputs**:
- {'provider': 'em'}
- {'ignore_delist': True}
- {'ignore_st': True}

**Components**:
- StockTags
- get_entity_ids_by_filter

**Validation**:
```
get_stocks_without_tag() 返回非空列表，代码均可在东方财富查到且无标签记录。

```

## `KUC-QU-003`
**Source**: `examples/tag_utils.py`

为一批A股个股自动生成默认行业标签（通过行业板块->主题映射表）， 解决大批量新股入库时缺少标签的冷启动问题。

**Inputs**:
- {'codes': '无标签股票代码列表'}
- {'provider': 'em'}

**Components**:
- BlockStock
- Block
- industry_to_tag (行业->主题映射函数)
- build_default_tags

**Parameters**:
```
{'block_category': 'industry'}
```

**Validation**:
```
build_default_tags(codes) 返回含 code/name/tag/reason 字段的字典列表， 无行业信息的股票自动打印告警。

```

## `KUC-QU-004`
**Source**: `examples/utils.py`

按热词配置（hot.json）对选出的个股用新闻标题分组， 归类到主题（如"华为"、"新能源"等），辅助人工快速了解选股的热门主题背景。

**Inputs**:
- {'entities': '已选个股列表'}
- {'hot_words_config': 'hot.json（主题:关键词列表）'}
- {'days_ago': 60}
- {'threshold': 3}

**Components**:
- StockNews
- group_stocks_by_topic
- msg_group_stocks_by_topic

**Parameters**:
```
{'entity_ids': '自动从 entities 提取'}
```

**Validation**:
```
msg_group_stocks_by_topic() 返回按主题分组的字符串， 包含"^^^^^^ 主题(N) ^^^^^^"格式。

```

## `KUC-QU-005`
**Source**: `examples/migration.py`

演示如何用 Pydantic + SQLAlchemy Mixin 向 zvt 注册自定义数据 schema， 支持业务团队扩展本地数据库存储自定义实体和 JSON 字段，无需修改框架核心。

**Inputs**:
- {'custom_schema': 'User (含 added_col: String, json_col: JSON)'}
- {'db_name': 'test'}
- {'providers': ['zvt']}

**Components**:
- Mixin
- register_schema
- get_db_session
- UserModel (Pydantic BaseModel)

**Parameters**:
```
{'declarative_base': 'ZvtInfoBase'}
```

**Validation**:
```
UserModel.validate(user) 不报错，说明 SQLAlchemy ORM 对象可无缝转换为 Pydantic 模型。

```

FILE:references/WISDOM.md
# Cross-Project Wisdom

Total: **23**

## `CW-BT-001` — Cerebro 统一编排引擎
**From**: backtrader · **Applicable to**: backtesting

backtrader 用 Cerebro 作为单一入口，统一管理 data feeds、strategies、analyzers、 observers 的生命周期，支持一次 cerebro.run() 跑多策略+多数据源。 zvt 的 StockTrader 目前每次实例化只绑定一套因子，缺乏统一的多策略组合编排层； 借鉴 Cerebro 模式可让用户把多个 Trader 实例组合到一个 runner 中对比评估。

## `CW-BT-002` — Analyzer 插件化绩效评估
**From**: backtrader · **Applicable to**: backtesting

backtrader 提供 SharpeRatio、DrawDown、TimeReturn、TradeAnalyzer 等即插即用 的 Analyzer，可在不修改策略代码的情况下附加任意绩效指标。 zvt 当前绩效评估能力较弱，没有标准化的 Analyzer 接口； 借鉴此模式可让用户 cerebro.addanalyzer(SharpeRatio) 即得风险调整收益报告。

## `CW-BT-003` — Sizer 仓位管理分离
**From**: backtrader · **Applicable to**: backtesting

backtrader 将仓位管理（每次开仓买多少股/多大比例）单独抽象为 Sizer， 与信号逻辑完全解耦；内置 FixedSize、PercentSizer 等，用户可自定义。 zvt 目前没有显式的 Sizer 概念，仓位控制逻辑散落在 Trader.on_profit_control 等钩子中； 引入 Sizer 接口可使策略信号与资金管理规则独立演化和组合复用。

## `CW-BT-004` — Order 类型全集（Limit/Stop/OCO/Bracket）
**From**: backtrader · **Applicable to**: backtesting

backtrader 支持 Market、Limit、Stop、StopLimit、OCO（二选一）、 Bracket（止盈止损一对订单）等丰富订单类型，并模拟成交滑点和手续费方案。 zvt 回测目前主要支持市价成交，缺乏限价委托和组合订单模拟； 对于高频或实盘对接场景，完善订单类型将大幅提升回测真实性。

## `CW-BT-005` — 数据重采样与重播（Resampling & Replaying）
**From**: backtrader · **Applicable to**: backtesting

backtrader 可将低级别数据（如 1 min）实时 resample 为高级别（如 1 day）并同步驱动策略， 或 replay 逐 tick 模拟 OHLC 形成过程，实现日内精细回测。 zvt 目前多时间框架通过预录入不同级别 K 线实现，缺少运行时动态重采样； 借鉴此模式可在不重复录入数据的前提下支持任意时间粒度组合回测。

## `CW-VN-001` — 事件驱动引擎 (EventEngine)
**From**: vnpy · **Applicable to**: live-trading

vnpy 的核心是一个异步事件总线（EventEngine），行情推送、委托回报、 成交通知等均以事件消息方式在各 App/Gateway 间流转， 天然支持实盘+回测同一套代码逻辑。 zvt 目前数据流是同步批量拉取，缺乏事件驱动架构； 对接实盘行情推送（如 WebSocket tick 流）时，事件驱动模式可大幅降低延迟。

## `CW-VN-002` — Gateway 多交易所统一接口抽象
**From**: vnpy · **Applicable to**: live-trading

vnpy 的 Gateway 层对 CTP 期货、XTP 证券、IB 等几十个交易接口做统一封装， 策略层只调用 buy/sell/cancel 通用接口，无需感知底层协议差异。 zvt 目前数据录入依赖具体 provider（em/joinquant），无统一的实盘交易 Gateway； 引入 Gateway 抽象可使 zvt 的因子+选股逻辑无缝对接实盘下单。

## `CW-VN-003` — CTA 回测引擎内置可视化
**From**: vnpy · **Applicable to**: backtesting

vnpy 的 cta_backtester 提供图形界面直接展示策略净值曲线、最大回撤、 每日盈亏、成交明细，无需 Jupyter Notebook。 zvt 目前回测结果可视化依赖 draw_result 方法调用 Plotly，但无统一的回测报告页面； 借鉴此模式可打包一个开箱即用的策略绩效仪表盘。

## `CW-VN-004` — vnpy.alpha ML 因子研究实验室（Lab）
**From**: vnpy · **Applicable to**: factor-research

vnpy 4.0 的 vnpy.alpha.lab 提供数据管理、模型训练、信号生成、策略回测一体化工作流， 支持 Lasso/LightGBM/MLP 等算法的标准化训练接口和可视化对比。 zvt 的 ML 能力目前仅有 MaStockMLMachine 一个入口，缺乏规范化 Lab 框架； 借鉴 Lab 模式可建立"特征工程→训练→信号→回测"的标准流水线，降低 ML 实验门槛。

## `CW-VN-005` — 价差交易（Spread Trading）模块
**From**: vnpy · **Applicable to**: live-trading

vnpy 支持自定义价差（如期货跨期套利、A股与港股溢价套利）， 实时计算价差行情、自动触发价差策略委托。 zvt 目前 compare() 只做可视化对比，缺乏价差信号计算和交易执行； 借鉴价差模块可扩展 zvt 到统计套利场景（如 AH 溢价、指数与成份股套利）。

## `CW-QL-001` — Point-in-Time 数据库（防未来数据泄漏）
**From**: qlib · **Applicable to**: backtesting

qlib 的 Point-in-Time Provider 保证在给定时间点 t 的查询只返回 t 时刻 真实可知的数据（财报发布延迟、修订历史均被正确处理）， 彻底消除回测中的 look-ahead bias。 zvt 目前财务数据以报告期为 timestamp，缺少"发布日"维度， 存在用未来财报数据做选股的潜在偏差；引入 PIT 模式可大幅提升回测可信度。

## `CW-QL-002` — Recorder + Experiment 实验管理（MLflow 风格）
**From**: qlib · **Applicable to**: factor-research

qlib 的 workflow 模块提供 Experiment/Recorder，自动记录每次模型训练的 超参数、特征、指标、预测结果，支持跨实验比较和模型版本管理。 zvt 目前缺乏 ML 实验追踪机制，每次重跑结果会覆盖前次； 借鉴 Recorder 模式可将每次因子实验的参数和结果持久化，支持快速复现和版本对比。

## `CW-QL-003` — Nested Decision Framework（多层嵌套决策执行）
**From**: qlib · **Applicable to**: backtesting

qlib 支持将高频执行层（分钟级委托拆单）嵌套在低频决策层（日级组合调仓）中， 两层独立优化且可组合运行，实现日内最优执行算法（如 TWAP、VWAP 调仓）。 zvt 目前回测仅有日线级别的成交假设，缺乏执行算法建模； 借鉴嵌套框架可让 zvt 区分"何时持有哪些股"与"如何以最小冲击成本建仓"两个问题。

## `CW-BT-001` — Cerebro 统一编排引擎
**From**: backtrader · **Applicable to**: backtesting

backtrader 用 Cerebro 作为单一入口，统一管理 data feeds、strategies、analyzers、 observers 的生命周期，支持一次 cerebro.run() 跑多策略+多数据源。 zvt 的 StockTrader 目前每次实例化只绑定一套因子，缺乏统一的多策略组合编排层； 借鉴 Cerebro 模式可让用户把多个 Trader 实例组合到一个 runner 中对比评估。

## `CW-BT-002` — Analyzer 插件化绩效评估
**From**: backtrader · **Applicable to**: backtesting

backtrader 提供 SharpeRatio、DrawDown、TimeReturn、TradeAnalyzer 等即插即用 的 Analyzer，可在不修改策略代码的情况下附加任意绩效指标。 zvt 当前绩效评估能力较弱，没有标准化的 Analyzer 接口； 借鉴此模式可让用户 cerebro.addanalyzer(SharpeRatio) 即得风险调整收益报告。

## `CW-BT-003` — Sizer 仓位管理分离
**From**: backtrader · **Applicable to**: backtesting

backtrader 将仓位管理（每次开仓买多少股/多大比例）单独抽象为 Sizer， 与信号逻辑完全解耦；内置 FixedSize、PercentSizer 等，用户可自定义。 zvt 目前没有显式的 Sizer 概念，仓位控制逻辑散落在 Trader.on_profit_control 等钩子中； 引入 Sizer 接口可使策略信号与资金管理规则独立演化和组合复用。

## `CW-BT-004` — Order 类型全集（Limit/Stop/OCO/Bracket）
**From**: backtrader · **Applicable to**: backtesting

backtrader 支持 Market、Limit、Stop、StopLimit、OCO（二选一）、 Bracket（止盈止损一对订单）等丰富订单类型，并模拟成交滑点和手续费方案。 zvt 回测目前主要支持市价成交，缺乏限价委托和组合订单模拟； 对于高频或实盘对接场景，完善订单类型将大幅提升回测真实性。

## `CW-BT-005` — 数据重采样与重播（Resampling & Replaying）
**From**: backtrader · **Applicable to**: backtesting

backtrader 可将低级别数据（如 1 min）实时 resample 为高级别（如 1 day）并同步驱动策略， 或 replay 逐 tick 模拟 OHLC 形成过程，实现日内精细回测。 zvt 目前多时间框架通过预录入不同级别 K 线实现，缺少运行时动态重采样； 借鉴此模式可在不重复录入数据的前提下支持任意时间粒度组合回测。

## `CW-VN-003` — CTA 回测引擎内置可视化
**From**: vnpy · **Applicable to**: backtesting

vnpy 的 cta_backtester 提供图形界面直接展示策略净值曲线、最大回撤、 每日盈亏、成交明细，无需 Jupyter Notebook。 zvt 目前回测结果可视化依赖 draw_result 方法调用 Plotly，但无统一的回测报告页面； 借鉴此模式可打包一个开箱即用的策略绩效仪表盘。

## `CW-VN-004` — vnpy.alpha ML 因子研究实验室（Lab）
**From**: vnpy · **Applicable to**: factor-research

vnpy 4.0 的 vnpy.alpha.lab 提供数据管理、模型训练、信号生成、策略回测一体化工作流， 支持 Lasso/LightGBM/MLP 等算法的标准化训练接口和可视化对比。 zvt 的 ML 能力目前仅有 MaStockMLMachine 一个入口，缺乏规范化 Lab 框架； 借鉴 Lab 模式可建立"特征工程→训练→信号→回测"的标准流水线，降低 ML 实验门槛。

## `CW-QL-001` — Point-in-Time 数据库（防未来数据泄漏）
**From**: qlib · **Applicable to**: backtesting

qlib 的 Point-in-Time Provider 保证在给定时间点 t 的查询只返回 t 时刻 真实可知的数据（财报发布延迟、修订历史均被正确处理）， 彻底消除回测中的 look-ahead bias。 zvt 目前财务数据以报告期为 timestamp，缺少"发布日"维度， 存在用未来财报数据做选股的潜在偏差；引入 PIT 模式可大幅提升回测可信度。

## `CW-QL-002` — Recorder + Experiment 实验管理（MLflow 风格）
**From**: qlib · **Applicable to**: factor-research

qlib 的 workflow 模块提供 Experiment/Recorder，自动记录每次模型训练的 超参数、特征、指标、预测结果，支持跨实验比较和模型版本管理。 zvt 目前缺乏 ML 实验追踪机制，每次重跑结果会覆盖前次； 借鉴 Recorder 模式可将每次因子实验的参数和结果持久化，支持快速复现和版本对比。

## `CW-QL-003` — Nested Decision Framework（多层嵌套决策执行）
**From**: qlib · **Applicable to**: backtesting

qlib 支持将高频执行层（分钟级委托拆单）嵌套在低频决策层（日级组合调仓）中， 两层独立优化且可组合运行，实现日内最优执行算法（如 TWAP、VWAP 调仓）。 zvt 目前回测仅有日线级别的成交假设，缺乏执行算法建模； 借鉴嵌套框架可让 zvt 区分"何时持有哪些股"与"如何以最小冲击成本建仓"两个问题。

FILE:references/components/api.md
# api (2 classes)

## `WindowMethod`
`api/stats.py:28`

## `TopType`
`api/stats.py:34`

FILE:references/components/broker.md
# broker (6 classes)

## `QmtContext`
`broker/qmt/context.py:8`

## `TraderError`
`broker/qmt/errors.py:2`
> Base class for exceptions in this module.

## `QmtError`
`broker/qmt/errors.py:8`

## `PositionOverflowError`
`broker/qmt/errors.py:13`

## `MyXtQuantTraderCallback`
`broker/qmt/qmt_account.py:26`

## `QmtStockAccount`
`broker/qmt/qmt_account.py:107`

FILE:references/components/common.md
# common (9 classes)

## `OrderByType`
`common/query_models.py:9`

## `TimeUnit`
`common/query_models.py:14`

## `AbsoluteTimeRange`
`common/query_models.py:23`

## `RelativeTimeRage`
`common/query_models.py:28`

## `TimeRange`
`common/query_models.py:33`

## `PositionType`
`common/trading_models.py:8`

## `BuyParameter`
`common/trading_models.py:17`

## `SellParameter`
`common/trading_models.py:25`

## `TradingResult`
`common/trading_models.py:30`

FILE:references/components/contract.md
# contract (53 classes)

## `IntervalLevel`
`contract/__init__.py:5`
> Repeated fixed time interval, e.g, 5m, 1d.

## `AdjustType`
`contract/__init__.py:121`
> split-adjusted type for :class:`~.zvt.contract.schema.TradableEntity` quotes

## `ActorType`
`contract/__init__.py:138`

## `TradableType`
`contract/__init__.py:159`

## `Exchange`
`contract/__init__.py:203`

## `StatefulService`
`contract/base_service.py:10`
> Base service with state could be stored in state_schema

## `OneStateService`
`contract/base_service.py:65`
> StatefulService which saving all states in one object

## `EntityStateService`
`contract/base_service.py:87`
> StatefulService which saving one state one entity

## `Registry`
`contract/context.py:13`
> Class storing zvt registering meta

## `Bean`
`contract/data_type.py:4`

## `ChartType`
`contract/drawer.py:20`
> Chart type enum

## `Rect`
`contract/drawer.py:45`
> rect struct with left-bottom(x0, y0), right-top(x1, y1)

## `Draw`
`contract/drawer.py:61`

## `Drawable`
`contract/drawer.py:231`

## `StackedDrawer`
`contract/drawer.py:296`

## `Drawer`
`contract/drawer.py:407`

## `TargetType`
`contract/factor.py:22`

## `Indicator`
`contract/factor.py:28`

## `Transformer`
`contract/factor.py:34`

## `Accumulator`
`contract/factor.py:82`

## `Scorer`
`contract/factor.py:163`

## `FactorMeta`
`contract/factor.py:181`

## `Factor`
`contract/factor.py:188`

## `ScoreFactor`
`contract/factor.py:667`

## `CustomModel`
`contract/model.py:7`

## `MixinModel`
`contract/model.py:11`

## `NormalData`
`contract/normal_data.py:6`

## `DataListener`
`contract/reader.py:16`

## `DataReader`
`contract/reader.py:40`

## `Meta`
`contract/recorder.py:71`

## `Recorder`
`contract/recorder.py:91`

## `EntityEventRecorder`
`contract/recorder.py:147`

## `TimeSeriesDataRecorder`
`contract/recorder.py:245`

## `FixedCycleDataRecorder`
`contract/recorder.py:612`

## `TimestampsDataRecorder`
`contract/recorder.py:712`

## `RouteRegistry`
`contract/route_registry.py:28`
> Maps (provider, db_name) or (provider, data_schema) to storage_id.

## `Mixin`
`contract/schema.py:34`
> Base class of schema.

## `NormalMixin`
`contract/schema.py:326`

## `Entity`
`contract/schema.py:333`

## `TradableEntity`
`contract/schema.py:348`
> tradable entity

## `ActorEntity`
`contract/schema.py:534`

## `NormalEntityMixin`
`contract/schema.py:538`

## `Portfolio`
`contract/schema.py:545`
> composition of tradable entities

## `PortfolioStock`
`contract/schema.py:580`

## `PortfolioStockHistory`
`contract/schema.py:596`

## `TradableMeetActor`
`contract/schema.py:613`

## `ActorMeetTradable`
`contract/schema.py:626`

## `StorageBackend`
`contract/storage.py:38`
> Abstract storage backend. Decouples physical storage from domain/read/record logic.

## `SqliteStorageBackend`
`contract/storage.py:65`
> SQLite storage backend. Default path: {data_path}/{provider}/{provider}_{db_name}.db

## `StateMixin`
`contract/zvt_info.py:11`

## `RecorderState`
`contract/zvt_info.py:19`
> Schema for storing recorder state

## `TaggerState`
`contract/zvt_info.py:27`
> Schema for storing tagger state

## `FactorState`
`contract/zvt_info.py:35`
> Schema for storing factor state

FILE:references/components/domain.md
# domain (114 classes)

## `BlockCategory`
`domain/__init__.py:5`

## `IndexCategory`
`domain/__init__.py:14`

## `ReportPeriod`
`domain/__init__.py:48`

## `CompanyType`
`domain/__init__.py:59`

## `ActorMeta`
`domain/actor/actor_meta.py:12`

## `StockTopTenFreeHolder`
`domain/actor/stock_actor.py:11`

## `StockTopTenHolder`
`domain/actor/stock_actor.py:25`

## `StockInstitutionalInvestorHolder`
`domain/actor/stock_actor.py:39`

## `StockActorSummary`
`domain/actor/stock_actor.py:53`

## `LimitUpInfo`
`domain/emotion/emotion.py:11`

## `LimitDownInfo`
`domain/emotion/emotion.py:46`

## `Emotion`
`domain/emotion/emotion.py:63`

## `DividendFinancing`
`domain/fundamental/dividend_financing.py:11`

## `DividendDetail`
`domain/fundamental/dividend_financing.py:32`

## `SpoDetail`
`domain/fundamental/dividend_financing.py:49`

## `RightsIssueDetail`
`domain/fundamental/dividend_financing.py:60`

## `BalanceSheet`
`domain/fundamental/finance.py:11`

## `IncomeStatement`
`domain/fundamental/finance.py:460`

## `CashFlowStatement`
`domain/fundamental/finance.py:629`

## `FinanceFactor`
`domain/fundamental/finance.py:831`

## `ManagerTrading`
`domain/fundamental/trading.py:11`

## `HolderTrading`
`domain/fundamental/trading.py:37`

## `BigDealTrading`
`domain/fundamental/trading.py:53`

## `MarginTrading`
`domain/fundamental/trading.py:71`

## `DragonAndTiger`
`domain/fundamental/trading.py:91`

## `StockValuation`
`domain/fundamental/valuation.py:11`

## `EtfValuation`
`domain/fundamental/valuation.py:38`

## `Economy`
`domain/macro/macro.py:11`

## `TreasuryYield`
`domain/macro/monetary.py:11`

## `Block`
`domain/meta/block_meta.py:14`

## `BlockStock`
`domain/meta/block_meta.py:21`

## `Blockus`
`domain/meta/blockus_meta.py:14`

## `BlockusStockus`
`domain/meta/blockus_meta.py:21`

## `CBond`
`domain/meta/cbond_meta.py:13`

## `Country`
`domain/meta/country_meta.py:13`

## `Currency`
`domain/meta/currency_meta.py:12`

## `Etf`
`domain/meta/etf_meta.py:15`

## `EtfStock`
`domain/meta/etf_meta.py:26`

## `Fund`
`domain/meta/fund_meta.py:14`

## `FundStock`
`domain/meta/fund_meta.py:53`

## `Future`
`domain/meta/future_meta.py:11`

## `Index`
`domain/meta/index_meta.py:14`

## `IndexStock`
`domain/meta/index_meta.py:26`

## `Indexhk`
`domain/meta/indexhk_meta.py:14`

## `Indexus`
`domain/meta/indexus_meta.py:14`

## `Stock`
`domain/meta/stock_meta.py:14`

## `StockDetail`
`domain/meta/stock_meta.py:32`

## `Stockhk`
`domain/meta/stockhk_meta.py:13`

## `Stockus`
`domain/meta/stockus_meta.py:14`

## `HkHolder`
`domain/misc/holder.py:11`

## `TopTenTradableHolder`
`domain/misc/holder.py:29`

## `TopTenHolder`
`domain/misc/holder.py:52`

## `InstitutionalInvestorHolder`
`domain/misc/holder.py:75`

## `BlockMoneyFlow`
`domain/misc/money_flow.py:14`

## `StockMoneyFlow`
`domain/misc/money_flow.py:48`

## `IndexMoneyFlow`
`domain/misc/money_flow.py:82`

## `StockSummary`
`domain/misc/overall.py:14`

## `MarginTradingSummary`
`domain/misc/overall.py:33`

## `CrossMarketSummary`
`domain/misc/overall.py:56`

## `StockNews`
`domain/misc/stock_news.py:11`

## `StockHotTopic`
`domain/misc/stock_news.py:28`

## `KdataCommon`
`domain/quotes/__init__.py:7`

## `TickCommon`
`domain/quotes/__init__.py:33`

## `BlockKdataCommon`
`domain/quotes/__init__.py:60`

## `IndexKdataCommon`
`domain/quotes/__init__.py:64`

## `IndexhkKdataCommon`
`domain/quotes/__init__.py:68`

## `IndexusKdataCommon`
`domain/quotes/__init__.py:72`

## `EtfKdataCommon`
`domain/quotes/__init__.py:76`

## `StockKdataCommon`
`domain/quotes/__init__.py:83`

## `StockusKdataCommon`
`domain/quotes/__init__.py:90`

## `StockhkKdataCommon`
`domain/quotes/__init__.py:97`

## `FutureKdataCommon`
`domain/quotes/__init__.py:102`

## `CurrencyKdataCommon`
`domain/quotes/__init__.py:113`

## `Block1dKdata`
`domain/quotes/block/block_1d_kdata.py:11`

## `Block1monKdata`
`domain/quotes/block/block_1mon_kdata.py:11`

## `Block1wkKdata`
`domain/quotes/block/block_1wk_kdata.py:11`

## `Currency1dKdata`
`domain/quotes/currency/currency_1d_kdata.py:11`

## `Etf1dKdata`
`domain/quotes/etf/etf_1d_kdata.py:11`

## `Future1dKdata`
`domain/quotes/future/future_1d_kdata.py:11`

## `Index1dKdata`
`domain/quotes/index/index_1d_kdata.py:11`

## `Index1mKdata`
`domain/quotes/index/index_1m_kdata.py:12`

## `Index1wkKdata`
`domain/quotes/index/index_1wk_kdata.py:11`

## `Indexhk1dKdata`
`domain/quotes/indexhk/indexhk_1d_kdata.py:11`

## `Indexus1dKdata`
`domain/quotes/indexus/indexus_1d_kdata.py:11`

## `Stock15mHfqKdata`
`domain/quotes/stock/stock_15m_hfq_kdata.py:11`

## `Stock15mKdata`
`domain/quotes/stock/stock_15m_kdata.py:11`

## `Stock1dHfqKdata`
`domain/quotes/stock/stock_1d_hfq_kdata.py:11`

## `Stock1dKdata`
`domain/quotes/stock/stock_1d_kdata.py:11`

## `Stock1hHfqKdata`
`domain/quotes/stock/stock_1h_hfq_kdata.py:11`

## `Stock1hKdata`
`domain/quotes/stock/stock_1h_kdata.py:11`

## `Stock1mHfqKdata`
`domain/quotes/stock/stock_1m_hfq_kdata.py:11`

## `Stock1mKdata`
`domain/quotes/stock/stock_1m_kdata.py:11`

## `Stock1monHfqKdata`
`domain/quotes/stock/stock_1mon_hfq_kdata.py:11`

## `Stock1monKdata`
`domain/quotes/stock/stock_1mon_kdata.py:11`

## `Stock1wkHfqKdata`
`domain/quotes/stock/stock_1wk_hfq_kdata.py:11`

## `Stock1wkKdata`
`domain/quotes/stock/stock_1wk_kdata.py:11`

## `Stock30mHfqKdata`
`domain/quotes/stock/stock_30m_hfq_kdata.py:11`

## `Stock30mKdata`
`domain/quotes/stock/stock_30m_kdata.py:11`

## `Stock4hHfqKdata`
`domain/quotes/stock/stock_4h_hfq_kdata.py:11`

## `Stock4hKdata`
`domain/quotes/stock/stock_4h_kdata.py:11`

## `Stock5mHfqKdata`
`domain/quotes/stock/stock_5m_hfq_kdata.py:11`

## `Stock5mKdata`
`domain/quotes/stock/stock_5m_kdata.py:11`

## `StockQuote`
`domain/quotes/stock/stock_quote.py:12`

## `Stock1mQuote`
`domain/quotes/stock/stock_quote.py:32`

## `StockQuoteLog`
`domain/quotes/stock/stock_quote_log.py:11`

## `Stockhk1dHfqKdata`
`domain/quotes/stockhk/stockhk_1d_hfq_kdata.py:11`

## `Stockhk1dKdata`
`domain/quotes/stockhk/stockhk_1d_kdata.py:11`

## `StockhkQuote`
`domain/quotes/stockhk/stockhk_quote.py:12`

## `Stockhk1mQuote`
`domain/quotes/stockhk/stockhk_quote.py:36`

## `Stockus1dHfqKdata`
`domain/quotes/stockus/stockus_1d_hfq_kdata.py:11`

## `Stockus1dKdata`
`domain/quotes/stockus/stockus_1d_kdata.py:11`

## `StockusQuote`
`domain/quotes/stockus/stockus_quote.py:12`

## `Stockus1mQuote`
`domain/quotes/stockus/stockus_quote.py:36`

## `StockTradeDay`
`domain/quotes/trade_day.py:10`

FILE:references/components/factors.md
# factors (54 classes)

## `RankScorer`
`factors/algorithm.py:141`

## `MaTransformer`
`factors/algorithm.py:150`

## `IntersectTransformer`
`factors/algorithm.py:193`

## `MaAndVolumeTransformer`
`factors/algorithm.py:224`

## `MacdTransformer`
`factors/algorithm.py:269`

## `QuantileScorer`
`factors/algorithm.py:311`

## `FactorRequestModel`
`factors/factor_models.py:12`

## `TradingSignalModel`
`factors/factor_models.py:20`

## `FactorResultModel`
`factors/factor_models.py:31`

## `FinanceBaseFactor`
`factors/fundamental/finance_factor.py:13`

## `GoodCompanyFactor`
`factors/fundamental/finance_factor.py:77`

## `MaStatsFactorCommon`
`factors/ma/domain/common.py:7`

## `Stock1dMaFactor`
`factors/ma/domain/stock_1d_ma_factor.py:11`

## `Stock1dMaStatsFactor`
`factors/ma/domain/stock_1d_ma_stats_factor.py:10`

## `MaFactor`
`factors/ma/ma_factor.py:24`

## `CrossMaFactor`
`factors/ma/ma_factor.py:93`

## `VolumeUpMaFactor`
`factors/ma/ma_factor.py:107`

## `CrossMaVolumeFactor`
`factors/ma/ma_factor.py:233`

## `MaStatsAccumulator`
`factors/ma/ma_stats_factor.py:24`

## `MaStatsFactor`
`factors/ma/ma_stats_factor.py:71`

## `TFactor`
`factors/ma/ma_stats_factor.py:147`

## `TopBottomTransformer`
`factors/ma/top_bottom_factor.py:17`

## `TopBottomFactor`
`factors/ma/top_bottom_factor.py:34`

## `MacdFactor`
`factors/macd/macd_factor.py:11`

## `BullFactor`
`factors/macd/macd_factor.py:24`

## `KeepBullFactor`
`factors/macd/macd_factor.py:30`

## `LiveOrDeadFactor`
`factors/macd/macd_factor.py:46`

## `GoldCrossFactor`
`factors/macd/macd_factor.py:56`

## `Direction`
`factors/shape.py:17`

## `Fenxing`
`factors/shape.py:28`

## `FactorStateEncoder`
`factors/shape.py:220`

## `TradeType`
`factors/target_selector.py:16`

## `SelectMode`
`factors/target_selector.py:25`

## `TargetSelector`
`factors/target_selector.py:30`

## `TechnicalFactor`
`factors/technical_factor.py:11`

## `TopStocks`
`factors/top_stocks.py:38`

## `CrossMaTransformer`
`factors/transformers.py:26`

## `SpecificTransformer`
`factors/transformers.py:42`

## `FallBelowTransformer`
`factors/transformers.py:55`

## `FactorStateEncoder`
`factors/zen/base_factor.py:38`

## `ZenState`
`factors/zen/base_factor.py:64`

## `ZenAccumulator`
`factors/zen/base_factor.py:152`

## `ZenFactor`
`factors/zen/base_factor.py:619`

## `ZenFactorCommon`
`factors/zen/domain/common.py:7`

## `Index1dZenFactor`
`factors/zen/domain/index_1d_zen_factor.py:10`

## `Stock1dZenFactor`
`factors/zen/domain/stock_1d_zen_factor.py:10`

## `Stock1wkZenFactor`
`factors/zen/domain/stock_1wk_zen_factor.py:10`

## `ZhongshuRange`
`factors/zen/zen_factor.py:28`

## `ZhongshuLevel`
`factors/zen/zen_factor.py:42`

## `ZhongshuDistance`
`factors/zen/zen_factor.py:60`

## `Zhongshu`
`factors/zen/zen_factor.py:81`

## `ZenState`
`factors/zen/zen_factor.py:118`

## `TrendingFactor`
`factors/zen/zen_factor.py:253`

## `ShakingFactor`
`factors/zen/zen_factor.py:338`

FILE:references/components/informer.md
# informer (4 classes)

## `Informer`
`informer/informer.py:17`

## `EmailInformer`
`informer/informer.py:22`

## `WechatInformer`
`informer/informer.py:95`

## `QiyeWechatBot`
`informer/informer.py:162`

FILE:references/components/misc.md
# misc (2 classes)

## `TimeMessage`
`misc/misc_models.py:7`

## `ZhDate`
`misc/zhdate.py:11`

FILE:references/components/ml.md
# ml (5 classes)

## `BehaviorCategory`
`ml/lables.py:5`

## `RelativePerformance`
`ml/lables.py:12`

## `MLMachine`
`ml/ml.py:46`

## `StockMLMachine`
`ml/ml.py:208`

## `MaStockMLMachine`
`ml/ml.py:212`

FILE:references/components/recorders.md
# recorders (90 classes)

## `ApiWrapper`
`recorders/eastmoney/common.py:15`

## `EastmoneyApiWrapper`
`recorders/eastmoney/common.py:101`

## `BaseEastmoneyRecorder`
`recorders/eastmoney/common.py:106`

## `EastmoneyTimestampsDataRecorder`
`recorders/eastmoney/common.py:140`

## `EastmoneyPageabeDataRecorder`
`recorders/eastmoney/common.py:163`

## `EastmoneyMoreDataRecorder`
`recorders/eastmoney/common.py:201`

## `DividendDetailRecorder`
`recorders/eastmoney/dividend_financing/eastmoney_dividend_detail_recorder.py:7`

## `DividendFinancingRecorder`
`recorders/eastmoney/dividend_financing/eastmoney_dividend_financing_recorder.py:7`

## `RightsIssueDetailRecorder`
`recorders/eastmoney/dividend_financing/eastmoney_rights_issue_detail_recorder.py:10`

## `SPODetailRecorder`
`recorders/eastmoney/dividend_financing/eastmoney_spo_detail_recorder.py:9`

## `BaseChinaStockFinanceRecorder`
`recorders/eastmoney/finance/base_china_stock_finance_recorder.py:36`

## `ChinaStockBalanceSheetRecorder`
`recorders/eastmoney/finance/eastmoney_balance_sheet_recorder.py:433`

## `ChinaStockCashFlowRecorder`
`recorders/eastmoney/finance/eastmoney_cash_flow_recorder.py:176`

## `ChinaStockFinanceFactorRecorder`
`recorders/eastmoney/finance/eastmoney_finance_factor_recorder.py:144`

## `ChinaStockIncomeStatementRecorder`
`recorders/eastmoney/finance/eastmoney_income_statement_recorder.py:158`

## `EastmoneyActorRecorder`
`recorders/eastmoney/holder/eastmoney_stock_actor_recorder.py:10`

## `TopTenHolderRecorder`
`recorders/eastmoney/holder/eastmoney_top_ten_holder_recorder.py:9`

## `TopTenTradableHolderRecorder`
`recorders/eastmoney/holder/eastmoney_top_ten_tradable_holder_recorder.py:6`

## `EastmoneyBlockRecorder`
`recorders/eastmoney/meta/eastmoney_block_meta_recorder.py:14`

## `EastmoneyBlockStockRecorder`
`recorders/eastmoney/meta/eastmoney_block_meta_recorder.py:52`

## `EastmoneyStockRecorder`
`recorders/eastmoney/meta/eastmoney_stock_meta_recorder.py:13`

## `EastmoneyStockDetailRecorder`
`recorders/eastmoney/meta/eastmoney_stock_meta_recorder.py:18`

## `HolderTradingRecorder`
`recorders/eastmoney/trading/eastmoney_holder_trading_recorder.py:7`

## `ManagerTradingRecorder`
`recorders/eastmoney/trading/eastmoney_manager_trading_recorder.py:7`

## `EMStockActorSummaryRecorder`
`recorders/em/actor/em_stock_actor_summary_recorder.py:28`

## `EMStockIIRecorder`
`recorders/em/actor/em_stock_ii_recorder.py:39`

## `EMStockTopTenFreeRecorder`
`recorders/em/actor/em_stock_top_ten_free_recorder.py:16`

## `EMStockTopTenRecorder`
`recorders/em/actor/em_stock_top_ten_recorder.py:16`

## `EMTreasuryYieldRecorder`
`recorders/em/macro/em_treasury_yield_recorder.py:12`

## `EMBlockRecorder`
`recorders/em/meta/em_block_meta_recorder.py:10`

## `EMBlockStockRecorder`
`recorders/em/meta/em_block_meta_recorder.py:21`

## `EMCBondRecorder`
`recorders/em/meta/em_cbond_meta_recorder.py:9`

## `EMCurrencyRecorder`
`recorders/em/meta/em_currency_meta_recorder.py:9`

## `EMFutureRecorder`
`recorders/em/meta/em_future_meta_recorder.py:9`

## `EMIndexRecorder`
`recorders/em/meta/em_index_meta_recorder.py:9`

## `EMIndexhkRecorder`
`recorders/em/meta/em_indexhk_meta_recorder.py:9`

## `EMIndexusRecorder`
`recorders/em/meta/em_indexus_meta_recorder.py:9`

## `EMStockRecorder`
`recorders/em/meta/em_stock_meta_recorder.py:13`

## `EMStockhkRecorder`
`recorders/em/meta/em_stockhk_meta_recorder.py:12`

## `EMStockusRecorder`
`recorders/em/meta/em_stockus_meta_recorder.py:12`

## `EMStockNewsRecorder`
`recorders/em/misc/em_stock_news_recorder.py:12`

## `BaseEMStockKdataRecorder`
`recorders/em/quotes/em_kdata_recorder.py:33`

## `EMStockKdataRecorder`
`recorders/em/quotes/em_kdata_recorder.py:170`

## `EMStockusKdataRecorder`
`recorders/em/quotes/em_kdata_recorder.py:200`

## `EMStockhkKdataRecorder`
`recorders/em/quotes/em_kdata_recorder.py:208`

## `EMIndexhkKdataRecorder`
`recorders/em/quotes/em_kdata_recorder.py:216`

## `EMIndexKdataRecorder`
`recorders/em/quotes/em_kdata_recorder.py:223`

## `EMIndexusKdataRecorder`
`recorders/em/quotes/em_kdata_recorder.py:230`

## `EMBlockKdataRecorder`
`recorders/em/quotes/em_kdata_recorder.py:237`

## `EMFutureKdataRecorder`
`recorders/em/quotes/em_kdata_recorder.py:244`

## `EMCurrencyKdataRecorder`
`recorders/em/quotes/em_kdata_recorder.py:251`

## `EMDragonAndTigerRecorder`
`recorders/em/trading/em_dragon_and_tiger_recorder.py:12`

## `ChinaETFListSpider`
`recorders/exchange/exchange_etf_meta_recorder.py:18`

## `ExchangeIndexRecorder`
`recorders/exchange/exchange_index_recorder.py:9`

## `ExchangeIndexStockRecorder`
`recorders/exchange/exchange_index_stock_recorder.py:15`

## `ExchangeStockMetaRecorder`
`recorders/exchange/exchange_stock_meta_recorder.py:15`

## `ExchangeStockSummaryRecorder`
`recorders/exchange/exchange_stock_summary_recorder.py:13`

## `JqChinaEtfValuationRecorder`
`recorders/joinquant/fundamental/jq_etf_valuation_recorder.py:12`

## `MarginTradingRecorder`
`recorders/joinquant/fundamental/jq_margin_trading_recorder.py:14`

## `JqChinaStockValuationRecorder`
`recorders/joinquant/fundamental/jq_stock_valuation_recorder.py:14`

## `JqChinaFundRecorder`
`recorders/joinquant/meta/jq_fund_meta_recorder.py:15`

## `JqChinaFundStockRecorder`
`recorders/joinquant/meta/jq_fund_meta_recorder.py:72`

## `BaseJqChinaMetaRecorder`
`recorders/joinquant/meta/jq_stock_meta_recorder.py:15`

## `JqChinaStockRecorder`
`recorders/joinquant/meta/jq_stock_meta_recorder.py:44`

## `JqChinaEtfRecorder`
`recorders/joinquant/meta/jq_stock_meta_recorder.py:58`

## `JqChinaStockEtfPortfolioRecorder`
`recorders/joinquant/meta/jq_stock_meta_recorder.py:70`

## `StockTradeDayRecorder`
`recorders/joinquant/meta/jq_trade_day_recorder.py:11`

## `JoinquantHkHolderRecorder`
`recorders/joinquant/misc/jq_hk_holder_recorder.py:20`

## `JoinquantIndexMoneyFlowRecorder`
`recorders/joinquant/misc/jq_index_money_flow_recorder.py:12`

## `JoinquantStockMoneyFlowRecorder`
`recorders/joinquant/misc/jq_stock_money_flow_recorder.py:17`

## `CrossMarketSummaryRecorder`
`recorders/joinquant/overall/jq_cross_market_recorder.py:9`

## `MarginTradingSummaryRecorder`
`recorders/joinquant/overall/jq_margin_trading_recorder.py:14`

## `StockSummaryRecorder`
`recorders/joinquant/overall/jq_stock_summary_recorder.py:21`

## `JqChinaIndexKdataRecorder`
`recorders/joinquant/quotes/jq_index_kdata_recorder.py:18`

## `JqChinaStockKdataRecorder`
`recorders/joinquant/quotes/jq_stock_kdata_recorder.py:17`

## `JqkaLimitUpRecorder`
`recorders/jqka/emotion/JqkaEmotionRecorder.py:24`

## `JqkaLimitDownRecorder`
`recorders/jqka/emotion/JqkaEmotionRecorder.py:99`

## `JqkaEmotionRecorder`
`recorders/jqka/emotion/JqkaEmotionRecorder.py:163`

## `QmtIndexRecorder`
`recorders/qmt/index/qmt_index_recorder.py:16`

## `QMTStockRecorder`
`recorders/qmt/meta/qmt_stock_meta_recorder.py:9`

## `BaseQmtKdataRecorder`
`recorders/qmt/quotes/qmt_kdata_recorder.py:17`

## `QMTStockKdataRecorder`
`recorders/qmt/quotes/qmt_kdata_recorder.py:169`

## `SinaBlockRecorder`
`recorders/sina/meta/sina_block_recorder.py:15`

## `SinaChinaBlockStockRecorder`
`recorders/sina/meta/sina_block_recorder.py:59`

## `SinaBlockMoneyFlowRecorder`
`recorders/sina/money_flow/sina_block_money_flow_recorder.py:17`

## `SinaStockMoneyFlowRecorder`
`recorders/sina/money_flow/sina_stock_money_flow_recorder.py:12`

## `ChinaETFDayKdataRecorder`
`recorders/sina/quotes/sina_etf_kdata_recorder.py:16`

## `ChinaIndexDayKdataRecorder`
`recorders/sina/quotes/sina_index_kdata_recorder.py:15`

## `WBCountryRecorder`
`recorders/wb/wb_country_recorder.py:9`

## `WBEconomyRecorder`
`recorders/wb/wb_economy_recorder.py:10`

FILE:references/components/samples.md
# samples (2 classes)

## `MyMaTrader`
`samples/stock_traders.py:8`

## `MyBullTrader`
`samples/stock_traders.py:27`

FILE:references/components/tag.md
# tag (42 classes)

## `StockPoolType`
`tag/common.py:5`

## `DynamicPoolType`
`tag/common.py:11`

## `InsertMode`
`tag/common.py:16`

## `TagType`
`tag/common.py:21`

## `TagStatsQueryType`
`tag/common.py:28`

## `TagInfoModel`
`tag/tag_models.py:12`

## `CreateTagInfoModel`
`tag/tag_models.py:18`

## `IndustryInfoModel`
`tag/tag_models.py:23`

## `MainTagIndustryRelation`
`tag/tag_models.py:30`

## `BuildMainTagIndustryRelationModel`
`tag/tag_models.py:35`

## `MainTagSubTagRelation`
`tag/tag_models.py:41`

## `BuildMainTagSubTagRelationModel`
`tag/tag_models.py:46`

## `ChangeMainTagModel`
`tag/tag_models.py:52`

## `StockTagsModel`
`tag/tag_models.py:57`

## `SimpleStockTagsModel`
`tag/tag_models.py:71`

## `QueryStockTagsModel`
`tag/tag_models.py:85`

## `QuerySimpleStockTagsModel`
`tag/tag_models.py:89`

## `BatchSetStockTagsModel`
`tag/tag_models.py:93`

## `TagParameter`
`tag/tag_models.py:100`

## `StockTagOptions`
`tag/tag_models.py:109`

## `SetStockTagsModel`
`tag/tag_models.py:119`

## `StockPoolModel`
`tag/tag_models.py:151`

## `StockPoolInfoModel`
`tag/tag_models.py:156`

## `CreateStockPoolInfoModel`
`tag/tag_models.py:161`

## `StockPoolsModel`
`tag/tag_models.py:173`

## `CreateStockPoolsModel`
`tag/tag_models.py:178`

## `QueryStockTagStatsModel`
`tag/tag_models.py:193`

## `StockTagDetailsModel`
`tag/tag_models.py:224`

## `StockTagStatsModel`
`tag/tag_models.py:258`

## `ActivateSubTagsModel`
`tag/tag_models.py:269`

## `ActivateSubTagsResultModel`
`tag/tag_models.py:273`

## `IndustryInfo`
`tag/tag_schemas.py:12`

## `MainTagInfo`
`tag/tag_schemas.py:21`

## `SubTagInfo`
`tag/tag_schemas.py:28`

## `HiddenTagInfo`
`tag/tag_schemas.py:38`

## `StockTags`
`tag/tag_schemas.py:45`
> Schema for storing stock tags

## `StockSystemTags`
`tag/tag_schemas.py:70`

## `StockPoolInfo`
`tag/tag_schemas.py:100`

## `StockPools`
`tag/tag_schemas.py:106`

## `TagStats`
`tag/tag_schemas.py:115`

## `Tagger`
`tag/tagger.py:16`

## `StockTagger`
`tag/tagger.py:40`

FILE:references/components/trader.md
# trader (22 classes)

## `TradingSignalType`
`trader/__init__.py:11`

## `OrderType`
`trader/__init__.py:20`

## `TradingSignal`
`trader/__init__.py:39`

## `TradingListener`
`trader/__init__.py:77`

## `AccountService`
`trader/__init__.py:94`

## `TraderError`
`trader/errors.py:2`
> Base class for exceptions in this module.

## `InvalidOrderParamError`
`trader/errors.py:8`

## `NotEnoughMoneyError`
`trader/errors.py:13`

## `NotEnoughPositionError`
`trader/errors.py:18`

## `InvalidOrderError`
`trader/errors.py:23`

## `WrongKdataError`
`trader/errors.py:28`

## `SimAccountService`
`trader/sim_account.py:25`

## `Trader`
`trader/trader.py:26`

## `StockTrader`
`trader/trader.py:535`

## `AccountStatsReader`
`trader/trader_info_api.py:69`

## `OrderReader`
`trader/trader_info_api.py:119`

## `PositionModel`
`trader/trader_models.py:7`

## `AccountStatsModel`
`trader/trader_models.py:32`

## `TraderInfo`
`trader/trader_schemas.py:13`
> trader info

## `AccountStats`
`trader/trader_schemas.py:33`
> account stats of every day

## `Position`
`trader/trader_schemas.py:63`

## `Order`
`trader/trader_schemas.py:97`

FILE:references/components/trading.md
# trading (19 classes)

## `ExecutionStatus`
`trading/common.py:5`

## `KdataRequestModel`
`trading/trading_models.py:19`

## `KdataModel`
`trading/trading_models.py:28`

## `TSRequestModel`
`trading/trading_models.py:36`

## `TSModel`
`trading/trading_models.py:42`

## `QuoteStatsModel`
`trading/trading_models.py:49`

## `QueryStockQuoteSettingModel`
`trading/trading_models.py:70`

## `BuildQueryStockQuoteSettingModel`
`trading/trading_models.py:75`

## `QueryTagQuoteModel`
`trading/trading_models.py:87`

## `QueryStockQuoteModel`
`trading/trading_models.py:92`

## `StockQuoteModel`
`trading/trading_models.py:102`

## `TagQuoteStatsModel`
`trading/trading_models.py:140`

## `StockQuoteStatsModel`
`trading/trading_models.py:156`

## `TradingPlanModel`
`trading/trading_models.py:173`

## `BuildTradingPlanModel`
`trading/trading_models.py:193`

## `QueryTradingPlanModel`
`trading/trading_models.py:214`

## `TagQuoteStats`
`trading/trading_schemas.py:12`

## `TradingPlan`
`trading/trading_schemas.py:24`

## `QueryStockQuoteSetting`
`trading/trading_schemas.py:44`

ClawHub Coding Data Analysis+2

T@clawhub-tangweigang-jpg-8679fec286

Previous4 / 4