@clawhub-tangweigang-jpg-8679fec286
Beancount 纯文本复式记账框架,支持导入银行对账单和交易数据,自动生成资产负债表和损益表等财务报表。
---
name: beancount-plaintext-ledger
description: |-
Beancount 纯文本复式记账框架,支持导入银行对账单和交易数据,自动生成资产负债表和损益表等财务报表。
license: Proprietary. See LICENSE.txt in project root.
compatibility: Designed for Doramagic-host ecosystem (Claude Code / openclaw / Cursor). Requires Python 3.12+ with uv package manager.
metadata:
version: "v6.1"
blueprint_id: "finance-bp-129"
compiled_at: "2026-04-22T13:01:04.739311+00:00"
capability_markets: "global"
capability_activities: "accounting"
sop_version: "crystal-compilation-v6.1"
---
# Beancount 纯文本账本 (beancount-plaintext-ledger)
> Beancount 纯文本复式记账框架,支持导入银行对账单和交易数据,自动生成资产负债表和损益表等财务报表。
## Pipeline
`data_collection -> data_storage -> factor_computation -> target_selection -> trading_execution -> visualization`
## Top Use Cases (2 total)
### Beancount Test Utilities Framework (`UC-101`)
Provides reusable testing utilities for beancount test scripts including temporary directory management and test file creation for integration testing
**Triggers**: testing utilities, tempdir, test files
### Test Utils Validation Suite (`UC-102`)
Unit tests that validate the correctness of test utility functions including temporary directory cleanup and test file generation for beancount test s
**Triggers**: unit test, validation, test utilities
**Execute trigger**: `When user intent matches intent_router.uc_entries[].positive_terms AND user uses action verb (run/execute/跑/执行/backtest/fetch/collect)`
## What I'll Ask You
- Target market: A-share (default), HK, or crypto? (US stocks in ZVT are half-baked — stockus_nasdaq_AAPL exists but coverage is thin)
- Data source / provider: eastmoney (free, no account), joinquant (account+paid), baostock (free, good history), akshare, or qmt (broker)?
- Strategy type: MACD golden-cross, MA crossover, volume breakout, fundamental screen, or custom factor?
- Time range: start_timestamp and end_timestamp for backtest period
- Target entity IDs: specific stocks (stock_sh_600000) or index components (SZ1000)?
## Semantic Locks (Fatal)
| ID | Rule | On Violation |
|---|---|---|
| `SL-01` | Execute sell orders before buy orders in every trading cycle | halt |
| `SL-02` | Trading signals MUST use next-bar execution (no look-ahead) | halt |
| `SL-03` | Entity IDs MUST follow format entity_type_exchange_code | halt |
| `SL-04` | DataFrame index MUST be MultiIndex (entity_id, timestamp) | halt |
| `SL-05` | TradingSignal MUST have EXACTLY ONE of: position_pct, order_money, order_amount | halt |
| `SL-06` | filter_result column semantics: True=BUY, False=SELL, None/NaN=NO ACTION | halt |
| `SL-07` | Transformer MUST run BEFORE Accumulator in factor pipeline | halt |
| `SL-08` | MACD parameters locked: fast=12, slow=26, signal=9 | halt |
Full lock definitions: [references/LOCKS.md](references/LOCKS.md)
## Top Anti-Patterns (15 total)
- **`AP-ACCOUNTING-001`**: Using floating-point arithmetic for monetary amounts
- **`AP-ACCOUNTING-002`**: Skipping initialization calls before VM/script execution
- **`AP-ACCOUNTING-003`**: Mixing different asset types in monetary operations
All 15 anti-patterns: [references/ANTI_PATTERNS.md](references/ANTI_PATTERNS.md)
## Evidence Quality Notice
> [QUALITY NOTICE] This crystal was compiled from blueprint finance-bp-129. Evidence verify ratio = 51.5% and audit fail total = 7. Generated results may have uncaptured requirement gaps. Verify critical decisions against source files (LATEST.yaml / LATEST.jsonl).
## Reference Files
| File | Contents | When to Load |
|---|---|---|
| [references/seed.yaml](references/seed.yaml) | V6+ 全量权威 (source-of-truth) | 有行为/决策争议时必读 |
| [references/ANTI_PATTERNS.md](references/ANTI_PATTERNS.md) | 15 条跨项目反模式 | 开始实现前 |
| [references/WISDOM.md](references/WISDOM.md) | 跨项目精华借鉴 | 架构决策时 |
| [references/CONSTRAINTS.md](references/CONSTRAINTS.md) | domain + fatal 约束 | 规则冲突时 |
| [references/USE_CASES.md](references/USE_CASES.md) | 全量 KUC-* 业务场景 | 需要完整示例时 |
| [references/LOCKS.md](references/LOCKS.md) | SL-* + preconditions + hints | 生成回测/交易代码前 |
| [references/COMPONENTS.md](references/COMPONENTS.md) | AST 组件地图(按 module 拆分)| 查 API 时 |
---
*Compiled by Doramagic crystal-compilation-v6.1 from `finance-bp-129` blueprint at 2026-04-22T13:01:04.739311+00:00.*
*See [human_summary.md](human_summary.md) for non-technical overview.*
FILE:human_summary.md
# Human Summary
> I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow. Just tell me what you want; I'll write the code, you don't have to dig docs. (Heads up: ZVT natively supports A-share, HK, and crypto. US stocks — stockus_nasdaq_AAPL — are half-baked; don't bother for serious work.)
## What I Can Do
- **tagline**: I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow. Just tell me what you want; I'll write the code, you don't have to dig docs. (Heads up: ZVT natively supports A-share, HK, and crypto. US stocks — stockus_nasdaq_AAPL — are half-baked; don't bother for serious work.)
- **use_cases**: ['Test Utils Validation Suite', 'Beancount Test Utilities Framework', 'A-share MACD daily golden-cross backtest with hfq price adjustment from eastmoney', 'End-to-end ZVT pipeline: FinanceRecorder + GoodCompanyFactor + StockTrader', 'Multi-factor strategy with TargetSelector (AND mode) combining MACD + volume breakout', 'Index composition data collection (SZ1000, SZ2000) with EM recorder', 'Institutional fund holdings tracker via joinquant_fund_runner pattern']
## What I Ask You
- Target market: A-share (default), HK, or crypto? (US stocks in ZVT are half-baked — stockus_nasdaq_AAPL exists but coverage is thin)
- Data source / provider: eastmoney (free, no account), joinquant (account+paid), baostock (free, good history), akshare, or qmt (broker)?
- Strategy type: MACD golden-cross, MA crossover, volume breakout, fundamental screen, or custom factor?
- Time range: start_timestamp and end_timestamp for backtest period
- Target entity IDs: specific stocks (stock_sh_600000) or index components (SZ1000)?
FILE:references/ANTI_PATTERNS.md
# Anti-Patterns (Cross-Project)
Total: **15**
## finance-bp-073--ledger (7)
### `AP-ACCOUNTING-002` — Skipping initialization calls before VM/script execution <sub>(high)</sub>
Executing Numscript VM without first calling ResolveResources() and ResolveBalances() causes panics with ErrResourcesNotInitialized or ErrBalancesNotInitialized. This prevents any script execution and leaves transactions in an unrunnable state, blocking financial operations entirely.
### `AP-ACCOUNTING-003` — Mixing different asset types in monetary operations <sub>(high)</sub>
Performing addition, subtraction, or take operations on amounts with different asset types produces invalid financial calculations. This violates the fundamental accounting principle that amounts in different currencies cannot be combined, leading to corrupted account balances and failed reconciliations.
### `AP-ACCOUNTING-004` — Missing insufficient funds validation <sub>(high)</sub>
Failing to detect when account balance cannot cover a requested withdrawal or transfer allows overdrafts beyond permitted limits. This causes real monetary losses, account balance violations, and potential regulatory compliance issues in global markets.
### `AP-ACCOUNTING-005` — Non-atomic transaction commit/rollback <sub>(high)</sub>
Processing database operations without atomic commit/rollback leaves partial state when failures occur. This corrupts account balances and volumes, violating double-entry bookkeeping integrity and making audit trails unreliable for global regulatory compliance.
### `AP-ACCOUNTING-006` — On-demand posting generation causing double-spending <sub>(high)</sub>
Computing postings on-demand rather than accumulating them during transaction execution fails to track already-spent funds within the same transaction. This creates double-spending vulnerabilities that violate atomic transaction semantics and can result in significant financial losses.
### `AP-ACCOUNTING-007` — Log insertion after transaction commit breaking event sourcing <sub>(high)</sub>
Committing the transaction before inserting the audit log breaks the event sourcing pattern fundamental to accounting integrity. This makes it impossible to rebuild state from logs and violates audit requirements necessary for global financial compliance.
### `AP-ACCOUNTING-008` — Incomplete transaction log hash chaining <sub>(high)</sub>
Computing log hashes without including the previous log hash breaks the immutable audit trail chain. This allows undetected tampering with historical transaction records, compromising financial integrity and regulatory audit compliance.
## finance-bp-073--ledger, finance-bp-129--beancount (1)
### `AP-ACCOUNTING-001` — Using floating-point arithmetic for monetary amounts <sub>(high)</sub>
Representing currency values with float64 or similar floating-point types causes precision loss during arithmetic operations. Rounding errors accumulate over multiple transactions, leading to incorrect balance calculations and potential financial losses. This violates the fundamental requirement that monetary calculations must be exact.
## finance-bp-078--fava_investor (4)
### `AP-ACCOUNTING-009` — Incorrect row data access patterns on query results <sub>(high)</sub>
Using dictionary notation (row['column_name']) on namedtuple query results raises TypeError since namedtuples only support attribute access. This breaks all module queries expecting attribute-style access, causing asset allocation, tax loss harvesting, and other critical financial computations to fail.
### `AP-ACCOUNTING-010` — Missing bidirectional inference for fund relationship declarations <sub>(medium)</sub>
When relationship A→B is declared but B→A is not inferred, the TLH partner list becomes incomplete. This leads to suboptimal tax-loss harvesting decisions where only some funds show all valid swap options, reducing potential tax savings for investors.
### `AP-ACCOUNTING-011` — Wash sale comparison within substantially identical groups <sub>(high)</sub>
Comparing a ticker to itself in its own substantially identical group falsely triggers wash sale warnings. This incorrectly blocks valid tax-loss harvesting transactions, causing investors to miss opportunities to realize tax losses and offset capital gains.
### `AP-ACCOUNTING-012` — Missing substantially identical tickers in wash sale queries <sub>(high)</sub>
Omitting substantially identical fund tickers from the wash sale comparison set allows purchases of similar funds within the 30-day window. This triggers unintended wash sales that disallow tax loss claims on subsequent sales of the original position.
## finance-bp-129--beancount (3)
### `AP-ACCOUNTING-013` — Using parsed entries with MISSING sentinel values for calculations <sub>(high)</sub>
Using parsed entries directly that contain MISSING sentinel values for balance or cost computations causes runtime errors or silent zero-value calculations. This results in incorrect portfolio valuations and reconciliation failures, compromising financial reporting accuracy.
### `AP-ACCOUNTING-014` — Underspecified interpolation with multiple missing values per currency <sub>(high)</sub>
Having more than one missing value per currency group creates an underdetermined system with no unique solution during interpolation. This causes InterpolationError and transaction failure, blocking balance calculations for affected accounts.
### `AP-ACCOUNTING-015` — Violating accounting identity in opening balance transactions <sub>(high)</sub>
Creating opening balance transactions where the total balance of summarized entries does not equal exactly zero violates the fundamental accounting identity (Assets = Liabilities + Equity). This causes the balance sheet to be fundamentally incorrect with non-zero total assets and liabilities.
FILE:references/COMPONENTS.md
# Component Capability Map
**Project**: finance-bp-129--beancount
**Scan date**: 2026-04-22
**Stats**: {'total_files': 6, 'total_classes': 19, 'total_functions': 0, 'total_stages': 6}
## Modules (6)
- [parsing](components/parsing.md): 4 classes
- [booking_(lot_matching)](components/booking_-lot_matching.md): 3 classes
- [transformation_(plugins)](components/transformation_-plugins.md): 3 classes
- [realization](components/realization.md): 3 classes
- [summarization](components/summarization.md): 3 classes
- [validation](components/validation.md): 3 classes
FILE:references/CONSTRAINTS.md
# Constraints
## preservation_manifest
```yaml
required_objects:
business_decisions_count: 116
fatal_constraints_count: 38
non_fatal_constraints_count: 146
use_cases_count: 2
semantic_locks_count: 12
preconditions_count: 4
evidence_quality_rules_count: 2
traceback_scenarios_count: 5
```
FILE:references/LOCKS.md
# Semantic Locks + Preconditions
## Semantic Locks (12)
### `SL-01` <sub>(on_violation: fatal)</sub>
Execute sell orders before buy orders in every trading cycle
### `SL-02` <sub>(on_violation: fatal)</sub>
Trading signals MUST use next-bar execution (no look-ahead)
### `SL-03` <sub>(on_violation: fatal)</sub>
Entity IDs MUST follow format entity_type_exchange_code
### `SL-04` <sub>(on_violation: fatal)</sub>
DataFrame index MUST be MultiIndex (entity_id, timestamp)
### `SL-05` <sub>(on_violation: fatal)</sub>
TradingSignal MUST have EXACTLY ONE of: position_pct, order_money, order_amount
### `SL-06` <sub>(on_violation: fatal)</sub>
filter_result column semantics: True=BUY, False=SELL, None/NaN=NO ACTION
### `SL-07` <sub>(on_violation: fatal)</sub>
Transformer MUST run BEFORE Accumulator in factor pipeline
### `SL-08` <sub>(on_violation: fatal)</sub>
MACD parameters locked: fast=12, slow=26, signal=9
### `SL-09` <sub>(on_violation: warning)</sub>
Default transaction costs: buy_cost=0.001, sell_cost=0.001, slippage=0.001
### `SL-10` <sub>(on_violation: fatal)</sub>
A-share equity trading is T+1 (no same-day close of buy positions)
### `SL-11` <sub>(on_violation: fatal)</sub>
Recorder subclass MUST define provider AND data_schema class attributes
### `SL-12` <sub>(on_violation: fatal)</sub>
Factor result_df MUST contain either 'filter_result' OR 'score_result' column
## Preconditions (4)
- **`PC-01`**: `python3 -c 'import zvt; print(zvt.__version__)'` → on_fail: Run: python3 -m pip install zvt then re-run: python3 -m zvt.init_dirs to initialize data directories
- **`PC-02`**: `python3 -c "from zvt.api.kdata import get_kdata; df = get_kdata(entity_ids=['stock_sh_600000'], limit=1); assert df is not None and len(df) > 0, 'No kdata found'"` → on_fail: Run recorder first: python3 -m zvt.recorders.em.em_stock_kdata_recorder --entity_ids stock_sh_600000 (replace with your target entity IDs)
- **`PC-03`**: `python3 -c "import os; from pathlib import Path; zvt_home = Path(os.environ.get('ZVT_HOME', Path.home() / '.zvt')); assert zvt_home.exists(), f'ZVT home not found: {zvt_home}'"` → on_fail: Run: python3 -m zvt.init_dirs
- **`PC-04`**: `python3 -c "import os, tempfile; from pathlib import Path; zvt_home = Path(os.environ.get('ZVT_HOME', Path.home() / '.zvt')); test_f = zvt_home / '.write_test'; test_f.touch(); test_f.unlink()"` → on_fail: Check directory permissions: chmod u+w ~/.zvt or set ZVT_HOME environment variable to a writable location
FILE:references/USE_CASES.md
# Known Use Cases (KUC)
Total: **2**
## `KUC-101`
**Source**: `beancount/utils/test_utils.py`
Provides reusable testing utilities for beancount test scripts including temporary directory management and test file creation for integration testing.
## `KUC-102`
**Source**: `beancount/utils/test_utils_test.py`
Unit tests that validate the correctness of test utility functions including temporary directory cleanup and test file generation for beancount test scripts.
FILE:references/WISDOM.md
# Cross-Project Wisdom
Total: **10**
## `CW-ACCOUNTING-001` — Use exact-precision integer types for monetary representation
**From**: finance-bp-073--ledger, finance-bp-129--beancount · **Applicable to**: accounting
Both the Numscript ledger and Beancount parser mandates using Decimal (beancount) or MonetaryInt based on big.Int (ledger) instead of floating-point. This pattern ensures no rounding errors accumulate in financial calculations, critical for audit compliance in global markets.
## `CW-ACCOUNTING-002` — Mandatory initialization sequence before execution
**From**: finance-bp-073--ledger · **Applicable to**: accounting
The Numscript VM requires a strict initialization sequence: ResolveResources() then ResolveBalances() must both be called before Execute(). Skipping any step causes panics. This teaches that VM/script execution requires careful state setup—always verify prerequisites before running financial logic.
## `CW-ACCOUNTING-003` — Dual idempotency key strategy
**From**: finance-bp-073--ledger · **Applicable to**: accounting
Using both IdempotencyKey and IdempotencyHash together ensures robust duplicate detection: IdempotencyKey prevents exact retries while IdempotencyHash catches retries with different input parameters that would otherwise incorrectly succeed. Single-key approaches leave gaps in financial transaction safety.
## `CW-ACCOUNTING-004` — Log-before-commit event sourcing pattern
**From**: finance-bp-073--ledger · **Applicable to**: accounting
In the transaction processing pipeline, the log must be inserted before committing the transaction to maintain event sourcing integrity. This ensures the audit trail can always reconstruct state and supports rollback scenarios, critical for regulatory compliance in global accounting.
## `CW-ACCOUNTING-005` — Read Committed isolation with FOR UPDATE locks
**From**: finance-bp-073--ledger · **Applicable to**: accounting
When implementing balance operations, use Read Committed isolation level combined with FOR UPDATE row locks. This prevents concurrent transactions from creating inconsistent balances (e.g., both succeeding when they should fail due to insufficient funds), ensuring data integrity under concurrent load.
## `CW-ACCOUNTING-006` — Transitive closure for equivalence relationships
**From**: finance-bp-078--fava_investor · **Applicable to**: accounting
When building commodity groups or substantially identical fund relationships, apply transitive closure to infer complete equivalence. If A equals B and B equals C, then A, B, and C form one group. This ensures wash sale detection and TLH calculations are complete and accurate across all declared relationships.
## `CW-ACCOUNTING-007` — Canonical representative selection for relationship groups
**From**: finance-bp-078--fava_investor · **Applicable to**: accounting
When selecting a representative for a substantially identical fund group, always return the same representative ticker for any member of that group. Inconsistent representative selection causes non-deterministic calculations where the same ticker gets different partners depending on which group member is queried.
## `CW-ACCOUNTING-008` — Immutable monetary objects with __slots__
**From**: finance-bp-129--beancount · **Applicable to**: accounting
Constructing Amount or Position objects using immutable Decimal values with __slots__ = () pattern prevents accidental mutation of monetary values after creation. This immutability ensures financial calculations remain consistent throughout transaction processing and audit trails.
## `CW-ACCOUNTING-009` — Eliminate all MISSING values before presenting parsed data as complete
**From**: finance-bp-129--beancount · **Applicable to**: accounting
Parsed entries with MISSING sentinel values are incomplete and cannot be used for financial reporting. All MISSING values must be resolved through booking and interpolation before claiming parsed entries are ready for balance calculations or realized/unrealized gains computation.
## `CW-ACCOUNTING-010` — Strict schema compatibility across class hierarchies
**From**: finance-bp-078--fava_investor, finance-bp-129--beancount · **Applicable to**: accounting
When extending base classes with additional functionality (like ScaledNAV extending RelateTickers), maintain compatibility with existing metadata schemas. Schema divergence causes extended classes to miss relationships declared for the base class, breaking wash sale detection and TLH recommendations.
FILE:references/components/booking_-lot_matching.md
# booking_(lot_matching) (3 classes)
## `Inventory.reduce`
`booking_(lot_matching)/inventory-reduce.py:0`
## `booking_method_STRICT`
`booking_(lot_matching)/booking-method-strict.py:0`
## `booking_method_fn`
`booking_(lot_matching)/booking-method-fn.py:0`
FILE:references/components/parsing.md
# parsing (4 classes)
## `Builder.build`
`parsing/builder-build.py:0`
## `OptDesc.convert`
`parsing/optdesc-convert.py:0`
## `booking_method`
`parsing/booking-method.py:0`
## `plugin`
`parsing/plugin.py:0`
FILE:references/components/realization.md
# realization (3 classes)
## `RealAccount.txn_postings`
`realization/realaccount-txn-postings.py:0`
## `Amount.__slots__`
`realization/amount-slots.py:0`
## `balance_reducer`
`realization/balance-reducer.py:0`
FILE:references/components/summarization.md
# summarization (3 classes)
## `AccountTypes.equity`
`summarization/accounttypes-equity.py:0`
## `summarize.open`
`summarization/summarize-open.py:0`
## `conversion_currency`
`summarization/conversion-currency.py:0`
FILE:references/components/transformation_-plugins.md
# transformation_(plugins) (3 classes)
## `DocumentError.check`
`transformation_(plugins)/documenterror-check.py:0`
## `PadError.check`
`transformation_(plugins)/paderror-check.py:0`
## `plugin_module`
`transformation_(plugins)/plugin-module.py:0`
FILE:references/components/validation.md
# validation (3 classes)
## `ValidationError.check`
`validation/validationerror-check.py:0`
## `validate_open_close`
`validation/validate-open-close.py:0`
## `extra_validations`
`validation/extra-validations.py:0`
FILE:references/seed.yaml
meta:
id: finance-bp-129-v5.3
version: v6.1
blueprint_id: finance-bp-129
sop_version: crystal-compilation-v6.1
source_language: en
compiled_at: '2026-04-22T13:01:04.739311+00:00'
target_host: openclaw
authoritative_artifact:
primary: seed.yaml
non_authoritative_derivatives:
- SKILL.md (host-generated summary, may lag)
- HEARTBEAT.md (host telemetry)
- memory/*.md (host conversational memory)
rule: On any behavioral decision (preconditions check, OV assertion, EQ rule firing, spec_lock verification), agents MUST
re-read seed.yaml. Derivatives are for UI display only and may be out-of-date.
execution_protocol:
install_trigger:
- Execute resources.host_adapter.install_recipes[] in declared order
- Verify each package with import check before proceeding
execute_trigger: When user intent matches intent_router.uc_entries[].positive_terms AND user uses action verb (run/execute/跑/执行/backtest/fetch/collect)
on_execute:
- Reload seed.yaml (do not rely on SKILL.md or cached summaries)
- Run preconditions[] in declared order; halt on first fatal failure with on_fail message to user
- Enter context_state_machine.CA1_MEMORY_CHECKED state
- Evaluate evidence_quality.enforcement_rules[]; prepend user_disclosure_template
- Translate user_facing_fields to user locale per locale_contract
- "[V6 READING ORDER]\nThis crystal contains the following V6 layers. Before answering any business question, the host\
\ MUST read them in order:\n 1. anti_patterns[] — cross-project anti-patterns (with AP-* ids)\n 2. cross_project_wisdom[]\
\ — cross-project wisdom (with CW-* ids)\n 3. domain_constraints_injected[] — domain constraints (SHARED-* ids)\n \
\ 4. known_use_cases[] — concrete business scenarios (KUC-* ids)\n 5. component_capability_map — AST component map\
\ (by module)\n\nWhen answering user questions, proactively cite relevant AP-*/CW-*/SHARED-*/KUC-* ids with source text.\
\ Examples: T+1 rules -> cite SHARED-* constraint; model comparison -> warn via AP-*; follow-holdings strategy -> cite\
\ KUC-* with example file."
workspace_resolution:
scripts_path: '{host_workspace}/scripts/'
skills_path: '{host_workspace}/skills/'
trace_path: '{host_workspace}/.trace/'
capability_tags:
markets:
- global
activities:
- accounting
upgraded_from: finance-bp-129-v1.seed.yaml
upgraded_at: '2026-04-22T13:20:35.880096+00:00'
v6_inputs:
ast_mind_map: knowledge/sources/finance/finance-bp-129--beancount/v6_inputs/ast_mind_map.yaml
anti_patterns: null
cross_project_wisdom: null
examples_kuc: knowledge/sources/finance/finance-bp-129--beancount/v6_inputs/examples_kuc.yaml
shared_pools_dir: knowledge/sources/finance/_shared
anti_patterns:
- id: AP-ACCOUNTING-001
title: Using floating-point arithmetic for monetary amounts
description: Representing currency values with float64 or similar floating-point types causes precision loss during arithmetic
operations. Rounding errors accumulate over multiple transactions, leading to incorrect balance calculations and potential
financial losses. This violates the fundamental requirement that monetary calculations must be exact.
project_source: finance-bp-073--ledger, finance-bp-129--beancount
severity: high
applicable_to_tags:
markets:
- global
activities:
- accounting
_source_file: anti-patterns/accounting.yaml
- id: AP-ACCOUNTING-002
title: Skipping initialization calls before VM/script execution
description: Executing Numscript VM without first calling ResolveResources() and ResolveBalances() causes panics with ErrResourcesNotInitialized
or ErrBalancesNotInitialized. This prevents any script execution and leaves transactions in an unrunnable state, blocking
financial operations entirely.
project_source: finance-bp-073--ledger
severity: high
applicable_to_tags:
markets:
- global
activities:
- accounting
_source_file: anti-patterns/accounting.yaml
- id: AP-ACCOUNTING-003
title: Mixing different asset types in monetary operations
description: Performing addition, subtraction, or take operations on amounts with different asset types produces invalid
financial calculations. This violates the fundamental accounting principle that amounts in different currencies cannot
be combined, leading to corrupted account balances and failed reconciliations.
project_source: finance-bp-073--ledger
severity: high
applicable_to_tags:
markets:
- global
activities:
- accounting
_source_file: anti-patterns/accounting.yaml
- id: AP-ACCOUNTING-004
title: Missing insufficient funds validation
description: Failing to detect when account balance cannot cover a requested withdrawal or transfer allows overdrafts beyond
permitted limits. This causes real monetary losses, account balance violations, and potential regulatory compliance issues
in global markets.
project_source: finance-bp-073--ledger
severity: high
applicable_to_tags:
markets:
- global
activities:
- accounting
_source_file: anti-patterns/accounting.yaml
- id: AP-ACCOUNTING-005
title: Non-atomic transaction commit/rollback
description: Processing database operations without atomic commit/rollback leaves partial state when failures occur. This
corrupts account balances and volumes, violating double-entry bookkeeping integrity and making audit trails unreliable
for global regulatory compliance.
project_source: finance-bp-073--ledger
severity: high
applicable_to_tags:
markets:
- global
activities:
- accounting
_source_file: anti-patterns/accounting.yaml
- id: AP-ACCOUNTING-006
title: On-demand posting generation causing double-spending
description: Computing postings on-demand rather than accumulating them during transaction execution fails to track already-spent
funds within the same transaction. This creates double-spending vulnerabilities that violate atomic transaction semantics
and can result in significant financial losses.
project_source: finance-bp-073--ledger
severity: high
applicable_to_tags:
markets:
- global
activities:
- accounting
_source_file: anti-patterns/accounting.yaml
- id: AP-ACCOUNTING-007
title: Log insertion after transaction commit breaking event sourcing
description: Committing the transaction before inserting the audit log breaks the event sourcing pattern fundamental to
accounting integrity. This makes it impossible to rebuild state from logs and violates audit requirements necessary for
global financial compliance.
project_source: finance-bp-073--ledger
severity: high
applicable_to_tags:
markets:
- global
activities:
- accounting
_source_file: anti-patterns/accounting.yaml
- id: AP-ACCOUNTING-008
title: Incomplete transaction log hash chaining
description: Computing log hashes without including the previous log hash breaks the immutable audit trail chain. This allows
undetected tampering with historical transaction records, compromising financial integrity and regulatory audit compliance.
project_source: finance-bp-073--ledger
severity: high
applicable_to_tags:
markets:
- global
activities:
- accounting
_source_file: anti-patterns/accounting.yaml
- id: AP-ACCOUNTING-009
title: Incorrect row data access patterns on query results
description: Using dictionary notation (row['column_name']) on namedtuple query results raises TypeError since namedtuples
only support attribute access. This breaks all module queries expecting attribute-style access, causing asset allocation,
tax loss harvesting, and other critical financial computations to fail.
project_source: finance-bp-078--fava_investor
severity: high
applicable_to_tags:
markets:
- global
activities:
- accounting
_source_file: anti-patterns/accounting.yaml
- id: AP-ACCOUNTING-010
title: Missing bidirectional inference for fund relationship declarations
description: When relationship A→B is declared but B→A is not inferred, the TLH partner list becomes incomplete. This leads
to suboptimal tax-loss harvesting decisions where only some funds show all valid swap options, reducing potential tax
savings for investors.
project_source: finance-bp-078--fava_investor
severity: medium
applicable_to_tags:
markets:
- global
activities:
- accounting
_source_file: anti-patterns/accounting.yaml
- id: AP-ACCOUNTING-011
title: Wash sale comparison within substantially identical groups
description: Comparing a ticker to itself in its own substantially identical group falsely triggers wash sale warnings.
This incorrectly blocks valid tax-loss harvesting transactions, causing investors to miss opportunities to realize tax
losses and offset capital gains.
project_source: finance-bp-078--fava_investor
severity: high
applicable_to_tags:
markets:
- global
activities:
- accounting
_source_file: anti-patterns/accounting.yaml
- id: AP-ACCOUNTING-012
title: Missing substantially identical tickers in wash sale queries
description: Omitting substantially identical fund tickers from the wash sale comparison set allows purchases of similar
funds within the 30-day window. This triggers unintended wash sales that disallow tax loss claims on subsequent sales
of the original position.
project_source: finance-bp-078--fava_investor
severity: high
applicable_to_tags:
markets:
- global
activities:
- accounting
_source_file: anti-patterns/accounting.yaml
- id: AP-ACCOUNTING-013
title: Using parsed entries with MISSING sentinel values for calculations
description: Using parsed entries directly that contain MISSING sentinel values for balance or cost computations causes
runtime errors or silent zero-value calculations. This results in incorrect portfolio valuations and reconciliation failures,
compromising financial reporting accuracy.
project_source: finance-bp-129--beancount
severity: high
applicable_to_tags:
markets:
- global
activities:
- accounting
_source_file: anti-patterns/accounting.yaml
- id: AP-ACCOUNTING-014
title: Underspecified interpolation with multiple missing values per currency
description: Having more than one missing value per currency group creates an underdetermined system with no unique solution
during interpolation. This causes InterpolationError and transaction failure, blocking balance calculations for affected
accounts.
project_source: finance-bp-129--beancount
severity: high
applicable_to_tags:
markets:
- global
activities:
- accounting
_source_file: anti-patterns/accounting.yaml
- id: AP-ACCOUNTING-015
title: Violating accounting identity in opening balance transactions
description: Creating opening balance transactions where the total balance of summarized entries does not equal exactly
zero violates the fundamental accounting identity (Assets = Liabilities + Equity). This causes the balance sheet to be
fundamentally incorrect with non-zero total assets and liabilities.
project_source: finance-bp-129--beancount
severity: high
applicable_to_tags:
markets:
- global
activities:
- accounting
_source_file: anti-patterns/accounting.yaml
cross_project_wisdom:
- wisdom_id: CW-ACCOUNTING-001
source_project: finance-bp-073--ledger, finance-bp-129--beancount
pattern_name: Use exact-precision integer types for monetary representation
description: Both the Numscript ledger and Beancount parser mandates using Decimal (beancount) or MonetaryInt based on big.Int
(ledger) instead of floating-point. This pattern ensures no rounding errors accumulate in financial calculations, critical
for audit compliance in global markets.
applicable_to_activity: accounting
_source_file: cross-project-wisdom/accounting.yaml
- wisdom_id: CW-ACCOUNTING-002
source_project: finance-bp-073--ledger
pattern_name: Mandatory initialization sequence before execution
description: 'The Numscript VM requires a strict initialization sequence: ResolveResources() then ResolveBalances() must
both be called before Execute(). Skipping any step causes panics. This teaches that VM/script execution requires careful
state setup—always verify prerequisites before running financial logic.'
applicable_to_activity: accounting
_source_file: cross-project-wisdom/accounting.yaml
- wisdom_id: CW-ACCOUNTING-003
source_project: finance-bp-073--ledger
pattern_name: Dual idempotency key strategy
description: 'Using both IdempotencyKey and IdempotencyHash together ensures robust duplicate detection: IdempotencyKey
prevents exact retries while IdempotencyHash catches retries with different input parameters that would otherwise incorrectly
succeed. Single-key approaches leave gaps in financial transaction safety.'
applicable_to_activity: accounting
_source_file: cross-project-wisdom/accounting.yaml
- wisdom_id: CW-ACCOUNTING-004
source_project: finance-bp-073--ledger
pattern_name: Log-before-commit event sourcing pattern
description: In the transaction processing pipeline, the log must be inserted before committing the transaction to maintain
event sourcing integrity. This ensures the audit trail can always reconstruct state and supports rollback scenarios, critical
for regulatory compliance in global accounting.
applicable_to_activity: accounting
_source_file: cross-project-wisdom/accounting.yaml
- wisdom_id: CW-ACCOUNTING-005
source_project: finance-bp-073--ledger
pattern_name: Read Committed isolation with FOR UPDATE locks
description: When implementing balance operations, use Read Committed isolation level combined with FOR UPDATE row locks.
This prevents concurrent transactions from creating inconsistent balances (e.g., both succeeding when they should fail
due to insufficient funds), ensuring data integrity under concurrent load.
applicable_to_activity: accounting
_source_file: cross-project-wisdom/accounting.yaml
- wisdom_id: CW-ACCOUNTING-006
source_project: finance-bp-078--fava_investor
pattern_name: Transitive closure for equivalence relationships
description: When building commodity groups or substantially identical fund relationships, apply transitive closure to infer
complete equivalence. If A equals B and B equals C, then A, B, and C form one group. This ensures wash sale detection
and TLH calculations are complete and accurate across all declared relationships.
applicable_to_activity: accounting
_source_file: cross-project-wisdom/accounting.yaml
- wisdom_id: CW-ACCOUNTING-007
source_project: finance-bp-078--fava_investor
pattern_name: Canonical representative selection for relationship groups
description: When selecting a representative for a substantially identical fund group, always return the same representative
ticker for any member of that group. Inconsistent representative selection causes non-deterministic calculations where
the same ticker gets different partners depending on which group member is queried.
applicable_to_activity: accounting
_source_file: cross-project-wisdom/accounting.yaml
- wisdom_id: CW-ACCOUNTING-008
source_project: finance-bp-129--beancount
pattern_name: Immutable monetary objects with __slots__
description: Constructing Amount or Position objects using immutable Decimal values with __slots__ = () pattern prevents
accidental mutation of monetary values after creation. This immutability ensures financial calculations remain consistent
throughout transaction processing and audit trails.
applicable_to_activity: accounting
_source_file: cross-project-wisdom/accounting.yaml
- wisdom_id: CW-ACCOUNTING-009
source_project: finance-bp-129--beancount
pattern_name: Eliminate all MISSING values before presenting parsed data as complete
description: Parsed entries with MISSING sentinel values are incomplete and cannot be used for financial reporting. All
MISSING values must be resolved through booking and interpolation before claiming parsed entries are ready for balance
calculations or realized/unrealized gains computation.
applicable_to_activity: accounting
_source_file: cross-project-wisdom/accounting.yaml
- wisdom_id: CW-ACCOUNTING-010
source_project: finance-bp-078--fava_investor, finance-bp-129--beancount
pattern_name: Strict schema compatibility across class hierarchies
description: When extending base classes with additional functionality (like ScaledNAV extending RelateTickers), maintain
compatibility with existing metadata schemas. Schema divergence causes extended classes to miss relationships declared
for the base class, breaking wash sale detection and TLH recommendations.
applicable_to_activity: accounting
_source_file: cross-project-wisdom/accounting.yaml
domain_constraints_injected: []
resources_injected: {}
known_use_cases:
- kuc_id: KUC-101
source_file: beancount/utils/test_utils.py
business_problem: Provides reusable testing utilities for beancount test scripts including temporary directory management
and test file creation for integration testing.
intent_keywords:
- testing utilities
- tempdir
- test files
- mock repository
- integration testing
stage: testing
data_domain: internal
type: builtin_factor
- kuc_id: KUC-102
source_file: beancount/utils/test_utils_test.py
business_problem: Unit tests that validate the correctness of test utility functions including temporary directory cleanup
and test file generation for beancount test scripts.
intent_keywords:
- unit test
- validation
- test utilities
- tempdir cleanup
- test file generation
stage: testing
data_domain: internal
type: builtin_factor
component_capability_map:
project: finance-bp-129--beancount
scan_date: '2026-04-22'
stats:
total_files: 6
total_classes: 19
total_functions: 0
total_stages: 6
modules:
parsing:
class_count: 4
stage_id: parsing
stage_order: 1
responsibility: Tokenize and parse Beancount DSL files into directive data structures. Provides the foundation for each
downstream processing.
classes:
- name: Builder.build
file: parsing/builder-build.py
line: 0
kind: required_method
signature: ''
- name: OptDesc.convert
file: parsing/optdesc-convert.py
line: 0
kind: required_method
signature: ''
- name: booking_method
file: parsing/booking-method.py
line: 0
kind: replaceable_point
- name: plugin
file: parsing/plugin.py
line: 0
kind: replaceable_point
design_decision_count: 4
booking_(lot_matching):
class_count: 3
stage_id: booking
stage_order: 2
responsibility: Match inventory reductions to existing lots using configurable methods; infer missing posting amounts
via interpolation
classes:
- name: Inventory.reduce
file: booking_(lot_matching)/inventory-reduce.py
line: 0
kind: required_method
signature: ''
- name: booking_method_STRICT
file: booking_(lot_matching)/booking-method-strict.py
line: 0
kind: required_method
signature: ''
- name: booking_method_fn
file: booking_(lot_matching)/booking-method-fn.py
line: 0
kind: replaceable_point
design_decision_count: 5
transformation_(plugins):
class_count: 3
stage_id: transformation
stage_order: 3
responsibility: Apply user plugins and built-in transformations to synthesized entries, pad balances, check assertions
classes:
- name: DocumentError.check
file: transformation_(plugins)/documenterror-check.py
line: 0
kind: required_method
signature: ''
- name: PadError.check
file: transformation_(plugins)/paderror-check.py
line: 0
kind: required_method
signature: ''
- name: plugin_module
file: transformation_(plugins)/plugin-module.py
line: 0
kind: replaceable_point
design_decision_count: 4
realization:
class_count: 3
stage_id: realization
stage_order: 4
responsibility: Convert chronological list of directives into account tree with running balances for reporting
classes:
- name: RealAccount.txn_postings
file: realization/realaccount-txn-postings.py
line: 0
kind: required_method
signature: ''
- name: Amount.__slots__
file: realization/amount-slots.py
line: 0
kind: required_method
signature: ''
- name: balance_reducer
file: realization/balance-reducer.py
line: 0
kind: replaceable_point
design_decision_count: 4
summarization:
class_count: 3
stage_id: summarization
stage_order: 5
responsibility: Fold historical entries into balance sheet opening transactions; support period reporting
classes:
- name: AccountTypes.equity
file: summarization/accounttypes-equity.py
line: 0
kind: required_method
signature: ''
- name: summarize.open
file: summarization/summarize-open.py
line: 0
kind: required_method
signature: ''
- name: conversion_currency
file: summarization/conversion-currency.py
line: 0
kind: replaceable_point
design_decision_count: 3
validation:
class_count: 3
stage_id: validation
stage_order: 6
responsibility: Verify invariants hold after each transformations; ensure accounting rules are not violated
classes:
- name: ValidationError.check
file: validation/validationerror-check.py
line: 0
kind: required_method
signature: ''
- name: validate_open_close
file: validation/validate-open-close.py
line: 0
kind: required_method
signature: ''
- name: extra_validations
file: validation/extra-validations.py
line: 0
kind: replaceable_point
design_decision_count: 3
data_flow_hints: []
locale_contract:
source_language: en
user_facing_fields:
- human_summary.what_i_can_do.tagline
- human_summary.what_i_can_do.use_cases[]
- human_summary.what_i_auto_fetch[]
- human_summary.what_i_ask_you[]
- evidence_quality.user_disclosure_template
- post_install_notice.message_template.positioning
- post_install_notice.message_template.capability_catalog.groups[].name
- post_install_notice.message_template.capability_catalog.groups[].description
- post_install_notice.message_template.capability_catalog.groups[].ucs[].name
- post_install_notice.message_template.capability_catalog.groups[].ucs[].short_description
- post_install_notice.message_template.call_to_action
- post_install_notice.message_template.featured_entries[].beginner_prompt
- post_install_notice.message_template.more_info_hint
- preconditions[].description
- preconditions[].on_fail
- intent_router.uc_entries[].name
- intent_router.uc_entries[].ambiguity_question
- architecture.pipeline
- architecture.stages[].narrative.does_what
- architecture.stages[].narrative.key_decisions
- architecture.stages[].narrative.common_pitfalls
- constraints.fatal[].consequence
- constraints.regular[].consequence
- output_validator.assertions[].failure_message
- acceptance.hard_gates[].on_fail
- skill_crystallization.action
locale_detection_order:
- explicit_user_declaration
- first_message_language
- system_locale
translation_enforcement:
trigger: on_first_user_message
action: Render user_facing_fields in detected locale, preserving all IDs (BD-/SL-/UC-/finance-C-) and code identifiers
verbatim
violation_code: LOCALE-01
violation_signal: User receives untranslated English Human Summary when detected locale != en
evidence_quality:
declared:
evidence_coverage_ratio: 1.0
evidence_verify_ratio: 0.5154639175257731
evidence_invalid: 47
evidence_verified: 50
evidence_auto_fixed: 0
audit_coverage: 29/29 (100%)
audit_pass_rate: 7/29 (24%)
audit_fail_total: 7
audit_finance_universal:
pass: 4
warn: 8
fail: 4
audit_subdomain_totals:
pass: 3
warn: 7
fail: 3
enforcement_rules:
- id: EQ-01
trigger: declared.evidence_verify_ratio < 0.5
action: MUST invoke traceback lookup for all cited BD-IDs in output before emitting business code — read LATEST.yaml sections
for each BD referenced
violation_code: EQ-01-V
violation_signal: Generated script references BD-IDs but no tool_call to read LATEST.yaml preceded code generation
user_disclosure_template: '[QUALITY NOTICE] This crystal was compiled from blueprint finance-bp-129. Evidence verify ratio
= 51.5% and audit fail total = 7. Generated results may have uncaptured requirement gaps. Verify critical decisions against
source files (LATEST.yaml / LATEST.jsonl).'
traceback:
source_files:
blueprint: LATEST.yaml
constraints: LATEST.jsonl
mandatory_lookup_scenarios:
- id: TB-01
condition: Two constraints have apparently conflicting enforcement rules
lookup_target: LATEST.jsonl — find both constraint IDs, compare `consequence` + `evidence_refs` to determine priority
- id: TB-02
condition: A business decision rationale is unclear or disputed
lookup_target: LATEST.yaml — locate BD-ID under business_decisions, read `rationale` + `alternative_considered` fields
- id: TB-03
condition: evidence_invalid > 0 in evidence_quality.declared
lookup_target: LATEST.yaml _enrich_meta — cross-check specific BD `evidence_refs` fields for invalid markers
- id: TB-04
condition: User asks where a rule comes from
lookup_target: LATEST.jsonl — find constraint by ID, read `confidence.evidence_refs` for source file + line number
- id: TB-05
condition: Generated code does not match expected ZVT API behavior
lookup_target: LATEST.yaml stages[].required_methods — verify method signature and evidence locator in source code
degraded_lookup:
no_fs_access: 'Ask the user to paste the relevant LATEST.yaml section or LATEST.jsonl lines for the BD-/finance-C- IDs
in question. Crystal ID: finance-bp-129-v5.0.'
trace_schema:
event_types:
- precondition_check
- spec_lock_check
- evidence_rule_fired
- evidence_rule_skipped
- locale_translation_emitted
- hard_gate_passed
- hard_gate_failed
- skill_emitted
- false_completion_claim
preconditions:
- id: PC-01
description: zvt package installed and importable
check_command: python3 -c 'import zvt; print(zvt.__version__)'
on_fail: 'Run: python3 -m pip install zvt then re-run: python3 -m zvt.init_dirs to initialize data directories'
severity: fatal
- id: PC-02
description: K-data exists for target entities (required before backtesting)
check_command: python3 -c "from zvt.api.kdata import get_kdata; df = get_kdata(entity_ids=['stock_sh_600000'], limit=1);
assert df is not None and len(df) > 0, 'No kdata found'"
on_fail: 'Run recorder first: python3 -m zvt.recorders.em.em_stock_kdata_recorder --entity_ids stock_sh_600000 (replace
with your target entity IDs)'
severity: fatal
applies_to_uc: []
- id: PC-03
description: ZVT data directory initialized (~/.zvt or ZVT_HOME)
check_command: 'python3 -c "import os; from pathlib import Path; zvt_home = Path(os.environ.get(''ZVT_HOME'', Path.home()
/ ''.zvt'')); assert zvt_home.exists(), f''ZVT home not found: {zvt_home}''"'
on_fail: 'Run: python3 -m zvt.init_dirs'
severity: fatal
- id: PC-04
description: SQLite write permission for ZVT data directory
check_command: python3 -c "import os, tempfile; from pathlib import Path; zvt_home = Path(os.environ.get('ZVT_HOME', Path.home()
/ '.zvt')); test_f = zvt_home / '.write_test'; test_f.touch(); test_f.unlink()"
on_fail: 'Check directory permissions: chmod u+w ~/.zvt or set ZVT_HOME environment variable to a writable location'
severity: warn
intent_router:
uc_entries:
- uc_id: UC-101
name: Beancount Test Utilities Framework
positive_terms:
- testing utilities
- tempdir
- test files
- mock repository
- integration testing
data_domain: internal
negative_terms:
- trading strategy
- screening
- live trading
- data pipeline
- monitoring
- reporting
ambiguity_question: Are you looking for reusable testing utilities for your beancount project, or are you looking for
a specific trading, screening, or data processing use case?
- uc_id: UC-102
name: Test Utils Validation Suite
positive_terms:
- unit test
- validation
- test utilities
- tempdir cleanup
- test file generation
data_domain: internal
negative_terms:
- trading signals
- portfolio screening
- data ingestion
- live execution
- performance reporting
ambiguity_question: Are you looking for test coverage of beancount utilities, or do you need a specific trading, screening,
or analytical use case?
context_state_machine:
states:
- id: CA1_MEMORY_CHECKED
entry: Task started
exit: All memory queries attempted and recorded; memory_unavailable set if failed
timeout: 30s — skip memory, mark memory_unavailable=true, proceed to CA2
- id: CA2_GAPS_FILLED
entry: CA1 complete
exit: 'All FATAL-priority required inputs answered: target market (A-share/HK/US), data source, time range, strategy type'
timeout: NOT skippable — FATAL inputs MUST be user-answered before proceeding
- id: CA3_PATH_SELECTED
entry: CA2 complete
exit: intent_router matched single use case with confidence gap > 20% over next candidate, no data_domain ambiguity
timeout: Trigger ambiguity_question for top-2 candidates, await user selection
- id: CA4_EXECUTING
entry: CA3 complete + user explicit confirmation received
exit: All hard gates G1-Gn passed and output files written
timeout: NOT skippable — user confirmation of execution path required
enforcement: Code generation is PROHIBITED before CA4_EXECUTING. Any regression to earlier state MUST be announced to user.
buy/sell ordering SL-01 check runs at CA4 entry.
spec_lock_registry:
semantic_locks:
- id: SL-01
description: Execute sell orders before buy orders in every trading cycle
locked_value: sell() called before buy() in each Trader.run() iteration
violation_is: fatal
source_bd_ids:
- BD-018
- id: SL-02
description: Trading signals MUST use next-bar execution (no look-ahead)
locked_value: due_timestamp = happen_timestamp + level.to_second()
violation_is: fatal
source_bd_ids:
- BD-014
- BD-025
- id: SL-03
description: Entity IDs MUST follow format entity_type_exchange_code
locked_value: stock_sh_600000 | stockhk_hk_0700 | stockus_nasdaq_AAPL
violation_is: fatal
source_bd_ids: []
- id: SL-04
description: DataFrame index MUST be MultiIndex (entity_id, timestamp)
locked_value: df.index.names == ['entity_id', 'timestamp']
violation_is: fatal
source_bd_ids: []
- id: SL-05
description: 'TradingSignal MUST have EXACTLY ONE of: position_pct, order_money, order_amount'
locked_value: XOR enforcement in trading/__init__.py:68
violation_is: fatal
source_bd_ids: []
- id: SL-06
description: 'filter_result column semantics: True=BUY, False=SELL, None/NaN=NO ACTION'
locked_value: factor.py:475 order_type_flag mapping
violation_is: fatal
source_bd_ids: []
- id: SL-07
description: Transformer MUST run BEFORE Accumulator in factor pipeline
locked_value: 'compute_result(): transform at :403 before accumulator at :409'
violation_is: fatal
source_bd_ids: []
- id: SL-08
description: 'MACD parameters locked: fast=12, slow=26, signal=9'
locked_value: factors/algorithm.py:30 macd(slow=26, fast=12, n=9)
violation_is: fatal
source_bd_ids:
- BD-036
- id: SL-09
description: 'Default transaction costs: buy_cost=0.001, sell_cost=0.001, slippage=0.001'
locked_value: sim_account.py:25 SimAccountService default costs
violation_is: warning
source_bd_ids:
- BD-029
- id: SL-10
description: A-share equity trading is T+1 (no same-day close of buy positions)
locked_value: sim_account.available_long filters by trading_t
violation_is: fatal
source_bd_ids: []
- id: SL-11
description: Recorder subclass MUST define provider AND data_schema class attributes
locked_value: contract/recorder.py:71 Meta; register_schema decorator
violation_is: fatal
source_bd_ids: []
- id: SL-12
description: Factor result_df MUST contain either 'filter_result' OR 'score_result' column
locked_value: result_df.columns.intersection({'filter_result', 'score_result'}) non-empty
violation_is: fatal
source_bd_ids: []
implementation_hints:
- id: IH-01
hint: 'Use AdjustType enum exactly: qfq (pre-adjust), hfq (post-adjust), bfq (none) — contract/__init__.py:121'
- id: IH-02
hint: For A-share kdata, default to hfq for long-term analysis (dividend-adjusted) — trader.py:538 StockTrader
- id: IH-03
hint: SQLite connection MUST use check_same_thread=False for multi-threaded recorders
- id: IH-04
hint: Accumulator state serialization uses JSON with custom encoder/decoder hooks — contract/base_service.py
- id: IH-05
hint: Factor.level MUST match TargetSelector.level (enforced at add_factor) — factors/target_selector.py:84
preservation_manifest:
required_objects:
business_decisions_count: 116
fatal_constraints_count: 38
non_fatal_constraints_count: 146
use_cases_count: 2
semantic_locks_count: 12
preconditions_count: 4
evidence_quality_rules_count: 2
traceback_scenarios_count: 5
architecture:
pipeline: data_collection -> data_storage -> factor_computation -> target_selection -> trading_execution -> visualization
stages:
- id: data_collection
narrative:
does_what: TimeSeriesDataRecorder and FixedCycleDataRecorder fetch OHLCV and fundamental data from providers (eastmoney,
joinquant, baostock, akshare) and persist domain objects (Stock1dKdata, BalanceSheet) to SQLite via df_to_db().
key_decisions: BD-002 chose evaluate_start_end_size_timestamps for incremental fetch (not full refresh) because comparing
to get_latest_saved_record avoids redundant API calls; BD-003 chose get_data_map field transformation to keep domain
schema provider-agnostic.
common_pitfalls: 'Don''t forget SL-11: Recorder subclass MUST declare both provider and data_schema class attributes
else initialization fails with assertion error; finance-C-001 fatal violation.'
business_decisions: []
- id: data_storage
narrative:
does_what: StorageBackend persists DataFrames to per-provider SQLite databases at {data_path}/{provider}/{provider}_{db_name}.db
using path templates from _get_path_template; Mixin.record_data and Mixin.query_data provide uniform read/write interface.
key_decisions: BD-004 chose StorageBackend abstraction (not hardcoded SQLite) to allow future cloud storage swap; BD-006
derives db_name from data_schema __tablename__ for per-domain database isolation.
common_pitfalls: SL-04 violation (wrong DataFrame index) causes factor pipeline failures downstream; always ensure df.index.names
== ['entity_id', 'timestamp'] before calling record_data.
business_decisions: []
- id: factor_computation
narrative:
does_what: Factor.compute() applies Transformer (stateless, e.g. MacdTransformer) then Accumulator (stateful, e.g. MaStatsAccumulator)
to produce filter_result or score_result columns; EntityStateService persists per-entity rolling state across batches.
key_decisions: BD-007 chose Factor inheriting DataReader for composable data access; SL-08 locks MACD at (fast=12, slow=26,
n=9) — chose standard Appel parameters not adaptive because interpretability matters for practitioners.
common_pitfalls: 'SL-07: Transformer MUST run before Accumulator — swapping order causes NaN propagation; SL-12: result_df
must contain filter_result OR score_result column or TargetSelector silently drops all signals.'
business_decisions: []
- id: target_selection
narrative:
does_what: TargetSelector.add_factor() registers Factor instances; get_targets() returns entity_ids passing threshold
filter at a specific timestamp, enabling point-in-time historical backtesting without look-ahead.
key_decisions: BD-012 chose registrable factor list (not hardcoded) for runtime customization; BD-013 chose timestamp-specific
filtering not current-only because backtests need historical point-in-time correctness.
common_pitfalls: Factor.level MUST match TargetSelector.level (IH-05); mismatched levels cause silent empty target lists
that look like no signals but are actually level-mismatch bugs.
business_decisions: []
- id: trading_execution
narrative:
does_what: Trader.run() calls sell() before buy() each cycle, generates TradingSignals with due_timestamp = happen_timestamp
+ level.to_second() for next-bar execution, and applies on_profit_control() for stop-loss/take-profit before regular
target selection.
key_decisions: SL-01 locks sell-before-buy order because available_long check in sim_account depends on it — chose this
over symmetric ordering to prevent implicit leverage; BD-039 chose long=AND/short=OR multi-level logic to reflect
risk asymmetry.
common_pitfalls: 'SL-02 violation (immediate execution instead of next-bar) introduces look-ahead bias and makes backtest
results unreproducible in live trading; SL-10: A-share T+1 constraint — backtesting without it overstates returns.'
business_decisions: []
- id: visualization
narrative:
does_what: Drawer.draw() combines kline main chart with factor overlays and Rect annotations for entry/exit signals
using Plotly; Drawable interface on Factor enables consistent chart rendering across data types.
key_decisions: BD-019 chose drawer_rects subclass override for custom annotations not hardcoded markers — allows traders
to define entry/exit visuals without modifying base drawing logic.
common_pitfalls: draw_result=True by default (BD-055) is fine for development but set draw_result=False in production/headless
environments to avoid Plotly server startup overhead.
business_decisions: []
- id: cross_cutting_concerns
narrative:
does_what: 'Invariants and utilities that span multiple pipeline stages — collected from 37 source groups: API Usage(1),
Architecture(1), Caching(1), Compatibility(1), Concurrency(1), Configuration(2), and 31 more.'
key_decisions: 116 BDs merged here because they apply to more than one main stage (e.g. algorithm helpers, default value
choices, ordering contracts, error handling). Agent should inspect individual BD summaries and link back to affected
main stages via shared IDs.
common_pitfalls: Cross-cutting concerns frequently surface as bugs when changes to one main stage unintentionally break
another. Check constraints referencing these BDs and verify invariants still hold after any stage-local modification.
business_decisions:
- id: BD-GAP-013
type: B
summary: Batch update API is used for transforming links in Google Docs
- id: BD-GAP-007
type: T
summary: Index document discovery pattern uses a known index document to find each linked documentation
- id: BD-GAP-006
type: B/DK
summary: File-based caching is used for Google Drive API responses to avoid repeated downloads
- id: BD-GAP-008
type: B
summary: Dual VCS support (Git AND Mercurial) for extracting file modification years in copyright update
- id: BD-GAP-005
type: B
summary: threading.local storage is used to save function call returns during regexp matching
- id: BD-GAP-010
type: T
summary: Google Docs is used as the storage mechanism for configuration options
- id: BD-GAP-011
type: B
summary: Redirect file pattern is used to lookup Google Doc IDs dynamically
- id: BD-GAP-002
type: B/DK
summary: MIME type parameter controls which document formats are downloaded from Google Drive
- id: BD-GAP-004
type: B
summary: Block-based document processing preserves blockquotes during DOCX-to-RST conversion
- id: BD-GAP-016
type: B
summary: Regexp-based file selection with ignore directories for copyright update operations
- id: BD-GAP-003
type: T
summary: reStructuredText (RST) is the target format for documentation conversion, not Markdown
- id: BD-GAP-009
type: T
summary: Year compression transforms sequential years into interval notation (2018-2020) in copyright notices
- id: BD-GAP-001
type: BA
summary: Google Drive API with service account authentication is used instead of OAuth user flow for document downloads
- id: BD-GAP-015
type: DK
summary: Test completeness is determined by file existence rather than code coverage analysis
- id: BD-GAP-012
type: B
summary: Benchmark execution is prevented when uncommitted local changes exist
- id: BD-GAP-014
type: B
summary: Dry-run mode is supported for link transformations to preview changes before applying
- id: BD-GAP-017
type: T
summary: Pandoc is used as the DOCX-to-RST conversion engine
- id: BD-028
type: B
summary: Accounts use hierarchical colon-separated naming with 4 standard types (Assets, Liabilities, Equity, Income,
Expenses)
- id: BD-029
type: B/BA
summary: Equity contains 'Opening-Balances' and 'Current-Earnings' standard sub-accounts
- id: BD-050
type: B
summary: Open entries mark accounts as active with optional currencies and booking method
- id: BD-051
type: B
summary: Close entries mark accounts as inactive after the specified date
- id: BD-063
type: B
summary: Use date + 1 day offset for balance check placement
- id: BD-064
type: B/BA
summary: Zero balance assertion for position verification
- id: BD-065
type: B/BA
summary: 'Metadata boolean ''closing: TRUE'' as trigger'
- id: BD-066
type: B
summary: Balance check date verification equals original + 1 day
- id: BD-067
type: B/BA
summary: Extra tolerance multiplier for dual constraint satisfaction
- id: BD-068
type: B/BA
summary: Expected proceeds = price * (-units) for short/long positions
- id: BD-069
type: B
summary: Proceeds accumulation for non-income accounts
- id: BD-070
type: B/DK
summary: Currency-by-currency inventory comparison
- id: BD-071
type: B/BA
summary: Absolute difference tolerance check
- id: BD-072
type: B/BA
summary: Tolerance inference per currency from postings
- id: BD-073
type: B
summary: Proceeds inventory accumulation using weight
- id: BD-074
type: B/RC
summary: Require each cost postings to have prices for validation
- id: BD-075
type: B
summary: Error when proceeds inventory has unmatched currencies
- id: BD-076
type: B
summary: Proceeds types include equity (for stock vesting)
- id: BD-077
type: B/BA
summary: No errors expected for balanced multi-leg sale
- id: BD-078
type: B/BA
summary: SellGainsError on unbalanced cash vs cost
- id: BD-079
type: B
summary: Dual error type on imbalance with missing expense
- id: BD-080
type: B
summary: Other currency (CAD) proceeds accepted as valid
- id: BD-081
type: B
summary: Zero price sale accepted as valid
- id: BD-039
type: B
summary: Weight calculation uses cost for lots, units for non-cost postings, explicit price overrides
- id: BD-026
type: B/BA
summary: Interpolation tolerance is 0.005 (0.5% of balance) for balance assertions
- id: BD-027
type: B
summary: Balance assertions use date-ordered preceding postings for interpolation
- id: BD-053
type: B/BA
summary: Pad entries create balance between two dates using interpolation from preceding entries
- id: BD-005
type: BA
summary: 'Two-pass booking: reductions first, then augmentations'
- id: BD-006
type: B
summary: CostSpec separates incomplete from resolved costs
- id: BD-007
type: BA
summary: Booking method dispatch via _BOOKING_METHODS dict
- id: BD-008
type: M/DK
summary: STRICT_WITH_SIZE fallback for size-exact matches
- id: BD-009
type: B
summary: MISSING sentinel for unfilled numbers (not None)
- id: BD-025
type: B/BA
summary: 'Booking method defaults to ''NONE'' (strict mode: no currency mixing without explicit cost)'
- id: BD-058
type: B/RC
summary: Position lot merging requires identical cost basis (number, currency, date, label)
- id: BD-059
type: B
summary: 'Inventory addition: same-lot positions combine, different lots coexist in inventory'
- id: BD-043
type: B
summary: 'Vesting example: 4-year vesting with 1-year cliff, monthly vesting thereafter'
- id: BD-044
type: B
summary: 'Trading simulation: sell biggest winner or biggest loser based on random 50/50 selection'
- id: BD-045
type: B
summary: 'Trading simulation: skip selling on days when buying (avoid same-day round-trip)'
- id: BD-046
type: B
summary: 'Trading simulation: skip lots without price movement when selecting sell candidates'
- id: BD-056
type: B
summary: Commission tracked as separate expense line (Expenses:Financial:Commissions) in trading
- id: BD-057
type: B
summary: Vesting calculation uses EXACT decimal arithmetic (Decimal type) for precision
- id: BD-082
type: B/BA
summary: 'INTERACTION: BD-001 × BD-003 → Parser line tracking enables secondary sort key for same-day ordering'
- id: BD-083
type: BA
summary: 'INTERACTION: BD-023 × BD-026 → Balanced postings invariant vs tolerance creates tolerance boundary ambiguity'
- id: BD-084
type: B/BA
summary: 'INTERACTION: BD-025 × BD-008 → Strict booking default vs STRICT_WITH_SIZE fallback creates behavioral inconsistency'
- id: BD-085
type: BA
summary: 'INTERACTION: BD-057 × BD-026 → Decimal precision vs tolerance existence reveals incomplete Decimal coverage'
- id: BD-086
type: B/RC
summary: 'INTERACTION: BD-006 × BD-005 × BD-031 → CostSpec incompleteness flows through two-pass booking to multi-lot
inventory'
- id: BD-087
type: B
summary: 'INTERACTION: BD-017 × BD-029 × BD-040 → Equity rollforward requires standard accounts and consistent date
ordering'
- id: BD-088
type: BA
summary: 'INTERACTION: BD-037 × BD-038 × BD-036 × BD-021 → Currency conversion chain creates systemic conversion dependency'
- id: BD-089
type: B/BA
summary: 'INTERACTION: BD-010 × BD-011 × BD-012 × BD-032 → Plugin pipeline ordering creates invariant check dependencies'
- id: BD-090
type: B/BA
summary: 'INTERACTION: BD-068 × BD-073 × BD-070 × BD-075 → Proceeds validation cascade from negation through currency
matching'
- id: BD-091
type: B/BA
summary: 'INTERACTION: BD-013 × BD-014 × BD-030 → Realization tree structure assumptions enable hierarchical aggregation'
- id: BD-092
type: B
summary: 'INTERACTION: BD-002 × BD-061 → Immutable directives enable safe post-modification invariant checking'
- id: BD-093
type: BA
summary: 'RISK CASCADE: Price lookup failure → multi-hop failure → constraint violation → validation error'
- id: BD-094
type: BA
summary: 'RISK CASCADE: Incomplete CostSpec → booking resolution failure → incorrect lot assignment → wrong cost basis'
- id: BD-095
type: BA
summary: 'RISK CASCADE: Plugin mode change → fixed ordering bypass → invariant check miss → silent corruption'
- id: BD-096
type: BA
summary: 'RISK CASCADE: Proceeds negation error → weighted accumulation wrong → per-currency pass → false validation
pass'
- id: BD-052
type: B
summary: Document directives provide optional source file references for entries
- id: BD-001
type: B/DK
summary: Lexer/Parser split using PLY (Python Lex-Yacc)
- id: BD-002
type: B
summary: Directives are immutable NamedTuples
- id: BD-003
type: B/RC
summary: Every directives require date; lineno is secondary sort key
- id: BD-004
type: BA
summary: Options parsed per-file with aggregation for includes
- id: BD-GAP-018
type: DK
summary: 'Missing: Timezone explicit annotation + UTC normalization'
- id: BD-GAP-019
type: B
summary: 'Missing: Provider Priority & Credential Isolation'
- id: BD-GAP-020
type: RC
summary: 'Missing: Delinquency Definition (DPD 30/60/90)'
- id: BD-032
type: B
summary: 'Plugin pipeline runs in order: check_closing first, then other plugins, then close_tree last'
- id: BD-033
type: B
summary: sellgains plugin moves realized gains to income when cost basis exceeds proceeds
- id: BD-048
type: B
summary: check_closing plugin verifies closing entries match computed balances
- id: BD-049
type: B
summary: close_tree plugin removes empty account subtrees after each processing
- id: BD-024
type: B/RC
summary: Lots are identified by (account, currency, cost_spec) triple including number, currency, date, and label
- id: BD-031
type: B
summary: Inventory holds multiple lots per (account, currency) with set-based equality
- id: BD-036
type: B/BA
summary: Cost currency defaults to the same currency as the lot's units when no explicit cost currency specified
- id: BD-042
type: B/RC
summary: Cost label (optional) allows distinguishing lots with same date/currency/amount
- id: BD-047
type: B
summary: Inventory reduce uses (amount, currency) key for aggregation across lots
- id: BD-037
type: B
summary: Price lookup uses (base, quote) currency tuple as key for rate retrieval
- id: BD-038
type: B
summary: Implied price conversion via intermediate currency hops when direct rate unavailable
- id: BD-013
type: B
summary: RealAccount extends dict with account component keys
- id: BD-014
type: B
summary: Balance stored as Inventory per RealAccount
- id: BD-015
type: B
summary: Balances computed via Inventory.reduce with convert functions
- id: BD-016
type: B
summary: Amount uses __slots__ = () to prevent dynamic attributes
- id: BD-030
type: B/DK
summary: Realization creates complete tree from root 'Root' down through 4 account levels
- id: BD-034
type: B
summary: Summarization uses previous-day dates to maintain chronological ordering in reports
- id: BD-035
type: B
summary: Transfer entries use date minus 1 day to precede cutoff date for account transfers
- id: BD-040
type: B
summary: 'Clamp operation: income/expenses to equity at begin_date, summarize period, truncate after end_date, convert
at end'
- id: BD-041
type: B
summary: Balance assertions following transferred accounts are removed from the new account
- id: BD-054
type: B
summary: Date range filtering in exports uses half-open interval [begin, end)
- id: BD-055
type: B
summary: Treeify expands flat account lists into nested dict structure for hierarchical reporting
- id: BD-060
type: B
summary: Realization tree contains both aggregated balances and per-node lot lists
- id: BD-017
type: BA
summary: Income/expenses rolled to equity at period boundaries
- id: BD-018
type: B
summary: Conversion entries at open date for zero-priced items
- id: BD-019
type: M
summary: GetAccounts class uses getattr dispatch on entry class name
- id: BD-062
type: B
summary: 'Position scaling: negate position units to create selling lot from buying position'
- id: BD-010
type: BA
summary: 'Plugin processing mode: raw vs default'
- id: BD-011
type: B/DK
summary: Documents plugin is always prepended
- id: BD-012
type: B
summary: pad and balance are always appended as post plugins
- id: BD-020
type: B
summary: Open/close account lifecycle validation
- id: BD-021
type: B
summary: Currency constraints from Open declaration
- id: BD-022
type: BA/DK
summary: Extra validations injected at load time
- id: BD-023
type: B
summary: 'Double-entry accounting: every transaction must have balanced postings (sum to zero)'
- id: BD-061
type: B
summary: 'Invariant checking: verify_balance_interpolation called after every entry modification'
resources:
packages:
- name: click >=7.0
version_pin: latest
- name: python-dateutil >=2.6.0
version_pin: latest
- name: regex >=2022.9.13
version_pin: latest
- name: flex / winflexbison-bin >=2.6.4
version_pin: latest
- name: bison-bin / winflexbison-bin >=3.8.0
version_pin: latest
- name: meson >=1.2.1
version_pin: latest
- name: meson-python >=0.14.0
version_pin: latest
- name: pytest
version_pin: latest
- name: mypy
version_pin: latest
- name: types-python-dateutil
version_pin: latest
strategy_scaffold:
entry_point_name: run_backtest
output_path: result.csv
execution_mode: backtest
conditional_entry_points:
backtest:
entry_point_name: run_backtest
output_path: result.csv
collector:
entry_point_name: run_collector
output_path: result.json
factor:
entry_point_name: run_factor
output_path: result.parquet
training:
entry_point_name: run_training
output_path: result.json
serving:
entry_point_name: run_server
output_path: result.json
research:
entry_point_name: run_research
output_path: result.json
tail_template: "# === DO NOT MODIFY BELOW THIS LINE ===\nif __name__ == \"__main__\":\n result = run_backtest() #\
\ implement above\n from validate import enforce_validation\n enforce_validation(result, output_path=\"{workspace}/result.csv\"\
)\n# === END DO NOT MODIFY ==="
host_adapter:
target: openclaw
timeout_seconds: 1800
shell_operator_restriction: 'exec tool intercepts && / ; / | — never chain: ''pip install X && python Y''. Use separate
exec calls.'
install_recipes:
- python3 -m pip install click >=7.0
- python3 -m pip install python-dateutil >=2.6.0
- python3 -m pip install regex >=2022.9.13
- python3 -m pip install zvt
credential_injection: JoinQuant/QMT credentials require user-side '!' prefix shell login. Never hardcode credentials in
generated scripts.
path_resolution: '{workspace} resolves to ~/.openclaw/workspace/doramagic at execution time.'
file_io_tooling: Use openclaw 'write' tool for .py/.sql files; 'exec' tool for python3 /absolute/path/script.py (absolute
paths only).
constraints:
fatal:
- id: finance-C-001
when: When implementing monetary calculations in the parsing stage
action: use Decimal type from beancount.core.number instead of floating-point types
severity: fatal
kind: domain_rule
modality: must
consequence: Floating-point arithmetic produces rounding errors that accumulate in financial calculations, leading to
incorrect account balances and audit failures
stage_ids:
- parsing
- id: finance-C-002
when: When parsing directives from Beancount source files
action: require every directive to have a valid date field as primary identifier
severity: fatal
kind: domain_rule
modality: must
consequence: Directives without dates cannot be temporally ordered, causing incorrect balance calculations and non-deterministic
transaction sequencing
stage_ids:
- parsing
- id: finance-C-003
when: When encoding Beancount source files for parsing
action: use UTF-8 encoding exclusively for each input files
severity: fatal
kind: domain_rule
modality: must
consequence: Non-UTF-8 encoded files cause character decoding errors, preventing valid entries from being parsed and resulting
in lost transaction data
stage_ids:
- parsing
- id: finance-C-009
when: When using parser output directly for financial calculations
action: use parsed entries that contain MISSING sentinel values for balance or cost computations
severity: fatal
kind: operational_lesson
modality: must_not
consequence: MISSING values in postings cause runtime errors or silent zero-value calculations, resulting in incorrect
portfolio valuations and reconciliation failures
stage_ids:
- parsing
- id: finance-C-014
when: When presenting parsed data as completed ledger output
action: claim that parsed entries are complete and ready for financial reporting without running booking
severity: fatal
kind: claim_boundary
modality: must_not
consequence: Presenting incomplete parsed entries as final results misleads users into using entries with unresolved MISSING
values and unimterpolated amounts for financial decisions
stage_ids:
- parsing
- id: finance-C-017
when: When processing inventory reduction postings
action: Match reductions against existing lots before applying interpolation
severity: fatal
kind: domain_rule
modality: must
consequence: Interpolation cannot succeed for reductions with missing price/cost because the booking method (FIFO/LIFO/etc.)
must first determine which lot is being reduced
stage_ids:
- booking
- id: finance-C-018
when: When validating transaction completeness after booking
action: Eliminate each MISSING values from posting units and costs
severity: fatal
kind: domain_rule
modality: must
consequence: Postings with incomplete amounts cannot be used for balance calculations or realized/unrealized gains computation
stage_ids:
- booking
- id: finance-C-019
when: When interpolating missing numbers in a currency group
action: Have more than one missing value per currency group
severity: fatal
kind: domain_rule
modality: must_not
consequence: Multiple missing values create an underdetermined system with no unique solution, causing InterpolationError
and transaction failure
stage_ids:
- booking
- id: finance-C-026
when: When implementing AVERAGE booking method
action: Claim support for AVERAGE method as it is not implemented
severity: fatal
kind: resource_boundary
modality: must_not
consequence: AVERAGE booking always returns an AmbiguousMatchError, so any code claiming AVERAGE support is incorrect
stage_ids:
- booking
- id: finance-C-034
when: When implementing a plugin function for beancount
action: return a tuple of (modified_entries, errors_list) from the plugin function
severity: fatal
kind: domain_rule
modality: must
consequence: Plugin return value mismatch causes loader to crash or corrupt entry list when extending errors
stage_ids:
- transformation
- id: finance-C-036
when: When implementing plugin function signatures
action: accept (entries, options_map, *optional_config) parameters in that exact order
severity: fatal
kind: domain_rule
modality: must
consequence: Plugin function signature mismatch causes TypeError when loader attempts to call the plugin callback with
(entries, options_map, *args)
stage_ids:
- transformation
- id: finance-C-049
when: When creating a RealAccount instance
action: pass a string account_name, not None or non-string value
severity: fatal
kind: domain_rule
modality: must
consequence: Passing None or non-string as account_name causes ValueError, breaking account tree construction and preventing
balance reporting for the entire account hierarchy
stage_ids:
- realization
- id: finance-C-050
when: When inserting a subaccount into a RealAccount tree
action: use string keys matching the hierarchical account naming convention
severity: fatal
kind: domain_rule
modality: must
consequence: Non-string keys or mismatched account names cause KeyError/ValueError, breaking the tree structure and corrupting
balance calculations for all child accounts
stage_ids:
- realization
- id: finance-C-051
when: When constructing Amount or Position objects
action: use Decimal for each monetary number values and immutable __slots__ = () pattern
severity: fatal
kind: domain_rule
modality: must
consequence: Using float instead of Decimal for monetary values causes rounding errors in balance calculations, leading
to incorrect financial reports and potential compliance issues
stage_ids:
- realization
- id: finance-C-061
when: When summarizing entries to create opening balance transactions
action: Verify the total balance of summarized entries equals exactly zero
severity: fatal
kind: domain_rule
modality: must
consequence: The accounting identity will be violated, causing the balance sheet to be fundamentally incorrect with non-zero
total assets and liabilities
stage_ids:
- summarization
- id: finance-C-062
when: When computing balances for summarization before a cutoff date
action: Include only entries strictly before the date parameter
severity: fatal
kind: domain_rule
modality: must
consequence: Entries at the boundary date will be incorrectly included or excluded, causing duplicate or missing transactions
in the opening balances
stage_ids:
- summarization
- id: finance-C-063
when: When inserting synthesized entries at a period boundary
action: Insert transfer and summary entries on the day before the boundary date
severity: fatal
kind: architecture_guardrail
modality: must
consequence: Open directives will appear after transactions, violating the requirement that Open entries precede all activity
for that account
stage_ids:
- summarization
- id: finance-C-064
when: When creating conversion entries for zero-priced items
action: Use ZERO as the price amount to maintain the balance invariant
severity: fatal
kind: domain_rule
modality: must
consequence: The balance invariant will be broken because conversion entries will not correctly offset the original positions,
causing phantom gains or losses
stage_ids:
- summarization
- id: finance-C-065
when: When executing the open() function for period opening
action: Execute conversions before clear, and clear before summarize in that exact order
severity: fatal
kind: architecture_guardrail
modality: must
consequence: Income/expense accounts will still have balances when summarization runs, causing those accounts to incorrectly
appear in opening balance transactions
stage_ids:
- summarization
- id: finance-C-071
when: When closing income and expense accounts at period boundaries
action: Transfer the accumulated balances to the equity earnings account before summarization
severity: fatal
kind: domain_rule
modality: must
consequence: Income statement accounts will show residual balances that should have been closed to equity, causing the
balance sheet to not balance correctly
stage_ids:
- summarization
- id: finance-C-076
when: When using the clamp function to filter entries to a time period
action: Execute income/expense transfer before summarize, then truncate, then add conversion entries
severity: fatal
kind: architecture_guardrail
modality: must
consequence: Income statement accounts will have residual balances and the period will not end with zero total balance
as required for period reporting
stage_ids:
- summarization
- id: finance-C-079
when: When implementing account lifecycle validation
action: Prevent duplicate Open or Close directives for the same account
severity: fatal
kind: domain_rule
modality: must
consequence: Duplicate Open/Close directives cause ambiguous account lifecycle, leading to incorrect balance calculations
and reports that mix up different account states
stage_ids:
- validation
- id: finance-C-080
when: When implementing account close validation
action: Verify Close directive date is strictly after its corresponding Open directive date
severity: fatal
kind: domain_rule
modality: must
consequence: Closing an account before it is opened creates invalid accounting state where transactions could reference
an account that doesn't exist, corrupting the ledger integrity
stage_ids:
- validation
- id: finance-C-081
when: When implementing balance assertion validation
action: Reject duplicate Balance entries with different amounts on the same (account, currency, date)
severity: fatal
kind: domain_rule
modality: must
consequence: Conflicting balance assertions for the same account on the same date create irreconcilable accounting state,
causing incorrect account balances and reporting errors
stage_ids:
- validation
- id: finance-C-083
when: When implementing transaction validation
action: Verify each transaction postings balance to zero within tolerance
severity: fatal
kind: domain_rule
modality: must
consequence: Unbalanced transactions violate double-entry accounting principles, resulting in incorrect ledger balances
and financial reports that do not sum correctly
stage_ids:
- validation
- id: finance-C-084
when: When implementing currency constraint validation
action: Enforce that postings only use currencies declared in the account's Open directive
severity: fatal
kind: domain_rule
modality: must
consequence: Postings with currencies not allowed for an account violate currency constraints, leading to incorrect inventory
tracking and mixing of incompatible currencies in the same account
stage_ids:
- validation
- id: finance-C-085
when: When implementing active account validation
action: Verify each directive references to accounts occur within the account's open-close interval
severity: fatal
kind: domain_rule
modality: must
consequence: References to accounts outside their active period create invalid transactions that reference non-existent
or closed accounts, corrupting the accounting records
stage_ids:
- validation
- id: finance-C-088
when: When running transaction balance validation
action: Execute balance checks AFTER each user plugin transformations
severity: fatal
kind: architecture_guardrail
modality: must
consequence: Running balance checks before plugins means unbalanced input transactions are rejected even when plugins
are designed to fix them, breaking valid workflows
stage_ids:
- validation
- id: finance-C-089
when: When calling the validation pipeline
action: Invoke validation AFTER booking and transformations are complete
severity: fatal
kind: architecture_guardrail
modality: must
consequence: Validation before booking or transformations checks incomplete or incorrect data, producing false errors
that do not reflect the final state of the ledger
stage_ids:
- validation
- id: finance-C-095
when: When parsing produces directives with CostSpec (incomplete costs)
action: Convert each CostSpec instances to Cost instances during booking stage using the account's configured booking
method
severity: fatal
kind: domain_rule
modality: must
consequence: Transaction balance calculations will be incorrect if CostSpec remains unresolved, causing wrong lot matching
and incorrect cost basis for assets
- id: finance-C-097
when: When booking stage produces entries with resolved costs
action: Verify entries remain sorted by date after booking completion using entry_sortkey
severity: fatal
kind: domain_rule
modality: must
consequence: Position calculations and balance checks will be incorrect if entries are processed out of chronological
order, violating accounting ledger order requirements
- id: finance-C-100
when: When plugin_processing_mode is 'default'
action: 'Execute plugins in exact order: PLUGINS_PRE first, then user plugins, then PLUGINS_AUTO, then PLUGINS_POST last'
severity: fatal
kind: architecture_guardrail
modality: must
consequence: Balance checks and padding will run at wrong time, allowing unbalanced transactions to pass validation or
padding to be applied incorrectly
- id: finance-C-102
when: When fully transformed directives are passed to validation stage
action: Verify each Transaction postings balance with tolerance checking using inferred_tolerances from options_map
severity: fatal
kind: domain_rule
modality: must
consequence: Unbalanced transactions will appear in reports as valid, leading to incorrect financial records and potentially
wrong tax calculations
- id: finance-C-108
when: When implementing or writing code that creates or manipulates monetary amounts in beancount
action: Use the Decimal type (via D() function) for each monetary numbers instead of Python float or int — never use floating-point
in an accounting system
severity: fatal
kind: domain_rule
modality: must
consequence: Floating-point arithmetic causes rounding errors that accumulate across transactions, leading to incorrect
account balances and misreported financial positions
- id: finance-C-110
when: When defining or using directives (Transaction, Open, Close, Balance, etc.) in beancount
action: Treat each directive instances as immutable — never mutate NamedTuple fields after construction; use meta dict
for metadata (filename, lineno) instead of modifying state
severity: fatal
kind: architecture_guardrail
modality: must
consequence: Mutating a directive breaks immutability assumptions throughout the system, causing inconsistent balances
and unpredictable plugin behavior
- id: finance-C-113
when: When sorting or retrieving entries in beancount
action: Sort each directives using entry_sortkey(entry) which returns (date, directive_type_sort_order, lineno) — never
sort by date alone or by file order alone
severity: fatal
kind: architecture_guardrail
modality: must
consequence: Incorrect sort order breaks balance assertions, causes check directives to process after transactions on
the same day, and corrupts inventory calculations
- id: finance-C-114
when: When creating or accessing positions within an Inventory in beancount
action: 'Key each Inventory positions by a tuple of (currency: str, cost: Cost|None) — use None cost for non-booked positions
and a Cost instance for booked lots'
severity: fatal
kind: domain_rule
modality: must
consequence: Incorrect inventory keying causes positions to merge incorrectly, leading to wrong cost basis calculations
and distorted portfolio reports
- id: finance-C-143
when: When implementing or refactoring period-bound financial snapshot logic in beancount/ops/summarize.py
action: 'Execute clamp operation in the exact sequence: (1) move income/expenses to Equity:Current-Earnings at begin_date,
(2) summarize the period, (3) remove entries after end_date, (4) apply currency conversion at end_date'
severity: fatal
kind: domain_rule
modality: must
consequence: Reordering the clamp operation sequence—such as applying currency conversion before truncation—produces incorrect
period snapshots where gains/losses are valued at wrong exchange rates or include entries outside the specified period
derived_from_bd_id: BD-040
regular:
- id: finance-C-004
when: When validating user-defined options from Beancount source files
action: validate each option value using its designated converter function and raise ValueError on invalid inputs
severity: high
kind: domain_rule
modality: must
consequence: Invalid option values bypass validation and propagate through the system, causing unexpected behavior in
downstream processing stages like booking and transformation
stage_ids:
- parsing
- id: finance-C-005
when: When providing input files to the Beancount parser
action: provide files with non-absolute path names to the top-level load_file function
severity: high
kind: resource_boundary
modality: must_not
consequence: Relative file paths cause include directive resolution to fail unpredictably depending on current working
directory, resulting in missing entries or incorrect file loading
stage_ids:
- parsing
- id: finance-C-006
when: When loading plugin modules via the plugin directive
action: verify plugin modules define the __plugins__ tuple attribute to be recognized by the transformation system
severity: high
kind: resource_boundary
modality: must
consequence: Plugins without __plugins__ attribute are silently skipped during run_transformations, leaving entries unprocessed
by expected validation and transformation logic
stage_ids:
- parsing
- id: finance-C-007
when: When including additional files via include directives
action: resolve relative include paths against the directory of the file containing the directive
severity: high
kind: resource_boundary
modality: must
consequence: Incorrect include path resolution causes files to not be found, resulting in missing directives and incomplete
ledger data
stage_ids:
- parsing
- id: finance-C-008
when: When processing duplicate include file references
action: allow the same file to be parsed more than once in a single load operation
severity: high
kind: resource_boundary
modality: must_not
consequence: Duplicate file parsing creates duplicate directives with identical timestamps, causing double-counting of
transactions and incorrect financial reports
stage_ids:
- parsing
- id: finance-C-010
when: When implementing directive sorting for downstream processing
action: sort entries by date as primary key and use lineno as secondary sort key for same-day directives
severity: high
kind: architecture_guardrail
modality: must
consequence: Incorrect sorting causes Balance directives to be evaluated after Transactions on the same day, breaking
the accounting invariant that balances apply at the beginning of the day
stage_ids:
- parsing
- id: finance-C-011
when: When building directive data structures
action: create directives as immutable NamedTuple instances to verify safe sharing and caching
severity: high
kind: architecture_guardrail
modality: must
consequence: Mutable directive objects cause unexpected side effects when shared across plugins or cached, leading to
non-deterministic behavior and hard-to-debug inconsistencies
stage_ids:
- parsing
- id: finance-C-012
when: When processing include directives recursively
action: maintain a stack-based processing order where each included file is fully processed before moving to the next
severity: medium
kind: architecture_guardrail
modality: must
consequence: Non-stack-based include processing causes entries to be interleaved incorrectly, breaking the assumption
that entries from included files are contiguous in date order
stage_ids:
- parsing
- id: finance-C-013
when: When aggregating options from multiple include files
action: merge operating_currency lists and dcontext from included files into the top-level options map
severity: medium
kind: architecture_guardrail
modality: must
consequence: Unmerged operating_currency lists cause currency restrictions from included files to be ignored, allowing
invalid currency postings without error
stage_ids:
- parsing
- id: finance-C-015
when: When using parser output for real-time financial decisions
action: claim the parser provides real-time data synchronization with exchange systems
severity: high
kind: claim_boundary
modality: must_not
consequence: The parser only reads static Beancount source files; it does not connect to any external data source, so
presenting parsed data as real-time creates false confidence in data freshness
stage_ids:
- parsing
- id: finance-C-016
when: When implementing lot matching with incomplete cost specifications
action: Use MISSING sentinel (not None) for unfilled numbers in CostSpec
severity: high
kind: domain_rule
modality: must
consequence: MISSING sentinel propagates through the booking process and surfaces in clear error messages, whereas None
would silently cause type errors or incorrect matches
stage_ids:
- booking
- id: finance-C-020
when: When inferring price for a posting with cost specification
action: Attempt to infer price from the residual for cost-held postings
severity: high
kind: domain_rule
modality: must_not
consequence: Cost-based postings should use cost for value calculation, not price interpolation, leading to incorrect
lot valuation
stage_ids:
- booking
- id: finance-C-021
when: When creating a position with cost specification
action: Allow zero units with a non-None cost
severity: high
kind: domain_rule
modality: must_not
consequence: Zero units with cost creates an invalid lot that cannot be meaningfully tracked or valued
stage_ids:
- booking
- id: finance-C-022
when: When creating or interpolating a cost value
action: Allow negative cost numbers
severity: high
kind: domain_rule
modality: must_not
consequence: Negative cost values break financial calculations and PnL computations, leading to incorrect balance assertions
stage_ids:
- booking
- id: finance-C-023
when: When booking reductions against existing inventory
action: Verify reduction postings match existing lots with matching cost currency
severity: high
kind: architecture_guardrail
modality: must
consequence: Reducing inventory against lots with mismatched currencies creates phantom gains/losses in reports
stage_ids:
- booking
- id: finance-C-024
when: When handling ambiguous lot matching scenarios
action: Use configured booking method from _BOOKING_METHODS dispatch table
severity: high
kind: architecture_guardrail
modality: must
consequence: Incorrect booking method selection leads to wrong lot matching, causing wrong cost basis for subsequent transactions
stage_ids:
- booking
- id: finance-C-025
when: When updating inventory balances after booking reductions
action: Update local balance tracking to avoid matching same lot twice in one transaction
severity: high
kind: architecture_guardrail
modality: must
consequence: Double-matching the same lot causes inventory to go negative or incorrect cost basis calculations
stage_ids:
- booking
- id: finance-C-027
when: When matching reductions against inventory balance
action: Match a reduction posting against positive lots (same-sign inventory)
severity: high
kind: domain_rule
modality: must_not
consequence: Matching reductions to positive lots creates invalid double-negative or double-positive positions
stage_ids:
- booking
- id: finance-C-028
when: When resolving augmentation postings with incomplete cost
action: Convert CostSpec to Cost only after interpolation completes
severity: high
kind: architecture_guardrail
modality: must
consequence: Converting CostSpec before interpolation prevents filling missing cost_per/cost_total values from the transaction
residual
stage_ids:
- booking
- id: finance-C-029
when: When converting CostSpec to Cost for augmenting postings
action: Verify each required cost fields (number_per or number_total, currency, date) are resolved
severity: high
kind: domain_rule
modality: must
consequence: Incomplete Cost after conversion breaks position valuation and causes downstream errors in reports
stage_ids:
- booking
- id: finance-C-030
when: When using STRICT booking method
action: Reject ambiguous matches unless each matching lots sum exactly to the reduction amount
severity: high
kind: architecture_guardrail
modality: must
consequence: Ambiguous matches without exact sum lead to arbitrary lot selection, causing wrong tax lot calculations
stage_ids:
- booking
- id: finance-C-031
when: When booking with NONE method
action: Treat postings as augmentations without attempting inventory matching
severity: medium
kind: resource_boundary
modality: must
consequence: NONE method intentionally skips matching; forcing matching creates mixed inventories that violate account
conventions
stage_ids:
- booking
- id: finance-C-032
when: When inventory has insufficient lots to satisfy a reduction request
action: Report ReductionError or AmbiguousMatchError for insufficient lots
severity: high
kind: domain_rule
modality: must
consequence: Undetected insufficient lots creates phantom inventory positions that cause incorrect balance assertions
stage_ids:
- booking
- id: finance-C-033
when: When processing same-day transactions affecting same account
action: Use local balance tracking that includes prior same-day postings
severity: high
kind: architecture_guardrail
modality: must
consequence: Without cumulative local balance, same-day reductions cannot match augmentations, causing phantom insufficient
lot errors
stage_ids:
- booking
- id: finance-C-035
when: When implementing a beancount plugin module
action: define the __plugins__ tuple with valid function names from that module
severity: high
kind: architecture_guardrail
modality: must
consequence: Plugin without __plugins__ attribute is silently skipped by the loader, causing transformation logic to never
execute
stage_ids:
- transformation
- id: finance-C-037
when: When plugins modify the entry list during transformation
action: preserve chronological ordering by using data.entry_sortkey for sorting new entries
severity: high
kind: architecture_guardrail
modality: must
consequence: Unsorted entries after plugin transformation cause incorrect balance calculations and validation errors in
subsequent stages
stage_ids:
- transformation
- id: finance-C-038
when: When a plugin raises a non-SystemExit exception during execution
action: allow exceptions to propagate and stop processing - exceptions must be caught and converted to LoadError
severity: high
kind: architecture_guardrail
modality: must_not
consequence: Uncaught plugin exception terminates the entire loader, preventing other plugins from running and losing
partial work
stage_ids:
- transformation
- id: finance-C-039
when: When plugin_processing_mode is set to 'default' (default behavior)
action: automatically prepend documents plugin and append pad/balance plugins to the plugin chain
severity: high
kind: architecture_guardrail
modality: must
consequence: Missing documents plugin causes document directives to not be processed; missing pad/balance causes balance
assertions to fail
stage_ids:
- transformation
- id: finance-C-040
when: When plugin_processing_mode is set to 'raw'
action: only execute user-specified plugins without automatically running pre/post plugins
severity: high
kind: resource_boundary
modality: must
consequence: Setting 'raw' mode without explicitly loading pad/balance causes balance assertions to never be checked,
silently producing incorrect results
stage_ids:
- transformation
- id: finance-C-041
when: When implementing a plugin that synthesizes new entries (e.g., auto_accounts)
action: sort newly synthesized entries using data.entry_sortkey before returning them
severity: high
kind: operational_lesson
modality: must
consequence: Unsorted new entries cause temporal ordering violations where newer entries appear before older ones, corrupting
account balance calculations
stage_ids:
- transformation
- id: finance-C-042
when: When a plugin raises SystemExit during transformation
action: allow SystemExit to propagate immediately without catching or converting to error
severity: medium
kind: architecture_guardrail
modality: must
consequence: SystemExit should not be caught to allow intentional termination of beancount processing (e.g., bail out
on critical errors)
stage_ids:
- transformation
- id: finance-C-043
when: When configuring plugin processing in beancount files
action: use invalid values for plugin_processing_mode option (only 'raw' or 'default' are valid)
severity: high
kind: domain_rule
modality: must_not
consequence: Invalid plugin_processing_mode value causes assertion failure and loader crashes without processing any plugins
stage_ids:
- transformation
- id: finance-C-044
when: When importing plugin modules during transformation
action: allow ImportError to terminate processing - errors must be caught and reported as LoadError
severity: high
kind: architecture_guardrail
modality: must_not
consequence: Uncaught ImportError from missing plugin module crashes the entire loader without reporting which plugin
failed
stage_ids:
- transformation
- id: finance-C-045
when: When combining multiple plugins using loader.combine_plugins()
action: verify combined module has __plugins__ attribute containing each functions from source modules
severity: high
kind: architecture_guardrail
modality: must
consequence: Combined plugin without proper __plugins__ attribute is silently skipped, losing transformations from all
constituent plugins
stage_ids:
- transformation
- id: finance-C-046
when: When processing entries in transformation stage
action: verify entries are sorted after each plugin completes to maintain chronological order
severity: high
kind: domain_rule
modality: must
consequence: Missing sort after plugin transformation breaks temporal ordering causing incorrect balance calculations
and validation errors
stage_ids:
- transformation
- id: finance-C-047
when: When loading beancount files with plugin configuration
action: insert user-specified pythonpath entries at the front of sys.path before loading plugins
severity: medium
kind: operational_lesson
modality: must
consequence: Plugin modules in user pythonpath fail to import if pythonpath is not inserted first, causing plugin import
failures
stage_ids:
- transformation
- id: finance-C-048
when: When setting the plugin_processing_mode option
action: use 'default' mode for normal processing where pad/balance are needed, only use 'raw' when full control is required
severity: medium
kind: claim_boundary
modality: must
consequence: Using 'raw' mode without explicitly loading pad/balance creates false confidence that balances are being
checked when they are not
stage_ids:
- transformation
- id: finance-C-052
when: When reducing an Inventory balance
action: use the specified reducer function matching the desired balance type (get_units for units, get_cost for cost basis,
get_weight for weighted cost)
severity: high
kind: domain_rule
modality: must
consequence: Using the wrong reducer function produces incorrect balance types, causing financial reports to show wrong
units vs cost basis vs market value
stage_ids:
- realization
- id: finance-C-053
when: When iterating postings with balance in realize function
action: process entries in chronological order to maintain correct running balance
severity: high
kind: domain_rule
modality: must
consequence: Out-of-order entry processing corrupts running balance calculations, causing incorrect account balances and
invalid financial reports
stage_ids:
- realization
- id: finance-C-054
when: When calling realize function
action: pass entries that have been previously filtered and date-sorted
severity: high
kind: architecture_guardrail
modality: must
consequence: Realizing unsorted entries leads to incorrect account tree structure and wrong balance ordering in reports,
breaking the chronological integrity of financial data
stage_ids:
- realization
- id: finance-C-055
when: When building the RealAccount tree
action: create parent accounts as dict containers before adding child account nodes
severity: high
kind: architecture_guardrail
modality: must
consequence: Missing parent accounts breaks the hierarchical tree structure, causing KeyError exceptions and preventing
balance aggregation across the account hierarchy
stage_ids:
- realization
- id: finance-C-056
when: When using balance_reducer function
action: provide a compatible reducer function accepting Position and returning Amount
severity: high
kind: resource_boundary
modality: must
consequence: Incompatible reducer functions cause type errors during balance computation, preventing the realization stage
from completing and generating account reports
stage_ids:
- realization
- id: finance-C-057
when: When computing account balance from postings
action: initialize balance as empty Inventory and accumulate positions incrementally
severity: high
kind: architecture_guardrail
modality: must
consequence: Improper balance initialization causes position data loss, leading to incorrect final balances that do not
reflect all transactions in the account
stage_ids:
- realization
- id: finance-C-058
when: When calling iterate_with_balance
action: not pass Posting instances directly - use TxnPosting wrappers instead
severity: high
kind: domain_rule
modality: must
consequence: Passing raw Posting instances causes assertion errors at line 420, breaking the iteration and preventing
balance accumulation
stage_ids:
- realization
- id: finance-C-059
when: When computing the total balance of parent accounts
action: aggregate balances from each child accounts using Inventory addition
severity: high
kind: architecture_guardrail
modality: must
consequence: Not aggregating child balances causes parent accounts to show incorrect balances that exclude subaccount
positions, breaking hierarchical balance reporting
stage_ids:
- realization
- id: finance-C-060
when: When realizing entries without postings
action: handle empty entry lists gracefully by returning root RealAccount with empty balances
severity: medium
kind: operational_lesson
modality: must
consequence: Not handling empty entry lists causes null reference errors, preventing the system from generating reports
when no transactions exist
stage_ids:
- realization
- id: finance-C-066
when: When generating opening balance entries for accounts with empty balances
action: Create opening balance transactions for accounts with zero inventory
severity: high
kind: domain_rule
modality: must_not
consequence: Synthetic transactions will be created for accounts with no activity, cluttering the balance sheet and potentially
causing downstream calculation errors
stage_ids:
- summarization
- id: finance-C-067
when: When transfer_balances() removes balance assertions after a cutoff
action: Remove Balance assertions for transferred accounts that occur after the transfer date
severity: high
kind: architecture_guardrail
modality: must
consequence: Balance assertions will fail because the account balance has been transferred away, causing validation errors
or incorrect balance checks
stage_ids:
- summarization
- id: finance-C-068
when: When summarizing entries that contain positions with cost basis
action: Include cost information in the summarized postings to preserve the original cost basis
severity: high
kind: domain_rule
modality: must
consequence: Positions will lose their cost basis information, causing incorrect average cost calculations and misreported
asset values
stage_ids:
- summarization
- id: finance-C-069
when: When using GetAccounts class to gather accounts from directives
action: Verify each directive classes have corresponding handler methods defined
severity: high
kind: resource_boundary
modality: must
consequence: AttributeError will be raised for unknown directive types, causing the summarization pipeline to fail on
valid entries
stage_ids:
- summarization
- id: finance-C-070
when: When creating summarized entries for period reporting
action: Sort the combined entries (open, price, summarizing) by data.entry_sortkey before returning
severity: high
kind: architecture_guardrail
modality: must
consequence: Entries will not be in chronological order, violating the requirement that entries be sorted by date and
potentially causing incorrect balance calculations
stage_ids:
- summarization
- id: finance-C-072
when: When entries list is empty before calling summarization functions
action: Return the empty entries list immediately without processing
severity: high
kind: operational_lesson
modality: must
consequence: IndexError or other exceptions may be raised when processing empty lists, causing the pipeline to fail silently
or with cryptic errors
stage_ids:
- summarization
- id: finance-C-073
when: When computing conversion balances for entries with positions at cost
action: Use conversion_cost_balance (reduced by convert.get_cost) for creating conversion entries
severity: high
kind: domain_rule
modality: must
consequence: Conversion entries will be created for positions at cost rather than units, causing incorrect currency conversion
calculations
stage_ids:
- summarization
- id: finance-C-074
when: When summarizing entries and preserving price directives
action: Preserve only the last relevant price entry for each commodity before the cutoff date
severity: medium
kind: operational_lesson
modality: must
consequence: Multiple stale price entries will be retained, causing the price fetcher to query unnecessary historical
prices and potentially use outdated conversion rates
stage_ids:
- summarization
- id: finance-C-075
when: When presenting summarized entries as the basis for a balance sheet
action: Claim that the summarized entries represent actual original transactions
severity: high
kind: claim_boundary
modality: must_not
consequence: Stakeholders will be misled about the provenance of transactions; summarized entries are synthetic replacements,
not original ledger entries
stage_ids:
- summarization
- id: finance-C-077
when: When creating conversion entries with conversion_currency parameter
action: Use the specified conversion_currency for each zero-priced conversion postings
severity: high
kind: resource_boundary
modality: must
consequence: Conversion entries will use the wrong target currency, causing incorrect balance calculations when the ledger
contains multiple currencies
stage_ids:
- summarization
- id: finance-C-078
when: When handling entries with compress_unbooked option for NONE booking
action: Merge postings together to obtain accurate cost basis for accounts with NONE booking method
severity: medium
kind: operational_lesson
modality: must
consequence: Individual lot positions will cause misleading profit/loss calculations because positions are not properly
matched against existing cost bases
stage_ids:
- summarization
- id: finance-C-082
when: When implementing commodity declaration validation
action: Enforce uniqueness of Commodity directives per currency
severity: high
kind: domain_rule
modality: must
consequence: Duplicate Commodity directives for the same currency create ambiguous price information, causing incorrect
cost basis calculations and price lookups
stage_ids:
- validation
- id: finance-C-086
when: When loading and validating a ledger file
action: Collect each validation errors rather than stopping at the first error
severity: high
kind: domain_rule
modality: must
consequence: Stopping at the first error prevents users from seeing all issues at once, requiring multiple fix-run cycles
to discover all problems in the ledger
stage_ids:
- validation
- id: finance-C-087
when: When implementing extra validation injection
action: Support the extra_validations parameter for custom validation functions
severity: medium
kind: architecture_guardrail
modality: must
consequence: Without extensible validations, users cannot add custom business rules, limiting the system's ability to
enforce domain-specific accounting policies
stage_ids:
- validation
- id: finance-C-090
when: When implementing account lifecycle exceptions
action: Allow Balance, Document, and Note directives after account closure
severity: medium
kind: domain_rule
modality: must
consequence: Rejecting legitimate Balance/Document/Note entries after close prevents users from verifying account closure
correctness or attaching late-arriving documents
stage_ids:
- validation
- id: finance-C-091
when: When implementing document path validation
action: Require each Document entries to have absolute file paths
severity: medium
kind: domain_rule
modality: must
consequence: Relative paths in Document directives cause file lookup failures when the working directory changes, breaking
document association functionality
stage_ids:
- validation
- id: finance-C-092
when: When implementing tolerance-based balance checking
action: Use tolerance when comparing transaction residual to zero
severity: high
kind: domain_rule
modality: must
consequence: Comparing without tolerance causes false errors for legitimate rounding differences, rejecting valid transactions
due to sub-cent precision differences
stage_ids:
- validation
- id: finance-C-093
when: When implementing data type validation
action: Check entry attribute data types match expected schema
severity: medium
kind: domain_rule
modality: should
consequence: Invalid data types in entries cause runtime errors during reporting or calculation, leading to crashes or
silent data corruption
stage_ids:
- validation
- id: finance-C-094
when: When configuring validation levels
action: Separate BASIC_VALIDATIONS from slow HARDCORE_VALIDATIONS
severity: low
kind: operational_lesson
modality: should
consequence: Running all validations including slow ones during development creates unnecessary performance overhead,
slowing down iteration cycles
stage_ids:
- validation
- id: finance-C-096
when: When booking method is specified in Open directive
action: Use the per-account Booking enum value from Open directive instead of the global option_map booking_method
severity: high
kind: architecture_guardrail
modality: must
consequence: Incorrect lot matching method applied to account, leading to wrong cost basis and potentially incorrect capital
gains calculations
- id: finance-C-098
when: When BalanceError list is passed from booking stage
action: Must not halt processing when BalanceError is encountered; errors must accumulate for downstream reporting
severity: high
kind: architecture_guardrail
modality: must_not
consequence: User will not see validation errors indicating mismatched balances, leading to undetected accounting errors
in final reports
- id: finance-C-099
when: When directives flow through plugin pipeline PLUGINS_PRE → user plugins → PLUGINS_AUTO → PLUGINS_POST
action: Maintain entries sorted by entry_sortkey after each plugin execution
severity: high
kind: architecture_guardrail
modality: must
consequence: Plugins may process entries in wrong chronological order, breaking accounting logic that depends on sequential
position updates
- id: finance-C-101
when: When user plugin raises an exception during transformation
action: Must not crash the entire loader; errors must be caught and accumulated for user reporting
severity: high
kind: architecture_guardrail
modality: must_not
consequence: Entire ledger processing halts on single plugin error, preventing user from seeing all other errors or generating
partial reports
- id: finance-C-103
when: When accumulated errors from each previous stages are passed to validation
action: Combine each errors from parsing, booking, and transformation stages before reporting to user
severity: high
kind: architecture_guardrail
modality: must
consequence: User sees incomplete error list, missing critical issues from earlier stages that affect data integrity
- id: finance-C-104
when: When options map is passed from parsing to validation
action: Provide tolerance_multiplier and inferred_tolerance_default values for balance validation checks
severity: high
kind: domain_rule
modality: must
consequence: Balance checks will use incorrect tolerances, failing valid entries or accepting invalid ones, corrupting
financial records
- id: finance-C-105
when: When options map is passed from parsing to validation
action: Provide account_types configuration for validating active account references
severity: medium
kind: domain_rule
modality: must
consequence: Validation will incorrectly flag transactions on valid accounts as errors or fail to detect transactions
on closed accounts
- id: finance-C-106
when: When validated directives are passed to realization for report generation
action: Pass entries that have passed each validation checks; any failed transactions must still be included with specified
error flags
severity: high
kind: architecture_guardrail
modality: must
consequence: Reports will show incorrect balances if invalid entries are silently dropped, or users will miss critical
validation errors if entries are hidden
- id: finance-C-107
when: When validated directives are passed to realization
action: Use the display_context from options_map for formatting monetary amounts in reports
severity: medium
kind: resource_boundary
modality: must
consequence: Reports will display incorrect number precision, showing wrong decimal places that could mislead users about
actual account balances
- id: finance-C-109
when: When writing or loading beancount source files with numeric amounts
action: Use the D() function to parse numbers instead of direct Decimal() constructor — D() handles comma thousands separators
and None values
severity: high
kind: domain_rule
modality: must
consequence: Numbers with comma thousand separators fail to parse correctly, causing ValueError exceptions and preventing
ledger file loading
- id: finance-C-111
when: When processing financial data with incomplete cost or number specifications
action: Use the MISSING sentinel class (not None or empty string) to represent incomplete/interpolatable fields — MISSING
is designed to appear correctly in error messages
severity: high
kind: domain_rule
modality: must
consequence: Using None instead of MISSING for incomplete data causes AttributeError or confusing TypeError messages when
interpolation is attempted
- id: finance-C-112
when: When adding tags or links to Transaction or other directive instances
action: Use EMPTY_SET (frozenset()) instead of None for absent tags or links — never use None as a placeholder for empty
collections
severity: high
kind: architecture_guardrail
modality: must
consequence: Using None for empty tags/links causes AttributeError when code iterates over tags or links, breaking plugin
processing and report generation
- id: finance-C-115
when: When validating or creating currency symbols in beancount
action: Match currency names against CURRENCY_RE regex — valid currencies are uppercase alphanumeric with optional dots/underscores/hyphens,
or forward-slash currency pairs like USD/EUR
severity: high
kind: domain_rule
modality: must
consequence: Invalid currency symbols accepted without validation cause parsing errors in downstream processing and generate
confusing error messages
- id: finance-C-116
when: When working with dates in beancount directives
action: Use datetime.date objects only (no time component) — beancount is a date-based accounting system with no support
for time-of-day timestamps
severity: high
kind: domain_rule
modality: must
consequence: Attempting to use datetime.datetime with time components causes type errors since all directive date fields
expect datetime.date without time
- id: finance-C-117
when: When booking ambiguous lots in an Inventory (multiple matching lots for a posting)
action: Apply the booking method declared on the account's Open directive (STRICT, STRICT_WITH_SIZE, AVERAGE, FIFO, LIFO,
HIFO, or NONE) — never arbitrarily pick a lot
severity: high
kind: architecture_guardrail
modality: must
consequence: Incorrect lot selection causes wrong cost basis for asset sales, distorting realized gains/losses and violating
accounting consistency requirements
- id: finance-C-118
when: When presenting or reporting beancount's capabilities to users
action: Claim that beancount supports real-time financial transactions — beancount is a batch-processing text-file-based
double-entry bookkeeping language, not a real-time trading or payment system
severity: high
kind: claim_boundary
modality: must_not
consequence: Users build integrations expecting live transaction recording and immediate balance updates, but beancount
requires file reloads and produces stale data between processing runs
- id: finance-C-119
when: When presenting or reporting beancount's capabilities to users
action: Claim that beancount supports multi-user ledger systems — beancount is a single-user file-based accounting tool
with no concurrency controls, user authentication, or access control
severity: high
kind: claim_boundary
modality: must_not
consequence: Multiple users editing the same beancount file simultaneously cause data corruption from concurrent writes
with no conflict resolution mechanism
- id: finance-C-120
when: When presenting or reporting beancount's capabilities to users
action: Claim that beancount supports complex derivatives pricing — beancount handles basic cost basis tracking and currency
conversion but lacks option pricing models, Greeks calculations, or margin modeling
severity: high
kind: claim_boundary
modality: must_not
consequence: Users expecting derivatives analytics receive only basic position tracking, leading to incorrect risk assessment
and compliance failures for derivative portfolios
- id: finance-C-121
when: When presenting or reporting beancount's capabilities to users
action: Claim that beancount provides a GUI-based accounting interface — beancount is a text-file DSL with CLI tools and
a minimal web interface, not a point-and-click accounting application
severity: medium
kind: claim_boundary
modality: must_not
consequence: Users expecting traditional GUI accounting workflows are misled about the technical skills required; beancount
requires text file editing and command-line operation
- id: finance-C-122
when: When reporting balance assertion failures or validation errors from beancount
action: Claim that beancount's error messages provide automatic correction suggestions — beancount detects errors but
requires manual file editing to resolve them; it does not have auto-fix capabilities
severity: medium
kind: claim_boundary
modality: must_not
consequence: Users expecting auto-correction wait indefinitely for fixes that require manual text file edits, delaying
reconciliation and causing frustration
- id: finance-C-123
when: When loading and processing beancount files across different environments or timezones
action: Use consistent date handling since beancount operates exclusively on date-only values (datetime.date) with no
timezone component — no timezone normalization is needed because there is no time component
severity: medium
kind: domain_rule
modality: must
consequence: Attempting to apply timezone transformations to beancount dates causes type errors since all dates are datetime.date
without timezone information
- id: finance-C-124
when: When loading beancount files and the file or any included files have been modified
action: Invalidate the pickle cache and recompute the loaded entries — cache validation must check each included files'
modification times, not just the top-level file
severity: high
kind: operational_lesson
modality: must
consequence: Stale cache causes beancount to return outdated entries and balances that don't reflect recent file changes,
leading to incorrect financial reports
- id: finance-C-125
when: When processing directive types that can appear after account closure
action: Allow Balance, Document, and Note directives to appear after their account's Close directive — these are exempt
from the general chronological ordering rule
severity: medium
kind: architecture_guardrail
modality: must
consequence: Rejecting valid post-close Balance/Document/Note directives causes validation errors on legitimate accounting
entries received after account closure
- id: finance-C-126
when: When implementing or refactoring cost tracking data structures
action: Maintain CostSpec as a separate type from Cost — CostSpec represents incomplete cost state (missing date, label,
or number) while Cost represents fully-specified resolved cost; do not merge these into a single type with nullable
fields
severity: high
kind: domain_rule
modality: must
consequence: Merging CostSpec and Cost into a single type with nullable fields creates ambiguity between intentional missing
values and programming errors, causing cost basis calculations to use incomplete specifications and produce incorrect
tax lot information
derived_from_bd_id: BD-006
- id: finance-C-127
when: When implementing number handling in cost or position calculations
action: Use MISSING sentinel for unfilled optional values instead of None — MISSING propagates through computations and
surfaces in error messages with meaningful context, making absence explicit rather than ambiguous
severity: high
kind: domain_rule
modality: must
consequence: Using None for missing values creates ambiguity between intentional absence and null-checking errors, causing
bugs to produce silent incorrect results instead of immediate failures with traceable context
derived_from_bd_id: BD-009
- id: finance-C-128
when: When implementing lot identification and merging logic for tax reporting
action: Identify lots using the complete tuple (account, currency, cost_number, cost_currency, cost_date, cost_label)
— any difference in these fields keeps positions as separate lots; only identical tuples are treated as the same lot
for merging
severity: high
kind: domain_rule
modality: must
consequence: Incomplete lot identification (e.g., omitting cost_date or cost_label) causes separate lots to merge incorrectly,
producing wrong cost basis calculations that lead to incorrect capital gains/losses for tax reporting
derived_from_bd_id: BD-024
- id: finance-C-129
when: When tracking positions with cost specifications for lot accounting
action: Include cost_label in lot identification when positions share identical cost_number, cost_currency, and cost_date
— cost_label (even when empty) is part of the lot identity tuple and must not be omitted for matching purposes
severity: high
kind: domain_rule
modality: must
consequence: Omitting cost_label from lot matching causes lots with identical cost specifications to merge incorrectly,
breaking tax lot accounting for securities that require separate lot tracking based on label differentiation
derived_from_bd_id: BD-042
- id: finance-C-130
when: When processing dates in ledger entries
action: Assume datetimes are timezone-aware without explicit annotation — the framework does not implement UTC normalization
for naive datetimes; without explicit timezone handling, comparisons and calculations across different system timezones
produce incorrect results
severity: high
kind: claim_boundary
modality: must_not
consequence: Without timezone annotation and UTC normalization, date comparisons and calculations produce inconsistent
results depending on system timezone settings, causing ledger entries to be processed incorrectly and financial calculations
to be wrong
derived_from_bd_id: BD-GAP-018
- id: finance-C-131
when: When parsing or creating ledger entries with date/time fields
action: Add explicit timezone annotation to each datetime fields and normalize to UTC during parsing — implement UTC normalization
step before any date comparison or calculation operations to verify consistent timezone handling
severity: high
kind: domain_rule
modality: must
consequence: Without UTC normalization, ledger entries processed in different timezones produce inconsistent calculations,
and cross-timezone reporting generates incorrect financial summaries that do not match when viewed from different locations
derived_from_bd_id: BD-GAP-018
- id: finance-C-132
when: When implementing plugin processing in beancount's plugin_pipeline stage
action: Execute check_closing plugin first before other plugins, and close_tree plugin last after other plugins — maintain
the fixed plugin ordering sequence to verify validation occurs before cleanup operations
severity: high
kind: domain_rule
modality: must
consequence: Changing plugin execution order causes validation to run after potential entry modifications, resulting in
invariant checks that pass but don't reflect the final state — backtest results may appear valid but contain hidden
inconsistencies
derived_from_bd_id: BD-032
- id: finance-C-133
when: When implementing the sellgains plugin's P&L reclassification logic
action: Move only losses (where cost basis exceeds proceeds) to income — must NOT move gains (where proceeds exceed cost)
to income; gains remain in their original expense classification
severity: high
kind: domain_rule
modality: must_not
consequence: Reclassifying gains to income instead of keeping them as expenses creates accounting classification errors
that distort P&L reports and cause incorrect tax calculations
derived_from_bd_id: BD-033
- id: finance-C-134
when: When implementing summarization period date boundaries in report_generation
action: Use end_date minus 1 day (date-1) as the summarization period boundary — verify summarized entries are dated up
to (end_date - 1 day) so source entries on end_date appear chronologically after the summary
severity: high
kind: domain_rule
modality: must
consequence: Using end_date without the -1 offset causes period summaries to overlap with subsequent entries, creating
chronological confusion in reports where the same date appears in both summary and detail
derived_from_bd_id: BD-034
- id: finance-C-135
when: When implementing account transfer entry date assignments in report_generation
action: Date transfer entries exactly 1 day before the transfer date (transfer_date - 1) — verify transferred balances
appear before the transfer cutoff point when filtering entries up to but not including the transfer date
severity: high
kind: domain_rule
modality: must
consequence: Using the actual transfer date instead of date-1 causes transfer entries to be excluded from balance calculations
filtered by transfer date, resulting in incorrect account balances at cutoff points
derived_from_bd_id: BD-035
- id: finance-C-136
when: When implementing price lookup or currency conversion logic in price_resolution
action: Use (base, quote) currency tuple as the lookup key for price retrieval — maintain explicit ordering as price lookups
are directional and (USD, CAD) must be treated as distinct from (CAD, USD)
severity: high
kind: domain_rule
modality: must
consequence: Using a single currency pair string without direction causes incorrect rate selection, converting currencies
at inverse rates and producing wrong valuations in portfolio reports
derived_from_bd_id: BD-037
- id: finance-C-137
when: When implementing currency conversion for currency pairs without direct exchange rates
action: Attempt multi-hop conversion through intermediate currencies when direct rate is unavailable — try each currency
as a potential intermediate step, returning the first valid conversion path found through the price graph
severity: high
kind: domain_rule
modality: must
consequence: Skipping multi-hop conversion when direct rates are missing causes valid conversions to fail silently, resulting
in incomplete portfolio valuations and missing transaction conversions
derived_from_bd_id: BD-038
- id: finance-C-138
when: When implementing booking/lot matching logic and users request strict cost specification
action: 'Be aware that strict booking mode has a fallback: STRICT_WITH_SIZE allows exact-size matches to bypass explicit
cost specification — document this behavior to users and do not promise complete strict enforcement'
severity: high
kind: operational_lesson
modality: must
consequence: Users relying on strict mode for regulatory compliance expect complete enforcement but get unexpected matches
via the STRICT_WITH_SIZE fallback, leading to incorrect lot selections and audit failures
derived_from_bd_id: BD-084
- id: finance-C-139
when: When implementing or modifying the plugin pipeline execution order
action: 'Preserve the fixed plugin ordering: check_closing must run first (after Documents prepend and before other plugins),
and close_tree must run last (after pad/balance append and invariant checking) — changing this order breaks invariant
detection at predictable stages'
severity: high
kind: domain_rule
modality: must
consequence: Altering plugin order causes invariant violations to be detected at unexpected stages or missed entirely,
resulting in data inconsistencies that pass validation but cause downstream calculation errors
derived_from_bd_id: BD-089
- id: finance-C-140
when: When implementing proceeds calculation or validation in lot matching and P&L computation
action: 'Implement validation checks at each cascade stage: verify negation produces positive proceeds (BD-068), verify
weighted accumulation matches expected totals (BD-073), validate currency matching per-currency (BD-070), and fail on
any unmatched currencies (BD-075) — subtle errors in earlier stages can pass through later validation'
severity: medium
kind: operational_lesson
modality: should
consequence: 'Errors in proceeds negation accumulate through the validation cascade: wrong proceeds pass currency-by-currency
validation due to tolerance settings, and only complete currency mismatches trigger errors, causing understated or overstated
P&L'
derived_from_bd_id: BD-090
- id: finance-C-141
when: When implementing account hierarchy or inventory tracking for hierarchical reporting
action: Maintain the complete 4-level account tree structure (Root → Assets/Income/Expenses/Liabilities → sub-accounts
→ components) with nested dict for O(1) lookup, and Inventory objects per account for per-currency cost-basis segregation
— the aggregation chain from leaf to root depends on each levels being present
severity: high
kind: domain_rule
modality: must
consequence: Missing any account level breaks hierarchical rollup, causing incomplete aggregations where sub-account balances
don't sum to parent totals and portfolio reports show incorrect totals
derived_from_bd_id: BD-091
- id: finance-C-142
when: When implementing or refactoring position averaging logic in beancount/core/convert.py
action: Use cost as weight for lots with cost specification, and units as weight for uncosted postings, with explicit
price overrides taking precedence when provided
severity: high
kind: domain_rule
modality: must
consequence: Changing the weight calculation logic causes average price computations to use incorrect weighting, leading
to misstated position values that diverge from actual cost basis and produce wrong P&L reports
derived_from_bd_id: BD-039
- id: finance-C-144
when: When implementing or refactoring account transfer logic in beancount/ops/summarize.py
action: Filter out any Balance directive on the destination (receiving) account after a transfer operation—the transfer
entry itself establishes the correct balance
severity: high
kind: domain_rule
modality: must
consequence: Retaining balance assertions on the destination account after transfer causes false validation errors when
the balance assertion value conflicts with the balance implicitly established by the transfer entry
derived_from_bd_id: BD-041
- id: finance-C-145
when: When implementing or refactoring inventory aggregation logic in beancount/core/inventory.py
action: Use (amount, currency) as the aggregation key during reduce operations to combine positions with identical amount
and currency regardless of individual cost basis
severity: high
kind: domain_rule
modality: must
consequence: Changing the aggregation key to cost basis or other fields causes positions with identical amount and currency
to remain uncombined, producing incorrect net position calculations and distorted portfolio reports
derived_from_bd_id: BD-047
- id: finance-C-146
when: When implementing or refactoring closing entry validation in beancount/plugins/check_closing.py
action: Verify that each Close directive's stated balance matches the computed balance from preceding entries—any discrepancy
must trigger an error
severity: high
kind: architecture_guardrail
modality: must
consequence: Removing the closing balance verification allows discrepancies between declared and actual closing balances
to go undetected, corrupting ledger integrity and producing financial reports that do not reflect actual account state
derived_from_bd_id: BD-048
- id: finance-C-147
when: When implementing or refactoring empty account tree cleanup in beancount/plugins/close_tree.py
action: Remove only accounts that have zero balance AND no subaccounts—preserve any account with meaningful balance or
child accounts
severity: medium
kind: architecture_guardrail
modality: must
consequence: Modifying the cleanup criteria to preserve empty accounts or remove accounts with children clutters the account
tree with meaningless branches, degrading report readability and query performance
derived_from_bd_id: BD-049
- id: finance-C-148
when: When implementing position adjustment or lot creation logic in the trading stage
action: Negate position units to create a selling lot from a buying position — the negated position creates a new lot
with negative units while the original lot remains unchanged until positions are netted by the booking method
severity: high
kind: domain_rule
modality: must
consequence: Without negation, sell transactions cannot properly reduce buying positions, causing duplicate lot creation
and incorrect cost-basis tracking that accumulates over multiple trades
derived_from_bd_id: BD-062
- id: finance-C-149
when: When implementing balance check logic in the check_closing stage
action: Place the balance check on date + 1 day offset from the closing entry date to verify each closing postings have
been applied before verification
severity: high
kind: domain_rule
modality: must
consequence: Same-day balance checks miss pending transactions in the closing batch, creating false balance discrepancies
that trigger incorrect error reports and mask actual reconciliation failures
derived_from_bd_id: BD-063
- id: finance-C-150
when: When implementing or validating balance check date offset in the check_closing stage
action: Verify that the balance verification date equals exactly original entry date + 1 calendar day — the boundary condition
ensures one full day separates closing entries from their verification
severity: high
kind: domain_rule
modality: must
consequence: Incorrect date offsets cause balance checks to run against incomplete closing batches or redundant post-verification
entries, producing unreliable reconciliation results that hide actual accounting errors
derived_from_bd_id: BD-066
- id: finance-C-151
when: When implementing proceeds accumulation logic in the sellgains stage
action: Track proceeds only for account types designated in proceed_types — exclude income accounts and non-proceeds categories
from inventory validation to verify only cash-equivalent accounts participate in reconciliation
severity: high
kind: domain_rule
modality: must
consequence: Including income accounts in proceeds tracking creates artificial reconciliation mismatches and produces
incorrect gain/loss calculations that misstate taxable income and lead to IRS audit findings
derived_from_bd_id: BD-069
- id: finance-C-152
when: When implementing proceeds inventory accumulation in the sellgains stage
action: Accumulate proceeds using weighted conversion that accounts for position size — larger positions must contribute
proportionally more to enable accurate FIFO/LIFO matching for cost-basis calculations
severity: high
kind: domain_rule
modality: must
consequence: Simple sum accumulation ignores position weighting, causing incorrect cost-basis allocation in partial sales
that systematically misstates gains/losses and triggers incorrect tax reporting
derived_from_bd_id: BD-073
- id: finance-C-153
when: When implementing currency validation in the sellgains stage
action: Fail validation if any proceeds currencies cannot be matched to cost currencies — verify complete currency matching
for accurate multi-currency gain/loss reporting
severity: high
kind: domain_rule
modality: must
consequence: Partial currency matching leaves orphaned proceeds that create accounting inconsistencies, causing multi-currency
gain/loss reports to show phantom balances and incorrect per-currency totals
derived_from_bd_id: BD-075
- id: finance-C-154
when: When configuring proceed_types in the sellgains stage for equity compensation tracking
action: Include equity accounts in proceed_types to capture stock vesting, RSU vesting, and stock option exercises — these
equity compensation events must be properly tracked in proceeds calculations
severity: high
kind: domain_rule
modality: must
consequence: Excluding equity accounts misses RSU vesting and option exercise events, causing incomplete proceeds tracking
that misstates cost-basis and generates incorrect 1099-B reports for equity compensation
derived_from_bd_id: BD-076
- id: finance-C-155
when: When implementing error reporting for transactions with multiple validation failures in the sellgains stage
action: Report each applicable error types when a transaction has both proceeds imbalance and missing expense category
— produce multiple error types to enable complete correction rather than sequential fixes
severity: high
kind: domain_rule
modality: must
consequence: Single error reporting hides secondary validation failures, requiring multiple fix-and-rerun cycles that
delay reconciliation and risk fixing symptoms rather than root causes
derived_from_bd_id: BD-079
- id: finance-C-156
when: When validating proceeds currencies for international brokerage accounts in the sellgains stage
action: Accept non-USD currencies (CAD, etc.) as valid proceeds when properly structured — validate foreign currency proceeds
against matching cost currencies to support international accounts
severity: high
kind: domain_rule
modality: must
consequence: Rejecting non-USD currencies causes valid international brokerage transactions to fail validation, blocking
reconciliation for CAD and other foreign currency accounts entirely
derived_from_bd_id: BD-080
- id: finance-C-157
when: When processing zero-priced transactions for corporate actions in the sellgains stage
action: Accept zero-priced transactions when properly structured — some corporate actions like stock splits or distributions
result in zero-value sales that must be captured for correct cost-basis calculations
severity: high
kind: domain_rule
modality: must
consequence: Rejecting zero-priced transactions causes legitimate corporate actions to fail, resulting in incomplete cost-basis
records that cannot properly reflect stock splits and create phantom cost-basis gaps
derived_from_bd_id: BD-081
- id: finance-C-158
when: When implementing period-bound reporting logic that rolls income/expenses to equity
action: Use standard equity accounts (Opening-Balances, Current-Earnings) from BD-029 and enforce correct date ordering
via BD-040's clamp operation when executing the equity rollforward
severity: high
kind: domain_rule
modality: must
consequence: Violating either the standard account requirement or date ordering causes period-bound reporting to fail
silently, producing incorrect equity balances that don't match source transactions
derived_from_bd_id: BD-087
- id: finance-C-159
when: When implementing post-modification invariant checking in the directive pipeline
action: Maintain immutability of NamedTuple directives as defined in BD-002 and verify entries passed to BD-061's invariant
checking remain unmodified during pipeline execution
severity: high
kind: domain_rule
modality: must
consequence: If directives become mutable, modifications could corrupt the entries being verified by invariant checking,
making the verification unreliable and allowing invalid data to propagate through the pipeline
derived_from_bd_id: BD-092
- id: finance-C-160
when: When configuring multiple data providers in production backtesting
action: Assume the framework implements provider priority selection or credential isolation between providers — these
capabilities are not present in the codebase; the framework uses no priority ordering and does not enforce credential
separation
severity: high
kind: claim_boundary
modality: must_not
consequence: Without provider priority handling, the system may select an incorrect or higher-cost data provider, causing
inconsistent data quality and increased operational costs in live trading
derived_from_bd_id: BD-GAP-019
- id: finance-C-161
when: When configuring multiple data providers in production backtesting
action: Implement provider priority configuration with a priority_rank field per provider and use credential isolation
by storing each provider's credentials in separate secure storage entries (e.g., provider_<name>_api_key, provider_<name>_secret)
instead of shared configuration
severity: high
kind: domain_rule
modality: must
consequence: Without explicit priority ranking, the system defaults to arbitrary provider selection, leading to inconsistent
data feeds and potential credential cross-contamination that can cause authentication failures or data leakage
derived_from_bd_id: BD-GAP-019
- id: finance-C-162
when: When implementing or modifying directive sorting and ordering logic in Beancount
action: Maintain date as the primary sort key and lineno (line number) as the secondary sort key for directives occurring
on the same date — preserve this two-level ordering guarantee
severity: high
kind: domain_rule
modality: must
consequence: Removing lineno as secondary sort key breaks deterministic ordering for same-day directives, causing balance
calculation inconsistencies when ledger files are reorganized or include ordering changes
derived_from_bd_id: BD-003
- id: finance-C-163
when: When implementing option parsing and aggregation logic for multi-file ledgers with includes
action: Verify that options are parsed per-file with proper aggregation, ensuring top-level file options dominate after
aggregation for predictable override behavior
severity: medium
kind: operational_lesson
modality: should
consequence: Incorrect option aggregation causes wrong operating_currency and display_context settings in included files,
leading to calculation errors when shared settings in included files conflict with main file settings
derived_from_bd_id: BD-004
- id: finance-C-164
when: When implementing or modifying the Beancount loader validation system
action: Preserve the ability to inject custom validation functions at load time — extra validation functions must execute
after standard validation completes and before data processing
severity: high
kind: architecture_guardrail
modality: must
consequence: Removing custom validation injection breaks domain-specific compliance checks such as regulatory rules and
personal accounting policies, allowing invalid entries to pass through the pipeline undetected
derived_from_bd_id: BD-022
- id: finance-C-165
when: When implementing position cost basis calculations for lot matching
action: Explicitly specify the booking method (FIFO, LIFO, MOST_RECENT, STRICT) when creating lots rather than relying
on framework defaults, as the booking method directly determines which lots are consumed first and affects realized
gains calculations
severity: medium
kind: operational_lesson
modality: should
consequence: Without explicit booking method specification, the framework uses its default method which may differ across
versions or configurations, producing inconsistent cost basis calculations that lead to materially different realized
gains or losses reports
derived_from_bd_id: BD-007
- id: finance-C-166
when: When processing positions with cost specifications
action: Explicitly specify cost currency in cost specifications when it differs from the position's units currency; when
cost currency is omitted, verify that the framework correctly infers cost currency matches units currency
severity: high
kind: domain_rule
modality: must
consequence: Silent cost currency inference causes multi-currency portfolios to incorrectly value positions when cost
and units currencies differ, leading to balance sheet errors and potentially incorrect P&L calculations in foreign-denominated
holdings
derived_from_bd_id: BD-036
- id: finance-C-167
when: When implementing period boundary processing or generating balance sheets for arbitrary periods
action: Verify that income and expense accounts are properly closed and rolled into retained earnings at period boundaries
— verify P&L closure follows standard accounting practice for accurate period-relative balance sheets
severity: medium
kind: operational_lesson
modality: should
consequence: Without proper income/expense rollforward to retained earnings, period-relative balance sheets show incorrect
retained earnings balances, leading to wrong equity calculations and potentially misguided strategy decisions based
on faulty capital estimates
derived_from_bd_id: BD-017
- id: finance-C-168
when: When validating transaction balance assertions or implementing posting validation logic
action: Enforce balanced postings invariant (sum to zero) strictly at transaction creation time; apply tolerance-based
balance checking only during balance assertion interpolation, not during transaction posting validation — maintain clear
scope separation between BD-023 and BD-026
severity: high
kind: domain_rule
modality: must
consequence: Mixing invariant-level and tolerance-level validation scopes causes inconsistent balance checking behavior
— transactions may be accepted that fail strict invariant checks, or rejected that pass tolerance checks, leading to
silent data integrity issues
derived_from_bd_id: BD-083
- id: finance-C-169
when: When processing financial calculations involving external data ingestion or price lookups
action: Verify each financial arithmetic operations use Decimal type for precision — verify that external data feeds,
price lookups, and data ingestion pipelines are converted to Decimal before calculations; do not rely on floating-point
arithmetic for any monetary values
severity: high
kind: domain_rule
modality: must
consequence: Floating-point calculations for financial values introduce rounding errors that accumulate over multiple
transactions; strategies relying on precise cost-basis calculations may execute incorrectly due to sub-cent errors,
causing systematic profit/loss misreporting
derived_from_bd_id: BD-085
- id: finance-C-170
when: When caching Google Drive API responses using file-based caching
action: Assume the file-based cache handles data staleness automatically — the cache has no automatic invalidation mechanism
and persists indefinitely until manually deleted
severity: high
kind: claim_boundary
modality: must_not
consequence: Without automatic cache invalidation, backtests use stale data when source Google Drive files are updated;
this causes backtest-live inconsistency where strategy decisions are based on outdated financial data
derived_from_bd_id: BD-GAP-006
- id: finance-C-171
when: When caching Google Drive API responses in backtesting workflows
action: Implement explicit cache invalidation by checking file modification timestamps or API ETag headers; delete stale
cache entries when remote data is newer than cached version, or use TTL-based expiration
severity: high
kind: domain_rule
modality: must
consequence: Backtests using stale cached API responses will execute strategies based on outdated financial data, causing
incorrect allocation decisions and misleading performance metrics
derived_from_bd_id: BD-GAP-006
- id: finance-C-172
when: When implementing or modifying plugin mode configurations in the framework
action: Verify that plugin ordering invariants (BD-061) are maintained across each execution modes, especially when using
raw mode that bypasses built-in plugins (BD-010)
severity: high
kind: operational_lesson
modality: must
consequence: In raw mode, fixed plugin ordering (BD-011, BD-012) is bypassed, causing invariant checks (BD-061) to run
on incompletely-transformed entries and miss silent data corruption that standard mode would catch
derived_from_bd_id: BD-095
- id: finance-C-173
when: When implementing or modifying proceeds calculation logic, especially for short positions
action: Verify that unit negation logic for proceeds calculation (BD-068) produces correct signed values, and validate
weighted accumulation results (BD-073) against independent calculations before relying on per-currency tolerance checks
(BD-070, BD-071)
severity: high
kind: operational_lesson
modality: must
consequence: A negation error in proceeds calculation (BD-068) propagates through weighted accumulation (BD-073), and
the resulting incorrect totals may pass validation due to tolerances (BD-026, BD-071), allowing incorrect gain/loss
to be recorded
derived_from_bd_id: BD-096
- id: finance-C-174
when: When configuring Google Drive API authentication for document downloads in enterprise environments
action: Verify that service account has domain-wide delegation authority configured in GSuite Admin Console before using
service account authentication; if delegation is not available, implement OAuth2 user flow with proper token refresh
handling
severity: high
kind: operational_lesson
modality: must
consequence: Service account authentication silently fails in GSuite environments without domain-wide delegation authority,
causing document download failures that may crash the system or produce incomplete data
derived_from_bd_id: BD-GAP-001
- id: finance-C-175
when: When implementing account structure in beancount
action: 'Use hierarchical colon-separated account naming with one of the 4 standard type prefixes: Assets, Liabilities,
Equity, Income, or Expenses. Accounts must begin with these prefixes to be valid for account type classification and
proper balance sheet organization.'
severity: high
kind: domain_rule
modality: must
consequence: Breaking hierarchical account naming would disrupt account type classification, causing balance sheet and
income statement reports to fail or produce incorrect totals
derived_from_bd_id: BD-028
- id: finance-C-176
when: When implementing inventory equality comparison logic
action: Use set-based equality for inventory comparison — two inventories are equal only when they contain identical lot
sets (sorted position comparison by account and currency), regardless of acquisition order
severity: high
kind: domain_rule
modality: must
consequence: Using list-based or order-dependent comparison would cause inventories with identical lots in different orders
to be treated as unequal, breaking cost basis tracking and lot-level accounting accuracy
derived_from_bd_id: BD-031
- id: finance-C-177
when: When testing or validating multi-leg sale transactions
action: Verify that balanced multi-leg sales (where cash proceeds equal cost basis within tolerance) produce zero validation
errors and are accepted by the sellgains plugin
severity: medium
kind: operational_lesson
modality: should
consequence: Modifying validation to reject balanced transactions would incorrectly flag valid economic outcomes, causing
false accounting errors in correctly structured trades
derived_from_bd_id: BD-077
- id: finance-C-178
when: When processing sell transactions in the sellgains plugin
action: Reject transactions where cash proceeds do not match cost basis within tolerance — must trigger SellGainsError
for unbalanced cash vs cost
severity: high
kind: domain_rule
modality: must
consequence: Allowing mismatched cash flows to pass validation would create accounting errors that propagate through the
system, producing incorrect realized gains/losses calculations
derived_from_bd_id: BD-078
- id: finance-C-179
when: When implementing directive ordering for same-day transactions
action: Preserve parser source location tracking (PLY line numbers) as the secondary sort key for same-day directive ordering
— the sellgains plugin relies on deterministic lineno-based ordering when timestamps are equal
severity: high
kind: architecture_guardrail
modality: must
consequence: Removing or modifying parser line number tracking would make same-day directive ordering non-deterministic,
causing inconsistent validation results and potential silent ordering-dependent bugs in accounting
derived_from_bd_id: BD-082
- id: finance-C-180
when: When implementing Open directive logic for account initialization
action: Preserve the currencies field to limit which currencies can post to the account, and preserve the booking field
to specify lot merging strategy — both constraints are enforced from the Open date onward
severity: high
kind: domain_rule
modality: must
consequence: Modifying or removing the currencies or booking fields during Open directive processing changes account scope
and lot merging behavior, causing unexpected transaction rejections or incorrect position tracking
derived_from_bd_id: BD-050
- id: finance-C-181
when: When implementing Close directive logic for account lifecycle management
action: Verify any transaction dated on or after the Close date for an account triggers an error — preventing posthumous
entries to closed accounts
severity: high
kind: domain_rule
modality: must
consequence: Allowing transactions on or after the Close date creates posthumous entries that violate account lifecycle
semantics, causing data integrity issues and incorrect historical reporting
derived_from_bd_id: BD-051
- id: finance-C-182
when: When implementing Document directive processing or balance calculation logic
action: Use Document directives for any balance calculations or position tracking — they are purely informational for
audit trail and source file references only
severity: high
kind: architecture_guardrail
modality: must_not
consequence: Incorporating Document directives into balance calculations creates phantom balances since Document merely
associates filenames with entry dates without affecting account balances
derived_from_bd_id: BD-052
- id: finance-C-183
when: When implementing date range filtering logic for export or report generation
action: Use half-open interval [begin, end) semantics — include entries on the begin date and exclude entries on the end
date, matching standard Python slice semantics
severity: high
kind: domain_rule
modality: must
consequence: Using closed intervals [begin, end] causes double-counting when exporting sequential periods, as entries
on period boundaries are counted in multiple exports simultaneously
derived_from_bd_id: BD-054
- id: finance-C-184
when: When implementing treeify logic for hierarchical report generation
action: Preserve the conversion of flat (account, balance) tuples to nested dict structure where each level corresponds
to account segments separated by colons — the hierarchical structure enables nested reporting aggregation
severity: high
kind: domain_rule
modality: must
consequence: Returning flat structure instead of nested dict breaks hierarchical aggregation and nested reporting, causing
incorrect or missing account segment rollups in financial reports
derived_from_bd_id: BD-055
output_validator:
assertions:
- id: OV-01
check_predicate: all(p in inspect.getsource(zvt.factors.algorithm.macd) for p in ['slow=26', 'fast=12', 'n=9'])
failure_message: 'FATAL: MACD params drifted from (fast=12, slow=26, n=9) — SL-08 violation, non-reproducible signals'
business_meaning: Standard MACD parameters are a semantic lock; drift makes results incomparable with industry-standard
indicators and non-reproducible.
source_ids:
- SL-08
- BD-036
- id: OV-02
check_predicate: result.get('total_trades', 0) > 0 or result.get('explicit_zero_trade_ack') is True
failure_message: Zero trades executed — likely missing pre-fetched data (see PC-02) or over-restrictive filters
business_meaning: A backtest with zero trades is not a valid result; either data is missing or the strategy never triggered.
Structural non-emptiness check is insufficient — we need business confirmation.
source_ids:
- SL-01
- finance-C-073
- id: OV-03
check_predicate: result.get('annual_return') is None or abs(float(result['annual_return'])) <= 5.0
failure_message: 'FATAL: |annual_return| > 500% — likely look-ahead bias or data error'
business_meaning: Annual returns exceeding 500% are physically implausible for A-share strategies; indicates look-ahead
bias or corrupt data.
source_ids: []
- id: OV-04
check_predicate: result.get('holding_change_pct') is None or abs(float(result['holding_change_pct'])) <= 1.0
failure_message: 'FATAL: |holding_change_pct| > 100% — physically impossible'
business_meaning: Holding change percentage cannot exceed 100%; violation indicates position accounting error.
source_ids:
- BD-029
- id: OV-05
check_predicate: result.get('max_drawdown') is None or abs(float(result['max_drawdown'])) <= 1.0
failure_message: 'FATAL: |max_drawdown| > 100% — impossible for non-leveraged account'
business_meaning: Maximum drawdown cannot exceed 100% without leverage; violation indicates calculation error or look-ahead
bias.
source_ids: []
- id: OV-06
check_predicate: not (hasattr(result, 'trade_log') and result.trade_log and any(result.trade_log[i].action == 'sell' and
i+1 < len(result.trade_log) and result.trade_log[i+1].action == 'buy' and result.trade_log[i].timestamp == result.trade_log[i+1].timestamp
for i in range(len(result.trade_log)-1)))
failure_message: 'FATAL: buy-before-sell detected in same cycle — SL-01 violation, creates implicit leverage'
business_meaning: SL-01 requires sell() before buy() in each cycle; violation means available_long was not updated before
buying, risking duplicate positions.
source_ids:
- SL-01
scaffold:
validate_py_path: '{workspace}/validate.py'
tail_block: "# === DO NOT MODIFY BELOW THIS LINE ===\nif __name__ == \"__main__\":\n result = run_backtest()\n from\
\ validate import enforce_validation\n enforce_validation(result, output_path=\"{workspace}/result.csv\")\n# ===\
\ END DO NOT MODIFY ==="
enforcement_protocol: 1. Never edit validate.py. 2. Never delete the DO NOT MODIFY tail block from the main script. 3. Never
wrap enforce_validation() in try/except. 4. Never rewrite result write logic — it MUST go through enforce_validation.
5. If validate.py raises ImportError, fix the dependency, do not remove the call.
acceptance:
hard_gates:
- id: G1
check: '{workspace}/result.csv exists AND file size > 0'
on_fail: Strategy did not produce output; check run_backtest() return value and enforce_validation() call
- id: G2
check: '{workspace}/result.csv.validation_passed marker file exists'
on_fail: Validation did not complete; review validate.py output and fix assertion failures
- id: G3
check: 'Main script contains literal: from validate import enforce_validation'
on_fail: Validation chain stripped; re-add the import in the DO NOT MODIFY block
- id: G4
check: 'Main script contains literal: # === DO NOT MODIFY BELOW THIS LINE ==='
on_fail: Validation fence removed; regenerate DO NOT MODIFY tail block
- id: G5
check: 'result.csv has at least 1 row: pandas.read_csv(result.csv).shape[0] >= 1'
on_fail: Empty result; check if trade_log is non-empty and factors generated signals. Confirm PC-02 (k-data exists) passed.
- id: G6
check: 'If MACD strategy: source contains ''slow=26'' AND ''fast=12'' AND ''n=9'' in algorithm call'
on_fail: MACD params drifted from SL-08 lock; restore standard (12, 26, 9)
- id: G7
check: 'For data pipeline tasks: result.csv contains ''entity_id'' and ''timestamp'' fields'
on_fail: Missing required columns; check Mixin.query_data return schema and DataFrame MultiIndex reset_index() before
writing
- id: G8
check: 'OV-03 passes: abs(annual_return) <= 5.0 (500%)'
on_fail: Physical plausibility check failed; investigate look-ahead bias or data corruption in input kdata
soft_gates:
- id: SG-01
rubric: 'Strategy narrative consistency: user intent aligns with generated strategy.py logic. dim_a: signal direction
(buy/sell) matches intent [1-5, pass>=4]; dim_b: frequency (daily/intraday) aligns [1-5, pass>=4]; dim_c: risk controls
match user intent [1-5, pass>=4].'
- id: SG-02
rubric: 'Factor combination quality. dim_a: no highly correlated factor duplication [1-5, pass>=4]; dim_b: multi-period
alignment correct [1-5, pass>=4]; dim_c: liquidity filter present for A-share [1-5, pass>=4].'
- id: SG-03
rubric: 'Data source selection appropriateness. dim_a: coverage sufficient for target entities [1-5, pass>=4]; dim_b:
provider latency acceptable for strategy frequency [1-5, pass>=4]; dim_c: no unauthorized provider used without credentials
[1-5, pass>=4].'
skill_crystallization:
trigger: all_hard_gates_passed AND user_opt_out_skill_saving != true
output_path_template: '{workspace}/../skills/{slug}.skill'
slug_template: '{blueprint_id_short}-{uc_id_lower}'
captured_fields:
- name
- intent_keywords
- entry_point_script
- validate_script
- fatal_constraints
- spec_locks
- preconditions
- install_recipes
- human_summary_translated
action: 'After all Hard Gates PASS, resolve slug via slug_template using the executed UC, then write the .skill YAML file
at output_path_template. Notify user in their detected locale: ''Skill saved as {slug}.skill — next time say one of {sample_triggers}
from the matched UC to invoke directly.'''
violation_signal: All hard gates passed but no .skill file exists at expected path
skill_file_schema:
name: finance-bp-129 / Beancount Test Utilities Framework
version: v5.3
intent_keywords:
- testing utilities
- tempdir
- test files
- mock repository
- integration testing
entry_point: run_backtest
fatal_guards:
- SL-01
- SL-02
- SL-03
- SL-04
- SL-05
- SL-06
- SL-07
- SL-08
- SL-10
- SL-11
- SL-12
spec_locks:
- SL-01
- SL-02
- SL-03
- SL-04
- SL-05
- SL-06
- SL-07
- SL-08
- SL-09
- SL-10
- SL-11
- SL-12
preconditions:
- PC-01
- PC-02
- PC-03
- PC-04
post_install_notice:
trigger: skill_installation_complete
message_template:
positioning: I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow.
capability_catalog:
group_strategy:
source: auto_grouped
strategy_reason: no candidate field had 2-7 distinct values; all capabilities collapsed into single group
groups:
- group_id: all
name: All Capabilities
description: ''
emoji: 📦
uc_count: 2
ucs:
- uc_id: UC-101
name: Beancount Test Utilities Framework
short_description: Provides reusable testing utilities for beancount test scripts including temporary directory
management and test file creation for integration testing
sample_triggers:
- testing utilities
- tempdir
- test files
- uc_id: UC-102
name: Test Utils Validation Suite
short_description: Unit tests that validate the correctness of test utility functions including temporary directory
cleanup and test file generation for beancount test s
sample_triggers:
- unit test
- validation
- test utilities
call_to_action: Tell me which one you want to try.
featured_entries:
- uc_id: UC-101
beginner_prompt: Try beancount test utilities framework
auto_selected: true
- uc_id: UC-102
beginner_prompt: Try test utils validation suite
auto_selected: true
- uc_id: UC-100
beginner_prompt: Try capability UC-100
auto_selected: true
more_info_hint: Ask me 'what else can you do?' to see all 2 capabilities.
locale_rendering:
instruction: On skill_installation_complete, translate ALL user-facing strings (positioning + capability_catalog.groups[].name
+ capability_catalog.groups[].description + capability_catalog.groups[].ucs[].short_description + call_to_action + featured_entries[].beginner_prompt
+ more_info_hint) into detected user locale per locale_contract. Preserve UC-IDs, group_id, emoji, and sample_triggers
verbatim.
preserve_verbatim:
- UC-IDs
- group_id
- emoji
- sample_triggers
- technical_class_names
enforcement:
action: 'Host agent MUST send composed message to user as the FIRST user-facing response after skill_installation_complete
event. Message MUST contain: positioning, capability_catalog (rendered as markdown tables per group), 3 featured_entries,
call_to_action, and more_info_hint.'
violation_code: PIN-01
violation_signal: First user-facing message post-install does not contain the full capability_catalog (all UCs grouped)
OR skips featured_entries OR skips call_to_action.
human_summary:
persona: Doraemon
what_i_can_do:
tagline: 'I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow. Just tell me
what you want; I''ll write the code, you don''t have to dig docs. (Heads up: ZVT natively supports A-share, HK, and
crypto. US stocks — stockus_nasdaq_AAPL — are half-baked; don''t bother for serious work.)'
use_cases:
- Test Utils Validation Suite
- Beancount Test Utilities Framework
- A-share MACD daily golden-cross backtest with hfq price adjustment from eastmoney
- 'End-to-end ZVT pipeline: FinanceRecorder + GoodCompanyFactor + StockTrader'
- Multi-factor strategy with TargetSelector (AND mode) combining MACD + volume breakout
- Index composition data collection (SZ1000, SZ2000) with EM recorder
- Institutional fund holdings tracker via joinquant_fund_runner pattern
what_i_auto_fetch:
- ZVT stage pipeline structure (data_collection → visualization) from LATEST.yaml
- Semantic locks (SL-01 through SL-12) — especially sell-before-buy ordering and MACD params
- Fatal constraints (finance-C-*) relevant to your target strategy type
- 'Default parameters: MACD(12,26,9), hfq adjustment, buy_cost=0.001, base_capital=1M CNY'
- Entity ID format (stock_sh_600000) and DataFrame MultiIndex convention
- Provider-specific recorder class names and required class attributes
what_i_ask_you:
- 'Target market: A-share (default), HK, or crypto? (US stocks in ZVT are half-baked — stockus_nasdaq_AAPL exists but coverage
is thin)'
- 'Data source / provider: eastmoney (free, no account), joinquant (account+paid), baostock (free, good history), akshare,
or qmt (broker)?'
- 'Strategy type: MACD golden-cross, MA crossover, volume breakout, fundamental screen, or custom factor?'
- 'Time range: start_timestamp and end_timestamp for backtest period'
- 'Target entity IDs: specific stocks (stock_sh_600000) or index components (SZ1000)?'
locale_rendering:
instruction: On first user contact, translate all fields above into detected user locale while preserving Doraemon persona
(direct, frank, mildly snarky, knows limits).
preserve_verbatim:
- BD-IDs
- SL-IDs
- UC-IDs
- finance-C-IDs
- class_names
- function_names
- file_paths
- numeric_thresholds
运行经典双均线交叉策略回测,事件驱动模拟信号生成与持仓,输出 PyFolio 绩效报告。
---
name: backtrader-event-driven
description: |-
运行经典双均线交叉策略回测,事件驱动模拟信号生成与持仓,输出 PyFolio 绩效报告。
license: Proprietary. See LICENSE.txt in project root.
compatibility: Designed for Doramagic-host ecosystem (Claude Code / openclaw / Cursor). Requires Python 3.12+ with uv package manager.
metadata:
version: "v6.1"
blueprint_id: "finance-bp-086"
compiled_at: "2026-04-22T13:00:35.750880+00:00"
capability_markets: "multi-market"
capability_activities: "backtesting, factor-research"
sop_version: "crystal-compilation-v6.1"
---
# Backtrader 事件回测 (backtrader-event-driven)
> 运行经典双均线交叉策略回测,事件驱动模拟信号生成与持仓,输出 PyFolio 绩效报告。
## Pipeline
`data_collection -> data_storage -> factor_computation -> target_selection -> trading_execution -> visualization`
## Top Use Cases (2 total)
### SMA Crossover Backtester with PyFolio Analytics (`UC-101`)
Implements a classic dual moving average crossover trading strategy using backtrader, generating LONG/LONGSHORT signals when fast and slow SMAs cross,
**Triggers**: backtrader, SMA crossover, moving average
### OHLC Data Printer Utility (`UC-102`)
Provides a minimal backtrader strategy that logs and prints OHLC (Open, High, Low, Close) data in CSV format for debugging and verifying data feed int
**Triggers**: backtrader, data printing, OHLC logging
**Execute trigger**: `When user intent matches intent_router.uc_entries[].positive_terms AND user uses action verb (run/execute/跑/执行/backtest/fetch/collect)`
## What I'll Ask You
- Target market: A-share (default), HK, or crypto? (US stocks in ZVT are half-baked — stockus_nasdaq_AAPL exists but coverage is thin)
- Data source / provider: eastmoney (free, no account), joinquant (account+paid), baostock (free, good history), akshare, or qmt (broker)?
- Strategy type: MACD golden-cross, MA crossover, volume breakout, fundamental screen, or custom factor?
- Time range: start_timestamp and end_timestamp for backtest period
- Target entity IDs: specific stocks (stock_sh_600000) or index components (SZ1000)?
## Semantic Locks (Fatal)
| ID | Rule | On Violation |
|---|---|---|
| `SL-01` | Execute sell orders before buy orders in every trading cycle | halt |
| `SL-02` | Trading signals MUST use next-bar execution (no look-ahead) | halt |
| `SL-03` | Entity IDs MUST follow format entity_type_exchange_code | halt |
| `SL-04` | DataFrame index MUST be MultiIndex (entity_id, timestamp) | halt |
| `SL-05` | TradingSignal MUST have EXACTLY ONE of: position_pct, order_money, order_amount | halt |
| `SL-06` | filter_result column semantics: True=BUY, False=SELL, None/NaN=NO ACTION | halt |
| `SL-07` | Transformer MUST run BEFORE Accumulator in factor pipeline | halt |
| `SL-08` | MACD parameters locked: fast=12, slow=26, signal=9 | halt |
Full lock definitions: [references/LOCKS.md](references/LOCKS.md)
## Top Anti-Patterns (25 total)
- **`AP-ZVT-183`**: 除权因子为 inf/NaN 时直接参与乘法导致复权静默失败
- **`AP-ZVT-179`**: 第三方数据接口超限后异常被吞噬,数据静默缺失
- **`AP-ZVT-183B`**: HFQ(后复权)与 QFQ(前复权)K 线表使用错误导致因子计算漂移
All 25 anti-patterns: [references/ANTI_PATTERNS.md](references/ANTI_PATTERNS.md)
## Evidence Quality Notice
> [QUALITY NOTICE] This crystal was compiled from blueprint finance-bp-086. Evidence verify ratio = 28.8% and audit fail total = 15. Generated results may have uncaptured requirement gaps. Verify critical decisions against source files (LATEST.yaml / LATEST.jsonl).
## Reference Files
| File | Contents | When to Load |
|---|---|---|
| [references/seed.yaml](references/seed.yaml) | V6+ 全量权威 (source-of-truth) | 有行为/决策争议时必读 |
| [references/ANTI_PATTERNS.md](references/ANTI_PATTERNS.md) | 25 条跨项目反模式 | 开始实现前 |
| [references/WISDOM.md](references/WISDOM.md) | 跨项目精华借鉴 | 架构决策时 |
| [references/CONSTRAINTS.md](references/CONSTRAINTS.md) | domain + fatal 约束 | 规则冲突时 |
| [references/USE_CASES.md](references/USE_CASES.md) | 全量 KUC-* 业务场景 | 需要完整示例时 |
| [references/LOCKS.md](references/LOCKS.md) | SL-* + preconditions + hints | 生成回测/交易代码前 |
| [references/COMPONENTS.md](references/COMPONENTS.md) | AST 组件地图(按 module 拆分)| 查 API 时 |
---
*Compiled by Doramagic crystal-compilation-v6.1 from `finance-bp-086` blueprint at 2026-04-22T13:00:35.750880+00:00.*
*See [human_summary.md](human_summary.md) for non-technical overview.*
FILE:human_summary.md
# Human Summary
> I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow. Just tell me what you want; I'll write the code, you don't have to dig docs. (Heads up: ZVT natively supports A-share, HK, and crypto. US stocks — stockus_nasdaq_AAPL — are half-baked; don't bother for serious work.)
## What I Can Do
- **tagline**: I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow. Just tell me what you want; I'll write the code, you don't have to dig docs. (Heads up: ZVT natively supports A-share, HK, and crypto. US stocks — stockus_nasdaq_AAPL — are half-baked; don't bother for serious work.)
- **use_cases**: ['OHLC Data Printer Utility', 'SMA Crossover Backtester with PyFolio Analytics', 'A-share MACD daily golden-cross backtest with hfq price adjustment from eastmoney', 'End-to-end ZVT pipeline: FinanceRecorder + GoodCompanyFactor + StockTrader', 'Multi-factor strategy with TargetSelector (AND mode) combining MACD + volume breakout', 'Index composition data collection (SZ1000, SZ2000) with EM recorder', 'Institutional fund holdings tracker via joinquant_fund_runner pattern']
## What I Ask You
- Target market: A-share (default), HK, or crypto? (US stocks in ZVT are half-baked — stockus_nasdaq_AAPL exists but coverage is thin)
- Data source / provider: eastmoney (free, no account), joinquant (account+paid), baostock (free, good history), akshare, or qmt (broker)?
- Strategy type: MACD golden-cross, MA crossover, volume breakout, fundamental screen, or custom factor?
- Time range: start_timestamp and end_timestamp for backtest period
- Target entity IDs: specific stocks (stock_sh_600000) or index components (SZ1000)?
FILE:references/ANTI_PATTERNS.md
# Anti-Patterns (Cross-Project)
Total: **25**
## qlib (9)
### `AP-QLIB-1930` — 回测结果与模型无关——共享 dataset 对象导致预测值被首次模型覆盖 <sub>(high)</sub>
Qlib 中多个模型复用同一个已 fit 的 DatasetH 实例时,dataset 内部的标准化 参数(fit_start_time/fit_end_time 决定的归一化统计量)在第一次 fit 后固化。 切换模型但不重新初始化 dataset,导致所有模型实际使用同一套预测信号。表现为 无论换 LightGBM/XGBoost/DNN,回测净值曲线完全一致。这是最危险的"实验看起来 在跑,但结论全部无效"反模式。
Source: https://github.com/microsoft/qlib/issues/1930
### `AP-QLIB-2090` — fit_start_time 与 train segment 双重配置引发隐式数据泄露 <sub>(high)</sub>
Qlib DatasetH 有两个"训练数据范围":handler 的 fit_start_time/fit_end_time (决定归一化器拟合范围)和 segments.train(决定模型训练范围)。常见错误是 让 fit_end_time 覆盖 valid/test 段,使归一化统计量(均值、标准差)包含了 未来数据,造成前向偏差(look-ahead bias)。两者独立配置但语义耦合,文档 未明确说明 fit_end_time 必须 <= train_end。
Source: https://github.com/microsoft/qlib/issues/2090
### `AP-QLIB-2036` — MACD 因子公式文档错误——DEA 被多除一次 CLOSE 导致量纲不一致 <sub>(high)</sub>
Qlib 官方文档中的 Alpha 公式示例将 MACD 的 DEA 定义为 EMA(DIF, 9) / CLOSE, 但 DIF 已经是无量纲(除过 CLOSE 的),再次除以 CLOSE 导致 DEA 量纲为 1/price。 基于此文档公式构建的 MACD 因子在截面标准化后与正确公式差异显著,IC 下降。 此类文档层面的公式错误会被大量用户直接照搬入生产因子库。
Source: https://github.com/microsoft/qlib/issues/2036
### `AP-QLIB-2184` — 自定义 A 股数据导入前未按约定填充停牌日 NaN,引发下游因子噪声 <sub>(high)</sub>
Qlib 约定停牌日 open/close/high/low/volume/factor 字段均应填 NaN,以便框架 在因子计算时识别并跳过。用户自建 A 股数据集时若将停牌日保留为上一日价格 (常见于从东财/Wind 直接导出的数据),会导致停牌期间的价格动量因子出现 "假信号"(价格不变但因子非零)。Qlib 不校验此约定,错误静默流入训练数据。
Source: https://github.com/microsoft/qlib/issues/2184
### `AP-QLIB-1892` — PIT(Point-In-Time)财务数据收集器依赖外部股票列表接口,全量 A 股获取不完整 <sub>(high)</sub>
Qlib 的 PIT 数据收集器(财务数据时间点快照)在初始化时调用 get_hs_stock_symbols() 获取沪深股票列表。该函数依赖东财 API,经常仅返回 部分列表而非全量 5000+ 股票,且函数在获取不完整时直接 raise ValueError。 用户若按文档步骤操作,财务数据集将只覆盖部分股票,基于 PIT 财务因子的回测 存在严重生存者偏差(未被采集的股票被隐式排除)。
Source: https://github.com/microsoft/qlib/issues/1892
### `AP-QLIB-2097` — 全市场 instrument="all" 在 32GB 内存机器上 OOM,但 CSI300 正常 <sub>(medium)</sub>
Qlib 在加载 Alpha158 特征时会将指定 universe 的全部特征矩阵一次性载入内存。 使用 instrument="csi300"(300 股)与 instrument="all"(5000+ 股)的内存占用 差约 16 倍。32GB 机器跑全市场时在 init_instance_by_config 阶段直接 OOM, 错误信息不提示内存问题。用户容易误以为是配置错误,实际上需要分批加载或 使用流式特征计算。
Source: https://github.com/microsoft/qlib/issues/2097
### `AP-QLIB-1984` — LightGBM 模型标签维度校验逻辑永远不触发导致多标签训练静默失败 <sub>(medium)</sub>
Qlib gbdt.py 中用 y.values.ndim == 2 判断是否为多标签,但从 DataFrame 取出的 Series 的 ndim 永远为 1,条件永远为 False,因此多标签训练不会走 squeeze 分支,而是直接进入 LightGBM 训练并在更深处抛出语义不明的错误。 用户尝试自定义多标签任务时无法从错误信息定位到此根因。
Source: https://github.com/microsoft/qlib/issues/1984
### `AP-QLIB-1915` — 自定义 CSV 数据 dump_bin 后 DataHandler 报 Length mismatch,D.features 却正常 <sub>(high)</sub>
Qlib 存在两套数据访问路径:D.features(直接读 binary)和 DataHandler/DataHandlerLP (带 processor pipeline)。自定义 A 股 CSV 数据在 dump_bin 时若字段顺序 或 symbol 格式(如 600000.SH vs SH600000)与 Qlib 约定不符,DataHandler 的 processor 在 align/reindex 时触发 Length mismatch,而 D.features 因不 经过 processor 而成功。这一"两套路径行为不一致"让用户误以为数据已正确导入。
Source: https://github.com/microsoft/qlib/issues/1915
### `AP-QLIB-1949` — Colab/Linux 多进程后端与 Qlib ParallelExt 冲突导致 DataHandler 完全不可用 <sub>(medium)</sub>
Qlib 在非 fork 环境(Windows 或 Google Colab)中,DataHandler 使用 joblib 并行加载特征时,ParallelExt 初始化时访问 _backend_args 属性失败(AttributeError)。 根因是 joblib 1.5+ 移除了该内部属性,Qlib 的兼容层未更新。表现为 D.features 调用抛出多层嵌套异常,用户无法从错误栈判断是并行后端问题还是数据问题。
Source: https://github.com/microsoft/qlib/issues/1949
## vnpy (4)
### `AP-VNPY-3691` — K 线生成器首根 K 线时间戳不对齐,导致第一个周期信号错误 <sub>(high)</sub>
vnpy BarGenerator 在合成 N 分钟 K 线时,第一根推送的 K 线时间戳为"当前 tick 所在分钟"而非"完整 N 分钟周期结束时间"。具体表现:09:59 的 tick 会 触发一根不完整的 5 分钟 K 线推送(本应等到 10:04 才推送)。策略若在 on_bar 中直接用 datetime.minute % 5 过滤,第一根 K 线恰好通过,但包含的 数据不足一个完整周期,用于信号计算会产生错误的开仓信号。
Source: https://github.com/vnpy/vnpy/issues/3691
### `AP-VNPY-3669` — Alpha 模块历史数据增量保存时新旧 DataFrame schema 不兼容导致 SchemaError <sub>(medium)</sub>
vnpy Alpha 模块在保存 K 线数据到 Parquet 文件时,将新下载数据(可能含 Float64 列)与已存文件(历史 Int64 列)直接 polars.concat。polars 强类型 不允许隐式类型提升,抛出 SchemaError。根因是不同数据源/版本返回的字段类型 不一致(如 volume 在部分行情源为整数,在另一些为浮点),且 concat 前无 schema 对齐步骤。影响所有使用 vnpy alpha 进行回测的历史数据构建流程。
Source: https://github.com/vnpy/vnpy/issues/3669
### `AP-VNPY-3685` — 价差交易模块 run_backtesting() 在 Jupyter 环境下静默报错,结果不可信 <sub>(high)</sub>
vnpy 4.10 价差交易(SpreadTrading)模块的 run_backtesting() 在 Jupyter 环境下存在事件循环冲突(asyncio already running),导致回测引擎部分逻辑 不执行但不抛异常,返回看似正常的回测统计数据。同样代码在命令行 Python 中无此问题。vnpy 4.x 将部分 IO 改为 async 但 Jupyter 的事件循环与之不兼容, 是"回测结果看起来正确但实际不完整"的隐蔽陷阱。
Source: https://github.com/vnpy/vnpy/issues/3685
### `AP-VNPY-3700` — 安装脚本不使用 venv 导致全局 numpy 版本被降级破坏其他依赖 <sub>(medium)</sub>
vnpy install.bat 直接在系统/conda base 环境安装,会强制降级 numpy 到 <2.0 以满足 vnpy 依赖,破坏依赖 numpy 2.x 的其他量化工具(如 scipy、pytorch 新版)。 没有 requirements.txt,依赖边界不透明。在多工具共存的量化研究环境中, vnpy 的安装脚本是"全局环境污染"的常见根源。
Source: https://github.com/vnpy/vnpy/issues/3700
## zipline (6)
### `AP-ZIPLINE-138` — 回测价格为未复权价,教程图表误导用户误判策略收益 <sub>(high)</sub>
Zipline 教程使用 AAPL 股价图做演示,但 bundle 中存储的是未复权价格(raw price), 而非经过拆股/分红调整的复权价。图表显示的历史价格与市场实际价约差 4 倍(Apple 历次拆股累计因子),用户误将"价格翻 4 倍"当作策略收益。A 股场景更严重: 除权前后价格跳变会在未复权数据中形成巨大"信号",吸引技术指标在除权日产生 虚假突破信号。
Source: https://github.com/stefan-jansen/zipline-reloaded/issues/138
### `AP-ZIPLINE-235` — 默认以当根 K 线收盘价成交,低估实盘滑点,策略回测收益虚高 <sub>(high)</sub>
Zipline 默认滑点模型在当根 K 线触发信号后,以同根 K 线收盘价成交(current bar close fill)。实盘中信号只能在下一根 K 线的开盘价附近成交(T+1 order execution)。以 A 股日线为例,用收盘价回测比用次日开盘价成交平均高估日收益 约 0.1-0.3%,年化差距可超 30%。需显式配置 slippage model 为 VolumeShareSlippage 或 FixedSlippage 并设合理 volume_limit。
Source: https://github.com/stefan-jansen/zipline-reloaded/issues/235
### `AP-ZIPLINE-190` — 日历 start_session 设为非交易日触发 DateOutOfBounds,无提示如何修正 <sub>(medium)</sub>
Zipline 在注册 bundle 或运行算法时,若 start_session 参数恰好是非交易日 (如 1998-01-01 元旦),Calendar 校验抛出 DateOutOfBounds("cannot be earlier than the first session")。错误信息仅显示交易日历起始日,不提示"请改为第一个 交易日"。A 股场景:使用 SSE/SZSE 日历时,若 start_date 恰好是春节前最后 一天次日(节假日),会触发同类错误,调试成本极高。
Source: https://github.com/stefan-jansen/zipline-reloaded/issues/190
### `AP-ZIPLINE-181` — asset db 过期后 Pipeline 报"no assets traded",误导用户排查数据范围 <sub>(high)</sub>
Zipline 的 asset database(SQLite)记录每只股票的 start/end 交易日期。若 使用了旧版 Quandl/自建 bundle 且未重新 ingest,在回测新日期范围时 Pipeline 抛出 "Failed to find any assets with country_code 'US' that traded between [dates]"。A 股场景:重新下载行情后若只更新价格数据而未重建 asset db,退市/ 新上市股票的日期范围不更新,Pipeline 过滤会悄悄排除这些股票,产生生存者偏差。
Source: https://github.com/stefan-jansen/zipline-reloaded/issues/181
### `AP-ZIPLINE-285` — week_start()/week_end() 在自定义日历(非美股)下静默失效 <sub>(medium)</sub>
Zipline schedule_function 的 date_rules.week_start() 和 date_rules.week_end() 依赖交易日历的周首/周末判断逻辑,但在非美股日历(如 ASX、SSE)中,该逻辑 与 NYSE 日历的偏移计算不兼容,导致 schedule 永远不触发或在错误的日期触发。 A 股场景:使用 SSE 日历时,含春节等连续长假的周,week_start 可能跳过整个 假期周而不调仓,但用户无法从日志发现未触发的调度。
Source: https://github.com/stefan-jansen/zipline-reloaded/issues/285
### `AP-ZIPLINE-240` — 回测日期时区必须为 UTC,传入 naive datetime 引发深层 AssertionError <sub>(medium)</sub>
Zipline 内部强制要求所有时间戳为 UTC aware datetime。当用户传入 naive datetime (无时区信息,如 pd.Timestamp('2020-01-01'))时,不在入口处报错,而是在 算法执行深处触发 AssertionError: Algorithm should have a utc datetime,栈深 难以定位。A 股开发者从本地 CST 时间导入数据时极易触发此陷阱,需在 bundle 注册时显式 tz_localize('UTC')。
Source: https://github.com/stefan-jansen/zipline-reloaded/issues/240
## zvt (6)
### `AP-ZVT-183` — 除权因子为 inf/NaN 时直接参与乘法导致复权静默失败 <sub>(high)</sub>
ZVT 在计算前复权因子时以 new/old 价格比计算 qfq_factor。当 old==0(新股首日 或数据缺失)时因子为 inf;当 kdata.open 本身为 None(停牌日未填充)时乘法 抛出 TypeError。结果:整个 entity 的复权计算中断,后续 K 线全部丢失,但主 流程只 log ERROR 不中断,用户往往不知道已有大量股票数据损坏。
Source: https://github.com/zvtvz/zvt/issues/183
### `AP-ZVT-179` — 第三方数据接口超限后异常被吞噬,数据静默缺失 <sub>(high)</sub>
ZVT 使用聚宽 jqdatasdk 批量拉取全市场 K 线时(4000+ 股票),触发聚宽每日 最大查询条数限制(错误:已超过每日最大查询数量)。ZVT 捕获异常后继续执行下一 entity,导致超限后所有股票的当日数据均静默缺失。回测若使用该残缺数据库,因 子计算结果将产生系统性偏差,且无告警。
Source: https://github.com/zvtvz/zvt/issues/179
### `AP-ZVT-161` — 全市场 SQLite 批量因子计算触发 too many SQL variables 错误 <sub>(medium)</sub>
ZVT 在计算 VolumeUpMaFactor 等多股因子时,将所有 entity_id 拼入单条 SQL 的 IN 子句。当 A 股全市场(5000+ 股)一次性查询时,触发 SQLite 默认限制 SQLITE_MAX_VARIABLE_NUMBER=999。调大 max_allowed_packet(MySQL 参数)无效, 根因是 SQLite 变量数上限。正确解法是分批查询,但 ZVT 早期版本未处理此边界。
Source: https://github.com/zvtvz/zvt/issues/161
### `AP-ZVT-129` — 使用通配符导入隐藏 API 版本变更,AdjustType 等枚举莫名消失 <sub>(medium)</sub>
ZVT 文档示例使用 `from zvt import *` 导入所有符号。当 ZVT 版本升级重构 枚举(如将 AdjustType 移入子模块)后,通配符导入不再包含该符号,触发 AttributeError。使用者误以为是安装问题,实际是版本间 API breaking change 未在 CHANGELOG 中标注,且通配符导入掩盖了具体来源。应显式 import 枚举类。
Source: https://github.com/zvtvz/zvt/issues/129
### `AP-ZVT-187` — 回测引擎未在数据层空结果时提前终止,导致空指针级联崩溃 <sub>(medium)</sub>
ZVT Trader 在 load_data 完成后检查数据为空时,不提前退出,而是将空 DataFrame 传入 selector 计算,触发后续 NoneType 操作链式崩溃。错误栈深且难以定位根因, 用户误以为是策略逻辑问题。根因是数据时间窗口配置错误(start/end 不在数据 库覆盖范围内)但无有效校验。
Source: https://github.com/zvtvz/zvt/issues/187
### `AP-ZVT-183B` — HFQ(后复权)与 QFQ(前复权)K 线表使用错误导致因子计算漂移 <sub>(high)</sub>
ZVT 提供 Stock1dKdata(不复权)、Stock1dHfqKdata(后复权)、Stock1dQfqKdata (前复权)三张独立表。用户在计算价格动量/均线因子时混用两张表(如用不复权 做均线,用后复权做收益率),导致除权日前后因子值产生跳变。ZVT 不做跨表 复权类型一致性校验,混用静默通过。
Source: https://github.com/zvtvz/zvt/issues/183
FILE:references/COMPONENTS.md
# Component Capability Map
**Project**: finance-bp-086--backtrader
**Scan date**: 2026-04-22
**Stats**: {'total_files': 7, 'total_classes': 24, 'total_functions': 0, 'total_stages': 7}
## Modules (7)
- [data_ingestion](components/data_ingestion.md): 3 classes
- [data_filtering_&_resampling](components/data_filtering_-_resampling.md): 2 classes
- [indicator_computation](components/indicator_computation.md): 3 classes
- [strategy_logic](components/strategy_logic.md): 6 classes
- [order_execution_&_broker](components/order_execution_-_broker.md): 4 classes
- [analysis_&_reporting](components/analysis_-_reporting.md): 2 classes
- [cerebro_orchestration](components/cerebro_orchestration.md): 4 classes
FILE:references/CONSTRAINTS.md
# Constraints
## preservation_manifest
```yaml
required_objects:
business_decisions_count: 184
fatal_constraints_count: 30
non_fatal_constraints_count: 239
use_cases_count: 2
semantic_locks_count: 12
preconditions_count: 4
evidence_quality_rules_count: 2
traceback_scenarios_count: 5
```
## Domain Constraints Injected (39)
- **`SHARED-BT-LAB-001`** <sub>(fatal)</sub>: 未来函数(Lookahead Bias):在模拟历史时间点 t 的交易决策时, 不得使用 t 时刻之后才能知道的信息。最常见形式: (1) 使用收盘价计算信号并同日以收盘价成交; (2) 将 T 日收盘后计算的指标标记在同一根 K 线; (3) 使用当日最高/最低价作为成交假设。 信号计算与成交时间必须对齐:T 日收盘后计算信号,T+1 日开盘成交。
- **`SHARED-BT-LAB-002`** <sub>(high)</sub>: 指标预热期(Warmup Period)处理:滚动窗口指标在前 N 个 bar 时 NaN, 这些 bar 不应参与信号计算和持仓决策。强制要求指标的 warmup_period 与最长 lookback 期等长,且 warmup 期间持仓应置零。
- **`SHARED-BT-LAB-003`** <sub>(fatal)</sub>: ML/DL 模型时序数据划分必须按时间顺序:TRAIN < VALID < TEST, 不可使用随机 k-fold 分折(会将未来数据混入训练集)。 应使用 TimeSeriesSplit 或 Walk-Forward 验证。
- **`SHARED-BT-LAB-004`** <sub>(fatal)</sub>: 开盘价/最高价/最低价成交假设:日线回测中假设每日可以最高价卖出或 最低价买入(如动量策略"最高价止盈"),这是明显的 lookahead, 因为日内最高/最低价只有收盘后才能确认。成交价只能用开盘价或 前一日收盘价(带滑点)。
- **`SHARED-BT-LAB-005`** <sub>(high)</sub>: 数据对齐偏移(Off-by-one):pandas rolling/shift 等操作容易引入细微的 1 期偏移错误。应在代码中明确记录每个序列的"观测时间点", 并通过 assert 验证关键时间对齐关系。
- **`SHARED-BT-LAB-006`** <sub>(high)</sub>: 过度优化(Overfitting):回测数量越多,过拟合概率越高。 Bailey et al.(2014)证明 Optimal Sharpe Ratio 期望值随回测次数单调递减。 应使用 Walk-Forward 验证代替 in-sample 参数穷举,并报告 Deflated Sharpe Ratio(DSR)而非峰值 Sharpe。
- **`SHARED-BT-SURV-001`** <sub>(fatal)</sub>: 幸存者偏差(Survivorship Bias):使用当前市场成分股作为历史回测股票池, 会遗漏曾经存在但后来退市、摘牌或被合并的股票,系统性高估策略历史收益率。 回测股票池必须使用历史时点快照(point-in-time universe)。
- **`SHARED-BT-SURV-002`** <sub>(high)</sub>: In-Sample / Out-of-Sample 划分:策略开发、参数选择必须在样本内完成, 样本外数据仅用于最终验证,不可多次"看"样本外数据后继续调优 (会将样本外变为新的样本内,重蹈过拟合)。
- **`SHARED-BT-SURV-003`** <sub>(high)</sub>: 停牌/缺失数据的填充策略:停牌日价格不可简单用前一日收盘价 forward-fill, 因为这会在复盘时造成"零成交量"日参与了因子计算和信号生成。 应在因子计算层显式过滤缺失交易日,不填充。
- **`SHARED-BT-SURV-004`** <sub>(high)</sub>: 异常值(Extreme Value)污染:原始市场数据可能含有数据源错误(如除权未 及时调整、手工录入错误导致的极端价格),不清洗直接进入因子计算会产生 极端信号,污染整个横截面。应在 pipeline 入口处过滤 3-sigma 异常值。
- **`SHARED-BT-COST-001`** <sub>(fatal)</sub>: 交易成本(佣金 + 印花税/转让税 + 过户费)必须在回测初始化时强制配置, 不可使用零成本默认值。忽略成本的回测策略绩效指标具有欺骗性, 高换手率策略尤其严重(单边往返成本往往吞噬 50%+ 的毛收益)。
- **`SHARED-BT-COST-002`** <sub>(high)</sub>: 滑点(Slippage)建模:回测若无滑点,假设每笔订单以理想价格成交, 高频策略在实盘中会因成交价劣化而产生严重亏损。至少应配置固定点差 或比例滑点;大单应使用成交量比例模型(如不超过日成交量 5%)。
- **`SHARED-BT-COST-003`** <sub>(high)</sub>: 换手率(Turnover)必须在回测绩效报告中展示并与成本关联分析。 月换手率超过 50%(年化 600%+)时,策略净收益对成本假设极度敏感, 每 10bps 成本变化可能改变策略盈亏结论,必须做成本敏感性分析。
- **`SHARED-BT-COST-004`** <sub>(medium)</sub>: 仓位规模化(Position Sizing)必须纳入资金量约束:回测应模拟固定资金量 下的实际持仓股数(取整),而非假设可以持有小数股。 对小盘股,最小交易单位(A股:100股/手)会导致实际可持仓量与目标权重 产生偏差,应在回测中模拟取整效应。
- **`SHARED-BT-TIME-001`** <sub>(high)</sub>: 时间戳时区统一:多数据源合并时,UTC vs 本地时间混用是常见数据腐败源。 所有时间戳必须在 pipeline 入口处统一转换为同一时区(推荐 UTC 存储, 市场本地时区展示),不可在 pipeline 中途混用不同时区。
- **`SHARED-BT-TIME-002`** <sub>(high)</sub>: 交易日历对齐:合并不同市场或不同频率数据时(如日线价格 + 周频因子), 必须使用明确的交易日历进行 reindex/merge,不可使用 outer join 后 fillna, 否则会在非交易日(节假日)创建虚假数据行。
- **`SHARED-BT-TIME-003`** <sub>(high)</sub>: 增量更新边界校验:历史数据增量更新时,必须从数据库查询已存最新日期, 仅下载该日期之后的数据。若重新下载已有数据并追加,会产生时间戳重复行, 导致回测时序错误。更新前后必须校验无重复 (index.duplicated().any() == False)。
- **`SHARED-BT-TIME-004`** <sub>(medium)</sub>: 回测绩效归因失真:基准(Benchmark)选择不当会使 Alpha/Beta 计算失真。 应选用策略实际可投资的被动基准(如 HS300 ETF),而非不可直接投资的 价格指数(如 HS300 指数)。价格指数不含股息再投资,会低估持仓基准收益。
- **`SHARED-BT-PERF-001`** <sub>(medium)</sub>: 最大回撤(Max Drawdown)计算必须使用净值序列(portfolio value), 不可用累计收益率序列代替。若使用对数收益率累加,会低估回撤深度 (因对数收益率在下跌时会比简单收益率偏小)。
- **`SHARED-BT-PERF-002`** <sub>(medium)</sub>: Sharpe Ratio 年化化约定:年化 Sharpe = 日 Sharpe × sqrt(252)(股票,252 交易日) 或 × sqrt(365)(加密货币,365日)。不同系统默认不同,跨系统对比前必须 确认年化因子,否则 Sharpe 不可比。
- **`SHARED-BT-PERF-003`** <sub>(medium)</sub>: Calmar Ratio / Sortino Ratio 优于 Sharpe Ratio 作为风险调整收益指标: Sharpe 假设收益正态分布,A 股/加密市场的收益分布显著左偏(肥尾), 会低估下行风险。量化评估应同时报告 Sortino(仅下行波动)和 Calmar(年化收益/最大回撤),不应单一依赖 Sharpe。
- **`SHARED-BT-PERF-004`** <sub>(medium)</sub>: 回测绩效归因应拆解为:alpha(主动收益)、beta(市场收益)、 因子暴露收益(style/sector)和特异性收益(stock selection)。 不做归因的回测无法区分"策略优秀"与"顺风行情恰好 beta 对了"。
- **`SHARED-FR-IC-001`** <sub>(high)</sub>: IC(信息系数)是衡量因子预测能力的核心指标,定义为因子值与 下期收益率的 Spearman 秩相关系数(ICIR = IC / std(IC))。 IC 绝对值 > 0.05 视为有预测能力的初步证据,ICIR > 0.5 视为稳定。 不计算 IC 直接报告回测绩效是因子有效性证明缺失的典型问题。
- **`SHARED-FR-IC-002`** <sub>(high)</sub>: IC 衰减(IC Decay)分析:因子预测能力通常随持仓期增长而衰减。 应计算 1/5/10/20 日 IC 序列,识别因子的最优持仓期。 IC 在1日高但20日迅速衰减的因子是短期因子,不适合月度换仓策略; 反之亦然。使用错误的持仓期会严重损害因子实盘表现。
- **`SHARED-FR-IC-003`** <sub>(high)</sub>: Harvey, Liu & Zhu (2016) 警告:学术界已发现 300+ 个"显著"因子, 其中大量是多重检验下的误发现(False Discovery)。因子有效性要求: t-stat > 3.0(而非传统的 1.96);或在不同时段/市场独立复现; 或有清晰的经济学逻辑。不满足上述条件的因子极可能是数据挖掘产物。
- **`SHARED-FR-IC-004`** <sub>(high)</sub>: 因子换手率(Factor Turnover)控制:高 IC 但高换手率的因子,在扣除 交易成本后净 IC 可能为负。应计算换手率调整后的有效 IC: net_IC = IC - turnover × cost_per_turn。目标换手率 ≤ 50%(月频)。
- **`SHARED-FR-IC-005`** <sub>(medium)</sub>: 因子衰减期(Half-life)是因子信号强度的核心参数,直接决定最优再平衡频率。 半衰期 < 5 日:日频或周频换仓;5-20 日:周频或双周;> 20 日:月频换仓。 错误地对短期因子使用月频换仓,会导致大量 alpha 在持仓期内消散。
- **`SHARED-FR-NEUT-001`** <sub>(high)</sub>: 行业中性化(Industry Neutralization):因子值若不对行业均值中性化, 因子收益中会混入行业轮动收益,难以判断是因子本身还是行业暴露驱动了收益。 行业中性化操作:factor_neutral = factor - industry_mean(factor)。
- **`SHARED-FR-NEUT-002`** <sub>(high)</sub>: 市值中性化(Market Cap Neutralization):小盘股效应(小盘跑赢大盘) 是金融史上最持久的 anomaly 之一,会污染几乎所有未中性化的因子。 若因子与市值高度相关,选股会系统性偏向小盘,收益来自市值暴露而非因子本身。 需同时进行行业和市值中性化(Fama-MacBeth 回归或残差法)。
- **`SHARED-FR-NEUT-003`** <sub>(high)</sub>: 异常值处理(Winsorize/MAD):因子原始值通常含有极端值,极端值会扭曲 分组分析(如 Q1/Q10 十分位)。应对原始因子值做 Winsorize(截尾至 [1%, 99%] 或 3-sigma)或 MAD(中位数绝对偏差)缩尾,然后再排名/中性化。
- **`SHARED-FR-NEUT-004`** <sub>(medium)</sub>: 因子正交化(Factor Orthogonalization):当多个因子共同用于合成打分时, 高相关因子的合成等效于对单一因子过度权重,稀释信号多样性。 应在合成前对因子做施密特正交化或 PCA,消除因子间的多重共线性。
- **`SHARED-FR-NEUT-005`** <sub>(medium)</sub>: 缺失数据填充策略:因子计算中的 NaN(停牌/新股/数据缺口)若用截面均值填充 会引入 lookahead bias(均值本身含未来信息);若完全删除会产生幸存者偏差; 正确做法是用截面中位数(当日所有股票的中位数,不依赖未来)或将该股当日排除。
- **`SHARED-FR-PORT-001`** <sub>(high)</sub>: 分层分析(Quantile Analysis):因子评估应使用 Q1/Q5(五分位)或 Q1/Q10(十分位)分组的多空收益差(top minus bottom spread)作为 主要评估指标,而非简单的多头收益。Q1 多 Q5 空的"单调性"检验是 因子有效性的核心证据:单调递增/递减 > 非单调 >> 仅多头有效。
- **`SHARED-FR-PORT-002`** <sub>(medium)</sub>: Alpha 衰减测试(Alpha Decay Test):因子的月度 IC 在不同时段(牛市/熊市/ 震荡市)的稳定性是因子鲁棒性的重要证据。IC 仅在某个特定市场状态下有效 的因子不适合全天候部署;应分段(rolling 12M)展示 IC 时序, 识别因子失效期。
- **`SHARED-FR-PORT-003`** <sub>(medium)</sub>: 换仓成本感知(Turnover-Aware Selection):因子排名靠近中间地带(49-51 分位) 的股票,排名小幅波动就会触发换仓,产生大量无效交易成本。 应在选股时设置换仓缓冲区(buffer zone):只在排名变化超过阈值时才换仓。
- **`SHARED-FR-PORT-004`** <sub>(medium)</sub>: 分组收益的统计显著性(Bootstrap 检验):因子分层收益差(Q1-Q5 spread) 即使在历史数据上很大,也可能是偶然,需要 bootstrap 或 t-test 检验 显著性(p-value < 0.05)。小样本回测期(< 3年)的分层收益尤其不可靠。
- **`SHARED-FR-XFER-001`** <sub>(high)</sub>: 因子跨市场可移植性验证:在一个市场有效的因子,不必然在另一个市场有效。 将美股因子直接套用 A 股、或将股票因子套用期货/加密货币,需要独立 IC 验证, 不可假设跨市场通用性。A 股特有异象(如反转效应、ST 价格异常)不存在于美股。
- **`SHARED-FR-XFER-002`** <sub>(medium)</sub>: 因子有效性时间稳定性:曾经有效的因子会因市场学习和套利行为逐渐失效 (McLean & Pontiff 2016 证明因子发表后平均衰减 58%)。 应定期(每季度/年)重新评估因子 IC,失效因子应及时替换或降权。
- **`SHARED-FR-XFER-003`** <sub>(medium)</sub>: 因子与宏观经济环境的交互:利率周期/经济周期/市场情绪对因子有效性影响显著。 价值因子(低 P/B)在利率上升期更有效;动量因子在趋势市更有效,震荡市失效。 部署因子前应评估当前宏观环境与因子最优生存环境的匹配度。
FILE:references/LOCKS.md
# Semantic Locks + Preconditions
## Semantic Locks (12)
### `SL-01` <sub>(on_violation: fatal)</sub>
Execute sell orders before buy orders in every trading cycle
### `SL-02` <sub>(on_violation: fatal)</sub>
Trading signals MUST use next-bar execution (no look-ahead)
### `SL-03` <sub>(on_violation: fatal)</sub>
Entity IDs MUST follow format entity_type_exchange_code
### `SL-04` <sub>(on_violation: fatal)</sub>
DataFrame index MUST be MultiIndex (entity_id, timestamp)
### `SL-05` <sub>(on_violation: fatal)</sub>
TradingSignal MUST have EXACTLY ONE of: position_pct, order_money, order_amount
### `SL-06` <sub>(on_violation: fatal)</sub>
filter_result column semantics: True=BUY, False=SELL, None/NaN=NO ACTION
### `SL-07` <sub>(on_violation: fatal)</sub>
Transformer MUST run BEFORE Accumulator in factor pipeline
### `SL-08` <sub>(on_violation: fatal)</sub>
MACD parameters locked: fast=12, slow=26, signal=9
### `SL-09` <sub>(on_violation: warning)</sub>
Default transaction costs: buy_cost=0.001, sell_cost=0.001, slippage=0.001
### `SL-10` <sub>(on_violation: fatal)</sub>
A-share equity trading is T+1 (no same-day close of buy positions)
### `SL-11` <sub>(on_violation: fatal)</sub>
Recorder subclass MUST define provider AND data_schema class attributes
### `SL-12` <sub>(on_violation: fatal)</sub>
Factor result_df MUST contain either 'filter_result' OR 'score_result' column
## Preconditions (4)
- **`PC-01`**: `python3 -c 'import zvt; print(zvt.__version__)'` → on_fail: Run: python3 -m pip install zvt then re-run: python3 -m zvt.init_dirs to initialize data directories
- **`PC-02`**: `python3 -c "from zvt.api.kdata import get_kdata; df = get_kdata(entity_ids=['stock_sh_600000'], limit=1); assert df is not None and len(df) > 0, 'No kdata found'"` → on_fail: Run recorder first: python3 -m zvt.recorders.em.em_stock_kdata_recorder --entity_ids stock_sh_600000 (replace with your target entity IDs)
- **`PC-03`**: `python3 -c "import os; from pathlib import Path; zvt_home = Path(os.environ.get('ZVT_HOME', Path.home() / '.zvt')); assert zvt_home.exists(), f'ZVT home not found: {zvt_home}'"` → on_fail: Run: python3 -m zvt.init_dirs
- **`PC-04`**: `python3 -c "import os, tempfile; from pathlib import Path; zvt_home = Path(os.environ.get('ZVT_HOME', Path.home() / '.zvt')); test_f = zvt_home / '.write_test'; test_f.touch(); test_f.unlink()"` → on_fail: Check directory permissions: chmod u+w ~/.zvt or set ZVT_HOME environment variable to a writable location
FILE:references/USE_CASES.md
# Known Use Cases (KUC)
Total: **2**
## `KUC-101`
**Source**: `samples/pyfolio2/backtrader-pyfolio.ipynb`
Implements a classic dual moving average crossover trading strategy using backtrader, generating LONG/LONGSHORT signals when fast and slow SMAs cross, with integrated PyFolio portfolio analytics for performance measurement.
## `KUC-102`
**Source**: `samples/pyfoliotest/backtrader-pyfolio.ipynb`
Provides a minimal backtrader strategy that logs and prints OHLC (Open, High, Low, Close) data in CSV format for debugging and verifying data feed integrity.
FILE:references/WISDOM.md
# Cross-Project Wisdom
Total: **10**
## `CW-BT-001` — Cerebro 统一编排引擎
**From**: backtrader · **Applicable to**: backtesting
backtrader 用 Cerebro 作为单一入口,统一管理 data feeds、strategies、analyzers、 observers 的生命周期,支持一次 cerebro.run() 跑多策略+多数据源。 zvt 的 StockTrader 目前每次实例化只绑定一套因子,缺乏统一的多策略组合编排层; 借鉴 Cerebro 模式可让用户把多个 Trader 实例组合到一个 runner 中对比评估。
## `CW-BT-002` — Analyzer 插件化绩效评估
**From**: backtrader · **Applicable to**: backtesting
backtrader 提供 SharpeRatio、DrawDown、TimeReturn、TradeAnalyzer 等即插即用 的 Analyzer,可在不修改策略代码的情况下附加任意绩效指标。 zvt 当前绩效评估能力较弱,没有标准化的 Analyzer 接口; 借鉴此模式可让用户 cerebro.addanalyzer(SharpeRatio) 即得风险调整收益报告。
## `CW-BT-003` — Sizer 仓位管理分离
**From**: backtrader · **Applicable to**: backtesting
backtrader 将仓位管理(每次开仓买多少股/多大比例)单独抽象为 Sizer, 与信号逻辑完全解耦;内置 FixedSize、PercentSizer 等,用户可自定义。 zvt 目前没有显式的 Sizer 概念,仓位控制逻辑散落在 Trader.on_profit_control 等钩子中; 引入 Sizer 接口可使策略信号与资金管理规则独立演化和组合复用。
## `CW-BT-004` — Order 类型全集(Limit/Stop/OCO/Bracket)
**From**: backtrader · **Applicable to**: backtesting
backtrader 支持 Market、Limit、Stop、StopLimit、OCO(二选一)、 Bracket(止盈止损一对订单)等丰富订单类型,并模拟成交滑点和手续费方案。 zvt 回测目前主要支持市价成交,缺乏限价委托和组合订单模拟; 对于高频或实盘对接场景,完善订单类型将大幅提升回测真实性。
## `CW-BT-005` — 数据重采样与重播(Resampling & Replaying)
**From**: backtrader · **Applicable to**: backtesting
backtrader 可将低级别数据(如 1 min)实时 resample 为高级别(如 1 day)并同步驱动策略, 或 replay 逐 tick 模拟 OHLC 形成过程,实现日内精细回测。 zvt 目前多时间框架通过预录入不同级别 K 线实现,缺少运行时动态重采样; 借鉴此模式可在不重复录入数据的前提下支持任意时间粒度组合回测。
## `CW-VN-003` — CTA 回测引擎内置可视化
**From**: vnpy · **Applicable to**: backtesting
vnpy 的 cta_backtester 提供图形界面直接展示策略净值曲线、最大回撤、 每日盈亏、成交明细,无需 Jupyter Notebook。 zvt 目前回测结果可视化依赖 draw_result 方法调用 Plotly,但无统一的回测报告页面; 借鉴此模式可打包一个开箱即用的策略绩效仪表盘。
## `CW-VN-004` — vnpy.alpha ML 因子研究实验室(Lab)
**From**: vnpy · **Applicable to**: factor-research
vnpy 4.0 的 vnpy.alpha.lab 提供数据管理、模型训练、信号生成、策略回测一体化工作流, 支持 Lasso/LightGBM/MLP 等算法的标准化训练接口和可视化对比。 zvt 的 ML 能力目前仅有 MaStockMLMachine 一个入口,缺乏规范化 Lab 框架; 借鉴 Lab 模式可建立"特征工程→训练→信号→回测"的标准流水线,降低 ML 实验门槛。
## `CW-QL-001` — Point-in-Time 数据库(防未来数据泄漏)
**From**: qlib · **Applicable to**: backtesting
qlib 的 Point-in-Time Provider 保证在给定时间点 t 的查询只返回 t 时刻 真实可知的数据(财报发布延迟、修订历史均被正确处理), 彻底消除回测中的 look-ahead bias。 zvt 目前财务数据以报告期为 timestamp,缺少"发布日"维度, 存在用未来财报数据做选股的潜在偏差;引入 PIT 模式可大幅提升回测可信度。
## `CW-QL-002` — Recorder + Experiment 实验管理(MLflow 风格)
**From**: qlib · **Applicable to**: factor-research
qlib 的 workflow 模块提供 Experiment/Recorder,自动记录每次模型训练的 超参数、特征、指标、预测结果,支持跨实验比较和模型版本管理。 zvt 目前缺乏 ML 实验追踪机制,每次重跑结果会覆盖前次; 借鉴 Recorder 模式可将每次因子实验的参数和结果持久化,支持快速复现和版本对比。
## `CW-QL-003` — Nested Decision Framework(多层嵌套决策执行)
**From**: qlib · **Applicable to**: backtesting
qlib 支持将高频执行层(分钟级委托拆单)嵌套在低频决策层(日级组合调仓)中, 两层独立优化且可组合运行,实现日内最优执行算法(如 TWAP、VWAP 调仓)。 zvt 目前回测仅有日线级别的成交假设,缺乏执行算法建模; 借鉴嵌套框架可让 zvt 区分"何时持有哪些股"与"如何以最小冲击成本建仓"两个问题。
FILE:references/components/analysis_-_reporting.md
# analysis_&_reporting (2 classes)
## `Analyzer.get_analysis`
`analysis_&_reporting/analyzer-get-analysis.py:0`
## `Output format`
`analysis_&_reporting/output-format.py:0`
FILE:references/components/cerebro_orchestration.md
# cerebro_orchestration (4 classes)
## `Cerebro.adddata`
`cerebro_orchestration/cerebro-adddata.py:0`
## `Cerebro.addstrategy`
`cerebro_orchestration/cerebro-addstrategy.py:0`
## `Cerebro.run`
`cerebro_orchestration/cerebro-run.py:0`
## `Execution mode`
`cerebro_orchestration/execution-mode.py:0`
FILE:references/components/data_filtering_-_resampling.md
# data_filtering_&_resampling (2 classes)
## `Resampler._process`
`data_filtering_&_resampling/resampler-process.py:0`
## `Resampling mode`
`data_filtering_&_resampling/resampling-mode.py:0`
FILE:references/components/data_ingestion.md
# data_ingestion (3 classes)
## `Cerebro.adddata`
`data_ingestion/cerebro-adddata.py:0`
## `CSVDataBase._loadline`
`data_ingestion/csvdatabase-loadline.py:0`
## `Data Source`
`data_ingestion/data-source.py:0`
FILE:references/components/indicator_computation.md
# indicator_computation (3 classes)
## `Indicator.next`
`indicator_computation/indicator-next.py:0`
## `Indicator.once`
`indicator_computation/indicator-once.py:0`
## `Indicator calculation`
`indicator_computation/indicator-calculation.py:0`
FILE:references/components/order_execution_-_broker.md
# order_execution_&_broker (4 classes)
## `BrokerBase.submit`
`order_execution_&_broker/brokerbase-submit.py:0`
## `BrokerBase.getvalue`
`order_execution_&_broker/brokerbase-getvalue.py:0`
## `BrokerBase.getposition`
`order_execution_&_broker/brokerbase-getposition.py:0`
## `Broker backend`
`order_execution_&_broker/broker-backend.py:0`
FILE:references/components/strategy_logic.md
# strategy_logic (6 classes)
## `Strategy.__init__`
`strategy_logic/strategy-init.py:0`
## `Strategy.next`
`strategy_logic/strategy-next.py:0`
## `Strategy.buy`
`strategy_logic/strategy-buy.py:0`
## `Strategy.sell`
`strategy_logic/strategy-sell.py:0`
## `Sizing logic`
`strategy_logic/sizing-logic.py:0`
## `Trade tracking`
`strategy_logic/trade-tracking.py:0`
管理大规模时序数据存储与查询,支持十亿行级数据高效聚合,提供 DataFrame 懒加载与批量拼接,兼容 AWS S3 等多种存储后端。。
---
name: arcticdb-timeseries
description: |-
管理大规模时序数据存储与查询,支持十亿行级数据高效聚合,提供 DataFrame 懒加载与批量拼接,兼容 AWS S3 等多种存储后端。。
license: Proprietary. See LICENSE.txt in project root.
compatibility: Designed for Doramagic-host ecosystem (Claude Code / openclaw / Cursor). Requires Python 3.12+ with uv package manager.
metadata:
version: "v6.1"
blueprint_id: "finance-bp-103"
compiled_at: "2026-04-22T13:00:48.376963+00:00"
capability_markets: "multi-market"
capability_activities: "data-sourcing"
sop_version: "crystal-compilation-v6.1"
---
# ArcticDB 时序存储 (arcticdb-timeseries)
> 管理大规模时序数据存储与查询,支持十亿行级数据高效聚合,提供 DataFrame 懒加载与批量拼接,兼容 AWS S3 等多种存储后端。
## Pipeline
`data_collection -> data_storage -> factor_computation -> target_selection -> trading_execution -> visualization`
## Top Use Cases (17 total)
### AWS S3 Configuration for Public Blockchain Data Access (`UC-101`)
Setting up AWS credentials to enable secure access to public blockchain data stored in S3, allowing integration with ArcticDB for time-series storage
**Triggers**: aws, s3, credentials
### Billion Row Challenge - Large Scale Data Performance (`UC-102`)
Demonstrates ArcticDB's ability to handle massive datasets (1 billion rows of temperature data) with efficient aggregation, serving as a performance b
**Triggers**: billion rows, large scale, performance
### Batch DataFrame Concatenation with Lazy Loading (`UC-103`)
Demonstrates efficient concatenation of multiple DataFrames stored in ArcticDB using lazy loading to minimize memory consumption during batch operatio
**Triggers**: concat, batch, lazy
For all **17** use cases, see [references/USE_CASES.md](references/USE_CASES.md).
**Execute trigger**: `When user intent matches intent_router.uc_entries[].positive_terms AND user uses action verb (run/execute/跑/执行/backtest/fetch/collect)`
## What I'll Ask You
- Target market: A-share (default), HK, or crypto? (US stocks in ZVT are half-baked — stockus_nasdaq_AAPL exists but coverage is thin)
- Data source / provider: eastmoney (free, no account), joinquant (account+paid), baostock (free, good history), akshare, or qmt (broker)?
- Strategy type: MACD golden-cross, MA crossover, volume breakout, fundamental screen, or custom factor?
- Time range: start_timestamp and end_timestamp for backtest period
- Target entity IDs: specific stocks (stock_sh_600000) or index components (SZ1000)?
## Semantic Locks (Fatal)
| ID | Rule | On Violation |
|---|---|---|
| `SL-01` | Execute sell orders before buy orders in every trading cycle | halt |
| `SL-02` | Trading signals MUST use next-bar execution (no look-ahead) | halt |
| `SL-03` | Entity IDs MUST follow format entity_type_exchange_code | halt |
| `SL-04` | DataFrame index MUST be MultiIndex (entity_id, timestamp) | halt |
| `SL-05` | TradingSignal MUST have EXACTLY ONE of: position_pct, order_money, order_amount | halt |
| `SL-06` | filter_result column semantics: True=BUY, False=SELL, None/NaN=NO ACTION | halt |
| `SL-07` | Transformer MUST run BEFORE Accumulator in factor pipeline | halt |
| `SL-08` | MACD parameters locked: fast=12, slow=26, signal=9 | halt |
Full lock definitions: [references/LOCKS.md](references/LOCKS.md)
## Top Anti-Patterns (14 total)
- **`AP-DATA-SOURCING-001`**: Missing or invalid User-Agent headers for SEC API requests
- **`AP-DATA-SOURCING-002`**: Ignoring external API rate limits causing IP blocking
- **`AP-DATA-SOURCING-003`**: No HTTP timeout configuration causing indefinite hangs
All 14 anti-patterns: [references/ANTI_PATTERNS.md](references/ANTI_PATTERNS.md)
## Evidence Quality Notice
> [QUALITY NOTICE] This crystal was compiled from blueprint finance-bp-103. Evidence verify ratio = 79.0% and audit fail total = 19. Generated results may have uncaptured requirement gaps. Verify critical decisions against source files (LATEST.yaml / LATEST.jsonl).
## Reference Files
| File | Contents | When to Load |
|---|---|---|
| [references/seed.yaml](references/seed.yaml) | V6+ 全量权威 (source-of-truth) | 有行为/决策争议时必读 |
| [references/ANTI_PATTERNS.md](references/ANTI_PATTERNS.md) | 14 条跨项目反模式 | 开始实现前 |
| [references/WISDOM.md](references/WISDOM.md) | 跨项目精华借鉴 | 架构决策时 |
| [references/CONSTRAINTS.md](references/CONSTRAINTS.md) | domain + fatal 约束 | 规则冲突时 |
| [references/USE_CASES.md](references/USE_CASES.md) | 全量 KUC-* 业务场景 | 需要完整示例时 |
| [references/LOCKS.md](references/LOCKS.md) | SL-* + preconditions + hints | 生成回测/交易代码前 |
| [references/COMPONENTS.md](references/COMPONENTS.md) | AST 组件地图(按 module 拆分)| 查 API 时 |
---
*Compiled by Doramagic crystal-compilation-v6.1 from `finance-bp-103` blueprint at 2026-04-22T13:00:48.376963+00:00.*
*See [human_summary.md](human_summary.md) for non-technical overview.*
FILE:human_summary.md
# Human Summary
> I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow. Just tell me what you want; I'll write the code, you don't have to dig docs. (Heads up: ZVT natively supports A-share, HK, and crypto. US stocks — stockus_nasdaq_AAPL — are half-baked; don't bother for serious work.)
## What I Can Do
- **tagline**: I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow. Just tell me what you want; I'll write the code, you don't have to dig docs. (Heads up: ZVT natively supports A-share, HK, and crypto. US stocks — stockus_nasdaq_AAPL — are half-baked; don't bother for serious work.)
- **use_cases**: ['Batch DataFrame Concatenation with Lazy Loading', 'Billion Row Challenge - Large Scale Data Performance', 'AWS S3 Configuration for Public Blockchain Data Access', 'A-share MACD daily golden-cross backtest with hfq price adjustment from eastmoney', 'End-to-end ZVT pipeline: FinanceRecorder + GoodCompanyFactor + StockTrader', 'Multi-factor strategy with TargetSelector (AND mode) combining MACD + volume breakout', 'Index composition data collection (SZ1000, SZ2000) with EM recorder']
## What I Ask You
- Target market: A-share (default), HK, or crypto? (US stocks in ZVT are half-baked — stockus_nasdaq_AAPL exists but coverage is thin)
- Data source / provider: eastmoney (free, no account), joinquant (account+paid), baostock (free, good history), akshare, or qmt (broker)?
- Strategy type: MACD golden-cross, MA crossover, volume breakout, fundamental screen, or custom factor?
- Time range: start_timestamp and end_timestamp for backtest period
- Target entity IDs: specific stocks (stock_sh_600000) or index components (SZ1000)?
FILE:references/ANTI_PATTERNS.md
# Anti-Patterns (Cross-Project)
Total: **14**
## finance-bp-070--edgartools (2)
### `AP-DATA-SOURCING-004` — Invalidating XBRL period types for balance sheet analysis <sub>(high)</sub>
Balance sheets represent point-in-time snapshots (instant periods), not ranges (duration periods). Using duration periods for balance sheet statements causes stockholder equity and other line items to show nonsensical date ranges, corrupting financial calculations that depend on accurate period associations.
### `AP-DATA-SOURCING-012` — Large document parsing without streaming causing OOM errors <sub>(high)</sub>
SEC filings can exceed 160MB, and parsing large documents in memory without streaming causes OOM errors that crash the entire service for all users. Documents exceeding 10MB require switching to streaming parsers to prevent extreme memory usage.
## finance-bp-070--edgartools, finance-bp-079--akshare, finance-bp-084--eastmoney, finance-bp-114--edgar-crawler (1)
### `AP-DATA-SOURCING-002` — Ignoring external API rate limits causing IP blocking <sub>(high)</sub>
Multiple financial data sources (SEC EDGAR, Sina, Eastmoney, TuShare) enforce strict rate limits (10 req/sec, 120 calls/minute). Exceeding these triggers temporary IP blocks lasting 10-60 minutes, causing complete data unavailability. Immediate retry attempts during blocks extend the block duration significantly.
## finance-bp-070--edgartools, finance-bp-114--edgar-crawler (1)
### `AP-DATA-SOURCING-001` — Missing or invalid User-Agent headers for SEC API requests <sub>(high)</sub>
SEC EDGAR requires valid User-Agent identity with contact information in headers. Without this, requests are rejected with 403 Forbidden errors, completely blocking all filing access. Both edgartools and edgar-crawler enforce this constraint as fundamental to any data retrieval operation.
## finance-bp-079--akshare (4)
### `AP-DATA-SOURCING-003` — No HTTP timeout configuration causing indefinite hangs <sub>(high)</sub>
HTTP requests to external financial data sources (Yahoo, Sina, Eastmoney) without timeout values can hang indefinitely on blocked connections. This freezes the entire application and prevents data collection from all other sources, creating cascading failures across the system.
### `AP-DATA-SOURCING-005` — Malformed or empty JSON responses causing silent failures <sub>(medium)</sub>
Financial API responses containing malformed JSON raise unhandled ValueError exceptions, crashing downstream processing. Similarly, empty JSON responses (empty dict, list, null) masquerading as valid data cause silent failures producing empty DataFrames or misleading results in financial analysis.
### `AP-DATA-SOURCING-006` — Source-specific symbol mapping errors causing data corruption <sub>(high)</sub>
Stock symbols require source-specific formatting (sh/sz prefixes for Sina, numeric codes for THS, etc.). Incorrect symbol mapping causes API calls to return empty results or wrong data, corrupting financial datasets with missing records or entirely incorrect tickers being stored.
### `AP-DATA-SOURCING-013` — Column mapping length mismatch causing DataFrame errors <sub>(medium)</sub>
Column mapping constants with length mismatch against actual API response columns cause ValueError exceptions during DataFrame construction. Raw field names (f1, f2, f12) must be mapped to meaningful names (最新价, 涨跌幅) with exact column count alignment.
## finance-bp-103--ArcticDB (3)
### `AP-DATA-SOURCING-007` — Using unsupported DataFrame types with time-series storage <sub>(high)</sub>
ArcticDB does not support MultiIndex columns, PyArrow-backed pandas DataFrames, or timedelta64 columns. Attempting to write these DataFrame types raises ArcticDbNotYetImplemented exceptions, causing write failures and permanent data loss if not properly handled before storage operations.
### `AP-DATA-SOURCING-008` — Non-atomic storage writes causing concurrent access corruption <sub>(high)</sub>
Storage backends without atomic write_if_none operations can cause data corruption under concurrent multi-writer access. Similarly, updating reference keys before atom keys complete allows readers to access incomplete or missing data, breaking version chain integrity.
### `AP-DATA-SOURCING-014` — Pruning snapshot-protected versions breaking point-in-time recovery <sub>(high)</sub>
Deleting or pruning versions that are referenced by existing snapshots breaks historical data access. Snapshots provide point-in-time recovery capabilities, and removing their referenced versions causes read failures when users attempt to access data from specific snapshots.
## finance-bp-114--edgar-crawler (1)
### `AP-DATA-SOURCING-010` — 8-K filing item numbering scheme mismatch for historical filings <sub>(medium)</sub>
8-K filings use obsolete item numbering (1-12) before 2004-08-23 and new numbering (1.01-9.01) after. Using the wrong numbering scheme causes no matches for historical filings, resulting in empty item sections and complete extraction failure for pre-2004 data.
## finance-bp-128--yfinance (2)
### `AP-DATA-SOURCING-009` — Missing timezone-aware DatetimeIndex causing DST offset errors <sub>(high)</sub>
Price history DataFrames returned without timezone-aware DatetimeIndex cause incorrect timestamp interpretation when combined with other timezone-aware data. This leads to 23-25 hour offset errors during daylight saving time transitions, corrupting historical price calculations.
### `AP-DATA-SOURCING-011` — Yahoo Finance missing crumb authentication causing 401/403 errors <sub>(high)</sub>
Yahoo Finance API requires crumb and cookie authentication with every request. Without proper crumb management, API calls return 401 Unauthorized or HTML error pages instead of JSON data, breaking all downstream price and financial data processing.
FILE:references/COMPONENTS.md
# Component Capability Map
**Project**: finance-bp-103--ArcticDB
**Scan date**: 2026-04-22
**Stats**: {'total_files': 9, 'total_classes': 32, 'total_functions': 0, 'total_stages': 9}
## Modules (9)
- [uri_parsing_and_storage_adapter_selection](components/uri_parsing_and_storage_adapter_selection.md): 3 classes
- [library_configuration_and_management](components/library_configuration_and_management.md): 3 classes
- [data_normalization](components/data_normalization.md): 3 classes
- [recursive_normalization_(nested_structures)](components/recursive_normalization_-nested_structures.md): 3 classes
- [write_operations](components/write_operations.md): 5 classes
- [read_operations](components/read_operations.md): 3 classes
- [query_processing](components/query_processing.md): 4 classes
- [version_and_snapshot_management](components/version_and_snapshot_management.md): 5 classes
- [storage_backend_(c++_layer)](components/storage_backend_-c-_layer.md): 3 classes
FILE:references/CONSTRAINTS.md
# Constraints
## preservation_manifest
```yaml
required_objects:
business_decisions_count: 153
fatal_constraints_count: 40
non_fatal_constraints_count: 278
use_cases_count: 17
semantic_locks_count: 12
preconditions_count: 4
evidence_quality_rules_count: 2
traceback_scenarios_count: 5
```
## Domain Constraints Injected (16)
- **`SHARED-DS-RL-001`** <sub>(fatal)</sub>: Rate Limit + 指数退避重试:所有外部数据 API 调用必须实施速率限制控制 和指数退避重试(Exponential Backoff with Jitter)。收到 429/503 响应后 立即重试是反模式,会加剧服务端压力并触发 IP 封禁。 最大重试次数 3-5 次,退避基数 1-2 秒,最大退避 60 秒。
- **`SHARED-DS-RL-002`** <sub>(high)</sub>: 批量 API 调用必须控制并发数(max_workers),不可无限制并行。 免费 API(akshare/tushare 免费版)通常限制为 1-3 并发; 付费 API 也有并发上限(tushare 积分制,不同积分对应不同并发)。 超出并发限制会触发 429 或 IP 封禁。推荐使用 asyncio.Semaphore 或 ThreadPoolExecutor 的 max_workers 参数显式控制。
- **`SHARED-DS-RL-003`** <sub>(high)</sub>: API Token / 凭证安全:数据源 API key(tushare token / akshare 无需 token 但 其他商业数据源需要)不可硬编码在代码中,必须通过环境变量或配置文件读取。 硬编码 token 提交到 Git 会导致 token 泄露和费用损失。
- **`SHARED-DS-RL-004`** <sub>(medium)</sub>: 请求节流(Throttling):对同一 API 的批量请求应在请求间插入最小间隔 (akshare 部分接口要求 ≥ 0.5s;tushare 免费版每分钟 200 次)。 纯代码 sleep 不如令牌桶(Token Bucket)算法精确,推荐使用 ratelimit 或 slowapi 等成熟库。
- **`SHARED-DS-MISS-001`** <sub>(high)</sub>: 停牌日数据缺失策略:停牌股票在停牌期间无成交数据,数据库中会出现日期缺口。 缺失日期不可使用 forward-fill(会产生虚假成交量); 应在数据库中以 is_suspended=True 标记,量和成交额填 0,价格保留前一日收盘价。 因子计算时必须过滤 is_suspended=True 的行。
- **`SHARED-DS-MISS-002`** <sub>(medium)</sub>: 新上市股票的历史数据边界:新股上市首日开始在数据库中出现,但其上市前 无历史数据。若因子计算的 lookback 期超过上市天数,会产生所有 NaN 因子值。 采集时应记录每只股票的上市日期(list_date),采集逻辑应以上市日期为起点, 不以固定开始日期。
- **`SHARED-DS-MISS-003`** <sub>(high)</sub>: 退市股票的数据完整性:已退市股票在主流数据源(akshare/tushare)中依然 可以查询历史数据(退市前的历史),但退市日期后无数据。 历史股票池构建时必须包含已退市股票(否则幸存者偏差), 且采集时需明确处理退市日截止边界。
- **`SHARED-DS-MISS-004`** <sub>(high)</sub>: 多数据源数据对账(Cross-Source Reconciliation):同一数据(如收盘价) 从不同数据源(akshare/tushare/baostock)获取可能存在细微差异 (不同复权方式/不同节假日处理/除息调整时间不同)。 应在 pipeline 中实施多源对账检查,差异超阈值(如 0.1%)时记录告警并人工确认。
- **`SHARED-DS-TIME-001`** <sub>(high)</sub>: 时间戳精度与类型一致性:数据库中时间戳应使用统一的数据类型 (timestamp 而非 varchar/int)。混用字符串日期('2024-01-15')和 Timestamp 对象是比较、索引、merge 出现细微 bug 的常见来源, 应在 pipeline 入口处强制转换。
- **`SHARED-DS-TIME-002`** <sub>(high)</sub>: 交易时间与自然时间的区分:日线数据的"日期"通常对应交易日(T日), 而新闻/公告数据的"时间"是自然时间。合并两类数据时,必须将自然时间 映射到下一个可用交易日(next available trading day), 否则会产生"公告在T日,但T日盘中已经可用"的 lookahead 问题。
- **`SHARED-DS-TIME-003`** <sub>(medium)</sub>: 夏令时(DST)处理:采集美股/欧洲股市数据时,夏令时切换日(3月/11月) 会导致同一 HH:MM 时刻对应不同的 UTC 时间,若未处理,当日时序数据 会出现1小时的漂移。应始终以 UTC 存储,展示时按市场本地时区转换。
- **`SHARED-DS-INCR-001`** <sub>(high)</sub>: 增量更新幂等性:数据更新脚本必须是幂等的(多次运行结果相同)。 若脚本因网络中断在中途失败,重新运行时不应产生重复数据或数据缺口。 实现方式:先写入临时表,校验后 UPSERT 到主表,不直接 INSERT/APPEND。
- **`SHARED-DS-INCR-002`** <sub>(high)</sub>: 数据完整性检验(数据校验和/行数检查):每次数据更新后, 应对关键字段做完整性检验:行数是否在预期范围内、价格是否为正数、 日期是否连续(无缺失交易日)。缺少自动校验的数据管道是"沉默腐烂"的根源。
- **`SHARED-DS-INCR-003`** <sub>(medium)</sub>: 数据版本化:数据管道的输出数据应版本化管理(data versioning)。 当数据源更新了历史数据(如修订调整后的财务数据), 旧版本数据应保留可追溯,不应静默覆盖,以便对比版本间差异及复现历史回测。
- **`SHARED-DS-INCR-004`** <sub>(medium)</sub>: 数据对齐到交易日历边界:采集完成后,应验证所有股票/资产的数据覆盖 完整性与交易日历的一致性。每只股票在每个交易日都应有一行数据 (停牌标记,不是缺失)。通过 pivot_table 检查 NaN 比例是有效的快速诊断手段。
- **`SHARED-DS-INCR-005`** <sub>(medium)</sub>: 缓存策略(Caching):频繁读取的静态/低频更新数据(如股票信息、行业分类、 指数成分股)应本地缓存,避免每次运行重复 API 调用。 缓存必须设置过期时间(TTL),防止使用过期的行业分类或已失效的成分股信息。
FILE:references/LOCKS.md
# Semantic Locks + Preconditions
## Semantic Locks (12)
### `SL-01` <sub>(on_violation: fatal)</sub>
Execute sell orders before buy orders in every trading cycle
### `SL-02` <sub>(on_violation: fatal)</sub>
Trading signals MUST use next-bar execution (no look-ahead)
### `SL-03` <sub>(on_violation: fatal)</sub>
Entity IDs MUST follow format entity_type_exchange_code
### `SL-04` <sub>(on_violation: fatal)</sub>
DataFrame index MUST be MultiIndex (entity_id, timestamp)
### `SL-05` <sub>(on_violation: fatal)</sub>
TradingSignal MUST have EXACTLY ONE of: position_pct, order_money, order_amount
### `SL-06` <sub>(on_violation: fatal)</sub>
filter_result column semantics: True=BUY, False=SELL, None/NaN=NO ACTION
### `SL-07` <sub>(on_violation: fatal)</sub>
Transformer MUST run BEFORE Accumulator in factor pipeline
### `SL-08` <sub>(on_violation: fatal)</sub>
MACD parameters locked: fast=12, slow=26, signal=9
### `SL-09` <sub>(on_violation: warning)</sub>
Default transaction costs: buy_cost=0.001, sell_cost=0.001, slippage=0.001
### `SL-10` <sub>(on_violation: fatal)</sub>
A-share equity trading is T+1 (no same-day close of buy positions)
### `SL-11` <sub>(on_violation: fatal)</sub>
Recorder subclass MUST define provider AND data_schema class attributes
### `SL-12` <sub>(on_violation: fatal)</sub>
Factor result_df MUST contain either 'filter_result' OR 'score_result' column
## Preconditions (4)
- **`PC-01`**: `python3 -c 'import zvt; print(zvt.__version__)'` → on_fail: Run: python3 -m pip install zvt then re-run: python3 -m zvt.init_dirs to initialize data directories
- **`PC-02`**: `python3 -c "from zvt.api.kdata import get_kdata; df = get_kdata(entity_ids=['stock_sh_600000'], limit=1); assert df is not None and len(df) > 0, 'No kdata found'"` → on_fail: Run recorder first: python3 -m zvt.recorders.em.em_stock_kdata_recorder --entity_ids stock_sh_600000 (replace with your target entity IDs)
- **`PC-03`**: `python3 -c "import os; from pathlib import Path; zvt_home = Path(os.environ.get('ZVT_HOME', Path.home() / '.zvt')); assert zvt_home.exists(), f'ZVT home not found: {zvt_home}'"` → on_fail: Run: python3 -m zvt.init_dirs
- **`PC-04`**: `python3 -c "import os, tempfile; from pathlib import Path; zvt_home = Path(os.environ.get('ZVT_HOME', Path.home() / '.zvt')); test_f = zvt_home / '.write_test'; test_f.touch(); test_f.unlink()"` → on_fail: Check directory permissions: chmod u+w ~/.zvt or set ZVT_HOME environment variable to a writable location
FILE:references/USE_CASES.md
# Known Use Cases (KUC)
Total: **17**
## `KUC-101`
**Source**: `docs/mkdocs/docs/notebooks/ArcticDB_aws_public_blockchain.ipynb`
Setting up AWS credentials to enable secure access to public blockchain data stored in S3, allowing integration with ArcticDB for time-series storage.
## `KUC-102`
**Source**: `docs/mkdocs/docs/notebooks/ArcticDB_billion_row_challenge.ipynb`
Demonstrates ArcticDB's ability to handle massive datasets (1 billion rows of temperature data) with efficient aggregation, serving as a performance benchmark for large-scale data operations.
## `KUC-103`
**Source**: `docs/mkdocs/docs/notebooks/ArcticDB_demo_concat.ipynb`
Demonstrates efficient concatenation of multiple DataFrames stored in ArcticDB using lazy loading to minimize memory consumption during batch operations.
## `KUC-104`
**Source**: `docs/mkdocs/docs/notebooks/ArcticDB_demo_equity_analytics.ipynb`
Demonstrates downloading historical equity market data from yfinance and storing it in ArcticDB for analytics, enabling time-series analysis of stock prices and volumes across multiple symbols.
## `KUC-105`
**Source**: `docs/mkdocs/docs/notebooks/ArcticDB_demo_equity_options.ipynb`
Demonstrates storing and querying equity options data including expiry analysis and option Greeks, supporting options strategy research and analysis workflows.
## `KUC-106`
**Source**: `docs/mkdocs/docs/notebooks/ArcticDB_demo_lazydataframe.ipynb`
Demonstrates reading large datasets (10M-1B rows) efficiently using ArcticDB's lazy loading to reduce memory usage while selecting specific columns.
## `KUC-107`
**Source**: `docs/mkdocs/docs/notebooks/ArcticDB_demo_lmdb.ipynb`
Demonstrates basic storage operations (write, read, append, update) with ArcticDB using LMDB backend, including version management and subframe reading capabilities.
## `KUC-108`
**Source**: `docs/mkdocs/docs/notebooks/ArcticDB_demo_querybuilder.ipynb`
Demonstrates efficient querying of large datasets (up to 1B rows) with specific column selection, optimizing read performance by avoiding unnecessary data loading.
## `KUC-109`
**Source**: `docs/mkdocs/docs/notebooks/ArcticDB_demo_read_as_arrow.ipynb`
Demonstrates reading ArcticDB data into Arrow and Polars formats for interoperability with modern data science tooling, enabling seamless integration with downstream processing pipelines.
## `KUC-110`
**Source**: `docs/mkdocs/docs/notebooks/ArcticDB_demo_recursive_normalizers.ipynb`
Demonstrates storing complex nested data structures including DataFrames within dictionaries using recursive normalizers, enabling preservation of hierarchical data relationships.
## `KUC-111`
**Source**: `docs/mkdocs/docs/notebooks/ArcticDB_demo_resample.ipynb`
Demonstrates resampling high-frequency time-series data (12M rows at second frequency) to lower frequencies (1-minute) using built-in aggregation, optimizing storage and query performance.
## `KUC-112`
**Source**: `docs/mkdocs/docs/notebooks/ArcticDB_demo_snapshots.ipynb`
Demonstrates creating and managing data snapshots for point-in-time recovery, enabling reproducibility and audit trails for time-series data in ArcticDB.
## `KUC-113`
**Source**: `docs/mkdocs/docs/notebooks/ArcticDB_merge.ipynb`
Demonstrates merging new data with existing datasets using merge strategies (update, do_nothing) for handling price corrections and data synchronization in financial applications.
## `KUC-114`
**Source**: `docs/mkdocs/docs/notebooks/ArcticDB_pythagorean_won_loss_formula_notebook.ipynb`
Demonstrates sports data analytics using the Pythagorean expectation formula to analyze team performance, including data storage, visualization, and OLS statistical modeling.
## `KUC-115`
**Source**: `docs/mkdocs/docs/notebooks/ArcticDB_staged_data_with_tokens.ipynb`
Demonstrates staging data from multiple concurrent writers before finalizing with tokens, enabling distributed data ingestion workflows with proper synchronization.
## `KUC-116`
**Source**: `docs/mkdocs/docs/notebooks/styling.py`
Provides styling functions for DataFrame visualization with custom themes, color schemes, and export capabilities for creating professional data presentations.
## `KUC-117`
**Source**: `docs/mkdocs/docs/technical/release_checks.py`
Provides automated release validation tests that verify basic ArcticDB functionality including library creation, data write/read operations, and library deletion.
FILE:references/WISDOM.md
# Cross-Project Wisdom
Total: **8**
## `CW-DATA-SOURCING-001` — Exponential backoff retry with rate limit detection
**From**: finance-bp-079--akshare, finance-bp-114--edgar-crawler · **Applicable to**: data-sourcing
Implement retry logic with exponential backoff specifically for HTTP 429 rate limit responses. Retrying immediately on rate limit errors worsens the block situation. Separate retry logic for transient network errors (TimeoutError, ConnectionError) from permanent errors (ValueError, KeyError) prevents resource waste and masks underlying bugs.
## `CW-DATA-SOURCING-002` — Strict date format validation and standardization
**From**: finance-bp-070--edgartools, finance-bp-079--akshare, finance-bp-084--eastmoney · **Applicable to**: data-sourcing
Validate date formats strictly (YYYY-MM-DD pattern with leap year and month-end checks) before processing XBRL or API data. Convert date strings between formats (YYYYMMDD to YYYY-MM-DD) when storing to databases. Invalid dates corrupt downstream financial calculations.
## `CW-DATA-SOURCING-003` — XBRL fact attribute completeness enforcement
**From**: finance-bp-070--edgartools, finance-bp-114--edgar-crawler · **Applicable to**: data-sourcing
Extract and validate all essential XBRL fact attributes (concept, value, period, unit) from every fact. Missing attributes cause financial analysis queries to return incomplete or misleading results. Period type (instant vs duration) must be correctly distinguished for accurate balance sheet rendering.
## `CW-DATA-SOURCING-004` — Streaming parser threshold for large documents
**From**: finance-bp-070--edgartools, finance-bp-128--yfinance · **Applicable to**: data-sourcing
Implement streaming parser activation when documents exceed configurable thresholds (10MB default). This prevents OOM errors on large NPORT-P filings or bulk document downloads. Also require timezone information for time-series data to prevent DST offset corruption.
## `CW-DATA-SOURCING-005` — Data accuracy disclaimer requirements
**From**: finance-bp-079--akshare, finance-bp-128--yfinance, finance-bp-097--OpenBB · **Applicable to**: data-sourcing
Always present scraped or third-party financial data with proper caveats about accuracy limitations and delays. Claims of guaranteed accuracy, real-time capabilities, or Yahoo/provider affiliation violate terms of service and can lead to user financial losses from reliance on delayed or incorrect data.
## `CW-DATA-SOURCING-006` — Atomic write ordering for versioned storage
**From**: finance-bp-103--ArcticDB · **Applicable to**: data-sourcing
Write atom keys (TABLE_DATA, TABLE_INDEX, VERSION) before updating mutable reference keys (VERSION_REF, SNAPSHOT_REF). Never modify atom keys after writing to preserve content-addressed storage invariants. This prevents readers from accessing incomplete data in multi-writer scenarios.
## `CW-DATA-SOURCING-007` — HTTP status code validation before data processing
**From**: finance-bp-079--akshare, finance-bp-097--OpenBB · **Applicable to**: data-sourcing
Always validate HTTP response status codes before processing response data. Error responses (404, 500) may contain HTML error pages that corrupt downstream JSON parsing. Explicitly check for HTTP 429 and raise RateLimitError for proper handling by callers.
## `CW-DATA-SOURCING-008` — Quality gates for financial recommendations
**From**: finance-bp-084--eastmoney · **Applicable to**: data-sourcing
Apply fundamental quality filters (ROE thresholds, OCF/Profit ratios, debt ratios) before generating financial recommendations. Without quality gates, low-quality stocks may be recommended for positions, leading to investment losses. Separate on-demand computation from scheduled pre-computation to handle API rate limits.
FILE:references/components/data_normalization.md
# data_normalization (3 classes)
## `CompositeNormalizer.normalize`
`data_normalization/compositenormalizer-normalize.py:0`
## `DataFrameNormalizer.normalize`
`data_normalization/dataframenormalizer-normalize.py:0`
## `Data Format`
`data_normalization/data-format.py:0`
FILE:references/components/library_configuration_and_management.md
# library_configuration_and_management (3 classes)
## `Arctic.get_library`
`library_configuration_and_management/arctic-get-library.py:0`
## `LibraryOptions.__init__`
`library_configuration_and_management/libraryoptions-init.py:0`
## `Library Options`
`library_configuration_and_management/library-options.py:0`
FILE:references/components/query_processing.md
# query_processing (4 classes)
## `QueryBuilder.__getitem__`
`query_processing/querybuilder-getitem.py:0`
## `QueryBuilder.groupby`
`query_processing/querybuilder-groupby.py:0`
## `QueryBuilder.resample`
`query_processing/querybuilder-resample.py:0`
## `Query Optimization`
`query_processing/query-optimization.py:0`
FILE:references/components/read_operations.md
# read_operations (3 classes)
## `NativeVersionStore.read`
`read_operations/nativeversionstore-read.py:0`
## `NativeVersionStore.batch_read`
`read_operations/nativeversionstore-batch-read.py:0`
## `Output Format`
`read_operations/output-format.py:0`
FILE:references/components/recursive_normalization_-nested_structures.md
# recursive_normalization_(nested_structures) (3 classes)
## `Flattener.flatten`
`recursive_normalization_(nested_structures)/flattener-flatten.py:0`
## `Flattener.reconstruct`
`recursive_normalization_(nested_structures)/flattener-reconstruct.py:0`
## `Metastructure Version`
`recursive_normalization_(nested_structures)/metastructure-version.py:0`
FILE:references/components/storage_backend_-c-_layer.md
# storage_backend_(c++_layer) (3 classes)
## `PythonVersionStore.write`
`storage_backend_(c++_layer)/pythonversionstore-write.py:0`
## `PythonVersionStore.read`
`storage_backend_(c++_layer)/pythonversionstore-read.py:0`
## `Key-Value Backend`
`storage_backend_(c++_layer)/key-value-backend.py:0`
FILE:references/components/uri_parsing_and_storage_adapter_selection.md
# uri_parsing_and_storage_adapter_selection (3 classes)
## `Arctic.__init__`
`uri_parsing_and_storage_adapter_selection/arctic-init.py:0`
## `ArcticLibraryAdapter.supports_uri`
`uri_parsing_and_storage_adapter_selection/arcticlibraryadapter-supports-uri.py:0`
## `Storage Backend`
`uri_parsing_and_storage_adapter_selection/storage-backend.py:0`
FILE:references/components/version_and_snapshot_management.md
# version_and_snapshot_management (5 classes)
## `NativeVersionStore.list_versions`
`version_and_snapshot_management/nativeversionstore-list-versions.py:0`
## `NativeVersionStore.snapshot`
`version_and_snapshot_management/nativeversionstore-snapshot.py:0`
## `NativeVersionStore.delete`
`version_and_snapshot_management/nativeversionstore-delete.py:0`
## `NativeVersionStore.prune_previous_versions`
`version_and_snapshot_management/nativeversionstore-prune-previous-versio.py:0`
## `Version Pruning`
`version_and_snapshot_management/version-pruning.py:0`
FILE:references/components/write_operations.md
# write_operations (5 classes)
## `NativeVersionStore.write`
`write_operations/nativeversionstore-write.py:0`
## `NativeVersionStore.append`
`write_operations/nativeversionstore-append.py:0`
## `NativeVersionStore.update`
`write_operations/nativeversionstore-update.py:0`
## `NativeVersionStore.stage`
`write_operations/nativeversionstore-stage.py:0`
## `Write Mode`
`write_operations/write-mode.py:0`
用 GARCH 族模型进行波动率建模与预测,支持夏普比率统计推断和 SPA 模型比较测试,应用于全球市场风险管理。
---
name: arch-garch-volatility
description: |-
用 GARCH 族模型进行波动率建模与预测,支持夏普比率统计推断和 SPA 模型比较测试,应用于全球市场风险管理。
license: Proprietary. See LICENSE.txt in project root.
compatibility: Designed for Doramagic-host ecosystem (Claude Code / openclaw / Cursor). Requires Python 3.12+ with uv package manager.
metadata:
version: "v6.1"
blueprint_id: "finance-bp-124"
compiled_at: "2026-04-22T13:01:01.570350+00:00"
capability_markets: "global"
capability_activities: "derivatives-pricing"
sop_version: "crystal-compilation-v6.1"
---
# GARCH 波动率模型 (arch-garch-volatility)
> 用 GARCH 族模型进行波动率建模与预测,支持夏普比率统计推断和 SPA 模型比较测试,应用于全球市场风险管理。
## Pipeline
`data_collection -> data_storage -> factor_computation -> target_selection -> trading_execution -> visualization`
## Top Use Cases (9 total)
### Sharpe Ratio Bootstrap Statistical Inference (`UC-101`)
Computes statistical inference (confidence intervals, standard errors) for the Sharpe Ratio using bootstrap methods to quantify uncertainty in risk-ad
**Triggers**: bootstrap, sharpe ratio, statistical inference
### Multiple Model Comparison with SPA Test (`UC-102`)
Compares 500 predictive models against a benchmark using the Superior Predictive Ability (SPA) test to determine if any models significantly outperfor
**Triggers**: model comparison, SPA test, multiple models
### Oil Price Cointegration Analysis (`UC-103`)
Tests for cointegration relationships between WTI and Brent crude oil prices to identify mean-reverting spread opportunities using Engle-Granger and P
**Triggers**: cointegration, unit root, ADF test
For all **9** use cases, see [references/USE_CASES.md](references/USE_CASES.md).
**Execute trigger**: `When user intent matches intent_router.uc_entries[].positive_terms AND user uses action verb (run/execute/跑/执行/backtest/fetch/collect)`
## What I'll Ask You
- Target market: A-share (default), HK, or crypto? (US stocks in ZVT are half-baked — stockus_nasdaq_AAPL exists but coverage is thin)
- Data source / provider: eastmoney (free, no account), joinquant (account+paid), baostock (free, good history), akshare, or qmt (broker)?
- Strategy type: MACD golden-cross, MA crossover, volume breakout, fundamental screen, or custom factor?
- Time range: start_timestamp and end_timestamp for backtest period
- Target entity IDs: specific stocks (stock_sh_600000) or index components (SZ1000)?
## Semantic Locks (Fatal)
| ID | Rule | On Violation |
|---|---|---|
| `SL-01` | Execute sell orders before buy orders in every trading cycle | halt |
| `SL-02` | Trading signals MUST use next-bar execution (no look-ahead) | halt |
| `SL-03` | Entity IDs MUST follow format entity_type_exchange_code | halt |
| `SL-04` | DataFrame index MUST be MultiIndex (entity_id, timestamp) | halt |
| `SL-05` | TradingSignal MUST have EXACTLY ONE of: position_pct, order_money, order_amount | halt |
| `SL-06` | filter_result column semantics: True=BUY, False=SELL, None/NaN=NO ACTION | halt |
| `SL-07` | Transformer MUST run BEFORE Accumulator in factor pipeline | halt |
| `SL-08` | MACD parameters locked: fast=12, slow=26, signal=9 | halt |
Full lock definitions: [references/LOCKS.md](references/LOCKS.md)
## Top Anti-Patterns (15 total)
- **`AP-DERIVATIVES-PRICING-001`**: Instrument NPV called without attached pricing engine
- **`AP-DERIVATIVES-PRICING-002`**: BSM forward price ignores dividend yield
- **`AP-DERIVATIVES-PRICING-003`**: Negative discount factors passed to log-domain interpolation
All 15 anti-patterns: [references/ANTI_PATTERNS.md](references/ANTI_PATTERNS.md)
## Evidence Quality Notice
> [QUALITY NOTICE] This crystal was compiled from blueprint finance-bp-124. Evidence verify ratio = 47.2% and audit fail total = 32. Generated results may have uncaptured requirement gaps. Verify critical decisions against source files (LATEST.yaml / LATEST.jsonl).
## Reference Files
| File | Contents | When to Load |
|---|---|---|
| [references/seed.yaml](references/seed.yaml) | V6+ 全量权威 (source-of-truth) | 有行为/决策争议时必读 |
| [references/ANTI_PATTERNS.md](references/ANTI_PATTERNS.md) | 15 条跨项目反模式 | 开始实现前 |
| [references/WISDOM.md](references/WISDOM.md) | 跨项目精华借鉴 | 架构决策时 |
| [references/CONSTRAINTS.md](references/CONSTRAINTS.md) | domain + fatal 约束 | 规则冲突时 |
| [references/USE_CASES.md](references/USE_CASES.md) | 全量 KUC-* 业务场景 | 需要完整示例时 |
| [references/LOCKS.md](references/LOCKS.md) | SL-* + preconditions + hints | 生成回测/交易代码前 |
| [references/COMPONENTS.md](references/COMPONENTS.md) | AST 组件地图(按 module 拆分)| 查 API 时 |
---
*Compiled by Doramagic crystal-compilation-v6.1 from `finance-bp-124` blueprint at 2026-04-22T13:01:01.570350+00:00.*
*See [human_summary.md](human_summary.md) for non-technical overview.*
FILE:human_summary.md
# Human Summary
> I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow. Just tell me what you want; I'll write the code, you don't have to dig docs. (Heads up: ZVT natively supports A-share, HK, and crypto. US stocks — stockus_nasdaq_AAPL — are half-baked; don't bother for serious work.)
## What I Can Do
- **tagline**: I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow. Just tell me what you want; I'll write the code, you don't have to dig docs. (Heads up: ZVT natively supports A-share, HK, and crypto. US stocks — stockus_nasdaq_AAPL — are half-baked; don't bother for serious work.)
- **use_cases**: ['Oil Price Cointegration Analysis', 'Multiple Model Comparison with SPA Test', 'Sharpe Ratio Bootstrap Statistical Inference', 'A-share MACD daily golden-cross backtest with hfq price adjustment from eastmoney', 'End-to-end ZVT pipeline: FinanceRecorder + GoodCompanyFactor + StockTrader', 'Multi-factor strategy with TargetSelector (AND mode) combining MACD + volume breakout', 'Index composition data collection (SZ1000, SZ2000) with EM recorder']
## What I Ask You
- Target market: A-share (default), HK, or crypto? (US stocks in ZVT are half-baked — stockus_nasdaq_AAPL exists but coverage is thin)
- Data source / provider: eastmoney (free, no account), joinquant (account+paid), baostock (free, good history), akshare, or qmt (broker)?
- Strategy type: MACD golden-cross, MA crossover, volume breakout, fundamental screen, or custom factor?
- Time range: start_timestamp and end_timestamp for backtest period
- Target entity IDs: specific stocks (stock_sh_600000) or index components (SZ1000)?
FILE:references/ANTI_PATTERNS.md
# Anti-Patterns (Cross-Project)
Total: **15**
## FinancePy (finance-bp-101) (3)
### `AP-DERIVATIVES-PRICING-003` — Negative discount factors passed to log-domain interpolation <sub>(high)</sub>
When Numba-jitted interpolation functions perform log transformation on discount factors, negative or zero values cause domain errors. This occurs because log(-x) and log(0) are mathematically undefined. The consequence is runtime crashes in jitted functions and complete failure of discount curve interpolation, blocking all downstream pricing calculations.
### `AP-DERIVATIVES-PRICING-004` — Non-monotonic time points in discount curve interpolation <sub>(high)</sub>
Interpolation over non-monotonically increasing time points produces undefined behavior at crossing times, causing discount factors to be incorrectly computed where time values overlap. This corrupts the entire term structure because the bootstrap algorithm cannot determine which discount factor corresponds to which maturity. The consequence is incorrect present value calculations across all downstream products priced against the curve.
### `AP-DERIVATIVES-PRICING-005` — Bootstrap calibration instruments not in maturity order <sub>(high)</sub>
When building yield curves from market instruments (deposits, FRAs, swaps), the instruments must be provided in strictly increasing maturity order. Out-of-order instruments cause the bootstrap algorithm to solve for discount factors at incorrect time points, corrupting the entire term structure. The consequence is wrong forward rates and discount factors that propagate into all priced instruments.
## QuantLib-SWIG (finance-bp-123) (4)
### `AP-DERIVATIVES-PRICING-001` — Instrument NPV called without attached pricing engine <sub>(high)</sub>
Calling NPV() on a derivatives instrument without first calling setPricingEngine() returns uninitialized garbage values or throws null pointer exceptions. This occurs because the Instrument class relies on the attached PricingEngine to perform actual valuation logic. The consequence is silently incorrect pricing results that appear valid, potentially leading to bad trading decisions.
### `AP-DERIVATIVES-PRICING-006` — Option Exercise type mismatches VanillaOption constructor <sub>(high)</sub>
VanillaOption requires both a StrikedTypePayoff and a matching Exercise object. Using wrong Exercise type (e.g., AmericanExercise for European option) causes compilation failures in C++ or runtime errors in SWIG bindings. The consequence is the pricing system cannot initialize options, blocking all option pricing workflows.
### `AP-DERIVATIVES-PRICING-013` — Evaluation date not set before QuantLib term structure construction <sub>(medium)</sub>
QuantLib requires ql.Settings.instance().evaluationDate to be set before constructing yield term structures and instruments. Without an explicit evaluation date, the curve reference date becomes undefined, causing date calculations to fail or produce incorrect settlement dates. The consequence is wrong discount factors and NPV calculations across the entire portfolio.
### `AP-DERIVATIVES-PRICING-014` — Market quotes passed without QuoteHandle wrapper <sub>(medium)</sub>
QuantLib's observer pattern requires all market quotes to be wrapped in QuoteHandle before passing to rate helpers. Raw quote values bypass the observable notification mechanism, causing dependent instruments to never recalculate when market data updates. The consequence is stale pricing that doesn't reflect current market conditions.
## arch (finance-bp-124) (2)
### `AP-DERIVATIVES-PRICING-007` — NaN/inf values in ARCH model input data <sub>(high)</sub>
ARCH model estimation relies on recursive variance computations and scipy optimize. Non-finite input values (NaN, inf) cause optimizers to produce NaN results and recursive variance calculations to fail. The consequence is complete model estimation failure with meaningless outputs that appear valid, leading to incorrect volatility forecasts and risk misestimation.
### `AP-DERIVATIVES-PRICING-008` — ARCH parameter array concatenation in wrong order <sub>(high)</sub>
ARCHModel composes from three components (mean, volatility, distribution) and requires parameter arrays concatenated in fixed order: [mean_params, volatility_params, distribution_params]. Incorrect ordering causes _parse_parameters to assign wrong values to wrong components, producing mathematically invalid models (e.g., volatility parameters interpreted as distribution parameters). The consequence is invalid conditional variance forecasts.
## py_vollib (finance-bp-127) (6)
### `AP-DERIVATIVES-PRICING-002` — BSM forward price ignores dividend yield <sub>(high)</sub>
When calculating option prices on dividend-paying stocks using BSM, the forward price must be adjusted as F = S * exp((r-q)*t). Omitting the dividend yield adjustment (using F = S * exp(r*t)) causes systematic mispricing for all dividend-paying assets. The consequence is consistently wrong option prices that diverge from market prices, leading to arbitrage opportunities and trading losses.
### `AP-DERIVATIVES-PRICING-009` — Zero or negative time-to-expiration in option pricing <sub>(high)</sub>
Option pricing formulas (Black-Scholes, Black model) compute sqrt(t) in the denominator. Zero time causes division by zero; negative time produces NaN in d1/d2 calculations. The consequence is invalid option prices (NaN, inf) that break downstream Greeks calculations and hedging workflows.
### `AP-DERIVATIVES-PRICING-010` — Black model applies spot price instead of forward price <sub>(high)</sub>
The Black model is designed for options on futures/forwards and expects futures price F as input, not spot price S. Using spot directly causes incorrect pricing because the Black formula assumes the underlying follows geometric Brownian motion with drift equal to the risk-free rate (i.e., forward dynamics). The consequence is systematically wrong forward option prices.
### `AP-DERIVATIVES-PRICING-011` — Missing discount factor in Black model pricing <sub>(medium)</sub>
Black model pricing must apply time value discounting with deflater = exp(-r*t) to undiscounted option prices. Omitting the discount factor produces forward option prices that exceed their fair value by the risk-free compounding amount. The consequence is violation of time value of money principles and prices that cannot be used for fair valuation or hedging.
### `AP-DERIVATIVES-PRICING-012` — Invalid flag parameter ('c'/'p') passed to py_vollib without validation <sub>(medium)</sub>
py_vollib binary_flag dict only contains keys 'c' and 'p'. Passing any other flag value causes KeyError exception. The library lacks input validation and crashes on invalid inputs. The consequence is unhandled exceptions in production systems when flag values come from external sources with unexpected formats.
### `AP-DERIVATIVES-PRICING-015` — Implied volatility computed without proper bounds validation <sub>(medium)</sub>
When computing implied volatility, option prices outside theoretical bounds (below intrinsic value or above maximum) must raise appropriate exceptions. Returning invalid IV values (negative volatility or extreme values) violates mathematical definitions and leads to incorrect pricing, risk calculations, and hedging ratios. The consequence is systemic pricing errors across all vol-dependent derivatives.
FILE:references/COMPONENTS.md
# Component Capability Map
**Project**: finance-bp-124--arch
**Scan date**: 2026-04-22
**Stats**: {'total_files': 7, 'total_classes': 40, 'total_functions': 0, 'total_stages': 7}
## Modules (7)
- [data_input_&_validation](components/data_input_-_validation.md): 4 classes
- [model_specification](components/model_specification.md): 7 classes
- [parameter_estimation](components/parameter_estimation.md): 5 classes
- [forecasting_&_simulation](components/forecasting_-_simulation.md): 5 classes
- [unit_root_&_cointegration_testing](components/unit_root_-_cointegration_testing.md): 7 classes
- [bootstrap_&_multiple_comparison](components/bootstrap_-_multiple_comparison.md): 7 classes
- [results_reporting_&_visualization](components/results_reporting_-_visualization.md): 5 classes
FILE:references/CONSTRAINTS.md
# Constraints
## preservation_manifest
```yaml
required_objects:
business_decisions_count: 107
fatal_constraints_count: 77
non_fatal_constraints_count: 151
use_cases_count: 9
semantic_locks_count: 12
preconditions_count: 4
evidence_quality_rules_count: 2
traceback_scenarios_count: 5
```
FILE:references/LOCKS.md
# Semantic Locks + Preconditions
## Semantic Locks (12)
### `SL-01` <sub>(on_violation: fatal)</sub>
Execute sell orders before buy orders in every trading cycle
### `SL-02` <sub>(on_violation: fatal)</sub>
Trading signals MUST use next-bar execution (no look-ahead)
### `SL-03` <sub>(on_violation: fatal)</sub>
Entity IDs MUST follow format entity_type_exchange_code
### `SL-04` <sub>(on_violation: fatal)</sub>
DataFrame index MUST be MultiIndex (entity_id, timestamp)
### `SL-05` <sub>(on_violation: fatal)</sub>
TradingSignal MUST have EXACTLY ONE of: position_pct, order_money, order_amount
### `SL-06` <sub>(on_violation: fatal)</sub>
filter_result column semantics: True=BUY, False=SELL, None/NaN=NO ACTION
### `SL-07` <sub>(on_violation: fatal)</sub>
Transformer MUST run BEFORE Accumulator in factor pipeline
### `SL-08` <sub>(on_violation: fatal)</sub>
MACD parameters locked: fast=12, slow=26, signal=9
### `SL-09` <sub>(on_violation: warning)</sub>
Default transaction costs: buy_cost=0.001, sell_cost=0.001, slippage=0.001
### `SL-10` <sub>(on_violation: fatal)</sub>
A-share equity trading is T+1 (no same-day close of buy positions)
### `SL-11` <sub>(on_violation: fatal)</sub>
Recorder subclass MUST define provider AND data_schema class attributes
### `SL-12` <sub>(on_violation: fatal)</sub>
Factor result_df MUST contain either 'filter_result' OR 'score_result' column
## Preconditions (4)
- **`PC-01`**: `python3 -c 'import zvt; print(zvt.__version__)'` → on_fail: Run: python3 -m pip install zvt then re-run: python3 -m zvt.init_dirs to initialize data directories
- **`PC-02`**: `python3 -c "from zvt.api.kdata import get_kdata; df = get_kdata(entity_ids=['stock_sh_600000'], limit=1); assert df is not None and len(df) > 0, 'No kdata found'"` → on_fail: Run recorder first: python3 -m zvt.recorders.em.em_stock_kdata_recorder --entity_ids stock_sh_600000 (replace with your target entity IDs)
- **`PC-03`**: `python3 -c "import os; from pathlib import Path; zvt_home = Path(os.environ.get('ZVT_HOME', Path.home() / '.zvt')); assert zvt_home.exists(), f'ZVT home not found: {zvt_home}'"` → on_fail: Run: python3 -m zvt.init_dirs
- **`PC-04`**: `python3 -c "import os, tempfile; from pathlib import Path; zvt_home = Path(os.environ.get('ZVT_HOME', Path.home() / '.zvt')); test_f = zvt_home / '.write_test'; test_f.touch(); test_f.unlink()"` → on_fail: Check directory permissions: chmod u+w ~/.zvt or set ZVT_HOME environment variable to a writable location
FILE:references/USE_CASES.md
# Known Use Cases (KUC)
Total: **9**
## `KUC-101`
**Source**: `examples/bootstrap_examples.ipynb`
Computes statistical inference (confidence intervals, standard errors) for the Sharpe Ratio using bootstrap methods to quantify uncertainty in risk-adjusted performance metrics.
## `KUC-102`
**Source**: `examples/multiple-comparison_examples.ipynb`
Compares 500 predictive models against a benchmark using the Superior Predictive Ability (SPA) test to determine if any models significantly outperform the benchmark.
## `KUC-103`
**Source**: `examples/unitroot_cointegration_examples.ipynb`
Tests for cointegration relationships between WTI and Brent crude oil prices to identify mean-reverting spread opportunities using Engle-Granger and Phillips-Ouliaris tests.
## `KUC-104`
**Source**: `examples/unitroot_examples.ipynb`
Tests for stationarity in credit spreads (BAA-AAA) using Augmented Dickey-Fuller tests to determine if mean-reversion trading strategies are applicable.
## `KUC-105`
**Source**: `examples/univariate_forecasting_with_exogenous_variables.ipynb`
Forecasts univariate time series using Autoregressive models with exogenous variables (ARX) to capture the impact of external factors on the target variable.
## `KUC-106`
**Source**: `examples/univariate_using_fixed_variance.ipynb`
Demonstrates how to specify a HARX mean model with fixed/external variance inputs and iteratively fit volatility models using the estimated conditional volatility.
## `KUC-107`
**Source**: `examples/univariate_volatility_forecasting.ipynb`
Forecasts future volatility of S&P 500 returns using GARCH models, including multi-step ahead forecasts and rolling window out-of-sample predictions.
## `KUC-108`
**Source**: `examples/univariate_volatility_modeling.ipynb`
Fits and compares different GARCH volatility model specifications (symmetric, asymmetric, power) with various error distributions to characterize S&P 500 return volatility dynamics.
## `KUC-109`
**Source**: `examples/univariate_volatility_scenarios.ipynb`
Generates multiple volatility scenarios for NASDAQ returns using simulation-based forecasting methods, useful for risk management and option pricing applications.
FILE:references/WISDOM.md
# Cross-Project Wisdom
Total: **8**
## `CW-DERIVATIVES-PRICING-001` — Strict input validation before financial calculations
**From**: FinancePy, QuantLib-SWIG · **Applicable to**: derivatives-pricing
Both FinancePy and QuantLib-SWIG enforce strict validation of all input parameters before any financial computation. FinancePy validates day count types, date arguments, tolerance parameters, and max iterations. QuantLib-SWIG validates exercise types and swap direction enums. This pattern prevents corrupted calculations and provides clear error messages. Apply this pattern by validating all inputs at function entry points.
## `CW-DERIVATIVES-PRICING-002` — Bootstrap requires ordered instrument calibration
**From**: FinancePy, QuantLib-SWIG · **Applicable to**: derivatives-pricing
Both FinancePy and QuantLib-SWIG require calibration instruments to be provided in strict maturity order for curve bootstrapping. FinancePy enforces monotonically increasing time points and validates instrument sequencing (deposits before FRAs before swaps). QuantLib-SWIG uses bootstrap helpers (DepositRateHelper, FraRateHelper, SwapRateHelper) that assume ordered inputs. This ensures the bootstrap algorithm solves for discount factors at mathematically correct time points.
## `CW-DERIVATIVES-PRICING-003` — Handle pattern for lazy evaluation chains
**From**: QuantLib-SWIG · **Applicable to**: derivatives-pricing
QuantLib-SWIG requires wrapping market data (quotes, term structures) in Handle objects to enable lazy evaluation and automatic recalculation. QuoteHandle for market quotes and Handle for term structures enable the observer pattern. When market data updates, all dependent instruments automatically recalculate. This pattern is essential for live pricing systems where prices must reflect current market conditions.
## `CW-DERIVATIVES-PRICING-004` — Parameter composition requires fixed ordering and partitioning
**From**: arch · **Applicable to**: derivatives-pricing
arch enforces a strict parameter composition pattern where mean, volatility, and distribution parameters must be concatenated in fixed order with explicit offset partitioning. The offsets array partitions the unified parameter vector into components. This pattern prevents parameter assignment errors that would corrupt model components. Apply this when composing financial models from multiple sub-components.
## `CW-DERIVATIVES-PRICING-005` — Strict mathematical constraint enforcement
**From**: arch, py_vollib · **Applicable to**: derivatives-pricing
Both arch and py_vollib enforce strict mathematical constraints: arch enforces volatility model stationarity constraints (A.dot(params) - b >= 0) for SLSQP optimization; py_vollib validates implied volatility is positive and option prices within intrinsic/maximum bounds. Violating these constraints produces mathematically invalid results. Always enforce domain constraints on all financial model parameters.
## `CW-DERIVATIVES-PRICING-006` — Forward price adjustment for dividend yield in BSM
**From**: py_vollib · **Applicable to**: derivatives-pricing
py_vollib demonstrates the correct BSM implementation: compute forward price F = S * exp((r-q)*t) to adjust for continuous dividend yield before passing to the pricing engine. This pattern is essential for all options on dividend-paying assets. Forgetting the dividend adjustment causes systematic mispricing for the entire equity derivatives book.
## `CW-DERIVATIVES-PRICING-007` — Monotonicity validation for interpolation arrays
**From**: FinancePy · **Applicable to**: derivatives-pricing
FinancePy enforces strictly monotonically increasing time arrays before interpolation operations. This prevents undefined behavior at crossing times and ensures each time point maps to exactly one discount factor. Apply this validation whenever implementing interpolation over financial time series (discount curves, volatility surfaces, forward rates).
## `CW-DERIVATIVES-PRICING-008` — Production vs reference implementation selection
**From**: py_vollib · **Applicable to**: derivatives-pricing
py_vollib explicitly distinguishes between ref_python (slow, educational) and production (fast, C-based lets_be_rational) implementations. Using the reference implementation in production causes 10-100x performance degradation. Always select the appropriate implementation tier based on use case requirements—reference for testing/education, optimized for production trading systems.
FILE:references/components/bootstrap_-_multiple_comparison.md
# bootstrap_&_multiple_comparison (7 classes)
## `IIDBootstrap.conf_int`
`bootstrap_&_multiple_comparison/iidbootstrap-conf-int.py:0`
## `StationaryBootstrap.__init__`
`bootstrap_&_multiple_comparison/stationarybootstrap-init.py:0`
## `MCS.__init__`
`bootstrap_&_multiple_comparison/mcs-init.py:0`
## `SPA.__init__`
`bootstrap_&_multiple_comparison/spa-init.py:0`
## `Bootstrap type`
`bootstrap_&_multiple_comparison/bootstrap-type.py:0`
## `Confidence interval method`
`bootstrap_&_multiple_comparison/confidence-interval-method.py:0`
## `Multiple comparison procedure`
`bootstrap_&_multiple_comparison/multiple-comparison-procedure.py:0`
FILE:references/components/data_input_-_validation.md
# data_input_&_validation (4 classes)
## `ARCHModel.__init__`
`data_input_&_validation/archmodel-init.py:0`
## `ensure1d`
`data_input_&_validation/ensure1d.py:0`
## `to_array_1d`
`data_input_&_validation/to-array-1d.py:0`
## `Input type coercion`
`data_input_&_validation/input-type-coercion.py:0`
FILE:references/components/forecasting_-_simulation.md
# forecasting_&_simulation (5 classes)
## `ARCHModelResult.forecast`
`forecasting_&_simulation/archmodelresult-forecast.py:0`
## `VarianceForecast._analytic_forecast`
`forecasting_&_simulation/varianceforecast-analytic-forecast.py:0`
## `VarianceForecast._simulation_forecast`
`forecasting_&_simulation/varianceforecast-simulation-forecast.py:0`
## `Forecasting method`
`forecasting_&_simulation/forecasting-method.py:0`
## `Alignment`
`forecasting_&_simulation/alignment.py:0`
FILE:references/components/model_specification.md
# model_specification (7 classes)
## `ARCHModel.fit`
`model_specification/archmodel-fit.py:0`
## `ARCHModel.forecast`
`model_specification/archmodel-forecast.py:0`
## `GARCH.__init__`
`model_specification/garch-init.py:0`
## `HARX.__init__`
`model_specification/harx-init.py:0`
## `Mean model`
`model_specification/mean-model.py:0`
## `Volatility model`
`model_specification/volatility-model.py:0`
## `Distribution`
`model_specification/distribution.py:0`
FILE:references/components/parameter_estimation.md
# parameter_estimation (5 classes)
## `ARCHModel.fit`
`parameter_estimation/archmodel-fit.py:0`
## `ARCHModelResult.summary`
`parameter_estimation/archmodelresult-summary.py:0`
## `ARCHModelResult.conf_int`
`parameter_estimation/archmodelresult-conf-int.py:0`
## `Starting values`
`parameter_estimation/starting-values.py:0`
## `Covariance type`
`parameter_estimation/covariance-type.py:0`
FILE:references/components/results_reporting_-_visualization.md
# results_reporting_&_visualization (5 classes)
## `ARCHModelResult.summary`
`results_reporting_&_visualization/archmodelresult-summary.py:0`
## `ARCHModelResult.conf_int`
`results_reporting_&_visualization/archmodelresult-conf-int.py:0`
## `ARCHModelResult.arch_lm_test`
`results_reporting_&_visualization/archmodelresult-arch-lm-test.py:0`
## `WaldTestStatistic`
`results_reporting_&_visualization/waldteststatistic.py:0`
## `Output format`
`results_reporting_&_visualization/output-format.py:0`
FILE:references/components/unit_root_-_cointegration_testing.md
# unit_root_&_cointegration_testing (7 classes)
## `ADF.__init__`
`unit_root_&_cointegration_testing/adf-init.py:0`
## `UnitRootTest.stat`
`unit_root_&_cointegration_testing/unitroottest-stat.py:0`
## `CointegrationTestResult.stat`
`unit_root_&_cointegration_testing/cointegrationtestresult-stat.py:0`
## `DynamicOLS.__init__`
`unit_root_&_cointegration_testing/dynamicols-init.py:0`
## `Test statistic`
`unit_root_&_cointegration_testing/test-statistic.py:0`
## `Lag selection method`
`unit_root_&_cointegration_testing/lag-selection-method.py:0`
## `Covariance kernel`
`unit_root_&_cointegration_testing/covariance-kernel.py:0`
FILE:references/seed.yaml
meta:
id: finance-bp-124-v5.3
version: v6.1
blueprint_id: finance-bp-124
sop_version: crystal-compilation-v6.1
source_language: en
compiled_at: '2026-04-22T13:01:01.570350+00:00'
target_host: openclaw
authoritative_artifact:
primary: seed.yaml
non_authoritative_derivatives:
- SKILL.md (host-generated summary, may lag)
- HEARTBEAT.md (host telemetry)
- memory/*.md (host conversational memory)
rule: On any behavioral decision (preconditions check, OV assertion, EQ rule firing, spec_lock verification), agents MUST
re-read seed.yaml. Derivatives are for UI display only and may be out-of-date.
execution_protocol:
install_trigger:
- Execute resources.host_adapter.install_recipes[] in declared order
- Verify each package with import check before proceeding
execute_trigger: When user intent matches intent_router.uc_entries[].positive_terms AND user uses action verb (run/execute/跑/执行/backtest/fetch/collect)
on_execute:
- Reload seed.yaml (do not rely on SKILL.md or cached summaries)
- Run preconditions[] in declared order; halt on first fatal failure with on_fail message to user
- Enter context_state_machine.CA1_MEMORY_CHECKED state
- Evaluate evidence_quality.enforcement_rules[]; prepend user_disclosure_template
- Translate user_facing_fields to user locale per locale_contract
- "[V6 READING ORDER]\nThis crystal contains the following V6 layers. Before answering any business question, the host\
\ MUST read them in order:\n 1. anti_patterns[] — cross-project anti-patterns (with AP-* ids)\n 2. cross_project_wisdom[]\
\ — cross-project wisdom (with CW-* ids)\n 3. domain_constraints_injected[] — domain constraints (SHARED-* ids)\n \
\ 4. known_use_cases[] — concrete business scenarios (KUC-* ids)\n 5. component_capability_map — AST component map\
\ (by module)\n\nWhen answering user questions, proactively cite relevant AP-*/CW-*/SHARED-*/KUC-* ids with source text.\
\ Examples: T+1 rules -> cite SHARED-* constraint; model comparison -> warn via AP-*; follow-holdings strategy -> cite\
\ KUC-* with example file."
workspace_resolution:
scripts_path: '{host_workspace}/scripts/'
skills_path: '{host_workspace}/skills/'
trace_path: '{host_workspace}/.trace/'
capability_tags:
markets:
- global
activities:
- derivatives-pricing
upgraded_from: finance-bp-124-v1.seed.yaml
upgraded_at: '2026-04-22T13:20:34.223301+00:00'
v6_inputs:
ast_mind_map: knowledge/sources/finance/finance-bp-124--arch/v6_inputs/ast_mind_map.yaml
anti_patterns: null
cross_project_wisdom: null
examples_kuc: knowledge/sources/finance/finance-bp-124--arch/v6_inputs/examples_kuc.yaml
shared_pools_dir: knowledge/sources/finance/_shared
anti_patterns:
- id: AP-DERIVATIVES-PRICING-001
title: Instrument NPV called without attached pricing engine
description: Calling NPV() on a derivatives instrument without first calling setPricingEngine() returns uninitialized garbage
values or throws null pointer exceptions. This occurs because the Instrument class relies on the attached PricingEngine
to perform actual valuation logic. The consequence is silently incorrect pricing results that appear valid, potentially
leading to bad trading decisions.
project_source: QuantLib-SWIG (finance-bp-123)
severity: high
applicable_to_tags:
markets:
- global
activities:
- derivatives-pricing
_source_file: anti-patterns/derivatives-pricing.yaml
- id: AP-DERIVATIVES-PRICING-002
title: BSM forward price ignores dividend yield
description: When calculating option prices on dividend-paying stocks using BSM, the forward price must be adjusted as F
= S * exp((r-q)*t). Omitting the dividend yield adjustment (using F = S * exp(r*t)) causes systematic mispricing for all
dividend-paying assets. The consequence is consistently wrong option prices that diverge from market prices, leading to
arbitrage opportunities and trading losses.
project_source: py_vollib (finance-bp-127)
severity: high
applicable_to_tags:
markets:
- global
activities:
- derivatives-pricing
_source_file: anti-patterns/derivatives-pricing.yaml
- id: AP-DERIVATIVES-PRICING-003
title: Negative discount factors passed to log-domain interpolation
description: When Numba-jitted interpolation functions perform log transformation on discount factors, negative or zero
values cause domain errors. This occurs because log(-x) and log(0) are mathematically undefined. The consequence is runtime
crashes in jitted functions and complete failure of discount curve interpolation, blocking all downstream pricing calculations.
project_source: FinancePy (finance-bp-101)
severity: high
applicable_to_tags:
markets:
- global
activities:
- derivatives-pricing
_source_file: anti-patterns/derivatives-pricing.yaml
- id: AP-DERIVATIVES-PRICING-004
title: Non-monotonic time points in discount curve interpolation
description: Interpolation over non-monotonically increasing time points produces undefined behavior at crossing times,
causing discount factors to be incorrectly computed where time values overlap. This corrupts the entire term structure
because the bootstrap algorithm cannot determine which discount factor corresponds to which maturity. The consequence
is incorrect present value calculations across all downstream products priced against the curve.
project_source: FinancePy (finance-bp-101)
severity: high
applicable_to_tags:
markets:
- global
activities:
- derivatives-pricing
_source_file: anti-patterns/derivatives-pricing.yaml
- id: AP-DERIVATIVES-PRICING-005
title: Bootstrap calibration instruments not in maturity order
description: When building yield curves from market instruments (deposits, FRAs, swaps), the instruments must be provided
in strictly increasing maturity order. Out-of-order instruments cause the bootstrap algorithm to solve for discount factors
at incorrect time points, corrupting the entire term structure. The consequence is wrong forward rates and discount factors
that propagate into all priced instruments.
project_source: FinancePy (finance-bp-101)
severity: high
applicable_to_tags:
markets:
- global
activities:
- derivatives-pricing
_source_file: anti-patterns/derivatives-pricing.yaml
- id: AP-DERIVATIVES-PRICING-006
title: Option Exercise type mismatches VanillaOption constructor
description: VanillaOption requires both a StrikedTypePayoff and a matching Exercise object. Using wrong Exercise type (e.g.,
AmericanExercise for European option) causes compilation failures in C++ or runtime errors in SWIG bindings. The consequence
is the pricing system cannot initialize options, blocking all option pricing workflows.
project_source: QuantLib-SWIG (finance-bp-123)
severity: high
applicable_to_tags:
markets:
- global
activities:
- derivatives-pricing
_source_file: anti-patterns/derivatives-pricing.yaml
- id: AP-DERIVATIVES-PRICING-007
title: NaN/inf values in ARCH model input data
description: ARCH model estimation relies on recursive variance computations and scipy optimize. Non-finite input values
(NaN, inf) cause optimizers to produce NaN results and recursive variance calculations to fail. The consequence is complete
model estimation failure with meaningless outputs that appear valid, leading to incorrect volatility forecasts and risk
misestimation.
project_source: arch (finance-bp-124)
severity: high
applicable_to_tags:
markets:
- global
activities:
- derivatives-pricing
_source_file: anti-patterns/derivatives-pricing.yaml
- id: AP-DERIVATIVES-PRICING-008
title: ARCH parameter array concatenation in wrong order
description: 'ARCHModel composes from three components (mean, volatility, distribution) and requires parameter arrays concatenated
in fixed order: [mean_params, volatility_params, distribution_params]. Incorrect ordering causes _parse_parameters to
assign wrong values to wrong components, producing mathematically invalid models (e.g., volatility parameters interpreted
as distribution parameters). The consequence is invalid conditional variance forecasts.'
project_source: arch (finance-bp-124)
severity: high
applicable_to_tags:
markets:
- global
activities:
- derivatives-pricing
_source_file: anti-patterns/derivatives-pricing.yaml
- id: AP-DERIVATIVES-PRICING-009
title: Zero or negative time-to-expiration in option pricing
description: Option pricing formulas (Black-Scholes, Black model) compute sqrt(t) in the denominator. Zero time causes division
by zero; negative time produces NaN in d1/d2 calculations. The consequence is invalid option prices (NaN, inf) that break
downstream Greeks calculations and hedging workflows.
project_source: py_vollib (finance-bp-127)
severity: high
applicable_to_tags:
markets:
- global
activities:
- derivatives-pricing
_source_file: anti-patterns/derivatives-pricing.yaml
- id: AP-DERIVATIVES-PRICING-010
title: Black model applies spot price instead of forward price
description: The Black model is designed for options on futures/forwards and expects futures price F as input, not spot
price S. Using spot directly causes incorrect pricing because the Black formula assumes the underlying follows geometric
Brownian motion with drift equal to the risk-free rate (i.e., forward dynamics). The consequence is systematically wrong
forward option prices.
project_source: py_vollib (finance-bp-127)
severity: high
applicable_to_tags:
markets:
- global
activities:
- derivatives-pricing
_source_file: anti-patterns/derivatives-pricing.yaml
- id: AP-DERIVATIVES-PRICING-011
title: Missing discount factor in Black model pricing
description: Black model pricing must apply time value discounting with deflater = exp(-r*t) to undiscounted option prices.
Omitting the discount factor produces forward option prices that exceed their fair value by the risk-free compounding
amount. The consequence is violation of time value of money principles and prices that cannot be used for fair valuation
or hedging.
project_source: py_vollib (finance-bp-127)
severity: medium
applicable_to_tags:
markets:
- global
activities:
- derivatives-pricing
_source_file: anti-patterns/derivatives-pricing.yaml
- id: AP-DERIVATIVES-PRICING-012
title: Invalid flag parameter ('c'/'p') passed to py_vollib without validation
description: py_vollib binary_flag dict only contains keys 'c' and 'p'. Passing any other flag value causes KeyError exception.
The library lacks input validation and crashes on invalid inputs. The consequence is unhandled exceptions in production
systems when flag values come from external sources with unexpected formats.
project_source: py_vollib (finance-bp-127)
severity: medium
applicable_to_tags:
markets:
- global
activities:
- derivatives-pricing
_source_file: anti-patterns/derivatives-pricing.yaml
- id: AP-DERIVATIVES-PRICING-013
title: Evaluation date not set before QuantLib term structure construction
description: QuantLib requires ql.Settings.instance().evaluationDate to be set before constructing yield term structures
and instruments. Without an explicit evaluation date, the curve reference date becomes undefined, causing date calculations
to fail or produce incorrect settlement dates. The consequence is wrong discount factors and NPV calculations across the
entire portfolio.
project_source: QuantLib-SWIG (finance-bp-123)
severity: medium
applicable_to_tags:
markets:
- global
activities:
- derivatives-pricing
_source_file: anti-patterns/derivatives-pricing.yaml
- id: AP-DERIVATIVES-PRICING-014
title: Market quotes passed without QuoteHandle wrapper
description: QuantLib's observer pattern requires all market quotes to be wrapped in QuoteHandle before passing to rate
helpers. Raw quote values bypass the observable notification mechanism, causing dependent instruments to never recalculate
when market data updates. The consequence is stale pricing that doesn't reflect current market conditions.
project_source: QuantLib-SWIG (finance-bp-123)
severity: medium
applicable_to_tags:
markets:
- global
activities:
- derivatives-pricing
_source_file: anti-patterns/derivatives-pricing.yaml
- id: AP-DERIVATIVES-PRICING-015
title: Implied volatility computed without proper bounds validation
description: When computing implied volatility, option prices outside theoretical bounds (below intrinsic value or above
maximum) must raise appropriate exceptions. Returning invalid IV values (negative volatility or extreme values) violates
mathematical definitions and leads to incorrect pricing, risk calculations, and hedging ratios. The consequence is systemic
pricing errors across all vol-dependent derivatives.
project_source: py_vollib (finance-bp-127)
severity: medium
applicable_to_tags:
markets:
- global
activities:
- derivatives-pricing
_source_file: anti-patterns/derivatives-pricing.yaml
cross_project_wisdom:
- wisdom_id: CW-DERIVATIVES-PRICING-001
source_project: FinancePy, QuantLib-SWIG
pattern_name: Strict input validation before financial calculations
description: Both FinancePy and QuantLib-SWIG enforce strict validation of all input parameters before any financial computation.
FinancePy validates day count types, date arguments, tolerance parameters, and max iterations. QuantLib-SWIG validates
exercise types and swap direction enums. This pattern prevents corrupted calculations and provides clear error messages.
Apply this pattern by validating all inputs at function entry points.
applicable_to_activity: derivatives-pricing
_source_file: cross-project-wisdom/derivatives-pricing.yaml
- wisdom_id: CW-DERIVATIVES-PRICING-002
source_project: FinancePy, QuantLib-SWIG
pattern_name: Bootstrap requires ordered instrument calibration
description: Both FinancePy and QuantLib-SWIG require calibration instruments to be provided in strict maturity order for
curve bootstrapping. FinancePy enforces monotonically increasing time points and validates instrument sequencing (deposits
before FRAs before swaps). QuantLib-SWIG uses bootstrap helpers (DepositRateHelper, FraRateHelper, SwapRateHelper) that
assume ordered inputs. This ensures the bootstrap algorithm solves for discount factors at mathematically correct time
points.
applicable_to_activity: derivatives-pricing
_source_file: cross-project-wisdom/derivatives-pricing.yaml
- wisdom_id: CW-DERIVATIVES-PRICING-003
source_project: QuantLib-SWIG
pattern_name: Handle pattern for lazy evaluation chains
description: QuantLib-SWIG requires wrapping market data (quotes, term structures) in Handle objects to enable lazy evaluation
and automatic recalculation. QuoteHandle for market quotes and Handle for term structures enable the observer pattern.
When market data updates, all dependent instruments automatically recalculate. This pattern is essential for live pricing
systems where prices must reflect current market conditions.
applicable_to_activity: derivatives-pricing
_source_file: cross-project-wisdom/derivatives-pricing.yaml
- wisdom_id: CW-DERIVATIVES-PRICING-004
source_project: arch
pattern_name: Parameter composition requires fixed ordering and partitioning
description: arch enforces a strict parameter composition pattern where mean, volatility, and distribution parameters must
be concatenated in fixed order with explicit offset partitioning. The offsets array partitions the unified parameter vector
into components. This pattern prevents parameter assignment errors that would corrupt model components. Apply this when
composing financial models from multiple sub-components.
applicable_to_activity: derivatives-pricing
_source_file: cross-project-wisdom/derivatives-pricing.yaml
- wisdom_id: CW-DERIVATIVES-PRICING-005
source_project: arch, py_vollib
pattern_name: Strict mathematical constraint enforcement
description: 'Both arch and py_vollib enforce strict mathematical constraints: arch enforces volatility model stationarity
constraints (A.dot(params) - b >= 0) for SLSQP optimization; py_vollib validates implied volatility is positive and option
prices within intrinsic/maximum bounds. Violating these constraints produces mathematically invalid results. Always enforce
domain constraints on all financial model parameters.'
applicable_to_activity: derivatives-pricing
_source_file: cross-project-wisdom/derivatives-pricing.yaml
- wisdom_id: CW-DERIVATIVES-PRICING-006
source_project: py_vollib
pattern_name: Forward price adjustment for dividend yield in BSM
description: 'py_vollib demonstrates the correct BSM implementation: compute forward price F = S * exp((r-q)*t) to adjust
for continuous dividend yield before passing to the pricing engine. This pattern is essential for all options on dividend-paying
assets. Forgetting the dividend adjustment causes systematic mispricing for the entire equity derivatives book.'
applicable_to_activity: derivatives-pricing
_source_file: cross-project-wisdom/derivatives-pricing.yaml
- wisdom_id: CW-DERIVATIVES-PRICING-007
source_project: FinancePy
pattern_name: Monotonicity validation for interpolation arrays
description: FinancePy enforces strictly monotonically increasing time arrays before interpolation operations. This prevents
undefined behavior at crossing times and ensures each time point maps to exactly one discount factor. Apply this validation
whenever implementing interpolation over financial time series (discount curves, volatility surfaces, forward rates).
applicable_to_activity: derivatives-pricing
_source_file: cross-project-wisdom/derivatives-pricing.yaml
- wisdom_id: CW-DERIVATIVES-PRICING-008
source_project: py_vollib
pattern_name: Production vs reference implementation selection
description: py_vollib explicitly distinguishes between ref_python (slow, educational) and production (fast, C-based lets_be_rational)
implementations. Using the reference implementation in production causes 10-100x performance degradation. Always select
the appropriate implementation tier based on use case requirements—reference for testing/education, optimized for production
trading systems.
applicable_to_activity: derivatives-pricing
_source_file: cross-project-wisdom/derivatives-pricing.yaml
domain_constraints_injected: []
resources_injected: {}
known_use_cases:
- kuc_id: KUC-101
source_file: examples/bootstrap_examples.ipynb
business_problem: Computes statistical inference (confidence intervals, standard errors) for the Sharpe Ratio using bootstrap
methods to quantify uncertainty in risk-adjusted performance metrics.
intent_keywords:
- bootstrap
- sharpe ratio
- statistical inference
- confidence intervals
- stationary bootstrap
stage: factor_computation
data_domain: financial_data
type: research_analysis
- kuc_id: KUC-102
source_file: examples/multiple-comparison_examples.ipynb
business_problem: Compares 500 predictive models against a benchmark using the Superior Predictive Ability (SPA) test to
determine if any models significantly outperform the benchmark.
intent_keywords:
- model comparison
- SPA test
- multiple models
- benchmark comparison
- superior predictive ability
stage: factor_computation
data_domain: financial_data
type: research_analysis
- kuc_id: KUC-103
source_file: examples/unitroot_cointegration_examples.ipynb
business_problem: Tests for cointegration relationships between WTI and Brent crude oil prices to identify mean-reverting
spread opportunities using Engle-Granger and Phillips-Ouliaris tests.
intent_keywords:
- cointegration
- unit root
- ADF test
- Engle-Granger
- oil prices
- mean reversion
stage: factor_computation
data_domain: financial_data
type: research_analysis
- kuc_id: KUC-104
source_file: examples/unitroot_examples.ipynb
business_problem: Tests for stationarity in credit spreads (BAA-AAA) using Augmented Dickey-Fuller tests to determine if
mean-reversion trading strategies are applicable.
intent_keywords:
- unit root
- ADF test
- stationarity
- credit spread
- mean reversion
stage: factor_computation
data_domain: financial_data
type: research_analysis
- kuc_id: KUC-105
source_file: examples/univariate_forecasting_with_exogenous_variables.ipynb
business_problem: Forecasts univariate time series using Autoregressive models with exogenous variables (ARX) to capture
the impact of external factors on the target variable.
intent_keywords:
- ARX
- exogenous variables
- forecasting
- autoregressive
- regression
stage: factor_computation
data_domain: financial_data
type: research_analysis
- kuc_id: KUC-106
source_file: examples/univariate_using_fixed_variance.ipynb
business_problem: Demonstrates how to specify a HARX mean model with fixed/external variance inputs and iteratively fit
volatility models using the estimated conditional volatility.
intent_keywords:
- fixed variance
- HARX
- volatility modeling
- GARCH
- VIX
stage: factor_computation
data_domain: financial_data
type: research_analysis
- kuc_id: KUC-107
source_file: examples/univariate_volatility_forecasting.ipynb
business_problem: Forecasts future volatility of S&P 500 returns using GARCH models, including multi-step ahead forecasts
and rolling window out-of-sample predictions.
intent_keywords:
- GARCH
- volatility forecasting
- S&P 500
- conditional variance
- out-of-sample
stage: factor_computation
data_domain: financial_data
type: research_analysis
- kuc_id: KUC-108
source_file: examples/univariate_volatility_modeling.ipynb
business_problem: Fits and compares different GARCH volatility model specifications (symmetric, asymmetric, power) with
various error distributions to characterize S&P 500 return volatility dynamics.
intent_keywords:
- GARCH
- volatility modeling
- S&P 500
- model comparison
- asymmetric GARCH
- t-distribution
stage: factor_computation
data_domain: financial_data
type: research_analysis
- kuc_id: KUC-109
source_file: examples/univariate_volatility_scenarios.ipynb
business_problem: Generates multiple volatility scenarios for NASDAQ returns using simulation-based forecasting methods,
useful for risk management and option pricing applications.
intent_keywords:
- volatility scenarios
- simulation
- NASDAQ
- GARCH
- scenario analysis
- risk management
stage: factor_computation
data_domain: financial_data
type: research_analysis
component_capability_map:
project: finance-bp-124--arch
scan_date: '2026-04-22'
stats:
total_files: 7
total_classes: 40
total_functions: 0
total_stages: 7
modules:
data_input_&_validation:
class_count: 4
stage_id: data_input
stage_order: 1
responsibility: Accept time series data (numpy/pandas), validate finiteness, convert to contiguous float64 arrays for
core computation. Provides type flexibility while ensuring memory layout efficiency for downstream numeric operations.
classes:
- name: ARCHModel.__init__
file: data_input_&_validation/archmodel-init.py
line: 0
kind: required_method
signature: ''
- name: ensure1d
file: data_input_&_validation/ensure1d.py
line: 0
kind: required_method
signature: ''
- name: to_array_1d
file: data_input_&_validation/to-array-1d.py
line: 0
kind: required_method
signature: ''
- name: Input type coercion
file: data_input_&_validation/input-type-coercion.py
line: 0
kind: replaceable_point
design_decision_count: 2
model_specification:
class_count: 7
stage_id: model_specification
stage_order: 2
responsibility: Allow users to compose mean model + volatility process + distribution as pluggable components. Each
component implements a common interface for unified likelihood computation and forecasting.
classes:
- name: ARCHModel.fit
file: model_specification/archmodel-fit.py
line: 0
kind: required_method
signature: ''
- name: ARCHModel.forecast
file: model_specification/archmodel-forecast.py
line: 0
kind: required_method
signature: ''
- name: GARCH.__init__
file: model_specification/garch-init.py
line: 0
kind: required_method
signature: ''
- name: HARX.__init__
file: model_specification/harx-init.py
line: 0
kind: required_method
signature: ''
- name: Mean model
file: model_specification/mean-model.py
line: 0
kind: replaceable_point
- name: Volatility model
file: model_specification/volatility-model.py
line: 0
kind: replaceable_point
- name: Distribution
file: model_specification/distribution.py
line: 0
kind: replaceable_point
design_decision_count: 5
parameter_estimation:
class_count: 5
stage_id: parameter_estimation
stage_order: 3
responsibility: Jointly estimate mean+volatility+distribution parameters via constrained maximum likelihood. Uses SLSQP
with bounds and inequality constraints derived from stationarity/coerciveness requirements.
classes:
- name: ARCHModel.fit
file: parameter_estimation/archmodel-fit.py
line: 0
kind: required_method
signature: ''
- name: ARCHModelResult.summary
file: parameter_estimation/archmodelresult-summary.py
line: 0
kind: required_method
signature: ''
- name: ARCHModelResult.conf_int
file: parameter_estimation/archmodelresult-conf-int.py
line: 0
kind: required_method
signature: ''
- name: Starting values
file: parameter_estimation/starting-values.py
line: 0
kind: replaceable_point
- name: Covariance type
file: parameter_estimation/covariance-type.py
line: 0
kind: replaceable_point
design_decision_count: 4
forecasting_&_simulation:
class_count: 5
stage_id: forecasting
stage_order: 4
responsibility: Generate multi-step volatility/mean forecasts using analytic formulas, simulation, or bootstrap. Handles
alignment (origin vs target), reindexing, and exogenous regressors.
classes:
- name: ARCHModelResult.forecast
file: forecasting_&_simulation/archmodelresult-forecast.py
line: 0
kind: required_method
signature: ''
- name: VarianceForecast._analytic_forecast
file: forecasting_&_simulation/varianceforecast-analytic-forecast.py
line: 0
kind: required_method
signature: ''
- name: VarianceForecast._simulation_forecast
file: forecasting_&_simulation/varianceforecast-simulation-forecast.py
line: 0
kind: required_method
signature: ''
- name: Forecasting method
file: forecasting_&_simulation/forecasting-method.py
line: 0
kind: replaceable_point
- name: Alignment
file: forecasting_&_simulation/alignment.py
line: 0
kind: replaceable_point
design_decision_count: 3
unit_root_&_cointegration_testing:
class_count: 7
stage_id: unitroot_testing
stage_order: 5
responsibility: Test time series for stationarity (unit roots) and cross-series cointegration relationships. Supports
ADF, DFGLS, PhillipsPerron, KPSS, ZivotAndrews, VarianceRatio, Engle-Granger, Phillips-Ouliaris, DOLS, FMOLS.
classes:
- name: ADF.__init__
file: unit_root_&_cointegration_testing/adf-init.py
line: 0
kind: required_method
signature: ''
- name: UnitRootTest.stat
file: unit_root_&_cointegration_testing/unitroottest-stat.py
line: 0
kind: required_method
signature: ''
- name: CointegrationTestResult.stat
file: unit_root_&_cointegration_testing/cointegrationtestresult-stat.py
line: 0
kind: required_method
signature: ''
- name: DynamicOLS.__init__
file: unit_root_&_cointegration_testing/dynamicols-init.py
line: 0
kind: required_method
signature: ''
- name: Test statistic
file: unit_root_&_cointegration_testing/test-statistic.py
line: 0
kind: replaceable_point
- name: Lag selection method
file: unit_root_&_cointegration_testing/lag-selection-method.py
line: 0
kind: replaceable_point
- name: Covariance kernel
file: unit_root_&_cointegration_testing/covariance-kernel.py
line: 0
kind: replaceable_point
design_decision_count: 1
bootstrap_&_multiple_comparison:
class_count: 7
stage_id: bootstrap_inference
stage_order: 6
responsibility: Time-series bootstrap for standard errors/confidence intervals; multiple comparison procedures (MCS,
StepM, SPA) for model selection. Supports block bootstrap (circular, stationary, moving) and independent resampling.
classes:
- name: IIDBootstrap.conf_int
file: bootstrap_&_multiple_comparison/iidbootstrap-conf-int.py
line: 0
kind: required_method
signature: ''
- name: StationaryBootstrap.__init__
file: bootstrap_&_multiple_comparison/stationarybootstrap-init.py
line: 0
kind: required_method
signature: ''
- name: MCS.__init__
file: bootstrap_&_multiple_comparison/mcs-init.py
line: 0
kind: required_method
signature: ''
- name: SPA.__init__
file: bootstrap_&_multiple_comparison/spa-init.py
line: 0
kind: required_method
signature: ''
- name: Bootstrap type
file: bootstrap_&_multiple_comparison/bootstrap-type.py
line: 0
kind: replaceable_point
- name: Confidence interval method
file: bootstrap_&_multiple_comparison/confidence-interval-method.py
line: 0
kind: replaceable_point
- name: Multiple comparison procedure
file: bootstrap_&_multiple_comparison/multiple-comparison-procedure.py
line: 0
kind: replaceable_point
design_decision_count: 4
results_reporting_&_visualization:
class_count: 5
stage_id: results_reporting
stage_order: 7
responsibility: Format and display estimation results (summary tables, R², AIC/BIC, parameter table with std_err/tvalues/pvalues),
residual diagnostics (ARCH-LM test), and visualization (hedgehog plots, residual plots).
classes:
- name: ARCHModelResult.summary
file: results_reporting_&_visualization/archmodelresult-summary.py
line: 0
kind: required_method
signature: ''
- name: ARCHModelResult.conf_int
file: results_reporting_&_visualization/archmodelresult-conf-int.py
line: 0
kind: required_method
signature: ''
- name: ARCHModelResult.arch_lm_test
file: results_reporting_&_visualization/archmodelresult-arch-lm-test.py
line: 0
kind: required_method
signature: ''
- name: WaldTestStatistic
file: results_reporting_&_visualization/waldteststatistic.py
line: 0
kind: required_method
signature: ''
- name: Output format
file: results_reporting_&_visualization/output-format.py
line: 0
kind: replaceable_point
design_decision_count: 1
data_flow_hints: []
locale_contract:
source_language: en
user_facing_fields:
- human_summary.what_i_can_do.tagline
- human_summary.what_i_can_do.use_cases[]
- human_summary.what_i_auto_fetch[]
- human_summary.what_i_ask_you[]
- evidence_quality.user_disclosure_template
- post_install_notice.message_template.positioning
- post_install_notice.message_template.capability_catalog.groups[].name
- post_install_notice.message_template.capability_catalog.groups[].description
- post_install_notice.message_template.capability_catalog.groups[].ucs[].name
- post_install_notice.message_template.capability_catalog.groups[].ucs[].short_description
- post_install_notice.message_template.call_to_action
- post_install_notice.message_template.featured_entries[].beginner_prompt
- post_install_notice.message_template.more_info_hint
- preconditions[].description
- preconditions[].on_fail
- intent_router.uc_entries[].name
- intent_router.uc_entries[].ambiguity_question
- architecture.pipeline
- architecture.stages[].narrative.does_what
- architecture.stages[].narrative.key_decisions
- architecture.stages[].narrative.common_pitfalls
- constraints.fatal[].consequence
- constraints.regular[].consequence
- output_validator.assertions[].failure_message
- acceptance.hard_gates[].on_fail
- skill_crystallization.action
locale_detection_order:
- explicit_user_declaration
- first_message_language
- system_locale
translation_enforcement:
trigger: on_first_user_message
action: Render user_facing_fields in detected locale, preserving all IDs (BD-/SL-/UC-/finance-C-) and code identifiers
verbatim
violation_code: LOCALE-01
violation_signal: User receives untranslated English Human Summary when detected locale != en
evidence_quality:
declared:
evidence_coverage_ratio: 1.0
evidence_verify_ratio: 0.4722222222222222
evidence_invalid: 38
evidence_verified: 34
evidence_auto_fixed: 0
audit_coverage: 53/53 (100%)
audit_pass_rate: 3/53 (5%)
audit_fail_total: 32
audit_finance_universal:
pass: 1
warn: 9
fail: 10
audit_subdomain_totals:
pass: 2
warn: 9
fail: 22
enforcement_rules:
- id: EQ-01
trigger: declared.evidence_verify_ratio < 0.5
action: MUST invoke traceback lookup for all cited BD-IDs in output before emitting business code — read LATEST.yaml sections
for each BD referenced
violation_code: EQ-01-V
violation_signal: Generated script references BD-IDs but no tool_call to read LATEST.yaml preceded code generation
user_disclosure_template: '[QUALITY NOTICE] This crystal was compiled from blueprint finance-bp-124. Evidence verify ratio
= 47.2% and audit fail total = 32. Generated results may have uncaptured requirement gaps. Verify critical decisions against
source files (LATEST.yaml / LATEST.jsonl).'
traceback:
source_files:
blueprint: LATEST.yaml
constraints: LATEST.jsonl
mandatory_lookup_scenarios:
- id: TB-01
condition: Two constraints have apparently conflicting enforcement rules
lookup_target: LATEST.jsonl — find both constraint IDs, compare `consequence` + `evidence_refs` to determine priority
- id: TB-02
condition: A business decision rationale is unclear or disputed
lookup_target: LATEST.yaml — locate BD-ID under business_decisions, read `rationale` + `alternative_considered` fields
- id: TB-03
condition: evidence_invalid > 0 in evidence_quality.declared
lookup_target: LATEST.yaml _enrich_meta — cross-check specific BD `evidence_refs` fields for invalid markers
- id: TB-04
condition: User asks where a rule comes from
lookup_target: LATEST.jsonl — find constraint by ID, read `confidence.evidence_refs` for source file + line number
- id: TB-05
condition: Generated code does not match expected ZVT API behavior
lookup_target: LATEST.yaml stages[].required_methods — verify method signature and evidence locator in source code
degraded_lookup:
no_fs_access: 'Ask the user to paste the relevant LATEST.yaml section or LATEST.jsonl lines for the BD-/finance-C- IDs
in question. Crystal ID: finance-bp-124-v5.0.'
trace_schema:
event_types:
- precondition_check
- spec_lock_check
- evidence_rule_fired
- evidence_rule_skipped
- locale_translation_emitted
- hard_gate_passed
- hard_gate_failed
- skill_emitted
- false_completion_claim
preconditions:
- id: PC-01
description: zvt package installed and importable
check_command: python3 -c 'import zvt; print(zvt.__version__)'
on_fail: 'Run: python3 -m pip install zvt then re-run: python3 -m zvt.init_dirs to initialize data directories'
severity: fatal
- id: PC-02
description: K-data exists for target entities (required before backtesting)
check_command: python3 -c "from zvt.api.kdata import get_kdata; df = get_kdata(entity_ids=['stock_sh_600000'], limit=1);
assert df is not None and len(df) > 0, 'No kdata found'"
on_fail: 'Run recorder first: python3 -m zvt.recorders.em.em_stock_kdata_recorder --entity_ids stock_sh_600000 (replace
with your target entity IDs)'
severity: fatal
applies_to_uc:
- UC-101
- UC-102
- UC-103
- UC-104
- UC-105
- UC-106
- UC-107
- UC-108
- UC-109
- id: PC-03
description: ZVT data directory initialized (~/.zvt or ZVT_HOME)
check_command: 'python3 -c "import os; from pathlib import Path; zvt_home = Path(os.environ.get(''ZVT_HOME'', Path.home()
/ ''.zvt'')); assert zvt_home.exists(), f''ZVT home not found: {zvt_home}''"'
on_fail: 'Run: python3 -m zvt.init_dirs'
severity: fatal
- id: PC-04
description: SQLite write permission for ZVT data directory
check_command: python3 -c "import os, tempfile; from pathlib import Path; zvt_home = Path(os.environ.get('ZVT_HOME', Path.home()
/ '.zvt')); test_f = zvt_home / '.write_test'; test_f.touch(); test_f.unlink()"
on_fail: 'Check directory permissions: chmod u+w ~/.zvt or set ZVT_HOME environment variable to a writable location'
severity: warn
intent_router:
uc_entries:
- uc_id: UC-101
name: Sharpe Ratio Bootstrap Statistical Inference
positive_terms:
- bootstrap
- sharpe ratio
- statistical inference
- confidence intervals
- stationary bootstrap
data_domain: financial_data
negative_terms:
- GARCH
- volatility forecasting
- unit root
- cointegration
- model comparison
ambiguity_question: Are you looking to compute statistical inference on performance metrics like Sharpe Ratio using bootstrap
methods?
- uc_id: UC-102
name: Multiple Model Comparison with SPA Test
positive_terms:
- model comparison
- SPA test
- multiple models
- benchmark comparison
- superior predictive ability
data_domain: financial_data
negative_terms:
- GARCH
- volatility forecasting
- unit root
- cointegration
- bootstrap
ambiguity_question: Do you want to test whether multiple models can significantly beat a benchmark predictor?
- uc_id: UC-103
name: Oil Price Cointegration Analysis
positive_terms:
- cointegration
- unit root
- ADF test
- Engle-Granger
- oil prices
- mean reversion
data_domain: financial_data
negative_terms:
- GARCH
- volatility forecasting
- bootstrap
- model comparison
ambiguity_question: Are you testing whether two related financial series (like crude oil prices) move together in a long-run
equilibrium?
- uc_id: UC-104
name: Credit Spread Stationarity Testing
positive_terms:
- unit root
- ADF test
- stationarity
- credit spread
- mean reversion
data_domain: financial_data
negative_terms:
- cointegration
- GARCH
- volatility forecasting
- bootstrap
ambiguity_question: Are you testing whether a time series (like credit spreads) is stationary or has a unit root?
- uc_id: UC-105
name: ARX Forecasting with Exogenous Variables
positive_terms:
- ARX
- exogenous variables
- forecasting
- autoregressive
- regression
data_domain: financial_data
negative_terms:
- GARCH
- volatility modeling
- cointegration
- unit root
ambiguity_question: Do you want to forecast a time series while accounting for the effect of external/exogenous variables?
- uc_id: UC-106
name: HARX Volatility Modeling with Fixed Variance
positive_terms:
- fixed variance
- HARX
- volatility modeling
- GARCH
- VIX
data_domain: financial_data
negative_terms:
- cointegration
- unit root
- exogenous variables
- bootstrap
ambiguity_question: Do you want to model volatility using a pre-specified or externally computed variance series?
- uc_id: UC-107
name: S&P 500 GARCH Volatility Forecasting
positive_terms:
- GARCH
- volatility forecasting
- S&P 500
- conditional variance
- out-of-sample
data_domain: financial_data
negative_terms:
- cointegration
- unit root
- bootstrap
- model comparison
ambiguity_question: Do you need to forecast future volatility levels for the S&P 500 or similar assets?
- uc_id: UC-108
name: S&P 500 GARCH Volatility Model Comparison
positive_terms:
- GARCH
- volatility modeling
- S&P 500
- model comparison
- asymmetric GARCH
- t-distribution
data_domain: financial_data
negative_terms:
- cointegration
- unit root
- bootstrap
- exogenous variables
ambiguity_question: Are you fitting GARCH volatility models to characterize return variance dynamics?
- uc_id: UC-109
name: NASDAQ Volatility Scenario Generation
positive_terms:
- volatility scenarios
- simulation
- NASDAQ
- GARCH
- scenario analysis
- risk management
data_domain: financial_data
negative_terms:
- cointegration
- unit root
- bootstrap
- model comparison
ambiguity_question: Do you need to generate multiple simulated volatility scenarios for stress testing or risk analysis?
context_state_machine:
states:
- id: CA1_MEMORY_CHECKED
entry: Task started
exit: All memory queries attempted and recorded; memory_unavailable set if failed
timeout: 30s — skip memory, mark memory_unavailable=true, proceed to CA2
- id: CA2_GAPS_FILLED
entry: CA1 complete
exit: 'All FATAL-priority required inputs answered: target market (A-share/HK/US), data source, time range, strategy type'
timeout: NOT skippable — FATAL inputs MUST be user-answered before proceeding
- id: CA3_PATH_SELECTED
entry: CA2 complete
exit: intent_router matched single use case with confidence gap > 20% over next candidate, no data_domain ambiguity
timeout: Trigger ambiguity_question for top-2 candidates, await user selection
- id: CA4_EXECUTING
entry: CA3 complete + user explicit confirmation received
exit: All hard gates G1-Gn passed and output files written
timeout: NOT skippable — user confirmation of execution path required
enforcement: Code generation is PROHIBITED before CA4_EXECUTING. Any regression to earlier state MUST be announced to user.
buy/sell ordering SL-01 check runs at CA4 entry.
spec_lock_registry:
semantic_locks:
- id: SL-01
description: Execute sell orders before buy orders in every trading cycle
locked_value: sell() called before buy() in each Trader.run() iteration
violation_is: fatal
source_bd_ids:
- BD-018
- id: SL-02
description: Trading signals MUST use next-bar execution (no look-ahead)
locked_value: due_timestamp = happen_timestamp + level.to_second()
violation_is: fatal
source_bd_ids:
- BD-014
- BD-025
- id: SL-03
description: Entity IDs MUST follow format entity_type_exchange_code
locked_value: stock_sh_600000 | stockhk_hk_0700 | stockus_nasdaq_AAPL
violation_is: fatal
source_bd_ids: []
- id: SL-04
description: DataFrame index MUST be MultiIndex (entity_id, timestamp)
locked_value: df.index.names == ['entity_id', 'timestamp']
violation_is: fatal
source_bd_ids: []
- id: SL-05
description: 'TradingSignal MUST have EXACTLY ONE of: position_pct, order_money, order_amount'
locked_value: XOR enforcement in trading/__init__.py:68
violation_is: fatal
source_bd_ids: []
- id: SL-06
description: 'filter_result column semantics: True=BUY, False=SELL, None/NaN=NO ACTION'
locked_value: factor.py:475 order_type_flag mapping
violation_is: fatal
source_bd_ids: []
- id: SL-07
description: Transformer MUST run BEFORE Accumulator in factor pipeline
locked_value: 'compute_result(): transform at :403 before accumulator at :409'
violation_is: fatal
source_bd_ids: []
- id: SL-08
description: 'MACD parameters locked: fast=12, slow=26, signal=9'
locked_value: factors/algorithm.py:30 macd(slow=26, fast=12, n=9)
violation_is: fatal
source_bd_ids:
- BD-036
- id: SL-09
description: 'Default transaction costs: buy_cost=0.001, sell_cost=0.001, slippage=0.001'
locked_value: sim_account.py:25 SimAccountService default costs
violation_is: warning
source_bd_ids:
- BD-029
- id: SL-10
description: A-share equity trading is T+1 (no same-day close of buy positions)
locked_value: sim_account.available_long filters by trading_t
violation_is: fatal
source_bd_ids: []
- id: SL-11
description: Recorder subclass MUST define provider AND data_schema class attributes
locked_value: contract/recorder.py:71 Meta; register_schema decorator
violation_is: fatal
source_bd_ids: []
- id: SL-12
description: Factor result_df MUST contain either 'filter_result' OR 'score_result' column
locked_value: result_df.columns.intersection({'filter_result', 'score_result'}) non-empty
violation_is: fatal
source_bd_ids: []
implementation_hints:
- id: IH-01
hint: 'Use AdjustType enum exactly: qfq (pre-adjust), hfq (post-adjust), bfq (none) — contract/__init__.py:121'
- id: IH-02
hint: For A-share kdata, default to hfq for long-term analysis (dividend-adjusted) — trader.py:538 StockTrader
- id: IH-03
hint: SQLite connection MUST use check_same_thread=False for multi-threaded recorders
- id: IH-04
hint: Accumulator state serialization uses JSON with custom encoder/decoder hooks — contract/base_service.py
- id: IH-05
hint: Factor.level MUST match TargetSelector.level (enforced at add_factor) — factors/target_selector.py:84
preservation_manifest:
required_objects:
business_decisions_count: 107
fatal_constraints_count: 77
non_fatal_constraints_count: 151
use_cases_count: 9
semantic_locks_count: 12
preconditions_count: 4
evidence_quality_rules_count: 2
traceback_scenarios_count: 5
architecture:
pipeline: data_collection -> data_storage -> factor_computation -> target_selection -> trading_execution -> visualization
stages:
- id: data_collection
narrative:
does_what: TimeSeriesDataRecorder and FixedCycleDataRecorder fetch OHLCV and fundamental data from providers (eastmoney,
joinquant, baostock, akshare) and persist domain objects (Stock1dKdata, BalanceSheet) to SQLite via df_to_db().
key_decisions: BD-002 chose evaluate_start_end_size_timestamps for incremental fetch (not full refresh) because comparing
to get_latest_saved_record avoids redundant API calls; BD-003 chose get_data_map field transformation to keep domain
schema provider-agnostic.
common_pitfalls: 'Don''t forget SL-11: Recorder subclass MUST declare both provider and data_schema class attributes
else initialization fails with assertion error; finance-C-001 fatal violation.'
business_decisions: []
- id: data_storage
narrative:
does_what: StorageBackend persists DataFrames to per-provider SQLite databases at {data_path}/{provider}/{provider}_{db_name}.db
using path templates from _get_path_template; Mixin.record_data and Mixin.query_data provide uniform read/write interface.
key_decisions: BD-004 chose StorageBackend abstraction (not hardcoded SQLite) to allow future cloud storage swap; BD-006
derives db_name from data_schema __tablename__ for per-domain database isolation.
common_pitfalls: SL-04 violation (wrong DataFrame index) causes factor pipeline failures downstream; always ensure df.index.names
== ['entity_id', 'timestamp'] before calling record_data.
business_decisions: []
- id: factor_computation
narrative:
does_what: Factor.compute() applies Transformer (stateless, e.g. MacdTransformer) then Accumulator (stateful, e.g. MaStatsAccumulator)
to produce filter_result or score_result columns; EntityStateService persists per-entity rolling state across batches.
key_decisions: BD-007 chose Factor inheriting DataReader for composable data access; SL-08 locks MACD at (fast=12, slow=26,
n=9) — chose standard Appel parameters not adaptive because interpretability matters for practitioners.
common_pitfalls: 'SL-07: Transformer MUST run before Accumulator — swapping order causes NaN propagation; SL-12: result_df
must contain filter_result OR score_result column or TargetSelector silently drops all signals.'
business_decisions: []
- id: target_selection
narrative:
does_what: TargetSelector.add_factor() registers Factor instances; get_targets() returns entity_ids passing threshold
filter at a specific timestamp, enabling point-in-time historical backtesting without look-ahead.
key_decisions: BD-012 chose registrable factor list (not hardcoded) for runtime customization; BD-013 chose timestamp-specific
filtering not current-only because backtests need historical point-in-time correctness.
common_pitfalls: Factor.level MUST match TargetSelector.level (IH-05); mismatched levels cause silent empty target lists
that look like no signals but are actually level-mismatch bugs.
business_decisions: []
- id: trading_execution
narrative:
does_what: Trader.run() calls sell() before buy() each cycle, generates TradingSignals with due_timestamp = happen_timestamp
+ level.to_second() for next-bar execution, and applies on_profit_control() for stop-loss/take-profit before regular
target selection.
key_decisions: SL-01 locks sell-before-buy order because available_long check in sim_account depends on it — chose this
over symmetric ordering to prevent implicit leverage; BD-039 chose long=AND/short=OR multi-level logic to reflect
risk asymmetry.
common_pitfalls: 'SL-02 violation (immediate execution instead of next-bar) introduces look-ahead bias and makes backtest
results unreproducible in live trading; SL-10: A-share T+1 constraint — backtesting without it overstates returns.'
business_decisions: []
- id: visualization
narrative:
does_what: Drawer.draw() combines kline main chart with factor overlays and Rect annotations for entry/exit signals
using Plotly; Drawable interface on Factor enables consistent chart rendering across data types.
key_decisions: BD-019 chose drawer_rects subclass override for custom annotations not hardcoded markers — allows traders
to define entry/exit visuals without modifying base drawing logic.
common_pitfalls: draw_result=True by default (BD-055) is fine for development but set draw_result=False in production/headless
environments to avoid Plotly server startup overhead.
business_decisions: []
- id: cross_cutting_concerns
narrative:
does_what: 'Invariants and utilities that span multiple pipeline stages — collected from 21 source groups: arch.bootstrap(2),
arch.covariance(2), arch.unitroot(11), arch.unitroot.critical_values.simulation(2), arch.univariate(11), bandwidth_selection(1),
and 15 more.'
key_decisions: 107 BDs merged here because they apply to more than one main stage (e.g. algorithm helpers, default value
choices, ordering contracts, error handling). Agent should inspect individual BD summaries and link back to affected
main stages via shared IDs.
common_pitfalls: Cross-cutting concerns frequently surface as bugs when changes to one main stage unintentionally break
another. Check constraints referencing these BDs and verify invariants still hold after any stage-local modification.
business_decisions:
- id: BD-030
type: B
summary: Use Politis-White optimal block length formula for Stationary and Circular Block Bootstrap
- id: BD-054
type: B/BA
summary: Use Stationary Bootstrap with geometric block length distribution
- id: BD-035
type: B
summary: Use Bartlett kernel with automatic bandwidth for long-run covariance in KPSS
- id: BD-053
type: B
summary: Use Quadratic Spectral kernel for Andrews-optimal long-run covariance
- id: BD-031
type: B/BA
summary: Use BIC as default lag selection method for ADF test
- id: BD-032
type: B/BA
summary: Use MacKinnon critical value regression surface for ADF/PP p-values
- id: BD-033
type: B/BA
summary: Use automatic max_lags formula 12*(nobs/100)^(1/4) for ADF when not specified
- id: BD-041
type: B
summary: Use Elliott-Rothenberg-Stock GLS detrending for DFGLS test
- id: BD-042
type: B/RC
summary: Use Engle-Granger two-step cointegration test on cross-sectional regression residuals
- id: BD-043
type: B
summary: Use Dynamic OLS with leads and lags for cointegrating vector estimation
- id: BD-044
type: B/BA
summary: Use Newey-West automatic bandwidth for KPSS with Hobijn et al. formula
- id: BD-045
type: B/BA
summary: Use Zivot-Andrews structural break test with single-break assumption
- id: BD-046
type: B/BA
summary: Use Phillips-Ouliaris Za/Zt/Pu/Pz tests with kernel-based long-run covariance
- id: BD-047
type: B
summary: Use Variance Ratio test with heteroskedasticity-robust inference for random walk
- id: BD-055
type: B/BA
summary: Use OLS with t-stat threshold of 1.645 for lag selection in t-stat method
- id: BD-052
type: B
summary: Use Weighted Least Squares for critical value surface estimation in simulations
- id: BD-057
type: B
summary: Use 250,000 simulations for critical value surface estimation
- id: BD-034
type: B
summary: Use GARCH recursion with power transformation for variance bounds
- id: BD-036
type: B
summary: Use EWMA recursion with lambda=0.94 for RiskMetrics2006 variance
- id: BD-037
type: B/BA
summary: Use Student's t distribution with kurtosis-based starting values for ARCH models
- id: BD-038
type: B/BA
summary: Use 0.94^i exponential decay backcast for GARCH initialization
- id: BD-039
type: B
summary: Use FIGARCH with ARCH(infinity) representation for long-memory volatility
- id: BD-040
type: B/BA
summary: Use EGARCH with log-variance for asymmetric volatility modeling
- id: BD-048
type: B/BA
summary: Use Skew Student's t with Hansen (1994) parameterization for asymmetric returns
- id: BD-049
type: B/RC
summary: Use Generalized Error Distribution with nu>1 for flexible tail modeling
- id: BD-050
type: B/BA
summary: Use HAR (Heterogeneous Autoregressive) model for financial volatility forecasting
- id: BD-051
type: B
summary: Use scipy.optimize.minimize with L-BFGS-B for ARCH model maximum likelihood
- id: BD-056
type: B/BA
summary: Use APARCH with delta=1 for TARCH specification and power parameter
- id: BD-028
type: B
summary: Auto-bandwidth selection for KPSS test
- id: BD-007
type: B/BA
summary: sqrt(T) as default bootstrap block size
- id: BD-008
type: B/BA
summary: Stationary Bootstrap as default for MCS
- id: BD-009
type: B/BA
summary: 1000 bootstrap replications for MCS
- id: BD-013
type: B/BA
summary: 2500 bootstrap replications for Sharpe ratio
- id: BD-019
type: B
summary: Two-sided p-values using normal SF
- id: BD-026
type: B/BA
summary: 'Bootstrap confidence intervals: ''basic'' and ''percentile'' methods'
- id: BD-022
type: B/BA
summary: EWMA decay parameter 0.94 for variance bounds
- id: BD-023
type: B
summary: 'Variance bounds: [var/1e6, var*1e6] with floor/ceiling'
- id: BD-027
type: B/BA
summary: 'Parametric constraints: alpha[i] > 0, beta[i] > 0, sum < 1'
- id: BD-003
type: B/BA
summary: 100 * pct_change() for returns calculation
- id: BD-GAP-001
type: DK
summary: 'Missing: as-of vs processing time'
- id: BD-GAP-002
type: DK
summary: 'Missing: Point-in-Time data availability'
- id: BD-GAP-003
type: DK
summary: 'Missing: Stale data detection and expiry'
- id: BD-GAP-004
type: DK
summary: 'Missing: Model/data version snapshot binding'
- id: BD-GAP-005
type: B
summary: 'Missing: Currency/unit explicit annotation'
- id: BD-GAP-006
type: RC
summary: 'Missing: Settlement and delivery time'
- id: BD-GAP-007
type: RC
summary: 'Missing: Price and quantity precision'
- id: BD-GAP-008
type: B
summary: 'Missing: 协方差矩阵 PSD 修复策略'
- id: BD-GAP-009
type: B
summary: 'Missing: 协方差估计量选择与收缩'
- id: BD-GAP-010
type: B
summary: 'Missing: VaR/CVaR 置信水平与窗口'
- id: BD-GAP-011
type: DK
summary: 'Missing: 版本化写入与快照语义'
- id: BD-GAP-012
type: DK
summary: 'Missing: ** "Implement explicit timezone annotation policy: each DatetimeIndex inputs must be UTC-normalized
with explicit tz_localize before processing; add a validate_timezone() helper'
- id: BD-GAP-013
type: M
summary: 'Missing: ** "Add Hessian condition number check before np.linalg.inv() in arch/univariate/base.py:979 and
in cointegration module; warn or regularize if cond > 1e10'
- id: BD-GAP-014
type: B
summary: 'Missing: ** "Add PSD (positive semi-definite) validation to kernel covariance estimator output in arch/covariance/kernel.py;
symmetrize + eigenfloor any non-PSD estimates'
- id: BD-GAP-015
type: M
summary: 'Missing: ** "Add explicit DataScaleWarning behavior description for poorly-scaled data: document the 1-1000
scale recommendation and add a rescale helper'
- id: BD-GAP-016
type: M
summary: 'Missing: ** "Add optional ConvergenceDiagnosis object that stores: iteration history, log-likelihood path,
parameter trajectory, for post-hoc convergence quality assessment'
- id: BD-GAP-017
type: B
summary: 'Missing: ** "Add explicit annualized_volatility() helper with configurable compounding convention (252, 365,
simple); clarify that each volatility is in frequency-of-data units'
- id: BD-GAP-018
type: B
summary: 'Missing: ** "Add backtest validation framework: automatic train/test split with historical VaR/CVaR/realized
PnL tracking for volatility models'
- id: BD-GAP-019
type: DK
summary: 'Missing: as-of vs processing time'
- id: BD-GAP-020
type: DK
summary: 'Missing: Point-in-Time data availability'
- id: BD-GAP-021
type: DK
summary: 'Missing: Stale data detection and expiry'
- id: BD-GAP-022
type: M/DK
summary: 'Missing: Day count convention'
- id: BD-GAP-023
type: B
summary: 'Missing: Currency/unit explicit annotation'
- id: BD-GAP-024
type: RC
summary: 'Missing: Settlement and delivery time'
- id: BD-GAP-025
type: RC
summary: 'Missing: Price and quantity precision'
- id: BD-060
type: BA/DK
summary: GARCH power=2.0 defaults to standard GARCH; power!=2.0 blocks analytic forecasts
- id: BD-061
type: B/BA
summary: ConstantVariance() and Normal() are hardcoded ARCHModel defaults
- id: BD-066
type: BA/M
summary: Backcast uses exponential decay tau=min(75, nobs) with 0.94^weight
- id: BD-069
type: BA
summary: hold_back=0 by default; each observations used in estimation
- id: BD-072
type: T
summary: BCa confidence intervals require equal-length datasets across args/kwargs
- id: BD-004
type: B/BA
summary: Analytic forecast method as default
- id: BD-005
type: B/BA
summary: 1000 simulations for simulation/bootstrap forecasting
- id: BD-014
type: B/DK
summary: Rolling window forecasts with 20 replications
- id: BD-020
type: B
summary: 3-d array for multiple exogenous variable forecasts
- id: BD-029
type: B/BA
summary: Simulation-based forecasting for multi-step GARCH with power!=2
- id: BD-073
type: BA/DK
summary: 'INTERACTION: BD-002 (Normal distribution) × BD-017 (Student''s T distribution) → CONTRADICTION: Gaussian default
vs heavy-tail reality'
- id: BD-074
type: BA/DK
summary: 'INTERACTION: BD-003 (100*pct_change returns) × BD-012 (Sharpe annualization 12x) → HIDDEN DEPENDENCY: Return
scaling propagates to performance metrics'
- id: BD-075
type: BA/DK
summary: 'INTERACTION: BD-001 (GARCH(1,1)) × BD-015 (GJR-GARCH leverage) × BD-041 (EGARCH log-variance) → RISK CASCADE:
Asymmetric volatility models cascade through VaR estimation'
- id: BD-076
type: BA/M
summary: 'INTERACTION: BD-010 (AIC for ADF) × BD-032 (BIC for ADF) → CONTRADICTION: Conflicting lag selection defaults
across codebase'
- id: BD-077
type: BA/DK
summary: 'INTERACTION: BD-004 (Analytic forecast) × BD-030 (Simulation for power!=2) × BD-061 (power blocks analytic)
→ RISK CASCADE: Power specification determines forecast method availability'
- id: BD-078
type: BA/M
summary: 'INTERACTION: BD-006 (EWMA backcast tau=75) × BD-022 (lambda=0.94 decay) × BD-037 (RiskMetrics lambda=0.94)
→ AMPLIFICATION: Consistent EWMA parameters amplify initialization sensitivity'
- id: BD-079
type: BA/M
summary: 'INTERACTION: BD-007 (sqrt(T) block size) × BD-008 (Stationary Bootstrap) × BD-031 (Politis-White optimal block)
→ AMPLIFICATION: Multiple block length selection mechanisms interact'
- id: BD-080
type: BA/DK
summary: 'INTERACTION: BD-021 (Bollerslev-Wooldridge robust SE) × BD-028 (GARCH constraints alpha+beta<1) → HIDDEN DEPENDENCY:
Robust inference requires correctly specified volatility model'
- id: BD-081
type: BA
summary: 'INTERACTION: BD-070 (hold_back=0 default) × BD-014 (20 rolling windows) → CONTRADICTION: Full-sample estimation
vs out-of-sample validation requirements'
- id: BD-082
type: BA
summary: 'INTERACTION: BD-033 (MacKinnon critical values) × BD-043 (Engle-Granger cointegration) × BD-047 (Phillips-Ouliaris
tests) → RISK CASCADE: Critical value surface accuracy cascades through each unit roo'
- id: BD-065
type: B/BA
summary: CircularBlockBootstrap inherits from IIDBootstrap with block_length override
- id: BD-006
type: B/BA
summary: EWMA backcast with lambda=0.94 and tau=75
- id: BD-062
type: DK
summary: 'Parameter ordering: [mean_params, vol_params, dist_params] with computed offsets'
- id: BD-063
type: DK
summary: 'Loglikelihood computation order: resid -> sigma2 -> distribution.loglikelihood'
- id: BD-070
type: DK
summary: 'rescale threshold: variance must be in [0.1, 10000) to avoid rescaling'
- id: BD-001
type: B/BA
summary: GARCH(1,1) as default volatility model
- id: BD-002
type: B/BA
summary: Normal (Gaussian) distribution as default error distribution
- id: BD-010
type: B/BA
summary: AIC as default lag selection criterion for ADF
- id: BD-011
type: B/BA
summary: Constant as default ADF deterministic trend
- id: BD-015
type: B/BA
summary: GJR-GARCH with o=1 captures leverage effect
- id: BD-016
type: B/BA
summary: TARCH (power=1.0) models absolute volatility
- id: BD-017
type: B/BA
summary: Student's T distribution for heavy-tailed returns
- id: BD-018
type: B
summary: R-squared adjusted for degrees of freedom
- id: BD-024
type: B/BA
summary: Fixed parameters via fix() method for counterfactuals
- id: BD-058
type: T
summary: fit() MUST be called before forecast() on ARCHModelResult
- id: BD-059
type: T
summary: Bootstrap clone() requires fresh fit data - old fit indices persist
- id: BD-068
type: T
summary: fit() closed-form path requires Normal() dist AND ConstantVariance volatility
- id: BD-025
type: B
summary: 'Horizon naming: ''h.1'', ''h.2'', ... for forecast columns'
- id: BD-021
type: B
summary: Bollerslev-Wooldridge robust covariance estimator
- id: BD-064
type: M/DK
summary: 'Strategy pattern: VolatilityProcess/Distribution are pluggable strategy objects'
- id: BD-067
type: M/DK
summary: Starting values search uses fixed grid of (alpha, gamma, beta) tuples
- id: BD-071
type: B
summary: Numba JIT compilation fallback to Python in _cov_kernel
- id: BD-012
type: B/BA
summary: Sharpe Ratio annualized with 12 multiplier
resources:
packages:
- name: pandas
version_pin: ==1.5.3
- name: numpy
version_pin: ==1.24.4
- name: matplotlib
version_pin: '>=2'
- name: requests
version_pin: ==2.31.0
- name: scipy
version_pin: '>=1.3.0'
- name: scikit-learn
version_pin: '>1.4.2'
- name: pytest
version_pin: '>=8.3'
strategy_scaffold:
entry_point_name: run_backtest
output_path: result.csv
execution_mode: backtest
conditional_entry_points:
backtest:
entry_point_name: run_backtest
output_path: result.csv
collector:
entry_point_name: run_collector
output_path: result.json
factor:
entry_point_name: run_factor
output_path: result.parquet
training:
entry_point_name: run_training
output_path: result.json
serving:
entry_point_name: run_server
output_path: result.json
research:
entry_point_name: run_research
output_path: result.json
tail_template: "# === DO NOT MODIFY BELOW THIS LINE ===\nif __name__ == \"__main__\":\n result = run_backtest() #\
\ implement above\n from validate import enforce_validation\n enforce_validation(result, output_path=\"{workspace}/result.csv\"\
)\n# === END DO NOT MODIFY ==="
host_adapter:
target: openclaw
timeout_seconds: 1800
shell_operator_restriction: 'exec tool intercepts && / ; / | — never chain: ''pip install X && python Y''. Use separate
exec calls.'
install_recipes:
- python3 -m pip install zvt
credential_injection: JoinQuant/QMT credentials require user-side '!' prefix shell login. Never hardcode credentials in
generated scripts.
path_resolution: '{workspace} resolves to ~/.openclaw/workspace/doramagic at execution time.'
file_io_tooling: Use openclaw 'write' tool for .py/.sql files; 'exec' tool for python3 /absolute/path/script.py (absolute
paths only).
constraints:
fatal:
- id: finance-C-001
when: When implementing data input for ARCH model initialization
action: validate that input data contains only finite values using np.all(np.isfinite) before any numeric computation
severity: fatal
kind: domain_rule
modality: must
consequence: Optimizers and recursive variance computations will produce NaN/inf results, causing the entire model estimation
to fail silently with meaningless outputs
stage_ids:
- data_input
- id: finance-C-002
when: When implementing data input for ARCH model initialization
action: convert each input data to contiguous float64 arrays using np.ascontiguousarray before storing in self._y
severity: fatal
kind: domain_rule
modality: must
consequence: Non-contiguous arrays or non-float64 types will cause buffer errors in Cython/Numba optimized recursive computations,
leading to segmentation faults or incorrect variance calculations
stage_ids:
- data_input
- id: finance-C-009
when: When initializing ARCH models with data
action: pass None as input data without raising RuntimeError when attempting to fit the model
severity: fatal
kind: domain_rule
modality: must_not
consequence: Fitting attempt with no data will cause cryptic errors in scipy optimize or segfault in Cython recursions
stage_ids:
- data_input
- id: finance-C-013
when: When implementing an ARCH model with custom components
action: verify input data y contains only finite values without NaN or inf
severity: fatal
kind: domain_rule
modality: must
consequence: NaN or inf values in the input data cause the model to fail silently or produce invalid likelihood computations
during optimization, leading to incorrect parameter estimates
stage_ids:
- model_specification
- id: finance-C-014
when: When plugging in a volatility component to ARCHModel
action: verify the volatility parameter inherits from VolatilityProcess abstract base class
severity: fatal
kind: domain_rule
modality: must
consequence: Using a non-VolatilityProcess subclass causes TypeError during initialization, and incompatible volatility
processes will fail during variance computation and forecasting
stage_ids:
- model_specification
- id: finance-C-015
when: When plugging in a distribution component to ARCHModel
action: verify the distribution parameter inherits from Distribution abstract base class
severity: fatal
kind: domain_rule
modality: must
consequence: Using a non-Distribution subclass causes TypeError during initialization, and incompatible distributions
will fail during log-likelihood computation
stage_ids:
- model_specification
- id: finance-C-016
when: When implementing custom volatility or distribution classes
action: implement the constraints() method returning (A, b) arrays where A.dot(params) - b >= 0
severity: fatal
kind: domain_rule
modality: must
consequence: Missing or incorrect constraints implementation causes optimization to use invalid parameter regions, producing
mathematically invalid volatility models (e.g., negative variances)
stage_ids:
- model_specification
- id: finance-C-019
when: When implementing custom volatility processes
action: provide compute_variance() method that fills sigma2 array with conditional variances
severity: fatal
kind: domain_rule
modality: must
consequence: Missing or incorrect compute_variance implementation causes the likelihood function to fail, making parameter
estimation impossible
stage_ids:
- model_specification
- id: finance-C-020
when: When implementing custom distribution classes
action: provide loglikelihood() method for likelihood evaluation during optimization
severity: fatal
kind: domain_rule
modality: must
consequence: Missing loglikelihood implementation causes the optimization to fail during parameter estimation, as log-likelihood
is the objective function for SLSQP
stage_ids:
- model_specification
- id: finance-C-029
when: When composing ARCHModel from three components
action: 'concatenate parameter arrays in the fixed order: [mean_params, volatility_params, distribution_params]'
severity: fatal
kind: architecture_guardrail
modality: must
consequence: Incorrect parameter ordering causes _parse_parameters to assign wrong values to each component, leading to
mathematically invalid models (e.g., volatility parameters interpreted as mean parameters)
stage_ids:
- model_specification
- id: finance-C-030
when: When constructing constraints for ARCH model fitting
action: stack constraint matrices from mean model, volatility, and distribution in parameter order
severity: fatal
kind: architecture_guardrail
modality: must
consequence: Incorrect constraint stacking causes the optimizer to enforce wrong constraints on wrong parameters, producing
invalid or non-stationary models
stage_ids:
- model_specification
- id: finance-C-031
when: When constructing starting values for ARCH model fitting
action: concatenate starting values from mean model, volatility (computed from resids), and distribution (computed from
std_resids)
severity: fatal
kind: architecture_guardrail
modality: must
consequence: Incorrect starting values concatenation causes the optimizer to use wrong initial values for wrong parameters,
leading to poor convergence or wrong solutions
stage_ids:
- model_specification
- id: finance-C-035
when: When implementing SLSQP constrained optimization
action: enforce inequality constraints a.dot(params) - b >= 0 for each parameters
severity: fatal
kind: domain_rule
modality: must
consequence: Volatility model parameters violating stationarity constraints produce invalid conditional variances, causing
downstream risk misestimation and potential trading losses
stage_ids:
- parameter_estimation
- id: finance-C-036
when: When computing conditional variance in optimization loop
action: verify sigma2 (conditional variance) >= 0 for each observations
severity: fatal
kind: domain_rule
modality: must
consequence: Negative variance values cause loglikelihood to produce NaN, invalidating parameter estimates and causing
downstream computations to fail silently
stage_ids:
- parameter_estimation
- id: finance-C-040
when: When constructing the unified parameter vector
action: use offsets array to partition parameters into mean|volatility|distribution components
severity: fatal
kind: architecture_guardrail
modality: must
consequence: Incorrect parameter partitioning causes wrong component parameters to be passed to mean model, volatility
model, and distribution, producing invalid results
stage_ids:
- parameter_estimation
- id: finance-C-041
when: When assembling inequality constraints for joint estimation
action: combine constraints from mean model, volatility model, and distribution in correct order
severity: fatal
kind: architecture_guardrail
modality: must
consequence: Incorrect constraint ordering places volatility constraints in wrong parameter positions, allowing invalid
parameters that violate stationarity or positivity requirements
stage_ids:
- parameter_estimation
- id: finance-C-051
when: When estimating volatility persistence close to 1.0
action: check that persistence = sum(alpha) + sum(gamma)/2 + sum(beta) < 1 for stationarity
severity: fatal
kind: domain_rule
modality: must
consequence: Persistence >= 1 violates covariance stationarity, producing non-mean-reverting variance that explodes over
time, invalidating long-horizon forecasts
stage_ids:
- parameter_estimation
- id: finance-C-052
when: When implementing ARCH model forecasting code
action: verify forecast variances are finite and non-negative throughout the forecast horizon
severity: fatal
kind: domain_rule
modality: must
consequence: Non-finite or negative variance forecasts indicate mathematical errors in the ARCH recursion, producing invalid
statistical inferences and potentially misleading risk estimates
stage_ids:
- forecasting
- id: finance-C-054
when: When validating forecast horizon parameter
action: require horizon to be a positive integer (>= 1)
severity: fatal
kind: domain_rule
modality: must
consequence: Invalid horizon values cause undefined forecast behavior or silent data corruption in downstream risk calculations
stage_ids:
- forecasting
- id: finance-C-055
when: When forecasting with EGARCH volatility models
action: use analytic method for horizons greater than 1
severity: fatal
kind: resource_boundary
modality: must_not
consequence: EGARCH variance evolves in logarithmic space, not squares. Analytic multi-step formulas require variance
to evolve in squares, producing mathematically invalid forecasts
stage_ids:
- forecasting
- id: finance-C-063
when: When validating forecasting method parameter
action: accept only 'analytic', 'simulation', or 'bootstrap' as valid method values
severity: fatal
kind: architecture_guardrail
modality: must
consequence: Invalid method strings cause undefined behavior or silent fallback to incorrect forecasting algorithm
stage_ids:
- forecasting
- id: finance-C-068
when: When comparing backtested forecast performance to live trading
action: claim backtest returns equal expected live trading returns
severity: fatal
kind: claim_boundary
modality: must_not
consequence: Backtest results exclude transaction costs, slippage, liquidity constraints, and market impact that affect
live execution. Claiming equivalence misleads risk assessment
stage_ids:
- forecasting
- id: finance-C-070
when: When implementing unit root tests that use lag selection
action: enforce non-negative integer lags by raising ValueError for negative lags
severity: fatal
kind: domain_rule
modality: must
consequence: Negative lag values produce invalid regression specifications with wrong degrees of freedom, causing misleading
test statistics and invalid statistical inference
stage_ids:
- unitroot_testing
- id: finance-C-071
when: When running ADF, DFGLS, or KPSS tests
action: validate trend against the test-specific supported trends before computation
severity: fatal
kind: domain_rule
modality: must
consequence: Invalid trend specification causes wrong statistical distribution assumptions, leading to incorrect critical
values and p-values that invalidate test conclusions
stage_ids:
- unitroot_testing
- id: finance-C-072
when: When implementing ADF or DFGLS test regression
action: verify sample size exceeds minimum requirement of 3 + trend_order + lag_len observations
severity: fatal
kind: domain_rule
modality: must
consequence: Insufficient observations cause singular or near-singular regression matrices, leading to unstable or undefined
test statistics
stage_ids:
- unitroot_testing
- id: finance-C-073
when: When computing test statistics for unit root tests
action: verify the statistic is finite and within the interpolation bounds of critical value tables
severity: fatal
kind: domain_rule
modality: must
consequence: Non-finite or out-of-range statistics produce undefined p-values (0.0 or 1.0) that miss actual stationarity
patterns or create false rejections
stage_ids:
- unitroot_testing
- id: finance-C-074
when: When implementing VarianceRatio test
action: enforce lags parameter to be an integer >= 2 before computation
severity: fatal
kind: domain_rule
modality: must
consequence: Lags less than 2 produce undefined multi-period variance ratios, causing division by zero or mathematically
invalid test statistics
stage_ids:
- unitroot_testing
- id: finance-C-075
when: When implementing ZivotAndrews test trim parameter
action: validate trim is a float in range [0, 1/3] to verify valid break period calculation
severity: fatal
kind: domain_rule
modality: must
consequence: Invalid trim values cause incorrect break point exclusion regions, leading to structural break misdetection
and invalid unit root conclusions
stage_ids:
- unitroot_testing
- id: finance-C-076
when: When implementing DFGLS test
action: use trend values other than 'c' or 'ct'
severity: fatal
kind: resource_boundary
modality: must_not
consequence: Unsupported trends produce GLS-detrending coefficients outside valid ranges (-7.0 for 'c', -13.5 for 'ct'),
causing undefined test statistics
stage_ids:
- unitroot_testing
- id: finance-C-077
when: When implementing cointegration tests
action: verify y and x have identical number of observations before cross-sectional regression
severity: fatal
kind: domain_rule
modality: must
consequence: Misaligned observation counts produce incorrect cointegrating vectors and residuals, leading to spurious
cointegration conclusions
stage_ids:
- unitroot_testing
- id: finance-C-080
when: When implementing Engle-Granger cointegration test
action: limit the number of cross-sectional variables (num_x) to range [1, 12]
severity: fatal
kind: resource_boundary
modality: must
consequence: Cross-sectional variables outside [1,12] lack pre-computed critical value tables, causing KeyError or invalid
cointegration inference
stage_ids:
- unitroot_testing
- id: finance-C-081
when: When using MacKinnon critical value functions
action: use regression='ctt' with dist_type='dfgls' since DFGLS only supports 'c' and 'ct'
severity: fatal
kind: resource_boundary
modality: must_not
consequence: Invalid regression-dist_type combination causes KeyError when accessing non-existent critical value table
entries
stage_ids:
- unitroot_testing
- id: finance-C-083
when: When computing automatic bandwidth for KPSS test
action: require at least 2 observations in the input series
severity: fatal
kind: domain_rule
modality: must
consequence: Single observation series causes division by zero or undefined bandwidth, leading to crashes in KPSS test
execution
stage_ids:
- unitroot_testing
- id: finance-C-084
when: When running PhillipsPerron test
action: allow zero or negative regression coefficient standard error
severity: fatal
kind: domain_rule
modality: must_not
consequence: Zero variance indicates constant-value series or perfect multicollinearity, producing undefined PP test statistics
stage_ids:
- unitroot_testing
- id: finance-C-085
when: When implementing ZivotAndrews test
action: validate regressor matrix rank to detect singular matrices from constant regions
severity: fatal
kind: architecture_guardrail
modality: must
consequence: Singular regressor matrices cause undefined OLS estimates, producing NaN test statistics and invalid structural
break conclusions
stage_ids:
- unitroot_testing
- id: finance-C-093
when: When implementing confidence interval calculation using conf_int method
action: use size parameter strictly between 0 and 1
severity: fatal
kind: domain_rule
modality: must
consequence: Invalid CI size produces undefined behavior or runtime ValueError, breaking statistical inference and producing
meaningless intervals that cannot be interpreted for decision-making
stage_ids:
- bootstrap_inference
- id: finance-C-094
when: When implementing confidence interval calculation using conf_int method
action: use tail parameter as one of 'two', 'lower', or 'upper'
severity: fatal
kind: domain_rule
modality: must
consequence: Invalid tail parameter causes ValueError and fails to produce one-sided or two-sided confidence intervals
needed for directional hypothesis testing
stage_ids:
- bootstrap_inference
- id: finance-C-095
when: When implementing Model Confidence Set (MCS) with multiple_comparison module
action: provide losses array with at least two columns (models)
severity: fatal
kind: domain_rule
modality: must
consequence: MCS with fewer than 2 models cannot compute pairwise comparisons, resulting in ValueError and failure to
produce any model confidence set output
stage_ids:
- bootstrap_inference
- id: finance-C-096
when: When implementing BCa (bias-corrected and accelerated) confidence interval method
action: verify empirical probability p is strictly between 0 and 1
severity: fatal
kind: domain_rule
modality: must
consequence: BCa fails when empirical probability is 0 or 1 (extreme statistics), causing RuntimeError and preventing
bias correction for distributions not well-approximated by normal in finite samples
stage_ids:
- bootstrap_inference
- id: finance-C-097
when: When implementing bootstrap-based forecasting using _bootstrap_forecast
action: verify start index includes more than 100 observations
severity: fatal
kind: domain_rule
modality: must
consequence: Bootstrap forecast with fewer than 100 observations produces unreliable standard errors and confidence intervals,
invalidating volatility forecasts for risk management decisions
stage_ids:
- bootstrap_inference
- id: finance-C-098
when: When implementing bootstrap confidence intervals
action: validate confidence interval size as strictly between 0 and 1
severity: fatal
kind: domain_rule
modality: must
consequence: Invalid CI size causes ValueError and prevents computation of statistically valid confidence intervals needed
for parameter uncertainty quantification
stage_ids:
- bootstrap_inference
- id: finance-C-103
when: When implementing MCS or SPA
action: require each input arrays to have the same number of elements in axis 0
severity: fatal
kind: resource_boundary
modality: must
consequence: Misaligned data causes silent misalignment in bootstrap resampling, producing incorrect standard errors and
invalid confidence intervals that appear valid
stage_ids:
- bootstrap_inference
- id: finance-C-106
when: When implementing SPA p-value calculation
action: compute pvalue argument must be strictly between 0 and 1 for critical values
severity: fatal
kind: operational_lesson
modality: must
consequence: Invalid p-value causes ValueError and prevents computation of critical values needed for model selection
decisions
stage_ids:
- bootstrap_inference
- id: finance-C-107
when: When implementing bootstrap-based model comparison
action: call compute() before accessing pvalues or included/excluded model sets
severity: fatal
kind: architecture_guardrail
modality: must
consequence: Accessing results before compute() causes RuntimeError and prevents retrieval of MCS/SPA results
stage_ids:
- bootstrap_inference
- id: finance-C-109
when: When implementing bootstrap data validation
action: verify each input data types are numpy ndarray, pandas DataFrame, or pandas Series
severity: fatal
kind: architecture_guardrail
modality: must
consequence: Unsupported data types cause TypeError and prevent bootstrap from resampling data for inference
stage_ids:
- bootstrap_inference
- id: finance-C-110
when: When implementing PRNG for bootstrap
action: use NumPy Generator or RandomState as PRNG source
severity: fatal
kind: architecture_guardrail
modality: must
consequence: Invalid PRNG type causes TypeError and prevents bootstrap from generating random indices for resampling
stage_ids:
- bootstrap_inference
- id: finance-C-116
when: When computing t-statistics for model parameters
action: compute tvalues as the ratio of params divided by std_err
severity: fatal
kind: domain_rule
modality: must
consequence: Incorrect t-statistics will lead to wrong hypothesis testing conclusions, causing invalid statistical inference
about parameter significance
stage_ids:
- results_reporting
- id: finance-C-117
when: When computing p-values for parameter t-statistics
action: compute pvalues using two-sided normal distribution survival function on absolute tvalues
severity: fatal
kind: domain_rule
modality: must
consequence: Incorrect p-values will cause wrong conclusions about parameter significance, leading to improper model specification
decisions
stage_ids:
- results_reporting
- id: finance-C-118
when: When constructing parameter confidence intervals
action: compute confidence intervals using normal distribution quantile with specified alpha coverage
severity: fatal
kind: domain_rule
modality: must
consequence: Incorrect confidence interval coverage will misrepresent the precision of parameter estimates, violating
statistical guarantees
stage_ids:
- results_reporting
- id: finance-C-119
when: When computing ARCH-LM test statistic
action: compute the ARCH-LM statistic as nobs multiplied by the R-squared of the auxiliary regression
severity: fatal
kind: domain_rule
modality: must
consequence: Incorrect ARCH-LM statistic will produce wrong diagnostic conclusions about remaining heteroskedasticity
in model residuals
stage_ids:
- results_reporting
- id: finance-C-120
when: When computing standard errors from parameter covariance
action: extract standard errors as the square root of diagonal elements of the parameter covariance matrix
severity: fatal
kind: domain_rule
modality: must
consequence: Incorrect standard errors propagate to all downstream inference, affecting t-statistics, p-values, and confidence
intervals
stage_ids:
- results_reporting
- id: finance-C-121
when: When computing model fit statistics
action: compute AIC as negative two times loglikelihood plus two times number of parameters
severity: fatal
kind: domain_rule
modality: must
consequence: Incorrect AIC will lead to wrong model selection decisions when comparing different ARCH specifications
stage_ids:
- results_reporting
- id: finance-C-122
when: When computing Schwarz/Bayesian Information Criteria
action: compute BIC as negative two times loglikelihood plus log of observations times number of parameters
severity: fatal
kind: domain_rule
modality: must
consequence: Incorrect BIC will cause improper model selection, potentially choosing over-parameterized or under-fitted
models
stage_ids:
- results_reporting
- id: finance-C-123
when: When computing adjusted R-squared
action: compute adjusted R-squared using the degrees of freedom correction formula
severity: fatal
kind: domain_rule
modality: must
consequence: Incorrect adjusted R-squared will misrepresent model explanatory power after accounting for parameter count
stage_ids:
- results_reporting
- id: finance-C-124
when: When displaying model estimation summary
action: display parameter table with columns for coefficient, standard error, t-statistic, p-value, and confidence interval
severity: fatal
kind: architecture_guardrail
modality: must
consequence: Missing columns in summary output will prevent users from performing valid statistical inference on model
parameters
stage_ids:
- results_reporting
- id: finance-C-125
when: When displaying fit statistics in summary
action: display R-squared, adjusted R-squared, log-likelihood, AIC, and BIC in the summary header
severity: fatal
kind: architecture_guardrail
modality: must
consequence: Missing fit statistics will prevent users from assessing model quality and performing model comparison
stage_ids:
- results_reporting
- id: finance-C-128
when: When running ARCH-LM test for residual diagnostics
action: require at least 3 non-nan observations for valid test results
severity: fatal
kind: domain_rule
modality: must
consequence: ARCH-LM test with insufficient observations produces unreliable test statistics and misleading diagnostic
conclusions
stage_ids:
- results_reporting
- id: finance-C-130
when: When computing R-squared for model fit assessment
action: handle implicit constant detection to verify correct total sum of squares computation
severity: fatal
kind: domain_rule
modality: must
consequence: Incorrect R-squared when model contains implicit constant leads to wrong assessment of model explanatory
power
stage_ids:
- results_reporting
- id: finance-C-135
when: When testing ARCH-LM with default lag selection
action: compute default lags using the formula ceil(12 * (nobs/100)^(1/4)) bounded by half the sample size
severity: fatal
kind: domain_rule
modality: must
consequence: Incorrect default lag selection will produce either under-powered or over-fitted ARCH-LM tests, leading to
wrong diagnostic conclusions
stage_ids:
- results_reporting
- id: finance-C-136
when: When initializing an ARCHModel with input data
action: Convert y to float64 contiguous array and validate each values are finite (no NaN or inf)
severity: fatal
kind: domain_rule
modality: must
consequence: Non-finite values (NaN/inf) in the input data will cause variance computations to produce NaN, leading to
failed optimization and meaningless model results
stage_ids:
- data_input
- model_specification
- id: finance-C-137
when: When converting input data to the internal representation
action: Verify input y is converted to 1D float64 contiguous array via to_array_1d
severity: fatal
kind: domain_rule
modality: must
consequence: Multi-dimensional or non-contiguous arrays will cause index errors in variance recursions and parameter estimation
stage_ids:
- data_input
- model_specification
- id: finance-C-139
when: When combining starting values from mean, volatility, and distribution
action: 'Concatenate starting values in the correct order: mean params, volatility params, distribution params'
severity: fatal
kind: domain_rule
modality: must
consequence: Incorrect parameter ordering will cause parameter parsing (_parse_parameters) to assign wrong values to mean/volatility/distribution,
producing invalid models
stage_ids:
- model_specification
- parameter_estimation
- id: finance-C-140
when: When combining bounds from mean, volatility, and distribution
action: 'Extend bounds list in the same order as parameters: mean bounds first, then volatility bounds, then distribution
bounds'
severity: fatal
kind: domain_rule
modality: must
consequence: Misaligned bounds will cause SLSQP optimizer to enforce wrong constraints on wrong parameters, potentially
producing invalid parameter values
stage_ids:
- model_specification
- parameter_estimation
- id: finance-C-141
when: When constructing linear constraints from each model components
action: Block-diagonalize constraint matrix A so each component's constraints only affect its own parameters
severity: fatal
kind: domain_rule
modality: must
consequence: Non-block-diagonal constraints will incorrectly constrain unrelated parameters, causing optimization to fail
or produce wrong parameter values
stage_ids:
- model_specification
- parameter_estimation
- id: finance-C-144
when: When passing fitted parameters from estimation to forecasting
action: Parse params using _parse_parameters to extract mean/volatility/distribution parameter subsets
severity: fatal
kind: architecture_guardrail
modality: must
consequence: Using raw params array without parsing will pass wrong parameter subsets to variance recursions, producing
incorrect forecasts
stage_ids:
- parameter_estimation
- forecasting
- id: finance-C-159
when: When initializing an ARCHModel with input data (y parameter)
action: Provide only finite values in the data array — NaN and inf values are not permitted and cause a ValueError
severity: fatal
kind: domain_rule
modality: must
consequence: NaN or inf values in the input time series cause the model's loglikelihood computation to produce NaN results,
corrupting all parameter estimates and forecasts
- id: finance-C-160
when: When implementing a new VolatilityProcess subclass
action: Verify each computed conditional variance values (sigma2) are non-negative throughout the variance recursion —
values below the lower var_bounds are clamped up, and values above the upper bound are log-adjusted
severity: fatal
kind: domain_rule
modality: must
consequence: Negative sigma2 values cause the distribution loglikelihood to receive invalid inputs (e.g., sqrt of negative
for Normal), producing NaN loglikelihood and corrupted parameter estimates
- id: finance-C-161
when: When constructing variance bounds (var_bounds) for any volatility model
action: Format var_bounds as a 2-column array of shape (nobs, 2) where column 0 is the lower bound and column 1 is the
upper bound for each observation
severity: fatal
kind: domain_rule
modality: must
consequence: Incorrectly formatted var_bounds causes the bounds_check function to misread lower/upper bounds, allowing
invalid sigma2 values to pass through and corrupt the loglikelihood
- id: finance-C-162
when: When providing a user-supplied backcast value to the volatility model's backcast_transform
action: Verify the backcast value is strictly positive — negative backcast values cause a ValueError
severity: fatal
kind: domain_rule
modality: must
consequence: Negative backcast causes the volatility recursion to start with an invalid initial sigma2 value, producing
invalid loglikelihood values and corrupted estimates
- id: finance-C-163
when: When implementing a new VolatilityProcess or Distribution subclass
action: Return constraint arrays (a, b) where parameters satisfy a.dot(params) - b >= 0 for each rows of a — this linear
constraint format is required by the SLSQP optimizer
severity: fatal
kind: domain_rule
modality: must
consequence: Incorrectly formatted constraint arrays cause SLSQP to receive invalid constraints, producing undefined optimization
behavior and potentially invalid parameter estimates
- id: finance-C-171
when: When implementing a new mean model, volatility model, or distribution
action: 'Verify the concatenated parameter array follows the fixed ordering: [mean_params, vol_params, dist_params], using
the pre-computed offsets array to slice each sub-parameter set'
severity: fatal
kind: architecture_guardrail
modality: must
consequence: Incorrect parameter ordering causes _parse_parameters to return wrong slices for mean, volatility, and distribution
parameters, producing invalid loglikelihood values and corrupted estimates
- id: finance-C-172
when: When computing the loglikelihood in the ARCHModel._loglikelihood method
action: 'Follow the fixed three-step computation order: (1) compute resids from the mean model, (2) compute sigma2 using
volatility.compute_variance(), (3) call distribution.loglikelihood() with the computed resids and sigma2'
severity: fatal
kind: architecture_guardrail
modality: must
consequence: Skipping or reordering any step produces an incorrect loglikelihood value, leading to wrong parameter estimates
during optimization
- id: finance-C-173
when: When calling forecast() on an ARCHModelResult
action: Call fit() first to produce an ARCHModelResult with estimated params — forecast() requires the params attribute
which is only populated after successful fit() execution
severity: fatal
kind: architecture_guardrail
modality: must
consequence: Calling forecast() before fit() causes an AttributeError because params is None, preventing any forecast
generation
- id: finance-C-174
when: When implementing a new VolatilityProcess subclass that is NOT ConstantVariance
action: Set closed_form = False — only ConstantVariance has closed_form = True; each other volatility processes must explicitly
set closed_form = False or accept the default
severity: fatal
kind: architecture_guardrail
modality: must_not
consequence: Setting closed_form = True on a non-ConstantVariance volatility process causes the fit() method to incorrectly
enter the closed-form path, producing mathematically invalid parameter estimates
- id: finance-C-175
when: When using the Normal/Gaussian distribution with any ARCH volatility model
action: Set num_params = 0 on the Normal distribution — Normal has no additional parameters beyond those estimated by
the mean and volatility models
severity: fatal
kind: architecture_guardrail
modality: must
consequence: Incorrectly setting num_params > 0 on the Normal distribution disrupts the parameter offset calculations
in fit(), causing parameter slicing errors and wrong estimates
- id: finance-C-210
when: When processing price and quantity data in financial calculations
action: Assume infinite precision for monetary calculations using native float types — floating-point representation causes
rounding errors in price aggregation and P&L computation
severity: fatal
kind: claim_boundary
modality: must_not
consequence: Using float for price and quantity causes accumulated rounding errors in high-frequency trading; a 0.01 cent
error per trade compounds to significant P&L discrepancies in live trading
derived_from_bd_id: BD-GAP-007
- id: finance-C-230
when: When setting APARCH model parameters in volatility estimation
action: Change the delta parameter to values other than 1 when intending TARCH specification — delta=1 recovers standard
absolute return specification required for threshold ARCH
severity: fatal
kind: architecture_guardrail
modality: must_not
consequence: Setting delta!=1 fundamentally alters the APARCH power transformation, breaking the TARCH nesting property
and producing incorrect asymmetric volatility estimates that misrepresent downside risk
derived_from_bd_id: BD-056
regular:
- id: finance-C-003
when: When implementing input type detection in ARCH model
action: track whether the original input was a pandas DataFrame or Series using isinstance(y, (pd.DataFrame, pd.Series))
severity: high
kind: architecture_guardrail
modality: must
consequence: Results reporting will return incorrect types (numpy array instead of Series), breaking user API expectations
and causing downstream type errors
stage_ids:
- data_input
- id: finance-C-004
when: When implementing data input for ARCH model initialization
action: store the original input data unchanged in _y_original before any transformation for results reporting
severity: high
kind: architecture_guardrail
modality: must
consequence: Results and forecasts will be reported using transformed/scaled data instead of original user input, making
results unintelligible to users
stage_ids:
- data_input
- id: finance-C-005
when: When implementing input coercion in ARCH library
action: handle various input types (Series, DataFrame, numpy arrays, lists, scalars) by converting to consistent 1D format
using ensure1d
severity: high
kind: resource_boundary
modality: must
consequence: Incompatible input types will raise unexpected TypeErrors, preventing users from using common data formats
like pandas Series or numpy arrays
stage_ids:
- data_input
- id: finance-C-006
when: When implementing array conversion in ARCH library
action: verify each converted arrays are 1D and float64 dtype using to_array_1d for downstream numeric operations
severity: high
kind: resource_boundary
modality: must
consequence: Multi-dimensional or non-float64 arrays will cause shape mismatches in matrix operations and optimize.outer
calls, producing incorrect log-likelihood values
stage_ids:
- data_input
- id: finance-C-007
when: When implementing rescale logic in ARCH model fitting
action: apply scale factor consistently to original data when rescaling is triggered, then update model state via _scale_changed()
severity: high
kind: architecture_guardrail
modality: must
consequence: Parameter estimates and forecasts will be on wrong scale, making results meaningless for users who expect
outputs in original data units
stage_ids:
- data_input
- id: finance-C-008
when: When implementing data validation in ARCH model
action: raise ValueError immediately when encountering non-1D-reshapable input, with clear message indicating dimensionality
requirement
severity: high
kind: domain_rule
modality: must
consequence: Multi-dimensional data will silently produce incorrect results in variance calculations, leading to misleading
ARCH parameter estimates
stage_ids:
- data_input
- id: finance-C-010
when: When implementing automatic data rescaling
action: warn users when data variance is outside [0.1, 10000) range to prevent numerical instability in optimization
severity: medium
kind: operational_lesson
modality: should
consequence: Poorly scaled data causes optimizer convergence failure or excessive iterations, wasting computational resources
and producing unreliable parameter estimates
stage_ids:
- data_input
- id: finance-C-011
when: When implementing ARCH model forecasting
action: claim that forecast outputs equal actual realized returns or that backtest returns predict live trading performance
severity: high
kind: claim_boundary
modality: must_not
consequence: ARCH forecasts represent conditional variance estimates based on historical patterns; presenting these as
predictions of actual returns violates fundamental statistical principles and may mislead users into financial losses
stage_ids:
- data_input
- id: finance-C-012
when: When handling pandas Series input with ensure1d
action: preserve the Series name attribute during conversion when series=True, or set name from provided name parameter
severity: medium
kind: architecture_guardrail
modality: must
consequence: Results will have unlabeled or incorrectly labeled output Series, making downstream data analysis and debugging
difficult
stage_ids:
- data_input
- id: finance-C-017
when: When implementing custom volatility processes
action: provide starting_values() method returning valid initial parameter values
severity: high
kind: domain_rule
modality: must
consequence: Invalid or poorly-chosen starting values cause the SLSQP optimizer to fail convergence or converge to local
optima, producing suboptimal parameter estimates
stage_ids:
- model_specification
- id: finance-C-018
when: When implementing custom volatility processes
action: provide bounds() method returning list of (lower, upper) tuples for each parameter
severity: high
kind: domain_rule
modality: must
consequence: Missing bounds causes the optimizer to use unbounded parameter search, potentially producing numerically
unstable or invalid parameter values
stage_ids:
- model_specification
- id: finance-C-021
when: When using the arch library for volatility modeling
action: claim real-time trading capability since arch is a pure backtesting and forecasting framework
severity: high
kind: claim_boundary
modality: must_not
consequence: Claiming live trading capability when arch only provides estimation and simulation leads to operational misuse
and potential financial losses from attempting to deploy estimation-only code in production trading
stage_ids:
- model_specification
- id: finance-C-022
when: When presenting ARCH model estimation results
action: present backtest simulation results as equivalent to live trading performance
severity: high
kind: claim_boundary
modality: must_not
consequence: Simulated backtest returns systematically differ from live trading due to execution slippage, transaction
costs, market impact, and liquidity constraints not captured in the estimation framework
stage_ids:
- model_specification
- id: finance-C-023
when: When estimating ARCH models on financial time series
action: claim parameter estimates are the true population parameters
severity: medium
kind: claim_boundary
modality: must_not
consequence: ARCH model parameters are estimated via maximum likelihood on finite samples, introducing estimation uncertainty.
Standard errors and confidence intervals must be reported to avoid overstating precision
stage_ids:
- model_specification
- id: finance-C-024
when: When initializing ARCHModel with default components
action: use ConstantVariance as the default volatility process since it has closed-form estimation
severity: medium
kind: resource_boundary
modality: must
consequence: Using non-ConstantVariance volatility without explicit specification causes the model to require iterative
optimization, increasing computation time and potential convergence issues
stage_ids:
- model_specification
- id: finance-C-025
when: When initializing ARCHModel with default components
action: use Normal distribution as the default since it has no shape parameters (closed-form fit available)
severity: medium
kind: resource_boundary
modality: must
consequence: Using heavy-tailed distributions (StudentsT, SkewStudent) without explicit selection may cause optimization
to fail if starting values are poorly chosen
stage_ids:
- model_specification
- id: finance-C-026
when: When optimizing ARCH model parameters
action: use SLSQP optimizer since it supports both bound constraints and linear inequality constraints
severity: high
kind: resource_boundary
modality: must
consequence: Using optimizers without proper constraint support (L-BFGS-B, Nelder-Mead) cannot enforce ARCH parameter
constraints, producing mathematically invalid models
stage_ids:
- model_specification
- id: finance-C-027
when: When estimating ARCH models with HAR or other lag-based mean models
action: set hold_back parameter to exclude pre-sample observations that would cause look-ahead bias
severity: high
kind: operational_lesson
modality: must
consequence: HAR models use historical average calculations that can include pre-sample data if hold_back is not set,
causing look-ahead bias where information not yet available affects current estimates
stage_ids:
- model_specification
- id: finance-C-028
when: When using closed-form estimation path
action: verify volatility has closed_form=True AND distribution has num_params=0 AND volatility is ConstantVariance
severity: medium
kind: operational_lesson
modality: must
consequence: Failing to meet all three conditions forces the model through iterative optimization instead of closed-form
estimation, significantly increasing computation time
stage_ids:
- model_specification
- id: finance-C-032
when: When implementing volatility process starting values
action: compute volatility starting values using residuals from the mean model fit
severity: high
kind: architecture_guardrail
modality: must
consequence: Using raw data instead of residuals for volatility starting values produces incorrect initial variance estimates,
potentially causing divergence in optimization
stage_ids:
- model_specification
- id: finance-C-033
when: When implementing distribution starting values
action: compute distribution starting values using standardized residuals from volatility
severity: high
kind: architecture_guardrail
modality: must
consequence: Using non-standardized residuals for distribution starting values produces incorrect shape parameter initialization,
especially for heavy-tailed distributions
stage_ids:
- model_specification
- id: finance-C-034
when: When validating user-provided starting values
action: check starting values satisfy both bounds and constraint inequalities before optimization
severity: high
kind: architecture_guardrail
modality: must
consequence: Starting values outside bounds or violating constraints cause the optimizer to either fail immediately or
produce invalid intermediate results
stage_ids:
- model_specification
- id: finance-C-037
when: When estimating parameter covariance matrix
action: verify the parameter covariance matrix is positive definite
severity: high
kind: domain_rule
modality: must
consequence: Non-positive-definite covariance matrix produces invalid standard errors, t-statistics, and confidence intervals,
corrupting statistical inference
stage_ids:
- parameter_estimation
- id: finance-C-038
when: When constructing residuals in parameter estimation
action: produce mean-zero residuals by subtracting the conditional mean
severity: high
kind: domain_rule
modality: must
consequence: Non-zero mean residuals bias volatility estimation, as the ARCH variance equation assumes mean-zero shocks,
leading to systematic risk mismeasurement
stage_ids:
- parameter_estimation
- id: finance-C-039
when: When implementing SLSQP optimization for ARCH models
action: skip convergence status validation based on 'looks reasonable' assessment
severity: high
kind: architecture_guardrail
modality: must_not
consequence: Non-zero optimizer status indicates the solution may be suboptimal or infeasible, producing biased parameter
estimates that corrupt volatility forecasts
stage_ids:
- parameter_estimation
- id: finance-C-042
when: When setting up backcasting for variance recursion
action: use EWMA(0.94) with tau=75 for backcast computation
severity: medium
kind: resource_boundary
modality: must
consequence: Incorrect backcast values bias initial variance estimates, affecting convergence speed and potentially producing
suboptimal parameter estimates
stage_ids:
- parameter_estimation
- id: finance-C-043
when: When executing likelihood computation in ARCH estimation
action: use Numba JIT compilation with nopython=True for speed, with graceful pure Python fallback
severity: medium
kind: resource_boundary
modality: must
consequence: Without JIT compilation, likelihood evaluation becomes prohibitively slow for large datasets, making estimation
impractical
stage_ids:
- parameter_estimation
- id: finance-C-044
when: When the optimizer returns non-zero convergence status
action: emit ConvergenceWarning to alert user about potential optimization difficulty
severity: high
kind: operational_lesson
modality: must
consequence: Silent acceptance of non-converged optimization produces unreliable parameter estimates that may not represent
the true optimum, misleading downstream risk calculations
stage_ids:
- parameter_estimation
- id: finance-C-045
when: When providing custom starting values for optimization
action: validate starting values satisfy both bounds and inequality constraints before use
severity: high
kind: operational_lesson
modality: must
consequence: Invalid starting values cause optimization to start from an infeasible point, potentially converging to invalid
parameters or failing to converge
stage_ids:
- parameter_estimation
- id: finance-C-046
when: When estimating ARCH models with SLSQP optimizer
action: tolerate non-zero convergence status as common in ARCH estimation due to constraint boundaries
severity: medium
kind: operational_lesson
modality: must
consequence: Treating non-zero status as fatal error prevents valid estimates from being returned when optimizer reaches
constraint boundaries (common in volatility models)
stage_ids:
- parameter_estimation
- id: finance-C-047
when: When presenting ARCH model estimation results
action: claim that backtested volatility forecasts equal expected live trading performance
severity: high
kind: claim_boundary
modality: must_not
consequence: ARCH volatility estimates are conditional on historical data; structural breaks, regime changes, and market
conditions cause live performance to diverge from backtested results
stage_ids:
- parameter_estimation
- id: finance-C-048
when: When computing parameter covariance using numerical derivatives
action: skip numerical Hessian inversion when hessian is near-singular
severity: high
kind: domain_rule
modality: must_not
consequence: Near-singular Hessian indicates model identification issues; blind matrix inversion produces unreliable standard
errors and invalid inference
stage_ids:
- parameter_estimation
- id: finance-C-049
when: When the optimizer status is non-zero but parameters look reasonable
action: dismiss the convergence warning as 'false alarm' without investigation
severity: high
kind: rationalization_guard
modality: must_not
consequence: Dismissing convergence warnings based on superficial parameter inspection ignores constraint boundary conditions
that invalidate stationarity guarantees
stage_ids:
- parameter_estimation
- id: finance-C-050
when: When constraints appear satisfied and optimization completes
action: skip validation that each inequality constraints a.dot(params) - b >= 0 hold for final parameters
severity: medium
kind: rationalization_guard
modality: must_not
consequence: Optimizer may return parameters at constraint boundaries that technically satisfy a.dot(params) - b >= 0
but produce numerically unstable or invalid variance forecasts
stage_ids:
- parameter_estimation
- id: finance-C-053
when: When implementing multi-step ARCH variance forecasting
action: verify first-horizon multi-step variance equals one-step variance due to ARCH model structure
severity: high
kind: domain_rule
modality: must
consequence: Violation indicates incorrect implementation of ARCH recursion, causing forecast variance to diverge from
true model dynamics at h=1
stage_ids:
- forecasting
- id: finance-C-057
when: When using bootstrap forecasting method
action: require at least 10 initial observations and horizon/start ratio less than 20%
severity: high
kind: resource_boundary
modality: must
consequence: Bootstrap with insufficient burn-in or excessive extrapolation ratio produces unreliable variance estimates
with high sampling bias
stage_ids:
- forecasting
- id: finance-C-058
when: When setting simulation count for Monte Carlo forecasting
action: use default of 1000 simulations to balance Monte Carlo error against computation time
severity: medium
kind: resource_boundary
modality: should
consequence: Insufficient simulations increase variance of forecast estimates; excessive simulations waste computation
without meaningful accuracy gains
stage_ids:
- forecasting
- id: finance-C-059
when: When forecasting with FixedVariance volatility process
action: rely on simulation method producing meaningful variance forecasts
severity: high
kind: resource_boundary
modality: must_not
consequence: FixedVariance process returns NaN for all forecast paths when using simulation method, as the variance is
predetermined and cannot be simulated
stage_ids:
- forecasting
- id: finance-C-060
when: When forecasting with ARCHInMean models
action: attempt to generate forecasts as this model variant does not support prediction
severity: high
kind: resource_boundary
modality: must_not
consequence: ARCHInMean models raise NotImplementedError because the ARCH-in-mean specification makes multi-step forecasting
mathematically undefined
stage_ids:
- forecasting
- id: finance-C-061
when: When validating simulation output shape
action: verify simulated_paths/variances have shape (horizons x simulations x reindex_dim)
severity: high
kind: operational_lesson
modality: must
consequence: Incorrect simulation shape causes downstream indexing errors or silent misalignment between forecast paths
and their variance estimates
stage_ids:
- forecasting
- id: finance-C-062
when: When forecasting models with exogenous regressors
action: limit horizon to 1 or accept NaN-filled columns for multi-step forecasts
severity: medium
kind: operational_lesson
modality: must
consequence: Multi-step exogenous variable forecasts require aligned out-of-sample values that are typically unavailable,
producing NaN columns
stage_ids:
- forecasting
- id: finance-C-065
when: When validating forecasting capability before computation
action: call _check_forecasting_method to verify method compatibility with model type and horizon
severity: high
kind: architecture_guardrail
modality: must
consequence: Skipping method validation allows unsupported forecast types (e.g., EGARCH analytic multi-step) to reach
computation stage
stage_ids:
- forecasting
- id: finance-C-066
when: When using bootstrap method for variance forecasting
action: wrap standardized residual sampling through BootstrapRng to enable simulation-based path generation
severity: high
kind: architecture_guardrail
modality: must
consequence: Direct bootstrap sampling without BootstrapRng wrapper produces non-reproducible results and breaks the RNG
interface contract for simulation paths
stage_ids:
- forecasting
- id: finance-C-067
when: When presenting simulation-based forecast results
action: claim exact reproducibility without setting random_state or provide identical random seed
severity: medium
kind: claim_boundary
modality: must_not
consequence: Simulation paths contain inherent Monte Carlo randomness; presenting them as deterministic produces misleading
risk estimates
stage_ids:
- forecasting
- id: finance-C-069
when: When using simulation or bootstrap methods
action: recognize that forecast variance estimates contain Monte Carlo sampling error
severity: medium
kind: claim_boundary
modality: must
consequence: With finite simulations (default 1000), variance estimates have standard error proportional to 1/sqrt(simulations).
Claims of point precision are statistically invalid
stage_ids:
- forecasting
- id: finance-C-078
when: When implementing PhillipsPerron automatic lag selection
action: use the formula 12 * (nobs/100)^(1/4) as the default when lags is None
severity: high
kind: resource_boundary
modality: must
consequence: Non-standard lag selection produces inconsistent long-run variance estimates and invalid PP test statistics
across implementations
stage_ids:
- unitroot_testing
- id: finance-C-079
when: When implementing PhillipsPerron test
action: allow lag parameter to exceed available observations for covariance estimation
severity: high
kind: resource_boundary
modality: must_not
consequence: Excessive lags relative to observations cause ill-conditioned long-run covariance matrices, producing unreliable
PP test statistics
stage_ids:
- unitroot_testing
- id: finance-C-082
when: When running unit root tests with automatic lag selection
action: warn users when max_lags is large relative to sample size due to performance impact
severity: medium
kind: operational_lesson
modality: should
consequence: Large lag search spaces with many observations cause slow computation without proportional statistical benefit
stage_ids:
- unitroot_testing
- id: finance-C-088
when: When implementing unit root test base class
action: make _check_specification and _compute_statistic abstract methods requiring subclass implementation
severity: high
kind: architecture_guardrail
modality: must
consequence: Missing abstract method enforcement allows instantiation of incomplete test classes that lack core computation
logic
stage_ids:
- unitroot_testing
- id: finance-C-089
when: When using unit root test results for trading decisions
action: claim unit root test results as predictions of future price behavior
severity: high
kind: claim_boundary
modality: must_not
consequence: Unit root tests are statistical hypothesis tests on historical data, not forecasts; presenting stationarity
conclusions as trading signals misleads stakeholders
stage_ids:
- unitroot_testing
- id: finance-C-090
when: When presenting cointegration test results
action: claim cointegration implies causal trading relationships
severity: high
kind: claim_boundary
modality: must_not
consequence: Cointegration only indicates statistical equilibrium relationships; presenting it as evidence of profitable
pairs trading without proper risk management overstates the capability
stage_ids:
- unitroot_testing
- id: finance-C-091
when: When using critical values for unit root tests
action: claim asymptotic critical values are exact for finite samples
severity: medium
kind: claim_boundary
modality: must_not
consequence: MacKinnon critical values are asymptotic approximations; presenting them as precise thresholds for small
samples overstates test accuracy and may lead to incorrect conclusions
stage_ids:
- unitroot_testing
- id: finance-C-092
when: When running multiple unit root tests on the same series
action: select the test with most favorable p-value without pre-specification justification
severity: medium
kind: claim_boundary
modality: should_not
consequence: Multiple testing without correction inflates Type I error rate; selecting favorable results misleads about
statistical evidence for stationarity
stage_ids:
- unitroot_testing
- id: finance-C-099
when: When initializing bootstrap for confidence intervals
action: use default method='basic' for confidence interval calculation
severity: medium
kind: resource_boundary
modality: must
consequence: Using non-basic methods without understanding tradeoffs may produce incorrect coverage; basic is simplest
and matches default behavior expected by the framework
stage_ids:
- bootstrap_inference
- id: finance-C-100
when: When initializing bootstrap for confidence intervals
action: use default sampling='nonparametric'
severity: medium
kind: resource_boundary
modality: must
consequence: Nonparametric is the safest default; parametric/semiparametric require specific assumptions that may not
hold
stage_ids:
- bootstrap_inference
- id: finance-C-101
when: When initializing bootstrap for confidence intervals
action: use at least 1000 bootstrap replications for stable estimates
severity: high
kind: resource_boundary
modality: must
consequence: Fewer than 1000 reps produces high-variance confidence intervals with poor coverage, leading to unreliable
statistical inference and potentially wrong conclusions
stage_ids:
- bootstrap_inference
- id: finance-C-102
when: When implementing MCS or SPA multiple comparison procedures
action: default block_size to sqrt(T) when not provided
severity: high
kind: resource_boundary
modality: must
consequence: Wrong block size invalidates time-series bootstrap standard errors; sqrt(T) is theoretically justified for
block bootstraps
stage_ids:
- bootstrap_inference
- id: finance-C-104
when: When initializing MCS
action: default MCS test size to 0.05 (5% significance level)
severity: low
kind: resource_boundary
modality: should
consequence: Non-standard significance levels may not be appropriate for model selection; 0.05 is conventional
stage_ids:
- bootstrap_inference
- id: finance-C-105
when: When implementing BCa confidence intervals
action: compute acceleration parameter using jackknife estimation
severity: high
kind: operational_lesson
modality: must
consequence: BCa without proper jackknife acceleration produces biased confidence intervals that fail to achieve nominal
coverage
stage_ids:
- bootstrap_inference
- id: finance-C-108
when: When initializing any bootstrap class for reproducibility
action: pass seed parameter (int, Generator, or RandomState) to enable reproducible results
severity: high
kind: architecture_guardrail
modality: must
consequence: Without seed, each run produces different bootstrap replicates, preventing reproducible inference and making
results impossible to verify
stage_ids:
- bootstrap_inference
- id: finance-C-111
when: When implementing bootstrap state management
action: use reset() to restore initial state or reset with new seed
severity: medium
kind: architecture_guardrail
modality: must
consequence: Without proper reset, bootstrap continues from current state causing reproducibility issues in sequential
inference
stage_ids:
- bootstrap_inference
- id: finance-C-112
when: When using bootstrap confidence intervals
action: claim bootstrap CI coverage is exact for finite samples
severity: high
kind: claim_boundary
modality: must_not
consequence: Bootstrap confidence intervals are asymptotically valid; claiming exact finite-sample coverage is misleading
and violates statistical theory
stage_ids:
- bootstrap_inference
- id: finance-C-113
when: When using MCS model confidence set
action: claim MCS produces guaranteed model rankings
severity: high
kind: claim_boundary
modality: must_not
consequence: MCS produces confidence set of models, not rankings; claiming guaranteed ranking is statistically incorrect
stage_ids:
- bootstrap_inference
- id: finance-C-114
when: When using SPA test of Superior Predictive Ability
action: claim SPA p-values have exact finite-sample distribution
severity: high
kind: claim_boundary
modality: must_not
consequence: SPA uses bootstrap p-values that are asymptotically calibrated; exact finite-sample distribution is unknown
stage_ids:
- bootstrap_inference
- id: finance-C-115
when: When implementing MCS with identical loss values
action: handle standard deviation of 0 in loss differences with warning
severity: high
kind: operational_lesson
modality: must
consequence: Identical losses produce zero variance, causing division by zero and invalid MCS computation with RuntimeWarning
issued
stage_ids:
- bootstrap_inference
- id: finance-C-126
when: When displaying model results summary
action: organize parameters into separate tables for Mean Model, Volatility Model, and Distribution
severity: high
kind: architecture_guardrail
modality: must
consequence: Flat parameter listing without model component separation obscures model structure and complicates interpretation
stage_ids:
- results_reporting
- id: finance-C-127
when: When outputting results using statsmodels Summary
action: use statsmodels SimpleTable and Summary classes for consistent output formatting
severity: high
kind: architecture_guardrail
modality: must
consequence: Non-standard output formatting will break compatibility with Jupyter notebooks and the econometrics ecosystem
stage_ids:
- results_reporting
- id: finance-C-129
when: When optimizer indicates failed convergence
action: display convergence warning in summary output with optimizer message
severity: high
kind: architecture_guardrail
modality: must
consequence: Silent convergence failure will produce unreliable parameter estimates that appear valid but are actually
suboptimal
stage_ids:
- results_reporting
- id: finance-C-131
when: When presenting ARCH-LM test results to users
action: claim that a significant ARCH-LM test indicates adequate model specification
severity: high
kind: claim_boundary
modality: must_not
consequence: A significant ARCH-LM p-value means remaining ARCH effects exist, indicating the model is misspecified -
claiming otherwise misleads users
stage_ids:
- results_reporting
- id: finance-C-132
when: When presenting R-squared values from ARCH model estimation
action: claim that high R-squared indicates good volatility forecasting ability
severity: medium
kind: claim_boundary
modality: must_not
consequence: R-squared measures mean model fit, not volatility model adequacy; ARCH models are estimated for volatility
forecasting, not point prediction
stage_ids:
- results_reporting
- id: finance-C-133
when: When visualizing forecast results with hedgehog plots
action: align forecast spines with actual historical values at the forecast origin
severity: high
kind: domain_rule
modality: must
consequence: Misaligned hedgehog plot spines will mislead users about the timing and accuracy of forecasts relative to
actual observations
stage_ids:
- results_reporting
- id: finance-C-134
when: When using matplotlib for visualization
action: handle matplotlib version compatibility for date plotting methods
severity: medium
kind: resource_boundary
modality: should
consequence: Incompatibility with matplotlib version < 3.10 will cause date axis plotting failures in hedgehog and residual
plots
stage_ids:
- results_reporting
- id: finance-C-138
when: When passing data between data_input and model_specification
action: Preserve the hold_back parameter when excluding initial observations from estimation
severity: high
kind: architecture_guardrail
modality: must
consequence: Incorrect hold_back handling will cause the first hold_back observations to be incorrectly included or excluded
from parameter estimation
stage_ids:
- data_input
- model_specification
- id: finance-C-142
when: When computing variance bounds for use in loglikelihood
action: Pass variance_bounds computed from residuals to each variance recursion calls to prevent NaN in loglikelihood
severity: high
kind: domain_rule
modality: must
consequence: Without variance bounds, extreme variance values will produce -inf loglikelihood, causing optimizer to fail
or produce invalid parameters
stage_ids:
- model_specification
- parameter_estimation
- id: finance-C-143
when: When validating user-provided starting values
action: Check starting values satisfy both bounds AND linear constraints before passing to optimizer
severity: high
kind: domain_rule
modality: must
consequence: Invalid starting values that violate constraints will cause SLSQP to fail immediately or produce undefined
behavior in optimization
stage_ids:
- model_specification
- parameter_estimation
- id: finance-C-145
when: When passing backcast and var_bounds to forecasting
action: Use the same backcast and variance_bounds computed during estimation, not recomputed values
severity: high
kind: architecture_guardrail
modality: must
consequence: Recomputing backcast/var_bounds may produce slightly different values, breaking alignment between in-sample
fit and out-of-sample forecasts
stage_ids:
- parameter_estimation
- forecasting
- id: finance-C-146
when: When computing multi-step variance forecasts
action: Verify horizon is >= 1 and uses only variance forecasting method supported by the volatility model
severity: high
kind: domain_rule
modality: must
consequence: Using unsupported forecasting method (e.g., analytic for EGARCH multi-step) will raise ValueError or produce
mathematically incorrect forecasts
stage_ids:
- parameter_estimation
- forecasting
- id: finance-C-147
when: When passing ARCHModelResult to results reporting
action: Populate residuals array with NaN in positions outside estimation window (first_obs:last_obs)
severity: medium
kind: architecture_guardrail
modality: must
consequence: Without NaN padding, users cannot distinguish observations excluded from estimation from actual zero residuals,
causing misinterpretation of results
stage_ids:
- parameter_estimation
- results_reporting
- id: finance-C-148
when: When reporting ARCHModelResult summary
action: Report both fit_start and fit_stop indices to indicate which observations were used in estimation
severity: medium
kind: architecture_guardrail
modality: must
consequence: Without explicit fit window reporting, users may incorrectly analyze residuals or apply forecasts to wrong
time periods
stage_ids:
- parameter_estimation
- results_reporting
- id: finance-C-149
when: When passing bootstrap results to results reporting
action: Return confidence intervals with shape (2, num_params) where row 0 is lower bounds and row 1 is upper bounds
severity: high
kind: architecture_guardrail
modality: must
consequence: Incorrect confidence interval shape will cause downstream reporting to display wrong bounds or raise dimension
errors
stage_ids:
- bootstrap_inference
- results_reporting
- id: finance-C-150
when: When computing Model Confidence Set (MCS) p-values
action: Return p-values in DataFrame format with model indices as rows
severity: high
kind: architecture_guardrail
modality: must
consequence: Wrong format will cause downstream model selection to fail or select wrong models
stage_ids:
- bootstrap_inference
- results_reporting
- id: finance-C-151
when: When computing SPA (Reality Check) p-values
action: Return three p-values (lower, consistent, upper) to account for test's one-sided nature
severity: high
kind: domain_rule
modality: must
consequence: Single p-value ignores the SPA's multiple-testing correction, leading to incorrect model selection decisions
stage_ids:
- bootstrap_inference
- results_reporting
- id: finance-C-152
when: When computing unit root test statistics
action: Store stat, pvalue, and critical_values (dict with keys 1%, 5%, 10%) in the result object
severity: high
kind: architecture_guardrail
modality: must
consequence: Missing critical values will prevent users from making decisions using critical-value-based thresholds instead
of p-values
stage_ids:
- unitroot_testing
- results_reporting
- id: finance-C-153
when: When returning cointegration test results
action: Include cointegrating_vector in the result so users can implement the discovered relationship
severity: high
kind: architecture_guardrail
modality: must
consequence: Without the cointegrating vector, users cannot implement the discovered long-run equilibrium relationship
between variables
stage_ids:
- unitroot_testing
- results_reporting
- id: finance-C-154
when: When validating unit root test lags parameter
action: Reject negative lag values as they are mathematically invalid
severity: high
kind: domain_rule
modality: must
consequence: Negative lags will produce undefined behavior in the test regression or raise cryptic errors downstream
stage_ids:
- unitroot_testing
- results_reporting
- id: finance-C-155
when: When passing forecast results to reporting
action: Return mean, variance, and residual_variance as DataFrames aligned by index
severity: high
kind: architecture_guardrail
modality: must
consequence: Misaligned forecasts will cause incorrect visualization and summary statistics in results reporting
stage_ids:
- forecasting
- results_reporting
- id: finance-C-156
when: When creating hedgehog forecast plot data
action: Pad forecasts with NaN for dates before the earliest forecastable date to prevent look-ahead bias in visualization
severity: medium
kind: domain_rule
modality: must
consequence: Without NaN padding, the hedgehog plot will display incorrect forecast trajectories that include in-sample
information
stage_ids:
- forecasting
- results_reporting
- id: finance-C-157
when: When forecasting with align='target'
action: Align forecasts so that column h contains h-step ahead forecast from time t-h, matching evaluation methodology
severity: high
kind: domain_rule
modality: must
consequence: Misaligned target forecasts will show incorrect alignment between realizations and forecasts, invalidating
forecast evaluation metrics
stage_ids:
- forecasting
- results_reporting
- id: finance-C-158
when: When using bootstrap forecasting method
action: Require at least 10 initial observations and horizon/start ratio < 0.2 for valid bootstrap
severity: high
kind: resource_boundary
modality: must
consequence: Insufficient initial observations for bootstrap will produce unstable variance estimates and unreliable forecasts
stage_ids:
- forecasting
- results_reporting
- id: finance-C-164
when: When using a GARCH volatility model with power not equal to 2.0
action: Request analytic forecasting method for horizon > 1 — only 'simulation' or 'bootstrap' are valid forecasting methods
for non-square-power GARCH models
severity: high
kind: domain_rule
modality: must_not
consequence: Using method='analytic' with power!=2.0 at horizon > 1 raises a ValueError and produces no forecast, forcing
re-estimation or re-configuration
- id: finance-C-165
when: When using an EGARCH volatility model
action: Request analytic forecasting method for horizon > 1 — EGARCH does not support analytic multi-step variance forecasts
severity: high
kind: domain_rule
modality: must_not
consequence: Using method='analytic' with EGARCH at horizon > 1 raises a ValueError, preventing multi-step forecast generation
- id: finance-C-166
when: When determining the estimation path in ARCHModel.fit()
action: 'The closed-form estimation path is only taken when each three conditions hold simultaneously: volatility.closed_form=True,
distribution.num_params=0, and isinstance(volatility, ConstantVariance) — if any condition fails, use the general SLSQP
optimization path'
severity: high
kind: architecture_guardrail
modality: must
consequence: Attempting to use the closed-form path without all conditions met causes incorrect parameter estimates or
AttributeErrors, as the closed-form formulas are specific to the ConstantVariance + Normal combination
- id: finance-C-167
when: When calling the forecast() method on an ARCHModel or ARCHModelResult
action: 'Use only one of the three explicitly supported ForecastingMethod values: ''analytic'', ''simulation'', or ''bootstrap''
— any other string raises a TypeError'
severity: high
kind: architecture_guardrail
modality: must
consequence: Using an unsupported forecasting method string causes a TypeError in the function call chain, preventing
forecast generation
- id: finance-C-168
when: When configuring unit root tests (ADF, DFGLS, Phillips-Perron, KPSS, Zivot-Andrews, VarianceRatio)
action: 'Use only one of the four explicitly supported trend specifications: ''n'' (no constant), ''c'' (constant only),
''ct'' (constant and time trend), or ''ctt'' (constant, time trend, and squared time trend)'
severity: high
kind: architecture_guardrail
modality: must
consequence: Using an unsupported trend specification causes a TypeError or produces statistically incorrect test results
with wrong degrees of freedom
- id: finance-C-169
when: When configuring parameter covariance estimation in ARCHModel.fit()
action: Use only 'robust' or 'classic' for the cov_type parameter — 'robust' uses the sandwich estimator with numerical
derivatives, 'classic' uses the inverse Hessian
severity: high
kind: architecture_guardrail
modality: must
consequence: Using an unsupported cov_type value causes incorrect standard errors and invalid inference (wrong t-statistics,
p-values, and confidence intervals)
- id: finance-C-170
when: When calling forecast() on an ARCHModel or ARCHModelResult
action: Use only 'origin' or 'target' for the align parameter — 'origin' aligns forecasts by their information origin
time, 'target' aligns by the forecast target time
severity: high
kind: architecture_guardrail
modality: must
consequence: Using an unsupported align value causes a TypeError and prevents forecast computation
- id: finance-C-176
when: When initializing any bootstrap class (IIDBootstrap, CircularBlockBootstrap, StationaryBootstrap, MovingBlockBootstrap)
action: 'Pass the index parameter as one of the three supported types: an Int64Array1D, a tuple of Int64Array1D, or a
tuple of (list of Int64Array1D, dict of Int64Array1D) — matching the BootstrapIndexT union type'
severity: high
kind: architecture_guardrail
modality: must
consequence: Passing an unsupported index type causes the bootstrap to produce invalid resampled indices, corrupting all
bootstrap confidence intervals, p-values, and covariance estimates
- id: finance-C-177
when: When estimating any ARCH model and the variance of input residuals is outside [0.1, 10000.0)
action: Allow automatic rescaling of the data or provide explicit rescale=True — the _check_scale function automatically
rescales data outside this range by powers of 10 to avoid numerical issues in the optimizer
severity: medium
kind: operational_lesson
modality: should
consequence: Data with variance outside [0.1, 10000) causes the SLSQP optimizer to converge slowly or fail to find the
optimal parameters, producing suboptimal or invalid estimates
- id: finance-C-178
when: When presenting or reporting results from this package to users
action: Claim that the package supports real-time streaming data analysis — it is a batch statistical estimation library
that operates on static historical time series
severity: high
kind: claim_boundary
modality: must_not
consequence: Users attempt to integrate the package into real-time trading pipelines expecting live data ingestion, leading
to system failures when the package cannot process streaming data
- id: finance-C-179
when: When presenting or reporting this system's capabilities
action: Claim support for high-frequency trading systems requiring sub-second latency — the package performs batch maximum-likelihood
estimation unsuitable for latency-critical applications
severity: high
kind: claim_boundary
modality: must_not
consequence: Users deploy the package in HFT contexts where sub-second decision-making is required, causing severe financial
losses due to estimation latency
- id: finance-C-180
when: When presenting or reporting this system's capabilities
action: Claim support for multivariate volatility models — the package only implements univariate ARCH/GARCH variants;
users requiring multivariate volatility must use the rpy2 port of the R package 'rmgarch'
severity: high
kind: claim_boundary
modality: must_not
consequence: Users attempt to model multivariate volatility correlations using the univariate package, producing incorrect
risk estimates and wrong portfolio allocation decisions
- id: finance-C-181
when: When presenting or reporting this system's capabilities
action: Claim support for structural break detection beyond unit root tests — the package's unit root tests cannot detect
multiple structural breaks; users requiring this should use dedicated structural change packages
severity: medium
kind: claim_boundary
modality: must_not
consequence: Users rely on unit root tests for structural break detection, missing multiple breaks that invalidate the
entire time-series model specification
- id: finance-C-182
when: When presenting or reporting this system's capabilities
action: Claim drag-and-drop GUI interfaces — the package is a Python API-only library with no graphical user interface;
users requiring GUI access must build their own wrappers
severity: medium
kind: claim_boundary
modality: must_not
consequence: Users expect a graphical interface and cannot use the package's Python API, wasting development resources
on attempting to find a non-existent GUI
- id: finance-C-183
when: When presenting or reporting this system's capabilities
action: Claim that this package supports live trading — it is a pure backtesting and statistical estimation library with
no exchange connectivity, order execution, or portfolio management capabilities
severity: high
kind: claim_boundary
modality: must_not
consequence: Users connect the package directly to a brokerage expecting automated trade execution, causing unintended
market orders to be placed with real capital
- id: finance-C-184
when: When presenting or reporting this system's backtested or estimated returns to users
action: Claim that estimated model parameters or historical backtest results equal expected future performance — past
ARCH model estimates do not predict future volatility or returns
severity: high
kind: claim_boundary
modality: must_not
consequence: Users make live capital allocation decisions based on inflated historical estimates, leading to severe underperformance
when market regimes shift away from the estimation period
- id: finance-C-185
when: When presenting or reporting volatility forecasts to users
action: Claim that model-based forecasts fully account for market microstructure costs — forecasts ignore market impact,
bid-ask spread, financing costs, slippage, and execution delays
severity: high
kind: claim_boundary
modality: must_not
consequence: Users incorporate ARCH forecasts directly into live trading strategies without adjusting for execution costs,
producing strategies that appear profitable in backtests but lose money in live trading after costs
- id: finance-C-186
when: When implementing GARCH volatility model configuration in arch package
action: Use GARCH(1,1) with power=2.0 as the default configuration — verify default p=1, q=1 parameters are used unless
explicit model selection is performed; for asymmetric volatility, use GJR-GARCH; for log-volatility use EGARCH
severity: high
kind: domain_rule
modality: must
consequence: Using non-standard GARCH parameters (p>1, q>1) without sufficient data causes unreliable volatility estimates,
leading to incorrect risk forecasts and poor hedging decisions in live trading
derived_from_bd_id: BD-001
- id: finance-C-187
when: When using GARCH model with non-standard power parameter
action: Use simulation-based forecasting when power != 2.0 — analytic forecast methods are only available for standard
GARCH (power=2.0); verify forecast method is explicitly set to simulation if power differs from default
severity: high
kind: domain_rule
modality: must
consequence: Using analytic forecasting with power != 2.0 produces incorrect forecast values since closed-form solutions
do not exist for non-standard power specifications, causing systematic mispricing in option hedging and VaR calculations
derived_from_bd_id: BD-060
- id: finance-C-188
when: When configuring backcast parameters for GARCH model initialization
action: Verify backcast tau calculation matches min(75, nobs) formula — for samples smaller than 75, tau should equal
nobs; for larger samples, tau should equal 75; changing decay factor from 0.94 affects backcast smoothness and initial
variance estimates
severity: medium
kind: operational_lesson
modality: should
consequence: Incorrect backcast tau calculation produces biased initial variance estimates, affecting the accuracy of
short-term volatility forecasts which are critical for intraday risk management and option pricing
derived_from_bd_id: BD-066
- id: finance-C-189
when: When testing for cointegration between two financial time series
action: Apply Engle-Granger two-step cointegration test using OLS regression followed by ADF test on residuals — this
method applies to bivariate relationships only; use MacKinnon critical values for significance determination
severity: high
kind: domain_rule
modality: must
consequence: Using univariate time series methods for cointegration testing incorrectly identifies or misses cointegrating
relationships, causing pairs trading strategies to trade on spurious relationships or miss profitable opportunities
derived_from_bd_id: BD-042
- id: finance-C-190
when: When calculating returns for downstream performance metrics
action: Verify that 100 * pct_change() scaling matches the expected annualization multiplier (12x for percentage returns)
used in Sharpe ratio and other risk-adjusted performance calculations
severity: medium
kind: operational_lesson
modality: should
consequence: Using log returns without adjusting the annualization multiplier from 12x to sqrt(12) causes systematic mismeasurement
of risk-adjusted performance, making strategies appear more or less attractive than they actually are
derived_from_bd_id: BD-003
- id: finance-C-191
when: When computing Sharpe ratio or other risk-adjusted performance metrics
action: 'Verify the annualization multiplier matches the return scaling convention: use 12x for percentage returns (100*pct_change)
or sqrt(12) for log returns; document the dependency explicitly in strategy analysis code'
severity: medium
kind: operational_lesson
modality: should
consequence: Mismatch between return scaling convention and annualization assumption creates systematic mismeasurement
of risk-adjusted performance, causing strategies to appear more or less attractive than their true performance
derived_from_bd_id: BD-074
- id: finance-C-192
when: When implementing optimization or fitting methods for ARCH models
action: Assume the framework provides comprehensive convergence diagnostics beyond scipy status codes — the current convergence_flag
only returns status codes without iteration history, log-likelihood path, or parameter trajectory data
severity: high
kind: claim_boundary
modality: must_not
consequence: Without trajectory data, local optima issues in complex ARCH models cannot be diagnosed post-hoc, leading
to unreliable parameter estimates being used in production strategies
derived_from_bd_id: BD-GAP-016
- id: finance-C-193
when: When fitting complex ARCH models requiring convergence quality assessment
action: 'Implement or use a ConvergenceDiagnosis object that stores: iteration history, log-likelihood path, and parameter
trajectory for post-hoc assessment of convergence quality to diagnose local optima issues'
severity: high
kind: domain_rule
modality: must
consequence: Fitting complex ARCH models without convergence diagnostics prevents diagnosis of local optima issues, causing
unreliable parameter estimates to be used in production strategies
derived_from_bd_id: BD-GAP-016
- id: finance-C-194
when: When running simulation/bootstrap forecasting
action: Verify that 1000 simulations (Monte Carlo SE ≈ 3.2%) provides sufficient precision for the intended use case;
for extreme quantile estimation such as VaR at 99%, increase to 10000+ simulations to achieve stable tail estimates
severity: medium
kind: operational_lesson
modality: should
consequence: Using 1000 simulations for extreme quantile estimation produces unstable VaR estimates with high variance,
leading to either excessive capital reserves or underestimation of tail risk
derived_from_bd_id: BD-005
- id: finance-C-195
when: When initializing variance in EWMA volatility models
action: Verify that lambda=0.94 decay rate and tau=75 observation window (~3 months of daily data) align with your data
frequency and volatility characteristics before using default backcast values
severity: medium
kind: operational_lesson
modality: should
consequence: Default EWMA parameters may not match asset-specific volatility dynamics, causing systematic initialization
bias that propagates through all forecasts
derived_from_bd_id: BD-006
- id: finance-C-196
when: When selecting power specification for APARCH or TARCH volatility models
action: Verify power parameter before selecting forecast method; power!=2 automatically switches from analytic to simulation-based
forecasting with different computational cost and replication requirements
severity: high
kind: domain_rule
modality: must
consequence: Selecting non-quadratic power models without understanding the forecast method switch causes unexpected runtime
increases and potentially insufficient simulation replications for stable tail estimates
derived_from_bd_id: BD-077
- id: finance-C-197
when: When relying on Bollerslev-Wooldridge robust standard errors for inference
action: Assume robust SE corrects inference only when the conditional variance specification is approximately correct;
model misspecification such as ignoring leverage effects makes robust SE unreliable despite appearing to correct standard
errors
severity: high
kind: operational_lesson
modality: must_not
consequence: Robust standard errors provide false confidence when the GARCH specification is misspecified, leading to
invalid hypothesis tests and potentially wrong conclusions about coefficient significance
derived_from_bd_id: BD-080
- id: finance-C-198
when: When selecting bootstrap block length for time series resampling
action: Verify sqrt(T) rule-of-thumb against Politis-White optimal block calculation; for highly persistent series, larger
blocks than sqrt(T) may be needed to account for long memory
severity: medium
kind: operational_lesson
modality: should
consequence: Using sqrt(T) default block length for highly persistent series inflates bootstrap variance and produces
unreliable confidence intervals, leading to incorrect statistical inference
derived_from_bd_id: BD-079
- id: finance-C-199
when: When implementing or refactoring RiskMetrics2006 variance calculations
action: Maintain lambda=0.94 as the fixed decay factor for EWMA recursion, as this is the RiskMetrics 2006 industry standard
for balancing responsiveness and stability in variance estimation
severity: high
kind: domain_rule
modality: must
consequence: Changing lambda from the RiskMetrics 2006 standard of 0.94 breaks comparability with industry benchmarks
and produces variance estimates that do not reflect the intended balance between responsiveness and stability, potentially
leading to misaligned risk management decisions
derived_from_bd_id: BD-036
- id: finance-C-200
when: When implementing volatility model selection for time series with long-memory characteristics
action: Use GARCH instead of FIGARCH for series exhibiting long-memory — GARCH cannot capture hyperbolic decay in volatility
autocorrelation; FIGARCH(1,d,1) with fractional differencing parameter d in (0,0.5) is required
severity: high
kind: domain_rule
modality: must_not
consequence: Substituting GARCH for FIGARCH when modeling long-memory volatility causes the model to miss the characteristic
hyperbolic decay pattern, leading to materially incorrect variance forecasts that distort risk estimates and hedging
ratios
derived_from_bd_id: BD-039
- id: finance-C-201
when: When implementing DFGLS unit root test for stationarity detection
action: Apply GLS detrending (ERS 1996) before Dickey-Fuller regression in the DFGLS variant — standard OLS detrending
must not be used as it provides materially lower test power
severity: high
kind: domain_rule
modality: must
consequence: Using OLS detrending instead of GLS detrending in DFGLS reduces test power below the designed efficiency
gains, causing higher rates of failing to detect actual unit roots and leading to false conclusions about stationarity
derived_from_bd_id: BD-041
- id: finance-C-202
when: When implementing cointegrating vector estimation for bivariate or multivariate relationships
action: Add leads and lags of differenced regressors (Dynamic OLS) to address endogeneity in the cointegrating regression
— static OLS without augmentation must not be used
severity: high
kind: domain_rule
modality: must
consequence: Using static OLS without lead/lag augmentation introduces endogeneity bias that violates the super-consistency
property of cointegration estimators, producing inconsistent coefficient estimates that invalidate the identified long-run
relationship
derived_from_bd_id: BD-043
- id: finance-C-203
when: When implementing volatility calculations in backtesting or production code
action: Assume the framework provides a built-in annualized_volatility() helper function with configurable compounding
convention — no such standardized helper exists in the current framework
severity: high
kind: claim_boundary
modality: must_not
consequence: Without a standardized annualization helper, users apply inconsistent formulas, leading to incorrect risk
estimates and strategy comparisons that diverge from live trading results
derived_from_bd_id: BD-GAP-017
- id: finance-C-204
when: When implementing volatility calculations in backtesting or production code
action: Implement an explicit annualized_volatility() helper that accepts configurable compounding convention (252 for
daily trading days, 365 for calendar days, simple for no compounding) and documents that input volatility is in frequency-of-data
units
severity: high
kind: domain_rule
modality: must
consequence: Without explicit annualization, users apply inconsistent formulas causing systematic risk mis-estimation
that compounds over time in live trading
derived_from_bd_id: BD-GAP-017
- id: finance-C-205
when: When implementing GARCH model evaluation workflows in backtesting
action: Assume the framework provides a standardized backtest validation framework with automatic train/test splits and
VaR/CVaR/realized PnL tracking — no such framework exists in the current implementation
severity: high
kind: claim_boundary
modality: must_not
consequence: Without standardized backtest methodology, users implement ad-hoc validation that fails to detect GARCH forecast
failures, leading to live trading losses from unvalidated volatility predictions
derived_from_bd_id: BD-GAP-018
- id: finance-C-206
when: When implementing GARCH model evaluation workflows in backtesting
action: Implement a backtest validation framework that includes automatic train/test split for time series, historical
VaR/CVaR tracking against realized PnL, and diagnostic plots for volatility model evaluation
severity: high
kind: domain_rule
modality: must
consequence: Without standardized backtest validation, GARCH forecast failures go undetected until live trading, causing
significant financial losses from incorrect volatility predictions
derived_from_bd_id: BD-GAP-018
- id: finance-C-207
when: When implementing or extending GARCH model parameter extraction and constraint application
action: Use hardcoded parameter indices without computed offsets — parameter ordering follows [mean_params, vol_params,
dist_params] with dynamically computed offsets during fit()
severity: high
kind: claim_boundary
modality: must_not
consequence: Direct index access assuming fixed parameter ordering breaks custom GARCH variants; constraint application
fails silently causing invalid optimization results
derived_from_bd_id: BD-GAP-015
- id: finance-C-208
when: When implementing or extending GARCH model parameter extraction and constraint application
action: Always use dynamically computed offsets from the model to access parameter indices; validate that offsets remain
within bounds before parameter extraction
severity: high
kind: domain_rule
modality: must
consequence: Without using computed offsets, custom GARCH variants with non-standard parameter counts cause index out-of-bounds
errors or silently corrupted parameter values
derived_from_bd_id: BD-GAP-015
- id: finance-C-209
when: When configuring starting values for GARCH optimization
action: Validate GARCH starting values against stationarity constraints (alpha >= 0, gamma >= 0, beta >= 0, alpha + gamma
+ beta < 1) before passing to optimizer
severity: high
kind: domain_rule
modality: must
consequence: The fixed grid search may produce invalid starting values for non-standard GARCH variants; using invalid
starting values causes optimizer divergence or convergence to incorrect parameters
derived_from_bd_id: BD-067
- id: finance-C-211
when: When using ARCH library for volatility calculations in production backtesting
action: Assume the framework handles stale data detection and expiry — the ARCH library does not implement data freshness
validation; stale data may be processed as current without warning
severity: high
kind: claim_boundary
modality: must_not
consequence: Without stale data detection, stale price data is processed as current, causing PnL calculations to be incorrect
and potentially resulting in significant financial losses or reporting errors in production systems
derived_from_bd_id: BD-GAP-003
- id: finance-C-212
when: When implementing data ingestion in ARCH library production backtesting
action: Implement data timestamp validation using validate_timestamp() helper — check data timestamps against current
time and reject data older than the configured staleness threshold (e.g., 5 minutes for intraday data)
severity: high
kind: domain_rule
modality: must
consequence: Stale data processed as current leads to incorrect PnL calculations and reporting errors in production backtesting
systems, potentially causing significant financial losses
derived_from_bd_id: BD-GAP-003
- id: finance-C-213
when: When running backtests or production strategies with ARCH library
action: Assume the framework automatically maintains model and data version binding — the framework does not implement
snapshot binding; different runs may silently use different data versions without tracking
severity: high
kind: claim_boundary
modality: must_not
consequence: Without version snapshot binding, backtest results become non-reproducible because different executions may
load different data versions without any tracking or warning, making strategy audits impossible
derived_from_bd_id: BD-GAP-004
- id: finance-C-214
when: When running backtests or production strategies with ARCH library
action: Capture and persist data hashes (e.g., hashlib.sha256 on source data) and model version identifiers for each backtest
run, storing them alongside results in a version manifest file
severity: high
kind: domain_rule
modality: must
consequence: Without storing version snapshots, backtest results cannot be reproduced or audited; different data versions
may silently change strategy performance and invalidate historical comparisons
derived_from_bd_id: BD-GAP-004
- id: finance-C-215
when: When implementing custom data provider integration or attempting to write external datasets to ARCH library
action: Assume the framework supports versioned writes and snapshot semantics for data persistence — the framework only
supports built-in datasets (sp500, cpu, realized volatility) without version control or atomic write guarantees
severity: high
kind: claim_boundary
modality: must_not
consequence: Without versioned writes, concurrent data updates can corrupt datasets, and snapshot semantics cannot guarantee
data consistency across parallel operations or strategy executions
derived_from_bd_id: BD-GAP-011
- id: finance-C-216
when: When implementing custom data provider integration with ARCH library
action: Use external database transactions or file versioning systems (e.g., git LFS, versioned S3 buckets) for custom
dataset writes — implement atomic write patterns using database transactions or write-to-temp-then-rename operations
severity: high
kind: domain_rule
modality: must
consequence: Without version control and atomic writes, data corruption can occur during concurrent updates; strategies
may execute on inconsistent snapshots, leading to unpredictable backtest results
derived_from_bd_id: BD-GAP-011
- id: finance-C-217
when: When processing DatetimeIndex inputs with timezone-naive indices in arch/utility/array.py:259-276
action: Assume the framework handles timezone conversion automatically — timezone-naive indices are silently stripped
to GMT without warning, causing subtle off-by-one errors in quarterly and monthly data
severity: high
kind: claim_boundary
modality: must_not
consequence: Timezone-naive date indices are silently stripped to GMT, leading to off-by-one errors that accumulate over
time in quarterly and monthly data, corrupting statistical analysis and strategy signals
derived_from_bd_id: BD-GAP-012
- id: finance-C-218
when: When processing DatetimeIndex inputs before passing to ARCH library functions
action: UTC-normalize each DatetimeIndex inputs using tz_localize(tz='UTC') before processing, and apply validate_timezone()
helper to verify each date indices carry explicit UTC timezone information
severity: high
kind: domain_rule
modality: must
consequence: Without explicit timezone normalization, timezone-naive indices silently default to GMT, causing subtle off-by-one
errors that corrupt quarterly and monthly statistical analysis used for strategy decisions
derived_from_bd_id: BD-GAP-012
- id: finance-C-219
when: When implementing or modifying KPSS stationarity test logic in arch/unitroot/unitroot.py
action: Use automatic bandwidth selection for KPSS test via auto_bandwidth() function — the auto_bandwidth function minimizes
asymptotic mean squared error of the variance estimator; do not replace with fixed bandwidth formulas (T^0.5, T^0.4)
unless sample size is small, series is trending, or exhibits heavy-tailed distributions
severity: high
kind: domain_rule
modality: must
consequence: Replacing auto-bandwidth with fixed bandwidth may cause KPSS test to reject stationarity incorrectly or accept
non-stationary series as stationary, leading to incorrect trading strategy signals and financial losses
derived_from_bd_id: BD-028
- id: finance-C-220
when: When processing monetary values or quantitative data in backtesting
action: Assume the framework provides explicit currency/unit annotation for data fields — the framework does not implement
currency/unit annotation; numeric values lack metadata about their denomination or measurement unit
severity: high
kind: claim_boundary
modality: must_not
consequence: Without explicit currency/unit annotation, mixed-currency portfolios or unit-mismatched data cause silent
conversion errors that accumulate over time, leading to incorrect portfolio valuations and risk calculations in production
derived_from_bd_id: BD-GAP-005
- id: finance-C-221
when: When defining portfolio data structures or processing multi-currency positions
action: Add explicit currency/unit metadata to each monetary fields (e.g., currency_code='CNY', unit='share', scale=1)
in data schemas and validate unit consistency before calculations
severity: high
kind: operational_lesson
modality: must
consequence: Explicit currency/unit annotation prevents silent unit mismatches that cause portfolio value miscalculations
when strategies operate across multiple currencies or asset classes
derived_from_bd_id: BD-GAP-005
- id: finance-C-222
when: When estimating covariance matrices for portfolio optimization or risk calculation
action: Assume the framework automatically fixes non-positive semi-definite (PSD) covariance matrices — the framework
does not implement PSD correction; eigenvalue adjustment, Higham's method, or shrinkage is not applied automatically
severity: high
kind: claim_boundary
modality: must_not
consequence: Non-PSD covariance matrices cause portfolio optimizers to fail or produce invalid allocations with negative
variances, leading to incorrect risk estimates and potentially catastrophic trading decisions
derived_from_bd_id: BD-GAP-008
- id: finance-C-223
when: When performing covariance matrix estimation in portfolio construction
action: Apply PSD correction using eigenvalue clipping (set negative eigenvalues to 0, rescale trace) or Higham's method
before passing covariance matrix to portfolio optimizer
severity: high
kind: domain_rule
modality: must
consequence: Without PSD correction, portfolio optimization fails for ill-conditioned covariance matrices, causing strategies
to produce unstable or infeasible allocations
derived_from_bd_id: BD-GAP-008
- id: finance-C-224
when: When estimating covariance matrices for portfolios with N > T (high-dimensional case)
action: Select Ledoit-Wolf shrinkage estimator or OAS (Oracle Approximating Shrinkage) with target=diagonal, and configure
shrinkage intensity α in range [0.2, 0.4] based on cross-validation
severity: high
kind: operational_lesson
modality: must
consequence: Shrinkage estimators provide finite-sample bias correction that improves portfolio out-of-sample performance
by 5-15% in high-dimensional cases compared to raw sample covariance
derived_from_bd_id: BD-GAP-009
- id: finance-C-225
when: When configuring VaR/CVaR risk metrics for regulatory or internal risk management
action: Set confidence_level=0.99 for regulatory VaR (Basel requirements) or 0.95 for internal models, and configure lookback_window=252
(1 year) for daily VaR or 60 for monthly; validate parameters against risk mandate before backtesting
severity: high
kind: domain_rule
modality: must
consequence: Explicit VaR/CVaR parameter configuration ensures regulatory compliance and accurate tail risk estimation
aligned with the specific risk management mandate
derived_from_bd_id: BD-GAP-010
- id: finance-C-226
when: When implementing volatility forecasting using realized volatility (RV) data
action: Use HAR (Heterogeneous Autoregressive) model with predetermined lags of 1, 5, and 22 days representing daily,
weekly, and monthly components respectively
severity: high
kind: domain_rule
modality: must
consequence: Using alternative volatility models like standard GARCH without understanding HAR's multi-scale advantage
produces systematically different volatility forecasts, leading to incorrect risk assessments and suboptimal hedging
decisions in live trading
derived_from_bd_id: BD-050
- id: finance-C-227
when: When implementing bootstrap confidence intervals or hypothesis tests for time series
action: Use Stationary Bootstrap with geometric block length distribution where expected block length equals 1/p (p =
success probability) to verify stationarity of bootstrap samples
severity: high
kind: domain_rule
modality: must
consequence: Using fixed block length bootstrap methods like Circular Block Bootstrap or Moving Block Bootstrap introduces
non-stationarity in bootstrap samples, causing confidence intervals to be systematically miscalibrated and hypothesis
tests to have incorrect rejection rates
derived_from_bd_id: BD-054
- id: finance-C-228
when: When implementing t-stat based lag selection for statistical tests
action: Apply t-stat threshold of |1.645| (10% two-sided significance) for lag elimination — remove lags where absolute
t-stat < 1.645 and continue until each remaining lags meet the threshold
severity: high
kind: domain_rule
modality: must
consequence: Changing the t-stat threshold alters the lag selection aggressiveness; using a stricter threshold (e.g.,
1.96 for 5%) retains fewer lags while using a looser threshold retains more lags, directly affecting model specification
and test power
derived_from_bd_id: BD-055
- id: finance-C-229
when: When implementing or configuring APARCH volatility models in ARCH library
action: Maintain delta=1 when using APARCH for TARCH specification; the power parameter controls volatility asymmetry
where negative shocks produce larger volatility increases (absolute return specification)
severity: high
kind: domain_rule
modality: must
consequence: Without delta=1, APARCH no longer nests TARCH specification, eliminating the asymmetric volatility mechanism
that captures negative shock amplification, causing systematically biased volatility forecasts
derived_from_bd_id: BD-056
- id: finance-C-231
when: When configuring ARCHModel instances for volatility estimation
action: Explicitly specify volatility model (ConstantVariance, EGARCH, GARCH, etc.) and distribution (Normal, StudentT,
etc.) in constructor parameters rather than relying on hardcoded defaults; verify default path matches intended estimation
severity: medium
kind: operational_lesson
modality: should
consequence: Using hardcoded defaults without verification may result in ConstantVariance+Normal being silently used,
causing incorrect volatility and distribution specifications that corrupt estimation results and invalidate downstream
risk metrics
derived_from_bd_id: BD-061
- id: finance-C-232
when: When refactoring bootstrap implementations in arch library
action: Preserve the inheritance relationship between CircularBlockBootstrap and IIDBootstrap; verify block_length parameter
override is maintained as the inheritance enables clone behavior and shared sampling interface
severity: high
kind: domain_rule
modality: must
consequence: Breaking the inheritance relationship between CircularBlockBootstrap and IIDBootstrap breaks the circular
block bootstrap implementation, causing sampling failures and invalid statistical inference in bootstrap-based confidence
intervals
derived_from_bd_id: BD-065
output_validator:
assertions:
- id: OV-01
check_predicate: all(p in inspect.getsource(zvt.factors.algorithm.macd) for p in ['slow=26', 'fast=12', 'n=9'])
failure_message: 'FATAL: MACD params drifted from (fast=12, slow=26, n=9) — SL-08 violation, non-reproducible signals'
business_meaning: Standard MACD parameters are a semantic lock; drift makes results incomparable with industry-standard
indicators and non-reproducible.
source_ids:
- SL-08
- BD-036
- id: OV-02
check_predicate: result.get('total_trades', 0) > 0 or result.get('explicit_zero_trade_ack') is True
failure_message: Zero trades executed — likely missing pre-fetched data (see PC-02) or over-restrictive filters
business_meaning: A backtest with zero trades is not a valid result; either data is missing or the strategy never triggered.
Structural non-emptiness check is insufficient — we need business confirmation.
source_ids:
- SL-01
- finance-C-073
- id: OV-03
check_predicate: result.get('annual_return') is None or abs(float(result['annual_return'])) <= 5.0
failure_message: 'FATAL: |annual_return| > 500% — likely look-ahead bias or data error'
business_meaning: Annual returns exceeding 500% are physically implausible for A-share strategies; indicates look-ahead
bias or corrupt data.
source_ids: []
- id: OV-04
check_predicate: result.get('holding_change_pct') is None or abs(float(result['holding_change_pct'])) <= 1.0
failure_message: 'FATAL: |holding_change_pct| > 100% — physically impossible'
business_meaning: Holding change percentage cannot exceed 100%; violation indicates position accounting error.
source_ids:
- BD-029
- id: OV-05
check_predicate: result.get('max_drawdown') is None or abs(float(result['max_drawdown'])) <= 1.0
failure_message: 'FATAL: |max_drawdown| > 100% — impossible for non-leveraged account'
business_meaning: Maximum drawdown cannot exceed 100% without leverage; violation indicates calculation error or look-ahead
bias.
source_ids: []
- id: OV-06
check_predicate: not (hasattr(result, 'trade_log') and result.trade_log and any(result.trade_log[i].action == 'sell' and
i+1 < len(result.trade_log) and result.trade_log[i+1].action == 'buy' and result.trade_log[i].timestamp == result.trade_log[i+1].timestamp
for i in range(len(result.trade_log)-1)))
failure_message: 'FATAL: buy-before-sell detected in same cycle — SL-01 violation, creates implicit leverage'
business_meaning: SL-01 requires sell() before buy() in each cycle; violation means available_long was not updated before
buying, risking duplicate positions.
source_ids:
- SL-01
scaffold:
validate_py_path: '{workspace}/validate.py'
tail_block: "# === DO NOT MODIFY BELOW THIS LINE ===\nif __name__ == \"__main__\":\n result = run_backtest()\n from\
\ validate import enforce_validation\n enforce_validation(result, output_path=\"{workspace}/result.csv\")\n# ===\
\ END DO NOT MODIFY ==="
enforcement_protocol: 1. Never edit validate.py. 2. Never delete the DO NOT MODIFY tail block from the main script. 3. Never
wrap enforce_validation() in try/except. 4. Never rewrite result write logic — it MUST go through enforce_validation.
5. If validate.py raises ImportError, fix the dependency, do not remove the call.
acceptance:
hard_gates:
- id: G1
check: '{workspace}/result.csv exists AND file size > 0'
on_fail: Strategy did not produce output; check run_backtest() return value and enforce_validation() call
- id: G2
check: '{workspace}/result.csv.validation_passed marker file exists'
on_fail: Validation did not complete; review validate.py output and fix assertion failures
- id: G3
check: 'Main script contains literal: from validate import enforce_validation'
on_fail: Validation chain stripped; re-add the import in the DO NOT MODIFY block
- id: G4
check: 'Main script contains literal: # === DO NOT MODIFY BELOW THIS LINE ==='
on_fail: Validation fence removed; regenerate DO NOT MODIFY tail block
- id: G5
check: 'result.csv has at least 1 row: pandas.read_csv(result.csv).shape[0] >= 1'
on_fail: Empty result; check if trade_log is non-empty and factors generated signals. Confirm PC-02 (k-data exists) passed.
- id: G6
check: 'If MACD strategy: source contains ''slow=26'' AND ''fast=12'' AND ''n=9'' in algorithm call'
on_fail: MACD params drifted from SL-08 lock; restore standard (12, 26, 9)
- id: G7
check: 'For data pipeline tasks: result.csv contains ''entity_id'' and ''timestamp'' fields'
on_fail: Missing required columns; check Mixin.query_data return schema and DataFrame MultiIndex reset_index() before
writing
- id: G8
check: 'OV-03 passes: abs(annual_return) <= 5.0 (500%)'
on_fail: Physical plausibility check failed; investigate look-ahead bias or data corruption in input kdata
soft_gates:
- id: SG-01
rubric: 'Strategy narrative consistency: user intent aligns with generated strategy.py logic. dim_a: signal direction
(buy/sell) matches intent [1-5, pass>=4]; dim_b: frequency (daily/intraday) aligns [1-5, pass>=4]; dim_c: risk controls
match user intent [1-5, pass>=4].'
- id: SG-02
rubric: 'Factor combination quality. dim_a: no highly correlated factor duplication [1-5, pass>=4]; dim_b: multi-period
alignment correct [1-5, pass>=4]; dim_c: liquidity filter present for A-share [1-5, pass>=4].'
- id: SG-03
rubric: 'Data source selection appropriateness. dim_a: coverage sufficient for target entities [1-5, pass>=4]; dim_b:
provider latency acceptable for strategy frequency [1-5, pass>=4]; dim_c: no unauthorized provider used without credentials
[1-5, pass>=4].'
skill_crystallization:
trigger: all_hard_gates_passed AND user_opt_out_skill_saving != true
output_path_template: '{workspace}/../skills/{slug}.skill'
slug_template: '{blueprint_id_short}-{uc_id_lower}'
captured_fields:
- name
- intent_keywords
- entry_point_script
- validate_script
- fatal_constraints
- spec_locks
- preconditions
- install_recipes
- human_summary_translated
action: 'After all Hard Gates PASS, resolve slug via slug_template using the executed UC, then write the .skill YAML file
at output_path_template. Notify user in their detected locale: ''Skill saved as {slug}.skill — next time say one of {sample_triggers}
from the matched UC to invoke directly.'''
violation_signal: All hard gates passed but no .skill file exists at expected path
skill_file_schema:
name: finance-bp-124 / Sharpe Ratio Bootstrap Statistical Inference
version: v5.3
intent_keywords:
- bootstrap
- sharpe ratio
- statistical inference
- confidence intervals
- stationary bootstrap
entry_point: run_backtest
fatal_guards:
- SL-01
- SL-02
- SL-03
- SL-04
- SL-05
- SL-06
- SL-07
- SL-08
- SL-10
- SL-11
- SL-12
spec_locks:
- SL-01
- SL-02
- SL-03
- SL-04
- SL-05
- SL-06
- SL-07
- SL-08
- SL-09
- SL-10
- SL-11
- SL-12
preconditions:
- PC-01
- PC-02
- PC-03
- PC-04
post_install_notice:
trigger: skill_installation_complete
message_template:
positioning: I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow.
capability_catalog:
group_strategy:
source: auto_grouped
strategy_reason: no candidate field had 2-7 distinct values; all capabilities collapsed into single group
groups:
- group_id: all
name: All Capabilities
description: ''
emoji: 📦
uc_count: 9
ucs:
- uc_id: UC-101
name: Sharpe Ratio Bootstrap Statistical Inference
short_description: Computes statistical inference (confidence intervals, standard errors) for the Sharpe Ratio using
bootstrap methods to quantify uncertainty in risk-ad
sample_triggers:
- bootstrap
- sharpe ratio
- statistical inference
- uc_id: UC-102
name: Multiple Model Comparison with SPA Test
short_description: Compares 500 predictive models against a benchmark using the Superior Predictive Ability (SPA)
test to determine if any models significantly outperfor
sample_triggers:
- model comparison
- SPA test
- multiple models
- uc_id: UC-103
name: Oil Price Cointegration Analysis
short_description: Tests for cointegration relationships between WTI and Brent crude oil prices to identify mean-reverting
spread opportunities using Engle-Granger and P
sample_triggers:
- cointegration
- unit root
- ADF test
- uc_id: UC-104
name: Credit Spread Stationarity Testing
short_description: Tests for stationarity in credit spreads (BAA-AAA) using Augmented Dickey-Fuller tests to determine
if mean-reversion trading strategies are applicabl
sample_triggers:
- unit root
- ADF test
- stationarity
- uc_id: UC-105
name: ARX Forecasting with Exogenous Variables
short_description: Forecasts univariate time series using Autoregressive models with exogenous variables (ARX) to
capture the impact of external factors on the target va
sample_triggers:
- ARX
- exogenous variables
- forecasting
- uc_id: UC-106
name: HARX Volatility Modeling with Fixed Variance
short_description: Demonstrates how to specify a HARX mean model with fixed/external variance inputs and iteratively
fit volatility models using the estimated conditiona
sample_triggers:
- fixed variance
- HARX
- volatility modeling
- uc_id: UC-107
name: S&P 500 GARCH Volatility Forecasting
short_description: Forecasts future volatility of S&P 500 returns using GARCH models, including multi-step ahead
forecasts and rolling window out-of-sample predictions
sample_triggers:
- GARCH
- volatility forecasting
- S&P 500
- uc_id: UC-108
name: S&P 500 GARCH Volatility Model Comparison
short_description: 'Fits and compares different GARCH volatility model specifications (symmetric, asymmetric, power)
with various error distributions to characterize S&P '
sample_triggers:
- GARCH
- volatility modeling
- S&P 500
- uc_id: UC-109
name: NASDAQ Volatility Scenario Generation
short_description: Generates multiple volatility scenarios for NASDAQ returns using simulation-based forecasting
methods, useful for risk management and option pricing a
sample_triggers:
- volatility scenarios
- simulation
- NASDAQ
call_to_action: Tell me which one you want to try.
featured_entries:
- uc_id: UC-101
beginner_prompt: Try sharpe ratio bootstrap statistical inference
auto_selected: true
- uc_id: UC-102
beginner_prompt: Try multiple model comparison with spa test
auto_selected: true
- uc_id: UC-103
beginner_prompt: Try oil price cointegration analysis
auto_selected: true
more_info_hint: Ask me 'what else can you do?' to see all 9 capabilities.
locale_rendering:
instruction: On skill_installation_complete, translate ALL user-facing strings (positioning + capability_catalog.groups[].name
+ capability_catalog.groups[].description + capability_catalog.groups[].ucs[].short_description + call_to_action + featured_entries[].beginner_prompt
+ more_info_hint) into detected user locale per locale_contract. Preserve UC-IDs, group_id, emoji, and sample_triggers
verbatim.
preserve_verbatim:
- UC-IDs
- group_id
- emoji
- sample_triggers
- technical_class_names
enforcement:
action: 'Host agent MUST send composed message to user as the FIRST user-facing response after skill_installation_complete
event. Message MUST contain: positioning, capability_catalog (rendered as markdown tables per group), 3 featured_entries,
call_to_action, and more_info_hint.'
violation_code: PIN-01
violation_signal: First user-facing message post-install does not contain the full capability_catalog (all UCs grouped)
OR skips featured_entries OR skips call_to_action.
human_summary:
persona: Doraemon
what_i_can_do:
tagline: 'I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow. Just tell me
what you want; I''ll write the code, you don''t have to dig docs. (Heads up: ZVT natively supports A-share, HK, and
crypto. US stocks — stockus_nasdaq_AAPL — are half-baked; don''t bother for serious work.)'
use_cases:
- Oil Price Cointegration Analysis
- Multiple Model Comparison with SPA Test
- Sharpe Ratio Bootstrap Statistical Inference
- A-share MACD daily golden-cross backtest with hfq price adjustment from eastmoney
- 'End-to-end ZVT pipeline: FinanceRecorder + GoodCompanyFactor + StockTrader'
- Multi-factor strategy with TargetSelector (AND mode) combining MACD + volume breakout
- Index composition data collection (SZ1000, SZ2000) with EM recorder
what_i_auto_fetch:
- ZVT stage pipeline structure (data_collection → visualization) from LATEST.yaml
- Semantic locks (SL-01 through SL-12) — especially sell-before-buy ordering and MACD params
- Fatal constraints (finance-C-*) relevant to your target strategy type
- 'Default parameters: MACD(12,26,9), hfq adjustment, buy_cost=0.001, base_capital=1M CNY'
- Entity ID format (stock_sh_600000) and DataFrame MultiIndex convention
- Provider-specific recorder class names and required class attributes
what_i_ask_you:
- 'Target market: A-share (default), HK, or crypto? (US stocks in ZVT are half-baked — stockus_nasdaq_AAPL exists but coverage
is thin)'
- 'Data source / provider: eastmoney (free, no account), joinquant (account+paid), baostock (free, good history), akshare,
or qmt (broker)?'
- 'Strategy type: MACD golden-cross, MA crossover, volume breakout, fundamental screen, or custom factor?'
- 'Time range: start_timestamp and end_timestamp for backtest period'
- 'Target entity IDs: specific stocks (stock_sh_600000) or index components (SZ1000)?'
locale_rendering:
instruction: On first user contact, translate all fields above into detected user locale while preserving Doraemon persona
(direct, frank, mildly snarky, knows limits).
preserve_verbatim:
- BD-IDs
- SL-IDs
- UC-IDs
- finance-C-IDs
- class_names
- function_names
- file_paths
- numeric_thresholds
生成符合AMLSim格式的合成交易数据,将交易日志转换为用于反洗钱检测系统测试的模拟数据集,支持按银行ID分割账户、合并多源输出并生成交易网络图。
---
name: aml-data-generator
description: |-
生成符合AMLSim格式的合成交易数据,将交易日志转换为用于反洗钱检测系统测试的模拟数据集,支持按银行ID分割账户、合并多源输出并生成交易网络图。
license: Proprietary. See LICENSE.txt in project root.
compatibility: Designed for Doramagic-host ecosystem (Claude Code / openclaw / Cursor). Requires Python 3.12+ with uv package manager.
metadata:
version: "v6.1"
blueprint_id: "finance-bp-060"
compiled_at: "2026-04-22T13:00:18.242568+00:00"
capability_markets: "global"
capability_activities: "regtech-compliance"
sop_version: "crystal-compilation-v6.1"
---
# AML 数据生成 (aml-data-generator)
> 生成符合AMLSim格式的合成交易数据,将交易日志转换为用于反洗钱检测系统测试的模拟数据集,支持按银行ID分割账户、合并多源输出并生成交易网络图。
## Pipeline
`data_collection -> data_storage -> factor_computation -> target_selection -> trading_execution -> visualization`
## Top Use Cases (13 total)
### Convert Logs to AML Simulation Data (`UC-101`)
Convert transaction log files into synthetic AML simulation data for testing anti-money laundering detection systems
**Triggers**: convert logs, synthetic data, AML simulation
### Split Accounts by Bank ID (`UC-102`)
Partition account CSV files by bank identifier for bank-specific analysis and processing
**Triggers**: split accounts, bank ID, partition data
### Combine AML Simulation Outputs (`UC-103`)
Aggregate multiple AMLSim output files into a consolidated dataset for comprehensive analysis
**Triggers**: combine outputs, merge data, AMLSim aggregation
For all **13** use cases, see [references/USE_CASES.md](references/USE_CASES.md).
**Execute trigger**: `When user intent matches intent_router.uc_entries[].positive_terms AND user uses action verb (run/execute/跑/执行/backtest/fetch/collect)`
## What I'll Ask You
- Target market: A-share (default), HK, or crypto? (US stocks in ZVT are half-baked — stockus_nasdaq_AAPL exists but coverage is thin)
- Data source / provider: eastmoney (free, no account), joinquant (account+paid), baostock (free, good history), akshare, or qmt (broker)?
- Strategy type: MACD golden-cross, MA crossover, volume breakout, fundamental screen, or custom factor?
- Time range: start_timestamp and end_timestamp for backtest period
- Target entity IDs: specific stocks (stock_sh_600000) or index components (SZ1000)?
## Semantic Locks (Fatal)
| ID | Rule | On Violation |
|---|---|---|
| `SL-01` | Execute sell orders before buy orders in every trading cycle | halt |
| `SL-02` | Trading signals MUST use next-bar execution (no look-ahead) | halt |
| `SL-03` | Entity IDs MUST follow format entity_type_exchange_code | halt |
| `SL-04` | DataFrame index MUST be MultiIndex (entity_id, timestamp) | halt |
| `SL-05` | TradingSignal MUST have EXACTLY ONE of: position_pct, order_money, order_amount | halt |
| `SL-06` | filter_result column semantics: True=BUY, False=SELL, None/NaN=NO ACTION | halt |
| `SL-07` | Transformer MUST run BEFORE Accumulator in factor pipeline | halt |
| `SL-08` | MACD parameters locked: fast=12, slow=26, signal=9 | halt |
Full lock definitions: [references/LOCKS.md](references/LOCKS.md)
## Top Anti-Patterns (15 total)
- **`AP-REGTECH-001`**: Missing attribute initialization on data structures
- **`AP-REGTECH-002`**: Self-loops in transaction graphs violate domain rules
- **`AP-REGTECH-003`**: Unvalidated floating-point inputs cause runtime crashes
All 15 anti-patterns: [references/ANTI_PATTERNS.md](references/ANTI_PATTERNS.md)
## Evidence Quality Notice
> [QUALITY NOTICE] This crystal was compiled from blueprint finance-bp-060. Evidence verify ratio = 15.9% and audit fail total = 22. Generated results may have uncaptured requirement gaps. Verify critical decisions against source files (LATEST.yaml / LATEST.jsonl).
## Reference Files
| File | Contents | When to Load |
|---|---|---|
| [references/seed.yaml](references/seed.yaml) | V6+ 全量权威 (source-of-truth) | 有行为/决策争议时必读 |
| [references/ANTI_PATTERNS.md](references/ANTI_PATTERNS.md) | 15 条跨项目反模式 | 开始实现前 |
| [references/WISDOM.md](references/WISDOM.md) | 跨项目精华借鉴 | 架构决策时 |
| [references/CONSTRAINTS.md](references/CONSTRAINTS.md) | domain + fatal 约束 | 规则冲突时 |
| [references/USE_CASES.md](references/USE_CASES.md) | 全量 KUC-* 业务场景 | 需要完整示例时 |
| [references/LOCKS.md](references/LOCKS.md) | SL-* + preconditions + hints | 生成回测/交易代码前 |
| [references/COMPONENTS.md](references/COMPONENTS.md) | AST 组件地图(按 module 拆分)| 查 API 时 |
---
*Compiled by Doramagic crystal-compilation-v6.1 from `finance-bp-060` blueprint at 2026-04-22T13:00:18.242568+00:00.*
*See [human_summary.md](human_summary.md) for non-technical overview.*
FILE:human_summary.md
# Human Summary
> I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow. Just tell me what you want; I'll write the code, you don't have to dig docs. (Heads up: ZVT natively supports A-share, HK, and crypto. US stocks — stockus_nasdaq_AAPL — are half-baked; don't bother for serious work.)
## What I Can Do
- **tagline**: I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow. Just tell me what you want; I'll write the code, you don't have to dig docs. (Heads up: ZVT natively supports A-share, HK, and crypto. US stocks — stockus_nasdaq_AAPL — are half-baked; don't bother for serious work.)
- **use_cases**: ['Combine AML Simulation Outputs', 'Split Accounts by Bank ID', 'Convert Logs to AML Simulation Data', 'A-share MACD daily golden-cross backtest with hfq price adjustment from eastmoney', 'End-to-end ZVT pipeline: FinanceRecorder + GoodCompanyFactor + StockTrader', 'Multi-factor strategy with TargetSelector (AND mode) combining MACD + volume breakout', 'Index composition data collection (SZ1000, SZ2000) with EM recorder']
## What I Ask You
- Target market: A-share (default), HK, or crypto? (US stocks in ZVT are half-baked — stockus_nasdaq_AAPL exists but coverage is thin)
- Data source / provider: eastmoney (free, no account), joinquant (account+paid), baostock (free, good history), akshare, or qmt (broker)?
- Strategy type: MACD golden-cross, MA crossover, volume breakout, fundamental screen, or custom factor?
- Time range: start_timestamp and end_timestamp for backtest period
- Target entity IDs: specific stocks (stock_sh_600000) or index components (SZ1000)?
FILE:references/ANTI_PATTERNS.md
# Anti-Patterns (Cross-Project)
Total: **15**
## finance-bp-060--AMLSim (1)
### `AP-REGTECH-011` — Mismatched configuration parameters across coupled components <sub>(medium)</sub>
When TransactionGenerator and Nominator use different degree_threshold values, Nominator identifies hub accounts using different criteria than TransactionGenerator. This causes incorrect fan-in/fan-out candidate selection. Consequence: AML typology patterns placed on wrong accounts, invalidating simulation results.
## finance-bp-060--AMLSim, finance-bp-067--firesale_stresstest (1)
### `AP-REGTECH-002` — Self-loops in transaction graphs violate domain rules <sub>(high)</sub>
When generating directed transaction graphs or AML typologies, allowing source == destination edges creates self-loops. In AML simulation, self-loops represent accounts sending money to themselves, which is not a valid money laundering pattern. In fire-sale models, self-loops cause undefined behavior. Consequence: corrupted graph topology and invalid typology validation.
## finance-bp-060--AMLSim, finance-bp-071--opensanctions (1)
### `AP-REGTECH-001` — Missing attribute initialization on data structures <sub>(high)</sub>
When loading account lists or creating entity dictionaries, failing to initialize required list/dict attributes (e.g., normal_models, statement IDs) causes KeyError or ValueError at runtime. The code path that reads these structures assumes they exist, but the initialization path omits them. Consequence: pipeline crashes or data loss for affected entities.
## finance-bp-062--ifrs9 (3)
### `AP-REGTECH-005` — Incorrect amortization windows violate IFRS 9 compliance <sub>(high)</sub>
Stage 1 ECL requires exactly 12-month amortization (11 zero-indexed iterations) while Stage 2/3 requires full remaining tenor (tenor-1 iterations). Using identical windows for all stages causes ECL over/understatement. Consequence: regulatory non-compliance and materially incorrect loan loss provisions.
### `AP-REGTECH-010` — Incorrect cumulative PD ordering corrupts lifetime ECL term structure <sub>(high)</sub>
Using cumprod(1-conPD) without shift(1) and fillna(1) produces corrupted first-period survival probability. This cascades into all subsequent marginal and cumulative PD calculations, violating IFRS 9 lifetime ECL requirements. Consequence: systematically incorrect provisions across all remaining tenor periods.
### `AP-REGTECH-015` — Missing EAD component in ECL formula produces incomplete provisions <sub>(high)</sub>
IFRS 9 requires ECL = PD x LGD x EAD. When the EAD module is missing or not integrated, the ECL calculation is incomplete and unusable for provisioning. Consequence: regulatory rejection of ECL calculations, blocking of provisioning and reporting processes.
## finance-bp-062--ifrs9, finance-bp-067--firesale_stresstest (2)
### `AP-REGTECH-003` — Unvalidated floating-point inputs cause runtime crashes <sub>(high)</sub>
When parsing CSV files or computing statistical functions on raw data, failing to validate inputs against acceptable ranges (e.g., DDP near 0 or 1 for norm.ppf, unvalidated floats from CSV) causes ValueError or infinite/NaN values. Consequence: entire model crashes before simulation or corrupted downstream calculations.
### `AP-REGTECH-004` — Division by zero in financial calculations produces inf/NaN <sub>(high)</sub>
When calculating ratios like DDP (downgrade observations / total observations) or price impact denominators (total_quantities), zero-denominator cases are not guarded. The resulting inf/NaN propagates through all downstream calculations, corrupting CCI, ECL, or market clearing. Consequence: systematic data corruption across the entire calculation pipeline.
## finance-bp-067--firesale_stresstest (4)
### `AP-REGTECH-006` — Wrong leverage formula in threshold-based decisions <sub>(high)</sub>
Computing leverage as equity-to-liabilities (E/L) instead of equity-to-assets (E/A) produces different values. This causes deleveraging triggers and insolvency detection to fire at wrong thresholds. Consequence: zombie banks continue operating with negative equity, or healthy banks unnecessarily deleverage.
### `AP-REGTECH-007` — Confusing deleveraging buffer threshold with insolvency threshold <sub>(high)</sub>
Banks below 3% leverage are insolvent and must default, but deleveraging should trigger at 4% buffer. Using the same threshold eliminates the buffer zone, causing immediate default with no intermediate corrective action. Consequence: excessive bank failures amplify systemic contagion.
### `AP-REGTECH-013` — Order-dependent execution creates first-mover advantage bias <sub>(medium)</sub>
Without separating step() and act() phases, first-acting banks sell assets before others decide, creating systematic first-mover advantage. This distorts the competitive equilibrium and fire-sale dynamics. Consequence: unreliable systemic risk estimates that understate contagion for late-acting banks.
### `AP-REGTECH-014` — Immediate asset sales cause double-selling and undefined state <sub>(medium)</sub>
Executing asset sales immediately rather than queuing them to a buffer allows multiple banks holding the same asset to sell simultaneously without accounting for concurrent intentions. Consequence: undefined price impact and incorrect cash transfers in market clearing.
## finance-bp-071--opensanctions (3)
### `AP-REGTECH-008` — Cache keys omit request body for state-changing methods <sub>(high)</sub>
Using only URL for cache fingerprints on POST/PATCH requests means different request bodies return identical cached content. This causes stale data, missing entities, and data corruption in compliance screening pipelines. Consequence: sanctions matches missed or false positives from stale entity data.
### `AP-REGTECH-009` — ID collision in entity construction creates false sanctions matches <sub>(high)</sub>
When constructing entity IDs from source identifiers, insufficient identifying attributes cause different real-world entities to receive identical IDs. The database then merges them into one entity. Consequence: a sanctioned entity's ID matches an innocent entity, causing false positive compliance alerts.
### `AP-REGTECH-012` — Reverse property assignment corrupts entity construction <sub>(medium)</sub>
Stub (reverse) properties represent inverse relationships and raise InvalidData when directly assigned. Attempting to add values to stub properties instead of forward properties causes ValueError, aborting entity construction. Consequence: entities lost from output, incomplete compliance datasets.
FILE:references/COMPONENTS.md
# Component Capability Map
**Project**: finance-bp-060--AMLSim
**Scan date**: 2026-04-22
**Stats**: {'total_files': 5, 'total_classes': 20, 'total_functions': 0, 'total_stages': 5}
## Modules (5)
- [graph_construction](components/graph_construction.md): 5 classes
- [alert_pattern_generation](components/alert_pattern_generation.md): 3 classes
- [log_conversion](components/log_conversion.md): 5 classes
- [alert_validation](components/alert_validation.md): 5 classes
- [data_combination](components/data_combination.md): 2 classes
FILE:references/CONSTRAINTS.md
# Constraints
## preservation_manifest
```yaml
required_objects:
business_decisions_count: 114
fatal_constraints_count: 54
non_fatal_constraints_count: 129
use_cases_count: 13
semantic_locks_count: 12
preconditions_count: 4
evidence_quality_rules_count: 2
traceback_scenarios_count: 5
```
FILE:references/LOCKS.md
# Semantic Locks + Preconditions
## Semantic Locks (12)
### `SL-01` <sub>(on_violation: fatal)</sub>
Execute sell orders before buy orders in every trading cycle
### `SL-02` <sub>(on_violation: fatal)</sub>
Trading signals MUST use next-bar execution (no look-ahead)
### `SL-03` <sub>(on_violation: fatal)</sub>
Entity IDs MUST follow format entity_type_exchange_code
### `SL-04` <sub>(on_violation: fatal)</sub>
DataFrame index MUST be MultiIndex (entity_id, timestamp)
### `SL-05` <sub>(on_violation: fatal)</sub>
TradingSignal MUST have EXACTLY ONE of: position_pct, order_money, order_amount
### `SL-06` <sub>(on_violation: fatal)</sub>
filter_result column semantics: True=BUY, False=SELL, None/NaN=NO ACTION
### `SL-07` <sub>(on_violation: fatal)</sub>
Transformer MUST run BEFORE Accumulator in factor pipeline
### `SL-08` <sub>(on_violation: fatal)</sub>
MACD parameters locked: fast=12, slow=26, signal=9
### `SL-09` <sub>(on_violation: warning)</sub>
Default transaction costs: buy_cost=0.001, sell_cost=0.001, slippage=0.001
### `SL-10` <sub>(on_violation: fatal)</sub>
A-share equity trading is T+1 (no same-day close of buy positions)
### `SL-11` <sub>(on_violation: fatal)</sub>
Recorder subclass MUST define provider AND data_schema class attributes
### `SL-12` <sub>(on_violation: fatal)</sub>
Factor result_df MUST contain either 'filter_result' OR 'score_result' column
## Preconditions (4)
- **`PC-01`**: `python3 -c 'import zvt; print(zvt.__version__)'` → on_fail: Run: python3 -m pip install zvt then re-run: python3 -m zvt.init_dirs to initialize data directories
- **`PC-02`**: `python3 -c "from zvt.api.kdata import get_kdata; df = get_kdata(entity_ids=['stock_sh_600000'], limit=1); assert df is not None and len(df) > 0, 'No kdata found'"` → on_fail: Run recorder first: python3 -m zvt.recorders.em.em_stock_kdata_recorder --entity_ids stock_sh_600000 (replace with your target entity IDs)
- **`PC-03`**: `python3 -c "import os; from pathlib import Path; zvt_home = Path(os.environ.get('ZVT_HOME', Path.home() / '.zvt')); assert zvt_home.exists(), f'ZVT home not found: {zvt_home}'"` → on_fail: Run: python3 -m zvt.init_dirs
- **`PC-04`**: `python3 -c "import os, tempfile; from pathlib import Path; zvt_home = Path(os.environ.get('ZVT_HOME', Path.home() / '.zvt')); test_f = zvt_home / '.write_test'; test_f.touch(); test_f.unlink()"` → on_fail: Check directory permissions: chmod u+w ~/.zvt or set ZVT_HOME environment variable to a writable location
FILE:references/USE_CASES.md
# Known Use Cases (KUC)
Total: **13**
## `KUC-101`
**Source**: `scripts/convert_logs.py`
Convert transaction log files into synthetic AML simulation data for testing anti-money laundering detection systems
## `KUC-102`
**Source**: `scripts/split_accounts_bank.py`
Partition account CSV files by bank identifier for bank-specific analysis and processing
## `KUC-103`
**Source**: `scripts/combine_data.py`
Aggregate multiple AMLSim output files into a consolidated dataset for comprehensive analysis
## `KUC-104`
**Source**: `scripts/transaction_graph_generator.py`
Generate the base transaction network graph used as input for AML simulation, defining account relationships and transaction patterns
## `KUC-105`
**Source**: `scripts/generate_scalefree.py`
Generate scale-free network graphs using Kronecker graph algorithm for research on network topology and distribution analysis
## `KUC-106`
**Source**: `scripts/visualize/plot_alert_pattern_subgraphs.py`
Visualize alert pattern subgraphs showing which accounts and transactions are involved in each generated alert for debugging and validation
## `KUC-107`
**Source**: `scripts/visualize/plot_distributions.py`
Generate statistical distribution plots (degree, amount, frequency) from transaction graphs for analysis and reporting
## `KUC-108`
**Source**: `scripts/amlsim/random_amount.py`
Generate random transaction amounts within configurable min/max bounds for transaction simulation
## `KUC-109`
**Source**: `scripts/amlsim/nominator.py`
Select appropriate accounts for different transaction types (fan-in, fan-out, single, mutual, periodical) based on network degree thresholds
## `KUC-110`
**Source**: `scripts/amlsim/rounded_amount.py`
Generate rounded transaction amounts (e.g., 100, 500, 1000) to simulate realistic human transaction patterns
## `KUC-111`
**Source**: `scripts/amlsim/normal_model.py`
Define and manage normal (non-suspicious) account behavior models including main accounts and member accounts for transaction simulation
## `KUC-112`
**Source**: `scripts/validation/network_analytics.py`
Load AMLSim outputs and analyze transaction network characteristics including degree distribution, connected components, and graph properties
## `KUC-113`
**Source**: `scripts/validation/validate_alerts.py`
Validate generated alerts against expected alert parameters to ensure AML simulation produces correct alert patterns and amounts
FILE:references/WISDOM.md
# Cross-Project Wisdom
Total: **10**
## `CW-REGTECH-001` — Input bounds validation before statistical computation
**From**: finance-bp-062--ifrs9, finance-bp-067--firesale_stresstest · **Applicable to**: regtech-compliance
Statistical functions like norm.ppf() and cumprod() have strict input requirements that, if violated, produce infinite or NaN values corrupting entire pipelines. Always validate inputs against domain constraints (DDP in (0,1), counts > 0) before passing to statistical functions. Apply to any statistical or inverse-CDF computation.
## `CW-REGTECH-002` — Graph/topology invariant verification before construction
**From**: finance-bp-060--AMLSim, finance-bp-067--firesale_stresstest · **Applicable to**: regtech-compliance
Before constructing graph structures (transaction networks, transition matrices), verify invariants: sum(in-degrees) = sum(out-degrees), matrix row sums = 1.0, degree sequence length divisibility. This catches data corruption early before expensive graph construction operations. Apply to any bipartite or directed graph generation.
## `CW-REGTECH-003` — Regulatory amortization window discipline
**From**: finance-bp-062--ifrs9 · **Applicable to**: regtech-compliance
IFRS 9 mandates different ECL calculation windows: exactly 12-month for Stage 1 (11 zero-indexed iterations), full remaining tenor for Stage 2/3. Mixing these up violates compliance requirements. Always encode stage-specific window logic explicitly rather than reusing a single loop variable across stages.
## `CW-REGTECH-004` — Fingerprint composition must include all request dimensions
**From**: finance-bp-071--opensanctions · **Applicable to**: regtech-compliance
Cache keys must include all request parameters that affect response content: URL, HTTP method, authentication headers, and request body for state-changing methods. POST requests with different bodies returning identical cache is a silent data corruption bug. Always compose fingerprints from the union of all content-affecting parameters.
## `CW-REGTECH-005` — Floating-point zero-equivalence with explicit epsilon tolerance
**From**: finance-bp-067--firesale_stresstest · **Applicable to**: regtech-compliance
IEEE 754 floating-point precision causes exact zero comparisons to fail in financial calculations. Always use eps=1e-9 tolerance for zero-equivalence checks in market clearing, leverage ratios, and price impact calculations. This prevents division-by-zero crashes and incorrect cash transfers.
## `CW-REGTECH-006` — Stage classification threshold ordering enforcement
**From**: finance-bp-062--ifrs9 · **Applicable to**: regtech-compliance
IFRS 9 SICR thresholds must be ordered: BUCKETS 2-3 trigger Stage 2, BUCKETS >=4 trigger Stage 3. Applying thresholds in wrong order or omitting absolute DPD triggers causes material ECL misstatement. Validate threshold ordering and document bucket-to-stage mapping explicitly.
## `CW-REGTECH-007` — Initialization-before-use dependency ordering
**From**: finance-bp-067--firesale_stresstest · **Applicable to**: regtech-compliance
Operational dependencies must initialize before dependent objects use them: AssetMarket before bank registration, CSV file existence before parsing, entity ID before statement addition. Violations cause AttributeError or FileNotFoundError that abort entire initialization. Always encode dependency ordering explicitly in initialization sequences.
## `CW-REGTECH-008` — Sufficient entity ID collision prevention
**From**: finance-bp-071--opensanctions · **Applicable to**: regtech-compliance
Entity IDs must include enough identifying attributes (dataset prefix, source, identifier type, document number) to guarantee uniqueness. Collisions create false equivalence between unrelated entities, directly causing false positive sanctions matches. Include the maximum available discriminating attributes in ID construction.
## `CW-REGTECH-009` — Hub selection with candidate removal before addition
**From**: finance-bp-060--AMLSim · **Applicable to**: regtech-compliance
When selecting hub accounts for typology placement, always call remove_typology_candidate BEFORE add_node for each selected account. Reversing this order causes hub self-selection (accounts choosing themselves) and duplicate assignment across overlapping patterns. Apply to any allocation algorithm with candidate pooling.
## `CW-REGTECH-010` — Insolvency detection before operational decisions
**From**: finance-bp-067--firesale_stresstest · **Applicable to**: regtech-compliance
Banks below the insolvency threshold (3% leverage) must trigger default immediately, not enter the deleveraging decision logic. Checking operational thresholds before insolvency creates zombie banks with negative equity. Always gate operational decisions on prior insolvency state.
FILE:references/components/alert_pattern_generation.md
# alert_pattern_generation (3 classes)
## `TransactionGenerator.add_aml_typology`
`alert_pattern_generation/transactiongenerator-add-aml-typology.py:0`
## `AMLTypology.add_transaction`
`alert_pattern_generation/amltypology-add-transaction.py:0`
## `AlertPattern`
`alert_pattern_generation/alertpattern.py:0`
FILE:references/components/alert_validation.md
# alert_validation (5 classes)
## `AlertValidator.validate_all`
`alert_validation/alertvalidator-validate-all.py:0`
## `satisfies_params`
`alert_validation/satisfies-params.py:0`
## `is_cycle`
`alert_validation/is-cycle.py:0`
## `is_scatter_gather`
`alert_validation/is-scatter-gather.py:0`
## `PatternValidator`
`alert_validation/patternvalidator.py:0`
FILE:references/components/data_combination.md
# data_combination (2 classes)
## `Combiner.combine`
`data_combination/combiner-combine.py:0`
## `Combiner.merge_schemas`
`data_combination/combiner-merge-schemas.py:0`
FILE:references/components/graph_construction.md
# graph_construction (5 classes)
## `TransactionGenerator.generate_normal_transactions`
`graph_construction/transactiongenerator-generate-normal-tra.py:0`
## `TransactionGenerator.build_normal_models`
`graph_construction/transactiongenerator-build-normal-models.py:0`
## `Nominator.place_normal_models`
`graph_construction/nominator-place-normal-models.py:0`
## `AmountGenerator`
`graph_construction/amountgenerator.py:0`
## `NormalModelType`
`graph_construction/normalmodeltype.py:0`
FILE:references/components/log_conversion.md
# log_conversion (5 classes)
## `LogConverter.convert`
`log_conversion/logconverter-convert.py:0`
## `Schema.get_tx_row`
`log_conversion/schema-get-tx-row.py:0`
## `Schema.get_account_row`
`log_conversion/schema-get-account-row.py:0`
## `FakerLocale`
`log_conversion/fakerlocale.py:0`
## `OutputFormat`
`log_conversion/outputformat.py:0`
FILE:references/seed.yaml
meta:
id: finance-bp-060-v5.3
version: v6.1
blueprint_id: finance-bp-060
sop_version: crystal-compilation-v6.1
source_language: en
compiled_at: '2026-04-22T13:00:18.242568+00:00'
target_host: openclaw
authoritative_artifact:
primary: seed.yaml
non_authoritative_derivatives:
- SKILL.md (host-generated summary, may lag)
- HEARTBEAT.md (host telemetry)
- memory/*.md (host conversational memory)
rule: On any behavioral decision (preconditions check, OV assertion, EQ rule firing, spec_lock verification), agents MUST
re-read seed.yaml. Derivatives are for UI display only and may be out-of-date.
execution_protocol:
install_trigger:
- Execute resources.host_adapter.install_recipes[] in declared order
- Verify each package with import check before proceeding
execute_trigger: When user intent matches intent_router.uc_entries[].positive_terms AND user uses action verb (run/execute/跑/执行/backtest/fetch/collect)
on_execute:
- Reload seed.yaml (do not rely on SKILL.md or cached summaries)
- Run preconditions[] in declared order; halt on first fatal failure with on_fail message to user
- Enter context_state_machine.CA1_MEMORY_CHECKED state
- Evaluate evidence_quality.enforcement_rules[]; prepend user_disclosure_template
- Translate user_facing_fields to user locale per locale_contract
- "[V6 READING ORDER]\nThis crystal contains the following V6 layers. Before answering any business question, the host\
\ MUST read them in order:\n 1. anti_patterns[] — cross-project anti-patterns (with AP-* ids)\n 2. cross_project_wisdom[]\
\ — cross-project wisdom (with CW-* ids)\n 3. domain_constraints_injected[] — domain constraints (SHARED-* ids)\n \
\ 4. known_use_cases[] — concrete business scenarios (KUC-* ids)\n 5. component_capability_map — AST component map\
\ (by module)\n\nWhen answering user questions, proactively cite relevant AP-*/CW-*/SHARED-*/KUC-* ids with source text.\
\ Examples: T+1 rules -> cite SHARED-* constraint; model comparison -> warn via AP-*; follow-holdings strategy -> cite\
\ KUC-* with example file."
workspace_resolution:
scripts_path: '{host_workspace}/scripts/'
skills_path: '{host_workspace}/skills/'
trace_path: '{host_workspace}/.trace/'
capability_tags:
markets:
- global
activities:
- regtech-compliance
upgraded_from: finance-bp-060-v1.seed.yaml
upgraded_at: '2026-04-22T13:20:11.565905+00:00'
v6_inputs:
ast_mind_map: knowledge/sources/finance/finance-bp-060--AMLSim/v6_inputs/ast_mind_map.yaml
anti_patterns: null
cross_project_wisdom: null
examples_kuc: knowledge/sources/finance/finance-bp-060--AMLSim/v6_inputs/examples_kuc.yaml
shared_pools_dir: knowledge/sources/finance/_shared
anti_patterns:
- id: AP-REGTECH-001
title: Missing attribute initialization on data structures
description: 'When loading account lists or creating entity dictionaries, failing to initialize required list/dict attributes
(e.g., normal_models, statement IDs) causes KeyError or ValueError at runtime. The code path that reads these structures
assumes they exist, but the initialization path omits them. Consequence: pipeline crashes or data loss for affected entities.'
project_source: finance-bp-060--AMLSim, finance-bp-071--opensanctions
severity: high
applicable_to_tags:
markets:
- global
activities:
- regtech-compliance
_source_file: anti-patterns/regtech.yaml
- id: AP-REGTECH-002
title: Self-loops in transaction graphs violate domain rules
description: 'When generating directed transaction graphs or AML typologies, allowing source == destination edges creates
self-loops. In AML simulation, self-loops represent accounts sending money to themselves, which is not a valid money laundering
pattern. In fire-sale models, self-loops cause undefined behavior. Consequence: corrupted graph topology and invalid typology
validation.'
project_source: finance-bp-060--AMLSim, finance-bp-067--firesale_stresstest
severity: high
applicable_to_tags:
markets:
- global
activities:
- regtech-compliance
_source_file: anti-patterns/regtech.yaml
- id: AP-REGTECH-003
title: Unvalidated floating-point inputs cause runtime crashes
description: 'When parsing CSV files or computing statistical functions on raw data, failing to validate inputs against
acceptable ranges (e.g., DDP near 0 or 1 for norm.ppf, unvalidated floats from CSV) causes ValueError or infinite/NaN
values. Consequence: entire model crashes before simulation or corrupted downstream calculations.'
project_source: finance-bp-062--ifrs9, finance-bp-067--firesale_stresstest
severity: high
applicable_to_tags:
markets:
- global
activities:
- regtech-compliance
_source_file: anti-patterns/regtech.yaml
- id: AP-REGTECH-004
title: Division by zero in financial calculations produces inf/NaN
description: 'When calculating ratios like DDP (downgrade observations / total observations) or price impact denominators
(total_quantities), zero-denominator cases are not guarded. The resulting inf/NaN propagates through all downstream calculations,
corrupting CCI, ECL, or market clearing. Consequence: systematic data corruption across the entire calculation pipeline.'
project_source: finance-bp-062--ifrs9, finance-bp-067--firesale_stresstest
severity: high
applicable_to_tags:
markets:
- global
activities:
- regtech-compliance
_source_file: anti-patterns/regtech.yaml
- id: AP-REGTECH-005
title: Incorrect amortization windows violate IFRS 9 compliance
description: 'Stage 1 ECL requires exactly 12-month amortization (11 zero-indexed iterations) while Stage 2/3 requires full
remaining tenor (tenor-1 iterations). Using identical windows for all stages causes ECL over/understatement. Consequence:
regulatory non-compliance and materially incorrect loan loss provisions.'
project_source: finance-bp-062--ifrs9
severity: high
applicable_to_tags:
markets:
- global
activities:
- regtech-compliance
_source_file: anti-patterns/regtech.yaml
- id: AP-REGTECH-006
title: Wrong leverage formula in threshold-based decisions
description: 'Computing leverage as equity-to-liabilities (E/L) instead of equity-to-assets (E/A) produces different values.
This causes deleveraging triggers and insolvency detection to fire at wrong thresholds. Consequence: zombie banks continue
operating with negative equity, or healthy banks unnecessarily deleverage.'
project_source: finance-bp-067--firesale_stresstest
severity: high
applicable_to_tags:
markets:
- global
activities:
- regtech-compliance
_source_file: anti-patterns/regtech.yaml
- id: AP-REGTECH-007
title: Confusing deleveraging buffer threshold with insolvency threshold
description: 'Banks below 3% leverage are insolvent and must default, but deleveraging should trigger at 4% buffer. Using
the same threshold eliminates the buffer zone, causing immediate default with no intermediate corrective action. Consequence:
excessive bank failures amplify systemic contagion.'
project_source: finance-bp-067--firesale_stresstest
severity: high
applicable_to_tags:
markets:
- global
activities:
- regtech-compliance
_source_file: anti-patterns/regtech.yaml
- id: AP-REGTECH-008
title: Cache keys omit request body for state-changing methods
description: 'Using only URL for cache fingerprints on POST/PATCH requests means different request bodies return identical
cached content. This causes stale data, missing entities, and data corruption in compliance screening pipelines. Consequence:
sanctions matches missed or false positives from stale entity data.'
project_source: finance-bp-071--opensanctions
severity: high
applicable_to_tags:
markets:
- global
activities:
- regtech-compliance
_source_file: anti-patterns/regtech.yaml
- id: AP-REGTECH-009
title: ID collision in entity construction creates false sanctions matches
description: 'When constructing entity IDs from source identifiers, insufficient identifying attributes cause different
real-world entities to receive identical IDs. The database then merges them into one entity. Consequence: a sanctioned
entity''s ID matches an innocent entity, causing false positive compliance alerts.'
project_source: finance-bp-071--opensanctions
severity: high
applicable_to_tags:
markets:
- global
activities:
- regtech-compliance
_source_file: anti-patterns/regtech.yaml
- id: AP-REGTECH-010
title: Incorrect cumulative PD ordering corrupts lifetime ECL term structure
description: 'Using cumprod(1-conPD) without shift(1) and fillna(1) produces corrupted first-period survival probability.
This cascades into all subsequent marginal and cumulative PD calculations, violating IFRS 9 lifetime ECL requirements.
Consequence: systematically incorrect provisions across all remaining tenor periods.'
project_source: finance-bp-062--ifrs9
severity: high
applicable_to_tags:
markets:
- global
activities:
- regtech-compliance
_source_file: anti-patterns/regtech.yaml
- id: AP-REGTECH-011
title: Mismatched configuration parameters across coupled components
description: 'When TransactionGenerator and Nominator use different degree_threshold values, Nominator identifies hub accounts
using different criteria than TransactionGenerator. This causes incorrect fan-in/fan-out candidate selection. Consequence:
AML typology patterns placed on wrong accounts, invalidating simulation results.'
project_source: finance-bp-060--AMLSim
severity: medium
applicable_to_tags:
markets:
- global
activities:
- regtech-compliance
_source_file: anti-patterns/regtech.yaml
- id: AP-REGTECH-012
title: Reverse property assignment corrupts entity construction
description: 'Stub (reverse) properties represent inverse relationships and raise InvalidData when directly assigned. Attempting
to add values to stub properties instead of forward properties causes ValueError, aborting entity construction. Consequence:
entities lost from output, incomplete compliance datasets.'
project_source: finance-bp-071--opensanctions
severity: medium
applicable_to_tags:
markets:
- global
activities:
- regtech-compliance
_source_file: anti-patterns/regtech.yaml
- id: AP-REGTECH-013
title: Order-dependent execution creates first-mover advantage bias
description: 'Without separating step() and act() phases, first-acting banks sell assets before others decide, creating
systematic first-mover advantage. This distorts the competitive equilibrium and fire-sale dynamics. Consequence: unreliable
systemic risk estimates that understate contagion for late-acting banks.'
project_source: finance-bp-067--firesale_stresstest
severity: medium
applicable_to_tags:
markets:
- global
activities:
- regtech-compliance
_source_file: anti-patterns/regtech.yaml
- id: AP-REGTECH-014
title: Immediate asset sales cause double-selling and undefined state
description: 'Executing asset sales immediately rather than queuing them to a buffer allows multiple banks holding the same
asset to sell simultaneously without accounting for concurrent intentions. Consequence: undefined price impact and incorrect
cash transfers in market clearing.'
project_source: finance-bp-067--firesale_stresstest
severity: medium
applicable_to_tags:
markets:
- global
activities:
- regtech-compliance
_source_file: anti-patterns/regtech.yaml
- id: AP-REGTECH-015
title: Missing EAD component in ECL formula produces incomplete provisions
description: 'IFRS 9 requires ECL = PD x LGD x EAD. When the EAD module is missing or not integrated, the ECL calculation
is incomplete and unusable for provisioning. Consequence: regulatory rejection of ECL calculations, blocking of provisioning
and reporting processes.'
project_source: finance-bp-062--ifrs9
severity: high
applicable_to_tags:
markets:
- global
activities:
- regtech-compliance
_source_file: anti-patterns/regtech.yaml
cross_project_wisdom:
- wisdom_id: CW-REGTECH-001
source_project: finance-bp-062--ifrs9, finance-bp-067--firesale_stresstest
pattern_name: Input bounds validation before statistical computation
description: Statistical functions like norm.ppf() and cumprod() have strict input requirements that, if violated, produce
infinite or NaN values corrupting entire pipelines. Always validate inputs against domain constraints (DDP in (0,1), counts
> 0) before passing to statistical functions. Apply to any statistical or inverse-CDF computation.
applicable_to_activity: regtech-compliance
_source_file: cross-project-wisdom/regtech.yaml
- wisdom_id: CW-REGTECH-002
source_project: finance-bp-060--AMLSim, finance-bp-067--firesale_stresstest
pattern_name: Graph/topology invariant verification before construction
description: 'Before constructing graph structures (transaction networks, transition matrices), verify invariants: sum(in-degrees)
= sum(out-degrees), matrix row sums = 1.0, degree sequence length divisibility. This catches data corruption early before
expensive graph construction operations. Apply to any bipartite or directed graph generation.'
applicable_to_activity: regtech-compliance
_source_file: cross-project-wisdom/regtech.yaml
- wisdom_id: CW-REGTECH-003
source_project: finance-bp-062--ifrs9
pattern_name: Regulatory amortization window discipline
description: 'IFRS 9 mandates different ECL calculation windows: exactly 12-month for Stage 1 (11 zero-indexed iterations),
full remaining tenor for Stage 2/3. Mixing these up violates compliance requirements. Always encode stage-specific window
logic explicitly rather than reusing a single loop variable across stages.'
applicable_to_activity: regtech-compliance
_source_file: cross-project-wisdom/regtech.yaml
- wisdom_id: CW-REGTECH-004
source_project: finance-bp-071--opensanctions
pattern_name: Fingerprint composition must include all request dimensions
description: 'Cache keys must include all request parameters that affect response content: URL, HTTP method, authentication
headers, and request body for state-changing methods. POST requests with different bodies returning identical cache is
a silent data corruption bug. Always compose fingerprints from the union of all content-affecting parameters.'
applicable_to_activity: regtech-compliance
_source_file: cross-project-wisdom/regtech.yaml
- wisdom_id: CW-REGTECH-005
source_project: finance-bp-067--firesale_stresstest
pattern_name: Floating-point zero-equivalence with explicit epsilon tolerance
description: IEEE 754 floating-point precision causes exact zero comparisons to fail in financial calculations. Always use
eps=1e-9 tolerance for zero-equivalence checks in market clearing, leverage ratios, and price impact calculations. This
prevents division-by-zero crashes and incorrect cash transfers.
applicable_to_activity: regtech-compliance
_source_file: cross-project-wisdom/regtech.yaml
- wisdom_id: CW-REGTECH-006
source_project: finance-bp-062--ifrs9
pattern_name: Stage classification threshold ordering enforcement
description: 'IFRS 9 SICR thresholds must be ordered: BUCKETS 2-3 trigger Stage 2, BUCKETS >=4 trigger Stage 3. Applying
thresholds in wrong order or omitting absolute DPD triggers causes material ECL misstatement. Validate threshold ordering
and document bucket-to-stage mapping explicitly.'
applicable_to_activity: regtech-compliance
_source_file: cross-project-wisdom/regtech.yaml
- wisdom_id: CW-REGTECH-007
source_project: finance-bp-067--firesale_stresstest
pattern_name: Initialization-before-use dependency ordering
description: 'Operational dependencies must initialize before dependent objects use them: AssetMarket before bank registration,
CSV file existence before parsing, entity ID before statement addition. Violations cause AttributeError or FileNotFoundError
that abort entire initialization. Always encode dependency ordering explicitly in initialization sequences.'
applicable_to_activity: regtech-compliance
_source_file: cross-project-wisdom/regtech.yaml
- wisdom_id: CW-REGTECH-008
source_project: finance-bp-071--opensanctions
pattern_name: Sufficient entity ID collision prevention
description: Entity IDs must include enough identifying attributes (dataset prefix, source, identifier type, document number)
to guarantee uniqueness. Collisions create false equivalence between unrelated entities, directly causing false positive
sanctions matches. Include the maximum available discriminating attributes in ID construction.
applicable_to_activity: regtech-compliance
_source_file: cross-project-wisdom/regtech.yaml
- wisdom_id: CW-REGTECH-009
source_project: finance-bp-060--AMLSim
pattern_name: Hub selection with candidate removal before addition
description: When selecting hub accounts for typology placement, always call remove_typology_candidate BEFORE add_node for
each selected account. Reversing this order causes hub self-selection (accounts choosing themselves) and duplicate assignment
across overlapping patterns. Apply to any allocation algorithm with candidate pooling.
applicable_to_activity: regtech-compliance
_source_file: cross-project-wisdom/regtech.yaml
- wisdom_id: CW-REGTECH-010
source_project: finance-bp-067--firesale_stresstest
pattern_name: Insolvency detection before operational decisions
description: Banks below the insolvency threshold (3% leverage) must trigger default immediately, not enter the deleveraging
decision logic. Checking operational thresholds before insolvency creates zombie banks with negative equity. Always gate
operational decisions on prior insolvency state.
applicable_to_activity: regtech-compliance
_source_file: cross-project-wisdom/regtech.yaml
domain_constraints_injected: []
resources_injected: {}
known_use_cases:
- kuc_id: KUC-101
source_file: scripts/convert_logs.py
business_problem: Convert transaction log files into synthetic AML simulation data for testing anti-money laundering detection
systems
intent_keywords:
- convert logs
- synthetic data
- AML simulation
- generate transaction logs
- test data generation
stage: data_collection
data_domain: mixed
type: data_pipeline
- kuc_id: KUC-102
source_file: scripts/split_accounts_bank.py
business_problem: Partition account CSV files by bank identifier for bank-specific analysis and processing
intent_keywords:
- split accounts
- bank ID
- partition data
- bank filtering
- account grouping
stage: data_collection
data_domain: holding_data
type: data_pipeline
- kuc_id: KUC-103
source_file: scripts/combine_data.py
business_problem: Aggregate multiple AMLSim output files into a consolidated dataset for comprehensive analysis
intent_keywords:
- combine outputs
- merge data
- AMLSim aggregation
- consolidate simulation results
- dataset assembly
stage: data_collection
data_domain: mixed
type: data_pipeline
- kuc_id: KUC-104
source_file: scripts/transaction_graph_generator.py
business_problem: Generate the base transaction network graph used as input for AML simulation, defining account relationships
and transaction patterns
intent_keywords:
- transaction graph
- network generation
- graph topology
- AMLSim input
- account relationships
stage: data_collection
data_domain: trading_data
type: data_pipeline
- kuc_id: KUC-105
source_file: scripts/generate_scalefree.py
business_problem: Generate scale-free network graphs using Kronecker graph algorithm for research on network topology and
distribution analysis
intent_keywords:
- scale-free
- Kronecker graph
- network topology
- degree distribution
- graph generation research
stage: network_generation
data_domain: market_data
type: research_analysis
- kuc_id: KUC-106
source_file: scripts/visualize/plot_alert_pattern_subgraphs.py
business_problem: Visualize alert pattern subgraphs showing which accounts and transactions are involved in each generated
alert for debugging and validation
intent_keywords:
- alert visualization
- subgraph plot
- alert debugging
- pattern inspection
- AMLSim validation
stage: validation
data_domain: trading_data
type: monitoring
- kuc_id: KUC-107
source_file: scripts/visualize/plot_distributions.py
business_problem: Generate statistical distribution plots (degree, amount, frequency) from transaction graphs for analysis
and reporting
intent_keywords:
- distribution plot
- statistics
- degree distribution
- amount analysis
- transaction visualization
stage: validation
data_domain: trading_data
type: reporting
- kuc_id: KUC-108
source_file: scripts/amlsim/random_amount.py
business_problem: Generate random transaction amounts within configurable min/max bounds for transaction simulation
intent_keywords:
- random amount
- transaction generator
- random number
- amount range
- simulation utility
stage: factor_computation
data_domain: trading_data
type: builtin_factor
- kuc_id: KUC-109
source_file: scripts/amlsim/nominator.py
business_problem: Select appropriate accounts for different transaction types (fan-in, fan-out, single, mutual, periodical)
based on network degree thresholds
intent_keywords:
- account selection
- nominator
- transaction routing
- fan-in fan-out
- network degree
stage: factor_computation
data_domain: holding_data
type: builtin_factor
- kuc_id: KUC-110
source_file: scripts/amlsim/rounded_amount.py
business_problem: Generate rounded transaction amounts (e.g., 100, 500, 1000) to simulate realistic human transaction patterns
intent_keywords:
- rounded amount
- realistic transaction
- human pattern
- currency rounding
- simulation utility
stage: factor_computation
data_domain: trading_data
type: builtin_factor
- kuc_id: KUC-111
source_file: scripts/amlsim/normal_model.py
business_problem: Define and manage normal (non-suspicious) account behavior models including main accounts and member accounts
for transaction simulation
intent_keywords:
- normal model
- behavior model
- account group
- main account
- member account
stage: factor_computation
data_domain: holding_data
type: builtin_factor
- kuc_id: KUC-112
source_file: scripts/validation/network_analytics.py
business_problem: Load AMLSim outputs and analyze transaction network characteristics including degree distribution, connected
components, and graph properties
intent_keywords:
- network analysis
- graph analytics
- validation
- topology analysis
- degree analysis
stage: validation
data_domain: trading_data
type: monitoring
- kuc_id: KUC-113
source_file: scripts/validation/validate_alerts.py
business_problem: Validate generated alerts against expected alert parameters to ensure AML simulation produces correct
alert patterns and amounts
intent_keywords:
- validate alerts
- alert verification
- simulation accuracy
- alert parameters
- SAR validation
stage: validation
data_domain: trading_data
type: monitoring
component_capability_map:
project: finance-bp-060--AMLSim
scan_date: '2026-04-22'
stats:
total_files: 5
total_classes: 20
total_functions: 0
total_stages: 5
modules:
graph_construction:
class_count: 5
stage_id: graph_construction
stage_order: 1
responsibility: Builds a directed transaction graph from account lists and degree sequences using configuration-model
random graphs. This is the foundation layer that creates the network topology for each downstream processing.
classes:
- name: TransactionGenerator.generate_normal_transactions
file: graph_construction/transactiongenerator-generate-normal-tra.py
line: 0
kind: required_method
signature: ''
- name: TransactionGenerator.build_normal_models
file: graph_construction/transactiongenerator-build-normal-models.py
line: 0
kind: required_method
signature: ''
- name: Nominator.place_normal_models
file: graph_construction/nominator-place-normal-models.py
line: 0
kind: required_method
signature: ''
- name: AmountGenerator
file: graph_construction/amountgenerator.py
line: 0
kind: replaceable_point
- name: NormalModelType
file: graph_construction/normalmodeltype.py
line: 0
kind: replaceable_point
design_decision_count: 5
alert_pattern_generation:
class_count: 3
stage_id: alert_pattern_generation
stage_order: 2
responsibility: Injects suspicious AML typology patterns (fan-in, fan-out, cycle, scatter-gather) into the base transaction
graph. These represent the ground-truth alerts that validation will later detect.
classes:
- name: TransactionGenerator.add_aml_typology
file: alert_pattern_generation/transactiongenerator-add-aml-typology.py
line: 0
kind: required_method
signature: ''
- name: AMLTypology.add_transaction
file: alert_pattern_generation/amltypology-add-transaction.py
line: 0
kind: required_method
signature: ''
- name: AlertPattern
file: alert_pattern_generation/alertpattern.py
line: 0
kind: replaceable_point
design_decision_count: 4
log_conversion:
class_count: 5
stage_id: log_conversion
stage_order: 3
responsibility: Transforms simulator output into standardized database schema format (Neo4j, JanusGraph). Applies Faker-generated
names, computes party relationships, and formats timestamps.
classes:
- name: LogConverter.convert
file: log_conversion/logconverter-convert.py
line: 0
kind: required_method
signature: ''
- name: Schema.get_tx_row
file: log_conversion/schema-get-tx-row.py
line: 0
kind: required_method
signature: ''
- name: Schema.get_account_row
file: log_conversion/schema-get-account-row.py
line: 0
kind: required_method
signature: ''
- name: FakerLocale
file: log_conversion/fakerlocale.py
line: 0
kind: replaceable_point
- name: OutputFormat
file: log_conversion/outputformat.py
line: 0
kind: replaceable_point
design_decision_count: 3
alert_validation:
class_count: 5
stage_id: alert_validation
stage_order: 4
responsibility: Validates that generated alert patterns match their expected typology parameters. Checks account counts,
amounts, periods, and structural properties like cycle ordering and scatter-gather chronology.
classes:
- name: AlertValidator.validate_all
file: alert_validation/alertvalidator-validate-all.py
line: 0
kind: required_method
signature: ''
- name: satisfies_params
file: alert_validation/satisfies-params.py
line: 0
kind: required_method
signature: ''
- name: is_cycle
file: alert_validation/is-cycle.py
line: 0
kind: required_method
signature: ''
- name: is_scatter_gather
file: alert_validation/is-scatter-gather.py
line: 0
kind: required_method
signature: ''
- name: PatternValidator
file: alert_validation/patternvalidator.py
line: 0
kind: replaceable_point
design_decision_count: 4
data_combination:
class_count: 2
stage_id: data_combination
stage_order: 5
responsibility: Merges multiple simulation outputs into a single dataset. Aggregates degrees and appends output CSVs
for multi-simulation batch runs, enabling large-scale dataset creation.
classes:
- name: Combiner.combine
file: data_combination/combiner-combine.py
line: 0
kind: required_method
signature: ''
- name: Combiner.merge_schemas
file: data_combination/combiner-merge-schemas.py
line: 0
kind: required_method
signature: ''
design_decision_count: 1
data_flow_hints: []
locale_contract:
source_language: en
user_facing_fields:
- human_summary.what_i_can_do.tagline
- human_summary.what_i_can_do.use_cases[]
- human_summary.what_i_auto_fetch[]
- human_summary.what_i_ask_you[]
- evidence_quality.user_disclosure_template
- post_install_notice.message_template.positioning
- post_install_notice.message_template.capability_catalog.groups[].name
- post_install_notice.message_template.capability_catalog.groups[].description
- post_install_notice.message_template.capability_catalog.groups[].ucs[].name
- post_install_notice.message_template.capability_catalog.groups[].ucs[].short_description
- post_install_notice.message_template.call_to_action
- post_install_notice.message_template.featured_entries[].beginner_prompt
- post_install_notice.message_template.more_info_hint
- preconditions[].description
- preconditions[].on_fail
- intent_router.uc_entries[].name
- intent_router.uc_entries[].ambiguity_question
- architecture.pipeline
- architecture.stages[].narrative.does_what
- architecture.stages[].narrative.key_decisions
- architecture.stages[].narrative.common_pitfalls
- constraints.fatal[].consequence
- constraints.regular[].consequence
- output_validator.assertions[].failure_message
- acceptance.hard_gates[].on_fail
- skill_crystallization.action
locale_detection_order:
- explicit_user_declaration
- first_message_language
- system_locale
translation_enforcement:
trigger: on_first_user_message
action: Render user_facing_fields in detected locale, preserving all IDs (BD-/SL-/UC-/finance-C-) and code identifiers
verbatim
violation_code: LOCALE-01
violation_signal: User receives untranslated English Human Summary when detected locale != en
evidence_quality:
declared:
evidence_coverage_ratio: 1.0
evidence_verify_ratio: 0.1590909090909091
evidence_invalid: 74
evidence_verified: 14
evidence_auto_fixed: 0
audit_coverage: 38/38 (100%)
audit_pass_rate: 1/38 (2%)
audit_fail_total: 22
audit_finance_universal:
pass: 1
warn: 9
fail: 10
audit_subdomain_totals:
pass: 0
warn: 6
fail: 12
enforcement_rules:
- id: EQ-01
trigger: declared.evidence_verify_ratio < 0.5
action: MUST invoke traceback lookup for all cited BD-IDs in output before emitting business code — read LATEST.yaml sections
for each BD referenced
violation_code: EQ-01-V
violation_signal: Generated script references BD-IDs but no tool_call to read LATEST.yaml preceded code generation
user_disclosure_template: '[QUALITY NOTICE] This crystal was compiled from blueprint finance-bp-060. Evidence verify ratio
= 15.9% and audit fail total = 22. Generated results may have uncaptured requirement gaps. Verify critical decisions against
source files (LATEST.yaml / LATEST.jsonl).'
traceback:
source_files:
blueprint: LATEST.yaml
constraints: LATEST.jsonl
mandatory_lookup_scenarios:
- id: TB-01
condition: Two constraints have apparently conflicting enforcement rules
lookup_target: LATEST.jsonl — find both constraint IDs, compare `consequence` + `evidence_refs` to determine priority
- id: TB-02
condition: A business decision rationale is unclear or disputed
lookup_target: LATEST.yaml — locate BD-ID under business_decisions, read `rationale` + `alternative_considered` fields
- id: TB-03
condition: evidence_invalid > 0 in evidence_quality.declared
lookup_target: LATEST.yaml _enrich_meta — cross-check specific BD `evidence_refs` fields for invalid markers
- id: TB-04
condition: User asks where a rule comes from
lookup_target: LATEST.jsonl — find constraint by ID, read `confidence.evidence_refs` for source file + line number
- id: TB-05
condition: Generated code does not match expected ZVT API behavior
lookup_target: LATEST.yaml stages[].required_methods — verify method signature and evidence locator in source code
degraded_lookup:
no_fs_access: 'Ask the user to paste the relevant LATEST.yaml section or LATEST.jsonl lines for the BD-/finance-C- IDs
in question. Crystal ID: finance-bp-060-v5.0.'
trace_schema:
event_types:
- precondition_check
- spec_lock_check
- evidence_rule_fired
- evidence_rule_skipped
- locale_translation_emitted
- hard_gate_passed
- hard_gate_failed
- skill_emitted
- false_completion_claim
preconditions:
- id: PC-01
description: zvt package installed and importable
check_command: python3 -c 'import zvt; print(zvt.__version__)'
on_fail: 'Run: python3 -m pip install zvt then re-run: python3 -m zvt.init_dirs to initialize data directories'
severity: fatal
- id: PC-02
description: K-data exists for target entities (required before backtesting)
check_command: python3 -c "from zvt.api.kdata import get_kdata; df = get_kdata(entity_ids=['stock_sh_600000'], limit=1);
assert df is not None and len(df) > 0, 'No kdata found'"
on_fail: 'Run recorder first: python3 -m zvt.recorders.em.em_stock_kdata_recorder --entity_ids stock_sh_600000 (replace
with your target entity IDs)'
severity: fatal
applies_to_uc:
- UC-108
- UC-109
- UC-110
- UC-111
- id: PC-03
description: ZVT data directory initialized (~/.zvt or ZVT_HOME)
check_command: 'python3 -c "import os; from pathlib import Path; zvt_home = Path(os.environ.get(''ZVT_HOME'', Path.home()
/ ''.zvt'')); assert zvt_home.exists(), f''ZVT home not found: {zvt_home}''"'
on_fail: 'Run: python3 -m zvt.init_dirs'
severity: fatal
- id: PC-04
description: SQLite write permission for ZVT data directory
check_command: python3 -c "import os, tempfile; from pathlib import Path; zvt_home = Path(os.environ.get('ZVT_HOME', Path.home()
/ '.zvt')); test_f = zvt_home / '.write_test'; test_f.touch(); test_f.unlink()"
on_fail: 'Check directory permissions: chmod u+w ~/.zvt or set ZVT_HOME environment variable to a writable location'
severity: warn
intent_router:
uc_entries:
- uc_id: UC-101
name: Convert Logs to AML Simulation Data
positive_terms:
- convert logs
- synthetic data
- AML simulation
- generate transaction logs
- test data generation
data_domain: mixed
negative_terms:
- live trading
- real-time data
- production alerts
- screening
ambiguity_question: Are you generating synthetic test data for simulation, or processing real transaction logs for analysis?
- uc_id: UC-102
name: Split Accounts by Bank ID
positive_terms:
- split accounts
- bank ID
- partition data
- bank filtering
- account grouping
data_domain: holding_data
negative_terms:
- alert generation
- transaction simulation
- network graph
ambiguity_question: Do you need to split existing account data by bank, or are you looking for transaction graph generation?
- uc_id: UC-103
name: Combine AML Simulation Outputs
positive_terms:
- combine outputs
- merge data
- AMLSim aggregation
- consolidate simulation results
- dataset assembly
data_domain: mixed
negative_terms:
- live trading
- real-time processing
- screening alerts
ambiguity_question: Are you combining simulation outputs into one dataset, or running the simulation itself?
- uc_id: UC-104
name: Generate Transaction Graph
positive_terms:
- transaction graph
- network generation
- graph topology
- AMLSim input
- account relationships
data_domain: trading_data
negative_terms:
- visualize graph
- plot distributions
- alert analysis
ambiguity_question: Do you need to create/generate a new transaction network, or analyze/visualize an existing one?
- uc_id: UC-105
name: Generate Scale-Free Network Graph
positive_terms:
- scale-free
- Kronecker graph
- network topology
- degree distribution
- graph generation research
data_domain: market_data
negative_terms:
- AML simulation
- alert generation
- transaction data
ambiguity_question: Are you generating mathematical network graphs for research, or creating transaction networks for
AML simulation?
- uc_id: UC-106
name: Plot Alert Pattern Subgraphs
positive_terms:
- alert visualization
- subgraph plot
- alert debugging
- pattern inspection
- AMLSim validation
data_domain: trading_data
negative_terms:
- generate alerts
- create transactions
- distributions
ambiguity_question: Are you visualizing existing alerts, or generating new transaction patterns and alerts?
- uc_id: UC-107
name: Plot Transaction Distributions
positive_terms:
- distribution plot
- statistics
- degree distribution
- amount analysis
- transaction visualization
data_domain: trading_data
negative_terms:
- alert generation
- transaction simulation
- network construction
ambiguity_question: Are you plotting statistics from existing transaction data, or generating new transactions for simulation?
- uc_id: UC-108
name: Random Amount Generator
positive_terms:
- random amount
- transaction generator
- random number
- amount range
- simulation utility
data_domain: trading_data
negative_terms:
- fixed amount
- rounded amount
- real data
ambiguity_question: Do you need random amounts with uniform distribution, or rounded/specific amounts for transactions?
- uc_id: UC-109
name: Account Nominator for Transaction Selection
positive_terms:
- account selection
- nominator
- transaction routing
- fan-in fan-out
- network degree
data_domain: holding_data
negative_terms:
- alert generation
- visualization
- data loading
ambiguity_question: Are you selecting accounts for transaction routing, or generating/analyzing alerts?
- uc_id: UC-110
name: Rounded Amount Generator
positive_terms:
- rounded amount
- realistic transaction
- human pattern
- currency rounding
- simulation utility
data_domain: trading_data
negative_terms:
- random precise
- exact amount
- real data
ambiguity_question: Do you need realistic rounded amounts, or precise random amounts for transactions?
- uc_id: UC-111
name: Normal Account Behavior Model
positive_terms:
- normal model
- behavior model
- account group
- main account
- member account
data_domain: holding_data
negative_terms:
- SAR
- suspicious activity
- alert
ambiguity_question: Are you defining normal transaction behavior patterns, or working with suspicious activity (SAR) alerts?
- uc_id: UC-112
name: Analyze Transaction Networks
positive_terms:
- network analysis
- graph analytics
- validation
- topology analysis
- degree analysis
data_domain: trading_data
negative_terms:
- generate network
- create transactions
- simulation
ambiguity_question: Are you analyzing existing network properties, or generating new transaction networks?
- uc_id: UC-113
name: Validate AML Simulation Alerts
positive_terms:
- validate alerts
- alert verification
- simulation accuracy
- alert parameters
- SAR validation
data_domain: trading_data
negative_terms:
- generate alerts
- create transactions
- visualization
ambiguity_question: Are you validating that alerts match expected parameters, or generating new alerts?
context_state_machine:
states:
- id: CA1_MEMORY_CHECKED
entry: Task started
exit: All memory queries attempted and recorded; memory_unavailable set if failed
timeout: 30s — skip memory, mark memory_unavailable=true, proceed to CA2
- id: CA2_GAPS_FILLED
entry: CA1 complete
exit: 'All FATAL-priority required inputs answered: target market (A-share/HK/US), data source, time range, strategy type'
timeout: NOT skippable — FATAL inputs MUST be user-answered before proceeding
- id: CA3_PATH_SELECTED
entry: CA2 complete
exit: intent_router matched single use case with confidence gap > 20% over next candidate, no data_domain ambiguity
timeout: Trigger ambiguity_question for top-2 candidates, await user selection
- id: CA4_EXECUTING
entry: CA3 complete + user explicit confirmation received
exit: All hard gates G1-Gn passed and output files written
timeout: NOT skippable — user confirmation of execution path required
enforcement: Code generation is PROHIBITED before CA4_EXECUTING. Any regression to earlier state MUST be announced to user.
buy/sell ordering SL-01 check runs at CA4 entry.
spec_lock_registry:
semantic_locks:
- id: SL-01
description: Execute sell orders before buy orders in every trading cycle
locked_value: sell() called before buy() in each Trader.run() iteration
violation_is: fatal
source_bd_ids:
- BD-018
- id: SL-02
description: Trading signals MUST use next-bar execution (no look-ahead)
locked_value: due_timestamp = happen_timestamp + level.to_second()
violation_is: fatal
source_bd_ids:
- BD-014
- BD-025
- id: SL-03
description: Entity IDs MUST follow format entity_type_exchange_code
locked_value: stock_sh_600000 | stockhk_hk_0700 | stockus_nasdaq_AAPL
violation_is: fatal
source_bd_ids: []
- id: SL-04
description: DataFrame index MUST be MultiIndex (entity_id, timestamp)
locked_value: df.index.names == ['entity_id', 'timestamp']
violation_is: fatal
source_bd_ids: []
- id: SL-05
description: 'TradingSignal MUST have EXACTLY ONE of: position_pct, order_money, order_amount'
locked_value: XOR enforcement in trading/__init__.py:68
violation_is: fatal
source_bd_ids: []
- id: SL-06
description: 'filter_result column semantics: True=BUY, False=SELL, None/NaN=NO ACTION'
locked_value: factor.py:475 order_type_flag mapping
violation_is: fatal
source_bd_ids: []
- id: SL-07
description: Transformer MUST run BEFORE Accumulator in factor pipeline
locked_value: 'compute_result(): transform at :403 before accumulator at :409'
violation_is: fatal
source_bd_ids: []
- id: SL-08
description: 'MACD parameters locked: fast=12, slow=26, signal=9'
locked_value: factors/algorithm.py:30 macd(slow=26, fast=12, n=9)
violation_is: fatal
source_bd_ids:
- BD-036
- id: SL-09
description: 'Default transaction costs: buy_cost=0.001, sell_cost=0.001, slippage=0.001'
locked_value: sim_account.py:25 SimAccountService default costs
violation_is: warning
source_bd_ids:
- BD-029
- id: SL-10
description: A-share equity trading is T+1 (no same-day close of buy positions)
locked_value: sim_account.available_long filters by trading_t
violation_is: fatal
source_bd_ids: []
- id: SL-11
description: Recorder subclass MUST define provider AND data_schema class attributes
locked_value: contract/recorder.py:71 Meta; register_schema decorator
violation_is: fatal
source_bd_ids: []
- id: SL-12
description: Factor result_df MUST contain either 'filter_result' OR 'score_result' column
locked_value: result_df.columns.intersection({'filter_result', 'score_result'}) non-empty
violation_is: fatal
source_bd_ids: []
implementation_hints:
- id: IH-01
hint: 'Use AdjustType enum exactly: qfq (pre-adjust), hfq (post-adjust), bfq (none) — contract/__init__.py:121'
- id: IH-02
hint: For A-share kdata, default to hfq for long-term analysis (dividend-adjusted) — trader.py:538 StockTrader
- id: IH-03
hint: SQLite connection MUST use check_same_thread=False for multi-threaded recorders
- id: IH-04
hint: Accumulator state serialization uses JSON with custom encoder/decoder hooks — contract/base_service.py
- id: IH-05
hint: Factor.level MUST match TargetSelector.level (enforced at add_factor) — factors/target_selector.py:84
preservation_manifest:
required_objects:
business_decisions_count: 114
fatal_constraints_count: 54
non_fatal_constraints_count: 129
use_cases_count: 13
semantic_locks_count: 12
preconditions_count: 4
evidence_quality_rules_count: 2
traceback_scenarios_count: 5
architecture:
pipeline: data_collection -> data_storage -> factor_computation -> target_selection -> trading_execution -> visualization
stages:
- id: data_collection
narrative:
does_what: TimeSeriesDataRecorder and FixedCycleDataRecorder fetch OHLCV and fundamental data from providers (eastmoney,
joinquant, baostock, akshare) and persist domain objects (Stock1dKdata, BalanceSheet) to SQLite via df_to_db().
key_decisions: BD-002 chose evaluate_start_end_size_timestamps for incremental fetch (not full refresh) because comparing
to get_latest_saved_record avoids redundant API calls; BD-003 chose get_data_map field transformation to keep domain
schema provider-agnostic.
common_pitfalls: 'Don''t forget SL-11: Recorder subclass MUST declare both provider and data_schema class attributes
else initialization fails with assertion error; finance-C-001 fatal violation.'
business_decisions: []
- id: data_storage
narrative:
does_what: StorageBackend persists DataFrames to per-provider SQLite databases at {data_path}/{provider}/{provider}_{db_name}.db
using path templates from _get_path_template; Mixin.record_data and Mixin.query_data provide uniform read/write interface.
key_decisions: BD-004 chose StorageBackend abstraction (not hardcoded SQLite) to allow future cloud storage swap; BD-006
derives db_name from data_schema __tablename__ for per-domain database isolation.
common_pitfalls: SL-04 violation (wrong DataFrame index) causes factor pipeline failures downstream; always ensure df.index.names
== ['entity_id', 'timestamp'] before calling record_data.
business_decisions: []
- id: factor_computation
narrative:
does_what: Factor.compute() applies Transformer (stateless, e.g. MacdTransformer) then Accumulator (stateful, e.g. MaStatsAccumulator)
to produce filter_result or score_result columns; EntityStateService persists per-entity rolling state across batches.
key_decisions: BD-007 chose Factor inheriting DataReader for composable data access; SL-08 locks MACD at (fast=12, slow=26,
n=9) — chose standard Appel parameters not adaptive because interpretability matters for practitioners.
common_pitfalls: 'SL-07: Transformer MUST run before Accumulator — swapping order causes NaN propagation; SL-12: result_df
must contain filter_result OR score_result column or TargetSelector silently drops all signals.'
business_decisions: []
- id: target_selection
narrative:
does_what: TargetSelector.add_factor() registers Factor instances; get_targets() returns entity_ids passing threshold
filter at a specific timestamp, enabling point-in-time historical backtesting without look-ahead.
key_decisions: BD-012 chose registrable factor list (not hardcoded) for runtime customization; BD-013 chose timestamp-specific
filtering not current-only because backtests need historical point-in-time correctness.
common_pitfalls: Factor.level MUST match TargetSelector.level (IH-05); mismatched levels cause silent empty target lists
that look like no signals but are actually level-mismatch bugs.
business_decisions: []
- id: trading_execution
narrative:
does_what: Trader.run() calls sell() before buy() each cycle, generates TradingSignals with due_timestamp = happen_timestamp
+ level.to_second() for next-bar execution, and applies on_profit_control() for stop-loss/take-profit before regular
target selection.
key_decisions: SL-01 locks sell-before-buy order because available_long check in sim_account depends on it — chose this
over symmetric ordering to prevent implicit leverage; BD-039 chose long=AND/short=OR multi-level logic to reflect
risk asymmetry.
common_pitfalls: 'SL-02 violation (immediate execution instead of next-bar) introduces look-ahead bias and makes backtest
results unreproducible in live trading; SL-10: A-share T+1 constraint — backtesting without it overstates returns.'
business_decisions: []
- id: visualization
narrative:
does_what: Drawer.draw() combines kline main chart with factor overlays and Rect annotations for entry/exit signals
using Plotly; Drawable interface on Factor enables consistent chart rendering across data types.
key_decisions: BD-019 chose drawer_rects subclass override for custom annotations not hardcoded markers — allows traders
to define entry/exit visuals without modifying base drawing logic.
common_pitfalls: draw_result=True by default (BD-055) is fine for development but set draw_result=False in production/headless
environments to avoid Plotly server startup overhead.
business_decisions:
- id: BD-062
type: B/DK
summary: Graphviz layout for alert subgraph visualization
- id: cross_cutting_concerns
narrative:
does_what: 'Invariants and utilities that span multiple pipeline stages — collected from 33 source groups: account_attribute(1),
account_classification(1), account_config(1), account_initialization(1), alert_pattern_generation(17), alert_validation(10),
and 27 more.'
key_decisions: 113 BDs merged here because they apply to more than one main stage (e.g. algorithm helpers, default value
choices, ordering contracts, error handling). Agent should inspect individual BD summaries and link back to affected
main stages via shared IDs.
common_pitfalls: Cross-cutting concerns frequently surface as bugs when changes to one main stage unintentionally break
another. Check constraints referencing these BDs and verify invariants still hold after any stage-local modification.
business_decisions:
- id: BD-035
type: B/BA
summary: Gender assigned with 50/50 probability (Male/Female)
- id: BD-036
type: B/BA
summary: Account type assigned 50/50 (individual vs organization)
- id: BD-043
type: B/BA
summary: 'Initial balance range: min=50000, max=100000'
- id: BD-028
type: B/BA
summary: Account balance generated with uniform distribution between min_balance and max_balance
- id: BD-006
type: B
summary: AML typologies use hub accounts as main nodes
- id: BD-007
type: B
summary: Accounts removed from hub pool after being selected
- id: BD-008
type: BA/M
summary: Alert types encoded as integer model IDs
- id: BD-024
type: B/BA
summary: Transaction amounts rounded to psychologically appealing values (multiples of 10, 100, 1000)
- id: BD-025
type: B/BA
summary: 'Step size selection: find power of ten giving 7-30 slots in range'
- id: BD-046
type: B/BA
summary: 'Fan-in pattern: multiple originators send to single main account'
- id: BD-047
type: B
summary: 'Fan-out pattern: single main account sends to multiple beneficiaries'
- id: BD-048
type: B
summary: 'Bipartite pattern: split accounts evenly between originators and beneficiaries'
- id: BD-049
type: B
summary: 'Stack pattern: divide accounts into thirds for originator/intermediate/beneficiary'
- id: BD-050
type: B/BA
summary: 'Cycle pattern: transactions form ring using modulo arithmetic, margin decrements amount'
- id: BD-051
type: B/BA
summary: 'Scatter-gather: split at midpoint date, scatter (orig->mid) then gather (mid->bene)'
- id: BD-052
type: B/BA
summary: 'Gather-scatter: collect from origins to mid at midpoint, then distribute to beneficiaries'
- id: BD-060
type: B/RC
summary: Random amount generation using uniform distribution
- id: BD-069
type: DK/B
summary: Nominator uses circular iterator pattern with manual index wrapping - next_node_id() resets index to 0 on IndexError
- id: BD-074
type: M/DK
summary: Schema classes use factory pattern via get_*_row() methods - row builders take **attrs for extensible columns
- id: BD-077
type: DK
summary: Nominator state machine uses increment_type_index() round-robin across types - assumes balanced workload but
allows type starvation
- id: BD-082
type: BA/DK
summary: RoundedAmount implements adaptive step size algorithm (7-30 slots per range) - non-uniform distribution favoring
round numbers
- id: BD-012
type: B
summary: Validation uses graph-theoretic properties rather than regex/text matching
- id: BD-013
type: BA
summary: Ordered patterns check chronological sequencing of transactions
- id: BD-014
type: B
summary: Scatter-gather requires intermediate amounts to decrease
- id: BD-018
type: B
summary: In-degree and out-degree sequences must have equal sums
- id: BD-019
type: B/BA
summary: Total accounts must be multiple of degree sequence length
- id: BD-030
type: B/DK
summary: SAR flag marks accounts involved in suspicious activity reports
- id: BD-053
type: B/BA
summary: 'Alert validation checks: number of accounts, amount range, period range'
- id: BD-054
type: B
summary: 'Cycle pattern validation: single cycle, chronological ordering, unique amounts'
- id: BD-055
type: B
summary: 'Scatter-gather validation: intermediate degree=1, amounts decrease, chronological order'
- id: BD-064
type: B
summary: Alert is_sar checked with > 0 comparison (sar_id > 0)
- id: BD-037
type: B/BA
summary: Powerlaw distribution fitting for degree distribution visualization
- id: BD-GAP-001
type: T
summary: Transaction generator uses INI configuration files to define test scenarios, enabling non-technical users to
create fraud test data without modifying code
- id: BD-031
type: B
summary: External (inter-bank) transactions allowed when multiple banks exist and bank_id is empty
- id: BD-GAP-002
type: B/BA
summary: Suspicious account classification uses boolean flags (country_risk, business_risk) rather than continuous risk
scores, forcing discrete categorization
- id: BD-GAP-003
type: B/BA
summary: AML rule engine combines multiple indicators (amount, frequency, country, business) into single rule definitions,
treating them as conjunction requirements
- id: BD-GAP-005
type: BA
summary: Fraud patterns are explicitly typed (fan_in, fan_out, dense, mixed, stack) rather than emerging from configuration,
encoding domain expertise about common laundering techniques
- id: BD-044
type: B/BA
summary: Cash-in normal interval=100, fraud interval=50; cash-out reversed
- id: BD-045
type: B/BA
summary: Cash-in normal amount=50-100, fraud=500-1000; cash-out reversed
- id: BD-017
type: B
summary: Environment variable RANDOM_SEED overrides config file random seed
- id: BD-056
type: B/BA
summary: Degree threshold of 4 for hub account selection
- id: BD-015
type: BA
summary: Schema loaded from first input and reused for all
- id: BD-033
type: B
summary: Transaction deduplication using (orig_id, dest_id, type, amount, date) tuple
- id: BD-034
type: B/DK
summary: Faker library (en_US locale) generates account names and addresses
- id: BD-063
type: B/DK
summary: Address retry loop ensures valid US address format
- id: BD-GAP-004
type: B
summary: Transaction network generation models hub accounts as high-degree vertices with preferential attachment, reflecting
real-world concentration of transaction volume
- id: BD-067
type: BA
summary: DEFAULT_MARGIN_RATIO=0.1 encodes business assumption that intermediaries retain 10% of funds in cycle/scatter-gather
patterns
- id: BD-073
type: DK
summary: 'base_date inconsistency: conf.json and convert_logs.py use ''2017-01-01'' but network_analytics.py uses ''1970-01-01'''
- id: BD-078
type: BA/M
summary: schedule_id defaults to 1 for normal models (hardcoded) vs AML typologies using schedule parameter from CSV
- id: BD-083
type: DK
summary: 'degree_threshold test/production mismatch: conf.json uses threshold=10 but test fixtures use threshold=3'
- id: BD-058
type: B/DK
summary: Active edge marking for normal model subgraph edges
- id: BD-084
type: B/BA
summary: 'INTERACTION: BD-066 × BD-072 → Initialization sequence violations cause Nominator AttributeError cascades'
- id: BD-085
type: BA
summary: 'INTERACTION: BD-073 × BD-038 → Inconsistent base dates (2017-01-01 vs 1970-01-01) corrupt temporal calculations
across pipeline boundaries'
- id: BD-086
type: B/BA
summary: 'INTERACTION: BD-083 × BD-003 → Test/production threshold mismatch causes false confidence in hub detection
validation'
- id: BD-087
type: B/BA
summary: 'INTERACTION: BD-006 × BD-007 → Hub main node selection conflicts with account pool depletion under high alert
volumes'
- id: BD-088
type: BA
summary: 'INTERACTION: BD-012 × BD-079 → Graph-theoretic validation amplifies maintenance burden and detection divergence
risk'
- id: BD-089
type: BA
summary: 'INTERACTION: BD-021 × BD-050 × BD-051 → Margin ratio creates detectable signature across cycle and scatter-gather
patterns'
- id: BD-090
type: B
summary: 'INTERACTION: BD-080 × BD-018 → Graph construction constraints formalize flow conservation requirements'
- id: BD-091
type: BA
summary: 'INTERACTION: BD-009 × BD-015 → Schema-driven mapping enables multi-format support but assumes consistency
across combined data'
- id: BD-092
type: B/BA
summary: 'RISK CASCADE: BD-066 → BD-071 → BD-027 → BD-003 → BD-006 → BD-046/BD-047 → BD-005/BD-059 → Alert pipeline
failure'
- id: BD-093
type: BA/M
summary: 'RISK CASCADE: BD-073 → BD-010 → BD-029 → BD-053 → BD-013 → Incorrect temporal validation'
- id: BD-094
type: B/BA
summary: 'CONTRADICTION: BD-015 assumes schema consistency while BD-009 enables schema evolution - these create conflicting
requirements'
- id: BD-095
type: BA/M
summary: 'CONTRADICTION: BD-078 hardcodes schedule_id=1 for normal models while AML typologies use dynamic CSV scheduling'
- id: BD-001
type: B
summary: Directed configuration model avoids self-loops by swapping IDs
- id: BD-002
type: B
summary: Degree sequences are repeated to fill total account count
- id: BD-003
type: B/BA
summary: Hub nodes defined by degree_threshold crossing either in OR out degree
- id: BD-004
type: BA
summary: Nominator uses degree-based candidate sorting
- id: BD-005
type: BA
summary: Fan breakdown algorithm can steal nodes from existing clumps
- id: BD-016
type: B
summary: Use directed configuration model to generate transaction graphs from degree sequences
- id: BD-039
type: B
summary: Weakly connected components analyzed for network structure
- id: BD-040
type: B/BA
summary: Clustering coefficient computed at intervals (default 30 steps) for performance
- id: BD-GAP-006
type: DK
summary: 'Missing: Timezone explicit annotation + UTC normalization'
- id: BD-GAP-007
type: M
summary: 'Missing: Convergence criteria explicit declaration'
- id: BD-GAP-008
type: DK
summary: 'Missing: Point-in-Time data availability'
- id: BD-GAP-009
type: DK
summary: 'Missing: Stale data detection and expiry policy'
- id: BD-GAP-010
type: B
summary: 'Missing: Train/test time split integrity'
- id: BD-GAP-011
type: DK
summary: 'Missing: Model and data version snapshot binding'
- id: BD-GAP-012
type: RC
summary: 'Missing: Settlement and delivery time convention'
- id: BD-GAP-013
type: B
summary: 'Missing: 模糊匹配算法与阈值(Jaro-Winkler/Levenshtein)'
- id: BD-GAP-014
type: RC
summary: 'Missing: 误报率监控与模型治理'
- id: BD-GAP-015
type: B
summary: 'Missing: ** "Implement immutable audit logging with cryptographic hash chains and append-only storage'
- id: BD-GAP-016
type: RC
summary: 'Missing: ** "Add Decimal type for each currency amounts (balance, transaction amounts) instead of float/double'
- id: BD-GAP-017
type: B
summary: 'Missing: ** "Implement jurisdiction-specific CTR/SAR threshold configuration with audit trail'
- id: BD-GAP-018
type: DK
summary: 'Missing: ** "Add run_id/experiment_id for reproducible simulation snapshots'
- id: BD-GAP-019
type: M
summary: 'Missing: Convergence criteria explicit declaration'
- id: BD-020
type: B/BA
summary: Hub accounts selected as accounts with degree >= degree_threshold
- id: BD-070
type: B/BA
summary: ResultGraphLoader overrides count_hub_accounts() but calls super() then extends - inheritance creates dual
counting behavior
- id: BD-068
type: T
summary: degree_threshold MUST be consistent between TransactionGenerator and Nominator - both receive identical value
at construction
- id: BD-071
type: RC
summary: Each account node MUST have 'normal_models' list attribute initialized at add_account() time for Nominator
graph lookups
- id: BD-076
type: DK/B
summary: fan_in/fan_out candidates are mutually exclusive after first assignment - node removed from opposite list on
first use
- id: BD-080
type: T
summary: 'Directed graph degree sequences MUST satisfy: sum(in_deg) == sum(out_deg) and num_accounts % len(sequence)
== 0'
- id: BD-009
type: BA
summary: Schema drives each column mappings via dataType annotations
- id: BD-010
type: B/DK
summary: Days converted to UTC ISO 8601 via base_date offset
- id: BD-011
type: BA
summary: SAR accounts extracted via org_type lookup
- id: BD-038
type: B/BA
summary: Base date (2017-01-01) plus days offset for transaction timestamps
- id: BD-027
type: B/BA
summary: Nominator uses degree_threshold to determine fan_in/fan_out candidates
- id: BD-057
type: B
summary: Nominator tracks remaining/used counts per type for model assignment
- id: BD-059
type: B/BA
summary: 'Fan breakdown candidates: subtract existing fan nodes, fill if below threshold'
- id: BD-026
type: B
summary: 'Normal model types: single, fan_in, fan_out, forward, mutual, periodical'
- id: BD-065
type: B/BA
summary: Normal model type count initialized from normalModels.csv
- id: BD-061
type: B/BA
summary: Normal model schedule_id defaults to 2
- id: BD-066
type: B/BA
summary: 'TransactionGenerator init sequence MUST be: set_num_accounts -> generate_normal_transactions -> load_account_list
-> load_normal_models -> build_normal_models -> set_main_acct_candidates -> load_alert'
- id: BD-072
type: B
summary: remove_typology_candidate MUST be called BEFORE add_node in each AML typology generators - order matters for
hub accounting
- id: BD-075
type: BA
summary: scatter_gather pattern requires scatter_date < gather_date AND scatter_amount > gather_amount - two independent
ordering constraints
- id: BD-081
type: B
summary: normal_models list must be written AFTER mark_active_edges sets edge attributes - active flag drives CSV export
filter
- id: BD-041
type: B/BA
summary: Simulation total_steps=150, base_date=2017-01-01, random_seed=0
- id: BD-079
type: M
summary: validation/ module implements independent alert pattern detection (is_cycle, is_scatter_gather, is_gather_scatter)
mirroring graph generator
- id: BD-029
type: B/BA
summary: Transaction dates distributed uniformly within [start_date, end_date] inclusive
- id: BD-021
type: B/BA
summary: Default margin ratio of 0.1 (10%) for intermediate accounts
- id: BD-042
type: B/BA
summary: 'Transaction amount range: min=100, max=1000'
- id: BD-032
type: B/BA
summary: Cash transactions identified by type in CASH_TYPES set ("CASH-IN", "CASH-OUT")
- id: BD-022
type: B
summary: 'AML typology types: fan_in, fan_out, cycle, bipartite, stack, random, scatter_gather, gather_scatter'
- id: BD-023
type: B
summary: 'Alert type ID mapping: fan_out=1, fan_in=2, cycle=3, bipartite=4, stack=5, random=6, scatter_gather=7, gather_scatter=8'
resources:
packages:
- name: numpy
version_pin: latest
- name: networkx
version_pin: latest
- name: matplotlib
version_pin: latest
- name: pygraphviz
version_pin: latest
- name: powerlaw
version_pin: latest
- name: python-dateutil
version_pin: latest
- name: Faker
version_pin: latest
- name: MASON
version_pin: latest
- name: JSON in Java
version_pin: latest
- name: WebGraph
version_pin: latest
strategy_scaffold:
entry_point_name: run_backtest
output_path: result.csv
execution_mode: backtest
conditional_entry_points:
backtest:
entry_point_name: run_backtest
output_path: result.csv
collector:
entry_point_name: run_collector
output_path: result.json
factor:
entry_point_name: run_factor
output_path: result.parquet
training:
entry_point_name: run_training
output_path: result.json
serving:
entry_point_name: run_server
output_path: result.json
research:
entry_point_name: run_research
output_path: result.json
tail_template: "# === DO NOT MODIFY BELOW THIS LINE ===\nif __name__ == \"__main__\":\n result = run_backtest() #\
\ implement above\n from validate import enforce_validation\n enforce_validation(result, output_path=\"{workspace}/result.csv\"\
)\n# === END DO NOT MODIFY ==="
host_adapter:
target: openclaw
timeout_seconds: 1800
shell_operator_restriction: 'exec tool intercepts && / ; / | — never chain: ''pip install X && python Y''. Use separate
exec calls.'
install_recipes:
- python3 -m pip install numpy
- python3 -m pip install networkx
- python3 -m pip install matplotlib
- python3 -m pip install zvt
credential_injection: JoinQuant/QMT credentials require user-side '!' prefix shell login. Never hardcode credentials in
generated scripts.
path_resolution: '{workspace} resolves to ~/.openclaw/workspace/doramagic at execution time.'
file_io_tooling: Use openclaw 'write' tool for .py/.sql files; 'exec' tool for python3 /absolute/path/script.py (absolute
paths only).
constraints:
fatal:
- id: finance-C-001
when: When implementing directed_configuration_model graph generation
action: Enforce sum of in-degrees equals sum of out-degrees before edge creation
severity: fatal
kind: domain_rule
modality: must
consequence: Invalid degree sequences will produce an inconsistent directed graph where some nodes have unmatched incoming/outgoing
edges, corrupting the transaction network topology for AML analysis
stage_ids:
- graph_construction
- id: finance-C-002
when: When loading account lists via load_account_list_param
action: Initialize normal_models as empty list for every account node
severity: fatal
kind: domain_rule
modality: must
consequence: Missing normal_models attribute causes KeyError when Nominator methods attempt to access it during fan_in_breakdown
and fan_out_breakdown operations, breaking the entire normal model generation pipeline
stage_ids:
- graph_construction
- id: finance-C-003
when: When loading raw account lists via load_account_list_raw
action: Initialize normal_models as empty list for every account node attribute dictionary
severity: fatal
kind: domain_rule
modality: must
consequence: Raw account loading path does not include normal_models initialization, causing KeyError when downstream
Nominator code attempts to append to the missing attribute during normal model construction
stage_ids:
- graph_construction
- id: finance-C-004
when: When constructing directed graphs from degree sequences
action: Swap IDs to eliminate self-loops when source equals destination after shuffling
severity: fatal
kind: domain_rule
modality: must
consequence: Self-loops in the transaction graph would represent accounts sending money to themselves, which violates
AML domain requirements and corrupts downstream fan-in/fan-out pattern analysis
stage_ids:
- graph_construction
- id: finance-C-005
when: When parsing degree distribution CSV files
action: Verify in-degree sequence length equals out-degree sequence length
severity: fatal
kind: domain_rule
modality: must
consequence: Mismatched sequence lengths produce a graph where the number of accounts with incoming edges differs from
those with outgoing edges, corrupting the bipartite degree sequence matching for directed configuration model
stage_ids:
- graph_construction
- id: finance-C-006
when: When scaling degree sequences to match account count
action: Require num_accounts to be evenly divisible by degree sequence length
severity: fatal
kind: domain_rule
modality: must
consequence: Non-divisible account count causes incomplete graph scaling where some accounts lack degree assignments,
resulting in orphaned nodes with undefined transaction patterns in the AML simulation
stage_ids:
- graph_construction
- id: finance-C-008
when: When instantiating TransactionGenerator and Nominator classes
action: Pass identical degree_threshold value to both TransactionGenerator and Nominator
severity: fatal
kind: architecture_guardrail
modality: must
consequence: Mismatched degree_threshold causes Nominator to identify hub accounts using different criteria than TransactionGenerator,
leading to incorrect fan-in/fan-out candidate selection and corrupted AML pattern generation
stage_ids:
- graph_construction
- id: finance-C-014
when: When loading account data from aggregated CSV files
action: Expand degree sequence entries by the repeat count before graph construction
severity: fatal
kind: domain_rule
modality: must
consequence: Without proper expansion, degree sequences remain at sample size causing graph topology to be incorrect for
the full account set, with accounts receiving incorrect transaction pattern assignments
stage_ids:
- graph_construction
- id: finance-C-015
when: When implementing scatter_gather pattern generation
action: verify scatter transactions occur before gather transactions (scatter_date < gather_date)
severity: fatal
kind: domain_rule
modality: must
consequence: Scatter-gather pattern validation will fail if scatter_date >= gather_date, breaking the chronological ordering
required for AML typology verification
stage_ids:
- alert_pattern_generation
- id: finance-C-016
when: When implementing scatter_gather pattern generation
action: verify scatter_amount exceeds gather_amount for each intermediate account
severity: fatal
kind: domain_rule
modality: must
consequence: Validation will reject scatter-gather patterns if scatter_amount <= gather_amount, as the margin must be
retained by intermediate accounts
stage_ids:
- alert_pattern_generation
- id: finance-C-017
when: When loading margin_ratio configuration
action: verify margin_ratio value is within the valid range [0.0, 1.0]
severity: fatal
kind: domain_rule
modality: must
consequence: Invalid margin_ratio will cause ValueError during pattern generation, preventing any AML typology from being
placed in the transaction graph
stage_ids:
- alert_pattern_generation
- id: finance-C-018
when: When implementing cycle pattern generation
action: verify cycle transactions are chronologically ordered with decreasing amounts
severity: fatal
kind: domain_rule
modality: must
consequence: Validation will reject cycle patterns if transaction dates are not strictly increasing or amounts are not
strictly decreasing, breaking the expected money laundering funnel pattern
stage_ids:
- alert_pattern_generation
- id: finance-C-019
when: When adding transaction edges in AML typologies
action: create self-loops where originator equals beneficiary account
severity: fatal
kind: domain_rule
modality: must_not
consequence: Self-loops are not valid transaction patterns for AML detection systems and will cause ValueError to be raised
during edge creation
stage_ids:
- alert_pattern_generation
- id: finance-C-020
when: When creating AML typology patterns
action: call remove_typology_candidate BEFORE add_node for each selected account
severity: fatal
kind: architecture_guardrail
modality: must
consequence: Reversing this order causes hub self-selection and duplicate account assignment across overlapping alert
patterns, corrupting the generated transaction graph
stage_ids:
- alert_pattern_generation
- id: finance-C-021
when: When selecting hub accounts for AML typologies
action: validate hub pool is non-empty before calling add_main_acct
severity: fatal
kind: architecture_guardrail
modality: must
consequence: Calling add_main_acct with empty hub pool raises ValueError and stops all further typology generation, preventing
alert pattern placement
stage_ids:
- alert_pattern_generation
- id: finance-C-025
when: When generating scatter_gather patterns
action: apply margin_ratio to intermediate account amounts correctly (gather_amount = scatter_amount - scatter_amount
* margin_ratio)
severity: fatal
kind: domain_rule
modality: must
consequence: Incorrect margin application violates the scatter_amount > gather_amount invariant required for validation,
causing pattern rejection
stage_ids:
- alert_pattern_generation
- id: finance-C-038
when: When converting simulator day offsets to timestamps
action: Append 'Z' suffix to mark UTC timezone in ISO 8601 format
severity: fatal
kind: domain_rule
modality: must
consequence: Database imports fail or misattribute transaction times to wrong timezone, causing incorrect AML alert sequencing
and compliance violations
stage_ids:
- log_conversion
- id: finance-C-039
when: When parsing SAR flag from input CSV files
action: Convert SAR flag to lowercase string 'true'/'false' for consistent CSV output
severity: fatal
kind: domain_rule
modality: must
consequence: Alert filtering logic in downstream analytics fails silently because case-sensitive comparisons miss SAR
transactions, causing compliance detection gaps
stage_ids:
- log_conversion
- id: finance-C-041
when: When outputting transaction rows with date valueType
action: Apply days2date conversion to each date-typed columns before writing CSV rows
severity: fatal
kind: domain_rule
modality: must
consequence: CSV columns contain raw day integers instead of ISO timestamps, causing database schema violations and failed
imports for Neo4j/JanusGraph
stage_ids:
- log_conversion
- id: finance-C-042
when: When parsing alert transactions for SAR extraction
action: Verify alert_id exists in self.reports dictionary before calling get_reason()
severity: fatal
kind: domain_rule
modality: must
consequence: Python raises AttributeError when accessing get_reason() on None, causing transaction conversion to abort
and leaving incomplete CSV outputs
stage_ids:
- log_conversion
- id: finance-C-044
when: When converting raw transaction logs to CSV format
action: Execute convert_alert_members() before convert_acct_tx() to populate self.reports
severity: fatal
kind: architecture_guardrail
modality: must
consequence: Alert transaction extraction fails with NoneType errors because reports dictionary is empty, preventing SAR
case generation
stage_ids:
- log_conversion
- id: finance-C-045
when: When loading schema.json for column mapping
action: Parse dataType annotations to determine field roles (account_id, timestamp, sar_flag, alert_id)
severity: fatal
kind: architecture_guardrail
modality: must
consequence: Schema-driven field mapping fails, causing wrong columns to populate critical identifiers and preventing
join operations across CSV outputs
stage_ids:
- log_conversion
- id: finance-C-053
when: When validating cycle alert patterns
action: check that the alert subgraph contains exactly one closed loop detectable by nx.simple_cycles
severity: fatal
kind: domain_rule
modality: must
consequence: Cycle patterns with zero or multiple closed loops will pass validation incorrectly, causing invalid AML typologies
to be treated as legitimate alerts
stage_ids:
- alert_validation
- id: finance-C-054
when: When validating ordered scatter-gather alert patterns
action: check that scatter_date is chronologically before gather_date for each intermediate accounts
severity: fatal
kind: domain_rule
modality: must
consequence: Scatter-gather patterns with transactions in reverse chronological order will be incorrectly validated, breaking
the fundamental fan-out then fan-in structure of the AML typology
stage_ids:
- alert_validation
- id: finance-C-055
when: When validating ordered scatter-gather alert patterns
action: check that scatter_amount exceeds gather_amount for each intermediate account to verify margin extraction
severity: fatal
kind: domain_rule
modality: must
consequence: Scatter-gather patterns where intermediate accounts do not receive margin will be incorrectly validated,
failing to detect money laundering via fee extraction
stage_ids:
- alert_validation
- id: finance-C-056
when: When validating ordered cycle patterns
action: check that cycle transaction amounts are strictly monotonically decreasing in chronological order
severity: fatal
kind: domain_rule
modality: must
consequence: Cycle patterns with unordered transaction amounts will be incorrectly validated, breaking the margin extraction
chain in circular fund movements
stage_ids:
- alert_validation
- id: finance-C-057
when: When validating ordered cycle patterns
action: check that cycle transaction dates are chronologically ordered and successor edge connects from predecessor's
beneficiary
severity: fatal
kind: domain_rule
modality: must
consequence: Cycle patterns with unordered transaction dates or broken chain connections will be incorrectly validated,
failing to represent legitimate circular fund flow
stage_ids:
- alert_validation
- id: finance-C-063
when: When validating gather-scatter patterns
action: check that gather transactions complete before scatter transactions commence chronologically
severity: fatal
kind: domain_rule
modality: must
consequence: Gather-scatter patterns where scatter occurs before gather completes violate the fundamental fan-in then
fan-out structure of this AML typology
stage_ids:
- alert_validation
- id: finance-C-064
when: When validating gather-scatter patterns
action: check that scatter amounts do not exceed the average gathered amount per beneficiary account
severity: fatal
kind: domain_rule
modality: must
consequence: Gather-scatter patterns where scatter amounts exceed gathered amounts indicate impossible fund flows that
should not pass validation
stage_ids:
- alert_validation
- id: finance-C-065
when: When modifying alert pattern validation rules
action: modify validation rules in isolation without synchronizing changes to transaction_graph_generator.py
severity: fatal
kind: architecture_guardrail
modality: must_not
consequence: Desynchronization between generation and validation rules will cause valid generated patterns to fail validation
or invalid patterns to pass
stage_ids:
- alert_validation
- id: finance-C-066
when: When loading alert parameter CSV files
action: 'parse each required columns: count, type, schedule_id, min_accounts, max_accounts, min_amount, max_amount, min_period,
max_period, bank_id, is_sar'
severity: fatal
kind: architecture_guardrail
modality: must
consequence: Missing column indices will cause KeyError exceptions during parameter loading, preventing alert validation
from executing
stage_ids:
- alert_validation
- id: finance-C-067
when: When loading alert transaction CSV files
action: construct a directed graph with edges containing amount and date attributes for each transaction
severity: fatal
kind: architecture_guardrail
modality: must
consequence: Directed graph without proper edge attributes will cause KeyError exceptions during pattern validation when
accessing date or amount properties
stage_ids:
- alert_validation
- id: finance-C-072
when: When validating alert patterns against typology specifications
action: only pass validation if the alert subgraph matches at least one parameter set with matching alert_type
severity: fatal
kind: architecture_guardrail
modality: must
consequence: Alert patterns matched against wrong typology parameters will produce incorrect validation results, compromising
the integrity of generated simulation data
stage_ids:
- alert_validation
- id: finance-C-079
when: When combining multiple simulation outputs into a single dataset
action: use only input simulations that share the same schema structure as the first input
severity: fatal
kind: domain_rule
modality: must
consequence: Combined CSV files will have mismatched column counts and names, causing downstream alert validation and
ML training pipelines to fail with column index errors
stage_ids:
- data_combination
- id: finance-C-080
when: When appending output data from each input simulation
action: offset each account IDs by the cumulative account ID offset from previous simulations
severity: fatal
kind: domain_rule
modality: must
consequence: Account IDs will collide across combined simulations, causing referential integrity failures when transactions
reference accounts that appear in multiple simulations
stage_ids:
- data_combination
- id: finance-C-081
when: When appending output data from each input simulation
action: offset each transaction IDs by the cumulative transaction ID offset from previous simulations
severity: fatal
kind: domain_rule
modality: must
consequence: Transaction IDs will duplicate across combined simulations, breaking alert-to-transaction joins and creating
false-positive SAR identifications
stage_ids:
- data_combination
- id: finance-C-082
when: When appending output data from each input simulation
action: offset each alert IDs by the cumulative alert ID offset from previous simulations
severity: fatal
kind: domain_rule
modality: must
consequence: Alert IDs will duplicate across combined simulations, causing alert_members and alert_transactions to join
incorrectly and corrupt suspicious activity reports
stage_ids:
- data_combination
- id: finance-C-083
when: When combining transaction outputs from multiple simulations
action: offset both orig_id and dest_id (account references) by the cumulative account ID offset
severity: fatal
kind: domain_rule
modality: must
consequence: Transaction sender/receiver references will point to wrong accounts across simulation boundaries, corrupting
the transaction graph and breaking downstream graph analytics
stage_ids:
- data_combination
- id: finance-C-084
when: When combining alert member outputs from multiple simulations
action: offset account_id references within alert_members by the cumulative account ID offset
severity: fatal
kind: domain_rule
modality: must
consequence: Alert-to-account mappings will reference incorrect accounts, causing investigators to examine wrong accounts
when reviewing alerts
stage_ids:
- data_combination
- id: finance-C-085
when: When combining alert transaction outputs from multiple simulations
action: offset tx_id, orig_id, and dest_id references within alert_transactions by cumulative offsets
severity: fatal
kind: domain_rule
modality: must
consequence: Alert transactions will reference non-existent transactions and accounts, breaking the link between suspicious
activity alerts and the underlying transaction records
stage_ids:
- data_combination
- id: finance-C-088
when: When writing output CSV headers for combined files
action: use the output schema column names (acct_names, tx_names, alert_acct_names, alert_tx_names)
severity: fatal
kind: architecture_guardrail
modality: must
consequence: Column headers in combined CSVs will not match the schema definition, causing downstream parsers to misidentify
columns and corrupt data loading
stage_ids:
- data_combination
- id: finance-C-096
when: When configuring the degree sequence for directed graph generation
action: Verify the sum of in-degrees equals the sum of out-degrees
severity: fatal
kind: domain_rule
modality: must
consequence: Directed configuration model will raise NetworkXError, causing the entire transaction graph generation pipeline
to fail
- id: finance-C-098
when: When outputting alert members CSV from alert_pattern_generation
action: Include the alertID column that uniquely identifies each AML typology
severity: fatal
kind: architecture_guardrail
modality: must
consequence: Log converter cannot link alert transactions to their corresponding typology members, breaking the SAR reporting
chain
- id: finance-C-100
when: When generating hub account candidates for AML typologies
action: Select accounts with degree exceeding the degree_threshold configuration parameter
severity: fatal
kind: operational_lesson
modality: must
consequence: Alert generation will fail with ValueError when no hub accounts exist, halting simulation
- id: finance-C-103
when: When combining multiple simulation outputs in data_combination
action: Offset account IDs by the maximum ID from previously combined simulations
severity: fatal
kind: domain_rule
modality: must
consequence: Account ID collisions will cause incorrect transaction linkage in downstream analysis, producing invalid
money laundering patterns
- id: finance-C-104
when: When combining multiple simulation outputs in data_combination
action: Offset alert IDs by the maximum alert ID from previously combined simulations
severity: fatal
kind: domain_rule
modality: must
consequence: Alert ID collisions will merge distinct SAR cases in the alert database, corrupting compliance investigation
workflows
- id: finance-C-105
when: When mapping transaction originator and beneficiary IDs during combination
action: Apply account ID offset to both orig_id and dest_id fields in transactions
severity: fatal
kind: architecture_guardrail
modality: must
consequence: Transaction sender/receiver relationships will be incorrectly attributed, breaking transaction graph topology
for AML analysis
- id: finance-C-120
when: When generating directed graphs from degree sequences
action: Validate that sum of in-degrees equals sum of out-degrees before graph construction
severity: fatal
kind: domain_rule
modality: must
consequence: NetworkXError raised during graph generation causes simulation failure; uncaught exception crashes the pipeline
and loses all generated data
- id: finance-C-121
when: When loading degree sequences for directed graph generation
action: Validate that number of total accounts is divisible by the degree sequence length
severity: fatal
kind: domain_rule
modality: must
consequence: ValueError raised when degree sequence cannot evenly tile the account graph; simulation fails to initialize
the transaction network
- id: finance-C-130
when: When using AMLSim in any production or compliance context
action: Treat synthetic AML alerts as regulatory-grade findings or use them to satisfy AML compliance obligations
severity: fatal
kind: claim_boundary
modality: must_not
consequence: Non-compliant AML program may face regulatory sanctions, fines, or enforcement actions from financial regulators;
synthetic data does not satisfy reporting requirements
- id: finance-C-131
when: When deploying AMLSim for real-time financial operations
action: Connect AMLSim outputs to real-time transaction processing, payment systems, or live financial infrastructure
severity: fatal
kind: claim_boundary
modality: must_not
consequence: Synthetic transaction data injected into live systems may trigger incorrect fraud alerts, freeze legitimate
customer accounts, or corrupt financial databases with fabricated records
- id: finance-C-138
when: When implementing account creation logic in AML transaction graph simulation
action: Initialize 'normal_models' as an empty list attribute for each account node at add_account() time — accounts must
have this attribute before any Nominator graph operations
severity: fatal
kind: domain_rule
modality: must
consequence: Accounts added without normal_models initialization cause AttributeError during Nominator operation when
pattern generators attempt to extend the list, breaking graph construction and preventing alert generation
derived_from_bd_id: BD-071
- id: finance-C-160
when: When implementing timestamp conversion and temporal validation logic
action: Mix epoch-based timestamps (Unix epoch 1970-01-01) with date-string-based timestamps (2017-01-01 base) in temporal
validation — verify each timestamps use consistent reference dates throughout the pipeline from generation (BD-073)
through conversion (BD-010), distribution logic (BD-029), alert validation (BD-053), and chronological ordering (BD-013)
severity: fatal
kind: domain_rule
modality: must_not
consequence: The RISK CASCADE causes transactions generated with 2017-01-01 base dates to be interpreted relative to 1970-01-01
Unix epoch, making period range validation produce incorrect results that either accept invalid patterns or reject valid
ones, corrupting downstream analytics
derived_from_bd_id: BD-093
- id: finance-C-161
when: When validating transaction temporal ranges against configured time periods
action: Implement centralized date constant management — use a single source of truth for base_date (e.g., BASE_DATE =
datetime(2017, 1, 1)) imported consistently across timestamp generation (BD-073), UTC conversion (BD-010), uniform distribution
(BD-029), alert validation (BD-053), and chronological ordering (BD-013) modules
severity: fatal
kind: architecture_guardrail
modality: must
consequence: Without centralized date management, the base_date inconsistency (2017-01-01 vs 1970-01-01) propagates through
each transformation stage, causing period validation to incorrectly compare timestamps against the wrong epoch and produce
systematically wrong results
derived_from_bd_id: BD-093
regular:
- id: finance-C-007
when: When using AMLSim for transaction graph generation
action: Use networkx version other than 1.11 for large graph generation
severity: high
kind: resource_boundary
modality: must_not
consequence: NetworkX version 2.* exhibits severe performance degradation when creating large transaction graphs, causing
exponential slowdown in graph generation for datasets with 10K+ accounts
stage_ids:
- graph_construction
- id: finance-C-009
when: When implementing hub node identification logic
action: Identify hub accounts using OR semantics for in/out degree threshold crossing
severity: high
kind: architecture_guardrail
modality: must
consequence: Using AND instead of OR semantics excludes pure senders or pure receivers from hub identification, breaking
the AML typology design where both fan-in aggregators and fan-out distributors serve as main accounts
stage_ids:
- graph_construction
- id: finance-C-010
when: When validating transaction graph generation outputs
action: Verify that at least one hub account exists before proceeding to model building
severity: high
kind: operational_lesson
modality: must
consequence: Proceeding without hub accounts causes AML typology generation to fail when trying to assign main accounts,
requiring users to reconfigure degree_threshold with no clear error message
stage_ids:
- graph_construction
- id: finance-C-011
when: When generating directed configuration model graphs
action: Use the same random seed across TransactionGenerator and Nominator for reproducibility
severity: medium
kind: operational_lesson
modality: must
consequence: Different random seeds cause shuffled degree lists to produce different graph topologies between graph generation
and model assignment, breaking reproducibility of AML simulation runs
stage_ids:
- graph_construction
- id: finance-C-012
when: When presenting AMLSim generated data as research or compliance evidence
action: Claim generated transaction graphs represent real financial transaction data
severity: high
kind: claim_boundary
modality: must_not
consequence: Presenting synthetic AML simulation data as real transactions violates research integrity and could lead
to regulatory compliance violations if used in actual AML investigations without proper disclosure
stage_ids:
- graph_construction
- id: finance-C-013
when: When evaluating graph generation quality or AML detection accuracy
action: Assume backtest performance on synthetic data predicts live AML detection effectiveness
severity: high
kind: claim_boundary
modality: must_not
consequence: Synthetic transaction patterns may not capture real-world evasion techniques, data quality issues, or temporal
dynamics, leading to over-optimistic evaluation of detection algorithms that fail on actual financial crime data
stage_ids:
- graph_construction
- id: finance-C-022
when: When generating alert subgroups
action: assign sequential alert_id values and store subgraph under correct alert_id key
severity: high
kind: architecture_guardrail
modality: must
consequence: Alert IDs in transaction log must match alert_members.csv for joinability in downstream validation; mismatched
IDs break data integrity for alert correlation
stage_ids:
- alert_pattern_generation
- id: finance-C-023
when: When placing AML typology accounts
action: use hub accounts (high-degree vertices) as main accounts for pattern centroids
severity: high
kind: architecture_guardrail
modality: must
consequence: Non-hub main accounts create highly anomalous patterns that stand out artificially, defeating the purpose
of realistic AML simulation
stage_ids:
- alert_pattern_generation
- id: finance-C-024
when: When implementing ordered pattern types
action: verify transaction dates fall within the generated start_date and end_date range
severity: high
kind: domain_rule
modality: must
consequence: Out-of-range dates cause validation failures and create invalid temporal patterns that do not match the intended
alert typology period
stage_ids:
- alert_pattern_generation
- id: finance-C-026
when: When generating cycle patterns
action: apply margin_ratio to transfer amounts sequentially through each account in the cycle
severity: high
kind: domain_rule
modality: must
consequence: Without sequential margin deduction, cycle amounts would remain constant instead of decreasing, violating
the expected money laundering funnel behavior
stage_ids:
- alert_pattern_generation
- id: finance-C-027
when: When selecting accounts for AML typology members
action: allow hub accounts to be selected as main accounts for multiple patterns
severity: high
kind: architecture_guardrail
modality: must_not
consequence: Hub account reuse across patterns causes overlapping suspicious activity that inflates detection metrics
and creates duplicate SAR assignments
stage_ids:
- alert_pattern_generation
- id: finance-C-028
when: When running AMLSim with large transaction graphs
action: use networkx version 2.* or later
severity: high
kind: resource_boundary
modality: must_not
consequence: NetworkX 2.* has significant performance issues with large graph creation, causing excessive runtime or memory
exhaustion during transaction graph generation
stage_ids:
- alert_pattern_generation
- id: finance-C-029
when: When configuring AML typology generation
action: verify sufficient hub account candidates exist relative to pattern count
severity: high
kind: operational_lesson
modality: must
consequence: Insufficient hub accounts relative to alert pattern count causes ValueError at check_hub_exists and stops
all pattern generation; solution requires lowering degree_threshold
stage_ids:
- alert_pattern_generation
- id: finance-C-030
when: When generating external-bank AML patterns
action: verify sub-bank has sufficient candidate accounts before attempting selection
severity: medium
kind: operational_lesson
modality: must
consequence: Pattern generation silently returns without placing the pattern if insufficient accounts exist in the target
bank, causing incomplete alert coverage
stage_ids:
- alert_pattern_generation
- id: finance-C-031
when: When presenting AMLSim output data
action: claim synthetic AML patterns represent real-world money laundering behavior
severity: high
kind: claim_boundary
modality: must_not
consequence: Presenting synthetic transaction patterns as real AML cases misleads stakeholders about the system's actual
detection capability on genuine suspicious activity
stage_ids:
- alert_pattern_generation
- id: finance-C-032
when: When using AMLSim validation results
action: present validation success rates as indicators of real-world detection performance
severity: medium
kind: claim_boundary
modality: must_not
consequence: AMLSim validates that generated patterns match their parameters, but this does not guarantee equivalent detection
rates on real financial crime patterns which have different characteristics
stage_ids:
- alert_pattern_generation
- id: finance-C-033
when: When loading typology pattern names
action: verify typology name is one of the supported alert_types keys
severity: medium
kind: domain_rule
modality: must
consequence: Unknown typology names are skipped with a warning but the pattern count for that row is not retried, potentially
leaving alert coverage below intended levels
stage_ids:
- alert_pattern_generation
- id: finance-C-034
when: When marking accounts involved in alert patterns
action: set IS_SAR_KEY attribute to True for each vertices participating in alert typologies
severity: high
kind: architecture_guardrail
modality: must
consequence: Missing IS_SAR_KEY flag causes SAR account list generation to miss alerted accounts, breaking downstream
compliance reporting requirements
stage_ids:
- alert_pattern_generation
- id: finance-C-035
when: When specifying external-bank typology requirements
action: require at least 2 banks to exist when bank_id is empty in pattern configuration
severity: medium
kind: operational_lesson
modality: must
consequence: Attempting external transactions without multiple banks causes KeyError when checking if bank exists, terminating
pattern generation
stage_ids:
- alert_pattern_generation
- id: finance-C-036
when: When implementing bipartite and stack patterns
action: calculate originator and beneficiary account counts correctly (num_orig_accts = num_accounts // 2 for bipartite,
num_accounts // 3 for stack)
severity: high
kind: domain_rule
modality: must
consequence: Incorrect account count allocation causes insufficient accounts for one partition, breaking the expected
multi-layer transaction structure
stage_ids:
- alert_pattern_generation
- id: finance-C-037
when: When implementing gather_scatter pattern
action: accumulate amounts from origin accounts and distribute equal amounts to beneficiary accounts
severity: high
kind: domain_rule
modality: must
consequence: Non-equal distribution breaks the expected gather-scatter money flow pattern and causes validation failures
stage_ids:
- alert_pattern_generation
- id: finance-C-040
when: When configuring the base_date parameter
action: Set base_date to '2017-01-01' to match hardcoded fallback in days2date calculation
severity: high
kind: domain_rule
modality: must
consequence: Transaction timestamps drift by years, causing all AML alert correlations to reference wrong date ranges
and invalidating historical pattern analysis
stage_ids:
- log_conversion
- id: finance-C-043
when: When determining account organization type for SAR routing
action: Return 'INDIVIDUAL' for account type 'I' and 'COMPANY' for each other types
severity: high
kind: architecture_guardrail
modality: must
consequence: SAR accounts misrouted to wrong entity tables, causing party enrichment queries to return empty results for
legitimate SAR investigations
stage_ids:
- log_conversion
- id: finance-C-046
when: When generating Faker-based personal attributes
action: Use 'en_US' locale for consistent US-style name and address generation
severity: medium
kind: resource_boundary
modality: should
consequence: Mixed locale attributes cause address parsing failures and inconsistent naming conventions across account
records
stage_ids:
- log_conversion
- id: finance-C-047
when: When validating transaction log row integrity
action: Skip rows with fewer columns than expected header to prevent index out of bounds errors
severity: high
kind: domain_rule
modality: must
consequence: CSV reader raises IndexError on malformed rows, causing transaction conversion to crash with incomplete output
stage_ids:
- log_conversion
- id: finance-C-048
when: When presenting AMLSim converted outputs
action: Claim synthetic transaction data represents real-world AML patterns or compliance-ready alerts
severity: high
kind: claim_boundary
modality: must_not
consequence: Regulatory bodies may take enforcement action if synthetic data is presented as validated AML intelligence
without proper disclosure
stage_ids:
- log_conversion
- id: finance-C-049
when: When outputting Faker-generated personal information
action: Present Faker-generated names and SSNs as real personal identification data
severity: medium
kind: claim_boundary
modality: must_not
consequence: Data misuse if synthetic personal data is mistaken for actual PII, violating data handling policies and privacy
expectations
stage_ids:
- log_conversion
- id: finance-C-050
when: When handling is_sar boolean to string conversion
action: Write 'YES'/'NO' strings to IS_SAR column in sar_accounts.csv (not 'true'/'false')
severity: high
kind: domain_rule
modality: must
consequence: SAR filtering in downstream dashboards fails because 'YES'/'NO' values are expected but 'true'/'false' are
written, causing zero SAR alerts detected
stage_ids:
- log_conversion
- id: finance-C-051
when: When reading prior_sar_count boolean field from accounts CSV
action: Map prior_sar_count boolean through AccountDataTypeLookup.inputType before writing to output
severity: high
kind: architecture_guardrail
modality: must
consequence: SAR history field mismatches schema expectations, causing account risk scoring algorithms to receive invalid
boolean values
stage_ids:
- log_conversion
- id: finance-C-052
when: When generating Python Faker instance for name anonymization
action: Seed Faker with deterministic value (Faker.seed(0)) for reproducible name generation
severity: medium
kind: operational_lesson
modality: must
consequence: Different Faker outputs across runs cause non-deterministic account names, breaking regression tests and
reproducibility requirements
stage_ids:
- log_conversion
- id: finance-C-058
when: When validating alert subgraph structures
action: check that the number of accounts falls within the specified min_accounts to max_accounts range
severity: high
kind: domain_rule
modality: must
consequence: Alert patterns with incorrect account counts will be incorrectly validated, causing the generated simulation
to deviate from specified typology parameters
stage_ids:
- alert_validation
- id: finance-C-059
when: When validating alert subgraph structures
action: check that the initial transaction amount falls within the specified min_amount to max_amount range
severity: high
kind: domain_rule
modality: must
consequence: Alert patterns with incorrect transaction amounts will be incorrectly validated, causing AML typologies to
violate financial thresholds specified in simulation parameters
stage_ids:
- alert_validation
- id: finance-C-060
when: When validating alert subgraph structures
action: check that the transaction period falls within the specified min_period to max_period range
severity: high
kind: domain_rule
modality: must
consequence: Alert patterns with incorrect transaction periods will be incorrectly validated, causing temporal characteristics
of AML typologies to deviate from simulation parameters
stage_ids:
- alert_validation
- id: finance-C-061
when: When implementing or extending pattern validation logic
action: introduce custom validation rules that diverge from the graph-theoretic property-based approach
severity: high
kind: domain_rule
modality: must_not
consequence: Text-based or regex matching approaches are less robust than graph-theoretic validation and may produce false
positives or negatives in pattern matching
stage_ids:
- alert_validation
- id: finance-C-062
when: When validating scatter-gather patterns
action: check that intermediate accounts have exactly one incoming edge and one outgoing edge (degree 1)
severity: high
kind: domain_rule
modality: must
consequence: Intermediate accounts with incorrect vertex degrees indicate malformed scatter-gather structures that should
not pass validation
stage_ids:
- alert_validation
- id: finance-C-068
when: When parsing schedule_id from alert parameter CSV
action: 'convert schedule_id to boolean ordered flag: schedule_id > 0 means ordered, schedule_id == 0 means unordered'
severity: high
kind: architecture_guardrail
modality: must
consequence: Incorrect conversion of schedule_id will cause ordered vs unordered validation checks to be applied incorrectly,
either missing required checks or adding invalid ones
stage_ids:
- alert_validation
- id: finance-C-069
when: When running the AlertValidator class
action: validate alerts before alert_transactions.csv has been generated by the transaction simulator
severity: high
kind: resource_boundary
modality: must_not
consequence: Attempting to validate non-existent transaction files will cause FileNotFoundError and validation will fail
without producing results
stage_ids:
- alert_validation
- id: finance-C-070
when: When validating individual alerts via AlertValidator.validate_single
action: raise KeyError if the requested alert_id does not exist in the loaded alert graphs
severity: medium
kind: operational_lesson
modality: must
consequence: Silent failure to handle non-existent alert IDs may cause misleading validation results in batch processing
stage_ids:
- alert_validation
- id: finance-C-071
when: When validating alert subgraph structures
action: extract the initial amount from the transaction occurring on the start_date (earliest transaction)
severity: high
kind: domain_rule
modality: must
consequence: Using the wrong transaction for initial amount comparison will cause amount range validation to fail for
valid patterns or pass for invalid ones
stage_ids:
- alert_validation
- id: finance-C-073
when: When reporting validation results
action: log both successful matches with parameter line number and failed matches with mismatch reason
severity: medium
kind: operational_lesson
modality: must
consequence: Missing diagnostic information will make it difficult to debug validation failures and identify which parameter
constraints were violated
stage_ids:
- alert_validation
- id: finance-C-074
when: When calculating transaction period for alert validation
action: compute period as the number of days between start_date and end_date inclusive
severity: high
kind: domain_rule
modality: must
consequence: Incorrect period calculation (e.g., exclusive end_date) will cause valid patterns to fail or invalid patterns
to pass validation
stage_ids:
- alert_validation
- id: finance-C-075
when: When validating alert patterns
action: claim that validation results prove real-world AML detection effectiveness
severity: high
kind: claim_boundary
modality: must_not
consequence: Presenting synthetic simulation validation as evidence of real-world AML detection capability misrepresents
the system's limitations
stage_ids:
- alert_validation
- id: finance-C-076
when: When generating validation reports
action: present validation results as proof of financial crime detection capability
severity: medium
kind: claim_boundary
modality: must_not
consequence: AML typology pattern validation only confirms synthetic data generation parameters, not the system's ability
to detect actual money laundering
stage_ids:
- alert_validation
- id: finance-C-077
when: When interpreting validation failure messages
action: dismiss validation failures as simulation artifacts rather than investigating root causes
severity: high
kind: rationalization_guard
modality: must_not
consequence: Attributing validation failures to simulation quirks without investigation may mask genuine bugs in pattern
generation or validation logic
stage_ids:
- alert_validation
- id: finance-C-078
when: When extending AML typology support
action: skip adding corresponding validation logic for newly added pattern types
severity: high
kind: architecture_guardrail
modality: must_not
consequence: Unvalidated pattern types will allow invalid synthetic data to be generated, compromising the integrity of
downstream ML training and evaluation
stage_ids:
- alert_validation
- id: finance-C-086
when: When using the combine_data script for batch combination runs
action: provide an even number of command-line arguments (InputConfJSON and Repetitions pairs)
severity: high
kind: domain_rule
modality: must
consequence: Script will exit with error code 1 and no data combination occurs, leaving incomplete datasets
stage_ids:
- data_combination
- id: finance-C-087
when: When aggregating degree statistics across multiple simulations
action: accumulate degree counts from each simulation using Counter addition
severity: high
kind: domain_rule
modality: must
consequence: Degree distribution statistics will be incomplete, causing graph analysis tools to miscalculate node connectivity
and miss high-degree suspicious accounts
stage_ids:
- data_combination
- id: finance-C-089
when: When processing the first alert member row in each simulation
action: initialize last_alert_id to 0 before processing if it is None
severity: high
kind: architecture_guardrail
modality: must
consequence: Alert ID offsetting will use None as offset, causing TypeError exceptions or silent ID corruption
stage_ids:
- data_combination
- id: finance-C-090
when: When skipping CSV header rows during data combination
action: call next(reader) once before processing each input CSV file
severity: high
kind: architecture_guardrail
modality: must
consequence: Header rows will be included as data rows, corrupting aggregated statistics and causing type conversion errors
stage_ids:
- data_combination
- id: finance-C-091
when: When validating combined dataset outputs for research purposes
action: claim that combined synthetic data represents real-world transaction patterns
severity: high
kind: claim_boundary
modality: must_not
consequence: Research results trained on synthetic AMLSim data will not generalize to real AML detection, potentially
wasting investigation resources on patterns that do not exist in actual financial crime
stage_ids:
- data_combination
- id: finance-C-092
when: When combining simulations that were generated with different random seeds
action: expect the combined dataset to maintain temporal ordering across simulation boundaries
severity: medium
kind: resource_boundary
modality: must_not
consequence: Transaction timestamps from later simulations may overlap with or precede those from earlier simulations,
breaking time-series analysis assumptions
stage_ids:
- data_combination
- id: finance-C-093
when: When using combine_data.py for very large-scale dataset creation
action: load entire output CSV files into memory during append operations
severity: medium
kind: resource_boundary
modality: should_not
consequence: Memory consumption will grow linearly with combined dataset size, potentially causing OutOfMemoryError for
multi-million row combinations
stage_ids:
- data_combination
- id: finance-C-094
when: When interpreting combined alert outputs for downstream AML analysis
action: assume that alert_id uniqueness alone guarantees cross-simulation alert attribution
severity: medium
kind: claim_boundary
modality: must_not
consequence: Alert type, schedule_id, and bank_id fields from different simulations may reference the same conceptual
alert pattern with different IDs after offset, causing analysis tools to miss related alerts
stage_ids:
- data_combination
- id: finance-C-095
when: When combining simulation outputs with repetitions parameter
action: load each input simulation configuration exactly N times as specified by the repetitions argument
severity: high
kind: operational_lesson
modality: must
consequence: Combined dataset will have incorrect simulation count, skewing statistical properties and reducing dataset
diversity
stage_ids:
- data_combination
- id: finance-C-097
when: When passing account IDs from graph_construction to alert_pattern_generation
action: Allow duplicate account IDs across different banks within the same simulation
severity: high
kind: domain_rule
modality: must_not
consequence: Alert validation will produce false matches when comparing transaction subgraphs against parameter definitions
- id: finance-C-099
when: When converting transaction timestamps from days to ISO format
action: Use the base_date configuration parameter as the reference epoch (2017-01-01 default)
severity: high
kind: architecture_guardrail
modality: must
consequence: Alert validation will compute incorrect transaction periods, causing false negatives in pattern matching
- id: finance-C-101
when: When reading alert transactions CSV in alert_validation
action: Parse date strings with ISO 8601 format (YYYY-MM-DDTHH:MM:SSZ)
severity: high
kind: architecture_guardrail
modality: must
consequence: Date parsing will raise ValueError, preventing validation from executing on any alert subgraph
- id: finance-C-102
when: When loading alert transaction subgraphs for validation
action: Construct NetworkX DiGraph with edge attributes containing both amount and date fields
severity: high
kind: architecture_guardrail
modality: must
consequence: Pattern validation functions will raise KeyError when accessing edge attributes for cycle/scatter-gather
checks
- id: finance-C-106
when: When referencing degree sequences during alert validation
action: Use degree.csv from the same simulation run as the alert parameter file
severity: medium
kind: operational_lesson
modality: must
consequence: Structural validation will compare alerts against mismatched degree distributions, producing false validation
failures
- id: finance-C-107
when: When using Python NetworkX library for graph operations
action: Use networkx version 2.x due to performance issues with large-scale graph creation
severity: high
kind: resource_boundary
modality: must_not
consequence: Graph construction will become extremely slow or run out of memory for large transaction networks (10K+ accounts)
- id: finance-C-108
when: When configuring the number of members in AML typologies
action: Specify member count greater than 1 to avoid degenerate single-account patterns
severity: high
kind: operational_lesson
modality: must
consequence: Typology generation will raise ValueError for insufficient member count, breaking the alert generation pipeline
- id: finance-C-109
when: When presenting backtest simulation results
action: Claim that simulated transaction patterns represent real-world money laundering behavior
severity: high
kind: claim_boundary
modality: must_not
consequence: Compliance teams may make incorrect regulatory decisions based on unrealistic synthetic data
- id: finance-C-110
when: When validating alert patterns against simulation parameters
action: Assume that generated alerts perfectly match parameter specifications due to random sampling
severity: medium
kind: claim_boundary
modality: must_not
consequence: Validation will report false mismatches for edge cases in random amount generation and temporal scheduling
- id: finance-C-114
when: When generating synthetic transaction data for AML analysis
action: Present the generated synthetic data as real-world financial transaction data or claim it reflects actual banking
activity
severity: high
kind: claim_boundary
modality: must_not
consequence: Users or organizations may use synthetic data in regulatory submissions or compliance reports, misrepresenting
the nature of the dataset and violating financial reporting standards
- id: finance-C-115
when: When using AMLSim for compliance or regulatory purposes
action: Claim that AMLSim-generated alerts or SAR flags are equivalent to real Suspicious Activity Reports or regulatory
compliance findings
severity: high
kind: claim_boundary
modality: must_not
consequence: Regulatory filings based on synthetic alerts may be rejected by authorities, leading to compliance violations
and potential legal liability for the filing organization
- id: finance-C-116
when: When integrating AMLSim into operational transaction monitoring systems
action: Use AMLSim outputs as inputs to real-time transaction monitoring, alerting, or blocking systems
severity: high
kind: claim_boundary
modality: must_not
consequence: Real-time monitoring systems receiving synthetic data may generate false alerts, fail to detect actual suspicious
activity, or block legitimate transactions based on simulated patterns
- id: finance-C-117
when: When interpreting simulation results for machine learning model training
action: Claim that ML detection models trained on AMLSim synthetic data will perform equivalently on real-world transaction
data without validation
severity: high
kind: claim_boundary
modality: must_not
consequence: ML models may exhibit significant performance degradation when deployed on real data, leading to missed detections
of actual money laundering activity and regulatory non-compliance
- id: finance-C-118
when: When converting transaction logs to CSV outputs
action: Output SAR flag values as lowercase string 'true' or 'false' (matching the schema specification)
severity: high
kind: domain_rule
modality: must
consequence: Alert downstream processing systems expecting lowercase boolean strings may fail to correctly identify SAR-flagged
transactions, causing incorrect compliance categorization
- id: finance-C-119
when: When representing in-memory transaction graphs
action: Use NetworkX DiGraph class for each in-memory graph representations (accounts as nodes, transactions as directed
edges)
severity: high
kind: architecture_guardrail
modality: must
consequence: Using MultiDiGraph for the main transaction graph may cause duplicate edge handling inconsistencies, while
using undirected graphs loses transaction directionality critical for AML typology detection
- id: finance-C-122
when: When configuring the AMLSim system
action: Set degree_threshold identically in both TransactionGenerator and Nominator instances
severity: high
kind: architecture_guardrail
modality: must
consequence: Mismatched degree thresholds cause incorrect identification of main account candidates; fan-in/fan-out patterns
are misclassified, corrupting AML typology simulation results
- id: finance-C-123
when: When initializing account nodes in the transaction graph
action: Initialize each account vertex with a 'normal_models' list attribute
severity: high
kind: architecture_guardrail
modality: must
consequence: KeyError raised when Nominator methods attempt to access 'normal_models' attribute for filtering; AML typology
assignment fails for accounts without initialized normal_models
- id: finance-C-124
when: When assigning AML typology roles to account candidates
action: Remove assigned nodes from the opposite candidate list (fan-in assigned nodes must be removed from fan-out candidates)
severity: medium
kind: architecture_guardrail
modality: must
consequence: Same account may be assigned multiple conflicting AML typology roles; simulation generates invalid nested
or circular transaction patterns that do not match parameter definitions
- id: finance-C-125
when: When initializing the TransactionGenerator for simulation
action: 'Execute initialization methods in the specified order: set_num_accounts -> generate_normal_transactions -> load_account_list
-> load_normal_models -> build_normal_models -> set_main_acct_candidates -> load_alert_patterns -> mark_active_edges'
severity: high
kind: architecture_guardrail
modality: must
consequence: Dependency violations cause AttributeError or KeyError exceptions; for example, generating transactions before
setting account count creates mismatched graph topology
- id: finance-C-126
when: When interpreting timestamp values in simulator outputs
action: Treat each timestamp values as days offset from base_date (default 2017-01-01), not as absolute dates or Unix
timestamps
severity: high
kind: domain_rule
modality: must
consequence: Misinterpretation of day offsets as Unix timestamps produces dates in year 1970 or beyond year 4000; misinterpretation
as absolute dates produces incorrect temporal ordering of transactions
- id: finance-C-127
when: When joining transaction and alert member datasets
action: Verify Alert IDs in transaction log match those in alert_members.csv for joinability
severity: high
kind: domain_rule
modality: must
consequence: SQL or pandas join operations fail to match alert transactions with alert members; downstream compliance
analysis cannot correlate transactions to suspicious accounts
- id: finance-C-128
when: When configuring AMLSim Python dependencies
action: Use networkx version 1.11 specifically (version 2.* is not supported due to performance issues with large graph
creation)
severity: high
kind: resource_boundary
modality: must
consequence: Using networkx 2.* causes severe performance degradation or out-of-memory errors when generating transaction
graphs with thousands of accounts; simulation may not complete
- id: finance-C-129
when: When creating base transaction graphs from degree sequences
action: Use MultiDiGraph as intermediate representation in directed_configuration_model, then convert to DiGraph for TransactionGenerator
severity: medium
kind: architecture_guardrail
modality: must
consequence: Skipping MultiDiGraph intermediate step may cause NetworkX API incompatibilities; duplicate edges in MultiDiGraph
are lost when converted to simple DiGraph, affecting transaction multiplicity
- id: finance-C-132
when: When validating alert transaction subgraphs
action: Match generated alert subgraphs against parameter definitions to detect structural inconsistencies
severity: medium
kind: operational_lesson
modality: must
consequence: Undetected inconsistencies between generated patterns and parameter files produce invalid typologies; ML
training data contains incorrectly structured transaction sequences
- id: finance-C-133
when: When implementing or refactoring the directed transaction graph generation logic
action: Maintain the self-loop avoidance logic that swaps IDs to prevent self-referential edges in the generated graph
severity: high
kind: domain_rule
modality: must
consequence: Removing self-loop swap logic causes artificial self-loops in transaction graphs, distorting AML pattern
analysis and producing unrealistic account-to-account relationships that bias detection algorithms toward false positives
or negatives
derived_from_bd_id: BD-001
- id: finance-C-134
when: When implementing hub node identification logic in AML transaction graph analysis
action: Use OR semantics when checking if degree_threshold is crossed (check if in_degree >= threshold OR out_degree >=
threshold) — must NOT use AND semantics that requires both in and out degree to exceed threshold
severity: high
kind: domain_rule
modality: must
consequence: Using AND semantics for hub detection excludes legitimate one-sided hub accounts (high senders or high receivers
only), reducing AML pattern coverage and missing detection opportunities for one-sided transaction patterns common in
layering and structuring schemes
derived_from_bd_id: BD-003
- id: finance-C-135
when: When implementing fan-in or fan-out alert pattern generation in the Nominator
action: Verify candidate sorting uses degree-based selection (out-degree for fan-in collection points, in-degree for fan-out
distribution points) — verify that high-activity nodes are prioritized as aggregation points rather than using random
selection
severity: medium
kind: operational_lesson
modality: should
consequence: Using random selection instead of degree-based sorting creates unrealistic aggregation points with no outbound
capability, generating AML alerts that appear anomalous to reviewers and reducing backtest fidelity for pattern detection
systems
derived_from_bd_id: BD-004
- id: finance-C-136
when: When implementing amount rounding logic for transaction generation
action: Implement the adaptive step size algorithm (7-30 slots per range) to create non-uniform distribution favoring
round numbers — verify step_size is between 7 and 30, and amounts align to step boundaries
severity: medium
kind: operational_lesson
modality: should
consequence: Using uniform distribution or step sizes below 7 produces unrealistic transaction amounts that lack the natural
clustering around round figures, causing generated AML alerts to appear artificial and fail pattern authenticity validation
derived_from_bd_id: BD-082
- id: finance-C-137
when: When modifying pattern detection logic (cycle, scatter_gather, gather_scatter) in either the graph generator or
validation module
action: Verify identical pattern detection logic is maintained in both validation/validate_alerts.py and the graph generator
— apply changes to both modules simultaneously to maintain detection consistency
severity: high
kind: architecture_guardrail
modality: must
consequence: Modifying pattern detection in only one module creates divergence where validation flags patterns the generator
missed or vice versa, causing inconsistent alert classification and breaking the independent verification capability
derived_from_bd_id: BD-079
- id: finance-C-139
when: When performing cross-module date arithmetic involving logs and analytics
action: Normalize base_date to a single consistent value before performing date arithmetic across modules; do not mix
conf.json/convert_logs.py (2017-01-01) with network_analytics.py (1970-01-01) without explicit conversion
severity: high
kind: operational_lesson
modality: must
consequence: Using inconsistent base dates across modules produces incorrect duration calculations, causing transaction
age and risk scoring errors that accumulate silently across pipeline boundaries
derived_from_bd_id: BD-073
- id: finance-C-140
when: When implementing scatter-gather pattern validation logic
action: Validate scatter-gather patterns with degree exactly 1 for intermediate nodes (neither sending nor receiving additional
transactions), monotonically decreasing amounts through the chain, and chronological transaction order within each phase
severity: high
kind: domain_rule
modality: must
consequence: Loose validation accepts malformed scatter-gather patterns that don't represent real money laundering schemes,
causing false positive alerts that waste investigation resources and dilute detection signal
derived_from_bd_id: BD-055
- id: finance-C-141
when: When implementing model assignment logic for AML typology simulation
action: Track remaining and used counts per typology type to verify specified model quantities match allocation, preventing
over or under-assignment of patterns to simulation accounts
severity: high
kind: architecture_guardrail
modality: must
consequence: Random assignment without per-type counters produces uncontrolled pattern distributions unsuitable for testing,
causing validation failures and unreliable detection algorithm assessment
derived_from_bd_id: BD-057
- id: finance-C-142
when: When implementing suspicious activity report (SAR) status checking logic
action: Check alert is_sar status using sar_id > 0 comparison (positive integer), where sar_id equals 0 indicates no SAR
filed and positive values indicate filed report IDs
severity: high
kind: domain_rule
modality: must
consequence: Using zero check (sar_id == 0) instead of positive integer check incorrectly marks null-SAR accounts as having
filed suspicious activity reports, violating database nullable integer semantics and causing compliance violations
derived_from_bd_id: BD-064
- id: finance-C-143
when: When implementing AML typology graph generation with hub accounting
action: Call remove_typology_candidate BEFORE add_node in each typology generator - this ordering ensures hub accounting
tracks candidates before node registration
severity: high
kind: domain_rule
modality: must
consequence: Reversing the order causes hub accounts to be miscounted and alerts to reference unregistered nodes, corrupting
the transaction graph structure and breaking alert correlation logic
derived_from_bd_id: BD-072
- id: finance-C-144
when: When implementing normal model account population for CSV export
action: Populate normal_models list AFTER mark_active_edges sets edge attributes - the active flag drives CSV export filter
and must be set before population
severity: high
kind: domain_rule
modality: must
consequence: Writing normal_models before mark_active_edges includes inactive accounts in exports, causing data quality
issues where CSV files contain accounts without valid transaction patterns
derived_from_bd_id: BD-081
- id: finance-C-145
when: When configuring cash transaction amount ranges for AML simulation
action: Set cash-in amounts with normal range 50-100 and fraud range 500-1000 (10x normal), and reverse the ranges for
cash-out - these thresholds create multi-dimensional fraud signatures essential for detection
severity: high
kind: domain_rule
modality: must
consequence: Using uniform amount ranges for both normal and fraud transactions eliminates the characteristic volume increase
signature, making transactions indistinguishable from legitimate cash activity and breaking detection algorithms
derived_from_bd_id: BD-045
- id: finance-C-146
when: When implementing fan-in pattern generation for structuring detection
action: Configure fan-in pattern with multiple originators sending to a single main account - this models smurfing schemes
where individuals make sub-threshold deposits to avoid reporting
severity: high
kind: domain_rule
modality: must
consequence: Using fan-out pattern (single originator to multiple destinations) instead reverses the money flow direction,
causing detection algorithms to look for opposite convergence patterns and miss actual structuring activity
derived_from_bd_id: BD-046
- id: finance-C-147
when: When implementing cycle pattern generation for sophisticated laundering detection
action: Form transactions into ring structures using modulo arithmetic for deterministic paths, and decrement amounts
at each hop via margin extraction to verify final amounts differ from initial
severity: high
kind: domain_rule
modality: must
consequence: Random cycle paths without modulo arithmetic or missing margin decrements cause funds to return unchanged
to origin, misrepresenting laundering fund degradation through layering stages
derived_from_bd_id: BD-050
- id: finance-C-148
when: When implementing scatter-gather pattern generation with temporal segmentation
action: Split scatter-gather at midpoint date with scatter phase (originators to intermediaries) executing before gather
phase (intermediaries to beneficiaries) - this creates two-phase temporal signature
severity: high
kind: domain_rule
modality: must
consequence: Implementing single-phase patterns instead of two-phase scatter-gather eliminates the temporal evasion dimension,
causing detection systems to miss timing-based evasion techniques that rely on phase delays
derived_from_bd_id: BD-051
- id: finance-C-149
when: When implementing gather-scatter pattern generation with reversed phase order
action: Execute gather phase (originators to intermediaries) first, then scatter phase (intermediaries to beneficiaries)
- the phase order is critical for creating mirror pattern to scatter-gather
severity: high
kind: domain_rule
modality: must
consequence: Reversing to scatter-first order makes the pattern identical to scatter-gather, creating a detection blind
spot where collection-first schemes are not identified regardless of phase order
derived_from_bd_id: BD-052
- id: finance-C-150
when: When implementing graph construction logic in amlsim.nominator (Nominator stage)
action: 'Maintain flow conservation invariants: in-degree sum must equal out-degree sum for every vertex, and num_accounts
% len(sequence) == 0 must hold; graph construction must fail-fast if these constraints are violated'
severity: high
kind: domain_rule
modality: must
consequence: Violating flow conservation invariants causes Nominator failures (BD-071) and prevents directed graph generation
entirely; backtest pipeline halts without generating transaction networks
derived_from_bd_id: BD-090
- id: finance-C-151
when: When implementing multi-jurisdiction AML compliance reporting
action: Assume the framework provides configurable CTR/SAR threshold handling per jurisdiction — the framework uses hardcoded
thresholds that cannot accommodate jurisdictional variations
severity: high
kind: claim_boundary
modality: must_not
consequence: Hardcoded CTR/SAR thresholds prevent deployment across multiple jurisdictions with different regulatory requirements,
causing compliance violations in production environments where thresholds differ from the hardcoded values
derived_from_bd_id: BD-GAP-017
- id: finance-C-152
when: When configuring AML threshold parameters for compliance reporting
action: Implement jurisdiction-specific CTR/SAR threshold configuration with audit trail — externalize thresholds to configuration
files with jurisdiction codes and maintain change history for regulatory audit purposes
severity: high
kind: domain_rule
modality: must
consequence: Without configurable thresholds, organizations cannot meet multi-jurisdiction AML requirements where CTR
limits vary (e.g., FinCEN $3000 vs UK £500) and regulators require documented threshold changes
derived_from_bd_id: BD-GAP-017
- id: finance-C-153
when: When initializing the TransactionGraphGenerator component
action: 'Execute initialization sequence exactly as: set_num_accounts -> generate_normal_transactions -> load_account_list
-> load_normal_models -> build_normal_models -> set_main_acct_candidates -> load_alert_patterns -> mark_active_edges'
severity: high
kind: domain_rule
modality: must
consequence: Violating the initialization order causes Nominator graph lookups to fail when normal_models lists are missing
or accounts are uninitialized, leading to AttributeError cascades in the alert generation pipeline
derived_from_bd_id: BD-066
- id: finance-C-154
when: When using ResultGraphLoader.count_hub_accounts() for analytics reporting
action: Verify that the dual counting behavior (base + extension) is expected for the use case — callers should not assume
this returns a simple hub account count as it includes both parent implementation and extended analytics counting
severity: medium
kind: operational_lesson
modality: should
consequence: Callers expecting a single hub account count will misinterpret the inflated value from dual counting, causing
metric discrepancies in downstream reporting and potentially incorrect AML alert prioritization
derived_from_bd_id: BD-070
- id: finance-C-155
when: When testing hub detection patterns at different threshold values
action: Verify test configurations match production threshold values — validate that tests run with threshold=10 (production
value) to guarantee correct behavior for hub-based pattern assignment
severity: high
kind: operational_lesson
modality: must
consequence: Tests passing at threshold=3 do not guarantee correct behavior at threshold=10, creating false confidence
where insufficient candidate pools for pattern assignment go undetected until production
derived_from_bd_id: BD-086
- id: finance-C-156
when: When running alert generation under high volume conditions
action: Monitor hub pool depletion rates and verify fallback behavior produces acceptable results — when hub pool exhausts,
the fallback to lower-degree accounts may violate realism requirements for pattern blending
severity: high
kind: operational_lesson
modality: must
consequence: Under high alert volumes, hub pool depletion causes fallback to lower-degree accounts that violate the realism
requirement, creating obvious anomalies that real-world AML systems would detect and reject
derived_from_bd_id: BD-087
- id: finance-C-157
when: When combining simulation runs with different schema versions
action: Combine data from runs with varying schema versions without schema validation — BD-015 enforces consistency while
BD-009 enables evolution, creating silent misinterpretation when schemas differ
severity: high
kind: domain_rule
modality: must_not
consequence: Schema evolution enabled by BD-009 combines with BD-015 consistency enforcement, causing silent data misinterpretation
when simulation runs with different schema versions are combined
derived_from_bd_id: BD-094
- id: finance-C-158
when: When implementing suspicious account classification for tiered AML monitoring
action: Verify that boolean risk flags (country_risk, business_risk) are sufficient for the AML rule engine — if nuanced
risk levels are needed, the architecture requires redesign as the system only supports discrete thresholds
severity: medium
kind: operational_lesson
modality: should
consequence: Boolean risk classification forces discrete categorization that breaks when nuanced risk levels (medium-high)
are required for tiered monitoring, potentially missing suspicious activity that falls between binary thresholds
derived_from_bd_id: BD-GAP-002
- id: finance-C-159
when: When implementing hub account detection logic using degree threshold
action: Verify that degree_threshold=4 matches the actual statistical outliers in degree distribution for the specific
dataset being analyzed; adjust threshold based on the actual degree distribution rather than using the default value
blindly
severity: medium
kind: operational_lesson
modality: should
consequence: Using degree_threshold=4 without verification may identify incorrect hub accounts; in money laundering detection,
misidentified hubs cause both false positives (unnecessary investigations) and false negatives (missed consolidation
points), violating FATF compliance requirements
derived_from_bd_id: BD-020
- id: finance-C-162
when: When using the framework's default margin ratio parameter for transaction amount generation
action: Verify that DEFAULT_MARGIN_RATIO=0.1 matches the actual intermediary fee structure in the target laundering scenario,
and adjust to reflect specific layering scheme economics if needed
severity: medium
kind: operational_lesson
modality: should
consequence: Using 10% margin creates detectable decrement patterns across multi-hop chains; if actual intermediary fees
differ, the generated transaction amounts will exhibit unrealistic margins that either over or understate laundering
costs, compromising detection validation
derived_from_bd_id: BD-021
- id: finance-C-163
when: When implementing transaction amount generation logic
action: Verify that transaction amount rounding follows psychologically appealing patterns (multiples of 10, 100, 1000)
as configured, and confirm the rounding strategy matches the target scenario's behavioral assumptions
severity: medium
kind: operational_lesson
modality: should
consequence: Rounding to round numbers creates realistic launderer behavior patterns that avoid obvious structuring thresholds;
removing this rounding produces either unnaturally distributed amounts or constant-amount chains that fail to represent
real transaction patterns
derived_from_bd_id: BD-024
- id: finance-C-164
when: When implementing normal model subgraph edge generation
action: Mark subgraph edges as active when they represent current-period transactions — active edges must be distinguishable
from dormant historical edges to enable downstream pattern detection filtering
severity: high
kind: domain_rule
modality: must
consequence: Without active edge marking, dormant historical transactions incorrectly match against current-period alert
patterns, causing false positive alerts that trigger unnecessary investigator review and dilute detection system effectiveness
derived_from_bd_id: BD-058
- id: finance-C-165
when: When implementing SAR account extraction logic during log conversion
action: Use org_type lookup to classify SAR accounts before schema routing — verify individual and organizational SAR
accounts are routed to their respective schemas to comply with reporting requirements
severity: high
kind: domain_rule
modality: must
consequence: Failing to classify SAR accounts by org_type causes schema routing violations where individual accounts receive
organizational schemas or vice versa, resulting in non-compliant SAR reports that regulatory authorities will reject
derived_from_bd_id: BD-011
- id: finance-C-166
when: When implementing alert validation logic that checks transaction patterns for AML detection
action: Verify that the validation framework enforces strict chronological ordering of transactions — verify transaction
sequence is validated as a temporal dependency, not just as data presence
severity: high
kind: operational_lesson
modality: must
consequence: Without chronological ordering enforcement, AML typologies like layering sequences are not detected correctly;
alerts for time-sensitive patterns generate false negatives, allowing suspicious transactions to pass undetected
derived_from_bd_id: BD-013
- id: finance-C-167
when: When routing normal model alerts through the scheduling system
action: Assume normal model alerts use the same dynamic CSV scheduling as AML typology patterns — normal model distribution
is hardcoded to schedule_id=1 regardless of CSV parameters
severity: high
kind: architecture_guardrail
modality: must_not
consequence: Hardcoded schedule_id=1 prevents multi-schedule simulation scenarios where normal activity distribution differs;
analysts cannot route normal model alerts to alternative schedules, limiting backtesting flexibility for schedule-dependent
strategies
derived_from_bd_id: BD-095
- id: finance-C-168
when: When implementing schedule routing configuration for pattern distribution
action: Use dynamic CSV scheduling configuration for AML typology patterns while acknowledging normal models require hardcoded
schedule_id=1 — do not attempt to override normal model schedule routing via CSV
severity: medium
kind: domain_rule
modality: should
consequence: Attempting to route normal model alerts through dynamic CSV causes routing conflicts; normal model alerts
always default to schedule 1, so configuration changes for normal models in CSV have no effect
derived_from_bd_id: BD-095
- id: finance-C-169
when: When processing transaction timestamps during graph_construction
action: Assume the framework handles timezone conversion or UTC normalization automatically — timestamps are not explicitly
annotated with timezone and may be treated as naive
severity: high
kind: claim_boundary
modality: must_not
consequence: Without explicit timezone annotation, transactions across multiple timezones are incorrectly sequenced in
the graph; UTC-based systems may misalign events by hours, causing cycle detection algorithms to miss or incorrectly
flag temporal patterns
derived_from_bd_id: BD-GAP-006
- id: finance-C-170
when: When constructing transaction graphs from multiple data sources with timestamps
action: Annotate each timestamps with explicit timezone identifiers and normalize to UTC before graph construction — convert
local timestamps using source timezone metadata and store as UTC-aware datetime objects
severity: high
kind: domain_rule
modality: must
consequence: Missing UTC normalization causes cross-timezone transaction graphs to have incorrect temporal ordering; alerts
relying on chronological sequences may trigger at wrong times or miss detection windows entirely
derived_from_bd_id: BD-GAP-006
- id: finance-C-171
when: When selecting historical data snapshots for graph_construction
action: Assume the framework provides point-in-time data availability — historical queries return current-state data,
not the state that existed at the query timestamp
severity: high
kind: claim_boundary
modality: must_not
consequence: Without point-in-time data, backtests use current entity states that include future changes unknown at the
historical timestamp; this introduces look-ahead bias where alerts reference accounts or entities modified after the
backtest date
derived_from_bd_id: BD-GAP-008
- id: finance-C-172
when: When running historical backtests or validating alerts against past timestamps
action: Query data using point-in-time semantics — use temporal query methods that return the entity state as it existed
at the specified timestamp, filtering out records created or modified after that point
severity: high
kind: domain_rule
modality: must
consequence: Using current-state data for historical backtests causes false positive alerts; entities that were valid
at the historical timestamp but were subsequently closed or flagged appear as suspicious when they were not at that
time
derived_from_bd_id: BD-GAP-008
- id: finance-C-173
when: When implementing pattern validation logic for AML alert detection
action: Use graph-theoretic algorithms (such as NetworkX simple_cycles for cycle detection) rather than regex or text-based
pattern matching — validate patterns based on transaction graph structure
severity: high
kind: domain_rule
modality: must
consequence: Regex-based validation can be evaded by simple field value changes or formatting variations; suspicious transactions
that modify field contents bypass detection while still exhibiting structurally suspicious patterns
derived_from_bd_id: BD-012
- id: finance-C-174
when: When combining multiple data inputs in the data_combination pipeline
action: Verify that each combined inputs share the same schema structure before processing — if schemas differ, the framework
will silently load schema from the first input only and may misinterpret subsequent data fields
severity: high
kind: operational_lesson
modality: must
consequence: Silent schema mismatch causes the framework to load structure from the first input only, potentially misinterpreting
field names and types in subsequent inputs and corrupting the combined dataset without raising errors
derived_from_bd_id: BD-015
- id: finance-C-175
when: When using the framework's DEFAULT_MARGIN_RATIO parameter for transaction cycle simulation
action: Verify that DEFAULT_MARGIN_RATIO=0.1 (10% fund retention) matches the actual regulatory requirement for intermediaries
in cycle/scatter-gather patterns, and adjust if the mandated retention ratio differs in the target jurisdiction
severity: medium
kind: operational_lesson
modality: should
consequence: Hardcoded 0.1 margin ratio causes the simulation to under-flag or over-flag transaction cycles if the actual
regulatory retention requirement differs, leading to validation results that don't match compliance expectations
derived_from_bd_id: BD-067
- id: finance-C-176
when: When processing data in the graph_construction stage
action: Assume the framework implements stale data detection or automatic data expiry — the framework does not include
staleness checks; expired or outdated data is processed as current without warning
severity: high
kind: claim_boundary
modality: must_not
consequence: Without stale data detection, the framework processes outdated data as current, causing downstream analysis
to use stale values and producing unreliable results in production systems
derived_from_bd_id: BD-GAP-009
- id: finance-C-177
when: When managing data feeds in the graph_construction stage
action: Implement a data staleness policy with configurable TTL (time-to-live) — add a timestamp or version field to each
data record, and mark records as expired when current_time - timestamp exceeds the configured TTL threshold
severity: high
kind: domain_rule
modality: must
consequence: Without a staleness policy, stale data continues to flow through the pipeline causing downstream systems
to make decisions based on outdated information
derived_from_bd_id: BD-GAP-009
- id: finance-C-178
when: When managing model and data artifacts in production systems
action: Assume the framework enforces model-data version consistency — the framework does not implement snapshot binding
between model versions and their corresponding training/inference data versions
severity: high
kind: claim_boundary
modality: must_not
consequence: Without version snapshot binding, models trained on old data can run against new data without validation,
causing prediction quality degradation that accumulates silently in production
derived_from_bd_id: BD-GAP-011
- id: finance-C-179
when: When registering or loading model artifacts in the graph_construction stage
action: Implement version snapshot binding by storing model_version and data_version metadata together in the artifact
registry, and validate that loaded model artifacts' data_version matches the target dataset's version before inference
severity: high
kind: domain_rule
modality: must
consequence: Without version binding, models trained on outdated data continue serving predictions against new data distributions,
causing prediction quality degradation that remains undetected until significant business impact occurs
derived_from_bd_id: BD-GAP-011
- id: finance-C-180
when: When generating synthetic transaction data with cycle patterns or scatter-gather patterns for AML system training
action: Introduce randomized margin ratios instead of fixed DEFAULT_MARGIN_RATIO=0.1; vary margin ratio stochastically
(e.g., uniform[0.05, 0.15] or normally distributed) to prevent uniform 10% decrement signature detection
severity: high
kind: operational_lesson
modality: must
consequence: Fixed 10% margin ratio creates uniform decrement signature across cycle and scatter-gather patterns; adversaries
can identify synthetic data origin by the consistent 0.1 ratio, compromising AML system training validity
derived_from_bd_id: BD-089
- id: finance-C-181
when: When combining data from multiple input sources or simulation runs in the fraud detection pipeline
action: Verify that each combined inputs share the same schema version before processing; implement schema validation
checks that detect drift between the first-loaded schema and subsequent inputs
severity: medium
kind: operational_lesson
modality: should
consequence: When inputs have different schema versions, the framework silently applies the first-loaded schema to all
combined data, misinterpreting fields in subsequent inputs and causing silent data corruption in aggregated alerts
derived_from_bd_id: BD-091
- id: finance-C-182
when: When implementing graph analysis algorithms for money laundering detection
action: Use weakly connected component analysis to identify isolated transaction clusters representing distinct money
laundering networks — do not replace with strongly connected components alone
severity: high
kind: architecture_guardrail
modality: must
consequence: Replacing weakly connected components with strongly connected components misses direction-agnostic connectivity
patterns in undirected graph views, causing isolated shell company networks and segmented operations to remain invisible
to detection algorithms
derived_from_bd_id: BD-039
- id: finance-C-183
when: When implementing money laundering pattern detection in transaction graphs
action: Use deterministic fan-out pattern where a single main account sends to multiple beneficiaries — do not replace
with random distribution recipients
severity: high
kind: domain_rule
modality: must
consequence: Replacing deterministic fan-out with random distribution breaks the reproducible test case structure and
misses the single-source multi-destination anomalies that model the final laundering distribution stage
derived_from_bd_id: BD-047
- id: finance-C-184
when: When implementing peer-to-peer layering pattern detection in transaction graphs
action: Use even split between originators and beneficiaries in bipartite patterns — do not use uneven splits that create
obvious hub accounts
severity: high
kind: domain_rule
modality: must
consequence: Using uneven splits creates obvious hub accounts detectable by simple degree thresholds, breaking the balanced
bipartite subgraphs that obscure the overall laundering flow by distributing activity symmetrically
derived_from_bd_id: BD-048
- id: finance-C-185
when: When implementing three-tier layering pattern generation in transaction graphs
action: Divide accounts into equal thirds for originator, intermediate, and beneficiary roles — do not use variable tier
sizes
severity: high
kind: domain_rule
modality: must
consequence: Using variable tiers blurs the distinct role boundaries between placement, layering, and integration stages,
causing the recognizable tiered structures representing classic three-tier laundering to become unrecognizable
derived_from_bd_id: BD-049
- id: finance-C-186
when: When implementing alert validation for cycle pattern detection
action: 'Enforce cycle-specific validation constraints: single cycle topology, chronological transaction ordering, and
unique transaction amounts — do not use generic validation that lacks topological and temporal constraints'
severity: high
kind: architecture_guardrail
modality: must
consequence: Using generic validation produces malformed synthetic cycles that do not match real-world ring structure
characteristics, causing false-positive detections in money laundering cycle alerts
derived_from_bd_id: BD-054
output_validator:
assertions:
- id: OV-01
check_predicate: all(p in inspect.getsource(zvt.factors.algorithm.macd) for p in ['slow=26', 'fast=12', 'n=9'])
failure_message: 'FATAL: MACD params drifted from (fast=12, slow=26, n=9) — SL-08 violation, non-reproducible signals'
business_meaning: Standard MACD parameters are a semantic lock; drift makes results incomparable with industry-standard
indicators and non-reproducible.
source_ids:
- SL-08
- BD-036
- id: OV-02
check_predicate: result.get('total_trades', 0) > 0 or result.get('explicit_zero_trade_ack') is True
failure_message: Zero trades executed — likely missing pre-fetched data (see PC-02) or over-restrictive filters
business_meaning: A backtest with zero trades is not a valid result; either data is missing or the strategy never triggered.
Structural non-emptiness check is insufficient — we need business confirmation.
source_ids:
- SL-01
- finance-C-073
- id: OV-03
check_predicate: result.get('annual_return') is None or abs(float(result['annual_return'])) <= 5.0
failure_message: 'FATAL: |annual_return| > 500% — likely look-ahead bias or data error'
business_meaning: Annual returns exceeding 500% are physically implausible for A-share strategies; indicates look-ahead
bias or corrupt data.
source_ids: []
- id: OV-04
check_predicate: result.get('holding_change_pct') is None or abs(float(result['holding_change_pct'])) <= 1.0
failure_message: 'FATAL: |holding_change_pct| > 100% — physically impossible'
business_meaning: Holding change percentage cannot exceed 100%; violation indicates position accounting error.
source_ids:
- BD-029
- id: OV-05
check_predicate: result.get('max_drawdown') is None or abs(float(result['max_drawdown'])) <= 1.0
failure_message: 'FATAL: |max_drawdown| > 100% — impossible for non-leveraged account'
business_meaning: Maximum drawdown cannot exceed 100% without leverage; violation indicates calculation error or look-ahead
bias.
source_ids: []
- id: OV-06
check_predicate: not (hasattr(result, 'trade_log') and result.trade_log and any(result.trade_log[i].action == 'sell' and
i+1 < len(result.trade_log) and result.trade_log[i+1].action == 'buy' and result.trade_log[i].timestamp == result.trade_log[i+1].timestamp
for i in range(len(result.trade_log)-1)))
failure_message: 'FATAL: buy-before-sell detected in same cycle — SL-01 violation, creates implicit leverage'
business_meaning: SL-01 requires sell() before buy() in each cycle; violation means available_long was not updated before
buying, risking duplicate positions.
source_ids:
- SL-01
scaffold:
validate_py_path: '{workspace}/validate.py'
tail_block: "# === DO NOT MODIFY BELOW THIS LINE ===\nif __name__ == \"__main__\":\n result = run_backtest()\n from\
\ validate import enforce_validation\n enforce_validation(result, output_path=\"{workspace}/result.csv\")\n# ===\
\ END DO NOT MODIFY ==="
enforcement_protocol: 1. Never edit validate.py. 2. Never delete the DO NOT MODIFY tail block from the main script. 3. Never
wrap enforce_validation() in try/except. 4. Never rewrite result write logic — it MUST go through enforce_validation.
5. If validate.py raises ImportError, fix the dependency, do not remove the call.
acceptance:
hard_gates:
- id: G1
check: '{workspace}/result.csv exists AND file size > 0'
on_fail: Strategy did not produce output; check run_backtest() return value and enforce_validation() call
- id: G2
check: '{workspace}/result.csv.validation_passed marker file exists'
on_fail: Validation did not complete; review validate.py output and fix assertion failures
- id: G3
check: 'Main script contains literal: from validate import enforce_validation'
on_fail: Validation chain stripped; re-add the import in the DO NOT MODIFY block
- id: G4
check: 'Main script contains literal: # === DO NOT MODIFY BELOW THIS LINE ==='
on_fail: Validation fence removed; regenerate DO NOT MODIFY tail block
- id: G5
check: 'result.csv has at least 1 row: pandas.read_csv(result.csv).shape[0] >= 1'
on_fail: Empty result; check if trade_log is non-empty and factors generated signals. Confirm PC-02 (k-data exists) passed.
- id: G6
check: 'If MACD strategy: source contains ''slow=26'' AND ''fast=12'' AND ''n=9'' in algorithm call'
on_fail: MACD params drifted from SL-08 lock; restore standard (12, 26, 9)
- id: G7
check: 'For data pipeline tasks: result.csv contains ''entity_id'' and ''timestamp'' fields'
on_fail: Missing required columns; check Mixin.query_data return schema and DataFrame MultiIndex reset_index() before
writing
- id: G8
check: 'OV-03 passes: abs(annual_return) <= 5.0 (500%)'
on_fail: Physical plausibility check failed; investigate look-ahead bias or data corruption in input kdata
soft_gates:
- id: SG-01
rubric: 'Strategy narrative consistency: user intent aligns with generated strategy.py logic. dim_a: signal direction
(buy/sell) matches intent [1-5, pass>=4]; dim_b: frequency (daily/intraday) aligns [1-5, pass>=4]; dim_c: risk controls
match user intent [1-5, pass>=4].'
- id: SG-02
rubric: 'Factor combination quality. dim_a: no highly correlated factor duplication [1-5, pass>=4]; dim_b: multi-period
alignment correct [1-5, pass>=4]; dim_c: liquidity filter present for A-share [1-5, pass>=4].'
- id: SG-03
rubric: 'Data source selection appropriateness. dim_a: coverage sufficient for target entities [1-5, pass>=4]; dim_b:
provider latency acceptable for strategy frequency [1-5, pass>=4]; dim_c: no unauthorized provider used without credentials
[1-5, pass>=4].'
skill_crystallization:
trigger: all_hard_gates_passed AND user_opt_out_skill_saving != true
output_path_template: '{workspace}/../skills/{slug}.skill'
slug_template: '{blueprint_id_short}-{uc_id_lower}'
captured_fields:
- name
- intent_keywords
- entry_point_script
- validate_script
- fatal_constraints
- spec_locks
- preconditions
- install_recipes
- human_summary_translated
action: 'After all Hard Gates PASS, resolve slug via slug_template using the executed UC, then write the .skill YAML file
at output_path_template. Notify user in their detected locale: ''Skill saved as {slug}.skill — next time say one of {sample_triggers}
from the matched UC to invoke directly.'''
violation_signal: All hard gates passed but no .skill file exists at expected path
skill_file_schema:
name: finance-bp-060 / Convert Logs to AML Simulation Data
version: v5.3
intent_keywords:
- convert logs
- synthetic data
- AML simulation
- generate transaction logs
- test data generation
entry_point: run_backtest
fatal_guards:
- SL-01
- SL-02
- SL-03
- SL-04
- SL-05
- SL-06
- SL-07
- SL-08
- SL-10
- SL-11
- SL-12
spec_locks:
- SL-01
- SL-02
- SL-03
- SL-04
- SL-05
- SL-06
- SL-07
- SL-08
- SL-09
- SL-10
- SL-11
- SL-12
preconditions:
- PC-01
- PC-02
- PC-03
- PC-04
post_install_notice:
trigger: skill_installation_complete
message_template:
positioning: I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow.
capability_catalog:
group_strategy:
source: auto_grouped
strategy_reason: auto-grouped by UC.type (5 distinct values, balanced distribution)
groups:
- group_id: data_pipeline
name: Data Pipeline
description: ''
emoji: 📊
uc_count: 4
ucs:
- uc_id: UC-101
name: Convert Logs to AML Simulation Data
short_description: Convert transaction log files into synthetic AML simulation data for testing anti-money laundering
detection systems
sample_triggers:
- convert logs
- synthetic data
- AML simulation
- uc_id: UC-102
name: Split Accounts by Bank ID
short_description: Partition account CSV files by bank identifier for bank-specific analysis and processing
sample_triggers:
- split accounts
- bank ID
- partition data
- uc_id: UC-103
name: Combine AML Simulation Outputs
short_description: Aggregate multiple AMLSim output files into a consolidated dataset for comprehensive analysis
sample_triggers:
- combine outputs
- merge data
- AMLSim aggregation
- uc_id: UC-104
name: Generate Transaction Graph
short_description: Generate the base transaction network graph used as input for AML simulation, defining account
relationships and transaction patterns
sample_triggers:
- transaction graph
- network generation
- graph topology
- group_id: research_analysis
name: Research Analysis
description: ''
emoji: 📦
uc_count: 1
ucs:
- uc_id: UC-105
name: Generate Scale-Free Network Graph
short_description: Generate scale-free network graphs using Kronecker graph algorithm for research on network topology
and distribution analysis
sample_triggers:
- scale-free
- Kronecker graph
- network topology
- group_id: monitoring
name: Monitoring
description: ''
emoji: 📦
uc_count: 3
ucs:
- uc_id: UC-106
name: Plot Alert Pattern Subgraphs
short_description: Visualize alert pattern subgraphs showing which accounts and transactions are involved in each
generated alert for debugging and validation
sample_triggers:
- alert visualization
- subgraph plot
- alert debugging
- uc_id: UC-112
name: Analyze Transaction Networks
short_description: Load AMLSim outputs and analyze transaction network characteristics including degree distribution,
connected components, and graph properties
sample_triggers:
- network analysis
- graph analytics
- validation
- uc_id: UC-113
name: Validate AML Simulation Alerts
short_description: Validate generated alerts against expected alert parameters to ensure AML simulation produces
correct alert patterns and amounts
sample_triggers:
- validate alerts
- alert verification
- simulation accuracy
- group_id: reporting
name: Reporting
description: ''
emoji: 📋
uc_count: 1
ucs:
- uc_id: UC-107
name: Plot Transaction Distributions
short_description: Generate statistical distribution plots (degree, amount, frequency) from transaction graphs for
analysis and reporting
sample_triggers:
- distribution plot
- statistics
- degree distribution
- group_id: builtin_factor
name: Builtin Factor
description: ''
emoji: 🧮
uc_count: 4
ucs:
- uc_id: UC-108
name: Random Amount Generator
short_description: Generate random transaction amounts within configurable min/max bounds for transaction simulation
sample_triggers:
- random amount
- transaction generator
- random number
- uc_id: UC-109
name: Account Nominator for Transaction Selection
short_description: Select appropriate accounts for different transaction types (fan-in, fan-out, single, mutual,
periodical) based on network degree thresholds
sample_triggers:
- account selection
- nominator
- transaction routing
- uc_id: UC-110
name: Rounded Amount Generator
short_description: Generate rounded transaction amounts (e.g., 100, 500, 1000) to simulate realistic human transaction
patterns
sample_triggers:
- rounded amount
- realistic transaction
- human pattern
- uc_id: UC-111
name: Normal Account Behavior Model
short_description: Define and manage normal (non-suspicious) account behavior models including main accounts and
member accounts for transaction simulation
sample_triggers:
- normal model
- behavior model
- account group
call_to_action: Tell me which one you want to try.
featured_entries:
- uc_id: UC-101
beginner_prompt: Try convert logs to aml simulation data
auto_selected: true
- uc_id: UC-102
beginner_prompt: Try split accounts by bank id
auto_selected: true
- uc_id: UC-103
beginner_prompt: Try combine aml simulation outputs
auto_selected: true
more_info_hint: Ask me 'what else can you do?' to see all 13 capabilities.
locale_rendering:
instruction: On skill_installation_complete, translate ALL user-facing strings (positioning + capability_catalog.groups[].name
+ capability_catalog.groups[].description + capability_catalog.groups[].ucs[].short_description + call_to_action + featured_entries[].beginner_prompt
+ more_info_hint) into detected user locale per locale_contract. Preserve UC-IDs, group_id, emoji, and sample_triggers
verbatim.
preserve_verbatim:
- UC-IDs
- group_id
- emoji
- sample_triggers
- technical_class_names
enforcement:
action: 'Host agent MUST send composed message to user as the FIRST user-facing response after skill_installation_complete
event. Message MUST contain: positioning, capability_catalog (rendered as markdown tables per group), 3 featured_entries,
call_to_action, and more_info_hint.'
violation_code: PIN-01
violation_signal: First user-facing message post-install does not contain the full capability_catalog (all UCs grouped)
OR skips featured_entries OR skips call_to_action.
human_summary:
persona: Doraemon
what_i_can_do:
tagline: 'I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow. Just tell me
what you want; I''ll write the code, you don''t have to dig docs. (Heads up: ZVT natively supports A-share, HK, and
crypto. US stocks — stockus_nasdaq_AAPL — are half-baked; don''t bother for serious work.)'
use_cases:
- Combine AML Simulation Outputs
- Split Accounts by Bank ID
- Convert Logs to AML Simulation Data
- A-share MACD daily golden-cross backtest with hfq price adjustment from eastmoney
- 'End-to-end ZVT pipeline: FinanceRecorder + GoodCompanyFactor + StockTrader'
- Multi-factor strategy with TargetSelector (AND mode) combining MACD + volume breakout
- Index composition data collection (SZ1000, SZ2000) with EM recorder
what_i_auto_fetch:
- ZVT stage pipeline structure (data_collection → visualization) from LATEST.yaml
- Semantic locks (SL-01 through SL-12) — especially sell-before-buy ordering and MACD params
- Fatal constraints (finance-C-*) relevant to your target strategy type
- 'Default parameters: MACD(12,26,9), hfq adjustment, buy_cost=0.001, base_capital=1M CNY'
- Entity ID format (stock_sh_600000) and DataFrame MultiIndex convention
- Provider-specific recorder class names and required class attributes
what_i_ask_you:
- 'Target market: A-share (default), HK, or crypto? (US stocks in ZVT are half-baked — stockus_nasdaq_AAPL exists but coverage
is thin)'
- 'Data source / provider: eastmoney (free, no account), joinquant (account+paid), baostock (free, good history), akshare,
or qmt (broker)?'
- 'Strategy type: MACD golden-cross, MA crossover, volume breakout, fundamental screen, or custom factor?'
- 'Time range: start_timestamp and end_timestamp for backtest period'
- 'Target entity IDs: specific stocks (stock_sh_600000) or index components (SZ1000)?'
locale_rendering:
instruction: On first user contact, translate all fields above into detected user locale while preserving Doraemon persona
(direct, frank, mildly snarky, knows limits).
preserve_verbatim:
- BD-IDs
- SL-IDs
- UC-IDs
- finance-C-IDs
- class_names
- function_names
- file_paths
- numeric_thresholds
分析alpha因子的预测能力与前向收益特征,生成分组收益、IC、换手率等报告,辅助量化策略的因子研究与事件分析。。
---
name: alphalens-factor-analysis
description: |-
分析alpha因子的预测能力与前向收益特征,生成分组收益、IC、换手率等报告,辅助量化策略的因子研究与事件分析。。
license: Proprietary. See LICENSE.txt in project root.
compatibility: Designed for Doramagic-host ecosystem (Claude Code / openclaw / Cursor). Requires Python 3.12+ with uv package manager.
metadata:
version: "v6.1"
blueprint_id: "finance-bp-120"
compiled_at: "2026-04-22T13:00:58.879278+00:00"
capability_markets: "multi-market"
capability_activities: "backtesting, factor-research"
sop_version: "crystal-compilation-v6.1"
---
# Alphalens 因子分析 (alphalens-factor-analysis)
> 分析alpha因子的预测能力与前向收益特征,生成分组收益、IC、换手率等报告,辅助量化策略的因子研究与事件分析。
## Pipeline
`data_collection -> data_storage -> factor_computation -> target_selection -> trading_execution -> visualization`
## Top Use Cases (6 total)
### Documentation Deployment (`UC-101`)
Automated build and deployment of project documentation to ensure consistent and reproducible documentation releases
**Triggers**: docs, deploy, build
### Sphinx Documentation Configuration (`UC-102`)
Configures the Sphinx documentation system with extensions for Python API documentation, Jupyter notebooks, and mathematical expressions
**Triggers**: sphinx, config, documentation
### PyFolio Portfolio Integration (`UC-106`)
Combines Alphalens factor analysis with PyFolio portfolio analytics to evaluate factor-derived portfolio performance, risk metrics, and tearsheet gene
**Triggers**: pyfolio, integration, portfolio
For all **6** use cases, see [references/USE_CASES.md](references/USE_CASES.md).
**Execute trigger**: `When user intent matches intent_router.uc_entries[].positive_terms AND user uses action verb (run/execute/跑/执行/backtest/fetch/collect)`
## What I'll Ask You
- Target market: A-share (default), HK, or crypto? (US stocks in ZVT are half-baked — stockus_nasdaq_AAPL exists but coverage is thin)
- Data source / provider: eastmoney (free, no account), joinquant (account+paid), baostock (free, good history), akshare, or qmt (broker)?
- Strategy type: MACD golden-cross, MA crossover, volume breakout, fundamental screen, or custom factor?
- Time range: start_timestamp and end_timestamp for backtest period
- Target entity IDs: specific stocks (stock_sh_600000) or index components (SZ1000)?
## Semantic Locks (Fatal)
| ID | Rule | On Violation |
|---|---|---|
| `SL-01` | Execute sell orders before buy orders in every trading cycle | halt |
| `SL-02` | Trading signals MUST use next-bar execution (no look-ahead) | halt |
| `SL-03` | Entity IDs MUST follow format entity_type_exchange_code | halt |
| `SL-04` | DataFrame index MUST be MultiIndex (entity_id, timestamp) | halt |
| `SL-05` | TradingSignal MUST have EXACTLY ONE of: position_pct, order_money, order_amount | halt |
| `SL-06` | filter_result column semantics: True=BUY, False=SELL, None/NaN=NO ACTION | halt |
| `SL-07` | Transformer MUST run BEFORE Accumulator in factor pipeline | halt |
| `SL-08` | MACD parameters locked: fast=12, slow=26, signal=9 | halt |
Full lock definitions: [references/LOCKS.md](references/LOCKS.md)
## Top Anti-Patterns (25 total)
- **`AP-ZVT-183`**: 除权因子为 inf/NaN 时直接参与乘法导致复权静默失败
- **`AP-ZVT-179`**: 第三方数据接口超限后异常被吞噬,数据静默缺失
- **`AP-ZVT-183B`**: HFQ(后复权)与 QFQ(前复权)K 线表使用错误导致因子计算漂移
All 25 anti-patterns: [references/ANTI_PATTERNS.md](references/ANTI_PATTERNS.md)
## Evidence Quality Notice
> [QUALITY NOTICE] This crystal was compiled from blueprint finance-bp-120. Evidence verify ratio = 55.2% and audit fail total = 22. Generated results may have uncaptured requirement gaps. Verify critical decisions against source files (LATEST.yaml / LATEST.jsonl).
## Reference Files
| File | Contents | When to Load |
|---|---|---|
| [references/seed.yaml](references/seed.yaml) | V6+ 全量权威 (source-of-truth) | 有行为/决策争议时必读 |
| [references/ANTI_PATTERNS.md](references/ANTI_PATTERNS.md) | 25 条跨项目反模式 | 开始实现前 |
| [references/WISDOM.md](references/WISDOM.md) | 跨项目精华借鉴 | 架构决策时 |
| [references/CONSTRAINTS.md](references/CONSTRAINTS.md) | domain + fatal 约束 | 规则冲突时 |
| [references/USE_CASES.md](references/USE_CASES.md) | 全量 KUC-* 业务场景 | 需要完整示例时 |
| [references/LOCKS.md](references/LOCKS.md) | SL-* + preconditions + hints | 生成回测/交易代码前 |
| [references/COMPONENTS.md](references/COMPONENTS.md) | AST 组件地图(按 module 拆分)| 查 API 时 |
---
*Compiled by Doramagic crystal-compilation-v6.1 from `finance-bp-120` blueprint at 2026-04-22T13:00:58.879278+00:00.*
*See [human_summary.md](human_summary.md) for non-technical overview.*
FILE:human_summary.md
# Human Summary
> I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow. Just tell me what you want; I'll write the code, you don't have to dig docs. (Heads up: ZVT natively supports A-share, HK, and crypto. US stocks — stockus_nasdaq_AAPL — are half-baked; don't bother for serious work.)
## What I Can Do
- **tagline**: I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow. Just tell me what you want; I'll write the code, you don't have to dig docs. (Heads up: ZVT natively supports A-share, HK, and crypto. US stocks — stockus_nasdaq_AAPL — are half-baked; don't bother for serious work.)
- **use_cases**: ['Event Study Analysis', 'Sphinx Documentation Configuration', 'Documentation Deployment', 'A-share MACD daily golden-cross backtest with hfq price adjustment from eastmoney', 'End-to-end ZVT pipeline: FinanceRecorder + GoodCompanyFactor + StockTrader', 'Multi-factor strategy with TargetSelector (AND mode) combining MACD + volume breakout', 'Index composition data collection (SZ1000, SZ2000) with EM recorder']
## What I Ask You
- Target market: A-share (default), HK, or crypto? (US stocks in ZVT are half-baked — stockus_nasdaq_AAPL exists but coverage is thin)
- Data source / provider: eastmoney (free, no account), joinquant (account+paid), baostock (free, good history), akshare, or qmt (broker)?
- Strategy type: MACD golden-cross, MA crossover, volume breakout, fundamental screen, or custom factor?
- Time range: start_timestamp and end_timestamp for backtest period
- Target entity IDs: specific stocks (stock_sh_600000) or index components (SZ1000)?
FILE:references/ANTI_PATTERNS.md
# Anti-Patterns (Cross-Project)
Total: **25**
## qlib (9)
### `AP-QLIB-1930` — 回测结果与模型无关——共享 dataset 对象导致预测值被首次模型覆盖 <sub>(high)</sub>
Qlib 中多个模型复用同一个已 fit 的 DatasetH 实例时,dataset 内部的标准化 参数(fit_start_time/fit_end_time 决定的归一化统计量)在第一次 fit 后固化。 切换模型但不重新初始化 dataset,导致所有模型实际使用同一套预测信号。表现为 无论换 LightGBM/XGBoost/DNN,回测净值曲线完全一致。这是最危险的"实验看起来 在跑,但结论全部无效"反模式。
Source: https://github.com/microsoft/qlib/issues/1930
### `AP-QLIB-2090` — fit_start_time 与 train segment 双重配置引发隐式数据泄露 <sub>(high)</sub>
Qlib DatasetH 有两个"训练数据范围":handler 的 fit_start_time/fit_end_time (决定归一化器拟合范围)和 segments.train(决定模型训练范围)。常见错误是 让 fit_end_time 覆盖 valid/test 段,使归一化统计量(均值、标准差)包含了 未来数据,造成前向偏差(look-ahead bias)。两者独立配置但语义耦合,文档 未明确说明 fit_end_time 必须 <= train_end。
Source: https://github.com/microsoft/qlib/issues/2090
### `AP-QLIB-2036` — MACD 因子公式文档错误——DEA 被多除一次 CLOSE 导致量纲不一致 <sub>(high)</sub>
Qlib 官方文档中的 Alpha 公式示例将 MACD 的 DEA 定义为 EMA(DIF, 9) / CLOSE, 但 DIF 已经是无量纲(除过 CLOSE 的),再次除以 CLOSE 导致 DEA 量纲为 1/price。 基于此文档公式构建的 MACD 因子在截面标准化后与正确公式差异显著,IC 下降。 此类文档层面的公式错误会被大量用户直接照搬入生产因子库。
Source: https://github.com/microsoft/qlib/issues/2036
### `AP-QLIB-2184` — 自定义 A 股数据导入前未按约定填充停牌日 NaN,引发下游因子噪声 <sub>(high)</sub>
Qlib 约定停牌日 open/close/high/low/volume/factor 字段均应填 NaN,以便框架 在因子计算时识别并跳过。用户自建 A 股数据集时若将停牌日保留为上一日价格 (常见于从东财/Wind 直接导出的数据),会导致停牌期间的价格动量因子出现 "假信号"(价格不变但因子非零)。Qlib 不校验此约定,错误静默流入训练数据。
Source: https://github.com/microsoft/qlib/issues/2184
### `AP-QLIB-1892` — PIT(Point-In-Time)财务数据收集器依赖外部股票列表接口,全量 A 股获取不完整 <sub>(high)</sub>
Qlib 的 PIT 数据收集器(财务数据时间点快照)在初始化时调用 get_hs_stock_symbols() 获取沪深股票列表。该函数依赖东财 API,经常仅返回 部分列表而非全量 5000+ 股票,且函数在获取不完整时直接 raise ValueError。 用户若按文档步骤操作,财务数据集将只覆盖部分股票,基于 PIT 财务因子的回测 存在严重生存者偏差(未被采集的股票被隐式排除)。
Source: https://github.com/microsoft/qlib/issues/1892
### `AP-QLIB-2097` — 全市场 instrument="all" 在 32GB 内存机器上 OOM,但 CSI300 正常 <sub>(medium)</sub>
Qlib 在加载 Alpha158 特征时会将指定 universe 的全部特征矩阵一次性载入内存。 使用 instrument="csi300"(300 股)与 instrument="all"(5000+ 股)的内存占用 差约 16 倍。32GB 机器跑全市场时在 init_instance_by_config 阶段直接 OOM, 错误信息不提示内存问题。用户容易误以为是配置错误,实际上需要分批加载或 使用流式特征计算。
Source: https://github.com/microsoft/qlib/issues/2097
### `AP-QLIB-1984` — LightGBM 模型标签维度校验逻辑永远不触发导致多标签训练静默失败 <sub>(medium)</sub>
Qlib gbdt.py 中用 y.values.ndim == 2 判断是否为多标签,但从 DataFrame 取出的 Series 的 ndim 永远为 1,条件永远为 False,因此多标签训练不会走 squeeze 分支,而是直接进入 LightGBM 训练并在更深处抛出语义不明的错误。 用户尝试自定义多标签任务时无法从错误信息定位到此根因。
Source: https://github.com/microsoft/qlib/issues/1984
### `AP-QLIB-1915` — 自定义 CSV 数据 dump_bin 后 DataHandler 报 Length mismatch,D.features 却正常 <sub>(high)</sub>
Qlib 存在两套数据访问路径:D.features(直接读 binary)和 DataHandler/DataHandlerLP (带 processor pipeline)。自定义 A 股 CSV 数据在 dump_bin 时若字段顺序 或 symbol 格式(如 600000.SH vs SH600000)与 Qlib 约定不符,DataHandler 的 processor 在 align/reindex 时触发 Length mismatch,而 D.features 因不 经过 processor 而成功。这一"两套路径行为不一致"让用户误以为数据已正确导入。
Source: https://github.com/microsoft/qlib/issues/1915
### `AP-QLIB-1949` — Colab/Linux 多进程后端与 Qlib ParallelExt 冲突导致 DataHandler 完全不可用 <sub>(medium)</sub>
Qlib 在非 fork 环境(Windows 或 Google Colab)中,DataHandler 使用 joblib 并行加载特征时,ParallelExt 初始化时访问 _backend_args 属性失败(AttributeError)。 根因是 joblib 1.5+ 移除了该内部属性,Qlib 的兼容层未更新。表现为 D.features 调用抛出多层嵌套异常,用户无法从错误栈判断是并行后端问题还是数据问题。
Source: https://github.com/microsoft/qlib/issues/1949
## vnpy (4)
### `AP-VNPY-3691` — K 线生成器首根 K 线时间戳不对齐,导致第一个周期信号错误 <sub>(high)</sub>
vnpy BarGenerator 在合成 N 分钟 K 线时,第一根推送的 K 线时间戳为"当前 tick 所在分钟"而非"完整 N 分钟周期结束时间"。具体表现:09:59 的 tick 会 触发一根不完整的 5 分钟 K 线推送(本应等到 10:04 才推送)。策略若在 on_bar 中直接用 datetime.minute % 5 过滤,第一根 K 线恰好通过,但包含的 数据不足一个完整周期,用于信号计算会产生错误的开仓信号。
Source: https://github.com/vnpy/vnpy/issues/3691
### `AP-VNPY-3669` — Alpha 模块历史数据增量保存时新旧 DataFrame schema 不兼容导致 SchemaError <sub>(medium)</sub>
vnpy Alpha 模块在保存 K 线数据到 Parquet 文件时,将新下载数据(可能含 Float64 列)与已存文件(历史 Int64 列)直接 polars.concat。polars 强类型 不允许隐式类型提升,抛出 SchemaError。根因是不同数据源/版本返回的字段类型 不一致(如 volume 在部分行情源为整数,在另一些为浮点),且 concat 前无 schema 对齐步骤。影响所有使用 vnpy alpha 进行回测的历史数据构建流程。
Source: https://github.com/vnpy/vnpy/issues/3669
### `AP-VNPY-3685` — 价差交易模块 run_backtesting() 在 Jupyter 环境下静默报错,结果不可信 <sub>(high)</sub>
vnpy 4.10 价差交易(SpreadTrading)模块的 run_backtesting() 在 Jupyter 环境下存在事件循环冲突(asyncio already running),导致回测引擎部分逻辑 不执行但不抛异常,返回看似正常的回测统计数据。同样代码在命令行 Python 中无此问题。vnpy 4.x 将部分 IO 改为 async 但 Jupyter 的事件循环与之不兼容, 是"回测结果看起来正确但实际不完整"的隐蔽陷阱。
Source: https://github.com/vnpy/vnpy/issues/3685
### `AP-VNPY-3700` — 安装脚本不使用 venv 导致全局 numpy 版本被降级破坏其他依赖 <sub>(medium)</sub>
vnpy install.bat 直接在系统/conda base 环境安装,会强制降级 numpy 到 <2.0 以满足 vnpy 依赖,破坏依赖 numpy 2.x 的其他量化工具(如 scipy、pytorch 新版)。 没有 requirements.txt,依赖边界不透明。在多工具共存的量化研究环境中, vnpy 的安装脚本是"全局环境污染"的常见根源。
Source: https://github.com/vnpy/vnpy/issues/3700
## zipline (6)
### `AP-ZIPLINE-138` — 回测价格为未复权价,教程图表误导用户误判策略收益 <sub>(high)</sub>
Zipline 教程使用 AAPL 股价图做演示,但 bundle 中存储的是未复权价格(raw price), 而非经过拆股/分红调整的复权价。图表显示的历史价格与市场实际价约差 4 倍(Apple 历次拆股累计因子),用户误将"价格翻 4 倍"当作策略收益。A 股场景更严重: 除权前后价格跳变会在未复权数据中形成巨大"信号",吸引技术指标在除权日产生 虚假突破信号。
Source: https://github.com/stefan-jansen/zipline-reloaded/issues/138
### `AP-ZIPLINE-235` — 默认以当根 K 线收盘价成交,低估实盘滑点,策略回测收益虚高 <sub>(high)</sub>
Zipline 默认滑点模型在当根 K 线触发信号后,以同根 K 线收盘价成交(current bar close fill)。实盘中信号只能在下一根 K 线的开盘价附近成交(T+1 order execution)。以 A 股日线为例,用收盘价回测比用次日开盘价成交平均高估日收益 约 0.1-0.3%,年化差距可超 30%。需显式配置 slippage model 为 VolumeShareSlippage 或 FixedSlippage 并设合理 volume_limit。
Source: https://github.com/stefan-jansen/zipline-reloaded/issues/235
### `AP-ZIPLINE-190` — 日历 start_session 设为非交易日触发 DateOutOfBounds,无提示如何修正 <sub>(medium)</sub>
Zipline 在注册 bundle 或运行算法时,若 start_session 参数恰好是非交易日 (如 1998-01-01 元旦),Calendar 校验抛出 DateOutOfBounds("cannot be earlier than the first session")。错误信息仅显示交易日历起始日,不提示"请改为第一个 交易日"。A 股场景:使用 SSE/SZSE 日历时,若 start_date 恰好是春节前最后 一天次日(节假日),会触发同类错误,调试成本极高。
Source: https://github.com/stefan-jansen/zipline-reloaded/issues/190
### `AP-ZIPLINE-181` — asset db 过期后 Pipeline 报"no assets traded",误导用户排查数据范围 <sub>(high)</sub>
Zipline 的 asset database(SQLite)记录每只股票的 start/end 交易日期。若 使用了旧版 Quandl/自建 bundle 且未重新 ingest,在回测新日期范围时 Pipeline 抛出 "Failed to find any assets with country_code 'US' that traded between [dates]"。A 股场景:重新下载行情后若只更新价格数据而未重建 asset db,退市/ 新上市股票的日期范围不更新,Pipeline 过滤会悄悄排除这些股票,产生生存者偏差。
Source: https://github.com/stefan-jansen/zipline-reloaded/issues/181
### `AP-ZIPLINE-285` — week_start()/week_end() 在自定义日历(非美股)下静默失效 <sub>(medium)</sub>
Zipline schedule_function 的 date_rules.week_start() 和 date_rules.week_end() 依赖交易日历的周首/周末判断逻辑,但在非美股日历(如 ASX、SSE)中,该逻辑 与 NYSE 日历的偏移计算不兼容,导致 schedule 永远不触发或在错误的日期触发。 A 股场景:使用 SSE 日历时,含春节等连续长假的周,week_start 可能跳过整个 假期周而不调仓,但用户无法从日志发现未触发的调度。
Source: https://github.com/stefan-jansen/zipline-reloaded/issues/285
### `AP-ZIPLINE-240` — 回测日期时区必须为 UTC,传入 naive datetime 引发深层 AssertionError <sub>(medium)</sub>
Zipline 内部强制要求所有时间戳为 UTC aware datetime。当用户传入 naive datetime (无时区信息,如 pd.Timestamp('2020-01-01'))时,不在入口处报错,而是在 算法执行深处触发 AssertionError: Algorithm should have a utc datetime,栈深 难以定位。A 股开发者从本地 CST 时间导入数据时极易触发此陷阱,需在 bundle 注册时显式 tz_localize('UTC')。
Source: https://github.com/stefan-jansen/zipline-reloaded/issues/240
## zvt (6)
### `AP-ZVT-183` — 除权因子为 inf/NaN 时直接参与乘法导致复权静默失败 <sub>(high)</sub>
ZVT 在计算前复权因子时以 new/old 价格比计算 qfq_factor。当 old==0(新股首日 或数据缺失)时因子为 inf;当 kdata.open 本身为 None(停牌日未填充)时乘法 抛出 TypeError。结果:整个 entity 的复权计算中断,后续 K 线全部丢失,但主 流程只 log ERROR 不中断,用户往往不知道已有大量股票数据损坏。
Source: https://github.com/zvtvz/zvt/issues/183
### `AP-ZVT-179` — 第三方数据接口超限后异常被吞噬,数据静默缺失 <sub>(high)</sub>
ZVT 使用聚宽 jqdatasdk 批量拉取全市场 K 线时(4000+ 股票),触发聚宽每日 最大查询条数限制(错误:已超过每日最大查询数量)。ZVT 捕获异常后继续执行下一 entity,导致超限后所有股票的当日数据均静默缺失。回测若使用该残缺数据库,因 子计算结果将产生系统性偏差,且无告警。
Source: https://github.com/zvtvz/zvt/issues/179
### `AP-ZVT-161` — 全市场 SQLite 批量因子计算触发 too many SQL variables 错误 <sub>(medium)</sub>
ZVT 在计算 VolumeUpMaFactor 等多股因子时,将所有 entity_id 拼入单条 SQL 的 IN 子句。当 A 股全市场(5000+ 股)一次性查询时,触发 SQLite 默认限制 SQLITE_MAX_VARIABLE_NUMBER=999。调大 max_allowed_packet(MySQL 参数)无效, 根因是 SQLite 变量数上限。正确解法是分批查询,但 ZVT 早期版本未处理此边界。
Source: https://github.com/zvtvz/zvt/issues/161
### `AP-ZVT-129` — 使用通配符导入隐藏 API 版本变更,AdjustType 等枚举莫名消失 <sub>(medium)</sub>
ZVT 文档示例使用 `from zvt import *` 导入所有符号。当 ZVT 版本升级重构 枚举(如将 AdjustType 移入子模块)后,通配符导入不再包含该符号,触发 AttributeError。使用者误以为是安装问题,实际是版本间 API breaking change 未在 CHANGELOG 中标注,且通配符导入掩盖了具体来源。应显式 import 枚举类。
Source: https://github.com/zvtvz/zvt/issues/129
### `AP-ZVT-187` — 回测引擎未在数据层空结果时提前终止,导致空指针级联崩溃 <sub>(medium)</sub>
ZVT Trader 在 load_data 完成后检查数据为空时,不提前退出,而是将空 DataFrame 传入 selector 计算,触发后续 NoneType 操作链式崩溃。错误栈深且难以定位根因, 用户误以为是策略逻辑问题。根因是数据时间窗口配置错误(start/end 不在数据 库覆盖范围内)但无有效校验。
Source: https://github.com/zvtvz/zvt/issues/187
### `AP-ZVT-183B` — HFQ(后复权)与 QFQ(前复权)K 线表使用错误导致因子计算漂移 <sub>(high)</sub>
ZVT 提供 Stock1dKdata(不复权)、Stock1dHfqKdata(后复权)、Stock1dQfqKdata (前复权)三张独立表。用户在计算价格动量/均线因子时混用两张表(如用不复权 做均线,用后复权做收益率),导致除权日前后因子值产生跳变。ZVT 不做跨表 复权类型一致性校验,混用静默通过。
Source: https://github.com/zvtvz/zvt/issues/183
FILE:references/COMPONENTS.md
# Component Capability Map
**Project**: finance-bp-120--alphalens-reloaded
**Scan date**: 2026-04-22
**Stats**: {'total_files': 4, 'total_classes': 32, 'total_functions': 0, 'total_stages': 4}
## Modules (4)
- [data_preparation_&_alignment](components/data_preparation_-_alignment.md): 8 classes
- [performance_&_risk_metrics](components/performance_-_risk_metrics.md): 9 classes
- [plotting_&_visualization](components/plotting_-_visualization.md): 8 classes
- [tear_sheet_reporting](components/tear_sheet_reporting.md): 7 classes
FILE:references/CONSTRAINTS.md
# Constraints
## preservation_manifest
```yaml
required_objects:
business_decisions_count: 136
fatal_constraints_count: 31
non_fatal_constraints_count: 129
use_cases_count: 6
semantic_locks_count: 12
preconditions_count: 4
evidence_quality_rules_count: 2
traceback_scenarios_count: 5
```
## Domain Constraints Injected (39)
- **`SHARED-BT-LAB-001`** <sub>(fatal)</sub>: 未来函数(Lookahead Bias):在模拟历史时间点 t 的交易决策时, 不得使用 t 时刻之后才能知道的信息。最常见形式: (1) 使用收盘价计算信号并同日以收盘价成交; (2) 将 T 日收盘后计算的指标标记在同一根 K 线; (3) 使用当日最高/最低价作为成交假设。 信号计算与成交时间必须对齐:T 日收盘后计算信号,T+1 日开盘成交。
- **`SHARED-BT-LAB-002`** <sub>(high)</sub>: 指标预热期(Warmup Period)处理:滚动窗口指标在前 N 个 bar 时 NaN, 这些 bar 不应参与信号计算和持仓决策。强制要求指标的 warmup_period 与最长 lookback 期等长,且 warmup 期间持仓应置零。
- **`SHARED-BT-LAB-003`** <sub>(fatal)</sub>: ML/DL 模型时序数据划分必须按时间顺序:TRAIN < VALID < TEST, 不可使用随机 k-fold 分折(会将未来数据混入训练集)。 应使用 TimeSeriesSplit 或 Walk-Forward 验证。
- **`SHARED-BT-LAB-004`** <sub>(fatal)</sub>: 开盘价/最高价/最低价成交假设:日线回测中假设每日可以最高价卖出或 最低价买入(如动量策略"最高价止盈"),这是明显的 lookahead, 因为日内最高/最低价只有收盘后才能确认。成交价只能用开盘价或 前一日收盘价(带滑点)。
- **`SHARED-BT-LAB-005`** <sub>(high)</sub>: 数据对齐偏移(Off-by-one):pandas rolling/shift 等操作容易引入细微的 1 期偏移错误。应在代码中明确记录每个序列的"观测时间点", 并通过 assert 验证关键时间对齐关系。
- **`SHARED-BT-LAB-006`** <sub>(high)</sub>: 过度优化(Overfitting):回测数量越多,过拟合概率越高。 Bailey et al.(2014)证明 Optimal Sharpe Ratio 期望值随回测次数单调递减。 应使用 Walk-Forward 验证代替 in-sample 参数穷举,并报告 Deflated Sharpe Ratio(DSR)而非峰值 Sharpe。
- **`SHARED-BT-SURV-001`** <sub>(fatal)</sub>: 幸存者偏差(Survivorship Bias):使用当前市场成分股作为历史回测股票池, 会遗漏曾经存在但后来退市、摘牌或被合并的股票,系统性高估策略历史收益率。 回测股票池必须使用历史时点快照(point-in-time universe)。
- **`SHARED-BT-SURV-002`** <sub>(high)</sub>: In-Sample / Out-of-Sample 划分:策略开发、参数选择必须在样本内完成, 样本外数据仅用于最终验证,不可多次"看"样本外数据后继续调优 (会将样本外变为新的样本内,重蹈过拟合)。
- **`SHARED-BT-SURV-003`** <sub>(high)</sub>: 停牌/缺失数据的填充策略:停牌日价格不可简单用前一日收盘价 forward-fill, 因为这会在复盘时造成"零成交量"日参与了因子计算和信号生成。 应在因子计算层显式过滤缺失交易日,不填充。
- **`SHARED-BT-SURV-004`** <sub>(high)</sub>: 异常值(Extreme Value)污染:原始市场数据可能含有数据源错误(如除权未 及时调整、手工录入错误导致的极端价格),不清洗直接进入因子计算会产生 极端信号,污染整个横截面。应在 pipeline 入口处过滤 3-sigma 异常值。
- **`SHARED-BT-COST-001`** <sub>(fatal)</sub>: 交易成本(佣金 + 印花税/转让税 + 过户费)必须在回测初始化时强制配置, 不可使用零成本默认值。忽略成本的回测策略绩效指标具有欺骗性, 高换手率策略尤其严重(单边往返成本往往吞噬 50%+ 的毛收益)。
- **`SHARED-BT-COST-002`** <sub>(high)</sub>: 滑点(Slippage)建模:回测若无滑点,假设每笔订单以理想价格成交, 高频策略在实盘中会因成交价劣化而产生严重亏损。至少应配置固定点差 或比例滑点;大单应使用成交量比例模型(如不超过日成交量 5%)。
- **`SHARED-BT-COST-003`** <sub>(high)</sub>: 换手率(Turnover)必须在回测绩效报告中展示并与成本关联分析。 月换手率超过 50%(年化 600%+)时,策略净收益对成本假设极度敏感, 每 10bps 成本变化可能改变策略盈亏结论,必须做成本敏感性分析。
- **`SHARED-BT-COST-004`** <sub>(medium)</sub>: 仓位规模化(Position Sizing)必须纳入资金量约束:回测应模拟固定资金量 下的实际持仓股数(取整),而非假设可以持有小数股。 对小盘股,最小交易单位(A股:100股/手)会导致实际可持仓量与目标权重 产生偏差,应在回测中模拟取整效应。
- **`SHARED-BT-TIME-001`** <sub>(high)</sub>: 时间戳时区统一:多数据源合并时,UTC vs 本地时间混用是常见数据腐败源。 所有时间戳必须在 pipeline 入口处统一转换为同一时区(推荐 UTC 存储, 市场本地时区展示),不可在 pipeline 中途混用不同时区。
- **`SHARED-BT-TIME-002`** <sub>(high)</sub>: 交易日历对齐:合并不同市场或不同频率数据时(如日线价格 + 周频因子), 必须使用明确的交易日历进行 reindex/merge,不可使用 outer join 后 fillna, 否则会在非交易日(节假日)创建虚假数据行。
- **`SHARED-BT-TIME-003`** <sub>(high)</sub>: 增量更新边界校验:历史数据增量更新时,必须从数据库查询已存最新日期, 仅下载该日期之后的数据。若重新下载已有数据并追加,会产生时间戳重复行, 导致回测时序错误。更新前后必须校验无重复 (index.duplicated().any() == False)。
- **`SHARED-BT-TIME-004`** <sub>(medium)</sub>: 回测绩效归因失真:基准(Benchmark)选择不当会使 Alpha/Beta 计算失真。 应选用策略实际可投资的被动基准(如 HS300 ETF),而非不可直接投资的 价格指数(如 HS300 指数)。价格指数不含股息再投资,会低估持仓基准收益。
- **`SHARED-BT-PERF-001`** <sub>(medium)</sub>: 最大回撤(Max Drawdown)计算必须使用净值序列(portfolio value), 不可用累计收益率序列代替。若使用对数收益率累加,会低估回撤深度 (因对数收益率在下跌时会比简单收益率偏小)。
- **`SHARED-BT-PERF-002`** <sub>(medium)</sub>: Sharpe Ratio 年化化约定:年化 Sharpe = 日 Sharpe × sqrt(252)(股票,252 交易日) 或 × sqrt(365)(加密货币,365日)。不同系统默认不同,跨系统对比前必须 确认年化因子,否则 Sharpe 不可比。
- **`SHARED-BT-PERF-003`** <sub>(medium)</sub>: Calmar Ratio / Sortino Ratio 优于 Sharpe Ratio 作为风险调整收益指标: Sharpe 假设收益正态分布,A 股/加密市场的收益分布显著左偏(肥尾), 会低估下行风险。量化评估应同时报告 Sortino(仅下行波动)和 Calmar(年化收益/最大回撤),不应单一依赖 Sharpe。
- **`SHARED-BT-PERF-004`** <sub>(medium)</sub>: 回测绩效归因应拆解为:alpha(主动收益)、beta(市场收益)、 因子暴露收益(style/sector)和特异性收益(stock selection)。 不做归因的回测无法区分"策略优秀"与"顺风行情恰好 beta 对了"。
- **`SHARED-FR-IC-001`** <sub>(high)</sub>: IC(信息系数)是衡量因子预测能力的核心指标,定义为因子值与 下期收益率的 Spearman 秩相关系数(ICIR = IC / std(IC))。 IC 绝对值 > 0.05 视为有预测能力的初步证据,ICIR > 0.5 视为稳定。 不计算 IC 直接报告回测绩效是因子有效性证明缺失的典型问题。
- **`SHARED-FR-IC-002`** <sub>(high)</sub>: IC 衰减(IC Decay)分析:因子预测能力通常随持仓期增长而衰减。 应计算 1/5/10/20 日 IC 序列,识别因子的最优持仓期。 IC 在1日高但20日迅速衰减的因子是短期因子,不适合月度换仓策略; 反之亦然。使用错误的持仓期会严重损害因子实盘表现。
- **`SHARED-FR-IC-003`** <sub>(high)</sub>: Harvey, Liu & Zhu (2016) 警告:学术界已发现 300+ 个"显著"因子, 其中大量是多重检验下的误发现(False Discovery)。因子有效性要求: t-stat > 3.0(而非传统的 1.96);或在不同时段/市场独立复现; 或有清晰的经济学逻辑。不满足上述条件的因子极可能是数据挖掘产物。
- **`SHARED-FR-IC-004`** <sub>(high)</sub>: 因子换手率(Factor Turnover)控制:高 IC 但高换手率的因子,在扣除 交易成本后净 IC 可能为负。应计算换手率调整后的有效 IC: net_IC = IC - turnover × cost_per_turn。目标换手率 ≤ 50%(月频)。
- **`SHARED-FR-IC-005`** <sub>(medium)</sub>: 因子衰减期(Half-life)是因子信号强度的核心参数,直接决定最优再平衡频率。 半衰期 < 5 日:日频或周频换仓;5-20 日:周频或双周;> 20 日:月频换仓。 错误地对短期因子使用月频换仓,会导致大量 alpha 在持仓期内消散。
- **`SHARED-FR-NEUT-001`** <sub>(high)</sub>: 行业中性化(Industry Neutralization):因子值若不对行业均值中性化, 因子收益中会混入行业轮动收益,难以判断是因子本身还是行业暴露驱动了收益。 行业中性化操作:factor_neutral = factor - industry_mean(factor)。
- **`SHARED-FR-NEUT-002`** <sub>(high)</sub>: 市值中性化(Market Cap Neutralization):小盘股效应(小盘跑赢大盘) 是金融史上最持久的 anomaly 之一,会污染几乎所有未中性化的因子。 若因子与市值高度相关,选股会系统性偏向小盘,收益来自市值暴露而非因子本身。 需同时进行行业和市值中性化(Fama-MacBeth 回归或残差法)。
- **`SHARED-FR-NEUT-003`** <sub>(high)</sub>: 异常值处理(Winsorize/MAD):因子原始值通常含有极端值,极端值会扭曲 分组分析(如 Q1/Q10 十分位)。应对原始因子值做 Winsorize(截尾至 [1%, 99%] 或 3-sigma)或 MAD(中位数绝对偏差)缩尾,然后再排名/中性化。
- **`SHARED-FR-NEUT-004`** <sub>(medium)</sub>: 因子正交化(Factor Orthogonalization):当多个因子共同用于合成打分时, 高相关因子的合成等效于对单一因子过度权重,稀释信号多样性。 应在合成前对因子做施密特正交化或 PCA,消除因子间的多重共线性。
- **`SHARED-FR-NEUT-005`** <sub>(medium)</sub>: 缺失数据填充策略:因子计算中的 NaN(停牌/新股/数据缺口)若用截面均值填充 会引入 lookahead bias(均值本身含未来信息);若完全删除会产生幸存者偏差; 正确做法是用截面中位数(当日所有股票的中位数,不依赖未来)或将该股当日排除。
- **`SHARED-FR-PORT-001`** <sub>(high)</sub>: 分层分析(Quantile Analysis):因子评估应使用 Q1/Q5(五分位)或 Q1/Q10(十分位)分组的多空收益差(top minus bottom spread)作为 主要评估指标,而非简单的多头收益。Q1 多 Q5 空的"单调性"检验是 因子有效性的核心证据:单调递增/递减 > 非单调 >> 仅多头有效。
- **`SHARED-FR-PORT-002`** <sub>(medium)</sub>: Alpha 衰减测试(Alpha Decay Test):因子的月度 IC 在不同时段(牛市/熊市/ 震荡市)的稳定性是因子鲁棒性的重要证据。IC 仅在某个特定市场状态下有效 的因子不适合全天候部署;应分段(rolling 12M)展示 IC 时序, 识别因子失效期。
- **`SHARED-FR-PORT-003`** <sub>(medium)</sub>: 换仓成本感知(Turnover-Aware Selection):因子排名靠近中间地带(49-51 分位) 的股票,排名小幅波动就会触发换仓,产生大量无效交易成本。 应在选股时设置换仓缓冲区(buffer zone):只在排名变化超过阈值时才换仓。
- **`SHARED-FR-PORT-004`** <sub>(medium)</sub>: 分组收益的统计显著性(Bootstrap 检验):因子分层收益差(Q1-Q5 spread) 即使在历史数据上很大,也可能是偶然,需要 bootstrap 或 t-test 检验 显著性(p-value < 0.05)。小样本回测期(< 3年)的分层收益尤其不可靠。
- **`SHARED-FR-XFER-001`** <sub>(high)</sub>: 因子跨市场可移植性验证:在一个市场有效的因子,不必然在另一个市场有效。 将美股因子直接套用 A 股、或将股票因子套用期货/加密货币,需要独立 IC 验证, 不可假设跨市场通用性。A 股特有异象(如反转效应、ST 价格异常)不存在于美股。
- **`SHARED-FR-XFER-002`** <sub>(medium)</sub>: 因子有效性时间稳定性:曾经有效的因子会因市场学习和套利行为逐渐失效 (McLean & Pontiff 2016 证明因子发表后平均衰减 58%)。 应定期(每季度/年)重新评估因子 IC,失效因子应及时替换或降权。
- **`SHARED-FR-XFER-003`** <sub>(medium)</sub>: 因子与宏观经济环境的交互:利率周期/经济周期/市场情绪对因子有效性影响显著。 价值因子(低 P/B)在利率上升期更有效;动量因子在趋势市更有效,震荡市失效。 部署因子前应评估当前宏观环境与因子最优生存环境的匹配度。
FILE:references/LOCKS.md
# Semantic Locks + Preconditions
## Semantic Locks (12)
### `SL-01` <sub>(on_violation: fatal)</sub>
Execute sell orders before buy orders in every trading cycle
### `SL-02` <sub>(on_violation: fatal)</sub>
Trading signals MUST use next-bar execution (no look-ahead)
### `SL-03` <sub>(on_violation: fatal)</sub>
Entity IDs MUST follow format entity_type_exchange_code
### `SL-04` <sub>(on_violation: fatal)</sub>
DataFrame index MUST be MultiIndex (entity_id, timestamp)
### `SL-05` <sub>(on_violation: fatal)</sub>
TradingSignal MUST have EXACTLY ONE of: position_pct, order_money, order_amount
### `SL-06` <sub>(on_violation: fatal)</sub>
filter_result column semantics: True=BUY, False=SELL, None/NaN=NO ACTION
### `SL-07` <sub>(on_violation: fatal)</sub>
Transformer MUST run BEFORE Accumulator in factor pipeline
### `SL-08` <sub>(on_violation: fatal)</sub>
MACD parameters locked: fast=12, slow=26, signal=9
### `SL-09` <sub>(on_violation: warning)</sub>
Default transaction costs: buy_cost=0.001, sell_cost=0.001, slippage=0.001
### `SL-10` <sub>(on_violation: fatal)</sub>
A-share equity trading is T+1 (no same-day close of buy positions)
### `SL-11` <sub>(on_violation: fatal)</sub>
Recorder subclass MUST define provider AND data_schema class attributes
### `SL-12` <sub>(on_violation: fatal)</sub>
Factor result_df MUST contain either 'filter_result' OR 'score_result' column
## Preconditions (4)
- **`PC-01`**: `python3 -c 'import zvt; print(zvt.__version__)'` → on_fail: Run: python3 -m pip install zvt then re-run: python3 -m zvt.init_dirs to initialize data directories
- **`PC-02`**: `python3 -c "from zvt.api.kdata import get_kdata; df = get_kdata(entity_ids=['stock_sh_600000'], limit=1); assert df is not None and len(df) > 0, 'No kdata found'"` → on_fail: Run recorder first: python3 -m zvt.recorders.em.em_stock_kdata_recorder --entity_ids stock_sh_600000 (replace with your target entity IDs)
- **`PC-03`**: `python3 -c "import os; from pathlib import Path; zvt_home = Path(os.environ.get('ZVT_HOME', Path.home() / '.zvt')); assert zvt_home.exists(), f'ZVT home not found: {zvt_home}'"` → on_fail: Run: python3 -m zvt.init_dirs
- **`PC-04`**: `python3 -c "import os, tempfile; from pathlib import Path; zvt_home = Path(os.environ.get('ZVT_HOME', Path.home() / '.zvt')); test_f = zvt_home / '.write_test'; test_f.touch(); test_f.unlink()"` → on_fail: Check directory permissions: chmod u+w ~/.zvt or set ZVT_HOME environment variable to a writable location
FILE:references/USE_CASES.md
# Known Use Cases (KUC)
Total: **6**
## `KUC-101`
**Source**: `docs/deploy.py`
Automated build and deployment of project documentation to ensure consistent and reproducible documentation releases.
## `KUC-102`
**Source**: `docs/source/conf.py`
Configures the Sphinx documentation system with extensions for Python API documentation, Jupyter notebooks, and mathematical expressions.
## `KUC-103`
**Source**: `docs/source/notebooks/event_study.ipynb, src/alphalens/examples/event_study.ipynb`
Identifies and analyzes specific market events (e.g., price crossing thresholds) to study their predictive power and forward return characteristics.
## `KUC-104`
**Source**: `docs/source/notebooks/intraday_factor.ipynb, src/alphalens/examples/intraday_factor.ipynb`
Analyzes factors across multiple market sectors (11 GICS sectors) to evaluate cross-sector factor performance and sector-specific factor behavior.
## `KUC-105`
**Source**: `docs/source/notebooks/overview.ipynb, src/alphalens/examples/overview.ipynb`
Provides a comprehensive introduction to Alphalens capabilities for factor analysis, including data preparation, factor computation, and performance visualization.
## `KUC-106`
**Source**: `docs/source/notebooks/pyfolio_integration.ipynb, src/alphalens/examples/pyfolio_integration.ipynb`
Combines Alphalens factor analysis with PyFolio portfolio analytics to evaluate factor-derived portfolio performance, risk metrics, and tearsheet generation.
FILE:references/WISDOM.md
# Cross-Project Wisdom
Total: **10**
## `CW-BT-001` — Cerebro 统一编排引擎
**From**: backtrader · **Applicable to**: backtesting
backtrader 用 Cerebro 作为单一入口,统一管理 data feeds、strategies、analyzers、 observers 的生命周期,支持一次 cerebro.run() 跑多策略+多数据源。 zvt 的 StockTrader 目前每次实例化只绑定一套因子,缺乏统一的多策略组合编排层; 借鉴 Cerebro 模式可让用户把多个 Trader 实例组合到一个 runner 中对比评估。
## `CW-BT-002` — Analyzer 插件化绩效评估
**From**: backtrader · **Applicable to**: backtesting
backtrader 提供 SharpeRatio、DrawDown、TimeReturn、TradeAnalyzer 等即插即用 的 Analyzer,可在不修改策略代码的情况下附加任意绩效指标。 zvt 当前绩效评估能力较弱,没有标准化的 Analyzer 接口; 借鉴此模式可让用户 cerebro.addanalyzer(SharpeRatio) 即得风险调整收益报告。
## `CW-BT-003` — Sizer 仓位管理分离
**From**: backtrader · **Applicable to**: backtesting
backtrader 将仓位管理(每次开仓买多少股/多大比例)单独抽象为 Sizer, 与信号逻辑完全解耦;内置 FixedSize、PercentSizer 等,用户可自定义。 zvt 目前没有显式的 Sizer 概念,仓位控制逻辑散落在 Trader.on_profit_control 等钩子中; 引入 Sizer 接口可使策略信号与资金管理规则独立演化和组合复用。
## `CW-BT-004` — Order 类型全集(Limit/Stop/OCO/Bracket)
**From**: backtrader · **Applicable to**: backtesting
backtrader 支持 Market、Limit、Stop、StopLimit、OCO(二选一)、 Bracket(止盈止损一对订单)等丰富订单类型,并模拟成交滑点和手续费方案。 zvt 回测目前主要支持市价成交,缺乏限价委托和组合订单模拟; 对于高频或实盘对接场景,完善订单类型将大幅提升回测真实性。
## `CW-BT-005` — 数据重采样与重播(Resampling & Replaying)
**From**: backtrader · **Applicable to**: backtesting
backtrader 可将低级别数据(如 1 min)实时 resample 为高级别(如 1 day)并同步驱动策略, 或 replay 逐 tick 模拟 OHLC 形成过程,实现日内精细回测。 zvt 目前多时间框架通过预录入不同级别 K 线实现,缺少运行时动态重采样; 借鉴此模式可在不重复录入数据的前提下支持任意时间粒度组合回测。
## `CW-VN-003` — CTA 回测引擎内置可视化
**From**: vnpy · **Applicable to**: backtesting
vnpy 的 cta_backtester 提供图形界面直接展示策略净值曲线、最大回撤、 每日盈亏、成交明细,无需 Jupyter Notebook。 zvt 目前回测结果可视化依赖 draw_result 方法调用 Plotly,但无统一的回测报告页面; 借鉴此模式可打包一个开箱即用的策略绩效仪表盘。
## `CW-VN-004` — vnpy.alpha ML 因子研究实验室(Lab)
**From**: vnpy · **Applicable to**: factor-research
vnpy 4.0 的 vnpy.alpha.lab 提供数据管理、模型训练、信号生成、策略回测一体化工作流, 支持 Lasso/LightGBM/MLP 等算法的标准化训练接口和可视化对比。 zvt 的 ML 能力目前仅有 MaStockMLMachine 一个入口,缺乏规范化 Lab 框架; 借鉴 Lab 模式可建立"特征工程→训练→信号→回测"的标准流水线,降低 ML 实验门槛。
## `CW-QL-001` — Point-in-Time 数据库(防未来数据泄漏)
**From**: qlib · **Applicable to**: backtesting
qlib 的 Point-in-Time Provider 保证在给定时间点 t 的查询只返回 t 时刻 真实可知的数据(财报发布延迟、修订历史均被正确处理), 彻底消除回测中的 look-ahead bias。 zvt 目前财务数据以报告期为 timestamp,缺少"发布日"维度, 存在用未来财报数据做选股的潜在偏差;引入 PIT 模式可大幅提升回测可信度。
## `CW-QL-002` — Recorder + Experiment 实验管理(MLflow 风格)
**From**: qlib · **Applicable to**: factor-research
qlib 的 workflow 模块提供 Experiment/Recorder,自动记录每次模型训练的 超参数、特征、指标、预测结果,支持跨实验比较和模型版本管理。 zvt 目前缺乏 ML 实验追踪机制,每次重跑结果会覆盖前次; 借鉴 Recorder 模式可将每次因子实验的参数和结果持久化,支持快速复现和版本对比。
## `CW-QL-003` — Nested Decision Framework(多层嵌套决策执行)
**From**: qlib · **Applicable to**: backtesting
qlib 支持将高频执行层(分钟级委托拆单)嵌套在低频决策层(日级组合调仓)中, 两层独立优化且可组合运行,实现日内最优执行算法(如 TWAP、VWAP 调仓)。 zvt 目前回测仅有日线级别的成交假设,缺乏执行算法建模; 借鉴嵌套框架可让 zvt 区分"何时持有哪些股"与"如何以最小冲击成本建仓"两个问题。
FILE:references/components/data_preparation_-_alignment.md
# data_preparation_&_alignment (8 classes)
## `utils.get_clean_factor_and_forward_returns`
`data_preparation_&_alignment/utils-get-clean-factor-and-forward-retur.py:0`
## `utils.compute_forward_returns`
`data_preparation_&_alignment/utils-compute-forward-returns.py:0`
## `utils.quantize_factor`
`data_preparation_&_alignment/utils-quantize-factor.py:0`
## `utils.demean_forward_returns`
`data_preparation_&_alignment/utils-demean-forward-returns.py:0`
## `utils.infer_trading_calendar`
`data_preparation_&_alignment/utils-infer-trading-calendar.py:0`
## `forward_returns_computation`
`data_preparation_&_alignment/forward-returns-computation.py:0`
## `binning_strategy`
`data_preparation_&_alignment/binning-strategy.py:0`
## `zscore_filter`
`data_preparation_&_alignment/zscore-filter.py:0`
FILE:references/components/performance_-_risk_metrics.md
# performance_&_risk_metrics (9 classes)
## `performance.factor_information_coefficient`
`performance_&_risk_metrics/performance-factor-information-coefficie.py:0`
## `performance.factor_weights`
`performance_&_risk_metrics/performance-factor-weights.py:0`
## `performance.factor_returns`
`performance_&_risk_metrics/performance-factor-returns.py:0`
## `performance.factor_alpha_beta`
`performance_&_risk_metrics/performance-factor-alpha-beta.py:0`
## `performance.mean_return_by_quantile`
`performance_&_risk_metrics/performance-mean-return-by-quantile.py:0`
## `performance.factor_rank_autocorrelation`
`performance_&_risk_metrics/performance-factor-rank-autocorrelation.py:0`
## `IC_computation`
`performance_&_risk_metrics/ic-computation.py:0`
## `weighting_scheme`
`performance_&_risk_metrics/weighting-scheme.py:0`
## `portfolio_type`
`performance_&_risk_metrics/portfolio-type.py:0`
FILE:references/components/plotting_-_visualization.md
# plotting_&_visualization (8 classes)
## `plotting.plot_ic_ts`
`plotting_&_visualization/plotting-plot-ic-ts.py:0`
## `plotting.plot_ic_hqq`
`plotting_&_visualization/plotting-plot-ic-hqq.py:0`
## `plotting.plot_quantile_returns_bar`
`plotting_&_visualization/plotting-plot-quantile-returns-bar.py:0`
## `plotting.plot_cumulative_returns`
`plotting_&_visualization/plotting-plot-cumulative-returns.py:0`
## `plotting.plot_turnover_table`
`plotting_&_visualization/plotting-plot-turnover-table.py:0`
## `plotting.plot_event_returns`
`plotting_&_visualization/plotting-plot-event-returns.py:0`
## `plotting_style`
`plotting_&_visualization/plotting-style.py:0`
## `context`
`plotting_&_visualization/context.py:0`
FILE:references/components/tear_sheet_reporting.md
# tear_sheet_reporting (7 classes)
## `tears.create_full_tear_sheet`
`tear_sheet_reporting/tears-create-full-tear-sheet.py:0`
## `tears.create_summary_tear_sheet`
`tear_sheet_reporting/tears-create-summary-tear-sheet.py:0`
## `tears.create_returns_tear_sheet`
`tear_sheet_reporting/tears-create-returns-tear-sheet.py:0`
## `tears.create_information_tear_sheet`
`tear_sheet_reporting/tears-create-information-tear-sheet.py:0`
## `tears.create_turnover_tear_sheet`
`tear_sheet_reporting/tears-create-turnover-tear-sheet.py:0`
## `tears.create_event_returns_tear_sheet`
`tear_sheet_reporting/tears-create-event-returns-tear-sheet.py:0`
## `tear_sheet_type`
`tear_sheet_reporting/tear-sheet-type.py:0`
获取中国 A 股市场实时行情、历史 K 线、财务报表、基金期货等金融数据,支持股票、债券、期权等多品种数据查询。
---
name: akshare-financial-data
description: |-
获取中国 A 股市场实时行情、历史 K 线、财务报表、基金期货等金融数据,支持股票、债券、期权等多品种数据查询。
license: Proprietary. See LICENSE.txt in project root.
compatibility: Designed for Doramagic-host ecosystem (Claude Code / openclaw / Cursor). Requires Python 3.12+ with uv package manager.
metadata:
version: "v6.1"
blueprint_id: "finance-bp-079"
compiled_at: "2026-04-22T13:00:30.352072+00:00"
capability_markets: "cn-astock"
capability_activities: "data-sourcing"
sop_version: "crystal-compilation-v6.1"
---
# AkShare 金融数据 (akshare-financial-data)
> 获取中国 A 股市场实时行情、历史 K 线、财务报表、基金期货等金融数据,支持股票、债券、期权等多品种数据查询。
## Pipeline
`data_collection -> data_storage -> factor_computation -> target_selection -> trading_execution -> visualization`
## Top Use Cases (1 total)
### Sphinx Documentation Configuration for Akshare (`UC-101`)
Sets up the Sphinx documentation builder with Chinese language support (via ctex), Markdown parsing via recommonmark, and automatic version string ext
**Triggers**: documentation, sphinx, docs build
**Execute trigger**: `When user intent matches intent_router.uc_entries[].positive_terms AND user uses action verb (run/execute/跑/执行/backtest/fetch/collect)`
## What I'll Ask You
- Target market: A-share (default), HK, or crypto? (US stocks in ZVT are half-baked — stockus_nasdaq_AAPL exists but coverage is thin)
- Data source / provider: eastmoney (free, no account), joinquant (account+paid), baostock (free, good history), akshare, or qmt (broker)?
- Strategy type: MACD golden-cross, MA crossover, volume breakout, fundamental screen, or custom factor?
- Time range: start_timestamp and end_timestamp for backtest period
- Target entity IDs: specific stocks (stock_sh_600000) or index components (SZ1000)?
## Semantic Locks (Fatal)
| ID | Rule | On Violation |
|---|---|---|
| `SL-01` | Execute sell orders before buy orders in every trading cycle | halt |
| `SL-02` | Trading signals MUST use next-bar execution (no look-ahead) | halt |
| `SL-03` | Entity IDs MUST follow format entity_type_exchange_code | halt |
| `SL-04` | DataFrame index MUST be MultiIndex (entity_id, timestamp) | halt |
| `SL-05` | TradingSignal MUST have EXACTLY ONE of: position_pct, order_money, order_amount | halt |
| `SL-06` | filter_result column semantics: True=BUY, False=SELL, None/NaN=NO ACTION | halt |
| `SL-07` | Transformer MUST run BEFORE Accumulator in factor pipeline | halt |
| `SL-08` | MACD parameters locked: fast=12, slow=26, signal=9 | halt |
Full lock definitions: [references/LOCKS.md](references/LOCKS.md)
## Top Anti-Patterns (14 total)
- **`AP-DATA-SOURCING-001`**: Missing or invalid User-Agent headers for SEC API requests
- **`AP-DATA-SOURCING-002`**: Ignoring external API rate limits causing IP blocking
- **`AP-DATA-SOURCING-003`**: No HTTP timeout configuration causing indefinite hangs
All 14 anti-patterns: [references/ANTI_PATTERNS.md](references/ANTI_PATTERNS.md)
## Evidence Quality Notice
> [QUALITY NOTICE] This crystal was compiled from blueprint finance-bp-079. Evidence verify ratio = 30.6% and audit fail total = 41. Generated results may have uncaptured requirement gaps. Verify critical decisions against source files (LATEST.yaml / LATEST.jsonl).
## Reference Files
| File | Contents | When to Load |
|---|---|---|
| [references/seed.yaml](references/seed.yaml) | V6+ 全量权威 (source-of-truth) | 有行为/决策争议时必读 |
| [references/ANTI_PATTERNS.md](references/ANTI_PATTERNS.md) | 14 条跨项目反模式 | 开始实现前 |
| [references/WISDOM.md](references/WISDOM.md) | 跨项目精华借鉴 | 架构决策时 |
| [references/CONSTRAINTS.md](references/CONSTRAINTS.md) | domain + fatal 约束 | 规则冲突时 |
| [references/USE_CASES.md](references/USE_CASES.md) | 全量 KUC-* 业务场景 | 需要完整示例时 |
| [references/LOCKS.md](references/LOCKS.md) | SL-* + preconditions + hints | 生成回测/交易代码前 |
| [references/COMPONENTS.md](references/COMPONENTS.md) | AST 组件地图(按 module 拆分)| 查 API 时 |
---
*Compiled by Doramagic crystal-compilation-v6.1 from `finance-bp-079` blueprint at 2026-04-22T13:00:30.352072+00:00.*
*See [human_summary.md](human_summary.md) for non-technical overview.*
FILE:human_summary.md
# Human Summary
> I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow. Just tell me what you want; I'll write the code, you don't have to dig docs. (Heads up: ZVT natively supports A-share, HK, and crypto. US stocks — stockus_nasdaq_AAPL — are half-baked; don't bother for serious work.)
## What I Can Do
- **tagline**: I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow. Just tell me what you want; I'll write the code, you don't have to dig docs. (Heads up: ZVT natively supports A-share, HK, and crypto. US stocks — stockus_nasdaq_AAPL — are half-baked; don't bother for serious work.)
- **use_cases**: ['Sphinx Documentation Configuration for Akshare', 'A-share MACD daily golden-cross backtest with hfq price adjustment from eastmoney', 'End-to-end ZVT pipeline: FinanceRecorder + GoodCompanyFactor + StockTrader', 'Multi-factor strategy with TargetSelector (AND mode) combining MACD + volume breakout', 'Index composition data collection (SZ1000, SZ2000) with EM recorder', 'Institutional fund holdings tracker via joinquant_fund_runner pattern', 'Custom Transformer + Accumulator factor with per-entity rolling state']
## What I Ask You
- Target market: A-share (default), HK, or crypto? (US stocks in ZVT are half-baked — stockus_nasdaq_AAPL exists but coverage is thin)
- Data source / provider: eastmoney (free, no account), joinquant (account+paid), baostock (free, good history), akshare, or qmt (broker)?
- Strategy type: MACD golden-cross, MA crossover, volume breakout, fundamental screen, or custom factor?
- Time range: start_timestamp and end_timestamp for backtest period
- Target entity IDs: specific stocks (stock_sh_600000) or index components (SZ1000)?
FILE:references/ANTI_PATTERNS.md
# Anti-Patterns (Cross-Project)
Total: **14**
## finance-bp-070--edgartools (2)
### `AP-DATA-SOURCING-004` — Invalidating XBRL period types for balance sheet analysis <sub>(high)</sub>
Balance sheets represent point-in-time snapshots (instant periods), not ranges (duration periods). Using duration periods for balance sheet statements causes stockholder equity and other line items to show nonsensical date ranges, corrupting financial calculations that depend on accurate period associations.
### `AP-DATA-SOURCING-012` — Large document parsing without streaming causing OOM errors <sub>(high)</sub>
SEC filings can exceed 160MB, and parsing large documents in memory without streaming causes OOM errors that crash the entire service for all users. Documents exceeding 10MB require switching to streaming parsers to prevent extreme memory usage.
## finance-bp-070--edgartools, finance-bp-079--akshare, finance-bp-084--eastmoney, finance-bp-114--edgar-crawler (1)
### `AP-DATA-SOURCING-002` — Ignoring external API rate limits causing IP blocking <sub>(high)</sub>
Multiple financial data sources (SEC EDGAR, Sina, Eastmoney, TuShare) enforce strict rate limits (10 req/sec, 120 calls/minute). Exceeding these triggers temporary IP blocks lasting 10-60 minutes, causing complete data unavailability. Immediate retry attempts during blocks extend the block duration significantly.
## finance-bp-070--edgartools, finance-bp-114--edgar-crawler (1)
### `AP-DATA-SOURCING-001` — Missing or invalid User-Agent headers for SEC API requests <sub>(high)</sub>
SEC EDGAR requires valid User-Agent identity with contact information in headers. Without this, requests are rejected with 403 Forbidden errors, completely blocking all filing access. Both edgartools and edgar-crawler enforce this constraint as fundamental to any data retrieval operation.
## finance-bp-079--akshare (4)
### `AP-DATA-SOURCING-003` — No HTTP timeout configuration causing indefinite hangs <sub>(high)</sub>
HTTP requests to external financial data sources (Yahoo, Sina, Eastmoney) without timeout values can hang indefinitely on blocked connections. This freezes the entire application and prevents data collection from all other sources, creating cascading failures across the system.
### `AP-DATA-SOURCING-005` — Malformed or empty JSON responses causing silent failures <sub>(medium)</sub>
Financial API responses containing malformed JSON raise unhandled ValueError exceptions, crashing downstream processing. Similarly, empty JSON responses (empty dict, list, null) masquerading as valid data cause silent failures producing empty DataFrames or misleading results in financial analysis.
### `AP-DATA-SOURCING-006` — Source-specific symbol mapping errors causing data corruption <sub>(high)</sub>
Stock symbols require source-specific formatting (sh/sz prefixes for Sina, numeric codes for THS, etc.). Incorrect symbol mapping causes API calls to return empty results or wrong data, corrupting financial datasets with missing records or entirely incorrect tickers being stored.
### `AP-DATA-SOURCING-013` — Column mapping length mismatch causing DataFrame errors <sub>(medium)</sub>
Column mapping constants with length mismatch against actual API response columns cause ValueError exceptions during DataFrame construction. Raw field names (f1, f2, f12) must be mapped to meaningful names (最新价, 涨跌幅) with exact column count alignment.
## finance-bp-103--ArcticDB (3)
### `AP-DATA-SOURCING-007` — Using unsupported DataFrame types with time-series storage <sub>(high)</sub>
ArcticDB does not support MultiIndex columns, PyArrow-backed pandas DataFrames, or timedelta64 columns. Attempting to write these DataFrame types raises ArcticDbNotYetImplemented exceptions, causing write failures and permanent data loss if not properly handled before storage operations.
### `AP-DATA-SOURCING-008` — Non-atomic storage writes causing concurrent access corruption <sub>(high)</sub>
Storage backends without atomic write_if_none operations can cause data corruption under concurrent multi-writer access. Similarly, updating reference keys before atom keys complete allows readers to access incomplete or missing data, breaking version chain integrity.
### `AP-DATA-SOURCING-014` — Pruning snapshot-protected versions breaking point-in-time recovery <sub>(high)</sub>
Deleting or pruning versions that are referenced by existing snapshots breaks historical data access. Snapshots provide point-in-time recovery capabilities, and removing their referenced versions causes read failures when users attempt to access data from specific snapshots.
## finance-bp-114--edgar-crawler (1)
### `AP-DATA-SOURCING-010` — 8-K filing item numbering scheme mismatch for historical filings <sub>(medium)</sub>
8-K filings use obsolete item numbering (1-12) before 2004-08-23 and new numbering (1.01-9.01) after. Using the wrong numbering scheme causes no matches for historical filings, resulting in empty item sections and complete extraction failure for pre-2004 data.
## finance-bp-128--yfinance (2)
### `AP-DATA-SOURCING-009` — Missing timezone-aware DatetimeIndex causing DST offset errors <sub>(high)</sub>
Price history DataFrames returned without timezone-aware DatetimeIndex cause incorrect timestamp interpretation when combined with other timezone-aware data. This leads to 23-25 hour offset errors during daylight saving time transitions, corrupting historical price calculations.
### `AP-DATA-SOURCING-011` — Yahoo Finance missing crumb authentication causing 401/403 errors <sub>(high)</sub>
Yahoo Finance API requires crumb and cookie authentication with every request. Without proper crumb management, API calls return 401 Unauthorized or HTML error pages instead of JSON data, breaking all downstream price and financial data processing.
FILE:references/COMPONENTS.md
# Component Capability Map
**Project**: finance-bp-079--akshare
**Scan date**: 2026-04-22
**Stats**: {'total_files': 10, 'total_classes': 18, 'total_functions': 0, 'total_stages': 10}
## Modules (10)
- [http_request_layer](components/http_request_layer.md): 3 classes
- [source-specific_data_acquisition](components/source-specific_data_acquisition.md): 3 classes
- [html_table_extraction](components/html_table_extraction.md): 2 classes
- [json_response_parsing](components/json_response_parsing.md): 1 classes
- [column_name_standardization](components/column_name_standardization.md): 2 classes
- [data_type_conversion](components/data_type_conversion.md): 1 classes
- [paginated_data_fetching](components/paginated_data_fetching.md): 1 classes
- [trading_calendar_validation](components/trading_calendar_validation.md): 2 classes
- [price_adjustment_processing](components/price_adjustment_processing.md): 1 classes
- [realized_volatility_calculation](components/realized_volatility_calculation.md): 2 classes
FILE:references/CONSTRAINTS.md
# Constraints
## preservation_manifest
```yaml
required_objects:
business_decisions_count: 158
fatal_constraints_count: 30
non_fatal_constraints_count: 198
use_cases_count: 1
semantic_locks_count: 12
preconditions_count: 4
evidence_quality_rules_count: 2
traceback_scenarios_count: 5
```
## Domain Constraints Injected (47)
- **`SHARED-CN-ASTOCK-T1-001`** <sub>(fatal)</sub>: A 股股票实行 T+1 交收制度:T 日买入的股票最早 T+1 日方可卖出。 T 日卖出所得资金可当日再用于买入。回测框架若未施加 T+1 持仓锁定, 将高估换手率与策略胜率,尤其损害日内反转类策略的真实性。
- **`SHARED-CN-ASTOCK-T1-002`** <sub>(fatal)</sub>: 沪深主板股票日涨跌幅上限为 ±10%(ST/SST 股票 ±5%)。 涨停封板时买方消失、跌停封板时卖方消失;回测若假设当日可以任意价格 成交,会系统性高估可执行性。封板检测应在成交模拟层强制实施。
- **`SHARED-CN-ASTOCK-T1-003`** <sub>(high)</sub>: 科创板和创业板(2020年8月改革后)正常交易日涨跌幅为 ±20%; 北交所 ±30%;新股上市后前5个交易日不设涨跌幅限制。 回测若对所有股票统一套用 ±10% 过滤逻辑,会错误剔除或错误包含这些板块的成交。
- **`SHARED-CN-ASTOCK-T1-004`** <sub>(high)</sub>: ST/*ST 股票日涨跌幅限制为 ±5%,流动性极差,成交假设不可与正常股票混用。 包含历史 ST 股票(最终退市)但不纳入回测会产生幸存者偏差; 纳入回测但不区分 ST 涨跌幅会错误模拟成交。
- **`SHARED-CN-ASTOCK-T1-005`** <sub>(medium)</sub>: A 股开盘集合竞价(9:15-9:25)和收盘集合竞价(14:57-15:00)期间, 成交价由"最大成交量原则"确定,非即时撮合。回测以开盘价或收盘价假设 即时全量成交会低估实际滑点风险,大单策略尤为明显。
- **`SHARED-CN-ASTOCK-T1-006`** <sub>(high)</sub>: 停牌制度:A 股长期停牌(2018年前可长达数月)期间,持仓资金被锁定, 无法再平衡,机会成本在回测中普遍被忽略。应在因子计算前过滤停牌日 (volume == 0 或 is_suspended == True),停牌期间不发出信号。
- **`SHARED-CN-ASTOCK-T1-007`** <sub>(high)</sub>: 新股上市后前5个交易日无涨跌幅限制(首日涨幅可超300%), 且无完整历史数据(均线/波动率/换手率因子无法计算)。 应在因子计算前过滤上市不足 N 个交易日(通常 60-252 日)的股票。
- **`SHARED-CN-ASTOCK-T1-008`** <sub>(high)</sub>: A 股程序化交易监管新规(2025年7月7日施行):单账户每秒申报/撤单 ≥ 300 笔, 或单日申报/撤单 ≥ 20000 笔,被认定为高频交易,须向交易所报备。 AI 生成的量化策略若频率超标则无法合规运行,应在策略设计期提示。
- **`SHARED-CN-ASTOCK-ADJ-001`** <sub>(fatal)</sub>: 除权除息日股价跳空是账面调整而非真实亏损。复权选择: 不复权会虚增策略亏损;前复权会将历史价格内嵌未来分红信息(lookahead bias); 后复权以上市首日为基准累积,是量化回测的最优选择。
- **`SHARED-CN-ASTOCK-ADJ-002`** <sub>(fatal)</sub>: A 股上市公司财务报告披露有法定延迟:年报在次年4月30日前、 半年报在8月31日前、季报分别在4月30日(一季)/10月31日(三季)前披露。 回测中使用财务数据时,必须以实际披露日期(announcement_date)而非 会计期间结束日作为数据可用时间点,否则引入 point-in-time lookahead bias。
- **`SHARED-CN-ASTOCK-ADJ-003`** <sub>(high)</sub>: 分红送股转增和配股会导致除权除息日后股本增加,历史持股数量不变但股价等比 缩水,若回测系统未同步调整持仓股数,会在除权日产生虚假亏损或盈利。
- **`SHARED-CN-ASTOCK-ADJ-004`** <sub>(medium)</sub>: 大宗交易与竞价交易价差:大宗交易成交价可比市价折价最多 10%(主板), 但此价格不影响次日竞价开盘。大宗交易数据出现在收盘后,若将其混入 日内 OHLCV 数据,会污染收盘价和成交量的正常计算。
- **`SHARED-CN-ASTOCK-ADJ-005`** <sub>(fatal)</sub>: 融资融券(两融)做空限制:A 股散户无法直接卖空,融券标的池有限(主要为 大盘蓝筹,中小盘融券极度稀缺),融券利率远高于融资利率。 回测若直接假设可做空任意股票,会产生不可执行的策略,实盘与回测严重背离。
- **`SHARED-CN-ASTOCK-FX-001`** <sub>(high)</sub>: 通过沪深港通(北向)买入股票,境外投资者合计持股上限 30%,预警线 28%。 当外资持股比例达 28% 时,联交所暂停该股新增买盘,直到降至 26% 才恢复。 策略若重仓外资偏好股(消费/医药龙头),需监控外资持股比例。
- **`SHARED-CN-ASTOCK-FX-002`** <sub>(high)</sub>: 5% 举牌规则:单一投资者持有上市公司已发行股份超过 5%,须在3日内向证监会 和交易所报告并公告;在此期间及公告后2日内不得再买卖。 量化选股系统若不考虑此规则,重仓股超过 5% 阈值后将面临强制停止买入。
- **`SHARED-CN-ASTOCK-FX-003`** <sub>(high)</sub>: 公募基金"双十原则":单基金持有单只股票不超过净资产 10%, 同一基金管理人旗下所有基金合计不超过该公司已发行股份 10%。 量化选股组合若部署于公募基金,需在优化约束中强制加入合规上限。
- **`SHARED-CN-ASTOCK-FX-004`** <sub>(fatal)</sub>: 内幕交易边界:AI 辅助量化系统的所有输入数据必须来自公开已披露信息。 通过非公开渠道(私有数据服务/内部消息/重组前预知)触发的自动化交易 构成内幕交易,适用《证券法》第80-83条及《内幕交易行为认定指引》。
- **`SHARED-CN-ASTOCK-MKT-001`** <sub>(fatal)</sub>: 幸存者偏差:使用当前 A 股成分股(如当前沪深300)作为历史回测股票池, 会遗漏曾被纳入指数但因业绩差被调出或退市的股票。2020-2024年 A 股 退市数量加速(41家/年创纪录),此偏差日趋严重。必须使用历史时点快照。
- **`SHARED-CN-ASTOCK-MKT-002`** <sub>(medium)</sub>: 指数成分股调整效应:沪深300/中证500等每半年调整一次(6月/12月), 被纳入股票通常在公告日至生效日之间显著上涨(被动资金被动买入), 被剔除股票则相反。回测股票池应使用历史成分股快照,并标注调整窗口期。
- **`SHARED-CN-ASTOCK-MKT-003`** <sub>(high)</sub>: 策略拥挤(Strategy Crowding):大量量化私募使用相似因子模型时, 持仓高度重叠,遇市场冲击时集体卖出形成踩踏。2024年2月 A 股量化危机 是典型案例(小盘股指数单日跌幅超 10%)。需监控因子多头持仓与 主流量化基金的重叠率。
- **`SHARED-CN-ASTOCK-MKT-004`** <sub>(high)</sub>: A 股量化对冲策略常用 IF/IC/IM 股指期货做多/空对冲系统性风险。 但 A 股股指期货长期处于贴水(远期价格 < 现货),IC 年化贴水可达 10-20%。 回测若仅考虑价格收益而忽略期货贴水/升水,会严重高估对冲策略净收益。
- **`SHARED-CN-ASTOCK-MKT-005`** <sub>(high)</sub>: A 股月度动量因子在方向上与美股相反:近1个月表现最好的股票, 下1个月大概率反转(反转效应而非动量)。机构研究(华泰/东吴证券) 与学术论文均验证:直接套用美股月度动量因子在 A 股会产生系统性亏损。
- **`SHARED-CN-ASTOCK-BF-001`** <sub>(medium)</sub>: 处置效应(Shefrin & Statman 1985)在 A 股散户中尤为显著: 投资者倾向于过早卖出盈利股票、过长持有亏损股票。上交所实证研究证实 超过 90% 的个人账户存在此效应,AI 辅助工具不应迁就"持有亏损等解套" 的直觉,而应基于量化信号提供纪律性止损止盈建议。
- **`SHARED-CN-ASTOCK-BF-002`** <sub>(medium)</sub>: A 股以散户为主(个人账户交易量占比超 80%),羊群效应显著:散户倾向于 跟风操作,导致价格非理性波动(如 2015年杠杆牛熊)。量化策略应避免 使用成交量排行/热度排行等可能强化羊群信号的指标作为主要因子。
- **`SHARED-CN-ASTOCK-BF-003`** <sub>(medium)</sub>: 过度自信效应(Barber & Odean 2000)在 A 股散户中更严重:散户年均换手率 超 500%,机构长期收益显著优于散户。高换手率策略经交易成本后净收益往往 更低。AI 不应鼓励"频繁操作",而应推荐低频高质信号驱动交易。
- **`SHARED-CN-ASTOCK-BF-004`** <sub>(medium)</sub>: A 股日历效应:春节效应(节前5日和节后1-3日倾向上涨)、月初效应 (月初第1-5个交易日表现优于月中/月末)已有学术实证(南京财经大学等)。 策略应在日历特殊窗口降低信号置信度,或单独评估日历驱动收益的贡献。
- **`SHARED-CN-ASTOCK-BF-005`** <sub>(high)</sub>: 策略容量(Capacity)限制:A 股小盘/微盘股日均成交额仅数百万, 大资金买入/卖出会造成严重价格冲击,策略实际容量可能仅几千万元。 回测结果不可外推至亿级资金,应在回测中加入成交量比例上限约束。
- **`SHARED-CN-ASTOCK-COST-001`** <sub>(fatal)</sub>: A 股完整交易成本结构(2023年8月调整后):印花税卖出单向 0.05%; 佣金双向约 0.01%(最低5元);过户费(沪市)0.001%; 滑点/冲击成本小盘股 0.1%-0.5%/次。忽略成本的回测策略年化收益率 具有欺骗性,高频/高换手策略尤甚。
- **`SHARED-CN-ASTOCK-COST-002`** <sub>(high)</sub>: 市场冲击成本(Market Impact)在回测中通常完全缺失,但在实盘中可能是 最大成本来源。A 股小盘股 100 万元买入可能推高 1% 以上。冲击成本与 成交规模呈幂律而非线性关系,应使用 Almgren-Chriss 模型或简化版估算。
- **`SHARED-CN-ASTOCK-COST-003`** <sub>(medium)</sub>: 大股东/董监高减持新规(证监会第224号令,2024年5月):持股5%以上大股东 通过集中竞价减持须提前15个交易日披露减持计划,3个月内不超过股份总数1%。 解禁股减持压力是 A 股特有的系统性风险因子,回测中忽略解禁日历会低估 相关股票的持股风险。
- **`SHARED-CN-ASTOCK-DATA-001`** <sub>(high)</sub>: A 股交易日历与自然日历不一致:存在法定节假日调休导致的"补班日"(周六上班), 以及临时停市(2015年7月8日至7月10日因股灾紧急停市)。 使用通用工作日历(weekdays)推算 A 股交易日会产生偏差, 必须使用 A 股专用交易日历(如 exchange_calendars 或 tushare 的交易日接口)。
- **`SHARED-CN-ASTOCK-DATA-002`** <sub>(medium)</sub>: A 股退市后股票代码可能被新股重用(极少见但存在)。使用纯代码(如 '000001') 作为历史数据主键而不包含交易所后缀('.SZ')或上市日期范围,可能导致 历史数据与当前股票的错误混淆,长周期回测中需特别注意。
- **`SHARED-DS-RL-001`** <sub>(fatal)</sub>: Rate Limit + 指数退避重试:所有外部数据 API 调用必须实施速率限制控制 和指数退避重试(Exponential Backoff with Jitter)。收到 429/503 响应后 立即重试是反模式,会加剧服务端压力并触发 IP 封禁。 最大重试次数 3-5 次,退避基数 1-2 秒,最大退避 60 秒。
- **`SHARED-DS-RL-002`** <sub>(high)</sub>: 批量 API 调用必须控制并发数(max_workers),不可无限制并行。 免费 API(akshare/tushare 免费版)通常限制为 1-3 并发; 付费 API 也有并发上限(tushare 积分制,不同积分对应不同并发)。 超出并发限制会触发 429 或 IP 封禁。推荐使用 asyncio.Semaphore 或 ThreadPoolExecutor 的 max_workers 参数显式控制。
- **`SHARED-DS-RL-003`** <sub>(high)</sub>: API Token / 凭证安全:数据源 API key(tushare token / akshare 无需 token 但 其他商业数据源需要)不可硬编码在代码中,必须通过环境变量或配置文件读取。 硬编码 token 提交到 Git 会导致 token 泄露和费用损失。
- **`SHARED-DS-RL-004`** <sub>(medium)</sub>: 请求节流(Throttling):对同一 API 的批量请求应在请求间插入最小间隔 (akshare 部分接口要求 ≥ 0.5s;tushare 免费版每分钟 200 次)。 纯代码 sleep 不如令牌桶(Token Bucket)算法精确,推荐使用 ratelimit 或 slowapi 等成熟库。
- **`SHARED-DS-MISS-001`** <sub>(high)</sub>: 停牌日数据缺失策略:停牌股票在停牌期间无成交数据,数据库中会出现日期缺口。 缺失日期不可使用 forward-fill(会产生虚假成交量); 应在数据库中以 is_suspended=True 标记,量和成交额填 0,价格保留前一日收盘价。 因子计算时必须过滤 is_suspended=True 的行。
- **`SHARED-DS-MISS-002`** <sub>(medium)</sub>: 新上市股票的历史数据边界:新股上市首日开始在数据库中出现,但其上市前 无历史数据。若因子计算的 lookback 期超过上市天数,会产生所有 NaN 因子值。 采集时应记录每只股票的上市日期(list_date),采集逻辑应以上市日期为起点, 不以固定开始日期。
- **`SHARED-DS-MISS-003`** <sub>(high)</sub>: 退市股票的数据完整性:已退市股票在主流数据源(akshare/tushare)中依然 可以查询历史数据(退市前的历史),但退市日期后无数据。 历史股票池构建时必须包含已退市股票(否则幸存者偏差), 且采集时需明确处理退市日截止边界。
- **`SHARED-DS-MISS-004`** <sub>(high)</sub>: 多数据源数据对账(Cross-Source Reconciliation):同一数据(如收盘价) 从不同数据源(akshare/tushare/baostock)获取可能存在细微差异 (不同复权方式/不同节假日处理/除息调整时间不同)。 应在 pipeline 中实施多源对账检查,差异超阈值(如 0.1%)时记录告警并人工确认。
- **`SHARED-DS-TIME-001`** <sub>(high)</sub>: 时间戳精度与类型一致性:数据库中时间戳应使用统一的数据类型 (timestamp 而非 varchar/int)。混用字符串日期('2024-01-15')和 Timestamp 对象是比较、索引、merge 出现细微 bug 的常见来源, 应在 pipeline 入口处强制转换。
- **`SHARED-DS-TIME-002`** <sub>(high)</sub>: 交易时间与自然时间的区分:日线数据的"日期"通常对应交易日(T日), 而新闻/公告数据的"时间"是自然时间。合并两类数据时,必须将自然时间 映射到下一个可用交易日(next available trading day), 否则会产生"公告在T日,但T日盘中已经可用"的 lookahead 问题。
- **`SHARED-DS-INCR-001`** <sub>(high)</sub>: 增量更新幂等性:数据更新脚本必须是幂等的(多次运行结果相同)。 若脚本因网络中断在中途失败,重新运行时不应产生重复数据或数据缺口。 实现方式:先写入临时表,校验后 UPSERT 到主表,不直接 INSERT/APPEND。
- **`SHARED-DS-INCR-002`** <sub>(high)</sub>: 数据完整性检验(数据校验和/行数检查):每次数据更新后, 应对关键字段做完整性检验:行数是否在预期范围内、价格是否为正数、 日期是否连续(无缺失交易日)。缺少自动校验的数据管道是"沉默腐烂"的根源。
- **`SHARED-DS-INCR-003`** <sub>(medium)</sub>: 数据版本化:数据管道的输出数据应版本化管理(data versioning)。 当数据源更新了历史数据(如修订调整后的财务数据), 旧版本数据应保留可追溯,不应静默覆盖,以便对比版本间差异及复现历史回测。
- **`SHARED-DS-INCR-004`** <sub>(medium)</sub>: 数据对齐到交易日历边界:采集完成后,应验证所有股票/资产的数据覆盖 完整性与交易日历的一致性。每只股票在每个交易日都应有一行数据 (停牌标记,不是缺失)。通过 pivot_table 检查 NaN 比例是有效的快速诊断手段。
- **`SHARED-DS-INCR-005`** <sub>(medium)</sub>: 缓存策略(Caching):频繁读取的静态/低频更新数据(如股票信息、行业分类、 指数成分股)应本地缓存,避免每次运行重复 API 调用。 缓存必须设置过期时间(TTL),防止使用过期的行业分类或已失效的成分股信息。
FILE:references/LOCKS.md
# Semantic Locks + Preconditions
## Semantic Locks (12)
### `SL-01` <sub>(on_violation: fatal)</sub>
Execute sell orders before buy orders in every trading cycle
### `SL-02` <sub>(on_violation: fatal)</sub>
Trading signals MUST use next-bar execution (no look-ahead)
### `SL-03` <sub>(on_violation: fatal)</sub>
Entity IDs MUST follow format entity_type_exchange_code
### `SL-04` <sub>(on_violation: fatal)</sub>
DataFrame index MUST be MultiIndex (entity_id, timestamp)
### `SL-05` <sub>(on_violation: fatal)</sub>
TradingSignal MUST have EXACTLY ONE of: position_pct, order_money, order_amount
### `SL-06` <sub>(on_violation: fatal)</sub>
filter_result column semantics: True=BUY, False=SELL, None/NaN=NO ACTION
### `SL-07` <sub>(on_violation: fatal)</sub>
Transformer MUST run BEFORE Accumulator in factor pipeline
### `SL-08` <sub>(on_violation: fatal)</sub>
MACD parameters locked: fast=12, slow=26, signal=9
### `SL-09` <sub>(on_violation: warning)</sub>
Default transaction costs: buy_cost=0.001, sell_cost=0.001, slippage=0.001
### `SL-10` <sub>(on_violation: fatal)</sub>
A-share equity trading is T+1 (no same-day close of buy positions)
### `SL-11` <sub>(on_violation: fatal)</sub>
Recorder subclass MUST define provider AND data_schema class attributes
### `SL-12` <sub>(on_violation: fatal)</sub>
Factor result_df MUST contain either 'filter_result' OR 'score_result' column
## Preconditions (4)
- **`PC-01`**: `python3 -c 'import zvt; print(zvt.__version__)'` → on_fail: Run: python3 -m pip install zvt then re-run: python3 -m zvt.init_dirs to initialize data directories
- **`PC-02`**: `python3 -c "from zvt.api.kdata import get_kdata; df = get_kdata(entity_ids=['stock_sh_600000'], limit=1); assert df is not None and len(df) > 0, 'No kdata found'"` → on_fail: Run recorder first: python3 -m zvt.recorders.em.em_stock_kdata_recorder --entity_ids stock_sh_600000 (replace with your target entity IDs)
- **`PC-03`**: `python3 -c "import os; from pathlib import Path; zvt_home = Path(os.environ.get('ZVT_HOME', Path.home() / '.zvt')); assert zvt_home.exists(), f'ZVT home not found: {zvt_home}'"` → on_fail: Run: python3 -m zvt.init_dirs
- **`PC-04`**: `python3 -c "import os, tempfile; from pathlib import Path; zvt_home = Path(os.environ.get('ZVT_HOME', Path.home() / '.zvt')); test_f = zvt_home / '.write_test'; test_f.touch(); test_f.unlink()"` → on_fail: Check directory permissions: chmod u+w ~/.zvt or set ZVT_HOME environment variable to a writable location
FILE:references/USE_CASES.md
# Known Use Cases (KUC)
Total: **1**
## `KUC-101`
**Source**: `docs/conf.py`
Sets up the Sphinx documentation builder with Chinese language support (via ctex), Markdown parsing via recommonmark, and automatic version string extraction from the akshare package for consistent documentation.
FILE:references/WISDOM.md
# Cross-Project Wisdom
Total: **8**
## `CW-DATA-SOURCING-001` — Exponential backoff retry with rate limit detection
**From**: finance-bp-079--akshare, finance-bp-114--edgar-crawler · **Applicable to**: data-sourcing
Implement retry logic with exponential backoff specifically for HTTP 429 rate limit responses. Retrying immediately on rate limit errors worsens the block situation. Separate retry logic for transient network errors (TimeoutError, ConnectionError) from permanent errors (ValueError, KeyError) prevents resource waste and masks underlying bugs.
## `CW-DATA-SOURCING-002` — Strict date format validation and standardization
**From**: finance-bp-070--edgartools, finance-bp-079--akshare, finance-bp-084--eastmoney · **Applicable to**: data-sourcing
Validate date formats strictly (YYYY-MM-DD pattern with leap year and month-end checks) before processing XBRL or API data. Convert date strings between formats (YYYYMMDD to YYYY-MM-DD) when storing to databases. Invalid dates corrupt downstream financial calculations.
## `CW-DATA-SOURCING-003` — XBRL fact attribute completeness enforcement
**From**: finance-bp-070--edgartools, finance-bp-114--edgar-crawler · **Applicable to**: data-sourcing
Extract and validate all essential XBRL fact attributes (concept, value, period, unit) from every fact. Missing attributes cause financial analysis queries to return incomplete or misleading results. Period type (instant vs duration) must be correctly distinguished for accurate balance sheet rendering.
## `CW-DATA-SOURCING-004` — Streaming parser threshold for large documents
**From**: finance-bp-070--edgartools, finance-bp-128--yfinance · **Applicable to**: data-sourcing
Implement streaming parser activation when documents exceed configurable thresholds (10MB default). This prevents OOM errors on large NPORT-P filings or bulk document downloads. Also require timezone information for time-series data to prevent DST offset corruption.
## `CW-DATA-SOURCING-005` — Data accuracy disclaimer requirements
**From**: finance-bp-079--akshare, finance-bp-128--yfinance, finance-bp-097--OpenBB · **Applicable to**: data-sourcing
Always present scraped or third-party financial data with proper caveats about accuracy limitations and delays. Claims of guaranteed accuracy, real-time capabilities, or Yahoo/provider affiliation violate terms of service and can lead to user financial losses from reliance on delayed or incorrect data.
## `CW-DATA-SOURCING-006` — Atomic write ordering for versioned storage
**From**: finance-bp-103--ArcticDB · **Applicable to**: data-sourcing
Write atom keys (TABLE_DATA, TABLE_INDEX, VERSION) before updating mutable reference keys (VERSION_REF, SNAPSHOT_REF). Never modify atom keys after writing to preserve content-addressed storage invariants. This prevents readers from accessing incomplete data in multi-writer scenarios.
## `CW-DATA-SOURCING-007` — HTTP status code validation before data processing
**From**: finance-bp-079--akshare, finance-bp-097--OpenBB · **Applicable to**: data-sourcing
Always validate HTTP response status codes before processing response data. Error responses (404, 500) may contain HTML error pages that corrupt downstream JSON parsing. Explicitly check for HTTP 429 and raise RateLimitError for proper handling by callers.
## `CW-DATA-SOURCING-008` — Quality gates for financial recommendations
**From**: finance-bp-084--eastmoney · **Applicable to**: data-sourcing
Apply fundamental quality filters (ROE thresholds, OCF/Profit ratios, debt ratios) before generating financial recommendations. Without quality gates, low-quality stocks may be recommended for positions, leading to investment losses. Separate on-demand computation from scheduled pre-computation to handle API rate limits.
FILE:references/components/column_name_standardization.md
# column_name_standardization (2 classes)
## `set_df_columns`
`column_name_standardization/set-df-columns.py:0`
## `Column naming convention`
`column_name_standardization/column-naming-convention.py:0`
FILE:references/components/data_type_conversion.md
# data_type_conversion (1 classes)
## `N/A`
`data_type_conversion/n-a.py:0`
FILE:references/components/html_table_extraction.md
# html_table_extraction (2 classes)
## `N/A`
`html_table_extraction/n-a.py:0`
## `HTML parser`
`html_table_extraction/html-parser.py:0`
FILE:references/components/http_request_layer.md
# http_request_layer (3 classes)
## `AkshareConfig`
`http_request_layer/akshareconfig.py:0`
## `ProxyContext`
`http_request_layer/proxycontext.py:0`
## `HTTP client`
`http_request_layer/http-client.py:0`
FILE:references/components/json_response_parsing.md
# json_response_parsing (1 classes)
## `N/A`
`json_response_parsing/n-a.py:0`
FILE:references/components/paginated_data_fetching.md
# paginated_data_fetching (1 classes)
## `fetch_paginated_data`
`paginated_data_fetching/fetch-paginated-data.py:0`
FILE:references/components/price_adjustment_processing.md
# price_adjustment_processing (1 classes)
## `stock_zh_a_daily`
`price_adjustment_processing/stock-zh-a-daily.py:0`
FILE:references/components/realized_volatility_calculation.md
# realized_volatility_calculation (2 classes)
## `volatility_yz_rv`
`realized_volatility_calculation/volatility-yz-rv.py:0`
## `Volatility estimator`
`realized_volatility_calculation/volatility-estimator.py:0`
FILE:references/components/source-specific_data_acquisition.md
# source-specific_data_acquisition (3 classes)
## `TLSAdapter`
`source-specific_data_acquisition/tlsadapter.py:0`
## `DataApi`
`source-specific_data_acquisition/dataapi.py:0`
## `Data source`
`source-specific_data_acquisition/data-source.py:0`
FILE:references/components/trading_calendar_validation.md
# trading_calendar_validation (2 classes)
## `get_rank_sum_daily`
`trading_calendar_validation/get-rank-sum-daily.py:0`
## `Calendar source`
`trading_calendar_validation/calendar-source.py:0`
MlFinLab 提供金融机器学习高级实现,包括信息驱动 bars(tick/volume/dollar/imbalance bars)、分数阶差分和回测工具,支持多市场因子研究与策略验证。
---
name: advanced-financial-ml
description: |-
MlFinLab 提供金融机器学习高级实现,包括信息驱动 bars(tick/volume/dollar/imbalance bars)、分数阶差分和回测工具,支持多市场因子研究与策略验证。
license: Proprietary. See LICENSE.txt in project root.
compatibility: Designed for Doramagic-host ecosystem (Claude Code / openclaw / Cursor). Requires Python 3.12+ with uv package manager.
metadata:
version: "v6.1"
blueprint_id: "finance-bp-115"
compiled_at: "2026-04-22T13:00:55.567727+00:00"
capability_markets: "multi-market"
capability_activities: "backtesting, factor-research"
sop_version: "crystal-compilation-v6.1"
---
# 金融机器学习 (advanced-financial-ml)
> MlFinLab 提供金融机器学习高级实现,包括信息驱动 bars(tick/volume/dollar/imbalance bars)、分数阶差分和回测工具,支持多市场因子研究与策略验证。
## Pipeline
`data_collection -> data_storage -> factor_computation -> target_selection -> trading_execution -> visualization`
## Top Use Cases (1 total)
### Sphinx Documentation Configuration (`UC-101`)
How to configure and generate project documentation using Sphinx autodoc and extensions for API documentation coverage
**Triggers**: documentation, sphinx, autodoc
**Execute trigger**: `When user intent matches intent_router.uc_entries[].positive_terms AND user uses action verb (run/execute/跑/执行/backtest/fetch/collect)`
## What I'll Ask You
- Target market: A-share (default), HK, or crypto? (US stocks in ZVT are half-baked — stockus_nasdaq_AAPL exists but coverage is thin)
- Data source / provider: eastmoney (free, no account), joinquant (account+paid), baostock (free, good history), akshare, or qmt (broker)?
- Strategy type: MACD golden-cross, MA crossover, volume breakout, fundamental screen, or custom factor?
- Time range: start_timestamp and end_timestamp for backtest period
- Target entity IDs: specific stocks (stock_sh_600000) or index components (SZ1000)?
## Semantic Locks (Fatal)
| ID | Rule | On Violation |
|---|---|---|
| `SL-01` | Execute sell orders before buy orders in every trading cycle | halt |
| `SL-02` | Trading signals MUST use next-bar execution (no look-ahead) | halt |
| `SL-03` | Entity IDs MUST follow format entity_type_exchange_code | halt |
| `SL-04` | DataFrame index MUST be MultiIndex (entity_id, timestamp) | halt |
| `SL-05` | TradingSignal MUST have EXACTLY ONE of: position_pct, order_money, order_amount | halt |
| `SL-06` | filter_result column semantics: True=BUY, False=SELL, None/NaN=NO ACTION | halt |
| `SL-07` | Transformer MUST run BEFORE Accumulator in factor pipeline | halt |
| `SL-08` | MACD parameters locked: fast=12, slow=26, signal=9 | halt |
Full lock definitions: [references/LOCKS.md](references/LOCKS.md)
## Top Anti-Patterns (25 total)
- **`AP-ZVT-183`**: 除权因子为 inf/NaN 时直接参与乘法导致复权静默失败
- **`AP-ZVT-179`**: 第三方数据接口超限后异常被吞噬,数据静默缺失
- **`AP-ZVT-183B`**: HFQ(后复权)与 QFQ(前复权)K 线表使用错误导致因子计算漂移
All 25 anti-patterns: [references/ANTI_PATTERNS.md](references/ANTI_PATTERNS.md)
## Evidence Quality Notice
> [QUALITY NOTICE] This crystal was compiled from blueprint finance-bp-115. Evidence verify ratio = 43.7% and audit fail total = 34. Generated results may have uncaptured requirement gaps. Verify critical decisions against source files (LATEST.yaml / LATEST.jsonl).
## Reference Files
| File | Contents | When to Load |
|---|---|---|
| [references/seed.yaml](references/seed.yaml) | V6+ 全量权威 (source-of-truth) | 有行为/决策争议时必读 |
| [references/ANTI_PATTERNS.md](references/ANTI_PATTERNS.md) | 25 条跨项目反模式 | 开始实现前 |
| [references/WISDOM.md](references/WISDOM.md) | 跨项目精华借鉴 | 架构决策时 |
| [references/CONSTRAINTS.md](references/CONSTRAINTS.md) | domain + fatal 约束 | 规则冲突时 |
| [references/USE_CASES.md](references/USE_CASES.md) | 全量 KUC-* 业务场景 | 需要完整示例时 |
| [references/LOCKS.md](references/LOCKS.md) | SL-* + preconditions + hints | 生成回测/交易代码前 |
| [references/COMPONENTS.md](references/COMPONENTS.md) | AST 组件地图(按 module 拆分)| 查 API 时 |
---
*Compiled by Doramagic crystal-compilation-v6.1 from `finance-bp-115` blueprint at 2026-04-22T13:00:55.567727+00:00.*
*See [human_summary.md](human_summary.md) for non-technical overview.*
FILE:human_summary.md
# Human Summary
> I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow. Just tell me what you want; I'll write the code, you don't have to dig docs. (Heads up: ZVT natively supports A-share, HK, and crypto. US stocks — stockus_nasdaq_AAPL — are half-baked; don't bother for serious work.)
## What I Can Do
- **tagline**: I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow. Just tell me what you want; I'll write the code, you don't have to dig docs. (Heads up: ZVT natively supports A-share, HK, and crypto. US stocks — stockus_nasdaq_AAPL — are half-baked; don't bother for serious work.)
- **use_cases**: ['Sphinx Documentation Configuration', 'A-share MACD daily golden-cross backtest with hfq price adjustment from eastmoney', 'End-to-end ZVT pipeline: FinanceRecorder + GoodCompanyFactor + StockTrader', 'Multi-factor strategy with TargetSelector (AND mode) combining MACD + volume breakout', 'Index composition data collection (SZ1000, SZ2000) with EM recorder', 'Institutional fund holdings tracker via joinquant_fund_runner pattern', 'Custom Transformer + Accumulator factor with per-entity rolling state']
## What I Ask You
- Target market: A-share (default), HK, or crypto? (US stocks in ZVT are half-baked — stockus_nasdaq_AAPL exists but coverage is thin)
- Data source / provider: eastmoney (free, no account), joinquant (account+paid), baostock (free, good history), akshare, or qmt (broker)?
- Strategy type: MACD golden-cross, MA crossover, volume breakout, fundamental screen, or custom factor?
- Time range: start_timestamp and end_timestamp for backtest period
- Target entity IDs: specific stocks (stock_sh_600000) or index components (SZ1000)?
FILE:references/ANTI_PATTERNS.md
# Anti-Patterns (Cross-Project)
Total: **25**
## qlib (9)
### `AP-QLIB-1930` — 回测结果与模型无关——共享 dataset 对象导致预测值被首次模型覆盖 <sub>(high)</sub>
Qlib 中多个模型复用同一个已 fit 的 DatasetH 实例时,dataset 内部的标准化 参数(fit_start_time/fit_end_time 决定的归一化统计量)在第一次 fit 后固化。 切换模型但不重新初始化 dataset,导致所有模型实际使用同一套预测信号。表现为 无论换 LightGBM/XGBoost/DNN,回测净值曲线完全一致。这是最危险的"实验看起来 在跑,但结论全部无效"反模式。
Source: https://github.com/microsoft/qlib/issues/1930
### `AP-QLIB-2090` — fit_start_time 与 train segment 双重配置引发隐式数据泄露 <sub>(high)</sub>
Qlib DatasetH 有两个"训练数据范围":handler 的 fit_start_time/fit_end_time (决定归一化器拟合范围)和 segments.train(决定模型训练范围)。常见错误是 让 fit_end_time 覆盖 valid/test 段,使归一化统计量(均值、标准差)包含了 未来数据,造成前向偏差(look-ahead bias)。两者独立配置但语义耦合,文档 未明确说明 fit_end_time 必须 <= train_end。
Source: https://github.com/microsoft/qlib/issues/2090
### `AP-QLIB-2036` — MACD 因子公式文档错误——DEA 被多除一次 CLOSE 导致量纲不一致 <sub>(high)</sub>
Qlib 官方文档中的 Alpha 公式示例将 MACD 的 DEA 定义为 EMA(DIF, 9) / CLOSE, 但 DIF 已经是无量纲(除过 CLOSE 的),再次除以 CLOSE 导致 DEA 量纲为 1/price。 基于此文档公式构建的 MACD 因子在截面标准化后与正确公式差异显著,IC 下降。 此类文档层面的公式错误会被大量用户直接照搬入生产因子库。
Source: https://github.com/microsoft/qlib/issues/2036
### `AP-QLIB-2184` — 自定义 A 股数据导入前未按约定填充停牌日 NaN,引发下游因子噪声 <sub>(high)</sub>
Qlib 约定停牌日 open/close/high/low/volume/factor 字段均应填 NaN,以便框架 在因子计算时识别并跳过。用户自建 A 股数据集时若将停牌日保留为上一日价格 (常见于从东财/Wind 直接导出的数据),会导致停牌期间的价格动量因子出现 "假信号"(价格不变但因子非零)。Qlib 不校验此约定,错误静默流入训练数据。
Source: https://github.com/microsoft/qlib/issues/2184
### `AP-QLIB-1892` — PIT(Point-In-Time)财务数据收集器依赖外部股票列表接口,全量 A 股获取不完整 <sub>(high)</sub>
Qlib 的 PIT 数据收集器(财务数据时间点快照)在初始化时调用 get_hs_stock_symbols() 获取沪深股票列表。该函数依赖东财 API,经常仅返回 部分列表而非全量 5000+ 股票,且函数在获取不完整时直接 raise ValueError。 用户若按文档步骤操作,财务数据集将只覆盖部分股票,基于 PIT 财务因子的回测 存在严重生存者偏差(未被采集的股票被隐式排除)。
Source: https://github.com/microsoft/qlib/issues/1892
### `AP-QLIB-2097` — 全市场 instrument="all" 在 32GB 内存机器上 OOM,但 CSI300 正常 <sub>(medium)</sub>
Qlib 在加载 Alpha158 特征时会将指定 universe 的全部特征矩阵一次性载入内存。 使用 instrument="csi300"(300 股)与 instrument="all"(5000+ 股)的内存占用 差约 16 倍。32GB 机器跑全市场时在 init_instance_by_config 阶段直接 OOM, 错误信息不提示内存问题。用户容易误以为是配置错误,实际上需要分批加载或 使用流式特征计算。
Source: https://github.com/microsoft/qlib/issues/2097
### `AP-QLIB-1984` — LightGBM 模型标签维度校验逻辑永远不触发导致多标签训练静默失败 <sub>(medium)</sub>
Qlib gbdt.py 中用 y.values.ndim == 2 判断是否为多标签,但从 DataFrame 取出的 Series 的 ndim 永远为 1,条件永远为 False,因此多标签训练不会走 squeeze 分支,而是直接进入 LightGBM 训练并在更深处抛出语义不明的错误。 用户尝试自定义多标签任务时无法从错误信息定位到此根因。
Source: https://github.com/microsoft/qlib/issues/1984
### `AP-QLIB-1915` — 自定义 CSV 数据 dump_bin 后 DataHandler 报 Length mismatch,D.features 却正常 <sub>(high)</sub>
Qlib 存在两套数据访问路径:D.features(直接读 binary)和 DataHandler/DataHandlerLP (带 processor pipeline)。自定义 A 股 CSV 数据在 dump_bin 时若字段顺序 或 symbol 格式(如 600000.SH vs SH600000)与 Qlib 约定不符,DataHandler 的 processor 在 align/reindex 时触发 Length mismatch,而 D.features 因不 经过 processor 而成功。这一"两套路径行为不一致"让用户误以为数据已正确导入。
Source: https://github.com/microsoft/qlib/issues/1915
### `AP-QLIB-1949` — Colab/Linux 多进程后端与 Qlib ParallelExt 冲突导致 DataHandler 完全不可用 <sub>(medium)</sub>
Qlib 在非 fork 环境(Windows 或 Google Colab)中,DataHandler 使用 joblib 并行加载特征时,ParallelExt 初始化时访问 _backend_args 属性失败(AttributeError)。 根因是 joblib 1.5+ 移除了该内部属性,Qlib 的兼容层未更新。表现为 D.features 调用抛出多层嵌套异常,用户无法从错误栈判断是并行后端问题还是数据问题。
Source: https://github.com/microsoft/qlib/issues/1949
## vnpy (4)
### `AP-VNPY-3691` — K 线生成器首根 K 线时间戳不对齐,导致第一个周期信号错误 <sub>(high)</sub>
vnpy BarGenerator 在合成 N 分钟 K 线时,第一根推送的 K 线时间戳为"当前 tick 所在分钟"而非"完整 N 分钟周期结束时间"。具体表现:09:59 的 tick 会 触发一根不完整的 5 分钟 K 线推送(本应等到 10:04 才推送)。策略若在 on_bar 中直接用 datetime.minute % 5 过滤,第一根 K 线恰好通过,但包含的 数据不足一个完整周期,用于信号计算会产生错误的开仓信号。
Source: https://github.com/vnpy/vnpy/issues/3691
### `AP-VNPY-3669` — Alpha 模块历史数据增量保存时新旧 DataFrame schema 不兼容导致 SchemaError <sub>(medium)</sub>
vnpy Alpha 模块在保存 K 线数据到 Parquet 文件时,将新下载数据(可能含 Float64 列)与已存文件(历史 Int64 列)直接 polars.concat。polars 强类型 不允许隐式类型提升,抛出 SchemaError。根因是不同数据源/版本返回的字段类型 不一致(如 volume 在部分行情源为整数,在另一些为浮点),且 concat 前无 schema 对齐步骤。影响所有使用 vnpy alpha 进行回测的历史数据构建流程。
Source: https://github.com/vnpy/vnpy/issues/3669
### `AP-VNPY-3685` — 价差交易模块 run_backtesting() 在 Jupyter 环境下静默报错,结果不可信 <sub>(high)</sub>
vnpy 4.10 价差交易(SpreadTrading)模块的 run_backtesting() 在 Jupyter 环境下存在事件循环冲突(asyncio already running),导致回测引擎部分逻辑 不执行但不抛异常,返回看似正常的回测统计数据。同样代码在命令行 Python 中无此问题。vnpy 4.x 将部分 IO 改为 async 但 Jupyter 的事件循环与之不兼容, 是"回测结果看起来正确但实际不完整"的隐蔽陷阱。
Source: https://github.com/vnpy/vnpy/issues/3685
### `AP-VNPY-3700` — 安装脚本不使用 venv 导致全局 numpy 版本被降级破坏其他依赖 <sub>(medium)</sub>
vnpy install.bat 直接在系统/conda base 环境安装,会强制降级 numpy 到 <2.0 以满足 vnpy 依赖,破坏依赖 numpy 2.x 的其他量化工具(如 scipy、pytorch 新版)。 没有 requirements.txt,依赖边界不透明。在多工具共存的量化研究环境中, vnpy 的安装脚本是"全局环境污染"的常见根源。
Source: https://github.com/vnpy/vnpy/issues/3700
## zipline (6)
### `AP-ZIPLINE-138` — 回测价格为未复权价,教程图表误导用户误判策略收益 <sub>(high)</sub>
Zipline 教程使用 AAPL 股价图做演示,但 bundle 中存储的是未复权价格(raw price), 而非经过拆股/分红调整的复权价。图表显示的历史价格与市场实际价约差 4 倍(Apple 历次拆股累计因子),用户误将"价格翻 4 倍"当作策略收益。A 股场景更严重: 除权前后价格跳变会在未复权数据中形成巨大"信号",吸引技术指标在除权日产生 虚假突破信号。
Source: https://github.com/stefan-jansen/zipline-reloaded/issues/138
### `AP-ZIPLINE-235` — 默认以当根 K 线收盘价成交,低估实盘滑点,策略回测收益虚高 <sub>(high)</sub>
Zipline 默认滑点模型在当根 K 线触发信号后,以同根 K 线收盘价成交(current bar close fill)。实盘中信号只能在下一根 K 线的开盘价附近成交(T+1 order execution)。以 A 股日线为例,用收盘价回测比用次日开盘价成交平均高估日收益 约 0.1-0.3%,年化差距可超 30%。需显式配置 slippage model 为 VolumeShareSlippage 或 FixedSlippage 并设合理 volume_limit。
Source: https://github.com/stefan-jansen/zipline-reloaded/issues/235
### `AP-ZIPLINE-190` — 日历 start_session 设为非交易日触发 DateOutOfBounds,无提示如何修正 <sub>(medium)</sub>
Zipline 在注册 bundle 或运行算法时,若 start_session 参数恰好是非交易日 (如 1998-01-01 元旦),Calendar 校验抛出 DateOutOfBounds("cannot be earlier than the first session")。错误信息仅显示交易日历起始日,不提示"请改为第一个 交易日"。A 股场景:使用 SSE/SZSE 日历时,若 start_date 恰好是春节前最后 一天次日(节假日),会触发同类错误,调试成本极高。
Source: https://github.com/stefan-jansen/zipline-reloaded/issues/190
### `AP-ZIPLINE-181` — asset db 过期后 Pipeline 报"no assets traded",误导用户排查数据范围 <sub>(high)</sub>
Zipline 的 asset database(SQLite)记录每只股票的 start/end 交易日期。若 使用了旧版 Quandl/自建 bundle 且未重新 ingest,在回测新日期范围时 Pipeline 抛出 "Failed to find any assets with country_code 'US' that traded between [dates]"。A 股场景:重新下载行情后若只更新价格数据而未重建 asset db,退市/ 新上市股票的日期范围不更新,Pipeline 过滤会悄悄排除这些股票,产生生存者偏差。
Source: https://github.com/stefan-jansen/zipline-reloaded/issues/181
### `AP-ZIPLINE-285` — week_start()/week_end() 在自定义日历(非美股)下静默失效 <sub>(medium)</sub>
Zipline schedule_function 的 date_rules.week_start() 和 date_rules.week_end() 依赖交易日历的周首/周末判断逻辑,但在非美股日历(如 ASX、SSE)中,该逻辑 与 NYSE 日历的偏移计算不兼容,导致 schedule 永远不触发或在错误的日期触发。 A 股场景:使用 SSE 日历时,含春节等连续长假的周,week_start 可能跳过整个 假期周而不调仓,但用户无法从日志发现未触发的调度。
Source: https://github.com/stefan-jansen/zipline-reloaded/issues/285
### `AP-ZIPLINE-240` — 回测日期时区必须为 UTC,传入 naive datetime 引发深层 AssertionError <sub>(medium)</sub>
Zipline 内部强制要求所有时间戳为 UTC aware datetime。当用户传入 naive datetime (无时区信息,如 pd.Timestamp('2020-01-01'))时,不在入口处报错,而是在 算法执行深处触发 AssertionError: Algorithm should have a utc datetime,栈深 难以定位。A 股开发者从本地 CST 时间导入数据时极易触发此陷阱,需在 bundle 注册时显式 tz_localize('UTC')。
Source: https://github.com/stefan-jansen/zipline-reloaded/issues/240
## zvt (6)
### `AP-ZVT-183` — 除权因子为 inf/NaN 时直接参与乘法导致复权静默失败 <sub>(high)</sub>
ZVT 在计算前复权因子时以 new/old 价格比计算 qfq_factor。当 old==0(新股首日 或数据缺失)时因子为 inf;当 kdata.open 本身为 None(停牌日未填充)时乘法 抛出 TypeError。结果:整个 entity 的复权计算中断,后续 K 线全部丢失,但主 流程只 log ERROR 不中断,用户往往不知道已有大量股票数据损坏。
Source: https://github.com/zvtvz/zvt/issues/183
### `AP-ZVT-179` — 第三方数据接口超限后异常被吞噬,数据静默缺失 <sub>(high)</sub>
ZVT 使用聚宽 jqdatasdk 批量拉取全市场 K 线时(4000+ 股票),触发聚宽每日 最大查询条数限制(错误:已超过每日最大查询数量)。ZVT 捕获异常后继续执行下一 entity,导致超限后所有股票的当日数据均静默缺失。回测若使用该残缺数据库,因 子计算结果将产生系统性偏差,且无告警。
Source: https://github.com/zvtvz/zvt/issues/179
### `AP-ZVT-161` — 全市场 SQLite 批量因子计算触发 too many SQL variables 错误 <sub>(medium)</sub>
ZVT 在计算 VolumeUpMaFactor 等多股因子时,将所有 entity_id 拼入单条 SQL 的 IN 子句。当 A 股全市场(5000+ 股)一次性查询时,触发 SQLite 默认限制 SQLITE_MAX_VARIABLE_NUMBER=999。调大 max_allowed_packet(MySQL 参数)无效, 根因是 SQLite 变量数上限。正确解法是分批查询,但 ZVT 早期版本未处理此边界。
Source: https://github.com/zvtvz/zvt/issues/161
### `AP-ZVT-129` — 使用通配符导入隐藏 API 版本变更,AdjustType 等枚举莫名消失 <sub>(medium)</sub>
ZVT 文档示例使用 `from zvt import *` 导入所有符号。当 ZVT 版本升级重构 枚举(如将 AdjustType 移入子模块)后,通配符导入不再包含该符号,触发 AttributeError。使用者误以为是安装问题,实际是版本间 API breaking change 未在 CHANGELOG 中标注,且通配符导入掩盖了具体来源。应显式 import 枚举类。
Source: https://github.com/zvtvz/zvt/issues/129
### `AP-ZVT-187` — 回测引擎未在数据层空结果时提前终止,导致空指针级联崩溃 <sub>(medium)</sub>
ZVT Trader 在 load_data 完成后检查数据为空时,不提前退出,而是将空 DataFrame 传入 selector 计算,触发后续 NoneType 操作链式崩溃。错误栈深且难以定位根因, 用户误以为是策略逻辑问题。根因是数据时间窗口配置错误(start/end 不在数据 库覆盖范围内)但无有效校验。
Source: https://github.com/zvtvz/zvt/issues/187
### `AP-ZVT-183B` — HFQ(后复权)与 QFQ(前复权)K 线表使用错误导致因子计算漂移 <sub>(high)</sub>
ZVT 提供 Stock1dKdata(不复权)、Stock1dHfqKdata(后复权)、Stock1dQfqKdata (前复权)三张独立表。用户在计算价格动量/均线因子时混用两张表(如用不复权 做均线,用后复权做收益率),导致除权日前后因子值产生跳变。ZVT 不做跨表 复权类型一致性校验,混用静默通过。
Source: https://github.com/zvtvz/zvt/issues/183
FILE:references/COMPONENTS.md
# Component Capability Map
**Project**: finance-bp-115--mlfinlab
**Scan date**: 2026-04-22
**Stats**: {'total_files': 12, 'total_classes': 58, 'total_functions': 0, 'total_stages': 12}
## Modules (12)
- [data_ingestion_&_bar_construction](components/data_ingestion_-_bar_construction.md): 4 classes
- [event_filtering_&_sampling](components/event_filtering_-_sampling.md): 3 classes
- [triple_barrier_labeling_&_meta-labeling](components/triple_barrier_labeling_-_meta-labeling.md): 7 classes
- [sample_weighting_&_uniqueness](components/sample_weighting_-_uniqueness.md): 5 classes
- [feature_engineering_&_importance](components/feature_engineering_-_importance.md): 6 classes
- [model_training_with_sequential_bootstrap](components/model_training_with_sequential_bootstrap.md): 5 classes
- [bet_sizing](components/bet_sizing.md): 5 classes
- [backtesting_&_statistics](components/backtesting_-_statistics.md): 6 classes
- [correlation_&_codependence_analysis](components/correlation_-_codependence_analysis.md): 4 classes
- [clustering_&_network_generation](components/clustering_-_network_generation.md): 6 classes
- [synthetic_data_generation](components/synthetic_data_generation.md): 5 classes
- [structural_break_detection](components/structural_break_detection.md): 2 classes
FILE:references/CONSTRAINTS.md
# Constraints
## preservation_manifest
```yaml
required_objects:
business_decisions_count: 131
fatal_constraints_count: 76
non_fatal_constraints_count: 250
use_cases_count: 1
semantic_locks_count: 12
preconditions_count: 4
evidence_quality_rules_count: 2
traceback_scenarios_count: 5
```
## Domain Constraints Injected (39)
- **`SHARED-BT-LAB-001`** <sub>(fatal)</sub>: 未来函数(Lookahead Bias):在模拟历史时间点 t 的交易决策时, 不得使用 t 时刻之后才能知道的信息。最常见形式: (1) 使用收盘价计算信号并同日以收盘价成交; (2) 将 T 日收盘后计算的指标标记在同一根 K 线; (3) 使用当日最高/最低价作为成交假设。 信号计算与成交时间必须对齐:T 日收盘后计算信号,T+1 日开盘成交。
- **`SHARED-BT-LAB-002`** <sub>(high)</sub>: 指标预热期(Warmup Period)处理:滚动窗口指标在前 N 个 bar 时 NaN, 这些 bar 不应参与信号计算和持仓决策。强制要求指标的 warmup_period 与最长 lookback 期等长,且 warmup 期间持仓应置零。
- **`SHARED-BT-LAB-003`** <sub>(fatal)</sub>: ML/DL 模型时序数据划分必须按时间顺序:TRAIN < VALID < TEST, 不可使用随机 k-fold 分折(会将未来数据混入训练集)。 应使用 TimeSeriesSplit 或 Walk-Forward 验证。
- **`SHARED-BT-LAB-004`** <sub>(fatal)</sub>: 开盘价/最高价/最低价成交假设:日线回测中假设每日可以最高价卖出或 最低价买入(如动量策略"最高价止盈"),这是明显的 lookahead, 因为日内最高/最低价只有收盘后才能确认。成交价只能用开盘价或 前一日收盘价(带滑点)。
- **`SHARED-BT-LAB-005`** <sub>(high)</sub>: 数据对齐偏移(Off-by-one):pandas rolling/shift 等操作容易引入细微的 1 期偏移错误。应在代码中明确记录每个序列的"观测时间点", 并通过 assert 验证关键时间对齐关系。
- **`SHARED-BT-LAB-006`** <sub>(high)</sub>: 过度优化(Overfitting):回测数量越多,过拟合概率越高。 Bailey et al.(2014)证明 Optimal Sharpe Ratio 期望值随回测次数单调递减。 应使用 Walk-Forward 验证代替 in-sample 参数穷举,并报告 Deflated Sharpe Ratio(DSR)而非峰值 Sharpe。
- **`SHARED-BT-SURV-001`** <sub>(fatal)</sub>: 幸存者偏差(Survivorship Bias):使用当前市场成分股作为历史回测股票池, 会遗漏曾经存在但后来退市、摘牌或被合并的股票,系统性高估策略历史收益率。 回测股票池必须使用历史时点快照(point-in-time universe)。
- **`SHARED-BT-SURV-002`** <sub>(high)</sub>: In-Sample / Out-of-Sample 划分:策略开发、参数选择必须在样本内完成, 样本外数据仅用于最终验证,不可多次"看"样本外数据后继续调优 (会将样本外变为新的样本内,重蹈过拟合)。
- **`SHARED-BT-SURV-003`** <sub>(high)</sub>: 停牌/缺失数据的填充策略:停牌日价格不可简单用前一日收盘价 forward-fill, 因为这会在复盘时造成"零成交量"日参与了因子计算和信号生成。 应在因子计算层显式过滤缺失交易日,不填充。
- **`SHARED-BT-SURV-004`** <sub>(high)</sub>: 异常值(Extreme Value)污染:原始市场数据可能含有数据源错误(如除权未 及时调整、手工录入错误导致的极端价格),不清洗直接进入因子计算会产生 极端信号,污染整个横截面。应在 pipeline 入口处过滤 3-sigma 异常值。
- **`SHARED-BT-COST-001`** <sub>(fatal)</sub>: 交易成本(佣金 + 印花税/转让税 + 过户费)必须在回测初始化时强制配置, 不可使用零成本默认值。忽略成本的回测策略绩效指标具有欺骗性, 高换手率策略尤其严重(单边往返成本往往吞噬 50%+ 的毛收益)。
- **`SHARED-BT-COST-002`** <sub>(high)</sub>: 滑点(Slippage)建模:回测若无滑点,假设每笔订单以理想价格成交, 高频策略在实盘中会因成交价劣化而产生严重亏损。至少应配置固定点差 或比例滑点;大单应使用成交量比例模型(如不超过日成交量 5%)。
- **`SHARED-BT-COST-003`** <sub>(high)</sub>: 换手率(Turnover)必须在回测绩效报告中展示并与成本关联分析。 月换手率超过 50%(年化 600%+)时,策略净收益对成本假设极度敏感, 每 10bps 成本变化可能改变策略盈亏结论,必须做成本敏感性分析。
- **`SHARED-BT-COST-004`** <sub>(medium)</sub>: 仓位规模化(Position Sizing)必须纳入资金量约束:回测应模拟固定资金量 下的实际持仓股数(取整),而非假设可以持有小数股。 对小盘股,最小交易单位(A股:100股/手)会导致实际可持仓量与目标权重 产生偏差,应在回测中模拟取整效应。
- **`SHARED-BT-TIME-001`** <sub>(high)</sub>: 时间戳时区统一:多数据源合并时,UTC vs 本地时间混用是常见数据腐败源。 所有时间戳必须在 pipeline 入口处统一转换为同一时区(推荐 UTC 存储, 市场本地时区展示),不可在 pipeline 中途混用不同时区。
- **`SHARED-BT-TIME-002`** <sub>(high)</sub>: 交易日历对齐:合并不同市场或不同频率数据时(如日线价格 + 周频因子), 必须使用明确的交易日历进行 reindex/merge,不可使用 outer join 后 fillna, 否则会在非交易日(节假日)创建虚假数据行。
- **`SHARED-BT-TIME-003`** <sub>(high)</sub>: 增量更新边界校验:历史数据增量更新时,必须从数据库查询已存最新日期, 仅下载该日期之后的数据。若重新下载已有数据并追加,会产生时间戳重复行, 导致回测时序错误。更新前后必须校验无重复 (index.duplicated().any() == False)。
- **`SHARED-BT-TIME-004`** <sub>(medium)</sub>: 回测绩效归因失真:基准(Benchmark)选择不当会使 Alpha/Beta 计算失真。 应选用策略实际可投资的被动基准(如 HS300 ETF),而非不可直接投资的 价格指数(如 HS300 指数)。价格指数不含股息再投资,会低估持仓基准收益。
- **`SHARED-BT-PERF-001`** <sub>(medium)</sub>: 最大回撤(Max Drawdown)计算必须使用净值序列(portfolio value), 不可用累计收益率序列代替。若使用对数收益率累加,会低估回撤深度 (因对数收益率在下跌时会比简单收益率偏小)。
- **`SHARED-BT-PERF-002`** <sub>(medium)</sub>: Sharpe Ratio 年化化约定:年化 Sharpe = 日 Sharpe × sqrt(252)(股票,252 交易日) 或 × sqrt(365)(加密货币,365日)。不同系统默认不同,跨系统对比前必须 确认年化因子,否则 Sharpe 不可比。
- **`SHARED-BT-PERF-003`** <sub>(medium)</sub>: Calmar Ratio / Sortino Ratio 优于 Sharpe Ratio 作为风险调整收益指标: Sharpe 假设收益正态分布,A 股/加密市场的收益分布显著左偏(肥尾), 会低估下行风险。量化评估应同时报告 Sortino(仅下行波动)和 Calmar(年化收益/最大回撤),不应单一依赖 Sharpe。
- **`SHARED-BT-PERF-004`** <sub>(medium)</sub>: 回测绩效归因应拆解为:alpha(主动收益)、beta(市场收益)、 因子暴露收益(style/sector)和特异性收益(stock selection)。 不做归因的回测无法区分"策略优秀"与"顺风行情恰好 beta 对了"。
- **`SHARED-FR-IC-001`** <sub>(high)</sub>: IC(信息系数)是衡量因子预测能力的核心指标,定义为因子值与 下期收益率的 Spearman 秩相关系数(ICIR = IC / std(IC))。 IC 绝对值 > 0.05 视为有预测能力的初步证据,ICIR > 0.5 视为稳定。 不计算 IC 直接报告回测绩效是因子有效性证明缺失的典型问题。
- **`SHARED-FR-IC-002`** <sub>(high)</sub>: IC 衰减(IC Decay)分析:因子预测能力通常随持仓期增长而衰减。 应计算 1/5/10/20 日 IC 序列,识别因子的最优持仓期。 IC 在1日高但20日迅速衰减的因子是短期因子,不适合月度换仓策略; 反之亦然。使用错误的持仓期会严重损害因子实盘表现。
- **`SHARED-FR-IC-003`** <sub>(high)</sub>: Harvey, Liu & Zhu (2016) 警告:学术界已发现 300+ 个"显著"因子, 其中大量是多重检验下的误发现(False Discovery)。因子有效性要求: t-stat > 3.0(而非传统的 1.96);或在不同时段/市场独立复现; 或有清晰的经济学逻辑。不满足上述条件的因子极可能是数据挖掘产物。
- **`SHARED-FR-IC-004`** <sub>(high)</sub>: 因子换手率(Factor Turnover)控制:高 IC 但高换手率的因子,在扣除 交易成本后净 IC 可能为负。应计算换手率调整后的有效 IC: net_IC = IC - turnover × cost_per_turn。目标换手率 ≤ 50%(月频)。
- **`SHARED-FR-IC-005`** <sub>(medium)</sub>: 因子衰减期(Half-life)是因子信号强度的核心参数,直接决定最优再平衡频率。 半衰期 < 5 日:日频或周频换仓;5-20 日:周频或双周;> 20 日:月频换仓。 错误地对短期因子使用月频换仓,会导致大量 alpha 在持仓期内消散。
- **`SHARED-FR-NEUT-001`** <sub>(high)</sub>: 行业中性化(Industry Neutralization):因子值若不对行业均值中性化, 因子收益中会混入行业轮动收益,难以判断是因子本身还是行业暴露驱动了收益。 行业中性化操作:factor_neutral = factor - industry_mean(factor)。
- **`SHARED-FR-NEUT-002`** <sub>(high)</sub>: 市值中性化(Market Cap Neutralization):小盘股效应(小盘跑赢大盘) 是金融史上最持久的 anomaly 之一,会污染几乎所有未中性化的因子。 若因子与市值高度相关,选股会系统性偏向小盘,收益来自市值暴露而非因子本身。 需同时进行行业和市值中性化(Fama-MacBeth 回归或残差法)。
- **`SHARED-FR-NEUT-003`** <sub>(high)</sub>: 异常值处理(Winsorize/MAD):因子原始值通常含有极端值,极端值会扭曲 分组分析(如 Q1/Q10 十分位)。应对原始因子值做 Winsorize(截尾至 [1%, 99%] 或 3-sigma)或 MAD(中位数绝对偏差)缩尾,然后再排名/中性化。
- **`SHARED-FR-NEUT-004`** <sub>(medium)</sub>: 因子正交化(Factor Orthogonalization):当多个因子共同用于合成打分时, 高相关因子的合成等效于对单一因子过度权重,稀释信号多样性。 应在合成前对因子做施密特正交化或 PCA,消除因子间的多重共线性。
- **`SHARED-FR-NEUT-005`** <sub>(medium)</sub>: 缺失数据填充策略:因子计算中的 NaN(停牌/新股/数据缺口)若用截面均值填充 会引入 lookahead bias(均值本身含未来信息);若完全删除会产生幸存者偏差; 正确做法是用截面中位数(当日所有股票的中位数,不依赖未来)或将该股当日排除。
- **`SHARED-FR-PORT-001`** <sub>(high)</sub>: 分层分析(Quantile Analysis):因子评估应使用 Q1/Q5(五分位)或 Q1/Q10(十分位)分组的多空收益差(top minus bottom spread)作为 主要评估指标,而非简单的多头收益。Q1 多 Q5 空的"单调性"检验是 因子有效性的核心证据:单调递增/递减 > 非单调 >> 仅多头有效。
- **`SHARED-FR-PORT-002`** <sub>(medium)</sub>: Alpha 衰减测试(Alpha Decay Test):因子的月度 IC 在不同时段(牛市/熊市/ 震荡市)的稳定性是因子鲁棒性的重要证据。IC 仅在某个特定市场状态下有效 的因子不适合全天候部署;应分段(rolling 12M)展示 IC 时序, 识别因子失效期。
- **`SHARED-FR-PORT-003`** <sub>(medium)</sub>: 换仓成本感知(Turnover-Aware Selection):因子排名靠近中间地带(49-51 分位) 的股票,排名小幅波动就会触发换仓,产生大量无效交易成本。 应在选股时设置换仓缓冲区(buffer zone):只在排名变化超过阈值时才换仓。
- **`SHARED-FR-PORT-004`** <sub>(medium)</sub>: 分组收益的统计显著性(Bootstrap 检验):因子分层收益差(Q1-Q5 spread) 即使在历史数据上很大,也可能是偶然,需要 bootstrap 或 t-test 检验 显著性(p-value < 0.05)。小样本回测期(< 3年)的分层收益尤其不可靠。
- **`SHARED-FR-XFER-001`** <sub>(high)</sub>: 因子跨市场可移植性验证:在一个市场有效的因子,不必然在另一个市场有效。 将美股因子直接套用 A 股、或将股票因子套用期货/加密货币,需要独立 IC 验证, 不可假设跨市场通用性。A 股特有异象(如反转效应、ST 价格异常)不存在于美股。
- **`SHARED-FR-XFER-002`** <sub>(medium)</sub>: 因子有效性时间稳定性:曾经有效的因子会因市场学习和套利行为逐渐失效 (McLean & Pontiff 2016 证明因子发表后平均衰减 58%)。 应定期(每季度/年)重新评估因子 IC,失效因子应及时替换或降权。
- **`SHARED-FR-XFER-003`** <sub>(medium)</sub>: 因子与宏观经济环境的交互:利率周期/经济周期/市场情绪对因子有效性影响显著。 价值因子(低 P/B)在利率上升期更有效;动量因子在趋势市更有效,震荡市失效。 部署因子前应评估当前宏观环境与因子最优生存环境的匹配度。
FILE:references/LOCKS.md
# Semantic Locks + Preconditions
## Semantic Locks (12)
### `SL-01` <sub>(on_violation: fatal)</sub>
Execute sell orders before buy orders in every trading cycle
### `SL-02` <sub>(on_violation: fatal)</sub>
Trading signals MUST use next-bar execution (no look-ahead)
### `SL-03` <sub>(on_violation: fatal)</sub>
Entity IDs MUST follow format entity_type_exchange_code
### `SL-04` <sub>(on_violation: fatal)</sub>
DataFrame index MUST be MultiIndex (entity_id, timestamp)
### `SL-05` <sub>(on_violation: fatal)</sub>
TradingSignal MUST have EXACTLY ONE of: position_pct, order_money, order_amount
### `SL-06` <sub>(on_violation: fatal)</sub>
filter_result column semantics: True=BUY, False=SELL, None/NaN=NO ACTION
### `SL-07` <sub>(on_violation: fatal)</sub>
Transformer MUST run BEFORE Accumulator in factor pipeline
### `SL-08` <sub>(on_violation: fatal)</sub>
MACD parameters locked: fast=12, slow=26, signal=9
### `SL-09` <sub>(on_violation: warning)</sub>
Default transaction costs: buy_cost=0.001, sell_cost=0.001, slippage=0.001
### `SL-10` <sub>(on_violation: fatal)</sub>
A-share equity trading is T+1 (no same-day close of buy positions)
### `SL-11` <sub>(on_violation: fatal)</sub>
Recorder subclass MUST define provider AND data_schema class attributes
### `SL-12` <sub>(on_violation: fatal)</sub>
Factor result_df MUST contain either 'filter_result' OR 'score_result' column
## Preconditions (4)
- **`PC-01`**: `python3 -c 'import zvt; print(zvt.__version__)'` → on_fail: Run: python3 -m pip install zvt then re-run: python3 -m zvt.init_dirs to initialize data directories
- **`PC-02`**: `python3 -c "from zvt.api.kdata import get_kdata; df = get_kdata(entity_ids=['stock_sh_600000'], limit=1); assert df is not None and len(df) > 0, 'No kdata found'"` → on_fail: Run recorder first: python3 -m zvt.recorders.em.em_stock_kdata_recorder --entity_ids stock_sh_600000 (replace with your target entity IDs)
- **`PC-03`**: `python3 -c "import os; from pathlib import Path; zvt_home = Path(os.environ.get('ZVT_HOME', Path.home() / '.zvt')); assert zvt_home.exists(), f'ZVT home not found: {zvt_home}'"` → on_fail: Run: python3 -m zvt.init_dirs
- **`PC-04`**: `python3 -c "import os, tempfile; from pathlib import Path; zvt_home = Path(os.environ.get('ZVT_HOME', Path.home() / '.zvt')); test_f = zvt_home / '.write_test'; test_f.touch(); test_f.unlink()"` → on_fail: Check directory permissions: chmod u+w ~/.zvt or set ZVT_HOME environment variable to a writable location
FILE:references/USE_CASES.md
# Known Use Cases (KUC)
Total: **1**
## `KUC-101`
**Source**: `docs/source/conf.py`
How to configure and generate project documentation using Sphinx autodoc and extensions for API documentation coverage.
FILE:references/WISDOM.md
# Cross-Project Wisdom
Total: **10**
## `CW-BT-001` — Cerebro 统一编排引擎
**From**: backtrader · **Applicable to**: backtesting
backtrader 用 Cerebro 作为单一入口,统一管理 data feeds、strategies、analyzers、 observers 的生命周期,支持一次 cerebro.run() 跑多策略+多数据源。 zvt 的 StockTrader 目前每次实例化只绑定一套因子,缺乏统一的多策略组合编排层; 借鉴 Cerebro 模式可让用户把多个 Trader 实例组合到一个 runner 中对比评估。
## `CW-BT-002` — Analyzer 插件化绩效评估
**From**: backtrader · **Applicable to**: backtesting
backtrader 提供 SharpeRatio、DrawDown、TimeReturn、TradeAnalyzer 等即插即用 的 Analyzer,可在不修改策略代码的情况下附加任意绩效指标。 zvt 当前绩效评估能力较弱,没有标准化的 Analyzer 接口; 借鉴此模式可让用户 cerebro.addanalyzer(SharpeRatio) 即得风险调整收益报告。
## `CW-BT-003` — Sizer 仓位管理分离
**From**: backtrader · **Applicable to**: backtesting
backtrader 将仓位管理(每次开仓买多少股/多大比例)单独抽象为 Sizer, 与信号逻辑完全解耦;内置 FixedSize、PercentSizer 等,用户可自定义。 zvt 目前没有显式的 Sizer 概念,仓位控制逻辑散落在 Trader.on_profit_control 等钩子中; 引入 Sizer 接口可使策略信号与资金管理规则独立演化和组合复用。
## `CW-BT-004` — Order 类型全集(Limit/Stop/OCO/Bracket)
**From**: backtrader · **Applicable to**: backtesting
backtrader 支持 Market、Limit、Stop、StopLimit、OCO(二选一)、 Bracket(止盈止损一对订单)等丰富订单类型,并模拟成交滑点和手续费方案。 zvt 回测目前主要支持市价成交,缺乏限价委托和组合订单模拟; 对于高频或实盘对接场景,完善订单类型将大幅提升回测真实性。
## `CW-BT-005` — 数据重采样与重播(Resampling & Replaying)
**From**: backtrader · **Applicable to**: backtesting
backtrader 可将低级别数据(如 1 min)实时 resample 为高级别(如 1 day)并同步驱动策略, 或 replay 逐 tick 模拟 OHLC 形成过程,实现日内精细回测。 zvt 目前多时间框架通过预录入不同级别 K 线实现,缺少运行时动态重采样; 借鉴此模式可在不重复录入数据的前提下支持任意时间粒度组合回测。
## `CW-VN-003` — CTA 回测引擎内置可视化
**From**: vnpy · **Applicable to**: backtesting
vnpy 的 cta_backtester 提供图形界面直接展示策略净值曲线、最大回撤、 每日盈亏、成交明细,无需 Jupyter Notebook。 zvt 目前回测结果可视化依赖 draw_result 方法调用 Plotly,但无统一的回测报告页面; 借鉴此模式可打包一个开箱即用的策略绩效仪表盘。
## `CW-VN-004` — vnpy.alpha ML 因子研究实验室(Lab)
**From**: vnpy · **Applicable to**: factor-research
vnpy 4.0 的 vnpy.alpha.lab 提供数据管理、模型训练、信号生成、策略回测一体化工作流, 支持 Lasso/LightGBM/MLP 等算法的标准化训练接口和可视化对比。 zvt 的 ML 能力目前仅有 MaStockMLMachine 一个入口,缺乏规范化 Lab 框架; 借鉴 Lab 模式可建立"特征工程→训练→信号→回测"的标准流水线,降低 ML 实验门槛。
## `CW-QL-001` — Point-in-Time 数据库(防未来数据泄漏)
**From**: qlib · **Applicable to**: backtesting
qlib 的 Point-in-Time Provider 保证在给定时间点 t 的查询只返回 t 时刻 真实可知的数据(财报发布延迟、修订历史均被正确处理), 彻底消除回测中的 look-ahead bias。 zvt 目前财务数据以报告期为 timestamp,缺少"发布日"维度, 存在用未来财报数据做选股的潜在偏差;引入 PIT 模式可大幅提升回测可信度。
## `CW-QL-002` — Recorder + Experiment 实验管理(MLflow 风格)
**From**: qlib · **Applicable to**: factor-research
qlib 的 workflow 模块提供 Experiment/Recorder,自动记录每次模型训练的 超参数、特征、指标、预测结果,支持跨实验比较和模型版本管理。 zvt 目前缺乏 ML 实验追踪机制,每次重跑结果会覆盖前次; 借鉴 Recorder 模式可将每次因子实验的参数和结果持久化,支持快速复现和版本对比。
## `CW-QL-003` — Nested Decision Framework(多层嵌套决策执行)
**From**: qlib · **Applicable to**: backtesting
qlib 支持将高频执行层(分钟级委托拆单)嵌套在低频决策层(日级组合调仓)中, 两层独立优化且可组合运行,实现日内最优执行算法(如 TWAP、VWAP 调仓)。 zvt 目前回测仅有日线级别的成交假设,缺乏执行算法建模; 借鉴嵌套框架可让 zvt 区分"何时持有哪些股"与"如何以最小冲击成本建仓"两个问题。
FILE:references/components/backtesting_-_statistics.md
# backtesting_&_statistics (6 classes)
## `CampbellBacktesting.haircut_sharpe_ratios`
`backtesting_&_statistics/campbellbacktesting-haircut-sharpe-ratio.py:0`
## `sharpe_ratio`
`backtesting_&_statistics/sharpe-ratio.py:0`
## `probabilistic_sharpe_ratio`
`backtesting_&_statistics/probabilistic-sharpe-ratio.py:0`
## `deflated_sharpe_ratio`
`backtesting_&_statistics/deflated-sharpe-ratio.py:0`
## `drawdown_and_time_under_water`
`backtesting_&_statistics/drawdown-and-time-under-water.py:0`
## `Sharpe adjustment method`
`backtesting_&_statistics/sharpe-adjustment-method.py:0`
FILE:references/components/bet_sizing.md
# bet_sizing (5 classes)
## `M2N.fit`
`bet_sizing/m2n-fit.py:0`
## `bet_size_probability`
`bet_sizing/bet-size-probability.py:0`
## `bet_size_dynamic`
`bet_sizing/bet-size-dynamic.py:0`
## `bet_size_reserve`
`bet_sizing/bet-size-reserve.py:0`
## `Sizing function`
`bet_sizing/sizing-function.py:0`
FILE:references/components/clustering_-_network_generation.md
# clustering_&_network_generation (6 classes)
## `MST.create_mst`
`clustering_&_network_generation/mst-create-mst.py:0`
## `PMFG.create_pmfg`
`clustering_&_network_generation/pmfg-create-pmfg.py:0`
## `ALMST.create_almst`
`clustering_&_network_generation/almst-create-almst.py:0`
## `get_feature_clusters`
`clustering_&_network_generation/get-feature-clusters.py:0`
## `optimal_hierarchical_cluster`
`clustering_&_network_generation/optimal-hierarchical-cluster.py:0`
## `Network type`
`clustering_&_network_generation/network-type.py:0`
FILE:references/components/correlation_-_codependence_analysis.md
# correlation_&_codependence_analysis (4 classes)
## `get_dependence_matrix`
`correlation_&_codependence_analysis/get-dependence-matrix.py:0`
## `get_mutual_info`
`correlation_&_codependence_analysis/get-mutual-info.py:0`
## `optimal_transport_dependence`
`correlation_&_codependence_analysis/optimal-transport-dependence.py:0`
## `Dependence metric`
`correlation_&_codependence_analysis/dependence-metric.py:0`
FILE:references/components/data_ingestion_-_bar_construction.md
# data_ingestion_&_bar_construction (4 classes)
## `MicrostructuralFeaturesGenerator.generate_features`
`data_ingestion_&_bar_construction/microstructuralfeaturesgenerator-generat.py:0`
## `Bar threshold calculation`
`data_ingestion_&_bar_construction/bar-threshold-calculation.py:0`
## `Imbalance metric`
`data_ingestion_&_bar_construction/imbalance-metric.py:0`
## `Bar type`
`data_ingestion_&_bar_construction/bar-type.py:0`
FILE:references/components/event_filtering_-_sampling.md
# event_filtering_&_sampling (3 classes)
## `cusum_filter`
`event_filtering_&_sampling/cusum-filter.py:0`
## `z_score_filter`
`event_filtering_&_sampling/z-score-filter.py:0`
## `Filter type`
`event_filtering_&_sampling/filter-type.py:0`
FILE:references/components/feature_engineering_-_importance.md
# feature_engineering_&_importance (6 classes)
## `FractionalDifferentiation.frac_diff_ffd`
`feature_engineering_&_importance/fractionaldifferentiation-frac-diff-ffd.py:0`
## `mean_decrease_impurity`
`feature_engineering_&_importance/mean-decrease-impurity.py:0`
## `mean_decrease_accuracy`
`feature_engineering_&_importance/mean-decrease-accuracy.py:0`
## `get_orthogonal_features`
`feature_engineering_&_importance/get-orthogonal-features.py:0`
## `Fractional differentiation method`
`feature_engineering_&_importance/fractional-differentiation-method.py:0`
## `Importance metric`
`feature_engineering_&_importance/importance-metric.py:0`
FILE:references/components/model_training_with_sequential_bootstrap.md
# model_training_with_sequential_bootstrap (5 classes)
## `SequentiallyBootstrappedBaggingClassifier.fit`
`model_training_with_sequential_bootstrap/sequentiallybootstrappedbaggingclassifie.py:0`
## `ml_cross_val_score`
`model_training_with_sequential_bootstrap/ml-cross-val-score.py:0`
## `PurgedKFold.split`
`model_training_with_sequential_bootstrap/purgedkfold-split.py:0`
## `CombinatorialPurgedKFold.split`
`model_training_with_sequential_bootstrap/combinatorialpurgedkfold-split.py:0`
## `Cross-validation generator`
`model_training_with_sequential_bootstrap/cross-validation-generator.py:0`
FILE:references/components/sample_weighting_-_uniqueness.md
# sample_weighting_&_uniqueness (5 classes)
## `get_weights_by_return`
`sample_weighting_&_uniqueness/get-weights-by-return.py:0`
## `get_weights_by_time_decay`
`sample_weighting_&_uniqueness/get-weights-by-time-decay.py:0`
## `seq_bootstrap`
`sample_weighting_&_uniqueness/seq-bootstrap.py:0`
## `get_ind_matrix`
`sample_weighting_&_uniqueness/get-ind-matrix.py:0`
## `Weighting scheme`
`sample_weighting_&_uniqueness/weighting-scheme.py:0`
FILE:references/components/structural_break_detection.md
# structural_break_detection (2 classes)
## `get_sadf`
`structural_break_detection/get-sadf.py:0`
## `Break detection model`
`structural_break_detection/break-detection-model.py:0`
FILE:references/components/synthetic_data_generation.md
# synthetic_data_generation (5 classes)
## `sample_from_dvine`
`synthetic_data_generation/sample-from-dvine.py:0`
## `sample_from_cvine`
`synthetic_data_generation/sample-from-cvine.py:0`
## `generate_hcmb_mat`
`synthetic_data_generation/generate-hcmb-mat.py:0`
## `sample_from_corrgan`
`synthetic_data_generation/sample-from-corrgan.py:0`
## `Generation method`
`synthetic_data_generation/generation-method.py:0`
FILE:references/components/triple_barrier_labeling_-_meta-labeling.md
# triple_barrier_labeling_&_meta-labeling (7 classes)
## `apply_pt_sl_on_t1`
`triple_barrier_labeling_&_meta-labeling/apply-pt-sl-on-t1.py:0`
## `get_events`
`triple_barrier_labeling_&_meta-labeling/get-events.py:0`
## `get_bins`
`triple_barrier_labeling_&_meta-labeling/get-bins.py:0`
## `add_vertical_barrier`
`triple_barrier_labeling_&_meta-labeling/add-vertical-barrier.py:0`
## `drop_labels`
`triple_barrier_labeling_&_meta-labeling/drop-labels.py:0`
## `Vertical barrier`
`triple_barrier_labeling_&_meta-labeling/vertical-barrier.py:0`
## `Labeling approach`
`triple_barrier_labeling_&_meta-labeling/labeling-approach.py:0`
建模资产支持证券交易结构,模拟抵押贷款池现金流、债券分级偿还和瀑布分配,分析 tranche 收益与风险表现。。
---
name: abs-cashflow-modeling
description: |-
建模资产支持证券交易结构,模拟抵押贷款池现金流、债券分级偿还和瀑布分配,分析 tranche 收益与风险表现。。
license: Proprietary. See LICENSE.txt in project root.
compatibility: Designed for Doramagic-host ecosystem (Claude Code / openclaw / Cursor). Requires Python 3.12+ with uv package manager.
metadata:
version: "v6.1"
blueprint_id: "finance-bp-076"
compiled_at: "2026-04-22T13:00:28.210602+00:00"
capability_markets: "global"
capability_activities: "insurance-actuarial"
sop_version: "crystal-compilation-v6.1"
---
# ABS 现金流建模 (abs-cashflow-modeling)
> 建模资产支持证券交易结构,模拟抵押贷款池现金流、债券分级偿还和瀑布分配,分析 tranche 收益与风险表现。
## Pipeline
`data_collection -> data_storage -> factor_computation -> target_selection -> trading_execution -> visualization`
## Top Use Cases (40 total)
### Basic ABS Deal Model (`UC-001`)
Model a basic asset-backed securities deal with mortgage pool, bonds, fees, and waterfall to analyze cashflows and tranche performance
**Triggers**: basic deal, ABS, mortgage pool
### Adjustable Rate Mortgage Pool (`UC-002`)
Model an adjustable rate mortgage pool with LIBOR-based floating rates and periodic resets
**Triggers**: ARM, adjustable rate, LIBOR
### Bond Step-Up Rate (`UC-003`)
Model bonds with scheduled rate step-ups at specific dates for ABS deal structuring
**Triggers**: step-up, bond rate, scheduled increase
For all **40** use cases, see [references/USE_CASES.md](references/USE_CASES.md).
**Execute trigger**: `When user intent matches intent_router.uc_entries[].positive_terms AND user uses action verb (run/execute/跑/执行/backtest/fetch/collect)`
## What I'll Ask You
- Target market: A-share (default), HK, or crypto? (US stocks in ZVT are half-baked — stockus_nasdaq_AAPL exists but coverage is thin)
- Data source / provider: eastmoney (free, no account), joinquant (account+paid), baostock (free, good history), akshare, or qmt (broker)?
- Strategy type: MACD golden-cross, MA crossover, volume breakout, fundamental screen, or custom factor?
- Time range: start_timestamp and end_timestamp for backtest period
- Target entity IDs: specific stocks (stock_sh_600000) or index components (SZ1000)?
## Semantic Locks (Fatal)
| ID | Rule | On Violation |
|---|---|---|
| `SL-01` | Execute sell orders before buy orders in every trading cycle | halt |
| `SL-02` | Trading signals MUST use next-bar execution (no look-ahead) | halt |
| `SL-03` | Entity IDs MUST follow format entity_type_exchange_code | halt |
| `SL-04` | DataFrame index MUST be MultiIndex (entity_id, timestamp) | halt |
| `SL-05` | TradingSignal MUST have EXACTLY ONE of: position_pct, order_money, order_amount | halt |
| `SL-06` | filter_result column semantics: True=BUY, False=SELL, None/NaN=NO ACTION | halt |
| `SL-07` | Transformer MUST run BEFORE Accumulator in factor pipeline | halt |
| `SL-08` | MACD parameters locked: fast=12, slow=26, signal=9 | halt |
Full lock definitions: [references/LOCKS.md](references/LOCKS.md)
## Top Anti-Patterns (15 total)
- **`AP-INSURANCE-001`**: Implicit numeric format assumptions without validation
- **`AP-INSURANCE-002`**: Triangle axis construction with invalid temporal ordering
- **`AP-INSURANCE-003`**: Cumulative/incremental triangle representation misuse
All 15 anti-patterns: [references/ANTI_PATTERNS.md](references/ANTI_PATTERNS.md)
## Evidence Quality Notice
> [QUALITY NOTICE] This crystal was compiled from blueprint finance-bp-076. Evidence verify ratio = 37.8% and audit fail total = 22. Generated results may have uncaptured requirement gaps. Verify critical decisions against source files (LATEST.yaml / LATEST.jsonl).
## Reference Files
| File | Contents | When to Load |
|---|---|---|
| [references/seed.yaml](references/seed.yaml) | V6+ 全量权威 (source-of-truth) | 有行为/决策争议时必读 |
| [references/ANTI_PATTERNS.md](references/ANTI_PATTERNS.md) | 15 条跨项目反模式 | 开始实现前 |
| [references/WISDOM.md](references/WISDOM.md) | 跨项目精华借鉴 | 架构决策时 |
| [references/CONSTRAINTS.md](references/CONSTRAINTS.md) | domain + fatal 约束 | 规则冲突时 |
| [references/USE_CASES.md](references/USE_CASES.md) | 全量 KUC-* 业务场景 | 需要完整示例时 |
| [references/LOCKS.md](references/LOCKS.md) | SL-* + preconditions + hints | 生成回测/交易代码前 |
| [references/COMPONENTS.md](references/COMPONENTS.md) | AST 组件地图(按 module 拆分)| 查 API 时 |
---
*Compiled by Doramagic crystal-compilation-v6.1 from `finance-bp-076` blueprint at 2026-04-22T13:00:28.210602+00:00.*
*See [human_summary.md](human_summary.md) for non-technical overview.*
FILE:human_summary.md
# Human Summary
> I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow. Just tell me what you want; I'll write the code, you don't have to dig docs. (Heads up: ZVT natively supports A-share, HK, and crypto. US stocks — stockus_nasdaq_AAPL — are half-baked; don't bother for serious work.)
## What I Can Do
- **tagline**: I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow. Just tell me what you want; I'll write the code, you don't have to dig docs. (Heads up: ZVT natively supports A-share, HK, and crypto. US stocks — stockus_nasdaq_AAPL — are half-baked; don't bother for serious work.)
- **use_cases**: ['Bond Step-Up Rate', 'Adjustable Rate Mortgage Pool', 'Basic ABS Deal Model', 'A-share MACD daily golden-cross backtest with hfq price adjustment from eastmoney', 'End-to-end ZVT pipeline: FinanceRecorder + GoodCompanyFactor + StockTrader', 'Multi-factor strategy with TargetSelector (AND mode) combining MACD + volume breakout', 'Index composition data collection (SZ1000, SZ2000) with EM recorder']
## What I Ask You
- Target market: A-share (default), HK, or crypto? (US stocks in ZVT are half-baked — stockus_nasdaq_AAPL exists but coverage is thin)
- Data source / provider: eastmoney (free, no account), joinquant (account+paid), baostock (free, good history), akshare, or qmt (broker)?
- Strategy type: MACD golden-cross, MA crossover, volume breakout, fundamental screen, or custom factor?
- Time range: start_timestamp and end_timestamp for backtest period
- Target entity IDs: specific stocks (stock_sh_600000) or index components (SZ1000)?
FILE:references/ANTI_PATTERNS.md
# Anti-Patterns (Cross-Project)
Total: **15**
## finance-bp-063--chainladder-python (4)
### `AP-INSURANCE-002` — Triangle axis construction with invalid temporal ordering <sub>(high)</sub>
Development dates are created without verifying they are strictly greater than origin dates, or development lags are calculated with incorrect formulas (e.g., using wrong divisor for monthly difference). This creates logically impossible triangle cells where development <= origin, corrupting the fundamental data structure and producing wrong loss development patterns.
### `AP-INSURANCE-003` — Cumulative/incremental triangle representation misuse <sub>(high)</sub>
Link ratios are computed on incremental triangles instead of cumulative form, or cum_to_incr/incr_to_cum conversions are not properly inverse-applied. This produces link ratios near 1.0 regardless of actual claims development, leading to misleading development factors and incorrect IBNR estimates.
### `AP-INSURANCE-004` — Including incomplete latest diagonal in development analysis <sub>(high)</sub>
Link ratio computation includes the latest diagonal which contains incomplete/in-progress development data. Without excluding this diagonal via valuation_date filtering, development factor estimation uses partial data that biases IBNR estimates. The latest diagonal must be excluded to capture true historical development patterns.
### `AP-INSURANCE-015` — Triangle grain transformation with incompatible parameters <sub>(medium)</sub>
Triangle grain() method is called without setting is_cumulative attribute, or origin grain is made finer than development grain. These produce invalid triangular data structures with misaligned periods and undefined behavior, corrupting actuarial reserving calculations.
## finance-bp-064--insurance_python (2)
### `AP-INSURANCE-005` — EIOPA calibration workflow violations <sub>(high)</sub>
Smith-Wilson calibration workflow is violated in multiple ways: calibration step is skipped before extrapolation, different alpha values are used for calibration vs extrapolation, or convergence point T uses incorrect formula. These violations produce mathematically inconsistent rate curves where observed points do not match market data and extrapolated rates violate EIOPA specifications.
### `AP-INSURANCE-006` — Missing iteration bounds causing infinite loops <sub>(high)</sub>
Root-finding algorithms like bisection for alpha calibration lack maxIter parameters. When the algorithm fails to converge (e.g., no sign change in Galfa at interval bounds), the application freezes indefinitely, causing service disruption. This is especially critical in regulatory compliance workflows where calibration must complete.
## finance-bp-064--insurance_python, finance-bp-126--lifelines (1)
### `AP-INSURANCE-007` — Invalid financial/mathematical constraints not validated <sub>(high)</sub>
Correlation coefficients outside [-1,1], non-positive-semidefinite covariance matrices, negative durations, or entry times >= duration are not validated before use. These cause Cholesky decomposition failures, imaginary values in sqrt(1-rho²), or logically impossible scenarios, producing NaN prices or corrupted at-risk calculations.
## finance-bp-065--pyliferisk (4)
### `AP-INSURANCE-008` — None values propagated to arithmetic operations <sub>(high)</sub>
Critical parameters like interest rate i are passed as None to actuarial calculations. In pyliferisk, Actuarial.__init__ with i=None causes TypeError in (1/(1+i)) and commutation arrays remain empty. Bare except clauses catch these TypeErrors and silently return 0, masking the fundamental issue and producing incorrect but seemingly valid results.
### `AP-INSURANCE-009` — Stub function implementations and duplicate definitions <sub>(high)</sub>
Critical insurance functions like deferred temporary annuities are implemented as empty stubs (only 'pass' statement) or have duplicate definitions where the second shadows the first. This causes functions to return None instead of calculated values, breaking increasing annuity and premium calculations silently in production.
### `AP-INSURANCE-010` — Dispatcher routing to undefined functions <sub>(medium)</sub>
Complex function dispatchers (like annuity()) handle many parameter combinations but call functions that do not exist (e.g., qtaaxn, qtaxn). This causes NameError at runtime when specific parameter combinations are requested, preventing deferred temporary increasing annuity calculations entirely.
### `AP-INSURANCE-014` — Actuarial convention violations in life table construction <sub>(high)</sub>
Life tables violate standard actuarial conventions: using incorrect radix (not 100000), failing to append 0 to lx array for complete extinction, or using wrong payment adjustment formula for fractional annuities. These violations scale all derived quantities (dx, ex, reserves, premiums) incorrectly.
## finance-bp-065--pyliferisk, finance-bp-064--insurance_python (1)
### `AP-INSURANCE-001` — Implicit numeric format assumptions without validation <sub>(high)</sub>
Data formats like per-mille qx values or rate-to-price conversions are applied implicitly without validation. In pyliferisk, qx values stored as per-mille (qx*1000) are used directly as probabilities yielding 1000x errors. In insurance_python, rates are converted to prices using p=(1+r)^(-M) without verifying input format. This causes material miscalculations in reserve and premium calculations.
## finance-bp-126--lifelines (3)
### `AP-INSURANCE-011` — Survival function monotonicity not enforced <sub>(high)</sub>
Non-parametric survival curve estimators do not verify that S(t) is monotonically non-increasing across timeline values. Violations produce mathematically invalid survival curves where probability of survival increases over time, or S(0) is not initialized to 1.0, breaking interpretation as probability distribution.
### `AP-INSURANCE-012` — Input data corruption via inplace operations <sub>(medium)</sub>
User-provided DataFrames are modified inplace using .pop() operations without first creating a copy. This permanently corrupts user data by removing columns, violating data isolation principles and potentially affecting downstream analysis on the original data.
### `AP-INSURANCE-013` — Interval censoring bounds not validated <sub>(medium)</sub>
Lower and upper bounds for interval-censored data are not validated, allowing upper_bound < lower_bound. Invalid interval bounds produce undefined survival probability calculations, potentially negative time intervals in the likelihood function, and corrupt NPMLE estimation.
FILE:references/COMPONENTS.md
# Component Capability Map
**Project**: finance-bp-076--AbsBox
**Scan date**: 2026-04-22
**Stats**: {'total_files': 8, 'total_classes': 29, 'total_functions': 0, 'total_stages': 8}
## Modules (8)
- [deal_definition](components/deal_definition.md): 4 classes
- [component_transformation](components/component_transformation.md): 7 classes
- [deal_execution_(api)](components/deal_execution_-api.md): 4 classes
- [result_parsing](components/result_parsing.md): 4 classes
- [asset_type_system](components/asset_type_system.md): 3 classes
- [input_validation](components/input_validation.md): 2 classes
- [report_generation](components/report_generation.md): 2 classes
- [root_finding](components/root_finding.md): 3 classes
FILE:references/CONSTRAINTS.md
# Constraints
## preservation_manifest
```yaml
required_objects:
business_decisions_count: 122
fatal_constraints_count: 52
non_fatal_constraints_count: 210
use_cases_count: 40
semantic_locks_count: 12
preconditions_count: 4
evidence_quality_rules_count: 2
traceback_scenarios_count: 5
```
FILE:references/LOCKS.md
# Semantic Locks + Preconditions
## Semantic Locks (12)
### `SL-01` <sub>(on_violation: fatal)</sub>
Execute sell orders before buy orders in every trading cycle
### `SL-02` <sub>(on_violation: fatal)</sub>
Trading signals MUST use next-bar execution (no look-ahead)
### `SL-03` <sub>(on_violation: fatal)</sub>
Entity IDs MUST follow format entity_type_exchange_code
### `SL-04` <sub>(on_violation: fatal)</sub>
DataFrame index MUST be MultiIndex (entity_id, timestamp)
### `SL-05` <sub>(on_violation: fatal)</sub>
TradingSignal MUST have EXACTLY ONE of: position_pct, order_money, order_amount
### `SL-06` <sub>(on_violation: fatal)</sub>
filter_result column semantics: True=BUY, False=SELL, None/NaN=NO ACTION
### `SL-07` <sub>(on_violation: fatal)</sub>
Transformer MUST run BEFORE Accumulator in factor pipeline
### `SL-08` <sub>(on_violation: fatal)</sub>
MACD parameters locked: fast=12, slow=26, signal=9
### `SL-09` <sub>(on_violation: warning)</sub>
Default transaction costs: buy_cost=0.001, sell_cost=0.001, slippage=0.001
### `SL-10` <sub>(on_violation: fatal)</sub>
A-share equity trading is T+1 (no same-day close of buy positions)
### `SL-11` <sub>(on_violation: fatal)</sub>
Recorder subclass MUST define provider AND data_schema class attributes
### `SL-12` <sub>(on_violation: fatal)</sub>
Factor result_df MUST contain either 'filter_result' OR 'score_result' column
## Preconditions (4)
- **`PC-01`**: `python3 -c 'import zvt; print(zvt.__version__)'` → on_fail: Run: python3 -m pip install zvt then re-run: python3 -m zvt.init_dirs to initialize data directories
- **`PC-02`**: `python3 -c "from zvt.api.kdata import get_kdata; df = get_kdata(entity_ids=['stock_sh_600000'], limit=1); assert df is not None and len(df) > 0, 'No kdata found'"` → on_fail: Run recorder first: python3 -m zvt.recorders.em.em_stock_kdata_recorder --entity_ids stock_sh_600000 (replace with your target entity IDs)
- **`PC-03`**: `python3 -c "import os; from pathlib import Path; zvt_home = Path(os.environ.get('ZVT_HOME', Path.home() / '.zvt')); assert zvt_home.exists(), f'ZVT home not found: {zvt_home}'"` → on_fail: Run: python3 -m zvt.init_dirs
- **`PC-04`**: `python3 -c "import os, tempfile; from pathlib import Path; zvt_home = Path(os.environ.get('ZVT_HOME', Path.home() / '.zvt')); test_f = zvt_home / '.write_test'; test_f.touch(); test_f.unlink()"` → on_fail: Check directory permissions: chmod u+w ~/.zvt or set ZVT_HOME environment variable to a writable location
FILE:references/USE_CASES.md
# Known Use Cases (KUC)
Total: **40**
## `KUC-001`
**Source**: `docs/source/deal_sample/test01.py`
Model a basic asset-backed securities deal with mortgage pool, bonds, fees, and waterfall to analyze cashflows and tranche performance
## `KUC-002`
**Source**: `docs/source/deal_sample/arm_sample.py`
Model an adjustable rate mortgage pool with LIBOR-based floating rates and periodic resets
## `KUC-003`
**Source**: `docs/source/deal_sample/bondStepUp.py`
Model bonds with scheduled rate step-ups at specific dates for ABS deal structuring
## `KUC-004`
**Source**: `docs/source/deal_sample/test10.py`
Incorporate interest rate swap to hedge floating rate exposure in ABS deal
## `KUC-005`
**Source**: `docs/source/deal_sample/conditionAgg.py`
Implement conditional aggregation rules in waterfall that trigger based on pool status
## `KUC-006`
**Source**: `docs/source/deal_sample/fee1.py`
Calculate fees based on period, pool balance percentages, and tiered tables in ABS deals
## `KUC-007`
**Source**: `docs/source/deal_sample/fireTrigger.py`
Implement trigger mechanisms that fire events in waterfall based on performance conditions
## `KUC-008`
**Source**: `docs/source/deal_sample/float_bond.py`
Model ABS deal with floating rate bonds tied to SOFR index
## `KUC-009`
**Source**: `docs/source/deal_sample/multi_pool.py`
Model ABS deal with multiple pools containing different asset types (mortgage and loan) with separate assumptions
## `KUC-010`
**Source**: `docs/source/deal_sample/payPrinSeq.py`
Structure sequential principal payments across multiple bond tranches
## `KUC-011`
**Source**: `docs/source/deal_sample/rateCap.py`
Implement interest rate cap to limit floating rate exposure in ABS deal
## `KUC-012`
**Source**: `docs/source/deal_sample/resec.py`
Model re-securitization where bonds from underlying deals become assets in a new structure
## `KUC-013`
**Source**: `docs/source/deal_sample/stepup_sample.py`
Model bonds with conditional step-up rates that increase after specified dates
## `KUC-014`
**Source**: `docs/source/deal_sample/test02.py`
Implement multiple waterfall phases (amortizing, accelerated) with different payment priorities
## `KUC-015`
**Source**: `docs/source/deal_sample/test04.py`
Split pool income (interest/principal) proportionally across multiple accounts
## `KUC-016`
**Source**: `docs/source/deal_sample/test05.py`
Model insurance or liquidation provider supporting interest payments when pool income is insufficient
## `KUC-017`
**Source**: `docs/source/deal_sample/test08.py`
Model GNMA (Ginnie Mae) mortgage-backed deal with custom ARM loans, guarantor fees, and servicer fees
## `KUC-018`
**Source**: `docs/source/deal_sample/ysoc.py`
Implement yield supplement overcollateralization to bridge yield gap between low-rate assets and higher-rate bonds
## `KUC-019`
**Source**: `docs/source/deal_sample/test13.py`
Model assets with pre-defined projected cashflows rather than individual loan calculations
## `KUC-020`
**Source**: `docs/source/nbsample/pool_multiScenario.ipynb`
Run single pool through multiple CDR/CPR scenarios to compare default and prepayment impacts
## `KUC-021`
**Source**: `docs/source/nbsample/multiAsset.ipynb`
Run multiple asset pools (Mortgage, Loan) with separate assumptions and inspect pool balances
## `KUC-022`
**Source**: `docs/source/nbsample/single_mortgage.ipynb`
Project cashflows for individual mortgage with various CDR scenarios
## `KUC-023`
**Source**: `docs/source/nbsample/single_loan.ipynb`
Model individual loan with SOFR-based floating rate and rate assumption scenarios
## `KUC-024`
**Source**: `docs/source/nbsample/How-to-price-Balloon-Mortgage.ipynb`
Price balloon mortgages and analyze impact of default assumptions on pricing
## `KUC-025`
**Source**: `docs/source/nbsample/bond_pricing.ipynb`
Price bonds using discount curve to determine present value of cashflows
## `KUC-026`
**Source**: `docs/source/nbsample/firstLoss.ipynb`
Calculate first loss position and equity tranche absorption using root finder
## `KUC-027`
**Source**: `docs/source/nbsample/triggers.ipynb`
Monitor default rate triggers and cumulative defaults over deal life
## `KUC-028`
**Source**: `docs/source/nbsample/HowDealEnded.ipynb`
Model deal call options and determine deal termination conditions
## `KUC-029`
**Source**: `docs/source/nbsample/Irr_002.ipynb`
Calculate IRR for equity tranche with target return and incentive fee structure
## `KUC-030`
**Source**: `docs/source/nbsample/masterTrust.ipynb`
Model master trust with multiple sub-tranches (A-1, A-2) under same series
## `KUC-031`
**Source**: `docs/source/nbsample/comboSensitivity.ipynb`
Run combined scenarios with different deal structures and pool assumptions
## `KUC-032`
**Source**: `docs/source/nbsample/InspectSample.ipynb`
Inspect and extract intermediate waterfall variables for debugging deal logic
## `KUC-033`
**Source**: `docs/source/nbsample/re_securitization_example.ipynb`
Model complete re-securitization with child deals, parent deal, and asset pooling from bond proceeds
## `KUC-034`
**Source**: `docs/source/nbsample/revolving_buy_multiple_pools.ipynb`
Model revolving credit structure that purchases multiple pools of assets over time
## `KUC-035`
**Source**: `docs/source/nbsample/warehouse.ipynb`
Model warehouse facility with funding period before term deal issuance
## `KUC-036`
**Source**: `docs/source/nbsample/SRT_Example_Native_Prod.ipynb`
Model synthetic risk transfer where credit risk is transferred via derivatives rather than asset transfer
## `KUC-037`
**Source**: `docs/source/nbsample/PoolAndTag.ipynb`
Run pool analysis with tag-based filtering and multiple assumption scenarios
## `KUC-038`
**Source**: `docs/source/nbsample/MultiIntBond.ipynb`
Model bond with multiple interest components (multipliers and separate rate types)
## `KUC-039`
**Source**: `docs/source/nbsample/structuring-lease-doc.ipynb`
Structure ABS deal backed by lease assets with rental income collections
## `KUC-040`
**Source**: `docs/source/nbsample/WhyByTerm.ipynb`
Apply time-varying assumptions by term periods for CPR and other parameters
FILE:references/WISDOM.md
# Cross-Project Wisdom
Total: **10**
## `CW-INSURANCE-001` — Validate input data format and type before computation
**From**: finance-bp-063--chainladder-python, finance-bp-126--lifelines · **Applicable to**: insurance-actuarial
Both triangle construction and survival analysis require strict input validation: numeric types for triangle columns, valid event indicators (0/1), no NaN/Inf values, and correct temporal ordering. This prevents downstream numerical failures and ensures mathematical validity of actuarial computations.
## `CW-INSURANCE-002` — Initialize probability distributions to boundary values
**From**: finance-bp-065--pyliferisk, finance-bp-126--lifelines · **Applicable to**: insurance-actuarial
Survival probability S(0) must equal 1.0 and life table lx must start at standard radix (100000) and end at 0. Properly initializing boundary values ensures actuarial quantities have correct scale and interpretation as probability distributions.
## `CW-INSURANCE-003` — Include iteration limits in numerical root-finding
**From**: finance-bp-064--insurance_python · **Applicable to**: insurance-actuarial
Bisection and other root-finding algorithms must include maxIter parameters and verify interval contains valid root (sign change). This prevents infinite loops when calibration fails, ensuring service availability in regulatory compliance workflows.
## `CW-INSURANCE-004` — Avoid bare except clauses that mask TypeErrors
**From**: finance-bp-065--pyliferisk · **Applicable to**: insurance-actuarial
Bare except clauses that catch all exceptions including TypeError and return default values (0 or None) mask fundamental parameter errors. Use specific exception handling and validate inputs upfront to fail fast with clear error messages.
## `CW-INSURANCE-005` — Preserve standard radix and extinction conventions in life tables
**From**: finance-bp-065--pyliferisk · **Applicable to**: insurance-actuarial
Life insurance calculations rely on industry-standard conventions: radix of 100000 at age 0 and lx[-1]=0 for complete extinction. Deviating from these conventions scales all derived quantities incorrectly and breaks interoperability with other actuarial systems.
## `CW-INSURANCE-006` — Ensure workflow step ordering and parameter consistency
**From**: finance-bp-063--chainladder-python, finance-bp-064--insurance_python · **Applicable to**: insurance-actuarial
Multi-step algorithms (triangle transformations, Smith-Wilson calibration) require strict step ordering: compute calibration vector before extrapolation, use consistent alpha values throughout. Violating workflow order produces undefined or mathematically inconsistent results.
## `CW-INSURANCE-007` — Validate probability bounds for confidence intervals
**From**: finance-bp-126--lifelines · **Applicable to**: insurance-actuarial
Confidence interval bounds must be constrained to [0,1] for probability estimates. Use fillna and formula constraints to ensure CI bounds remain valid probability ranges, preventing invalid statistical inference from actuarial models.
## `CW-INSURANCE-008` — Validate matrix properties before decomposition
**From**: finance-bp-065--pyliferisk, finance-bp-064--insurance_python · **Applicable to**: insurance-actuarial
Positive semi-definite matrices must be verified before Cholesky decomposition. Invalid matrices cause math domain errors or invalid correlated samples. Similarly, correlation coefficients must be validated to [-1,1] bounds before use in sqrt(1-rho²).
## `CW-INSURANCE-009` — Make defensive copies of input DataFrames
**From**: finance-bp-126--lifelines · **Applicable to**: insurance-actuarial
User-provided DataFrames should be copied before inplace modifications (.pop(), .drop()). This preserves user data integrity and prevents side effects from leaking into caller code, maintaining data isolation principles.
## `CW-INSURANCE-010` — Exclude incomplete diagonals from historical analysis
**From**: finance-bp-063--chainladder-python · **Applicable to**: insurance-actuarial
The latest diagonal in claims triangles contains incomplete development data from the current period. Excluding this diagonal via valuation_date filtering ensures development factors capture only completed, reliable historical patterns for unbiased IBNR estimation.
FILE:references/components/asset_type_system.md
# asset_type_system (3 classes)
## `mkAsset`
`asset_type_system/mkasset.py:0`
## `mkAssumpType`
`asset_type_system/mkassumptype.py:0`
## `Asset Classification`
`asset_type_system/asset-classification.py:0`
FILE:references/components/component_transformation.md
# component_transformation (7 classes)
## `mkDate`
`component_transformation/mkdate.py:0`
## `mkAsset`
`component_transformation/mkasset.py:0`
## `mkBndComp`
`component_transformation/mkbndcomp.py:0`
## `mkAction`
`component_transformation/mkaction.py:0`
## `mkWaterfall`
`component_transformation/mkwaterfall.py:0`
## `Asset Type Builder`
`component_transformation/asset-type-builder.py:0`
## `Waterfall Phase Tag`
`component_transformation/waterfall-phase-tag.py:0`
FILE:references/components/deal_definition.md
# deal_definition (4 classes)
## `Generic.__init__`
`deal_definition/generic-init.py:0`
## `SPV.__init__`
`deal_definition/spv-init.py:0`
## `mkDeal`
`deal_definition/mkdeal.py:0`
## `Deal Locale`
`deal_definition/deal-locale.py:0`
FILE:references/components/deal_execution_-api.md
# deal_execution_(api) (4 classes)
## `API.run`
`deal_execution_(api)/api-run.py:0`
## `API.connect`
`deal_execution_(api)/api-connect.py:0`
## `Run Mode`
`deal_execution_(api)/run-mode.py:0`
## `Response Locale`
`deal_execution_(api)/response-locale.py:0`
FILE:references/components/input_validation.md
# input_validation (2 classes)
## `vDate`
`input_validation/vdate.py:0`
## `isListOfDict`
`input_validation/islistofdict.py:0`
FILE:references/components/report_generation.md
# report_generation (2 classes)
## `toHtml`
`report_generation/tohtml.py:0`
## `Report Format`
`report_generation/report-format.py:0`
FILE:references/components/result_parsing.md
# result_parsing (4 classes)
## `Generic.read`
`result_parsing/generic-read.py:0`
## `SPV.read`
`result_parsing/spv-read.py:0`
## `readBondStmt`
`result_parsing/readbondstmt.py:0`
## `Response Locale`
`result_parsing/response-locale.py:0`
FILE:references/components/root_finding.md
# root_finding (3 classes)
## `mkTweak`
`root_finding/mktweak.py:0`
## `mkStop`
`root_finding/mkstop.py:0`
## `Target Metric`
`root_finding/target-metric.py:0`
A 股量化实验室:基于 zvt 框架的数据采集 + 因子研究 + 回测执行一站式。 覆盖 31 个场景——机构持仓、财报、指数成分、MACD/MA/量能择时。仅限中国 A 股。
---
name: a-stock-quant-lab
description: |-
A 股量化实验室:基于 zvt 框架的数据采集 + 因子研究 + 回测执行一站式。
覆盖 31 个场景——机构持仓、财报、指数成分、MACD/MA/量能择时。仅限中国 A 股。
license: MIT-0
compatibility: Python 3.12+, uv package manager. Network access to eastmoney / joinquant / baostock / akshare for data fetch.
metadata:
version: "v6.1"
blueprint_id: "finance-bp-009"
compiled_at: "2026-04-20T07:34:47.524525+00:00"
capability_markets: "cn-astock"
capability_activities: "data-sourcing, backtesting, factor-research"
sop_version: "crystal-compilation-v6.1"
openclaw:
emoji: "📈"
skillKey: a-stock-quant-lab
category: finance
primaryEnv: python
requires:
bins: ["python3", "uv"]
---
# A 股量化实验室 (a-stock-quant-lab)
> 说出"跟机构持仓"或"MACD 回测"——我基于 zvt 直接写代码跑起来,不用你翻文档。 美股数据质量一般,不推荐。
## Pipeline
`data_collection -> data_storage -> factor_computation -> target_selection -> trading_execution -> visualization`
## Top Use Cases (31 total)
### Actor Data Recorder (`UC-101`)
Collects institutional investor holdings and top 10 free float shareholders on a weekly schedule for tracking major player positions
**Triggers**: institutional investor, top holders, actor data
### Financial Statement Recorder (`UC-102`)
Collects fundamental financial data including balance sheets, income statements, and cash flow statements from eastmoney on a weekly basis
**Triggers**: financial statements, balance sheet, income statement
### Index Data Recorder (`UC-103`)
Collects index metadata, index compositions (SZ1000, SZ2000, growth, value indices), and daily index price data
**Triggers**: index data, index composition, SZ1000
For all **31** use cases, see [references/USE_CASES.md](references/USE_CASES.md).
**Execute trigger**: `When user intent matches intent_router.uc_entries[].positive_terms AND user uses action verb (run/execute/跑/执行/backtest/fetch/collect)`
## What I'll Ask You
- Target market: A-share (default), HK, or crypto? (US stocks in ZVT are half-baked — stockus_nasdaq_AAPL exists but coverage is thin)
- Data source / provider: eastmoney (free, no account), joinquant (account+paid), baostock (free, good history), akshare, or qmt (broker)?
- Strategy type: MACD golden-cross, MA crossover, volume breakout, fundamental screen, or custom factor?
- Time range: start_timestamp and end_timestamp for backtest period
- Target entity IDs: specific stocks (stock_sh_600000) or index components (SZ1000)?
## Semantic Locks (Fatal)
| ID | Rule | On Violation |
|---|---|---|
| `SL-01` | Execute sell orders before buy orders in every trading cycle | halt |
| `SL-02` | Trading signals MUST use next-bar execution (no look-ahead) | halt |
| `SL-03` | Entity IDs MUST follow format entity_type_exchange_code | halt |
| `SL-04` | DataFrame index MUST be MultiIndex (entity_id, timestamp) | halt |
| `SL-05` | TradingSignal MUST have EXACTLY ONE of: position_pct, order_money, order_amount | halt |
| `SL-06` | filter_result column semantics: True=BUY, False=SELL, None/NaN=NO ACTION | halt |
| `SL-07` | Transformer MUST run BEFORE Accumulator in factor pipeline | halt |
| `SL-08` | MACD parameters locked: fast=12, slow=26, signal=9 | halt |
Full lock definitions: [references/LOCKS.md](references/LOCKS.md)
## Top Anti-Patterns (47 total)
- **`AP-ZVT-183`**: 除权因子为 inf/NaN 时直接参与乘法导致复权静默失败
- **`AP-ZVT-179`**: 第三方数据接口超限后异常被吞噬,数据静默缺失
- **`AP-ZVT-200`**: Token 失效后数据查询返回空 DataFrame 而非报错
All 47 anti-patterns: [references/ANTI_PATTERNS.md](references/ANTI_PATTERNS.md)
## Evidence Quality Notice
> [QUALITY NOTICE] This crystal was compiled from blueprint finance-bp-009. Evidence verify ratio = 55.0% and audit fail total = 36. Generated results may have uncaptured requirement gaps. Verify critical decisions against source files (LATEST.yaml / LATEST.jsonl).
## Reference Files
| File | Contents | When to Load |
|---|---|---|
| [references/seed.yaml](references/seed.yaml) | V6+ 全量权威 (source-of-truth) | 有行为/决策争议时必读 |
| [references/ANTI_PATTERNS.md](references/ANTI_PATTERNS.md) | 47 条跨项目反模式 | 开始实现前 |
| [references/WISDOM.md](references/WISDOM.md) | 跨项目精华借鉴 | 架构决策时 |
| [references/CONSTRAINTS.md](references/CONSTRAINTS.md) | domain + fatal 约束 | 规则冲突时 |
| [references/USE_CASES.md](references/USE_CASES.md) | 全量 KUC-* 业务场景 | 需要完整示例时 |
| [references/LOCKS.md](references/LOCKS.md) | SL-* + preconditions + hints | 生成回测/交易代码前 |
| [references/COMPONENTS.md](references/COMPONENTS.md) | AST 组件地图(按 module 拆分)| 查 API 时 |
---
*Compiled by Doramagic crystal-compilation-v6.1 from `finance-bp-009` blueprint at 2026-04-20T07:34:47.524525+00:00.*
*See [human_summary.md](human_summary.md) for non-technical overview.*
FILE:human_summary.md
# Human Summary
> I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow. Just tell me what you want; I'll write the code, you don't have to dig docs. (Heads up: ZVT natively supports A-share, HK, and crypto. US stocks — stockus_nasdaq_AAPL — are half-baked; don't bother for serious work.)
## What I Can Do
- **tagline**: I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow. Just tell me what you want; I'll write the code, you don't have to dig docs. (Heads up: ZVT natively supports A-share, HK, and crypto. US stocks — stockus_nasdaq_AAPL — are half-baked; don't bother for serious work.)
- **use_cases**: ['Index Data Recorder', 'Financial Statement Recorder', 'Actor Data Recorder', 'A-share MACD daily golden-cross backtest with hfq price adjustment from eastmoney', 'End-to-end ZVT pipeline: FinanceRecorder + GoodCompanyFactor + StockTrader', 'Multi-factor strategy with TargetSelector (AND mode) combining MACD + volume breakout', 'Index composition data collection (SZ1000, SZ2000) with EM recorder']
## What I Ask You
- Target market: A-share (default), HK, or crypto? (US stocks in ZVT are half-baked — stockus_nasdaq_AAPL exists but coverage is thin)
- Data source / provider: eastmoney (free, no account), joinquant (account+paid), baostock (free, good history), akshare, or qmt (broker)?
- Strategy type: MACD golden-cross, MA crossover, volume breakout, fundamental screen, or custom factor?
- Time range: start_timestamp and end_timestamp for backtest period
- Target entity IDs: specific stocks (stock_sh_600000) or index components (SZ1000)?
FILE:references/ANTI_PATTERNS.md
# Anti-Patterns (Cross-Project)
Total: **47**
## qlib (12)
### `AP-QLIB-1930` — 回测结果与模型无关——共享 dataset 对象导致预测值被首次模型覆盖 <sub>(high)</sub>
Qlib 中多个模型复用同一个已 fit 的 DatasetH 实例时,dataset 内部的标准化 参数(fit_start_time/fit_end_time 决定的归一化统计量)在第一次 fit 后固化。 切换模型但不重新初始化 dataset,导致所有模型实际使用同一套预测信号。表现为 无论换 LightGBM/XGBoost/DNN,回测净值曲线完全一致。这是最危险的"实验看起来 在跑,但结论全部无效"反模式。
Source: https://github.com/microsoft/qlib/issues/1930
### `AP-QLIB-2090` — fit_start_time 与 train segment 双重配置引发隐式数据泄露 <sub>(high)</sub>
Qlib DatasetH 有两个"训练数据范围":handler 的 fit_start_time/fit_end_time (决定归一化器拟合范围)和 segments.train(决定模型训练范围)。常见错误是 让 fit_end_time 覆盖 valid/test 段,使归一化统计量(均值、标准差)包含了 未来数据,造成前向偏差(look-ahead bias)。两者独立配置但语义耦合,文档 未明确说明 fit_end_time 必须 <= train_end。
Source: https://github.com/microsoft/qlib/issues/2090
### `AP-QLIB-2036` — MACD 因子公式文档错误——DEA 被多除一次 CLOSE 导致量纲不一致 <sub>(high)</sub>
Qlib 官方文档中的 Alpha 公式示例将 MACD 的 DEA 定义为 EMA(DIF, 9) / CLOSE, 但 DIF 已经是无量纲(除过 CLOSE 的),再次除以 CLOSE 导致 DEA 量纲为 1/price。 基于此文档公式构建的 MACD 因子在截面标准化后与正确公式差异显著,IC 下降。 此类文档层面的公式错误会被大量用户直接照搬入生产因子库。
Source: https://github.com/microsoft/qlib/issues/2036
### `AP-QLIB-2184` — 自定义 A 股数据导入前未按约定填充停牌日 NaN,引发下游因子噪声 <sub>(high)</sub>
Qlib 约定停牌日 open/close/high/low/volume/factor 字段均应填 NaN,以便框架 在因子计算时识别并跳过。用户自建 A 股数据集时若将停牌日保留为上一日价格 (常见于从东财/Wind 直接导出的数据),会导致停牌期间的价格动量因子出现 "假信号"(价格不变但因子非零)。Qlib 不校验此约定,错误静默流入训练数据。
Source: https://github.com/microsoft/qlib/issues/2184
### `AP-QLIB-1892` — PIT(Point-In-Time)财务数据收集器依赖外部股票列表接口,全量 A 股获取不完整 <sub>(high)</sub>
Qlib 的 PIT 数据收集器(财务数据时间点快照)在初始化时调用 get_hs_stock_symbols() 获取沪深股票列表。该函数依赖东财 API,经常仅返回 部分列表而非全量 5000+ 股票,且函数在获取不完整时直接 raise ValueError。 用户若按文档步骤操作,财务数据集将只覆盖部分股票,基于 PIT 财务因子的回测 存在严重生存者偏差(未被采集的股票被隐式排除)。
Source: https://github.com/microsoft/qlib/issues/1892
### `AP-QLIB-2097` — 全市场 instrument="all" 在 32GB 内存机器上 OOM,但 CSI300 正常 <sub>(medium)</sub>
Qlib 在加载 Alpha158 特征时会将指定 universe 的全部特征矩阵一次性载入内存。 使用 instrument="csi300"(300 股)与 instrument="all"(5000+ 股)的内存占用 差约 16 倍。32GB 机器跑全市场时在 init_instance_by_config 阶段直接 OOM, 错误信息不提示内存问题。用户容易误以为是配置错误,实际上需要分批加载或 使用流式特征计算。
Source: https://github.com/microsoft/qlib/issues/2097
### `AP-QLIB-1974` — data_collector 即使指定 --region US 仍调用东财 A 股接口获取股票列表 <sub>(medium)</sub>
Qlib Yahoo 数据收集器在 download_data 时无论 --region 参数为何,均调用 东财 API(_get_eastmoney)获取完整股票列表作为基底,再用 Yahoo Finance 补充数据。在国际网络环境下东财接口不可达,导致即便指定 US 区域也必须科学 上网。这一隐式依赖从未在文档中说明,是 A 股数据基础设施默认全局的典型 设计陷阱。
Source: https://github.com/microsoft/qlib/issues/1974
### `AP-QLIB-1984` — LightGBM 模型标签维度校验逻辑永远不触发导致多标签训练静默失败 <sub>(medium)</sub>
Qlib gbdt.py 中用 y.values.ndim == 2 判断是否为多标签,但从 DataFrame 取出的 Series 的 ndim 永远为 1,条件永远为 False,因此多标签训练不会走 squeeze 分支,而是直接进入 LightGBM 训练并在更深处抛出语义不明的错误。 用户尝试自定义多标签任务时无法从错误信息定位到此根因。
Source: https://github.com/microsoft/qlib/issues/1984
### `AP-QLIB-1915` — 自定义 CSV 数据 dump_bin 后 DataHandler 报 Length mismatch,D.features 却正常 <sub>(high)</sub>
Qlib 存在两套数据访问路径:D.features(直接读 binary)和 DataHandler/DataHandlerLP (带 processor pipeline)。自定义 A 股 CSV 数据在 dump_bin 时若字段顺序 或 symbol 格式(如 600000.SH vs SH600000)与 Qlib 约定不符,DataHandler 的 processor 在 align/reindex 时触发 Length mismatch,而 D.features 因不 经过 processor 而成功。这一"两套路径行为不一致"让用户误以为数据已正确导入。
Source: https://github.com/microsoft/qlib/issues/1915
### `AP-QLIB-1949` — Colab/Linux 多进程后端与 Qlib ParallelExt 冲突导致 DataHandler 完全不可用 <sub>(medium)</sub>
Qlib 在非 fork 环境(Windows 或 Google Colab)中,DataHandler 使用 joblib 并行加载特征时,ParallelExt 初始化时访问 _backend_args 属性失败(AttributeError)。 根因是 joblib 1.5+ 移除了该内部属性,Qlib 的兼容层未更新。表现为 D.features 调用抛出多层嵌套异常,用户无法从错误栈判断是并行后端问题还是数据问题。
Source: https://github.com/microsoft/qlib/issues/1949
### `AP-QLIB-1930` — 回测结果与模型无关——共享 dataset 对象导致预测值被首次模型覆盖 <sub>(high)</sub>
Qlib 中多个模型复用同一个已 fit 的 DatasetH 实例时,dataset 内部的标准化 参数(fit_start_time/fit_end_time 决定的归一化统计量)在第一次 fit 后固化。 切换模型但不重新初始化 dataset,导致所有模型实际使用同一套预测信号。表现为 无论换 LightGBM/XGBoost/DNN,回测净值曲线完全一致。这是最危险的"实验看起来 在跑,但结论全部无效"反模式。
Source: https://github.com/microsoft/qlib/issues/1930
### `AP-QLIB-2090` — fit_start_time 与 train segment 双重配置引发隐式数据泄露 <sub>(high)</sub>
Qlib DatasetH 有两个"训练数据范围":handler 的 fit_start_time/fit_end_time (决定归一化器拟合范围)和 segments.train(决定模型训练范围)。常见错误是 让 fit_end_time 覆盖 valid/test 段,使归一化统计量(均值、标准差)包含了 未来数据,造成前向偏差(look-ahead bias)。两者独立配置但语义耦合,文档 未明确说明 fit_end_time 必须 <= train_end。
Source: https://github.com/microsoft/qlib/issues/2090
## vnpy (11)
### `AP-VNPY-3691` — K 线生成器首根 K 线时间戳不对齐,导致第一个周期信号错误 <sub>(high)</sub>
vnpy BarGenerator 在合成 N 分钟 K 线时,第一根推送的 K 线时间戳为"当前 tick 所在分钟"而非"完整 N 分钟周期结束时间"。具体表现:09:59 的 tick 会 触发一根不完整的 5 分钟 K 线推送(本应等到 10:04 才推送)。策略若在 on_bar 中直接用 datetime.minute % 5 过滤,第一根 K 线恰好通过,但包含的 数据不足一个完整周期,用于信号计算会产生错误的开仓信号。
Source: https://github.com/vnpy/vnpy/issues/3691
### `AP-VNPY-3705` — CTP 委托价格超限被系统自动撤单,vnpy 无日志输出形同"无声失败" <sub>(high)</sub>
A 股/期货涨跌停价格限制下,超限委托在 CTP 端会被直接撤单(OnRtnOrder statusMsg="50:已撤单被拒绝SHFE:价格跌破跌停板"),而非触发 OnRspOrderInsert 拒单回调。vnpy 的 CTP Gateway 仅在 onRspOrderInsert 时输出拒单日志,对 OnRtnOrder 的撤单原因不做解析区分。策略开发者若依赖日志监控委托失败, 超限委托将完全静默消失,导致实盘仓位与预期严重偏离。
Source: https://github.com/vnpy/vnpy/issues/3705
### `AP-VNPY-3669` — Alpha 模块历史数据增量保存时新旧 DataFrame schema 不兼容导致 SchemaError <sub>(medium)</sub>
vnpy Alpha 模块在保存 K 线数据到 Parquet 文件时,将新下载数据(可能含 Float64 列)与已存文件(历史 Int64 列)直接 polars.concat。polars 强类型 不允许隐式类型提升,抛出 SchemaError。根因是不同数据源/版本返回的字段类型 不一致(如 volume 在部分行情源为整数,在另一些为浮点),且 concat 前无 schema 对齐步骤。影响所有使用 vnpy alpha 进行回测的历史数据构建流程。
Source: https://github.com/vnpy/vnpy/issues/3669
### `AP-VNPY-3707` — CTP Gateway 登出时 C++ 空指针崩溃,重连/切换账号导致进程终止 <sub>(high)</sub>
vnpy_ctp 在调用 close() 登出时,C++ 端 MdApi/TdApi 未检查空指针,有较大概率 触发段错误导致整个 Python 进程崩溃。影响场景:策略测试时频繁登录/登出、切换 模拟与实盘账号、服务器关机重连等。崩溃不产生 Python 异常,无法被 try/except 捕获,是实盘场景中最危险的稳定性陷阱之一。
Source: https://github.com/vnpy/vnpy/issues/3707
### `AP-VNPY-3685` — 价差交易模块 run_backtesting() 在 Jupyter 环境下静默报错,结果不可信 <sub>(high)</sub>
vnpy 4.10 价差交易(SpreadTrading)模块的 run_backtesting() 在 Jupyter 环境下存在事件循环冲突(asyncio already running),导致回测引擎部分逻辑 不执行但不抛异常,返回看似正常的回测统计数据。同样代码在命令行 Python 中无此问题。vnpy 4.x 将部分 IO 改为 async 但 Jupyter 的事件循环与之不兼容, 是"回测结果看起来正确但实际不完整"的隐蔽陷阱。
Source: https://github.com/vnpy/vnpy/issues/3685
### `AP-VNPY-3700` — 安装脚本不使用 venv 导致全局 numpy 版本被降级破坏其他依赖 <sub>(medium)</sub>
vnpy install.bat 直接在系统/conda base 环境安装,会强制降级 numpy 到 <2.0 以满足 vnpy 依赖,破坏依赖 numpy 2.x 的其他量化工具(如 scipy、pytorch 新版)。 没有 requirements.txt,依赖边界不透明。在多工具共存的量化研究环境中, vnpy 的安装脚本是"全局环境污染"的常见根源。
Source: https://github.com/vnpy/vnpy/issues/3700
### `AP-VNPY-3715` — loguru 格式化字符串中含花括号的 order 对象触发 KeyError 导致日志系统崩溃 <sub>(high)</sub>
vnpy engine.py 使用 f-string 将 order.__dict__ 直接格式化后传给 loguru 的 write_log。当 order 的字段名(如 gateway_name)恰好匹配 loguru 格式化占位符时, loguru 将其解析为模板变量并抛出 KeyError,导致整个日志线程崩溃。实盘中 日志系统崩溃意味着后续所有委托/成交记录丢失,是生产环境的高危陷阱。
Source: https://github.com/vnpy/vnpy/issues/3715
### `AP-VNPY-3691` — K 线生成器首根 K 线时间戳不对齐,导致第一个周期信号错误 <sub>(high)</sub>
vnpy BarGenerator 在合成 N 分钟 K 线时,第一根推送的 K 线时间戳为"当前 tick 所在分钟"而非"完整 N 分钟周期结束时间"。具体表现:09:59 的 tick 会 触发一根不完整的 5 分钟 K 线推送(本应等到 10:04 才推送)。策略若在 on_bar 中直接用 datetime.minute % 5 过滤,第一根 K 线恰好通过,但包含的 数据不足一个完整周期,用于信号计算会产生错误的开仓信号。
Source: https://github.com/vnpy/vnpy/issues/3691
### `AP-VNPY-3669` — Alpha 模块历史数据增量保存时新旧 DataFrame schema 不兼容导致 SchemaError <sub>(medium)</sub>
vnpy Alpha 模块在保存 K 线数据到 Parquet 文件时,将新下载数据(可能含 Float64 列)与已存文件(历史 Int64 列)直接 polars.concat。polars 强类型 不允许隐式类型提升,抛出 SchemaError。根因是不同数据源/版本返回的字段类型 不一致(如 volume 在部分行情源为整数,在另一些为浮点),且 concat 前无 schema 对齐步骤。影响所有使用 vnpy alpha 进行回测的历史数据构建流程。
Source: https://github.com/vnpy/vnpy/issues/3669
### `AP-VNPY-3685` — 价差交易模块 run_backtesting() 在 Jupyter 环境下静默报错,结果不可信 <sub>(high)</sub>
vnpy 4.10 价差交易(SpreadTrading)模块的 run_backtesting() 在 Jupyter 环境下存在事件循环冲突(asyncio already running),导致回测引擎部分逻辑 不执行但不抛异常,返回看似正常的回测统计数据。同样代码在命令行 Python 中无此问题。vnpy 4.x 将部分 IO 改为 async 但 Jupyter 的事件循环与之不兼容, 是"回测结果看起来正确但实际不完整"的隐蔽陷阱。
Source: https://github.com/vnpy/vnpy/issues/3685
### `AP-VNPY-3700` — 安装脚本不使用 venv 导致全局 numpy 版本被降级破坏其他依赖 <sub>(medium)</sub>
vnpy install.bat 直接在系统/conda base 环境安装,会强制降级 numpy 到 <2.0 以满足 vnpy 依赖,破坏依赖 numpy 2.x 的其他量化工具(如 scipy、pytorch 新版)。 没有 requirements.txt,依赖边界不透明。在多工具共存的量化研究环境中, vnpy 的安装脚本是"全局环境污染"的常见根源。
Source: https://github.com/vnpy/vnpy/issues/3700
## zipline (13)
### `AP-ZIPLINE-138` — 回测价格为未复权价,教程图表误导用户误判策略收益 <sub>(high)</sub>
Zipline 教程使用 AAPL 股价图做演示,但 bundle 中存储的是未复权价格(raw price), 而非经过拆股/分红调整的复权价。图表显示的历史价格与市场实际价约差 4 倍(Apple 历次拆股累计因子),用户误将"价格翻 4 倍"当作策略收益。A 股场景更严重: 除权前后价格跳变会在未复权数据中形成巨大"信号",吸引技术指标在除权日产生 虚假突破信号。
Source: https://github.com/stefan-jansen/zipline-reloaded/issues/138
### `AP-ZIPLINE-235` — 默认以当根 K 线收盘价成交,低估实盘滑点,策略回测收益虚高 <sub>(high)</sub>
Zipline 默认滑点模型在当根 K 线触发信号后,以同根 K 线收盘价成交(current bar close fill)。实盘中信号只能在下一根 K 线的开盘价附近成交(T+1 order execution)。以 A 股日线为例,用收盘价回测比用次日开盘价成交平均高估日收益 约 0.1-0.3%,年化差距可超 30%。需显式配置 slippage model 为 VolumeShareSlippage 或 FixedSlippage 并设合理 volume_limit。
Source: https://github.com/stefan-jansen/zipline-reloaded/issues/235
### `AP-ZIPLINE-190` — 日历 start_session 设为非交易日触发 DateOutOfBounds,无提示如何修正 <sub>(medium)</sub>
Zipline 在注册 bundle 或运行算法时,若 start_session 参数恰好是非交易日 (如 1998-01-01 元旦),Calendar 校验抛出 DateOutOfBounds("cannot be earlier than the first session")。错误信息仅显示交易日历起始日,不提示"请改为第一个 交易日"。A 股场景:使用 SSE/SZSE 日历时,若 start_date 恰好是春节前最后 一天次日(节假日),会触发同类错误,调试成本极高。
Source: https://github.com/stefan-jansen/zipline-reloaded/issues/190
### `AP-ZIPLINE-182` — CSV bundle 中股票 symbol 列为空/None 时 SQLite 约束失败,全量导入静默中断 <sub>(medium)</sub>
Zipline csvdir bundle 在 ingest 时会将所有 CSV 文件名解析为 symbol,写入 equity_symbol_mappings 表。若 CSV 文件名不符合 Zipline 规范(如含中文、 带交易所后缀 .SH),symbol 字段被解析为空字符串或 None,触发 sqlite3.IntegrityError: NOT NULL constraint failed。错误发生在 ingest 尾声, 前面已写入的数据被回滚,整个 bundle 不可用。常见于 A 股数据(000001.SZ.csv 格式),需预处理文件名去掉后缀。
Source: https://github.com/stefan-jansen/zipline-reloaded/issues/182
### `AP-ZIPLINE-181` — asset db 过期后 Pipeline 报"no assets traded",误导用户排查数据范围 <sub>(high)</sub>
Zipline 的 asset database(SQLite)记录每只股票的 start/end 交易日期。若 使用了旧版 Quandl/自建 bundle 且未重新 ingest,在回测新日期范围时 Pipeline 抛出 "Failed to find any assets with country_code 'US' that traded between [dates]"。A 股场景:重新下载行情后若只更新价格数据而未重建 asset db,退市/ 新上市股票的日期范围不更新,Pipeline 过滤会悄悄排除这些股票,产生生存者偏差。
Source: https://github.com/stefan-jansen/zipline-reloaded/issues/181
### `AP-ZIPLINE-285` — week_start()/week_end() 在自定义日历(非美股)下静默失效 <sub>(medium)</sub>
Zipline schedule_function 的 date_rules.week_start() 和 date_rules.week_end() 依赖交易日历的周首/周末判断逻辑,但在非美股日历(如 ASX、SSE)中,该逻辑 与 NYSE 日历的偏移计算不兼容,导致 schedule 永远不触发或在错误的日期触发。 A 股场景:使用 SSE 日历时,含春节等连续长假的周,week_start 可能跳过整个 假期周而不调仓,但用户无法从日志发现未触发的调度。
Source: https://github.com/stefan-jansen/zipline-reloaded/issues/285
### `AP-ZIPLINE-240` — 回测日期时区必须为 UTC,传入 naive datetime 引发深层 AssertionError <sub>(medium)</sub>
Zipline 内部强制要求所有时间戳为 UTC aware datetime。当用户传入 naive datetime (无时区信息,如 pd.Timestamp('2020-01-01'))时,不在入口处报错,而是在 算法执行深处触发 AssertionError: Algorithm should have a utc datetime,栈深 难以定位。A 股开发者从本地 CST 时间导入数据时极易触发此陷阱,需在 bundle 注册时显式 tz_localize('UTC')。
Source: https://github.com/stefan-jansen/zipline-reloaded/issues/240
### `AP-ZIPLINE-138` — 回测价格为未复权价,教程图表误导用户误判策略收益 <sub>(high)</sub>
Zipline 教程使用 AAPL 股价图做演示,但 bundle 中存储的是未复权价格(raw price), 而非经过拆股/分红调整的复权价。图表显示的历史价格与市场实际价约差 4 倍(Apple 历次拆股累计因子),用户误将"价格翻 4 倍"当作策略收益。A 股场景更严重: 除权前后价格跳变会在未复权数据中形成巨大"信号",吸引技术指标在除权日产生 虚假突破信号。
Source: https://github.com/stefan-jansen/zipline-reloaded/issues/138
### `AP-ZIPLINE-235` — 默认以当根 K 线收盘价成交,低估实盘滑点,策略回测收益虚高 <sub>(high)</sub>
Zipline 默认滑点模型在当根 K 线触发信号后,以同根 K 线收盘价成交(current bar close fill)。实盘中信号只能在下一根 K 线的开盘价附近成交(T+1 order execution)。以 A 股日线为例,用收盘价回测比用次日开盘价成交平均高估日收益 约 0.1-0.3%,年化差距可超 30%。需显式配置 slippage model 为 VolumeShareSlippage 或 FixedSlippage 并设合理 volume_limit。
Source: https://github.com/stefan-jansen/zipline-reloaded/issues/235
### `AP-ZIPLINE-190` — 日历 start_session 设为非交易日触发 DateOutOfBounds,无提示如何修正 <sub>(medium)</sub>
Zipline 在注册 bundle 或运行算法时,若 start_session 参数恰好是非交易日 (如 1998-01-01 元旦),Calendar 校验抛出 DateOutOfBounds("cannot be earlier than the first session")。错误信息仅显示交易日历起始日,不提示"请改为第一个 交易日"。A 股场景:使用 SSE/SZSE 日历时,若 start_date 恰好是春节前最后 一天次日(节假日),会触发同类错误,调试成本极高。
Source: https://github.com/stefan-jansen/zipline-reloaded/issues/190
### `AP-ZIPLINE-181` — asset db 过期后 Pipeline 报"no assets traded",误导用户排查数据范围 <sub>(high)</sub>
Zipline 的 asset database(SQLite)记录每只股票的 start/end 交易日期。若 使用了旧版 Quandl/自建 bundle 且未重新 ingest,在回测新日期范围时 Pipeline 抛出 "Failed to find any assets with country_code 'US' that traded between [dates]"。A 股场景:重新下载行情后若只更新价格数据而未重建 asset db,退市/ 新上市股票的日期范围不更新,Pipeline 过滤会悄悄排除这些股票,产生生存者偏差。
Source: https://github.com/stefan-jansen/zipline-reloaded/issues/181
### `AP-ZIPLINE-285` — week_start()/week_end() 在自定义日历(非美股)下静默失效 <sub>(medium)</sub>
Zipline schedule_function 的 date_rules.week_start() 和 date_rules.week_end() 依赖交易日历的周首/周末判断逻辑,但在非美股日历(如 ASX、SSE)中,该逻辑 与 NYSE 日历的偏移计算不兼容,导致 schedule 永远不触发或在错误的日期触发。 A 股场景:使用 SSE 日历时,含春节等连续长假的周,week_start 可能跳过整个 假期周而不调仓,但用户无法从日志发现未触发的调度。
Source: https://github.com/stefan-jansen/zipline-reloaded/issues/285
### `AP-ZIPLINE-240` — 回测日期时区必须为 UTC,传入 naive datetime 引发深层 AssertionError <sub>(medium)</sub>
Zipline 内部强制要求所有时间戳为 UTC aware datetime。当用户传入 naive datetime (无时区信息,如 pd.Timestamp('2020-01-01'))时,不在入口处报错,而是在 算法执行深处触发 AssertionError: Algorithm should have a utc datetime,栈深 难以定位。A 股开发者从本地 CST 时间导入数据时极易触发此陷阱,需在 bundle 注册时显式 tz_localize('UTC')。
Source: https://github.com/stefan-jansen/zipline-reloaded/issues/240
## zvt (11)
### `AP-ZVT-183` — 除权因子为 inf/NaN 时直接参与乘法导致复权静默失败 <sub>(high)</sub>
ZVT 在计算前复权因子时以 new/old 价格比计算 qfq_factor。当 old==0(新股首日 或数据缺失)时因子为 inf;当 kdata.open 本身为 None(停牌日未填充)时乘法 抛出 TypeError。结果:整个 entity 的复权计算中断,后续 K 线全部丢失,但主 流程只 log ERROR 不中断,用户往往不知道已有大量股票数据损坏。
Source: https://github.com/zvtvz/zvt/issues/183
### `AP-ZVT-179` — 第三方数据接口超限后异常被吞噬,数据静默缺失 <sub>(high)</sub>
ZVT 使用聚宽 jqdatasdk 批量拉取全市场 K 线时(4000+ 股票),触发聚宽每日 最大查询条数限制(错误:已超过每日最大查询数量)。ZVT 捕获异常后继续执行下一 entity,导致超限后所有股票的当日数据均静默缺失。回测若使用该残缺数据库,因 子计算结果将产生系统性偏差,且无告警。
Source: https://github.com/zvtvz/zvt/issues/179
### `AP-ZVT-200` — Token 失效后数据查询返回空 DataFrame 而非报错 <sub>(high)</sub>
当聚宽/东财 token 过期时,ZVT 的 record_data 不抛异常,而是将 API 返回的 错误信息(如"error: token无效")当作 DataFrame 列名解析,得到 0 行空表。 后续更新逻辑认为"无新数据"而跳过,造成数据库长期停止更新却无任何错误日志。 用户直到回测结果异常才发现数据已过期数月。
Source: https://github.com/zvtvz/zvt/issues/200
### `AP-ZVT-161` — 全市场 SQLite 批量因子计算触发 too many SQL variables 错误 <sub>(medium)</sub>
ZVT 在计算 VolumeUpMaFactor 等多股因子时,将所有 entity_id 拼入单条 SQL 的 IN 子句。当 A 股全市场(5000+ 股)一次性查询时,触发 SQLite 默认限制 SQLITE_MAX_VARIABLE_NUMBER=999。调大 max_allowed_packet(MySQL 参数)无效, 根因是 SQLite 变量数上限。正确解法是分批查询,但 ZVT 早期版本未处理此边界。
Source: https://github.com/zvtvz/zvt/issues/161
### `AP-ZVT-129` — 使用通配符导入隐藏 API 版本变更,AdjustType 等枚举莫名消失 <sub>(medium)</sub>
ZVT 文档示例使用 `from zvt import *` 导入所有符号。当 ZVT 版本升级重构 枚举(如将 AdjustType 移入子模块)后,通配符导入不再包含该符号,触发 AttributeError。使用者误以为是安装问题,实际是版本间 API breaking change 未在 CHANGELOG 中标注,且通配符导入掩盖了具体来源。应显式 import 枚举类。
Source: https://github.com/zvtvz/zvt/issues/129
### `AP-ZVT-187` — 回测引擎未在数据层空结果时提前终止,导致空指针级联崩溃 <sub>(medium)</sub>
ZVT Trader 在 load_data 完成后检查数据为空时,不提前退出,而是将空 DataFrame 传入 selector 计算,触发后续 NoneType 操作链式崩溃。错误栈深且难以定位根因, 用户误以为是策略逻辑问题。根因是数据时间窗口配置错误(start/end 不在数据 库覆盖范围内)但无有效校验。
Source: https://github.com/zvtvz/zvt/issues/187
### `AP-ZVT-184` — 样例历史数据替换后 provider 目录不匹配导致更新报错 <sub>(low)</sub>
ZVT 提供了可下载的历史快照数据库,但文档未说明必须放置于特定 zvt_home 子目录 下且与 provider 名称对应。用户将数据放错目录后执行 record_data 时,框架 发现本地库为空,触发从头全量拉取,再次遭遇 API 额度或权限错误。数据库路径 与 provider 的隐式绑定是常见理解盲区。
Source: https://github.com/zvtvz/zvt/issues/184
### `AP-ZVT-183B` — HFQ(后复权)与 QFQ(前复权)K 线表使用错误导致因子计算漂移 <sub>(high)</sub>
ZVT 提供 Stock1dKdata(不复权)、Stock1dHfqKdata(后复权)、Stock1dQfqKdata (前复权)三张独立表。用户在计算价格动量/均线因子时混用两张表(如用不复权 做均线,用后复权做收益率),导致除权日前后因子值产生跳变。ZVT 不做跨表 复权类型一致性校验,混用静默通过。
Source: https://github.com/zvtvz/zvt/issues/183
### `AP-ZVT-183` — 除权因子为 inf/NaN 时直接参与乘法导致复权静默失败 <sub>(high)</sub>
ZVT 在计算前复权因子时以 new/old 价格比计算 qfq_factor。当 old==0(新股首日 或数据缺失)时因子为 inf;当 kdata.open 本身为 None(停牌日未填充)时乘法 抛出 TypeError。结果:整个 entity 的复权计算中断,后续 K 线全部丢失,但主 流程只 log ERROR 不中断,用户往往不知道已有大量股票数据损坏。
Source: https://github.com/zvtvz/zvt/issues/183
### `AP-ZVT-187` — 回测引擎未在数据层空结果时提前终止,导致空指针级联崩溃 <sub>(medium)</sub>
ZVT Trader 在 load_data 完成后检查数据为空时,不提前退出,而是将空 DataFrame 传入 selector 计算,触发后续 NoneType 操作链式崩溃。错误栈深且难以定位根因, 用户误以为是策略逻辑问题。根因是数据时间窗口配置错误(start/end 不在数据 库覆盖范围内)但无有效校验。
Source: https://github.com/zvtvz/zvt/issues/187
### `AP-ZVT-183B` — HFQ(后复权)与 QFQ(前复权)K 线表使用错误导致因子计算漂移 <sub>(high)</sub>
ZVT 提供 Stock1dKdata(不复权)、Stock1dHfqKdata(后复权)、Stock1dQfqKdata (前复权)三张独立表。用户在计算价格动量/均线因子时混用两张表(如用不复权 做均线,用后复权做收益率),导致除权日前后因子值产生跳变。ZVT 不做跨表 复权类型一致性校验,混用静默通过。
Source: https://github.com/zvtvz/zvt/issues/183
FILE:references/COMPONENTS.md
# Component Capability Map
**Project**: zvt
**Scan date**: 2026-04-20
**Stats**: {'total_files': 325, 'total_classes': 424, 'total_functions': 571, 'total_business_decision_candidates': 147}
## Modules (14)
- [factors](components/factors.md): 54 classes
- [recorders](components/recorders.md): 90 classes
- [trader](components/trader.md): 22 classes
- [domain](components/domain.md): 114 classes
- [api](components/api.md): 2 classes
- [contract](components/contract.md): 53 classes
- [broker](components/broker.md): 6 classes
- [ml](components/ml.md): 5 classes
- [tag](components/tag.md): 42 classes
- [trading](components/trading.md): 19 classes
- [common](components/common.md): 9 classes
- [misc](components/misc.md): 2 classes
- [informer](components/informer.md): 4 classes
- [samples](components/samples.md): 2 classes
## Data Flow Hints (6)
- {'from': 'EntitySchema (contract/schema.py)', 'to': 'Recorder (contract/recorder.py)', 'how': 'Recorder.data_schema = EntitySchema; RecorderManager registers recorders per entity'}
- {'from': 'Recorder', 'to': 'Domain DB (SQLAlchemy models in domain/)', 'how': 'Recorder.run() calls schema.query_data() / session.add() via zvt storage layer'}
- {'from': 'Domain DB', 'to': 'Factor (contract/factor.py)', 'how': 'Factor.__init__ reads entity_schema; Factor.compute() loads data via TechnicalFactor/TransformerFactor'}
- {'from': 'Factor', 'to': 'TargetSelector (factors/target_selector.py)', 'how': 'TargetSelector aggregates multiple Factors; filters entities by score'}
- {'from': 'TargetSelector', 'to': 'Trader (trader/)', 'how': 'Trader consumes TargetSelector.run() result to make buy/sell decisions'}
- {'from': 'Trader', 'to': 'SimAccount / Broker', 'how': 'Trader places orders via Account.order(); SimAccount simulates fills'}
FILE:references/CONSTRAINTS.md
# Constraints
## preservation_manifest
```yaml
required_objects:
business_decisions_count: 188
fatal_constraints_count: 34
non_fatal_constraints_count: 124
use_cases_count: 31
semantic_locks_count: 12
preconditions_count: 4
evidence_quality_rules_count: 2
traceback_scenarios_count: 5
```
## Domain Constraints Injected (86)
- **`SHARED-CN-ASTOCK-T1-001`** <sub>(fatal)</sub>: A 股股票实行 T+1 交收制度:T 日买入的股票最早 T+1 日方可卖出。 T 日卖出所得资金可当日再用于买入。回测框架若未施加 T+1 持仓锁定, 将高估换手率与策略胜率,尤其损害日内反转类策略的真实性。
- **`SHARED-CN-ASTOCK-T1-002`** <sub>(fatal)</sub>: 沪深主板股票日涨跌幅上限为 ±10%(ST/SST 股票 ±5%)。 涨停封板时买方消失、跌停封板时卖方消失;回测若假设当日可以任意价格 成交,会系统性高估可执行性。封板检测应在成交模拟层强制实施。
- **`SHARED-CN-ASTOCK-T1-003`** <sub>(high)</sub>: 科创板和创业板(2020年8月改革后)正常交易日涨跌幅为 ±20%; 北交所 ±30%;新股上市后前5个交易日不设涨跌幅限制。 回测若对所有股票统一套用 ±10% 过滤逻辑,会错误剔除或错误包含这些板块的成交。
- **`SHARED-CN-ASTOCK-T1-004`** <sub>(high)</sub>: ST/*ST 股票日涨跌幅限制为 ±5%,流动性极差,成交假设不可与正常股票混用。 包含历史 ST 股票(最终退市)但不纳入回测会产生幸存者偏差; 纳入回测但不区分 ST 涨跌幅会错误模拟成交。
- **`SHARED-CN-ASTOCK-T1-005`** <sub>(medium)</sub>: A 股开盘集合竞价(9:15-9:25)和收盘集合竞价(14:57-15:00)期间, 成交价由"最大成交量原则"确定,非即时撮合。回测以开盘价或收盘价假设 即时全量成交会低估实际滑点风险,大单策略尤为明显。
- **`SHARED-CN-ASTOCK-T1-006`** <sub>(high)</sub>: 停牌制度:A 股长期停牌(2018年前可长达数月)期间,持仓资金被锁定, 无法再平衡,机会成本在回测中普遍被忽略。应在因子计算前过滤停牌日 (volume == 0 或 is_suspended == True),停牌期间不发出信号。
- **`SHARED-CN-ASTOCK-T1-007`** <sub>(high)</sub>: 新股上市后前5个交易日无涨跌幅限制(首日涨幅可超300%), 且无完整历史数据(均线/波动率/换手率因子无法计算)。 应在因子计算前过滤上市不足 N 个交易日(通常 60-252 日)的股票。
- **`SHARED-CN-ASTOCK-T1-008`** <sub>(high)</sub>: A 股程序化交易监管新规(2025年7月7日施行):单账户每秒申报/撤单 ≥ 300 笔, 或单日申报/撤单 ≥ 20000 笔,被认定为高频交易,须向交易所报备。 AI 生成的量化策略若频率超标则无法合规运行,应在策略设计期提示。
- **`SHARED-CN-ASTOCK-ADJ-001`** <sub>(fatal)</sub>: 除权除息日股价跳空是账面调整而非真实亏损。复权选择: 不复权会虚增策略亏损;前复权会将历史价格内嵌未来分红信息(lookahead bias); 后复权以上市首日为基准累积,是量化回测的最优选择。
- **`SHARED-CN-ASTOCK-ADJ-002`** <sub>(fatal)</sub>: A 股上市公司财务报告披露有法定延迟:年报在次年4月30日前、 半年报在8月31日前、季报分别在4月30日(一季)/10月31日(三季)前披露。 回测中使用财务数据时,必须以实际披露日期(announcement_date)而非 会计期间结束日作为数据可用时间点,否则引入 point-in-time lookahead bias。
- **`SHARED-CN-ASTOCK-ADJ-003`** <sub>(high)</sub>: 分红送股转增和配股会导致除权除息日后股本增加,历史持股数量不变但股价等比 缩水,若回测系统未同步调整持仓股数,会在除权日产生虚假亏损或盈利。
- **`SHARED-CN-ASTOCK-ADJ-004`** <sub>(medium)</sub>: 大宗交易与竞价交易价差:大宗交易成交价可比市价折价最多 10%(主板), 但此价格不影响次日竞价开盘。大宗交易数据出现在收盘后,若将其混入 日内 OHLCV 数据,会污染收盘价和成交量的正常计算。
- **`SHARED-CN-ASTOCK-ADJ-005`** <sub>(fatal)</sub>: 融资融券(两融)做空限制:A 股散户无法直接卖空,融券标的池有限(主要为 大盘蓝筹,中小盘融券极度稀缺),融券利率远高于融资利率。 回测若直接假设可做空任意股票,会产生不可执行的策略,实盘与回测严重背离。
- **`SHARED-CN-ASTOCK-FX-001`** <sub>(high)</sub>: 通过沪深港通(北向)买入股票,境外投资者合计持股上限 30%,预警线 28%。 当外资持股比例达 28% 时,联交所暂停该股新增买盘,直到降至 26% 才恢复。 策略若重仓外资偏好股(消费/医药龙头),需监控外资持股比例。
- **`SHARED-CN-ASTOCK-FX-002`** <sub>(high)</sub>: 5% 举牌规则:单一投资者持有上市公司已发行股份超过 5%,须在3日内向证监会 和交易所报告并公告;在此期间及公告后2日内不得再买卖。 量化选股系统若不考虑此规则,重仓股超过 5% 阈值后将面临强制停止买入。
- **`SHARED-CN-ASTOCK-FX-003`** <sub>(high)</sub>: 公募基金"双十原则":单基金持有单只股票不超过净资产 10%, 同一基金管理人旗下所有基金合计不超过该公司已发行股份 10%。 量化选股组合若部署于公募基金,需在优化约束中强制加入合规上限。
- **`SHARED-CN-ASTOCK-FX-004`** <sub>(fatal)</sub>: 内幕交易边界:AI 辅助量化系统的所有输入数据必须来自公开已披露信息。 通过非公开渠道(私有数据服务/内部消息/重组前预知)触发的自动化交易 构成内幕交易,适用《证券法》第80-83条及《内幕交易行为认定指引》。
- **`SHARED-CN-ASTOCK-MKT-001`** <sub>(fatal)</sub>: 幸存者偏差:使用当前 A 股成分股(如当前沪深300)作为历史回测股票池, 会遗漏曾被纳入指数但因业绩差被调出或退市的股票。2020-2024年 A 股 退市数量加速(41家/年创纪录),此偏差日趋严重。必须使用历史时点快照。
- **`SHARED-CN-ASTOCK-MKT-002`** <sub>(medium)</sub>: 指数成分股调整效应:沪深300/中证500等每半年调整一次(6月/12月), 被纳入股票通常在公告日至生效日之间显著上涨(被动资金被动买入), 被剔除股票则相反。回测股票池应使用历史成分股快照,并标注调整窗口期。
- **`SHARED-CN-ASTOCK-MKT-003`** <sub>(high)</sub>: 策略拥挤(Strategy Crowding):大量量化私募使用相似因子模型时, 持仓高度重叠,遇市场冲击时集体卖出形成踩踏。2024年2月 A 股量化危机 是典型案例(小盘股指数单日跌幅超 10%)。需监控因子多头持仓与 主流量化基金的重叠率。
- **`SHARED-CN-ASTOCK-MKT-004`** <sub>(high)</sub>: A 股量化对冲策略常用 IF/IC/IM 股指期货做多/空对冲系统性风险。 但 A 股股指期货长期处于贴水(远期价格 < 现货),IC 年化贴水可达 10-20%。 回测若仅考虑价格收益而忽略期货贴水/升水,会严重高估对冲策略净收益。
- **`SHARED-CN-ASTOCK-MKT-005`** <sub>(high)</sub>: A 股月度动量因子在方向上与美股相反:近1个月表现最好的股票, 下1个月大概率反转(反转效应而非动量)。机构研究(华泰/东吴证券) 与学术论文均验证:直接套用美股月度动量因子在 A 股会产生系统性亏损。
- **`SHARED-CN-ASTOCK-BF-001`** <sub>(medium)</sub>: 处置效应(Shefrin & Statman 1985)在 A 股散户中尤为显著: 投资者倾向于过早卖出盈利股票、过长持有亏损股票。上交所实证研究证实 超过 90% 的个人账户存在此效应,AI 辅助工具不应迁就"持有亏损等解套" 的直觉,而应基于量化信号提供纪律性止损止盈建议。
- **`SHARED-CN-ASTOCK-BF-002`** <sub>(medium)</sub>: A 股以散户为主(个人账户交易量占比超 80%),羊群效应显著:散户倾向于 跟风操作,导致价格非理性波动(如 2015年杠杆牛熊)。量化策略应避免 使用成交量排行/热度排行等可能强化羊群信号的指标作为主要因子。
- **`SHARED-CN-ASTOCK-BF-003`** <sub>(medium)</sub>: 过度自信效应(Barber & Odean 2000)在 A 股散户中更严重:散户年均换手率 超 500%,机构长期收益显著优于散户。高换手率策略经交易成本后净收益往往 更低。AI 不应鼓励"频繁操作",而应推荐低频高质信号驱动交易。
- **`SHARED-CN-ASTOCK-BF-004`** <sub>(medium)</sub>: A 股日历效应:春节效应(节前5日和节后1-3日倾向上涨)、月初效应 (月初第1-5个交易日表现优于月中/月末)已有学术实证(南京财经大学等)。 策略应在日历特殊窗口降低信号置信度,或单独评估日历驱动收益的贡献。
- **`SHARED-CN-ASTOCK-BF-005`** <sub>(high)</sub>: 策略容量(Capacity)限制:A 股小盘/微盘股日均成交额仅数百万, 大资金买入/卖出会造成严重价格冲击,策略实际容量可能仅几千万元。 回测结果不可外推至亿级资金,应在回测中加入成交量比例上限约束。
- **`SHARED-CN-ASTOCK-COST-001`** <sub>(fatal)</sub>: A 股完整交易成本结构(2023年8月调整后):印花税卖出单向 0.05%; 佣金双向约 0.01%(最低5元);过户费(沪市)0.001%; 滑点/冲击成本小盘股 0.1%-0.5%/次。忽略成本的回测策略年化收益率 具有欺骗性,高频/高换手策略尤甚。
- **`SHARED-CN-ASTOCK-COST-002`** <sub>(high)</sub>: 市场冲击成本(Market Impact)在回测中通常完全缺失,但在实盘中可能是 最大成本来源。A 股小盘股 100 万元买入可能推高 1% 以上。冲击成本与 成交规模呈幂律而非线性关系,应使用 Almgren-Chriss 模型或简化版估算。
- **`SHARED-CN-ASTOCK-COST-003`** <sub>(medium)</sub>: 大股东/董监高减持新规(证监会第224号令,2024年5月):持股5%以上大股东 通过集中竞价减持须提前15个交易日披露减持计划,3个月内不超过股份总数1%。 解禁股减持压力是 A 股特有的系统性风险因子,回测中忽略解禁日历会低估 相关股票的持股风险。
- **`SHARED-CN-ASTOCK-DATA-001`** <sub>(high)</sub>: A 股交易日历与自然日历不一致:存在法定节假日调休导致的"补班日"(周六上班), 以及临时停市(2015年7月8日至7月10日因股灾紧急停市)。 使用通用工作日历(weekdays)推算 A 股交易日会产生偏差, 必须使用 A 股专用交易日历(如 exchange_calendars 或 tushare 的交易日接口)。
- **`SHARED-CN-ASTOCK-DATA-002`** <sub>(medium)</sub>: A 股退市后股票代码可能被新股重用(极少见但存在)。使用纯代码(如 '000001') 作为历史数据主键而不包含交易所后缀('.SZ')或上市日期范围,可能导致 历史数据与当前股票的错误混淆,长周期回测中需特别注意。
- **`SHARED-DS-RL-001`** <sub>(fatal)</sub>: Rate Limit + 指数退避重试:所有外部数据 API 调用必须实施速率限制控制 和指数退避重试(Exponential Backoff with Jitter)。收到 429/503 响应后 立即重试是反模式,会加剧服务端压力并触发 IP 封禁。 最大重试次数 3-5 次,退避基数 1-2 秒,最大退避 60 秒。
- **`SHARED-DS-RL-002`** <sub>(high)</sub>: 批量 API 调用必须控制并发数(max_workers),不可无限制并行。 免费 API(akshare/tushare 免费版)通常限制为 1-3 并发; 付费 API 也有并发上限(tushare 积分制,不同积分对应不同并发)。 超出并发限制会触发 429 或 IP 封禁。推荐使用 asyncio.Semaphore 或 ThreadPoolExecutor 的 max_workers 参数显式控制。
- **`SHARED-DS-RL-003`** <sub>(high)</sub>: API Token / 凭证安全:数据源 API key(tushare token / akshare 无需 token 但 其他商业数据源需要)不可硬编码在代码中,必须通过环境变量或配置文件读取。 硬编码 token 提交到 Git 会导致 token 泄露和费用损失。
- **`SHARED-DS-RL-004`** <sub>(medium)</sub>: 请求节流(Throttling):对同一 API 的批量请求应在请求间插入最小间隔 (akshare 部分接口要求 ≥ 0.5s;tushare 免费版每分钟 200 次)。 纯代码 sleep 不如令牌桶(Token Bucket)算法精确,推荐使用 ratelimit 或 slowapi 等成熟库。
- **`SHARED-DS-MISS-001`** <sub>(high)</sub>: 停牌日数据缺失策略:停牌股票在停牌期间无成交数据,数据库中会出现日期缺口。 缺失日期不可使用 forward-fill(会产生虚假成交量); 应在数据库中以 is_suspended=True 标记,量和成交额填 0,价格保留前一日收盘价。 因子计算时必须过滤 is_suspended=True 的行。
- **`SHARED-DS-MISS-002`** <sub>(medium)</sub>: 新上市股票的历史数据边界:新股上市首日开始在数据库中出现,但其上市前 无历史数据。若因子计算的 lookback 期超过上市天数,会产生所有 NaN 因子值。 采集时应记录每只股票的上市日期(list_date),采集逻辑应以上市日期为起点, 不以固定开始日期。
- **`SHARED-DS-MISS-003`** <sub>(high)</sub>: 退市股票的数据完整性:已退市股票在主流数据源(akshare/tushare)中依然 可以查询历史数据(退市前的历史),但退市日期后无数据。 历史股票池构建时必须包含已退市股票(否则幸存者偏差), 且采集时需明确处理退市日截止边界。
- **`SHARED-DS-MISS-004`** <sub>(high)</sub>: 多数据源数据对账(Cross-Source Reconciliation):同一数据(如收盘价) 从不同数据源(akshare/tushare/baostock)获取可能存在细微差异 (不同复权方式/不同节假日处理/除息调整时间不同)。 应在 pipeline 中实施多源对账检查,差异超阈值(如 0.1%)时记录告警并人工确认。
- **`SHARED-DS-TIME-001`** <sub>(high)</sub>: 时间戳精度与类型一致性:数据库中时间戳应使用统一的数据类型 (timestamp 而非 varchar/int)。混用字符串日期('2024-01-15')和 Timestamp 对象是比较、索引、merge 出现细微 bug 的常见来源, 应在 pipeline 入口处强制转换。
- **`SHARED-DS-TIME-002`** <sub>(high)</sub>: 交易时间与自然时间的区分:日线数据的"日期"通常对应交易日(T日), 而新闻/公告数据的"时间"是自然时间。合并两类数据时,必须将自然时间 映射到下一个可用交易日(next available trading day), 否则会产生"公告在T日,但T日盘中已经可用"的 lookahead 问题。
- **`SHARED-DS-INCR-001`** <sub>(high)</sub>: 增量更新幂等性:数据更新脚本必须是幂等的(多次运行结果相同)。 若脚本因网络中断在中途失败,重新运行时不应产生重复数据或数据缺口。 实现方式:先写入临时表,校验后 UPSERT 到主表,不直接 INSERT/APPEND。
- **`SHARED-DS-INCR-002`** <sub>(high)</sub>: 数据完整性检验(数据校验和/行数检查):每次数据更新后, 应对关键字段做完整性检验:行数是否在预期范围内、价格是否为正数、 日期是否连续(无缺失交易日)。缺少自动校验的数据管道是"沉默腐烂"的根源。
- **`SHARED-DS-INCR-003`** <sub>(medium)</sub>: 数据版本化:数据管道的输出数据应版本化管理(data versioning)。 当数据源更新了历史数据(如修订调整后的财务数据), 旧版本数据应保留可追溯,不应静默覆盖,以便对比版本间差异及复现历史回测。
- **`SHARED-DS-INCR-004`** <sub>(medium)</sub>: 数据对齐到交易日历边界:采集完成后,应验证所有股票/资产的数据覆盖 完整性与交易日历的一致性。每只股票在每个交易日都应有一行数据 (停牌标记,不是缺失)。通过 pivot_table 检查 NaN 比例是有效的快速诊断手段。
- **`SHARED-DS-INCR-005`** <sub>(medium)</sub>: 缓存策略(Caching):频繁读取的静态/低频更新数据(如股票信息、行业分类、 指数成分股)应本地缓存,避免每次运行重复 API 调用。 缓存必须设置过期时间(TTL),防止使用过期的行业分类或已失效的成分股信息。
- **`SHARED-BT-LAB-001`** <sub>(fatal)</sub>: 未来函数(Lookahead Bias):在模拟历史时间点 t 的交易决策时, 不得使用 t 时刻之后才能知道的信息。最常见形式: (1) 使用收盘价计算信号并同日以收盘价成交; (2) 将 T 日收盘后计算的指标标记在同一根 K 线; (3) 使用当日最高/最低价作为成交假设。 信号计算与成交时间必须对齐:T 日收盘后计算信号,T+1 日开盘成交。
- **`SHARED-BT-LAB-002`** <sub>(high)</sub>: 指标预热期(Warmup Period)处理:滚动窗口指标在前 N 个 bar 时 NaN, 这些 bar 不应参与信号计算和持仓决策。强制要求指标的 warmup_period 与最长 lookback 期等长,且 warmup 期间持仓应置零。
- **`SHARED-BT-LAB-003`** <sub>(fatal)</sub>: ML/DL 模型时序数据划分必须按时间顺序:TRAIN < VALID < TEST, 不可使用随机 k-fold 分折(会将未来数据混入训练集)。 应使用 TimeSeriesSplit 或 Walk-Forward 验证。
- **`SHARED-BT-LAB-004`** <sub>(fatal)</sub>: 开盘价/最高价/最低价成交假设:日线回测中假设每日可以最高价卖出或 最低价买入(如动量策略"最高价止盈"),这是明显的 lookahead, 因为日内最高/最低价只有收盘后才能确认。成交价只能用开盘价或 前一日收盘价(带滑点)。
- **`SHARED-BT-LAB-005`** <sub>(high)</sub>: 数据对齐偏移(Off-by-one):pandas rolling/shift 等操作容易引入细微的 1 期偏移错误。应在代码中明确记录每个序列的"观测时间点", 并通过 assert 验证关键时间对齐关系。
- **`SHARED-BT-LAB-006`** <sub>(high)</sub>: 过度优化(Overfitting):回测数量越多,过拟合概率越高。 Bailey et al.(2014)证明 Optimal Sharpe Ratio 期望值随回测次数单调递减。 应使用 Walk-Forward 验证代替 in-sample 参数穷举,并报告 Deflated Sharpe Ratio(DSR)而非峰值 Sharpe。
- **`SHARED-BT-SURV-001`** <sub>(fatal)</sub>: 幸存者偏差(Survivorship Bias):使用当前市场成分股作为历史回测股票池, 会遗漏曾经存在但后来退市、摘牌或被合并的股票,系统性高估策略历史收益率。 回测股票池必须使用历史时点快照(point-in-time universe)。
- **`SHARED-BT-SURV-002`** <sub>(high)</sub>: In-Sample / Out-of-Sample 划分:策略开发、参数选择必须在样本内完成, 样本外数据仅用于最终验证,不可多次"看"样本外数据后继续调优 (会将样本外变为新的样本内,重蹈过拟合)。
- **`SHARED-BT-SURV-003`** <sub>(high)</sub>: 停牌/缺失数据的填充策略:停牌日价格不可简单用前一日收盘价 forward-fill, 因为这会在复盘时造成"零成交量"日参与了因子计算和信号生成。 应在因子计算层显式过滤缺失交易日,不填充。
- **`SHARED-BT-SURV-004`** <sub>(high)</sub>: 异常值(Extreme Value)污染:原始市场数据可能含有数据源错误(如除权未 及时调整、手工录入错误导致的极端价格),不清洗直接进入因子计算会产生 极端信号,污染整个横截面。应在 pipeline 入口处过滤 3-sigma 异常值。
- **`SHARED-BT-COST-001`** <sub>(fatal)</sub>: 交易成本(佣金 + 印花税/转让税 + 过户费)必须在回测初始化时强制配置, 不可使用零成本默认值。忽略成本的回测策略绩效指标具有欺骗性, 高换手率策略尤其严重(单边往返成本往往吞噬 50%+ 的毛收益)。
- **`SHARED-BT-COST-002`** <sub>(high)</sub>: 滑点(Slippage)建模:回测若无滑点,假设每笔订单以理想价格成交, 高频策略在实盘中会因成交价劣化而产生严重亏损。至少应配置固定点差 或比例滑点;大单应使用成交量比例模型(如不超过日成交量 5%)。
- **`SHARED-BT-COST-003`** <sub>(high)</sub>: 换手率(Turnover)必须在回测绩效报告中展示并与成本关联分析。 月换手率超过 50%(年化 600%+)时,策略净收益对成本假设极度敏感, 每 10bps 成本变化可能改变策略盈亏结论,必须做成本敏感性分析。
- **`SHARED-BT-COST-004`** <sub>(medium)</sub>: 仓位规模化(Position Sizing)必须纳入资金量约束:回测应模拟固定资金量 下的实际持仓股数(取整),而非假设可以持有小数股。 对小盘股,最小交易单位(A股:100股/手)会导致实际可持仓量与目标权重 产生偏差,应在回测中模拟取整效应。
- **`SHARED-BT-TIME-001`** <sub>(high)</sub>: 时间戳时区统一:多数据源合并时,UTC vs 本地时间混用是常见数据腐败源。 所有时间戳必须在 pipeline 入口处统一转换为同一时区(推荐 UTC 存储, 市场本地时区展示),不可在 pipeline 中途混用不同时区。
- **`SHARED-BT-TIME-002`** <sub>(high)</sub>: 交易日历对齐:合并不同市场或不同频率数据时(如日线价格 + 周频因子), 必须使用明确的交易日历进行 reindex/merge,不可使用 outer join 后 fillna, 否则会在非交易日(节假日)创建虚假数据行。
- **`SHARED-BT-TIME-003`** <sub>(high)</sub>: 增量更新边界校验:历史数据增量更新时,必须从数据库查询已存最新日期, 仅下载该日期之后的数据。若重新下载已有数据并追加,会产生时间戳重复行, 导致回测时序错误。更新前后必须校验无重复 (index.duplicated().any() == False)。
- **`SHARED-BT-TIME-004`** <sub>(medium)</sub>: 回测绩效归因失真:基准(Benchmark)选择不当会使 Alpha/Beta 计算失真。 应选用策略实际可投资的被动基准(如 HS300 ETF),而非不可直接投资的 价格指数(如 HS300 指数)。价格指数不含股息再投资,会低估持仓基准收益。
- **`SHARED-BT-PERF-001`** <sub>(medium)</sub>: 最大回撤(Max Drawdown)计算必须使用净值序列(portfolio value), 不可用累计收益率序列代替。若使用对数收益率累加,会低估回撤深度 (因对数收益率在下跌时会比简单收益率偏小)。
- **`SHARED-BT-PERF-002`** <sub>(medium)</sub>: Sharpe Ratio 年化化约定:年化 Sharpe = 日 Sharpe × sqrt(252)(股票,252 交易日) 或 × sqrt(365)(加密货币,365日)。不同系统默认不同,跨系统对比前必须 确认年化因子,否则 Sharpe 不可比。
- **`SHARED-BT-PERF-003`** <sub>(medium)</sub>: Calmar Ratio / Sortino Ratio 优于 Sharpe Ratio 作为风险调整收益指标: Sharpe 假设收益正态分布,A 股/加密市场的收益分布显著左偏(肥尾), 会低估下行风险。量化评估应同时报告 Sortino(仅下行波动)和 Calmar(年化收益/最大回撤),不应单一依赖 Sharpe。
- **`SHARED-BT-PERF-004`** <sub>(medium)</sub>: 回测绩效归因应拆解为:alpha(主动收益)、beta(市场收益)、 因子暴露收益(style/sector)和特异性收益(stock selection)。 不做归因的回测无法区分"策略优秀"与"顺风行情恰好 beta 对了"。
- **`SHARED-FR-IC-001`** <sub>(high)</sub>: IC(信息系数)是衡量因子预测能力的核心指标,定义为因子值与 下期收益率的 Spearman 秩相关系数(ICIR = IC / std(IC))。 IC 绝对值 > 0.05 视为有预测能力的初步证据,ICIR > 0.5 视为稳定。 不计算 IC 直接报告回测绩效是因子有效性证明缺失的典型问题。
- **`SHARED-FR-IC-002`** <sub>(high)</sub>: IC 衰减(IC Decay)分析:因子预测能力通常随持仓期增长而衰减。 应计算 1/5/10/20 日 IC 序列,识别因子的最优持仓期。 IC 在1日高但20日迅速衰减的因子是短期因子,不适合月度换仓策略; 反之亦然。使用错误的持仓期会严重损害因子实盘表现。
- **`SHARED-FR-IC-003`** <sub>(high)</sub>: Harvey, Liu & Zhu (2016) 警告:学术界已发现 300+ 个"显著"因子, 其中大量是多重检验下的误发现(False Discovery)。因子有效性要求: t-stat > 3.0(而非传统的 1.96);或在不同时段/市场独立复现; 或有清晰的经济学逻辑。不满足上述条件的因子极可能是数据挖掘产物。
- **`SHARED-FR-IC-004`** <sub>(high)</sub>: 因子换手率(Factor Turnover)控制:高 IC 但高换手率的因子,在扣除 交易成本后净 IC 可能为负。应计算换手率调整后的有效 IC: net_IC = IC - turnover × cost_per_turn。目标换手率 ≤ 50%(月频)。
- **`SHARED-FR-IC-005`** <sub>(medium)</sub>: 因子衰减期(Half-life)是因子信号强度的核心参数,直接决定最优再平衡频率。 半衰期 < 5 日:日频或周频换仓;5-20 日:周频或双周;> 20 日:月频换仓。 错误地对短期因子使用月频换仓,会导致大量 alpha 在持仓期内消散。
- **`SHARED-FR-NEUT-001`** <sub>(high)</sub>: 行业中性化(Industry Neutralization):因子值若不对行业均值中性化, 因子收益中会混入行业轮动收益,难以判断是因子本身还是行业暴露驱动了收益。 行业中性化操作:factor_neutral = factor - industry_mean(factor)。
- **`SHARED-FR-NEUT-002`** <sub>(high)</sub>: 市值中性化(Market Cap Neutralization):小盘股效应(小盘跑赢大盘) 是金融史上最持久的 anomaly 之一,会污染几乎所有未中性化的因子。 若因子与市值高度相关,选股会系统性偏向小盘,收益来自市值暴露而非因子本身。 需同时进行行业和市值中性化(Fama-MacBeth 回归或残差法)。
- **`SHARED-FR-NEUT-003`** <sub>(high)</sub>: 异常值处理(Winsorize/MAD):因子原始值通常含有极端值,极端值会扭曲 分组分析(如 Q1/Q10 十分位)。应对原始因子值做 Winsorize(截尾至 [1%, 99%] 或 3-sigma)或 MAD(中位数绝对偏差)缩尾,然后再排名/中性化。
- **`SHARED-FR-NEUT-004`** <sub>(medium)</sub>: 因子正交化(Factor Orthogonalization):当多个因子共同用于合成打分时, 高相关因子的合成等效于对单一因子过度权重,稀释信号多样性。 应在合成前对因子做施密特正交化或 PCA,消除因子间的多重共线性。
- **`SHARED-FR-NEUT-005`** <sub>(medium)</sub>: 缺失数据填充策略:因子计算中的 NaN(停牌/新股/数据缺口)若用截面均值填充 会引入 lookahead bias(均值本身含未来信息);若完全删除会产生幸存者偏差; 正确做法是用截面中位数(当日所有股票的中位数,不依赖未来)或将该股当日排除。
- **`SHARED-FR-PORT-001`** <sub>(high)</sub>: 分层分析(Quantile Analysis):因子评估应使用 Q1/Q5(五分位)或 Q1/Q10(十分位)分组的多空收益差(top minus bottom spread)作为 主要评估指标,而非简单的多头收益。Q1 多 Q5 空的"单调性"检验是 因子有效性的核心证据:单调递增/递减 > 非单调 >> 仅多头有效。
- **`SHARED-FR-PORT-002`** <sub>(medium)</sub>: Alpha 衰减测试(Alpha Decay Test):因子的月度 IC 在不同时段(牛市/熊市/ 震荡市)的稳定性是因子鲁棒性的重要证据。IC 仅在某个特定市场状态下有效 的因子不适合全天候部署;应分段(rolling 12M)展示 IC 时序, 识别因子失效期。
- **`SHARED-FR-PORT-003`** <sub>(medium)</sub>: 换仓成本感知(Turnover-Aware Selection):因子排名靠近中间地带(49-51 分位) 的股票,排名小幅波动就会触发换仓,产生大量无效交易成本。 应在选股时设置换仓缓冲区(buffer zone):只在排名变化超过阈值时才换仓。
- **`SHARED-FR-PORT-004`** <sub>(medium)</sub>: 分组收益的统计显著性(Bootstrap 检验):因子分层收益差(Q1-Q5 spread) 即使在历史数据上很大,也可能是偶然,需要 bootstrap 或 t-test 检验 显著性(p-value < 0.05)。小样本回测期(< 3年)的分层收益尤其不可靠。
- **`SHARED-FR-XFER-001`** <sub>(high)</sub>: 因子跨市场可移植性验证:在一个市场有效的因子,不必然在另一个市场有效。 将美股因子直接套用 A 股、或将股票因子套用期货/加密货币,需要独立 IC 验证, 不可假设跨市场通用性。A 股特有异象(如反转效应、ST 价格异常)不存在于美股。
- **`SHARED-FR-XFER-002`** <sub>(medium)</sub>: 因子有效性时间稳定性:曾经有效的因子会因市场学习和套利行为逐渐失效 (McLean & Pontiff 2016 证明因子发表后平均衰减 58%)。 应定期(每季度/年)重新评估因子 IC,失效因子应及时替换或降权。
- **`SHARED-FR-XFER-003`** <sub>(medium)</sub>: 因子与宏观经济环境的交互:利率周期/经济周期/市场情绪对因子有效性影响显著。 价值因子(低 P/B)在利率上升期更有效;动量因子在趋势市更有效,震荡市失效。 部署因子前应评估当前宏观环境与因子最优生存环境的匹配度。
FILE:references/LOCKS.md
# Semantic Locks + Preconditions
## Semantic Locks (12)
### `SL-01` <sub>(on_violation: fatal)</sub>
Execute sell orders before buy orders in every trading cycle
### `SL-02` <sub>(on_violation: fatal)</sub>
Trading signals MUST use next-bar execution (no look-ahead)
### `SL-03` <sub>(on_violation: fatal)</sub>
Entity IDs MUST follow format entity_type_exchange_code
### `SL-04` <sub>(on_violation: fatal)</sub>
DataFrame index MUST be MultiIndex (entity_id, timestamp)
### `SL-05` <sub>(on_violation: fatal)</sub>
TradingSignal MUST have EXACTLY ONE of: position_pct, order_money, order_amount
### `SL-06` <sub>(on_violation: fatal)</sub>
filter_result column semantics: True=BUY, False=SELL, None/NaN=NO ACTION
### `SL-07` <sub>(on_violation: fatal)</sub>
Transformer MUST run BEFORE Accumulator in factor pipeline
### `SL-08` <sub>(on_violation: fatal)</sub>
MACD parameters locked: fast=12, slow=26, signal=9
### `SL-09` <sub>(on_violation: warning)</sub>
Default transaction costs: buy_cost=0.001, sell_cost=0.001, slippage=0.001
### `SL-10` <sub>(on_violation: fatal)</sub>
A-share equity trading is T+1 (no same-day close of buy positions)
### `SL-11` <sub>(on_violation: fatal)</sub>
Recorder subclass MUST define provider AND data_schema class attributes
### `SL-12` <sub>(on_violation: fatal)</sub>
Factor result_df MUST contain either 'filter_result' OR 'score_result' column
## Preconditions (4)
- **`PC-01`**: `python3 -c 'import zvt; print(zvt.__version__)'` → on_fail: Run: python3 -m pip install zvt then re-run: python3 -m zvt.init_dirs to initialize data directories
- **`PC-02`**: `python3 -c "from zvt.api.kdata import get_kdata; df = get_kdata(entity_ids=['stock_sh_600000'], limit=1); assert df is not None and len(df) > 0, 'No kdata found'"` → on_fail: Run recorder first: python3 -m zvt.recorders.em.em_stock_kdata_recorder --entity_ids stock_sh_600000 (replace with your target entity IDs)
- **`PC-03`**: `python3 -c "import os; from pathlib import Path; zvt_home = Path(os.environ.get('ZVT_HOME', Path.home() / '.zvt')); assert zvt_home.exists(), f'ZVT home not found: {zvt_home}'"` → on_fail: Run: python3 -m zvt.init_dirs
- **`PC-04`**: `python3 -c "import os, tempfile; from pathlib import Path; zvt_home = Path(os.environ.get('ZVT_HOME', Path.home() / '.zvt')); test_f = zvt_home / '.write_test'; test_f.touch(); test_f.unlink()"` → on_fail: Check directory permissions: chmod u+w ~/.zvt or set ZVT_HOME environment variable to a writable location
FILE:references/USE_CASES.md
# Known Use Cases (KUC)
Total: **41**
## `KUC-DR-001`
**Source**: `examples/data_runner/actor_runner.py`
定时(每周三凌晨1点)批量采集机构投资者持仓、前十大流通股东、股东汇总数据, 支撑后续机构持仓变化的量化分析。
**Inputs**:
- {'data_provider': 'em'}
- {'entity_provider': 'em'}
- {'cron_schedule': 'hour=1, minute=0, day_of_week=2'}
**Components**:
- StockInstitutionalInvestorHolder
- StockTopTenFreeHolder
- StockActorSummary
- run_data_recorder
- BackgroundScheduler
**Parameters**:
```
{'day_data': True, 'sleeping_time': None}
```
**Validation**:
```
运行后查询 StockActorSummary.query_data() 返回非空 DataFrame, 且数据时间戳更新至最近报告期。
```
## `KUC-DR-002`
**Source**: `examples/data_runner/finance_runner.py`
每周五凌晨1点同步全市场A股财务四表(利润表、资产负债表、现金流量表、财务因子), 保持本地数据库与东方财富数据源同步,为基本面选股提供原始数据。
**Inputs**:
- {'data_provider': 'eastmoney'}
- {'entity_provider': 'eastmoney'}
- {'cron_schedule': 'hour=1, minute=0, day_of_week=5'}
**Components**:
- Stock
- StockDetail
- FinanceFactor
- BalanceSheet
- IncomeStatement
- CashFlowStatement
- run_data_recorder
- BackgroundScheduler
**Parameters**:
```
{'day_data': True}
```
**Validation**:
```
FinanceFactor.query_data(limit=5) 返回含最新报告期记录, CashFlowStatement.query_data() 不为空。
```
## `KUC-DR-003`
**Source**: `examples/data_runner/index_runner.py`
维护A股指数基本信息及其成份股列表(国证1000/2000/成长/价值), 并在每个交易日16:20后同步重要指数的日K行情,为板块轮动分析提供基础数据。
**Inputs**:
- {'data_provider': '"exchange" / "em"'}
- {'index_ids': ['index_sz_399311', 'index_sz_399303', 'index_sz_399370', 'index_sz_399371']}
- {'important_index_codes': 'IMPORTANT_INDEX 常量'}
**Components**:
- Index
- Index1dKdata
- IndexStock
- run_data_recorder
- BackgroundScheduler
**Parameters**:
```
{'day_data': True, 'entity_provider': 'exchange'}
```
**Validation**:
```
Index1dKdata.query_data(codes=IMPORTANT_INDEX) 返回当日收盘价记录。
```
## `KUC-DR-004`
**Source**: `examples/data_runner/joinquant_fund_runner.py`
每周六从聚宽采集公募基金基本信息、基金持仓明细和个股估值(PE/PB), 支持基金重仓分析与价值择时策略。
**Inputs**:
- {'data_provider': 'joinquant'}
- {'entity_provider': 'joinquant'}
**Components**:
- Fund
- FundStock
- StockValuation
- run_data_recorder
**Parameters**:
```
{'sleeping_time': 0, 'day_data': True}
```
**Validation**:
```
FundStock.query_data(limit=5) 返回含基金代码和持仓股代码的记录。
```
## `KUC-DR-005`
**Source**: `examples/data_runner/joinquant_kdata_runner.py`
每个交易日15:30后从聚宽采集A股日后复权K线及交易日历, 保持本地历史行情数据库完整,为技术因子计算提供全量数据基础。
**Inputs**:
- {'data_provider': 'joinquant'}
- {'entity_provider': 'joinquant'}
**Components**:
- Stock
- StockTradeDay
- Stock1dHfqKdata
- run_data_recorder
**Parameters**:
```
{'force_update': False, 'day_data': True, 'sleeping_time': 0}
```
**Validation**:
```
Stock1dHfqKdata.query_data(entity_id="stock_sz_000001", limit=5) 返回最新交易日收盘价。
```
## `KUC-DR-006`
**Source**: `examples/data_runner/kdata_runner.py`
A股+港股每日全市场行情录入主流程:包含涨停数据、指数行情、板块行情(概念/行业)、 A股后复权K线、港股(南向通)行情,并推送新板块通知邮件。
**Inputs**:
- {'data_provider': 'em'}
- {'entity_provider': 'em'}
- {'sleeping_time': 0}
**Components**:
- LimitUpInfo
- Index
- Index1dKdata
- Block
- Block1dKdata
- Stock
- Stock1dHfqKdata
- Stockhk
- Stockhk1dHfqKdata
- get_entity_ids_by_filter
- EmailInformer
- run_data_recorder
**Parameters**:
```
{'ignore_delist': True, 'ignore_st': False, 'ignore_new_stock': False, 'return_unfinished': True, 'force_update': False}
```
**Validation**:
```
Stock1dHfqKdata.query_data(day_data=True) 及 LimitUpInfo.query_data() 均有当日记录, 邮件收到新板块通知。
```
## `KUC-DR-007`
**Source**: `examples/data_runner/kdata_runner.py`
采集涨停股的涨停原因并统计近期热门涨停题材(按出现频次排序), 输出题材热度榜以辅助短线复盘。
**Inputs**:
- {'days_ago': '20 / 5'}
- {'limit': 15}
**Components**:
- LimitUpInfo
- get_hot_topics
- EmailInformer
**Parameters**:
```
{'reason_split_char': '+'}
```
**Validation**:
```
get_hot_topics(days_ago=5) 返回非空字典,键为题材名,值为出现次数。
```
## `KUC-DR-008`
**Source**: `examples/data_runner/kdata_runner.py`
采集A股全市场新闻标题,按可配置关键词分组统计各题材关联个股, 识别长期热门 vs 新热门 vs 退潮题材,辅助主题投资决策。
**Inputs**:
- {'hot_words_config': 'hot.json(主题:关键词列表)'}
- {'days_ago': '20 / 5'}
- {'threshold': 3}
**Components**:
- StockNews
- run_data_recorder
- get_hot_topics
- group_stocks_by_topic
- EmailInformer
**Parameters**:
```
{'sleeping_time': 2, 'force_update': False}
```
**Validation**:
```
report_hot_topics() 邮件包含"一直热门"、"+++"、"---"三段信息, 且每段均非空。
```
## `KUC-DR-009`
**Source**: `examples/data_runner/sina_data_runner.py`
从新浪采集A股板块(概念/行业)基本信息及板块资金流向, 提供与东方财富数据源互补的资金面视角。
**Inputs**:
- {'data_provider': 'sina'}
**Components**:
- Block
- BlockMoneyFlow
- run_data_recorder
**Parameters**:
```
{'day_data': True}
```
**Validation**:
```
BlockMoneyFlow.query_data(provider="sina", limit=5) 返回含 main_net_inflow 字段的记录。
```
## `KUC-DR-010`
**Source**: `examples/data_runner/trading_runner.py`
每个交易日18点采集龙虎榜,筛选出近一年胜率高的知名游资席位, 再过滤出30天内有该席位参与且当日成交额+换手率达标的个股,推送邮件供人工跟踪。
**Inputs**:
- {'data_provider': 'em'}
- {'entity_provider': 'em'}
- {'look_back_days': 400}
- {'recent_days': 30}
- {'dep_rate_threshold': 5}
- {'turnover_threshold': 300000000}
- {'turnover_rate_threshold': 0.02}
**Components**:
- DragonAndTiger
- Stock1dHfqKdata
- get_big_players
- EmailInformer
- run_data_recorder
**Parameters**:
```
{'sleeping_time': 2, 'day_data': True}
```
**Validation**:
```
DragonAndTiger.query_data(limit=5) 有当日记录,邮件包含"report 龙虎榜"主题。
```
## `KUC-FA-001`
**Source**: `examples/factors/boll_factor.py`
为A股个股计算布林带(Bollinger Bands)并标注突破上轨/下轨信号, 可视化价格与带宽关系,辅助均值回归与趋势跟踪策略。
**Inputs**:
- {'entity_ids': ['stock_sz_000338', 'stock_sh_601318']}
- {'provider': 'em'}
- {'start_timestamp': '2019-01-01'}
- {'data_level': '"1d" / "30m"'}
**Components**:
- BollTransformer (自定义 Transformer,使用 ta.volatility.BollingerBands)
- BollFactor (继承 TechnicalFactor)
- Stock1dHfqKdata / Stock30mHfqKdata
**Parameters**:
```
{'window': 20, 'window_dev': 2, 'output_columns': ['bb_bbm', 'bb_bbh', 'bb_bbl', 'bb_bbhi', 'bb_bbli', 'bb_bbw', 'bb_bbp'], 'filter_result': 'bb_bbli - bb_bbhi (1=价格在下轨, -1=价格在上轨)'}
```
**Validation**:
```
factor.draw(show=True) 弹出含价格+布林带三轨道图形; factor.result_df 包含 True/False/None 三种状态。
```
## `KUC-FA-002`
**Source**: `examples/factors/fundamental_selector.py`
用基本面多维度筛选"核心资产":高ROE、高现金流、低财务杠杆、有增长、 低应收账款(应收<=总流动资产30%),为长线价值投资提供股票池。
**Inputs**:
- {'start_timestamp': '2016-01-01'}
- {'end_timestamp': '当前日期字符串'}
- {'codes': 'null(全A)'}
**Components**:
- FundamentalSelector (继承 TargetSelector)
- GoodCompanyFactor (使用 FinanceFactor 数据)
- GoodCompanyFactor (使用 BalanceSheet + accounts_receivable 过滤)
- BalanceSheet
**Parameters**:
```
{'provider': 'eastmoney', 'col_period_threshold': 'null (第二个 factor 不设列期数阈值)', 'accounts_receivable_max': '0.3 * total_current_assets'}
```
**Validation**:
```
selector.get_targets("2019-06-30") 返回非空 entity_id 列表, 手工核对结果含典型高ROE龙头股(如贵州茅台、格力电器等)。
```
## `KUC-FA-003`
**Source**: `examples/factors/tech_factor.py`
综合 MACD 金叉/多头趋势 + 均线多头排列(5/120/250日线)+ 成交额与换手率过滤, 识别"放量上攻牛股",为日线趋势跟踪策略提供入场信号。
**Inputs**:
- {'entity_ids': '中大市值股票池(由 get_middle_and_big_stock 预过滤)'}
- {'start_timestamp': '2019-01-01'}
- {'adjust_type': 'AdjustType.hfq'}
**Components**:
- BullAndUpFactor (继承 MacdFactor)
- CrossMaTransformer (windows=[5, 120, 250])
- MacdFactor
**Parameters**:
```
{'turnover_threshold': 400000000, 'turnover_rate_threshold': 0.02, 'ma_windows': [5, 120, 250]}
```
**Validation**:
```
factor.result_df["filter_result"] 含 True 条目; report_bull() 邮件列出符合条件标的。
```
## `KUC-RP-001`
**Source**: `examples/reports/report_bull.py`
每个交易日18点自动筛选满足"牛股"条件(MACD金叉+多头趋势+成交量达标)的 A股及板块,分类推送邮件并同步到东方财富自选股组,辅助每日择股。
**Inputs**:
- {'target_date': 'get_latest_kdata_date() 自动获取'}
- {'entity_ids': 'get_middle_and_big_stock(timestamp)'}
- {'adjust_type': 'AdjustType.hfq (股) / AdjustType.qfq (板块)'}
**Components**:
- BullAndUpFactor
- report_targets
- get_middle_and_big_stock
- EmailInformer
**Parameters**:
```
{'turnover_threshold': 300000000, 'turnover_rate_threshold': 0.02, 'start_timestamp': '2019-01-01', 'em_group': 'bull股票', 'em_group_over_write': False, 'filter_by_volume': False}
```
**Validation**:
```
邮件主题包含"bull股票"且正文包含个股代码; 东方财富"bull股票"组有对应记录。
```
## `KUC-RP-002`
**Source**: `examples/reports/report_core_compay.py`
每周六基于基本面多因子模型(FundamentalSelector)筛选核心资产, 附上基金和QFII持仓占比变化,发邮件并同步东方财富"core"自选组, 为长线配置提供每周精选标的。
**Inputs**:
- {'start_timestamp': '2016-01-01'}
- {'end_timestamp': '当前日期'}
- {'subscriber_emails': 'subscriber_emails.json 文件'}
**Components**:
- FundamentalSelector
- TargetSelector
- StockActorSummary
- get_entities
- add_to_eastmoney
- EmailInformer
**Parameters**:
```
{'actor_type': 'ActorType.raised_fund / ActorType.qfii', 'em_group': 'core'}
```
**Validation**:
```
邮件含选股结果(含机构持仓占比);若无结果则发送"no targets"。
```
## `KUC-RP-003`
**Source**: `examples/reports/report_tops.py`
每日17点计算A股短期最强(近期涨幅最高)和中期最强个股, 17:30计算最强行业板块和最强概念板块(按N日涨幅排名), 并同步推送港股南向通短期/中期最强,辅助板块轮动和动量策略。
**Inputs**:
- {'periods': '短期=[近N天], 中期=[30,50]'}
- {'top_count': '10(板块)'}
- {'turnover_threshold': '100000000(港股)'}
**Components**:
- get_top_stocks
- report_top_entities
- get_top_performance_entities_by_periods
- Block
- BlockCategory
- inform
- EmailInformer
**Parameters**:
```
{'return_type': 'TopType.positive', 'ignore_new_stock': 'false / true', 'adjust_type': 'AdjustType.hfq / null', 'em_group_over_write': True}
```
**Validation**:
```
邮件分别包含"短期最强"、"中期最强"、"最强行业"、"最强概念"主题, 每组列出 top_count 数量标的。
```
## `KUC-RP-004`
**Source**: `examples/reports/report_vol_up.py`
筛选"放量突破半年线或年线"的A股(按市值分大小市值两组)和港股, 识别均线突破形态的个股,辅助中期趋势入场。
**Inputs**:
- {'windows': [120, 250]}
- {'up_intervals': 60}
- {'over_mode': 'or'}
- {'turnover_threshold': '100000000(港股)'}
**Components**:
- VolumeUpMaFactor
- get_top_stocks (return_type="small_vol_up" / "big_vol_up")
- report_targets
- inform
- EmailInformer
**Parameters**:
```
{'adjust_type': 'AdjustType.hfq', 'start_timestamp': '2021-01-01', 'filter_by_volume': False}
```
**Validation**:
```
邮件包含"放量突破(半)年线"标题,标的按大小市值分两封邮件。
```
## `KUC-RP-005`
**Source**: `examples/reports/__init__.py`
识别财务风险股票:营收/利润下滑、流动比率/速动比率低、 高应收+高存货+高商誉、应收账款超净利润一半, 用于规避高风险标的或做空筛选。
**Inputs**:
- {'the_date': '当前日期(默认)'}
- {'income_yoy': '-0.1 (同比跌幅阈值)'}
- {'profit_yoy': -0.1}
- {'entity_ids': 'null(全A)'}
**Components**:
- FinanceFactor
- BalanceSheet
- IncomeStatement
- risky_company (自定义函数)
**Parameters**:
```
{'current_ratio_min': 0.7, 'quick_ratio_min': 0.5, 'start_offset_days': 130}
```
**Validation**:
```
risky_company() 返回含高风险个股代码的列表,手工验证含已知财务暴雷股。
```
## `KUC-RS-001`
**Source**: `examples/research/dragon_and_tiger.py`
分析龙虎榜历史数据,识别过去一年(~400天)中胜率最高的游资席位(大玩家), 并计算每个席位在不同持仓天数(3/5/10天)下的历史胜率, 为跟庄席位策略提供统计依据。
**Inputs**:
- {'provider': 'em'}
- {'start_timestamp': 'date_time_by_interval(end_timestamp, -400)'}
- {'end_timestamp': 'date_time_by_interval(current_date(), -60)'}
- {'intervals': [3, 5, 10]}
**Components**:
- DragonAndTiger
- get_big_players
- get_player_success_rate
**Validation**:
```
get_player_success_rate() 返回含席位名+多个持仓天数胜率的 DataFrame, 可见知名游资席位(如"国泰君安证券股份有限公司上海江苏路证券营业部")。
```
## `KUC-RS-002`
**Source**: `examples/research/top_dragon_tiger.py`
对每月涨幅前30股票,追溯其月涨幅期间内龙虎榜记录, 统计哪些席位频繁参与月度强势股,揭示"聪明钱"机构行为模式。
**Inputs**:
- {'data_provider': 'em'}
- {'start_timestamp': '2021-01-01'}
- {'end_timestamp': '2022-01-01'}
**Components**:
- get_top_performance_by_month
- get_players
- DragonTigerFactor (继承 TechnicalFactor,叠加席位注释)
**Parameters**:
```
{'direction': 'in', 'top_count_per_month': 30}
```
**Validation**:
```
top_dragon_and_tiger() 返回合并后的 player_df, 按 entity_id+timestamp 双索引排序,可见重复出现的知名席位。
```
## `KUC-RS-003`
**Source**: `examples/research/top_tags.py`
统计每月涨幅前30股票的市值分布,验证"小市值效应"假设, 并记录每个月度强势股对应时点的市值及得分,为选股规则制定提供实证依据。
**Inputs**:
- {'data_provider': 'em'}
- {'start_timestamp': '2020-01-01'}
- {'end_timestamp': '2021-01-01'}
**Components**:
- get_top_performance_by_month
- Stock1dHfqKdata
- top_tags (自定义函数)
**Parameters**:
```
{'list_days': 250}
```
**Validation**:
```
top_tags() 返回含 {entity_id, timestamp, cap, score} 的记录列表, 分析结果验证"市值90%分布在100亿以下"的假设。
```
## `KUC-ML-001`
**Source**: `examples/ml/sgd.py`
用 SGD 分类器基于MA特征预测A股个股下期价格行为(涨/跌/震荡分类), 结合标准化管道训练+预测,可视化预测结果与实际K线对比。
**Inputs**:
- {'data_provider': 'em'}
- {'entity_ids': ['stock_sz_000001']}
- {'label_method': 'behavior_cls'}
**Components**:
- MaStockMLMachine
- SGDClassifier (sklearn)
- StandardScaler
- make_pipeline
**Parameters**:
```
{'max_iter': 1000, 'tol': '1e-3'}
```
**Validation**:
```
machine.draw_result(entity_id="stock_sz_000001") 展示预测结果图; 预测准确率可通过 machine.predict() 返回的 DataFrame 评估。
```
## `KUC-ML-002`
**Source**: `examples/ml/sgd.py`
用 SGD 回归器基于MA特征直接预测A股个股下期收益率(连续值), 与分类模式对比,评估线性模型的预测能力。
**Inputs**:
- {'data_provider': 'em'}
- {'entity_ids': ['stock_sz_000001']}
- {'label_method': 'raw'}
**Components**:
- MaStockMLMachine
- SGDRegressor (sklearn)
- StandardScaler
- make_pipeline
**Parameters**:
```
{'max_iter': 1000, 'tol': '1e-3'}
```
**Validation**:
```
machine.draw_result(entity_id="stock_sz_000001") 展示回归预测线; 预测误差通过 MSE/MAE 评估。
```
## `KUC-IN-001`
**Source**: `examples/intent/intent.py`
对比沪指与道琼斯指数自2000年起的相对表现(同基归一化), 直观展示中美股市的长期相关性与分化,辅助宏观择时判断。
**Inputs**:
- {'entity_ids': ['index_sh_000001', 'indexus_us_SPX']}
- {'start_timestamp': '2000-01-01'}
- {'scale_value': 100}
**Components**:
- Index
- Indexus
- Index1dKdata
- Indexus1dKdata
- compare
**Validation**:
```
compare() 弹出含双轨叠加的折线图,Y轴为归一化值(基期=100)。
```
## `KUC-IN-002`
**Source**: `examples/intent/intent.py`
比较美债收益率(2年/5年)与道指走势的历史关系, 验证"高利率压制股市"假设,辅助美联储政策周期下的资产配置。
**Inputs**:
- {'entity_ids': ['country_galaxy_US', 'indexus_us_SPX']}
- {'start_timestamp': '1990-01-01'}
**Components**:
- TreasuryYield
- Indexus1dKdata
- compare
**Parameters**:
```
{'scale_value': None, 'schema_map_columns': {'TreasuryYield': ['yield_2', 'yield_5'], 'Indexus1dKdata': ['close']}}
```
**Validation**:
```
compare() 展示多轨折线图,可见利率与指数的反向关系。
```
## `KUC-IN-003`
**Source**: `examples/intent/intent.py`
对比江西铜业股票与沪铜期货走势(归一化), 验证"资源股跟踪商品价格"假设,识别股价与商品价格的背离机会。
**Inputs**:
- {'entity_ids': ['stock_sh_600362', 'future_shfe_CU']}
- {'start_timestamp': '2005-01-01'}
- {'scale_value': 100}
**Components**:
- compare
**Validation**:
```
compare() 展示归一化双轨折线,可见铜业股与铜期货的高度相关走势。
```
## `KUC-IN-004`
**Source**: `examples/intent/intent.py`
比较铜、铝、螺纹钢三种工业金属的价格走势, 识别金属品种间的轮动规律与分化,为跨品种套利提供参考。
**Inputs**:
- {'entity_ids': ['future_shfe_CU', 'future_shfe_AL', 'future_shfe_RB']}
- {'start_timestamp': '2009-04-01'}
- {'scale_value': 100}
**Components**:
- compare
**Validation**:
```
compare() 展示三条归一化折线,可见品种间分化时段。
```
## `KUC-IN-005`
**Source**: `examples/intent/intent.py`
比较纳指/标普/美元指数三者走势(2015年后), 研究"美元强弱对美股的压制效应",辅助海外资产配置。
**Inputs**:
- {'entity_ids': ['indexus_us_NDX', 'indexus_us_SPX', 'indexus_us_UDI']}
- {'start_timestamp': '2015-01-01'}
- {'scale_value': 100}
**Components**:
- Indexus1dKdata
- compare
**Parameters**:
```
{'schema_map_columns': {'Indexus1dKdata': ['close']}}
```
**Validation**:
```
compare() 展示三轨折线,可分析纳指与美元指数的走势分化。
```
## `KUC-IN-006`
**Source**: `examples/intent/intent.py`
对比人民币兑美元汇率(USDCNY)与沪指走势, 研究汇率贬值对A股资金流向的影响,辅助外资流出风险判断。
**Inputs**:
- {'entity_ids': ['index_sh_000001', 'currency_forex_USDCNYC']}
- {'start_timestamp': '2005-01-01'}
- {'scale_value': 100}
**Components**:
- Currency1dKdata
- Index1dKdata
- compare
**Parameters**:
```
{'schema_map_columns': {'Currency1dKdata': ['close'], 'Index1dKdata': ['close']}}
```
**Validation**:
```
compare() 展示双轨折线,可见汇率与指数的阶段性反相关。
```
## `KUC-TR-001`
**Source**: `examples/trader/ma_trader.py`
最简单的MA均线交叉回测:5日线上穿10日线买入、下穿卖出, 验证双均线策略在单只/多只A股上的历史收益, 提供最基础的趋势跟踪策略 baseline。
**Inputs**:
- {'codes': ['000338']}
- {'level': 'IntervalLevel.LEVEL_1DAY'}
- {'start_timestamp': '2019-01-01'}
- {'end_timestamp': '2019-06-30'}
- {'windows': [5, 10]}
**Components**:
- CrossMaFactor
- MyMaTrader (继承 StockTrader)
**Parameters**:
```
{'need_persist': False, 'trader_name': '000338_ma_trader'}
```
**Validation**:
```
trader.run() 完成后,zvt 数据库中有 trader_name 对应的交易记录; trader.draw_result() 展示净值曲线。
```
## `KUC-TR-002`
**Source**: `examples/trader/ma_trader.py`
MACD多头市场过滤+MA均线交叉的组合策略回测:只在大趋势向上(BullFactor)时 持有多头仓位,降低熊市做多风险,验证加入趋势过滤的策略改进效果。
**Inputs**:
- {'codes': ['000338']}
- {'level': 'IntervalLevel.LEVEL_1DAY'}
- {'start_timestamp': '2019-01-01'}
- {'adjust_type': 'hfq'}
**Components**:
- BullFactor
- MyBullTrader (继承 StockTrader)
**Validation**:
```
trader.run() 完成,净值曲线较纯MA策略回撤更低。
```
## `KUC-TR-003`
**Source**: `examples/trader/macd_day_trader.py`
日线 MACD 金叉策略完整框架示例:演示如何在 StockTrader 中 覆盖止盈止损(on_profit_control)、开收盘钩子(on_trading_open/close)、 交易信号批处理(on_trading_signals)等生命周期方法。
**Inputs**:
- {'start_timestamp': '2019-01-01'}
- {'end_timestamp': '2020-01-01'}
- {'provider': 'joinquant'}
- {'level': 'IntervalLevel.LEVEL_1DAY'}
**Components**:
- GoldCrossFactor
- MacdDayTrader (继承 StockTrader)
**Parameters**:
```
{'start_offset_days': -50}
```
**Validation**:
```
trader.run() 不报错;各 hook 方法被正确调用(可在 override 中添加日志验证)。
```
## `KUC-TR-004`
**Source**: `examples/trader/macd_week_and_day_trader.py`
周线+日线双时间框架 MACD 策略:只有周线和日线同时金叉时才开多仓, 降低误信号,验证多周期共振策略的信号质量提升。
**Inputs**:
- {'start_timestamp': '2019-01-01'}
- {'end_timestamp': '2020-01-01'}
- {'provider': 'joinquant'}
**Components**:
- GoldCrossFactor (LEVEL_1WEEK)
- GoldCrossFactor (LEVEL_1DAY)
- MultipleLevelTrader (继承 StockTrader)
**Parameters**:
```
{'on_targets_selected_from_levels': 'override 可自定义多级别合并逻辑'}
```
**Validation**:
```
trader.run() 完成;相比纯日线策略,交易次数减少但胜率更高。
```
## `KUC-TR-005`
**Source**: `examples/trader/dragon_and_tiger_trader.py`
基于龙虎榜跟踪机构专用席位:当"机构专用"席位在某股票上榜时产生买入信号, 依此验证"跟机构席位"策略的历史有效性。
**Inputs**:
- {'start_timestamp': '2020-01-01'}
- {'end_timestamp': '2022-05-01'}
- {'provider': 'em'}
**Components**:
- DragonTigerFactor (继承 Factor,数据源为 DragonAndTiger)
- MyTrader (继承 StockTrader)
**Parameters**:
```
{'filter': 'DragonAndTiger.dep1 == "机构专用"'}
```
**Validation**:
```
trader.run() 完成,交易记录中开仓时间与龙虎榜机构上榜日期一致。
```
## `KUC-TR-006`
**Source**: `examples/trader/follow_ii_trader.py`
跟随公募基金持仓变化做多/平仓:当基金持仓比例季报新增超5%时买入, 减持超50%时卖出,验证"跟基金重仓变化"的交易逻辑在单只股票(茅台)上的效果。
**Inputs**:
- {'code': '600519'}
- {'start_timestamp': '2002-01-01'}
- {'end_timestamp': '2021-01-01'}
- {'adjust_type': 'AdjustType.qfq'}
**Components**:
- FollowIITrader (继承 StockTrader,override on_time)
- StockActorSummary
- Stock1dKdata
**Parameters**:
```
{'actor_type': 'ActorType.raised_fund', 'long_threshold': 0.05, 'short_threshold': -0.5, 'profit_threshold': None}
```
**Validation**:
```
trader.run() 完成;可视化净值曲线在2010-2021年茅台上涨期间应呈显著正收益。
```
## `KUC-TR-007`
**Source**: `examples/trader/keep_run_trader.py`
滚动40天区间多因子组合策略:每40天重新计算股票池(成交量前30%+机构重仓前30%的交集), 结合周线牛市判断+日线金叉入场,验证动态股票池筛选对组合策略的增益。
**Inputs**:
- {'start': '2019-01-01'}
- {'end': '2021-01-01'}
- {'interval': 40}
- {'vol_pct': 0.3}
- {'ii_pct': 0.3}
**Components**:
- MultipleLevelTrader (BullFactor LEVEL_1WEEK + GoldCrossFactor LEVEL_1DAY)
- get_top_volume_entities
- get_top_fund_holding_stocks
- split_time_interval
- clear_trader
**Parameters**:
```
{'keep_history': True, 'draw_result': False, 'rich_mode': False, 'trader_name': 'keep_run_trader'}
```
**Validation**:
```
所有时间段遍历完成后,zvt 数据库中有完整的 keep_run_trader 交易历史记录。
```
## `KUC-QU-001`
**Source**: `examples/query_snippet.py`
演示 StockTags 的 JSON 字段查询能力:通过 SQLite JSON_EXTRACT 函数 按子标签(如"低空经济")精确筛选A股,解决标签多值存储下的高效检索问题。
**Inputs**:
- {'sub_tag': '低空经济'}
**Components**:
- StockTags
- func.json_extract (SQLAlchemy)
**Parameters**:
```
{'json_path': '$."{tag}"'}
```
**Validation**:
```
query_json() 返回含对应子标签的 DataFrame,行数与东方财富"低空经济"板块成份数接近。
```
## `KUC-QU-002`
**Source**: `examples/query_snippet.py`
快速获取当前标签库的覆盖缺口:找出尚无标签的在市A股代码列表, 为标签体系维护提供自动差异发现。
**Inputs**:
- {'provider': 'em'}
- {'ignore_delist': True}
- {'ignore_st': True}
**Components**:
- StockTags
- get_entity_ids_by_filter
**Validation**:
```
get_stocks_without_tag() 返回非空列表,代码均可在东方财富查到且无标签记录。
```
## `KUC-QU-003`
**Source**: `examples/tag_utils.py`
为一批A股个股自动生成默认行业标签(通过行业板块->主题映射表), 解决大批量新股入库时缺少标签的冷启动问题。
**Inputs**:
- {'codes': '无标签股票代码列表'}
- {'provider': 'em'}
**Components**:
- BlockStock
- Block
- industry_to_tag (行业->主题映射函数)
- build_default_tags
**Parameters**:
```
{'block_category': 'industry'}
```
**Validation**:
```
build_default_tags(codes) 返回含 code/name/tag/reason 字段的字典列表, 无行业信息的股票自动打印告警。
```
## `KUC-QU-004`
**Source**: `examples/utils.py`
按热词配置(hot.json)对选出的个股用新闻标题分组, 归类到主题(如"华为"、"新能源"等),辅助人工快速了解选股的热门主题背景。
**Inputs**:
- {'entities': '已选个股列表'}
- {'hot_words_config': 'hot.json(主题:关键词列表)'}
- {'days_ago': 60}
- {'threshold': 3}
**Components**:
- StockNews
- group_stocks_by_topic
- msg_group_stocks_by_topic
**Parameters**:
```
{'entity_ids': '自动从 entities 提取'}
```
**Validation**:
```
msg_group_stocks_by_topic() 返回按主题分组的字符串, 包含"^^^^^^ 主题(N) ^^^^^^"格式。
```
## `KUC-QU-005`
**Source**: `examples/migration.py`
演示如何用 Pydantic + SQLAlchemy Mixin 向 zvt 注册自定义数据 schema, 支持业务团队扩展本地数据库存储自定义实体和 JSON 字段,无需修改框架核心。
**Inputs**:
- {'custom_schema': 'User (含 added_col: String, json_col: JSON)'}
- {'db_name': 'test'}
- {'providers': ['zvt']}
**Components**:
- Mixin
- register_schema
- get_db_session
- UserModel (Pydantic BaseModel)
**Parameters**:
```
{'declarative_base': 'ZvtInfoBase'}
```
**Validation**:
```
UserModel.validate(user) 不报错,说明 SQLAlchemy ORM 对象可无缝转换为 Pydantic 模型。
```
FILE:references/WISDOM.md
# Cross-Project Wisdom
Total: **23**
## `CW-BT-001` — Cerebro 统一编排引擎
**From**: backtrader · **Applicable to**: backtesting
backtrader 用 Cerebro 作为单一入口,统一管理 data feeds、strategies、analyzers、 observers 的生命周期,支持一次 cerebro.run() 跑多策略+多数据源。 zvt 的 StockTrader 目前每次实例化只绑定一套因子,缺乏统一的多策略组合编排层; 借鉴 Cerebro 模式可让用户把多个 Trader 实例组合到一个 runner 中对比评估。
## `CW-BT-002` — Analyzer 插件化绩效评估
**From**: backtrader · **Applicable to**: backtesting
backtrader 提供 SharpeRatio、DrawDown、TimeReturn、TradeAnalyzer 等即插即用 的 Analyzer,可在不修改策略代码的情况下附加任意绩效指标。 zvt 当前绩效评估能力较弱,没有标准化的 Analyzer 接口; 借鉴此模式可让用户 cerebro.addanalyzer(SharpeRatio) 即得风险调整收益报告。
## `CW-BT-003` — Sizer 仓位管理分离
**From**: backtrader · **Applicable to**: backtesting
backtrader 将仓位管理(每次开仓买多少股/多大比例)单独抽象为 Sizer, 与信号逻辑完全解耦;内置 FixedSize、PercentSizer 等,用户可自定义。 zvt 目前没有显式的 Sizer 概念,仓位控制逻辑散落在 Trader.on_profit_control 等钩子中; 引入 Sizer 接口可使策略信号与资金管理规则独立演化和组合复用。
## `CW-BT-004` — Order 类型全集(Limit/Stop/OCO/Bracket)
**From**: backtrader · **Applicable to**: backtesting
backtrader 支持 Market、Limit、Stop、StopLimit、OCO(二选一)、 Bracket(止盈止损一对订单)等丰富订单类型,并模拟成交滑点和手续费方案。 zvt 回测目前主要支持市价成交,缺乏限价委托和组合订单模拟; 对于高频或实盘对接场景,完善订单类型将大幅提升回测真实性。
## `CW-BT-005` — 数据重采样与重播(Resampling & Replaying)
**From**: backtrader · **Applicable to**: backtesting
backtrader 可将低级别数据(如 1 min)实时 resample 为高级别(如 1 day)并同步驱动策略, 或 replay 逐 tick 模拟 OHLC 形成过程,实现日内精细回测。 zvt 目前多时间框架通过预录入不同级别 K 线实现,缺少运行时动态重采样; 借鉴此模式可在不重复录入数据的前提下支持任意时间粒度组合回测。
## `CW-VN-001` — 事件驱动引擎 (EventEngine)
**From**: vnpy · **Applicable to**: live-trading
vnpy 的核心是一个异步事件总线(EventEngine),行情推送、委托回报、 成交通知等均以事件消息方式在各 App/Gateway 间流转, 天然支持实盘+回测同一套代码逻辑。 zvt 目前数据流是同步批量拉取,缺乏事件驱动架构; 对接实盘行情推送(如 WebSocket tick 流)时,事件驱动模式可大幅降低延迟。
## `CW-VN-002` — Gateway 多交易所统一接口抽象
**From**: vnpy · **Applicable to**: live-trading
vnpy 的 Gateway 层对 CTP 期货、XTP 证券、IB 等几十个交易接口做统一封装, 策略层只调用 buy/sell/cancel 通用接口,无需感知底层协议差异。 zvt 目前数据录入依赖具体 provider(em/joinquant),无统一的实盘交易 Gateway; 引入 Gateway 抽象可使 zvt 的因子+选股逻辑无缝对接实盘下单。
## `CW-VN-003` — CTA 回测引擎内置可视化
**From**: vnpy · **Applicable to**: backtesting
vnpy 的 cta_backtester 提供图形界面直接展示策略净值曲线、最大回撤、 每日盈亏、成交明细,无需 Jupyter Notebook。 zvt 目前回测结果可视化依赖 draw_result 方法调用 Plotly,但无统一的回测报告页面; 借鉴此模式可打包一个开箱即用的策略绩效仪表盘。
## `CW-VN-004` — vnpy.alpha ML 因子研究实验室(Lab)
**From**: vnpy · **Applicable to**: factor-research
vnpy 4.0 的 vnpy.alpha.lab 提供数据管理、模型训练、信号生成、策略回测一体化工作流, 支持 Lasso/LightGBM/MLP 等算法的标准化训练接口和可视化对比。 zvt 的 ML 能力目前仅有 MaStockMLMachine 一个入口,缺乏规范化 Lab 框架; 借鉴 Lab 模式可建立"特征工程→训练→信号→回测"的标准流水线,降低 ML 实验门槛。
## `CW-VN-005` — 价差交易(Spread Trading)模块
**From**: vnpy · **Applicable to**: live-trading
vnpy 支持自定义价差(如期货跨期套利、A股与港股溢价套利), 实时计算价差行情、自动触发价差策略委托。 zvt 目前 compare() 只做可视化对比,缺乏价差信号计算和交易执行; 借鉴价差模块可扩展 zvt 到统计套利场景(如 AH 溢价、指数与成份股套利)。
## `CW-QL-001` — Point-in-Time 数据库(防未来数据泄漏)
**From**: qlib · **Applicable to**: backtesting
qlib 的 Point-in-Time Provider 保证在给定时间点 t 的查询只返回 t 时刻 真实可知的数据(财报发布延迟、修订历史均被正确处理), 彻底消除回测中的 look-ahead bias。 zvt 目前财务数据以报告期为 timestamp,缺少"发布日"维度, 存在用未来财报数据做选股的潜在偏差;引入 PIT 模式可大幅提升回测可信度。
## `CW-QL-002` — Recorder + Experiment 实验管理(MLflow 风格)
**From**: qlib · **Applicable to**: factor-research
qlib 的 workflow 模块提供 Experiment/Recorder,自动记录每次模型训练的 超参数、特征、指标、预测结果,支持跨实验比较和模型版本管理。 zvt 目前缺乏 ML 实验追踪机制,每次重跑结果会覆盖前次; 借鉴 Recorder 模式可将每次因子实验的参数和结果持久化,支持快速复现和版本对比。
## `CW-QL-003` — Nested Decision Framework(多层嵌套决策执行)
**From**: qlib · **Applicable to**: backtesting
qlib 支持将高频执行层(分钟级委托拆单)嵌套在低频决策层(日级组合调仓)中, 两层独立优化且可组合运行,实现日内最优执行算法(如 TWAP、VWAP 调仓)。 zvt 目前回测仅有日线级别的成交假设,缺乏执行算法建模; 借鉴嵌套框架可让 zvt 区分"何时持有哪些股"与"如何以最小冲击成本建仓"两个问题。
## `CW-BT-001` — Cerebro 统一编排引擎
**From**: backtrader · **Applicable to**: backtesting
backtrader 用 Cerebro 作为单一入口,统一管理 data feeds、strategies、analyzers、 observers 的生命周期,支持一次 cerebro.run() 跑多策略+多数据源。 zvt 的 StockTrader 目前每次实例化只绑定一套因子,缺乏统一的多策略组合编排层; 借鉴 Cerebro 模式可让用户把多个 Trader 实例组合到一个 runner 中对比评估。
## `CW-BT-002` — Analyzer 插件化绩效评估
**From**: backtrader · **Applicable to**: backtesting
backtrader 提供 SharpeRatio、DrawDown、TimeReturn、TradeAnalyzer 等即插即用 的 Analyzer,可在不修改策略代码的情况下附加任意绩效指标。 zvt 当前绩效评估能力较弱,没有标准化的 Analyzer 接口; 借鉴此模式可让用户 cerebro.addanalyzer(SharpeRatio) 即得风险调整收益报告。
## `CW-BT-003` — Sizer 仓位管理分离
**From**: backtrader · **Applicable to**: backtesting
backtrader 将仓位管理(每次开仓买多少股/多大比例)单独抽象为 Sizer, 与信号逻辑完全解耦;内置 FixedSize、PercentSizer 等,用户可自定义。 zvt 目前没有显式的 Sizer 概念,仓位控制逻辑散落在 Trader.on_profit_control 等钩子中; 引入 Sizer 接口可使策略信号与资金管理规则独立演化和组合复用。
## `CW-BT-004` — Order 类型全集(Limit/Stop/OCO/Bracket)
**From**: backtrader · **Applicable to**: backtesting
backtrader 支持 Market、Limit、Stop、StopLimit、OCO(二选一)、 Bracket(止盈止损一对订单)等丰富订单类型,并模拟成交滑点和手续费方案。 zvt 回测目前主要支持市价成交,缺乏限价委托和组合订单模拟; 对于高频或实盘对接场景,完善订单类型将大幅提升回测真实性。
## `CW-BT-005` — 数据重采样与重播(Resampling & Replaying)
**From**: backtrader · **Applicable to**: backtesting
backtrader 可将低级别数据(如 1 min)实时 resample 为高级别(如 1 day)并同步驱动策略, 或 replay 逐 tick 模拟 OHLC 形成过程,实现日内精细回测。 zvt 目前多时间框架通过预录入不同级别 K 线实现,缺少运行时动态重采样; 借鉴此模式可在不重复录入数据的前提下支持任意时间粒度组合回测。
## `CW-VN-003` — CTA 回测引擎内置可视化
**From**: vnpy · **Applicable to**: backtesting
vnpy 的 cta_backtester 提供图形界面直接展示策略净值曲线、最大回撤、 每日盈亏、成交明细,无需 Jupyter Notebook。 zvt 目前回测结果可视化依赖 draw_result 方法调用 Plotly,但无统一的回测报告页面; 借鉴此模式可打包一个开箱即用的策略绩效仪表盘。
## `CW-VN-004` — vnpy.alpha ML 因子研究实验室(Lab)
**From**: vnpy · **Applicable to**: factor-research
vnpy 4.0 的 vnpy.alpha.lab 提供数据管理、模型训练、信号生成、策略回测一体化工作流, 支持 Lasso/LightGBM/MLP 等算法的标准化训练接口和可视化对比。 zvt 的 ML 能力目前仅有 MaStockMLMachine 一个入口,缺乏规范化 Lab 框架; 借鉴 Lab 模式可建立"特征工程→训练→信号→回测"的标准流水线,降低 ML 实验门槛。
## `CW-QL-001` — Point-in-Time 数据库(防未来数据泄漏)
**From**: qlib · **Applicable to**: backtesting
qlib 的 Point-in-Time Provider 保证在给定时间点 t 的查询只返回 t 时刻 真实可知的数据(财报发布延迟、修订历史均被正确处理), 彻底消除回测中的 look-ahead bias。 zvt 目前财务数据以报告期为 timestamp,缺少"发布日"维度, 存在用未来财报数据做选股的潜在偏差;引入 PIT 模式可大幅提升回测可信度。
## `CW-QL-002` — Recorder + Experiment 实验管理(MLflow 风格)
**From**: qlib · **Applicable to**: factor-research
qlib 的 workflow 模块提供 Experiment/Recorder,自动记录每次模型训练的 超参数、特征、指标、预测结果,支持跨实验比较和模型版本管理。 zvt 目前缺乏 ML 实验追踪机制,每次重跑结果会覆盖前次; 借鉴 Recorder 模式可将每次因子实验的参数和结果持久化,支持快速复现和版本对比。
## `CW-QL-003` — Nested Decision Framework(多层嵌套决策执行)
**From**: qlib · **Applicable to**: backtesting
qlib 支持将高频执行层(分钟级委托拆单)嵌套在低频决策层(日级组合调仓)中, 两层独立优化且可组合运行,实现日内最优执行算法(如 TWAP、VWAP 调仓)。 zvt 目前回测仅有日线级别的成交假设,缺乏执行算法建模; 借鉴嵌套框架可让 zvt 区分"何时持有哪些股"与"如何以最小冲击成本建仓"两个问题。
FILE:references/components/api.md
# api (2 classes)
## `WindowMethod`
`api/stats.py:28`
## `TopType`
`api/stats.py:34`
FILE:references/components/broker.md
# broker (6 classes)
## `QmtContext`
`broker/qmt/context.py:8`
## `TraderError`
`broker/qmt/errors.py:2`
> Base class for exceptions in this module.
## `QmtError`
`broker/qmt/errors.py:8`
## `PositionOverflowError`
`broker/qmt/errors.py:13`
## `MyXtQuantTraderCallback`
`broker/qmt/qmt_account.py:26`
## `QmtStockAccount`
`broker/qmt/qmt_account.py:107`
FILE:references/components/common.md
# common (9 classes)
## `OrderByType`
`common/query_models.py:9`
## `TimeUnit`
`common/query_models.py:14`
## `AbsoluteTimeRange`
`common/query_models.py:23`
## `RelativeTimeRage`
`common/query_models.py:28`
## `TimeRange`
`common/query_models.py:33`
## `PositionType`
`common/trading_models.py:8`
## `BuyParameter`
`common/trading_models.py:17`
## `SellParameter`
`common/trading_models.py:25`
## `TradingResult`
`common/trading_models.py:30`
FILE:references/components/contract.md
# contract (53 classes)
## `IntervalLevel`
`contract/__init__.py:5`
> Repeated fixed time interval, e.g, 5m, 1d.
## `AdjustType`
`contract/__init__.py:121`
> split-adjusted type for :class:`~.zvt.contract.schema.TradableEntity` quotes
## `ActorType`
`contract/__init__.py:138`
## `TradableType`
`contract/__init__.py:159`
## `Exchange`
`contract/__init__.py:203`
## `StatefulService`
`contract/base_service.py:10`
> Base service with state could be stored in state_schema
## `OneStateService`
`contract/base_service.py:65`
> StatefulService which saving all states in one object
## `EntityStateService`
`contract/base_service.py:87`
> StatefulService which saving one state one entity
## `Registry`
`contract/context.py:13`
> Class storing zvt registering meta
## `Bean`
`contract/data_type.py:4`
## `ChartType`
`contract/drawer.py:20`
> Chart type enum
## `Rect`
`contract/drawer.py:45`
> rect struct with left-bottom(x0, y0), right-top(x1, y1)
## `Draw`
`contract/drawer.py:61`
## `Drawable`
`contract/drawer.py:231`
## `StackedDrawer`
`contract/drawer.py:296`
## `Drawer`
`contract/drawer.py:407`
## `TargetType`
`contract/factor.py:22`
## `Indicator`
`contract/factor.py:28`
## `Transformer`
`contract/factor.py:34`
## `Accumulator`
`contract/factor.py:82`
## `Scorer`
`contract/factor.py:163`
## `FactorMeta`
`contract/factor.py:181`
## `Factor`
`contract/factor.py:188`
## `ScoreFactor`
`contract/factor.py:667`
## `CustomModel`
`contract/model.py:7`
## `MixinModel`
`contract/model.py:11`
## `NormalData`
`contract/normal_data.py:6`
## `DataListener`
`contract/reader.py:16`
## `DataReader`
`contract/reader.py:40`
## `Meta`
`contract/recorder.py:71`
## `Recorder`
`contract/recorder.py:91`
## `EntityEventRecorder`
`contract/recorder.py:147`
## `TimeSeriesDataRecorder`
`contract/recorder.py:245`
## `FixedCycleDataRecorder`
`contract/recorder.py:612`
## `TimestampsDataRecorder`
`contract/recorder.py:712`
## `RouteRegistry`
`contract/route_registry.py:28`
> Maps (provider, db_name) or (provider, data_schema) to storage_id.
## `Mixin`
`contract/schema.py:34`
> Base class of schema.
## `NormalMixin`
`contract/schema.py:326`
## `Entity`
`contract/schema.py:333`
## `TradableEntity`
`contract/schema.py:348`
> tradable entity
## `ActorEntity`
`contract/schema.py:534`
## `NormalEntityMixin`
`contract/schema.py:538`
## `Portfolio`
`contract/schema.py:545`
> composition of tradable entities
## `PortfolioStock`
`contract/schema.py:580`
## `PortfolioStockHistory`
`contract/schema.py:596`
## `TradableMeetActor`
`contract/schema.py:613`
## `ActorMeetTradable`
`contract/schema.py:626`
## `StorageBackend`
`contract/storage.py:38`
> Abstract storage backend. Decouples physical storage from domain/read/record logic.
## `SqliteStorageBackend`
`contract/storage.py:65`
> SQLite storage backend. Default path: {data_path}/{provider}/{provider}_{db_name}.db
## `StateMixin`
`contract/zvt_info.py:11`
## `RecorderState`
`contract/zvt_info.py:19`
> Schema for storing recorder state
## `TaggerState`
`contract/zvt_info.py:27`
> Schema for storing tagger state
## `FactorState`
`contract/zvt_info.py:35`
> Schema for storing factor state
FILE:references/components/domain.md
# domain (114 classes)
## `BlockCategory`
`domain/__init__.py:5`
## `IndexCategory`
`domain/__init__.py:14`
## `ReportPeriod`
`domain/__init__.py:48`
## `CompanyType`
`domain/__init__.py:59`
## `ActorMeta`
`domain/actor/actor_meta.py:12`
## `StockTopTenFreeHolder`
`domain/actor/stock_actor.py:11`
## `StockTopTenHolder`
`domain/actor/stock_actor.py:25`
## `StockInstitutionalInvestorHolder`
`domain/actor/stock_actor.py:39`
## `StockActorSummary`
`domain/actor/stock_actor.py:53`
## `LimitUpInfo`
`domain/emotion/emotion.py:11`
## `LimitDownInfo`
`domain/emotion/emotion.py:46`
## `Emotion`
`domain/emotion/emotion.py:63`
## `DividendFinancing`
`domain/fundamental/dividend_financing.py:11`
## `DividendDetail`
`domain/fundamental/dividend_financing.py:32`
## `SpoDetail`
`domain/fundamental/dividend_financing.py:49`
## `RightsIssueDetail`
`domain/fundamental/dividend_financing.py:60`
## `BalanceSheet`
`domain/fundamental/finance.py:11`
## `IncomeStatement`
`domain/fundamental/finance.py:460`
## `CashFlowStatement`
`domain/fundamental/finance.py:629`
## `FinanceFactor`
`domain/fundamental/finance.py:831`
## `ManagerTrading`
`domain/fundamental/trading.py:11`
## `HolderTrading`
`domain/fundamental/trading.py:37`
## `BigDealTrading`
`domain/fundamental/trading.py:53`
## `MarginTrading`
`domain/fundamental/trading.py:71`
## `DragonAndTiger`
`domain/fundamental/trading.py:91`
## `StockValuation`
`domain/fundamental/valuation.py:11`
## `EtfValuation`
`domain/fundamental/valuation.py:38`
## `Economy`
`domain/macro/macro.py:11`
## `TreasuryYield`
`domain/macro/monetary.py:11`
## `Block`
`domain/meta/block_meta.py:14`
## `BlockStock`
`domain/meta/block_meta.py:21`
## `Blockus`
`domain/meta/blockus_meta.py:14`
## `BlockusStockus`
`domain/meta/blockus_meta.py:21`
## `CBond`
`domain/meta/cbond_meta.py:13`
## `Country`
`domain/meta/country_meta.py:13`
## `Currency`
`domain/meta/currency_meta.py:12`
## `Etf`
`domain/meta/etf_meta.py:15`
## `EtfStock`
`domain/meta/etf_meta.py:26`
## `Fund`
`domain/meta/fund_meta.py:14`
## `FundStock`
`domain/meta/fund_meta.py:53`
## `Future`
`domain/meta/future_meta.py:11`
## `Index`
`domain/meta/index_meta.py:14`
## `IndexStock`
`domain/meta/index_meta.py:26`
## `Indexhk`
`domain/meta/indexhk_meta.py:14`
## `Indexus`
`domain/meta/indexus_meta.py:14`
## `Stock`
`domain/meta/stock_meta.py:14`
## `StockDetail`
`domain/meta/stock_meta.py:32`
## `Stockhk`
`domain/meta/stockhk_meta.py:13`
## `Stockus`
`domain/meta/stockus_meta.py:14`
## `HkHolder`
`domain/misc/holder.py:11`
## `TopTenTradableHolder`
`domain/misc/holder.py:29`
## `TopTenHolder`
`domain/misc/holder.py:52`
## `InstitutionalInvestorHolder`
`domain/misc/holder.py:75`
## `BlockMoneyFlow`
`domain/misc/money_flow.py:14`
## `StockMoneyFlow`
`domain/misc/money_flow.py:48`
## `IndexMoneyFlow`
`domain/misc/money_flow.py:82`
## `StockSummary`
`domain/misc/overall.py:14`
## `MarginTradingSummary`
`domain/misc/overall.py:33`
## `CrossMarketSummary`
`domain/misc/overall.py:56`
## `StockNews`
`domain/misc/stock_news.py:11`
## `StockHotTopic`
`domain/misc/stock_news.py:28`
## `KdataCommon`
`domain/quotes/__init__.py:7`
## `TickCommon`
`domain/quotes/__init__.py:33`
## `BlockKdataCommon`
`domain/quotes/__init__.py:60`
## `IndexKdataCommon`
`domain/quotes/__init__.py:64`
## `IndexhkKdataCommon`
`domain/quotes/__init__.py:68`
## `IndexusKdataCommon`
`domain/quotes/__init__.py:72`
## `EtfKdataCommon`
`domain/quotes/__init__.py:76`
## `StockKdataCommon`
`domain/quotes/__init__.py:83`
## `StockusKdataCommon`
`domain/quotes/__init__.py:90`
## `StockhkKdataCommon`
`domain/quotes/__init__.py:97`
## `FutureKdataCommon`
`domain/quotes/__init__.py:102`
## `CurrencyKdataCommon`
`domain/quotes/__init__.py:113`
## `Block1dKdata`
`domain/quotes/block/block_1d_kdata.py:11`
## `Block1monKdata`
`domain/quotes/block/block_1mon_kdata.py:11`
## `Block1wkKdata`
`domain/quotes/block/block_1wk_kdata.py:11`
## `Currency1dKdata`
`domain/quotes/currency/currency_1d_kdata.py:11`
## `Etf1dKdata`
`domain/quotes/etf/etf_1d_kdata.py:11`
## `Future1dKdata`
`domain/quotes/future/future_1d_kdata.py:11`
## `Index1dKdata`
`domain/quotes/index/index_1d_kdata.py:11`
## `Index1mKdata`
`domain/quotes/index/index_1m_kdata.py:12`
## `Index1wkKdata`
`domain/quotes/index/index_1wk_kdata.py:11`
## `Indexhk1dKdata`
`domain/quotes/indexhk/indexhk_1d_kdata.py:11`
## `Indexus1dKdata`
`domain/quotes/indexus/indexus_1d_kdata.py:11`
## `Stock15mHfqKdata`
`domain/quotes/stock/stock_15m_hfq_kdata.py:11`
## `Stock15mKdata`
`domain/quotes/stock/stock_15m_kdata.py:11`
## `Stock1dHfqKdata`
`domain/quotes/stock/stock_1d_hfq_kdata.py:11`
## `Stock1dKdata`
`domain/quotes/stock/stock_1d_kdata.py:11`
## `Stock1hHfqKdata`
`domain/quotes/stock/stock_1h_hfq_kdata.py:11`
## `Stock1hKdata`
`domain/quotes/stock/stock_1h_kdata.py:11`
## `Stock1mHfqKdata`
`domain/quotes/stock/stock_1m_hfq_kdata.py:11`
## `Stock1mKdata`
`domain/quotes/stock/stock_1m_kdata.py:11`
## `Stock1monHfqKdata`
`domain/quotes/stock/stock_1mon_hfq_kdata.py:11`
## `Stock1monKdata`
`domain/quotes/stock/stock_1mon_kdata.py:11`
## `Stock1wkHfqKdata`
`domain/quotes/stock/stock_1wk_hfq_kdata.py:11`
## `Stock1wkKdata`
`domain/quotes/stock/stock_1wk_kdata.py:11`
## `Stock30mHfqKdata`
`domain/quotes/stock/stock_30m_hfq_kdata.py:11`
## `Stock30mKdata`
`domain/quotes/stock/stock_30m_kdata.py:11`
## `Stock4hHfqKdata`
`domain/quotes/stock/stock_4h_hfq_kdata.py:11`
## `Stock4hKdata`
`domain/quotes/stock/stock_4h_kdata.py:11`
## `Stock5mHfqKdata`
`domain/quotes/stock/stock_5m_hfq_kdata.py:11`
## `Stock5mKdata`
`domain/quotes/stock/stock_5m_kdata.py:11`
## `StockQuote`
`domain/quotes/stock/stock_quote.py:12`
## `Stock1mQuote`
`domain/quotes/stock/stock_quote.py:32`
## `StockQuoteLog`
`domain/quotes/stock/stock_quote_log.py:11`
## `Stockhk1dHfqKdata`
`domain/quotes/stockhk/stockhk_1d_hfq_kdata.py:11`
## `Stockhk1dKdata`
`domain/quotes/stockhk/stockhk_1d_kdata.py:11`
## `StockhkQuote`
`domain/quotes/stockhk/stockhk_quote.py:12`
## `Stockhk1mQuote`
`domain/quotes/stockhk/stockhk_quote.py:36`
## `Stockus1dHfqKdata`
`domain/quotes/stockus/stockus_1d_hfq_kdata.py:11`
## `Stockus1dKdata`
`domain/quotes/stockus/stockus_1d_kdata.py:11`
## `StockusQuote`
`domain/quotes/stockus/stockus_quote.py:12`
## `Stockus1mQuote`
`domain/quotes/stockus/stockus_quote.py:36`
## `StockTradeDay`
`domain/quotes/trade_day.py:10`
FILE:references/components/factors.md
# factors (54 classes)
## `RankScorer`
`factors/algorithm.py:141`
## `MaTransformer`
`factors/algorithm.py:150`
## `IntersectTransformer`
`factors/algorithm.py:193`
## `MaAndVolumeTransformer`
`factors/algorithm.py:224`
## `MacdTransformer`
`factors/algorithm.py:269`
## `QuantileScorer`
`factors/algorithm.py:311`
## `FactorRequestModel`
`factors/factor_models.py:12`
## `TradingSignalModel`
`factors/factor_models.py:20`
## `FactorResultModel`
`factors/factor_models.py:31`
## `FinanceBaseFactor`
`factors/fundamental/finance_factor.py:13`
## `GoodCompanyFactor`
`factors/fundamental/finance_factor.py:77`
## `MaStatsFactorCommon`
`factors/ma/domain/common.py:7`
## `Stock1dMaFactor`
`factors/ma/domain/stock_1d_ma_factor.py:11`
## `Stock1dMaStatsFactor`
`factors/ma/domain/stock_1d_ma_stats_factor.py:10`
## `MaFactor`
`factors/ma/ma_factor.py:24`
## `CrossMaFactor`
`factors/ma/ma_factor.py:93`
## `VolumeUpMaFactor`
`factors/ma/ma_factor.py:107`
## `CrossMaVolumeFactor`
`factors/ma/ma_factor.py:233`
## `MaStatsAccumulator`
`factors/ma/ma_stats_factor.py:24`
## `MaStatsFactor`
`factors/ma/ma_stats_factor.py:71`
## `TFactor`
`factors/ma/ma_stats_factor.py:147`
## `TopBottomTransformer`
`factors/ma/top_bottom_factor.py:17`
## `TopBottomFactor`
`factors/ma/top_bottom_factor.py:34`
## `MacdFactor`
`factors/macd/macd_factor.py:11`
## `BullFactor`
`factors/macd/macd_factor.py:24`
## `KeepBullFactor`
`factors/macd/macd_factor.py:30`
## `LiveOrDeadFactor`
`factors/macd/macd_factor.py:46`
## `GoldCrossFactor`
`factors/macd/macd_factor.py:56`
## `Direction`
`factors/shape.py:17`
## `Fenxing`
`factors/shape.py:28`
## `FactorStateEncoder`
`factors/shape.py:220`
## `TradeType`
`factors/target_selector.py:16`
## `SelectMode`
`factors/target_selector.py:25`
## `TargetSelector`
`factors/target_selector.py:30`
## `TechnicalFactor`
`factors/technical_factor.py:11`
## `TopStocks`
`factors/top_stocks.py:38`
## `CrossMaTransformer`
`factors/transformers.py:26`
## `SpecificTransformer`
`factors/transformers.py:42`
## `FallBelowTransformer`
`factors/transformers.py:55`
## `FactorStateEncoder`
`factors/zen/base_factor.py:38`
## `ZenState`
`factors/zen/base_factor.py:64`
## `ZenAccumulator`
`factors/zen/base_factor.py:152`
## `ZenFactor`
`factors/zen/base_factor.py:619`
## `ZenFactorCommon`
`factors/zen/domain/common.py:7`
## `Index1dZenFactor`
`factors/zen/domain/index_1d_zen_factor.py:10`
## `Stock1dZenFactor`
`factors/zen/domain/stock_1d_zen_factor.py:10`
## `Stock1wkZenFactor`
`factors/zen/domain/stock_1wk_zen_factor.py:10`
## `ZhongshuRange`
`factors/zen/zen_factor.py:28`
## `ZhongshuLevel`
`factors/zen/zen_factor.py:42`
## `ZhongshuDistance`
`factors/zen/zen_factor.py:60`
## `Zhongshu`
`factors/zen/zen_factor.py:81`
## `ZenState`
`factors/zen/zen_factor.py:118`
## `TrendingFactor`
`factors/zen/zen_factor.py:253`
## `ShakingFactor`
`factors/zen/zen_factor.py:338`
FILE:references/components/informer.md
# informer (4 classes)
## `Informer`
`informer/informer.py:17`
## `EmailInformer`
`informer/informer.py:22`
## `WechatInformer`
`informer/informer.py:95`
## `QiyeWechatBot`
`informer/informer.py:162`
FILE:references/components/misc.md
# misc (2 classes)
## `TimeMessage`
`misc/misc_models.py:7`
## `ZhDate`
`misc/zhdate.py:11`
FILE:references/components/ml.md
# ml (5 classes)
## `BehaviorCategory`
`ml/lables.py:5`
## `RelativePerformance`
`ml/lables.py:12`
## `MLMachine`
`ml/ml.py:46`
## `StockMLMachine`
`ml/ml.py:208`
## `MaStockMLMachine`
`ml/ml.py:212`
FILE:references/components/recorders.md
# recorders (90 classes)
## `ApiWrapper`
`recorders/eastmoney/common.py:15`
## `EastmoneyApiWrapper`
`recorders/eastmoney/common.py:101`
## `BaseEastmoneyRecorder`
`recorders/eastmoney/common.py:106`
## `EastmoneyTimestampsDataRecorder`
`recorders/eastmoney/common.py:140`
## `EastmoneyPageabeDataRecorder`
`recorders/eastmoney/common.py:163`
## `EastmoneyMoreDataRecorder`
`recorders/eastmoney/common.py:201`
## `DividendDetailRecorder`
`recorders/eastmoney/dividend_financing/eastmoney_dividend_detail_recorder.py:7`
## `DividendFinancingRecorder`
`recorders/eastmoney/dividend_financing/eastmoney_dividend_financing_recorder.py:7`
## `RightsIssueDetailRecorder`
`recorders/eastmoney/dividend_financing/eastmoney_rights_issue_detail_recorder.py:10`
## `SPODetailRecorder`
`recorders/eastmoney/dividend_financing/eastmoney_spo_detail_recorder.py:9`
## `BaseChinaStockFinanceRecorder`
`recorders/eastmoney/finance/base_china_stock_finance_recorder.py:36`
## `ChinaStockBalanceSheetRecorder`
`recorders/eastmoney/finance/eastmoney_balance_sheet_recorder.py:433`
## `ChinaStockCashFlowRecorder`
`recorders/eastmoney/finance/eastmoney_cash_flow_recorder.py:176`
## `ChinaStockFinanceFactorRecorder`
`recorders/eastmoney/finance/eastmoney_finance_factor_recorder.py:144`
## `ChinaStockIncomeStatementRecorder`
`recorders/eastmoney/finance/eastmoney_income_statement_recorder.py:158`
## `EastmoneyActorRecorder`
`recorders/eastmoney/holder/eastmoney_stock_actor_recorder.py:10`
## `TopTenHolderRecorder`
`recorders/eastmoney/holder/eastmoney_top_ten_holder_recorder.py:9`
## `TopTenTradableHolderRecorder`
`recorders/eastmoney/holder/eastmoney_top_ten_tradable_holder_recorder.py:6`
## `EastmoneyBlockRecorder`
`recorders/eastmoney/meta/eastmoney_block_meta_recorder.py:14`
## `EastmoneyBlockStockRecorder`
`recorders/eastmoney/meta/eastmoney_block_meta_recorder.py:52`
## `EastmoneyStockRecorder`
`recorders/eastmoney/meta/eastmoney_stock_meta_recorder.py:13`
## `EastmoneyStockDetailRecorder`
`recorders/eastmoney/meta/eastmoney_stock_meta_recorder.py:18`
## `HolderTradingRecorder`
`recorders/eastmoney/trading/eastmoney_holder_trading_recorder.py:7`
## `ManagerTradingRecorder`
`recorders/eastmoney/trading/eastmoney_manager_trading_recorder.py:7`
## `EMStockActorSummaryRecorder`
`recorders/em/actor/em_stock_actor_summary_recorder.py:28`
## `EMStockIIRecorder`
`recorders/em/actor/em_stock_ii_recorder.py:39`
## `EMStockTopTenFreeRecorder`
`recorders/em/actor/em_stock_top_ten_free_recorder.py:16`
## `EMStockTopTenRecorder`
`recorders/em/actor/em_stock_top_ten_recorder.py:16`
## `EMTreasuryYieldRecorder`
`recorders/em/macro/em_treasury_yield_recorder.py:12`
## `EMBlockRecorder`
`recorders/em/meta/em_block_meta_recorder.py:10`
## `EMBlockStockRecorder`
`recorders/em/meta/em_block_meta_recorder.py:21`
## `EMCBondRecorder`
`recorders/em/meta/em_cbond_meta_recorder.py:9`
## `EMCurrencyRecorder`
`recorders/em/meta/em_currency_meta_recorder.py:9`
## `EMFutureRecorder`
`recorders/em/meta/em_future_meta_recorder.py:9`
## `EMIndexRecorder`
`recorders/em/meta/em_index_meta_recorder.py:9`
## `EMIndexhkRecorder`
`recorders/em/meta/em_indexhk_meta_recorder.py:9`
## `EMIndexusRecorder`
`recorders/em/meta/em_indexus_meta_recorder.py:9`
## `EMStockRecorder`
`recorders/em/meta/em_stock_meta_recorder.py:13`
## `EMStockhkRecorder`
`recorders/em/meta/em_stockhk_meta_recorder.py:12`
## `EMStockusRecorder`
`recorders/em/meta/em_stockus_meta_recorder.py:12`
## `EMStockNewsRecorder`
`recorders/em/misc/em_stock_news_recorder.py:12`
## `BaseEMStockKdataRecorder`
`recorders/em/quotes/em_kdata_recorder.py:33`
## `EMStockKdataRecorder`
`recorders/em/quotes/em_kdata_recorder.py:170`
## `EMStockusKdataRecorder`
`recorders/em/quotes/em_kdata_recorder.py:200`
## `EMStockhkKdataRecorder`
`recorders/em/quotes/em_kdata_recorder.py:208`
## `EMIndexhkKdataRecorder`
`recorders/em/quotes/em_kdata_recorder.py:216`
## `EMIndexKdataRecorder`
`recorders/em/quotes/em_kdata_recorder.py:223`
## `EMIndexusKdataRecorder`
`recorders/em/quotes/em_kdata_recorder.py:230`
## `EMBlockKdataRecorder`
`recorders/em/quotes/em_kdata_recorder.py:237`
## `EMFutureKdataRecorder`
`recorders/em/quotes/em_kdata_recorder.py:244`
## `EMCurrencyKdataRecorder`
`recorders/em/quotes/em_kdata_recorder.py:251`
## `EMDragonAndTigerRecorder`
`recorders/em/trading/em_dragon_and_tiger_recorder.py:12`
## `ChinaETFListSpider`
`recorders/exchange/exchange_etf_meta_recorder.py:18`
## `ExchangeIndexRecorder`
`recorders/exchange/exchange_index_recorder.py:9`
## `ExchangeIndexStockRecorder`
`recorders/exchange/exchange_index_stock_recorder.py:15`
## `ExchangeStockMetaRecorder`
`recorders/exchange/exchange_stock_meta_recorder.py:15`
## `ExchangeStockSummaryRecorder`
`recorders/exchange/exchange_stock_summary_recorder.py:13`
## `JqChinaEtfValuationRecorder`
`recorders/joinquant/fundamental/jq_etf_valuation_recorder.py:12`
## `MarginTradingRecorder`
`recorders/joinquant/fundamental/jq_margin_trading_recorder.py:14`
## `JqChinaStockValuationRecorder`
`recorders/joinquant/fundamental/jq_stock_valuation_recorder.py:14`
## `JqChinaFundRecorder`
`recorders/joinquant/meta/jq_fund_meta_recorder.py:15`
## `JqChinaFundStockRecorder`
`recorders/joinquant/meta/jq_fund_meta_recorder.py:72`
## `BaseJqChinaMetaRecorder`
`recorders/joinquant/meta/jq_stock_meta_recorder.py:15`
## `JqChinaStockRecorder`
`recorders/joinquant/meta/jq_stock_meta_recorder.py:44`
## `JqChinaEtfRecorder`
`recorders/joinquant/meta/jq_stock_meta_recorder.py:58`
## `JqChinaStockEtfPortfolioRecorder`
`recorders/joinquant/meta/jq_stock_meta_recorder.py:70`
## `StockTradeDayRecorder`
`recorders/joinquant/meta/jq_trade_day_recorder.py:11`
## `JoinquantHkHolderRecorder`
`recorders/joinquant/misc/jq_hk_holder_recorder.py:20`
## `JoinquantIndexMoneyFlowRecorder`
`recorders/joinquant/misc/jq_index_money_flow_recorder.py:12`
## `JoinquantStockMoneyFlowRecorder`
`recorders/joinquant/misc/jq_stock_money_flow_recorder.py:17`
## `CrossMarketSummaryRecorder`
`recorders/joinquant/overall/jq_cross_market_recorder.py:9`
## `MarginTradingSummaryRecorder`
`recorders/joinquant/overall/jq_margin_trading_recorder.py:14`
## `StockSummaryRecorder`
`recorders/joinquant/overall/jq_stock_summary_recorder.py:21`
## `JqChinaIndexKdataRecorder`
`recorders/joinquant/quotes/jq_index_kdata_recorder.py:18`
## `JqChinaStockKdataRecorder`
`recorders/joinquant/quotes/jq_stock_kdata_recorder.py:17`
## `JqkaLimitUpRecorder`
`recorders/jqka/emotion/JqkaEmotionRecorder.py:24`
## `JqkaLimitDownRecorder`
`recorders/jqka/emotion/JqkaEmotionRecorder.py:99`
## `JqkaEmotionRecorder`
`recorders/jqka/emotion/JqkaEmotionRecorder.py:163`
## `QmtIndexRecorder`
`recorders/qmt/index/qmt_index_recorder.py:16`
## `QMTStockRecorder`
`recorders/qmt/meta/qmt_stock_meta_recorder.py:9`
## `BaseQmtKdataRecorder`
`recorders/qmt/quotes/qmt_kdata_recorder.py:17`
## `QMTStockKdataRecorder`
`recorders/qmt/quotes/qmt_kdata_recorder.py:169`
## `SinaBlockRecorder`
`recorders/sina/meta/sina_block_recorder.py:15`
## `SinaChinaBlockStockRecorder`
`recorders/sina/meta/sina_block_recorder.py:59`
## `SinaBlockMoneyFlowRecorder`
`recorders/sina/money_flow/sina_block_money_flow_recorder.py:17`
## `SinaStockMoneyFlowRecorder`
`recorders/sina/money_flow/sina_stock_money_flow_recorder.py:12`
## `ChinaETFDayKdataRecorder`
`recorders/sina/quotes/sina_etf_kdata_recorder.py:16`
## `ChinaIndexDayKdataRecorder`
`recorders/sina/quotes/sina_index_kdata_recorder.py:15`
## `WBCountryRecorder`
`recorders/wb/wb_country_recorder.py:9`
## `WBEconomyRecorder`
`recorders/wb/wb_economy_recorder.py:10`
FILE:references/components/samples.md
# samples (2 classes)
## `MyMaTrader`
`samples/stock_traders.py:8`
## `MyBullTrader`
`samples/stock_traders.py:27`
FILE:references/components/tag.md
# tag (42 classes)
## `StockPoolType`
`tag/common.py:5`
## `DynamicPoolType`
`tag/common.py:11`
## `InsertMode`
`tag/common.py:16`
## `TagType`
`tag/common.py:21`
## `TagStatsQueryType`
`tag/common.py:28`
## `TagInfoModel`
`tag/tag_models.py:12`
## `CreateTagInfoModel`
`tag/tag_models.py:18`
## `IndustryInfoModel`
`tag/tag_models.py:23`
## `MainTagIndustryRelation`
`tag/tag_models.py:30`
## `BuildMainTagIndustryRelationModel`
`tag/tag_models.py:35`
## `MainTagSubTagRelation`
`tag/tag_models.py:41`
## `BuildMainTagSubTagRelationModel`
`tag/tag_models.py:46`
## `ChangeMainTagModel`
`tag/tag_models.py:52`
## `StockTagsModel`
`tag/tag_models.py:57`
## `SimpleStockTagsModel`
`tag/tag_models.py:71`
## `QueryStockTagsModel`
`tag/tag_models.py:85`
## `QuerySimpleStockTagsModel`
`tag/tag_models.py:89`
## `BatchSetStockTagsModel`
`tag/tag_models.py:93`
## `TagParameter`
`tag/tag_models.py:100`
## `StockTagOptions`
`tag/tag_models.py:109`
## `SetStockTagsModel`
`tag/tag_models.py:119`
## `StockPoolModel`
`tag/tag_models.py:151`
## `StockPoolInfoModel`
`tag/tag_models.py:156`
## `CreateStockPoolInfoModel`
`tag/tag_models.py:161`
## `StockPoolsModel`
`tag/tag_models.py:173`
## `CreateStockPoolsModel`
`tag/tag_models.py:178`
## `QueryStockTagStatsModel`
`tag/tag_models.py:193`
## `StockTagDetailsModel`
`tag/tag_models.py:224`
## `StockTagStatsModel`
`tag/tag_models.py:258`
## `ActivateSubTagsModel`
`tag/tag_models.py:269`
## `ActivateSubTagsResultModel`
`tag/tag_models.py:273`
## `IndustryInfo`
`tag/tag_schemas.py:12`
## `MainTagInfo`
`tag/tag_schemas.py:21`
## `SubTagInfo`
`tag/tag_schemas.py:28`
## `HiddenTagInfo`
`tag/tag_schemas.py:38`
## `StockTags`
`tag/tag_schemas.py:45`
> Schema for storing stock tags
## `StockSystemTags`
`tag/tag_schemas.py:70`
## `StockPoolInfo`
`tag/tag_schemas.py:100`
## `StockPools`
`tag/tag_schemas.py:106`
## `TagStats`
`tag/tag_schemas.py:115`
## `Tagger`
`tag/tagger.py:16`
## `StockTagger`
`tag/tagger.py:40`
FILE:references/components/trader.md
# trader (22 classes)
## `TradingSignalType`
`trader/__init__.py:11`
## `OrderType`
`trader/__init__.py:20`
## `TradingSignal`
`trader/__init__.py:39`
## `TradingListener`
`trader/__init__.py:77`
## `AccountService`
`trader/__init__.py:94`
## `TraderError`
`trader/errors.py:2`
> Base class for exceptions in this module.
## `InvalidOrderParamError`
`trader/errors.py:8`
## `NotEnoughMoneyError`
`trader/errors.py:13`
## `NotEnoughPositionError`
`trader/errors.py:18`
## `InvalidOrderError`
`trader/errors.py:23`
## `WrongKdataError`
`trader/errors.py:28`
## `SimAccountService`
`trader/sim_account.py:25`
## `Trader`
`trader/trader.py:26`
## `StockTrader`
`trader/trader.py:535`
## `AccountStatsReader`
`trader/trader_info_api.py:69`
## `OrderReader`
`trader/trader_info_api.py:119`
## `PositionModel`
`trader/trader_models.py:7`
## `AccountStatsModel`
`trader/trader_models.py:32`
## `TraderInfo`
`trader/trader_schemas.py:13`
> trader info
## `AccountStats`
`trader/trader_schemas.py:33`
> account stats of every day
## `Position`
`trader/trader_schemas.py:63`
## `Order`
`trader/trader_schemas.py:97`
FILE:references/components/trading.md
# trading (19 classes)
## `ExecutionStatus`
`trading/common.py:5`
## `KdataRequestModel`
`trading/trading_models.py:19`
## `KdataModel`
`trading/trading_models.py:28`
## `TSRequestModel`
`trading/trading_models.py:36`
## `TSModel`
`trading/trading_models.py:42`
## `QuoteStatsModel`
`trading/trading_models.py:49`
## `QueryStockQuoteSettingModel`
`trading/trading_models.py:70`
## `BuildQueryStockQuoteSettingModel`
`trading/trading_models.py:75`
## `QueryTagQuoteModel`
`trading/trading_models.py:87`
## `QueryStockQuoteModel`
`trading/trading_models.py:92`
## `StockQuoteModel`
`trading/trading_models.py:102`
## `TagQuoteStatsModel`
`trading/trading_models.py:140`
## `StockQuoteStatsModel`
`trading/trading_models.py:156`
## `TradingPlanModel`
`trading/trading_models.py:173`
## `BuildTradingPlanModel`
`trading/trading_models.py:193`
## `QueryTradingPlanModel`
`trading/trading_models.py:214`
## `TagQuoteStats`
`trading/trading_schemas.py:12`
## `TradingPlan`
`trading/trading_schemas.py:24`
## `QueryStockQuoteSetting`
`trading/trading_schemas.py:44`