Tang Weigang

@clawhub-tangweigang-jpg-8679fec286

82prompts

0upvotes received

0contributions

Joined 3 months ago

82 contributions in the last year

Aug

Sep

Oct

Nov

Dec

Jan

Feb

Mar

Apr

May

Jun

Jul

Less

Freqtrade Crypto Bot

Skill

使用 Freqtrade 框架加载多交易所 OHLCV 历史数据并进行策略回测分析。

---
name: freqtrade-crypto-bot
description: |-
  使用 Freqtrade 框架加载多交易所 OHLCV 历史数据并进行策略回测分析。
license: Proprietary. See LICENSE.txt in project root.
compatibility: Designed for Doramagic-host ecosystem (Claude Code / openclaw / Cursor). Requires Python 3.12+ with uv package manager.
metadata:
  version: "v6.1"
  blueprint_id: "finance-bp-085"
  compiled_at: "2026-04-22T13:00:34.948027+00:00"
  capability_markets: "multi-market"
  capability_activities: "backtesting, factor-research"
  sop_version: "crystal-compilation-v6.1"
---
# Freqtrade 加密回测 (freqtrade-crypto-bot)

> 使用 Freqtrade 框架加载多交易所 OHLCV 历史数据并进行策略回测分析。

## Pipeline

`data_collection -> data_storage -> factor_computation -> target_selection -> trading_execution -> visualization`

## Top Use Cases (1 total)

### Strategy Analysis Template (`UC-101`)
Users need a template to load historical market data and analyze trading strategy performance using Freqtrade's configuration and history loading capa
**Triggers**: strategy analysis, backtesting template, historical data loading

**Execute trigger**: `When user intent matches intent_router.uc_entries[].positive_terms AND user uses action verb (run/execute/跑/执行/backtest/fetch/collect)`

## What I'll Ask You

- Target market: A-share (default), HK, or crypto? (US stocks in ZVT are half-baked — stockus_nasdaq_AAPL exists but coverage is thin)
- Data source / provider: eastmoney (free, no account), joinquant (account+paid), baostock (free, good history), akshare, or qmt (broker)?
- Strategy type: MACD golden-cross, MA crossover, volume breakout, fundamental screen, or custom factor?
- Time range: start_timestamp and end_timestamp for backtest period
- Target entity IDs: specific stocks (stock_sh_600000) or index components (SZ1000)?

## Semantic Locks (Fatal)

| ID | Rule | On Violation |
|---|---|---|
| `SL-01` | Execute sell orders before buy orders in every trading cycle | halt |
| `SL-02` | Trading signals MUST use next-bar execution (no look-ahead) | halt |
| `SL-03` | Entity IDs MUST follow format entity_type_exchange_code | halt |
| `SL-04` | DataFrame index MUST be MultiIndex (entity_id, timestamp) | halt |
| `SL-05` | TradingSignal MUST have EXACTLY ONE of: position_pct, order_money, order_amount | halt |
| `SL-06` | filter_result column semantics: True=BUY, False=SELL, None/NaN=NO ACTION | halt |
| `SL-07` | Transformer MUST run BEFORE Accumulator in factor pipeline | halt |
| `SL-08` | MACD parameters locked: fast=12, slow=26, signal=9 | halt |

Full lock definitions: [references/LOCKS.md](references/LOCKS.md)

## Top Anti-Patterns (25 total)

- **`AP-ZVT-183`**: 除权因子为 inf/NaN 时直接参与乘法导致复权静默失败
- **`AP-ZVT-179`**: 第三方数据接口超限后异常被吞噬，数据静默缺失
- **`AP-ZVT-183B`**: HFQ（后复权）与 QFQ（前复权）K 线表使用错误导致因子计算漂移

All 25 anti-patterns: [references/ANTI_PATTERNS.md](references/ANTI_PATTERNS.md)

## Evidence Quality Notice

> [QUALITY NOTICE] This crystal was compiled from blueprint finance-bp-085. Evidence verify ratio = 43.3% and audit fail total = 1. Generated results may have uncaptured requirement gaps. Verify critical decisions against source files (LATEST.yaml / LATEST.jsonl).

## Reference Files

| File | Contents | When to Load |
|---|---|---|
| [references/seed.yaml](references/seed.yaml) | V6+ 全量权威 (source-of-truth) | 有行为/决策争议时必读 |
| [references/ANTI_PATTERNS.md](references/ANTI_PATTERNS.md) | 25 条跨项目反模式 | 开始实现前 |
| [references/WISDOM.md](references/WISDOM.md) | 跨项目精华借鉴 | 架构决策时 |
| [references/CONSTRAINTS.md](references/CONSTRAINTS.md) | domain + fatal 约束 | 规则冲突时 |
| [references/USE_CASES.md](references/USE_CASES.md) | 全量 KUC-* 业务场景 | 需要完整示例时 |
| [references/LOCKS.md](references/LOCKS.md) | SL-* + preconditions + hints | 生成回测/交易代码前 |
| [references/COMPONENTS.md](references/COMPONENTS.md) | AST 组件地图（按 module 拆分）| 查 API 时 |

---

*Compiled by Doramagic crystal-compilation-v6.1 from `finance-bp-085` blueprint at 2026-04-22T13:00:34.948027+00:00.*
*See [human_summary.md](human_summary.md) for non-technical overview.*

FILE:human_summary.md
# Human Summary

> I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow. Just tell me what you want; I'll write the code, you don't have to dig docs. (Heads up: ZVT natively supports A-share, HK, and crypto. US stocks — stockus_nasdaq_AAPL — are half-baked; don't bother for serious work.)

## What I Can Do

- **tagline**: I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow. Just tell me what you want; I'll write the code, you don't have to dig docs. (Heads up: ZVT natively supports A-share, HK, and crypto. US stocks — stockus_nasdaq_AAPL — are half-baked; don't bother for serious work.)
- **use_cases**: ['Strategy Analysis Template', 'A-share MACD daily golden-cross backtest with hfq price adjustment from eastmoney', 'End-to-end ZVT pipeline: FinanceRecorder + GoodCompanyFactor + StockTrader', 'Multi-factor strategy with TargetSelector (AND mode) combining MACD + volume breakout', 'Index composition data collection (SZ1000, SZ2000) with EM recorder', 'Institutional fund holdings tracker via joinquant_fund_runner pattern', 'Custom Transformer + Accumulator factor with per-entity rolling state']

## What I Ask You

- Target market: A-share (default), HK, or crypto? (US stocks in ZVT are half-baked — stockus_nasdaq_AAPL exists but coverage is thin)
- Data source / provider: eastmoney (free, no account), joinquant (account+paid), baostock (free, good history), akshare, or qmt (broker)?
- Strategy type: MACD golden-cross, MA crossover, volume breakout, fundamental screen, or custom factor?
- Time range: start_timestamp and end_timestamp for backtest period
- Target entity IDs: specific stocks (stock_sh_600000) or index components (SZ1000)?

FILE:references/ANTI_PATTERNS.md
# Anti-Patterns (Cross-Project)

Total: **25**

## qlib (9)

### `AP-QLIB-1930` — 回测结果与模型无关——共享 dataset 对象导致预测值被首次模型覆盖 <sub>(high)</sub>

Qlib 中多个模型复用同一个已 fit 的 DatasetH 实例时，dataset 内部的标准化 参数（fit_start_time/fit_end_time 决定的归一化统计量）在第一次 fit 后固化。 切换模型但不重新初始化 dataset，导致所有模型实际使用同一套预测信号。表现为 无论换 LightGBM/XGBoost/DNN，回测净值曲线完全一致。这是最危险的"实验看起来 在跑，但结论全部无效"反模式。

Source: https://github.com/microsoft/qlib/issues/1930

### `AP-QLIB-2090` — fit_start_time 与 train segment 双重配置引发隐式数据泄露 <sub>(high)</sub>

Qlib DatasetH 有两个"训练数据范围"：handler 的 fit_start_time/fit_end_time （决定归一化器拟合范围）和 segments.train（决定模型训练范围）。常见错误是 让 fit_end_time 覆盖 valid/test 段，使归一化统计量（均值、标准差）包含了 未来数据，造成前向偏差（look-ahead bias）。两者独立配置但语义耦合，文档 未明确说明 fit_end_time 必须 <= train_end。

Source: https://github.com/microsoft/qlib/issues/2090

### `AP-QLIB-2036` — MACD 因子公式文档错误——DEA 被多除一次 CLOSE 导致量纲不一致 <sub>(high)</sub>

Qlib 官方文档中的 Alpha 公式示例将 MACD 的 DEA 定义为 EMA(DIF, 9) / CLOSE， 但 DIF 已经是无量纲（除过 CLOSE 的），再次除以 CLOSE 导致 DEA 量纲为 1/price。 基于此文档公式构建的 MACD 因子在截面标准化后与正确公式差异显著，IC 下降。 此类文档层面的公式错误会被大量用户直接照搬入生产因子库。

Source: https://github.com/microsoft/qlib/issues/2036

### `AP-QLIB-2184` — 自定义 A 股数据导入前未按约定填充停牌日 NaN，引发下游因子噪声 <sub>(high)</sub>

Qlib 约定停牌日 open/close/high/low/volume/factor 字段均应填 NaN，以便框架 在因子计算时识别并跳过。用户自建 A 股数据集时若将停牌日保留为上一日价格 （常见于从东财/Wind 直接导出的数据），会导致停牌期间的价格动量因子出现 "假信号"（价格不变但因子非零）。Qlib 不校验此约定，错误静默流入训练数据。

Source: https://github.com/microsoft/qlib/issues/2184

### `AP-QLIB-1892` — PIT（Point-In-Time）财务数据收集器依赖外部股票列表接口，全量 A 股获取不完整 <sub>(high)</sub>

Qlib 的 PIT 数据收集器（财务数据时间点快照）在初始化时调用 get_hs_stock_symbols() 获取沪深股票列表。该函数依赖东财 API，经常仅返回 部分列表而非全量 5000+ 股票，且函数在获取不完整时直接 raise ValueError。 用户若按文档步骤操作，财务数据集将只覆盖部分股票，基于 PIT 财务因子的回测 存在严重生存者偏差（未被采集的股票被隐式排除）。

Source: https://github.com/microsoft/qlib/issues/1892

### `AP-QLIB-2097` — 全市场 instrument="all" 在 32GB 内存机器上 OOM，但 CSI300 正常 <sub>(medium)</sub>

Qlib 在加载 Alpha158 特征时会将指定 universe 的全部特征矩阵一次性载入内存。 使用 instrument="csi300"（300 股）与 instrument="all"（5000+ 股）的内存占用 差约 16 倍。32GB 机器跑全市场时在 init_instance_by_config 阶段直接 OOM， 错误信息不提示内存问题。用户容易误以为是配置错误，实际上需要分批加载或 使用流式特征计算。

Source: https://github.com/microsoft/qlib/issues/2097

### `AP-QLIB-1984` — LightGBM 模型标签维度校验逻辑永远不触发导致多标签训练静默失败 <sub>(medium)</sub>

Qlib gbdt.py 中用 y.values.ndim == 2 判断是否为多标签，但从 DataFrame 取出的 Series 的 ndim 永远为 1，条件永远为 False，因此多标签训练不会走 squeeze 分支，而是直接进入 LightGBM 训练并在更深处抛出语义不明的错误。 用户尝试自定义多标签任务时无法从错误信息定位到此根因。

Source: https://github.com/microsoft/qlib/issues/1984

### `AP-QLIB-1915` — 自定义 CSV 数据 dump_bin 后 DataHandler 报 Length mismatch，D.features 却正常 <sub>(high)</sub>

Qlib 存在两套数据访问路径：D.features（直接读 binary）和 DataHandler/DataHandlerLP （带 processor pipeline）。自定义 A 股 CSV 数据在 dump_bin 时若字段顺序 或 symbol 格式（如 600000.SH vs SH600000）与 Qlib 约定不符，DataHandler 的 processor 在 align/reindex 时触发 Length mismatch，而 D.features 因不 经过 processor 而成功。这一"两套路径行为不一致"让用户误以为数据已正确导入。

Source: https://github.com/microsoft/qlib/issues/1915

### `AP-QLIB-1949` — Colab/Linux 多进程后端与 Qlib ParallelExt 冲突导致 DataHandler 完全不可用 <sub>(medium)</sub>

Qlib 在非 fork 环境（Windows 或 Google Colab）中，DataHandler 使用 joblib 并行加载特征时，ParallelExt 初始化时访问 _backend_args 属性失败（AttributeError）。 根因是 joblib 1.5+ 移除了该内部属性，Qlib 的兼容层未更新。表现为 D.features 调用抛出多层嵌套异常，用户无法从错误栈判断是并行后端问题还是数据问题。

Source: https://github.com/microsoft/qlib/issues/1949

## vnpy (4)

### `AP-VNPY-3691` — K 线生成器首根 K 线时间戳不对齐，导致第一个周期信号错误 <sub>(high)</sub>

vnpy BarGenerator 在合成 N 分钟 K 线时，第一根推送的 K 线时间戳为"当前 tick 所在分钟"而非"完整 N 分钟周期结束时间"。具体表现：09:59 的 tick 会 触发一根不完整的 5 分钟 K 线推送（本应等到 10:04 才推送）。策略若在 on_bar 中直接用 datetime.minute % 5 过滤，第一根 K 线恰好通过，但包含的 数据不足一个完整周期，用于信号计算会产生错误的开仓信号。

Source: https://github.com/vnpy/vnpy/issues/3691

### `AP-VNPY-3669` — Alpha 模块历史数据增量保存时新旧 DataFrame schema 不兼容导致 SchemaError <sub>(medium)</sub>

vnpy Alpha 模块在保存 K 线数据到 Parquet 文件时，将新下载数据（可能含 Float64 列）与已存文件（历史 Int64 列）直接 polars.concat。polars 强类型 不允许隐式类型提升，抛出 SchemaError。根因是不同数据源/版本返回的字段类型 不一致（如 volume 在部分行情源为整数，在另一些为浮点），且 concat 前无 schema 对齐步骤。影响所有使用 vnpy alpha 进行回测的历史数据构建流程。

Source: https://github.com/vnpy/vnpy/issues/3669

### `AP-VNPY-3685` — 价差交易模块 run_backtesting() 在 Jupyter 环境下静默报错，结果不可信 <sub>(high)</sub>

vnpy 4.10 价差交易（SpreadTrading）模块的 run_backtesting() 在 Jupyter 环境下存在事件循环冲突（asyncio already running），导致回测引擎部分逻辑 不执行但不抛异常，返回看似正常的回测统计数据。同样代码在命令行 Python 中无此问题。vnpy 4.x 将部分 IO 改为 async 但 Jupyter 的事件循环与之不兼容， 是"回测结果看起来正确但实际不完整"的隐蔽陷阱。

Source: https://github.com/vnpy/vnpy/issues/3685

### `AP-VNPY-3700` — 安装脚本不使用 venv 导致全局 numpy 版本被降级破坏其他依赖 <sub>(medium)</sub>

vnpy install.bat 直接在系统/conda base 环境安装，会强制降级 numpy 到 <2.0 以满足 vnpy 依赖，破坏依赖 numpy 2.x 的其他量化工具（如 scipy、pytorch 新版）。 没有 requirements.txt，依赖边界不透明。在多工具共存的量化研究环境中， vnpy 的安装脚本是"全局环境污染"的常见根源。

Source: https://github.com/vnpy/vnpy/issues/3700

## zipline (6)

### `AP-ZIPLINE-138` — 回测价格为未复权价，教程图表误导用户误判策略收益 <sub>(high)</sub>

Zipline 教程使用 AAPL 股价图做演示，但 bundle 中存储的是未复权价格（raw price）， 而非经过拆股/分红调整的复权价。图表显示的历史价格与市场实际价约差 4 倍（Apple 历次拆股累计因子），用户误将"价格翻 4 倍"当作策略收益。A 股场景更严重： 除权前后价格跳变会在未复权数据中形成巨大"信号"，吸引技术指标在除权日产生 虚假突破信号。

Source: https://github.com/stefan-jansen/zipline-reloaded/issues/138

### `AP-ZIPLINE-235` — 默认以当根 K 线收盘价成交，低估实盘滑点，策略回测收益虚高 <sub>(high)</sub>

Zipline 默认滑点模型在当根 K 线触发信号后，以同根 K 线收盘价成交（current bar close fill）。实盘中信号只能在下一根 K 线的开盘价附近成交（T+1 order execution）。以 A 股日线为例，用收盘价回测比用次日开盘价成交平均高估日收益 约 0.1-0.3%，年化差距可超 30%。需显式配置 slippage model 为 VolumeShareSlippage 或 FixedSlippage 并设合理 volume_limit。

Source: https://github.com/stefan-jansen/zipline-reloaded/issues/235

### `AP-ZIPLINE-190` — 日历 start_session 设为非交易日触发 DateOutOfBounds，无提示如何修正 <sub>(medium)</sub>

Zipline 在注册 bundle 或运行算法时，若 start_session 参数恰好是非交易日 （如 1998-01-01 元旦），Calendar 校验抛出 DateOutOfBounds（"cannot be earlier than the first session"）。错误信息仅显示交易日历起始日，不提示"请改为第一个 交易日"。A 股场景：使用 SSE/SZSE 日历时，若 start_date 恰好是春节前最后 一天次日（节假日），会触发同类错误，调试成本极高。

Source: https://github.com/stefan-jansen/zipline-reloaded/issues/190

### `AP-ZIPLINE-181` — asset db 过期后 Pipeline 报"no assets traded"，误导用户排查数据范围 <sub>(high)</sub>

Zipline 的 asset database（SQLite）记录每只股票的 start/end 交易日期。若 使用了旧版 Quandl/自建 bundle 且未重新 ingest，在回测新日期范围时 Pipeline 抛出 "Failed to find any assets with country_code 'US' that traded between [dates]"。A 股场景：重新下载行情后若只更新价格数据而未重建 asset db，退市/ 新上市股票的日期范围不更新，Pipeline 过滤会悄悄排除这些股票，产生生存者偏差。

Source: https://github.com/stefan-jansen/zipline-reloaded/issues/181

### `AP-ZIPLINE-285` — week_start()/week_end() 在自定义日历（非美股）下静默失效 <sub>(medium)</sub>

Zipline schedule_function 的 date_rules.week_start() 和 date_rules.week_end() 依赖交易日历的周首/周末判断逻辑，但在非美股日历（如 ASX、SSE）中，该逻辑 与 NYSE 日历的偏移计算不兼容，导致 schedule 永远不触发或在错误的日期触发。 A 股场景：使用 SSE 日历时，含春节等连续长假的周，week_start 可能跳过整个 假期周而不调仓，但用户无法从日志发现未触发的调度。

Source: https://github.com/stefan-jansen/zipline-reloaded/issues/285

### `AP-ZIPLINE-240` — 回测日期时区必须为 UTC，传入 naive datetime 引发深层 AssertionError <sub>(medium)</sub>

Zipline 内部强制要求所有时间戳为 UTC aware datetime。当用户传入 naive datetime （无时区信息，如 pd.Timestamp('2020-01-01')）时，不在入口处报错，而是在 算法执行深处触发 AssertionError: Algorithm should have a utc datetime，栈深 难以定位。A 股开发者从本地 CST 时间导入数据时极易触发此陷阱，需在 bundle 注册时显式 tz_localize('UTC')。

Source: https://github.com/stefan-jansen/zipline-reloaded/issues/240

## zvt (6)

### `AP-ZVT-183` — 除权因子为 inf/NaN 时直接参与乘法导致复权静默失败 <sub>(high)</sub>

ZVT 在计算前复权因子时以 new/old 价格比计算 qfq_factor。当 old==0（新股首日 或数据缺失）时因子为 inf；当 kdata.open 本身为 None（停牌日未填充）时乘法 抛出 TypeError。结果：整个 entity 的复权计算中断，后续 K 线全部丢失，但主 流程只 log ERROR 不中断，用户往往不知道已有大量股票数据损坏。

Source: https://github.com/zvtvz/zvt/issues/183

### `AP-ZVT-179` — 第三方数据接口超限后异常被吞噬，数据静默缺失 <sub>(high)</sub>

ZVT 使用聚宽 jqdatasdk 批量拉取全市场 K 线时（4000+ 股票），触发聚宽每日 最大查询条数限制（错误：已超过每日最大查询数量）。ZVT 捕获异常后继续执行下一 entity，导致超限后所有股票的当日数据均静默缺失。回测若使用该残缺数据库，因 子计算结果将产生系统性偏差，且无告警。

Source: https://github.com/zvtvz/zvt/issues/179

### `AP-ZVT-161` — 全市场 SQLite 批量因子计算触发 too many SQL variables 错误 <sub>(medium)</sub>

ZVT 在计算 VolumeUpMaFactor 等多股因子时，将所有 entity_id 拼入单条 SQL 的 IN 子句。当 A 股全市场（5000+ 股）一次性查询时，触发 SQLite 默认限制 SQLITE_MAX_VARIABLE_NUMBER=999。调大 max_allowed_packet（MySQL 参数）无效， 根因是 SQLite 变量数上限。正确解法是分批查询，但 ZVT 早期版本未处理此边界。

Source: https://github.com/zvtvz/zvt/issues/161

### `AP-ZVT-129` — 使用通配符导入隐藏 API 版本变更，AdjustType 等枚举莫名消失 <sub>(medium)</sub>

ZVT 文档示例使用 `from zvt import *` 导入所有符号。当 ZVT 版本升级重构 枚举（如将 AdjustType 移入子模块）后，通配符导入不再包含该符号，触发 AttributeError。使用者误以为是安装问题，实际是版本间 API breaking change 未在 CHANGELOG 中标注，且通配符导入掩盖了具体来源。应显式 import 枚举类。

Source: https://github.com/zvtvz/zvt/issues/129

### `AP-ZVT-187` — 回测引擎未在数据层空结果时提前终止，导致空指针级联崩溃 <sub>(medium)</sub>

ZVT Trader 在 load_data 完成后检查数据为空时，不提前退出，而是将空 DataFrame 传入 selector 计算，触发后续 NoneType 操作链式崩溃。错误栈深且难以定位根因， 用户误以为是策略逻辑问题。根因是数据时间窗口配置错误（start/end 不在数据 库覆盖范围内）但无有效校验。

Source: https://github.com/zvtvz/zvt/issues/187

### `AP-ZVT-183B` — HFQ（后复权）与 QFQ（前复权）K 线表使用错误导致因子计算漂移 <sub>(high)</sub>

ZVT 提供 Stock1dKdata（不复权）、Stock1dHfqKdata（后复权）、Stock1dQfqKdata （前复权）三张独立表。用户在计算价格动量/均线因子时混用两张表（如用不复权 做均线，用后复权做收益率），导致除权日前后因子值产生跳变。ZVT 不做跨表 复权类型一致性校验，混用静默通过。

Source: https://github.com/zvtvz/zvt/issues/183

FILE:references/COMPONENTS.md
# Component Capability Map

**Project**: finance-bp-085--freqtrade
**Scan date**: 2026-04-22
**Stats**: {'total_files': 8, 'total_classes': 46, 'total_functions': 0, 'total_stages': 8}

## Modules (8)

- [data_ingestion_&_history_management](components/data_ingestion_-_history_management.md): 6 classes
- [strategy_analysis_&_signal_generation](components/strategy_analysis_-_signal_generation.md): 8 classes
- [freqai_ml_training_&_inference](components/freqai_ml_training_-_inference.md): 6 classes
- [order_execution_&_trade_management](components/order_execution_-_trade_management.md): 7 classes
- [backtesting_engine](components/backtesting_engine.md): 5 classes
- [hyperoptimization](components/hyperoptimization.md): 5 classes
- [rpc_communication](components/rpc_communication.md): 5 classes
- [configuration_loading_&_validation](components/configuration_loading_-_validation.md): 4 classes

FILE:references/CONSTRAINTS.md
# Constraints

## preservation_manifest

```yaml
required_objects:
  business_decisions_count: 174
  fatal_constraints_count: 77
  non_fatal_constraints_count: 202
  use_cases_count: 1
  semantic_locks_count: 12
  preconditions_count: 4
  evidence_quality_rules_count: 2
  traceback_scenarios_count: 5

```

## Domain Constraints Injected (39)

- **`SHARED-BT-LAB-001`** <sub>(fatal)</sub>: 未来函数（Lookahead Bias）：在模拟历史时间点 t 的交易决策时， 不得使用 t 时刻之后才能知道的信息。最常见形式： (1) 使用收盘价计算信号并同日以收盘价成交； (2) 将 T 日收盘后计算的指标标记在同一根 K 线； (3) 使用当日最高/最低价作为成交假设。 信号计算与成交时间必须对齐：T 日收盘后计算信号，T+1 日开盘成交。
- **`SHARED-BT-LAB-002`** <sub>(high)</sub>: 指标预热期（Warmup Period）处理：滚动窗口指标在前 N 个 bar 时 NaN， 这些 bar 不应参与信号计算和持仓决策。强制要求指标的 warmup_period 与最长 lookback 期等长，且 warmup 期间持仓应置零。
- **`SHARED-BT-LAB-003`** <sub>(fatal)</sub>: ML/DL 模型时序数据划分必须按时间顺序：TRAIN < VALID < TEST， 不可使用随机 k-fold 分折（会将未来数据混入训练集）。 应使用 TimeSeriesSplit 或 Walk-Forward 验证。
- **`SHARED-BT-LAB-004`** <sub>(fatal)</sub>: 开盘价/最高价/最低价成交假设：日线回测中假设每日可以最高价卖出或 最低价买入（如动量策略"最高价止盈"），这是明显的 lookahead， 因为日内最高/最低价只有收盘后才能确认。成交价只能用开盘价或 前一日收盘价（带滑点）。
- **`SHARED-BT-LAB-005`** <sub>(high)</sub>: 数据对齐偏移（Off-by-one）：pandas rolling/shift 等操作容易引入细微的 1 期偏移错误。应在代码中明确记录每个序列的"观测时间点"， 并通过 assert 验证关键时间对齐关系。
- **`SHARED-BT-LAB-006`** <sub>(high)</sub>: 过度优化（Overfitting）：回测数量越多，过拟合概率越高。 Bailey et al.（2014）证明 Optimal Sharpe Ratio 期望值随回测次数单调递减。 应使用 Walk-Forward 验证代替 in-sample 参数穷举，并报告 Deflated Sharpe Ratio（DSR）而非峰值 Sharpe。
- **`SHARED-BT-SURV-001`** <sub>(fatal)</sub>: 幸存者偏差（Survivorship Bias）：使用当前市场成分股作为历史回测股票池， 会遗漏曾经存在但后来退市、摘牌或被合并的股票，系统性高估策略历史收益率。 回测股票池必须使用历史时点快照（point-in-time universe）。
- **`SHARED-BT-SURV-002`** <sub>(high)</sub>: In-Sample / Out-of-Sample 划分：策略开发、参数选择必须在样本内完成， 样本外数据仅用于最终验证，不可多次"看"样本外数据后继续调优 （会将样本外变为新的样本内，重蹈过拟合）。
- **`SHARED-BT-SURV-003`** <sub>(high)</sub>: 停牌/缺失数据的填充策略：停牌日价格不可简单用前一日收盘价 forward-fill， 因为这会在复盘时造成"零成交量"日参与了因子计算和信号生成。 应在因子计算层显式过滤缺失交易日，不填充。
- **`SHARED-BT-SURV-004`** <sub>(high)</sub>: 异常值（Extreme Value）污染：原始市场数据可能含有数据源错误（如除权未 及时调整、手工录入错误导致的极端价格），不清洗直接进入因子计算会产生 极端信号，污染整个横截面。应在 pipeline 入口处过滤 3-sigma 异常值。
- **`SHARED-BT-COST-001`** <sub>(fatal)</sub>: 交易成本（佣金 + 印花税/转让税 + 过户费）必须在回测初始化时强制配置， 不可使用零成本默认值。忽略成本的回测策略绩效指标具有欺骗性， 高换手率策略尤其严重（单边往返成本往往吞噬 50%+ 的毛收益）。
- **`SHARED-BT-COST-002`** <sub>(high)</sub>: 滑点（Slippage）建模：回测若无滑点，假设每笔订单以理想价格成交， 高频策略在实盘中会因成交价劣化而产生严重亏损。至少应配置固定点差 或比例滑点；大单应使用成交量比例模型（如不超过日成交量 5%）。
- **`SHARED-BT-COST-003`** <sub>(high)</sub>: 换手率（Turnover）必须在回测绩效报告中展示并与成本关联分析。 月换手率超过 50%（年化 600%+）时，策略净收益对成本假设极度敏感， 每 10bps 成本变化可能改变策略盈亏结论，必须做成本敏感性分析。
- **`SHARED-BT-COST-004`** <sub>(medium)</sub>: 仓位规模化（Position Sizing）必须纳入资金量约束：回测应模拟固定资金量 下的实际持仓股数（取整），而非假设可以持有小数股。 对小盘股，最小交易单位（A股：100股/手）会导致实际可持仓量与目标权重 产生偏差，应在回测中模拟取整效应。
- **`SHARED-BT-TIME-001`** <sub>(high)</sub>: 时间戳时区统一：多数据源合并时，UTC vs 本地时间混用是常见数据腐败源。 所有时间戳必须在 pipeline 入口处统一转换为同一时区（推荐 UTC 存储， 市场本地时区展示），不可在 pipeline 中途混用不同时区。
- **`SHARED-BT-TIME-002`** <sub>(high)</sub>: 交易日历对齐：合并不同市场或不同频率数据时（如日线价格 + 周频因子）， 必须使用明确的交易日历进行 reindex/merge，不可使用 outer join 后 fillna， 否则会在非交易日（节假日）创建虚假数据行。
- **`SHARED-BT-TIME-003`** <sub>(high)</sub>: 增量更新边界校验：历史数据增量更新时，必须从数据库查询已存最新日期， 仅下载该日期之后的数据。若重新下载已有数据并追加，会产生时间戳重复行， 导致回测时序错误。更新前后必须校验无重复 (index.duplicated().any() == False)。
- **`SHARED-BT-TIME-004`** <sub>(medium)</sub>: 回测绩效归因失真：基准（Benchmark）选择不当会使 Alpha/Beta 计算失真。 应选用策略实际可投资的被动基准（如 HS300 ETF），而非不可直接投资的 价格指数（如 HS300 指数）。价格指数不含股息再投资，会低估持仓基准收益。
- **`SHARED-BT-PERF-001`** <sub>(medium)</sub>: 最大回撤（Max Drawdown）计算必须使用净值序列（portfolio value）， 不可用累计收益率序列代替。若使用对数收益率累加，会低估回撤深度 （因对数收益率在下跌时会比简单收益率偏小）。
- **`SHARED-BT-PERF-002`** <sub>(medium)</sub>: Sharpe Ratio 年化化约定：年化 Sharpe = 日 Sharpe × sqrt(252)（股票，252 交易日） 或 × sqrt(365)（加密货币，365日）。不同系统默认不同，跨系统对比前必须 确认年化因子，否则 Sharpe 不可比。
- **`SHARED-BT-PERF-003`** <sub>(medium)</sub>: Calmar Ratio / Sortino Ratio 优于 Sharpe Ratio 作为风险调整收益指标： Sharpe 假设收益正态分布，A 股/加密市场的收益分布显著左偏（肥尾）， 会低估下行风险。量化评估应同时报告 Sortino（仅下行波动）和 Calmar（年化收益/最大回撤），不应单一依赖 Sharpe。
- **`SHARED-BT-PERF-004`** <sub>(medium)</sub>: 回测绩效归因应拆解为：alpha（主动收益）、beta（市场收益）、 因子暴露收益（style/sector）和特异性收益（stock selection）。 不做归因的回测无法区分"策略优秀"与"顺风行情恰好 beta 对了"。
- **`SHARED-FR-IC-001`** <sub>(high)</sub>: IC（信息系数）是衡量因子预测能力的核心指标，定义为因子值与 下期收益率的 Spearman 秩相关系数（ICIR = IC / std(IC)）。 IC 绝对值 > 0.05 视为有预测能力的初步证据，ICIR > 0.5 视为稳定。 不计算 IC 直接报告回测绩效是因子有效性证明缺失的典型问题。
- **`SHARED-FR-IC-002`** <sub>(high)</sub>: IC 衰减（IC Decay）分析：因子预测能力通常随持仓期增长而衰减。 应计算 1/5/10/20 日 IC 序列，识别因子的最优持仓期。 IC 在1日高但20日迅速衰减的因子是短期因子，不适合月度换仓策略； 反之亦然。使用错误的持仓期会严重损害因子实盘表现。
- **`SHARED-FR-IC-003`** <sub>(high)</sub>: Harvey, Liu & Zhu (2016) 警告：学术界已发现 300+ 个"显著"因子， 其中大量是多重检验下的误发现（False Discovery）。因子有效性要求： t-stat > 3.0（而非传统的 1.96）；或在不同时段/市场独立复现； 或有清晰的经济学逻辑。不满足上述条件的因子极可能是数据挖掘产物。
- **`SHARED-FR-IC-004`** <sub>(high)</sub>: 因子换手率（Factor Turnover）控制：高 IC 但高换手率的因子，在扣除 交易成本后净 IC 可能为负。应计算换手率调整后的有效 IC： net_IC = IC - turnover × cost_per_turn。目标换手率 ≤ 50%（月频）。
- **`SHARED-FR-IC-005`** <sub>(medium)</sub>: 因子衰减期（Half-life）是因子信号强度的核心参数，直接决定最优再平衡频率。 半衰期 < 5 日：日频或周频换仓；5-20 日：周频或双周；> 20 日：月频换仓。 错误地对短期因子使用月频换仓，会导致大量 alpha 在持仓期内消散。
- **`SHARED-FR-NEUT-001`** <sub>(high)</sub>: 行业中性化（Industry Neutralization）：因子值若不对行业均值中性化， 因子收益中会混入行业轮动收益，难以判断是因子本身还是行业暴露驱动了收益。 行业中性化操作：factor_neutral = factor - industry_mean(factor)。
- **`SHARED-FR-NEUT-002`** <sub>(high)</sub>: 市值中性化（Market Cap Neutralization）：小盘股效应（小盘跑赢大盘） 是金融史上最持久的 anomaly 之一，会污染几乎所有未中性化的因子。 若因子与市值高度相关，选股会系统性偏向小盘，收益来自市值暴露而非因子本身。 需同时进行行业和市值中性化（Fama-MacBeth 回归或残差法）。
- **`SHARED-FR-NEUT-003`** <sub>(high)</sub>: 异常值处理（Winsorize/MAD）：因子原始值通常含有极端值，极端值会扭曲 分组分析（如 Q1/Q10 十分位）。应对原始因子值做 Winsorize（截尾至 [1%, 99%] 或 3-sigma）或 MAD（中位数绝对偏差）缩尾，然后再排名/中性化。
- **`SHARED-FR-NEUT-004`** <sub>(medium)</sub>: 因子正交化（Factor Orthogonalization）：当多个因子共同用于合成打分时， 高相关因子的合成等效于对单一因子过度权重，稀释信号多样性。 应在合成前对因子做施密特正交化或 PCA，消除因子间的多重共线性。
- **`SHARED-FR-NEUT-005`** <sub>(medium)</sub>: 缺失数据填充策略：因子计算中的 NaN（停牌/新股/数据缺口）若用截面均值填充 会引入 lookahead bias（均值本身含未来信息）；若完全删除会产生幸存者偏差； 正确做法是用截面中位数（当日所有股票的中位数，不依赖未来）或将该股当日排除。
- **`SHARED-FR-PORT-001`** <sub>(high)</sub>: 分层分析（Quantile Analysis）：因子评估应使用 Q1/Q5（五分位）或 Q1/Q10（十分位）分组的多空收益差（top minus bottom spread）作为 主要评估指标，而非简单的多头收益。Q1 多 Q5 空的"单调性"检验是 因子有效性的核心证据：单调递增/递减 > 非单调 >> 仅多头有效。
- **`SHARED-FR-PORT-002`** <sub>(medium)</sub>: Alpha 衰减测试（Alpha Decay Test）：因子的月度 IC 在不同时段（牛市/熊市/ 震荡市）的稳定性是因子鲁棒性的重要证据。IC 仅在某个特定市场状态下有效 的因子不适合全天候部署；应分段（rolling 12M）展示 IC 时序， 识别因子失效期。
- **`SHARED-FR-PORT-003`** <sub>(medium)</sub>: 换仓成本感知（Turnover-Aware Selection）：因子排名靠近中间地带（49-51 分位） 的股票，排名小幅波动就会触发换仓，产生大量无效交易成本。 应在选股时设置换仓缓冲区（buffer zone）：只在排名变化超过阈值时才换仓。
- **`SHARED-FR-PORT-004`** <sub>(medium)</sub>: 分组收益的统计显著性（Bootstrap 检验）：因子分层收益差（Q1-Q5 spread） 即使在历史数据上很大，也可能是偶然，需要 bootstrap 或 t-test 检验 显著性（p-value < 0.05）。小样本回测期（< 3年）的分层收益尤其不可靠。
- **`SHARED-FR-XFER-001`** <sub>(high)</sub>: 因子跨市场可移植性验证：在一个市场有效的因子，不必然在另一个市场有效。 将美股因子直接套用 A 股、或将股票因子套用期货/加密货币，需要独立 IC 验证， 不可假设跨市场通用性。A 股特有异象（如反转效应、ST 价格异常）不存在于美股。
- **`SHARED-FR-XFER-002`** <sub>(medium)</sub>: 因子有效性时间稳定性：曾经有效的因子会因市场学习和套利行为逐渐失效 （McLean & Pontiff 2016 证明因子发表后平均衰减 58%）。 应定期（每季度/年）重新评估因子 IC，失效因子应及时替换或降权。
- **`SHARED-FR-XFER-003`** <sub>(medium)</sub>: 因子与宏观经济环境的交互：利率周期/经济周期/市场情绪对因子有效性影响显著。 价值因子（低 P/B）在利率上升期更有效；动量因子在趋势市更有效，震荡市失效。 部署因子前应评估当前宏观环境与因子最优生存环境的匹配度。

FILE:references/LOCKS.md
# Semantic Locks + Preconditions

## Semantic Locks (12)

### `SL-01` <sub>(on_violation: fatal)</sub>
Execute sell orders before buy orders in every trading cycle

### `SL-02` <sub>(on_violation: fatal)</sub>
Trading signals MUST use next-bar execution (no look-ahead)

### `SL-03` <sub>(on_violation: fatal)</sub>
Entity IDs MUST follow format entity_type_exchange_code

### `SL-04` <sub>(on_violation: fatal)</sub>
DataFrame index MUST be MultiIndex (entity_id, timestamp)

### `SL-05` <sub>(on_violation: fatal)</sub>
TradingSignal MUST have EXACTLY ONE of: position_pct, order_money, order_amount

### `SL-06` <sub>(on_violation: fatal)</sub>
filter_result column semantics: True=BUY, False=SELL, None/NaN=NO ACTION

### `SL-07` <sub>(on_violation: fatal)</sub>
Transformer MUST run BEFORE Accumulator in factor pipeline

### `SL-08` <sub>(on_violation: fatal)</sub>
MACD parameters locked: fast=12, slow=26, signal=9

### `SL-09` <sub>(on_violation: warning)</sub>
Default transaction costs: buy_cost=0.001, sell_cost=0.001, slippage=0.001

### `SL-10` <sub>(on_violation: fatal)</sub>
A-share equity trading is T+1 (no same-day close of buy positions)

### `SL-11` <sub>(on_violation: fatal)</sub>
Recorder subclass MUST define provider AND data_schema class attributes

### `SL-12` <sub>(on_violation: fatal)</sub>
Factor result_df MUST contain either 'filter_result' OR 'score_result' column

## Preconditions (4)

- **`PC-01`**: `python3 -c 'import zvt; print(zvt.__version__)'` → on_fail: Run: python3 -m pip install zvt then re-run: python3 -m zvt.init_dirs to initialize data directories
- **`PC-02`**: `python3 -c "from zvt.api.kdata import get_kdata; df = get_kdata(entity_ids=['stock_sh_600000'], limit=1); assert df is not None and len(df) > 0, 'No kdata found'"` → on_fail: Run recorder first: python3 -m zvt.recorders.em.em_stock_kdata_recorder --entity_ids stock_sh_600000 (replace with your target entity IDs)
- **`PC-03`**: `python3 -c "import os; from pathlib import Path; zvt_home = Path(os.environ.get('ZVT_HOME', Path.home() / '.zvt')); assert zvt_home.exists(), f'ZVT home not found: {zvt_home}'"` → on_fail: Run: python3 -m zvt.init_dirs
- **`PC-04`**: `python3 -c "import os, tempfile; from pathlib import Path; zvt_home = Path(os.environ.get('ZVT_HOME', Path.home() / '.zvt')); test_f = zvt_home / '.write_test'; test_f.touch(); test_f.unlink()"` → on_fail: Check directory permissions: chmod u+w ~/.zvt or set ZVT_HOME environment variable to a writable location

FILE:references/USE_CASES.md
# Known Use Cases (KUC)

Total: **1**

## `KUC-101`
**Source**: `freqtrade/templates/strategy_analysis_example.ipynb`

Users need a template to load historical market data and analyze trading strategy performance using Freqtrade's configuration and history loading capabilities before live deployment.

FILE:references/WISDOM.md
# Cross-Project Wisdom

Total: **10**

## `CW-BT-001` — Cerebro 统一编排引擎
**From**: backtrader · **Applicable to**: backtesting

backtrader 用 Cerebro 作为单一入口，统一管理 data feeds、strategies、analyzers、 observers 的生命周期，支持一次 cerebro.run() 跑多策略+多数据源。 zvt 的 StockTrader 目前每次实例化只绑定一套因子，缺乏统一的多策略组合编排层； 借鉴 Cerebro 模式可让用户把多个 Trader 实例组合到一个 runner 中对比评估。

## `CW-BT-002` — Analyzer 插件化绩效评估
**From**: backtrader · **Applicable to**: backtesting

backtrader 提供 SharpeRatio、DrawDown、TimeReturn、TradeAnalyzer 等即插即用 的 Analyzer，可在不修改策略代码的情况下附加任意绩效指标。 zvt 当前绩效评估能力较弱，没有标准化的 Analyzer 接口； 借鉴此模式可让用户 cerebro.addanalyzer(SharpeRatio) 即得风险调整收益报告。

## `CW-BT-003` — Sizer 仓位管理分离
**From**: backtrader · **Applicable to**: backtesting

backtrader 将仓位管理（每次开仓买多少股/多大比例）单独抽象为 Sizer， 与信号逻辑完全解耦；内置 FixedSize、PercentSizer 等，用户可自定义。 zvt 目前没有显式的 Sizer 概念，仓位控制逻辑散落在 Trader.on_profit_control 等钩子中； 引入 Sizer 接口可使策略信号与资金管理规则独立演化和组合复用。

## `CW-BT-004` — Order 类型全集（Limit/Stop/OCO/Bracket）
**From**: backtrader · **Applicable to**: backtesting

backtrader 支持 Market、Limit、Stop、StopLimit、OCO（二选一）、 Bracket（止盈止损一对订单）等丰富订单类型，并模拟成交滑点和手续费方案。 zvt 回测目前主要支持市价成交，缺乏限价委托和组合订单模拟； 对于高频或实盘对接场景，完善订单类型将大幅提升回测真实性。

## `CW-BT-005` — 数据重采样与重播（Resampling & Replaying）
**From**: backtrader · **Applicable to**: backtesting

backtrader 可将低级别数据（如 1 min）实时 resample 为高级别（如 1 day）并同步驱动策略， 或 replay 逐 tick 模拟 OHLC 形成过程，实现日内精细回测。 zvt 目前多时间框架通过预录入不同级别 K 线实现，缺少运行时动态重采样； 借鉴此模式可在不重复录入数据的前提下支持任意时间粒度组合回测。

## `CW-VN-003` — CTA 回测引擎内置可视化
**From**: vnpy · **Applicable to**: backtesting

vnpy 的 cta_backtester 提供图形界面直接展示策略净值曲线、最大回撤、 每日盈亏、成交明细，无需 Jupyter Notebook。 zvt 目前回测结果可视化依赖 draw_result 方法调用 Plotly，但无统一的回测报告页面； 借鉴此模式可打包一个开箱即用的策略绩效仪表盘。

## `CW-VN-004` — vnpy.alpha ML 因子研究实验室（Lab）
**From**: vnpy · **Applicable to**: factor-research

vnpy 4.0 的 vnpy.alpha.lab 提供数据管理、模型训练、信号生成、策略回测一体化工作流， 支持 Lasso/LightGBM/MLP 等算法的标准化训练接口和可视化对比。 zvt 的 ML 能力目前仅有 MaStockMLMachine 一个入口，缺乏规范化 Lab 框架； 借鉴 Lab 模式可建立"特征工程→训练→信号→回测"的标准流水线，降低 ML 实验门槛。

## `CW-QL-001` — Point-in-Time 数据库（防未来数据泄漏）
**From**: qlib · **Applicable to**: backtesting

qlib 的 Point-in-Time Provider 保证在给定时间点 t 的查询只返回 t 时刻 真实可知的数据（财报发布延迟、修订历史均被正确处理）， 彻底消除回测中的 look-ahead bias。 zvt 目前财务数据以报告期为 timestamp，缺少"发布日"维度， 存在用未来财报数据做选股的潜在偏差；引入 PIT 模式可大幅提升回测可信度。

## `CW-QL-002` — Recorder + Experiment 实验管理（MLflow 风格）
**From**: qlib · **Applicable to**: factor-research

qlib 的 workflow 模块提供 Experiment/Recorder，自动记录每次模型训练的 超参数、特征、指标、预测结果，支持跨实验比较和模型版本管理。 zvt 目前缺乏 ML 实验追踪机制，每次重跑结果会覆盖前次； 借鉴 Recorder 模式可将每次因子实验的参数和结果持久化，支持快速复现和版本对比。

## `CW-QL-003` — Nested Decision Framework（多层嵌套决策执行）
**From**: qlib · **Applicable to**: backtesting

qlib 支持将高频执行层（分钟级委托拆单）嵌套在低频决策层（日级组合调仓）中， 两层独立优化且可组合运行，实现日内最优执行算法（如 TWAP、VWAP 调仓）。 zvt 目前回测仅有日线级别的成交假设，缺乏执行算法建模； 借鉴嵌套框架可让 zvt 区分"何时持有哪些股"与"如何以最小冲击成本建仓"两个问题。

FILE:references/components/backtesting_engine.md
# backtesting_engine (5 classes)

## `Backtesting.backtest_loop`
`backtesting_engine/backtesting-backtest-loop.py:0`

## `Backtesting._run_funding_fees`
`backtesting_engine/backtesting-run-funding-fees.py:0`

## `trade_list_to_dataframe`
`backtesting_engine/trade-list-to-dataframe.py:0`

## `fill_model`
`backtesting_engine/fill-model.py:0`

## `protections`
`backtesting_engine/protections.py:0`

FILE:references/components/configuration_loading_-_validation.md
# configuration_loading_&_validation (4 classes)

## `Configuration.load_config`
`configuration_loading_&_validation/configuration-load-config.py:0`

## `Configuration.deep_merge_dicts`
`configuration_loading_&_validation/configuration-deep-merge-dicts.py:0`

## `CONF_SCHEMA.validate`
`configuration_loading_&_validation/conf-schema-validate.py:0`

## `config_validation`
`configuration_loading_&_validation/config-validation.py:0`

FILE:references/components/data_ingestion_-_history_management.md
# data_ingestion_&_history_management (6 classes)

## `Exchange._init_subclasses`
`data_ingestion_&_history_management/exchange-init-subclasses.py:0`

## `IDataHandler.get_file_extension`
`data_ingestion_&_history_management/idatahandler-get-file-extension.py:0`

## `DataProvider.get_df`
`data_ingestion_&_history_management/dataprovider-get-df.py:0`

## `load_data`
`data_ingestion_&_history_management/load-data.py:0`

## `data_handler_implementation`
`data_ingestion_&_history_management/data-handler-implementation.py:0`

## `exchange_adapter`
`data_ingestion_&_history_management/exchange-adapter.py:0`

FILE:references/components/freqai_ml_training_-_inference.md
# freqai_ml_training_&_inference (6 classes)

## `IFreqaiModel.train`
`freqai_ml_training_&_inference/ifreqaimodel-train.py:0`

## `IFreqaiModel.predict`
`freqai_ml_training_&_inference/ifreqaimodel-predict.py:0`

## `FreqaiDataKitchen.check_if_new_training_required`
`freqai_ml_training_&_inference/freqaidatakitchen-check-if-new-training-.py:0`

## `FreqaiDataDrawer.load_historic_predictions_from_disk`
`freqai_ml_training_&_inference/freqaidatadrawer-load-historic-predictio.py:0`

## `prediction_model`
`freqai_ml_training_&_inference/prediction-model.py:0`

## `compute_device`
`freqai_ml_training_&_inference/compute-device.py:0`

FILE:references/components/hyperoptimization.md
# hyperoptimization (5 classes)

## `Hyperopt.run`
`hyperoptimization/hyperopt-run.py:0`

## `HyperOptimizer.hyperopt_pickle_magic`
`hyperoptimization/hyperoptimizer-hyperopt-pickle-magic.py:0`

## `IHyperOptLoss.__call__`
`hyperoptimization/ihyperoptloss-call.py:0`

## `optimizer`
`hyperoptimization/optimizer.py:0`

## `loss_function`
`hyperoptimization/loss-function.py:0`

FILE:references/components/order_execution_-_trade_management.md
# order_execution_&_trade_management (7 classes)

## `FreqtradeBot.execute_entry`
`order_execution_&_trade_management/freqtradebot-execute-entry.py:0`

## `FreqtradeBot.handle_trade`
`order_execution_&_trade_management/freqtradebot-handle-trade.py:0`

## `Trade.calc_profit`
`order_execution_&_trade_management/trade-calc-profit.py:0`

## `Trade.adjust_trade_position`
`order_execution_&_trade_management/trade-adjust-trade-position.py:0`

## `order_type`
`order_execution_&_trade_management/order-type.py:0`

## `stoploss_placement`
`order_execution_&_trade_management/stoploss-placement.py:0`

## `position_sizing`
`order_execution_&_trade_management/position-sizing.py:0`

FILE:references/components/rpc_communication.md
# rpc_communication (5 classes)

## `RPC._rpc_force_entry`
`rpc_communication/rpc-rpc-force-entry.py:0`

## `RPC._rpc_force_exit`
`rpc_communication/rpc-rpc-force-exit.py:0`

## `RPC._ws_request_analyzed_df`
`rpc_communication/rpc-ws-request-analyzed-df.py:0`

## `RPCManager.start`
`rpc_communication/rpcmanager-start.py:0`

## `rpc_transport`
`rpc_communication/rpc-transport.py:0`

FILE:references/components/strategy_analysis_-_signal_generation.md
# strategy_analysis_&_signal_generation (8 classes)

## `IStrategy.populate_indicators`
`strategy_analysis_&_signal_generation/istrategy-populate-indicators.py:0`

## `IStrategy.populate_entry_trend`
`strategy_analysis_&_signal_generation/istrategy-populate-entry-trend.py:0`

## `IStrategy.populate_exit_trend`
`strategy_analysis_&_signal_generation/istrategy-populate-exit-trend.py:0`

## `IStrategy.get_entry_signal`
`strategy_analysis_&_signal_generation/istrategy-get-entry-signal.py:0`

## `IStrategy.should_exit`
`strategy_analysis_&_signal_generation/istrategy-should-exit.py:0`

## `strategy_implementation`
`strategy_analysis_&_signal_generation/strategy-implementation.py:0`

## `pairlist_filter`
`strategy_analysis_&_signal_generation/pairlist-filter.py:0`

## `custom_callbacks`
`strategy_analysis_&_signal_generation/custom-callbacks.py:0`

ClawHub Coding Data Analysis+2

T@clawhub-tangweigang-jpg-8679fec286

Firesale Stress Test

Skill

执行银行系统级压力测试，基于EBA 2018真实数据计算CET1比率与杠杆率，模拟firesale情景下资产负债表韧性。

---
name: firesale-stress-test
description: |-
  执行银行系统级压力测试，基于EBA 2018真实数据计算CET1比率与杠杆率，模拟firesale情景下资产负债表韧性。
license: Proprietary. See LICENSE.txt in project root.
compatibility: Designed for Doramagic-host ecosystem (Claude Code / openclaw / Cursor). Requires Python 3.12+ with uv package manager.
metadata:
  version: "v6.1"
  blueprint_id: "finance-bp-067"
  compiled_at: "2026-04-22T13:00:22.380878+00:00"
  capability_markets: "global"
  capability_activities: "regtech-compliance"
  sop_version: "crystal-compilation-v6.1"
---
# 银行压力测试 (firesale-stress-test)

> 执行银行系统级压力测试，基于EBA 2018真实数据计算CET1比率与杠杆率，模拟firesale情景下资产负债表韧性。

## Pipeline

`data_collection -> data_storage -> factor_computation -> target_selection -> trading_execution -> visualization`

## Top Use Cases (0 total)

**Execute trigger**: `When user intent matches intent_router.uc_entries[].positive_terms AND user uses action verb (run/execute/跑/执行/backtest/fetch/collect)`

## What I'll Ask You

- Target market: A-share (default), HK, or crypto? (US stocks in ZVT are half-baked — stockus_nasdaq_AAPL exists but coverage is thin)
- Data source / provider: eastmoney (free, no account), joinquant (account+paid), baostock (free, good history), akshare, or qmt (broker)?
- Strategy type: MACD golden-cross, MA crossover, volume breakout, fundamental screen, or custom factor?
- Time range: start_timestamp and end_timestamp for backtest period
- Target entity IDs: specific stocks (stock_sh_600000) or index components (SZ1000)?

## Semantic Locks (Fatal)

| ID | Rule | On Violation |
|---|---|---|
| `SL-01` | Execute sell orders before buy orders in every trading cycle | halt |
| `SL-02` | Trading signals MUST use next-bar execution (no look-ahead) | halt |
| `SL-03` | Entity IDs MUST follow format entity_type_exchange_code | halt |
| `SL-04` | DataFrame index MUST be MultiIndex (entity_id, timestamp) | halt |
| `SL-05` | TradingSignal MUST have EXACTLY ONE of: position_pct, order_money, order_amount | halt |
| `SL-06` | filter_result column semantics: True=BUY, False=SELL, None/NaN=NO ACTION | halt |
| `SL-07` | Transformer MUST run BEFORE Accumulator in factor pipeline | halt |
| `SL-08` | MACD parameters locked: fast=12, slow=26, signal=9 | halt |

Full lock definitions: [references/LOCKS.md](references/LOCKS.md)

## Top Anti-Patterns (15 total)

- **`AP-REGTECH-001`**: Missing attribute initialization on data structures
- **`AP-REGTECH-002`**: Self-loops in transaction graphs violate domain rules
- **`AP-REGTECH-003`**: Unvalidated floating-point inputs cause runtime crashes

All 15 anti-patterns: [references/ANTI_PATTERNS.md](references/ANTI_PATTERNS.md)

## Evidence Quality Notice

> [QUALITY NOTICE] This crystal was compiled from blueprint finance-bp-067. Evidence verify ratio = 56.1% and audit fail total = 22. Generated results may have uncaptured requirement gaps. Verify critical decisions against source files (LATEST.yaml / LATEST.jsonl).

## Reference Files

| File | Contents | When to Load |
|---|---|---|
| [references/seed.yaml](references/seed.yaml) | V6+ 全量权威 (source-of-truth) | 有行为/决策争议时必读 |
| [references/ANTI_PATTERNS.md](references/ANTI_PATTERNS.md) | 15 条跨项目反模式 | 开始实现前 |
| [references/WISDOM.md](references/WISDOM.md) | 跨项目精华借鉴 | 架构决策时 |
| [references/CONSTRAINTS.md](references/CONSTRAINTS.md) | domain + fatal 约束 | 规则冲突时 |
| [references/USE_CASES.md](references/USE_CASES.md) | 全量 KUC-* 业务场景 | 需要完整示例时 |
| [references/LOCKS.md](references/LOCKS.md) | SL-* + preconditions + hints | 生成回测/交易代码前 |
| [references/COMPONENTS.md](references/COMPONENTS.md) | AST 组件地图（按 module 拆分）| 查 API 时 |

---

*Compiled by Doramagic crystal-compilation-v6.1 from `finance-bp-067` blueprint at 2026-04-22T13:00:22.380878+00:00.*
*See [human_summary.md](human_summary.md) for non-technical overview.*

FILE:human_summary.md
# Human Summary

> I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow. Just tell me what you want; I'll write the code, you don't have to dig docs. (Heads up: ZVT natively supports A-share, HK, and crypto. US stocks — stockus_nasdaq_AAPL — are half-baked; don't bother for serious work.)

## What I Can Do

- **tagline**: I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow. Just tell me what you want; I'll write the code, you don't have to dig docs. (Heads up: ZVT natively supports A-share, HK, and crypto. US stocks — stockus_nasdaq_AAPL — are half-baked; don't bother for serious work.)
- **use_cases**: ['A-share MACD daily golden-cross backtest with hfq price adjustment from eastmoney', 'End-to-end ZVT pipeline: FinanceRecorder + GoodCompanyFactor + StockTrader', 'Multi-factor strategy with TargetSelector (AND mode) combining MACD + volume breakout', 'Index composition data collection (SZ1000, SZ2000) with EM recorder', 'Institutional fund holdings tracker via joinquant_fund_runner pattern', 'Custom Transformer + Accumulator factor with per-entity rolling state', 'Bollinger Band mean-reversion factor with BollTransformer (window=20, window_dev=2)']

## What I Ask You

- Target market: A-share (default), HK, or crypto? (US stocks in ZVT are half-baked — stockus_nasdaq_AAPL exists but coverage is thin)
- Data source / provider: eastmoney (free, no account), joinquant (account+paid), baostock (free, good history), akshare, or qmt (broker)?
- Strategy type: MACD golden-cross, MA crossover, volume breakout, fundamental screen, or custom factor?
- Time range: start_timestamp and end_timestamp for backtest period
- Target entity IDs: specific stocks (stock_sh_600000) or index components (SZ1000)?

FILE:references/ANTI_PATTERNS.md
# Anti-Patterns (Cross-Project)

Total: **15**

## finance-bp-060--AMLSim (1)

### `AP-REGTECH-011` — Mismatched configuration parameters across coupled components <sub>(medium)</sub>

When TransactionGenerator and Nominator use different degree_threshold values, Nominator identifies hub accounts using different criteria than TransactionGenerator. This causes incorrect fan-in/fan-out candidate selection. Consequence: AML typology patterns placed on wrong accounts, invalidating simulation results.

## finance-bp-060--AMLSim, finance-bp-067--firesale_stresstest (1)

### `AP-REGTECH-002` — Self-loops in transaction graphs violate domain rules <sub>(high)</sub>

When generating directed transaction graphs or AML typologies, allowing source == destination edges creates self-loops. In AML simulation, self-loops represent accounts sending money to themselves, which is not a valid money laundering pattern. In fire-sale models, self-loops cause undefined behavior. Consequence: corrupted graph topology and invalid typology validation.

## finance-bp-060--AMLSim, finance-bp-071--opensanctions (1)

### `AP-REGTECH-001` — Missing attribute initialization on data structures <sub>(high)</sub>

When loading account lists or creating entity dictionaries, failing to initialize required list/dict attributes (e.g., normal_models, statement IDs) causes KeyError or ValueError at runtime. The code path that reads these structures assumes they exist, but the initialization path omits them. Consequence: pipeline crashes or data loss for affected entities.

## finance-bp-062--ifrs9 (3)

### `AP-REGTECH-005` — Incorrect amortization windows violate IFRS 9 compliance <sub>(high)</sub>

Stage 1 ECL requires exactly 12-month amortization (11 zero-indexed iterations) while Stage 2/3 requires full remaining tenor (tenor-1 iterations). Using identical windows for all stages causes ECL over/understatement. Consequence: regulatory non-compliance and materially incorrect loan loss provisions.

### `AP-REGTECH-010` — Incorrect cumulative PD ordering corrupts lifetime ECL term structure <sub>(high)</sub>

Using cumprod(1-conPD) without shift(1) and fillna(1) produces corrupted first-period survival probability. This cascades into all subsequent marginal and cumulative PD calculations, violating IFRS 9 lifetime ECL requirements. Consequence: systematically incorrect provisions across all remaining tenor periods.

### `AP-REGTECH-015` — Missing EAD component in ECL formula produces incomplete provisions <sub>(high)</sub>

IFRS 9 requires ECL = PD x LGD x EAD. When the EAD module is missing or not integrated, the ECL calculation is incomplete and unusable for provisioning. Consequence: regulatory rejection of ECL calculations, blocking of provisioning and reporting processes.

## finance-bp-062--ifrs9, finance-bp-067--firesale_stresstest (2)

### `AP-REGTECH-003` — Unvalidated floating-point inputs cause runtime crashes <sub>(high)</sub>

When parsing CSV files or computing statistical functions on raw data, failing to validate inputs against acceptable ranges (e.g., DDP near 0 or 1 for norm.ppf, unvalidated floats from CSV) causes ValueError or infinite/NaN values. Consequence: entire model crashes before simulation or corrupted downstream calculations.

### `AP-REGTECH-004` — Division by zero in financial calculations produces inf/NaN <sub>(high)</sub>

When calculating ratios like DDP (downgrade observations / total observations) or price impact denominators (total_quantities), zero-denominator cases are not guarded. The resulting inf/NaN propagates through all downstream calculations, corrupting CCI, ECL, or market clearing. Consequence: systematic data corruption across the entire calculation pipeline.

## finance-bp-067--firesale_stresstest (4)

### `AP-REGTECH-006` — Wrong leverage formula in threshold-based decisions <sub>(high)</sub>

Computing leverage as equity-to-liabilities (E/L) instead of equity-to-assets (E/A) produces different values. This causes deleveraging triggers and insolvency detection to fire at wrong thresholds. Consequence: zombie banks continue operating with negative equity, or healthy banks unnecessarily deleverage.

### `AP-REGTECH-007` — Confusing deleveraging buffer threshold with insolvency threshold <sub>(high)</sub>

Banks below 3% leverage are insolvent and must default, but deleveraging should trigger at 4% buffer. Using the same threshold eliminates the buffer zone, causing immediate default with no intermediate corrective action. Consequence: excessive bank failures amplify systemic contagion.

### `AP-REGTECH-013` — Order-dependent execution creates first-mover advantage bias <sub>(medium)</sub>

Without separating step() and act() phases, first-acting banks sell assets before others decide, creating systematic first-mover advantage. This distorts the competitive equilibrium and fire-sale dynamics. Consequence: unreliable systemic risk estimates that understate contagion for late-acting banks.

### `AP-REGTECH-014` — Immediate asset sales cause double-selling and undefined state <sub>(medium)</sub>

Executing asset sales immediately rather than queuing them to a buffer allows multiple banks holding the same asset to sell simultaneously without accounting for concurrent intentions. Consequence: undefined price impact and incorrect cash transfers in market clearing.

## finance-bp-071--opensanctions (3)

### `AP-REGTECH-008` — Cache keys omit request body for state-changing methods <sub>(high)</sub>

Using only URL for cache fingerprints on POST/PATCH requests means different request bodies return identical cached content. This causes stale data, missing entities, and data corruption in compliance screening pipelines. Consequence: sanctions matches missed or false positives from stale entity data.

### `AP-REGTECH-009` — ID collision in entity construction creates false sanctions matches <sub>(high)</sub>

When constructing entity IDs from source identifiers, insufficient identifying attributes cause different real-world entities to receive identical IDs. The database then merges them into one entity. Consequence: a sanctioned entity's ID matches an innocent entity, causing false positive compliance alerts.

### `AP-REGTECH-012` — Reverse property assignment corrupts entity construction <sub>(medium)</sub>

Stub (reverse) properties represent inverse relationships and raise InvalidData when directly assigned. Attempting to add values to stub properties instead of forward properties causes ValueError, aborting entity construction. Consequence: entities lost from output, incomplete compliance datasets.

FILE:references/COMPONENTS.md
# Component Capability Map

**Project**: finance-bp-067--firesale_stresstest
**Scan date**: 2026-04-22
**Stats**: {'total_files': 5, 'total_classes': 23, 'total_functions': 0, 'total_stages': 5}

## Modules (5)

- [model_initialization](components/model_initialization.md): 4 classes
- [shock_application](components/shock_application.md): 5 classes
- [agent_decision_phase](components/agent_decision_phase.md): 5 classes
- [market_clearing](components/market_clearing.md): 6 classes
- [default_handling](components/default_handling.md): 3 classes

FILE:references/CONSTRAINTS.md
# Constraints

## preservation_manifest

```yaml
required_objects:
  business_decisions_count: 143
  fatal_constraints_count: 42
  non_fatal_constraints_count: 127
  use_cases_count: 0
  semantic_locks_count: 12
  preconditions_count: 4
  evidence_quality_rules_count: 2
  traceback_scenarios_count: 5

```

FILE:references/LOCKS.md
# Semantic Locks + Preconditions

## Semantic Locks (12)

### `SL-01` <sub>(on_violation: fatal)</sub>
Execute sell orders before buy orders in every trading cycle

### `SL-02` <sub>(on_violation: fatal)</sub>
Trading signals MUST use next-bar execution (no look-ahead)

### `SL-03` <sub>(on_violation: fatal)</sub>
Entity IDs MUST follow format entity_type_exchange_code

### `SL-04` <sub>(on_violation: fatal)</sub>
DataFrame index MUST be MultiIndex (entity_id, timestamp)

### `SL-05` <sub>(on_violation: fatal)</sub>
TradingSignal MUST have EXACTLY ONE of: position_pct, order_money, order_amount

### `SL-06` <sub>(on_violation: fatal)</sub>
filter_result column semantics: True=BUY, False=SELL, None/NaN=NO ACTION

### `SL-07` <sub>(on_violation: fatal)</sub>
Transformer MUST run BEFORE Accumulator in factor pipeline

### `SL-08` <sub>(on_violation: fatal)</sub>
MACD parameters locked: fast=12, slow=26, signal=9

### `SL-09` <sub>(on_violation: warning)</sub>
Default transaction costs: buy_cost=0.001, sell_cost=0.001, slippage=0.001

### `SL-10` <sub>(on_violation: fatal)</sub>
A-share equity trading is T+1 (no same-day close of buy positions)

### `SL-11` <sub>(on_violation: fatal)</sub>
Recorder subclass MUST define provider AND data_schema class attributes

### `SL-12` <sub>(on_violation: fatal)</sub>
Factor result_df MUST contain either 'filter_result' OR 'score_result' column

## Preconditions (4)

- **`PC-01`**: `python3 -c 'import zvt; print(zvt.__version__)'` → on_fail: Run: python3 -m pip install zvt then re-run: python3 -m zvt.init_dirs to initialize data directories
- **`PC-02`**: `python3 -c "from zvt.api.kdata import get_kdata; df = get_kdata(entity_ids=['stock_sh_600000'], limit=1); assert df is not None and len(df) > 0, 'No kdata found'"` → on_fail: Run recorder first: python3 -m zvt.recorders.em.em_stock_kdata_recorder --entity_ids stock_sh_600000 (replace with your target entity IDs)
- **`PC-03`**: `python3 -c "import os; from pathlib import Path; zvt_home = Path(os.environ.get('ZVT_HOME', Path.home() / '.zvt')); assert zvt_home.exists(), f'ZVT home not found: {zvt_home}'"` → on_fail: Run: python3 -m zvt.init_dirs
- **`PC-04`**: `python3 -c "import os, tempfile; from pathlib import Path; zvt_home = Path(os.environ.get('ZVT_HOME', Path.home() / '.zvt')); test_f = zvt_home / '.write_test'; test_f.touch(); test_f.unlink()"` → on_fail: Check directory permissions: chmod u+w ~/.zvt or set ZVT_HOME environment variable to a writable location

FILE:references/USE_CASES.md
# Known Use Cases (KUC)

Total: **0**

FILE:references/WISDOM.md
# Cross-Project Wisdom

Total: **10**

## `CW-REGTECH-001` — Input bounds validation before statistical computation
**From**: finance-bp-062--ifrs9, finance-bp-067--firesale_stresstest · **Applicable to**: regtech-compliance

Statistical functions like norm.ppf() and cumprod() have strict input requirements that, if violated, produce infinite or NaN values corrupting entire pipelines. Always validate inputs against domain constraints (DDP in (0,1), counts > 0) before passing to statistical functions. Apply to any statistical or inverse-CDF computation.

## `CW-REGTECH-002` — Graph/topology invariant verification before construction
**From**: finance-bp-060--AMLSim, finance-bp-067--firesale_stresstest · **Applicable to**: regtech-compliance

Before constructing graph structures (transaction networks, transition matrices), verify invariants: sum(in-degrees) = sum(out-degrees), matrix row sums = 1.0, degree sequence length divisibility. This catches data corruption early before expensive graph construction operations. Apply to any bipartite or directed graph generation.

## `CW-REGTECH-003` — Regulatory amortization window discipline
**From**: finance-bp-062--ifrs9 · **Applicable to**: regtech-compliance

IFRS 9 mandates different ECL calculation windows: exactly 12-month for Stage 1 (11 zero-indexed iterations), full remaining tenor for Stage 2/3. Mixing these up violates compliance requirements. Always encode stage-specific window logic explicitly rather than reusing a single loop variable across stages.

## `CW-REGTECH-004` — Fingerprint composition must include all request dimensions
**From**: finance-bp-071--opensanctions · **Applicable to**: regtech-compliance

Cache keys must include all request parameters that affect response content: URL, HTTP method, authentication headers, and request body for state-changing methods. POST requests with different bodies returning identical cache is a silent data corruption bug. Always compose fingerprints from the union of all content-affecting parameters.

## `CW-REGTECH-005` — Floating-point zero-equivalence with explicit epsilon tolerance
**From**: finance-bp-067--firesale_stresstest · **Applicable to**: regtech-compliance

IEEE 754 floating-point precision causes exact zero comparisons to fail in financial calculations. Always use eps=1e-9 tolerance for zero-equivalence checks in market clearing, leverage ratios, and price impact calculations. This prevents division-by-zero crashes and incorrect cash transfers.

## `CW-REGTECH-006` — Stage classification threshold ordering enforcement
**From**: finance-bp-062--ifrs9 · **Applicable to**: regtech-compliance

IFRS 9 SICR thresholds must be ordered: BUCKETS 2-3 trigger Stage 2, BUCKETS >=4 trigger Stage 3. Applying thresholds in wrong order or omitting absolute DPD triggers causes material ECL misstatement. Validate threshold ordering and document bucket-to-stage mapping explicitly.

## `CW-REGTECH-007` — Initialization-before-use dependency ordering
**From**: finance-bp-067--firesale_stresstest · **Applicable to**: regtech-compliance

Operational dependencies must initialize before dependent objects use them: AssetMarket before bank registration, CSV file existence before parsing, entity ID before statement addition. Violations cause AttributeError or FileNotFoundError that abort entire initialization. Always encode dependency ordering explicitly in initialization sequences.

## `CW-REGTECH-008` — Sufficient entity ID collision prevention
**From**: finance-bp-071--opensanctions · **Applicable to**: regtech-compliance

Entity IDs must include enough identifying attributes (dataset prefix, source, identifier type, document number) to guarantee uniqueness. Collisions create false equivalence between unrelated entities, directly causing false positive sanctions matches. Include the maximum available discriminating attributes in ID construction.

## `CW-REGTECH-009` — Hub selection with candidate removal before addition
**From**: finance-bp-060--AMLSim · **Applicable to**: regtech-compliance

When selecting hub accounts for typology placement, always call remove_typology_candidate BEFORE add_node for each selected account. Reversing this order causes hub self-selection (accounts choosing themselves) and duplicate assignment across overlapping patterns. Apply to any allocation algorithm with candidate pooling.

## `CW-REGTECH-010` — Insolvency detection before operational decisions
**From**: finance-bp-067--firesale_stresstest · **Applicable to**: regtech-compliance

Banks below the insolvency threshold (3% leverage) must trigger default immediately, not enter the deleveraging decision logic. Checking operational thresholds before insolvency creates zombie banks with negative equity. Always gate operational decisions on prior insolvency state.

FILE:references/components/agent_decision_phase.md
# agent_decision_phase (5 classes)

## `Bank.act`
`agent_decision_phase/bank-act.py:0`

## `BankLeverageConstraint.check`
`agent_decision_phase/bankleverageconstraint-check.py:0`

## `do_delever.execute`
`agent_decision_phase/do-delever-execute.py:0`

## `delever_strategy`
`agent_decision_phase/delever-strategy.py:0`

## `leverage_threshold`
`agent_decision_phase/leverage-threshold.py:0`

FILE:references/components/default_handling.md
# default_handling (3 classes)

## `Bank.do_trigger_default`
`default_handling/bank-do-trigger-default.py:0`

## `sell_assets_proportionally`
`default_handling/sell-assets-proportionally.py:0`

## `default_treatment`
`default_handling/default-treatment.py:0`

FILE:references/components/market_clearing.md
# market_clearing (6 classes)

## `AssetMarket.clear_the_market`
`market_clearing/assetmarket-clear-the-market.py:0`

## `Order.settle`
`market_clearing/order-settle.py:0`

## `compute_price_impact`
`market_clearing/compute-price-impact.py:0`

## `clearing_mode`
`market_clearing/clearing-mode.py:0`

## `price_impact_function`
`market_clearing/price-impact-function.py:0`

## `execution_price`
`market_clearing/execution-price.py:0`

FILE:references/components/model_initialization.md
# model_initialization (4 classes)

## `Model.initialize`
`model_initialization/model-initialize.py:0`

## `Bank.__init__`
`model_initialization/bank-init.py:0`

## `AssetMarket.__init__`
`model_initialization/assetmarket-init.py:0`

## `data_source`
`model_initialization/data-source.py:0`

FILE:references/components/shock_application.md
# shock_application (5 classes)

## `Model.apply_initial_shock`
`shock_application/model-apply-initial-shock.py:0`

## `AssetMarket.update_price`
`shock_application/assetmarket-update-price.py:0`

## `Tradable.update_price`
`shock_application/tradable-update-price.py:0`

## `shock_asset`
`shock_application/shock-asset.py:0`

## `shock_fraction`
`shock_application/shock-fraction.py:0`

FILE:references/seed.yaml
meta:
  id: finance-bp-067-v5.3
  version: v6.1
  blueprint_id: finance-bp-067
  sop_version: crystal-compilation-v6.1
  source_language: en
  compiled_at: '2026-04-22T13:00:22.380878+00:00'
  target_host: openclaw
  authoritative_artifact:
    primary: seed.yaml
    non_authoritative_derivatives:
    - SKILL.md (host-generated summary, may lag)
    - HEARTBEAT.md (host telemetry)
    - memory/*.md (host conversational memory)
    rule: On any behavioral decision (preconditions check, OV assertion, EQ rule firing, spec_lock verification), agents MUST
      re-read seed.yaml. Derivatives are for UI display only and may be out-of-date.
  execution_protocol:
    install_trigger:
    - Execute resources.host_adapter.install_recipes[] in declared order
    - Verify each package with import check before proceeding
    execute_trigger: When user intent matches intent_router.uc_entries[].positive_terms AND user uses action verb (run/execute/跑/执行/backtest/fetch/collect)
    on_execute:
    - Reload seed.yaml (do not rely on SKILL.md or cached summaries)
    - Run preconditions[] in declared order; halt on first fatal failure with on_fail message to user
    - Enter context_state_machine.CA1_MEMORY_CHECKED state
    - Evaluate evidence_quality.enforcement_rules[]; prepend user_disclosure_template
    - Translate user_facing_fields to user locale per locale_contract
    - "[V6 READING ORDER]\nThis crystal contains the following V6 layers. Before answering any business question, the host\
      \ MUST read them in order:\n  1. anti_patterns[] — cross-project anti-patterns (with AP-* ids)\n  2. cross_project_wisdom[]\
      \ — cross-project wisdom (with CW-* ids)\n  3. domain_constraints_injected[] — domain constraints (SHARED-* ids)\n \
      \ 4. known_use_cases[] — concrete business scenarios (KUC-* ids)\n  5. component_capability_map — AST component map\
      \ (by module)\n\nWhen answering user questions, proactively cite relevant AP-*/CW-*/SHARED-*/KUC-* ids with source text.\
      \ Examples: T+1 rules -> cite SHARED-* constraint; model comparison -> warn via AP-*; follow-holdings strategy -> cite\
      \ KUC-* with example file."
    workspace_resolution:
      scripts_path: '{host_workspace}/scripts/'
      skills_path: '{host_workspace}/skills/'
      trace_path: '{host_workspace}/.trace/'
  capability_tags:
    markets:
    - global
    activities:
    - regtech-compliance
  upgraded_from: finance-bp-067-v1.seed.yaml
  upgraded_at: '2026-04-22T13:20:13.710708+00:00'
  v6_inputs:
    ast_mind_map: knowledge/sources/finance/finance-bp-067--firesale_stresstest/v6_inputs/ast_mind_map.yaml
    anti_patterns: null
    cross_project_wisdom: null
    examples_kuc: knowledge/sources/finance/finance-bp-067--firesale_stresstest/v6_inputs/examples_kuc.yaml
    shared_pools_dir: knowledge/sources/finance/_shared
anti_patterns:
- id: AP-REGTECH-001
  title: Missing attribute initialization on data structures
  description: 'When loading account lists or creating entity dictionaries, failing to initialize required list/dict attributes
    (e.g., normal_models, statement IDs) causes KeyError or ValueError at runtime. The code path that reads these structures
    assumes they exist, but the initialization path omits them. Consequence: pipeline crashes or data loss for affected entities.'
  project_source: finance-bp-060--AMLSim, finance-bp-071--opensanctions
  severity: high
  applicable_to_tags:
    markets:
    - global
    activities:
    - regtech-compliance
  _source_file: anti-patterns/regtech.yaml
- id: AP-REGTECH-002
  title: Self-loops in transaction graphs violate domain rules
  description: 'When generating directed transaction graphs or AML typologies, allowing source == destination edges creates
    self-loops. In AML simulation, self-loops represent accounts sending money to themselves, which is not a valid money laundering
    pattern. In fire-sale models, self-loops cause undefined behavior. Consequence: corrupted graph topology and invalid typology
    validation.'
  project_source: finance-bp-060--AMLSim, finance-bp-067--firesale_stresstest
  severity: high
  applicable_to_tags:
    markets:
    - global
    activities:
    - regtech-compliance
  _source_file: anti-patterns/regtech.yaml
- id: AP-REGTECH-003
  title: Unvalidated floating-point inputs cause runtime crashes
  description: 'When parsing CSV files or computing statistical functions on raw data, failing to validate inputs against
    acceptable ranges (e.g., DDP near 0 or 1 for norm.ppf, unvalidated floats from CSV) causes ValueError or infinite/NaN
    values. Consequence: entire model crashes before simulation or corrupted downstream calculations.'
  project_source: finance-bp-062--ifrs9, finance-bp-067--firesale_stresstest
  severity: high
  applicable_to_tags:
    markets:
    - global
    activities:
    - regtech-compliance
  _source_file: anti-patterns/regtech.yaml
- id: AP-REGTECH-004
  title: Division by zero in financial calculations produces inf/NaN
  description: 'When calculating ratios like DDP (downgrade observations / total observations) or price impact denominators
    (total_quantities), zero-denominator cases are not guarded. The resulting inf/NaN propagates through all downstream calculations,
    corrupting CCI, ECL, or market clearing. Consequence: systematic data corruption across the entire calculation pipeline.'
  project_source: finance-bp-062--ifrs9, finance-bp-067--firesale_stresstest
  severity: high
  applicable_to_tags:
    markets:
    - global
    activities:
    - regtech-compliance
  _source_file: anti-patterns/regtech.yaml
- id: AP-REGTECH-005
  title: Incorrect amortization windows violate IFRS 9 compliance
  description: 'Stage 1 ECL requires exactly 12-month amortization (11 zero-indexed iterations) while Stage 2/3 requires full
    remaining tenor (tenor-1 iterations). Using identical windows for all stages causes ECL over/understatement. Consequence:
    regulatory non-compliance and materially incorrect loan loss provisions.'
  project_source: finance-bp-062--ifrs9
  severity: high
  applicable_to_tags:
    markets:
    - global
    activities:
    - regtech-compliance
  _source_file: anti-patterns/regtech.yaml
- id: AP-REGTECH-006
  title: Wrong leverage formula in threshold-based decisions
  description: 'Computing leverage as equity-to-liabilities (E/L) instead of equity-to-assets (E/A) produces different values.
    This causes deleveraging triggers and insolvency detection to fire at wrong thresholds. Consequence: zombie banks continue
    operating with negative equity, or healthy banks unnecessarily deleverage.'
  project_source: finance-bp-067--firesale_stresstest
  severity: high
  applicable_to_tags:
    markets:
    - global
    activities:
    - regtech-compliance
  _source_file: anti-patterns/regtech.yaml
- id: AP-REGTECH-007
  title: Confusing deleveraging buffer threshold with insolvency threshold
  description: 'Banks below 3% leverage are insolvent and must default, but deleveraging should trigger at 4% buffer. Using
    the same threshold eliminates the buffer zone, causing immediate default with no intermediate corrective action. Consequence:
    excessive bank failures amplify systemic contagion.'
  project_source: finance-bp-067--firesale_stresstest
  severity: high
  applicable_to_tags:
    markets:
    - global
    activities:
    - regtech-compliance
  _source_file: anti-patterns/regtech.yaml
- id: AP-REGTECH-008
  title: Cache keys omit request body for state-changing methods
  description: 'Using only URL for cache fingerprints on POST/PATCH requests means different request bodies return identical
    cached content. This causes stale data, missing entities, and data corruption in compliance screening pipelines. Consequence:
    sanctions matches missed or false positives from stale entity data.'
  project_source: finance-bp-071--opensanctions
  severity: high
  applicable_to_tags:
    markets:
    - global
    activities:
    - regtech-compliance
  _source_file: anti-patterns/regtech.yaml
- id: AP-REGTECH-009
  title: ID collision in entity construction creates false sanctions matches
  description: 'When constructing entity IDs from source identifiers, insufficient identifying attributes cause different
    real-world entities to receive identical IDs. The database then merges them into one entity. Consequence: a sanctioned
    entity''s ID matches an innocent entity, causing false positive compliance alerts.'
  project_source: finance-bp-071--opensanctions
  severity: high
  applicable_to_tags:
    markets:
    - global
    activities:
    - regtech-compliance
  _source_file: anti-patterns/regtech.yaml
- id: AP-REGTECH-010
  title: Incorrect cumulative PD ordering corrupts lifetime ECL term structure
  description: 'Using cumprod(1-conPD) without shift(1) and fillna(1) produces corrupted first-period survival probability.
    This cascades into all subsequent marginal and cumulative PD calculations, violating IFRS 9 lifetime ECL requirements.
    Consequence: systematically incorrect provisions across all remaining tenor periods.'
  project_source: finance-bp-062--ifrs9
  severity: high
  applicable_to_tags:
    markets:
    - global
    activities:
    - regtech-compliance
  _source_file: anti-patterns/regtech.yaml
- id: AP-REGTECH-011
  title: Mismatched configuration parameters across coupled components
  description: 'When TransactionGenerator and Nominator use different degree_threshold values, Nominator identifies hub accounts
    using different criteria than TransactionGenerator. This causes incorrect fan-in/fan-out candidate selection. Consequence:
    AML typology patterns placed on wrong accounts, invalidating simulation results.'
  project_source: finance-bp-060--AMLSim
  severity: medium
  applicable_to_tags:
    markets:
    - global
    activities:
    - regtech-compliance
  _source_file: anti-patterns/regtech.yaml
- id: AP-REGTECH-012
  title: Reverse property assignment corrupts entity construction
  description: 'Stub (reverse) properties represent inverse relationships and raise InvalidData when directly assigned. Attempting
    to add values to stub properties instead of forward properties causes ValueError, aborting entity construction. Consequence:
    entities lost from output, incomplete compliance datasets.'
  project_source: finance-bp-071--opensanctions
  severity: medium
  applicable_to_tags:
    markets:
    - global
    activities:
    - regtech-compliance
  _source_file: anti-patterns/regtech.yaml
- id: AP-REGTECH-013
  title: Order-dependent execution creates first-mover advantage bias
  description: 'Without separating step() and act() phases, first-acting banks sell assets before others decide, creating
    systematic first-mover advantage. This distorts the competitive equilibrium and fire-sale dynamics. Consequence: unreliable
    systemic risk estimates that understate contagion for late-acting banks.'
  project_source: finance-bp-067--firesale_stresstest
  severity: medium
  applicable_to_tags:
    markets:
    - global
    activities:
    - regtech-compliance
  _source_file: anti-patterns/regtech.yaml
- id: AP-REGTECH-014
  title: Immediate asset sales cause double-selling and undefined state
  description: 'Executing asset sales immediately rather than queuing them to a buffer allows multiple banks holding the same
    asset to sell simultaneously without accounting for concurrent intentions. Consequence: undefined price impact and incorrect
    cash transfers in market clearing.'
  project_source: finance-bp-067--firesale_stresstest
  severity: medium
  applicable_to_tags:
    markets:
    - global
    activities:
    - regtech-compliance
  _source_file: anti-patterns/regtech.yaml
- id: AP-REGTECH-015
  title: Missing EAD component in ECL formula produces incomplete provisions
  description: 'IFRS 9 requires ECL = PD x LGD x EAD. When the EAD module is missing or not integrated, the ECL calculation
    is incomplete and unusable for provisioning. Consequence: regulatory rejection of ECL calculations, blocking of provisioning
    and reporting processes.'
  project_source: finance-bp-062--ifrs9
  severity: high
  applicable_to_tags:
    markets:
    - global
    activities:
    - regtech-compliance
  _source_file: anti-patterns/regtech.yaml
cross_project_wisdom:
- wisdom_id: CW-REGTECH-001
  source_project: finance-bp-062--ifrs9, finance-bp-067--firesale_stresstest
  pattern_name: Input bounds validation before statistical computation
  description: Statistical functions like norm.ppf() and cumprod() have strict input requirements that, if violated, produce
    infinite or NaN values corrupting entire pipelines. Always validate inputs against domain constraints (DDP in (0,1), counts
    > 0) before passing to statistical functions. Apply to any statistical or inverse-CDF computation.
  applicable_to_activity: regtech-compliance
  _source_file: cross-project-wisdom/regtech.yaml
- wisdom_id: CW-REGTECH-002
  source_project: finance-bp-060--AMLSim, finance-bp-067--firesale_stresstest
  pattern_name: Graph/topology invariant verification before construction
  description: 'Before constructing graph structures (transaction networks, transition matrices), verify invariants: sum(in-degrees)
    = sum(out-degrees), matrix row sums = 1.0, degree sequence length divisibility. This catches data corruption early before
    expensive graph construction operations. Apply to any bipartite or directed graph generation.'
  applicable_to_activity: regtech-compliance
  _source_file: cross-project-wisdom/regtech.yaml
- wisdom_id: CW-REGTECH-003
  source_project: finance-bp-062--ifrs9
  pattern_name: Regulatory amortization window discipline
  description: 'IFRS 9 mandates different ECL calculation windows: exactly 12-month for Stage 1 (11 zero-indexed iterations),
    full remaining tenor for Stage 2/3. Mixing these up violates compliance requirements. Always encode stage-specific window
    logic explicitly rather than reusing a single loop variable across stages.'
  applicable_to_activity: regtech-compliance
  _source_file: cross-project-wisdom/regtech.yaml
- wisdom_id: CW-REGTECH-004
  source_project: finance-bp-071--opensanctions
  pattern_name: Fingerprint composition must include all request dimensions
  description: 'Cache keys must include all request parameters that affect response content: URL, HTTP method, authentication
    headers, and request body for state-changing methods. POST requests with different bodies returning identical cache is
    a silent data corruption bug. Always compose fingerprints from the union of all content-affecting parameters.'
  applicable_to_activity: regtech-compliance
  _source_file: cross-project-wisdom/regtech.yaml
- wisdom_id: CW-REGTECH-005
  source_project: finance-bp-067--firesale_stresstest
  pattern_name: Floating-point zero-equivalence with explicit epsilon tolerance
  description: IEEE 754 floating-point precision causes exact zero comparisons to fail in financial calculations. Always use
    eps=1e-9 tolerance for zero-equivalence checks in market clearing, leverage ratios, and price impact calculations. This
    prevents division-by-zero crashes and incorrect cash transfers.
  applicable_to_activity: regtech-compliance
  _source_file: cross-project-wisdom/regtech.yaml
- wisdom_id: CW-REGTECH-006
  source_project: finance-bp-062--ifrs9
  pattern_name: Stage classification threshold ordering enforcement
  description: 'IFRS 9 SICR thresholds must be ordered: BUCKETS 2-3 trigger Stage 2, BUCKETS >=4 trigger Stage 3. Applying
    thresholds in wrong order or omitting absolute DPD triggers causes material ECL misstatement. Validate threshold ordering
    and document bucket-to-stage mapping explicitly.'
  applicable_to_activity: regtech-compliance
  _source_file: cross-project-wisdom/regtech.yaml
- wisdom_id: CW-REGTECH-007
  source_project: finance-bp-067--firesale_stresstest
  pattern_name: Initialization-before-use dependency ordering
  description: 'Operational dependencies must initialize before dependent objects use them: AssetMarket before bank registration,
    CSV file existence before parsing, entity ID before statement addition. Violations cause AttributeError or FileNotFoundError
    that abort entire initialization. Always encode dependency ordering explicitly in initialization sequences.'
  applicable_to_activity: regtech-compliance
  _source_file: cross-project-wisdom/regtech.yaml
- wisdom_id: CW-REGTECH-008
  source_project: finance-bp-071--opensanctions
  pattern_name: Sufficient entity ID collision prevention
  description: Entity IDs must include enough identifying attributes (dataset prefix, source, identifier type, document number)
    to guarantee uniqueness. Collisions create false equivalence between unrelated entities, directly causing false positive
    sanctions matches. Include the maximum available discriminating attributes in ID construction.
  applicable_to_activity: regtech-compliance
  _source_file: cross-project-wisdom/regtech.yaml
- wisdom_id: CW-REGTECH-009
  source_project: finance-bp-060--AMLSim
  pattern_name: Hub selection with candidate removal before addition
  description: When selecting hub accounts for typology placement, always call remove_typology_candidate BEFORE add_node for
    each selected account. Reversing this order causes hub self-selection (accounts choosing themselves) and duplicate assignment
    across overlapping patterns. Apply to any allocation algorithm with candidate pooling.
  applicable_to_activity: regtech-compliance
  _source_file: cross-project-wisdom/regtech.yaml
- wisdom_id: CW-REGTECH-010
  source_project: finance-bp-067--firesale_stresstest
  pattern_name: Insolvency detection before operational decisions
  description: Banks below the insolvency threshold (3% leverage) must trigger default immediately, not enter the deleveraging
    decision logic. Checking operational thresholds before insolvency creates zombie banks with negative equity. Always gate
    operational decisions on prior insolvency state.
  applicable_to_activity: regtech-compliance
  _source_file: cross-project-wisdom/regtech.yaml
domain_constraints_injected: []
resources_injected: {}
component_capability_map:
  project: finance-bp-067--firesale_stresstest
  scan_date: '2026-04-22'
  stats:
    total_files: 5
    total_classes: 23
    total_functions: 0
    total_stages: 5
  modules:
    model_initialization:
      class_count: 4
      stage_id: initialization
      stage_order: 1
      responsibility: 'Load bank balance sheets from EBA data and initialize market infrastructure. WHY: Provides reproducible
        starting state from real European banking data.'
      classes:
      - name: Model.initialize
        file: model_initialization/model-initialize.py
        line: 0
        kind: required_method
        signature: ''
      - name: Bank.__init__
        file: model_initialization/bank-init.py
        line: 0
        kind: required_method
        signature: ''
      - name: AssetMarket.__init__
        file: model_initialization/assetmarket-init.py
        line: 0
        kind: required_method
        signature: ''
      - name: data_source
        file: model_initialization/data-source.py
        line: 0
        kind: replaceable_point
      design_decision_count: 4
    shock_application:
      class_count: 5
      stage_id: shock_application
      stage_order: 2
      responsibility: 'Apply exogenous initial shock to asset prices, triggering potential deleveraging cascade. WHY: Models
        contagion from external market shock (e.g., sovereign debt crisis).'
      classes:
      - name: Model.apply_initial_shock
        file: shock_application/model-apply-initial-shock.py
        line: 0
        kind: required_method
        signature: ''
      - name: AssetMarket.update_price
        file: shock_application/assetmarket-update-price.py
        line: 0
        kind: required_method
        signature: ''
      - name: Tradable.update_price
        file: shock_application/tradable-update-price.py
        line: 0
        kind: required_method
        signature: ''
      - name: shock_asset
        file: shock_application/shock-asset.py
        line: 0
        kind: replaceable_point
      - name: shock_fraction
        file: shock_application/shock-fraction.py
        line: 0
        kind: replaceable_point
      design_decision_count: 3
    agent_decision_phase:
      class_count: 5
      stage_id: agent_decision
      stage_order: 3
      responsibility: 'Each bank evaluates solvency and chooses deleveraging actions. WHY: Separating decision from execution
        ensures order independence.'
      classes:
      - name: Bank.act
        file: agent_decision_phase/bank-act.py
        line: 0
        kind: required_method
        signature: ''
      - name: BankLeverageConstraint.check
        file: agent_decision_phase/bankleverageconstraint-check.py
        line: 0
        kind: required_method
        signature: ''
      - name: do_delever.execute
        file: agent_decision_phase/do-delever-execute.py
        line: 0
        kind: required_method
        signature: ''
      - name: delever_strategy
        file: agent_decision_phase/delever-strategy.py
        line: 0
        kind: replaceable_point
      - name: leverage_threshold
        file: agent_decision_phase/leverage-threshold.py
        line: 0
        kind: replaceable_point
      design_decision_count: 6
    market_clearing:
      class_count: 6
      stage_id: market_clearing
      stage_order: 4
      responsibility: 'Execute each queued sell orders and compute price impact. WHY: Batch execution isolates market mechanics
        from agent decision-making.'
      classes:
      - name: AssetMarket.clear_the_market
        file: market_clearing/assetmarket-clear-the-market.py
        line: 0
        kind: required_method
        signature: ''
      - name: Order.settle
        file: market_clearing/order-settle.py
        line: 0
        kind: required_method
        signature: ''
      - name: compute_price_impact
        file: market_clearing/compute-price-impact.py
        line: 0
        kind: required_method
        signature: ''
      - name: clearing_mode
        file: market_clearing/clearing-mode.py
        line: 0
        kind: replaceable_point
      - name: price_impact_function
        file: market_clearing/price-impact-function.py
        line: 0
        kind: replaceable_point
      - name: execution_price
        file: market_clearing/execution-price.py
        line: 0
        kind: replaceable_point
      design_decision_count: 6
    default_handling:
      class_count: 3
      stage_id: default_handling
      stage_order: 5
      responsibility: 'Process bank defaults and redistribute assets. WHY: Defaults are terminal events that affect systemic
        risk calculations.'
      classes:
      - name: Bank.do_trigger_default
        file: default_handling/bank-do-trigger-default.py
        line: 0
        kind: required_method
        signature: ''
      - name: sell_assets_proportionally
        file: default_handling/sell-assets-proportionally.py
        line: 0
        kind: required_method
        signature: ''
      - name: default_treatment
        file: default_handling/default-treatment.py
        line: 0
        kind: replaceable_point
      design_decision_count: 3
  data_flow_hints: []
locale_contract:
  source_language: en
  user_facing_fields:
  - human_summary.what_i_can_do.tagline
  - human_summary.what_i_can_do.use_cases[]
  - human_summary.what_i_auto_fetch[]
  - human_summary.what_i_ask_you[]
  - evidence_quality.user_disclosure_template
  - post_install_notice.message_template.positioning
  - post_install_notice.message_template.capability_catalog.groups[].name
  - post_install_notice.message_template.capability_catalog.groups[].description
  - post_install_notice.message_template.capability_catalog.groups[].ucs[].name
  - post_install_notice.message_template.capability_catalog.groups[].ucs[].short_description
  - post_install_notice.message_template.call_to_action
  - post_install_notice.message_template.featured_entries[].beginner_prompt
  - post_install_notice.message_template.more_info_hint
  - preconditions[].description
  - preconditions[].on_fail
  - intent_router.uc_entries[].name
  - intent_router.uc_entries[].ambiguity_question
  - architecture.pipeline
  - architecture.stages[].narrative.does_what
  - architecture.stages[].narrative.key_decisions
  - architecture.stages[].narrative.common_pitfalls
  - constraints.fatal[].consequence
  - constraints.regular[].consequence
  - output_validator.assertions[].failure_message
  - acceptance.hard_gates[].on_fail
  - skill_crystallization.action
  locale_detection_order:
  - explicit_user_declaration
  - first_message_language
  - system_locale
  translation_enforcement:
    trigger: on_first_user_message
    action: Render user_facing_fields in detected locale, preserving all IDs (BD-/SL-/UC-/finance-C-) and code identifiers
      verbatim
    violation_code: LOCALE-01
    violation_signal: User receives untranslated English Human Summary when detected locale != en
evidence_quality:
  declared:
    evidence_coverage_ratio: 1.0
    evidence_verify_ratio: 0.5607476635514018
    evidence_invalid: 47
    evidence_verified: 60
    evidence_auto_fixed: 0
    audit_coverage: 32/32 (100%)
    audit_pass_rate: 1/32 (3%)
    audit_fail_total: 22
    audit_finance_universal:
      pass: 1
      warn: 6
      fail: 13
    audit_subdomain_totals:
      pass: 0
      warn: 3
      fail: 9
  enforcement_rules:
  - id: EQ-01
    trigger: declared.evidence_verify_ratio < 0.5
    action: MUST invoke traceback lookup for all cited BD-IDs in output before emitting business code — read LATEST.yaml sections
      for each BD referenced
    violation_code: EQ-01-V
    violation_signal: Generated script references BD-IDs but no tool_call to read LATEST.yaml preceded code generation
  user_disclosure_template: '[QUALITY NOTICE] This crystal was compiled from blueprint finance-bp-067. Evidence verify ratio
    = 56.1% and audit fail total = 22. Generated results may have uncaptured requirement gaps. Verify critical decisions against
    source files (LATEST.yaml / LATEST.jsonl).'
traceback:
  source_files:
    blueprint: LATEST.yaml
    constraints: LATEST.jsonl
  mandatory_lookup_scenarios:
  - id: TB-01
    condition: Two constraints have apparently conflicting enforcement rules
    lookup_target: LATEST.jsonl — find both constraint IDs, compare `consequence` + `evidence_refs` to determine priority
  - id: TB-02
    condition: A business decision rationale is unclear or disputed
    lookup_target: LATEST.yaml — locate BD-ID under business_decisions, read `rationale` + `alternative_considered` fields
  - id: TB-03
    condition: evidence_invalid > 0 in evidence_quality.declared
    lookup_target: LATEST.yaml _enrich_meta — cross-check specific BD `evidence_refs` fields for invalid markers
  - id: TB-04
    condition: User asks where a rule comes from
    lookup_target: LATEST.jsonl — find constraint by ID, read `confidence.evidence_refs` for source file + line number
  - id: TB-05
    condition: Generated code does not match expected ZVT API behavior
    lookup_target: LATEST.yaml stages[].required_methods — verify method signature and evidence locator in source code
  degraded_lookup:
    no_fs_access: 'Ask the user to paste the relevant LATEST.yaml section or LATEST.jsonl lines for the BD-/finance-C- IDs
      in question. Crystal ID: finance-bp-067-v5.0.'
trace_schema:
  event_types:
  - precondition_check
  - spec_lock_check
  - evidence_rule_fired
  - evidence_rule_skipped
  - locale_translation_emitted
  - hard_gate_passed
  - hard_gate_failed
  - skill_emitted
  - false_completion_claim
preconditions:
- id: PC-01
  description: zvt package installed and importable
  check_command: python3 -c 'import zvt; print(zvt.__version__)'
  on_fail: 'Run: python3 -m pip install zvt  then re-run: python3 -m zvt.init_dirs to initialize data directories'
  severity: fatal
- id: PC-02
  description: K-data exists for target entities (required before backtesting)
  check_command: python3 -c "from zvt.api.kdata import get_kdata; df = get_kdata(entity_ids=['stock_sh_600000'], limit=1);
    assert df is not None and len(df) > 0, 'No kdata found'"
  on_fail: 'Run recorder first: python3 -m zvt.recorders.em.em_stock_kdata_recorder --entity_ids stock_sh_600000  (replace
    with your target entity IDs)'
  severity: fatal
  applies_to_uc: []
- id: PC-03
  description: ZVT data directory initialized (~/.zvt or ZVT_HOME)
  check_command: 'python3 -c "import os; from pathlib import Path; zvt_home = Path(os.environ.get(''ZVT_HOME'', Path.home()
    / ''.zvt'')); assert zvt_home.exists(), f''ZVT home not found: {zvt_home}''"'
  on_fail: 'Run: python3 -m zvt.init_dirs'
  severity: fatal
- id: PC-04
  description: SQLite write permission for ZVT data directory
  check_command: python3 -c "import os, tempfile; from pathlib import Path; zvt_home = Path(os.environ.get('ZVT_HOME', Path.home()
    / '.zvt')); test_f = zvt_home / '.write_test'; test_f.touch(); test_f.unlink()"
  on_fail: 'Check directory permissions: chmod u+w ~/.zvt  or set ZVT_HOME environment variable to a writable location'
  severity: warn
intent_router:
  uc_entries: []
context_state_machine:
  states:
  - id: CA1_MEMORY_CHECKED
    entry: Task started
    exit: All memory queries attempted and recorded; memory_unavailable set if failed
    timeout: 30s — skip memory, mark memory_unavailable=true, proceed to CA2
  - id: CA2_GAPS_FILLED
    entry: CA1 complete
    exit: 'All FATAL-priority required inputs answered: target market (A-share/HK/US), data source, time range, strategy type'
    timeout: NOT skippable — FATAL inputs MUST be user-answered before proceeding
  - id: CA3_PATH_SELECTED
    entry: CA2 complete
    exit: intent_router matched single use case with confidence gap > 20% over next candidate, no data_domain ambiguity
    timeout: Trigger ambiguity_question for top-2 candidates, await user selection
  - id: CA4_EXECUTING
    entry: CA3 complete + user explicit confirmation received
    exit: All hard gates G1-Gn passed and output files written
    timeout: NOT skippable — user confirmation of execution path required
  enforcement: Code generation is PROHIBITED before CA4_EXECUTING. Any regression to earlier state MUST be announced to user.
    buy/sell ordering SL-01 check runs at CA4 entry.
spec_lock_registry:
  semantic_locks:
  - id: SL-01
    description: Execute sell orders before buy orders in every trading cycle
    locked_value: sell() called before buy() in each Trader.run() iteration
    violation_is: fatal
    source_bd_ids:
    - BD-018
  - id: SL-02
    description: Trading signals MUST use next-bar execution (no look-ahead)
    locked_value: due_timestamp = happen_timestamp + level.to_second()
    violation_is: fatal
    source_bd_ids:
    - BD-014
    - BD-025
  - id: SL-03
    description: Entity IDs MUST follow format entity_type_exchange_code
    locked_value: stock_sh_600000 | stockhk_hk_0700 | stockus_nasdaq_AAPL
    violation_is: fatal
    source_bd_ids: []
  - id: SL-04
    description: DataFrame index MUST be MultiIndex (entity_id, timestamp)
    locked_value: df.index.names == ['entity_id', 'timestamp']
    violation_is: fatal
    source_bd_ids: []
  - id: SL-05
    description: 'TradingSignal MUST have EXACTLY ONE of: position_pct, order_money, order_amount'
    locked_value: XOR enforcement in trading/__init__.py:68
    violation_is: fatal
    source_bd_ids: []
  - id: SL-06
    description: 'filter_result column semantics: True=BUY, False=SELL, None/NaN=NO ACTION'
    locked_value: factor.py:475 order_type_flag mapping
    violation_is: fatal
    source_bd_ids: []
  - id: SL-07
    description: Transformer MUST run BEFORE Accumulator in factor pipeline
    locked_value: 'compute_result(): transform at :403 before accumulator at :409'
    violation_is: fatal
    source_bd_ids: []
  - id: SL-08
    description: 'MACD parameters locked: fast=12, slow=26, signal=9'
    locked_value: factors/algorithm.py:30 macd(slow=26, fast=12, n=9)
    violation_is: fatal
    source_bd_ids:
    - BD-036
  - id: SL-09
    description: 'Default transaction costs: buy_cost=0.001, sell_cost=0.001, slippage=0.001'
    locked_value: sim_account.py:25 SimAccountService default costs
    violation_is: warning
    source_bd_ids:
    - BD-029
  - id: SL-10
    description: A-share equity trading is T+1 (no same-day close of buy positions)
    locked_value: sim_account.available_long filters by trading_t
    violation_is: fatal
    source_bd_ids: []
  - id: SL-11
    description: Recorder subclass MUST define provider AND data_schema class attributes
    locked_value: contract/recorder.py:71 Meta; register_schema decorator
    violation_is: fatal
    source_bd_ids: []
  - id: SL-12
    description: Factor result_df MUST contain either 'filter_result' OR 'score_result' column
    locked_value: result_df.columns.intersection({'filter_result', 'score_result'}) non-empty
    violation_is: fatal
    source_bd_ids: []
  implementation_hints:
  - id: IH-01
    hint: 'Use AdjustType enum exactly: qfq (pre-adjust), hfq (post-adjust), bfq (none) — contract/__init__.py:121'
  - id: IH-02
    hint: For A-share kdata, default to hfq for long-term analysis (dividend-adjusted) — trader.py:538 StockTrader
  - id: IH-03
    hint: SQLite connection MUST use check_same_thread=False for multi-threaded recorders
  - id: IH-04
    hint: Accumulator state serialization uses JSON with custom encoder/decoder hooks — contract/base_service.py
  - id: IH-05
    hint: Factor.level MUST match TargetSelector.level (enforced at add_factor) — factors/target_selector.py:84
preservation_manifest:
  required_objects:
    business_decisions_count: 143
    fatal_constraints_count: 42
    non_fatal_constraints_count: 127
    use_cases_count: 0
    semantic_locks_count: 12
    preconditions_count: 4
    evidence_quality_rules_count: 2
    traceback_scenarios_count: 5
architecture:
  pipeline: data_collection -> data_storage -> factor_computation -> target_selection -> trading_execution -> visualization
  stages:
  - id: data_collection
    narrative:
      does_what: TimeSeriesDataRecorder and FixedCycleDataRecorder fetch OHLCV and fundamental data from providers (eastmoney,
        joinquant, baostock, akshare) and persist domain objects (Stock1dKdata, BalanceSheet) to SQLite via df_to_db().
      key_decisions: BD-002 chose evaluate_start_end_size_timestamps for incremental fetch (not full refresh) because comparing
        to get_latest_saved_record avoids redundant API calls; BD-003 chose get_data_map field transformation to keep domain
        schema provider-agnostic.
      common_pitfalls: 'Don''t forget SL-11: Recorder subclass MUST declare both provider and data_schema class attributes
        else initialization fails with assertion error; finance-C-001 fatal violation.'
    business_decisions: []
  - id: data_storage
    narrative:
      does_what: StorageBackend persists DataFrames to per-provider SQLite databases at {data_path}/{provider}/{provider}_{db_name}.db
        using path templates from _get_path_template; Mixin.record_data and Mixin.query_data provide uniform read/write interface.
      key_decisions: BD-004 chose StorageBackend abstraction (not hardcoded SQLite) to allow future cloud storage swap; BD-006
        derives db_name from data_schema __tablename__ for per-domain database isolation.
      common_pitfalls: SL-04 violation (wrong DataFrame index) causes factor pipeline failures downstream; always ensure df.index.names
        == ['entity_id', 'timestamp'] before calling record_data.
    business_decisions: []
  - id: factor_computation
    narrative:
      does_what: Factor.compute() applies Transformer (stateless, e.g. MacdTransformer) then Accumulator (stateful, e.g. MaStatsAccumulator)
        to produce filter_result or score_result columns; EntityStateService persists per-entity rolling state across batches.
      key_decisions: BD-007 chose Factor inheriting DataReader for composable data access; SL-08 locks MACD at (fast=12, slow=26,
        n=9) — chose standard Appel parameters not adaptive because interpretability matters for practitioners.
      common_pitfalls: 'SL-07: Transformer MUST run before Accumulator — swapping order causes NaN propagation; SL-12: result_df
        must contain filter_result OR score_result column or TargetSelector silently drops all signals.'
    business_decisions: []
  - id: target_selection
    narrative:
      does_what: TargetSelector.add_factor() registers Factor instances; get_targets() returns entity_ids passing threshold
        filter at a specific timestamp, enabling point-in-time historical backtesting without look-ahead.
      key_decisions: BD-012 chose registrable factor list (not hardcoded) for runtime customization; BD-013 chose timestamp-specific
        filtering not current-only because backtests need historical point-in-time correctness.
      common_pitfalls: Factor.level MUST match TargetSelector.level (IH-05); mismatched levels cause silent empty target lists
        that look like no signals but are actually level-mismatch bugs.
    business_decisions: []
  - id: trading_execution
    narrative:
      does_what: Trader.run() calls sell() before buy() each cycle, generates TradingSignals with due_timestamp = happen_timestamp
        + level.to_second() for next-bar execution, and applies on_profit_control() for stop-loss/take-profit before regular
        target selection.
      key_decisions: SL-01 locks sell-before-buy order because available_long check in sim_account depends on it — chose this
        over symmetric ordering to prevent implicit leverage; BD-039 chose long=AND/short=OR multi-level logic to reflect
        risk asymmetry.
      common_pitfalls: 'SL-02 violation (immediate execution instead of next-bar) introduces look-ahead bias and makes backtest
        results unreproducible in live trading; SL-10: A-share T+1 constraint — backtesting without it overstates returns.'
    business_decisions: []
  - id: visualization
    narrative:
      does_what: Drawer.draw() combines kline main chart with factor overlays and Rect annotations for entry/exit signals
        using Plotly; Drawable interface on Factor enables consistent chart rendering across data types.
      key_decisions: BD-019 chose drawer_rects subclass override for custom annotations not hardcoded markers — allows traders
        to define entry/exit visuals without modifying base drawing logic.
      common_pitfalls: draw_result=True by default (BD-055) is fine for development but set draw_result=False in production/headless
        environments to avoid Plotly server startup overhead.
    business_decisions: []
  - id: cross_cutting_concerns
    narrative:
      does_what: 'Invariants and utilities that span multiple pipeline stages — collected from 22 source groups: agent_decision(6),
        behaviour_strategy(5), behaviours(2), constraint_definition(7), constraints(1), default_handling(11), and 16 more.'
      key_decisions: 143 BDs merged here because they apply to more than one main stage (e.g. algorithm helpers, default value
        choices, ordering contracts, error handling). Agent should inspect individual BD summaries and link back to affected
        main stages via shared IDs.
      common_pitfalls: Cross-cutting concerns frequently surface as bugs when changes to one main stage unintentionally break
        another. Check constraints referencing these BDs and verify invariants still hold after any stage-local modification.
    business_decisions:
    - id: BD-008
      type: B
      summary: Two-phase step (step + act) for order independence
    - id: BD-009
      type: B/BA
      summary: Insolvent banks trigger default (raise exception)
    - id: BD-010
      type: B
      summary: Order-independent buffer putForSale_
    - id: BD-011
      type: B/BA
      summary: 'Deleveraging priority: pay loans first, then sell'
    - id: BD-012
      type: B/BA
      summary: 'Threshold model: act when leverage < buffer (4%)'
    - id: BD-013
      type: B
      summary: Perform proportionally across each actions of same type
    - id: BD-043
      type: B/DK
      summary: Proportional delevering across each assets/liabilities
    - id: BD-044
      type: B/BA
      summary: Pay liabilities first, then sell assets to raise liquidity
    - id: BD-059
      type: B
      summary: Perform proportional delevering by max-amount weighting
    - id: BD-061
      type: B
      summary: Truncate loan payment to notional to prevent overpayment
    - id: BD-063
      type: B
      summary: Available actions reconstructed from scratch each step
    - id: BD-085
      type: B
      summary: Sell/proportional deleveraging strategy
    - id: BD-086
      type: B
      summary: 'Two-step delever: pay liabilities first, then sell assets'
    - id: BD-025
      type: B/BA
      summary: Minimum leverage (insolvency threshold) = 3%
    - id: BD-026
      type: B/BA
      summary: Leverage buffer threshold = 4% triggers delevering behavior
    - id: BD-027
      type: B/BA
      summary: Target leverage = 5% when delevering
    - id: BD-041
      type: B/BA
      summary: Solvency measured purely by leverage ratio (equity/assets)
    - id: BD-064
      type: B/DK
      summary: Asset valuation = quantity * price for tradables
    - id: BD-065
      type: B/BA
      summary: Loan valuation = principal (face value)
    - id: BD-066
      type: B/DK
      summary: Other assets/liabilities use principal amount as valuation
    - id: BD-071
      type: B/DK
      summary: Use leverage ratio lambda = E/A for insolvency detection
    - id: BD-020
      type: B/BA
      summary: Default deferred to next step() phase
    - id: BD-021
      type: B/BA
      summary: Bank alive flag prevents further actions
    - id: BD-022
      type: B/BA
      summary: Default sells each assets proportionally
    - id: BD-046
      type: B/BA
      summary: Upon default, sell ALL tradable assets immediately
    - id: BD-092
      type: BA/DK
      summary: SIMULTANEOUS_FIRESALE=True batches each sells before price impact
    - id: BD-093
      type: BA/DK
      summary: PRICE_IMPACTS defaults to 0.05 (5% price drop per 5% market sold)
    - id: BD-094
      type: BA
      summary: BANK_LEVERAGE_BUFFER=0.04 is threshold for initiating deleveraging
    - id: BD-095
      type: BA
      summary: BANK_LEVERAGE_MIN=0.03 is insolvency trigger (leverage < 3%)
    - id: BD-101
      type: BA
      summary: ASSET_TO_SHOCK defaults to GOV_BONDS for initial price shock
    - id: BD-105
      type: BA
      summary: Loan/OtherLiability split 50-50 from total liability
    - id: BD-107
      type: BA/DK
      summary: 'Exponential price impact formula: 5% sold -> 5% drop at beta=~10.5'
    - id: BD-108
      type: B/BA
      summary: 'INTERACTION: [BD-005/BD-029] × [BD-014/BD-032] × [BD-015/BD-038] → Amplification of cascade severity through
        simultaneous fire sale compression'
    - id: BD-109
      type: B/BA
      summary: 'INTERACTION: [BD-026/BD-073] → [BD-043] → [BD-014] → [BD-015] → [BD-026] → Risk Cascade feedback loop: deleveraging
        buffer triggers fire sales that erode buffer again'
    - id: BD-110
      type: BA
      summary: 'INTERACTION: [BD-026/BD-073] vs [BD-027/BD-074] → Contradiction: 1% buffer between trigger (4%) and target
        (5%) is insufficient for stabilization'
    - id: BD-111
      type: BA
      summary: 'INTERACTION: [BD-043] × [BD-017/BD-083] × [BD-015] → Hidden dependency: Proportional deleveraging with midpoint
        pricing undervalues assets under stress'
    - id: BD-112
      type: BA/DK
      summary: 'INTERACTION: [BD-014/BD-032] × [BD-048] → Hidden dependency: Simultaneous firesale requires price impact computed
        BEFORE settlement, breaking if order reversed'
    - id: BD-113
      type: BA/M
      summary: 'INTERACTION: [BD-045] × [BD-051] × [BD-052] → Hidden dependency: Random shuffle for fairness requires fixed
        seed AND sufficient Monte Carlo runs for validity'
    - id: BD-114
      type: BA/M
      summary: 'INTERACTION: [BD-018] × [BD-022] → Hidden dependency: Per-asset-type price impact assumes fungibility that
        breaks under default liquidation'
    - id: BD-115
      type: BA
      summary: 'INTERACTION: [BD-002/BD-067] × [BD-071] × [BD-003/BD-033] → Risk Cascade: Balance sheet derivation with 5%
        cash creates liquidity-solvency timing mismatch'
    - id: BD-116
      type: BA
      summary: 'INTERACTION: [BD-007] × [BD-062] × [BD-040] → Hidden dependency: Default price of 1.0 for unknown assets creates
        silent failures in parameter sweep results'
    - id: BD-117
      type: BA/DK
      summary: 'INTERACTION: [BD-041] (leverage-only insolvency) × [BD-065] (loan at face value) × [BD-064] (tradables at
        market) → Contradiction: Mixed valuation creates arbitrary solvency boundaries'
    - id: BD-118
      type: B/BA
      summary: 'INTERACTION: [BD-009] × [BD-020] × [BD-042] → Risk Cascade: Deferred default execution creates accumulation
        of silent distress across timesteps'
    - id: BD-119
      type: BA
      summary: 'INTERACTION: [BD-006] × [BD-049] → Amplification: Per-bank price update combined with asymmetric price dynamics
        creates systematic underpricing'
    - id: BD-106
      type: M
      summary: Contract extends ESLContract from external economicsl library
    - id: BD-001
      type: B/BA
      summary: Banks derived from real EBA 2018 CSV data
    - id: BD-002
      type: B/BA
      summary: 'Balance sheet formula: asset=CET1E/leverage, liability=asset-CET1E'
    - id: BD-003
      type: BA
      summary: Cash fixed at 5% of total assets
    - id: BD-004
      type: BA/M
      summary: Other liability split 50/50 with loan
    - id: BD-023
      type: B/BA
      summary: Use 48 banks from EBA 2018 EU-wide stress test data as model population
    - id: BD-033
      type: B/BA
      summary: Cash allocation = 5% of total assets
    - id: BD-034
      type: B/BA
      summary: Loans and other liabilities split 50/50
    - id: BD-035
      type: B/RC
      summary: Corporate bonds = debt securities minus government bonds
    - id: BD-067
      type: B/BA
      summary: 'Balance sheet derived: asset = CET1E / (leverage/100)'
    - id: BD-068
      type: B/RC
      summary: Other assets = total assets - debt securities - cash
    - id: BD-072
      type: B/BA
      summary: Insolvency threshold at leverage < 3%
    - id: BD-073
      type: B/BA
      summary: Trigger deleveraging when leverage < 4% (buffer zone)
    - id: BD-074
      type: B/BA
      summary: Target leverage of 5% (100/20 leverage ratio)
    - id: BD-079
      type: B/BA
      summary: 'Systemic risk threshold: EOSE < 5% returns 0 (no systemic event)'
    - id: BD-080
      type: B/BA
      summary: EOSE = number_of_defaulted_banks / NBANKS (48 banks)
    - id: BD-081
      type: B/BA
      summary: Run simulation for exactly 6 timesteps
    - id: BD-084
      type: B
      summary: 'Two-phase execution: simultaneous vs random shuffle firesale'
    - id: BD-088
      type: B/BA
      summary: Initial shock applied to government bonds market
    - id: BD-GAP-001
      type: DK
      summary: 'Missing: as-of vs processing time'
    - id: BD-GAP-002
      type: DK
      summary: 'Missing: Trading calendar isolation'
    - id: BD-GAP-003
      type: DK
      summary: 'Missing: Timezone explicit annotation'
    - id: BD-GAP-004
      type: M
      summary: 'Missing: Matrix ill-conditioning'
    - id: BD-GAP-005
      type: DK
      summary: 'Missing: Point-in-Time data availability'
    - id: BD-GAP-006
      type: DK
      summary: 'Missing: Stale data detection'
    - id: BD-GAP-007
      type: B
      summary: 'Missing: PnL conservation'
    - id: BD-GAP-008
      type: DK
      summary: 'Missing: Model and data version snapshot'
    - id: BD-GAP-009
      type: RC
      summary: 'Missing: Price/quantity precision (tick/lot)'
    - id: BD-GAP-010
      type: M
      summary: 'Missing: Transition matrix time homogeneity'
    - id: BD-GAP-011
      type: B
      summary: 'Missing: Overdue definition (DPD 30/60/90)'
    - id: BD-GAP-012
      type: RC
      summary: 'Missing: Collection priority & compliance'
    - id: BD-GAP-013
      type: DK
      summary: 'Missing: Reconciliation timeliness'
    - id: BD-GAP-014
      type: DK
      summary: 'Missing: as-of vs processing time'
    - id: BD-GAP-015
      type: DK
      summary: 'Missing: Trading calendar isolation'
    - id: BD-GAP-016
      type: DK
      summary: 'Missing: Timezone explicit annotation'
    - id: BD-GAP-017
      type: M
      summary: 'Missing: Matrix ill-conditioning'
    - id: BD-GAP-018
      type: DK
      summary: 'Missing: Stale data detection'
    - id: BD-GAP-019
      type: B
      summary: 'Missing: PnL conservation'
    - id: BD-GAP-020
      type: M/DK
      summary: 'Missing: Day count convention'
    - id: BD-GAP-021
      type: RC
      summary: 'Missing: Price/quantity precision (tick/lot)'
    - id: BD-GAP-022
      type: B
      summary: 'Missing: Default definition & IFRS 9 stages'
    - id: BD-GAP-023
      type: B
      summary: 'Missing: PD/LGD/EAD estimation (IRB vs Standard)'
    - id: BD-GAP-024
      type: B
      summary: 'Missing: Vasicek single-factor correlation (rho)'
    - id: BD-096
      type: DK/B
      summary: putForSale_ tracks pending sales to ensure order independence
    - id: BD-097
      type: DK/B
      summary: oldPrices stored before price update for mid-point settlement pricing
    - id: BD-102
      type: DK
      summary: cash = 0.05 * asset (5% cash buffer) during balance sheet init
    - id: BD-014
      type: B/DK
      summary: SIMULTANEOUS_FIRESALE=True batches each sales
    - id: BD-015
      type: B/BA
      summary: Exponential price impact per Cifuentes 2005
    - id: BD-016
      type: BA/DK
      summary: 5% market cap sold = 5% price drop by default
    - id: BD-017
      type: B
      summary: Midpoint price execution
    - id: BD-018
      type: B
      summary: Price impact per asset type, not per asset
    - id: BD-019
      type: BA
      summary: Floating point tolerance eps=1e-9
    - id: BD-031
      type: B/BA
      summary: Default price impact = 5% (5% market sold causes 5% price drop) linear baseline
    - id: BD-032
      type: B
      summary: Simultaneous firesale batch processing enabled
    - id: BD-037
      type: B
      summary: Asset sales settle at midpoint price (current + old price) / 2
    - id: BD-038
      type: B/BA
      summary: Exponential price impact function per Cifuentes 2005
    - id: BD-039
      type: B/DK
      summary: Beta calibrated so 5% market cap sold = 5% price drop
    - id: BD-040
      type: B/BA
      summary: Default asset prices initialized at 1.0
    - id: BD-048
      type: B/DK
      summary: Price impact computed before sales settle in clear_the_market()
    - id: BD-049
      type: B/BA
      summary: Update asset prices only when price loss > 0
    - id: BD-050
      type: B/DK
      summary: Cumulative quantities tracked separately from per-step
    - id: BD-056
      type: B/DK
      summary: putForSale_ accumulator ensures order independence in asset sales
    - id: BD-058
      type: B/BA
      summary: Price impact function uses exponential decay
    - id: BD-062
      type: B/BA
      summary: 'Use defaultdict with lambda: 1.0 for default prices'
    - id: BD-069
      type: B/DK
      summary: Use exponential price impact function for asset pricing
    - id: BD-070
      type: B/BA
      summary: Calibrate price impact so 5% market sell causes 5% price drop
    - id: BD-083
      type: B/BA
      summary: 'Settle sales at midpoint price: (current + old_price) / 2'
    - id: BD-036
      type: B/BA
      summary: Floating point tolerance = 1e-9 EUR for zero checks
    - id: BD-060
      type: B/DK
      summary: Do not execute action if amount is effectively zero
    - id: BD-089
      type: T
      summary: step() MUST be called before act() per simulation tick
    - id: BD-090
      type: RC
      summary: put_for_sale() MUST be called before clear_the_market() in same tick
    - id: BD-091
      type: T
      summary: act() raises DefaultException, trigger_default() executes in NEXT step()
    - id: BD-104
      type: RC
      summary: do_delever pays liabilities BEFORE selling assets (priority order)
    - id: BD-098
      type: B
      summary: 'Contract pattern: get_action() returns action objects, is_eligible() filters'
    - id: BD-099
      type: B/DK
      summary: perform_proportionally() distributes actions by max-amount weighting
    - id: BD-103
      type: B/DK
      summary: random.shuffle(allAgents) ensures order independence across simulation runs
    - id: BD-075
      type: B
      summary: Run 100 Monte Carlo simulations per parameter set
    - id: BD-076
      type: B
      summary: Use sample mean and standard deviation for aggregating MC results
    - id: BD-077
      type: B/DK
      summary: Set price impact parameter sweep from 0% to 10% in 21 points
    - id: BD-082
      type: B
      summary: Use fixed random seed 1337 for reproducibility
    - id: BD-024
      type: B/BA
      summary: Systemic event threshold = 5% average bank defaults (Gai-Kapadia 2010)
    - id: BD-052
      type: B
      summary: Run 100 simulations for random shuffling benchmark
    - id: BD-053
      type: B/DK
      summary: Price impact parameter sweep from 0% to 10%
    - id: BD-054
      type: B/BA
      summary: Initial shock parameter sweep from 0% to 30%
    - id: BD-055
      type: B/BA
      summary: Leverage buffer = 1.0 for leverage targeting comparison baseline
    - id: BD-057
      type: B/BA
      summary: Leverage targeting comparison uses 100% buffer override
    - id: BD-005
      type: BA
      summary: Initial shock defaults to 20% on government bonds
    - id: BD-006
      type: B/DK
      summary: Price shock propagates by updating each bank asset prices
    - id: BD-007
      type: BA
      summary: Default price is 1.0 for unknown asset types
    - id: BD-028
      type: B/BA
      summary: Government bonds selected as asset to shock
    - id: BD-029
      type: B/BA
      summary: Initial shock magnitude = 20% price drop
    - id: BD-078
      type: B/BA
      summary: Set initial shock sweep from 0% to 30% in 21 points
    - id: BD-087
      type: B/BA
      summary: Use 100% leverage buffer (1.0) for leverage targeting simulation
    - id: BD-030
      type: B/BA
      summary: Simulation runs for 6 timesteps
    - id: BD-042
      type: B/BA
      summary: Default execution deferred to step() phase
    - id: BD-045
      type: B/BA
      summary: Random shuffle of agent order each simulation round
    - id: BD-047
      type: B/BA
      summary: 'Step/act split: step() handles defaults, act() handles delevering'
    - id: BD-051
      type: B/DK
      summary: Fixed random seed = 1337 for reproducibility
    - id: BD-100
      type: M/DK
      summary: random_shuffling.py compares SIMULTANEOUS_FIRESALE vs sequential clearing
resources:
  packages:
  - name: numpy
    version_pin: latest
  - name: py-economicsl
    version_pin: latest
  - name: matplotlib
    version_pin: latest
  - name: jupytext
    version_pin: latest
  - name: rise
    version_pin: latest
  - name: py-destilledESL
    version_pin: latest
  strategy_scaffold:
    entry_point_name: run_backtest
    output_path: result.csv
    execution_mode: backtest
    conditional_entry_points:
      backtest:
        entry_point_name: run_backtest
        output_path: result.csv
      collector:
        entry_point_name: run_collector
        output_path: result.json
      factor:
        entry_point_name: run_factor
        output_path: result.parquet
      training:
        entry_point_name: run_training
        output_path: result.json
      serving:
        entry_point_name: run_server
        output_path: result.json
      research:
        entry_point_name: run_research
        output_path: result.json
    tail_template: "# === DO NOT MODIFY BELOW THIS LINE ===\nif __name__ == \"__main__\":\n    result = run_backtest()  #\
      \ implement above\n    from validate import enforce_validation\n    enforce_validation(result, output_path=\"{workspace}/result.csv\"\
      )\n# === END DO NOT MODIFY ==="
  host_adapter:
    target: openclaw
    timeout_seconds: 1800
    shell_operator_restriction: 'exec tool intercepts && / ; / | — never chain: ''pip install X && python Y''. Use separate
      exec calls.'
    install_recipes:
    - python3 -m pip install numpy
    - python3 -m pip install py-economicsl
    - python3 -m pip install matplotlib
    - python3 -m pip install zvt
    credential_injection: JoinQuant/QMT credentials require user-side '!' prefix shell login. Never hardcode credentials in
      generated scripts.
    path_resolution: '{workspace} resolves to ~/.openclaw/workspace/doramagic at execution time.'
    file_io_tooling: Use openclaw 'write' tool for .py/.sql files; 'exec' tool for python3 /absolute/path/script.py (absolute
      paths only).
constraints:
  fatal:
  - id: finance-C-001
    when: When initializing banks from EBA_2018.csv
    action: Create exactly 48 bank agents matching the NBANKS constant
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Model systemic risk calculations will be incorrect if fewer or more than 48 banks are created, as the get_extent_of_systemic_event
      function divides by NBANKS=48 and the balance sheet aggregation will not match EBA 2018 reported values
    stage_ids:
    - initialization
  - id: finance-C-002
    when: When creating the AssetMarket during initialization
    action: Initialize asset prices to exactly 1.0 for each asset types using defaultdict
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Initial price of 1.0 is required for consistent price impact calculations using Cifuentes 2005 formula. If
      prices differ, the initial shock magnitude and contagion dynamics will be miscalibrated
    stage_ids:
    - initialization
  - id: finance-C-003
    when: When calculating bank balance sheet values from EBA data
    action: Verify each parsed CSV values convert to valid floats without errors
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: CSV parsing with float() on unvalidated strings will raise ValueError at runtime, causing the entire model
      to crash before any simulation can run
    stage_ids:
    - initialization
  - id: finance-C-005
    when: When initializing model state
    action: Verify EBA_2018.csv file exists in the working directory
    severity: fatal
    kind: resource_boundary
    modality: must
    consequence: FileNotFoundError will crash the model initialization if EBA_2018.csv is missing, preventing any stress test
      simulation from running
    stage_ids:
    - initialization
  - id: finance-C-010
    when: When initializing the model
    action: Create AssetMarket instance before initializing any bank balance sheets
    severity: fatal
    kind: architecture_guardrail
    modality: must
    consequence: Bank balance sheet initialization registers assets with the AssetMarket (institutions.py:48, 55); if AssetMarket
      does not exist yet, AttributeError will crash initialization
    stage_ids:
    - initialization
  - id: finance-C-014
    when: When implementing deleveraging logic for banks
    action: Check leverage ratio against the insolvency threshold (3%) first, before any deleveraging decision
    severity: fatal
    kind: architecture_guardrail
    modality: must
    consequence: Banks below 3% leverage are insolvent and must trigger default, not deleverage. Failure to check insolvency
      first causes zombie banks to continue operating with negative equity
    stage_ids:
    - agent_decision
  - id: finance-C-015
    when: When implementing the agent decision phase execution order
    action: Separate step() and act() phases, executing each step() calls before each act() calls within each simulation round
    severity: fatal
    kind: architecture_guardrail
    modality: must
    consequence: Without two-phase execution, agent decisions become order-dependent. First-acting banks can sell assets before
      others decide, creating first-mover advantage that distorts systemic risk measurement
    stage_ids:
    - agent_decision
  - id: finance-C-016
    when: When implementing asset sale actions
    action: Queue asset sales to the putForSale_ buffer rather than executing sales immediately
    severity: fatal
    kind: architecture_guardrail
    modality: must
    consequence: Immediate asset sales cause double-selling when multiple banks hold the same asset, as the market cannot
      account for concurrent sale intentions before execution
    stage_ids:
    - agent_decision
  - id: finance-C-019
    when: When calculating leverage for decision thresholds
    action: Compute leverage ratio as equity-to-assets (λ = E/A), not equity-to-liabilities
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Incorrect leverage formula causes wrong threshold comparisons, potentially triggering deleveraging at wrong
      leverage levels or failing to detect insolvency
    stage_ids:
    - agent_decision
  - id: finance-C-020
    when: When implementing the deleveraging trigger condition
    action: Trigger deleveraging when leverage < 4% buffer, not when leverage < 3% minimum
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Using wrong threshold causes banks to delay deleveraging until insolvency or react prematurely, breaking
      the threshold model designed to avoid excessive trading at minor losses
    stage_ids:
    - agent_decision
  - id: finance-C-027
    when: When calibrating the leverage thresholds
    action: Use the same threshold value for both insolvency trigger and deleveraging buffer
    severity: fatal
    kind: domain_rule
    modality: must_not
    consequence: Equal thresholds eliminate the buffer zone, causing banks to either do nothing or immediately default with
      no intermediate deleveraging option
    stage_ids:
    - agent_decision
  - id: finance-C-029
    when: When implementing the market clearing stage
    action: Use a floating-point tolerance eps=1e-9 for zero-equivalence checks in financial calculations
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Without eps=1e-9 tolerance, IEEE 754 floating-point precision errors cause division-by-zero crashes or incorrect
      cash transfers, corrupting the simulation's financial state
    stage_ids:
    - market_clearing
  - id: finance-C-030
    when: When adding orders to the orderbook
    action: Assert quantity > 0 before adding orders to prevent invalid sales
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Zero or negative quantities in the orderbook cause incorrect price impact calculations and cash transfers,
      corrupting the market clearing mechanism
    stage_ids:
    - market_clearing
  - id: finance-C-031
    when: When calculating price impact from asset sales
    action: Implement the Cifuentes 2005 exponential price impact formula with beta calibrated to 5% impact at 5% sold
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Using a linear or incorrect price impact formula causes the model to misestimate fire-sale contagion dynamics,
      leading to unreliable systemic risk estimates
    stage_ids:
    - market_clearing
  - id: finance-C-042
    when: When tracking cumulative sales for price impact
    action: Initialize total_quantities with actual market capitalization values from bank balance sheets
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Zero or uninitialized total_quantities causes division-by-zero or undefined price impact calculations, crashing
      the market clearing stage
    stage_ids:
    - market_clearing
  - id: finance-C-043
    when: When implementing bank default handling logic
    action: defer default execution to the step() phase by using do_trigger_default flag
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Immediate default execution in act() phase causes order-dependent effects when default triggers bilateral
      funding pulls, leading to non-reproducible simulation results depending on agent processing order
    stage_ids:
    - default_handling
  - id: finance-C-044
    when: When processing a bank marked as dead
    action: prevent the dead bank from executing any further actions by checking the alive flag
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Dead banks continue to execute actions, corrupting systemic risk calculations by including already defaulted
      institutions in deleveraging decisions
    stage_ids:
    - default_handling
  - id: finance-C-045
    when: When liquidating defaulting bank's assets
    action: sell assets proportionally using the same sell_assets_proportionally function as normal deleveraging
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Non-proportional fire sale liquidation creates inconsistent market dynamics compared to normal deleveraging,
      causing skewed price impact calculations and incorrect systemic risk measurements
    stage_ids:
    - default_handling
  - id: finance-C-052
    when: When processing agent actions in the simulation loop
    action: execute each step() calls for each agents before any act() calls
    severity: fatal
    kind: architecture_guardrail
    modality: must
    consequence: Interleaving step() and act() calls causes defaults triggered during act() to execute out of order, breaking
      the deferred execution design and causing inconsistent systemic outcomes
    stage_ids:
    - default_handling
  - id: finance-C-055
    when: When validating that fire sale assets enter the market orderbook
    action: confirm each defaulted bank assets are added to the market orderbook before price clearing
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Defaulting bank assets not entering the orderbook means they are not included in price impact calculations,
      causing fire sale prices to be artificially inflated and systemic risk to be understated
    stage_ids:
    - default_handling
  - id: finance-C-056
    when: When initializing the model from EBA_2018.csv balance sheet data
    action: create exactly 48 Bank agents as specified by NBANKS constant
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Systemic risk calculations (eose = sum(out) / NBANKS) will produce incorrect results if the bank count differs
      from 48, causing misleading stress test conclusions
    stage_ids:
    - initialization
    - shock_application
  - id: finance-C-057
    when: When creating Bank balance sheets during initialization
    action: populate balance sheet with exactly 4 asset components (cash, corp_bonds, gov_bonds, other_asset) and 2 liability
      components (loan, other_liability)
    severity: fatal
    kind: architecture_guardrail
    modality: must
    consequence: Balance sheet integrity violations will cause incorrect leverage calculations, leading to wrong insolvency
      detection and potentially masked systemic failures
    stage_ids:
    - initialization
    - shock_application
  - id: finance-C-058
    when: When AssetMarket is created during initialization
    action: initialize prices dict with default value of 1.0 for each asset types and record oldPrices as empty dict
    severity: fatal
    kind: architecture_guardrail
    modality: must
    consequence: Uninitialized price structures will cause KeyError exceptions during shock application or price impact calculations,
      halting simulation
    stage_ids:
    - initialization
    - shock_application
  - id: finance-C-066
    when: When checking bank solvency via leverage constraint
    action: raise DefaultException when leverage ratio (equity/assets) falls below BANK_LEVERAGE_MIN threshold (3%)
    severity: fatal
    kind: architecture_guardrail
    modality: must
    consequence: Insolvent banks will continue operating, accumulating losses and creating misleading systemic risk metrics
      in the stress test
    stage_ids:
    - market_clearing
    - default_handling
  - id: finance-C-074
    when: When implementing leverage ratio calculations for bank solvency checks
    action: Use equity_valuation divided by asset_valuation (lambda = E/A), not debt-to-equity or other ratios
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Bank solvency status will be incorrectly assessed, causing insolvent banks to continue trading or solvent
      banks to default unnecessarily
  - id: finance-C-075
    when: When implementing any financial comparisons involving price, quantity, or monetary values
    action: Use eps = 1e-9 tolerance threshold to avoid floating-point edge cases in IEEE 754 arithmetic
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Financial comparisons may fail silently due to floating-point precision errors, causing assets with near-zero
      prices to be incorrectly sold or loans with negligible amounts to be processed
  - id: finance-C-076
    when: When determining whether a bank has defaulted and must liquidate each tradable assets
    action: Use BankLeverageConstraint.is_insolvent() which checks if lambda < BANK_LEVERAGE_MIN (3%)
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Insolvent banks will not be properly defaulted, causing them to continue deleveraging and selling assets
      at depressed prices, accelerating systemic contagion
  - id: finance-C-077
    when: When implementing any Action class for bank deleveraging behavior
    action: Implement perform() method for execution logic and get_max() method returning maximum executable amount
    severity: fatal
    kind: architecture_guardrail
    modality: must
    consequence: Bank deleveraging actions will fail to execute or return incorrect amounts, breaking the contagion model
      and producing invalid systemic risk estimates
  - id: finance-C-078
    when: When implementing any Contract class to be considered for agent deleveraging
    action: Implement is_eligible(me) method returning boolean to filter which contracts are acted upon
    severity: fatal
    kind: architecture_guardrail
    modality: must
    consequence: Contracts will always return False for eligibility, preventing banks from selling assets or paying loans
      to delever, breaking the fire-sale cascade mechanism
  - id: finance-C-080
    when: When executing the simulation tick loop for agent-based model
    action: Call step() on each agents before calling act() on each agents, with market clearing between them
    severity: fatal
    kind: architecture_guardrail
    modality: must
    consequence: Order-dependent execution will produce different results based on agent ordering, invalidating simulation
      reproducibility and academic results
  - id: finance-C-081
    when: When processing asset sales within a simulation tick
    action: Call put_for_sale() on the market before calling clear_the_market() in the same tick to preserve orderbook integrity
    severity: fatal
    kind: architecture_guardrail
    modality: must
    consequence: Assets will be sold at current prices without accumulated orders, eliminating price impact effects and breaking
      the fire-sale contagion model
  - id: finance-C-082
    when: When tracking asset quantities during settlement in the orderbook
    action: Decrement putForSale_ buffer when settling a sale to prevent double-selling the same asset
    severity: fatal
    kind: architecture_guardrail
    modality: must
    consequence: Assets will be sold multiple times in the same tick, causing phantom cash generation and incorrect balance
      sheet calculations that corrupt systemic risk metrics
  - id: finance-C-086
    when: When presenting or reporting this system's simulation results to users or stakeholders
    action: Claim that simulation results represent real-time trading system capabilities or live market execution
    severity: fatal
    kind: claim_boundary
    modality: must_not
    consequence: Users will make investment or policy decisions based on inapplicable simulation assumptions, leading to severe
      misallocation of capital or incorrect regulatory assessments
  - id: finance-C-087
    when: When using this model for credit risk assessment or regulatory capital calculations
    action: Claim regulatory-grade accuracy or compliance with Basel/CRR capital adequacy frameworks
    severity: fatal
    kind: claim_boundary
    modality: must_not
    consequence: Regulatory submissions based on simplified fire-sale contagion model will violate compliance requirements,
      exposing institutions to penalties and supervisory action
  - id: finance-C-095
    when: When implementing bank asset initialization
    action: Calculate corporate bonds as total debt securities minus government bonds — government bonds must be allocated
      first, corporate bonds are whatever remains in the debt securities bucket
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Incorrectly treating corporate bonds as direct allocation instead of residual breaks regulatory data structure,
      causing balance sheet mismatch and invalid stress test results
    derived_from_bd_id: BD-035
  - id: finance-C-099
    when: When implementing contract inheritance hierarchy
    action: Verify ESLContract base class is available from the economicsl library before using contract inheritance — verify
      the dependency is installed and the library version is compatible
    severity: fatal
    kind: architecture_guardrail
    modality: must
    consequence: Contract inheritance from missing external library causes immediate initialization failure, preventing any
      simulation from running
    derived_from_bd_id: BD-106
  - id: finance-C-101
    when: When implementing market operation sequence within a tick
    action: Call put_for_sale() before clear_the_market() in the same tick — the pending sales must be registered before market
      finalization
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Reversing the operation order causes clear_the_market() to execute with stale market state, leading to incorrect
      price settlement and violating the intended market operation sequence
    derived_from_bd_id: BD-090
  - id: finance-C-116
    when: When implementing asset valuation for tradable securities in constraint_definition
    action: 'Use mark-to-market valuation: value = quantity × current price for each tradable assets — do not substitute with
      amortized cost or historical cost'
    severity: fatal
    kind: architecture_guardrail
    modality: must
    consequence: Using non-market valuations during stress scenarios masks actual leverage ratios; mark-to-market reveals
      true asset values needed for regulatory compliance and contagion detection
    derived_from_bd_id: BD-064
  - id: finance-C-124
    when: When implementing the delevering behavior for financial institutions
    action: Pay each liabilities first, then sell assets to raise liquidity — liability payments must occur before any asset
      sales within each delevering step
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Paying liabilities first reflects legal obligation hierarchy; reversing this order (selling assets before
      liabilities) violates both regulatory requirements and behavioral realism for regulated banks
    derived_from_bd_id: BD-044
  - id: finance-C-139
    when: When configuring or verifying price impact computation models for liquidation scenarios
    action: Verify price impact computation differentiates between liquid and illiquid assets within the same asset type;
      DO NOT apply uniform price impact coefficients across asset-type groupings when liquidity characteristics differ
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Uniform price impact coefficients applied to mixed-liquidity asset groups cause systematic mispricing of
      illiquid asset liquidations, with execution shortfalls accumulating silently in backtests
    derived_from_bd_id: BD-114
  - id: finance-C-160
    when: When implementing insolvency detection in the banking model
    action: Trigger bank default when leverage falls below 3% — banks with leverage >= 3% must remain solvent and tradable;
      banks with leverage < 3% must be marked as insolvent and excluded from trading
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Setting incorrect insolvency threshold causes either premature defaults (threshold too high) or late defaults
      allowing insolvent banks to continue trading (threshold too low), both violating regulatory capital adequacy assumptions
    derived_from_bd_id: BD-072
  - id: finance-C-162
    when: When setting initial capital structure constraints in the banking model
    action: Set target leverage at 5% representing 2% buffer above the 4% deleveraging trigger — verify target > deleveraging_trigger
      > insolvency_threshold (5% > 4% > 3%) hierarchical ordering is maintained
    severity: fatal
    kind: architecture_guardrail
    modality: must
    consequence: Setting target leverage <= deleveraging trigger removes the buffer hierarchy, preventing banks from maintaining
      safe capital buffers above the deleveraging activation level
    derived_from_bd_id: BD-074
  regular:
  - id: finance-C-004
    when: When initializing bank balance sheets
    action: Allow banks to be created with non-positive leverage ratios
    severity: high
    kind: domain_rule
    modality: must_not
    consequence: Banks with zero or negative leverage ratios will cause division by zero in asset calculation (CET1E/leverage)
      or immediately trigger insolvency, corrupting the model's initial state
    stage_ids:
    - initialization
  - id: finance-C-006
    when: When initializing the model
    action: Use real EBA 2018 data as the sole source for bank balance sheet initialization
    severity: high
    kind: resource_boundary
    modality: must
    consequence: The model claims reproducibility from real European banking data; using synthetic or modified data would
      invalidate the empirical grounding that distinguishes this stress test from arbitrary simulations
    stage_ids:
    - initialization
  - id: finance-C-007
    when: When initializing banks from EBA CSV data
    action: Parse gov_bonds field using eval() to handle compound expressions like '13+27682'
    severity: high
    kind: operational_lesson
    modality: must
    consequence: The EBA_2018.csv uses additive notation for government bond holdings (e.g., '13+27682' meaning 13+27682 units);
      using simple int() parsing would fail, causing ValueError and model crash
    stage_ids:
    - initialization
  - id: finance-C-008
    when: When initializing banks from EBA CSV data
    action: Split CSV rows using space delimiter as the data format specifies
    severity: high
    kind: operational_lesson
    modality: must
    consequence: The EBA_2018.csv uses space-separated values with no quoted fields; using comma splitting would misalign
      columns, causing bank_name, CET1E, leverage to be incorrectly parsed and balance sheet calculations to fail
    stage_ids:
    - initialization
  - id: finance-C-009
    when: When initializing the model
    action: Set random seeds before any simulation runs to verify reproducibility
    severity: medium
    kind: operational_lesson
    modality: must
    consequence: Without deterministic random seeds, the random.shuffle() in run_simulation (model.py:89) will produce different
      agent ordering each run, causing non-reproducible systemic risk measurements
    stage_ids:
    - initialization
  - id: finance-C-011
    when: When initializing bank balance sheets
    action: 'Maintain correct order of balance sheet component calculations: asset→cash→liability→loan→other_liability'
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Balance sheet components have dependencies (liability=asset-CET1E, loan=liability/2); incorrect calculation
      order will produce invalid negative values or incorrect leverage ratios
    stage_ids:
    - initialization
  - id: finance-C-012
    when: When initializing the model
    action: Claim that stress test results represent actual live trading outcomes
    severity: high
    kind: claim_boundary
    modality: must_not
    consequence: This is a simplified fire sale contagion model using EBA 2018 data snapshots; presenting model outputs as
      predictions of actual bank behavior or market outcomes would mislead stakeholders about real-world risk
    stage_ids:
    - initialization
  - id: finance-C-013
    when: When initializing the model
    action: Present model outputs as regulatory-grade capital adequacy assessments
    severity: high
    kind: claim_boundary
    modality: must_not
    consequence: The model uses simplified tier-1 leverage ratio (CET1E/leverage) rather than risk-weighted capital ratios
      from Basel frameworks; presenting it as regulatory assessment would misrepresent compliance status
    stage_ids:
    - initialization
  - id: finance-C-017
    when: When implementing deleveraging priority
    action: Execute loan repayment before asset sales when both actions are available
    severity: high
    kind: domain_rule
    modality: must
    consequence: Selling assets before repaying loans triggers asset price impact (per Cifuentes 2005), reducing collateral
      value and potentially accelerating systemic contagion compared to liability reduction first
    stage_ids:
    - agent_decision
  - id: finance-C-018
    when: When implementing proportional allocation across multiple contracts
    action: Distribute deleveraging amounts proportionally based on each contract's maximum action value
    severity: high
    kind: domain_rule
    modality: must
    consequence: Non-proportional allocation allows one large position to monopolize deleveraging, causing concentration risk
      and violating Cont-Schaanning 2017 fire sale stress testing principles
    stage_ids:
    - agent_decision
  - id: finance-C-021
    when: When running batch simulations for systemic risk measurement
    action: Randomly shuffle agent processing order each simulation round
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Fixed agent order creates systematic first-mover bias across simulations, causing deterministic results that
      mask true systemic risk variance and correlation effects
    stage_ids:
    - agent_decision
  - id: finance-C-022
    when: When default is triggered during the decision phase
    action: Defer default execution to the step() phase rather than executing immediately
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Immediate default execution during act() creates order dependency, as the defaulting bank's asset liquidation
      affects other banks' decisions in the same round
    stage_ids:
    - agent_decision
  - id: finance-C-023
    when: When calculating monetary amounts in deleveraging operations
    action: Use epsilon tolerance (eps = 1e-9) to avoid floating-point edge cases
    severity: medium
    kind: domain_rule
    modality: must
    consequence: Without epsilon tolerance, near-zero quantities trigger infinite loops or NaN values in exponential price
      impact calculations, corrupting simulation results
    stage_ids:
    - agent_decision
  - id: finance-C-024
    when: When implementing the deleveraging strategy
    action: Use a hardcoded fixed order for contract selection; strategy must be parameterized
    severity: medium
    kind: resource_boundary
    modality: must_not
    consequence: Hardcoded strategy prevents testing different allocation behaviors (e.g., proportional vs. equal-weighted),
      limiting stress test coverage and validation
    stage_ids:
    - agent_decision
  - id: finance-C-025
    when: When running the model for firesale scenarios
    action: Enable SIMULTANEOUS_FIRESALE mode to clear market between step and act phases
    severity: high
    kind: operational_lesson
    modality: must
    consequence: Without simultaneous firesale mode, asset prices update sequentially during step(), causing price discovery
      timing to depend on agent order rather than aggregate demand
    stage_ids:
    - agent_decision
  - id: finance-C-026
    when: When presenting simulation results
    action: Claim simulated returns represent expected live trading performance
    severity: high
    kind: claim_boundary
    modality: must_not
    consequence: Backtested stress test results do not guarantee live execution outcomes due to market impact assumptions,
      simplified behavioral rules, and absence of counterparty reactions
    stage_ids:
    - agent_decision
  - id: finance-C-028
    when: When determining asset eligibility for deleveraging
    action: Only include assets with quantity greater than already-marked-for-sale amount
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Including already-sold assets in available actions causes attempts to sell non-existent quantities, leading
      to negative balances or invalid state
    stage_ids:
    - agent_decision
  - id: finance-C-032
    when: When settling sell orders in the market clearing stage
    action: Execute sales at the midpoint of pre-sale and post-sale prices to prevent front-running
    severity: high
    kind: domain_rule
    modality: must
    consequence: Using only post-sale prices creates an unfair advantage for late sellers, violating the batch execution guarantee
      and distorting market clearing fairness
    stage_ids:
    - market_clearing
  - id: finance-C-033
    when: When preventing zero-quantity or zero-price sales
    action: Skip sale execution when asset price or computed quantity is effectively zero
    severity: high
    kind: domain_rule
    modality: must
    consequence: Executing sales with zero price or zero quantity causes division errors or incorrect cash accounting, corrupting
      the market clearing settlement
    stage_ids:
    - market_clearing
  - id: finance-C-034
    when: When modeling market clearing behavior
    action: Batch each asset sales before computing price impact when SIMULTANEOUS_FIRESALE=True
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Sequential execution with price signals affecting subsequent sales creates look-ahead bias, preventing realistic
      modeling of illiquid market dynamics
    stage_ids:
    - market_clearing
  - id: finance-C-035
    when: When executing the market clearing sequence
    action: Capture oldPrices BEFORE computing price impact, then compute price impact BEFORE settling orders
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Capturing oldPrices after price impact or settling before price impact breaks the midpoint price guarantee,
      causing incorrect cash transfers to selling banks
    stage_ids:
    - market_clearing
  - id: finance-C-036
    when: When computing price impact across multiple asset types
    action: Aggregate quantities sold per asset type before computing price impact for that type
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Computing price impact per individual asset instead of per asset type violates the assumption that same-type
      assets are perfect substitutes, causing incorrect price trajectories
    stage_ids:
    - market_clearing
  - id: finance-C-037
    when: When configuring the stress testing model
    action: Claim the model produces real-time trading signals or actual market prices
    severity: high
    kind: claim_boundary
    modality: must_not
    consequence: Presenting model outputs as live trading signals misleads stakeholders about the model's purpose as a stress
      testing and systemic risk estimation tool
    stage_ids:
    - market_clearing
  - id: finance-C-038
    when: When presenting stress testing results
    action: Claim backtest returns equal expected live trading returns without specified caveats
    severity: medium
    kind: claim_boundary
    modality: must_not
    consequence: Backtested systemic risk estimates reflect model assumptions and historical data patterns, not guaranteed
      future market behavior under stress conditions
    stage_ids:
    - market_clearing
  - id: finance-C-039
    when: When configuring the clearing mode
    action: Understand that SIMULTANEOUS_FIRESALE=True batches each sales, preventing price signals from affecting subsequent
      sales in the same round
    severity: high
    kind: operational_lesson
    modality: must
    consequence: Setting SIMULTANEOUS_FIRESALE=False allows cascading price impacts where each sale affects subsequent prices,
      fundamentally changing the market clearing semantics
    stage_ids:
    - market_clearing
  - id: finance-C-040
    when: When selecting the price impact function
    action: Use exponential price impact rather than linear to capture nonlinear liquidity drain where small sales have limited
      impact and large sales trigger steep drops
    severity: high
    kind: resource_boundary
    modality: must
    consequence: Linear price impact underestimates fire-sale contagion for large sales and overestimates impact for small
      sales, producing misleading systemic risk estimates
    stage_ids:
    - market_clearing
  - id: finance-C-041
    when: When configuring price impact parameters
    action: Set the default price impact to 0.05 (5%) meaning selling 5% of market cap causes 5% price drop
    severity: medium
    kind: resource_boundary
    modality: must
    consequence: Using non-calibrated price impact parameters produces unrealistic liquidity assumptions, either overstating
      or understating systemic risk
    stage_ids:
    - market_clearing
  - id: finance-C-046
    when: When a DefaultException is raised during agent decision phase
    action: increment the bank_defaults_this_round counter for record keeping
    severity: high
    kind: domain_rule
    modality: must
    consequence: Missing counter increment breaks systemic risk tracking and makes it impossible to calculate the extent of
      system-wide stress events from simulation output
    stage_ids:
    - default_handling
  - id: finance-C-047
    when: When executing agent actions in the main simulation loop
    action: include the defaulting bank in the current round's agent list to verify its assets enter the orderbook
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Removing the defaulting bank before its assets are sold means fire sale liquidation never occurs, breaking
      the contagion mechanism and producing incorrect systemic risk estimates
    stage_ids:
    - default_handling
  - id: finance-C-048
    when: When market clearing occurs during simultaneous fire sale mode
    action: clear the market and update asset prices before the next round of agent decisions
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Surviving banks receive stale prices from before fire sales, causing them to make incorrect deleveraging
      decisions based on inflated asset valuations
    stage_ids:
    - default_handling
  - id: finance-C-049
    when: When implementing default handling for the stress testing model
    action: restrict default handling to SOLVENCY type only (not liquidity or margin call failures)
    severity: high
    kind: resource_boundary
    modality: must
    consequence: Including unsupported default types (liquidity, margin call) in the model produces results that do not match
      the documented model specification, misleading stakeholders about systemic risk
    stage_ids:
    - default_handling
  - id: finance-C-050
    when: When selecting the default treatment algorithm
    action: treat default_treatment as a replaceable module that can be swapped without breaking the core simulation loop
    severity: medium
    kind: resource_boundary
    modality: should
    consequence: Hardcoding default treatment logic prevents experimentation with alternative resolution strategies and makes
      the model less adaptable to different stress scenarios
    stage_ids:
    - default_handling
  - id: finance-C-051
    when: When running the simulation to ensure order independence
    action: shuffle the agent list before each round of step() and act() calls
    severity: high
    kind: operational_lesson
    modality: must
    consequence: Without shuffling, agent processing order affects default timing and fire sale sequencing, producing non-deterministic
      results that cannot be replicated across simulation runs
    stage_ids:
    - default_handling
  - id: finance-C-053
    when: When interpreting simulation results for policy decisions
    action: claim that backtest stress test results predict actual live trading outcomes
    severity: high
    kind: claim_boundary
    modality: must_not
    consequence: Presenting simulation outputs as expected real-world results ignores model simplification, assumptions, and
      the well-documented gap between backtest and live performance
    stage_ids:
    - default_handling
  - id: finance-C-054
    when: When presenting the stress testing framework capabilities
    action: claim real-time trading support for a pure backtesting simulation framework
    severity: high
    kind: claim_boundary
    modality: must_not
    consequence: The system uses polling-based simulation with no live market connectivity, so claiming real-time capabilities
      would mislead users about actual system functionality
    stage_ids:
    - default_handling
  - id: finance-C-059
    when: When applying the initial shock to asset prices
    action: reduce price by exactly the INITIAL_SHOCK fraction (default 20%) and propagate price changes to each Bank agents
      via update_asset_price()
    severity: high
    kind: domain_rule
    modality: must
    consequence: Agents will retain stale asset valuations causing incorrect leverage calculations, leading to wrong deleveraging
      decisions and distorted systemic risk results
    stage_ids:
    - shock_application
    - agent_decision
  - id: finance-C-060
    when: When propagating shocked prices to agent balance sheets
    action: update the price attribute of each Tradable contract to match the new market price from AssetMarket.prices
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Agents will calculate leverage based on outdated prices, causing incorrect insolvency detection and potential
      domino effects in the contagion chain
    stage_ids:
    - shock_application
    - agent_decision
  - id: finance-C-061
    when: When Banks submit sell orders to the AssetMarket
    action: pass orders only with quantity greater than zero (enforced by assert) and append Order objects to the orderbook
      list
    severity: high
    kind: domain_rule
    modality: must
    consequence: AssertionError will terminate the simulation when zero-quantity orders are submitted, preventing completion
      of stress test runs
    stage_ids:
    - agent_decision
    - market_clearing
  - id: finance-C-062
    when: When SIMULTANEOUS_FIRESALE mode is enabled (default)
    action: accumulate each orders in orderbook during agent decisions before calling clear_the_market() once per timestep
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: If clear_the_market() is called per-order instead of batched, price impact calculations will be fragmented
      and produce incorrect cascading price effects
    stage_ids:
    - agent_decision
    - market_clearing
  - id: finance-C-063
    when: When clearing the market and computing price impact
    action: save oldPrices BEFORE updating prices so that settlement uses the correct pre-impact price for calculating proceeds
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Settlement will use incorrect price reference, causing wrong cash proceeds calculations and breaking conservation
      of value across the edge
    stage_ids:
    - market_clearing
    - agent_decision
  - id: finance-C-064
    when: When updating agent asset prices after market clearing
    action: propagate price changes to agents only when price decreases (priceLost > 0), not on price increases
    severity: medium
    kind: domain_rule
    modality: must
    consequence: Agents will receive incorrect positive price signals on asset sales, causing artificial equity increases
      and distorted deleveraging behavior
    stage_ids:
    - market_clearing
    - agent_decision
  - id: finance-C-065
    when: When settling asset sale orders
    action: use midpoint pricing formula (current_price + old_price) / 2 for settlement, not the new impacted price
    severity: high
    kind: domain_rule
    modality: must
    consequence: Using the post-impact price would undervalue seller proceeds, causing systematic underestimation of cash
      raised and distorted leverage calculations
    stage_ids:
    - market_clearing
    - agent_decision
  - id: finance-C-067
    when: When catching DefaultException in agent decision phase
    action: set the bank's alive flag to False and increment bank_defaults_this_round counter for tracking
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Dead banks will remain in active agent lists, causing errors in subsequent rounds and corrupting systemic
      risk calculations
    stage_ids:
    - default_handling
    - agent_decision
  - id: finance-C-068
    when: When filtering agents for the next simulation round
    action: process banks marked as not alive (alive=False) through step() or act() methods
    severity: high
    kind: operational_lesson
    modality: must_not
    consequence: Processing dead banks will cause KeyError or attribute errors as their balance sheets have been liquidated,
      halting the simulation
    stage_ids:
    - default_handling
    - agent_decision
  - id: finance-C-069
    when: When iterating over agents in each simulation timestep
    action: shuffle agent list randomly before processing to verify order independence in the simultaneous firesale scenario
    severity: medium
    kind: operational_lesson
    modality: must
    consequence: Systematic processing order will bias results as earlier agents get better prices in sequential clearing,
      creating non-reproducible and unfair outcomes
    stage_ids:
    - default_handling
    - agent_decision
  - id: finance-C-070
    when: When presenting stress test results as system-wide risk metrics
    action: claim that simulated bank defaults directly predict actual bank failures in a real stress scenario
    severity: high
    kind: claim_boundary
    modality: must_not
    consequence: Presenting simulated defaults as predictions will mislead policymakers as the model uses simplified balance
      sheet structures and threshold-based triggers
    stage_ids:
    - market_clearing
    - default_handling
    - agent_decision
  - id: finance-C-071
    when: When calibrating price impact parameters based on literature
    action: set beta coefficient such that selling 5% of market cap causes exactly 5% price drop (the Cifuentes 2005 convention)
    severity: medium
    kind: operational_lesson
    modality: must
    consequence: Incorrect beta calibration will distort systemic risk estimates, leading to either underestimation of fire
      sale contagion or excessive conservatism
    stage_ids:
    - agent_decision
    - market_clearing
  - id: finance-C-072
    when: When interpreting simulation results for policy decisions
    action: assume that backtested stress test parameters (leverage thresholds, price impacts) will remain valid without recalibration
      for changing market conditions
    severity: medium
    kind: claim_boundary
    modality: must_not
    consequence: Stale calibration parameters will produce misleading risk assessments as market microstructure and bank balance
      sheet compositions evolve over time
    stage_ids:
    - shock_application
    - agent_decision
    - market_clearing
  - id: finance-C-073
    when: When encountering unexpected insolvency behavior or price anomalies
    action: assume the problem is due to data quality without investigating the leverage calculation and price update chain
    severity: high
    kind: rationalization_guard
    modality: must_not
    consequence: Misattributing cascading failures to data issues will mask implementation bugs in the contagion mechanics,
      leading to persistent incorrect results
    stage_ids:
    - shock_application
    - agent_decision
    - market_clearing
  - id: finance-C-079
    when: When representing asset quantities in tradable contracts
    action: Store quantities as floats, not integers, to support fractional sales and proportional allocation
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Integer quantities prevent proportional deleveraging when a bank needs to sell 33.5% of its assets, breaking
      the Greenwood 2015 and Cont-Schaanning 2017 behavioral models
  - id: finance-C-083
    when: When calculating settlement prices during asset sales
    action: Use midpoint of current price and oldPrices (pre-update price) for settlement to prevent look-ahead bias
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Settlement at updated prices allows agents to profit from their own price-impacting trades, introducing strategic
      arbitrage and invalidating the economic model
  - id: finance-C-084
    when: When conducting agent-based simulation to ensure reproducible yet stochastic results
    action: Shuffle agent execution order randomly each tick to verify order-independent execution
    severity: high
    kind: operational_lesson
    modality: must
    consequence: Simulation results become deterministic with agent ordering, producing biased systemic risk estimates that
      depend on input data ordering rather than structural factors
  - id: finance-C-085
    when: When using py-economicsl external library dependency
    action: Pin to specific git commit or version tag as this is a research library without PyPI release
    severity: high
    kind: resource_boundary
    modality: must
    consequence: Dependency breakage or API changes will cause simulation to fail, blocking academic research and reproducibility
      efforts
  - id: finance-C-088
    when: When interpreting the model's price impact function
    action: Claim that the Cifuentes 2005 exponential price impact model equals actual market microstructure or high-frequency
      trading dynamics
    severity: high
    kind: claim_boundary
    modality: must_not
    consequence: Policy recommendations based on simplified price impact will underestimate or overestimate systemic risk
      in real markets with order book dynamics, dark pools, and HFT
  - id: finance-C-089
    when: When using this model for high-frequency trading strategy development
    action: Claim applicability to tick-by-tick trading or intraday arbitrage strategies
    severity: high
    kind: claim_boundary
    modality: must_not
    consequence: Trading algorithms based on daily timestep model will fail to capture sub-second market dynamics, latency
      effects, and adverse selection from HFT counterparties
  - id: finance-C-090
    when: When presenting simulation outputs without proper caveats
    action: Omit disclosure that results are based on 2018 EBA data, simplified behavioral assumptions, and exponential price
      impact approximation
    severity: medium
    kind: claim_boundary
    modality: must_not
    consequence: Stakeholders will misinterpret outputs as current market conditions rather than historical stress test scenario,
      leading to inappropriate resource allocation or policy decisions
  - id: finance-C-091
    when: When comparing simulation results across different parameter configurations
    action: Re-initialize model from scratch (model.initialize()) before each simulation run to prevent state leakage
    severity: high
    kind: operational_lesson
    modality: must
    consequence: Residual state from previous runs will corrupt systemic risk measurements, producing false comparisons between
      price impact or shock scenarios
  - id: finance-C-092
    when: When initializing bank balance sheets from external data
    action: Parse government bond holdings correctly using eval() for the 'n+m' notation in EBA_2018.csv
    severity: high
    kind: domain_rule
    modality: must
    consequence: Banks with split government bond holdings will have incorrect balance sheets, distorting asset allocation
      and systemic risk calculations
  - id: finance-C-093
    when: When modeling bank deleveraging behavior in the simulation
    action: Pay off liabilities first, then sell assets proportionally if cash is insufficient to meet deleveraging target
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Banks will sell assets before reducing liabilities, incorrectly amplifying fire-sale pressure and overestimating
      systemic contagion
  - id: finance-C-094
    when: When implementing execution order in agent_decision logic
    action: 'Execute actions in two-phase structure: first decide in act() method, then execute in step() method — never execute
      trades directly in the decision method'
    severity: high
    kind: domain_rule
    modality: must
    consequence: Combining decision and execution in a single phase creates first-mover advantage in simultaneous firesale
      scenarios, causing parallel execution to produce unfair outcomes where the first actor gets better prices
    derived_from_bd_id: BD-008
  - id: finance-C-096
    when: When implementing bank initialization with cash buffer parameters
    action: Verify that cash=0.05*total_assets ratio matches actual bank behavior for the time period being modeled; adjust
      to actual historical cash ratios if analyzing different regulatory environments
    severity: medium
    kind: operational_lesson
    modality: should
    consequence: Hardcoded 5% cash buffer does not reflect actual bank behavior across different time periods and regulatory
      environments, causing stress test results to overestimate or underestimate bank liquidity during crises
    derived_from_bd_id: BD-003
  - id: finance-C-097
    when: When implementing market clearing price impact calculations
    action: Verify PRICE_IMPACTS calibration matches actual liquidity conditions — beta=-1/0.05*log(1-price_impact) calibrates
      to 5% impact at 5% sold; adjust parameters if modeling different liquidity regimes
    severity: medium
    kind: operational_lesson
    modality: should
    consequence: Default price impact calibration may not match actual market liquidity, causing strategies to assume execution
      at unrealistic prices and produce backtest results that cannot be replicated in live trading
    derived_from_bd_id: BD-016
  - id: finance-C-098
    when: When implementing liability allocation logic
    action: Verify liability_split_ratio==0.5 when granular data is unavailable — if actual distribution differs significantly,
      use historical proportions instead of equal split to avoid modeler bias
    severity: high
    kind: domain_rule
    modality: must
    consequence: Equal 50/50 split is an arbitrary assumption that may not match actual liability composition, introducing
      systematic bias into balance sheet calculations that compounds over time
    derived_from_bd_id: BD-004
  - id: finance-C-100
    when: When implementing market clearing for fire sale simulations
    action: Validate that the selected clearing mode (SIMULTANEOUS_FIRESALE or sequential) matches the intended scenario —
      if strategy depends on a specific mode, the alternative mode must produce comparable results for robustness
    severity: high
    kind: domain_rule
    modality: must
    consequence: Using incorrect clearing mode produces materially different market outcomes, causing strategy backtests to
      be invalid for scenarios with different execution assumptions
    derived_from_bd_id: BD-100
  - id: finance-C-102
    when: When implementing Monte Carlo simulation with agent order randomization in backtesting
    action: Use random shuffle (BD-045) with fixed seed 1337 (BD-051) AND run across 100 Monte Carlo simulations (BD-052)
      together; each three components are required for statistical validity - using any subset produces biased or non-reproducible
      results
    severity: high
    kind: domain_rule
    modality: must
    consequence: Using random shuffle without fixed seed produces non-reproducible results; using fixed seed without sufficient
      MC runs produces statistically biased results that misrepresent fair agent ordering
    derived_from_bd_id: BD-113
  - id: finance-C-103
    when: When implementing state transition modeling with transition matrices in backtesting
    action: Assume transition matrices are time-homogeneous by default - this capability is missing; matrices may vary across
      different time periods without explicit homogeneity handling
    severity: high
    kind: claim_boundary
    modality: must_not
    consequence: Assuming time-homogeneous transition matrices when the framework does not enforce this causes incorrect state
      evolution modeling, producing biased Monte Carlo simulation results
    derived_from_bd_id: BD-GAP-010
  - id: finance-C-104
    when: When implementing position deleveraging logic during stress scenarios
    action: Execute proportional delevering based on max-amount weighting — larger positions must be reduced proportionally
      more than smaller ones to maintain portfolio composition structure
    severity: high
    kind: domain_rule
    modality: must
    consequence: Changing deleveraging from proportional to equal-weighting causes portfolio composition to shift toward smaller
      positions, creating artificial diversification that does not reflect actual deleveraging behavior in stress scenarios
    derived_from_bd_id: BD-059
  - id: finance-C-105
    when: When implementing loan payment calculation logic in contract execution
    action: Truncate loan payment amounts at the original notional principal — accumulated interest payments must not exceed
      the loan principal amount
    severity: high
    kind: domain_rule
    modality: must
    consequence: Removing the notional truncation allows accumulated interest to exceed the original loan principal, causing
      banks to overpay lenders in simulation and creating non-realistic financial outcomes
    derived_from_bd_id: BD-061
  - id: finance-C-106
    when: When implementing action generation logic for bank decision-making
    action: Recompute available actions from scratch at each simulation step — do not cache or reuse action lists from previous
      steps
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Caching available actions causes stale action data when balance sheet state changes between steps, potentially
      allowing banks to select actions that are no longer valid and producing inconsistent simulation behavior
    derived_from_bd_id: BD-063
  - id: finance-C-107
    when: When implementing default/fire-sale liquidation logic in bank resolution
    action: Execute fire-sale asset sales proportionally across each asset types — maintain the same relative weighting as
      normal deleveraging to verify behavioral consistency
    severity: high
    kind: domain_rule
    modality: must
    consequence: Using non-proportional asset selection during default creates inconsistent behavior compared to normal deleveraging,
      introducing arbitrary asset type preferences that distort resolution outcomes and create unrealistic liquidation patterns
    derived_from_bd_id: BD-022
  - id: finance-C-108
    when: When initializing the model population for EU banking sector simulation
    action: Use the 48-bank population from EBA 2018 EU-wide stress test data — verify that any alternative bank set maintains
      similar systemic representativeness characteristics (total assets, tier 1 capital ratios, geographic distribution)
    severity: medium
    kind: operational_lesson
    modality: should
    consequence: Using synthetic or alternative bank populations without empirical grounding may produce simulation results
      that do not reflect actual EU banking sector behavior, creating false claims about systemic risk and resolution outcomes
    derived_from_bd_id: BD-023
  - id: finance-C-109
    when: When implementing asset sale order execution in market clearing
    action: Accumulate each putForSale orders before executing any sales — do not execute individual sales immediately upon
      request to verify price discovery fairness across each participants
    severity: high
    kind: domain_rule
    modality: must
    consequence: Executing asset sales immediately upon request creates first-mover advantages where early sellers receive
      better prices before aggregate selling pressure affects market prices, distorting simulation fairness and producing
      unrealistic price distributions
    derived_from_bd_id: BD-056
  - id: finance-C-110
    when: When implementing action execution logic for any transaction type
    action: Skip execution of actions with amounts below epsilon threshold (1e-9) — do not process micro-transactions that
      could cause floating-point numerical artifacts
    severity: medium
    kind: architecture_guardrail
    modality: must
    consequence: Processing zero or near-zero amounts without epsilon filtering introduces floating-point rounding errors
      that accumulate across simulation steps, potentially causing incorrect balance calculations and non-deterministic results
    derived_from_bd_id: BD-060
  - id: finance-C-111
    when: When implementing leverage targeting and deleveraging trigger logic
    action: Verify that the gap between deleveraging trigger threshold and target leverage is sufficient (recommend minimum
      3-5 percentage points) — the 1 percentage point buffer between 4% trigger and 5% target is mathematically insufficient
    severity: high
    kind: operational_lesson
    modality: must
    consequence: The 1% buffer causes oscillation behavior where banks overshoot below 3% leverage before reaching the 5%
      target due to price impact during asset sales reducing the asset base denominator, leading to insolvency outcomes that
      could be avoided with wider buffer
    derived_from_bd_id: BD-110
  - id: finance-C-112
    when: When implementing proportional deleveraging combined with midpoint pricing in stress scenarios
    action: Apply stress-adjusted pricing multiplier for proportional deleveraging under high aggregate selling pressure —
      the midpoint pricing formula (old_price + new_price) / 2 undervalues assets when exponential price impact is active
    severity: high
    kind: domain_rule
    modality: must
    consequence: Proportional deleveraging combined with midpoint pricing systematically undervalues asset sales during stress
      because exponential price impact makes new_price significantly lower under high aggregate selling, causing banks to
      receive below-fair-value prices for proportional amounts
    derived_from_bd_id: BD-111
  - id: finance-C-113
    when: When implementing Monte Carlo result aggregation logic in the random_shuffling module
    action: Use sample mean and standard deviation to aggregate simulation results across runs — do not replace with median/MAD
      or other robust statistics
    severity: high
    kind: domain_rule
    modality: must
    consequence: Replacing mean/std with median/MAD changes the statistical characterization of simulation outcomes; tail
      risk identification and scenario comparison become inconsistent with the documented analytical framework
    derived_from_bd_id: BD-076
  - id: finance-C-114
    when: When configuring the systemic risk threshold for cascade detection in risk_measurement
    action: Use exactly 5% as the systemic event threshold based on Gai-Kapadia (2010) cascade model literature — do not change
      to alternative values without re-validation against academic grounding
    severity: high
    kind: domain_rule
    modality: must_not
    consequence: Changing the 5% threshold alters what constitutes a systemic event; higher thresholds miss moderate cascades
      while lower thresholds over-flag normal bank failures, both causing misaligned risk management responses
    derived_from_bd_id: BD-024
  - id: finance-C-115
    when: When implementing leverage buffer logic in constraint_definition
    action: Set leverage buffer threshold to exactly 4% — banks at or below this level must initiate delevering actions, maintaining
      1% gap above insolvency threshold
    severity: high
    kind: domain_rule
    modality: must
    consequence: The 1% buffer (4% minus 3% insolvency) provides critical reaction time for banks to reduce risk before failure;
      changing this buffer reduces or eliminates the safety margin
    derived_from_bd_id: BD-026
  - id: finance-C-117
    when: When implementing valuation for non-market assets/liabilities in constraint_definition
    action: Use principal amount as valuation for Other assets and liabilities — accept that principal may differ from economic
      value for items with prepayment options or embedded derivatives
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Using principal provides consistent accounting treatment across non-market items; replacing with fair value
      estimation requires additional data and adds complexity that was explicitly rejected
    derived_from_bd_id: BD-066
  - id: finance-C-118
    when: When implementing price impact calculation in markets
    action: Use exponential price impact function for asset pricing — do not replace with linear models which underestimate
      market impact for large trades
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Linear price impact models underestimate cascading losses during stress; exponential functions capture realistic
      fire-sale dynamics where large sell orders cause disproportionately large price drops
    derived_from_bd_id: BD-069
  - id: finance-C-119
    when: When running parameter sweeps over price impact or initial shock magnitudes
    action: Add explicit validation to detect unknown asset types before sweep execution — do not rely on defaultdict(lambda:1.0)
      default price which silently returns 1.0 for misspelled or missing asset types
    severity: medium
    kind: operational_lesson
    modality: must
    consequence: Unknown assets silently receiving price 1.0 causes parameter sweep results to include incorrect valuations;
      during shock magnitude sweeps, misspelled asset types always show price 1.0 regardless of the shock parameter, corrupting
      the entire analysis
    derived_from_bd_id: BD-116
  - id: finance-C-120
    when: When implementing or modifying price impact calculations in market clearing
    action: Use the exponential price impact function per Cifuentes 2005 — the exponential form ensures price impacts accelerate
      as volume sold increases, capturing realistic market depth constraints in fire sale scenarios
    severity: high
    kind: domain_rule
    modality: must
    consequence: Using linear price impact underestimates fire sale severity in illiquid conditions, causing strategies to
      appear more resilient than they would be in actual market stress scenarios
    derived_from_bd_id: BD-038
  - id: finance-C-121
    when: When initializing asset prices in the market model
    action: Verify that initial asset prices are normalized to 1.0 (not actual market prices), and understand this assumes
      each assets start at par before percentage shocks are applied
    severity: medium
    kind: operational_lesson
    modality: should
    consequence: Using actual market prices instead of normalized values introduces heterogenous starting conditions that
      make leverage calculations and shock comparisons inconsistent across different asset price scales
    derived_from_bd_id: BD-040
  - id: finance-C-122
    when: When implementing solvency determination logic for financial institutions
    action: Determine solvency using ONLY the leverage ratio (equity/assets) — do not incorporate liquidity ratios, credit
      quality, or off-balance-sheet items into insolvency determination
    severity: high
    kind: domain_rule
    modality: must
    consequence: Adding multi-factor solvency criteria changes default timing and cascade dynamics, making backtest results
      inconsistent with the model's designed behavior aligned to Basel III leverage standards
    derived_from_bd_id: BD-041
  - id: finance-C-123
    when: When implementing default resolution logic in the simulation
    action: Defer default execution to the step() phase only — defaults must be resolved at step boundaries; mid-step insolvencies
      must accumulate without triggering default until the next step() call
    severity: high
    kind: domain_rule
    modality: must
    consequence: Immediate default execution creates ambiguous ordering dependencies where banks observe different market
      states, breaking the consistent cascade ordering the model relies on for reproducibility
    derived_from_bd_id: BD-042
  - id: finance-C-125
    when: When implementing agent ordering in the simulation round loop
    action: Randomly shuffle the agent order each simulation round — fixed ordering introduces systematic bias where earlier-acting
      banks consistently gain artificial advantages or disadvantages based purely on initialization order
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Fixed ordering creates reproducible but biased results where bank outcomes depend on initialization order
      rather than actual financial position, making backtest conclusions about bank resilience unreliable
    derived_from_bd_id: BD-045
  - id: finance-C-126
    when: When implementing default trigger handling for financial institutions
    action: Immediately sell each tradable assets upon default — maximize recovery for creditors by liquidating each tradable
      positions; non-tradable loans and positions should be written off
    severity: high
    kind: domain_rule
    modality: must
    consequence: Partial or gradual liquidation changes creditor recovery rates and fire sale pressure dynamics, creating
      inconsistent cascade severity compared to the model's maximum fire sale scenario design
    derived_from_bd_id: BD-046
  - id: finance-C-127
    when: When implementing simulation control flow
    action: Maintain strict separation between step() and act() — step() must handle each default resolution, act() must handle
      each delevering; defaults must be finalized before banks can act
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Interleaved or concurrent execution creates circular dependencies where delevering triggers default which
      triggers more delevering, causing unpredictable cascade dynamics and breaking model reproducibility
    derived_from_bd_id: BD-047
  - id: finance-C-128
    when: When implementing asset price update logic in market clearing
    action: Update asset prices only when price loss > 0 — prices can only decrease during stress events; gains during market
      stress must not be applied (asymmetric dynamics)
    severity: high
    kind: domain_rule
    modality: must
    consequence: Symmetric price updates during fire sales allow prices to recover mid-crisis, underestimating the duration
      and severity of fire sale cascades by allowing unrealistically quick market rebounds
    derived_from_bd_id: BD-049
  - id: finance-C-129
    when: When configuring initial shock parameters for stress testing scenarios
    action: Set initial shock parameters to sweep from 0% (no stress) to 30% (severe crisis) — verify shock values are within
      the calibrated range and not using default assumptions
    severity: medium
    kind: operational_lesson
    modality: should
    consequence: Using default shock values without calibration may miss resilience thresholds; values outside 0-30% range
      may capture unrealistic scenarios not observed in historical financial crises
    derived_from_bd_id: BD-054
  - id: finance-C-132
    when: When implementing or refactoring the balance sheet initialization logic
    action: Calculate total_assets as CET1E/leverage_ratio and liabilities as total_assets - CET1E, maintaining the fundamental
      accounting identity asset = liability + equity
    severity: high
    kind: domain_rule
    modality: must
    consequence: Breaking the accounting identity causes the balance sheet to fail balancing, producing incorrect leverage
      ratios and making all regulatory capital calculations meaningless
    derived_from_bd_id: BD-002
  - id: finance-C-133
    when: When configuring the model's capital structure parameters for regulatory stress testing
    action: Verify that CET1E (Common Equity Tier 1 capital) and leverage_ratio values match the intended regulatory framework
      requirements, and document the source of these parameters
    severity: medium
    kind: operational_lesson
    modality: should
    consequence: Incorrect capital assumptions produce wrong leverage ratios, causing the stress test to misrepresent the
      bank's actual capital adequacy and regulatory standing
    derived_from_bd_id: BD-002
  - id: finance-C-134
    when: When implementing the fire sale clearing logic during market stress events
    action: Process each fire sale orders simultaneously before computing price impact — aggregate each sell orders first,
      then calculate price impact once on the combined quantity, not sequentially on each individual order
    severity: high
    kind: domain_rule
    modality: must
    consequence: Sequential price impact application underestimates the true market impact of aggregate fire sales, causing
      the stress test to overstate remaining portfolio value during realistic liquidity crises
    derived_from_bd_id: BD-014
  - id: finance-C-135
    when: When initializing the bank's asset portfolio in the stress test model
    action: Calculate other_assets as total_assets - gov_bonds - corp_bonds - cash, ensuring the balance sheet identity holds
      after explicit asset allocation
    severity: high
    kind: domain_rule
    modality: must
    consequence: Incorrect residual calculation breaks the balance sheet identity, causing total assets to not equal the sum
      of allocated assets plus other assets, invalidating all subsequent stress calculations
    derived_from_bd_id: BD-068
  - id: finance-C-136
    when: When configuring the initial shock parameters for sovereign debt stress testing
    action: Verify that the 20% initial shock default on government bonds matches the intended stress scenario severity, and
      adjust if modeling a different crisis magnitude (e.g., mild 5%, extreme 40%)
    severity: medium
    kind: operational_lesson
    modality: should
    consequence: Using the wrong shock magnitude produces non-representative stress test results — too low understates contagion
      risk, too high may trigger unrealistic cascading defaults in the model
    derived_from_bd_id: BD-005
  - id: finance-C-137
    when: When implementing the default handling and fire sale execution logic
    action: Batch each sell orders together before computing price impact — set SIMULTANEOUS_FIRESALE=True to calculate impact
      once on aggregate quantity, preserving the illiquidity assumption that orders don't move prices until clearing
    severity: high
    kind: domain_rule
    modality: must
    consequence: Disabling batch mode causes price signals from early sales to affect subsequent sales, breaking the illiquidity
      assumption and understating the cascade severity in stress scenarios
    derived_from_bd_id: BD-092
  - id: finance-C-138
    when: When implementing liquidation logic using default liquidation that sells each tradable assets proportionally
    action: Verify that per-asset-type price impact assumptions hold for the specific assets being liquidated; assets with
      different liquidity characteristics (e.g., government bonds vs corporate bonds from defaulted entity) should NOT be
      treated as fungible substitutes
    severity: high
    kind: operational_lesson
    modality: must
    consequence: Treating illiquid assets as fungible with liquid counterparts underestimates liquidation costs by 10-30%
      for distressed securities, causing backtest results to overstate actual recovery values
    derived_from_bd_id: BD-114
  - id: finance-C-140
    when: When implementing or extending initialization and scheduling logic in the trading framework
    action: Assume trading calendar operations work correctly without explicit isolation from system calendar operations;
      trading calendar interactions with system calendar can cause incorrect scheduling and missed trading windows
    severity: high
    kind: claim_boundary
    modality: must_not
    consequence: Without explicit trading calendar isolation, batch jobs and scheduled tasks may execute on non-trading days
      or miss trading windows, causing strategies to fail or positions to remain unmanaged during critical periods
    derived_from_bd_id: BD-GAP-002
  - id: finance-C-141
    when: When implementing trading calendar management in the framework
    action: Implement explicit trading calendar isolation by maintaining a separate calendar instance for trading operations,
      ensuring each date/time operations involving trading schedules use this isolated calendar and validate against trading
      day rules
    severity: high
    kind: domain_rule
    modality: must
    consequence: Without isolated trading calendar, system calendar changes or timezone shifts can corrupt trading schedules,
      causing strategies to attempt trading on non-trading days or skip valid trading opportunities
    derived_from_bd_id: BD-GAP-002
  - id: finance-C-142
    when: When implementing any date/time handling in the trading framework
    action: Assume date/time values are implicitly in the correct timezone; implicit timezone handling leads to execution
      at wrong times, causing strategies to trade before or after market opens
    severity: high
    kind: claim_boundary
    modality: must_not
    consequence: Implicit timezone handling causes trades to execute at wrong times (e.g., buying after market close, selling
      before open), resulting in missed opportunities or execution at unfavorable prices
    derived_from_bd_id: BD-GAP-003
  - id: finance-C-143
    when: When implementing any datetime or timestamp fields in the trading framework
    action: Add explicit timezone annotation to each datetime fields and enforce timezone validation at data ingestion; verify
      each timestamps are converted to a canonical timezone (e.g., UTC) before processing and converted to market timezone
      only at display/execution time
    severity: high
    kind: domain_rule
    modality: must
    consequence: Without explicit timezone annotation, backtests use incorrect timestamps leading to trades at wrong times,
      with live trading failing to match backtest behavior due to timezone mismatches
    derived_from_bd_id: BD-GAP-003
  - id: finance-C-144
    when: When implementing default account fund collection logic
    action: Assume the framework handles collection priority and compliance automatically; failure to implement explicit collection
      priority rules violates regulatory requirements and may result in improper fund handling
    severity: high
    kind: claim_boundary
    modality: must_not
    consequence: Without explicit collection priority and compliance controls, fund collection may violate regulatory sequencing
      requirements, leading to compliance violations, penalties, or customer disputes
    derived_from_bd_id: BD-GAP-012
  - id: finance-C-145
    when: When implementing account fund collection and recovery operations
    action: Implement explicit collection priority rules defining the sequence of fund sources (e.g., cash accounts first,
      then securities, then other assets) and add compliance validation checks to verify collection follows regulatory requirements
      for the applicable jurisdiction
    severity: high
    kind: domain_rule
    modality: must
    consequence: Explicit collection priority ensures regulatory compliance and prevents improper fund seizure; without it,
      collection may violate customer protections or regulatory sequencing rules
    derived_from_bd_id: BD-GAP-012
  - id: finance-C-146
    when: When implementing price execution logic for batch trade matching
    action: Use midpoint pricing (average of pre/post prices) for each executions in batch mode; verify sellers receive the
      midpoint price to prevent front-running and guarantee fair execution symmetry for each participants
    severity: high
    kind: domain_rule
    modality: must
    consequence: Deviating from midpoint pricing creates front-running opportunities where participants trade at favorable
      prices at the expense of others, breaking batch execution fairness and potentially causing disputes or regulatory scrutiny
    derived_from_bd_id: BD-017
  - id: finance-C-147
    when: When implementing or modifying insolvency detection logic in bank default handling
    action: Preserve the insolvency trigger at leverage < 3% (BANK_LEVERAGE_MIN=0.03) as the hard stop for forced liquidation
      — any modification to this threshold must be explicitly reviewed
    severity: high
    kind: operational_lesson
    modality: must
    consequence: Changing the insolvency threshold below 3% allows banks with inadequate asset coverage to continue trading,
      amplifying losses that should have triggered forced liquidation and distorting systemic risk measurements
    derived_from_bd_id: BD-095
  - id: finance-C-148
    when: When implementing firesale settlement logic that handles simultaneous bank liquidations
    action: Compute price impact BEFORE settling any firesale orders — the ordering of compute_price_impact then settle is
      critical and must be preserved; reversing this order corrupts each cascade result
    severity: high
    kind: operational_lesson
    modality: must
    consequence: Reversing the firesale order causes the first sale to update market prices before subsequent sales execute,
      resulting in later sales receiving worse execution at artificially depressed prices and silently corrupting the entire
      cascade
    derived_from_bd_id: BD-112
  - id: finance-C-149
    when: When implementing solvency determination logic that combines market-valued and face-valued assets
    action: Recognize that solvency boundaries are determined by a mixed valuation approach — tradable assets use market prices
      while loans use face value, creating arbitrary cutoff points where market declines are masked
    severity: medium
    kind: operational_lesson
    modality: should
    consequence: Mixed valuation may show a bank as solvent when market-valued assets have declined substantially, because
      unchanged loan face values mask actual credit deterioration and market conditions are not reflected in solvency calculations
    derived_from_bd_id: BD-117
  - id: finance-C-150
    when: When implementing data initialization or lookback logic for historical data queries
    action: Assume the framework provides point-in-time data availability — historical data queries may return current values
      rather than values as of a specific date; the framework does not implement temporal data versioning
    severity: high
    kind: claim_boundary
    modality: must_not
    consequence: Without point-in-time data handling, historical backtests use current values for past dates, introducing
      look-ahead bias that makes backtest results completely non-reproducible in live trading
    derived_from_bd_id: BD-GAP-005
  - id: finance-C-151
    when: When implementing data layer initialization or historical data retrieval
    action: Implement point-in-time data retrieval using temporal query fields (e.g., as_of_date, valid_start, valid_end)
      in the data schema, ensuring historical queries return values as they existed at each specific point in time
    severity: high
    kind: operational_lesson
    modality: must
    consequence: Without point-in-time data retrieval, historical backtests use current values for past dates, introducing
      look-ahead bias that causes live trading returns to fall far below backtested results
    derived_from_bd_id: BD-GAP-005
  - id: finance-C-152
    when: When implementing firesale execution logic in market clearing that handles simultaneous bank liquidations
    action: Process each firesale orders in a batch at the same pre-settlement price — execute each orders in the firesale
      simultaneously at the price computed before any settlement occurs; do NOT interleave with other transactions or change
      to sequential processing
    severity: high
    kind: domain_rule
    modality: must
    consequence: Changing from simultaneous batch processing to sequential or interleaved execution gives first-mover advantage
      to early sellers and understates the liquidity pressure of simultaneous liquidations during systemic stress, producing
      unrealistic stress test results
    derived_from_bd_id: BD-032
  - id: finance-C-153
    when: When initializing asset prices in market simulation
    action: 'Verify that each assets are explicitly initialized with known prices before use; the defaultdict(lambda: 1.0)
      default may mask missing initialization and silently propagate placeholder values'
    severity: medium
    kind: operational_lesson
    modality: should
    consequence: Using default price of 1.0 for unseen assets can silently propagate placeholder values through calculations,
      causing incorrect simulation results that appear valid without explicit validation
    derived_from_bd_id: BD-062
  - id: finance-C-154
    when: When implementing loan valuation in banking simulation
    action: Assume loans are valued at face value regardless of credit quality; impaired loans must be marked down to reflect
      actual recoverable value, not held at principal
    severity: high
    kind: domain_rule
    modality: must_not
    consequence: Valuing impaired loans at face value overstates asset values, causing incorrect leverage ratios and capital
      adequacy calculations; in live trading, impaired loan portfolios would trigger regulatory violations not apparent in
      backtest
    derived_from_bd_id: BD-065
  - id: finance-C-155
    when: When configuring leverage targeting sensitivity analysis
    action: Document that buffer=1.0 (100%) is an extreme edge case where target equals regulatory minimum; results represent
      worst-case targeting behavior and do not reflect conservative banking practices
    severity: medium
    kind: operational_lesson
    modality: must
    consequence: Using buffer=1.0 as a general-purpose baseline produces unrealistic results; actual banks maintain buffers
      of 0.5-3% for safety, so leverage targeting at minimum threshold is an atypical edge case that will not generalize to
      practical scenarios
    derived_from_bd_id: BD-055
  - id: finance-C-156
    when: When configuring leverage targeting sensitivity analysis with 100% buffer
    action: Document that 100% buffer represents aggressive targeting with no safety margin; results are not generalizable
      to more conservative targeting approaches that maintain actual safety buffers
    severity: medium
    kind: operational_lesson
    modality: must
    consequence: Leverage targeting with 100% buffer isolates targeting from buffer effects but creates results that apply
      only to edge-case configurations; practical targeting strategies maintain 0.5-3% buffers, so comparative analysis may
      mislead if this boundary is not documented
    derived_from_bd_id: BD-057
  - id: finance-C-157
    when: When implementing price impact calculations for large-volume trades
    action: Use exponential decay price impact function; do not replace with linear models as they underestimate large-volume
      impacts during market stress conditions
    severity: high
    kind: domain_rule
    modality: must
    consequence: Linear price impact models systematically underestimate market impact for large trades, causing backtested
      execution costs to appear lower than actual costs in live trading with realistic market depth constraints
    derived_from_bd_id: BD-058
  - id: finance-C-158
    when: When implementing balance sheet initialization in the banking model
    action: 'Calculate total assets from CET1E and leverage ratio using formula: assets = CET1E / (leverage/100) — do not
      reverse the calculation by starting from assets and deriving capital'
    severity: high
    kind: domain_rule
    modality: must
    consequence: Reversing the calculation order (assets → capital) breaks the iterative loop for capital ratio calculation,
      causing either non-convergence or incorrect leverage ratios that misrepresent regulatory capital adequacy
    derived_from_bd_id: BD-067
  - id: finance-C-159
    when: When implementing market stress price impact in the banking model
    action: Calibrate price impact using fixed 1:1 ratio where 5% market sell causes 5% price drop — do not optimize or adjust
      this ratio unless explicitly testing alternative liquidity scenarios
    severity: high
    kind: domain_rule
    modality: must
    consequence: Using higher ratios overstates fire-sale dynamics causing excessive price drops in stress tests; using lower
      ratios understates liquidity risk leading to inaccurate stress scenario results
    derived_from_bd_id: BD-070
  - id: finance-C-161
    when: When implementing deleveraging logic in the banking model
    action: Activate deleveraging when leverage falls below 4% — this creates a 1% buffer zone above the 3% insolvency threshold
      allowing banks remediation opportunity before default
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Removing or changing the buffer zone eliminates the remediation window, causing banks to default immediately
      when hitting insolvency threshold instead of attempting to reduce leverage proactively
    derived_from_bd_id: BD-073
  - id: finance-C-163
    when: When configuring stress test shock sweep parameters
    action: Verify that the shock sweep range covers 0-30% with at least 21 points (1.5% increments) to capture threshold
      effects where contagion becomes systemic
    severity: medium
    kind: operational_lesson
    modality: should
    consequence: Incorrect shock range causes critical crisis scenarios to be missed entirely, making systemic risk assessments
      incomplete and potentially causing underestimation of tail risk exposure
    derived_from_bd_id: BD-078
  - id: finance-C-164
    when: When computing systemic risk classification in stress test simulations
    action: 'Apply 5% EOSE threshold: values below 5% indicate no systemic event, values at or above 5% indicate systemic
      crisis'
    severity: high
    kind: domain_rule
    modality: must
    consequence: Using an incorrect EOSE threshold causes misclassification of systemic events; wrong threshold may trigger
      false alerts or miss critical contagion scenarios, leading to incorrect risk management decisions
    derived_from_bd_id: BD-079
  - id: finance-C-165
    when: When configuring stress test simulation duration
    action: Verify that 6 timesteps are sufficient to capture full contagion dynamics including initial shock, first wave,
      stabilization, and final state resolution
    severity: medium
    kind: operational_lesson
    modality: should
    consequence: Reducing timesteps below 6 may cause late-stage contagion effects to be missed entirely, resulting in incomplete
      systemic risk assessment and underestimation of cascade failures
    derived_from_bd_id: BD-081
  - id: finance-C-166
    when: When implementing settlement price calculation for asset sales in stress test simulations
    action: 'Calculate settlement price as midpoint: (current_price + old_price) / 2 — this represents fair value OTC execution
      preventing extreme fire-sale valuations'
    severity: high
    kind: domain_rule
    modality: must
    consequence: Using only current price undervalues assets during fire sales while using only old price overvalues; either
      deviation causes systematic misvaluation leading to incorrect loss calculations and suboptimal risk management
    derived_from_bd_id: BD-083
  - id: finance-C-167
    when: When configuring leverage targeting simulation parameters
    action: Document and validate that leverage_buffer=1.0 produces specific simulation outcomes — this creates an aggressive
      deleveraging target (2x current leverage), and strategies using different buffer values will yield materially different
      cascade dynamics
    severity: medium
    kind: operational_lesson
    modality: should
    consequence: Using fixed buffer=1.0 means the simulation only tests aggressive deleveraging scenarios; strategies assuming
      different buffer values may produce different cascade behaviors not captured in this backtest
    derived_from_bd_id: BD-087
  - id: finance-C-168
    when: When running stress test simulations with initial market shocks
    action: Verify that initial shock targeting government bonds only (no equity shock) aligns with the stress scenario hypothesis
      — other asset classes may behave differently under stress
    severity: medium
    kind: operational_lesson
    modality: should
    consequence: Applying initial shock only to government bonds means equity-driven stress scenarios produce different cascade
      patterns not captured in the simulation; backtest results are specific to sovereign risk events, not general market
      stress
    derived_from_bd_id: BD-088
  - id: finance-C-169
    when: When interpreting cascade severity results from stress testing
    action: Recognize that 20% initial bond shock combined with SIMULTANEOUS_FIRESALE batch processing produces 2-3x higher
      cascade severity than sequential selling — this amplification effect is specific to simultaneous execution and does
      not represent each fire sale scenarios
    severity: high
    kind: domain_rule
    modality: must
    consequence: The triple interaction effect (BD-108) causes cascade severity 2-3x higher than any single mechanism; interpreting
      results as representative of sequential fire sales would overestimate systemic risk by 200-300%
    derived_from_bd_id: BD-108
  - id: finance-C-170
    when: When analyzing long-duration stress scenarios or cascade persistence
    action: 'Account for the deleveraging feedback loop: leverage_buffer triggers fire sales, which erode asset values, which
      drop leverage further, re-triggering the buffer threshold — this creates extended cascade duration not seen in single-pass
      simulations'
    severity: high
    kind: domain_rule
    modality: must
    consequence: The BD-109 feedback loop causes cascades to persist until assets are fully sold or banks default; single-timestep
      severity metrics underestimate total systemic impact by not capturing the iterative erosion pattern
    derived_from_bd_id: BD-109
  - id: finance-C-171
    when: When designing or validating stress test timing mechanisms
    action: 'Account for deferred default execution: insolvent banks continue operating during the interval between detection
      and next step(), accumulating positions that may create additional interdependencies before defaults execute'
    severity: high
    kind: domain_rule
    modality: must
    consequence: Deferred execution (BD-118) allows insolvent banks to accumulate positions between detection and execution;
      cascade timing and severity differ from immediate-execution models, potentially masking or amplifying systemic risk
      depending on step granularity
    derived_from_bd_id: BD-118
output_validator:
  assertions:
  - id: OV-01
    check_predicate: all(p in inspect.getsource(zvt.factors.algorithm.macd) for p in ['slow=26', 'fast=12', 'n=9'])
    failure_message: 'FATAL: MACD params drifted from (fast=12, slow=26, n=9) — SL-08 violation, non-reproducible signals'
    business_meaning: Standard MACD parameters are a semantic lock; drift makes results incomparable with industry-standard
      indicators and non-reproducible.
    source_ids:
    - SL-08
    - BD-036
  - id: OV-02
    check_predicate: result.get('total_trades', 0) > 0 or result.get('explicit_zero_trade_ack') is True
    failure_message: Zero trades executed — likely missing pre-fetched data (see PC-02) or over-restrictive filters
    business_meaning: A backtest with zero trades is not a valid result; either data is missing or the strategy never triggered.
      Structural non-emptiness check is insufficient — we need business confirmation.
    source_ids:
    - SL-01
    - finance-C-073
  - id: OV-03
    check_predicate: result.get('annual_return') is None or abs(float(result['annual_return'])) <= 5.0
    failure_message: 'FATAL: |annual_return| > 500% — likely look-ahead bias or data error'
    business_meaning: Annual returns exceeding 500% are physically implausible for A-share strategies; indicates look-ahead
      bias or corrupt data.
    source_ids: []
  - id: OV-04
    check_predicate: result.get('holding_change_pct') is None or abs(float(result['holding_change_pct'])) <= 1.0
    failure_message: 'FATAL: |holding_change_pct| > 100% — physically impossible'
    business_meaning: Holding change percentage cannot exceed 100%; violation indicates position accounting error.
    source_ids:
    - BD-029
  - id: OV-05
    check_predicate: result.get('max_drawdown') is None or abs(float(result['max_drawdown'])) <= 1.0
    failure_message: 'FATAL: |max_drawdown| > 100% — impossible for non-leveraged account'
    business_meaning: Maximum drawdown cannot exceed 100% without leverage; violation indicates calculation error or look-ahead
      bias.
    source_ids: []
  - id: OV-06
    check_predicate: not (hasattr(result, 'trade_log') and result.trade_log and any(result.trade_log[i].action == 'sell' and
      i+1 < len(result.trade_log) and result.trade_log[i+1].action == 'buy' and result.trade_log[i].timestamp == result.trade_log[i+1].timestamp
      for i in range(len(result.trade_log)-1)))
    failure_message: 'FATAL: buy-before-sell detected in same cycle — SL-01 violation, creates implicit leverage'
    business_meaning: SL-01 requires sell() before buy() in each cycle; violation means available_long was not updated before
      buying, risking duplicate positions.
    source_ids:
    - SL-01
  scaffold:
    validate_py_path: '{workspace}/validate.py'
    tail_block: "# === DO NOT MODIFY BELOW THIS LINE ===\nif __name__ == \"__main__\":\n    result = run_backtest()\n    from\
      \ validate import enforce_validation\n    enforce_validation(result, output_path=\"{workspace}/result.csv\")\n# ===\
      \ END DO NOT MODIFY ==="
  enforcement_protocol: 1. Never edit validate.py. 2. Never delete the DO NOT MODIFY tail block from the main script. 3. Never
    wrap enforce_validation() in try/except. 4. Never rewrite result write logic — it MUST go through enforce_validation.
    5. If validate.py raises ImportError, fix the dependency, do not remove the call.
acceptance:
  hard_gates:
  - id: G1
    check: '{workspace}/result.csv exists AND file size > 0'
    on_fail: Strategy did not produce output; check run_backtest() return value and enforce_validation() call
  - id: G2
    check: '{workspace}/result.csv.validation_passed marker file exists'
    on_fail: Validation did not complete; review validate.py output and fix assertion failures
  - id: G3
    check: 'Main script contains literal: from validate import enforce_validation'
    on_fail: Validation chain stripped; re-add the import in the DO NOT MODIFY block
  - id: G4
    check: 'Main script contains literal: # === DO NOT MODIFY BELOW THIS LINE ==='
    on_fail: Validation fence removed; regenerate DO NOT MODIFY tail block
  - id: G5
    check: 'result.csv has at least 1 row: pandas.read_csv(result.csv).shape[0] >= 1'
    on_fail: Empty result; check if trade_log is non-empty and factors generated signals. Confirm PC-02 (k-data exists) passed.
  - id: G6
    check: 'If MACD strategy: source contains ''slow=26'' AND ''fast=12'' AND ''n=9'' in algorithm call'
    on_fail: MACD params drifted from SL-08 lock; restore standard (12, 26, 9)
  - id: G7
    check: 'For data pipeline tasks: result.csv contains ''entity_id'' and ''timestamp'' fields'
    on_fail: Missing required columns; check Mixin.query_data return schema and DataFrame MultiIndex reset_index() before
      writing
  - id: G8
    check: 'OV-03 passes: abs(annual_return) <= 5.0 (500%)'
    on_fail: Physical plausibility check failed; investigate look-ahead bias or data corruption in input kdata
  soft_gates:
  - id: SG-01
    rubric: 'Strategy narrative consistency: user intent aligns with generated strategy.py logic. dim_a: signal direction
      (buy/sell) matches intent [1-5, pass>=4]; dim_b: frequency (daily/intraday) aligns [1-5, pass>=4]; dim_c: risk controls
      match user intent [1-5, pass>=4].'
  - id: SG-02
    rubric: 'Factor combination quality. dim_a: no highly correlated factor duplication [1-5, pass>=4]; dim_b: multi-period
      alignment correct [1-5, pass>=4]; dim_c: liquidity filter present for A-share [1-5, pass>=4].'
  - id: SG-03
    rubric: 'Data source selection appropriateness. dim_a: coverage sufficient for target entities [1-5, pass>=4]; dim_b:
      provider latency acceptable for strategy frequency [1-5, pass>=4]; dim_c: no unauthorized provider used without credentials
      [1-5, pass>=4].'
skill_crystallization:
  trigger: all_hard_gates_passed AND user_opt_out_skill_saving != true
  output_path_template: '{workspace}/../skills/{slug}.skill'
  slug_template: '{blueprint_id_short}-{uc_id_lower}'
  captured_fields:
  - name
  - intent_keywords
  - entry_point_script
  - validate_script
  - fatal_constraints
  - spec_locks
  - preconditions
  - install_recipes
  - human_summary_translated
  action: 'After all Hard Gates PASS, resolve slug via slug_template using the executed UC, then write the .skill YAML file
    at output_path_template. Notify user in their detected locale: ''Skill saved as {slug}.skill — next time say one of {sample_triggers}
    from the matched UC to invoke directly.'''
  violation_signal: All hard gates passed but no .skill file exists at expected path
  skill_file_schema:
    name: finance-bp-067 / UC-101
    version: v5.3
    intent_keywords: []
    entry_point: run_backtest
    fatal_guards:
    - SL-01
    - SL-02
    - SL-03
    - SL-04
    - SL-05
    - SL-06
    - SL-07
    - SL-08
    - SL-10
    - SL-11
    - SL-12
    spec_locks:
    - SL-01
    - SL-02
    - SL-03
    - SL-04
    - SL-05
    - SL-06
    - SL-07
    - SL-08
    - SL-09
    - SL-10
    - SL-11
    - SL-12
    preconditions:
    - PC-01
    - PC-02
    - PC-03
    - PC-04
post_install_notice:
  trigger: skill_installation_complete
  message_template:
    positioning: I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow.
    capability_catalog:
      group_strategy:
        source: auto_grouped
        strategy_reason: no candidate field had 2-7 distinct values; all capabilities collapsed into single group
      groups:
      - group_id: all
        name: All Capabilities
        description: ''
        emoji: 📦
        uc_count: 0
        ucs: []
    call_to_action: Tell me which one you want to try.
    featured_entries:
    - uc_id: UC-100
      beginner_prompt: Try capability UC-100
      auto_selected: true
    - uc_id: UC-101
      beginner_prompt: Try capability UC-101
      auto_selected: true
    - uc_id: UC-102
      beginner_prompt: Try capability UC-102
      auto_selected: true
    more_info_hint: Ask me 'what else can you do?' to see all 0 capabilities.
  locale_rendering:
    instruction: On skill_installation_complete, translate ALL user-facing strings (positioning + capability_catalog.groups[].name
      + capability_catalog.groups[].description + capability_catalog.groups[].ucs[].short_description + call_to_action + featured_entries[].beginner_prompt
      + more_info_hint) into detected user locale per locale_contract. Preserve UC-IDs, group_id, emoji, and sample_triggers
      verbatim.
    preserve_verbatim:
    - UC-IDs
    - group_id
    - emoji
    - sample_triggers
    - technical_class_names
  enforcement:
    action: 'Host agent MUST send composed message to user as the FIRST user-facing response after skill_installation_complete
      event. Message MUST contain: positioning, capability_catalog (rendered as markdown tables per group), 3 featured_entries,
      call_to_action, and more_info_hint.'
    violation_code: PIN-01
    violation_signal: First user-facing message post-install does not contain the full capability_catalog (all UCs grouped)
      OR skips featured_entries OR skips call_to_action.
human_summary:
  persona: Doraemon
  what_i_can_do:
    tagline: 'I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow. Just tell me
      what you want; I''ll write the code, you don''t have to dig docs. (Heads up: ZVT natively supports A-share, HK, and
      crypto. US stocks — stockus_nasdaq_AAPL — are half-baked; don''t bother for serious work.)'
    use_cases:
    - A-share MACD daily golden-cross backtest with hfq price adjustment from eastmoney
    - 'End-to-end ZVT pipeline: FinanceRecorder + GoodCompanyFactor + StockTrader'
    - Multi-factor strategy with TargetSelector (AND mode) combining MACD + volume breakout
    - Index composition data collection (SZ1000, SZ2000) with EM recorder
    - Institutional fund holdings tracker via joinquant_fund_runner pattern
    - Custom Transformer + Accumulator factor with per-entity rolling state
    - Bollinger Band mean-reversion factor with BollTransformer (window=20, window_dev=2)
  what_i_auto_fetch:
  - ZVT stage pipeline structure (data_collection → visualization) from LATEST.yaml
  - Semantic locks (SL-01 through SL-12) — especially sell-before-buy ordering and MACD params
  - Fatal constraints (finance-C-*) relevant to your target strategy type
  - 'Default parameters: MACD(12,26,9), hfq adjustment, buy_cost=0.001, base_capital=1M CNY'
  - Entity ID format (stock_sh_600000) and DataFrame MultiIndex convention
  - Provider-specific recorder class names and required class attributes
  what_i_ask_you:
  - 'Target market: A-share (default), HK, or crypto? (US stocks in ZVT are half-baked — stockus_nasdaq_AAPL exists but coverage
    is thin)'
  - 'Data source / provider: eastmoney (free, no account), joinquant (account+paid), baostock (free, good history), akshare,
    or qmt (broker)?'
  - 'Strategy type: MACD golden-cross, MA crossover, volume breakout, fundamental screen, or custom factor?'
  - 'Time range: start_timestamp and end_timestamp for backtest period'
  - 'Target entity IDs: specific stocks (stock_sh_600000) or index components (SZ1000)?'
  locale_rendering:
    instruction: On first user contact, translate all fields above into detected user locale while preserving Doraemon persona
      (direct, frank, mildly snarky, knows limits).
    preserve_verbatim:
    - BD-IDs
    - SL-IDs
    - UC-IDs
    - finance-C-IDs
    - class_names
    - function_names
    - file_paths
    - numeric_thresholds

ClawHub Coding Data Analysis+2

T@clawhub-tangweigang-jpg-8679fec286

Finrobot Multi Agent

Skill

多智能体金融分析平台，支持股票研究、市场预测、财报解读与量化回测策略构建，覆盖全球市场数据分析。

---
name: finrobot-multi-agent
description: |-
  多智能体金融分析平台，支持股票研究、市场预测、财报解读与量化回测策略构建，覆盖全球市场数据分析。
license: Proprietary. See LICENSE.txt in project root.
compatibility: Designed for Doramagic-host ecosystem (Claude Code / openclaw / Cursor). Requires Python 3.12+ with uv package manager.
metadata:
  version: "v6.1"
  blueprint_id: "finance-bp-074"
  compiled_at: "2026-04-22T13:00:27.479397+00:00"
  capability_markets: "global"
  capability_activities: "macro-data"
  sop_version: "crystal-compilation-v6.1"
---
# FinRobot 多智能体 (finrobot-multi-agent)

> 多智能体金融分析平台，支持股票研究、市场预测、财报解读与量化回测策略构建，覆盖全球市场数据分析。

## Pipeline

`data_collection -> data_storage -> factor_computation -> target_selection -> trading_execution -> visualization`

## Top Use Cases (14 total)

### FMP API Equity Research Report Generator (`UC-101`)
Investors need comprehensive equity research reports that combine financial statement analysis, peer comparisons, and recent news to make informed inv
**Triggers**: equity research, financial analysis report, FMP API

### Multi-Agent Annual Report Generator (`UC-102`)
Financial analysts require automated generation of customized financial analysis reports that can interact with clients, gather requirements, and prod
**Triggers**: annual report, financial report generation, multi-agent

### OpenBB Financial Data Agent (`UC-104`)
Users need an intelligent agent interface to access OpenBB's comprehensive financial data capabilities including market data, fundamentals, and techni
**Triggers**: openbb, financial data agent, market data

For all **14** use cases, see [references/USE_CASES.md](references/USE_CASES.md).

**Execute trigger**: `When user intent matches intent_router.uc_entries[].positive_terms AND user uses action verb (run/execute/跑/执行/backtest/fetch/collect)`

## What I'll Ask You

- Target market: A-share (default), HK, or crypto? (US stocks in ZVT are half-baked — stockus_nasdaq_AAPL exists but coverage is thin)
- Data source / provider: eastmoney (free, no account), joinquant (account+paid), baostock (free, good history), akshare, or qmt (broker)?
- Strategy type: MACD golden-cross, MA crossover, volume breakout, fundamental screen, or custom factor?
- Time range: start_timestamp and end_timestamp for backtest period
- Target entity IDs: specific stocks (stock_sh_600000) or index components (SZ1000)?

## Semantic Locks (Fatal)

| ID | Rule | On Violation |
|---|---|---|
| `SL-01` | Execute sell orders before buy orders in every trading cycle | halt |
| `SL-02` | Trading signals MUST use next-bar execution (no look-ahead) | halt |
| `SL-03` | Entity IDs MUST follow format entity_type_exchange_code | halt |
| `SL-04` | DataFrame index MUST be MultiIndex (entity_id, timestamp) | halt |
| `SL-05` | TradingSignal MUST have EXACTLY ONE of: position_pct, order_money, order_amount | halt |
| `SL-06` | filter_result column semantics: True=BUY, False=SELL, None/NaN=NO ACTION | halt |
| `SL-07` | Transformer MUST run BEFORE Accumulator in factor pipeline | halt |
| `SL-08` | MACD parameters locked: fast=12, slow=26, signal=9 | halt |

Full lock definitions: [references/LOCKS.md](references/LOCKS.md)

## Top Anti-Patterns (14 total)

- **`AP-MACRO-DATA-001`**: SEC EDGAR Rate Limit Violation
- **`AP-MACRO-DATA-002`**: Temporal Knowledge Graph Look-Ahead Bias
- **`AP-MACRO-DATA-003`**: Technical Indicator Look-Ahead Bias via Missing Shift

All 14 anti-patterns: [references/ANTI_PATTERNS.md](references/ANTI_PATTERNS.md)

## Evidence Quality Notice

> [QUALITY NOTICE] This crystal was compiled from blueprint finance-bp-074. Evidence verify ratio = 11.5% and audit fail total = 36. Generated results may have uncaptured requirement gaps. Verify critical decisions against source files (LATEST.yaml / LATEST.jsonl).

## Reference Files

| File | Contents | When to Load |
|---|---|---|
| [references/seed.yaml](references/seed.yaml) | V6+ 全量权威 (source-of-truth) | 有行为/决策争议时必读 |
| [references/ANTI_PATTERNS.md](references/ANTI_PATTERNS.md) | 14 条跨项目反模式 | 开始实现前 |
| [references/WISDOM.md](references/WISDOM.md) | 跨项目精华借鉴 | 架构决策时 |
| [references/CONSTRAINTS.md](references/CONSTRAINTS.md) | domain + fatal 约束 | 规则冲突时 |
| [references/USE_CASES.md](references/USE_CASES.md) | 全量 KUC-* 业务场景 | 需要完整示例时 |
| [references/LOCKS.md](references/LOCKS.md) | SL-* + preconditions + hints | 生成回测/交易代码前 |
| [references/COMPONENTS.md](references/COMPONENTS.md) | AST 组件地图（按 module 拆分）| 查 API 时 |

---

*Compiled by Doramagic crystal-compilation-v6.1 from `finance-bp-074` blueprint at 2026-04-22T13:00:27.479397+00:00.*
*See [human_summary.md](human_summary.md) for non-technical overview.*

FILE:human_summary.md
# Human Summary

> I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow. Just tell me what you want; I'll write the code, you don't have to dig docs. (Heads up: ZVT natively supports A-share, HK, and crypto. US stocks — stockus_nasdaq_AAPL — are half-baked; don't bother for serious work.)

## What I Can Do

- **tagline**: I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow. Just tell me what you want; I'll write the code, you don't have to dig docs. (Heads up: ZVT natively supports A-share, HK, and crypto. US stocks — stockus_nasdaq_AAPL — are half-baked; don't bother for serious work.)
- **use_cases**: ['FinnGPT Market Forecaster', 'Multi-Agent Annual Report Generator', 'FMP API Equity Research Report Generator', 'A-share MACD daily golden-cross backtest with hfq price adjustment from eastmoney', 'End-to-end ZVT pipeline: FinanceRecorder + GoodCompanyFactor + StockTrader', 'Multi-factor strategy with TargetSelector (AND mode) combining MACD + volume breakout', 'Index composition data collection (SZ1000, SZ2000) with EM recorder']

## What I Ask You

- Target market: A-share (default), HK, or crypto? (US stocks in ZVT are half-baked — stockus_nasdaq_AAPL exists but coverage is thin)
- Data source / provider: eastmoney (free, no account), joinquant (account+paid), baostock (free, good history), akshare, or qmt (broker)?
- Strategy type: MACD golden-cross, MA crossover, volume breakout, fundamental screen, or custom factor?
- Time range: start_timestamp and end_timestamp for backtest period
- Target entity IDs: specific stocks (stock_sh_600000) or index components (SZ1000)?

FILE:references/ANTI_PATTERNS.md
# Anti-Patterns (Cross-Project)

Total: **14**

## finance-bp-074--FinRobot (1)

### `AP-MACRO-DATA-001` — SEC EDGAR Rate Limit Violation <sub>(high)</sub>

When implementing SEC API calls without applying rate limiting decorators, requests exceed the regulatory 10 requests per second limit. This causes IP blocking from SEC EDGAR, preventing all subsequent access to critical financial filings and completely disrupting the data collection pipeline. FinRobot demonstrates that SEC enforces strict rate limits and missing User-Agent headers compound this by causing silent request failures.

## finance-bp-077--Open_Source_Economic_Model (2)

### `AP-MACRO-DATA-004` — EIOPA Non-Compliant Curve Extrapolation <sub>(high)</sub>

When implementing the Smith-Wilson algorithm for EIOPA Solvency II calculations, using non-EIOPA compliant formulas or incorrect convergence point calculations violates regulatory specifications. The convergence point must use max(U+40, 60) years per EIOPA paragraph 120. Non-compliant formulas will fail regulatory audits for insurance liability calculations and produce incorrect risk-free rates, leading to materially wrong liability valuations.

### `AP-MACRO-DATA-009` — CSV BOM Encoding Corruption in Data Import <sub>(medium)</sub>

When importing CSV portfolio files with special characters without using 'utf-8-sig' encoding to handle BOM markers, CSV files with UTF-8 BOM markers fail to parse correctly. This causes KeyError exceptions when reading row fields, preventing portfolio data from loading entirely. The BOM marker silently corrupts the first column name read by pandas.

## finance-bp-080--FinDKG (3)

### `AP-MACRO-DATA-002` — Temporal Knowledge Graph Look-Ahead Bias <sub>(high)</sub>

When implementing temporal data splitting for knowledge graphs, using non-temporal train/val/test splits causes the model to see future events during training. The violation of train_edges occurring before val_edges and test_edges temporally results in inflated metrics that do not reflect real-world performance. This produces overfit models that fail catastrophically when deployed for actual temporal prediction tasks.

### `AP-MACRO-DATA-008` — DGL Graph Attribute Propagation Failure in Temporal Batching <sub>(medium)</sub>

When implementing temporal knowledge graph data collation without propagating graph attributes (num_relations, num_all_nodes, time_interval) to subgraph variants created by collate_fn, downstream model components encounter missing attribute errors. The EmbeddingUpdater and EdgeModel expect these attributes on all graph objects including subgraphs, causing training to fail with AttributeError.

### `AP-MACRO-DATA-014` — Temporal DataLoader Shuffling Breaking Graph Ordering <sub>(medium)</sub>

When configuring DataLoader for temporal knowledge graph training with shuffle=True, the temporal ordering required for cumulative graph construction is violated. The model receives edges in non-chronological order, breaking the prior_G, batch_G, cumulative_G construction logic that depends on edges_before_batch occurring before edges_in_batch.

## finance-bp-083--Economic-Dashboard (3)

### `AP-MACRO-DATA-003` — Technical Indicator Look-Ahead Bias via Missing Shift <sub>(high)</sub>

When implementing SMA crossover detection (golden/death cross) without using shift(1) to compare current bar state with prior bar state, crossover detection uses current bar data causing look-ahead bias. Signals appear to fire at the same bar as the cross occurs, producing unrealistic backtest results that fail in live trading. Rationalizing this with 'we need the current bar signal immediately' leads to future information leaking into current signals.

### `AP-MACRO-DATA-010` — OHLCV Data Quality Validation Failure <sub>(medium)</sub>

When calculating technical indicators from OHLCV data without verifying required columns (open, high, low, close, volume), missing required OHLCV columns causes ValueError and prevents technical indicator calculation for affected tickers. This blocks downstream regime classification and pattern detection for all tickers with incomplete data.

### `AP-MACRO-DATA-011` — Inconsistent Primary Key Schema Causing JOIN Failures <sub>(medium)</sub>

When storing derived features in DuckDB with a different primary key schema than technical_features table, inconsistent primary keys prevent JOIN operations between tables. This breaks regime classification and pattern detection pipelines. The composite primary key (ticker, date) must be consistent across all feature tables to enable efficient querying and data integrity.

## finance-bp-105--open-climate-investing (5)

### `AP-MACRO-DATA-005` — Factor Regression Using Raw Returns Instead of Excess Returns <sub>(high)</sub>

When computing returns for CAPM/Fama-French factor regression, using raw stock returns instead of subtracting the risk-free rate (Rf) violates standard financial econometric methodology. CAPM/FF regression requires excess returns (Return - Rf); using raw returns produces incorrect beta estimates that misrepresent a stock's systematic risk exposure. This leads to fundamentally flawed risk attribution and portfolio construction decisions.

### `AP-MACRO-DATA-006` — Percentage vs Decimal Unit Mismatch in Factor Data <sub>(high)</sub>

When importing Fama-French factors from CSV files, failing to divide percentage-formatted factors (e.g., 5.2) by 100 before regression causes coefficients scaled by 100x. This produces statistically invalid inference and meaningless factor loadings. The same issue applies to risk-free rate values, corrupting all CAPM beta calculations downstream.

### `AP-MACRO-DATA-007` — Insufficient Regression Observations for Statistical Validity <sub>(medium)</sub>

When implementing factor regression analysis, using fewer than 20 data points after filtering (inner join, winsorization, date range) produces unreliable or undefined t-statistics and p-values. OLS with insufficient observations produces meaningless regression coefficients, making it impossible to distinguish significant factor exposures from noise. This commonly occurs when combining multiple data sources with missing values.

### `AP-MACRO-DATA-012` — Frequency Column Enforcement Missing in Time Series Schema <sub>(medium)</sub>

When creating PostgreSQL schema for time series tables without explicit frequency column enforcement of 'MONTHLY' or 'DAILY' text values, mixed frequency data corrupts regression calculations. Combining incompatible data frequencies produces statistically invalid regression results. The database must enforce frequency consistency to prevent silent data corruption.

### `AP-MACRO-DATA-013` — PostgreSQL Fork in Multiprocessing Context <sub>(medium)</sub>

When implementing multiprocessing for parallel regression execution using fork start method with psycopg2 database connections, child processes inherit corrupted connection state. This causes 'connection already closed' errors or corrupted connection state in child processes, resulting in failed database writes and incomplete factor regression calculations.

FILE:references/COMPONENTS.md
# Component Capability Map

**Project**: finance-bp-074--FinRobot
**Scan date**: 2026-04-22
**Stats**: {'total_files': 10, 'total_classes': 38, 'total_functions': 0, 'total_stages': 10}

## Modules (10)

- [financial_data_collection](components/financial_data_collection.md): 6 classes
- [quantitative_analysis_&_backtesting](components/quantitative_analysis_-_backtesting.md): 4 classes
- [multi-agent_workflow_orchestration](components/multi-agent_workflow_orchestration.md): 6 classes
- [financial_statement_analysis](components/financial_statement_analysis.md): 3 classes
- [valuation_analysis](components/valuation_analysis.md): 2 classes
- [sensitivity_analysis](components/sensitivity_analysis.md): 2 classes
- [market_catalyst_analysis](components/market_catalyst_analysis.md): 2 classes
- [equity_research_report_generation](components/equity_research_report_generation.md): 6 classes
- [web_application_&_user_interface](components/web_application_-_user_interface.md): 5 classes
- [rag-enhanced_document_retrieval](components/rag-enhanced_document_retrieval.md): 2 classes

FILE:references/CONSTRAINTS.md
# Constraints

## preservation_manifest

```yaml
required_objects:
  business_decisions_count: 151
  fatal_constraints_count: 38
  non_fatal_constraints_count: 213
  use_cases_count: 14
  semantic_locks_count: 12
  preconditions_count: 4
  evidence_quality_rules_count: 2
  traceback_scenarios_count: 5

```

FILE:references/LOCKS.md
# Semantic Locks + Preconditions

## Semantic Locks (12)

### `SL-01` <sub>(on_violation: fatal)</sub>
Execute sell orders before buy orders in every trading cycle

### `SL-02` <sub>(on_violation: fatal)</sub>
Trading signals MUST use next-bar execution (no look-ahead)

### `SL-03` <sub>(on_violation: fatal)</sub>
Entity IDs MUST follow format entity_type_exchange_code

### `SL-04` <sub>(on_violation: fatal)</sub>
DataFrame index MUST be MultiIndex (entity_id, timestamp)

### `SL-05` <sub>(on_violation: fatal)</sub>
TradingSignal MUST have EXACTLY ONE of: position_pct, order_money, order_amount

### `SL-06` <sub>(on_violation: fatal)</sub>
filter_result column semantics: True=BUY, False=SELL, None/NaN=NO ACTION

### `SL-07` <sub>(on_violation: fatal)</sub>
Transformer MUST run BEFORE Accumulator in factor pipeline

### `SL-08` <sub>(on_violation: fatal)</sub>
MACD parameters locked: fast=12, slow=26, signal=9

### `SL-09` <sub>(on_violation: warning)</sub>
Default transaction costs: buy_cost=0.001, sell_cost=0.001, slippage=0.001

### `SL-10` <sub>(on_violation: fatal)</sub>
A-share equity trading is T+1 (no same-day close of buy positions)

### `SL-11` <sub>(on_violation: fatal)</sub>
Recorder subclass MUST define provider AND data_schema class attributes

### `SL-12` <sub>(on_violation: fatal)</sub>
Factor result_df MUST contain either 'filter_result' OR 'score_result' column

## Preconditions (4)

- **`PC-01`**: `python3 -c 'import zvt; print(zvt.__version__)'` → on_fail: Run: python3 -m pip install zvt then re-run: python3 -m zvt.init_dirs to initialize data directories
- **`PC-02`**: `python3 -c "from zvt.api.kdata import get_kdata; df = get_kdata(entity_ids=['stock_sh_600000'], limit=1); assert df is not None and len(df) > 0, 'No kdata found'"` → on_fail: Run recorder first: python3 -m zvt.recorders.em.em_stock_kdata_recorder --entity_ids stock_sh_600000 (replace with your target entity IDs)
- **`PC-03`**: `python3 -c "import os; from pathlib import Path; zvt_home = Path(os.environ.get('ZVT_HOME', Path.home() / '.zvt')); assert zvt_home.exists(), f'ZVT home not found: {zvt_home}'"` → on_fail: Run: python3 -m zvt.init_dirs
- **`PC-04`**: `python3 -c "import os, tempfile; from pathlib import Path; zvt_home = Path(os.environ.get('ZVT_HOME', Path.home() / '.zvt')); test_f = zvt_home / '.write_test'; test_f.touch(); test_f.unlink()"` → on_fail: Check directory permissions: chmod u+w ~/.zvt or set ZVT_HOME environment variable to a writable location

FILE:references/USE_CASES.md
# Known Use Cases (KUC)

Total: **14**

## `KUC-101`
**Source**: `finrobot_equity/core/src/Run.ipynb`

Investors need comprehensive equity research reports that combine financial statement analysis, peer comparisons, and recent news to make informed investment decisions on specific companies.

## `KUC-102`
**Source**: `tutorials_advanced/agent_annual_report.ipynb`

Financial analysts require automated generation of customized financial analysis reports that can interact with clients, gather requirements, and produce professional-grade documents.

## `KUC-103`
**Source**: `tutorials_advanced/agent_fingpt_forecaster.ipynb`

Traders and investors need AI-powered market analysis that combines company profiles, financial data, and news to generate short-term price movement predictions.

## `KUC-104`
**Source**: `tutorials_advanced/agent_openbb.ipynb`

Users need an intelligent agent interface to access OpenBB's comprehensive financial data capabilities including market data, fundamentals, and technical analysis.

## `KUC-105`
**Source**: `tutorials_advanced/agent_trade_strategist.ipynb`

Algorithmic traders need automated assistance to develop, code, and backtest custom trading strategies using the BackTrader framework with logging for further analysis.

## `KUC-106`
**Source**: `tutorials_advanced/lmm_agent_mplfinance.ipynb`

Analysts need vision-enabled AI agents that can analyze financial charts and market visualizations alongside textual market news for comprehensive analysis.

## `KUC-107`
**Source**: `tutorials_advanced/lmm_agent_opt_smacross.ipynb`

Quantitative traders need AI-assisted optimization of Simple Moving Average crossover strategies by visually inspecting charts and iteratively refining parameters.

## `KUC-108`
**Source**: `tutorials_beginner/agent_annual_report.ipynb`

Beginners need a simple way to generate formatted PDF annual reports from SEC 10-K filings with appropriate length and professional presentation.

## `KUC-109`
**Source**: `tutorials_beginner/agent_fingpt_forecaster.ipynb`

Beginners need easy stock price movement predictions based on company news and available financial information for basic investment decision support.

## `KUC-110`
**Source**: `tutorials_beginner/agent_rag_earnings_call_sec_filings.ipynb`

Analysts need to query and analyze large collections of earnings call transcripts and SEC filings using retrieval augmented generation for insights.

## `KUC-111`
**Source**: `tutorials_beginner/agent_rag_qa.ipynb`

Users need to ask questions and get answers from annual report documents using RAG technology to quickly extract specific financial information.

## `KUC-112`
**Source**: `tutorials_beginner/agent_rag_qa_up.ipynb`

Analysts need to query across multiple financial document sources including earnings calls and SEC filings simultaneously to get comprehensive answers.

## `KUC-113`
**Source**: `tutorials_beginner/ollama function call.ipynb`

Users with privacy requirements or local infrastructure need to use local LLMs (Ollama) to call financial data functions for stock information retrieval.

## `KUC-114`
**Source**: `tutorials_beginner/ollama stock chart.ipynb`

Users need to generate stock price charts using local LLM infrastructure with yfinance data, avoiding cloud dependencies for visualization.

FILE:references/WISDOM.md
# Cross-Project Wisdom

Total: **8**

## `CW-MACRO-DATA-001` — Temporal Ordering Enforcement
**From**: finance-bp-080--FinDKG, finance-bp-083--Economic-Dashboard · **Applicable to**: macro-data

Across temporal knowledge graphs and financial time series, strict temporal ordering must be enforced in train/val/test splits and data loading. Train edges must occur strictly before validation edges, which must occur strictly before test edges. DataLoaders must never shuffle temporal data. Apply this pattern whenever implementing any time-series ML pipeline to prevent look-ahead bias that inflates evaluation metrics.

## `CW-MACRO-DATA-002` — Regulatory Formula Compliance
**From**: finance-bp-077--Open_Source_Economic_Model, finance-bp-105--open-climate-investing · **Applicable to**: macro-data

When implementing financial calculations subject to regulatory oversight (EIOPA Solvency II, CAPM, Fama-French), use exact formula specifications from authoritative sources. The Smith-Wilson convergence point must follow EIOPA paragraph 120, factor regressions must use excess returns with properly scaled inputs. Apply this pattern when calculations will be used for regulatory reporting or investment decision-making.

## `CW-MACRO-DATA-003` — Strict Data Schema Enforcement
**From**: finance-bp-083--Economic-Dashboard, finance-bp-077--Open_Source_Economic_Model · **Applicable to**: macro-data

Financial data pipelines require strict schema validation at ingestion points. OHLCV requires specific columns, CSV imports require exact column names matching field access, INI files require specific sections. Missing or malformed schema elements should fail loudly rather than produce silent corruption. Apply this pattern during data import to catch errors early before downstream calculations use bad data.

## `CW-MACRO-DATA-004` — Composite Primary Key Uniqueness
**From**: finance-bp-105--open-climate-investing, finance-bp-080--FinDKG, finance-bp-083--Economic-Dashboard · **Applicable to**: macro-data

Time-series financial databases require composite primary keys (ticker, date) to ensure uniqueness and enable efficient querying. Inconsistent primary keys across tables break JOIN operations essential for feature merging. Apply this pattern when designing any financial database schema involving time-series measurements with multiple entities.

## `CW-MACRO-DATA-005` — External API Rate Limiting
**From**: finance-bp-074--FinRobot · **Applicable to**: macro-data

When accessing external financial APIs (SEC EDGAR, data vendors), strict rate limiting must be implemented before deployment. SEC EDGAR enforces 10 requests per second with IP blocking consequences. Use decorators and proper User-Agent headers. Apply this pattern when integrating any external financial data API to prevent service disruption that blocks critical data access.

## `CW-MACRO-DATA-006` — Graph Attribute Propagation in Batching
**From**: finance-bp-080--FinDKG, finance-bp-105--open-climate-investing · **Applicable to**: macro-data

When creating subgraph variants during batch collation in graph-based ML, all metadata attributes (num_nodes, num_relations, time_interval) must be explicitly propagated to each subgraph. Downstream model components expect these attributes on all graph objects. Apply this pattern whenever implementing custom collate functions for graph neural networks to prevent training failures.

## `CW-MACRO-DATA-007` — Statistical Validity Thresholds
**From**: finance-bp-105--open-climate-investing, finance-bp-083--Economic-Dashboard · **Applicable to**: macro-data

Factor regressions and statistical calculations require minimum observation counts (typically 20+) for reliable inference. Inner joins, winsorization, and date filtering reduce observations; pipeline validation must check for sufficient data points before regression. Apply this pattern whenever computing regression statistics to ensure results are meaningful rather than spurious.

## `CW-MACRO-DATA-008` — Data Type Strictness for ML Operations
**From**: finance-bp-080--FinDKG, finance-bp-077--Open_Source_Economic_Model · **Applicable to**: macro-data

Graph operations and time calculations require strict dtype consistency (float32 for time values, integer for node types, boolean for masks). Type mismatches cause silent failures in edge_subgraph, degree calculations, and time interval transformations. Apply this pattern when preparing data for graph neural networks or any numerical ML pipeline to catch dtype issues early.

FILE:references/components/equity_research_report_generation.md
# equity_research_report_generation (6 classes)

## `EnhancedTextGenerator.generate`
`equity_research_report_generation/enhancedtextgenerator-generate.py:0`

## `EnhancedChartGenerator.generate_revenue_chart`
`equity_research_report_generation/enhancedchartgenerator-generate-revenue-.py:0`

## `EquityResearchAgentManager.generate_all_sections`
`equity_research_report_generation/equityresearchagentmanager-generate-all-.py:0`

## `HTMLRenderer.render`
`equity_research_report_generation/htmlrenderer-render.py:0`

## `output_format`
`equity_research_report_generation/output-format.py:0`

## `text_generation`
`equity_research_report_generation/text-generation.py:0`

FILE:references/components/financial_data_collection.md
# financial_data_collection (6 classes)

## `YFinanceUtils.get_stock_price`
`financial_data_collection/yfinanceutils-get-stock-price.py:0`

## `FMPUtils.get_financial_metrics`
`financial_data_collection/fmputils-get-financial-metrics.py:0`

## `SECUtils.extract_filing_section`
`financial_data_collection/secutils-extract-filing-section.py:0`

## `SECExtractor.parse_document`
`financial_data_collection/secextractor-parse-document.py:0`

## `data_source`
`financial_data_collection/data-source.py:0`

## `filing_parser`
`financial_data_collection/filing-parser.py:0`

FILE:references/components/financial_statement_analysis.md
# financial_statement_analysis (3 classes)

## `ReportAnalysisUtils.generate_analysis`
`financial_statement_analysis/reportanalysisutils-generate-analysis.py:0`

## `FinancialDataProcessor.extract_metrics`
`financial_statement_analysis/financialdataprocessor-extract-metrics.py:0`

## `metric_extractor`
`financial_statement_analysis/metric-extractor.py:0`

FILE:references/components/market_catalyst_analysis.md
# market_catalyst_analysis (2 classes)

## `CatalystAnalyzer.analyze`
`market_catalyst_analysis/catalystanalyzer-analyze.py:0`

## `classifier`
`market_catalyst_analysis/classifier.py:0`

FILE:references/components/multi-agent_workflow_orchestration.md
# multi-agent_workflow_orchestration (6 classes)

## `FinRobot.chat`
`multi-agent_workflow_orchestration/finrobot-chat.py:0`

## `SingleAssistant.chat`
`multi-agent_workflow_orchestration/singleassistant-chat.py:0`

## `MultiAssistantWithLeader.chat`
`multi-agent_workflow_orchestration/multiassistantwithleader-chat.py:0`

## `MultiAssistant.chat`
`multi-agent_workflow_orchestration/multiassistant-chat.py:0`

## `workflow_pattern`
`multi-agent_workflow_orchestration/workflow-pattern.py:0`

## `speaker_selection`
`multi-agent_workflow_orchestration/speaker-selection.py:0`

FILE:references/components/quantitative_analysis_-_backtesting.md
# quantitative_analysis_&_backtesting (4 classes)

## `BackTraderUtils.run`
`quantitative_analysis_&_backtesting/backtraderutils-run.py:0`

## `DeployedCapitalAnalyzer.analyze`
`quantitative_analysis_&_backtesting/deployedcapitalanalyzer-analyze.py:0`

## `strategy`
`quantitative_analysis_&_backtesting/strategy.py:0`

## `sizer`
`quantitative_analysis_&_backtesting/sizer.py:0`

FILE:references/components/rag-enhanced_document_retrieval.md
# rag-enhanced_document_retrieval (2 classes)

## `RetrieveUserProxyAgent.retrieve`
`rag-enhanced_document_retrieval/retrieveuserproxyagent-retrieve.py:0`

## `vector_store`
`rag-enhanced_document_retrieval/vector-store.py:0`

FILE:references/components/sensitivity_analysis.md
# sensitivity_analysis (2 classes)

## `SensitivityAnalyzer.analyze`
`sensitivity_analysis/sensitivityanalyzer-analyze.py:0`

## `sensitivity_dimension`
`sensitivity_analysis/sensitivity-dimension.py:0`

FILE:references/components/valuation_analysis.md
# valuation_analysis (2 classes)

## `ValuationEngine.calculate`
`valuation_analysis/valuationengine-calculate.py:0`

## `valuation_method`
`valuation_analysis/valuation-method.py:0`

FILE:references/components/web_application_-_user_interface.md
# web_application_&_user_interface (5 classes)

## `FastAPI endpoints`
`web_application_&_user_interface/fastapi-endpoints.py:0`

## `User.create`
`web_application_&_user_interface/user-create.py:0`

## `ReportRequest.track`
`web_application_&_user_interface/reportrequest-track.py:0`

## `auth_provider`
`web_application_&_user_interface/auth-provider.py:0`

## `session_store`
`web_application_&_user_interface/session-store.py:0`

ClawHub Coding Data Analysis+2

T@clawhub-tangweigang-jpg-8679fec286

Finrl Rl Trading

Skill

Use ensemble deep reinforcement learning (A2C, DDPG, PPO, TD3, SAC) to execute automated multi-market stock trading with

---
name: finrl-rl-trading
description: |-
  Use ensemble deep reinforcement learning (A2C, DDPG, PPO, TD3, SAC) to execute automated multi-market stock trading with
license: Proprietary. See LICENSE.txt in project root.
compatibility: Designed for Doramagic-host ecosystem (Claude Code / openclaw / Cursor). Requires Python 3.12+ with uv package manager.
metadata:
  version: "v6.1"
  blueprint_id: "finance-bp-061"
  compiled_at: "2026-04-22T13:00:18.884984+00:00"
  capability_markets: "multi-market"
  capability_activities: "backtesting, factor-research"
  sop_version: "crystal-compilation-v6.1"
---
# FinRL 强化学习交易 (finrl-rl-trading)

> Use ensemble deep reinforcement learning (A2C, DDPG, PPO, TD3, SAC) to execute automated multi-market stock tr。

## Pipeline

`data_collection -> data_storage -> factor_computation -> target_selection -> trading_execution -> visualization`

## Top Use Cases (14 total)

### Ensemble Stock Trading ICAIF 2020 (`UC-101`)
Executing automated stock trading using an ensemble of multiple DRL agents (A2C, DDPG, PPO, TD3, SAC) to reduce individual agent weakness and improve
**Triggers**: ensemble trading, multiple agents, stock trading

### NeurIPS 2018 DRL Training (`UC-107`)
Training deep reinforcement learning agents (A2C, DDPG, PPO, SAC, TD3) for automated stock trading using the StockTradingEnv environment
**Triggers**: DRL training, stock trading, A2C

### NeurIPS 2018 Ensemble Backtesting (`UC-108`)
Backtesting multiple trained DRL agents against baseline strategies (MVO, DJIA) to evaluate and compare ensemble trading performance
**Triggers**: backtesting, ensemble, DRL agents

For all **14** use cases, see [references/USE_CASES.md](references/USE_CASES.md).

**Execute trigger**: `When user intent matches intent_router.uc_entries[].positive_terms AND user uses action verb (run/execute/跑/执行/backtest/fetch/collect)`

## What I'll Ask You

- Target market: A-share (default), HK, or crypto? (US stocks in ZVT are half-baked — stockus_nasdaq_AAPL exists but coverage is thin)
- Data source / provider: eastmoney (free, no account), joinquant (account+paid), baostock (free, good history), akshare, or qmt (broker)?
- Strategy type: MACD golden-cross, MA crossover, volume breakout, fundamental screen, or custom factor?
- Time range: start_timestamp and end_timestamp for backtest period
- Target entity IDs: specific stocks (stock_sh_600000) or index components (SZ1000)?

## Semantic Locks (Fatal)

| ID | Rule | On Violation |
|---|---|---|
| `SL-01` | Execute sell orders before buy orders in every trading cycle | halt |
| `SL-02` | Trading signals MUST use next-bar execution (no look-ahead) | halt |
| `SL-03` | Entity IDs MUST follow format entity_type_exchange_code | halt |
| `SL-04` | DataFrame index MUST be MultiIndex (entity_id, timestamp) | halt |
| `SL-05` | TradingSignal MUST have EXACTLY ONE of: position_pct, order_money, order_amount | halt |
| `SL-06` | filter_result column semantics: True=BUY, False=SELL, None/NaN=NO ACTION | halt |
| `SL-07` | Transformer MUST run BEFORE Accumulator in factor pipeline | halt |
| `SL-08` | MACD parameters locked: fast=12, slow=26, signal=9 | halt |

Full lock definitions: [references/LOCKS.md](references/LOCKS.md)

## Top Anti-Patterns (25 total)

- **`AP-ZVT-183`**: 除权因子为 inf/NaN 时直接参与乘法导致复权静默失败
- **`AP-ZVT-179`**: 第三方数据接口超限后异常被吞噬，数据静默缺失
- **`AP-ZVT-183B`**: HFQ（后复权）与 QFQ（前复权）K 线表使用错误导致因子计算漂移

All 25 anti-patterns: [references/ANTI_PATTERNS.md](references/ANTI_PATTERNS.md)

## Evidence Quality Notice

> [QUALITY NOTICE] This crystal was compiled from blueprint finance-bp-061. Evidence verify ratio = 18.9% and audit fail total = 32. Generated results may have uncaptured requirement gaps. Verify critical decisions against source files (LATEST.yaml / LATEST.jsonl).

## Reference Files

| File | Contents | When to Load |
|---|---|---|
| [references/seed.yaml](references/seed.yaml) | V6+ 全量权威 (source-of-truth) | 有行为/决策争议时必读 |
| [references/ANTI_PATTERNS.md](references/ANTI_PATTERNS.md) | 25 条跨项目反模式 | 开始实现前 |
| [references/WISDOM.md](references/WISDOM.md) | 跨项目精华借鉴 | 架构决策时 |
| [references/CONSTRAINTS.md](references/CONSTRAINTS.md) | domain + fatal 约束 | 规则冲突时 |
| [references/USE_CASES.md](references/USE_CASES.md) | 全量 KUC-* 业务场景 | 需要完整示例时 |
| [references/LOCKS.md](references/LOCKS.md) | SL-* + preconditions + hints | 生成回测/交易代码前 |
| [references/COMPONENTS.md](references/COMPONENTS.md) | AST 组件地图（按 module 拆分）| 查 API 时 |

---

*Compiled by Doramagic crystal-compilation-v6.1 from `finance-bp-061` blueprint at 2026-04-22T13:00:18.884984+00:00.*
*See [human_summary.md](human_summary.md) for non-technical overview.*

FILE:human_summary.md
# Human Summary

> I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow. Just tell me what you want; I'll write the code, you don't have to dig docs. (Heads up: ZVT natively supports A-share, HK, and crypto. US stocks — stockus_nasdaq_AAPL — are half-baked; don't bother for serious work.)

## What I Can Do

- **tagline**: I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow. Just tell me what you want; I'll write the code, you don't have to dig docs. (Heads up: ZVT natively supports A-share, HK, and crypto. US stocks — stockus_nasdaq_AAPL — are half-baked; don't bother for serious work.)
- **use_cases**: ['Paper Trading with Alpaca API', 'Graph Portfolio Manager with GNN', 'Ensemble Stock Trading ICAIF 2020', 'A-share MACD daily golden-cross backtest with hfq price adjustment from eastmoney', 'End-to-end ZVT pipeline: FinanceRecorder + GoodCompanyFactor + StockTrader', 'Multi-factor strategy with TargetSelector (AND mode) combining MACD + volume breakout', 'Index composition data collection (SZ1000, SZ2000) with EM recorder']

## What I Ask You

- Target market: A-share (default), HK, or crypto? (US stocks in ZVT are half-baked — stockus_nasdaq_AAPL exists but coverage is thin)
- Data source / provider: eastmoney (free, no account), joinquant (account+paid), baostock (free, good history), akshare, or qmt (broker)?
- Strategy type: MACD golden-cross, MA crossover, volume breakout, fundamental screen, or custom factor?
- Time range: start_timestamp and end_timestamp for backtest period
- Target entity IDs: specific stocks (stock_sh_600000) or index components (SZ1000)?

FILE:references/ANTI_PATTERNS.md
# Anti-Patterns (Cross-Project)

Total: **25**

## qlib (9)

### `AP-QLIB-1930` — 回测结果与模型无关——共享 dataset 对象导致预测值被首次模型覆盖 <sub>(high)</sub>

Qlib 中多个模型复用同一个已 fit 的 DatasetH 实例时，dataset 内部的标准化 参数（fit_start_time/fit_end_time 决定的归一化统计量）在第一次 fit 后固化。 切换模型但不重新初始化 dataset，导致所有模型实际使用同一套预测信号。表现为 无论换 LightGBM/XGBoost/DNN，回测净值曲线完全一致。这是最危险的"实验看起来 在跑，但结论全部无效"反模式。

Source: https://github.com/microsoft/qlib/issues/1930

### `AP-QLIB-2090` — fit_start_time 与 train segment 双重配置引发隐式数据泄露 <sub>(high)</sub>

Qlib DatasetH 有两个"训练数据范围"：handler 的 fit_start_time/fit_end_time （决定归一化器拟合范围）和 segments.train（决定模型训练范围）。常见错误是 让 fit_end_time 覆盖 valid/test 段，使归一化统计量（均值、标准差）包含了 未来数据，造成前向偏差（look-ahead bias）。两者独立配置但语义耦合，文档 未明确说明 fit_end_time 必须 <= train_end。

Source: https://github.com/microsoft/qlib/issues/2090

### `AP-QLIB-2036` — MACD 因子公式文档错误——DEA 被多除一次 CLOSE 导致量纲不一致 <sub>(high)</sub>

Qlib 官方文档中的 Alpha 公式示例将 MACD 的 DEA 定义为 EMA(DIF, 9) / CLOSE， 但 DIF 已经是无量纲（除过 CLOSE 的），再次除以 CLOSE 导致 DEA 量纲为 1/price。 基于此文档公式构建的 MACD 因子在截面标准化后与正确公式差异显著，IC 下降。 此类文档层面的公式错误会被大量用户直接照搬入生产因子库。

Source: https://github.com/microsoft/qlib/issues/2036

### `AP-QLIB-2184` — 自定义 A 股数据导入前未按约定填充停牌日 NaN，引发下游因子噪声 <sub>(high)</sub>

Qlib 约定停牌日 open/close/high/low/volume/factor 字段均应填 NaN，以便框架 在因子计算时识别并跳过。用户自建 A 股数据集时若将停牌日保留为上一日价格 （常见于从东财/Wind 直接导出的数据），会导致停牌期间的价格动量因子出现 "假信号"（价格不变但因子非零）。Qlib 不校验此约定，错误静默流入训练数据。

Source: https://github.com/microsoft/qlib/issues/2184

### `AP-QLIB-1892` — PIT（Point-In-Time）财务数据收集器依赖外部股票列表接口，全量 A 股获取不完整 <sub>(high)</sub>

Qlib 的 PIT 数据收集器（财务数据时间点快照）在初始化时调用 get_hs_stock_symbols() 获取沪深股票列表。该函数依赖东财 API，经常仅返回 部分列表而非全量 5000+ 股票，且函数在获取不完整时直接 raise ValueError。 用户若按文档步骤操作，财务数据集将只覆盖部分股票，基于 PIT 财务因子的回测 存在严重生存者偏差（未被采集的股票被隐式排除）。

Source: https://github.com/microsoft/qlib/issues/1892

### `AP-QLIB-2097` — 全市场 instrument="all" 在 32GB 内存机器上 OOM，但 CSI300 正常 <sub>(medium)</sub>

Qlib 在加载 Alpha158 特征时会将指定 universe 的全部特征矩阵一次性载入内存。 使用 instrument="csi300"（300 股）与 instrument="all"（5000+ 股）的内存占用 差约 16 倍。32GB 机器跑全市场时在 init_instance_by_config 阶段直接 OOM， 错误信息不提示内存问题。用户容易误以为是配置错误，实际上需要分批加载或 使用流式特征计算。

Source: https://github.com/microsoft/qlib/issues/2097

### `AP-QLIB-1984` — LightGBM 模型标签维度校验逻辑永远不触发导致多标签训练静默失败 <sub>(medium)</sub>

Qlib gbdt.py 中用 y.values.ndim == 2 判断是否为多标签，但从 DataFrame 取出的 Series 的 ndim 永远为 1，条件永远为 False，因此多标签训练不会走 squeeze 分支，而是直接进入 LightGBM 训练并在更深处抛出语义不明的错误。 用户尝试自定义多标签任务时无法从错误信息定位到此根因。

Source: https://github.com/microsoft/qlib/issues/1984

### `AP-QLIB-1915` — 自定义 CSV 数据 dump_bin 后 DataHandler 报 Length mismatch，D.features 却正常 <sub>(high)</sub>

Qlib 存在两套数据访问路径：D.features（直接读 binary）和 DataHandler/DataHandlerLP （带 processor pipeline）。自定义 A 股 CSV 数据在 dump_bin 时若字段顺序 或 symbol 格式（如 600000.SH vs SH600000）与 Qlib 约定不符，DataHandler 的 processor 在 align/reindex 时触发 Length mismatch，而 D.features 因不 经过 processor 而成功。这一"两套路径行为不一致"让用户误以为数据已正确导入。

Source: https://github.com/microsoft/qlib/issues/1915

### `AP-QLIB-1949` — Colab/Linux 多进程后端与 Qlib ParallelExt 冲突导致 DataHandler 完全不可用 <sub>(medium)</sub>

Qlib 在非 fork 环境（Windows 或 Google Colab）中，DataHandler 使用 joblib 并行加载特征时，ParallelExt 初始化时访问 _backend_args 属性失败（AttributeError）。 根因是 joblib 1.5+ 移除了该内部属性，Qlib 的兼容层未更新。表现为 D.features 调用抛出多层嵌套异常，用户无法从错误栈判断是并行后端问题还是数据问题。

Source: https://github.com/microsoft/qlib/issues/1949

## vnpy (4)

### `AP-VNPY-3691` — K 线生成器首根 K 线时间戳不对齐，导致第一个周期信号错误 <sub>(high)</sub>

vnpy BarGenerator 在合成 N 分钟 K 线时，第一根推送的 K 线时间戳为"当前 tick 所在分钟"而非"完整 N 分钟周期结束时间"。具体表现：09:59 的 tick 会 触发一根不完整的 5 分钟 K 线推送（本应等到 10:04 才推送）。策略若在 on_bar 中直接用 datetime.minute % 5 过滤，第一根 K 线恰好通过，但包含的 数据不足一个完整周期，用于信号计算会产生错误的开仓信号。

Source: https://github.com/vnpy/vnpy/issues/3691

### `AP-VNPY-3669` — Alpha 模块历史数据增量保存时新旧 DataFrame schema 不兼容导致 SchemaError <sub>(medium)</sub>

vnpy Alpha 模块在保存 K 线数据到 Parquet 文件时，将新下载数据（可能含 Float64 列）与已存文件（历史 Int64 列）直接 polars.concat。polars 强类型 不允许隐式类型提升，抛出 SchemaError。根因是不同数据源/版本返回的字段类型 不一致（如 volume 在部分行情源为整数，在另一些为浮点），且 concat 前无 schema 对齐步骤。影响所有使用 vnpy alpha 进行回测的历史数据构建流程。

Source: https://github.com/vnpy/vnpy/issues/3669

### `AP-VNPY-3685` — 价差交易模块 run_backtesting() 在 Jupyter 环境下静默报错，结果不可信 <sub>(high)</sub>

vnpy 4.10 价差交易（SpreadTrading）模块的 run_backtesting() 在 Jupyter 环境下存在事件循环冲突（asyncio already running），导致回测引擎部分逻辑 不执行但不抛异常，返回看似正常的回测统计数据。同样代码在命令行 Python 中无此问题。vnpy 4.x 将部分 IO 改为 async 但 Jupyter 的事件循环与之不兼容， 是"回测结果看起来正确但实际不完整"的隐蔽陷阱。

Source: https://github.com/vnpy/vnpy/issues/3685

### `AP-VNPY-3700` — 安装脚本不使用 venv 导致全局 numpy 版本被降级破坏其他依赖 <sub>(medium)</sub>

vnpy install.bat 直接在系统/conda base 环境安装，会强制降级 numpy 到 <2.0 以满足 vnpy 依赖，破坏依赖 numpy 2.x 的其他量化工具（如 scipy、pytorch 新版）。 没有 requirements.txt，依赖边界不透明。在多工具共存的量化研究环境中， vnpy 的安装脚本是"全局环境污染"的常见根源。

Source: https://github.com/vnpy/vnpy/issues/3700

## zipline (6)

### `AP-ZIPLINE-138` — 回测价格为未复权价，教程图表误导用户误判策略收益 <sub>(high)</sub>

Zipline 教程使用 AAPL 股价图做演示，但 bundle 中存储的是未复权价格（raw price）， 而非经过拆股/分红调整的复权价。图表显示的历史价格与市场实际价约差 4 倍（Apple 历次拆股累计因子），用户误将"价格翻 4 倍"当作策略收益。A 股场景更严重： 除权前后价格跳变会在未复权数据中形成巨大"信号"，吸引技术指标在除权日产生 虚假突破信号。

Source: https://github.com/stefan-jansen/zipline-reloaded/issues/138

### `AP-ZIPLINE-235` — 默认以当根 K 线收盘价成交，低估实盘滑点，策略回测收益虚高 <sub>(high)</sub>

Zipline 默认滑点模型在当根 K 线触发信号后，以同根 K 线收盘价成交（current bar close fill）。实盘中信号只能在下一根 K 线的开盘价附近成交（T+1 order execution）。以 A 股日线为例，用收盘价回测比用次日开盘价成交平均高估日收益 约 0.1-0.3%，年化差距可超 30%。需显式配置 slippage model 为 VolumeShareSlippage 或 FixedSlippage 并设合理 volume_limit。

Source: https://github.com/stefan-jansen/zipline-reloaded/issues/235

### `AP-ZIPLINE-190` — 日历 start_session 设为非交易日触发 DateOutOfBounds，无提示如何修正 <sub>(medium)</sub>

Zipline 在注册 bundle 或运行算法时，若 start_session 参数恰好是非交易日 （如 1998-01-01 元旦），Calendar 校验抛出 DateOutOfBounds（"cannot be earlier than the first session"）。错误信息仅显示交易日历起始日，不提示"请改为第一个 交易日"。A 股场景：使用 SSE/SZSE 日历时，若 start_date 恰好是春节前最后 一天次日（节假日），会触发同类错误，调试成本极高。

Source: https://github.com/stefan-jansen/zipline-reloaded/issues/190

### `AP-ZIPLINE-181` — asset db 过期后 Pipeline 报"no assets traded"，误导用户排查数据范围 <sub>(high)</sub>

Zipline 的 asset database（SQLite）记录每只股票的 start/end 交易日期。若 使用了旧版 Quandl/自建 bundle 且未重新 ingest，在回测新日期范围时 Pipeline 抛出 "Failed to find any assets with country_code 'US' that traded between [dates]"。A 股场景：重新下载行情后若只更新价格数据而未重建 asset db，退市/ 新上市股票的日期范围不更新，Pipeline 过滤会悄悄排除这些股票，产生生存者偏差。

Source: https://github.com/stefan-jansen/zipline-reloaded/issues/181

### `AP-ZIPLINE-285` — week_start()/week_end() 在自定义日历（非美股）下静默失效 <sub>(medium)</sub>

Zipline schedule_function 的 date_rules.week_start() 和 date_rules.week_end() 依赖交易日历的周首/周末判断逻辑，但在非美股日历（如 ASX、SSE）中，该逻辑 与 NYSE 日历的偏移计算不兼容，导致 schedule 永远不触发或在错误的日期触发。 A 股场景：使用 SSE 日历时，含春节等连续长假的周，week_start 可能跳过整个 假期周而不调仓，但用户无法从日志发现未触发的调度。

Source: https://github.com/stefan-jansen/zipline-reloaded/issues/285

### `AP-ZIPLINE-240` — 回测日期时区必须为 UTC，传入 naive datetime 引发深层 AssertionError <sub>(medium)</sub>

Zipline 内部强制要求所有时间戳为 UTC aware datetime。当用户传入 naive datetime （无时区信息，如 pd.Timestamp('2020-01-01')）时，不在入口处报错，而是在 算法执行深处触发 AssertionError: Algorithm should have a utc datetime，栈深 难以定位。A 股开发者从本地 CST 时间导入数据时极易触发此陷阱，需在 bundle 注册时显式 tz_localize('UTC')。

Source: https://github.com/stefan-jansen/zipline-reloaded/issues/240

## zvt (6)

### `AP-ZVT-183` — 除权因子为 inf/NaN 时直接参与乘法导致复权静默失败 <sub>(high)</sub>

ZVT 在计算前复权因子时以 new/old 价格比计算 qfq_factor。当 old==0（新股首日 或数据缺失）时因子为 inf；当 kdata.open 本身为 None（停牌日未填充）时乘法 抛出 TypeError。结果：整个 entity 的复权计算中断，后续 K 线全部丢失，但主 流程只 log ERROR 不中断，用户往往不知道已有大量股票数据损坏。

Source: https://github.com/zvtvz/zvt/issues/183

### `AP-ZVT-179` — 第三方数据接口超限后异常被吞噬，数据静默缺失 <sub>(high)</sub>

ZVT 使用聚宽 jqdatasdk 批量拉取全市场 K 线时（4000+ 股票），触发聚宽每日 最大查询条数限制（错误：已超过每日最大查询数量）。ZVT 捕获异常后继续执行下一 entity，导致超限后所有股票的当日数据均静默缺失。回测若使用该残缺数据库，因 子计算结果将产生系统性偏差，且无告警。

Source: https://github.com/zvtvz/zvt/issues/179

### `AP-ZVT-161` — 全市场 SQLite 批量因子计算触发 too many SQL variables 错误 <sub>(medium)</sub>

ZVT 在计算 VolumeUpMaFactor 等多股因子时，将所有 entity_id 拼入单条 SQL 的 IN 子句。当 A 股全市场（5000+ 股）一次性查询时，触发 SQLite 默认限制 SQLITE_MAX_VARIABLE_NUMBER=999。调大 max_allowed_packet（MySQL 参数）无效， 根因是 SQLite 变量数上限。正确解法是分批查询，但 ZVT 早期版本未处理此边界。

Source: https://github.com/zvtvz/zvt/issues/161

### `AP-ZVT-129` — 使用通配符导入隐藏 API 版本变更，AdjustType 等枚举莫名消失 <sub>(medium)</sub>

ZVT 文档示例使用 `from zvt import *` 导入所有符号。当 ZVT 版本升级重构 枚举（如将 AdjustType 移入子模块）后，通配符导入不再包含该符号，触发 AttributeError。使用者误以为是安装问题，实际是版本间 API breaking change 未在 CHANGELOG 中标注，且通配符导入掩盖了具体来源。应显式 import 枚举类。

Source: https://github.com/zvtvz/zvt/issues/129

### `AP-ZVT-187` — 回测引擎未在数据层空结果时提前终止，导致空指针级联崩溃 <sub>(medium)</sub>

ZVT Trader 在 load_data 完成后检查数据为空时，不提前退出，而是将空 DataFrame 传入 selector 计算，触发后续 NoneType 操作链式崩溃。错误栈深且难以定位根因， 用户误以为是策略逻辑问题。根因是数据时间窗口配置错误（start/end 不在数据 库覆盖范围内）但无有效校验。

Source: https://github.com/zvtvz/zvt/issues/187

### `AP-ZVT-183B` — HFQ（后复权）与 QFQ（前复权）K 线表使用错误导致因子计算漂移 <sub>(high)</sub>

ZVT 提供 Stock1dKdata（不复权）、Stock1dHfqKdata（后复权）、Stock1dQfqKdata （前复权）三张独立表。用户在计算价格动量/均线因子时混用两张表（如用不复权 做均线，用后复权做收益率），导致除权日前后因子值产生跳变。ZVT 不做跨表 复权类型一致性校验，混用静默通过。

Source: https://github.com/zvtvz/zvt/issues/183

FILE:references/COMPONENTS.md
# Component Capability Map

**Project**: finance-bp-061--FinRL
**Scan date**: 2026-04-22
**Stats**: {'total_files': 9, 'total_classes': 39, 'total_functions': 0, 'total_stages': 9}

## Modules (9)

- [market_data_acquisition](components/market_data_acquisition.md): 4 classes
- [data_cleaning_&_alignment](components/data_cleaning_-_alignment.md): 3 classes
- [technical_indicator_computation](components/technical_indicator_computation.md): 4 classes
- [normalization_&_array_conversion](components/normalization_-_array_conversion.md): 3 classes
- [gym_environment_creation](components/gym_environment_creation.md): 6 classes
- [drl_model_training](components/drl_model_training.md): 5 classes
- [ensemble_validation](components/ensemble_validation.md): 4 classes
- [backtesting_&_paper_trading](components/backtesting_-_paper_trading.md): 5 classes
- [performance_metrics_&_visualization](components/performance_metrics_-_visualization.md): 5 classes

FILE:references/CONSTRAINTS.md
# Constraints

## preservation_manifest

```yaml
required_objects:
  business_decisions_count: 134
  fatal_constraints_count: 60
  non_fatal_constraints_count: 195
  use_cases_count: 14
  semantic_locks_count: 12
  preconditions_count: 4
  evidence_quality_rules_count: 2
  traceback_scenarios_count: 5

```

## Domain Constraints Injected (39)

- **`SHARED-BT-LAB-001`** <sub>(fatal)</sub>: 未来函数（Lookahead Bias）：在模拟历史时间点 t 的交易决策时， 不得使用 t 时刻之后才能知道的信息。最常见形式： (1) 使用收盘价计算信号并同日以收盘价成交； (2) 将 T 日收盘后计算的指标标记在同一根 K 线； (3) 使用当日最高/最低价作为成交假设。 信号计算与成交时间必须对齐：T 日收盘后计算信号，T+1 日开盘成交。
- **`SHARED-BT-LAB-002`** <sub>(high)</sub>: 指标预热期（Warmup Period）处理：滚动窗口指标在前 N 个 bar 时 NaN， 这些 bar 不应参与信号计算和持仓决策。强制要求指标的 warmup_period 与最长 lookback 期等长，且 warmup 期间持仓应置零。
- **`SHARED-BT-LAB-003`** <sub>(fatal)</sub>: ML/DL 模型时序数据划分必须按时间顺序：TRAIN < VALID < TEST， 不可使用随机 k-fold 分折（会将未来数据混入训练集）。 应使用 TimeSeriesSplit 或 Walk-Forward 验证。
- **`SHARED-BT-LAB-004`** <sub>(fatal)</sub>: 开盘价/最高价/最低价成交假设：日线回测中假设每日可以最高价卖出或 最低价买入（如动量策略"最高价止盈"），这是明显的 lookahead， 因为日内最高/最低价只有收盘后才能确认。成交价只能用开盘价或 前一日收盘价（带滑点）。
- **`SHARED-BT-LAB-005`** <sub>(high)</sub>: 数据对齐偏移（Off-by-one）：pandas rolling/shift 等操作容易引入细微的 1 期偏移错误。应在代码中明确记录每个序列的"观测时间点"， 并通过 assert 验证关键时间对齐关系。
- **`SHARED-BT-LAB-006`** <sub>(high)</sub>: 过度优化（Overfitting）：回测数量越多，过拟合概率越高。 Bailey et al.（2014）证明 Optimal Sharpe Ratio 期望值随回测次数单调递减。 应使用 Walk-Forward 验证代替 in-sample 参数穷举，并报告 Deflated Sharpe Ratio（DSR）而非峰值 Sharpe。
- **`SHARED-BT-SURV-001`** <sub>(fatal)</sub>: 幸存者偏差（Survivorship Bias）：使用当前市场成分股作为历史回测股票池， 会遗漏曾经存在但后来退市、摘牌或被合并的股票，系统性高估策略历史收益率。 回测股票池必须使用历史时点快照（point-in-time universe）。
- **`SHARED-BT-SURV-002`** <sub>(high)</sub>: In-Sample / Out-of-Sample 划分：策略开发、参数选择必须在样本内完成， 样本外数据仅用于最终验证，不可多次"看"样本外数据后继续调优 （会将样本外变为新的样本内，重蹈过拟合）。
- **`SHARED-BT-SURV-003`** <sub>(high)</sub>: 停牌/缺失数据的填充策略：停牌日价格不可简单用前一日收盘价 forward-fill， 因为这会在复盘时造成"零成交量"日参与了因子计算和信号生成。 应在因子计算层显式过滤缺失交易日，不填充。
- **`SHARED-BT-SURV-004`** <sub>(high)</sub>: 异常值（Extreme Value）污染：原始市场数据可能含有数据源错误（如除权未 及时调整、手工录入错误导致的极端价格），不清洗直接进入因子计算会产生 极端信号，污染整个横截面。应在 pipeline 入口处过滤 3-sigma 异常值。
- **`SHARED-BT-COST-001`** <sub>(fatal)</sub>: 交易成本（佣金 + 印花税/转让税 + 过户费）必须在回测初始化时强制配置， 不可使用零成本默认值。忽略成本的回测策略绩效指标具有欺骗性， 高换手率策略尤其严重（单边往返成本往往吞噬 50%+ 的毛收益）。
- **`SHARED-BT-COST-002`** <sub>(high)</sub>: 滑点（Slippage）建模：回测若无滑点，假设每笔订单以理想价格成交， 高频策略在实盘中会因成交价劣化而产生严重亏损。至少应配置固定点差 或比例滑点；大单应使用成交量比例模型（如不超过日成交量 5%）。
- **`SHARED-BT-COST-003`** <sub>(high)</sub>: 换手率（Turnover）必须在回测绩效报告中展示并与成本关联分析。 月换手率超过 50%（年化 600%+）时，策略净收益对成本假设极度敏感， 每 10bps 成本变化可能改变策略盈亏结论，必须做成本敏感性分析。
- **`SHARED-BT-COST-004`** <sub>(medium)</sub>: 仓位规模化（Position Sizing）必须纳入资金量约束：回测应模拟固定资金量 下的实际持仓股数（取整），而非假设可以持有小数股。 对小盘股，最小交易单位（A股：100股/手）会导致实际可持仓量与目标权重 产生偏差，应在回测中模拟取整效应。
- **`SHARED-BT-TIME-001`** <sub>(high)</sub>: 时间戳时区统一：多数据源合并时，UTC vs 本地时间混用是常见数据腐败源。 所有时间戳必须在 pipeline 入口处统一转换为同一时区（推荐 UTC 存储， 市场本地时区展示），不可在 pipeline 中途混用不同时区。
- **`SHARED-BT-TIME-002`** <sub>(high)</sub>: 交易日历对齐：合并不同市场或不同频率数据时（如日线价格 + 周频因子）， 必须使用明确的交易日历进行 reindex/merge，不可使用 outer join 后 fillna， 否则会在非交易日（节假日）创建虚假数据行。
- **`SHARED-BT-TIME-003`** <sub>(high)</sub>: 增量更新边界校验：历史数据增量更新时，必须从数据库查询已存最新日期， 仅下载该日期之后的数据。若重新下载已有数据并追加，会产生时间戳重复行， 导致回测时序错误。更新前后必须校验无重复 (index.duplicated().any() == False)。
- **`SHARED-BT-TIME-004`** <sub>(medium)</sub>: 回测绩效归因失真：基准（Benchmark）选择不当会使 Alpha/Beta 计算失真。 应选用策略实际可投资的被动基准（如 HS300 ETF），而非不可直接投资的 价格指数（如 HS300 指数）。价格指数不含股息再投资，会低估持仓基准收益。
- **`SHARED-BT-PERF-001`** <sub>(medium)</sub>: 最大回撤（Max Drawdown）计算必须使用净值序列（portfolio value）， 不可用累计收益率序列代替。若使用对数收益率累加，会低估回撤深度 （因对数收益率在下跌时会比简单收益率偏小）。
- **`SHARED-BT-PERF-002`** <sub>(medium)</sub>: Sharpe Ratio 年化化约定：年化 Sharpe = 日 Sharpe × sqrt(252)（股票，252 交易日） 或 × sqrt(365)（加密货币，365日）。不同系统默认不同，跨系统对比前必须 确认年化因子，否则 Sharpe 不可比。
- **`SHARED-BT-PERF-003`** <sub>(medium)</sub>: Calmar Ratio / Sortino Ratio 优于 Sharpe Ratio 作为风险调整收益指标： Sharpe 假设收益正态分布，A 股/加密市场的收益分布显著左偏（肥尾）， 会低估下行风险。量化评估应同时报告 Sortino（仅下行波动）和 Calmar（年化收益/最大回撤），不应单一依赖 Sharpe。
- **`SHARED-BT-PERF-004`** <sub>(medium)</sub>: 回测绩效归因应拆解为：alpha（主动收益）、beta（市场收益）、 因子暴露收益（style/sector）和特异性收益（stock selection）。 不做归因的回测无法区分"策略优秀"与"顺风行情恰好 beta 对了"。
- **`SHARED-FR-IC-001`** <sub>(high)</sub>: IC（信息系数）是衡量因子预测能力的核心指标，定义为因子值与 下期收益率的 Spearman 秩相关系数（ICIR = IC / std(IC)）。 IC 绝对值 > 0.05 视为有预测能力的初步证据，ICIR > 0.5 视为稳定。 不计算 IC 直接报告回测绩效是因子有效性证明缺失的典型问题。
- **`SHARED-FR-IC-002`** <sub>(high)</sub>: IC 衰减（IC Decay）分析：因子预测能力通常随持仓期增长而衰减。 应计算 1/5/10/20 日 IC 序列，识别因子的最优持仓期。 IC 在1日高但20日迅速衰减的因子是短期因子，不适合月度换仓策略； 反之亦然。使用错误的持仓期会严重损害因子实盘表现。
- **`SHARED-FR-IC-003`** <sub>(high)</sub>: Harvey, Liu & Zhu (2016) 警告：学术界已发现 300+ 个"显著"因子， 其中大量是多重检验下的误发现（False Discovery）。因子有效性要求： t-stat > 3.0（而非传统的 1.96）；或在不同时段/市场独立复现； 或有清晰的经济学逻辑。不满足上述条件的因子极可能是数据挖掘产物。
- **`SHARED-FR-IC-004`** <sub>(high)</sub>: 因子换手率（Factor Turnover）控制：高 IC 但高换手率的因子，在扣除 交易成本后净 IC 可能为负。应计算换手率调整后的有效 IC： net_IC = IC - turnover × cost_per_turn。目标换手率 ≤ 50%（月频）。
- **`SHARED-FR-IC-005`** <sub>(medium)</sub>: 因子衰减期（Half-life）是因子信号强度的核心参数，直接决定最优再平衡频率。 半衰期 < 5 日：日频或周频换仓；5-20 日：周频或双周；> 20 日：月频换仓。 错误地对短期因子使用月频换仓，会导致大量 alpha 在持仓期内消散。
- **`SHARED-FR-NEUT-001`** <sub>(high)</sub>: 行业中性化（Industry Neutralization）：因子值若不对行业均值中性化， 因子收益中会混入行业轮动收益，难以判断是因子本身还是行业暴露驱动了收益。 行业中性化操作：factor_neutral = factor - industry_mean(factor)。
- **`SHARED-FR-NEUT-002`** <sub>(high)</sub>: 市值中性化（Market Cap Neutralization）：小盘股效应（小盘跑赢大盘） 是金融史上最持久的 anomaly 之一，会污染几乎所有未中性化的因子。 若因子与市值高度相关，选股会系统性偏向小盘，收益来自市值暴露而非因子本身。 需同时进行行业和市值中性化（Fama-MacBeth 回归或残差法）。
- **`SHARED-FR-NEUT-003`** <sub>(high)</sub>: 异常值处理（Winsorize/MAD）：因子原始值通常含有极端值，极端值会扭曲 分组分析（如 Q1/Q10 十分位）。应对原始因子值做 Winsorize（截尾至 [1%, 99%] 或 3-sigma）或 MAD（中位数绝对偏差）缩尾，然后再排名/中性化。
- **`SHARED-FR-NEUT-004`** <sub>(medium)</sub>: 因子正交化（Factor Orthogonalization）：当多个因子共同用于合成打分时， 高相关因子的合成等效于对单一因子过度权重，稀释信号多样性。 应在合成前对因子做施密特正交化或 PCA，消除因子间的多重共线性。
- **`SHARED-FR-NEUT-005`** <sub>(medium)</sub>: 缺失数据填充策略：因子计算中的 NaN（停牌/新股/数据缺口）若用截面均值填充 会引入 lookahead bias（均值本身含未来信息）；若完全删除会产生幸存者偏差； 正确做法是用截面中位数（当日所有股票的中位数，不依赖未来）或将该股当日排除。
- **`SHARED-FR-PORT-001`** <sub>(high)</sub>: 分层分析（Quantile Analysis）：因子评估应使用 Q1/Q5（五分位）或 Q1/Q10（十分位）分组的多空收益差（top minus bottom spread）作为 主要评估指标，而非简单的多头收益。Q1 多 Q5 空的"单调性"检验是 因子有效性的核心证据：单调递增/递减 > 非单调 >> 仅多头有效。
- **`SHARED-FR-PORT-002`** <sub>(medium)</sub>: Alpha 衰减测试（Alpha Decay Test）：因子的月度 IC 在不同时段（牛市/熊市/ 震荡市）的稳定性是因子鲁棒性的重要证据。IC 仅在某个特定市场状态下有效 的因子不适合全天候部署；应分段（rolling 12M）展示 IC 时序， 识别因子失效期。
- **`SHARED-FR-PORT-003`** <sub>(medium)</sub>: 换仓成本感知（Turnover-Aware Selection）：因子排名靠近中间地带（49-51 分位） 的股票，排名小幅波动就会触发换仓，产生大量无效交易成本。 应在选股时设置换仓缓冲区（buffer zone）：只在排名变化超过阈值时才换仓。
- **`SHARED-FR-PORT-004`** <sub>(medium)</sub>: 分组收益的统计显著性（Bootstrap 检验）：因子分层收益差（Q1-Q5 spread） 即使在历史数据上很大，也可能是偶然，需要 bootstrap 或 t-test 检验 显著性（p-value < 0.05）。小样本回测期（< 3年）的分层收益尤其不可靠。
- **`SHARED-FR-XFER-001`** <sub>(high)</sub>: 因子跨市场可移植性验证：在一个市场有效的因子，不必然在另一个市场有效。 将美股因子直接套用 A 股、或将股票因子套用期货/加密货币，需要独立 IC 验证， 不可假设跨市场通用性。A 股特有异象（如反转效应、ST 价格异常）不存在于美股。
- **`SHARED-FR-XFER-002`** <sub>(medium)</sub>: 因子有效性时间稳定性：曾经有效的因子会因市场学习和套利行为逐渐失效 （McLean & Pontiff 2016 证明因子发表后平均衰减 58%）。 应定期（每季度/年）重新评估因子 IC，失效因子应及时替换或降权。
- **`SHARED-FR-XFER-003`** <sub>(medium)</sub>: 因子与宏观经济环境的交互：利率周期/经济周期/市场情绪对因子有效性影响显著。 价值因子（低 P/B）在利率上升期更有效；动量因子在趋势市更有效，震荡市失效。 部署因子前应评估当前宏观环境与因子最优生存环境的匹配度。

FILE:references/LOCKS.md
# Semantic Locks + Preconditions

## Semantic Locks (12)

### `SL-01` <sub>(on_violation: fatal)</sub>
Execute sell orders before buy orders in every trading cycle

### `SL-02` <sub>(on_violation: fatal)</sub>
Trading signals MUST use next-bar execution (no look-ahead)

### `SL-03` <sub>(on_violation: fatal)</sub>
Entity IDs MUST follow format entity_type_exchange_code

### `SL-04` <sub>(on_violation: fatal)</sub>
DataFrame index MUST be MultiIndex (entity_id, timestamp)

### `SL-05` <sub>(on_violation: fatal)</sub>
TradingSignal MUST have EXACTLY ONE of: position_pct, order_money, order_amount

### `SL-06` <sub>(on_violation: fatal)</sub>
filter_result column semantics: True=BUY, False=SELL, None/NaN=NO ACTION

### `SL-07` <sub>(on_violation: fatal)</sub>
Transformer MUST run BEFORE Accumulator in factor pipeline

### `SL-08` <sub>(on_violation: fatal)</sub>
MACD parameters locked: fast=12, slow=26, signal=9

### `SL-09` <sub>(on_violation: warning)</sub>
Default transaction costs: buy_cost=0.001, sell_cost=0.001, slippage=0.001

### `SL-10` <sub>(on_violation: fatal)</sub>
A-share equity trading is T+1 (no same-day close of buy positions)

### `SL-11` <sub>(on_violation: fatal)</sub>
Recorder subclass MUST define provider AND data_schema class attributes

### `SL-12` <sub>(on_violation: fatal)</sub>
Factor result_df MUST contain either 'filter_result' OR 'score_result' column

## Preconditions (4)

- **`PC-01`**: `python3 -c 'import zvt; print(zvt.__version__)'` → on_fail: Run: python3 -m pip install zvt then re-run: python3 -m zvt.init_dirs to initialize data directories
- **`PC-02`**: `python3 -c "from zvt.api.kdata import get_kdata; df = get_kdata(entity_ids=['stock_sh_600000'], limit=1); assert df is not None and len(df) > 0, 'No kdata found'"` → on_fail: Run recorder first: python3 -m zvt.recorders.em.em_stock_kdata_recorder --entity_ids stock_sh_600000 (replace with your target entity IDs)
- **`PC-03`**: `python3 -c "import os; from pathlib import Path; zvt_home = Path(os.environ.get('ZVT_HOME', Path.home() / '.zvt')); assert zvt_home.exists(), f'ZVT home not found: {zvt_home}'"` → on_fail: Run: python3 -m zvt.init_dirs
- **`PC-04`**: `python3 -c "import os, tempfile; from pathlib import Path; zvt_home = Path(os.environ.get('ZVT_HOME', Path.home() / '.zvt')); test_f = zvt_home / '.write_test'; test_f.touch(); test_f.unlink()"` → on_fail: Check directory permissions: chmod u+w ~/.zvt or set ZVT_HOME environment variable to a writable location

FILE:references/USE_CASES.md
# Known Use Cases (KUC)

Total: **14**

## `KUC-101`
**Source**: `examples/FinRL_Ensemble_StockTrading_ICAIF_2020.ipynb`

Executing automated stock trading using an ensemble of multiple DRL agents (A2C, DDPG, PPO, TD3, SAC) to reduce individual agent weakness and improve risk-adjusted returns.

## `KUC-102`
**Source**: `examples/FinRL_GPM_Demo.ipynb`

Optimizing stock portfolios using Graph Neural Networks (GPM architecture) that capture temporal and relational relationships between stocks in the NASDAQ market.

## `KUC-103`
**Source**: `examples/FinRL_PaperTrading_Demo.ipynb`

Executing simulated real-time stock trading with Alpaca paper trading API using a custom PPO neural network architecture to test strategies without financial risk.

## `KUC-104`
**Source**: `examples/FinRL_PaperTrading_Demo_refactored.py`

Production-ready paper trading script using Alpaca API with command-line argument parsing for automated DOW 30 stock trading.

## `KUC-105`
**Source**: `examples/FinRL_PortfolioOptimizationEnv_Demo.ipynb`

Optimizing cryptocurrency or stock portfolios using EIIE (Environment-Informed Investment Encoder) architecture for Brazilian market stocks.

## `KUC-106`
**Source**: `examples/FinRL_StockTrading_2026_1_data.py`

Fetching and processing stock market data from Yahoo Finance with technical indicators for automated stock trading model development.

## `KUC-107`
**Source**: `examples/FinRL_StockTrading_2026_2_train.py`

Training deep reinforcement learning agents (A2C, DDPG, PPO, SAC, TD3) for automated stock trading using the StockTradingEnv environment.

## `KUC-108`
**Source**: `examples/FinRL_StockTrading_2026_3_Backtest.py`

Backtesting multiple trained DRL agents against baseline strategies (MVO, DJIA) to evaluate and compare ensemble trading performance.

## `KUC-109`
**Source**: `finrl/applications/Stock_NeurIPS2018/Stock_NeurIPS2018_1_Data.ipynb`

Fetching DOW 30 stock data with VIX fear index and turbulence indicators for robust market condition modeling in stock trading.

## `KUC-110`
**Source**: `finrl/applications/Stock_NeurIPS2018/Stock_NeurIPS2018_2_Train.ipynb`

Training A2C reinforcement learning agent for automated stock trading with technical indicators and trading cost considerations.

## `KUC-111`
**Source**: `finrl/applications/Stock_NeurIPS2018/Stock_NeurIPS2018_3_Backtest.ipynb`

Evaluating and comparing multiple DRL trading agents (A2C, DDPG, PPO, SAC, TD3) through backtesting against market baselines.

## `KUC-112`
**Source**: `finrl/applications/imitation_learning/Imitation_Sandbox.ipynb`

Experimental sandbox for testing imitation learning algorithms (TD3+BC) combined with market factor models for stock portfolio management.

## `KUC-113`
**Source**: `finrl/applications/imitation_learning/Stock_Selection.ipynb`

Using imitation learning techniques to learn stock selection strategies from expert behavior combined with technical indicators.

## `KUC-114`
**Source**: `finrl/applications/imitation_learning/Weight_Initialization.ipynb`

Investigating weight initialization strategies for imitation learning models to improve stock portfolio management performance.

FILE:references/WISDOM.md
# Cross-Project Wisdom

Total: **10**

## `CW-BT-001` — Cerebro 统一编排引擎
**From**: backtrader · **Applicable to**: backtesting

backtrader 用 Cerebro 作为单一入口，统一管理 data feeds、strategies、analyzers、 observers 的生命周期，支持一次 cerebro.run() 跑多策略+多数据源。 zvt 的 StockTrader 目前每次实例化只绑定一套因子，缺乏统一的多策略组合编排层； 借鉴 Cerebro 模式可让用户把多个 Trader 实例组合到一个 runner 中对比评估。

## `CW-BT-002` — Analyzer 插件化绩效评估
**From**: backtrader · **Applicable to**: backtesting

backtrader 提供 SharpeRatio、DrawDown、TimeReturn、TradeAnalyzer 等即插即用 的 Analyzer，可在不修改策略代码的情况下附加任意绩效指标。 zvt 当前绩效评估能力较弱，没有标准化的 Analyzer 接口； 借鉴此模式可让用户 cerebro.addanalyzer(SharpeRatio) 即得风险调整收益报告。

## `CW-BT-003` — Sizer 仓位管理分离
**From**: backtrader · **Applicable to**: backtesting

backtrader 将仓位管理（每次开仓买多少股/多大比例）单独抽象为 Sizer， 与信号逻辑完全解耦；内置 FixedSize、PercentSizer 等，用户可自定义。 zvt 目前没有显式的 Sizer 概念，仓位控制逻辑散落在 Trader.on_profit_control 等钩子中； 引入 Sizer 接口可使策略信号与资金管理规则独立演化和组合复用。

## `CW-BT-004` — Order 类型全集（Limit/Stop/OCO/Bracket）
**From**: backtrader · **Applicable to**: backtesting

backtrader 支持 Market、Limit、Stop、StopLimit、OCO（二选一）、 Bracket（止盈止损一对订单）等丰富订单类型，并模拟成交滑点和手续费方案。 zvt 回测目前主要支持市价成交，缺乏限价委托和组合订单模拟； 对于高频或实盘对接场景，完善订单类型将大幅提升回测真实性。

## `CW-BT-005` — 数据重采样与重播（Resampling & Replaying）
**From**: backtrader · **Applicable to**: backtesting

backtrader 可将低级别数据（如 1 min）实时 resample 为高级别（如 1 day）并同步驱动策略， 或 replay 逐 tick 模拟 OHLC 形成过程，实现日内精细回测。 zvt 目前多时间框架通过预录入不同级别 K 线实现，缺少运行时动态重采样； 借鉴此模式可在不重复录入数据的前提下支持任意时间粒度组合回测。

## `CW-VN-003` — CTA 回测引擎内置可视化
**From**: vnpy · **Applicable to**: backtesting

vnpy 的 cta_backtester 提供图形界面直接展示策略净值曲线、最大回撤、 每日盈亏、成交明细，无需 Jupyter Notebook。 zvt 目前回测结果可视化依赖 draw_result 方法调用 Plotly，但无统一的回测报告页面； 借鉴此模式可打包一个开箱即用的策略绩效仪表盘。

## `CW-VN-004` — vnpy.alpha ML 因子研究实验室（Lab）
**From**: vnpy · **Applicable to**: factor-research

vnpy 4.0 的 vnpy.alpha.lab 提供数据管理、模型训练、信号生成、策略回测一体化工作流， 支持 Lasso/LightGBM/MLP 等算法的标准化训练接口和可视化对比。 zvt 的 ML 能力目前仅有 MaStockMLMachine 一个入口，缺乏规范化 Lab 框架； 借鉴 Lab 模式可建立"特征工程→训练→信号→回测"的标准流水线，降低 ML 实验门槛。

## `CW-QL-001` — Point-in-Time 数据库（防未来数据泄漏）
**From**: qlib · **Applicable to**: backtesting

qlib 的 Point-in-Time Provider 保证在给定时间点 t 的查询只返回 t 时刻 真实可知的数据（财报发布延迟、修订历史均被正确处理）， 彻底消除回测中的 look-ahead bias。 zvt 目前财务数据以报告期为 timestamp，缺少"发布日"维度， 存在用未来财报数据做选股的潜在偏差；引入 PIT 模式可大幅提升回测可信度。

## `CW-QL-002` — Recorder + Experiment 实验管理（MLflow 风格）
**From**: qlib · **Applicable to**: factor-research

qlib 的 workflow 模块提供 Experiment/Recorder，自动记录每次模型训练的 超参数、特征、指标、预测结果，支持跨实验比较和模型版本管理。 zvt 目前缺乏 ML 实验追踪机制，每次重跑结果会覆盖前次； 借鉴 Recorder 模式可将每次因子实验的参数和结果持久化，支持快速复现和版本对比。

## `CW-QL-003` — Nested Decision Framework（多层嵌套决策执行）
**From**: qlib · **Applicable to**: backtesting

qlib 支持将高频执行层（分钟级委托拆单）嵌套在低频决策层（日级组合调仓）中， 两层独立优化且可组合运行，实现日内最优执行算法（如 TWAP、VWAP 调仓）。 zvt 目前回测仅有日线级别的成交假设，缺乏执行算法建模； 借鉴嵌套框架可让 zvt 区分"何时持有哪些股"与"如何以最小冲击成本建仓"两个问题。

FILE:references/components/backtesting_-_paper_trading.md
# backtesting_&_paper_trading (5 classes)

## `DRL_prediction`
`backtesting_&_paper_trading/drl-prediction.py:0`

## `DRL_prediction_load_from_file`
`backtesting_&_paper_trading/drl-prediction-load-from-file.py:0`

## `AlpacaPaperTrading.run`
`backtesting_&_paper_trading/alpacapapertrading-run.py:0`

## `trade_mode`
`backtesting_&_paper_trading/trade-mode.py:0`

## `deterministic`
`backtesting_&_paper_trading/deterministic.py:0`

FILE:references/components/data_cleaning_-_alignment.md
# data_cleaning_&_alignment (3 classes)

## `YahooFinanceProcessor.clean_data`
`data_cleaning_&_alignment/yahoofinanceprocessor-clean-data.py:0`

## `FeatureEngineer.clean_data`
`data_cleaning_&_alignment/featureengineer-clean-data.py:0`

## `fill_method`
`data_cleaning_&_alignment/fill-method.py:0`

FILE:references/components/drl_model_training.md
# drl_model_training (5 classes)

## `DRLAgent.get_model`
`drl_model_training/drlagent-get-model.py:0`

## `DRLAgent.train_model`
`drl_model_training/drlagent-train-model.py:0`

## `DRLAgent (ElegantRL).get_model`
`drl_model_training/drlagent-elegantrl-get-model.py:0`

## `drl_lib`
`drl_model_training/drl-lib.py:0`

## `algorithm`
`drl_model_training/algorithm.py:0`

FILE:references/components/ensemble_validation.md
# ensemble_validation (4 classes)

## `DRLEnsembleAgent.run_ensemble_strategy`
`ensemble_validation/drlensembleagent-run-ensemble-strategy.py:0`

## `get_validation_sharpe`
`ensemble_validation/get-validation-sharpe.py:0`

## `rebalance_window`
`ensemble_validation/rebalance-window.py:0`

## `validation_metric`
`ensemble_validation/validation-metric.py:0`

FILE:references/components/gym_environment_creation.md
# gym_environment_creation (6 classes)

## `StockTradingEnv.__init__`
`gym_environment_creation/stocktradingenv-init.py:0`

## `StockTradingEnv.step`
`gym_environment_creation/stocktradingenv-step.py:0`

## `StockTradingEnv.reset`
`gym_environment_creation/stocktradingenv-reset.py:0`

## `PortfolioOptimizationEnv.step`
`gym_environment_creation/portfoliooptimizationenv-step.py:0`

## `reward_scaling`
`gym_environment_creation/reward-scaling.py:0`

## `action_space_type`
`gym_environment_creation/action-space-type.py:0`

FILE:references/components/market_data_acquisition.md
# market_data_acquisition (4 classes)

## `DataProcessor.download`
`market_data_acquisition/dataprocessor-download.py:0`

## `YahooFinanceProcessor.fetch_data`
`market_data_acquisition/yahoofinanceprocessor-fetch-data.py:0`

## `AlpacaProcessor.fetch_data`
`market_data_acquisition/alpacaprocessor-fetch-data.py:0`

## `data_source`
`market_data_acquisition/data-source.py:0`

FILE:references/components/normalization_-_array_conversion.md
# normalization_&_array_conversion (3 classes)

## `GroupByScaler.fit_transform`
`normalization_&_array_conversion/groupbyscaler-fit-transform.py:0`

## `df_to_array`
`normalization_&_array_conversion/df-to-array.py:0`

## `scaler`
`normalization_&_array_conversion/scaler.py:0`

FILE:references/components/performance_metrics_-_visualization.md
# performance_metrics_&_visualization (5 classes)

## `backtest_stats`
`performance_metrics_&_visualization/backtest-stats.py:0`

## `plot_return`
`performance_metrics_&_visualization/plot-return.py:0`

## `get_baseline`
`performance_metrics_&_visualization/get-baseline.py:0`

## `benchmark`
`performance_metrics_&_visualization/benchmark.py:0`

## `plot_format`
`performance_metrics_&_visualization/plot-format.py:0`

FILE:references/components/technical_indicator_computation.md
# technical_indicator_computation (4 classes)

## `FeatureEngineer.add_technical_indicator`
`technical_indicator_computation/featureengineer-add-technical-indicator.py:0`

## `calculate_turbulence`
`technical_indicator_computation/calculate-turbulence.py:0`

## `indicator_list`
`technical_indicator_computation/indicator-list.py:0`

## `turbulence_enabled`
`technical_indicator_computation/turbulence-enabled.py:0`

ClawHub Data Analysis Research+2

T@clawhub-tangweigang-jpg-8679fec286

Finrl Meta Envs

Skill

提供多市场金融强化学习环境，支持PPO/DQN等DRL算法回测、Markowitz组合优化与实时模拟交易，适配Alpaca等券商接口。。

---
name: finrl-meta-envs
description: |-
  提供多市场金融强化学习环境，支持PPO/DQN等DRL算法回测、Markowitz组合优化与实时模拟交易，适配Alpaca等券商接口。。
license: Proprietary. See LICENSE.txt in project root.
compatibility: Designed for Doramagic-host ecosystem (Claude Code / openclaw / Cursor). Requires Python 3.12+ with uv package manager.
metadata:
  version: "v6.1"
  blueprint_id: "finance-bp-116"
  compiled_at: "2026-04-22T13:00:56.369548+00:00"
  capability_markets: "multi-market"
  capability_activities: "backtesting, factor-research"
  sop_version: "crystal-compilation-v6.1"
---
# FinRL 强化环境 (finrl-meta-envs)

> 提供多市场金融强化学习环境，支持PPO/DQN等DRL算法回测、Markowitz组合优化与实时模拟交易，适配Alpaca等券商接口。

## Pipeline

`data_collection -> data_storage -> factor_computation -> target_selection -> trading_execution -> visualization`

## Top Use Cases (9 total)

### Automated Paper Trading with PPO Agent (`UC-101`)
Execute simulated paper trades in real-time using a trained PPO reinforcement learning agent connected to Alpaca brokerage API, enabling risk-free str
**Triggers**: paper trading, PPO agent, Alpaca

### Alpaca Paper Trading Demo with PPO (`UC-104`)
Demonstrate live paper trading execution using a PPO neural network agent connected to Alpaca's paper trading API, enabling real-time trade simulation
**Triggers**: paper trading, Alpaca demo, PPO

### Markowitz Mean-Variance Portfolio Optimization (`UC-102`)
Optimize portfolio allocation across multiple assets using Markowitz mean-variance optimization to maximize risk-adjusted returns, balancing expected
**Triggers**: portfolio optimization, Markowitz, mean-variance

For all **9** use cases, see [references/USE_CASES.md](references/USE_CASES.md).

**Execute trigger**: `When user intent matches intent_router.uc_entries[].positive_terms AND user uses action verb (run/execute/跑/执行/backtest/fetch/collect)`

## What I'll Ask You

- Target market: A-share (default), HK, or crypto? (US stocks in ZVT are half-baked — stockus_nasdaq_AAPL exists but coverage is thin)
- Data source / provider: eastmoney (free, no account), joinquant (account+paid), baostock (free, good history), akshare, or qmt (broker)?
- Strategy type: MACD golden-cross, MA crossover, volume breakout, fundamental screen, or custom factor?
- Time range: start_timestamp and end_timestamp for backtest period
- Target entity IDs: specific stocks (stock_sh_600000) or index components (SZ1000)?

## Semantic Locks (Fatal)

| ID | Rule | On Violation |
|---|---|---|
| `SL-01` | Execute sell orders before buy orders in every trading cycle | halt |
| `SL-02` | Trading signals MUST use next-bar execution (no look-ahead) | halt |
| `SL-03` | Entity IDs MUST follow format entity_type_exchange_code | halt |
| `SL-04` | DataFrame index MUST be MultiIndex (entity_id, timestamp) | halt |
| `SL-05` | TradingSignal MUST have EXACTLY ONE of: position_pct, order_money, order_amount | halt |
| `SL-06` | filter_result column semantics: True=BUY, False=SELL, None/NaN=NO ACTION | halt |
| `SL-07` | Transformer MUST run BEFORE Accumulator in factor pipeline | halt |
| `SL-08` | MACD parameters locked: fast=12, slow=26, signal=9 | halt |

Full lock definitions: [references/LOCKS.md](references/LOCKS.md)

## Top Anti-Patterns (25 total)

- **`AP-ZVT-183`**: 除权因子为 inf/NaN 时直接参与乘法导致复权静默失败
- **`AP-ZVT-179`**: 第三方数据接口超限后异常被吞噬，数据静默缺失
- **`AP-ZVT-183B`**: HFQ（后复权）与 QFQ（前复权）K 线表使用错误导致因子计算漂移

All 25 anti-patterns: [references/ANTI_PATTERNS.md](references/ANTI_PATTERNS.md)

## Evidence Quality Notice

> [QUALITY NOTICE] This crystal was compiled from blueprint finance-bp-116. Evidence verify ratio = 23.2% and audit fail total = 8. Generated results may have uncaptured requirement gaps. Verify critical decisions against source files (LATEST.yaml / LATEST.jsonl).

## Reference Files

| File | Contents | When to Load |
|---|---|---|
| [references/seed.yaml](references/seed.yaml) | V6+ 全量权威 (source-of-truth) | 有行为/决策争议时必读 |
| [references/ANTI_PATTERNS.md](references/ANTI_PATTERNS.md) | 25 条跨项目反模式 | 开始实现前 |
| [references/WISDOM.md](references/WISDOM.md) | 跨项目精华借鉴 | 架构决策时 |
| [references/CONSTRAINTS.md](references/CONSTRAINTS.md) | domain + fatal 约束 | 规则冲突时 |
| [references/USE_CASES.md](references/USE_CASES.md) | 全量 KUC-* 业务场景 | 需要完整示例时 |
| [references/LOCKS.md](references/LOCKS.md) | SL-* + preconditions + hints | 生成回测/交易代码前 |
| [references/COMPONENTS.md](references/COMPONENTS.md) | AST 组件地图（按 module 拆分）| 查 API 时 |

---

*Compiled by Doramagic crystal-compilation-v6.1 from `finance-bp-116` blueprint at 2026-04-22T13:00:56.369548+00:00.*
*See [human_summary.md](human_summary.md) for non-technical overview.*

FILE:human_summary.md
# Human Summary

> I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow. Just tell me what you want; I'll write the code, you don't have to dig docs. (Heads up: ZVT natively supports A-share, HK, and crypto. US stocks — stockus_nasdaq_AAPL — are half-baked; don't bother for serious work.)

## What I Can Do

- **tagline**: I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow. Just tell me what you want; I'll write the code, you don't have to dig docs. (Heads up: ZVT natively supports A-share, HK, and crypto. US stocks — stockus_nasdaq_AAPL — are half-baked; don't bother for serious work.)
- **use_cases**: ['Ensemble Stock Trading with DRL Agents', 'Markowitz Mean-Variance Portfolio Optimization', 'Automated Paper Trading with PPO Agent', 'A-share MACD daily golden-cross backtest with hfq price adjustment from eastmoney', 'End-to-end ZVT pipeline: FinanceRecorder + GoodCompanyFactor + StockTrader', 'Multi-factor strategy with TargetSelector (AND mode) combining MACD + volume breakout', 'Index composition data collection (SZ1000, SZ2000) with EM recorder']

## What I Ask You

- Target market: A-share (default), HK, or crypto? (US stocks in ZVT are half-baked — stockus_nasdaq_AAPL exists but coverage is thin)
- Data source / provider: eastmoney (free, no account), joinquant (account+paid), baostock (free, good history), akshare, or qmt (broker)?
- Strategy type: MACD golden-cross, MA crossover, volume breakout, fundamental screen, or custom factor?
- Time range: start_timestamp and end_timestamp for backtest period
- Target entity IDs: specific stocks (stock_sh_600000) or index components (SZ1000)?

FILE:references/ANTI_PATTERNS.md
# Anti-Patterns (Cross-Project)

Total: **25**

## qlib (9)

### `AP-QLIB-1930` — 回测结果与模型无关——共享 dataset 对象导致预测值被首次模型覆盖 <sub>(high)</sub>

Qlib 中多个模型复用同一个已 fit 的 DatasetH 实例时，dataset 内部的标准化 参数（fit_start_time/fit_end_time 决定的归一化统计量）在第一次 fit 后固化。 切换模型但不重新初始化 dataset，导致所有模型实际使用同一套预测信号。表现为 无论换 LightGBM/XGBoost/DNN，回测净值曲线完全一致。这是最危险的"实验看起来 在跑，但结论全部无效"反模式。

Source: https://github.com/microsoft/qlib/issues/1930

### `AP-QLIB-2090` — fit_start_time 与 train segment 双重配置引发隐式数据泄露 <sub>(high)</sub>

Qlib DatasetH 有两个"训练数据范围"：handler 的 fit_start_time/fit_end_time （决定归一化器拟合范围）和 segments.train（决定模型训练范围）。常见错误是 让 fit_end_time 覆盖 valid/test 段，使归一化统计量（均值、标准差）包含了 未来数据，造成前向偏差（look-ahead bias）。两者独立配置但语义耦合，文档 未明确说明 fit_end_time 必须 <= train_end。

Source: https://github.com/microsoft/qlib/issues/2090

### `AP-QLIB-2036` — MACD 因子公式文档错误——DEA 被多除一次 CLOSE 导致量纲不一致 <sub>(high)</sub>

Qlib 官方文档中的 Alpha 公式示例将 MACD 的 DEA 定义为 EMA(DIF, 9) / CLOSE， 但 DIF 已经是无量纲（除过 CLOSE 的），再次除以 CLOSE 导致 DEA 量纲为 1/price。 基于此文档公式构建的 MACD 因子在截面标准化后与正确公式差异显著，IC 下降。 此类文档层面的公式错误会被大量用户直接照搬入生产因子库。

Source: https://github.com/microsoft/qlib/issues/2036

### `AP-QLIB-2184` — 自定义 A 股数据导入前未按约定填充停牌日 NaN，引发下游因子噪声 <sub>(high)</sub>

Qlib 约定停牌日 open/close/high/low/volume/factor 字段均应填 NaN，以便框架 在因子计算时识别并跳过。用户自建 A 股数据集时若将停牌日保留为上一日价格 （常见于从东财/Wind 直接导出的数据），会导致停牌期间的价格动量因子出现 "假信号"（价格不变但因子非零）。Qlib 不校验此约定，错误静默流入训练数据。

Source: https://github.com/microsoft/qlib/issues/2184

### `AP-QLIB-1892` — PIT（Point-In-Time）财务数据收集器依赖外部股票列表接口，全量 A 股获取不完整 <sub>(high)</sub>

Qlib 的 PIT 数据收集器（财务数据时间点快照）在初始化时调用 get_hs_stock_symbols() 获取沪深股票列表。该函数依赖东财 API，经常仅返回 部分列表而非全量 5000+ 股票，且函数在获取不完整时直接 raise ValueError。 用户若按文档步骤操作，财务数据集将只覆盖部分股票，基于 PIT 财务因子的回测 存在严重生存者偏差（未被采集的股票被隐式排除）。

Source: https://github.com/microsoft/qlib/issues/1892

### `AP-QLIB-2097` — 全市场 instrument="all" 在 32GB 内存机器上 OOM，但 CSI300 正常 <sub>(medium)</sub>

Qlib 在加载 Alpha158 特征时会将指定 universe 的全部特征矩阵一次性载入内存。 使用 instrument="csi300"（300 股）与 instrument="all"（5000+ 股）的内存占用 差约 16 倍。32GB 机器跑全市场时在 init_instance_by_config 阶段直接 OOM， 错误信息不提示内存问题。用户容易误以为是配置错误，实际上需要分批加载或 使用流式特征计算。

Source: https://github.com/microsoft/qlib/issues/2097

### `AP-QLIB-1984` — LightGBM 模型标签维度校验逻辑永远不触发导致多标签训练静默失败 <sub>(medium)</sub>

Qlib gbdt.py 中用 y.values.ndim == 2 判断是否为多标签，但从 DataFrame 取出的 Series 的 ndim 永远为 1，条件永远为 False，因此多标签训练不会走 squeeze 分支，而是直接进入 LightGBM 训练并在更深处抛出语义不明的错误。 用户尝试自定义多标签任务时无法从错误信息定位到此根因。

Source: https://github.com/microsoft/qlib/issues/1984

### `AP-QLIB-1915` — 自定义 CSV 数据 dump_bin 后 DataHandler 报 Length mismatch，D.features 却正常 <sub>(high)</sub>

Qlib 存在两套数据访问路径：D.features（直接读 binary）和 DataHandler/DataHandlerLP （带 processor pipeline）。自定义 A 股 CSV 数据在 dump_bin 时若字段顺序 或 symbol 格式（如 600000.SH vs SH600000）与 Qlib 约定不符，DataHandler 的 processor 在 align/reindex 时触发 Length mismatch，而 D.features 因不 经过 processor 而成功。这一"两套路径行为不一致"让用户误以为数据已正确导入。

Source: https://github.com/microsoft/qlib/issues/1915

### `AP-QLIB-1949` — Colab/Linux 多进程后端与 Qlib ParallelExt 冲突导致 DataHandler 完全不可用 <sub>(medium)</sub>

Qlib 在非 fork 环境（Windows 或 Google Colab）中，DataHandler 使用 joblib 并行加载特征时，ParallelExt 初始化时访问 _backend_args 属性失败（AttributeError）。 根因是 joblib 1.5+ 移除了该内部属性，Qlib 的兼容层未更新。表现为 D.features 调用抛出多层嵌套异常，用户无法从错误栈判断是并行后端问题还是数据问题。

Source: https://github.com/microsoft/qlib/issues/1949

## vnpy (4)

### `AP-VNPY-3691` — K 线生成器首根 K 线时间戳不对齐，导致第一个周期信号错误 <sub>(high)</sub>

vnpy BarGenerator 在合成 N 分钟 K 线时，第一根推送的 K 线时间戳为"当前 tick 所在分钟"而非"完整 N 分钟周期结束时间"。具体表现：09:59 的 tick 会 触发一根不完整的 5 分钟 K 线推送（本应等到 10:04 才推送）。策略若在 on_bar 中直接用 datetime.minute % 5 过滤，第一根 K 线恰好通过，但包含的 数据不足一个完整周期，用于信号计算会产生错误的开仓信号。

Source: https://github.com/vnpy/vnpy/issues/3691

### `AP-VNPY-3669` — Alpha 模块历史数据增量保存时新旧 DataFrame schema 不兼容导致 SchemaError <sub>(medium)</sub>

vnpy Alpha 模块在保存 K 线数据到 Parquet 文件时，将新下载数据（可能含 Float64 列）与已存文件（历史 Int64 列）直接 polars.concat。polars 强类型 不允许隐式类型提升，抛出 SchemaError。根因是不同数据源/版本返回的字段类型 不一致（如 volume 在部分行情源为整数，在另一些为浮点），且 concat 前无 schema 对齐步骤。影响所有使用 vnpy alpha 进行回测的历史数据构建流程。

Source: https://github.com/vnpy/vnpy/issues/3669

### `AP-VNPY-3685` — 价差交易模块 run_backtesting() 在 Jupyter 环境下静默报错，结果不可信 <sub>(high)</sub>

vnpy 4.10 价差交易（SpreadTrading）模块的 run_backtesting() 在 Jupyter 环境下存在事件循环冲突（asyncio already running），导致回测引擎部分逻辑 不执行但不抛异常，返回看似正常的回测统计数据。同样代码在命令行 Python 中无此问题。vnpy 4.x 将部分 IO 改为 async 但 Jupyter 的事件循环与之不兼容， 是"回测结果看起来正确但实际不完整"的隐蔽陷阱。

Source: https://github.com/vnpy/vnpy/issues/3685

### `AP-VNPY-3700` — 安装脚本不使用 venv 导致全局 numpy 版本被降级破坏其他依赖 <sub>(medium)</sub>

vnpy install.bat 直接在系统/conda base 环境安装，会强制降级 numpy 到 <2.0 以满足 vnpy 依赖，破坏依赖 numpy 2.x 的其他量化工具（如 scipy、pytorch 新版）。 没有 requirements.txt，依赖边界不透明。在多工具共存的量化研究环境中， vnpy 的安装脚本是"全局环境污染"的常见根源。

Source: https://github.com/vnpy/vnpy/issues/3700

## zipline (6)

### `AP-ZIPLINE-138` — 回测价格为未复权价，教程图表误导用户误判策略收益 <sub>(high)</sub>

Zipline 教程使用 AAPL 股价图做演示，但 bundle 中存储的是未复权价格（raw price）， 而非经过拆股/分红调整的复权价。图表显示的历史价格与市场实际价约差 4 倍（Apple 历次拆股累计因子），用户误将"价格翻 4 倍"当作策略收益。A 股场景更严重： 除权前后价格跳变会在未复权数据中形成巨大"信号"，吸引技术指标在除权日产生 虚假突破信号。

Source: https://github.com/stefan-jansen/zipline-reloaded/issues/138

### `AP-ZIPLINE-235` — 默认以当根 K 线收盘价成交，低估实盘滑点，策略回测收益虚高 <sub>(high)</sub>

Zipline 默认滑点模型在当根 K 线触发信号后，以同根 K 线收盘价成交（current bar close fill）。实盘中信号只能在下一根 K 线的开盘价附近成交（T+1 order execution）。以 A 股日线为例，用收盘价回测比用次日开盘价成交平均高估日收益 约 0.1-0.3%，年化差距可超 30%。需显式配置 slippage model 为 VolumeShareSlippage 或 FixedSlippage 并设合理 volume_limit。

Source: https://github.com/stefan-jansen/zipline-reloaded/issues/235

### `AP-ZIPLINE-190` — 日历 start_session 设为非交易日触发 DateOutOfBounds，无提示如何修正 <sub>(medium)</sub>

Zipline 在注册 bundle 或运行算法时，若 start_session 参数恰好是非交易日 （如 1998-01-01 元旦），Calendar 校验抛出 DateOutOfBounds（"cannot be earlier than the first session"）。错误信息仅显示交易日历起始日，不提示"请改为第一个 交易日"。A 股场景：使用 SSE/SZSE 日历时，若 start_date 恰好是春节前最后 一天次日（节假日），会触发同类错误，调试成本极高。

Source: https://github.com/stefan-jansen/zipline-reloaded/issues/190

### `AP-ZIPLINE-181` — asset db 过期后 Pipeline 报"no assets traded"，误导用户排查数据范围 <sub>(high)</sub>

Zipline 的 asset database（SQLite）记录每只股票的 start/end 交易日期。若 使用了旧版 Quandl/自建 bundle 且未重新 ingest，在回测新日期范围时 Pipeline 抛出 "Failed to find any assets with country_code 'US' that traded between [dates]"。A 股场景：重新下载行情后若只更新价格数据而未重建 asset db，退市/ 新上市股票的日期范围不更新，Pipeline 过滤会悄悄排除这些股票，产生生存者偏差。

Source: https://github.com/stefan-jansen/zipline-reloaded/issues/181

### `AP-ZIPLINE-285` — week_start()/week_end() 在自定义日历（非美股）下静默失效 <sub>(medium)</sub>

Zipline schedule_function 的 date_rules.week_start() 和 date_rules.week_end() 依赖交易日历的周首/周末判断逻辑，但在非美股日历（如 ASX、SSE）中，该逻辑 与 NYSE 日历的偏移计算不兼容，导致 schedule 永远不触发或在错误的日期触发。 A 股场景：使用 SSE 日历时，含春节等连续长假的周，week_start 可能跳过整个 假期周而不调仓，但用户无法从日志发现未触发的调度。

Source: https://github.com/stefan-jansen/zipline-reloaded/issues/285

### `AP-ZIPLINE-240` — 回测日期时区必须为 UTC，传入 naive datetime 引发深层 AssertionError <sub>(medium)</sub>

Zipline 内部强制要求所有时间戳为 UTC aware datetime。当用户传入 naive datetime （无时区信息，如 pd.Timestamp('2020-01-01')）时，不在入口处报错，而是在 算法执行深处触发 AssertionError: Algorithm should have a utc datetime，栈深 难以定位。A 股开发者从本地 CST 时间导入数据时极易触发此陷阱，需在 bundle 注册时显式 tz_localize('UTC')。

Source: https://github.com/stefan-jansen/zipline-reloaded/issues/240

## zvt (6)

### `AP-ZVT-183` — 除权因子为 inf/NaN 时直接参与乘法导致复权静默失败 <sub>(high)</sub>

ZVT 在计算前复权因子时以 new/old 价格比计算 qfq_factor。当 old==0（新股首日 或数据缺失）时因子为 inf；当 kdata.open 本身为 None（停牌日未填充）时乘法 抛出 TypeError。结果：整个 entity 的复权计算中断，后续 K 线全部丢失，但主 流程只 log ERROR 不中断，用户往往不知道已有大量股票数据损坏。

Source: https://github.com/zvtvz/zvt/issues/183

### `AP-ZVT-179` — 第三方数据接口超限后异常被吞噬，数据静默缺失 <sub>(high)</sub>

ZVT 使用聚宽 jqdatasdk 批量拉取全市场 K 线时（4000+ 股票），触发聚宽每日 最大查询条数限制（错误：已超过每日最大查询数量）。ZVT 捕获异常后继续执行下一 entity，导致超限后所有股票的当日数据均静默缺失。回测若使用该残缺数据库，因 子计算结果将产生系统性偏差，且无告警。

Source: https://github.com/zvtvz/zvt/issues/179

### `AP-ZVT-161` — 全市场 SQLite 批量因子计算触发 too many SQL variables 错误 <sub>(medium)</sub>

ZVT 在计算 VolumeUpMaFactor 等多股因子时，将所有 entity_id 拼入单条 SQL 的 IN 子句。当 A 股全市场（5000+ 股）一次性查询时，触发 SQLite 默认限制 SQLITE_MAX_VARIABLE_NUMBER=999。调大 max_allowed_packet（MySQL 参数）无效， 根因是 SQLite 变量数上限。正确解法是分批查询，但 ZVT 早期版本未处理此边界。

Source: https://github.com/zvtvz/zvt/issues/161

### `AP-ZVT-129` — 使用通配符导入隐藏 API 版本变更，AdjustType 等枚举莫名消失 <sub>(medium)</sub>

ZVT 文档示例使用 `from zvt import *` 导入所有符号。当 ZVT 版本升级重构 枚举（如将 AdjustType 移入子模块）后，通配符导入不再包含该符号，触发 AttributeError。使用者误以为是安装问题，实际是版本间 API breaking change 未在 CHANGELOG 中标注，且通配符导入掩盖了具体来源。应显式 import 枚举类。

Source: https://github.com/zvtvz/zvt/issues/129

### `AP-ZVT-187` — 回测引擎未在数据层空结果时提前终止，导致空指针级联崩溃 <sub>(medium)</sub>

ZVT Trader 在 load_data 完成后检查数据为空时，不提前退出，而是将空 DataFrame 传入 selector 计算，触发后续 NoneType 操作链式崩溃。错误栈深且难以定位根因， 用户误以为是策略逻辑问题。根因是数据时间窗口配置错误（start/end 不在数据 库覆盖范围内）但无有效校验。

Source: https://github.com/zvtvz/zvt/issues/187

### `AP-ZVT-183B` — HFQ（后复权）与 QFQ（前复权）K 线表使用错误导致因子计算漂移 <sub>(high)</sub>

ZVT 提供 Stock1dKdata（不复权）、Stock1dHfqKdata（后复权）、Stock1dQfqKdata （前复权）三张独立表。用户在计算价格动量/均线因子时混用两张表（如用不复权 做均线，用后复权做收益率），导致除权日前后因子值产生跳变。ZVT 不做跨表 复权类型一致性校验，混用静默通过。

Source: https://github.com/zvtvz/zvt/issues/183

FILE:references/COMPONENTS.md
# Component Capability Map

**Project**: finance-bp-116--FinRL-Meta
**Scan date**: 2026-04-22
**Stats**: {'total_files': 6, 'total_classes': 29, 'total_functions': 0, 'total_stages': 6}

## Modules (6)

- [data_collection](components/data_collection.md): 5 classes
- [feature_engineering](components/feature_engineering.md): 4 classes
- [environment_simulation](components/environment_simulation.md): 5 classes
- [agent_training](components/agent_training.md): 6 classes
- [order_execution_&_execution_optimization](components/order_execution_-_execution_optimization.md): 6 classes
- [paper_trading](components/paper_trading.md): 3 classes

FILE:references/CONSTRAINTS.md
# Constraints

## preservation_manifest

```yaml
required_objects:
  business_decisions_count: 176
  fatal_constraints_count: 40
  non_fatal_constraints_count: 218
  use_cases_count: 9
  semantic_locks_count: 12
  preconditions_count: 4
  evidence_quality_rules_count: 2
  traceback_scenarios_count: 5

```

## Domain Constraints Injected (39)

- **`SHARED-BT-LAB-001`** <sub>(fatal)</sub>: 未来函数（Lookahead Bias）：在模拟历史时间点 t 的交易决策时， 不得使用 t 时刻之后才能知道的信息。最常见形式： (1) 使用收盘价计算信号并同日以收盘价成交； (2) 将 T 日收盘后计算的指标标记在同一根 K 线； (3) 使用当日最高/最低价作为成交假设。 信号计算与成交时间必须对齐：T 日收盘后计算信号，T+1 日开盘成交。
- **`SHARED-BT-LAB-002`** <sub>(high)</sub>: 指标预热期（Warmup Period）处理：滚动窗口指标在前 N 个 bar 时 NaN， 这些 bar 不应参与信号计算和持仓决策。强制要求指标的 warmup_period 与最长 lookback 期等长，且 warmup 期间持仓应置零。
- **`SHARED-BT-LAB-003`** <sub>(fatal)</sub>: ML/DL 模型时序数据划分必须按时间顺序：TRAIN < VALID < TEST， 不可使用随机 k-fold 分折（会将未来数据混入训练集）。 应使用 TimeSeriesSplit 或 Walk-Forward 验证。
- **`SHARED-BT-LAB-004`** <sub>(fatal)</sub>: 开盘价/最高价/最低价成交假设：日线回测中假设每日可以最高价卖出或 最低价买入（如动量策略"最高价止盈"），这是明显的 lookahead， 因为日内最高/最低价只有收盘后才能确认。成交价只能用开盘价或 前一日收盘价（带滑点）。
- **`SHARED-BT-LAB-005`** <sub>(high)</sub>: 数据对齐偏移（Off-by-one）：pandas rolling/shift 等操作容易引入细微的 1 期偏移错误。应在代码中明确记录每个序列的"观测时间点"， 并通过 assert 验证关键时间对齐关系。
- **`SHARED-BT-LAB-006`** <sub>(high)</sub>: 过度优化（Overfitting）：回测数量越多，过拟合概率越高。 Bailey et al.（2014）证明 Optimal Sharpe Ratio 期望值随回测次数单调递减。 应使用 Walk-Forward 验证代替 in-sample 参数穷举，并报告 Deflated Sharpe Ratio（DSR）而非峰值 Sharpe。
- **`SHARED-BT-SURV-001`** <sub>(fatal)</sub>: 幸存者偏差（Survivorship Bias）：使用当前市场成分股作为历史回测股票池， 会遗漏曾经存在但后来退市、摘牌或被合并的股票，系统性高估策略历史收益率。 回测股票池必须使用历史时点快照（point-in-time universe）。
- **`SHARED-BT-SURV-002`** <sub>(high)</sub>: In-Sample / Out-of-Sample 划分：策略开发、参数选择必须在样本内完成， 样本外数据仅用于最终验证，不可多次"看"样本外数据后继续调优 （会将样本外变为新的样本内，重蹈过拟合）。
- **`SHARED-BT-SURV-003`** <sub>(high)</sub>: 停牌/缺失数据的填充策略：停牌日价格不可简单用前一日收盘价 forward-fill， 因为这会在复盘时造成"零成交量"日参与了因子计算和信号生成。 应在因子计算层显式过滤缺失交易日，不填充。
- **`SHARED-BT-SURV-004`** <sub>(high)</sub>: 异常值（Extreme Value）污染：原始市场数据可能含有数据源错误（如除权未 及时调整、手工录入错误导致的极端价格），不清洗直接进入因子计算会产生 极端信号，污染整个横截面。应在 pipeline 入口处过滤 3-sigma 异常值。
- **`SHARED-BT-COST-001`** <sub>(fatal)</sub>: 交易成本（佣金 + 印花税/转让税 + 过户费）必须在回测初始化时强制配置， 不可使用零成本默认值。忽略成本的回测策略绩效指标具有欺骗性， 高换手率策略尤其严重（单边往返成本往往吞噬 50%+ 的毛收益）。
- **`SHARED-BT-COST-002`** <sub>(high)</sub>: 滑点（Slippage）建模：回测若无滑点，假设每笔订单以理想价格成交， 高频策略在实盘中会因成交价劣化而产生严重亏损。至少应配置固定点差 或比例滑点；大单应使用成交量比例模型（如不超过日成交量 5%）。
- **`SHARED-BT-COST-003`** <sub>(high)</sub>: 换手率（Turnover）必须在回测绩效报告中展示并与成本关联分析。 月换手率超过 50%（年化 600%+）时，策略净收益对成本假设极度敏感， 每 10bps 成本变化可能改变策略盈亏结论，必须做成本敏感性分析。
- **`SHARED-BT-COST-004`** <sub>(medium)</sub>: 仓位规模化（Position Sizing）必须纳入资金量约束：回测应模拟固定资金量 下的实际持仓股数（取整），而非假设可以持有小数股。 对小盘股，最小交易单位（A股：100股/手）会导致实际可持仓量与目标权重 产生偏差，应在回测中模拟取整效应。
- **`SHARED-BT-TIME-001`** <sub>(high)</sub>: 时间戳时区统一：多数据源合并时，UTC vs 本地时间混用是常见数据腐败源。 所有时间戳必须在 pipeline 入口处统一转换为同一时区（推荐 UTC 存储， 市场本地时区展示），不可在 pipeline 中途混用不同时区。
- **`SHARED-BT-TIME-002`** <sub>(high)</sub>: 交易日历对齐：合并不同市场或不同频率数据时（如日线价格 + 周频因子）， 必须使用明确的交易日历进行 reindex/merge，不可使用 outer join 后 fillna， 否则会在非交易日（节假日）创建虚假数据行。
- **`SHARED-BT-TIME-003`** <sub>(high)</sub>: 增量更新边界校验：历史数据增量更新时，必须从数据库查询已存最新日期， 仅下载该日期之后的数据。若重新下载已有数据并追加，会产生时间戳重复行， 导致回测时序错误。更新前后必须校验无重复 (index.duplicated().any() == False)。
- **`SHARED-BT-TIME-004`** <sub>(medium)</sub>: 回测绩效归因失真：基准（Benchmark）选择不当会使 Alpha/Beta 计算失真。 应选用策略实际可投资的被动基准（如 HS300 ETF），而非不可直接投资的 价格指数（如 HS300 指数）。价格指数不含股息再投资，会低估持仓基准收益。
- **`SHARED-BT-PERF-001`** <sub>(medium)</sub>: 最大回撤（Max Drawdown）计算必须使用净值序列（portfolio value）， 不可用累计收益率序列代替。若使用对数收益率累加，会低估回撤深度 （因对数收益率在下跌时会比简单收益率偏小）。
- **`SHARED-BT-PERF-002`** <sub>(medium)</sub>: Sharpe Ratio 年化化约定：年化 Sharpe = 日 Sharpe × sqrt(252)（股票，252 交易日） 或 × sqrt(365)（加密货币，365日）。不同系统默认不同，跨系统对比前必须 确认年化因子，否则 Sharpe 不可比。
- **`SHARED-BT-PERF-003`** <sub>(medium)</sub>: Calmar Ratio / Sortino Ratio 优于 Sharpe Ratio 作为风险调整收益指标： Sharpe 假设收益正态分布，A 股/加密市场的收益分布显著左偏（肥尾）， 会低估下行风险。量化评估应同时报告 Sortino（仅下行波动）和 Calmar（年化收益/最大回撤），不应单一依赖 Sharpe。
- **`SHARED-BT-PERF-004`** <sub>(medium)</sub>: 回测绩效归因应拆解为：alpha（主动收益）、beta（市场收益）、 因子暴露收益（style/sector）和特异性收益（stock selection）。 不做归因的回测无法区分"策略优秀"与"顺风行情恰好 beta 对了"。
- **`SHARED-FR-IC-001`** <sub>(high)</sub>: IC（信息系数）是衡量因子预测能力的核心指标，定义为因子值与 下期收益率的 Spearman 秩相关系数（ICIR = IC / std(IC)）。 IC 绝对值 > 0.05 视为有预测能力的初步证据，ICIR > 0.5 视为稳定。 不计算 IC 直接报告回测绩效是因子有效性证明缺失的典型问题。
- **`SHARED-FR-IC-002`** <sub>(high)</sub>: IC 衰减（IC Decay）分析：因子预测能力通常随持仓期增长而衰减。 应计算 1/5/10/20 日 IC 序列，识别因子的最优持仓期。 IC 在1日高但20日迅速衰减的因子是短期因子，不适合月度换仓策略； 反之亦然。使用错误的持仓期会严重损害因子实盘表现。
- **`SHARED-FR-IC-003`** <sub>(high)</sub>: Harvey, Liu & Zhu (2016) 警告：学术界已发现 300+ 个"显著"因子， 其中大量是多重检验下的误发现（False Discovery）。因子有效性要求： t-stat > 3.0（而非传统的 1.96）；或在不同时段/市场独立复现； 或有清晰的经济学逻辑。不满足上述条件的因子极可能是数据挖掘产物。
- **`SHARED-FR-IC-004`** <sub>(high)</sub>: 因子换手率（Factor Turnover）控制：高 IC 但高换手率的因子，在扣除 交易成本后净 IC 可能为负。应计算换手率调整后的有效 IC： net_IC = IC - turnover × cost_per_turn。目标换手率 ≤ 50%（月频）。
- **`SHARED-FR-IC-005`** <sub>(medium)</sub>: 因子衰减期（Half-life）是因子信号强度的核心参数，直接决定最优再平衡频率。 半衰期 < 5 日：日频或周频换仓；5-20 日：周频或双周；> 20 日：月频换仓。 错误地对短期因子使用月频换仓，会导致大量 alpha 在持仓期内消散。
- **`SHARED-FR-NEUT-001`** <sub>(high)</sub>: 行业中性化（Industry Neutralization）：因子值若不对行业均值中性化， 因子收益中会混入行业轮动收益，难以判断是因子本身还是行业暴露驱动了收益。 行业中性化操作：factor_neutral = factor - industry_mean(factor)。
- **`SHARED-FR-NEUT-002`** <sub>(high)</sub>: 市值中性化（Market Cap Neutralization）：小盘股效应（小盘跑赢大盘） 是金融史上最持久的 anomaly 之一，会污染几乎所有未中性化的因子。 若因子与市值高度相关，选股会系统性偏向小盘，收益来自市值暴露而非因子本身。 需同时进行行业和市值中性化（Fama-MacBeth 回归或残差法）。
- **`SHARED-FR-NEUT-003`** <sub>(high)</sub>: 异常值处理（Winsorize/MAD）：因子原始值通常含有极端值，极端值会扭曲 分组分析（如 Q1/Q10 十分位）。应对原始因子值做 Winsorize（截尾至 [1%, 99%] 或 3-sigma）或 MAD（中位数绝对偏差）缩尾，然后再排名/中性化。
- **`SHARED-FR-NEUT-004`** <sub>(medium)</sub>: 因子正交化（Factor Orthogonalization）：当多个因子共同用于合成打分时， 高相关因子的合成等效于对单一因子过度权重，稀释信号多样性。 应在合成前对因子做施密特正交化或 PCA，消除因子间的多重共线性。
- **`SHARED-FR-NEUT-005`** <sub>(medium)</sub>: 缺失数据填充策略：因子计算中的 NaN（停牌/新股/数据缺口）若用截面均值填充 会引入 lookahead bias（均值本身含未来信息）；若完全删除会产生幸存者偏差； 正确做法是用截面中位数（当日所有股票的中位数，不依赖未来）或将该股当日排除。
- **`SHARED-FR-PORT-001`** <sub>(high)</sub>: 分层分析（Quantile Analysis）：因子评估应使用 Q1/Q5（五分位）或 Q1/Q10（十分位）分组的多空收益差（top minus bottom spread）作为 主要评估指标，而非简单的多头收益。Q1 多 Q5 空的"单调性"检验是 因子有效性的核心证据：单调递增/递减 > 非单调 >> 仅多头有效。
- **`SHARED-FR-PORT-002`** <sub>(medium)</sub>: Alpha 衰减测试（Alpha Decay Test）：因子的月度 IC 在不同时段（牛市/熊市/ 震荡市）的稳定性是因子鲁棒性的重要证据。IC 仅在某个特定市场状态下有效 的因子不适合全天候部署；应分段（rolling 12M）展示 IC 时序， 识别因子失效期。
- **`SHARED-FR-PORT-003`** <sub>(medium)</sub>: 换仓成本感知（Turnover-Aware Selection）：因子排名靠近中间地带（49-51 分位） 的股票，排名小幅波动就会触发换仓，产生大量无效交易成本。 应在选股时设置换仓缓冲区（buffer zone）：只在排名变化超过阈值时才换仓。
- **`SHARED-FR-PORT-004`** <sub>(medium)</sub>: 分组收益的统计显著性（Bootstrap 检验）：因子分层收益差（Q1-Q5 spread） 即使在历史数据上很大，也可能是偶然，需要 bootstrap 或 t-test 检验 显著性（p-value < 0.05）。小样本回测期（< 3年）的分层收益尤其不可靠。
- **`SHARED-FR-XFER-001`** <sub>(high)</sub>: 因子跨市场可移植性验证：在一个市场有效的因子，不必然在另一个市场有效。 将美股因子直接套用 A 股、或将股票因子套用期货/加密货币，需要独立 IC 验证， 不可假设跨市场通用性。A 股特有异象（如反转效应、ST 价格异常）不存在于美股。
- **`SHARED-FR-XFER-002`** <sub>(medium)</sub>: 因子有效性时间稳定性：曾经有效的因子会因市场学习和套利行为逐渐失效 （McLean & Pontiff 2016 证明因子发表后平均衰减 58%）。 应定期（每季度/年）重新评估因子 IC，失效因子应及时替换或降权。
- **`SHARED-FR-XFER-003`** <sub>(medium)</sub>: 因子与宏观经济环境的交互：利率周期/经济周期/市场情绪对因子有效性影响显著。 价值因子（低 P/B）在利率上升期更有效；动量因子在趋势市更有效，震荡市失效。 部署因子前应评估当前宏观环境与因子最优生存环境的匹配度。

FILE:references/LOCKS.md
# Semantic Locks + Preconditions

## Semantic Locks (12)

### `SL-01` <sub>(on_violation: fatal)</sub>
Execute sell orders before buy orders in every trading cycle

### `SL-02` <sub>(on_violation: fatal)</sub>
Trading signals MUST use next-bar execution (no look-ahead)

### `SL-03` <sub>(on_violation: fatal)</sub>
Entity IDs MUST follow format entity_type_exchange_code

### `SL-04` <sub>(on_violation: fatal)</sub>
DataFrame index MUST be MultiIndex (entity_id, timestamp)

### `SL-05` <sub>(on_violation: fatal)</sub>
TradingSignal MUST have EXACTLY ONE of: position_pct, order_money, order_amount

### `SL-06` <sub>(on_violation: fatal)</sub>
filter_result column semantics: True=BUY, False=SELL, None/NaN=NO ACTION

### `SL-07` <sub>(on_violation: fatal)</sub>
Transformer MUST run BEFORE Accumulator in factor pipeline

### `SL-08` <sub>(on_violation: fatal)</sub>
MACD parameters locked: fast=12, slow=26, signal=9

### `SL-09` <sub>(on_violation: warning)</sub>
Default transaction costs: buy_cost=0.001, sell_cost=0.001, slippage=0.001

### `SL-10` <sub>(on_violation: fatal)</sub>
A-share equity trading is T+1 (no same-day close of buy positions)

### `SL-11` <sub>(on_violation: fatal)</sub>
Recorder subclass MUST define provider AND data_schema class attributes

### `SL-12` <sub>(on_violation: fatal)</sub>
Factor result_df MUST contain either 'filter_result' OR 'score_result' column

## Preconditions (4)

- **`PC-01`**: `python3 -c 'import zvt; print(zvt.__version__)'` → on_fail: Run: python3 -m pip install zvt then re-run: python3 -m zvt.init_dirs to initialize data directories
- **`PC-02`**: `python3 -c "from zvt.api.kdata import get_kdata; df = get_kdata(entity_ids=['stock_sh_600000'], limit=1); assert df is not None and len(df) > 0, 'No kdata found'"` → on_fail: Run recorder first: python3 -m zvt.recorders.em.em_stock_kdata_recorder --entity_ids stock_sh_600000 (replace with your target entity IDs)
- **`PC-03`**: `python3 -c "import os; from pathlib import Path; zvt_home = Path(os.environ.get('ZVT_HOME', Path.home() / '.zvt')); assert zvt_home.exists(), f'ZVT home not found: {zvt_home}'"` → on_fail: Run: python3 -m zvt.init_dirs
- **`PC-04`**: `python3 -c "import os, tempfile; from pathlib import Path; zvt_home = Path(os.environ.get('ZVT_HOME', Path.home() / '.zvt')); test_f = zvt_home / '.write_test'; test_f.touch(); test_f.unlink()"` → on_fail: Check directory permissions: chmod u+w ~/.zvt or set ZVT_HOME environment variable to a writable location

FILE:references/USE_CASES.md
# Known Use Cases (KUC)

Total: **9**

## `KUC-101`
**Source**: `Paper Trading/Automated_Paper_Trading.ipynb`

Execute simulated paper trades in real-time using a trained PPO reinforcement learning agent connected to Alpaca brokerage API, enabling risk-free strategy validation before live deployment.

## `KUC-102`
**Source**: `examples/Aarons_portfolio_optimization_example.ipynb`

Optimize portfolio allocation across multiple assets using Markowitz mean-variance optimization to maximize risk-adjusted returns, balancing expected return against portfolio volatility.

## `KUC-103`
**Source**: `examples/FinRL_Ensemble_StockTrading_ICAIF_2020.ipynb`

Train and evaluate an ensemble of deep reinforcement learning agents for stock trading, combining multiple model predictions to improve robustness and performance across varying market conditions.

## `KUC-104`
**Source**: `examples/FinRL_PaperTrading_Demo.ipynb`

Demonstrate live paper trading execution using a PPO neural network agent connected to Alpaca's paper trading API, enabling real-time trade simulation with market data feeds.

## `KUC-105`
**Source**: `examples/FinRL_PortfolioOptimizationEnv_Demo.ipynb`

Train deep reinforcement learning agents for portfolio allocation across Brazilian stocks (B3 exchange), using custom portfolio environment with deep neural network policies to optimize multi-asset holdings.

## `KUC-106`
**Source**: `examples/Stock_NeurIPS2018_SB3.ipynb`

Implement stock trading strategies using StableBaselines3 library's DRL implementations (A2C, PPO, SAC) with feature engineering on technical indicators for training and evaluation on historical market data.

## `KUC-107`
**Source**: `examples/run_markowitz_portfolio_optimization.py`

Execute Markowitz mean-variance portfolio optimization algorithm as a standalone Python script, computing optimal asset weights based on historical returns covariance and expected returns to minimize portfolio variance.

## `KUC-108`
**Source**: `examples/run_rl_portfolio_optimization.py`

Train and run reinforcement learning agent (A2C) for portfolio optimization using StockPortfolioEnv, enabling adaptive portfolio allocation that learns from market interactions rather than static optimization.

## `KUC-109`
**Source**: `meta/env_execution_optimizing/order_execution_qlib/workflow_by_code.ipynb`

Optimize order execution using Qlib's LightGBM model to predict stock movements and implement TopkDropoutStrategy, improving trade execution quality by timing orders based on predicted signals.

FILE:references/WISDOM.md
# Cross-Project Wisdom

Total: **10**

## `CW-BT-001` — Cerebro 统一编排引擎
**From**: backtrader · **Applicable to**: backtesting

backtrader 用 Cerebro 作为单一入口，统一管理 data feeds、strategies、analyzers、 observers 的生命周期，支持一次 cerebro.run() 跑多策略+多数据源。 zvt 的 StockTrader 目前每次实例化只绑定一套因子，缺乏统一的多策略组合编排层； 借鉴 Cerebro 模式可让用户把多个 Trader 实例组合到一个 runner 中对比评估。

## `CW-BT-002` — Analyzer 插件化绩效评估
**From**: backtrader · **Applicable to**: backtesting

backtrader 提供 SharpeRatio、DrawDown、TimeReturn、TradeAnalyzer 等即插即用 的 Analyzer，可在不修改策略代码的情况下附加任意绩效指标。 zvt 当前绩效评估能力较弱，没有标准化的 Analyzer 接口； 借鉴此模式可让用户 cerebro.addanalyzer(SharpeRatio) 即得风险调整收益报告。

## `CW-BT-003` — Sizer 仓位管理分离
**From**: backtrader · **Applicable to**: backtesting

backtrader 将仓位管理（每次开仓买多少股/多大比例）单独抽象为 Sizer， 与信号逻辑完全解耦；内置 FixedSize、PercentSizer 等，用户可自定义。 zvt 目前没有显式的 Sizer 概念，仓位控制逻辑散落在 Trader.on_profit_control 等钩子中； 引入 Sizer 接口可使策略信号与资金管理规则独立演化和组合复用。

## `CW-BT-004` — Order 类型全集（Limit/Stop/OCO/Bracket）
**From**: backtrader · **Applicable to**: backtesting

backtrader 支持 Market、Limit、Stop、StopLimit、OCO（二选一）、 Bracket（止盈止损一对订单）等丰富订单类型，并模拟成交滑点和手续费方案。 zvt 回测目前主要支持市价成交，缺乏限价委托和组合订单模拟； 对于高频或实盘对接场景，完善订单类型将大幅提升回测真实性。

## `CW-BT-005` — 数据重采样与重播（Resampling & Replaying）
**From**: backtrader · **Applicable to**: backtesting

backtrader 可将低级别数据（如 1 min）实时 resample 为高级别（如 1 day）并同步驱动策略， 或 replay 逐 tick 模拟 OHLC 形成过程，实现日内精细回测。 zvt 目前多时间框架通过预录入不同级别 K 线实现，缺少运行时动态重采样； 借鉴此模式可在不重复录入数据的前提下支持任意时间粒度组合回测。

## `CW-VN-003` — CTA 回测引擎内置可视化
**From**: vnpy · **Applicable to**: backtesting

vnpy 的 cta_backtester 提供图形界面直接展示策略净值曲线、最大回撤、 每日盈亏、成交明细，无需 Jupyter Notebook。 zvt 目前回测结果可视化依赖 draw_result 方法调用 Plotly，但无统一的回测报告页面； 借鉴此模式可打包一个开箱即用的策略绩效仪表盘。

## `CW-VN-004` — vnpy.alpha ML 因子研究实验室（Lab）
**From**: vnpy · **Applicable to**: factor-research

vnpy 4.0 的 vnpy.alpha.lab 提供数据管理、模型训练、信号生成、策略回测一体化工作流， 支持 Lasso/LightGBM/MLP 等算法的标准化训练接口和可视化对比。 zvt 的 ML 能力目前仅有 MaStockMLMachine 一个入口，缺乏规范化 Lab 框架； 借鉴 Lab 模式可建立"特征工程→训练→信号→回测"的标准流水线，降低 ML 实验门槛。

## `CW-QL-001` — Point-in-Time 数据库（防未来数据泄漏）
**From**: qlib · **Applicable to**: backtesting

qlib 的 Point-in-Time Provider 保证在给定时间点 t 的查询只返回 t 时刻 真实可知的数据（财报发布延迟、修订历史均被正确处理）， 彻底消除回测中的 look-ahead bias。 zvt 目前财务数据以报告期为 timestamp，缺少"发布日"维度， 存在用未来财报数据做选股的潜在偏差；引入 PIT 模式可大幅提升回测可信度。

## `CW-QL-002` — Recorder + Experiment 实验管理（MLflow 风格）
**From**: qlib · **Applicable to**: factor-research

qlib 的 workflow 模块提供 Experiment/Recorder，自动记录每次模型训练的 超参数、特征、指标、预测结果，支持跨实验比较和模型版本管理。 zvt 目前缺乏 ML 实验追踪机制，每次重跑结果会覆盖前次； 借鉴 Recorder 模式可将每次因子实验的参数和结果持久化，支持快速复现和版本对比。

## `CW-QL-003` — Nested Decision Framework（多层嵌套决策执行）
**From**: qlib · **Applicable to**: backtesting

qlib 支持将高频执行层（分钟级委托拆单）嵌套在低频决策层（日级组合调仓）中， 两层独立优化且可组合运行，实现日内最优执行算法（如 TWAP、VWAP 调仓）。 zvt 目前回测仅有日线级别的成交假设，缺乏执行算法建模； 借鉴嵌套框架可让 zvt 区分"何时持有哪些股"与"如何以最小冲击成本建仓"两个问题。

FILE:references/components/agent_training.md
# agent_training (6 classes)

## `DRLAgent.get_model`
`agent_training/drlagent-get-model.py:0`

## `DRLAgent.train_model`
`agent_training/drlagent-train-model.py:0`

## `DRLAgent.DRL_prediction`
`agent_training/drlagent-drl-prediction.py:0`

## `DRLEnsembleAgent.run_ensemble_strategy`
`agent_training/drlensembleagent-run-ensemble-strategy.py:0`

## `algorithm`
`agent_training/algorithm.py:0`

## `framework`
`agent_training/framework.py:0`

FILE:references/components/data_collection.md
# data_collection (5 classes)

## `DataProcessor.fetch_data`
`data_collection/dataprocessor-fetch-data.py:0`

## `_Base.download_data`
`data_collection/base-download-data.py:0`

## `_Base.clean_data`
`data_collection/base-clean-data.py:0`

## `_Base.calc_time_zone`
`data_collection/base-calc-time-zone.py:0`

## `data_source`
`data_collection/data-source.py:0`

FILE:references/components/environment_simulation.md
# environment_simulation (5 classes)

## `StockTradingEnv.reset`
`environment_simulation/stocktradingenv-reset.py:0`

## `StockTradingEnv.step`
`environment_simulation/stocktradingenv-step.py:0`

## `StockTradingEnv.get_state`
`environment_simulation/stocktradingenv-get-state.py:0`

## `reward_function`
`environment_simulation/reward-function.py:0`

## `cost_model`
`environment_simulation/cost-model.py:0`

FILE:references/components/feature_engineering.md
# feature_engineering (4 classes)

## `_Base.add_technical_indicator`
`feature_engineering/base-add-technical-indicator.py:0`

## `_Base.add_turbulence`
`feature_engineering/base-add-turbulence.py:0`

## `_Base.df_to_array`
`feature_engineering/base-df-to-array.py:0`

## `indicator_library`
`feature_engineering/indicator-library.py:0`

FILE:references/components/order_execution_-_execution_optimization.md
# order_execution_&_execution_optimization (6 classes)

## `TWAP.execute`
`order_execution_&_execution_optimization/twap-execute.py:0`

## `VWAP.execute`
`order_execution_&_execution_optimization/vwap-execute.py:0`

## `AC.compute_AC_utility`
`order_execution_&_execution_optimization/ac-compute-ac-utility.py:0`

## `MarketEnvironment.start_transactions`
`order_execution_&_execution_optimization/marketenvironment-start-transactions.py:0`

## `execution_policy`
`order_execution_&_execution_optimization/execution-policy.py:0`

## `reward_type`
`order_execution_&_execution_optimization/reward-type.py:0`

FILE:references/components/paper_trading.md
# paper_trading (3 classes)

## `AlpacaPaperTrading.run`
`paper_trading/alpacapapertrading-run.py:0`

## `AlpacaPaperTradingMultiCrypto.test_latency`
`paper_trading/alpacapapertradingmulticrypto-test-laten.py:0`

## `broker`
`paper_trading/broker.py:0`

ClawHub Data Analysis Research+2

T@clawhub-tangweigang-jpg-8679fec286

Financial Ratios Toolkit

Skill

提供多市场财务分析能力，涵盖历史数据获取、财务报表解析、财务比率计算、固定收益分析、投资组合绩效评估和股票基本面筛选等核心功能。。

---
name: financial-ratios-toolkit
description: |-
  提供多市场财务分析能力，涵盖历史数据获取、财务报表解析、财务比率计算、固定收益分析、投资组合绩效评估和股票基本面筛选等核心功能。。
license: Proprietary. See LICENSE.txt in project root.
compatibility: Designed for Doramagic-host ecosystem (Claude Code / openclaw / Cursor). Requires Python 3.12+ with uv package manager.
metadata:
  version: "v6.1"
  blueprint_id: "finance-bp-118"
  compiled_at: "2026-04-22T13:00:57.393924+00:00"
  capability_markets: "multi-market"
  capability_activities: "portfolio-analytics"
  sop_version: "crystal-compilation-v6.1"
---
# 财务比率工具 (financial-ratios-toolkit)

> 提供多市场财务分析能力，涵盖历史数据获取、财务报表解析、财务比率计算、固定收益分析、投资组合绩效评估和股票基本面筛选等核心功能。

## Pipeline

`data_collection -> data_storage -> factor_computation -> target_selection -> trading_execution -> visualization`

## Top Use Cases (13 total)

### Multi-Module Financial Analysis Overview (`UC-101`)
Demonstrating comprehensive financial analysis capabilities covering multiple domains including historical data, financial statements, ratios, models,
**Triggers**: financial analysis, overview, multi-module

### Fixed Income Analysis and Bond Valuation (`UC-103`)
Analyzing fixed income securities including bond statistics, duration calculations, derivative pricing models, and government/corporate bond yield com
**Triggers**: bond, fixed income, yield

### Financial Ratio Analysis (`UC-106`)
Evaluating company financial health through profitability ratios, solvency ratios, liquidity ratios, valuation ratios, and custom ratio calculations f
**Triggers**: ratio, profitability, solvency

For all **13** use cases, see [references/USE_CASES.md](references/USE_CASES.md).

**Execute trigger**: `When user intent matches intent_router.uc_entries[].positive_terms AND user uses action verb (run/execute/跑/执行/backtest/fetch/collect)`

## What I'll Ask You

- Target market: A-share (default), HK, or crypto? (US stocks in ZVT are half-baked — stockus_nasdaq_AAPL exists but coverage is thin)
- Data source / provider: eastmoney (free, no account), joinquant (account+paid), baostock (free, good history), akshare, or qmt (broker)?
- Strategy type: MACD golden-cross, MA crossover, volume breakout, fundamental screen, or custom factor?
- Time range: start_timestamp and end_timestamp for backtest period
- Target entity IDs: specific stocks (stock_sh_600000) or index components (SZ1000)?

## Semantic Locks (Fatal)

| ID | Rule | On Violation |
|---|---|---|
| `SL-01` | Execute sell orders before buy orders in every trading cycle | halt |
| `SL-02` | Trading signals MUST use next-bar execution (no look-ahead) | halt |
| `SL-03` | Entity IDs MUST follow format entity_type_exchange_code | halt |
| `SL-04` | DataFrame index MUST be MultiIndex (entity_id, timestamp) | halt |
| `SL-05` | TradingSignal MUST have EXACTLY ONE of: position_pct, order_money, order_amount | halt |
| `SL-06` | filter_result column semantics: True=BUY, False=SELL, None/NaN=NO ACTION | halt |
| `SL-07` | Transformer MUST run BEFORE Accumulator in factor pipeline | halt |
| `SL-08` | MACD parameters locked: fast=12, slow=26, signal=9 | halt |

Full lock definitions: [references/LOCKS.md](references/LOCKS.md)

## Top Anti-Patterns (14 total)

- **`AP-PORTFOLIO-ANALYTICS-001`**: Division by zero in price ratio calculations corrupts rebalancing
- **`AP-PORTFOLIO-ANALYTICS-002`**: Look-ahead bias from unshifted signal generation and position calculations
- **`AP-PORTFOLIO-ANALYTICS-003`**: Non-positive-semidefinite covariance matrix breaks CVXPY optimization

All 14 anti-patterns: [references/ANTI_PATTERNS.md](references/ANTI_PATTERNS.md)

## Evidence Quality Notice

> [QUALITY NOTICE] This crystal was compiled from blueprint finance-bp-118. Evidence verify ratio = 33.3% and audit fail total = 62. Generated results may have uncaptured requirement gaps. Verify critical decisions against source files (LATEST.yaml / LATEST.jsonl).

## Reference Files

| File | Contents | When to Load |
|---|---|---|
| [references/seed.yaml](references/seed.yaml) | V6+ 全量权威 (source-of-truth) | 有行为/决策争议时必读 |
| [references/ANTI_PATTERNS.md](references/ANTI_PATTERNS.md) | 14 条跨项目反模式 | 开始实现前 |
| [references/WISDOM.md](references/WISDOM.md) | 跨项目精华借鉴 | 架构决策时 |
| [references/CONSTRAINTS.md](references/CONSTRAINTS.md) | domain + fatal 约束 | 规则冲突时 |
| [references/USE_CASES.md](references/USE_CASES.md) | 全量 KUC-* 业务场景 | 需要完整示例时 |
| [references/LOCKS.md](references/LOCKS.md) | SL-* + preconditions + hints | 生成回测/交易代码前 |
| [references/COMPONENTS.md](references/COMPONENTS.md) | AST 组件地图（按 module 拆分）| 查 API 时 |

---

*Compiled by Doramagic crystal-compilation-v6.1 from `finance-bp-118` blueprint at 2026-04-22T13:00:57.393924+00:00.*
*See [human_summary.md](human_summary.md) for non-technical overview.*

FILE:human_summary.md
# Human Summary

> I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow. Just tell me what you want; I'll write the code, you don't have to dig docs. (Heads up: ZVT natively supports A-share, HK, and crypto. US stocks — stockus_nasdaq_AAPL — are half-baked; don't bother for serious work.)

## What I Can Do

- **tagline**: I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow. Just tell me what you want; I'll write the code, you don't have to dig docs. (Heads up: ZVT natively supports A-share, HK, and crypto. US stocks — stockus_nasdaq_AAPL — are half-baked; don't bother for serious work.)
- **use_cases**: ['Fixed Income Analysis and Bond Valuation', 'Basic Historical Data and Financial Statements Retrieval', 'Multi-Module Financial Analysis Overview', 'A-share MACD daily golden-cross backtest with hfq price adjustment from eastmoney', 'End-to-end ZVT pipeline: FinanceRecorder + GoodCompanyFactor + StockTrader', 'Multi-factor strategy with TargetSelector (AND mode) combining MACD + volume breakout', 'Index composition data collection (SZ1000, SZ2000) with EM recorder']

## What I Ask You

- Target market: A-share (default), HK, or crypto? (US stocks in ZVT are half-baked — stockus_nasdaq_AAPL exists but coverage is thin)
- Data source / provider: eastmoney (free, no account), joinquant (account+paid), baostock (free, good history), akshare, or qmt (broker)?
- Strategy type: MACD golden-cross, MA crossover, volume breakout, fundamental screen, or custom factor?
- Time range: start_timestamp and end_timestamp for backtest period
- Target entity IDs: specific stocks (stock_sh_600000) or index components (SZ1000)?

FILE:references/ANTI_PATTERNS.md
# Anti-Patterns (Cross-Project)

Total: **14**

## finance-bp-066--wealthbot (2)

### `AP-PORTFOLIO-ANALYTICS-001` — Division by zero in price ratio calculations corrupts rebalancing <sub>(high)</sub>

When calculating price_diff using current_price divided by old_price without validating old_price is non-zero, the result is NaN or INF. This corrupts portfolio rebalancing calculations in wealthbot, causing incorrect buy/sell decisions based on invalid prices_diff values. The same issue appears in getPricesDiff() where divide-by-zero when old_price equals zero produces NaN/infinity that propagates to all subsequent trade decisions.

### `AP-PORTFOLIO-ANALYTICS-004` — Incorrect portfolio value tracking destroys time-series integrity <sub>(high)</sub>

Updating existing ClientPortfolioValue records instead of creating new ones destroys the time-series integrity needed for billing calculations and historical reconciliation. This creates data corruption where billing calculations and historical reporting against custodian records will fail to match. Portfolio value records must be linked to parent ClientPortfolio via proper relationships to avoid orphaned records.

## finance-bp-068--xalpha (1)

### `AP-PORTFOLIO-ANALYTICS-006` — FIFO sell order violation corrupts cost basis and XIRR <sub>(high)</sub>

Processing positions out of chronological order in FIFO sell operations causes incorrect cost basis assignment, leading to inaccurate realized gains/losses and wrong XIRR calculation. Chinese funds have tiered redemption fees based on holding periods, so FIFO violations result in incorrect holding period calculation and wrong redemption fee being applied, causing direct financial loss.

## finance-bp-068--xalpha, finance-bp-093--PyPortfolioOpt, finance-bp-117--Riskfolio-Lib (1)

### `AP-PORTFOLIO-ANALYTICS-010` — Missing DataFrame schema validation causes KeyError propagation <sub>(medium)</sub>

Passing non-DataFrame objects (numpy arrays, lists) where DataFrame is expected causes NameError, AttributeError, or TypeError in downstream pandas operations. xalpha's fundinfo.price requires specific columns (date, netvalue, totvalue, comment), PyPortfolioOpt and Riskfolio-Lib require index alignment between expected returns and covariance matrix. Missing columns cause backtest calculations to fail with NaN values or KeyError.

## finance-bp-082--stock-screener (1)

### `AP-PORTFOLIO-ANALYTICS-007` — Score validation bypass allows invalid composite calculations <sub>(medium)</sub>

Accepting scores outside the 0-100 range in screener results corrupts ranking and rating logic, causing unpredictable screening results that violate the fundamental score contract. When combined with division-by-zero guards that return 0.0 for empty screener lists, this creates unpredictable behavior where invalid scores produce wrong composite calculations and incorrect Strong Buy/Buy/Watch/Pass ratings.

## finance-bp-093--PyPortfolioOpt (1)

### `AP-PORTFOLIO-ANALYTICS-008` — Convex optimization constraints violate DCP rules <sub>(high)</sub>

Using non-convex objectives or DCP-violating expressions in CVXPY optimization causes DCPError, completely preventing portfolio optimization from running. Similarly, providing non-callable constraints or invalid bounds formats (not matching n_assets length) causes TypeError. Feasibility violations like setting target_volatility below global minimum or target_return above maximum achievable return make problems infeasible.

## finance-bp-093--PyPortfolioOpt, finance-bp-117--Riskfolio-Lib (1)

### `AP-PORTFOLIO-ANALYTICS-003` — Non-positive-semidefinite covariance matrix breaks CVXPY optimization <sub>(high)</sub>

Passing a non-positive-semidefinite covariance matrix to CVXPY optimization with assume_PSD=True produces incorrect results because the solver assumes validity without verification. This causes Cholesky decomposition to fail or produce garbage weights, preventing portfolio optimization from running entirely. Riskfolio-Lib and PyPortfolioOpt both require explicit PSD validation before optimization.

## finance-bp-106--pyfolio-reloaded (2)

### `AP-PORTFOLIO-ANALYTICS-005` — Allocation denominator excludes cash, corrupting portfolio composition <sub>(medium)</sub>

When computing allocation percentages excluding cash from the denominator, portfolio allocation percentages will not sum to 100%, misrepresenting the portfolio's actual composition. Additionally, concentration metrics become artificially skewed when including cash (a non-position asset), producing misleading diversification assessments that could lead to inappropriate risk management decisions.

### `AP-PORTFOLIO-ANALYTICS-009` — Transaction data corruption from missing columns and invalid dates <sub>(medium)</sub>

Extracting round trips from transactions DataFrame without validating required columns (amount, price, symbol) causes KeyError exceptions. When open_dt is not strictly less than close_dt, negative or zero duration values indicate data corruption causing incorrect holding period statistics. Similarly, non-normalized transaction timestamps cause intra-day trades to be incorrectly split across days.

## finance-bp-107--empyrical-reloaded (1)

### `AP-PORTFOLIO-ANALYTICS-011` — Wrong annualization factors distort cross-frequency metric comparison <sub>(high)</sub>

Applying incorrect annualization factors (wrong values for daily, weekly, monthly, quarterly, yearly frequencies) produces non-comparable metrics across different return frequencies, causing invalid strategy comparisons and misallocated capital. The Sharpe ratio formula must use correct annualization with sample standard deviation (ddof=1), otherwise producing misleading risk-adjusted return estimates.

## finance-bp-107--empyrical-reloaded, finance-bp-118--FinanceToolkit (1)

### `AP-PORTFOLIO-ANALYTICS-012` — Misaligned time series in alpha/beta calculation produces invalid factor analysis <sub>(high)</sub>

Passing returns and factor_returns to alpha_beta functions without verifying data alignment on index labels (pd.Series) or length equality (np.ndarray) produces incorrect alpha/beta values due to correlation computed between mismatched periods. Including benchmark ticker in the asset ticker list causes circular correlation producing meaningless beta values of approximately 1.0.

## finance-bp-108--finmarketpy (1)

### `AP-PORTFOLIO-ANALYTICS-013` — Forward-filling spot prices creates look-ahead bias in TRI construction <sub>(high)</sub>

Forward-filling spot prices creates look-ahead bias where future prices are used to calculate historical returns, invalidating all TRI-based backtest results. The total return index construction requires multiplicative cumulation using cumprod (not cumsum) with base value 100, as additive cumulation allows negative cumulative returns to break the index chain.

## finance-bp-108--finmarketpy, finance-bp-106--pyfolio-reloaded (1)

### `AP-PORTFOLIO-ANALYTICS-002` — Look-ahead bias from unshifted signal generation and position calculations <sub>(high)</sub>

Generating trading signals from current-period technical indicators (RSI, moving averages) without proper shift(-1) creates look-ahead bias, causing live trading returns to fall far below backtested results. Similarly, when estimating intraday positions from transactions without applying shift(1) to EOD positions, day-start positions are contaminated with end-of-day values, making results unrepresentative of actual trading.

## finance-bp-117--Riskfolio-Lib, finance-bp-093--PyPortfolioOpt (1)

### `AP-PORTFOLIO-ANALYTICS-014` — Unsupported solver selection breaks advanced risk calculations <sub>(medium)</sub>

Using solvers that don't support required cone programming (power cone, exponential cone) causes CVXPY to fail with SolverError, returning None and breaking risk calculations. CLARABEL, SCS, ECOS support power cone for RLVaR/RLDaR calculations, while CLARABEL/MOSEK/SCS/ECOS support exponential cone for EVaR calculations. Riskfolio-Lib and PyPortfolioOpt both require careful solver selection.

FILE:references/COMPONENTS.md
# Component Capability Map

**Project**: finance-bp-118--FinanceToolkit
**Scan date**: 2026-04-22
**Stats**: {'total_files': 12, 'total_classes': 49, 'total_functions': 0, 'total_stages': 12}

## Modules (12)

- [data_acquisition](components/data_acquisition.md): 3 classes
- [financial_statement_normalization](components/financial_statement_normalization.md): 3 classes
- [financial_ratio_calculation](components/financial_ratio_calculation.md): 6 classes
- [performance_analysis](components/performance_analysis.md): 4 classes
- [risk_analysis](components/risk_analysis.md): 4 classes
- [technical_analysis](components/technical_analysis.md): 4 classes
- [options_pricing_&_greeks](components/options_pricing_-_greeks.md): 4 classes
- [financial_modeling](components/financial_modeling.md): 6 classes
- [security_discovery](components/security_discovery.md): 3 classes
- [fixed_income_analysis](components/fixed_income_analysis.md): 4 classes
- [economic_data](components/economic_data.md): 4 classes
- [portfolio_management](components/portfolio_management.md): 4 classes

FILE:references/CONSTRAINTS.md
# Constraints

## preservation_manifest

```yaml
required_objects:
  business_decisions_count: 177
  fatal_constraints_count: 56
  non_fatal_constraints_count: 352
  use_cases_count: 13
  semantic_locks_count: 12
  preconditions_count: 4
  evidence_quality_rules_count: 2
  traceback_scenarios_count: 5

```

FILE:references/LOCKS.md
# Semantic Locks + Preconditions

## Semantic Locks (12)

### `SL-01` <sub>(on_violation: fatal)</sub>
Execute sell orders before buy orders in every trading cycle

### `SL-02` <sub>(on_violation: fatal)</sub>
Trading signals MUST use next-bar execution (no look-ahead)

### `SL-03` <sub>(on_violation: fatal)</sub>
Entity IDs MUST follow format entity_type_exchange_code

### `SL-04` <sub>(on_violation: fatal)</sub>
DataFrame index MUST be MultiIndex (entity_id, timestamp)

### `SL-05` <sub>(on_violation: fatal)</sub>
TradingSignal MUST have EXACTLY ONE of: position_pct, order_money, order_amount

### `SL-06` <sub>(on_violation: fatal)</sub>
filter_result column semantics: True=BUY, False=SELL, None/NaN=NO ACTION

### `SL-07` <sub>(on_violation: fatal)</sub>
Transformer MUST run BEFORE Accumulator in factor pipeline

### `SL-08` <sub>(on_violation: fatal)</sub>
MACD parameters locked: fast=12, slow=26, signal=9

### `SL-09` <sub>(on_violation: warning)</sub>
Default transaction costs: buy_cost=0.001, sell_cost=0.001, slippage=0.001

### `SL-10` <sub>(on_violation: fatal)</sub>
A-share equity trading is T+1 (no same-day close of buy positions)

### `SL-11` <sub>(on_violation: fatal)</sub>
Recorder subclass MUST define provider AND data_schema class attributes

### `SL-12` <sub>(on_violation: fatal)</sub>
Factor result_df MUST contain either 'filter_result' OR 'score_result' column

## Preconditions (4)

- **`PC-01`**: `python3 -c 'import zvt; print(zvt.__version__)'` → on_fail: Run: python3 -m pip install zvt then re-run: python3 -m zvt.init_dirs to initialize data directories
- **`PC-02`**: `python3 -c "from zvt.api.kdata import get_kdata; df = get_kdata(entity_ids=['stock_sh_600000'], limit=1); assert df is not None and len(df) > 0, 'No kdata found'"` → on_fail: Run recorder first: python3 -m zvt.recorders.em.em_stock_kdata_recorder --entity_ids stock_sh_600000 (replace with your target entity IDs)
- **`PC-03`**: `python3 -c "import os; from pathlib import Path; zvt_home = Path(os.environ.get('ZVT_HOME', Path.home() / '.zvt')); assert zvt_home.exists(), f'ZVT home not found: {zvt_home}'"` → on_fail: Run: python3 -m zvt.init_dirs
- **`PC-04`**: `python3 -c "import os, tempfile; from pathlib import Path; zvt_home = Path(os.environ.get('ZVT_HOME', Path.home() / '.zvt')); test_f = zvt_home / '.write_test'; test_f.touch(); test_f.unlink()"` → on_fail: Check directory permissions: chmod u+w ~/.zvt or set ZVT_HOME environment variable to a writable location

FILE:references/USE_CASES.md
# Known Use Cases (KUC)

Total: **13**

## `KUC-101`
**Source**: `examples/Finance Toolkit - 0. README Examples.ipynb`

Demonstrating comprehensive financial analysis capabilities covering multiple domains including historical data, financial statements, ratios, models, options, performance, risk, technical analysis, and fixed income in a single workflow.

## `KUC-102`
**Source**: `examples/Finance Toolkit - 1. Getting Started.ipynb`

Retrieving foundational financial data including historical price data, balance sheets, income statements, and cash flow statements for fundamental analysis and financial modeling.

## `KUC-103`
**Source**: `examples/Finance Toolkit - 10. Fixed Income Module.ipynb`

Analyzing fixed income securities including bond statistics, duration calculations, derivative pricing models, and government/corporate bond yield comparisons across multiple countries.

## `KUC-104`
**Source**: `examples/Finance Toolkit - 11. Portfolio Module.ipynb`

Measuring portfolio performance metrics, transaction history, risk-adjusted returns, and benchmarking against market indices like S&P 500 to evaluate investment performance.

## `KUC-105`
**Source**: `examples/Finance Toolkit - 5. Discovery Module.ipynb`

Discovering investment opportunities through stock screening based on market cap, price, beta, volume, and dividend criteria, and identifying top gainers, losers, and most active stocks.

## `KUC-106`
**Source**: `examples/Finance Toolkit - 3. Ratios Module.ipynb`

Evaluating company financial health through profitability ratios, solvency ratios, liquidity ratios, valuation ratios, and custom ratio calculations for multiple companies over time.

## `KUC-107`
**Source**: `examples/Finance Toolkit - 4. Models Module.ipynb`

Applying financial models including Extended Dupont analysis, WACC calculation, Altman Z-score for bankruptcy prediction, and Piotroski F-Score for financial health assessment.

## `KUC-108`
**Source**: `examples/Finance Toolkit - 5. Options Module.ipynb`

Computing options pricing using Black-Scholes and binomial models, simulating stock price paths with Monte Carlo methods, and analyzing Greeks (delta, gamma, etc.) for options strategy evaluation.

## `KUC-109`
**Source**: `examples/Finance Toolkit - 6. Technicals Module.ipynb`

Calculating technical indicators including Bollinger Bands, RSI, ADX, and other chart patterns to identify trends, momentum, overbought/oversold conditions, and trading signals.

## `KUC-110`
**Source**: `examples/Finance Toolkit - 7. Risk Module.ipynb`

Quantifying investment risk through Value at Risk (VaR), Conditional VaR (CVaR), maximum drawdown, and return distribution analysis to measure downside risk and tail losses.

## `KUC-111`
**Source**: `examples/Finance Toolkit - 8. Performance Module.ipynb`

Evaluating investment performance using CAPM, Fama-French multi-factor models, Sharpe ratio, and Jensen's alpha to understand risk-adjusted returns and factor-based performance attribution.

## `KUC-112`
**Source**: `examples/Finance Toolkit - 9. Economics Module.ipynb`

Tracking macroeconomic conditions through consumer confidence indices, short-term and long-term interest rates across multiple countries to inform investment decisions.

## `KUC-113`
**Source**: `examples/Finance Toolkit - Using External Datasets.ipynb`

Importing proprietary or third-party financial data from CSV files, normalizing formats, and combining multiple datasets for unified analysis within the toolkit.

FILE:references/WISDOM.md
# Cross-Project Wisdom

Total: **10**

## `CW-PORTFOLIO-ANALYTICS-001` — Defensive zero-division guards with explicit handling
**From**: finance-bp-066--wealthbot, finance-bp-082--stock-screener, finance-bp-093--PyPortfolioOpt · **Applicable to**: portfolio-analytics

Always guard division operations with explicit zero-value checks before executing. In price ratio calculations, filter out securities where old_price is zero before calling getPricesDiff. In composite score calculations, guard against total_weight of zero and return 0.0 for empty input lists. This prevents NaN/infinity propagation that corrupts downstream calculations and crashes pipelines.

## `CW-PORTFOLIO-ANALYTICS-002` — Covariance matrix positive-semidefiniteness verification
**From**: finance-bp-093--PyPortfolioOpt, finance-bp-117--Riskfolio-Lib · **Applicable to**: portfolio-analytics

Always verify covariance matrix is positive-semidefinite before passing to CVXPY optimization. Apply eigenvalue clipping if violated, as non-PSD matrices cause Cholesky decomposition failures. Both PyPortfolioOpt and Riskfolio-Lib enforce this constraint to prevent optimizer from finding mathematically invalid solutions or crashing entirely.

## `CW-PORTFOLIO-ANALYTICS-003` — Geometric compounding for cumulative returns
**From**: finance-bp-068--xalpha, finance-bp-106--pyfolio-reloaded, finance-bp-107--empyrical-reloaded · **Applicable to**: portfolio-analytics

Compute cumulative returns using geometric compounding via cumprod(1 + returns), never arithmetic cumulation via cumsum. Arithmetic cumulative sum overstates gains and understates losses, causing cumulative returns to diverge significantly from actual portfolio performance over volatile periods. This principle applies to total return index construction and any cumulative performance calculation.

## `CW-PORTFOLIO-ANALYTICS-004` — Temporal shift enforcement to prevent look-ahead bias
**From**: finance-bp-108--finmarketpy, finance-bp-106--pyfolio-reloaded · **Applicable to**: portfolio-analytics

Enforce proper temporal shifting in signal generation and position calculations. Use shift(-1) for exit signals to prevent look-ahead bias, and shift(1) when estimating intraday positions from EOD data. Forward-fill carry data and backward-fill only old data gaps, never forward-fill spot prices. Violations cause live trading returns to diverge from backtested results.

## `CW-PORTFOLIO-ANALYTICS-005` — DCP-compliant convex optimization construction
**From**: finance-bp-093--PyPortfolioOpt, finance-bp-117--Riskfolio-Lib · **Applicable to**: portfolio-analytics

Use only DCP-compliant convex objectives and constraints in CVXPY. Provide constraints as callable functions accepting weight variables, use valid bounds formats matching n_assets length, and verify target parameters (volatility, return) are within feasible ranges. Non-convex or infeasible problems fail with DCPError or OptimizationError, preventing optimization entirely.

## `CW-PORTFOLIO-ANALYTICS-006` — Correct Sharpe ratio formula with risk-free rate subtraction
**From**: finance-bp-107--empyrical-reloaded, finance-bp-118--FinanceToolkit · **Applicable to**: portfolio-analytics

Calculate Sharpe ratio using (mean returns - risk_free) / std(returns) * sqrt(annualization) with sample standard deviation (ddof=1). Subtract risk-free rate from asset returns before dividing by volatility. Incorrect Sharpe ratio calculation produces misleading risk-adjusted return estimates, causing poor investment decisions based on faulty performance attribution.

## `CW-PORTFOLIO-ANALYTICS-007` — Immutable FIFO position tracking with chronological ordering
**From**: finance-bp-068--xalpha, finance-bp-066--wealthbot · **Applicable to**: portfolio-analytics

Maintain FIFO position tracking with strictly increasing date order for position entries. Use copy() function to create independent copies before mutating remtable to avoid side effects. Enforce chronological ordering in sell operations to ensure correct cost basis and holding period calculation, particularly important for funds with tiered fees by holding period.

## `CW-PORTFOLIO-ANALYTICS-008` — Validation at system boundaries with descriptive errors
**From**: finance-bp-082--stock-screener, finance-bp-093--PyPortfolioOpt, finance-bp-117--Riskfolio-Lib · **Applicable to**: portfolio-analytics

Enforce validation at system boundaries with descriptive error messages. Validate expected returns matches covariance matrix dimensions, score values are within [0, 100], confidence values within [0, 1], and required DataFrame columns are present. Invalid inputs should raise ValueError with descriptive messages listing valid options to prevent silent failures or corrupted calculations.

## `CW-PORTFOLIO-ANALYTICS-009` — Decimal rounding for monetary calculations
**From**: finance-bp-068--xalpha, finance-bp-107--empyrical-reloaded · **Applicable to**: portfolio-analytics

Use Decimal with explicit rounding (myround) for each monetary calculation to avoid floating-point errors that cause share miscalculation and incorrect cost basis. This prevents rounding errors from propagating to XIRR and portfolio valuation calculations. Direct floating-point operations in financial calculations accumulate errors that become material over many transactions.

## `CW-PORTFOLIO-ANALYTICS-010` — Cash flow sign convention enforcement
**From**: finance-bp-106--pyfolio-reloaded, finance-bp-068--xalpha · **Applicable to**: portfolio-analytics

Mark cash outflows as negative and cash inflows as positive in cftable. Incorrect cash flow signs cause NPV calculation to invert, producing negative returns for profitable trades and vice versa. Verify sum of round trip PnLs equals total realized transaction dollars to catch sign convention errors before they corrupt performance attribution.

FILE:references/components/data_acquisition.md
# data_acquisition (3 classes)

## `Toolkit.get_historical_data`
`data_acquisition/toolkit-get-historical-data.py:0`

## `Toolkit.get_financial_statements`
`data_acquisition/toolkit-get-financial-statements.py:0`

## `data_source`
`data_acquisition/data-source.py:0`

FILE:references/components/economic_data.md
# economic_data (4 classes)

## `Economics.get_gross_domestic_product`
`economic_data/economics-get-gross-domestic-product.py:0`

## `Economics.get_real_gross_domestic_product`
`economic_data/economics-get-real-gross-domestic-produc.py:0`

## `Economics.get_inflation`
`economic_data/economics-get-inflation.py:0`

## `data_source`
`economic_data/data-source.py:0`

FILE:references/components/financial_modeling.md
# financial_modeling (6 classes)

## `Models.get_dupont_analysis`
`financial_modeling/models-get-dupont-analysis.py:0`

## `Models.get_weighted_average_cost_of_capital`
`financial_modeling/models-get-weighted-average-cost-of-capi.py:0`

## `method`
`financial_modeling/method.py:0`

## `Models.get_altman_z_score`
`financial_modeling/models-get-altman-z-score.py:0`

## `Models.get_piotroski_f_score`
`financial_modeling/models-get-piotroski-f-score.py:0`

## `discount_rate`
`financial_modeling/discount-rate.py:0`

FILE:references/components/financial_ratio_calculation.md
# financial_ratio_calculation (6 classes)

## `Ratios.collect_profitability_ratios`
`financial_ratio_calculation/ratios-collect-profitability-ratios.py:0`

## `Ratios.collect_liquidity_ratios`
`financial_ratio_calculation/ratios-collect-liquidity-ratios.py:0`

## `Ratios.collect_valuation_ratios`
`financial_ratio_calculation/ratios-collect-valuation-ratios.py:0`

## `Ratios.collect_efficiency_ratios`
`financial_ratio_calculation/ratios-collect-efficiency-ratios.py:0`

## `Ratios.collect_all_ratios`
`financial_ratio_calculation/ratios-collect-all-ratios.py:0`

## `custom_ratios`
`financial_ratio_calculation/custom-ratios.py:0`

FILE:references/components/financial_statement_normalization.md
# financial_statement_normalization (3 classes)

## `normalize_statements`
`financial_statement_normalization/normalize-statements.py:0`

## `convert_financial_statements`
`financial_statement_normalization/convert-financial-statements.py:0`

## `normalization_format`
`financial_statement_normalization/normalization-format.py:0`

FILE:references/components/fixed_income_analysis.md
# fixed_income_analysis (4 classes)

## `FixedIncome.get_bond_price`
`fixed_income_analysis/fixedincome-get-bond-price.py:0`

## `FixedIncome.get_ice_bofa_option_adjusted_spread`
`fixed_income_analysis/fixedincome-get-ice-bofa-option-adjusted.py:0`

## `FixedIncome.get_yield_to_maturity`
`fixed_income_analysis/fixedincome-get-yield-to-maturity.py:0`

## `rate_source`
`fixed_income_analysis/rate-source.py:0`

FILE:references/components/options_pricing_-_greeks.md
# options_pricing_&_greeks (4 classes)

## `Options.get_delta`
`options_pricing_&_greeks/options-get-delta.py:0`

## `Options.get_gamma`
`options_pricing_&_greeks/options-get-gamma.py:0`

## `Options.get_implied_volatility`
`options_pricing_&_greeks/options-get-implied-volatility.py:0`

## `pricing_model`
`options_pricing_&_greeks/pricing-model.py:0`

FILE:references/components/performance_analysis.md
# performance_analysis (4 classes)

## `Performance.get_sharpe_ratio`
`performance_analysis/performance-get-sharpe-ratio.py:0`

## `Performance.get_capital_asset_pricing_model`
`performance_analysis/performance-get-capital-asset-pricing-mo.py:0`

## `Performance.get_fama_french`
`performance_analysis/performance-get-fama-french.py:0`

## `risk_free_rate`
`performance_analysis/risk-free-rate.py:0`

FILE:references/components/portfolio_management.md
# portfolio_management (4 classes)

## `Portfolio.read_portfolio_dataset`
`portfolio_management/portfolio-read-portfolio-dataset.py:0`

## `Portfolio.get_portfolio_performance`
`portfolio_management/portfolio-get-portfolio-performance.py:0`

## `Portfolio.get_positions_overview`
`portfolio_management/portfolio-get-positions-overview.py:0`

## `benchmark`
`portfolio_management/benchmark.py:0`

FILE:references/components/risk_analysis.md
# risk_analysis (4 classes)

## `Risk.get_var_historic`
`risk_analysis/risk-get-var-historic.py:0`

## `Risk.get_var_gaussian`
`risk_analysis/risk-get-var-gaussian.py:0`

## `Risk.get_max_drawdown`
`risk_analysis/risk-get-max-drawdown.py:0`

## `var_method`
`risk_analysis/var-method.py:0`

FILE:references/components/security_discovery.md
# security_discovery (3 classes)

## `Discovery.search_instruments`
`security_discovery/discovery-search-instruments.py:0`

## `Discovery.screen_stocks`
`security_discovery/discovery-screen-stocks.py:0`

## `search_method`
`security_discovery/search-method.py:0`

FILE:references/components/technical_analysis.md
# technical_analysis (4 classes)

## `Technicals.get_moving_average`
`technical_analysis/technicals-get-moving-average.py:0`

## `Technicals.get_money_flow_index`
`technical_analysis/technicals-get-money-flow-index.py:0`

## `Technicals.get_bollinger_bands`
`technical_analysis/technicals-get-bollinger-bands.py:0`

## `period`
`technical_analysis/period.py:0`

ClawHub Coding Data Analysis+2

T@clawhub-tangweigang-jpg-8679fec286

Financepy Derivatives

Skill

基于 FinancePy 框架的金融工具日期处理与定价能力，支持多国节假日日历与天数计数约定处理，生成债券和互换现金流调度，计算收益率和价格。

---
name: financepy-derivatives
description: |-
  基于 FinancePy 框架的金融工具日期处理与定价能力，支持多国节假日日历与天数计数约定处理，生成债券和互换现金流调度，计算收益率和价格。
license: Proprietary. See LICENSE.txt in project root.
compatibility: Designed for Doramagic-host ecosystem (Claude Code / openclaw / Cursor). Requires Python 3.12+ with uv package manager.
metadata:
  version: "v6.1"
  blueprint_id: "finance-bp-101"
  compiled_at: "2026-04-22T13:00:46.579380+00:00"
  capability_markets: "global"
  capability_activities: "derivatives-pricing"
  sop_version: "crystal-compilation-v6.1"
---
# FinancePy 衍生品定价 (financepy-derivatives)

> 基于 FinancePy 框架的金融工具日期处理与定价能力，支持多国节假日日历与天数计数约定处理，生成债券和互换现金流调度，计算收益率和价格。

## Pipeline

`data_collection -> data_storage -> factor_computation -> target_selection -> trading_execution -> visualization`

## Top Use Cases (88 total)

### Holiday Calendar Usage (`UC-101`)
Determining business days and holidays for different countries to correctly schedule financial transactions and settlements
**Triggers**: calendar, holiday, business days

### Financial Date Creation and Manipulation (`UC-103`)
Creating and manipulating financial dates including adding days, months, tenors, and handling weekends for trade scheduling
**Triggers**: date creation, add days, add months

### Day Count Conventions Introduction (`UC-104`)
Calculating year fractions and day counts using various conventions (ACT/360, ACT/365, 30/360) for interest accrual calculations
**Triggers**: day count, year fraction, accrued interest

For all **88** use cases, see [references/USE_CASES.md](references/USE_CASES.md).

**Execute trigger**: `When user intent matches intent_router.uc_entries[].positive_terms AND user uses action verb (run/execute/跑/执行/backtest/fetch/collect)`

## What I'll Ask You

- Target market: A-share (default), HK, or crypto? (US stocks in ZVT are half-baked — stockus_nasdaq_AAPL exists but coverage is thin)
- Data source / provider: eastmoney (free, no account), joinquant (account+paid), baostock (free, good history), akshare, or qmt (broker)?
- Strategy type: MACD golden-cross, MA crossover, volume breakout, fundamental screen, or custom factor?
- Time range: start_timestamp and end_timestamp for backtest period
- Target entity IDs: specific stocks (stock_sh_600000) or index components (SZ1000)?

## Semantic Locks (Fatal)

| ID | Rule | On Violation |
|---|---|---|
| `SL-01` | Execute sell orders before buy orders in every trading cycle | halt |
| `SL-02` | Trading signals MUST use next-bar execution (no look-ahead) | halt |
| `SL-03` | Entity IDs MUST follow format entity_type_exchange_code | halt |
| `SL-04` | DataFrame index MUST be MultiIndex (entity_id, timestamp) | halt |
| `SL-05` | TradingSignal MUST have EXACTLY ONE of: position_pct, order_money, order_amount | halt |
| `SL-06` | filter_result column semantics: True=BUY, False=SELL, None/NaN=NO ACTION | halt |
| `SL-07` | Transformer MUST run BEFORE Accumulator in factor pipeline | halt |
| `SL-08` | MACD parameters locked: fast=12, slow=26, signal=9 | halt |

Full lock definitions: [references/LOCKS.md](references/LOCKS.md)

## Top Anti-Patterns (15 total)

- **`AP-DERIVATIVES-PRICING-001`**: Instrument NPV called without attached pricing engine
- **`AP-DERIVATIVES-PRICING-002`**: BSM forward price ignores dividend yield
- **`AP-DERIVATIVES-PRICING-003`**: Negative discount factors passed to log-domain interpolation

All 15 anti-patterns: [references/ANTI_PATTERNS.md](references/ANTI_PATTERNS.md)

## Evidence Quality Notice

> [QUALITY NOTICE] This crystal was compiled from blueprint finance-bp-101. Evidence verify ratio = 3.4% and audit fail total = 34. Generated results may have uncaptured requirement gaps. Verify critical decisions against source files (LATEST.yaml / LATEST.jsonl).

## Reference Files

| File | Contents | When to Load |
|---|---|---|
| [references/seed.yaml](references/seed.yaml) | V6+ 全量权威 (source-of-truth) | 有行为/决策争议时必读 |
| [references/ANTI_PATTERNS.md](references/ANTI_PATTERNS.md) | 15 条跨项目反模式 | 开始实现前 |
| [references/WISDOM.md](references/WISDOM.md) | 跨项目精华借鉴 | 架构决策时 |
| [references/CONSTRAINTS.md](references/CONSTRAINTS.md) | domain + fatal 约束 | 规则冲突时 |
| [references/USE_CASES.md](references/USE_CASES.md) | 全量 KUC-* 业务场景 | 需要完整示例时 |
| [references/LOCKS.md](references/LOCKS.md) | SL-* + preconditions + hints | 生成回测/交易代码前 |
| [references/COMPONENTS.md](references/COMPONENTS.md) | AST 组件地图（按 module 拆分）| 查 API 时 |

---

*Compiled by Doramagic crystal-compilation-v6.1 from `finance-bp-101` blueprint at 2026-04-22T13:00:46.579380+00:00.*
*See [human_summary.md](human_summary.md) for non-technical overview.*

FILE:human_summary.md
# Human Summary

> I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow. Just tell me what you want; I'll write the code, you don't have to dig docs. (Heads up: ZVT natively supports A-share, HK, and crypto. US stocks — stockus_nasdaq_AAPL — are half-baked; don't bother for serious work.)

## What I Can Do

- **tagline**: I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow. Just tell me what you want; I'll write the code, you don't have to dig docs. (Heads up: ZVT natively supports A-share, HK, and crypto. US stocks — stockus_nasdaq_AAPL — are half-baked; don't bother for serious work.)
- **use_cases**: ['Financial Date Creation and Manipulation', 'Date Internal Testing', 'Holiday Calendar Usage', 'A-share MACD daily golden-cross backtest with hfq price adjustment from eastmoney', 'End-to-end ZVT pipeline: FinanceRecorder + GoodCompanyFactor + StockTrader', 'Multi-factor strategy with TargetSelector (AND mode) combining MACD + volume breakout', 'Index composition data collection (SZ1000, SZ2000) with EM recorder']

## What I Ask You

- Target market: A-share (default), HK, or crypto? (US stocks in ZVT are half-baked — stockus_nasdaq_AAPL exists but coverage is thin)
- Data source / provider: eastmoney (free, no account), joinquant (account+paid), baostock (free, good history), akshare, or qmt (broker)?
- Strategy type: MACD golden-cross, MA crossover, volume breakout, fundamental screen, or custom factor?
- Time range: start_timestamp and end_timestamp for backtest period
- Target entity IDs: specific stocks (stock_sh_600000) or index components (SZ1000)?

FILE:references/ANTI_PATTERNS.md
# Anti-Patterns (Cross-Project)

Total: **15**

## FinancePy (finance-bp-101) (3)

### `AP-DERIVATIVES-PRICING-003` — Negative discount factors passed to log-domain interpolation <sub>(high)</sub>

When Numba-jitted interpolation functions perform log transformation on discount factors, negative or zero values cause domain errors. This occurs because log(-x) and log(0) are mathematically undefined. The consequence is runtime crashes in jitted functions and complete failure of discount curve interpolation, blocking all downstream pricing calculations.

### `AP-DERIVATIVES-PRICING-004` — Non-monotonic time points in discount curve interpolation <sub>(high)</sub>

Interpolation over non-monotonically increasing time points produces undefined behavior at crossing times, causing discount factors to be incorrectly computed where time values overlap. This corrupts the entire term structure because the bootstrap algorithm cannot determine which discount factor corresponds to which maturity. The consequence is incorrect present value calculations across all downstream products priced against the curve.

### `AP-DERIVATIVES-PRICING-005` — Bootstrap calibration instruments not in maturity order <sub>(high)</sub>

When building yield curves from market instruments (deposits, FRAs, swaps), the instruments must be provided in strictly increasing maturity order. Out-of-order instruments cause the bootstrap algorithm to solve for discount factors at incorrect time points, corrupting the entire term structure. The consequence is wrong forward rates and discount factors that propagate into all priced instruments.

## QuantLib-SWIG (finance-bp-123) (4)

### `AP-DERIVATIVES-PRICING-001` — Instrument NPV called without attached pricing engine <sub>(high)</sub>

Calling NPV() on a derivatives instrument without first calling setPricingEngine() returns uninitialized garbage values or throws null pointer exceptions. This occurs because the Instrument class relies on the attached PricingEngine to perform actual valuation logic. The consequence is silently incorrect pricing results that appear valid, potentially leading to bad trading decisions.

### `AP-DERIVATIVES-PRICING-006` — Option Exercise type mismatches VanillaOption constructor <sub>(high)</sub>

VanillaOption requires both a StrikedTypePayoff and a matching Exercise object. Using wrong Exercise type (e.g., AmericanExercise for European option) causes compilation failures in C++ or runtime errors in SWIG bindings. The consequence is the pricing system cannot initialize options, blocking all option pricing workflows.

### `AP-DERIVATIVES-PRICING-013` — Evaluation date not set before QuantLib term structure construction <sub>(medium)</sub>

QuantLib requires ql.Settings.instance().evaluationDate to be set before constructing yield term structures and instruments. Without an explicit evaluation date, the curve reference date becomes undefined, causing date calculations to fail or produce incorrect settlement dates. The consequence is wrong discount factors and NPV calculations across the entire portfolio.

### `AP-DERIVATIVES-PRICING-014` — Market quotes passed without QuoteHandle wrapper <sub>(medium)</sub>

QuantLib's observer pattern requires all market quotes to be wrapped in QuoteHandle before passing to rate helpers. Raw quote values bypass the observable notification mechanism, causing dependent instruments to never recalculate when market data updates. The consequence is stale pricing that doesn't reflect current market conditions.

## arch (finance-bp-124) (2)

### `AP-DERIVATIVES-PRICING-007` — NaN/inf values in ARCH model input data <sub>(high)</sub>

ARCH model estimation relies on recursive variance computations and scipy optimize. Non-finite input values (NaN, inf) cause optimizers to produce NaN results and recursive variance calculations to fail. The consequence is complete model estimation failure with meaningless outputs that appear valid, leading to incorrect volatility forecasts and risk misestimation.

### `AP-DERIVATIVES-PRICING-008` — ARCH parameter array concatenation in wrong order <sub>(high)</sub>

ARCHModel composes from three components (mean, volatility, distribution) and requires parameter arrays concatenated in fixed order: [mean_params, volatility_params, distribution_params]. Incorrect ordering causes _parse_parameters to assign wrong values to wrong components, producing mathematically invalid models (e.g., volatility parameters interpreted as distribution parameters). The consequence is invalid conditional variance forecasts.

## py_vollib (finance-bp-127) (6)

### `AP-DERIVATIVES-PRICING-002` — BSM forward price ignores dividend yield <sub>(high)</sub>

When calculating option prices on dividend-paying stocks using BSM, the forward price must be adjusted as F = S * exp((r-q)*t). Omitting the dividend yield adjustment (using F = S * exp(r*t)) causes systematic mispricing for all dividend-paying assets. The consequence is consistently wrong option prices that diverge from market prices, leading to arbitrage opportunities and trading losses.

### `AP-DERIVATIVES-PRICING-009` — Zero or negative time-to-expiration in option pricing <sub>(high)</sub>

Option pricing formulas (Black-Scholes, Black model) compute sqrt(t) in the denominator. Zero time causes division by zero; negative time produces NaN in d1/d2 calculations. The consequence is invalid option prices (NaN, inf) that break downstream Greeks calculations and hedging workflows.

### `AP-DERIVATIVES-PRICING-010` — Black model applies spot price instead of forward price <sub>(high)</sub>

The Black model is designed for options on futures/forwards and expects futures price F as input, not spot price S. Using spot directly causes incorrect pricing because the Black formula assumes the underlying follows geometric Brownian motion with drift equal to the risk-free rate (i.e., forward dynamics). The consequence is systematically wrong forward option prices.

### `AP-DERIVATIVES-PRICING-011` — Missing discount factor in Black model pricing <sub>(medium)</sub>

Black model pricing must apply time value discounting with deflater = exp(-r*t) to undiscounted option prices. Omitting the discount factor produces forward option prices that exceed their fair value by the risk-free compounding amount. The consequence is violation of time value of money principles and prices that cannot be used for fair valuation or hedging.

### `AP-DERIVATIVES-PRICING-012` — Invalid flag parameter ('c'/'p') passed to py_vollib without validation <sub>(medium)</sub>

py_vollib binary_flag dict only contains keys 'c' and 'p'. Passing any other flag value causes KeyError exception. The library lacks input validation and crashes on invalid inputs. The consequence is unhandled exceptions in production systems when flag values come from external sources with unexpected formats.

### `AP-DERIVATIVES-PRICING-015` — Implied volatility computed without proper bounds validation <sub>(medium)</sub>

When computing implied volatility, option prices outside theoretical bounds (below intrinsic value or above maximum) must raise appropriate exceptions. Returning invalid IV values (negative volatility or extreme values) violates mathematical definitions and leads to incorrect pricing, risk calculations, and hedging ratios. The consequence is systemic pricing errors across all vol-dependent derivatives.

FILE:references/COMPONENTS.md
# Component Capability Map

**Project**: finance-bp-101--FinancePy
**Scan date**: 2026-04-22
**Stats**: {'total_files': 8, 'total_classes': 45, 'total_functions': 0, 'total_stages': 8}

## Modules (8)

- [utilities](components/utilities.md): 4 classes
- [market_curves](components/market_curves.md): 6 classes
- [market_volatility](components/market_volatility.md): 5 classes
- [pricing_models](components/pricing_models.md): 6 classes
- [equity_&_fx_options](components/equity_-_fx_options.md): 6 classes
- [interest_rate_products](components/interest_rate_products.md): 6 classes
- [credit_products](components/credit_products.md): 6 classes
- [bond_products](components/bond_products.md): 6 classes

FILE:references/CONSTRAINTS.md
# Constraints

## preservation_manifest

```yaml
required_objects:
  business_decisions_count: 129
  fatal_constraints_count: 89
  non_fatal_constraints_count: 216
  use_cases_count: 88
  semantic_locks_count: 12
  preconditions_count: 4
  evidence_quality_rules_count: 2
  traceback_scenarios_count: 5

```

FILE:references/LOCKS.md
# Semantic Locks + Preconditions

## Semantic Locks (12)

### `SL-01` <sub>(on_violation: fatal)</sub>
Execute sell orders before buy orders in every trading cycle

### `SL-02` <sub>(on_violation: fatal)</sub>
Trading signals MUST use next-bar execution (no look-ahead)

### `SL-03` <sub>(on_violation: fatal)</sub>
Entity IDs MUST follow format entity_type_exchange_code

### `SL-04` <sub>(on_violation: fatal)</sub>
DataFrame index MUST be MultiIndex (entity_id, timestamp)

### `SL-05` <sub>(on_violation: fatal)</sub>
TradingSignal MUST have EXACTLY ONE of: position_pct, order_money, order_amount

### `SL-06` <sub>(on_violation: fatal)</sub>
filter_result column semantics: True=BUY, False=SELL, None/NaN=NO ACTION

### `SL-07` <sub>(on_violation: fatal)</sub>
Transformer MUST run BEFORE Accumulator in factor pipeline

### `SL-08` <sub>(on_violation: fatal)</sub>
MACD parameters locked: fast=12, slow=26, signal=9

### `SL-09` <sub>(on_violation: warning)</sub>
Default transaction costs: buy_cost=0.001, sell_cost=0.001, slippage=0.001

### `SL-10` <sub>(on_violation: fatal)</sub>
A-share equity trading is T+1 (no same-day close of buy positions)

### `SL-11` <sub>(on_violation: fatal)</sub>
Recorder subclass MUST define provider AND data_schema class attributes

### `SL-12` <sub>(on_violation: fatal)</sub>
Factor result_df MUST contain either 'filter_result' OR 'score_result' column

## Preconditions (4)

- **`PC-01`**: `python3 -c 'import zvt; print(zvt.__version__)'` → on_fail: Run: python3 -m pip install zvt then re-run: python3 -m zvt.init_dirs to initialize data directories
- **`PC-02`**: `python3 -c "from zvt.api.kdata import get_kdata; df = get_kdata(entity_ids=['stock_sh_600000'], limit=1); assert df is not None and len(df) > 0, 'No kdata found'"` → on_fail: Run recorder first: python3 -m zvt.recorders.em.em_stock_kdata_recorder --entity_ids stock_sh_600000 (replace with your target entity IDs)
- **`PC-03`**: `python3 -c "import os; from pathlib import Path; zvt_home = Path(os.environ.get('ZVT_HOME', Path.home() / '.zvt')); assert zvt_home.exists(), f'ZVT home not found: {zvt_home}'"` → on_fail: Run: python3 -m zvt.init_dirs
- **`PC-04`**: `python3 -c "import os, tempfile; from pathlib import Path; zvt_home = Path(os.environ.get('ZVT_HOME', Path.home() / '.zvt')); test_f = zvt_home / '.write_test'; test_f.touch(); test_f.unlink()"` → on_fail: Check directory permissions: chmod u+w ~/.zvt or set ZVT_HOME environment variable to a writable location

FILE:references/USE_CASES.md
# Known Use Cases (KUC)

Total: **88**

## `KUC-101`
**Source**: `notebooks/finutils/FINCALENDAR_IntroductionToUsingCalendars.ipynb`

Determining business days and holidays for different countries to correctly schedule financial transactions and settlements.

## `KUC-102`
**Source**: `notebooks/finutils/FINDATES_TestingDateInternals.ipynb`

Testing internal date representation and Excel date serial number conversion for financial date calculations.

## `KUC-103`
**Source**: `notebooks/finutils/FINDATE_CreatingAndManipulatingFinDates.ipynb`

Creating and manipulating financial dates including adding days, months, tenors, and handling weekends for trade scheduling.

## `KUC-104`
**Source**: `notebooks/finutils/FINDAYCOUNT_Introduction.ipynb`

Calculating year fractions and day counts using various conventions (ACT/360, ACT/365, 30/360) for interest accrual calculations.

## `KUC-105`
**Source**: `notebooks/finutils/FINSCHEDULE_ExamplesOfScheduleGeneration.ipynb`

Generating payment schedules for bonds, swaps, and other fixed income instruments with proper date adjustments.

## `KUC-106`
**Source**: `notebooks/finutils/TENSIONSPLINE_Example.ipynb`

Using tension spline interpolation for smooth curve fitting with adjustable tension parameter.

## `KUC-107`
**Source**: `notebooks/market/curves/FINDISCOUNTCURVEFLAT_ExaminationOfDiscountCurveFlat.ipynb`

Analyzing discount factors and zero rates using a flat discount curve with different compounding frequencies.

## `KUC-108`
**Source**: `notebooks/market/curves/FINDISCOUNTCURVENSS_IntroductionToTheNelsonSiegelSvenssonCurve.ipynb`

Fitting yield curves using the Nelson-Siegel-Svensson parametric model for interest rate surface estimation.

## `KUC-109`
**Source**: `notebooks/market/curves/FINDISCOUNTCURVENS_ExaminingTheNelsonSiegelCurve.ipynb`

Analyzing the Nelson-Siegel model factor loadings and curve fitting for yield curve construction.

## `KUC-110`
**Source**: `notebooks/market/curves/FINDISCOUNTCURVEPOLY_SimpleAnalysis.ipynb`

Fitting discount curves using polynomial functions for yield curve construction and forward rate analysis.

## `KUC-111`
**Source**: `notebooks/market/curves/FINDISCOUNTCURVEZERO_ConvertZeroCurveToDiscountCurve.ipynb`

Converting zero rate curves to discount factor curves for bond and derivatives pricing.

## `KUC-112`
**Source**: `notebooks/market/curves/FINDISCOUNTCURVE_AnalysisOfInterpolationSchemes.ipynb`

Comparing different interpolation methods (linear, cubic spline) for discount curve construction.

## `KUC-113`
**Source**: `notebooks/market/curves/FINDISCOUNTCURVE_Introduction.ipynb`

Introduction to discount curve construction and calculating forward rates, swap rates from the curve.

## `KUC-114`
**Source**: `notebooks/market/curves/FINDISCOUNTCURVE_PieceWiseFlatOverNightForwardRateDiscountCurve.ipynb`

Building discount curves using piecewise flat overnight forward rates with bump analysis for risk management.

## `KUC-115`
**Source**: `notebooks/market/volatility/EquityVolSurfaceConstructionSVI.ipynb`

Constructing equity implied volatility surfaces using the SVI (Stochastic Volatility Inspired) parameterization.

## `KUC-116`
**Source**: `notebooks/market/volatility/FXVolSurfaceConstructionPartOne.ipynb`

Building FX implied volatility surfaces from market quotes using various volatility function types.

## `KUC-117`
**Source**: `notebooks/market/volatility/FXVolSurfaceConstructionPartTwo.ipynb`

Extended FX volatility surface construction with 10-delta and 25-delta quotes.

## `KUC-118`
**Source**: `notebooks/market/volatility/FXVolSurfaceConstructionPartThree.ipynb`

Advanced FX volatility surface construction with multiple tenors and full delta coverage.

## `KUC-119`
**Source**: `notebooks/market/volatility/SimpleBuildFXVolatilitySurface25Delta.ipynb`

Building a simple FX volatility surface using 25-delta risk reversals and strangles.

## `KUC-120`
**Source**: `notebooks/models/CHOLESKY CHECK.ipynb`

Validating Cholesky decomposition for generating correlated random variables in Monte Carlo simulations.

## `KUC-121`
**Source**: `notebooks/models/FINGBMPROCESS_generatePaths.ipynb`

Generating Geometric Brownian Motion paths for asset price simulation in Monte Carlo pricing.

## `KUC-122`
**Source**: `notebooks/models/FINITE_DIFFERENCE.ipynb`

Pricing options using finite difference methods (explicit, implicit, Crank-Nicolson) for Black-Scholes PDE.

## `KUC-123`
**Source**: `notebooks/models/FINITE_DIFFERENCE_PSOR.ipynb`

Using Projected Successive Over-Relaxation (PSOR) to solve finite difference equations for option pricing.

## `KUC-124`
**Source**: `notebooks/models/FINMODEL_GAUSSIANCOPULA_PortfolioLossDistributionBuilder.ipynb`

Building portfolio loss distributions using one-factor Gaussian copula model for credit risk analysis.

## `KUC-125`
**Source**: `notebooks/models/FINMODEL_SABRSHIFTED_InterestRates.ipynb`

Pricing interest rate swaptions using the shifted SABR model to capture volatility smile.

## `KUC-126`
**Source**: `notebooks/models/FINMODEL_SABRSHIFTED_VolatilitySmile.ipynb`

Analyzing volatility smiles using the shifted SABR model for low-rate environments.

## `KUC-127`
**Source**: `notebooks/models/FINMODEL_SABR_InterestRates.ipynb`

Implementing and analyzing the SABR stochastic volatility model for interest rate derivatives.

## `KUC-128`
**Source**: `notebooks/models/FINVOLFUNCTIONS_SSVI_MODEL.ipynb`

Analyzing the Surface SVI (SSVI) parameterization for volatility surface construction and arbitrage-free interpolation.

## `KUC-129`
**Source**: `notebooks/models/MERTON_CREDIT_MODEL.ipynb`

Structural credit risk modeling using Merton's firm value model to calculate default probability and credit spreads.

## `KUC-130`
**Source**: `notebooks/products/bonds/FINANNUITY_Valuation.ipynb`

Valuing bond annuity schedules and calculating clean/dirty prices using discount curves.

## `KUC-131`
**Source**: `notebooks/products/bonds/FINBONDCONVERTIBLE_ComparisonWithQLExample.ipynb`

Validating convertible bond pricing against QuantLib reference implementations.

## `KUC-132`
**Source**: `notebooks/products/bonds/FINBONDCONVERTIBLE_ValuationAndConvergenceTest.ipynb`

Testing convergence of convertible bond Monte Carlo valuation with varying step sizes.

## `KUC-133`
**Source**: `notebooks/products/bonds/FINBONDEMBEDDEDOPTION_Valuation.ipynb`

Valuing callable and putable bonds using interest rate tree models (Hull-White, Black-Karasinski).

## `KUC-134`
**Source**: `notebooks/products/bonds/FINBONDFRN_CitigroupExample.ipynb`

Pricing floating rate notes (FRNs) and calculating discount margin, duration, and convexity.

## `KUC-135`
**Source**: `notebooks/products/bonds/FINBONDFUTURES_ExampleContracts.ipynb`

Analyzing bond futures contracts and calculating cheapest-to-deliver and invoice prices.

## `KUC-136`
**Source**: `notebooks/products/bonds/FINBONDMARKET_DatabaseOfConventions.ipynb`

Accessing standard bond market conventions including day count, frequency, settlement days for different countries.

## `KUC-137`
**Source**: `notebooks/products/bonds/FINBONDMORTGAGE_SimpleCalculator.ipynb`

Calculating mortgage repayment schedules including interest-only and repayment modes.

## `KUC-138`
**Source**: `notebooks/products/bonds/FINBONDOPTION_All_Models_Valuation_Analysis.ipynb`

Valuing bond options (European and American) using various short rate models.

## `KUC-139`
**Source**: `notebooks/products/bonds/FINBONDOPTION_BK_ModelValuationAnalysis.ipynb`

Pricing bond options using the Black-Karasinski interest rate model.

## `KUC-140`
**Source**: `notebooks/products/bonds/FINBONDOPTION_HW_EXAMPLE_MATCH_DERIVA_GEN.ipynb`

Valuing bond options using Hull-White model validated against DerivaGem.

## `KUC-141`
**Source**: `notebooks/products/bonds/FINBONDOPTION_HW_Model_Jamshidian.ipynb`

Pricing European bond options using Hull-White model with Jamshidian decomposition.

## `KUC-142`
**Source**: `notebooks/products/bonds/FINBONDOPTION_Tree_Convergence_With_Volatility.ipynb`

Analyzing convergence of lattice tree methods for bond option pricing with varying volatility.

## `KUC-143`
**Source**: `notebooks/products/bonds/FINBONDOPTION_Tree_Convergence_Zero_Vol.ipynb`

Testing tree convergence for bond options in zero volatility (lognormal) limiting case.

## `KUC-144`
**Source**: `notebooks/products/bonds/FINBONDYIELDCURVES_FittingExample.ipynb`

Fitting yield curves to bond prices using polynomial regression.

## `KUC-145`
**Source**: `notebooks/products/bonds/FINBONDYIELDCURVE_FittingToAswAndZSpreads.ipynb`

Fitting bond yield curves to asset swap spreads and Z-spreads.

## `KUC-146`
**Source**: `notebooks/products/bonds/FINBONDYIELDCURVE_FittingToBondMarketPrices.ipynb`

Fitting yield curves directly to observable bond market prices.

## `KUC-147`
**Source**: `notebooks/products/bonds/FINBONDZEROCURVE_BootstrapOutstandingBonds.ipynb`

Bootstrapping zero coupon curves from outstanding bond prices.

## `KUC-148`
**Source**: `notebooks/products/bonds/FINBOND_CalculateOptionAdjustedSpread.ipynb`

Calculating option-adjusted spread (OAS) for callable bonds.

## `KUC-149`
**Source**: `notebooks/products/bonds/FINBOND_CalculatePriceUsingSurvivalCurve.ipynb`

Calculating bond prices using survival (credit) curves accounting for default risk.

## `KUC-150`
**Source**: `notebooks/products/bonds/FINBOND_CalculatingTheAssetSwapSpread.ipynb`

Calculating asset swap spreads for bonds relative to LIBOR.

## `KUC-151`
**Source**: `notebooks/products/bonds/FINBOND_ComparisonWithQLExample.ipynb`

Validating bond pricing implementation against QuantLib reference.

## `KUC-152`
**Source**: `notebooks/products/bonds/FINBOND_DiscountingBondCashflowsFinDiscountCurve.ipynb`

Calculating bond prices by discounting cash flows using a flat discount curve.

## `KUC-153`
**Source**: `notebooks/products/bonds/FINBOND_ExampleAppleCorp.ipynb`

Full analysis of Apple corporate bond including yield, duration, convexity, and accrued interest.

## `KUC-154`
**Source**: `notebooks/products/bonds/FINBOND_ExampleUSTreasury_CUSIP_91282CFX4.ipynb`

Valuing US Treasury bonds with proper conventions and calculating yields.

## `KUC-155`
**Source**: `notebooks/products/bonds/FINBOND_Key_Rate_Durations_Example.ipynb`

Calculating key rate durations for bond portfolio yield curve sensitivity analysis.

## `KUC-156`
**Source**: `notebooks/products/credit/FINCDSBASKET_ValuationModelComparison.ipynb`

Comparing different valuation models for CDS baskets and basket default swaps.

## `KUC-157`
**Source**: `notebooks/products/credit/FINCDSCURVE_BuildingASurvivalCurve.ipynb`

Building credit survival curves from CDS term structures for credit derivative pricing.

## `KUC-158`
**Source**: `notebooks/products/credit/FINCDSINDEXOPTION_CompareValuationApproaches.ipynb`

Comparing different approaches for valuing CDS index options.

## `KUC-159`
**Source**: `notebooks/products/credit/FINCDSINDEX_ValuingCDSIndex.ipynb`

Valuing credit default swap indices (CDX, iTraxx) and calculating par spreads.

## `KUC-160`
**Source**: `notebooks/products/credit/FINCDSOPTION_ValuingCDSOption.ipynb`

Valuing options on credit default swaps including sensitivity analysis.

## `KUC-161`
**Source**: `notebooks/products/credit/FINCDSTRANCHE_CalculatingFairSpread.ipynb`

Calculating fair spreads for CDS index tranches with different attachment/detachment points.

## `KUC-162`
**Source**: `notebooks/products/credit/FINCDS_ComparisonWithMarkitCDSModel.ipynb`

Validating CDS valuation against Markit CDS model reference implementation.

## `KUC-163`
**Source**: `notebooks/products/credit/FINCDS_CreatingAndValuingACDS.ipynb`

Creating and valuing credit default swaps including par spread and PV calculations.

## `KUC-164`
**Source**: `notebooks/products/credit/FINCDS_CreatingAndValuingACDSFlatCurves.ipynb`

CDS valuation using simplified flat discount and survival curves.

## `KUC-165`
**Source**: `notebooks/products/credit/FINCDS_ForwardAndBackward.ipynb`

Understanding CDS cash flow generation using forward vs backward date generation rules.

## `KUC-166`
**Source**: `notebooks/products/equity/EQUITY_AMERICANOPTION_BARONE_ADESI_WHALEY_APPROX.ipynb`

Pricing American options using Barone-Adesi Whaley approximation method.

## `KUC-167`
**Source**: `notebooks/products/equity/EQUITY_AMERICANOPTION_BJERKSUND_STENSLAND_APPROX.ipynb`

Pricing American options using Bjerksund-Stensland approximation for call-put parity.

## `KUC-168`
**Source**: `notebooks/products/equity/EQUITY_AMERICANOPTION_ComparisonWithQLExample.ipynb`

Validating American option pricing against QuantLib reference implementation.

## `KUC-169`
**Source**: `notebooks/products/equity/EQUITY_ASIAN_OPTIONS.ipynb`

Pricing Asian (average rate) options using geometric and arithmetic averaging methods.

## `KUC-170`
**Source**: `notebooks/products/equity/EQUITY_BARRIER_OPTIONS.ipynb`

Pricing barrier options (up-and-out, down-and-in, etc.) with Greeks calculation.

## `KUC-171`
**Source**: `notebooks/products/equity/EQUITY_BASKET_OPTIONS.ipynb`

Pricing basket options on multiple underlying assets using moment matching.

## `KUC-172`
**Source**: `notebooks/products/equity/EQUITY_CHOOSER_OPTION.ipynb`

Pricing chooser options that allow selection of call or put at a future date.

## `KUC-173`
**Source**: `notebooks/products/equity/EQUITY_CLIQUET_OPTION.ipynb`

Pricing cliquet (reset) options with periodic coupon-like payoffs based on performance.

## `KUC-174`
**Source**: `notebooks/products/equity/EQUITY_COMPOUND_OPTION_CompareWithML.ipynb`

Pricing compound options (option on option) and comparing with machine learning approaches.

## `KUC-175`
**Source**: `notebooks/products/equity/EQUITY_DIGITALOPTION_BasicValuation.ipynb`

Pricing digital options with asset-or-nothing payoff and calculating Greeks.

## `KUC-176`
**Source**: `notebooks/products/equity/EQUITY_DIGITAL_CASH_OR_NOTHING_OPTION.ipynb`

Pricing cash-or-nothing digital options with fixed payoff upon condition.

## `KUC-177`
**Source**: `notebooks/products/equity/EQUITY_FIXED_LOOKBACK_OPTION.ipynb`

Pricing fixed strike lookback options using Monte Carlo simulation.

## `KUC-178`
**Source**: `notebooks/products/equity/EQUITY_FLOAT_LOOKBACK_OPTION.ipynb`

Pricing floating strike lookback options where strike is determined by extreme price.

## `KUC-179`
**Source**: `notebooks/products/equity/EQUITY_ONE_TOUCH_OPTION.ipynb`

Pricing one-touch (digital) options that pay upon touching a barrier level.

## `KUC-180`
**Source**: `notebooks/products/equity/EQUITY_RAINBOW_OPTION.ipynb`

Pricing rainbow options on multiple assets with various payoff structures.

## `KUC-181`
**Source**: `notebooks/products/equity/EQUITY_VANILLA_AMERICAN_STYLE_OPTION.ipynb`

Pricing American vanilla options using LSMC and finite difference methods.

## `KUC-182`
**Source**: `notebooks/products/equity/EQUITY_VANILLA_EUROPEAN_STYLE_MONTE_CARLO_SOBOL.ipynb`

European option pricing using Sobol quasi-random sequences for Monte Carlo.

## `KUC-183`
**Source**: `notebooks/products/equity/EQUITY_VANILLA_EUROPEAN_STYLE_MONTE_CARLO_TIMINGS.ipynb`

Performance benchmarking of Monte Carlo implementations with different libraries.

## `KUC-184`
**Source**: `notebooks/products/equity/EQUITY_VANILLA_EUROPEAN_STYLE_OPTION HIGH VOL LIMIT.ipynb`

Analyzing European option behavior in high volatility limiting cases.

## `KUC-185`
**Source**: `notebooks/products/equity/EQUITY_VANILLA_EUROPEAN_STYLE_OPTION.ipynb`

European option pricing with full Greeks calculation (delta, gamma, theta, vega, rho).

## `KUC-186`
**Source**: `notebooks/products/equity/EQUITY_VANILLA_EUROPEAN_STYLE_OPTION_VECTORISATION.ipynb`

Vectorized European option pricing for multiple strikes, expiries, or option types simultaneously.

## `KUC-187`
**Source**: `notebooks/products/equity/EQUITY_VANILLA_OPTION_IntradayValuationAndGreeks.ipynb`

Intraday option pricing with hourly Greeks updates for trading desks.

## `KUC-188`
**Source**: `notebooks/products/equity/EQUITY_VARIANCESWAP_Basic_Example.ipynb`

Pricing variance swaps and calculating fair strike using realized volatility from option surface.

FILE:references/WISDOM.md
# Cross-Project Wisdom

Total: **8**

## `CW-DERIVATIVES-PRICING-001` — Strict input validation before financial calculations
**From**: FinancePy, QuantLib-SWIG · **Applicable to**: derivatives-pricing

Both FinancePy and QuantLib-SWIG enforce strict validation of all input parameters before any financial computation. FinancePy validates day count types, date arguments, tolerance parameters, and max iterations. QuantLib-SWIG validates exercise types and swap direction enums. This pattern prevents corrupted calculations and provides clear error messages. Apply this pattern by validating all inputs at function entry points.

## `CW-DERIVATIVES-PRICING-002` — Bootstrap requires ordered instrument calibration
**From**: FinancePy, QuantLib-SWIG · **Applicable to**: derivatives-pricing

Both FinancePy and QuantLib-SWIG require calibration instruments to be provided in strict maturity order for curve bootstrapping. FinancePy enforces monotonically increasing time points and validates instrument sequencing (deposits before FRAs before swaps). QuantLib-SWIG uses bootstrap helpers (DepositRateHelper, FraRateHelper, SwapRateHelper) that assume ordered inputs. This ensures the bootstrap algorithm solves for discount factors at mathematically correct time points.

## `CW-DERIVATIVES-PRICING-003` — Handle pattern for lazy evaluation chains
**From**: QuantLib-SWIG · **Applicable to**: derivatives-pricing

QuantLib-SWIG requires wrapping market data (quotes, term structures) in Handle objects to enable lazy evaluation and automatic recalculation. QuoteHandle for market quotes and Handle for term structures enable the observer pattern. When market data updates, all dependent instruments automatically recalculate. This pattern is essential for live pricing systems where prices must reflect current market conditions.

## `CW-DERIVATIVES-PRICING-004` — Parameter composition requires fixed ordering and partitioning
**From**: arch · **Applicable to**: derivatives-pricing

arch enforces a strict parameter composition pattern where mean, volatility, and distribution parameters must be concatenated in fixed order with explicit offset partitioning. The offsets array partitions the unified parameter vector into components. This pattern prevents parameter assignment errors that would corrupt model components. Apply this when composing financial models from multiple sub-components.

## `CW-DERIVATIVES-PRICING-005` — Strict mathematical constraint enforcement
**From**: arch, py_vollib · **Applicable to**: derivatives-pricing

Both arch and py_vollib enforce strict mathematical constraints: arch enforces volatility model stationarity constraints (A.dot(params) - b >= 0) for SLSQP optimization; py_vollib validates implied volatility is positive and option prices within intrinsic/maximum bounds. Violating these constraints produces mathematically invalid results. Always enforce domain constraints on all financial model parameters.

## `CW-DERIVATIVES-PRICING-006` — Forward price adjustment for dividend yield in BSM
**From**: py_vollib · **Applicable to**: derivatives-pricing

py_vollib demonstrates the correct BSM implementation: compute forward price F = S * exp((r-q)*t) to adjust for continuous dividend yield before passing to the pricing engine. This pattern is essential for all options on dividend-paying assets. Forgetting the dividend adjustment causes systematic mispricing for the entire equity derivatives book.

## `CW-DERIVATIVES-PRICING-007` — Monotonicity validation for interpolation arrays
**From**: FinancePy · **Applicable to**: derivatives-pricing

FinancePy enforces strictly monotonically increasing time arrays before interpolation operations. This prevents undefined behavior at crossing times and ensures each time point maps to exactly one discount factor. Apply this validation whenever implementing interpolation over financial time series (discount curves, volatility surfaces, forward rates).

## `CW-DERIVATIVES-PRICING-008` — Production vs reference implementation selection
**From**: py_vollib · **Applicable to**: derivatives-pricing

py_vollib explicitly distinguishes between ref_python (slow, educational) and production (fast, C-based lets_be_rational) implementations. Using the reference implementation in production causes 10-100x performance degradation. Always select the appropriate implementation tier based on use case requirements—reference for testing/education, optimized for production trading systems.

FILE:references/components/bond_products.md
# bond_products (6 classes)

## `Bond.dirty_price_from_discount_curve`
`bond_products/bond-dirty-price-from-discount-curve.py:0`

## `Bond.yield_to_maturity`
`bond_products/bond-yield-to-maturity.py:0`

## `BondCallable.value`
`bond_products/bondcallable-value.py:0`

## `BondFRN.discount_margin`
`bond_products/bondfrn-discount-margin.py:0`

## `ytm_convention`
`bond_products/ytm-convention.py:0`

## `yield_basis`
`bond_products/yield-basis.py:0`

FILE:references/components/credit_products.md
# credit_products (6 classes)

## `CDS.value`
`credit_products/cds-value.py:0`

## `CDS.par_spread`
`credit_products/cds-par-spread.py:0`

## `CDSCurve.build`
`credit_products/cdscurve-build.py:0`

## `CDSTranche.value`
`credit_products/cdstranche-value.py:0`

## `pv01_method`
`credit_products/pv01-method.py:0`

## `prot_method`
`credit_products/prot-method.py:0`

FILE:references/components/equity_-_fx_options.md
# equity_&_fx_options (6 classes)

## `EquityVanillaOption.value`
`equity_&_fx_options/equityvanillaoption-value.py:0`

## `EquityVanillaOption.delta`
`equity_&_fx_options/equityvanillaoption-delta.py:0`

## `EquityAmericanOption.value`
`equity_&_fx_options/equityamericanoption-value.py:0`

## `FXVanillaOption.value`
`equity_&_fx_options/fxvanillaoption-value.py:0`

## `pricing_model`
`equity_&_fx_options/pricing-model.py:0`

## `mc_method`
`equity_&_fx_options/mc-method.py:0`

FILE:references/components/interest_rate_products.md
# interest_rate_products (6 classes)

## `IborSwap.value`
`interest_rate_products/iborswap-value.py:0`

## `IborSwap.set_fixed_rate_to_atm`
`interest_rate_products/iborswap-set-fixed-rate-to-atm.py:0`

## `IborSwaption.value`
`interest_rate_products/iborswaption-value.py:0`

## `IborCapFloor.value`
`interest_rate_products/iborcapfloor-value.py:0`

## `swaption_model`
`interest_rate_products/swaption-model.py:0`

## `swap_rate_interpolation`
`interest_rate_products/swap-rate-interpolation.py:0`

FILE:references/components/market_curves.md
# market_curves (6 classes)

## `DiscountCurve.df`
`market_curves/discountcurve-df.py:0`

## `DiscountCurve.fwd`
`market_curves/discountcurve-fwd.py:0`

## `IborSingleCurve.build`
`market_curves/iborsinglecurve-build.py:0`

## `IborDualCurve.build`
`market_curves/ibordualcurve-build.py:0`

## `bootstrap_method`
`market_curves/bootstrap-method.py:0`

## `interpolator_type`
`market_curves/interpolator-type.py:0`

FILE:references/components/market_volatility.md
# market_volatility (5 classes)

## `FXVolSurface.volatility`
`market_volatility/fxvolsurface-volatility.py:0`

## `FXVolSurfacePlus.calibrate`
`market_volatility/fxvolsurfaceplus-calibrate.py:0`

## `SwaptionVolSurface.value`
`market_volatility/swaptionvolsurface-value.py:0`

## `vol_function_type`
`market_volatility/vol-function-type.py:0`

## `atm_method`
`market_volatility/atm-method.py:0`

FILE:references/components/pricing_models.md
# pricing_models (6 classes)

## `Model.price`
`pricing_models/model-price.py:0`

## `BlackScholes.price`
`pricing_models/blackscholes-price.py:0`

## `SABR.black_vol`
`pricing_models/sabr-black-vol.py:0`

## `Heston.value_lewis`
`pricing_models/heston-value-lewis.py:0`

## `model_implementation`
`pricing_models/model-implementation.py:0`

## `process_type`
`pricing_models/process-type.py:0`

FILE:references/components/utilities.md
# utilities (4 classes)

## `Date.add_days`
`utilities/date-add-days.py:0`

## `Schedule.generate`
`utilities/schedule-generate.py:0`

## `interpolation_scheme`
`utilities/interpolation-scheme.py:0`

## `day_count_convention`
`utilities/day-count-convention.py:0`

ClawHub Coding Data Analysis+2

T@clawhub-tangweigang-jpg-8679fec286

Finance Kg Embedding

Skill

训练动态知识图谱嵌入模型，学习时序实体关系表示，支持链接预测和时间预测任务。

---
name: finance-kg-embedding
description: |-
  训练动态知识图谱嵌入模型，学习时序实体关系表示，支持链接预测和时间预测任务。
license: Proprietary. See LICENSE.txt in project root.
compatibility: Designed for Doramagic-host ecosystem (Claude Code / openclaw / Cursor). Requires Python 3.12+ with uv package manager.
metadata:
  version: "v6.1"
  blueprint_id: "finance-bp-080"
  compiled_at: "2026-04-22T13:00:31.071227+00:00"
  capability_markets: "global"
  capability_activities: "macro-data"
  sop_version: "crystal-compilation-v6.1"
---
# 金融知识图谱嵌入 (finance-kg-embedding)

> 训练动态知识图谱嵌入模型，学习时序实体关系表示，支持链接预测和时间预测任务。

## Pipeline

`data_collection -> data_storage -> factor_computation -> target_selection -> trading_execution -> visualization`

## Top Use Cases (5 total)

### KGTransformer Model Training Pipeline (`UC-101`)
Training a knowledge graph-based transformer model for temporal/dynamic knowledge graph embedding tasks to learn entity and relation representations o
**Triggers**: training, knowledge graph, KGTransformer

### Dynamic Knowledge Graph Model Training (`UC-102`)
Training dynamic knowledge graph models to learn temporal entity and relation embeddings for link prediction and event time prediction tasks
**Triggers**: knowledge graph, dynamic graph, temporal modeling

### Early Stopping Training Utility (`UC-103`)
Preventing overfitting during model training by automatically stopping training when validation performance stops improving, with checkpoint managemen
**Triggers**: early stopping, overfitting prevention, model training

For all **5** use cases, see [references/USE_CASES.md](references/USE_CASES.md).

**Execute trigger**: `When user intent matches intent_router.uc_entries[].positive_terms AND user uses action verb (run/execute/跑/执行/backtest/fetch/collect)`

## What I'll Ask You

- Target market: A-share (default), HK, or crypto? (US stocks in ZVT are half-baked — stockus_nasdaq_AAPL exists but coverage is thin)
- Data source / provider: eastmoney (free, no account), joinquant (account+paid), baostock (free, good history), akshare, or qmt (broker)?
- Strategy type: MACD golden-cross, MA crossover, volume breakout, fundamental screen, or custom factor?
- Time range: start_timestamp and end_timestamp for backtest period
- Target entity IDs: specific stocks (stock_sh_600000) or index components (SZ1000)?

## Semantic Locks (Fatal)

| ID | Rule | On Violation |
|---|---|---|
| `SL-01` | Execute sell orders before buy orders in every trading cycle | halt |
| `SL-02` | Trading signals MUST use next-bar execution (no look-ahead) | halt |
| `SL-03` | Entity IDs MUST follow format entity_type_exchange_code | halt |
| `SL-04` | DataFrame index MUST be MultiIndex (entity_id, timestamp) | halt |
| `SL-05` | TradingSignal MUST have EXACTLY ONE of: position_pct, order_money, order_amount | halt |
| `SL-06` | filter_result column semantics: True=BUY, False=SELL, None/NaN=NO ACTION | halt |
| `SL-07` | Transformer MUST run BEFORE Accumulator in factor pipeline | halt |
| `SL-08` | MACD parameters locked: fast=12, slow=26, signal=9 | halt |

Full lock definitions: [references/LOCKS.md](references/LOCKS.md)

## Top Anti-Patterns (14 total)

- **`AP-MACRO-DATA-001`**: SEC EDGAR Rate Limit Violation
- **`AP-MACRO-DATA-002`**: Temporal Knowledge Graph Look-Ahead Bias
- **`AP-MACRO-DATA-003`**: Technical Indicator Look-Ahead Bias via Missing Shift

All 14 anti-patterns: [references/ANTI_PATTERNS.md](references/ANTI_PATTERNS.md)

## Evidence Quality Notice

> [QUALITY NOTICE] This crystal was compiled from blueprint finance-bp-080. Evidence verify ratio = 19.0% and audit fail total = 15. Generated results may have uncaptured requirement gaps. Verify critical decisions against source files (LATEST.yaml / LATEST.jsonl).

## Reference Files

| File | Contents | When to Load |
|---|---|---|
| [references/seed.yaml](references/seed.yaml) | V6+ 全量权威 (source-of-truth) | 有行为/决策争议时必读 |
| [references/ANTI_PATTERNS.md](references/ANTI_PATTERNS.md) | 14 条跨项目反模式 | 开始实现前 |
| [references/WISDOM.md](references/WISDOM.md) | 跨项目精华借鉴 | 架构决策时 |
| [references/CONSTRAINTS.md](references/CONSTRAINTS.md) | domain + fatal 约束 | 规则冲突时 |
| [references/USE_CASES.md](references/USE_CASES.md) | 全量 KUC-* 业务场景 | 需要完整示例时 |
| [references/LOCKS.md](references/LOCKS.md) | SL-* + preconditions + hints | 生成回测/交易代码前 |
| [references/COMPONENTS.md](references/COMPONENTS.md) | AST 组件地图（按 module 拆分）| 查 API 时 |

---

*Compiled by Doramagic crystal-compilation-v6.1 from `finance-bp-080` blueprint at 2026-04-22T13:00:31.071227+00:00.*
*See [human_summary.md](human_summary.md) for non-technical overview.*

FILE:human_summary.md
# Human Summary

> I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow. Just tell me what you want; I'll write the code, you don't have to dig docs. (Heads up: ZVT natively supports A-share, HK, and crypto. US stocks — stockus_nasdaq_AAPL — are half-baked; don't bother for serious work.)

## What I Can Do

- **tagline**: I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow. Just tell me what you want; I'll write the code, you don't have to dig docs. (Heads up: ZVT natively supports A-share, HK, and crypto. US stocks — stockus_nasdaq_AAPL — are half-baked; don't bother for serious work.)
- **use_cases**: ['Early Stopping Training Utility', 'Dynamic Knowledge Graph Model Training', 'KGTransformer Model Training Pipeline', 'A-share MACD daily golden-cross backtest with hfq price adjustment from eastmoney', 'End-to-end ZVT pipeline: FinanceRecorder + GoodCompanyFactor + StockTrader', 'Multi-factor strategy with TargetSelector (AND mode) combining MACD + volume breakout', 'Index composition data collection (SZ1000, SZ2000) with EM recorder']

## What I Ask You

- Target market: A-share (default), HK, or crypto? (US stocks in ZVT are half-baked — stockus_nasdaq_AAPL exists but coverage is thin)
- Data source / provider: eastmoney (free, no account), joinquant (account+paid), baostock (free, good history), akshare, or qmt (broker)?
- Strategy type: MACD golden-cross, MA crossover, volume breakout, fundamental screen, or custom factor?
- Time range: start_timestamp and end_timestamp for backtest period
- Target entity IDs: specific stocks (stock_sh_600000) or index components (SZ1000)?

FILE:references/ANTI_PATTERNS.md
# Anti-Patterns (Cross-Project)

Total: **14**

## finance-bp-074--FinRobot (1)

### `AP-MACRO-DATA-001` — SEC EDGAR Rate Limit Violation <sub>(high)</sub>

When implementing SEC API calls without applying rate limiting decorators, requests exceed the regulatory 10 requests per second limit. This causes IP blocking from SEC EDGAR, preventing all subsequent access to critical financial filings and completely disrupting the data collection pipeline. FinRobot demonstrates that SEC enforces strict rate limits and missing User-Agent headers compound this by causing silent request failures.

## finance-bp-077--Open_Source_Economic_Model (2)

### `AP-MACRO-DATA-004` — EIOPA Non-Compliant Curve Extrapolation <sub>(high)</sub>

When implementing the Smith-Wilson algorithm for EIOPA Solvency II calculations, using non-EIOPA compliant formulas or incorrect convergence point calculations violates regulatory specifications. The convergence point must use max(U+40, 60) years per EIOPA paragraph 120. Non-compliant formulas will fail regulatory audits for insurance liability calculations and produce incorrect risk-free rates, leading to materially wrong liability valuations.

### `AP-MACRO-DATA-009` — CSV BOM Encoding Corruption in Data Import <sub>(medium)</sub>

When importing CSV portfolio files with special characters without using 'utf-8-sig' encoding to handle BOM markers, CSV files with UTF-8 BOM markers fail to parse correctly. This causes KeyError exceptions when reading row fields, preventing portfolio data from loading entirely. The BOM marker silently corrupts the first column name read by pandas.

## finance-bp-080--FinDKG (3)

### `AP-MACRO-DATA-002` — Temporal Knowledge Graph Look-Ahead Bias <sub>(high)</sub>

When implementing temporal data splitting for knowledge graphs, using non-temporal train/val/test splits causes the model to see future events during training. The violation of train_edges occurring before val_edges and test_edges temporally results in inflated metrics that do not reflect real-world performance. This produces overfit models that fail catastrophically when deployed for actual temporal prediction tasks.

### `AP-MACRO-DATA-008` — DGL Graph Attribute Propagation Failure in Temporal Batching <sub>(medium)</sub>

When implementing temporal knowledge graph data collation without propagating graph attributes (num_relations, num_all_nodes, time_interval) to subgraph variants created by collate_fn, downstream model components encounter missing attribute errors. The EmbeddingUpdater and EdgeModel expect these attributes on all graph objects including subgraphs, causing training to fail with AttributeError.

### `AP-MACRO-DATA-014` — Temporal DataLoader Shuffling Breaking Graph Ordering <sub>(medium)</sub>

When configuring DataLoader for temporal knowledge graph training with shuffle=True, the temporal ordering required for cumulative graph construction is violated. The model receives edges in non-chronological order, breaking the prior_G, batch_G, cumulative_G construction logic that depends on edges_before_batch occurring before edges_in_batch.

## finance-bp-083--Economic-Dashboard (3)

### `AP-MACRO-DATA-003` — Technical Indicator Look-Ahead Bias via Missing Shift <sub>(high)</sub>

When implementing SMA crossover detection (golden/death cross) without using shift(1) to compare current bar state with prior bar state, crossover detection uses current bar data causing look-ahead bias. Signals appear to fire at the same bar as the cross occurs, producing unrealistic backtest results that fail in live trading. Rationalizing this with 'we need the current bar signal immediately' leads to future information leaking into current signals.

### `AP-MACRO-DATA-010` — OHLCV Data Quality Validation Failure <sub>(medium)</sub>

When calculating technical indicators from OHLCV data without verifying required columns (open, high, low, close, volume), missing required OHLCV columns causes ValueError and prevents technical indicator calculation for affected tickers. This blocks downstream regime classification and pattern detection for all tickers with incomplete data.

### `AP-MACRO-DATA-011` — Inconsistent Primary Key Schema Causing JOIN Failures <sub>(medium)</sub>

When storing derived features in DuckDB with a different primary key schema than technical_features table, inconsistent primary keys prevent JOIN operations between tables. This breaks regime classification and pattern detection pipelines. The composite primary key (ticker, date) must be consistent across all feature tables to enable efficient querying and data integrity.

## finance-bp-105--open-climate-investing (5)

### `AP-MACRO-DATA-005` — Factor Regression Using Raw Returns Instead of Excess Returns <sub>(high)</sub>

When computing returns for CAPM/Fama-French factor regression, using raw stock returns instead of subtracting the risk-free rate (Rf) violates standard financial econometric methodology. CAPM/FF regression requires excess returns (Return - Rf); using raw returns produces incorrect beta estimates that misrepresent a stock's systematic risk exposure. This leads to fundamentally flawed risk attribution and portfolio construction decisions.

### `AP-MACRO-DATA-006` — Percentage vs Decimal Unit Mismatch in Factor Data <sub>(high)</sub>

When importing Fama-French factors from CSV files, failing to divide percentage-formatted factors (e.g., 5.2) by 100 before regression causes coefficients scaled by 100x. This produces statistically invalid inference and meaningless factor loadings. The same issue applies to risk-free rate values, corrupting all CAPM beta calculations downstream.

### `AP-MACRO-DATA-007` — Insufficient Regression Observations for Statistical Validity <sub>(medium)</sub>

When implementing factor regression analysis, using fewer than 20 data points after filtering (inner join, winsorization, date range) produces unreliable or undefined t-statistics and p-values. OLS with insufficient observations produces meaningless regression coefficients, making it impossible to distinguish significant factor exposures from noise. This commonly occurs when combining multiple data sources with missing values.

### `AP-MACRO-DATA-012` — Frequency Column Enforcement Missing in Time Series Schema <sub>(medium)</sub>

When creating PostgreSQL schema for time series tables without explicit frequency column enforcement of 'MONTHLY' or 'DAILY' text values, mixed frequency data corrupts regression calculations. Combining incompatible data frequencies produces statistically invalid regression results. The database must enforce frequency consistency to prevent silent data corruption.

### `AP-MACRO-DATA-013` — PostgreSQL Fork in Multiprocessing Context <sub>(medium)</sub>

When implementing multiprocessing for parallel regression execution using fork start method with psycopg2 database connections, child processes inherit corrupted connection state. This causes 'connection already closed' errors or corrupted connection state in child processes, resulting in failed database writes and incomplete factor regression calculations.

FILE:references/COMPONENTS.md
# Component Capability Map

**Project**: finance-bp-080--FinDKG
**Scan date**: 2026-04-22
**Stats**: {'total_files': 9, 'total_classes': 43, 'total_functions': 0, 'total_stages': 9}

## Modules (9)

- [data_loading_&_temporal_graph_construction](components/data_loading_-_temporal_graph_construction.md): 4 classes
- [data_collocation](components/data_collocation.md): 2 classes
- [dynamic_embedding_updater](components/dynamic_embedding_updater.md): 8 classes
- [graph_neural_network_convolution](components/graph_neural_network_convolution.md): 4 classes
- [static-dynamic_embedding_combination](components/static-dynamic_embedding_combination.md): 5 classes
- [temporal_link_prediction](components/temporal_link_prediction.md): 3 classes
- [inter-event_time_prediction_(tpp)](components/inter-event_time_prediction_-tpp.md): 5 classes
- [training_pipeline](components/training_pipeline.md): 6 classes
- [evaluation_&_metrics](components/evaluation_-_metrics.md): 6 classes

FILE:references/CONSTRAINTS.md
# Constraints

## preservation_manifest

```yaml
required_objects:
  business_decisions_count: 143
  fatal_constraints_count: 70
  non_fatal_constraints_count: 164
  use_cases_count: 5
  semantic_locks_count: 12
  preconditions_count: 4
  evidence_quality_rules_count: 2
  traceback_scenarios_count: 5

```

FILE:references/LOCKS.md
# Semantic Locks + Preconditions

## Semantic Locks (12)

### `SL-01` <sub>(on_violation: fatal)</sub>
Execute sell orders before buy orders in every trading cycle

### `SL-02` <sub>(on_violation: fatal)</sub>
Trading signals MUST use next-bar execution (no look-ahead)

### `SL-03` <sub>(on_violation: fatal)</sub>
Entity IDs MUST follow format entity_type_exchange_code

### `SL-04` <sub>(on_violation: fatal)</sub>
DataFrame index MUST be MultiIndex (entity_id, timestamp)

### `SL-05` <sub>(on_violation: fatal)</sub>
TradingSignal MUST have EXACTLY ONE of: position_pct, order_money, order_amount

### `SL-06` <sub>(on_violation: fatal)</sub>
filter_result column semantics: True=BUY, False=SELL, None/NaN=NO ACTION

### `SL-07` <sub>(on_violation: fatal)</sub>
Transformer MUST run BEFORE Accumulator in factor pipeline

### `SL-08` <sub>(on_violation: fatal)</sub>
MACD parameters locked: fast=12, slow=26, signal=9

### `SL-09` <sub>(on_violation: warning)</sub>
Default transaction costs: buy_cost=0.001, sell_cost=0.001, slippage=0.001

### `SL-10` <sub>(on_violation: fatal)</sub>
A-share equity trading is T+1 (no same-day close of buy positions)

### `SL-11` <sub>(on_violation: fatal)</sub>
Recorder subclass MUST define provider AND data_schema class attributes

### `SL-12` <sub>(on_violation: fatal)</sub>
Factor result_df MUST contain either 'filter_result' OR 'score_result' column

## Preconditions (4)

- **`PC-01`**: `python3 -c 'import zvt; print(zvt.__version__)'` → on_fail: Run: python3 -m pip install zvt then re-run: python3 -m zvt.init_dirs to initialize data directories
- **`PC-02`**: `python3 -c "from zvt.api.kdata import get_kdata; df = get_kdata(entity_ids=['stock_sh_600000'], limit=1); assert df is not None and len(df) > 0, 'No kdata found'"` → on_fail: Run recorder first: python3 -m zvt.recorders.em.em_stock_kdata_recorder --entity_ids stock_sh_600000 (replace with your target entity IDs)
- **`PC-03`**: `python3 -c "import os; from pathlib import Path; zvt_home = Path(os.environ.get('ZVT_HOME', Path.home() / '.zvt')); assert zvt_home.exists(), f'ZVT home not found: {zvt_home}'"` → on_fail: Run: python3 -m zvt.init_dirs
- **`PC-04`**: `python3 -c "import os, tempfile; from pathlib import Path; zvt_home = Path(os.environ.get('ZVT_HOME', Path.home() / '.zvt')); test_f = zvt_home / '.write_test'; test_f.touch(); test_f.unlink()"` → on_fail: Check directory permissions: chmod u+w ~/.zvt or set ZVT_HOME environment variable to a writable location

FILE:references/USE_CASES.md
# Known Use Cases (KUC)

Total: **5**

## `KUC-101`
**Source**: `train_DKG_run.py`

Training a knowledge graph-based transformer model for temporal/dynamic knowledge graph embedding tasks to learn entity and relation representations over time.

## `KUC-102`
**Source**: `DKG/train.py`

Training dynamic knowledge graph models to learn temporal entity and relation embeddings for link prediction and event time prediction tasks.

## `KUC-103`
**Source**: `DKG/utils/train_utils.py`

Preventing overfitting during model training by automatically stopping training when validation performance stops improving, with checkpoint management.

## `KUC-104`
**Source**: `DKG/eval.py`

Evaluating trained knowledge graph models on link prediction and time prediction tasks to measure model performance using various metrics.

## `KUC-105`
**Source**: `DKG/utils/eval_utils.py`

Computing standard ranking metrics (MRR, recall) and regression metrics (MAE, MSE, RMSE) for evaluating machine learning model performance.

FILE:references/WISDOM.md
# Cross-Project Wisdom

Total: **8**

## `CW-MACRO-DATA-001` — Temporal Ordering Enforcement
**From**: finance-bp-080--FinDKG, finance-bp-083--Economic-Dashboard · **Applicable to**: macro-data

Across temporal knowledge graphs and financial time series, strict temporal ordering must be enforced in train/val/test splits and data loading. Train edges must occur strictly before validation edges, which must occur strictly before test edges. DataLoaders must never shuffle temporal data. Apply this pattern whenever implementing any time-series ML pipeline to prevent look-ahead bias that inflates evaluation metrics.

## `CW-MACRO-DATA-002` — Regulatory Formula Compliance
**From**: finance-bp-077--Open_Source_Economic_Model, finance-bp-105--open-climate-investing · **Applicable to**: macro-data

When implementing financial calculations subject to regulatory oversight (EIOPA Solvency II, CAPM, Fama-French), use exact formula specifications from authoritative sources. The Smith-Wilson convergence point must follow EIOPA paragraph 120, factor regressions must use excess returns with properly scaled inputs. Apply this pattern when calculations will be used for regulatory reporting or investment decision-making.

## `CW-MACRO-DATA-003` — Strict Data Schema Enforcement
**From**: finance-bp-083--Economic-Dashboard, finance-bp-077--Open_Source_Economic_Model · **Applicable to**: macro-data

Financial data pipelines require strict schema validation at ingestion points. OHLCV requires specific columns, CSV imports require exact column names matching field access, INI files require specific sections. Missing or malformed schema elements should fail loudly rather than produce silent corruption. Apply this pattern during data import to catch errors early before downstream calculations use bad data.

## `CW-MACRO-DATA-004` — Composite Primary Key Uniqueness
**From**: finance-bp-105--open-climate-investing, finance-bp-080--FinDKG, finance-bp-083--Economic-Dashboard · **Applicable to**: macro-data

Time-series financial databases require composite primary keys (ticker, date) to ensure uniqueness and enable efficient querying. Inconsistent primary keys across tables break JOIN operations essential for feature merging. Apply this pattern when designing any financial database schema involving time-series measurements with multiple entities.

## `CW-MACRO-DATA-005` — External API Rate Limiting
**From**: finance-bp-074--FinRobot · **Applicable to**: macro-data

When accessing external financial APIs (SEC EDGAR, data vendors), strict rate limiting must be implemented before deployment. SEC EDGAR enforces 10 requests per second with IP blocking consequences. Use decorators and proper User-Agent headers. Apply this pattern when integrating any external financial data API to prevent service disruption that blocks critical data access.

## `CW-MACRO-DATA-006` — Graph Attribute Propagation in Batching
**From**: finance-bp-080--FinDKG, finance-bp-105--open-climate-investing · **Applicable to**: macro-data

When creating subgraph variants during batch collation in graph-based ML, all metadata attributes (num_nodes, num_relations, time_interval) must be explicitly propagated to each subgraph. Downstream model components expect these attributes on all graph objects. Apply this pattern whenever implementing custom collate functions for graph neural networks to prevent training failures.

## `CW-MACRO-DATA-007` — Statistical Validity Thresholds
**From**: finance-bp-105--open-climate-investing, finance-bp-083--Economic-Dashboard · **Applicable to**: macro-data

Factor regressions and statistical calculations require minimum observation counts (typically 20+) for reliable inference. Inner joins, winsorization, and date filtering reduce observations; pipeline validation must check for sufficient data points before regression. Apply this pattern whenever computing regression statistics to ensure results are meaningful rather than spurious.

## `CW-MACRO-DATA-008` — Data Type Strictness for ML Operations
**From**: finance-bp-080--FinDKG, finance-bp-077--Open_Source_Economic_Model · **Applicable to**: macro-data

Graph operations and time calculations require strict dtype consistency (float32 for time values, integer for node types, boolean for masks). Type mismatches cause silent failures in edge_subgraph, degree calculations, and time interval transformations. Apply this pattern when preparing data for graph neural networks or any numerical ML pipeline to catch dtype issues early.

FILE:references/components/data_collocation.md
# data_collocation (2 classes)

## `collate_fn`
`data_collocation/collate-fn.py:0`

## `batch_time_window`
`data_collocation/batch-time-window.py:0`

FILE:references/components/data_loading_-_temporal_graph_construction.md
# data_loading_&_temporal_graph_construction (4 classes)

## `load_temporal_knowledge_graph`
`data_loading_&_temporal_graph_construction/load-temporal-knowledge-graph.py:0`

## `load_data_table`
`data_loading_&_temporal_graph_construction/load-data-table.py:0`

## `get_edge_mask`
`data_loading_&_temporal_graph_construction/get-edge-mask.py:0`

## `data_format`
`data_loading_&_temporal_graph_construction/data-format.py:0`

FILE:references/components/dynamic_embedding_updater.md
# dynamic_embedding_updater (8 classes)

## `EmbeddingUpdater.forward`
`dynamic_embedding_updater/embeddingupdater-forward.py:0`

## `GraphStructuralRNNConv.forward`
`dynamic_embedding_updater/graphstructuralrnnconv-forward.py:0`

## `GraphTemporalRNNConv.forward`
`dynamic_embedding_updater/graphtemporalrnnconv-forward.py:0`

## `RelationRNN.update`
`dynamic_embedding_updater/relationrnn-update.py:0`

## `EventTimeHelper.compute_inter_event_times`
`dynamic_embedding_updater/eventtimehelper-compute-inter-event-time.py:0`

## `gnn_architecture`
`dynamic_embedding_updater/gnn-architecture.py:0`

## `rnn_cell_type`
`dynamic_embedding_updater/rnn-cell-type.py:0`

## `inter_event_time_mode`
`dynamic_embedding_updater/inter-event-time-mode.py:0`

FILE:references/components/evaluation_-_metrics.md
# evaluation_&_metrics (6 classes)

## `evaluate`
`evaluation_&_metrics/evaluate.py:0`

## `eval_link_prediction`
`evaluation_&_metrics/eval-link-prediction.py:0`

## `EdgeEvaluator.evaluate_edges`
`evaluation_&_metrics/edgeevaluator-evaluate-edges.py:0`

## `RankingMetric.compute`
`evaluation_&_metrics/rankingmetric-compute.py:0`

## `RegressionMetric.compute`
`evaluation_&_metrics/regressionmetric-compute.py:0`

## `evaluation_mode`
`evaluation_&_metrics/evaluation-mode.py:0`

FILE:references/components/graph_neural_network_convolution.md
# graph_neural_network_convolution (4 classes)

## `RGCN.forward`
`graph_neural_network_convolution/rgcn-forward.py:0`

## `KGTransformer.forward`
`graph_neural_network_convolution/kgtransformer-forward.py:0`

## `GraphTransformer.layer`
`graph_neural_network_convolution/graphtransformer-layer.py:0`

## `gnn_layer`
`graph_neural_network_convolution/gnn-layer.py:0`

FILE:references/components/inter-event_time_prediction_-tpp.md
# inter-event_time_prediction_(tpp) (5 classes)

## `InterEventTimeModel.forward`
`inter-event_time_prediction_(tpp)/intereventtimemodel-forward.py:0`

## `LogNormMixTPP.forward`
`inter-event_time_prediction_(tpp)/lognormmixtpp-forward.py:0`

## `LogNormalMixtureDistribution.sample/log_prob`
`inter-event_time_prediction_(tpp)/lognormalmixturedistribution-sample-log-.py:0`

## `tpp_distribution_family`
`inter-event_time_prediction_(tpp)/tpp-distribution-family.py:0`

## `inter_event_time_mode`
`inter-event_time_prediction_(tpp)/inter-event-time-mode.py:0`

FILE:references/components/static-dynamic_embedding_combination.md
# static-dynamic_embedding_combination (5 classes)

## `Combiner.forward`
`static-dynamic_embedding_combination/combiner-forward.py:0`

## `StaticDynamicCombiner.combine`
`static-dynamic_embedding_combination/staticdynamiccombiner-combine.py:0`

## `GraphReadout.readout`
`static-dynamic_embedding_combination/graphreadout-readout.py:0`

## `combination_mode`
`static-dynamic_embedding_combination/combination-mode.py:0`

## `graph_readout_operation`
`static-dynamic_embedding_combination/graph-readout-operation.py:0`

FILE:references/components/temporal_link_prediction.md
# temporal_link_prediction (3 classes)

## `EdgeModel.forward`
`temporal_link_prediction/edgemodel-forward.py:0`

## `StaticEdgeModel.forward`
`temporal_link_prediction/staticedgemodel-forward.py:0`

## `edge_model_type`
`temporal_link_prediction/edge-model-type.py:0`

FILE:references/components/training_pipeline.md
# training_pipeline (6 classes)

## `main`
`training_pipeline/main.py:0`

## `compute_loss`
`training_pipeline/compute-loss.py:0`

## `forward_graphs`
`training_pipeline/forward-graphs.py:0`

## `EarlyStopping.check`
`training_pipeline/earlystopping-check.py:0`

## `optimize_target`
`training_pipeline/optimize-target.py:0`

## `early_stop_criterion`
`training_pipeline/early-stop-criterion.py:0`

ClawHub Coding Testing+2

T@clawhub-tangweigang-jpg-8679fec286

Fava Beancount Viewer

Skill

提供基于Fava/Beancount的投资组合管理能力，支持税务亏损收割优化、资产配置分析与等价证券分组识别，辅助用户制定最优卖出策略。

---
name: fava-beancount-viewer
description: |-
  提供基于Fava/Beancount的投资组合管理能力，支持税务亏损收割优化、资产配置分析与等价证券分组识别，辅助用户制定最优卖出策略。
license: Proprietary. See LICENSE.txt in project root.
compatibility: Designed for Doramagic-host ecosystem (Claude Code / openclaw / Cursor). Requires Python 3.12+ with uv package manager.
metadata:
  version: "v6.1"
  blueprint_id: "finance-bp-078"
  compiled_at: "2026-04-22T13:00:29.702985+00:00"
  capability_markets: "global"
  capability_activities: "accounting"
  sop_version: "crystal-compilation-v6.1"
---
# Fava 账本查看 (fava-beancount-viewer)

> 提供基于Fava/Beancount的投资组合管理能力，支持税务亏损收割优化、资产配置分析与等价证券分组识别，辅助用户制定最优卖出策略。

## Pipeline

`data_collection -> data_storage -> factor_computation -> target_selection -> trading_execution -> visualization`

## Top Use Cases (5 total)

### Portfolio Management CLI Entry Point (`UC-101`)
Provides a unified command-line interface for portfolio management operations including tax loss harvesting, asset allocation analysis, cash drag dete
**Triggers**: portfolio management, CLI, command line

### Tax-Optimized Selling Strategy (`UC-103`)
Determines optimal sell order for securities to minimize realized capital gains by analyzing cost basis and holding periods across multiple lots
**Triggers**: minimize gains, tax-efficient selling, capital gains optimization

### Tax Loss Harvesting Opportunity Detection (`UC-105`)
Identifies securities with unrealized losses that can be sold to harvest tax losses, typically looking back 30 days to find positions eligible for was
**Triggers**: tax loss harvesting, loss identification, wash sale

For all **5** use cases, see [references/USE_CASES.md](references/USE_CASES.md).

**Execute trigger**: `When user intent matches intent_router.uc_entries[].positive_terms AND user uses action verb (run/execute/跑/执行/backtest/fetch/collect)`

## What I'll Ask You

- Target market: A-share (default), HK, or crypto? (US stocks in ZVT are half-baked — stockus_nasdaq_AAPL exists but coverage is thin)
- Data source / provider: eastmoney (free, no account), joinquant (account+paid), baostock (free, good history), akshare, or qmt (broker)?
- Strategy type: MACD golden-cross, MA crossover, volume breakout, fundamental screen, or custom factor?
- Time range: start_timestamp and end_timestamp for backtest period
- Target entity IDs: specific stocks (stock_sh_600000) or index components (SZ1000)?

## Semantic Locks (Fatal)

| ID | Rule | On Violation |
|---|---|---|
| `SL-01` | Execute sell orders before buy orders in every trading cycle | halt |
| `SL-02` | Trading signals MUST use next-bar execution (no look-ahead) | halt |
| `SL-03` | Entity IDs MUST follow format entity_type_exchange_code | halt |
| `SL-04` | DataFrame index MUST be MultiIndex (entity_id, timestamp) | halt |
| `SL-05` | TradingSignal MUST have EXACTLY ONE of: position_pct, order_money, order_amount | halt |
| `SL-06` | filter_result column semantics: True=BUY, False=SELL, None/NaN=NO ACTION | halt |
| `SL-07` | Transformer MUST run BEFORE Accumulator in factor pipeline | halt |
| `SL-08` | MACD parameters locked: fast=12, slow=26, signal=9 | halt |

Full lock definitions: [references/LOCKS.md](references/LOCKS.md)

## Top Anti-Patterns (15 total)

- **`AP-ACCOUNTING-001`**: Using floating-point arithmetic for monetary amounts
- **`AP-ACCOUNTING-002`**: Skipping initialization calls before VM/script execution
- **`AP-ACCOUNTING-003`**: Mixing different asset types in monetary operations

All 15 anti-patterns: [references/ANTI_PATTERNS.md](references/ANTI_PATTERNS.md)

## Evidence Quality Notice

> [QUALITY NOTICE] This crystal was compiled from blueprint finance-bp-078. Evidence verify ratio = 21.6% and audit fail total = 14. Generated results may have uncaptured requirement gaps. Verify critical decisions against source files (LATEST.yaml / LATEST.jsonl).

## Reference Files

| File | Contents | When to Load |
|---|---|---|
| [references/seed.yaml](references/seed.yaml) | V6+ 全量权威 (source-of-truth) | 有行为/决策争议时必读 |
| [references/ANTI_PATTERNS.md](references/ANTI_PATTERNS.md) | 15 条跨项目反模式 | 开始实现前 |
| [references/WISDOM.md](references/WISDOM.md) | 跨项目精华借鉴 | 架构决策时 |
| [references/CONSTRAINTS.md](references/CONSTRAINTS.md) | domain + fatal 约束 | 规则冲突时 |
| [references/USE_CASES.md](references/USE_CASES.md) | 全量 KUC-* 业务场景 | 需要完整示例时 |
| [references/LOCKS.md](references/LOCKS.md) | SL-* + preconditions + hints | 生成回测/交易代码前 |
| [references/COMPONENTS.md](references/COMPONENTS.md) | AST 组件地图（按 module 拆分）| 查 API 时 |

---

*Compiled by Doramagic crystal-compilation-v6.1 from `finance-bp-078` blueprint at 2026-04-22T13:00:29.702985+00:00.*
*See [human_summary.md](human_summary.md) for non-technical overview.*

FILE:human_summary.md
# Human Summary

> I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow. Just tell me what you want; I'll write the code, you don't have to dig docs. (Heads up: ZVT natively supports A-share, HK, and crypto. US stocks — stockus_nasdaq_AAPL — are half-baked; don't bother for serious work.)

## What I Can Do

- **tagline**: I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow. Just tell me what you want; I'll write the code, you don't have to dig docs. (Heads up: ZVT natively supports A-share, HK, and crypto. US stocks — stockus_nasdaq_AAPL — are half-baked; don't bother for serious work.)
- **use_cases**: ['Tax-Optimized Selling Strategy', 'Related Ticker Grouping Utility', 'Portfolio Management CLI Entry Point', 'A-share MACD daily golden-cross backtest with hfq price adjustment from eastmoney', 'End-to-end ZVT pipeline: FinanceRecorder + GoodCompanyFactor + StockTrader', 'Multi-factor strategy with TargetSelector (AND mode) combining MACD + volume breakout', 'Index composition data collection (SZ1000, SZ2000) with EM recorder']

## What I Ask You

- Target market: A-share (default), HK, or crypto? (US stocks in ZVT are half-baked — stockus_nasdaq_AAPL exists but coverage is thin)
- Data source / provider: eastmoney (free, no account), joinquant (account+paid), baostock (free, good history), akshare, or qmt (broker)?
- Strategy type: MACD golden-cross, MA crossover, volume breakout, fundamental screen, or custom factor?
- Time range: start_timestamp and end_timestamp for backtest period
- Target entity IDs: specific stocks (stock_sh_600000) or index components (SZ1000)?

FILE:references/ANTI_PATTERNS.md
# Anti-Patterns (Cross-Project)

Total: **15**

## finance-bp-073--ledger (7)

### `AP-ACCOUNTING-002` — Skipping initialization calls before VM/script execution <sub>(high)</sub>

Executing Numscript VM without first calling ResolveResources() and ResolveBalances() causes panics with ErrResourcesNotInitialized or ErrBalancesNotInitialized. This prevents any script execution and leaves transactions in an unrunnable state, blocking financial operations entirely.

### `AP-ACCOUNTING-003` — Mixing different asset types in monetary operations <sub>(high)</sub>

Performing addition, subtraction, or take operations on amounts with different asset types produces invalid financial calculations. This violates the fundamental accounting principle that amounts in different currencies cannot be combined, leading to corrupted account balances and failed reconciliations.

### `AP-ACCOUNTING-004` — Missing insufficient funds validation <sub>(high)</sub>

Failing to detect when account balance cannot cover a requested withdrawal or transfer allows overdrafts beyond permitted limits. This causes real monetary losses, account balance violations, and potential regulatory compliance issues in global markets.

### `AP-ACCOUNTING-005` — Non-atomic transaction commit/rollback <sub>(high)</sub>

Processing database operations without atomic commit/rollback leaves partial state when failures occur. This corrupts account balances and volumes, violating double-entry bookkeeping integrity and making audit trails unreliable for global regulatory compliance.

### `AP-ACCOUNTING-006` — On-demand posting generation causing double-spending <sub>(high)</sub>

Computing postings on-demand rather than accumulating them during transaction execution fails to track already-spent funds within the same transaction. This creates double-spending vulnerabilities that violate atomic transaction semantics and can result in significant financial losses.

### `AP-ACCOUNTING-007` — Log insertion after transaction commit breaking event sourcing <sub>(high)</sub>

Committing the transaction before inserting the audit log breaks the event sourcing pattern fundamental to accounting integrity. This makes it impossible to rebuild state from logs and violates audit requirements necessary for global financial compliance.

### `AP-ACCOUNTING-008` — Incomplete transaction log hash chaining <sub>(high)</sub>

Computing log hashes without including the previous log hash breaks the immutable audit trail chain. This allows undetected tampering with historical transaction records, compromising financial integrity and regulatory audit compliance.

## finance-bp-073--ledger, finance-bp-129--beancount (1)

### `AP-ACCOUNTING-001` — Using floating-point arithmetic for monetary amounts <sub>(high)</sub>

Representing currency values with float64 or similar floating-point types causes precision loss during arithmetic operations. Rounding errors accumulate over multiple transactions, leading to incorrect balance calculations and potential financial losses. This violates the fundamental requirement that monetary calculations must be exact.

## finance-bp-078--fava_investor (4)

### `AP-ACCOUNTING-009` — Incorrect row data access patterns on query results <sub>(high)</sub>

Using dictionary notation (row['column_name']) on namedtuple query results raises TypeError since namedtuples only support attribute access. This breaks all module queries expecting attribute-style access, causing asset allocation, tax loss harvesting, and other critical financial computations to fail.

### `AP-ACCOUNTING-010` — Missing bidirectional inference for fund relationship declarations <sub>(medium)</sub>

When relationship A→B is declared but B→A is not inferred, the TLH partner list becomes incomplete. This leads to suboptimal tax-loss harvesting decisions where only some funds show all valid swap options, reducing potential tax savings for investors.

### `AP-ACCOUNTING-011` — Wash sale comparison within substantially identical groups <sub>(high)</sub>

Comparing a ticker to itself in its own substantially identical group falsely triggers wash sale warnings. This incorrectly blocks valid tax-loss harvesting transactions, causing investors to miss opportunities to realize tax losses and offset capital gains.

### `AP-ACCOUNTING-012` — Missing substantially identical tickers in wash sale queries <sub>(high)</sub>

Omitting substantially identical fund tickers from the wash sale comparison set allows purchases of similar funds within the 30-day window. This triggers unintended wash sales that disallow tax loss claims on subsequent sales of the original position.

## finance-bp-129--beancount (3)

### `AP-ACCOUNTING-013` — Using parsed entries with MISSING sentinel values for calculations <sub>(high)</sub>

Using parsed entries directly that contain MISSING sentinel values for balance or cost computations causes runtime errors or silent zero-value calculations. This results in incorrect portfolio valuations and reconciliation failures, compromising financial reporting accuracy.

### `AP-ACCOUNTING-014` — Underspecified interpolation with multiple missing values per currency <sub>(high)</sub>

Having more than one missing value per currency group creates an underdetermined system with no unique solution during interpolation. This causes InterpolationError and transaction failure, blocking balance calculations for affected accounts.

### `AP-ACCOUNTING-015` — Violating accounting identity in opening balance transactions <sub>(high)</sub>

Creating opening balance transactions where the total balance of summarized entries does not equal exactly zero violates the fundamental accounting identity (Assets = Liabilities + Equity). This causes the balance sheet to be fundamentally incorrect with non-zero total assets and liabilities.

FILE:references/COMPONENTS.md
# Component Capability Map

**Project**: finance-bp-078--fava_investor
**Scan date**: 2026-04-22
**Stats**: {'total_files': 8, 'total_classes': 35, 'total_functions': 0, 'total_stages': 8}

## Modules (8)

- [api_abstraction_layer](components/api_abstraction_layer.md): 5 classes
- [ticker_relationship_analyzer](components/ticker_relationship_analyzer.md): 4 classes
- [asset_allocation_by_class](components/asset_allocation_by_class.md): 4 classes
- [asset_allocation_by_account](components/asset_allocation_by_account.md): 5 classes
- [cash_drag_detector](components/cash_drag_detector.md): 3 classes
- [tax_loss_harvester](components/tax_loss_harvester.md): 7 classes
- [gains_minimizer](components/gains_minimizer.md): 3 classes
- [metadata_summarizer](components/metadata_summarizer.md): 4 classes

FILE:references/CONSTRAINTS.md
# Constraints

## preservation_manifest

```yaml
required_objects:
  business_decisions_count: 128
  fatal_constraints_count: 54
  non_fatal_constraints_count: 168
  use_cases_count: 5
  semantic_locks_count: 12
  preconditions_count: 4
  evidence_quality_rules_count: 2
  traceback_scenarios_count: 5

```

FILE:references/LOCKS.md
# Semantic Locks + Preconditions

## Semantic Locks (12)

### `SL-01` <sub>(on_violation: fatal)</sub>
Execute sell orders before buy orders in every trading cycle

### `SL-02` <sub>(on_violation: fatal)</sub>
Trading signals MUST use next-bar execution (no look-ahead)

### `SL-03` <sub>(on_violation: fatal)</sub>
Entity IDs MUST follow format entity_type_exchange_code

### `SL-04` <sub>(on_violation: fatal)</sub>
DataFrame index MUST be MultiIndex (entity_id, timestamp)

### `SL-05` <sub>(on_violation: fatal)</sub>
TradingSignal MUST have EXACTLY ONE of: position_pct, order_money, order_amount

### `SL-06` <sub>(on_violation: fatal)</sub>
filter_result column semantics: True=BUY, False=SELL, None/NaN=NO ACTION

### `SL-07` <sub>(on_violation: fatal)</sub>
Transformer MUST run BEFORE Accumulator in factor pipeline

### `SL-08` <sub>(on_violation: fatal)</sub>
MACD parameters locked: fast=12, slow=26, signal=9

### `SL-09` <sub>(on_violation: warning)</sub>
Default transaction costs: buy_cost=0.001, sell_cost=0.001, slippage=0.001

### `SL-10` <sub>(on_violation: fatal)</sub>
A-share equity trading is T+1 (no same-day close of buy positions)

### `SL-11` <sub>(on_violation: fatal)</sub>
Recorder subclass MUST define provider AND data_schema class attributes

### `SL-12` <sub>(on_violation: fatal)</sub>
Factor result_df MUST contain either 'filter_result' OR 'score_result' column

## Preconditions (4)

- **`PC-01`**: `python3 -c 'import zvt; print(zvt.__version__)'` → on_fail: Run: python3 -m pip install zvt then re-run: python3 -m zvt.init_dirs to initialize data directories
- **`PC-02`**: `python3 -c "from zvt.api.kdata import get_kdata; df = get_kdata(entity_ids=['stock_sh_600000'], limit=1); assert df is not None and len(df) > 0, 'No kdata found'"` → on_fail: Run recorder first: python3 -m zvt.recorders.em.em_stock_kdata_recorder --entity_ids stock_sh_600000 (replace with your target entity IDs)
- **`PC-03`**: `python3 -c "import os; from pathlib import Path; zvt_home = Path(os.environ.get('ZVT_HOME', Path.home() / '.zvt')); assert zvt_home.exists(), f'ZVT home not found: {zvt_home}'"` → on_fail: Run: python3 -m zvt.init_dirs
- **`PC-04`**: `python3 -c "import os, tempfile; from pathlib import Path; zvt_home = Path(os.environ.get('ZVT_HOME', Path.home() / '.zvt')); test_f = zvt_home / '.write_test'; test_f.touch(); test_f.unlink()"` → on_fail: Check directory permissions: chmod u+w ~/.zvt or set ZVT_HOME environment variable to a writable location

FILE:references/USE_CASES.md
# Known Use Cases (KUC)

Total: **5**

## `KUC-101`
**Source**: `fava_investor/cli/investor.py`

Provides a unified command-line interface for portfolio management operations including tax loss harvesting, asset allocation analysis, cash drag detection, and tax gain minimization.

## `KUC-102`
**Source**: `fava_investor/util/test_relatetickers.py`

Identifies and groups equivalent or substitutable securities (e.g., VTI, VTSAX, VTSMX) based on metadata annotations to support tax lot management and wash sale detection.

## `KUC-103`
**Source**: `fava_investor/modules/minimizegains/test_minimizegains.py`

Determines optimal sell order for securities to minimize realized capital gains by analyzing cost basis and holding periods across multiple lots.

## `KUC-104`
**Source**: `fava_investor/modules/assetalloc_class/test_asset_allocation.py`

Calculates and reports portfolio allocation breakdown by asset type (stocks, bonds, cash, etc.) with percentage distributions from investment account holdings.

## `KUC-105`
**Source**: `fava_investor/modules/tlh/test_libtlh.py`

Identifies securities with unrealized losses that can be sold to harvest tax losses, typically looking back 30 days to find positions eligible for wash sale rule exceptions.

FILE:references/WISDOM.md
# Cross-Project Wisdom

Total: **10**

## `CW-ACCOUNTING-001` — Use exact-precision integer types for monetary representation
**From**: finance-bp-073--ledger, finance-bp-129--beancount · **Applicable to**: accounting

Both the Numscript ledger and Beancount parser mandates using Decimal (beancount) or MonetaryInt based on big.Int (ledger) instead of floating-point. This pattern ensures no rounding errors accumulate in financial calculations, critical for audit compliance in global markets.

## `CW-ACCOUNTING-002` — Mandatory initialization sequence before execution
**From**: finance-bp-073--ledger · **Applicable to**: accounting

The Numscript VM requires a strict initialization sequence: ResolveResources() then ResolveBalances() must both be called before Execute(). Skipping any step causes panics. This teaches that VM/script execution requires careful state setup—always verify prerequisites before running financial logic.

## `CW-ACCOUNTING-003` — Dual idempotency key strategy
**From**: finance-bp-073--ledger · **Applicable to**: accounting

Using both IdempotencyKey and IdempotencyHash together ensures robust duplicate detection: IdempotencyKey prevents exact retries while IdempotencyHash catches retries with different input parameters that would otherwise incorrectly succeed. Single-key approaches leave gaps in financial transaction safety.

## `CW-ACCOUNTING-004` — Log-before-commit event sourcing pattern
**From**: finance-bp-073--ledger · **Applicable to**: accounting

In the transaction processing pipeline, the log must be inserted before committing the transaction to maintain event sourcing integrity. This ensures the audit trail can always reconstruct state and supports rollback scenarios, critical for regulatory compliance in global accounting.

## `CW-ACCOUNTING-005` — Read Committed isolation with FOR UPDATE locks
**From**: finance-bp-073--ledger · **Applicable to**: accounting

When implementing balance operations, use Read Committed isolation level combined with FOR UPDATE row locks. This prevents concurrent transactions from creating inconsistent balances (e.g., both succeeding when they should fail due to insufficient funds), ensuring data integrity under concurrent load.

## `CW-ACCOUNTING-006` — Transitive closure for equivalence relationships
**From**: finance-bp-078--fava_investor · **Applicable to**: accounting

When building commodity groups or substantially identical fund relationships, apply transitive closure to infer complete equivalence. If A equals B and B equals C, then A, B, and C form one group. This ensures wash sale detection and TLH calculations are complete and accurate across all declared relationships.

## `CW-ACCOUNTING-007` — Canonical representative selection for relationship groups
**From**: finance-bp-078--fava_investor · **Applicable to**: accounting

When selecting a representative for a substantially identical fund group, always return the same representative ticker for any member of that group. Inconsistent representative selection causes non-deterministic calculations where the same ticker gets different partners depending on which group member is queried.

## `CW-ACCOUNTING-008` — Immutable monetary objects with __slots__
**From**: finance-bp-129--beancount · **Applicable to**: accounting

Constructing Amount or Position objects using immutable Decimal values with __slots__ = () pattern prevents accidental mutation of monetary values after creation. This immutability ensures financial calculations remain consistent throughout transaction processing and audit trails.

## `CW-ACCOUNTING-009` — Eliminate all MISSING values before presenting parsed data as complete
**From**: finance-bp-129--beancount · **Applicable to**: accounting

Parsed entries with MISSING sentinel values are incomplete and cannot be used for financial reporting. All MISSING values must be resolved through booking and interpolation before claiming parsed entries are ready for balance calculations or realized/unrealized gains computation.

## `CW-ACCOUNTING-010` — Strict schema compatibility across class hierarchies
**From**: finance-bp-078--fava_investor, finance-bp-129--beancount · **Applicable to**: accounting

When extending base classes with additional functionality (like ScaledNAV extending RelateTickers), maintain compatibility with existing metadata schemas. Schema divergence causes extended classes to miss relationships declared for the base class, breaking wash sale detection and TLH recommendations.

FILE:references/components/api_abstraction_layer.md
# api_abstraction_layer (5 classes)

## `FavaInvestorAPI.query_func`
`api_abstraction_layer/favainvestorapi-query-func.py:0`

## `AccAPI.build_price_map`
`api_abstraction_layer/accapi-build-price-map.py:0`

## `AccAPI.realize`
`api_abstraction_layer/accapi-realize.py:0`

## `AccAPI.get_operating_currencies`
`api_abstraction_layer/accapi-get-operating-currencies.py:0`

## `api_implementation`
`api_abstraction_layer/api-implementation.py:0`

FILE:references/components/asset_allocation_by_account.md
# asset_allocation_by_account (5 classes)

## `portfolio_accounts`
`asset_allocation_by_account/portfolio-accounts.py:0`

## `by_account_name`
`asset_allocation_by_account/by-account-name.py:0`

## `by_account_open_metadata`
`asset_allocation_by_account/by-account-open-metadata.py:0`

## `asset_allocation`
`asset_allocation_by_account/asset-allocation.py:0`

## `selection_strategy`
`asset_allocation_by_account/selection-strategy.py:0`

FILE:references/components/asset_allocation_by_class.md
# asset_allocation_by_class (4 classes)

## `treeify`
`asset_allocation_by_class/treeify.py:0`

## `bucketize`
`asset_allocation_by_class/bucketize.py:0`

## `AssetClassNode.serialise`
`asset_allocation_by_class/assetclassnode-serialise.py:0`

## `bucketize_strategy`
`asset_allocation_by_class/bucketize-strategy.py:0`

FILE:references/components/cash_drag_detector.md
# cash_drag_detector (3 classes)

## `find_loose_cash`
`cash_drag_detector/find-loose-cash.py:0`

## `find_cash_commodities`
`cash_drag_detector/find-cash-commodities.py:0`

## `cash_definition`
`cash_drag_detector/cash-definition.py:0`

FILE:references/components/gains_minimizer.md
# gains_minimizer (3 classes)

## `find_minimized_gains`
`gains_minimizer/find-minimized-gains.py:0`

## `find_tax_burden`
`gains_minimizer/find-tax-burden.py:0`

## `lot_selection_algorithm`
`gains_minimizer/lot-selection-algorithm.py:0`

FILE:references/components/metadata_summarizer.md
# metadata_summarizer (4 classes)

## `build_tables`
`metadata_summarizer/build-tables.py:0`

## `active_accounts_metadata`
`metadata_summarizer/active-accounts-metadata.py:0`

## `commodities_metadata`
`metadata_summarizer/commodities-metadata.py:0`

## `directive_type`
`metadata_summarizer/directive-type.py:0`

FILE:references/components/tax_loss_harvester.md
# tax_loss_harvester (7 classes)

## `find_harvestable_lots`
`tax_loss_harvester/find-harvestable-lots.py:0`

## `gain_term`
`tax_loss_harvester/gain-term.py:0`

## `query_recently_bought`
`tax_loss_harvester/query-recently-bought.py:0`

## `recently_sold_at_loss`
`tax_loss_harvester/recently-sold-at-loss.py:0`

## `harvestable_by_commodity`
`tax_loss_harvester/harvestable-by-commodity.py:0`

## `wash_window`
`tax_loss_harvester/wash-window.py:0`

## `loss_threshold`
`tax_loss_harvester/loss-threshold.py:0`

FILE:references/components/ticker_relationship_analyzer.md
# ticker_relationship_analyzer (4 classes)

## `RelateTickers.substidenticals`
`ticker_relationship_analyzer/relatetickers-substidenticals.py:0`

## `RelateTickers.representative`
`ticker_relationship_analyzer/relatetickers-representative.py:0`

## `RelateTickers.compute_tlh_groups`
`ticker_relationship_analyzer/relatetickers-compute-tlh-groups.py:0`

## `relationship_source`
`ticker_relationship_analyzer/relationship-source.py:0`

FILE:references/seed.yaml
meta:
  id: finance-bp-078-v5.3
  version: v6.1
  blueprint_id: finance-bp-078
  sop_version: crystal-compilation-v6.1
  source_language: en
  compiled_at: '2026-04-22T13:00:29.702985+00:00'
  target_host: openclaw
  authoritative_artifact:
    primary: seed.yaml
    non_authoritative_derivatives:
    - SKILL.md (host-generated summary, may lag)
    - HEARTBEAT.md (host telemetry)
    - memory/*.md (host conversational memory)
    rule: On any behavioral decision (preconditions check, OV assertion, EQ rule firing, spec_lock verification), agents MUST
      re-read seed.yaml. Derivatives are for UI display only and may be out-of-date.
  execution_protocol:
    install_trigger:
    - Execute resources.host_adapter.install_recipes[] in declared order
    - Verify each package with import check before proceeding
    execute_trigger: When user intent matches intent_router.uc_entries[].positive_terms AND user uses action verb (run/execute/跑/执行/backtest/fetch/collect)
    on_execute:
    - Reload seed.yaml (do not rely on SKILL.md or cached summaries)
    - Run preconditions[] in declared order; halt on first fatal failure with on_fail message to user
    - Enter context_state_machine.CA1_MEMORY_CHECKED state
    - Evaluate evidence_quality.enforcement_rules[]; prepend user_disclosure_template
    - Translate user_facing_fields to user locale per locale_contract
    - "[V6 READING ORDER]\nThis crystal contains the following V6 layers. Before answering any business question, the host\
      \ MUST read them in order:\n  1. anti_patterns[] — cross-project anti-patterns (with AP-* ids)\n  2. cross_project_wisdom[]\
      \ — cross-project wisdom (with CW-* ids)\n  3. domain_constraints_injected[] — domain constraints (SHARED-* ids)\n \
      \ 4. known_use_cases[] — concrete business scenarios (KUC-* ids)\n  5. component_capability_map — AST component map\
      \ (by module)\n\nWhen answering user questions, proactively cite relevant AP-*/CW-*/SHARED-*/KUC-* ids with source text.\
      \ Examples: T+1 rules -> cite SHARED-* constraint; model comparison -> warn via AP-*; follow-holdings strategy -> cite\
      \ KUC-* with example file."
    workspace_resolution:
      scripts_path: '{host_workspace}/scripts/'
      skills_path: '{host_workspace}/skills/'
      trace_path: '{host_workspace}/.trace/'
  capability_tags:
    markets:
    - global
    activities:
    - accounting
  upgraded_from: finance-bp-078-v1.seed.yaml
  upgraded_at: '2026-04-22T13:20:17.397484+00:00'
  v6_inputs:
    ast_mind_map: knowledge/sources/finance/finance-bp-078--fava_investor/v6_inputs/ast_mind_map.yaml
    anti_patterns: null
    cross_project_wisdom: null
    examples_kuc: knowledge/sources/finance/finance-bp-078--fava_investor/v6_inputs/examples_kuc.yaml
    shared_pools_dir: knowledge/sources/finance/_shared
anti_patterns:
- id: AP-ACCOUNTING-001
  title: Using floating-point arithmetic for monetary amounts
  description: Representing currency values with float64 or similar floating-point types causes precision loss during arithmetic
    operations. Rounding errors accumulate over multiple transactions, leading to incorrect balance calculations and potential
    financial losses. This violates the fundamental requirement that monetary calculations must be exact.
  project_source: finance-bp-073--ledger, finance-bp-129--beancount
  severity: high
  applicable_to_tags:
    markets:
    - global
    activities:
    - accounting
  _source_file: anti-patterns/accounting.yaml
- id: AP-ACCOUNTING-002
  title: Skipping initialization calls before VM/script execution
  description: Executing Numscript VM without first calling ResolveResources() and ResolveBalances() causes panics with ErrResourcesNotInitialized
    or ErrBalancesNotInitialized. This prevents any script execution and leaves transactions in an unrunnable state, blocking
    financial operations entirely.
  project_source: finance-bp-073--ledger
  severity: high
  applicable_to_tags:
    markets:
    - global
    activities:
    - accounting
  _source_file: anti-patterns/accounting.yaml
- id: AP-ACCOUNTING-003
  title: Mixing different asset types in monetary operations
  description: Performing addition, subtraction, or take operations on amounts with different asset types produces invalid
    financial calculations. This violates the fundamental accounting principle that amounts in different currencies cannot
    be combined, leading to corrupted account balances and failed reconciliations.
  project_source: finance-bp-073--ledger
  severity: high
  applicable_to_tags:
    markets:
    - global
    activities:
    - accounting
  _source_file: anti-patterns/accounting.yaml
- id: AP-ACCOUNTING-004
  title: Missing insufficient funds validation
  description: Failing to detect when account balance cannot cover a requested withdrawal or transfer allows overdrafts beyond
    permitted limits. This causes real monetary losses, account balance violations, and potential regulatory compliance issues
    in global markets.
  project_source: finance-bp-073--ledger
  severity: high
  applicable_to_tags:
    markets:
    - global
    activities:
    - accounting
  _source_file: anti-patterns/accounting.yaml
- id: AP-ACCOUNTING-005
  title: Non-atomic transaction commit/rollback
  description: Processing database operations without atomic commit/rollback leaves partial state when failures occur. This
    corrupts account balances and volumes, violating double-entry bookkeeping integrity and making audit trails unreliable
    for global regulatory compliance.
  project_source: finance-bp-073--ledger
  severity: high
  applicable_to_tags:
    markets:
    - global
    activities:
    - accounting
  _source_file: anti-patterns/accounting.yaml
- id: AP-ACCOUNTING-006
  title: On-demand posting generation causing double-spending
  description: Computing postings on-demand rather than accumulating them during transaction execution fails to track already-spent
    funds within the same transaction. This creates double-spending vulnerabilities that violate atomic transaction semantics
    and can result in significant financial losses.
  project_source: finance-bp-073--ledger
  severity: high
  applicable_to_tags:
    markets:
    - global
    activities:
    - accounting
  _source_file: anti-patterns/accounting.yaml
- id: AP-ACCOUNTING-007
  title: Log insertion after transaction commit breaking event sourcing
  description: Committing the transaction before inserting the audit log breaks the event sourcing pattern fundamental to
    accounting integrity. This makes it impossible to rebuild state from logs and violates audit requirements necessary for
    global financial compliance.
  project_source: finance-bp-073--ledger
  severity: high
  applicable_to_tags:
    markets:
    - global
    activities:
    - accounting
  _source_file: anti-patterns/accounting.yaml
- id: AP-ACCOUNTING-008
  title: Incomplete transaction log hash chaining
  description: Computing log hashes without including the previous log hash breaks the immutable audit trail chain. This allows
    undetected tampering with historical transaction records, compromising financial integrity and regulatory audit compliance.
  project_source: finance-bp-073--ledger
  severity: high
  applicable_to_tags:
    markets:
    - global
    activities:
    - accounting
  _source_file: anti-patterns/accounting.yaml
- id: AP-ACCOUNTING-009
  title: Incorrect row data access patterns on query results
  description: Using dictionary notation (row['column_name']) on namedtuple query results raises TypeError since namedtuples
    only support attribute access. This breaks all module queries expecting attribute-style access, causing asset allocation,
    tax loss harvesting, and other critical financial computations to fail.
  project_source: finance-bp-078--fava_investor
  severity: high
  applicable_to_tags:
    markets:
    - global
    activities:
    - accounting
  _source_file: anti-patterns/accounting.yaml
- id: AP-ACCOUNTING-010
  title: Missing bidirectional inference for fund relationship declarations
  description: When relationship A→B is declared but B→A is not inferred, the TLH partner list becomes incomplete. This leads
    to suboptimal tax-loss harvesting decisions where only some funds show all valid swap options, reducing potential tax
    savings for investors.
  project_source: finance-bp-078--fava_investor
  severity: medium
  applicable_to_tags:
    markets:
    - global
    activities:
    - accounting
  _source_file: anti-patterns/accounting.yaml
- id: AP-ACCOUNTING-011
  title: Wash sale comparison within substantially identical groups
  description: Comparing a ticker to itself in its own substantially identical group falsely triggers wash sale warnings.
    This incorrectly blocks valid tax-loss harvesting transactions, causing investors to miss opportunities to realize tax
    losses and offset capital gains.
  project_source: finance-bp-078--fava_investor
  severity: high
  applicable_to_tags:
    markets:
    - global
    activities:
    - accounting
  _source_file: anti-patterns/accounting.yaml
- id: AP-ACCOUNTING-012
  title: Missing substantially identical tickers in wash sale queries
  description: Omitting substantially identical fund tickers from the wash sale comparison set allows purchases of similar
    funds within the 30-day window. This triggers unintended wash sales that disallow tax loss claims on subsequent sales
    of the original position.
  project_source: finance-bp-078--fava_investor
  severity: high
  applicable_to_tags:
    markets:
    - global
    activities:
    - accounting
  _source_file: anti-patterns/accounting.yaml
- id: AP-ACCOUNTING-013
  title: Using parsed entries with MISSING sentinel values for calculations
  description: Using parsed entries directly that contain MISSING sentinel values for balance or cost computations causes
    runtime errors or silent zero-value calculations. This results in incorrect portfolio valuations and reconciliation failures,
    compromising financial reporting accuracy.
  project_source: finance-bp-129--beancount
  severity: high
  applicable_to_tags:
    markets:
    - global
    activities:
    - accounting
  _source_file: anti-patterns/accounting.yaml
- id: AP-ACCOUNTING-014
  title: Underspecified interpolation with multiple missing values per currency
  description: Having more than one missing value per currency group creates an underdetermined system with no unique solution
    during interpolation. This causes InterpolationError and transaction failure, blocking balance calculations for affected
    accounts.
  project_source: finance-bp-129--beancount
  severity: high
  applicable_to_tags:
    markets:
    - global
    activities:
    - accounting
  _source_file: anti-patterns/accounting.yaml
- id: AP-ACCOUNTING-015
  title: Violating accounting identity in opening balance transactions
  description: Creating opening balance transactions where the total balance of summarized entries does not equal exactly
    zero violates the fundamental accounting identity (Assets = Liabilities + Equity). This causes the balance sheet to be
    fundamentally incorrect with non-zero total assets and liabilities.
  project_source: finance-bp-129--beancount
  severity: high
  applicable_to_tags:
    markets:
    - global
    activities:
    - accounting
  _source_file: anti-patterns/accounting.yaml
cross_project_wisdom:
- wisdom_id: CW-ACCOUNTING-001
  source_project: finance-bp-073--ledger, finance-bp-129--beancount
  pattern_name: Use exact-precision integer types for monetary representation
  description: Both the Numscript ledger and Beancount parser mandates using Decimal (beancount) or MonetaryInt based on big.Int
    (ledger) instead of floating-point. This pattern ensures no rounding errors accumulate in financial calculations, critical
    for audit compliance in global markets.
  applicable_to_activity: accounting
  _source_file: cross-project-wisdom/accounting.yaml
- wisdom_id: CW-ACCOUNTING-002
  source_project: finance-bp-073--ledger
  pattern_name: Mandatory initialization sequence before execution
  description: 'The Numscript VM requires a strict initialization sequence: ResolveResources() then ResolveBalances() must
    both be called before Execute(). Skipping any step causes panics. This teaches that VM/script execution requires careful
    state setup—always verify prerequisites before running financial logic.'
  applicable_to_activity: accounting
  _source_file: cross-project-wisdom/accounting.yaml
- wisdom_id: CW-ACCOUNTING-003
  source_project: finance-bp-073--ledger
  pattern_name: Dual idempotency key strategy
  description: 'Using both IdempotencyKey and IdempotencyHash together ensures robust duplicate detection: IdempotencyKey
    prevents exact retries while IdempotencyHash catches retries with different input parameters that would otherwise incorrectly
    succeed. Single-key approaches leave gaps in financial transaction safety.'
  applicable_to_activity: accounting
  _source_file: cross-project-wisdom/accounting.yaml
- wisdom_id: CW-ACCOUNTING-004
  source_project: finance-bp-073--ledger
  pattern_name: Log-before-commit event sourcing pattern
  description: In the transaction processing pipeline, the log must be inserted before committing the transaction to maintain
    event sourcing integrity. This ensures the audit trail can always reconstruct state and supports rollback scenarios, critical
    for regulatory compliance in global accounting.
  applicable_to_activity: accounting
  _source_file: cross-project-wisdom/accounting.yaml
- wisdom_id: CW-ACCOUNTING-005
  source_project: finance-bp-073--ledger
  pattern_name: Read Committed isolation with FOR UPDATE locks
  description: When implementing balance operations, use Read Committed isolation level combined with FOR UPDATE row locks.
    This prevents concurrent transactions from creating inconsistent balances (e.g., both succeeding when they should fail
    due to insufficient funds), ensuring data integrity under concurrent load.
  applicable_to_activity: accounting
  _source_file: cross-project-wisdom/accounting.yaml
- wisdom_id: CW-ACCOUNTING-006
  source_project: finance-bp-078--fava_investor
  pattern_name: Transitive closure for equivalence relationships
  description: When building commodity groups or substantially identical fund relationships, apply transitive closure to infer
    complete equivalence. If A equals B and B equals C, then A, B, and C form one group. This ensures wash sale detection
    and TLH calculations are complete and accurate across all declared relationships.
  applicable_to_activity: accounting
  _source_file: cross-project-wisdom/accounting.yaml
- wisdom_id: CW-ACCOUNTING-007
  source_project: finance-bp-078--fava_investor
  pattern_name: Canonical representative selection for relationship groups
  description: When selecting a representative for a substantially identical fund group, always return the same representative
    ticker for any member of that group. Inconsistent representative selection causes non-deterministic calculations where
    the same ticker gets different partners depending on which group member is queried.
  applicable_to_activity: accounting
  _source_file: cross-project-wisdom/accounting.yaml
- wisdom_id: CW-ACCOUNTING-008
  source_project: finance-bp-129--beancount
  pattern_name: Immutable monetary objects with __slots__
  description: Constructing Amount or Position objects using immutable Decimal values with __slots__ = () pattern prevents
    accidental mutation of monetary values after creation. This immutability ensures financial calculations remain consistent
    throughout transaction processing and audit trails.
  applicable_to_activity: accounting
  _source_file: cross-project-wisdom/accounting.yaml
- wisdom_id: CW-ACCOUNTING-009
  source_project: finance-bp-129--beancount
  pattern_name: Eliminate all MISSING values before presenting parsed data as complete
  description: Parsed entries with MISSING sentinel values are incomplete and cannot be used for financial reporting. All
    MISSING values must be resolved through booking and interpolation before claiming parsed entries are ready for balance
    calculations or realized/unrealized gains computation.
  applicable_to_activity: accounting
  _source_file: cross-project-wisdom/accounting.yaml
- wisdom_id: CW-ACCOUNTING-010
  source_project: finance-bp-078--fava_investor, finance-bp-129--beancount
  pattern_name: Strict schema compatibility across class hierarchies
  description: When extending base classes with additional functionality (like ScaledNAV extending RelateTickers), maintain
    compatibility with existing metadata schemas. Schema divergence causes extended classes to miss relationships declared
    for the base class, breaking wash sale detection and TLH recommendations.
  applicable_to_activity: accounting
  _source_file: cross-project-wisdom/accounting.yaml
domain_constraints_injected: []
resources_injected: {}
known_use_cases:
- kuc_id: KUC-101
  source_file: fava_investor/cli/investor.py
  business_problem: Provides a unified command-line interface for portfolio management operations including tax loss harvesting,
    asset allocation analysis, cash drag detection, and tax gain minimization.
  intent_keywords:
  - portfolio management
  - CLI
  - command line
  - tax optimization
  - investment analysis
  stage: data_collection
  data_domain: holding_data
  type: live_trading
- kuc_id: KUC-102
  source_file: fava_investor/util/test_relatetickers.py
  business_problem: Identifies and groups equivalent or substitutable securities (e.g., VTI, VTSAX, VTSMX) based on metadata
    annotations to support tax lot management and wash sale detection.
  intent_keywords:
  - equivalent tickers
  - related securities
  - commodity grouping
  - ticker equivalence
  - substitutable assets
  stage: data_collection
  data_domain: holding_data
  type: data_pipeline
- kuc_id: KUC-103
  source_file: fava_investor/modules/minimizegains/test_minimizegains.py
  business_problem: Determines optimal sell order for securities to minimize realized capital gains by analyzing cost basis
    and holding periods across multiple lots.
  intent_keywords:
  - minimize gains
  - tax-efficient selling
  - capital gains optimization
  - lot selection
  - cost basis optimization
  stage: factor_computation
  data_domain: holding_data
  type: live_trading
- kuc_id: KUC-104
  source_file: fava_investor/modules/assetalloc_class/test_asset_allocation.py
  business_problem: Calculates and reports portfolio allocation breakdown by asset type (stocks, bonds, cash, etc.) with percentage
    distributions from investment account holdings.
  intent_keywords:
  - asset allocation
  - portfolio breakdown
  - asset class distribution
  - allocation report
  - portfolio composition
  stage: factor_computation
  data_domain: holding_data
  type: reporting
- kuc_id: KUC-105
  source_file: fava_investor/modules/tlh/test_libtlh.py
  business_problem: Identifies securities with unrealized losses that can be sold to harvest tax losses, typically looking
    back 30 days to find positions eligible for wash sale rule exceptions.
  intent_keywords:
  - tax loss harvesting
  - loss identification
  - wash sale
  - TLH opportunities
  - tax loss selling
  stage: factor_computation
  data_domain: holding_data
  type: live_trading
component_capability_map:
  project: finance-bp-078--fava_investor
  scan_date: '2026-04-22'
  stats:
    total_files: 8
    total_classes: 35
    total_functions: 0
    total_stages: 8
  modules:
    api_abstraction_layer:
      class_count: 5
      stage_id: api_abstraction
      stage_order: 1
      responsibility: 'Provides unified interface for accessing Beancount ledger data from both Fava web UI and standalone
        CLI. WHY: Enables code reuse while accommodating different runtime contexts - web plugin vs command-line tool share
        the same data access logic.'
      classes:
      - name: FavaInvestorAPI.query_func
        file: api_abstraction_layer/favainvestorapi-query-func.py
        line: 0
        kind: required_method
        signature: ''
      - name: AccAPI.build_price_map
        file: api_abstraction_layer/accapi-build-price-map.py
        line: 0
        kind: required_method
        signature: ''
      - name: AccAPI.realize
        file: api_abstraction_layer/accapi-realize.py
        line: 0
        kind: required_method
        signature: ''
      - name: AccAPI.get_operating_currencies
        file: api_abstraction_layer/accapi-get-operating-currencies.py
        line: 0
        kind: required_method
        signature: ''
      - name: api_implementation
        file: api_abstraction_layer/api-implementation.py
        line: 0
        kind: replaceable_point
      design_decision_count: 3
    ticker_relationship_analyzer:
      class_count: 4
      stage_id: ticker_relationships
      stage_order: 2
      responsibility: 'Infers relationships between investment tickers from incomplete metadata declarations. WHY: Tax loss
        harvesting requires knowing which funds are ''substantially identical'' (trigger wash sales) vs ''substantially different''
        (safe for TLH swaps), and this information should be declarable once and inferr'
      classes:
      - name: RelateTickers.substidenticals
        file: ticker_relationship_analyzer/relatetickers-substidenticals.py
        line: 0
        kind: required_method
        signature: ''
      - name: RelateTickers.representative
        file: ticker_relationship_analyzer/relatetickers-representative.py
        line: 0
        kind: required_method
        signature: ''
      - name: RelateTickers.compute_tlh_groups
        file: ticker_relationship_analyzer/relatetickers-compute-tlh-groups.py
        line: 0
        kind: required_method
        signature: ''
      - name: relationship_source
        file: ticker_relationship_analyzer/relationship-source.py
        line: 0
        kind: replaceable_point
      design_decision_count: 4
    asset_allocation_by_class:
      class_count: 4
      stage_id: asset_allocation_by_class
      stage_order: 3
      responsibility: 'Computes portfolio allocation percentages based on commodity metadata classifications (asset_allocation_*).
        WHY: Enables investors to see if their portfolio matches target allocations without manual spreadsheet work, visualizing
        how their actual holdings compare to target allocations.'
      classes:
      - name: treeify
        file: asset_allocation_by_class/treeify.py
        line: 0
        kind: required_method
        signature: ''
      - name: bucketize
        file: asset_allocation_by_class/bucketize.py
        line: 0
        kind: required_method
        signature: ''
      - name: AssetClassNode.serialise
        file: asset_allocation_by_class/assetclassnode-serialise.py
        line: 0
        kind: required_method
        signature: ''
      - name: bucketize_strategy
        file: asset_allocation_by_class/bucketize-strategy.py
        line: 0
        kind: replaceable_point
      design_decision_count: 4
    asset_allocation_by_account:
      class_count: 5
      stage_id: asset_allocation_by_account
      stage_order: 4
      responsibility: 'Groups account balances into portfolios based on regex patterns or metadata. WHY: Lets investors define
        custom portfolio groupings without changing their account naming conventions, enabling flexible portfolio organization
        independent of chart of accounts structure.'
      classes:
      - name: portfolio_accounts
        file: asset_allocation_by_account/portfolio-accounts.py
        line: 0
        kind: required_method
        signature: ''
      - name: by_account_name
        file: asset_allocation_by_account/by-account-name.py
        line: 0
        kind: required_method
        signature: ''
      - name: by_account_open_metadata
        file: asset_allocation_by_account/by-account-open-metadata.py
        line: 0
        kind: required_method
        signature: ''
      - name: asset_allocation
        file: asset_allocation_by_account/asset-allocation.py
        line: 0
        kind: required_method
        signature: ''
      - name: selection_strategy
        file: asset_allocation_by_account/selection-strategy.py
        line: 0
        kind: replaceable_point
      design_decision_count: 2
    cash_drag_detector:
      class_count: 3
      stage_id: cash_drag
      stage_order: 5
      responsibility: 'Identifies uninvested cash sitting in brokerage accounts that could be deployed for investment. WHY:
        Idle cash loses purchasing power to inflation over time; detecting it enables investors to take action and rebalance
        their portfolios efficiently.'
      classes:
      - name: find_loose_cash
        file: cash_drag_detector/find-loose-cash.py
        line: 0
        kind: required_method
        signature: ''
      - name: find_cash_commodities
        file: cash_drag_detector/find-cash-commodities.py
        line: 0
        kind: required_method
        signature: ''
      - name: cash_definition
        file: cash_drag_detector/cash-definition.py
        line: 0
        kind: replaceable_point
      design_decision_count: 2
    tax_loss_harvester:
      class_count: 7
      stage_id: tax_loss_harvesting
      stage_order: 6
      responsibility: 'Finds investment lots with unrealized losses suitable for tax loss harvesting (TLH). WHY: TLH turns
        paper losses into actual tax deductions, reducing current tax burden in the US SpecID method. Allows investors to
        systematically identify harvest opportunities.'
      classes:
      - name: find_harvestable_lots
        file: tax_loss_harvester/find-harvestable-lots.py
        line: 0
        kind: required_method
        signature: ''
      - name: gain_term
        file: tax_loss_harvester/gain-term.py
        line: 0
        kind: required_method
        signature: ''
      - name: query_recently_bought
        file: tax_loss_harvester/query-recently-bought.py
        line: 0
        kind: required_method
        signature: ''
      - name: recently_sold_at_loss
        file: tax_loss_harvester/recently-sold-at-loss.py
        line: 0
        kind: required_method
        signature: ''
      - name: harvestable_by_commodity
        file: tax_loss_harvester/harvestable-by-commodity.py
        line: 0
        kind: required_method
        signature: ''
      - name: wash_window
        file: tax_loss_harvester/wash-window.py
        line: 0
        kind: replaceable_point
      - name: loss_threshold
        file: tax_loss_harvester/loss-threshold.py
        line: 0
        kind: replaceable_point
      design_decision_count: 5
    gains_minimizer:
      class_count: 3
      stage_id: minimize_gains
      stage_order: 7
      responsibility: 'Determines optimal lot selection to minimize capital gains when selling. WHY: In US SpecID method,
        choosing which lots to sell directly impacts tax liability. This module helps investors minimize their tax burden
        by prioritizing highest-loss lots.'
      classes:
      - name: find_minimized_gains
        file: gains_minimizer/find-minimized-gains.py
        line: 0
        kind: required_method
        signature: ''
      - name: find_tax_burden
        file: gains_minimizer/find-tax-burden.py
        line: 0
        kind: required_method
        signature: ''
      - name: lot_selection_algorithm
        file: gains_minimizer/lot-selection-algorithm.py
        line: 0
        kind: replaceable_point
      design_decision_count: 4
    metadata_summarizer:
      class_count: 4
      stage_id: metadata_summarizer
      stage_order: 8
      responsibility: 'Extracts and displays metadata from account/commodity Open directives as formatted tables. WHY: Allows
        investors to store and view reference info (phone numbers, account numbers, contact details) alongside their ledger,
        making metadata accessible without manual inspection.'
      classes:
      - name: build_tables
        file: metadata_summarizer/build-tables.py
        line: 0
        kind: required_method
        signature: ''
      - name: active_accounts_metadata
        file: metadata_summarizer/active-accounts-metadata.py
        line: 0
        kind: required_method
        signature: ''
      - name: commodities_metadata
        file: metadata_summarizer/commodities-metadata.py
        line: 0
        kind: required_method
        signature: ''
      - name: directive_type
        file: metadata_summarizer/directive-type.py
        line: 0
        kind: replaceable_point
      design_decision_count: 3
  data_flow_hints: []
locale_contract:
  source_language: en
  user_facing_fields:
  - human_summary.what_i_can_do.tagline
  - human_summary.what_i_can_do.use_cases[]
  - human_summary.what_i_auto_fetch[]
  - human_summary.what_i_ask_you[]
  - evidence_quality.user_disclosure_template
  - post_install_notice.message_template.positioning
  - post_install_notice.message_template.capability_catalog.groups[].name
  - post_install_notice.message_template.capability_catalog.groups[].description
  - post_install_notice.message_template.capability_catalog.groups[].ucs[].name
  - post_install_notice.message_template.capability_catalog.groups[].ucs[].short_description
  - post_install_notice.message_template.call_to_action
  - post_install_notice.message_template.featured_entries[].beginner_prompt
  - post_install_notice.message_template.more_info_hint
  - preconditions[].description
  - preconditions[].on_fail
  - intent_router.uc_entries[].name
  - intent_router.uc_entries[].ambiguity_question
  - architecture.pipeline
  - architecture.stages[].narrative.does_what
  - architecture.stages[].narrative.key_decisions
  - architecture.stages[].narrative.common_pitfalls
  - constraints.fatal[].consequence
  - constraints.regular[].consequence
  - output_validator.assertions[].failure_message
  - acceptance.hard_gates[].on_fail
  - skill_crystallization.action
  locale_detection_order:
  - explicit_user_declaration
  - first_message_language
  - system_locale
  translation_enforcement:
    trigger: on_first_user_message
    action: Render user_facing_fields in detected locale, preserving all IDs (BD-/SL-/UC-/finance-C-) and code identifiers
      verbatim
    violation_code: LOCALE-01
    violation_signal: User receives untranslated English Human Summary when detected locale != en
evidence_quality:
  declared:
    evidence_coverage_ratio: 1.0
    evidence_verify_ratio: 0.21551724137931033
    evidence_invalid: 91
    evidence_verified: 25
    evidence_auto_fixed: 0
    audit_coverage: 33/33 (100%)
    audit_pass_rate: 4/33 (12%)
    audit_fail_total: 14
    audit_finance_universal:
      pass: 2
      warn: 11
      fail: 7
    audit_subdomain_totals:
      pass: 2
      warn: 4
      fail: 7
  enforcement_rules:
  - id: EQ-01
    trigger: declared.evidence_verify_ratio < 0.5
    action: MUST invoke traceback lookup for all cited BD-IDs in output before emitting business code — read LATEST.yaml sections
      for each BD referenced
    violation_code: EQ-01-V
    violation_signal: Generated script references BD-IDs but no tool_call to read LATEST.yaml preceded code generation
  user_disclosure_template: '[QUALITY NOTICE] This crystal was compiled from blueprint finance-bp-078. Evidence verify ratio
    = 21.6% and audit fail total = 14. Generated results may have uncaptured requirement gaps. Verify critical decisions against
    source files (LATEST.yaml / LATEST.jsonl).'
traceback:
  source_files:
    blueprint: LATEST.yaml
    constraints: LATEST.jsonl
  mandatory_lookup_scenarios:
  - id: TB-01
    condition: Two constraints have apparently conflicting enforcement rules
    lookup_target: LATEST.jsonl — find both constraint IDs, compare `consequence` + `evidence_refs` to determine priority
  - id: TB-02
    condition: A business decision rationale is unclear or disputed
    lookup_target: LATEST.yaml — locate BD-ID under business_decisions, read `rationale` + `alternative_considered` fields
  - id: TB-03
    condition: evidence_invalid > 0 in evidence_quality.declared
    lookup_target: LATEST.yaml _enrich_meta — cross-check specific BD `evidence_refs` fields for invalid markers
  - id: TB-04
    condition: User asks where a rule comes from
    lookup_target: LATEST.jsonl — find constraint by ID, read `confidence.evidence_refs` for source file + line number
  - id: TB-05
    condition: Generated code does not match expected ZVT API behavior
    lookup_target: LATEST.yaml stages[].required_methods — verify method signature and evidence locator in source code
  degraded_lookup:
    no_fs_access: 'Ask the user to paste the relevant LATEST.yaml section or LATEST.jsonl lines for the BD-/finance-C- IDs
      in question. Crystal ID: finance-bp-078-v5.0.'
trace_schema:
  event_types:
  - precondition_check
  - spec_lock_check
  - evidence_rule_fired
  - evidence_rule_skipped
  - locale_translation_emitted
  - hard_gate_passed
  - hard_gate_failed
  - skill_emitted
  - false_completion_claim
preconditions:
- id: PC-01
  description: zvt package installed and importable
  check_command: python3 -c 'import zvt; print(zvt.__version__)'
  on_fail: 'Run: python3 -m pip install zvt  then re-run: python3 -m zvt.init_dirs to initialize data directories'
  severity: fatal
- id: PC-02
  description: K-data exists for target entities (required before backtesting)
  check_command: python3 -c "from zvt.api.kdata import get_kdata; df = get_kdata(entity_ids=['stock_sh_600000'], limit=1);
    assert df is not None and len(df) > 0, 'No kdata found'"
  on_fail: 'Run recorder first: python3 -m zvt.recorders.em.em_stock_kdata_recorder --entity_ids stock_sh_600000  (replace
    with your target entity IDs)'
  severity: fatal
  applies_to_uc:
  - UC-103
  - UC-104
  - UC-105
- id: PC-03
  description: ZVT data directory initialized (~/.zvt or ZVT_HOME)
  check_command: 'python3 -c "import os; from pathlib import Path; zvt_home = Path(os.environ.get(''ZVT_HOME'', Path.home()
    / ''.zvt'')); assert zvt_home.exists(), f''ZVT home not found: {zvt_home}''"'
  on_fail: 'Run: python3 -m zvt.init_dirs'
  severity: fatal
- id: PC-04
  description: SQLite write permission for ZVT data directory
  check_command: python3 -c "import os, tempfile; from pathlib import Path; zvt_home = Path(os.environ.get('ZVT_HOME', Path.home()
    / '.zvt')); test_f = zvt_home / '.write_test'; test_f.touch(); test_f.unlink()"
  on_fail: 'Check directory permissions: chmod u+w ~/.zvt  or set ZVT_HOME environment variable to a writable location'
  severity: warn
intent_router:
  uc_entries:
  - uc_id: UC-101
    name: Portfolio Management CLI Entry Point
    positive_terms:
    - portfolio management
    - CLI
    - command line
    - tax optimization
    - investment analysis
    data_domain: holding_data
    negative_terms:
    - screening
    - trading strategy
    - ML prediction
    - factor computation
    ambiguity_question: Are you looking to use the command-line interface directly, or do you need to integrate one of these
      modules into your own code?
  - uc_id: UC-102
    name: Related Ticker Grouping Utility
    positive_terms:
    - equivalent tickers
    - related securities
    - commodity grouping
    - ticker equivalence
    - substitutable assets
    data_domain: holding_data
    negative_terms:
    - price prediction
    - trading signals
    - factor analysis
    ambiguity_question: Do you need to group equivalent securities for tax purposes, or are you looking for price/volume analysis
      of individual tickers?
  - uc_id: UC-103
    name: Tax-Optimized Selling Strategy
    positive_terms:
    - minimize gains
    - tax-efficient selling
    - capital gains optimization
    - lot selection
    - cost basis optimization
    data_domain: holding_data
    negative_terms:
    - buy signals
    - screening
    - portfolio rebalancing
    - ML prediction
    ambiguity_question: Are you deciding which lots to sell to minimize taxes, or are you looking for new investment opportunities
      to buy?
  - uc_id: UC-104
    name: Asset Allocation Analysis
    positive_terms:
    - asset allocation
    - portfolio breakdown
    - asset class distribution
    - allocation report
    - portfolio composition
    data_domain: holding_data
    negative_terms:
    - tax loss harvesting
    - screening
    - trading signals
    - ML prediction
    ambiguity_question: Do you need a report showing how your portfolio is allocated across asset types, or are you looking
      for specific tax optimization strategies?
  - uc_id: UC-105
    name: Tax Loss Harvesting Opportunity Detection
    positive_terms:
    - tax loss harvesting
    - loss identification
    - wash sale
    - TLH opportunities
    - tax loss selling
    data_domain: holding_data
    negative_terms:
    - buy signals
    - portfolio allocation
    - screening
    - ML prediction
    ambiguity_question: Are you looking for securities to sell at a loss for tax benefits, or do you need to identify securities
      to buy or allocate differently?
context_state_machine:
  states:
  - id: CA1_MEMORY_CHECKED
    entry: Task started
    exit: All memory queries attempted and recorded; memory_unavailable set if failed
    timeout: 30s — skip memory, mark memory_unavailable=true, proceed to CA2
  - id: CA2_GAPS_FILLED
    entry: CA1 complete
    exit: 'All FATAL-priority required inputs answered: target market (A-share/HK/US), data source, time range, strategy type'
    timeout: NOT skippable — FATAL inputs MUST be user-answered before proceeding
  - id: CA3_PATH_SELECTED
    entry: CA2 complete
    exit: intent_router matched single use case with confidence gap > 20% over next candidate, no data_domain ambiguity
    timeout: Trigger ambiguity_question for top-2 candidates, await user selection
  - id: CA4_EXECUTING
    entry: CA3 complete + user explicit confirmation received
    exit: All hard gates G1-Gn passed and output files written
    timeout: NOT skippable — user confirmation of execution path required
  enforcement: Code generation is PROHIBITED before CA4_EXECUTING. Any regression to earlier state MUST be announced to user.
    buy/sell ordering SL-01 check runs at CA4 entry.
spec_lock_registry:
  semantic_locks:
  - id: SL-01
    description: Execute sell orders before buy orders in every trading cycle
    locked_value: sell() called before buy() in each Trader.run() iteration
    violation_is: fatal
    source_bd_ids:
    - BD-018
  - id: SL-02
    description: Trading signals MUST use next-bar execution (no look-ahead)
    locked_value: due_timestamp = happen_timestamp + level.to_second()
    violation_is: fatal
    source_bd_ids:
    - BD-014
    - BD-025
  - id: SL-03
    description: Entity IDs MUST follow format entity_type_exchange_code
    locked_value: stock_sh_600000 | stockhk_hk_0700 | stockus_nasdaq_AAPL
    violation_is: fatal
    source_bd_ids: []
  - id: SL-04
    description: DataFrame index MUST be MultiIndex (entity_id, timestamp)
    locked_value: df.index.names == ['entity_id', 'timestamp']
    violation_is: fatal
    source_bd_ids: []
  - id: SL-05
    description: 'TradingSignal MUST have EXACTLY ONE of: position_pct, order_money, order_amount'
    locked_value: XOR enforcement in trading/__init__.py:68
    violation_is: fatal
    source_bd_ids: []
  - id: SL-06
    description: 'filter_result column semantics: True=BUY, False=SELL, None/NaN=NO ACTION'
    locked_value: factor.py:475 order_type_flag mapping
    violation_is: fatal
    source_bd_ids: []
  - id: SL-07
    description: Transformer MUST run BEFORE Accumulator in factor pipeline
    locked_value: 'compute_result(): transform at :403 before accumulator at :409'
    violation_is: fatal
    source_bd_ids: []
  - id: SL-08
    description: 'MACD parameters locked: fast=12, slow=26, signal=9'
    locked_value: factors/algorithm.py:30 macd(slow=26, fast=12, n=9)
    violation_is: fatal
    source_bd_ids:
    - BD-036
  - id: SL-09
    description: 'Default transaction costs: buy_cost=0.001, sell_cost=0.001, slippage=0.001'
    locked_value: sim_account.py:25 SimAccountService default costs
    violation_is: warning
    source_bd_ids:
    - BD-029
  - id: SL-10
    description: A-share equity trading is T+1 (no same-day close of buy positions)
    locked_value: sim_account.available_long filters by trading_t
    violation_is: fatal
    source_bd_ids: []
  - id: SL-11
    description: Recorder subclass MUST define provider AND data_schema class attributes
    locked_value: contract/recorder.py:71 Meta; register_schema decorator
    violation_is: fatal
    source_bd_ids: []
  - id: SL-12
    description: Factor result_df MUST contain either 'filter_result' OR 'score_result' column
    locked_value: result_df.columns.intersection({'filter_result', 'score_result'}) non-empty
    violation_is: fatal
    source_bd_ids: []
  implementation_hints:
  - id: IH-01
    hint: 'Use AdjustType enum exactly: qfq (pre-adjust), hfq (post-adjust), bfq (none) — contract/__init__.py:121'
  - id: IH-02
    hint: For A-share kdata, default to hfq for long-term analysis (dividend-adjusted) — trader.py:538 StockTrader
  - id: IH-03
    hint: SQLite connection MUST use check_same_thread=False for multi-threaded recorders
  - id: IH-04
    hint: Accumulator state serialization uses JSON with custom encoder/decoder hooks — contract/base_service.py
  - id: IH-05
    hint: Factor.level MUST match TargetSelector.level (enforced at add_factor) — factors/target_selector.py:84
preservation_manifest:
  required_objects:
    business_decisions_count: 128
    fatal_constraints_count: 54
    non_fatal_constraints_count: 168
    use_cases_count: 5
    semantic_locks_count: 12
    preconditions_count: 4
    evidence_quality_rules_count: 2
    traceback_scenarios_count: 5
architecture:
  pipeline: data_collection -> data_storage -> factor_computation -> target_selection -> trading_execution -> visualization
  stages:
  - id: data_collection
    narrative:
      does_what: TimeSeriesDataRecorder and FixedCycleDataRecorder fetch OHLCV and fundamental data from providers (eastmoney,
        joinquant, baostock, akshare) and persist domain objects (Stock1dKdata, BalanceSheet) to SQLite via df_to_db().
      key_decisions: BD-002 chose evaluate_start_end_size_timestamps for incremental fetch (not full refresh) because comparing
        to get_latest_saved_record avoids redundant API calls; BD-003 chose get_data_map field transformation to keep domain
        schema provider-agnostic.
      common_pitfalls: 'Don''t forget SL-11: Recorder subclass MUST declare both provider and data_schema class attributes
        else initialization fails with assertion error; finance-C-001 fatal violation.'
    business_decisions: []
  - id: data_storage
    narrative:
      does_what: StorageBackend persists DataFrames to per-provider SQLite databases at {data_path}/{provider}/{provider}_{db_name}.db
        using path templates from _get_path_template; Mixin.record_data and Mixin.query_data provide uniform read/write interface.
      key_decisions: BD-004 chose StorageBackend abstraction (not hardcoded SQLite) to allow future cloud storage swap; BD-006
        derives db_name from data_schema __tablename__ for per-domain database isolation.
      common_pitfalls: SL-04 violation (wrong DataFrame index) causes factor pipeline failures downstream; always ensure df.index.names
        == ['entity_id', 'timestamp'] before calling record_data.
    business_decisions: []
  - id: factor_computation
    narrative:
      does_what: Factor.compute() applies Transformer (stateless, e.g. MacdTransformer) then Accumulator (stateful, e.g. MaStatsAccumulator)
        to produce filter_result or score_result columns; EntityStateService persists per-entity rolling state across batches.
      key_decisions: BD-007 chose Factor inheriting DataReader for composable data access; SL-08 locks MACD at (fast=12, slow=26,
        n=9) — chose standard Appel parameters not adaptive because interpretability matters for practitioners.
      common_pitfalls: 'SL-07: Transformer MUST run before Accumulator — swapping order causes NaN propagation; SL-12: result_df
        must contain filter_result OR score_result column or TargetSelector silently drops all signals.'
    business_decisions: []
  - id: target_selection
    narrative:
      does_what: TargetSelector.add_factor() registers Factor instances; get_targets() returns entity_ids passing threshold
        filter at a specific timestamp, enabling point-in-time historical backtesting without look-ahead.
      key_decisions: BD-012 chose registrable factor list (not hardcoded) for runtime customization; BD-013 chose timestamp-specific
        filtering not current-only because backtests need historical point-in-time correctness.
      common_pitfalls: Factor.level MUST match TargetSelector.level (IH-05); mismatched levels cause silent empty target lists
        that look like no signals but are actually level-mismatch bugs.
    business_decisions: []
  - id: trading_execution
    narrative:
      does_what: Trader.run() calls sell() before buy() each cycle, generates TradingSignals with due_timestamp = happen_timestamp
        + level.to_second() for next-bar execution, and applies on_profit_control() for stop-loss/take-profit before regular
        target selection.
      key_decisions: SL-01 locks sell-before-buy order because available_long check in sim_account depends on it — chose this
        over symmetric ordering to prevent implicit leverage; BD-039 chose long=AND/short=OR multi-level logic to reflect
        risk asymmetry.
      common_pitfalls: 'SL-02 violation (immediate execution instead of next-bar) introduces look-ahead bias and makes backtest
        results unreproducible in live trading; SL-10: A-share T+1 constraint — backtesting without it overstates returns.'
    business_decisions: []
  - id: visualization
    narrative:
      does_what: Drawer.draw() combines kline main chart with factor overlays and Rect annotations for entry/exit signals
        using Plotly; Drawable interface on Factor enables consistent chart rendering across data types.
      key_decisions: BD-019 chose drawer_rects subclass override for custom annotations not hardcoded markers — allows traders
        to define entry/exit visuals without modifying base drawing logic.
      common_pitfalls: draw_result=True by default (BD-055) is fine for development but set draw_result=False in production/headless
        environments to avoid Plotly server startup overhead.
    business_decisions: []
  - id: cross_cutting_concerns
    narrative:
      does_what: 'Invariants and utilities that span multiple pipeline stages — collected from 26 source groups: Common Library(3),
        Scaled NAV(3), api_abstraction(17), asset_allocation_by_account(3), asset_allocation_by_class(10), cachedtickerinfo(1),
        and 20 more.'
      key_decisions: 128 BDs merged here because they apply to more than one main stage (e.g. algorithm helpers, default value
        choices, ordering contracts, error handling). Agent should inspect individual BD summaries and link back to affected
        main stages via shared IDs.
      common_pitfalls: Cross-cutting concerns frequently surface as bugs when changes to one main stage unintentionally break
        another. Check constraints referencing these BDs and verify invariants still hold after any stage-local modification.
    business_decisions:
    - id: BD-056
      type: B
      summary: val() function returns 0 for empty Inventory, None for other cases
    - id: BD-057
      type: B/RC
      summary: Decimal type used for each financial calculations
    - id: BD-058
      type: B
      summary: Table footer sums Inventory columns via reduce with currency conversion
    - id: BD-050
      type: B/DK
      summary: Use last 10 days of price data for MF/ETF ratio calculation
    - id: BD-051
      type: B/BA
      summary: Use median ratio instead of mean to avoid extreme values
    - id: BD-052
      type: B/RC
      summary: Warn but don't fail when ETF prices unavailable for MF estimation
    - id: BD-019
      type: M
      summary: Duck-typed API classes, no abstract base
    - id: BD-020
      type: B
      summary: Fava version compatibility via version.parse
    - id: BD-021
      type: B
      summary: Config extracted from Custom directives
    - id: BD-053
      type: B
      summary: CLI mode returns end_date=None (no date filtering)
    - id: BD-054
      type: B
      summary: Config extracted from fava-extension custom directives in beancount file
    - id: BD-055
      type: B
      summary: Version-specific query_func for Fava 1.22+ vs 1.30+ compatibility
    - id: BD-GAP-001
      type: M
      summary: 'Missing: Convergence criteria explicit'
    - id: BD-GAP-002
      type: M
      summary: 'Missing: Matrix ill-conditioning'
    - id: BD-GAP-003
      type: B
      summary: 'Missing: 收益率频率与年化因子'
    - id: BD-GAP-004
      type: B
      summary: 'Missing: 波动率模型族与分布选择'
    - id: BD-GAP-005
      type: B
      summary: 'Missing: 因子 IC 的 demean 与分组对齐'
    - id: BD-GAP-006
      type: RC
      summary: 'Missing: ** "Implement immutable append-only semantics for each data writes with timestamp + hash chaining'
    - id: BD-GAP-007
      type: RC
      summary: 'Missing: ** "Add timezone-aware datetime handling throughout - prefer UTC normalization for each timestamp
        operations'
    - id: BD-GAP-008
      type: M
      summary: 'Missing: 协方差矩阵 PSD 修复策略'
    - id: BD-GAP-009
      type: B
      summary: 'Missing: 协方差估计量选择与收缩'
    - id: BD-GAP-010
      type: B
      summary: 'Missing: VaR/CVaR 置信水平与窗口'
    - id: BD-GAP-011
      type: B
      summary: 'Missing: 波动率模型族与分布选择'
    - id: BD-004
      type: B
      summary: Strategy pattern via pattern_type string lookup
    - id: BD-065
      type: B
      summary: Account allocation uses optional include_children flag for balance rollup
    - id: BD-066
      type: B/BA
      summary: Dynamic dispatch to pattern_type function (by_account_name, by_account_open_metadata)
    - id: BD-001
      type: B
      summary: Metadata-driven bucketing via asset_allocation_* prefix
    - id: BD-002
      type: B
      summary: Single base currency stored in root node
    - id: BD-003
      type: BA
      summary: Unallocated amounts fall through to 'unknown' bucket
    - id: BD-031
      type: B/BA
      summary: Tax adjustment enabled by default in asset allocation
    - id: BD-032
      type: B
      summary: Use first operating currency as base currency for each conversions
    - id: BD-033
      type: B
      summary: Bucket unallocated percentages into 'unknown' bucket when metadata < 100%
    - id: BD-034
      type: B/BA
      summary: Skip negative balances (liabilities) in asset allocation
    - id: BD-035
      type: B
      summary: Remove empty accounts and zero-balance ancestor accounts
    - id: BD-036
      type: B/BA
      summary: Use 'asset_allocation_' metadata prefix for bucket definitions
    - id: BD-037
      type: B
      summary: Convert currencies via operating currencies when direct conversion unavailable
    - id: BD-082
      type: B/BA
      summary: 'Expense ratio conversion: ER * 100 for percentage display'
    - id: BD-005
      type: B/DK
      summary: Cash commodities include operating currencies + metadata-tagged ones
    - id: BD-006
      type: BA
      summary: Empty inventory rows filtered after query
    - id: BD-038
      type: B
      summary: Cash commodities detected via asset_allocation_Bond_Cash metadata = 100
    - id: BD-039
      type: B/BA
      summary: Include operating currencies as cash by default
    - id: BD-040
      type: B/BA
      summary: Default accounts pattern '^Assets' for cash drag detection
    - id: BD-097
      type: B/BA
      summary: loss_threshold defaults to 1 in TLH but 0 in example config (tlh.py:96 vs tlh.py:25)
    - id: BD-103
      type: B/BA
      summary: Asset allocation tree root.currency hardcoded to first operating currency with no fallback
    - id: BD-107
      type: B/BA
      summary: 'INTERACTION: BD-092 × BD-103 → Systemic currency failure when operating_currencies is empty or first currency
        is inappropriate'
    - id: BD-108
      type: RC
      summary: 'INTERACTION: BD-007 × BD-022 × BD-027 × BD-073 → Duplicated 30-day wash sale window hardcoding creates maintenance
        hazard and compliance risk'
    - id: BD-109
      type: BA
      summary: 'INTERACTION: BD-102 × BD-023 × BD-008 → minimizegains has hidden dependency on libtlh leap-year handling for
        tax term classification'
    - id: BD-110
      type: RC
      summary: 'INTERACTION: BD-094 × BD-095 → Hardcoded account filter in 3 modules with single source-of-truth creates concentrated
        failure point'
    - id: BD-111
      type: B/BA
      summary: 'INTERACTION: BD-097 (Contradiction) → loss_threshold defaults to 1 in code but 0 in example config creates
        context-dependent harvesting behavior'
    - id: BD-112
      type: BA
      summary: 'INTERACTION: BD-029 × BD-030 × BD-017 → Cross-account wash sale detection depends on substantially identical
        fund grouping accuracy'
    - id: BD-113
      type: BA
      summary: 'INTERACTION: BD-101 × BD-093 → AccAPI imports Fava internals despite being designed as standalone Beancount
        API'
    - id: BD-114
      type: T
      summary: 'INTERACTION: BD-057 × BD-092 → Decimal precision guarantee depends on single-base-currency assumption holding'
    - id: BD-115
      type: BA/DK
      summary: 'INTERACTION: BD-020 × BD-055 → Fava version compatibility logic duplicated across FavaInvestorAPI with different
        version thresholds'
    - id: BD-116
      type: RC
      summary: 'RISK CASCADE: BD-092 → BD-002 → BD-103 → BD-037 → BD-058 → BD-069 → BD-090 → BD-091 → BD-076 → BD-077'
    - id: BD-117
      type: RC
      summary: 'RISK CASCADE: BD-095 → BD-102 → BD-023 → BD-008 → BD-096 → BD-024'
    - id: BD-098
      type: BA
      summary: ScaledNAV extends RelateTickers inheriting build_commodity_groups() for identicals - both use same metadata
    - id: BD-106
      type: BA
      summary: Each Investor method creates NEW FavaInvestorAPI instance instead of reusing
    - id: BD-092
      type: RC
      summary: Every modules use operating_currencies[0] as single base currency for ALL financial calculations
    - id: BD-094
      type: B/RC
      summary: Account filter 'account_sortkey(account) ~ "^[01]"' hardcoded across TLH, minimizegains, and summarizer
    - id: BD-099
      type: RC
      summary: RelateTickers.substidenticals() combines both 'a__equivalents' AND 'a__substidenticals' by default
    - id: BD-101
      type: DK/B
      summary: AccAPI.root_tree() imports fava/core.Tree despite being in beancountinvestorapi.py
    - id: BD-104
      type: T
      summary: Every modules use 'a__' prefix for auto-generated metadata vs 'asset_allocation_' for user config
    - id: BD-079
      type: B
      summary: 'Portfolio allocation percentage: (balance / total) * 100 rounded to 1 decimal'
    - id: BD-067
      type: B
      summary: 'Asset allocation percentage calculation: (balance / total) * 100'
    - id: BD-068
      type: B/RC
      summary: Recursive subtree balance computation for hierarchical percentages
    - id: BD-069
      type: B/BA
      summary: 'Tax-adjusted position scaling: position * (tax_adj / 100)'
    - id: BD-088
      type: B/RC
      summary: 'Bucket allocation distribution: amount * (meta_value / 100)'
    - id: BD-080
      type: B/BA
      summary: 'Cash drag threshold filter: position >= min_threshold'
    - id: BD-081
      type: B/RC
      summary: Inventory sum via sequential accumulation into single Inventory
    - id: BD-075
      type: B/BA
      summary: Tax burden interpolation between proceeds bracket boundaries
    - id: BD-076
      type: B/RC
      summary: 'Average tax rate calculation: (cumulative_taxes / cumulative_proceeds) * 100'
    - id: BD-077
      type: B/RC
      summary: 'Marginal tax rate: (Δ_taxes / Δ_proceeds) * 100'
    - id: BD-078
      type: B/RC
      summary: Lot selection ordering by estimated tax percentage (ascending)
    - id: BD-089
      type: B/RC
      summary: Decimal rounding for proceeds (0 decimals), tax rates (1-2 decimals)
    - id: BD-091
      type: B/RC
      summary: 'Estimated tax calculation: gain * tax_rate (per term)'
    - id: BD-086
      type: B
      summary: Table sorting with configurable column and direction
    - id: BD-072
      type: B/BA
      summary: Gain term classification using relativedelta for precise date arithmetic
    - id: BD-073
      type: B/RC
      summary: 30-day wash sale lookback period for recent purchases
    - id: BD-074
      type: B/BA
      summary: 'Loss threshold filtering: losses < -loss_threshold'
    - id: BD-090
      type: B/DK
      summary: Market value to basis difference for loss calculation
    - id: BD-014
      type: B
      summary: Metadata prefix filtering for flexible column selection
    - id: BD-015
      type: BA
      summary: Commodity leaf accounts excluded if parent is open
    - id: BD-059
      type: B
      summary: Commodity_leaf accounts only included if parent account has no Open directive
    - id: BD-060
      type: B/RC
      summary: Option to filter summarizer to active commodities only (has positions)
    - id: BD-061
      type: B
      summary: Empty string used for missing column values in summarizer
    - id: BD-011
      type: B/RC
      summary: Lots sorted by estimated tax percentage (ascending)
    - id: BD-012
      type: BA
      summary: Cumulative columns added after sorting
    - id: BD-013
      type: B/BA
      summary: Short-term and long-term tax rates from config
    - id: BD-041
      type: B/BA
      summary: Default short-term and long-term tax rates of 1%
    - id: BD-042
      type: B/RC
      summary: Sort lots by estimated tax percentage ascending
    - id: BD-043
      type: B/BA
      summary: 'Add cumulative columns: cu_proceeds, cu_taxes, tax_avg, tax_marg'
    - id: BD-044
      type: B/RC
      summary: Estimate tax = gain × tax_rate for each lot
    - id: BD-095
      type: BA
      summary: libtlh.get_account_field(options) is the ONLY source of truth for account field extraction, shared by TLH and
        minimizegains
    - id: BD-096
      type: RC
      summary: 'libtlh.get_tables() pipeline order: find_harvestable_lots → harvestable_by_commodity → summarize_tlh → build_recents'
    - id: BD-102
      type: RC
      summary: minimizegains relies on libtlh.gain_term() for short/long term classification
    - id: BD-093
      type: BA
      summary: 'Dual-API pattern: FavaInvestorAPI (Fava context) vs AccAPI (CLI context) implement identical interfaces'
    - id: BD-100
      type: B/BA
      summary: Node tree pattern with underscore-separated naming (e.g., 'equity_domestic') for asset allocation hierarchy
    - id: BD-105
      type: B/DK
      summary: Wash sale detection uses 30-day lookback hardcoded in SQL DATE_ADD(TODAY(), -30)
    - id: BD-083
      type: B/RC
      summary: Union-Find algorithm for building commodity equivalence groups
    - id: BD-084
      type: B
      summary: 'TLH partner inference using symmetric rule: if A→(B,C) then B→(A,C), C→(A,B)'
    - id: BD-085
      type: B/RC
      summary: Representative ticker selection for identical group
    - id: BD-070
      type: B
      summary: MF NAV estimation using median ratio from historical MF/ETF pairs
    - id: BD-071
      type: B/DK
      summary: NAV scaling ratio based on only most recent 10 historical ratios
    - id: BD-087
      type: B
      summary: 'Price ratio calculation: MF_price / ETF_price across matching dates'
    - id: BD-007
      type: B/DK
      summary: 30-day wash sale window hardcoded in query
    - id: BD-008
      type: B/BA
      summary: relativedelta for gain term to handle leap years
    - id: BD-009
      type: B/BA
      summary: Substantially identical tickers read from commodity metadata
    - id: BD-010
      type: B/RC
      summary: Summary aggregates currency values via Decimal sum
    - id: BD-022
      type: B/DK
      summary: 30-day wash sale window for both recent purchases and recent sales
    - id: BD-023
      type: B/BA
      summary: 'Long-term gain threshold: >1 year using relativedelta accounting for leap years'
    - id: BD-024
      type: B/BA
      summary: Default loss_threshold of 1 dollar
    - id: BD-025
      type: B/RC
      summary: Uses a__substidenticals metadata to identify substantially identical securities
    - id: BD-026
      type: B/BA
      summary: Filter accounts via account_sortkey matching ^[01] pattern
    - id: BD-027
      type: B/DK
      summary: Earliest safe sale date = acquisition_date + 31 days
    - id: BD-028
      type: B/RC
      summary: Sort harvestable table by highest to lowest losses
    - id: BD-029
      type: B/RC
      summary: Separate wash_pattern to distinguish taxable accounts from wash-sale accounts
    - id: BD-030
      type: B/DK
      summary: Deduplicate recent purchases by ticker across substantially identical funds
    - id: BD-016
      type: B
      summary: Graph-based inference of TLH partners
    - id: BD-017
      type: B/DK
      summary: Equivalents vs Substidenticals distinction
    - id: BD-018
      type: BA
      summary: Archived tickers filtered from TLH groups
    - id: BD-045
      type: B/RC
      summary: Separate 'a__equivalents' and 'a__substidenticals' metadata fields
    - id: BD-046
      type: B/RC
      summary: Tickers with 'archive' metadata are excluded from TLH calculations
    - id: BD-047
      type: B/RC
      summary: 'TLH partners are made transitive: if A→B, then B→A inferred'
    - id: BD-048
      type: B/RC
      summary: Option to filter TLH partners by fund type (ETF vs MUTUALFUND)
    - id: BD-049
      type: B/RC
      summary: Representative ticker chosen from preferred set (idents_preferred)
    - id: BD-062
      type: B
      summary: Yahoo info cache stored as pickle file in BEAN_ROOT directory
    - id: BD-063
      type: B/BA
      summary: Expense ratio converted from decimal to percentage (×100) on cache write
    - id: BD-064
      type: B/DK
      summary: Remove '-' ISIN values from cached ticker info
resources:
  packages:
  - name: beancount >= 2.3.2
    version_pin: latest
  - name: fava >= 1.26
    version_pin: latest
  - name: beanquery
    version_pin: latest
  - name: Click >= 7.0
    version_pin: latest
  - name: click_aliases >= 1.0.1
    version_pin: latest
  - name: tabulate >= 0.8.9
    version_pin: latest
  - name: packaging >= 20.3
    version_pin: latest
  - name: python_dateutil >= 2.8.1
    version_pin: latest
  - name: yfinance >= 0.1.70
    version_pin: latest
  - name: importlib_metadata >= 1.5.0
    version_pin: latest
  strategy_scaffold:
    entry_point_name: run_backtest
    output_path: result.csv
    execution_mode: backtest
    conditional_entry_points:
      backtest:
        entry_point_name: run_backtest
        output_path: result.csv
      collector:
        entry_point_name: run_collector
        output_path: result.json
      factor:
        entry_point_name: run_factor
        output_path: result.parquet
      training:
        entry_point_name: run_training
        output_path: result.json
      serving:
        entry_point_name: run_server
        output_path: result.json
      research:
        entry_point_name: run_research
        output_path: result.json
    tail_template: "# === DO NOT MODIFY BELOW THIS LINE ===\nif __name__ == \"__main__\":\n    result = run_backtest()  #\
      \ implement above\n    from validate import enforce_validation\n    enforce_validation(result, output_path=\"{workspace}/result.csv\"\
      )\n# === END DO NOT MODIFY ==="
  host_adapter:
    target: openclaw
    timeout_seconds: 1800
    shell_operator_restriction: 'exec tool intercepts && / ; / | — never chain: ''pip install X && python Y''. Use separate
      exec calls.'
    install_recipes:
    - python3 -m pip install beancount >= 2.3.2
    - python3 -m pip install fava >= 1.26
    - python3 -m pip install beanquery
    - python3 -m pip install zvt
    credential_injection: JoinQuant/QMT credentials require user-side '!' prefix shell login. Never hardcode credentials in
      generated scripts.
    path_resolution: '{workspace} resolves to ~/.openclaw/workspace/doramagic at execution time.'
    file_io_tooling: Use openclaw 'write' tool for .py/.sql files; 'exec' tool for python3 /absolute/path/script.py (absolute
      paths only).
constraints:
  fatal:
  - id: finance-C-001
    when: When implementing AccAPI or FavaInvestorAPI query_func
    action: Convert query results to namedtuple format with field names from query types
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Module code expecting attribute access via row.column_name will fail with AttributeError, breaking asset
      allocation, tax loss harvesting, and other financial computations that rely on namedtuple row iteration
    stage_ids:
    - api_abstraction
  - id: finance-C-002
    when: When implementing price loading in AccAPI
    action: Build beancount price map using prices.build_price_map(entries) from each ledger entries
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Multi-currency portfolios will fail to convert positions to base currency, causing asset allocation calculations
      to crash or produce incorrect results when commodities have different denominations
    stage_ids:
    - api_abstraction
  - id: finance-C-003
    when: When parsing Custom directive configurations
    action: Filter entries for Custom type with fava-extension and evaluate value strings using ast.literal_eval
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Configuration parsing will silently return empty dicts for all modules, causing all investor reports to use
      default parameters instead of user-specified configurations, leading to incorrect financial analysis
    stage_ids:
    - api_abstraction
  - id: finance-C-004
    when: When running FavaInvestorAPI with different Fava versions
    action: Use version.parse from packaging library to compare fava_version against 1.22 and 1.30 thresholds
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Query execution will fail with wrong number of arguments error on incompatible Fava versions, breaking the
      Fava web interface entirely
    stage_ids:
    - api_abstraction
  - id: finance-C-006
    when: When running CLI with AccAPI
    action: Pass a valid beancount file path to AccAPI constructor for ledger loading
    severity: fatal
    kind: resource_boundary
    modality: must
    consequence: Ledger loading will fail with FileNotFoundError, preventing any CLI commands from executing; all modules
      (tlh, assetalloc, cashdrag, summarizer) depend on this
    stage_ids:
    - api_abstraction
  - id: finance-C-009
    when: When creating new AccAPI or FavaInvestorAPI implementations
    action: 'Implement each required methods: query_func, build_price_map, get_custom_config, get_commodity_directives, get_operating_currencies,
      realize, root_tree'
    severity: fatal
    kind: architecture_guardrail
    modality: must
    consequence: Modules calling missing methods will raise AttributeError at runtime, breaking financial calculations that
      depend on those APIs
    stage_ids:
    - api_abstraction
  - id: finance-C-017
    when: When accessing row data from query_func results
    action: Access row fields using attribute notation (row.column_name) not dictionary notation (row['column_name'])
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: namedtuple rows do not support item access; code using row['column'] will raise TypeError, breaking all module
      queries that expect attribute-style access
    stage_ids:
    - api_abstraction
  - id: finance-C-019
    when: When building commodity groups from incomplete declarations
    action: apply transitive closure to infer complete equivalence/substantially identical groups
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Incomplete relationship inference will cause some substantially identical funds to not be grouped together,
      leading to incorrect wash sale detection and missing TLH partner recommendations
    stage_ids:
    - ticker_relationships
  - id: finance-C-020
    when: When implementing the representative() method
    action: consistently return the same representative ticker for any member of a substantially identical group
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Inconsistent representative selection will cause non-deterministic TLH group calculations, where the same
      ticker gets different partners depending on which group member is used as the key
    stage_ids:
    - ticker_relationships
  - id: finance-C-021
    when: When computing TLH groups from unidirectional declarations
    action: infer bidirectional relationships so if A→B is declared, B→A is also included
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Missing bidirectional inference will cause incomplete TLH partner lists, where only some funds show all their
      valid swap options, leading to suboptimal TLH decisions
    stage_ids:
    - ticker_relationships
  - id: finance-C-025
    when: When relationship_source is extended to read from sources other than commodity directives
    action: maintain compatibility with the existing a__equivalents/a__substidenticals/a__tlh_partners metadata schema
    severity: fatal
    kind: resource_boundary
    modality: must
    consequence: Schema changes will break compatibility with ScaledNAV and other classes extending RelateTickers, causing
      inconsistent ticker relationship data across the system
    stage_ids:
    - ticker_relationships
  - id: finance-C-026
    when: When extending RelateTickers with additional functionality
    action: share the same metadata schema for equivalents and substidenticals declarations
    severity: fatal
    kind: resource_boundary
    modality: must
    consequence: Schema divergence will cause extended classes to miss ticker relationships declared for the base class, resulting
      in incomplete wash sale detection and TLH recommendations
    stage_ids:
    - ticker_relationships
  - id: finance-C-027
    when: When selecting tickers for wash sale comparison
    action: use the representatives of substantially identical groups to prevent within-group comparisons
    severity: fatal
    kind: architecture_guardrail
    modality: must
    consequence: Comparing a ticker to itself in its own substantially identical group will falsely trigger wash sale warnings,
      blocking valid TLH transactions
    stage_ids:
    - ticker_relationships
  - id: finance-C-030
    when: When building the wash sale prevention query
    action: use both the current ticker and each its substantially identical partners in the comparison set
    severity: fatal
    kind: architecture_guardrail
    modality: must
    consequence: Missing substantially identical tickers from the wash sale query will allow purchases of similar funds within
      30 days, triggering unintended wash sales that disallow the tax loss
    stage_ids:
    - ticker_relationships
  - id: finance-C-031
    when: When implementing substidenticals() for a ticker with no substantially identical partners
    action: return an empty list, not None or a list containing the input ticker
    severity: fatal
    kind: architecture_guardrail
    modality: must
    consequence: Returning the input ticker will cause false wash sale warnings when comparing a ticker to itself, incorrectly
      blocking valid TLH transactions
    stage_ids:
    - ticker_relationships
  - id: finance-C-035
    when: When implementing classes that extend RelateTickers
    action: call the parent __init__ or replicate its file loading and database initialization
    severity: fatal
    kind: architecture_guardrail
    modality: must
    consequence: Missing database initialization will cause AttributeError when accessing self.db or self.idents, preventing
      the class from functioning
    stage_ids:
    - ticker_relationships
  - id: finance-C-037
    when: When computing asset class percentages for multiple currencies
    action: Convert each positions to a single base currency before aggregating into asset buckets
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Asset allocation percentages will be incorrect if positions in different currencies are summed directly,
      leading to meaningless results like '500 + 750 = 1250 USD' without currency conversion
    stage_ids:
    - asset_allocation_by_class
  - id: finance-C-039
    when: When scaling positions for tax adjustment
    action: Verify that positions have a cost spec present after realization before scaling
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Tax-adjusted allocation will be incorrect if positions lose cost information during realization, causing
      wrong basis for tax-adjusted percentage calculations
    stage_ids:
    - asset_allocation_by_class
  - id: finance-C-040
    when: When handling multi-currency portfolios with cost currencies different from operating currencies
    action: Include the cost currency in operating_currencies list to enable transitive conversion
    severity: fatal
    kind: resource_boundary
    modality: must
    consequence: 'Currency conversion will fail with ''Error: unable to convert X to base currency Y (Missing price directive?)''
      if cost currency is not available as operating currency or via price chain'
    stage_ids:
    - asset_allocation_by_class
  - id: finance-C-052
    when: When writing the asset_allocation function in libaaacc.py
    action: guard against division by zero when portfolio_total is zero
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Division by portfolio_total at line 79 causes ZeroDivisionError when all account balances in selected accounts
      are zero, resulting in complete failure to render any asset allocation report
    stage_ids:
    - asset_allocation_by_account
  - id: finance-C-055
    when: When using accapi.cost_or_value in asset allocation calculations
    action: call accapi.cost_or_value with a CLI-based AccAPI instance without first implementing the method
    severity: fatal
    kind: resource_boundary
    modality: must_not
    consequence: AccAPI.cost_or_value is commented out and returns None/not implemented, causing libaaacc.py:71 to fail with
      AttributeError when running via CLI, preventing any asset allocation by account computation
    stage_ids:
    - asset_allocation_by_account
  - id: finance-C-060
    when: When implementing asset allocation by account in Fava extension
    action: obtain portfolio data via FavaInvestorAPI which provides cost_or_value functionality
    severity: fatal
    kind: architecture_guardrail
    modality: must
    consequence: FavaInvestorAPI provides the cost_or_value method required for balance calculations (libaaacc.py:71), while
      AccAPI for CLI lacks this; using wrong API class causes AttributeError on cost_or_value call
    stage_ids:
    - asset_allocation_by_account
  - id: finance-C-069
    when: When determining which commodities are considered cash
    action: always include operating currencies in the cash commodities list
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Operating currencies like USD will not appear in cash drag output, missing potential cash drag opportunities
      that should be flagged
    stage_ids:
    - cash_drag
  - id: finance-C-079
    when: When implementing loss calculations for tax loss harvesting
    action: Use beancount.core.number.Decimal for each monetary calculations
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Floating-point arithmetic may cause incorrect loss calculations, leading to harvesting the wrong lot quantities
      and potential tax compliance issues
    stage_ids:
    - tax_loss_harvesting
  - id: finance-C-080
    when: When determining long-term vs short-term gain classification
    action: Use dateutil.relativedelta to calculate gain term duration
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Incorrect leap year handling can misclassify gains as long-term when they should be short-term (or vice versa),
      resulting in wrong tax rates applied
    stage_ids:
    - tax_loss_harvesting
  - id: finance-C-081
    when: When implementing wash sale detection logic
    action: Use a 30-day lookback window and 31-day earliest sale date for wash sale compliance
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Incorrect wash sale window violates IRS wash sale rule, disallowing the loss deduction and potentially triggering
      IRS penalties
    stage_ids:
    - tax_loss_harvesting
  - id: finance-C-083
    when: When implementing tax loss harvesting features
    action: Read substantially identical tickers from commodity metadata using a__substidenticals label
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Selling a security and buying a substantially identical one within wash sale window converts harvest into
      a disallowed wash sale
    stage_ids:
    - tax_loss_harvesting
  - id: finance-C-094
    when: When implementing tax calculations in gains minimization
    action: Use Decimal type from beancount.core.number for each monetary calculations
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Floating-point float arithmetic introduces rounding errors in tax calculations, potentially causing incorrect
      lot selection and misreported tax liabilities
    stage_ids:
    - minimize_gains
  - id: finance-C-095
    when: When implementing lot sorting in gains minimization
    action: Sort lots by est_tax_percent in ascending order to prioritize highest-loss lots first
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Incorrect sorting order causes suboptimal lot selection, resulting in higher capital gains tax liability
      than necessary
    stage_ids:
    - minimize_gains
  - id: finance-C-096
    when: When classifying tax term (short-term vs long-term) in gains minimization
    action: Use libtlh.gain_term() function with relativedelta for IRS-compliant year boundary calculation
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Using simple year difference instead of relativedelta causes incorrect long/short term classification near
      year boundaries due to leap year handling; IRS defines 'more than 1 year' as requiring 1 year plus at least 1 day
    stage_ids:
    - minimize_gains
  - id: finance-C-101
    when: When applying short-term and long-term tax rates in gains minimization
    action: Separate lots by holding period and apply corresponding st_tax_rate or lt_tax_rate from config
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Wrong tax rate application causes incorrect estimated tax calculations, leading to suboptimal lot selection
      decisions
    stage_ids:
    - minimize_gains
  - id: finance-C-104
    when: When depending on gain_term classification in gains minimization
    action: Replace or modify libtlh.gain_term() with a custom implementation that changes term classification logic
    severity: fatal
    kind: architecture_guardrail
    modality: must_not
    consequence: Custom term classification may violate IRS rules for long/short term gains, causing incorrect tax estimates
      and potential IRS compliance issues
    stage_ids:
    - minimize_gains
  - id: finance-C-128
    when: When implementing any financial calculation across the entire system
    action: Use Decimal from beancount.core.number for each monetary values to preserve exact decimal representation without
      floating-point errors
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Floating-point calculations introduce rounding errors that corrupt PnL, tax estimates, and portfolio value
      calculations, leading to incorrect financial decisions
  - id: finance-C-129
    when: When building any table output consumed by Fava templates or CLI presenters
    action: Return a standardized 4-tuple format (rtypes, rrows, extra, footer) where rrows is a namedtuple with column access
      via row.column_name
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Fava templates and CLI presenters expect namedtuple row objects; breaking this contract causes AttributeError
      in all consumers
  - id: finance-C-130
    when: When implementing the AccAPI interface for data access abstraction
    action: 'Implement each required methods: query_func, realize, root_tree, build_price_map, get_commodity_directives, get_operating_currencies'
    severity: fatal
    kind: architecture_guardrail
    modality: must
    consequence: Missing interface methods cause AttributeError when modules attempt data access, breaking both Fava extension
      and CLI
  - id: finance-C-131
    when: When performing any currency conversion or financial aggregation
    action: Use operating_currencies[0] as the single base currency for each financial calculations to collapse multi-currency
      portfolios
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Using currencies inconsistently produces incorrect portfolio totals and asset allocation percentages across
      different currencies
  - id: finance-C-148
    when: When executing the TLH analysis pipeline or refactoring its implementation
    action: 'Maintain strict sequential execution order: find_harvestable_lots → harvestable_by_commodity → summarize_tlh
      → build_recents; do not reorder, parallelize, or cache intermediate outputs between stages as subsequent stages expect
      specifically formatted inputs from predecessors'
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Reordering pipeline stages or using cached outputs from non-sequential execution causes subsequent stages
      to receive incorrectly formatted inputs, producing invalid tax loss harvesting recommendations
    derived_from_bd_id: BD-096
  - id: finance-C-151
    when: When calculating gain term for tax classification using relativedelta
    action: Use relativedelta(years=1) for >1 year threshold calculation to properly handle leap years and varying month lengths;
      must NOT substitute with timedelta(days=365) or simple year subtraction
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Using 365-day timedelta misclassifies assets held exactly 365 days as long-term when they should be short-term,
      and vice versa near year boundaries with leap years, causing incorrect tax rate application
    derived_from_bd_id: BD-023
  - id: finance-C-152
    when: When determining wash sale applicability for security transactions
    action: Distinguish between equivalents (freely interchangeable, no wash sale) and substidenticals (trigger wash sale
      if bought/sold within 61-day window) — do NOT treat each ticker relationships as substantially identical
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Treating equivalents as substantially identical over-flags wash sales, incorrectly preventing legitimate
      fund switches and disallowing valid loss deductions; treating substidenticals as equivalents misses wash sales, causing
      IRS non-compliance
    derived_from_bd_id: BD-017
  - id: finance-C-153
    when: When implementing wash sale detection logic in tax loss harvesting
    action: Enforce the 30-day wash sale window symmetrically on BOTH sides of any transaction — check for overlapping purchases
      within 30 days after sales AND overlapping sales within 30 days before purchases
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Asymmetric wash sale enforcement misses overlapping scenarios; selling at a loss and repurchasing within
      30 days would not trigger wash sale, allowing disallowed loss deductions to pass through undetected
    derived_from_bd_id: BD-022
  - id: finance-C-154
    when: When identifying substantially identical securities for wash sale tracking
    action: Read and respect the a__substidenticals metadata field on securities to identify user-defined substantially identical
      relationships — must NOT rely solely on generic fund matching algorithms
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Skipping the a__substidenticals metadata check misses user-defined substantially identical relationships,
      causing false negative wash sale detection where disallowed losses are not properly flagged for IRS compliance
    derived_from_bd_id: BD-025
  - id: finance-C-156
    when: When classifying gains and losses as short-term or long-term across tax optimization modules
    action: Use libtlh.gain_term() consistently as the authoritative source for short/long term classification in each modules
      including minimizegains — must NOT implement independent classification logic in any module
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Inconsistent classification between TLH and minimizegains creates contradictory tax optimization decisions;
      some modules may harvest losses while others classify the same gains differently, potentially creating IRS compliance
      issues
    derived_from_bd_id: BD-102
  - id: finance-C-157
    when: When implementing wash sale detection logic that references the 30-day window
    action: Centralize the 30-day wash sale constant in a single constants module and import it across each modules (BD-007,
      BD-022, BD-027, BD-073) — must NOT hardcode the 30-day value in multiple locations
    severity: fatal
    kind: architecture_guardrail
    modality: must
    consequence: Hardcoding 30-day constant in multiple locations creates maintenance hazard; if IRS rules change or non-US
      jurisdictions need adaptation, missing one location causes inconsistent wash sale detection leading to false positives
      or missed violations
    derived_from_bd_id: BD-108
  - id: finance-C-161
    when: When implementing wash sale detection or determining earliest safe sale dates for tax loss harvesting
    action: Calculate earliest safe sale date as acquisition_date plus exactly 31 days — this boundary is one day beyond the
      30-day IRS wash sale window, ensuring sales occur outside the prohibited period
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Using 30 days instead of 31 days places sales inside the wash sale window, causing the IRS to disallow loss
      deductions and recharacterize gains — this creates tax liability that the backtest does not account for
    derived_from_bd_id: BD-027
  - id: finance-C-163
    when: When configuring wash sale detection for accounts that could trigger wash sale rules
    action: Use separate wash_pattern configurations for taxable accounts versus accounts that could trigger wash sales —
      the wash_pattern for taxable accounts must differ from the wash_pattern for accounts where repurchase would be disallowed
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Using the same wash_pattern for all accounts causes harvesting in one account to incorrectly trigger wash
      sale restrictions in another, disallowing legitimate tax losses across related accounts
    derived_from_bd_id: BD-029
  - id: finance-C-166
    when: When implementing or modifying account filtering logic across TLH, minimizegains, and summarizer modules
    action: Centralize account filter pattern '^[01]' and account field extraction logic in a single shared function — each
      three modules (libtlh.py, libminimizegains.py, libsummarizer.py) must import this function rather than duplicating the
      pattern
    severity: fatal
    kind: architecture_guardrail
    modality: must
    consequence: Hardcoded account filter patterns in three separate modules create concentrated failure points — if the pattern
      needs updating, a missed update in any module causes inconsistent account filtering and contradictory tax optimization
      recommendations
    derived_from_bd_id: BD-110
  - id: finance-C-175
    when: When implementing position scaling with tax adjustment percentage
    action: Validate that tax_adj parameter is non-negative (>= 0) before computing position * (tax_adj / 100); reject or
      clamp negative values to 0
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: A negative tax_adj value produces invalid negative positions, causing strategies to take short positions
      in assets they should only hold long, resulting in completely inverted position logic and potentially unlimited loss
      exposure in margin accounts
    derived_from_bd_id: BD-069
  - id: finance-C-177
    when: When implementing tax-loss harvesting logic that excludes recent purchases
    action: Exclude positions purchased within the last 30 days (including day-of-purchase) from TLH harvesting pool; positions
      with purchase_date >= current_date - 30 days are ineligible
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Failing to exclude recently purchased positions from TLH harvesting causes the system to harvest losses that
      the IRS will disallow under wash sale rules, resulting in unexpected tax liabilities when the disallowed loss is added
      back to cost basis in future years
    derived_from_bd_id: BD-073
  - id: finance-C-192
    when: When implementing any financial calculation (NAV, portfolio valuations, tax lot computations) in any module
    action: Convert each monetary values to operating_currencies[0] before performing calculations — the system enforces using
      the first operating currency as the single canonical base unit; there is no multi-currency fallback
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Mixing currencies without conversion produces mathematically invalid results; adding USD and CNY values without
      conversion creates meaningless numbers that violate accounting consistency and cause incorrect tax calculations
    derived_from_bd_id: BD-092
  - id: finance-C-199
    when: When implementing or modifying libtlh.gain_term() or libtlh.get_account_field() functions
    action: Verify backward compatibility in function signatures and behavior - changes will cascade through both TLH harvestable
      lot identification and minimizegains gain minimization recommendations simultaneously
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: A bug in gain_term leap-year handling or account field parsing silently corrupts both the TLH module's harvestable
      lot identification AND the minimizegains module's gain minimization recommendations, causing suboptimal tax strategy
      recommendations without raising errors
    derived_from_bd_id: BD-095
  - id: finance-C-200
    when: When processing portfolio valuation for accounts with multiple operating_currencies or when operating_currencies
      may be empty
    action: Validate that operating_currencies contains at least one currency before any downstream calculations - if operating_currencies
      is empty, the entire calculation pipeline (BD-103 asset allocation, BD-037 currency conversion, BD-069 position scaling,
      BD-090/091 tax lot calculations, BD-076/077 tax rates) will produce values in invalid currency without raising errors
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Empty operating_currencies causes the entire financial calculation pipeline to produce values in an invalid
      currency, corrupting tax optimization recommendations and asset allocation analysis for multi-currency portfolios -
      the system will show numbers but they represent no valid currency
    derived_from_bd_id: BD-116
  - id: finance-C-201
    when: When implementing minimizegains module or modifying loss_threshold default value
    action: Verify loss_threshold default of $1 is intentional for the tax optimization strategy - changes to this default
      affect both the harvestable lot identification stage and the gain minimization stage of the tax loss harvesting pipeline
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Changing loss_threshold default affects both pipeline stages, potentially causing lots with small losses
      to be excluded from harvesting or different optimization strategies to be recommended, reducing tax savings effectiveness
    derived_from_bd_id: BD-117
  - id: finance-C-202
    when: When implementing minimizegains module or modifying libtlh gain_term() tax term classification
    action: Verify leap year handling in relativedelta calculations for gain_term() - BD-023 and BD-008 verify correct handling,
      changes would cascade through tax lot classification affecting both harvestable lot identification and gain minimization
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Incorrect leap year handling in gain_term() causes misclassification of short-term vs long-term gains, leading
      to incorrect tax optimization recommendations and potential non-compliance with tax holding period requirements
    derived_from_bd_id: BD-117
  - id: finance-C-207
    when: When implementing wash sale detection logic
    action: Modify the 30-day wash sale lookback period — DATE_ADD(TODAY(), -30) implements IRS IRC Section 1091 regulatory
      requirement; changing this value violates tax compliance
    severity: fatal
    kind: domain_rule
    modality: must_not
    consequence: Modifying the hardcoded 30-day wash sale lookback causes the backtesting system to violate IRS wash sale
      rule IRC Section 1091, potentially allowing loss claims that would be disallowed in actual tax filings and triggering
      IRS penalties
    derived_from_bd_id: BD-105
  regular:
  - id: finance-C-005
    when: When implementing AccAPI root_tree
    action: Import and use fava.core.Tree class with entries to build tree structure
    severity: high
    kind: resource_boundary
    modality: must
    consequence: CLI commands that display account tree structures will fail to import fava.core.Tree, preventing tree-based
      visualizations from working in standalone mode
    stage_ids:
    - api_abstraction
  - id: finance-C-007
    when: When deploying FavaInvestorAPI in web context
    action: Verify Fava version is at least 1.22 to support required query API compatibility
    severity: high
    kind: resource_boundary
    modality: must
    consequence: Fava web interface will fail to execute queries, displaying error messages to users and making all investor
      reports inaccessible via the web UI
    stage_ids:
    - api_abstraction
  - id: finance-C-008
    when: When parsing config with ast.literal_eval
    action: Use Python dict literal syntax in beancount Custom directive values (not JSON)
    severity: high
    kind: resource_boundary
    modality: must
    consequence: ast.literal_eval will raise SyntaxError when parsing JSON format, causing all module configurations to be
      ignored and defaults to be used instead
    stage_ids:
    - api_abstraction
  - id: finance-C-010
    when: When accessing operating currencies in module code
    action: Call accapi.get_operating_currencies() which returns a list, and access first element for single currency operations
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Currency conversion operations will fail with index error or wrong currency when expecting single base currency
      but receiving list of operating currencies
    stage_ids:
    - api_abstraction
  - id: finance-C-011
    when: When instantiating AccAPI in Fava web context
    action: Instantiate AccAPI directly in Fava extension code that runs in web context
    severity: high
    kind: architecture_guardrail
    modality: must_not
    consequence: AccAPI requires a beancount file path which is not available in Fava web context; using AccAPI instead of
      FavaInvestorAPI will cause loader.load_file to fail with FileNotFoundError
    stage_ids:
    - api_abstraction
  - id: finance-C-012
    when: When configuring fava-extension Custom directive
    action: Use 'fava_investor' as part of the config key name for fava_investor module configurations
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: get_custom_config will fail to find module configurations, returning empty dicts and causing all modules
      to use default parameters instead of user-specified settings
    stage_ids:
    - api_abstraction
  - id: finance-C-013
    when: When claiming API functionality
    action: Claim real-time price updates or live trading execution capability
    severity: high
    kind: claim_boundary
    modality: must_not
    consequence: API only reads historical beancount ledger entries; no mechanism exists for real-time data or trade execution,
      misleading users about system capabilities
    stage_ids:
    - api_abstraction
  - id: finance-C-014
    when: When presenting query results or calculations
    action: Present backtested calculations as guaranteed future investment outcomes
    severity: high
    kind: claim_boundary
    modality: must_not
    consequence: Historical ledger analysis does not predict future returns; presenting tax loss harvesting opportunities
      or asset allocation suggestions as guaranteed profits violates financial advisory regulations
    stage_ids:
    - api_abstraction
  - id: finance-C-015
    when: When considering duck-typing as reason to skip interface validation
    action: Skip verifying that new API implementations have each required method signatures
    severity: high
    kind: rationalization_guard
    modality: must_not
    consequence: Missing methods will cause AttributeError at runtime when modules call query_func, build_price_map, or other
      APIs; duck-typing only works when interface contract is fulfilled
    stage_ids:
    - api_abstraction
  - id: finance-C-016
    when: When simplifying API by removing version compatibility branches
    action: Remove Fava version 1.22 compatibility branch assuming no users have older versions
    severity: medium
    kind: rationalization_guard
    modality: must_not
    consequence: Users running Fava 1.22-1.29 will get TypeError when executing queries, breaking the web interface for a
      significant portion of the user base
    stage_ids:
    - api_abstraction
  - id: finance-C-018
    when: When declaring ticker relationships in Beancount commodity directives
    action: use 'a__equivalents' for same-share-class relationships that don't trigger wash sales
    severity: high
    kind: domain_rule
    modality: must
    consequence: Using 'a__substidenticals' for same-share-class funds instead of 'a__equivalents' will incorrectly flag them
      as wash sale risks, leading to missed TLH opportunities and potential false wash sale warnings
    stage_ids:
    - ticker_relationships
  - id: finance-C-022
    when: When declaring archived tickers in commodity directives
    action: include 'archive' in the commodity metadata to mark it as no longer held
    severity: high
    kind: domain_rule
    modality: must
    consequence: Without proper archive marking, archived tickers will continue appearing in active TLH recommendations, suggesting
      swaps to funds that are no longer part of the portfolio
    stage_ids:
    - ticker_relationships
  - id: finance-C-023
    when: When declaring TLH partners in commodity directives
    action: use 'a__tlh_partners' metadata field (not 'tlh_partners') with comma-separated ticker values
    severity: high
    kind: domain_rule
    modality: must
    consequence: Using the old 'tlh_partners' field or incorrect format will result in zero TLH partners being read, eliminating
      all inferred TLH swap recommendations
    stage_ids:
    - ticker_relationships
  - id: finance-C-024
    when: When using RelateTickers with a commodities file path
    action: provide an existing file path or None; exit gracefully if file doesn't exist
    severity: high
    kind: resource_boundary
    modality: must
    consequence: Missing file handling will cause abrupt program termination without clear error message, making debugging
      difficult for users
    stage_ids:
    - ticker_relationships
  - id: finance-C-028
    when: When computing TLH groups in Step 3
    action: apply the bidirectional inference rule exactly once without iteration or convergence
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Iterating to convergence will cause unintended transitive effects where A→B→C creates A→C directly, which
      may not be appropriate for all fund relationships
    stage_ids:
    - ticker_relationships
  - id: finance-C-029
    when: When filtering archived tickers from TLH groups
    action: remove archived tickers from both keys and values in Step 5
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Archived tickers appearing as keys will cause downstream errors; archived tickers in values will recommend
      funds that are no longer held in the portfolio
    stage_ids:
    - ticker_relationships
  - id: finance-C-032
    when: When comparing funds in same_type_funds_only mode
    action: compare quote types using the 'a__quoteType' metadata field with values like 'ETF' or 'MUTUALFUND'
    severity: medium
    kind: architecture_guardrail
    modality: must
    consequence: Without type filtering, mixing ETFs and mutual funds in TLH recommendations may create unsuitable swap suggestions
      that are not truly equivalent investment vehicles
    stage_ids:
    - ticker_relationships
  - id: finance-C-033
    when: When TLH analysis results are presented to users
    action: present the results as tax or financial advice
    severity: high
    kind: claim_boundary
    modality: must_not
    consequence: Presenting computational results as advice may lead users to make financial decisions without consulting
      qualified tax professionals, potentially resulting in IRS penalties or missed tax optimization opportunities
    stage_ids:
    - ticker_relationships
  - id: finance-C-034
    when: When declaring a__tlh_partners metadata
    action: declare only funds that are NOT substantially identical to the current fund
    severity: high
    kind: domain_rule
    modality: must
    consequence: Including substantially identical funds in TLH partners will produce misleading recommendations since swapping
      between substantially identical funds triggers wash sales rather than achieving tax loss harvesting
    stage_ids:
    - ticker_relationships
  - id: finance-C-036
    when: When using the pretty_sort() function for display
    action: document that the sort produces valid but potentially different results across different runs
    severity: low
    kind: operational_lesson
    modality: should
    consequence: Users expecting deterministic ordering may be confused when the same TLH groups appear in different order
      on different runs, potentially leading to misread reports
    stage_ids:
    - ticker_relationships
  - id: finance-C-038
    when: When bucketing commodities by asset_allocation_* metadata
    action: Verify bucket names use underscores consistently for hierarchical tree construction
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Tree hierarchy will be incorrectly constructed if bucket names contain hyphens or other separators, causing
      'equity-domestic' to be treated as one node instead of nested 'equity' and 'domestic' nodes
    stage_ids:
    - asset_allocation_by_class
  - id: finance-C-041
    when: When calculating asset allocation percentages
    action: Use Decimal type for percentage calculations to avoid floating-point rounding errors
    severity: high
    kind: domain_rule
    modality: must
    consequence: Percentage calculations will have rounding errors when using float, potentially causing percentages to not
      sum to 100.00% exactly
    stage_ids:
    - asset_allocation_by_class
    - asset_allocation_by_account
  - id: finance-C-042
    when: When validating asset_allocation_* metadata percentages
    action: Pad remaining percentage to 'unknown' bucket if metadata does not sum to 100%
    severity: medium
    kind: domain_rule
    modality: must
    consequence: Portfolio allocation will not sum to 100% if metadata percentages don't add up, silently misrepresenting
      the actual portfolio allocation
    stage_ids:
    - asset_allocation_by_class
  - id: finance-C-043
    when: When building asset allocation tree from buckets
    action: Skip positions with negative balances from asset allocation calculations
    severity: medium
    kind: domain_rule
    modality: must_not
    consequence: Liabilities will be incorrectly included as positive asset allocation, distorting portfolio composition and
      percentages
    stage_ids:
    - asset_allocation_by_class
  - id: finance-C-044
    when: When specifying commodities for asset allocation classification
    action: Prefix each asset class metadata keys with 'asset_allocation_'
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Commodities without the asset_allocation_ prefix will not be included in any bucket, causing incomplete portfolio
      representation
    stage_ids:
    - asset_allocation_by_class
  - id: finance-C-045
    when: When calculating asset allocation in multi-currency portfolios
    action: Add price entries for each commodities to enable accurate market value conversion
    severity: high
    kind: operational_lesson
    modality: must
    consequence: Portfolio allocation will be incorrect if price entries are missing or outdated, causing inaccurate market
      valuations of held commodities
    stage_ids:
    - asset_allocation_by_class
  - id: finance-C-047
    when: When excluding ancestor accounts from asset allocation
    action: Set excluded ancestor account balances to empty Inventory instead of removing them
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Ancestor accounts with explicit transactions will inflate asset allocation if balances are not zeroed, causing
      double-counting of holdings
    stage_ids:
    - asset_allocation_by_class
  - id: finance-C-048
    when: When reporting asset allocation results
    action: Claim percentages are accurate for live trading without considering price timing
    severity: medium
    kind: claim_boundary
    modality: must_not
    consequence: Asset allocation percentages depend on price entries at calculation time; presenting these as precise allocations
      ignores that prices change throughout the trading day
    stage_ids:
    - asset_allocation_by_class
  - id: finance-C-049
    when: When filtering accounts for asset allocation
    action: Include accounts with zero balance in the allocation calculation
    severity: low
    kind: operational_lesson
    modality: must_not
    consequence: Empty accounts will create unnecessary tree nodes and potentially confuse the hierarchical structure without
      providing meaningful allocation data
    stage_ids:
    - asset_allocation_by_class
  - id: finance-C-050
    when: When implementing tax-adjusted positions with multiple currency conversions
    action: Convert each positions to base currency before applying tax adjustment scaling
    severity: high
    kind: domain_rule
    modality: must
    consequence: Tax-adjusted position values will be incorrect if cost currencies are not converted to base currency before
      scaling, causing wrong percentage allocations
    stage_ids:
    - asset_allocation_by_class
  - id: finance-C-051
    when: When configuring the asset allocation module
    action: Set skip_tax_adjustment to True only when not using tax-adjusted accounts
    severity: high
    kind: operational_lesson
    modality: must
    consequence: Tax-deferred accounts like retirement accounts will show incorrect allocations if tax adjustment is skipped
      when it should be applied
    stage_ids:
    - asset_allocation_by_class
  - id: finance-C-054
    when: When calculating asset allocation percentages
    action: verify percentages sum to exactly 100% within each portfolio group
    severity: high
    kind: domain_rule
    modality: must
    consequence: Without rounding or normalization logic, individual allocation percentages may not sum to 100%, violating
      the fundamental invariant that total allocation must equal 100% and breaking acceptance criteria
    stage_ids:
    - asset_allocation_by_account
  - id: finance-C-056
    when: When extending asset allocation selection strategies
    action: implement new selection strategies as by_* functions in libaaacc.py module scope
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: The pattern_type lookup uses globals()['by_' + pattern_type] to dynamically locate selection strategy functions;
      functions not defined at module level or with incorrect naming will raise KeyError, breaking portfolio grouping
    stage_ids:
    - asset_allocation_by_account
  - id: finance-C-057
    when: When configuring asset allocation by account patterns
    action: verify regex patterns match against existing account names in the root tree
    severity: high
    kind: operational_lesson
    modality: must
    consequence: Non-matching regex patterns result in empty selected_accounts list, causing portfolio_total to be zero and
      triggering division by zero error, producing no meaningful allocation output
    stage_ids:
    - asset_allocation_by_account
  - id: finance-C-058
    when: When running asset allocation by account via CLI
    action: attempt to execute assetalloc_account CLI which exits with error message
    severity: high
    kind: resource_boundary
    modality: must_not
    consequence: 'assetalloc_account.py:59 contains sys.exit(''Error: CLI not yet implemented''), causing immediate termination
      of CLI execution with no asset allocation output, blocking automated workflows'
    stage_ids:
    - asset_allocation_by_account
  - id: finance-C-059
    when: When converting account balances to operating currency
    action: verify that at least one account balance exists in the operating currency before computing percentages
    severity: high
    kind: domain_rule
    modality: must
    consequence: When no accounts have balances in the configured operating currency, rrows stays empty, leading to portfolio_total=0
      and division by zero, producing no allocation output despite valid configuration
    stage_ids:
    - asset_allocation_by_account
  - id: finance-C-061
    when: When returning table data for Fava template rendering
    action: format output as (title, table_data) tuple with correct rtypes structure
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Fava template expects (title, table_data) tuple structure from table_list_renderer macro; incorrect tuple
      format causes Jinja2 template rendering failure and empty output
    stage_ids:
    - asset_allocation_by_account
  - id: finance-C-062
    when: When using include_children configuration option
    action: call cost_or_value with include_children parameter correctly forwarded to accapi
    severity: medium
    kind: operational_lesson
    modality: must
    consequence: include_children defaults to False in config.get() but is passed to cost_or_value; incorrect handling causes
      child account balances to be excluded when user expects them included, producing incorrect portfolio totals
    stage_ids:
    - asset_allocation_by_account
  - id: finance-C-063
    when: When adding new selection strategy types via pattern_type
    action: use unvalidated user input as pattern_type without verifying the corresponding function exists
    severity: high
    kind: domain_rule
    modality: must_not
    consequence: Dynamic function lookup via globals()['by_' + pattern_type] with unvalidated config input allows attacker
      or misconfigured user to trigger KeyError exceptions, potentially exposing internal module structure or causing denial
      of service
    stage_ids:
    - asset_allocation_by_account
  - id: finance-C-064
    when: When presenting asset allocation results
    action: claim the percentages represent actual market timing or real-time trading accuracy
    severity: medium
    kind: claim_boundary
    modality: must_not
    consequence: Asset allocation percentages are computed from historical cost basis and may not reflect current market values,
      especially for assets with significant unrealized gains; presenting these as real-time allocation overstates precision
      of actual portfolio composition
    stage_ids:
    - asset_allocation_by_account
  - id: finance-C-065
    when: When running asset allocation by account via Beancount CLI
    action: expect the same functionality as the Fava web UI implementation
    severity: high
    kind: claim_boundary
    modality: must_not
    consequence: CLI implementation explicitly exits with error and cost_or_value is not implemented for AccAPI; users attempting
      CLI usage will receive immediate termination, not a degraded but functional experience
    stage_ids:
    - asset_allocation_by_account
  - id: finance-C-066
    when: When the pattern matches accounts but they have zero balance
    action: handle the empty result set gracefully with informative output
    severity: medium
    kind: operational_lesson
    modality: must
    consequence: When pattern matches accounts but all have zero balance in operating currency, rrows becomes empty, portfolio_total=0,
      and percentage calculation fails silently, producing no output to indicate why allocation is empty
    stage_ids:
    - asset_allocation_by_account
  - id: finance-C-067
    when: When using regex patterns with account name matching
    action: compile regex patterns with error handling for invalid regex syntax
    severity: high
    kind: operational_lesson
    modality: must
    consequence: re.compile(pattern) at libaaacc.py:29 and :48 raises re.error for invalid regex patterns, crashing the entire
      asset allocation report and preventing viewing of any portfolio data
    stage_ids:
    - asset_allocation_by_account
  - id: finance-C-068
    when: When building the cash drag analysis table
    action: filter out rows where position equals empty Inventory before displaying
    severity: high
    kind: domain_rule
    modality: must
    consequence: Empty zero-value rows will appear in the table output, cluttering the display with meaningless rows showing
      no cash balances
    stage_ids:
    - cash_drag
  - id: finance-C-070
    when: When filtering cash positions by minimum threshold
    action: apply min_threshold only after filtering empty positions and using converted position value
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Threshold filtering may fail with AttributeError when position inventory contains no positions, or may compare
      wrong currency values
    stage_ids:
    - cash_drag
  - id: finance-C-071
    when: When configuring the cash drag module
    action: set accounts_exclude_pattern to exclude wallet cash and zero-sum accounts from cash drag analysis
    severity: medium
    kind: operational_lesson
    modality: must
    consequence: Physical wallet cash (like Cash-In-Wallet) and zero-sum reconciliation accounts will incorrectly appear as
      uninvested cash drag
    stage_ids:
    - cash_drag
  - id: finance-C-072
    when: When running cash drag in command-line mode vs Fava
    action: use AccAPI for CLI and FavaInvestorAPI for Fava extension, both providing consistent query_func interface
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Cash drag analysis will fail entirely when running from command line, only working within Fava web interface
    stage_ids:
    - cash_drag
  - id: finance-C-073
    when: When using get_only_position() on Inventory objects
    action: verify Inventory contains exactly one position before calling get_only_position()
    severity: high
    kind: domain_rule
    modality: must
    consequence: ValueError exception will be raised when get_only_position() is called on multi-position Inventory, crashing
      the cash drag analysis
    stage_ids:
    - cash_drag
  - id: finance-C-074
    when: When presenting cash drag analysis results
    action: display balances converted to the primary operating currency for consistent comparison
    severity: medium
    kind: architecture_guardrail
    modality: must
    consequence: Multi-currency cash holdings cannot be compared or summed meaningfully, making cash drag analysis unreliable
      across different currencies
    stage_ids:
    - cash_drag
  - id: finance-C-075
    when: When identifying cash commodities via metadata
    action: check for metadata_label_cash set to value 100 on commodity declarations
    severity: medium
    kind: domain_rule
    modality: must
    consequence: Money market funds and short-term bonds tagged with asset_allocation_Bond_Cash metadata will not be recognized
      as cash equivalents
    stage_ids:
    - cash_drag
  - id: finance-C-076
    when: When claiming cash drag detection capabilities
    action: claim real-time brokerage balance synchronization or live trading integration
    severity: high
    kind: claim_boundary
    modality: must_not
    consequence: Users will expect live cash position updates and automated investment capabilities that the system cannot
      provide, leading to unmet expectations
    stage_ids:
    - cash_drag
  - id: finance-C-077
    when: When configuring accounts_pattern for cash drag analysis
    action: use regex pattern anchored to asset account hierarchy (e.g., '^Assets:.*') to avoid false matches
    severity: high
    kind: operational_lesson
    modality: must
    consequence: Income, expense, or liability accounts matching the pattern will incorrectly contribute to cash drag analysis
      with spurious amounts
    stage_ids:
    - cash_drag
  - id: finance-C-078
    when: When accepting user configurations for cash drag module
    action: accept empty string as accounts_exclude_pattern without handling as no-exclusion case
    severity: medium
    kind: resource_boundary
    modality: must_not
    consequence: Empty exclude pattern will cause SQL query to fail or produce incorrect results due to malformed regex in
      WHERE clause
    stage_ids:
    - cash_drag
  - id: finance-C-082
    when: When implementing tax loss harvesting for a client's accounts
    action: Verify that Beancount booking method is set to STRICT (Specific Identification of Shares)
    severity: high
    kind: domain_rule
    modality: must
    consequence: Using average cost method or FIFO/LIFO while claiming SpecID-based TLH violates tax regulations, causing
      disallowed losses and potential audits
    stage_ids:
    - tax_loss_harvesting
  - id: finance-C-084
    when: When implementing TLH for users with multiple substantially identical securities
    action: Read substantially identical relationships from both a__substidenticals and a__equivalents commodity metadata
    severity: high
    kind: domain_rule
    modality: must
    consequence: Missing equivalent fund relationships (like VOO/VFINX/VFIAX) causes wash sales when users switch between
      share classes of the same fund
    stage_ids:
    - tax_loss_harvesting
  - id: finance-C-085
    when: When deploying tax loss harvesting for a jurisdiction
    action: Claim the tool works for non-US tax jurisdictions without explicit adaptation
    severity: high
    kind: claim_boundary
    modality: must_not
    consequence: US-specific rules (SpecID, wash sale 30-day window, >365-day long-term) do not apply to other countries,
      leading to incorrect tax advice
    stage_ids:
    - tax_loss_harvesting
  - id: finance-C-086
    when: When presenting tax loss harvesting results to users
    action: Present harvest recommendations without explicit financial/tax advice disclaimer
    severity: high
    kind: claim_boundary
    modality: must_not
    consequence: Without proper disclaimer, users may treat automated suggestions as professional tax advice, violating regulatory
      requirements and causing financial harm
    stage_ids:
    - tax_loss_harvesting
  - id: finance-C-087
    when: When integrating tax loss harvesting into Fava
    action: Apply Fava GUI time filters to the TLH module
    severity: high
    kind: resource_boundary
    modality: must_not
    consequence: Fava time filters cause unpredictable results in TLH since the module uses TODAY() for wash sale calculation
      and needs current market prices
    stage_ids:
    - tax_loss_harvesting
  - id: finance-C-088
    when: When handling partial wash sales in tax loss harvesting
    action: Display complex wash sale scenarios that require sophisticated IRS matching rules
    severity: medium
    kind: resource_boundary
    modality: must_not
    consequence: Displaying partial wash sales with ambiguous purchase/sale matching confuses users and may lead to incorrect
      tax filings
    stage_ids:
    - tax_loss_harvesting
  - id: finance-C-089
    when: When implementing TLH wash sale detection across accounts
    action: Configure dividend reinvestment to be OFF for each tickers across each accounts
    severity: high
    kind: operational_lesson
    modality: must
    consequence: Dividend reinvestment creates new purchases within the wash sale window, silently invalidating harvest recommendations
      and causing wash sales
    stage_ids:
    - tax_loss_harvesting
  - id: finance-C-090
    when: When calculating harvestable losses in TLH summary
    action: Include wash-sale-affected losses in the total harvestable loss summary
    severity: high
    kind: operational_lesson
    modality: must_not
    consequence: Summary includes losses that will be disallowed due to wash sales, overstating actual harvestable tax benefit
    stage_ids:
    - tax_loss_harvesting
  - id: finance-C-092
    when: When implementing TLH for non-standard Beancount ledgers
    action: Verify account numbering follows Beancount convention with account_sortkey starting with 0 or 1
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Hardcoded account_sortkey pattern '^[01]' causes non-standard ledgers to fail silently without finding any
      harvestable lots
    stage_ids:
    - tax_loss_harvesting
  - id: finance-C-093
    when: When implementing tax loss harvesting queries
    action: Use TODAY() function in SQL queries instead of hardcoded dates
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Hardcoded dates cause look-ahead bias where future information influences current recommendations, and queries
      become stale over time
    stage_ids:
    - tax_loss_harvesting
  - id: finance-C-097
    when: When querying lot data for gains minimization
    action: Verify market_value inventory contains exactly one position using get_only_position()
    severity: high
    kind: domain_rule
    modality: must
    consequence: Inventory with multiple positions causes ambiguous gain calculation and incorrect lot pricing, leading to
      wrong tax estimates
    stage_ids:
    - minimize_gains
  - id: finance-C-098
    when: When calculating cumulative tax columns in gains minimization
    action: Calculate cumulative proceeds, taxes, and gains by iterating through sorted lots in order
    severity: high
    kind: domain_rule
    modality: must
    consequence: Cumulative columns show incorrect incremental tax burden if calculated out of order, breaking the progressive
      selling guidance feature
    stage_ids:
    - minimize_gains
  - id: finance-C-099
    when: When implementing lot selection algorithm in gains minimization
    action: Design lot_selection_algorithm as a replaceable/pluggable component for customization
    severity: medium
    kind: resource_boundary
    modality: must
    consequence: Hardcoded lot selection logic prevents users from implementing jurisdiction-specific or personalized selling
      strategies
    stage_ids:
    - minimize_gains
  - id: finance-C-100
    when: When using the gains minimization results for tax planning
    action: Claim that results account for asset allocation constraints or tax-advantaged account positioning
    severity: high
    kind: resource_boundary
    modality: must_not
    consequence: Results misrepresent tax optimization by ignoring portfolio rebalancing impacts and tax-advantaged account
      considerations, potentially leading to unintended portfolio drift
    stage_ids:
    - minimize_gains
  - id: finance-C-102
    when: When replacing the lot selection algorithm in gains minimization
    action: Maintain the est_tax_percent output column and ascending sort for compatibility with downstream cumulative calculations
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Breaking the est_tax_percent interface breaks cumulative column calculations and interpolation functions
      that rely on sorted lot order
    stage_ids:
    - minimize_gains
  - id: finance-C-103
    when: When accessing configuration in gains minimization
    action: Retrieve minimizegains-specific config via accapi.get_custom_config('minimizegains')
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Accessing config directly bypasses the abstraction layer, causing CLI and Fava plugin interfaces to fail
    stage_ids:
    - minimize_gains
  - id: finance-C-105
    when: When presenting tax burden estimates from gains minimization
    action: Claim results as guaranteed tax savings or accurate tax liability predictions
    severity: high
    kind: claim_boundary
    modality: must_not
    consequence: Presenting estimates as guarantees misleads users about actual tax outcomes, which depend on individual circumstances,
      tax year changes, and jurisdiction-specific rules
    stage_ids:
    - minimize_gains
  - id: finance-C-106
    when: When using gains minimization for tax planning decisions
    action: Claim that the tool provides wash sale avoidance analysis or considers 30-day rebalancing windows
    severity: high
    kind: claim_boundary
    modality: must_not
    consequence: Unlike tax_loss_harvesting module, minimizegains does not check for wash sale implications, leading users
      to believe they have comprehensive tax planning when critical rules are missing
    stage_ids:
    - minimize_gains
  - id: finance-C-107
    when: When building lot tables with single positions in gains minimization
    action: Group by cost_date, currency, cost_currency, cost_number to verify each lot is analyzed separately
    severity: high
    kind: domain_rule
    modality: must
    consequence: Incorrect grouping causes mixed lots to be analyzed together, breaking gain calculation per-lot and preventing
      granular tax optimization
    stage_ids:
    - minimize_gains
  - id: finance-C-108
    when: When interpolating tax burden for a specific liquidation amount
    action: Use linear interpolation between cumulative rows when amount falls between two cu_proceeds values
    severity: high
    kind: domain_rule
    modality: must
    consequence: Non-linear interpolation causes incorrect tax burden estimates for specific liquidation amounts, leading
      to suboptimal selling decisions
    stage_ids:
    - minimize_gains
  - id: finance-C-109
    when: When implementing a metadata summarizer config for Beancount
    action: specify directive_type values other than 'accounts' or 'commodities'
    severity: high
    kind: domain_rule
    modality: must_not
    consequence: The build_table function only handles 'accounts' and 'commodities' directive types, and any other value causes
      silent failure with empty table output
    stage_ids:
    - metadata_summarizer
  - id: finance-C-110
    when: When processing commodity leaf accounts in the summarizer
    action: exclude commodity leaf accounts that have an open parent account from output
    severity: high
    kind: domain_rule
    modality: must
    consequence: Commodity accounts (uppercase names) duplicate their parent's metadata in summaries, causing confusing redundant
      rows in the output table
    stage_ids:
    - metadata_summarizer
  - id: finance-C-111
    when: When building table rows from Open directive metadata
    action: fill missing column values with empty strings rather than omitting the row or column
    severity: medium
    kind: domain_rule
    modality: must
    consequence: Tables become misaligned with missing cells or columns omitted entirely, making output unreadable and breaking
      sort operations on the table
    stage_ids:
    - metadata_summarizer
  - id: finance-C-112
    when: When using meta_prefix filtering with specified columns
    action: construct column names by concatenating meta_prefix with each specified column name
    severity: high
    kind: domain_rule
    modality: must
    consequence: Column matching fails and tables display empty values even when matching metadata exists, because the prefix
      is not properly prepended
    stage_ids:
    - metadata_summarizer
  - id: finance-C-113
    when: When using col_labels to rename metadata columns
    action: preserve the order of columns as specified in the columns config array
    severity: high
    kind: domain_rule
    modality: must
    consequence: Column order becomes inconsistent between the header and row data, causing ValueError exceptions when creating
      namedtuples or misaligned table output
    stage_ids:
    - metadata_summarizer
  - id: finance-C-114
    when: When defining column labels in the summarizer config
    action: include spaces in col_labels values
    severity: high
    kind: domain_rule
    modality: must_not
    consequence: Namedtuple field names cannot contain spaces, causing ValueError exceptions when building table output and
      crashing the summarizer
    stage_ids:
    - metadata_summarizer
  - id: finance-C-115
    when: When defining namedtuple field names from config column labels
    action: use column names that start with digits or contain invalid Python identifier characters
    severity: high
    kind: domain_rule
    modality: must_not
    consequence: Namedtuple requires valid Python identifiers as field names, causing ValueError exceptions when column labels
      start with numbers or contain special characters
    stage_ids:
    - metadata_summarizer
  - id: finance-C-116
    when: When executing the active commodities SQL query
    action: use the Beancount convention 'account_sortkey(account) ~ "^[01]"' for filtering investment accounts
    severity: medium
    kind: domain_rule
    modality: must
    consequence: Incorrect account numbering filter causes commodity holdings to be missed from 'active_only' summaries, making
      market_value calculations incomplete or zero
    stage_ids:
    - metadata_summarizer
  - id: finance-C-117
    when: When retrieving the operating currency for balance calculations
    action: access only index [0] of the operating currencies list, assuming at least one currency is defined
    severity: high
    kind: resource_boundary
    modality: must
    consequence: IndexError exception crashes the summarizer when the Beancount file does not define any operating_currency
      option
    stage_ids:
    - metadata_summarizer
  - id: finance-C-118
    when: When compiling the acc_pattern regex in account filtering
    action: provide a valid Python regex pattern string for account matching
    severity: high
    kind: resource_boundary
    modality: must
    consequence: Invalid regex syntax causes re.compile to raise PatternSyntaxError, crashing the summarizer before any table
      generation occurs
    stage_ids:
    - metadata_summarizer
  - id: finance-C-119
    when: When running the summarizer without a fava-extension config
    action: return an empty dictionary from get_custom_config when no config is found
    severity: medium
    kind: resource_boundary
    modality: must
    consequence: Without proper empty dict handling, build_tables iterates over an empty list and produces no output tables,
      silently failing
    stage_ids:
    - metadata_summarizer
  - id: finance-C-120
    when: When implementing metadata prefix filtering with special metadata columns
    action: use either meta_prefix with or without specified_cols, not both modes simultaneously
    severity: medium
    kind: operational_lesson
    modality: must
    consequence: Conflicting configuration causes incorrect column selection logic, either returning too many or too few columns
      in the output
    stage_ids:
    - metadata_summarizer
  - id: finance-C-121
    when: When processing closed accounts in metadata summarization
    action: exclude accounts that have a Close directive from the output
    severity: medium
    kind: architecture_guardrail
    modality: must
    consequence: Closed accounts continue appearing in metadata tables, showing stale contact information for accounts no
      longer in use
    stage_ids:
    - metadata_summarizer
  - id: finance-C-122
    when: When adding special 'account' and 'balance' columns to account metadata
    action: conditionally add these columns based on whether they appear in the columns config
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Missing special column handling causes KeyError when accessing row keys, or duplicate columns when 'account'/'balance'
      exist in both metadata and as special additions
    stage_ids:
    - metadata_summarizer
  - id: finance-C-123
    when: When implementing the summarizer module
    action: access data exclusively through the AccAPI/FavaInvestorAPI abstraction, not directly from Beancount internals
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Direct Beancount API access bypasses the abstraction layer, making the code incompatible with Fava CLI mode
      and reducing portability
    stage_ids:
    - metadata_summarizer
  - id: finance-C-124
    when: When requesting the commodity market_value column
    action: enable active_only mode to populate market_value data from current holdings
    severity: medium
    kind: operational_lesson
    modality: must
    consequence: Without active_only, market_value column shows empty values for all commodities, making the table misleading
      as it omits actual holding values
    stage_ids:
    - metadata_summarizer
  - id: finance-C-125
    when: When summarizing metadata from Beancount directives
    action: claim support for metadata on transaction postings or other directive types beyond Open directives
    severity: high
    kind: claim_boundary
    modality: must_not
    consequence: Documentation explicitly states only Open directive metadata is supported; claiming broader support violates
      documented functionality and misleads users
    stage_ids:
    - metadata_summarizer
  - id: finance-C-126
    when: When comparing metadata summarizer output between backtest and live mode
    action: expect identical metadata values if the Beancount ledger entries differ between periods
    severity: low
    kind: claim_boundary
    modality: must_not
    consequence: Metadata comes from Open directive entries; if accounts are opened/closed or metadata values change, the
      summarizer output will differ, making it unsuitable for direct performance comparisons
    stage_ids:
    - metadata_summarizer
  - id: finance-C-132
    when: When generating metadata for auto-computed ticker attributes
    action: Prefix auto-generated metadata labels with 'a__' to distinguish from user-configured 'asset_allocation_' namespace
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Namespace collision causes user配置的资产配置被系统生成的属性覆盖，或者反之，导致资产分类报告错误
  - id: finance-C-133
    when: When presenting or reporting the system's tax optimization capabilities to users
    action: Claim tax advice, financial advice, or investment recommendations — the system is informational only
    severity: high
    kind: claim_boundary
    modality: must_not
    consequence: Users may make investment decisions based on unverified tax calculations, leading to unexpected tax liability
      and potential IRS penalties
  - id: finance-C-134
    when: When presenting wash sale analysis to users outside the United States
    action: Claim wash sale detection accuracy — the 30-day window is hardcoded for US IRS wash sale rules only
    severity: high
    kind: claim_boundary
    modality: must_not
    consequence: International users receive incorrect wash sale warnings based on US-specific rules, causing them to either
      miss legitimate harvesting opportunities or avoid legitimate rebalancing trades
  - id: finance-C-135
    when: When promoting the system to non-Beancount users
    action: Claim compatibility with non-Beancount accounting systems — the entire architecture depends on Beancount's data
      model
    severity: high
    kind: claim_boundary
    modality: must_not
    consequence: Users attempt to use the system with incompatible ledgers, resulting in complete failure to generate any
      reports
  - id: finance-C-136
    when: When promoting or describing the system's capabilities
    action: Claim real-time trading execution capability — this is a read-only analysis and reporting system with no execution
      interface
    severity: high
    kind: claim_boundary
    modality: must_not
    consequence: Users expect automated trade execution that does not exist, leading to missed investment opportunities and
      broken workflows
  - id: finance-C-137
    when: When describing tax optimization capabilities to users
    action: Claim tax optimization accuracy for users without US SpecID/STRICT lot booking — the system assumes specific identification
      of shares
    severity: high
    kind: claim_boundary
    modality: must_not
    consequence: Users on average cost basis accounting receive incorrect harvestable loss calculations, leading to tax filing
      errors or missed deductions
  - id: finance-C-138
    when: When using Fava date filters with the Tax Loss Harvester module
    action: Expect accurate TLH results — Fava date filter selection leads to unpredictable results with the TLH module
    severity: high
    kind: resource_boundary
    modality: must_not
    consequence: Selecting time filters in Fava causes TLH to show incorrect lots, wash sale detection, or summary values
  - id: finance-C-139
    when: When displaying wash sale analysis to users
    action: Claim comprehensive wash sale coverage — partial wash sales and complex matching scenarios are not displayed
    severity: medium
    kind: claim_boundary
    modality: must_not
    consequence: Users rely on incomplete wash sale information and trigger wash sales inadvertently, resulting in disallowed
      loss deductions and IRS adjustments
  - id: finance-C-140
    when: When running gains minimization analysis
    action: Claim asset allocation preservation — the minimizegains algorithm does not account for asset allocation shifts
      caused by selling
    severity: medium
    kind: claim_boundary
    modality: must_not
    consequence: Users following the minimization order inadvertently shift their portfolio allocation, leading to unintended
      risk profile changes
  - id: finance-C-141
    when: When configuring tax rates for gains minimization
    action: Set st_tax_rate and lt_tax_rate to actual applicable tax rates — default value of 1.0 (100%) produces wildly incorrect
      tax estimates
    severity: high
    kind: resource_boundary
    modality: must
    consequence: Default tax rates cause est_tax_percent to be grossly incorrect, leading users to sell wrong lots and overpay
      taxes by up to 100%
  - id: finance-C-142
    when: When implementing asset aggregation or reporting logic that combines values across multiple positions
    action: Verify each values are normalized to the single base currency stored in the root node before aggregation; if base
      currency is missing or values have mixed currencies, fail with an explicit error rather than attempting silent aggregation
    severity: high
    kind: domain_rule
    modality: must
    consequence: Silent mixed-currency aggregation produces nonsensical aggregate percentages and misleading portfolio reports,
      causing investors to make decisions based on fundamentally invalid data
    derived_from_bd_id: BD-002
  - id: finance-C-143
    when: When adding or modifying account selection strategies in the asset allocation by account stage
    action: Implement new strategies as by_* functions that match the pattern_type string from configuration; any new strategy
      requires a corresponding by_* function with the exact string match expected by the config parser
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Strategies without matching by_* functions are silently skipped, causing only the default strategies to run
      and producing incomplete or incorrect account selection results
    derived_from_bd_id: BD-004
  - id: finance-C-144
    when: When implementing wash sale detection logic for tax loss harvesting
    action: Read substantially identical ticker groups from beancount commodity metadata (Custom directives); wash sale applies
      only within user-defined groups marked as substantially identical; cross-group transactions must be treated as distinct
      securities
    severity: high
    kind: domain_rule
    modality: must
    consequence: Incorrectly treating cross-group transactions as substantially identical causes false wash sale disallowances,
      unnecessarily reducing realized losses and increasing tax liability
    derived_from_bd_id: BD-009
  - id: finance-C-145
    when: When using the framework's default tax rate configuration for US 2024 (ST=37%, LT=20%)
    action: Verify that short_term_rate and long_term_rate from configuration match the actual jurisdiction tax brackets;
      if operating outside US 2024, explicitly configure rates via beancount Custom directives before running tax analysis
    severity: medium
    kind: operational_lesson
    modality: should
    consequence: Default US tax rates applied to non-US portfolios systematically produce incorrect tax calculations, causing
      either overpayment (if rates too high) or underpayment (if rates too low) with audit exposure
    derived_from_bd_id: BD-013
  - id: finance-C-146
    when: When implementing or modifying wash sale detection queries
    action: Verify the 30-day wash sale window uses 61 total days (30 before + 1 sale day + 30 after); if modifying the window,
      change each related date calculations consistently to maintain the full 61-day look-back/look-forward span
    severity: high
    kind: domain_rule
    modality: must
    consequence: Using inconsistent wash sale windows (e.g., only 30 days total) fails to capture the full IRS 30-day rule,
      potentially reporting wash sale adjustments that are incomplete or missing, risking non-compliance
    derived_from_bd_id: BD-007
  - id: finance-C-147
    when: When implementing tax lot liquidation order in minimize_gains analysis
    action: 'Sort lots by estimated tax percentage in ascending order: negative gains (losses) sorted with most negative first
      for harvesting, positive gains sorted by lowest tax rate first; use this ordering to determine which lots are selected
      for liquidation to minimize immediate tax burden'
    severity: high
    kind: domain_rule
    modality: must
    consequence: Using FIFO or LIFO instead of tax-optimized sorting produces sub-optimal tax outcomes, potentially causing
      investors to pay more taxes than necessary on realized gains
    derived_from_bd_id: BD-011
  - id: finance-C-149
    when: When implementing security substitution logic for NAV calculations or tax lot tracking
    action: Verify both 'a__equivalents' and 'a__substidenticals' metadata fields are combined by default in RelateTickers.substidenticals();
      if using equivalents_only=True, explicitly document that only equivalents are being used and acknowledge reduced substitution
      coverage
    severity: high
    kind: domain_rule
    modality: must
    consequence: Missing either field from the combination causes misclassified securities in NAV calculations and tax lot
      tracking, leading to incorrect portfolio valuations and potential tax reporting errors
    derived_from_bd_id: BD-099
  - id: finance-C-150
    when: When implementing metadata column filtering logic in the summarizer
    action: Preserve prefix-based wildcard matching (e.g., 'contact_' matches 'contact_phone', 'contact_email') for flexible
      metadata column selection
    severity: high
    kind: domain_rule
    modality: must
    consequence: Changing prefix matching to exact matching breaks user-defined flexible schemas; applications relying on
      wildcard metadata lookups will fail to find expected columns
    derived_from_bd_id: BD-014
  - id: finance-C-155
    when: When calculating cumulative tax impact for progressive sell simulations
    action: Compute cumulative columns (running totals of selling amounts and associated taxes) AFTER sorting the lot list,
      not before — ensures cumulative columns reflect the correct progressive tax situation as lots are added in sorted order
    severity: medium
    kind: operational_lesson
    modality: must
    consequence: Computing cumulative columns before sorting produces incorrect running totals; users simulating different
      sell quantities see wrong tax impact at each level and may make suboptimal tax decisions
    derived_from_bd_id: BD-012
  - id: finance-C-158
    when: When implementing or refactoring ticker relationship logic for tax loss harvesting
    action: Apply graph-based transitive closure when inferring TLH partner relationships — if ticker A has a TLH partner
      B, and ticker B has a TLH partner C, then tickers A, B, and C must be treated as a single TLH group
    severity: high
    kind: domain_rule
    modality: must
    consequence: Without transitive closure, TLH recommendations may violate wash sale rules when the same underlying position
      is held across multiple tickers in the same group
    derived_from_bd_id: BD-016
  - id: finance-C-159
    when: When configuring or modifying the loss_threshold parameter for tax loss harvesting
    action: Verify that loss_threshold=1 dollar matches the actual transaction cost structure for the account — if per-trade
      costs exceed $1, adjust threshold upward to avoid harvesting trivial losses that generate net negative tax benefit
    severity: medium
    kind: operational_lesson
    modality: should
    consequence: Default loss_threshold of $1 may trigger harvests for positions with unrealized losses below transaction
      costs, creating net tax loss after accounting for brokerage fees and spread
    derived_from_bd_id: BD-024
  - id: finance-C-160
    when: When configuring account filters for tax loss harvesting or gain minimization modules
    action: Verify that account_sortkey values for each accounts intended for tax optimization follow the ^[01] regex pattern
      — accounts not matching this pattern are silently excluded from TLH and minimizegains processing
    severity: medium
    kind: operational_lesson
    modality: should
    consequence: Misconfigured account sortkey values silently exclude accounts from tax optimization, causing missed harvesting
      opportunities or suboptimal tax recommendations
    derived_from_bd_id: BD-026
  - id: finance-C-162
    when: When processing recent purchases for wash sale detection across substantially identical funds
    action: Deduplicate recent purchases by ticker symbol before wash sale window check — when multiple substantially identical
      funds hold the same underlying position, only one ticker entry per underlying asset should appear in wash sale analysis
    severity: high
    kind: domain_rule
    modality: must
    consequence: Without ticker-level deduplication, wash sale detection produces false positives triggering multiple overlapping
      wash sale windows for the same underlying position across different fund wrappers
    derived_from_bd_id: BD-030
  - id: finance-C-164
    when: When implementing lot selection logic for gain minimization or tax-aware rebalancing
    action: Sort candidate lots by estimated tax percentage in ascending order — lots with lowest tax impact must be prioritized
      for sale before lots with higher tax impact to minimize tax liability
    severity: high
    kind: domain_rule
    modality: must
    consequence: Incorrect lot sort order causes suboptimal tax outcomes where higher-tax-impact lots are sold before lower-tax-impact
      lots, increasing actual tax liability beyond the minimum achievable
    derived_from_bd_id: BD-042
  - id: finance-C-165
    when: When filtering ticker lists for active TLH group analysis and recommendations
    action: Exclude archived tickers from TLH group analysis — archived tickers represent positions no longer held and generating
      sales recommendations for them produces recommendations users cannot act upon
    severity: medium
    kind: operational_lesson
    modality: must
    consequence: Including archived tickers in TLH recommendations causes the system to suggest selling positions that no
      longer exist in the portfolio, creating confusion and wasted analysis effort
    derived_from_bd_id: BD-018
  - id: finance-C-167
    when: When persisting data to cache files (e.g., pickle files) in production
    action: Assume immutable append-only semantics with timestamp and hash chaining exist in the framework — the framework
      lacks these capabilities and pickle cache files can be arbitrarily modified
    severity: high
    kind: claim_boundary
    modality: must_not
    consequence: Without immutable append-only semantics, cache files can be arbitrarily modified which breaks the audit trail
      and compromises data integrity verification in production environments
    derived_from_bd_id: BD-GAP-006
  - id: finance-C-168
    when: When implementing data persistence operations that require audit trail
    action: Implement immutable append-only semantics for each data write operation, including timestamp and hash chaining
      to create verifiable audit trail — use append-only logging with SHA-256 hash of previous entry
    severity: high
    kind: domain_rule
    modality: must
    consequence: Without timestamp and hash chaining, cache modifications cannot be detected or audited, making it impossible
      to verify data integrity or trace unauthorized changes in production
    derived_from_bd_id: BD-GAP-006
  - id: finance-C-169
    when: When processing datetime values throughout the system
    action: Assume consistent timezone handling across each operations — the framework has mixed timezone handling with some
      timestamps having tzinfo and others without
    severity: high
    kind: claim_boundary
    modality: must_not
    consequence: Mixed timezone handling causes subtle datetime comparison bugs where timestamps with and without timezone
      info are compared incorrectly, leading to incorrect scheduling, reporting, or calculation errors
    derived_from_bd_id: BD-GAP-007
  - id: finance-C-170
    when: When handling timestamps throughout the system
    action: Normalize each datetime operations to UTC and use timezone-aware datetime objects consistently — apply UTC normalization
      at data ingestion and verify each timestamps stored have explicit timezone information
    severity: high
    kind: domain_rule
    modality: must
    consequence: Without UTC normalization, datetime comparisons across different system components produce incorrect results,
      causing wrong order execution times, misplaced transaction records, and incorrect NAV calculations
    derived_from_bd_id: BD-GAP-007
  - id: finance-C-171
    when: When implementing asset allocation calculation logic
    action: Bucket unallocated percentages (when metadata sums to less than 100%) into an 'unknown' category rather than silently
      scaling or rejecting the configuration
    severity: high
    kind: domain_rule
    modality: must
    consequence: Without unknown bucket handling, unallocated percentages are silently misallocated or cause configuration
      rejection, leading to incorrect portfolio allocation calculations and misleading performance attribution
    derived_from_bd_id: BD-033
  - id: finance-C-172
    when: When calculating NAV scaling ratios
    action: Limit the NAV scaling calculation to the most recent 10 historical ratio observations — use sliding window approach
      where window shrinks to available count when fewer than 10 ratios exist; use fallback behavior when zero ratios are
      available
    severity: high
    kind: domain_rule
    modality: must
    consequence: Using more than 10 historical ratios introduces stale data drift into NAV calculations, causing scaled NAV
      values to diverge from current market conditions and leading to incorrect performance measurement
    derived_from_bd_id: BD-071
  - id: finance-C-173
    when: When modifying ScaledNAV or RelateTickers class hierarchy
    action: Verify that any changes to build_commodity_groups() method or metadata format (a__equivalents, a__substidenticals)
      account for impact on both ScaledNAV and RelateTickers classes — use composition over inheritance or explicit interface
      contracts to decouple if changes are frequent
    severity: medium
    kind: operational_lesson
    modality: should
    consequence: Tight coupling through inheritance means changes to build_commodity_groups() or metadata format silently
      affect both classes, potentially causing unexpected behavior in scaled NAV calculations or ticker relationship analysis
    derived_from_bd_id: BD-098
  - id: finance-C-174
    when: When using the framework's expense ratio conversion logic for backtesting or display calculations
    action: Verify that expense ratio values are provided in decimal format before the system multiplies by 100; if source
      data is already in percentage format, divide by 100 before caching to avoid inflated values
    severity: medium
    kind: operational_lesson
    modality: should
    consequence: If expense ratios are sourced in percentage format (e.g., 0.5 for 0.5%) but the cache conversion multiplies
      by 100, the stored value becomes 50%, causing strategy cost calculations to underestimate impact by 100x and making
      expense-adjusted returns appear artificially inflated
    derived_from_bd_id: BD-063
  - id: finance-C-176
    when: When implementing hierarchical balance aggregation with recursive subtree computation
    action: Validate account hierarchy for circular parent references before running balance aggregation; implement cycle
      detection or use iterative depth-first traversal with visited tracking
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Circular parent references in the account hierarchy cause infinite recursion during subtree balance computation,
      leading to stack overflow crashes and complete backtesting failure with no partial results returned
    derived_from_bd_id: BD-068
  - id: finance-C-178
    when: When implementing or refactoring gain term classification logic for short-term vs long-term capital gains
    action: Use python-dateutil relativedelta for holding period date arithmetic to verify accurate IRS 1-year-and-a-day boundary
      calculation — do NOT use simple day-counting (365 days) which fails at month boundaries such as Feb 29 to Mar 1
    severity: high
    kind: operational_lesson
    modality: must
    consequence: Using simple 365-day arithmetic incorrectly classifies gains held across month boundaries, causing wrong
      short-term vs long-term gain determination that leads to tax miscalculation and potential IRS compliance issues
    derived_from_bd_id: BD-072
  - id: finance-C-179
    when: When configuring or validating the TLH loss threshold parameter
    action: Set loss_threshold to a negative value — the threshold must be non-negative; negative values would incorrectly
      include profits or zero-change positions as harvestable losses
    severity: high
    kind: operational_lesson
    modality: must_not
    consequence: Negative loss_threshold allows harvesting of positions without actual losses, generating wash sale complications
      and transaction costs without tax benefit
    derived_from_bd_id: BD-074
  - id: finance-C-180
    when: When implementing or refactoring TLH lot filtering logic
    action: Filter candidate lots using 'losses < -loss_threshold' comparison — not 'losses <= -loss_threshold' or 'losses
      < loss_threshold' which alter the qualifying boundary
    severity: high
    kind: operational_lesson
    modality: must
    consequence: Using <= instead of < includes zero-change lots; using positive comparison flips the logic entirely, harvesting
      profits instead of losses — both cause financial harm
    derived_from_bd_id: BD-074
  - id: finance-C-181
    when: When implementing or refactoring tax burden interpolation between bracket boundaries
    action: 'Handle edge cases for proceeds outside bracket range: use floor ratio for proceeds below lowest bracket, use
      ceiling ratio for proceeds above highest bracket, and return zero tax for zero proceeds'
    severity: medium
    kind: operational_lesson
    modality: must
    consequence: Omitting boundary handling causes undefined interpolation results for out-of-range proceeds values, producing
      incorrect tax estimates that could lead to underpayment or overpayment
    derived_from_bd_id: BD-075
  - id: finance-C-182
    when: When configuring or validating the cash drag minimum threshold parameter
    action: Set min_threshold to a negative value — the threshold must be non-negative; negative values would incorrectly
      flag profits or unchanged positions as cash drag
    severity: high
    kind: operational_lesson
    modality: must_not
    consequence: Negative threshold causes incorrect flagging of profitable or neutral positions for liquidation, generating
      unnecessary transaction costs and potentially realizing gains that create tax liability
    derived_from_bd_id: BD-080
  - id: finance-C-183
    when: When implementing or refactoring cash drag position filtering logic
    action: Filter positions using 'position >= min_threshold' comparison — positions exactly equal to min_threshold are included,
      not excluded
    severity: medium
    kind: operational_lesson
    modality: must
    consequence: Using > instead of >= excludes borderline positions at exactly the threshold, reducing the pool of harvestable
      lots and potentially missing tax optimization opportunities
    derived_from_bd_id: BD-080
  - id: finance-C-184
    when: When implementing or refactoring average tax rate calculation
    action: Guard against division by zero when cumulative_proceeds equals zero — return zero average rate or raise explicit
      exception; this edge case cannot occur in valid tax scenarios but must be defensively handled
    severity: high
    kind: domain_rule
    modality: must
    consequence: Division by zero crashes the tax calculation module, causing backtest pipeline failure and preventing portfolio
      tax efficiency analysis
    derived_from_bd_id: BD-076
  - id: finance-C-185
    when: When implementing or refactoring marginal tax rate calculation between tax brackets
    action: Detect and handle zero Δ_proceeds between brackets that yields undefined infinite marginal rate — return a sentinel
      value or skip to next valid bracket range
    severity: high
    kind: domain_rule
    modality: must
    consequence: Division by zero from zero Δ_proceeds crashes the tax calculation module, causing backtest pipeline failure
      and preventing lot-selection optimization
    derived_from_bd_id: BD-077
  - id: finance-C-186
    when: When implementing or refactoring TLH lot selection ordering logic
    action: Sort TLH candidate lots in ascending order of estimated tax percentage to prioritize harvesting highest-tax-impact
      losses first — verify stable sort for ties preserving original list order
    severity: high
    kind: domain_rule
    modality: must
    consequence: Sorting in descending order harvests lowest-tax-impact losses first, wasting limited harvestable slots on
      small losses while missing opportunities to harvest larger losses that provide greater tax savings
    derived_from_bd_id: BD-078
  - id: finance-C-187
    when: When implementing risk calculation or option pricing modules in backtesting
    action: Assume the framework handles volatility model family selection and distribution assumption — the framework does
      not implement volatility model families (e.g., GARCH, EWMA, historical volatility) or distribution selection (normal,
      t-distribution, skew-normal)
    severity: high
    kind: claim_boundary
    modality: must_not
    consequence: Without explicit volatility model and distribution selection, the framework may use inappropriate assumptions
      causing systematic mispricing of risk measures by 10-30% for strategies with option positions or volatility-dependent
      signals
    derived_from_bd_id: BD-GAP-004
  - id: finance-C-188
    when: When calculating factor IC (Information Coefficient) in cross-sectional backtesting
    action: Assume the framework handles factor IC demeaning and group alignment — the framework does not implement cross-sectional
      IC demeaning (subtracting cross-sectional mean) or proper group alignment before IC computation
    severity: high
    kind: claim_boundary
    modality: must_not
    consequence: Without IC demeaning, cross-sectional mean returns contaminate factor IC calculations, causing 5-15% systematic
      bias in IC estimates and leading to incorrect factor selection decisions
    derived_from_bd_id: BD-GAP-005
  - id: finance-C-189
    when: When implementing a new subclass of FavaInvestorAPI or extending the API layer
    action: 'Implement each required method signatures: get_commodity_value, get_cost_basis, get_open_amounts — duck typing
      provides no compile-time enforcement; missing methods will not be detected until runtime'
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Missing or misspelled method signatures cause runtime AttributeError when the framework invokes expected
      interface methods, breaking financial calculations and potentially corrupting portfolio valuations
    derived_from_bd_id: BD-019
  - id: finance-C-190
    when: When implementing iterative financial calculations or optimization algorithms
    action: Assume the framework handles convergence automatically without explicit criteria — missing convergence criteria
      means no defined stopping point for iterative calculations
    severity: high
    kind: claim_boundary
    modality: must_not
    consequence: Without explicit convergence criteria, iterative calculations may not terminate reliably, causing either
      infinite loops consuming CPU or premature termination with inaccurate results that silently propagate through financial
      computations
    derived_from_bd_id: BD-GAP-001
  - id: finance-C-191
    when: When implementing any iterative calculation or convergence-dependent algorithm in financial modules
    action: 'Define explicit convergence parameters: max_iterations (required), tolerance (required), and a convergence_check
      callable — document these as class attributes or constructor parameters'
    severity: high
    kind: domain_rule
    modality: must
    consequence: Unbounded iterative calculations may run indefinitely in edge cases, causing system hangs or producing inaccurate
      financial results that accumulate without obvious warning
    derived_from_bd_id: BD-GAP-001
  - id: finance-C-193
    when: When classifying assets into portfolio buckets for asset allocation analysis
    action: Only recognize commodities with metadata prefix 'asset_allocation_*' for bucketing — do not hardcode commodity
      names, use implicit patterns, or implement alternative classification logic
    severity: high
    kind: domain_rule
    modality: must
    consequence: Incorrect asset classification changes portfolio allocation percentages, causing misaligned rebalancing decisions
      that over-weight or under-weight asset classes relative to investment policy targets
    derived_from_bd_id: BD-001
  - id: finance-C-194
    when: When calculating gain holding period for tax-loss harvesting or capital gains classification
    action: Use dateutil.relativedelta for date arithmetic in gain_term calculation — relativedelta correctly handles leap
      year edge cases (Feb 29 + 1 year = Feb 28), matching IRS interpretation of 'greater than 1 year'
    severity: high
    kind: operational_lesson
    modality: must
    consequence: Using simple 365-day arithmetic misclassifies gains near leap years (e.g., Feb 29 purchase date), applying
      incorrect tax rates and causing unexpected tax liability when short-term gains are incorrectly taxed at long-term rates
      or vice versa
    derived_from_bd_id: BD-008
  - id: finance-C-195
    when: When implementing or maintaining version-specific Fava compatibility logic
    action: Duplicate version detection logic across different modules (FavaInvestorAPI and other workers) — use a unified
      version detection utility with a single source of truth for version thresholds
    severity: high
    kind: architecture_guardrail
    modality: must_not
    consequence: Duplicated version thresholds create silent inconsistencies where query results differ based on which module's
      version logic is used, causing backtest results to diverge from live trading behavior when Fava releases new versions
    derived_from_bd_id: BD-115
  - id: finance-C-196
    when: When using AccAPI.root_tree() or designing new AccAPI functionality
    action: Import fava.core.Tree in beancountinvestorapi.py — the standalone Beancount API should not depend on Fava internals
      to maintain portability across different Beancount deployments
    severity: medium
    kind: architecture_guardrail
    modality: must_not
    consequence: Fava dependency leak means AccAPI fails when Fava is not installed, preventing use of the 'pure' Beancount
      API in environments without Fava and indicating incomplete separation of concerns that could cause API failures
    derived_from_bd_id: BD-101
  - id: finance-C-197
    when: When implementing or modifying API methods (build_price_map, realize, query_func, get_commodity_directives) in FavaInvestorAPI
      or AccAPI
    action: Verify method signatures and return values remain identical between FavaInvestorAPI and AccAPI implementations
      - any divergence breaks cross-context compatibility and causes consumers using the wrong context to receive incorrect
      data or silent failures
    severity: high
    kind: domain_rule
    modality: must
    consequence: If FavaInvestorAPI and AccAPI implementations diverge on any method (e.g., different parameter handling,
      missing fields, or changed return types), code switching between Fava and CLI contexts will receive inconsistent results
      without explicit errors
    derived_from_bd_id: BD-093
  - id: finance-C-198
    when: When implementing or modifying account field extraction logic in TLH or minimizegains modules
    action: Use libtlh.get_account_field(options) as the sole extraction function - do not duplicate logic or extract similar
      functionality inline; changes to this function affect both TLH and minimizegains simultaneously
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Duplicating account field extraction logic in either TLH or minimizegains creates divergence where one module
      may use stale parsing rules while the other uses updated logic, corrupting tax optimization recommendations
    derived_from_bd_id: BD-095
  - id: finance-C-203
    when: When implementing asset allocation calculations in the asset_allocation_by_class stage
    action: Remove empty accounts (accounts with no holdings) and zero-balance ancestor accounts from asset allocation calculations
      — only include accounts with actual positive balances
    severity: high
    kind: domain_rule
    modality: must
    consequence: Including empty accounts in asset allocation percentages distorts the allocation results, causing portfolio
      displays to show incorrect positions and potentially leading to poor investment decisions based on misleading allocation
      data
    derived_from_bd_id: BD-035
  - id: finance-C-204
    when: When implementing currency conversion logic for multi-currency portfolios in asset_allocation_by_class
    action: Convert currencies via operating currencies as an intermediate step when direct conversion is unavailable — implement
      fallback routing through common currencies (USD, EUR) to verify conversions complete even when direct currency pairs
      lack pricing data
    severity: high
    kind: domain_rule
    modality: must
    consequence: Without currency conversion fallback, multi-currency portfolios with unavailable direct conversion pairs
      will fail to calculate, causing portfolio valuation errors and preventing asset allocation from completing for valid
      international portfolios
    derived_from_bd_id: BD-037
  - id: finance-C-205
    when: When using the framework's default cash detection for cash drag calculations in the cash_drag stage
    action: Verify that operating currencies are included as cash holdings by default — check that base currencies held in
      accounts are captured in cash calculations unless explicitly excluded in configuration
    severity: medium
    kind: operational_lesson
    modality: should
    consequence: Default inclusion of operating currencies as cash affects cash drag detection accuracy; strategies relying
      on accurate cash position data may miscalculate idle cash and miss optimization opportunities
    derived_from_bd_id: BD-039
  - id: finance-C-206
    when: When configuring cash drag detection in the cash_drag stage
    action: Verify that the default regex pattern '^Assets' matches the portfolio's actual account naming convention — customize
      the accounts_pattern parameter if the portfolio uses non-standard account naming structures
    severity: medium
    kind: operational_lesson
    modality: should
    consequence: Using the default '^Assets' pattern without verification causes cash drag detection to scan incorrect accounts,
      potentially missing cash positions in non-standard account structures and producing incomplete cash drag analysis
    derived_from_bd_id: BD-040
  - id: finance-C-208
    when: When using minimizegains strategy for tax optimization
    action: Preserve the leap-year-aware date arithmetic in libtlh.gain_term() using relativedelta — verify that any refactoring
      maintains the same relativedelta-based date calculations for tax term classification (short-term vs long-term)
    severity: high
    kind: operational_lesson
    modality: must
    consequence: Incorrect leap-year handling in gain_term() corrupts short/long-term tax classification, causing minimizegains
      to make suboptimal liquidation decisions based on wrong holding period calculations and users to pay higher taxes than
      backtested
    derived_from_bd_id: BD-109
  - id: finance-C-209
    when: When configuring cross-account wash sale detection
    action: Verify substantially identical fund classification accuracy (BD-017) before relying on cross-account wash sale
      detection — verify each fund pair is correctly marked as equivalent or substantially identical before using BD-030 deduplication
    severity: high
    kind: operational_lesson
    modality: must
    consequence: Incorrectly marking substantially identical funds as equivalent causes cross-account wash sale detection
      to either miss real violations (false negatives) or trigger false blocks (false positives), leading to either IRS penalties
      or unnecessarily restricted trading
    derived_from_bd_id: BD-112
  - id: finance-C-210
    when: When implementing cash commodity detection logic in asset allocation
    action: Check asset_allocation_Bond_Cash metadata equals exactly 100 to classify an instrument as cash commodity
    severity: high
    kind: domain_rule
    modality: must
    consequence: Using a different threshold or metadata field for cash classification causes incorrect asset allocation,
      potentially leading to inappropriate portfolio rebalancing decisions or misrepresentation of actual cash positions
    derived_from_bd_id: BD-038
  - id: finance-C-211
    when: When implementing API calls for CLI mode batch operations
    action: Set end_date to None in CLI mode to disable date filtering — must not default to current date like GUI mode
    severity: high
    kind: domain_rule
    modality: must_not
    consequence: Defaulting end_date to current date in CLI mode breaks batch operations that need to process all historical
      data, causing incomplete analysis when users run scripts against full historical datasets
    derived_from_bd_id: BD-053
  - id: finance-C-212
    when: When implementing configuration extraction for the investor API
    action: Extract configuration from fava-extension custom directives in the beancount file, not from separate config files
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Using separate config files instead of beancount directives breaks configuration version control and portability,
      causing configuration drift between environments and loss of audit trail
    derived_from_bd_id: BD-054
  - id: finance-C-213
    when: When using the framework's default tax rate parameters for capital gains calculations
    action: Verify that default short_term_tax_rate=1% and long_term_tax_rate=1% match the user's actual tax bracket, adjusting
      if the user falls in higher brackets where actual rates may be 15%, 20%, or 37%
    severity: high
    kind: operational_lesson
    modality: must
    consequence: Default 1% tax rates significantly underestimate tax liability for most investors, causing backtested portfolio
      values to appear materially higher than actual results after tax settlement
    derived_from_bd_id: BD-041
  - id: finance-C-214
    when: When implementing inventory aggregation logic for tax lot tracking
    action: Use sequential accumulation into a single Inventory object, not vectorized operations, to preserve lot-level detail
      for tax calculations
    severity: high
    kind: domain_rule
    modality: must
    consequence: Using vectorized or parallel accumulation loses lot-level granularity, making tax lot tracking incomplete
      and causing incorrect tax reporting for accounts with multiple position lots
    derived_from_bd_id: BD-081
  - id: finance-C-215
    when: When implementing inventory aggregation for large portfolios
    action: Handle numeric type overflow for cumulative inventory sums exceeding standard integer/float bounds - use Decimal
      type or explicit overflow detection
    severity: high
    kind: domain_rule
    modality: must
    consequence: Unchecked overflow in inventory accumulation silently wraps or truncates values, causing portfolio totals
      to become incorrect and triggering wrong tax lot calculations for large positions
    derived_from_bd_id: BD-081
  - id: finance-C-216
    when: When implementing security clustering for commodity-equivalence groups
    action: Use Union-Find data structure with path compression and union by rank for near-constant-time group operations;
      do not replace with dict-based or list-based grouping
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Dict-based grouping has O(n) lookup time, causing exponential slowdown when processing portfolios with thousands
      of securities; Union-Find provides near-constant-time operations essential for tax-lot matching
    derived_from_bd_id: BD-083
  - id: finance-C-217
    when: When implementing Union-Find for commodity grouping
    action: 'Handle edge cases: self-unions (union(A,A)) must be idempotent, isolated securities form singleton groups, circular
      dependencies must resolve correctly via algorithm'
    severity: high
    kind: domain_rule
    modality: must
    consequence: Missing edge case handling causes incorrect group membership, leading to wrong commodity-equivalence classification
      and broken tax-lot matching across related securities
    derived_from_bd_id: BD-083
  - id: finance-C-218
    when: When selecting representative tickers from commodity-equivalence groups
    action: Implement deterministic tie-breaking logic for groups with multiple valid candidates; empty groups must produce
      no representative, single-element groups must trivially select that element
    severity: high
    kind: domain_rule
    modality: must
    consequence: Non-deterministic or missing tie-breaking causes inconsistent reporting across runs, making audit trails
      unreliable and potentially causing incorrect tax reporting when different representatives are selected on each execution
    derived_from_bd_id: BD-085
  - id: finance-C-219
    when: When distributing position amounts into allocation buckets
    action: Validate that sum of bucket meta_value weights equals 100 for complete coverage; meta_value of 0 distributes nothing,
      and meta_value > 100 would overallocate and should be rejected
    severity: high
    kind: domain_rule
    modality: must
    consequence: Unvalidated bucket weights causing overallocation or underallocation distorts asset categorization, leading
      to incorrect risk reporting and potentially wrong portfolio allocation decisions
    derived_from_bd_id: BD-088
  - id: finance-C-220
    when: When implementing bucket allocation distribution formula
    action: 'Use the formula: amount * (meta_value / 100); do not reorder or use alternative distribution methods that could
      cause floating-point precision errors with small meta_value percentages'
    severity: medium
    kind: domain_rule
    modality: must
    consequence: Alternative formulas or incorrect operator precedence cause miscalculated bucket distributions that silently
      miscategorize assets, corrupting portfolio analysis reports
    derived_from_bd_id: BD-088
  - id: finance-C-221
    when: When calculating asset allocation percentages
    action: Pre-validate that portfolio total is non-zero before invoking (balance / total) * 100; callers must check total
      > 0 or handle division by zero explicitly
    severity: high
    kind: domain_rule
    modality: must
    consequence: Division by zero on empty portfolios produces Invalid results that propagate to reporting dashboards, causing
      misleading percentage displays and potentially triggering automated rebalancing on zero balances
    derived_from_bd_id: BD-067
  - id: finance-C-222
    when: When implementing TLH partner inference logic in the tax loss harvesting system
    action: 'Apply symmetric closure iteratively until fixed point: if A→(B,C) then B→(A,C) and C→(A,B) must be true; empty
      partner sets must remain empty; large groups may create many inferred edges and require iteration limits'
    severity: high
    kind: domain_rule
    modality: must
    consequence: Incorrect symmetric closure causes wash sale violations to go undetected in complex multi-fund scenarios
      where funds share common holdings; the IRS wash sale rule disallows loss deductions when substantially identical securities
      are purchased within 30 days before or after the sale
    derived_from_bd_id: BD-084
  - id: finance-C-223
    when: When implementing price ratio calculation (MF_price / ETF_price) in scaled NAVs processing
    action: Validate ETF_price is non-zero before division; handle missing ETF price on a date by excluding that observation;
      flag or investigate very high ratios as potential data issues
    severity: high
    kind: domain_rule
    modality: must
    consequence: Division by zero crashes the calculation pipeline; unhandled missing ETF prices create data gaps in NAV estimation;
      very high ratios indicate stale prices or data corruption causing incorrect valuation
    derived_from_bd_id: BD-087
  - id: finance-C-224
    when: When constructing asset allocation tree hierarchies in libassetalloc.py
    action: Verify that the first operating currency used as root node denomination is valid and available; implement fallback
      logic to select an alternative currency (e.g., portfolio base currency, USD) when the first operating currency is empty,
      null, or invalid
    severity: medium
    kind: operational_lesson
    modality: should
    consequence: Hardcoded first operating currency with no fallback causes silent failures or crashes when the currency list
      is empty or the first entry is invalid, leading to missing allocation reports and inability to track portfolio performance
    derived_from_bd_id: BD-103
  - id: finance-C-225
    when: When setting up or configuring operating_currencies for portfolio analysis
    action: Validate that operating_currencies is non-empty and first currency is valid before any portfolio analysis; implement
      explicit error handling or default fallback (e.g., USD) if empty
    severity: high
    kind: operational_lesson
    modality: must
    consequence: Empty operating_currencies list causes silent cascading failures across all downstream calculations (TLH,
      cash drag, allocation, minimizegains) that depend on the base currency invariant, with no graceful degradation and no
      user-facing error message
    derived_from_bd_id: BD-107
  - id: finance-C-226
    when: When initializing the TLH module with default loss_threshold parameter
    action: Verify loss_threshold default value (1 in code vs 0 in example config) matches expected behavior; explicitly set
      the value to verify consistent harvesting behavior across initialization paths
    severity: medium
    kind: operational_lesson
    modality: should
    consequence: Inconsistent loss_threshold defaults cause context-dependent TLH behavior — code default=1 filters out penny-level
      losses while config default=0 harvests all losses, leading to different recommendations depending on initialization
      path
    derived_from_bd_id: BD-111
output_validator:
  assertions:
  - id: OV-01
    check_predicate: all(p in inspect.getsource(zvt.factors.algorithm.macd) for p in ['slow=26', 'fast=12', 'n=9'])
    failure_message: 'FATAL: MACD params drifted from (fast=12, slow=26, n=9) — SL-08 violation, non-reproducible signals'
    business_meaning: Standard MACD parameters are a semantic lock; drift makes results incomparable with industry-standard
      indicators and non-reproducible.
    source_ids:
    - SL-08
    - BD-036
  - id: OV-02
    check_predicate: result.get('total_trades', 0) > 0 or result.get('explicit_zero_trade_ack') is True
    failure_message: Zero trades executed — likely missing pre-fetched data (see PC-02) or over-restrictive filters
    business_meaning: A backtest with zero trades is not a valid result; either data is missing or the strategy never triggered.
      Structural non-emptiness check is insufficient — we need business confirmation.
    source_ids:
    - SL-01
    - finance-C-073
  - id: OV-03
    check_predicate: result.get('annual_return') is None or abs(float(result['annual_return'])) <= 5.0
    failure_message: 'FATAL: |annual_return| > 500% — likely look-ahead bias or data error'
    business_meaning: Annual returns exceeding 500% are physically implausible for A-share strategies; indicates look-ahead
      bias or corrupt data.
    source_ids: []
  - id: OV-04
    check_predicate: result.get('holding_change_pct') is None or abs(float(result['holding_change_pct'])) <= 1.0
    failure_message: 'FATAL: |holding_change_pct| > 100% — physically impossible'
    business_meaning: Holding change percentage cannot exceed 100%; violation indicates position accounting error.
    source_ids:
    - BD-029
  - id: OV-05
    check_predicate: result.get('max_drawdown') is None or abs(float(result['max_drawdown'])) <= 1.0
    failure_message: 'FATAL: |max_drawdown| > 100% — impossible for non-leveraged account'
    business_meaning: Maximum drawdown cannot exceed 100% without leverage; violation indicates calculation error or look-ahead
      bias.
    source_ids: []
  - id: OV-06
    check_predicate: not (hasattr(result, 'trade_log') and result.trade_log and any(result.trade_log[i].action == 'sell' and
      i+1 < len(result.trade_log) and result.trade_log[i+1].action == 'buy' and result.trade_log[i].timestamp == result.trade_log[i+1].timestamp
      for i in range(len(result.trade_log)-1)))
    failure_message: 'FATAL: buy-before-sell detected in same cycle — SL-01 violation, creates implicit leverage'
    business_meaning: SL-01 requires sell() before buy() in each cycle; violation means available_long was not updated before
      buying, risking duplicate positions.
    source_ids:
    - SL-01
  scaffold:
    validate_py_path: '{workspace}/validate.py'
    tail_block: "# === DO NOT MODIFY BELOW THIS LINE ===\nif __name__ == \"__main__\":\n    result = run_backtest()\n    from\
      \ validate import enforce_validation\n    enforce_validation(result, output_path=\"{workspace}/result.csv\")\n# ===\
      \ END DO NOT MODIFY ==="
  enforcement_protocol: 1. Never edit validate.py. 2. Never delete the DO NOT MODIFY tail block from the main script. 3. Never
    wrap enforce_validation() in try/except. 4. Never rewrite result write logic — it MUST go through enforce_validation.
    5. If validate.py raises ImportError, fix the dependency, do not remove the call.
acceptance:
  hard_gates:
  - id: G1
    check: '{workspace}/result.csv exists AND file size > 0'
    on_fail: Strategy did not produce output; check run_backtest() return value and enforce_validation() call
  - id: G2
    check: '{workspace}/result.csv.validation_passed marker file exists'
    on_fail: Validation did not complete; review validate.py output and fix assertion failures
  - id: G3
    check: 'Main script contains literal: from validate import enforce_validation'
    on_fail: Validation chain stripped; re-add the import in the DO NOT MODIFY block
  - id: G4
    check: 'Main script contains literal: # === DO NOT MODIFY BELOW THIS LINE ==='
    on_fail: Validation fence removed; regenerate DO NOT MODIFY tail block
  - id: G5
    check: 'result.csv has at least 1 row: pandas.read_csv(result.csv).shape[0] >= 1'
    on_fail: Empty result; check if trade_log is non-empty and factors generated signals. Confirm PC-02 (k-data exists) passed.
  - id: G6
    check: 'If MACD strategy: source contains ''slow=26'' AND ''fast=12'' AND ''n=9'' in algorithm call'
    on_fail: MACD params drifted from SL-08 lock; restore standard (12, 26, 9)
  - id: G7
    check: 'For data pipeline tasks: result.csv contains ''entity_id'' and ''timestamp'' fields'
    on_fail: Missing required columns; check Mixin.query_data return schema and DataFrame MultiIndex reset_index() before
      writing
  - id: G8
    check: 'OV-03 passes: abs(annual_return) <= 5.0 (500%)'
    on_fail: Physical plausibility check failed; investigate look-ahead bias or data corruption in input kdata
  soft_gates:
  - id: SG-01
    rubric: 'Strategy narrative consistency: user intent aligns with generated strategy.py logic. dim_a: signal direction
      (buy/sell) matches intent [1-5, pass>=4]; dim_b: frequency (daily/intraday) aligns [1-5, pass>=4]; dim_c: risk controls
      match user intent [1-5, pass>=4].'
  - id: SG-02
    rubric: 'Factor combination quality. dim_a: no highly correlated factor duplication [1-5, pass>=4]; dim_b: multi-period
      alignment correct [1-5, pass>=4]; dim_c: liquidity filter present for A-share [1-5, pass>=4].'
  - id: SG-03
    rubric: 'Data source selection appropriateness. dim_a: coverage sufficient for target entities [1-5, pass>=4]; dim_b:
      provider latency acceptable for strategy frequency [1-5, pass>=4]; dim_c: no unauthorized provider used without credentials
      [1-5, pass>=4].'
skill_crystallization:
  trigger: all_hard_gates_passed AND user_opt_out_skill_saving != true
  output_path_template: '{workspace}/../skills/{slug}.skill'
  slug_template: '{blueprint_id_short}-{uc_id_lower}'
  captured_fields:
  - name
  - intent_keywords
  - entry_point_script
  - validate_script
  - fatal_constraints
  - spec_locks
  - preconditions
  - install_recipes
  - human_summary_translated
  action: 'After all Hard Gates PASS, resolve slug via slug_template using the executed UC, then write the .skill YAML file
    at output_path_template. Notify user in their detected locale: ''Skill saved as {slug}.skill — next time say one of {sample_triggers}
    from the matched UC to invoke directly.'''
  violation_signal: All hard gates passed but no .skill file exists at expected path
  skill_file_schema:
    name: finance-bp-078 / Portfolio Management CLI Entry Point
    version: v5.3
    intent_keywords:
    - portfolio management
    - CLI
    - command line
    - tax optimization
    - investment analysis
    entry_point: run_backtest
    fatal_guards:
    - SL-01
    - SL-02
    - SL-03
    - SL-04
    - SL-05
    - SL-06
    - SL-07
    - SL-08
    - SL-10
    - SL-11
    - SL-12
    spec_locks:
    - SL-01
    - SL-02
    - SL-03
    - SL-04
    - SL-05
    - SL-06
    - SL-07
    - SL-08
    - SL-09
    - SL-10
    - SL-11
    - SL-12
    preconditions:
    - PC-01
    - PC-02
    - PC-03
    - PC-04
post_install_notice:
  trigger: skill_installation_complete
  message_template:
    positioning: I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow.
    capability_catalog:
      group_strategy:
        source: auto_grouped
        strategy_reason: auto-grouped by UC.type (3 distinct values, balanced distribution)
      groups:
      - group_id: live_trading
        name: Live Trading
        description: ''
        emoji: 📦
        uc_count: 3
        ucs:
        - uc_id: UC-101
          name: Portfolio Management CLI Entry Point
          short_description: Provides a unified command-line interface for portfolio management operations including tax loss
            harvesting, asset allocation analysis, cash drag dete
          sample_triggers:
          - portfolio management
          - CLI
          - command line
        - uc_id: UC-103
          name: Tax-Optimized Selling Strategy
          short_description: Determines optimal sell order for securities to minimize realized capital gains by analyzing
            cost basis and holding periods across multiple lots
          sample_triggers:
          - minimize gains
          - tax-efficient selling
          - capital gains optimization
        - uc_id: UC-105
          name: Tax Loss Harvesting Opportunity Detection
          short_description: Identifies securities with unrealized losses that can be sold to harvest tax losses, typically
            looking back 30 days to find positions eligible for was
          sample_triggers:
          - tax loss harvesting
          - loss identification
          - wash sale
      - group_id: data_pipeline
        name: Data Pipeline
        description: ''
        emoji: 📊
        uc_count: 1
        ucs:
        - uc_id: UC-102
          name: Related Ticker Grouping Utility
          short_description: Identifies and groups equivalent or substitutable securities (e.g., VTI, VTSAX, VTSMX) based
            on metadata annotations to support tax lot management and
          sample_triggers:
          - equivalent tickers
          - related securities
          - commodity grouping
      - group_id: reporting
        name: Reporting
        description: ''
        emoji: 📋
        uc_count: 1
        ucs:
        - uc_id: UC-104
          name: Asset Allocation Analysis
          short_description: 'Calculates and reports portfolio allocation breakdown by asset type (stocks, bonds, cash, etc.)
            with percentage distributions from investment account '
          sample_triggers:
          - asset allocation
          - portfolio breakdown
          - asset class distribution
    call_to_action: Tell me which one you want to try.
    featured_entries:
    - uc_id: UC-101
      beginner_prompt: Try portfolio management cli entry point
      auto_selected: true
    - uc_id: UC-102
      beginner_prompt: Try related ticker grouping utility
      auto_selected: true
    - uc_id: UC-103
      beginner_prompt: Try tax-optimized selling strategy
      auto_selected: true
    more_info_hint: Ask me 'what else can you do?' to see all 5 capabilities.
  locale_rendering:
    instruction: On skill_installation_complete, translate ALL user-facing strings (positioning + capability_catalog.groups[].name
      + capability_catalog.groups[].description + capability_catalog.groups[].ucs[].short_description + call_to_action + featured_entries[].beginner_prompt
      + more_info_hint) into detected user locale per locale_contract. Preserve UC-IDs, group_id, emoji, and sample_triggers
      verbatim.
    preserve_verbatim:
    - UC-IDs
    - group_id
    - emoji
    - sample_triggers
    - technical_class_names
  enforcement:
    action: 'Host agent MUST send composed message to user as the FIRST user-facing response after skill_installation_complete
      event. Message MUST contain: positioning, capability_catalog (rendered as markdown tables per group), 3 featured_entries,
      call_to_action, and more_info_hint.'
    violation_code: PIN-01
    violation_signal: First user-facing message post-install does not contain the full capability_catalog (all UCs grouped)
      OR skips featured_entries OR skips call_to_action.
human_summary:
  persona: Doraemon
  what_i_can_do:
    tagline: 'I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow. Just tell me
      what you want; I''ll write the code, you don''t have to dig docs. (Heads up: ZVT natively supports A-share, HK, and
      crypto. US stocks — stockus_nasdaq_AAPL — are half-baked; don''t bother for serious work.)'
    use_cases:
    - Tax-Optimized Selling Strategy
    - Related Ticker Grouping Utility
    - Portfolio Management CLI Entry Point
    - A-share MACD daily golden-cross backtest with hfq price adjustment from eastmoney
    - 'End-to-end ZVT pipeline: FinanceRecorder + GoodCompanyFactor + StockTrader'
    - Multi-factor strategy with TargetSelector (AND mode) combining MACD + volume breakout
    - Index composition data collection (SZ1000, SZ2000) with EM recorder
  what_i_auto_fetch:
  - ZVT stage pipeline structure (data_collection → visualization) from LATEST.yaml
  - Semantic locks (SL-01 through SL-12) — especially sell-before-buy ordering and MACD params
  - Fatal constraints (finance-C-*) relevant to your target strategy type
  - 'Default parameters: MACD(12,26,9), hfq adjustment, buy_cost=0.001, base_capital=1M CNY'
  - Entity ID format (stock_sh_600000) and DataFrame MultiIndex convention
  - Provider-specific recorder class names and required class attributes
  what_i_ask_you:
  - 'Target market: A-share (default), HK, or crypto? (US stocks in ZVT are half-baked — stockus_nasdaq_AAPL exists but coverage
    is thin)'
  - 'Data source / provider: eastmoney (free, no account), joinquant (account+paid), baostock (free, good history), akshare,
    or qmt (broker)?'
  - 'Strategy type: MACD golden-cross, MA crossover, volume breakout, fundamental screen, or custom factor?'
  - 'Time range: start_timestamp and end_timestamp for backtest period'
  - 'Target entity IDs: specific stocks (stock_sh_600000) or index components (SZ1000)?'
  locale_rendering:
    instruction: On first user contact, translate all fields above into detected user locale while preserving Doraemon persona
      (direct, frank, mildly snarky, knows limits).
    preserve_verbatim:
    - BD-IDs
    - SL-IDs
    - UC-IDs
    - finance-C-IDs
    - class_names
    - function_names
    - file_paths
    - numeric_thresholds

ClawHub Coding Database+2

T@clawhub-tangweigang-jpg-8679fec286

Empyrical Risk Metrics

Skill

计算投资组合风险指标，包括年化收益率、夏普比率、索提诺比率、最大回撤和卡玛比率，支持滚动窗口统计和 NaN 数据处理，适用于多市场数据。。

---
name: empyrical-risk-metrics
description: |-
  计算投资组合风险指标，包括年化收益率、夏普比率、索提诺比率、最大回撤和卡玛比率，支持滚动窗口统计和 NaN 数据处理，适用于多市场数据。。
license: Proprietary. See LICENSE.txt in project root.
compatibility: Designed for Doramagic-host ecosystem (Claude Code / openclaw / Cursor). Requires Python 3.12+ with uv package manager.
metadata:
  version: "v6.1"
  blueprint_id: "finance-bp-107"
  compiled_at: "2026-04-22T13:00:51.147425+00:00"
  capability_markets: "multi-market"
  capability_activities: "backtesting, factor-research"
  sop_version: "crystal-compilation-v6.1"
---
# 投资风险指标 (empyrical-risk-metrics)

> 计算投资组合风险指标，包括年化收益率、夏普比率、索提诺比率、最大回撤和卡玛比率，支持滚动窗口统计和 NaN 数据处理，适用于多市场数据。

## Pipeline

`data_collection -> data_storage -> factor_computation -> target_selection -> trading_execution -> visualization`

## Top Use Cases (3 total)

### Sphinx Documentation Build Configuration (`UC-101`)
Configuring Sphinx to automatically generate API documentation from docstrings and source code comments for the empyrical library
**Triggers**: sphinx configuration, documentation build, autodoc setup

### Documentation Deployment Automation (`UC-102`)
Automating the process of cleaning, building, and deploying Sphinx documentation to a hosting platform for the empyrical project
**Triggers**: documentation deployment, automated deployment, CI/CD documentation

### Advanced Sphinx Documentation Source Setup (`UC-103`)
Configuring advanced Sphinx extensions including autodoc filtering, numpydoc integration, and markdown support for comprehensive documentation generat
**Triggers**: sphinx extensions, numpydoc, autodoc filtering

**Execute trigger**: `When user intent matches intent_router.uc_entries[].positive_terms AND user uses action verb (run/execute/跑/执行/backtest/fetch/collect)`

## What I'll Ask You

- Target market: A-share (default), HK, or crypto? (US stocks in ZVT are half-baked — stockus_nasdaq_AAPL exists but coverage is thin)
- Data source / provider: eastmoney (free, no account), joinquant (account+paid), baostock (free, good history), akshare, or qmt (broker)?
- Strategy type: MACD golden-cross, MA crossover, volume breakout, fundamental screen, or custom factor?
- Time range: start_timestamp and end_timestamp for backtest period
- Target entity IDs: specific stocks (stock_sh_600000) or index components (SZ1000)?

## Semantic Locks (Fatal)

| ID | Rule | On Violation |
|---|---|---|
| `SL-01` | Execute sell orders before buy orders in every trading cycle | halt |
| `SL-02` | Trading signals MUST use next-bar execution (no look-ahead) | halt |
| `SL-03` | Entity IDs MUST follow format entity_type_exchange_code | halt |
| `SL-04` | DataFrame index MUST be MultiIndex (entity_id, timestamp) | halt |
| `SL-05` | TradingSignal MUST have EXACTLY ONE of: position_pct, order_money, order_amount | halt |
| `SL-06` | filter_result column semantics: True=BUY, False=SELL, None/NaN=NO ACTION | halt |
| `SL-07` | Transformer MUST run BEFORE Accumulator in factor pipeline | halt |
| `SL-08` | MACD parameters locked: fast=12, slow=26, signal=9 | halt |

Full lock definitions: [references/LOCKS.md](references/LOCKS.md)

## Top Anti-Patterns (25 total)

- **`AP-ZVT-183`**: 除权因子为 inf/NaN 时直接参与乘法导致复权静默失败
- **`AP-ZVT-179`**: 第三方数据接口超限后异常被吞噬，数据静默缺失
- **`AP-ZVT-183B`**: HFQ（后复权）与 QFQ（前复权）K 线表使用错误导致因子计算漂移

All 25 anti-patterns: [references/ANTI_PATTERNS.md](references/ANTI_PATTERNS.md)

## Evidence Quality Notice

> [QUALITY NOTICE] This crystal was compiled from blueprint finance-bp-107. Evidence verify ratio = 45.3% and audit fail total = 21. Generated results may have uncaptured requirement gaps. Verify critical decisions against source files (LATEST.yaml / LATEST.jsonl).

## Reference Files

| File | Contents | When to Load |
|---|---|---|
| [references/seed.yaml](references/seed.yaml) | V6+ 全量权威 (source-of-truth) | 有行为/决策争议时必读 |
| [references/ANTI_PATTERNS.md](references/ANTI_PATTERNS.md) | 25 条跨项目反模式 | 开始实现前 |
| [references/WISDOM.md](references/WISDOM.md) | 跨项目精华借鉴 | 架构决策时 |
| [references/CONSTRAINTS.md](references/CONSTRAINTS.md) | domain + fatal 约束 | 规则冲突时 |
| [references/USE_CASES.md](references/USE_CASES.md) | 全量 KUC-* 业务场景 | 需要完整示例时 |
| [references/LOCKS.md](references/LOCKS.md) | SL-* + preconditions + hints | 生成回测/交易代码前 |
| [references/COMPONENTS.md](references/COMPONENTS.md) | AST 组件地图（按 module 拆分）| 查 API 时 |

---

*Compiled by Doramagic crystal-compilation-v6.1 from `finance-bp-107` blueprint at 2026-04-22T13:00:51.147425+00:00.*
*See [human_summary.md](human_summary.md) for non-technical overview.*

FILE:human_summary.md
# Human Summary

> I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow. Just tell me what you want; I'll write the code, you don't have to dig docs. (Heads up: ZVT natively supports A-share, HK, and crypto. US stocks — stockus_nasdaq_AAPL — are half-baked; don't bother for serious work.)

## What I Can Do

- **tagline**: I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow. Just tell me what you want; I'll write the code, you don't have to dig docs. (Heads up: ZVT natively supports A-share, HK, and crypto. US stocks — stockus_nasdaq_AAPL — are half-baked; don't bother for serious work.)
- **use_cases**: ['Advanced Sphinx Documentation Source Setup', 'Documentation Deployment Automation', 'Sphinx Documentation Build Configuration', 'A-share MACD daily golden-cross backtest with hfq price adjustment from eastmoney', 'End-to-end ZVT pipeline: FinanceRecorder + GoodCompanyFactor + StockTrader', 'Multi-factor strategy with TargetSelector (AND mode) combining MACD + volume breakout', 'Index composition data collection (SZ1000, SZ2000) with EM recorder']

## What I Ask You

- Target market: A-share (default), HK, or crypto? (US stocks in ZVT are half-baked — stockus_nasdaq_AAPL exists but coverage is thin)
- Data source / provider: eastmoney (free, no account), joinquant (account+paid), baostock (free, good history), akshare, or qmt (broker)?
- Strategy type: MACD golden-cross, MA crossover, volume breakout, fundamental screen, or custom factor?
- Time range: start_timestamp and end_timestamp for backtest period
- Target entity IDs: specific stocks (stock_sh_600000) or index components (SZ1000)?

FILE:references/ANTI_PATTERNS.md
# Anti-Patterns (Cross-Project)

Total: **25**

## qlib (9)

### `AP-QLIB-1930` — 回测结果与模型无关——共享 dataset 对象导致预测值被首次模型覆盖 <sub>(high)</sub>

Qlib 中多个模型复用同一个已 fit 的 DatasetH 实例时，dataset 内部的标准化 参数（fit_start_time/fit_end_time 决定的归一化统计量）在第一次 fit 后固化。 切换模型但不重新初始化 dataset，导致所有模型实际使用同一套预测信号。表现为 无论换 LightGBM/XGBoost/DNN，回测净值曲线完全一致。这是最危险的"实验看起来 在跑，但结论全部无效"反模式。

Source: https://github.com/microsoft/qlib/issues/1930

### `AP-QLIB-2090` — fit_start_time 与 train segment 双重配置引发隐式数据泄露 <sub>(high)</sub>

Qlib DatasetH 有两个"训练数据范围"：handler 的 fit_start_time/fit_end_time （决定归一化器拟合范围）和 segments.train（决定模型训练范围）。常见错误是 让 fit_end_time 覆盖 valid/test 段，使归一化统计量（均值、标准差）包含了 未来数据，造成前向偏差（look-ahead bias）。两者独立配置但语义耦合，文档 未明确说明 fit_end_time 必须 <= train_end。

Source: https://github.com/microsoft/qlib/issues/2090

### `AP-QLIB-2036` — MACD 因子公式文档错误——DEA 被多除一次 CLOSE 导致量纲不一致 <sub>(high)</sub>

Qlib 官方文档中的 Alpha 公式示例将 MACD 的 DEA 定义为 EMA(DIF, 9) / CLOSE， 但 DIF 已经是无量纲（除过 CLOSE 的），再次除以 CLOSE 导致 DEA 量纲为 1/price。 基于此文档公式构建的 MACD 因子在截面标准化后与正确公式差异显著，IC 下降。 此类文档层面的公式错误会被大量用户直接照搬入生产因子库。

Source: https://github.com/microsoft/qlib/issues/2036

### `AP-QLIB-2184` — 自定义 A 股数据导入前未按约定填充停牌日 NaN，引发下游因子噪声 <sub>(high)</sub>

Qlib 约定停牌日 open/close/high/low/volume/factor 字段均应填 NaN，以便框架 在因子计算时识别并跳过。用户自建 A 股数据集时若将停牌日保留为上一日价格 （常见于从东财/Wind 直接导出的数据），会导致停牌期间的价格动量因子出现 "假信号"（价格不变但因子非零）。Qlib 不校验此约定，错误静默流入训练数据。

Source: https://github.com/microsoft/qlib/issues/2184

### `AP-QLIB-1892` — PIT（Point-In-Time）财务数据收集器依赖外部股票列表接口，全量 A 股获取不完整 <sub>(high)</sub>

Qlib 的 PIT 数据收集器（财务数据时间点快照）在初始化时调用 get_hs_stock_symbols() 获取沪深股票列表。该函数依赖东财 API，经常仅返回 部分列表而非全量 5000+ 股票，且函数在获取不完整时直接 raise ValueError。 用户若按文档步骤操作，财务数据集将只覆盖部分股票，基于 PIT 财务因子的回测 存在严重生存者偏差（未被采集的股票被隐式排除）。

Source: https://github.com/microsoft/qlib/issues/1892

### `AP-QLIB-2097` — 全市场 instrument="all" 在 32GB 内存机器上 OOM，但 CSI300 正常 <sub>(medium)</sub>

Qlib 在加载 Alpha158 特征时会将指定 universe 的全部特征矩阵一次性载入内存。 使用 instrument="csi300"（300 股）与 instrument="all"（5000+ 股）的内存占用 差约 16 倍。32GB 机器跑全市场时在 init_instance_by_config 阶段直接 OOM， 错误信息不提示内存问题。用户容易误以为是配置错误，实际上需要分批加载或 使用流式特征计算。

Source: https://github.com/microsoft/qlib/issues/2097

### `AP-QLIB-1984` — LightGBM 模型标签维度校验逻辑永远不触发导致多标签训练静默失败 <sub>(medium)</sub>

Qlib gbdt.py 中用 y.values.ndim == 2 判断是否为多标签，但从 DataFrame 取出的 Series 的 ndim 永远为 1，条件永远为 False，因此多标签训练不会走 squeeze 分支，而是直接进入 LightGBM 训练并在更深处抛出语义不明的错误。 用户尝试自定义多标签任务时无法从错误信息定位到此根因。

Source: https://github.com/microsoft/qlib/issues/1984

### `AP-QLIB-1915` — 自定义 CSV 数据 dump_bin 后 DataHandler 报 Length mismatch，D.features 却正常 <sub>(high)</sub>

Qlib 存在两套数据访问路径：D.features（直接读 binary）和 DataHandler/DataHandlerLP （带 processor pipeline）。自定义 A 股 CSV 数据在 dump_bin 时若字段顺序 或 symbol 格式（如 600000.SH vs SH600000）与 Qlib 约定不符，DataHandler 的 processor 在 align/reindex 时触发 Length mismatch，而 D.features 因不 经过 processor 而成功。这一"两套路径行为不一致"让用户误以为数据已正确导入。

Source: https://github.com/microsoft/qlib/issues/1915

### `AP-QLIB-1949` — Colab/Linux 多进程后端与 Qlib ParallelExt 冲突导致 DataHandler 完全不可用 <sub>(medium)</sub>

Qlib 在非 fork 环境（Windows 或 Google Colab）中，DataHandler 使用 joblib 并行加载特征时，ParallelExt 初始化时访问 _backend_args 属性失败（AttributeError）。 根因是 joblib 1.5+ 移除了该内部属性，Qlib 的兼容层未更新。表现为 D.features 调用抛出多层嵌套异常，用户无法从错误栈判断是并行后端问题还是数据问题。

Source: https://github.com/microsoft/qlib/issues/1949

## vnpy (4)

### `AP-VNPY-3691` — K 线生成器首根 K 线时间戳不对齐，导致第一个周期信号错误 <sub>(high)</sub>

vnpy BarGenerator 在合成 N 分钟 K 线时，第一根推送的 K 线时间戳为"当前 tick 所在分钟"而非"完整 N 分钟周期结束时间"。具体表现：09:59 的 tick 会 触发一根不完整的 5 分钟 K 线推送（本应等到 10:04 才推送）。策略若在 on_bar 中直接用 datetime.minute % 5 过滤，第一根 K 线恰好通过，但包含的 数据不足一个完整周期，用于信号计算会产生错误的开仓信号。

Source: https://github.com/vnpy/vnpy/issues/3691

### `AP-VNPY-3669` — Alpha 模块历史数据增量保存时新旧 DataFrame schema 不兼容导致 SchemaError <sub>(medium)</sub>

vnpy Alpha 模块在保存 K 线数据到 Parquet 文件时，将新下载数据（可能含 Float64 列）与已存文件（历史 Int64 列）直接 polars.concat。polars 强类型 不允许隐式类型提升，抛出 SchemaError。根因是不同数据源/版本返回的字段类型 不一致（如 volume 在部分行情源为整数，在另一些为浮点），且 concat 前无 schema 对齐步骤。影响所有使用 vnpy alpha 进行回测的历史数据构建流程。

Source: https://github.com/vnpy/vnpy/issues/3669

### `AP-VNPY-3685` — 价差交易模块 run_backtesting() 在 Jupyter 环境下静默报错，结果不可信 <sub>(high)</sub>

vnpy 4.10 价差交易（SpreadTrading）模块的 run_backtesting() 在 Jupyter 环境下存在事件循环冲突（asyncio already running），导致回测引擎部分逻辑 不执行但不抛异常，返回看似正常的回测统计数据。同样代码在命令行 Python 中无此问题。vnpy 4.x 将部分 IO 改为 async 但 Jupyter 的事件循环与之不兼容， 是"回测结果看起来正确但实际不完整"的隐蔽陷阱。

Source: https://github.com/vnpy/vnpy/issues/3685

### `AP-VNPY-3700` — 安装脚本不使用 venv 导致全局 numpy 版本被降级破坏其他依赖 <sub>(medium)</sub>

vnpy install.bat 直接在系统/conda base 环境安装，会强制降级 numpy 到 <2.0 以满足 vnpy 依赖，破坏依赖 numpy 2.x 的其他量化工具（如 scipy、pytorch 新版）。 没有 requirements.txt，依赖边界不透明。在多工具共存的量化研究环境中， vnpy 的安装脚本是"全局环境污染"的常见根源。

Source: https://github.com/vnpy/vnpy/issues/3700

## zipline (6)

### `AP-ZIPLINE-138` — 回测价格为未复权价，教程图表误导用户误判策略收益 <sub>(high)</sub>

Zipline 教程使用 AAPL 股价图做演示，但 bundle 中存储的是未复权价格（raw price）， 而非经过拆股/分红调整的复权价。图表显示的历史价格与市场实际价约差 4 倍（Apple 历次拆股累计因子），用户误将"价格翻 4 倍"当作策略收益。A 股场景更严重： 除权前后价格跳变会在未复权数据中形成巨大"信号"，吸引技术指标在除权日产生 虚假突破信号。

Source: https://github.com/stefan-jansen/zipline-reloaded/issues/138

### `AP-ZIPLINE-235` — 默认以当根 K 线收盘价成交，低估实盘滑点，策略回测收益虚高 <sub>(high)</sub>

Zipline 默认滑点模型在当根 K 线触发信号后，以同根 K 线收盘价成交（current bar close fill）。实盘中信号只能在下一根 K 线的开盘价附近成交（T+1 order execution）。以 A 股日线为例，用收盘价回测比用次日开盘价成交平均高估日收益 约 0.1-0.3%，年化差距可超 30%。需显式配置 slippage model 为 VolumeShareSlippage 或 FixedSlippage 并设合理 volume_limit。

Source: https://github.com/stefan-jansen/zipline-reloaded/issues/235

### `AP-ZIPLINE-190` — 日历 start_session 设为非交易日触发 DateOutOfBounds，无提示如何修正 <sub>(medium)</sub>

Zipline 在注册 bundle 或运行算法时，若 start_session 参数恰好是非交易日 （如 1998-01-01 元旦），Calendar 校验抛出 DateOutOfBounds（"cannot be earlier than the first session"）。错误信息仅显示交易日历起始日，不提示"请改为第一个 交易日"。A 股场景：使用 SSE/SZSE 日历时，若 start_date 恰好是春节前最后 一天次日（节假日），会触发同类错误，调试成本极高。

Source: https://github.com/stefan-jansen/zipline-reloaded/issues/190

### `AP-ZIPLINE-181` — asset db 过期后 Pipeline 报"no assets traded"，误导用户排查数据范围 <sub>(high)</sub>

Zipline 的 asset database（SQLite）记录每只股票的 start/end 交易日期。若 使用了旧版 Quandl/自建 bundle 且未重新 ingest，在回测新日期范围时 Pipeline 抛出 "Failed to find any assets with country_code 'US' that traded between [dates]"。A 股场景：重新下载行情后若只更新价格数据而未重建 asset db，退市/ 新上市股票的日期范围不更新，Pipeline 过滤会悄悄排除这些股票，产生生存者偏差。

Source: https://github.com/stefan-jansen/zipline-reloaded/issues/181

### `AP-ZIPLINE-285` — week_start()/week_end() 在自定义日历（非美股）下静默失效 <sub>(medium)</sub>

Zipline schedule_function 的 date_rules.week_start() 和 date_rules.week_end() 依赖交易日历的周首/周末判断逻辑，但在非美股日历（如 ASX、SSE）中，该逻辑 与 NYSE 日历的偏移计算不兼容，导致 schedule 永远不触发或在错误的日期触发。 A 股场景：使用 SSE 日历时，含春节等连续长假的周，week_start 可能跳过整个 假期周而不调仓，但用户无法从日志发现未触发的调度。

Source: https://github.com/stefan-jansen/zipline-reloaded/issues/285

### `AP-ZIPLINE-240` — 回测日期时区必须为 UTC，传入 naive datetime 引发深层 AssertionError <sub>(medium)</sub>

Zipline 内部强制要求所有时间戳为 UTC aware datetime。当用户传入 naive datetime （无时区信息，如 pd.Timestamp('2020-01-01')）时，不在入口处报错，而是在 算法执行深处触发 AssertionError: Algorithm should have a utc datetime，栈深 难以定位。A 股开发者从本地 CST 时间导入数据时极易触发此陷阱，需在 bundle 注册时显式 tz_localize('UTC')。

Source: https://github.com/stefan-jansen/zipline-reloaded/issues/240

## zvt (6)

### `AP-ZVT-183` — 除权因子为 inf/NaN 时直接参与乘法导致复权静默失败 <sub>(high)</sub>

ZVT 在计算前复权因子时以 new/old 价格比计算 qfq_factor。当 old==0（新股首日 或数据缺失）时因子为 inf；当 kdata.open 本身为 None（停牌日未填充）时乘法 抛出 TypeError。结果：整个 entity 的复权计算中断，后续 K 线全部丢失，但主 流程只 log ERROR 不中断，用户往往不知道已有大量股票数据损坏。

Source: https://github.com/zvtvz/zvt/issues/183

### `AP-ZVT-179` — 第三方数据接口超限后异常被吞噬，数据静默缺失 <sub>(high)</sub>

ZVT 使用聚宽 jqdatasdk 批量拉取全市场 K 线时（4000+ 股票），触发聚宽每日 最大查询条数限制（错误：已超过每日最大查询数量）。ZVT 捕获异常后继续执行下一 entity，导致超限后所有股票的当日数据均静默缺失。回测若使用该残缺数据库，因 子计算结果将产生系统性偏差，且无告警。

Source: https://github.com/zvtvz/zvt/issues/179

### `AP-ZVT-161` — 全市场 SQLite 批量因子计算触发 too many SQL variables 错误 <sub>(medium)</sub>

ZVT 在计算 VolumeUpMaFactor 等多股因子时，将所有 entity_id 拼入单条 SQL 的 IN 子句。当 A 股全市场（5000+ 股）一次性查询时，触发 SQLite 默认限制 SQLITE_MAX_VARIABLE_NUMBER=999。调大 max_allowed_packet（MySQL 参数）无效， 根因是 SQLite 变量数上限。正确解法是分批查询，但 ZVT 早期版本未处理此边界。

Source: https://github.com/zvtvz/zvt/issues/161

### `AP-ZVT-129` — 使用通配符导入隐藏 API 版本变更，AdjustType 等枚举莫名消失 <sub>(medium)</sub>

ZVT 文档示例使用 `from zvt import *` 导入所有符号。当 ZVT 版本升级重构 枚举（如将 AdjustType 移入子模块）后，通配符导入不再包含该符号，触发 AttributeError。使用者误以为是安装问题，实际是版本间 API breaking change 未在 CHANGELOG 中标注，且通配符导入掩盖了具体来源。应显式 import 枚举类。

Source: https://github.com/zvtvz/zvt/issues/129

### `AP-ZVT-187` — 回测引擎未在数据层空结果时提前终止，导致空指针级联崩溃 <sub>(medium)</sub>

ZVT Trader 在 load_data 完成后检查数据为空时，不提前退出，而是将空 DataFrame 传入 selector 计算，触发后续 NoneType 操作链式崩溃。错误栈深且难以定位根因， 用户误以为是策略逻辑问题。根因是数据时间窗口配置错误（start/end 不在数据 库覆盖范围内）但无有效校验。

Source: https://github.com/zvtvz/zvt/issues/187

### `AP-ZVT-183B` — HFQ（后复权）与 QFQ（前复权）K 线表使用错误导致因子计算漂移 <sub>(high)</sub>

ZVT 提供 Stock1dKdata（不复权）、Stock1dHfqKdata（后复权）、Stock1dQfqKdata （前复权）三张独立表。用户在计算价格动量/均线因子时混用两张表（如用不复权 做均线，用后复权做收益率），导致除权日前后因子值产生跳变。ZVT 不做跨表 复权类型一致性校验，混用静默通过。

Source: https://github.com/zvtvz/zvt/issues/183

FILE:references/COMPONENTS.md
# Component Capability Map

**Project**: finance-bp-107--empyrical-reloaded
**Scan date**: 2026-04-22
**Stats**: {'total_files': 6, 'total_classes': 35, 'total_functions': 0, 'total_stages': 6}

## Modules (6)

- [data_ingestion_&_utilities](components/data_ingestion_-_utilities.md): 8 classes
- [return_computation](components/return_computation.md): 5 classes
- [risk_metrics](components/risk_metrics.md): 6 classes
- [performance_metrics](components/performance_metrics.md): 6 classes
- [factor_analysis](components/factor_analysis.md): 7 classes
- [performance_attribution](components/performance_attribution.md): 3 classes

FILE:references/CONSTRAINTS.md
# Constraints

## preservation_manifest

```yaml
required_objects:
  business_decisions_count: 114
  fatal_constraints_count: 30
  non_fatal_constraints_count: 163
  use_cases_count: 3
  semantic_locks_count: 12
  preconditions_count: 4
  evidence_quality_rules_count: 2
  traceback_scenarios_count: 5

```

## Domain Constraints Injected (39)

- **`SHARED-BT-LAB-001`** <sub>(fatal)</sub>: 未来函数（Lookahead Bias）：在模拟历史时间点 t 的交易决策时， 不得使用 t 时刻之后才能知道的信息。最常见形式： (1) 使用收盘价计算信号并同日以收盘价成交； (2) 将 T 日收盘后计算的指标标记在同一根 K 线； (3) 使用当日最高/最低价作为成交假设。 信号计算与成交时间必须对齐：T 日收盘后计算信号，T+1 日开盘成交。
- **`SHARED-BT-LAB-002`** <sub>(high)</sub>: 指标预热期（Warmup Period）处理：滚动窗口指标在前 N 个 bar 时 NaN， 这些 bar 不应参与信号计算和持仓决策。强制要求指标的 warmup_period 与最长 lookback 期等长，且 warmup 期间持仓应置零。
- **`SHARED-BT-LAB-003`** <sub>(fatal)</sub>: ML/DL 模型时序数据划分必须按时间顺序：TRAIN < VALID < TEST， 不可使用随机 k-fold 分折（会将未来数据混入训练集）。 应使用 TimeSeriesSplit 或 Walk-Forward 验证。
- **`SHARED-BT-LAB-004`** <sub>(fatal)</sub>: 开盘价/最高价/最低价成交假设：日线回测中假设每日可以最高价卖出或 最低价买入（如动量策略"最高价止盈"），这是明显的 lookahead， 因为日内最高/最低价只有收盘后才能确认。成交价只能用开盘价或 前一日收盘价（带滑点）。
- **`SHARED-BT-LAB-005`** <sub>(high)</sub>: 数据对齐偏移（Off-by-one）：pandas rolling/shift 等操作容易引入细微的 1 期偏移错误。应在代码中明确记录每个序列的"观测时间点"， 并通过 assert 验证关键时间对齐关系。
- **`SHARED-BT-LAB-006`** <sub>(high)</sub>: 过度优化（Overfitting）：回测数量越多，过拟合概率越高。 Bailey et al.（2014）证明 Optimal Sharpe Ratio 期望值随回测次数单调递减。 应使用 Walk-Forward 验证代替 in-sample 参数穷举，并报告 Deflated Sharpe Ratio（DSR）而非峰值 Sharpe。
- **`SHARED-BT-SURV-001`** <sub>(fatal)</sub>: 幸存者偏差（Survivorship Bias）：使用当前市场成分股作为历史回测股票池， 会遗漏曾经存在但后来退市、摘牌或被合并的股票，系统性高估策略历史收益率。 回测股票池必须使用历史时点快照（point-in-time universe）。
- **`SHARED-BT-SURV-002`** <sub>(high)</sub>: In-Sample / Out-of-Sample 划分：策略开发、参数选择必须在样本内完成， 样本外数据仅用于最终验证，不可多次"看"样本外数据后继续调优 （会将样本外变为新的样本内，重蹈过拟合）。
- **`SHARED-BT-SURV-003`** <sub>(high)</sub>: 停牌/缺失数据的填充策略：停牌日价格不可简单用前一日收盘价 forward-fill， 因为这会在复盘时造成"零成交量"日参与了因子计算和信号生成。 应在因子计算层显式过滤缺失交易日，不填充。
- **`SHARED-BT-SURV-004`** <sub>(high)</sub>: 异常值（Extreme Value）污染：原始市场数据可能含有数据源错误（如除权未 及时调整、手工录入错误导致的极端价格），不清洗直接进入因子计算会产生 极端信号，污染整个横截面。应在 pipeline 入口处过滤 3-sigma 异常值。
- **`SHARED-BT-COST-001`** <sub>(fatal)</sub>: 交易成本（佣金 + 印花税/转让税 + 过户费）必须在回测初始化时强制配置， 不可使用零成本默认值。忽略成本的回测策略绩效指标具有欺骗性， 高换手率策略尤其严重（单边往返成本往往吞噬 50%+ 的毛收益）。
- **`SHARED-BT-COST-002`** <sub>(high)</sub>: 滑点（Slippage）建模：回测若无滑点，假设每笔订单以理想价格成交， 高频策略在实盘中会因成交价劣化而产生严重亏损。至少应配置固定点差 或比例滑点；大单应使用成交量比例模型（如不超过日成交量 5%）。
- **`SHARED-BT-COST-003`** <sub>(high)</sub>: 换手率（Turnover）必须在回测绩效报告中展示并与成本关联分析。 月换手率超过 50%（年化 600%+）时，策略净收益对成本假设极度敏感， 每 10bps 成本变化可能改变策略盈亏结论，必须做成本敏感性分析。
- **`SHARED-BT-COST-004`** <sub>(medium)</sub>: 仓位规模化（Position Sizing）必须纳入资金量约束：回测应模拟固定资金量 下的实际持仓股数（取整），而非假设可以持有小数股。 对小盘股，最小交易单位（A股：100股/手）会导致实际可持仓量与目标权重 产生偏差，应在回测中模拟取整效应。
- **`SHARED-BT-TIME-001`** <sub>(high)</sub>: 时间戳时区统一：多数据源合并时，UTC vs 本地时间混用是常见数据腐败源。 所有时间戳必须在 pipeline 入口处统一转换为同一时区（推荐 UTC 存储， 市场本地时区展示），不可在 pipeline 中途混用不同时区。
- **`SHARED-BT-TIME-002`** <sub>(high)</sub>: 交易日历对齐：合并不同市场或不同频率数据时（如日线价格 + 周频因子）， 必须使用明确的交易日历进行 reindex/merge，不可使用 outer join 后 fillna， 否则会在非交易日（节假日）创建虚假数据行。
- **`SHARED-BT-TIME-003`** <sub>(high)</sub>: 增量更新边界校验：历史数据增量更新时，必须从数据库查询已存最新日期， 仅下载该日期之后的数据。若重新下载已有数据并追加，会产生时间戳重复行， 导致回测时序错误。更新前后必须校验无重复 (index.duplicated().any() == False)。
- **`SHARED-BT-TIME-004`** <sub>(medium)</sub>: 回测绩效归因失真：基准（Benchmark）选择不当会使 Alpha/Beta 计算失真。 应选用策略实际可投资的被动基准（如 HS300 ETF），而非不可直接投资的 价格指数（如 HS300 指数）。价格指数不含股息再投资，会低估持仓基准收益。
- **`SHARED-BT-PERF-001`** <sub>(medium)</sub>: 最大回撤（Max Drawdown）计算必须使用净值序列（portfolio value）， 不可用累计收益率序列代替。若使用对数收益率累加，会低估回撤深度 （因对数收益率在下跌时会比简单收益率偏小）。
- **`SHARED-BT-PERF-002`** <sub>(medium)</sub>: Sharpe Ratio 年化化约定：年化 Sharpe = 日 Sharpe × sqrt(252)（股票，252 交易日） 或 × sqrt(365)（加密货币，365日）。不同系统默认不同，跨系统对比前必须 确认年化因子，否则 Sharpe 不可比。
- **`SHARED-BT-PERF-003`** <sub>(medium)</sub>: Calmar Ratio / Sortino Ratio 优于 Sharpe Ratio 作为风险调整收益指标： Sharpe 假设收益正态分布，A 股/加密市场的收益分布显著左偏（肥尾）， 会低估下行风险。量化评估应同时报告 Sortino（仅下行波动）和 Calmar（年化收益/最大回撤），不应单一依赖 Sharpe。
- **`SHARED-BT-PERF-004`** <sub>(medium)</sub>: 回测绩效归因应拆解为：alpha（主动收益）、beta（市场收益）、 因子暴露收益（style/sector）和特异性收益（stock selection）。 不做归因的回测无法区分"策略优秀"与"顺风行情恰好 beta 对了"。
- **`SHARED-FR-IC-001`** <sub>(high)</sub>: IC（信息系数）是衡量因子预测能力的核心指标，定义为因子值与 下期收益率的 Spearman 秩相关系数（ICIR = IC / std(IC)）。 IC 绝对值 > 0.05 视为有预测能力的初步证据，ICIR > 0.5 视为稳定。 不计算 IC 直接报告回测绩效是因子有效性证明缺失的典型问题。
- **`SHARED-FR-IC-002`** <sub>(high)</sub>: IC 衰减（IC Decay）分析：因子预测能力通常随持仓期增长而衰减。 应计算 1/5/10/20 日 IC 序列，识别因子的最优持仓期。 IC 在1日高但20日迅速衰减的因子是短期因子，不适合月度换仓策略； 反之亦然。使用错误的持仓期会严重损害因子实盘表现。
- **`SHARED-FR-IC-003`** <sub>(high)</sub>: Harvey, Liu & Zhu (2016) 警告：学术界已发现 300+ 个"显著"因子， 其中大量是多重检验下的误发现（False Discovery）。因子有效性要求： t-stat > 3.0（而非传统的 1.96）；或在不同时段/市场独立复现； 或有清晰的经济学逻辑。不满足上述条件的因子极可能是数据挖掘产物。
- **`SHARED-FR-IC-004`** <sub>(high)</sub>: 因子换手率（Factor Turnover）控制：高 IC 但高换手率的因子，在扣除 交易成本后净 IC 可能为负。应计算换手率调整后的有效 IC： net_IC = IC - turnover × cost_per_turn。目标换手率 ≤ 50%（月频）。
- **`SHARED-FR-IC-005`** <sub>(medium)</sub>: 因子衰减期（Half-life）是因子信号强度的核心参数，直接决定最优再平衡频率。 半衰期 < 5 日：日频或周频换仓；5-20 日：周频或双周；> 20 日：月频换仓。 错误地对短期因子使用月频换仓，会导致大量 alpha 在持仓期内消散。
- **`SHARED-FR-NEUT-001`** <sub>(high)</sub>: 行业中性化（Industry Neutralization）：因子值若不对行业均值中性化， 因子收益中会混入行业轮动收益，难以判断是因子本身还是行业暴露驱动了收益。 行业中性化操作：factor_neutral = factor - industry_mean(factor)。
- **`SHARED-FR-NEUT-002`** <sub>(high)</sub>: 市值中性化（Market Cap Neutralization）：小盘股效应（小盘跑赢大盘） 是金融史上最持久的 anomaly 之一，会污染几乎所有未中性化的因子。 若因子与市值高度相关，选股会系统性偏向小盘，收益来自市值暴露而非因子本身。 需同时进行行业和市值中性化（Fama-MacBeth 回归或残差法）。
- **`SHARED-FR-NEUT-003`** <sub>(high)</sub>: 异常值处理（Winsorize/MAD）：因子原始值通常含有极端值，极端值会扭曲 分组分析（如 Q1/Q10 十分位）。应对原始因子值做 Winsorize（截尾至 [1%, 99%] 或 3-sigma）或 MAD（中位数绝对偏差）缩尾，然后再排名/中性化。
- **`SHARED-FR-NEUT-004`** <sub>(medium)</sub>: 因子正交化（Factor Orthogonalization）：当多个因子共同用于合成打分时， 高相关因子的合成等效于对单一因子过度权重，稀释信号多样性。 应在合成前对因子做施密特正交化或 PCA，消除因子间的多重共线性。
- **`SHARED-FR-NEUT-005`** <sub>(medium)</sub>: 缺失数据填充策略：因子计算中的 NaN（停牌/新股/数据缺口）若用截面均值填充 会引入 lookahead bias（均值本身含未来信息）；若完全删除会产生幸存者偏差； 正确做法是用截面中位数（当日所有股票的中位数，不依赖未来）或将该股当日排除。
- **`SHARED-FR-PORT-001`** <sub>(high)</sub>: 分层分析（Quantile Analysis）：因子评估应使用 Q1/Q5（五分位）或 Q1/Q10（十分位）分组的多空收益差（top minus bottom spread）作为 主要评估指标，而非简单的多头收益。Q1 多 Q5 空的"单调性"检验是 因子有效性的核心证据：单调递增/递减 > 非单调 >> 仅多头有效。
- **`SHARED-FR-PORT-002`** <sub>(medium)</sub>: Alpha 衰减测试（Alpha Decay Test）：因子的月度 IC 在不同时段（牛市/熊市/ 震荡市）的稳定性是因子鲁棒性的重要证据。IC 仅在某个特定市场状态下有效 的因子不适合全天候部署；应分段（rolling 12M）展示 IC 时序， 识别因子失效期。
- **`SHARED-FR-PORT-003`** <sub>(medium)</sub>: 换仓成本感知（Turnover-Aware Selection）：因子排名靠近中间地带（49-51 分位） 的股票，排名小幅波动就会触发换仓，产生大量无效交易成本。 应在选股时设置换仓缓冲区（buffer zone）：只在排名变化超过阈值时才换仓。
- **`SHARED-FR-PORT-004`** <sub>(medium)</sub>: 分组收益的统计显著性（Bootstrap 检验）：因子分层收益差（Q1-Q5 spread） 即使在历史数据上很大，也可能是偶然，需要 bootstrap 或 t-test 检验 显著性（p-value < 0.05）。小样本回测期（< 3年）的分层收益尤其不可靠。
- **`SHARED-FR-XFER-001`** <sub>(high)</sub>: 因子跨市场可移植性验证：在一个市场有效的因子，不必然在另一个市场有效。 将美股因子直接套用 A 股、或将股票因子套用期货/加密货币，需要独立 IC 验证， 不可假设跨市场通用性。A 股特有异象（如反转效应、ST 价格异常）不存在于美股。
- **`SHARED-FR-XFER-002`** <sub>(medium)</sub>: 因子有效性时间稳定性：曾经有效的因子会因市场学习和套利行为逐渐失效 （McLean & Pontiff 2016 证明因子发表后平均衰减 58%）。 应定期（每季度/年）重新评估因子 IC，失效因子应及时替换或降权。
- **`SHARED-FR-XFER-003`** <sub>(medium)</sub>: 因子与宏观经济环境的交互：利率周期/经济周期/市场情绪对因子有效性影响显著。 价值因子（低 P/B）在利率上升期更有效；动量因子在趋势市更有效，震荡市失效。 部署因子前应评估当前宏观环境与因子最优生存环境的匹配度。

FILE:references/LOCKS.md
# Semantic Locks + Preconditions

## Semantic Locks (12)

### `SL-01` <sub>(on_violation: fatal)</sub>
Execute sell orders before buy orders in every trading cycle

### `SL-02` <sub>(on_violation: fatal)</sub>
Trading signals MUST use next-bar execution (no look-ahead)

### `SL-03` <sub>(on_violation: fatal)</sub>
Entity IDs MUST follow format entity_type_exchange_code

### `SL-04` <sub>(on_violation: fatal)</sub>
DataFrame index MUST be MultiIndex (entity_id, timestamp)

### `SL-05` <sub>(on_violation: fatal)</sub>
TradingSignal MUST have EXACTLY ONE of: position_pct, order_money, order_amount

### `SL-06` <sub>(on_violation: fatal)</sub>
filter_result column semantics: True=BUY, False=SELL, None/NaN=NO ACTION

### `SL-07` <sub>(on_violation: fatal)</sub>
Transformer MUST run BEFORE Accumulator in factor pipeline

### `SL-08` <sub>(on_violation: fatal)</sub>
MACD parameters locked: fast=12, slow=26, signal=9

### `SL-09` <sub>(on_violation: warning)</sub>
Default transaction costs: buy_cost=0.001, sell_cost=0.001, slippage=0.001

### `SL-10` <sub>(on_violation: fatal)</sub>
A-share equity trading is T+1 (no same-day close of buy positions)

### `SL-11` <sub>(on_violation: fatal)</sub>
Recorder subclass MUST define provider AND data_schema class attributes

### `SL-12` <sub>(on_violation: fatal)</sub>
Factor result_df MUST contain either 'filter_result' OR 'score_result' column

## Preconditions (4)

- **`PC-01`**: `python3 -c 'import zvt; print(zvt.__version__)'` → on_fail: Run: python3 -m pip install zvt then re-run: python3 -m zvt.init_dirs to initialize data directories
- **`PC-02`**: `python3 -c "from zvt.api.kdata import get_kdata; df = get_kdata(entity_ids=['stock_sh_600000'], limit=1); assert df is not None and len(df) > 0, 'No kdata found'"` → on_fail: Run recorder first: python3 -m zvt.recorders.em.em_stock_kdata_recorder --entity_ids stock_sh_600000 (replace with your target entity IDs)
- **`PC-03`**: `python3 -c "import os; from pathlib import Path; zvt_home = Path(os.environ.get('ZVT_HOME', Path.home() / '.zvt')); assert zvt_home.exists(), f'ZVT home not found: {zvt_home}'"` → on_fail: Run: python3 -m zvt.init_dirs
- **`PC-04`**: `python3 -c "import os, tempfile; from pathlib import Path; zvt_home = Path(os.environ.get('ZVT_HOME', Path.home() / '.zvt')); test_f = zvt_home / '.write_test'; test_f.touch(); test_f.unlink()"` → on_fail: Check directory permissions: chmod u+w ~/.zvt or set ZVT_HOME environment variable to a writable location

FILE:references/USE_CASES.md
# Known Use Cases (KUC)

Total: **3**

## `KUC-101`
**Source**: `docs/conf.py`

Configuring Sphinx to automatically generate API documentation from docstrings and source code comments for the empyrical library

## `KUC-102`
**Source**: `docs/deploy.py`

Automating the process of cleaning, building, and deploying Sphinx documentation to a hosting platform for the empyrical project

## `KUC-103`
**Source**: `docs/source/conf.py`

Configuring advanced Sphinx extensions including autodoc filtering, numpydoc integration, and markdown support for comprehensive documentation generation

FILE:references/WISDOM.md
# Cross-Project Wisdom

Total: **10**

## `CW-BT-001` — Cerebro 统一编排引擎
**From**: backtrader · **Applicable to**: backtesting

backtrader 用 Cerebro 作为单一入口，统一管理 data feeds、strategies、analyzers、 observers 的生命周期，支持一次 cerebro.run() 跑多策略+多数据源。 zvt 的 StockTrader 目前每次实例化只绑定一套因子，缺乏统一的多策略组合编排层； 借鉴 Cerebro 模式可让用户把多个 Trader 实例组合到一个 runner 中对比评估。

## `CW-BT-002` — Analyzer 插件化绩效评估
**From**: backtrader · **Applicable to**: backtesting

backtrader 提供 SharpeRatio、DrawDown、TimeReturn、TradeAnalyzer 等即插即用 的 Analyzer，可在不修改策略代码的情况下附加任意绩效指标。 zvt 当前绩效评估能力较弱，没有标准化的 Analyzer 接口； 借鉴此模式可让用户 cerebro.addanalyzer(SharpeRatio) 即得风险调整收益报告。

## `CW-BT-003` — Sizer 仓位管理分离
**From**: backtrader · **Applicable to**: backtesting

backtrader 将仓位管理（每次开仓买多少股/多大比例）单独抽象为 Sizer， 与信号逻辑完全解耦；内置 FixedSize、PercentSizer 等，用户可自定义。 zvt 目前没有显式的 Sizer 概念，仓位控制逻辑散落在 Trader.on_profit_control 等钩子中； 引入 Sizer 接口可使策略信号与资金管理规则独立演化和组合复用。

## `CW-BT-004` — Order 类型全集（Limit/Stop/OCO/Bracket）
**From**: backtrader · **Applicable to**: backtesting

backtrader 支持 Market、Limit、Stop、StopLimit、OCO（二选一）、 Bracket（止盈止损一对订单）等丰富订单类型，并模拟成交滑点和手续费方案。 zvt 回测目前主要支持市价成交，缺乏限价委托和组合订单模拟； 对于高频或实盘对接场景，完善订单类型将大幅提升回测真实性。

## `CW-BT-005` — 数据重采样与重播（Resampling & Replaying）
**From**: backtrader · **Applicable to**: backtesting

backtrader 可将低级别数据（如 1 min）实时 resample 为高级别（如 1 day）并同步驱动策略， 或 replay 逐 tick 模拟 OHLC 形成过程，实现日内精细回测。 zvt 目前多时间框架通过预录入不同级别 K 线实现，缺少运行时动态重采样； 借鉴此模式可在不重复录入数据的前提下支持任意时间粒度组合回测。

## `CW-VN-003` — CTA 回测引擎内置可视化
**From**: vnpy · **Applicable to**: backtesting

vnpy 的 cta_backtester 提供图形界面直接展示策略净值曲线、最大回撤、 每日盈亏、成交明细，无需 Jupyter Notebook。 zvt 目前回测结果可视化依赖 draw_result 方法调用 Plotly，但无统一的回测报告页面； 借鉴此模式可打包一个开箱即用的策略绩效仪表盘。

## `CW-VN-004` — vnpy.alpha ML 因子研究实验室（Lab）
**From**: vnpy · **Applicable to**: factor-research

vnpy 4.0 的 vnpy.alpha.lab 提供数据管理、模型训练、信号生成、策略回测一体化工作流， 支持 Lasso/LightGBM/MLP 等算法的标准化训练接口和可视化对比。 zvt 的 ML 能力目前仅有 MaStockMLMachine 一个入口，缺乏规范化 Lab 框架； 借鉴 Lab 模式可建立"特征工程→训练→信号→回测"的标准流水线，降低 ML 实验门槛。

## `CW-QL-001` — Point-in-Time 数据库（防未来数据泄漏）
**From**: qlib · **Applicable to**: backtesting

qlib 的 Point-in-Time Provider 保证在给定时间点 t 的查询只返回 t 时刻 真实可知的数据（财报发布延迟、修订历史均被正确处理）， 彻底消除回测中的 look-ahead bias。 zvt 目前财务数据以报告期为 timestamp，缺少"发布日"维度， 存在用未来财报数据做选股的潜在偏差；引入 PIT 模式可大幅提升回测可信度。

## `CW-QL-002` — Recorder + Experiment 实验管理（MLflow 风格）
**From**: qlib · **Applicable to**: factor-research

qlib 的 workflow 模块提供 Experiment/Recorder，自动记录每次模型训练的 超参数、特征、指标、预测结果，支持跨实验比较和模型版本管理。 zvt 目前缺乏 ML 实验追踪机制，每次重跑结果会覆盖前次； 借鉴 Recorder 模式可将每次因子实验的参数和结果持久化，支持快速复现和版本对比。

## `CW-QL-003` — Nested Decision Framework（多层嵌套决策执行）
**From**: qlib · **Applicable to**: backtesting

qlib 支持将高频执行层（分钟级委托拆单）嵌套在低频决策层（日级组合调仓）中， 两层独立优化且可组合运行，实现日内最优执行算法（如 TWAP、VWAP 调仓）。 zvt 目前回测仅有日线级别的成交假设，缺乏执行算法建模； 借鉴嵌套框架可让 zvt 区分"何时持有哪些股"与"如何以最小冲击成本建仓"两个问题。

FILE:references/components/data_ingestion_-_utilities.md
# data_ingestion_&_utilities (8 classes)

## `roll`
`data_ingestion_&_utilities/roll.py:0`

## `_create_unary_vectorized_roll_function`
`data_ingestion_&_utilities/create-unary-vectorized-roll-function.py:0`

## `_create_binary_vectorized_roll_function`
`data_ingestion_&_utilities/create-binary-vectorized-roll-function.py:0`

## `_aligned_series`
`data_ingestion_&_utilities/aligned-series.py:0`

## `get_fama_french`
`data_ingestion_&_utilities/get-fama-french.py:0`

## `up`
`data_ingestion_&_utilities/up.py:0`

## `down`
`data_ingestion_&_utilities/down.py:0`

## `nan_aggregation`
`data_ingestion_&_utilities/nan-aggregation.py:0`

FILE:references/components/factor_analysis.md
# factor_analysis (7 classes)

## `alpha_beta`
`factor_analysis/alpha-beta.py:0`

## `alpha_aligned`
`factor_analysis/alpha-aligned.py:0`

## `beta_aligned`
`factor_analysis/beta-aligned.py:0`

## `capture`
`factor_analysis/capture.py:0`

## `up_capture`
`factor_analysis/up-capture.py:0`

## `down_capture`
`factor_analysis/down-capture.py:0`

## `beta_fragility_heuristic`
`factor_analysis/beta-fragility-heuristic.py:0`

FILE:references/components/performance_attribution.md
# performance_attribution (3 classes)

## `perf_attrib`
`performance_attribution/perf-attrib.py:0`

## `compute_exposures`
`performance_attribution/compute-exposures.py:0`

## `attribution_model`
`performance_attribution/attribution-model.py:0`

FILE:references/components/performance_metrics.md
# performance_metrics (6 classes)

## `sharpe_ratio`
`performance_metrics/sharpe-ratio.py:0`

## `sortino_ratio`
`performance_metrics/sortino-ratio.py:0`

## `omega_ratio`
`performance_metrics/omega-ratio.py:0`

## `calmar_ratio`
`performance_metrics/calmar-ratio.py:0`

## `annual_return`
`performance_metrics/annual-return.py:0`

## `risk_free_rate`
`performance_metrics/risk-free-rate.py:0`

FILE:references/components/return_computation.md
# return_computation (5 classes)

## `simple_returns`
`return_computation/simple-returns.py:0`

## `cum_returns`
`return_computation/cum-returns.py:0`

## `cum_returns_final`
`return_computation/cum-returns-final.py:0`

## `aggregate_returns`
`return_computation/aggregate-returns.py:0`

## `annualization_factor`
`return_computation/annualization-factor.py:0`

FILE:references/components/risk_metrics.md
# risk_metrics (6 classes)

## `max_drawdown`
`risk_metrics/max-drawdown.py:0`

## `drawdown_series`
`risk_metrics/drawdown-series.py:0`

## `annual_volatility`
`risk_metrics/annual-volatility.py:0`

## `downside_risk`
`risk_metrics/downside-risk.py:0`

## `tail_ratio`
`risk_metrics/tail-ratio.py:0`

## `tail_distribution_model`
`risk_metrics/tail-distribution-model.py:0`

ClawHub DevOps Data Analysis+2

T@clawhub-tangweigang-jpg-8679fec286

Edgar Crawler

Skill

从 SEC EDGAR 批量抓取上市公司年报（10-K）和季报（10-Q）文件，支持按季度增量更新与本地缓存，适用于美股基本面分析和量化研究数据获取。。

---
name: edgar-crawler
description: |-
  从 SEC EDGAR 批量抓取上市公司年报（10-K）和季报（10-Q）文件，支持按季度增量更新与本地缓存，适用于美股基本面分析和量化研究数据获取。。
license: Proprietary. See LICENSE.txt in project root.
compatibility: Designed for Doramagic-host ecosystem (Claude Code / openclaw / Cursor). Requires Python 3.12+ with uv package manager.
metadata:
  version: "v6.1"
  blueprint_id: "finance-bp-114"
  compiled_at: "2026-04-22T13:00:54.950360+00:00"
  capability_markets: "multi-market"
  capability_activities: "data-sourcing"
  sop_version: "crystal-compilation-v6.1"
---
# EDGAR 文件抓取 (edgar-crawler)

> 从 SEC EDGAR 批量抓取上市公司年报（10-K）和季报（10-Q）文件，支持按季度增量更新与本地缓存，适用于美股基本面分析和量化研究数据获取。

## Pipeline

`data_collection -> data_storage -> factor_computation -> target_selection -> trading_execution -> visualization`

## Top Use Cases (1 total)

### SEC EDGAR Filing Extraction (`UC-101`)
Extracts and processes SEC EDGAR filings (10-K annual reports, 10-Q quarterly reports) from compressed ZIP archives for downstream financial analysis
**Triggers**: EDGAR, SEC filings, 10-K extraction

**Execute trigger**: `When user intent matches intent_router.uc_entries[].positive_terms AND user uses action verb (run/execute/跑/执行/backtest/fetch/collect)`

## What I'll Ask You

- Target market: A-share (default), HK, or crypto? (US stocks in ZVT are half-baked — stockus_nasdaq_AAPL exists but coverage is thin)
- Data source / provider: eastmoney (free, no account), joinquant (account+paid), baostock (free, good history), akshare, or qmt (broker)?
- Strategy type: MACD golden-cross, MA crossover, volume breakout, fundamental screen, or custom factor?
- Time range: start_timestamp and end_timestamp for backtest period
- Target entity IDs: specific stocks (stock_sh_600000) or index components (SZ1000)?

## Semantic Locks (Fatal)

| ID | Rule | On Violation |
|---|---|---|
| `SL-01` | Execute sell orders before buy orders in every trading cycle | halt |
| `SL-02` | Trading signals MUST use next-bar execution (no look-ahead) | halt |
| `SL-03` | Entity IDs MUST follow format entity_type_exchange_code | halt |
| `SL-04` | DataFrame index MUST be MultiIndex (entity_id, timestamp) | halt |
| `SL-05` | TradingSignal MUST have EXACTLY ONE of: position_pct, order_money, order_amount | halt |
| `SL-06` | filter_result column semantics: True=BUY, False=SELL, None/NaN=NO ACTION | halt |
| `SL-07` | Transformer MUST run BEFORE Accumulator in factor pipeline | halt |
| `SL-08` | MACD parameters locked: fast=12, slow=26, signal=9 | halt |

Full lock definitions: [references/LOCKS.md](references/LOCKS.md)

## Top Anti-Patterns (14 total)

- **`AP-DATA-SOURCING-001`**: Missing or invalid User-Agent headers for SEC API requests
- **`AP-DATA-SOURCING-002`**: Ignoring external API rate limits causing IP blocking
- **`AP-DATA-SOURCING-003`**: No HTTP timeout configuration causing indefinite hangs

All 14 anti-patterns: [references/ANTI_PATTERNS.md](references/ANTI_PATTERNS.md)

## Evidence Quality Notice

> [QUALITY NOTICE] This crystal was compiled from blueprint finance-bp-114. Evidence verify ratio = 32.9% and audit fail total = 29. Generated results may have uncaptured requirement gaps. Verify critical decisions against source files (LATEST.yaml / LATEST.jsonl).

## Reference Files

| File | Contents | When to Load |
|---|---|---|
| [references/seed.yaml](references/seed.yaml) | V6+ 全量权威 (source-of-truth) | 有行为/决策争议时必读 |
| [references/ANTI_PATTERNS.md](references/ANTI_PATTERNS.md) | 14 条跨项目反模式 | 开始实现前 |
| [references/WISDOM.md](references/WISDOM.md) | 跨项目精华借鉴 | 架构决策时 |
| [references/CONSTRAINTS.md](references/CONSTRAINTS.md) | domain + fatal 约束 | 规则冲突时 |
| [references/USE_CASES.md](references/USE_CASES.md) | 全量 KUC-* 业务场景 | 需要完整示例时 |
| [references/LOCKS.md](references/LOCKS.md) | SL-* + preconditions + hints | 生成回测/交易代码前 |
| [references/COMPONENTS.md](references/COMPONENTS.md) | AST 组件地图（按 module 拆分）| 查 API 时 |

---

*Compiled by Doramagic crystal-compilation-v6.1 from `finance-bp-114` blueprint at 2026-04-22T13:00:54.950360+00:00.*
*See [human_summary.md](human_summary.md) for non-technical overview.*

FILE:human_summary.md
# Human Summary

> I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow. Just tell me what you want; I'll write the code, you don't have to dig docs. (Heads up: ZVT natively supports A-share, HK, and crypto. US stocks — stockus_nasdaq_AAPL — are half-baked; don't bother for serious work.)

## What I Can Do

- **tagline**: I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow. Just tell me what you want; I'll write the code, you don't have to dig docs. (Heads up: ZVT natively supports A-share, HK, and crypto. US stocks — stockus_nasdaq_AAPL — are half-baked; don't bother for serious work.)
- **use_cases**: ['SEC EDGAR Filing Extraction', 'A-share MACD daily golden-cross backtest with hfq price adjustment from eastmoney', 'End-to-end ZVT pipeline: FinanceRecorder + GoodCompanyFactor + StockTrader', 'Multi-factor strategy with TargetSelector (AND mode) combining MACD + volume breakout', 'Index composition data collection (SZ1000, SZ2000) with EM recorder', 'Institutional fund holdings tracker via joinquant_fund_runner pattern', 'Custom Transformer + Accumulator factor with per-entity rolling state']

## What I Ask You

- Target market: A-share (default), HK, or crypto? (US stocks in ZVT are half-baked — stockus_nasdaq_AAPL exists but coverage is thin)
- Data source / provider: eastmoney (free, no account), joinquant (account+paid), baostock (free, good history), akshare, or qmt (broker)?
- Strategy type: MACD golden-cross, MA crossover, volume breakout, fundamental screen, or custom factor?
- Time range: start_timestamp and end_timestamp for backtest period
- Target entity IDs: specific stocks (stock_sh_600000) or index components (SZ1000)?

FILE:references/ANTI_PATTERNS.md
# Anti-Patterns (Cross-Project)

Total: **14**

## finance-bp-070--edgartools (2)

### `AP-DATA-SOURCING-004` — Invalidating XBRL period types for balance sheet analysis <sub>(high)</sub>

Balance sheets represent point-in-time snapshots (instant periods), not ranges (duration periods). Using duration periods for balance sheet statements causes stockholder equity and other line items to show nonsensical date ranges, corrupting financial calculations that depend on accurate period associations.

### `AP-DATA-SOURCING-012` — Large document parsing without streaming causing OOM errors <sub>(high)</sub>

SEC filings can exceed 160MB, and parsing large documents in memory without streaming causes OOM errors that crash the entire service for all users. Documents exceeding 10MB require switching to streaming parsers to prevent extreme memory usage.

## finance-bp-070--edgartools, finance-bp-079--akshare, finance-bp-084--eastmoney, finance-bp-114--edgar-crawler (1)

### `AP-DATA-SOURCING-002` — Ignoring external API rate limits causing IP blocking <sub>(high)</sub>

Multiple financial data sources (SEC EDGAR, Sina, Eastmoney, TuShare) enforce strict rate limits (10 req/sec, 120 calls/minute). Exceeding these triggers temporary IP blocks lasting 10-60 minutes, causing complete data unavailability. Immediate retry attempts during blocks extend the block duration significantly.

## finance-bp-070--edgartools, finance-bp-114--edgar-crawler (1)

### `AP-DATA-SOURCING-001` — Missing or invalid User-Agent headers for SEC API requests <sub>(high)</sub>

SEC EDGAR requires valid User-Agent identity with contact information in headers. Without this, requests are rejected with 403 Forbidden errors, completely blocking all filing access. Both edgartools and edgar-crawler enforce this constraint as fundamental to any data retrieval operation.

## finance-bp-079--akshare (4)

### `AP-DATA-SOURCING-003` — No HTTP timeout configuration causing indefinite hangs <sub>(high)</sub>

HTTP requests to external financial data sources (Yahoo, Sina, Eastmoney) without timeout values can hang indefinitely on blocked connections. This freezes the entire application and prevents data collection from all other sources, creating cascading failures across the system.

### `AP-DATA-SOURCING-005` — Malformed or empty JSON responses causing silent failures <sub>(medium)</sub>

Financial API responses containing malformed JSON raise unhandled ValueError exceptions, crashing downstream processing. Similarly, empty JSON responses (empty dict, list, null) masquerading as valid data cause silent failures producing empty DataFrames or misleading results in financial analysis.

### `AP-DATA-SOURCING-006` — Source-specific symbol mapping errors causing data corruption <sub>(high)</sub>

Stock symbols require source-specific formatting (sh/sz prefixes for Sina, numeric codes for THS, etc.). Incorrect symbol mapping causes API calls to return empty results or wrong data, corrupting financial datasets with missing records or entirely incorrect tickers being stored.

### `AP-DATA-SOURCING-013` — Column mapping length mismatch causing DataFrame errors <sub>(medium)</sub>

Column mapping constants with length mismatch against actual API response columns cause ValueError exceptions during DataFrame construction. Raw field names (f1, f2, f12) must be mapped to meaningful names (最新价, 涨跌幅) with exact column count alignment.

## finance-bp-103--ArcticDB (3)

### `AP-DATA-SOURCING-007` — Using unsupported DataFrame types with time-series storage <sub>(high)</sub>

ArcticDB does not support MultiIndex columns, PyArrow-backed pandas DataFrames, or timedelta64 columns. Attempting to write these DataFrame types raises ArcticDbNotYetImplemented exceptions, causing write failures and permanent data loss if not properly handled before storage operations.

### `AP-DATA-SOURCING-008` — Non-atomic storage writes causing concurrent access corruption <sub>(high)</sub>

Storage backends without atomic write_if_none operations can cause data corruption under concurrent multi-writer access. Similarly, updating reference keys before atom keys complete allows readers to access incomplete or missing data, breaking version chain integrity.

### `AP-DATA-SOURCING-014` — Pruning snapshot-protected versions breaking point-in-time recovery <sub>(high)</sub>

Deleting or pruning versions that are referenced by existing snapshots breaks historical data access. Snapshots provide point-in-time recovery capabilities, and removing their referenced versions causes read failures when users attempt to access data from specific snapshots.

## finance-bp-114--edgar-crawler (1)

### `AP-DATA-SOURCING-010` — 8-K filing item numbering scheme mismatch for historical filings <sub>(medium)</sub>

8-K filings use obsolete item numbering (1-12) before 2004-08-23 and new numbering (1.01-9.01) after. Using the wrong numbering scheme causes no matches for historical filings, resulting in empty item sections and complete extraction failure for pre-2004 data.

## finance-bp-128--yfinance (2)

### `AP-DATA-SOURCING-009` — Missing timezone-aware DatetimeIndex causing DST offset errors <sub>(high)</sub>

Price history DataFrames returned without timezone-aware DatetimeIndex cause incorrect timestamp interpretation when combined with other timezone-aware data. This leads to 23-25 hour offset errors during daylight saving time transitions, corrupting historical price calculations.

### `AP-DATA-SOURCING-011` — Yahoo Finance missing crumb authentication causing 401/403 errors <sub>(high)</sub>

Yahoo Finance API requires crumb and cookie authentication with every request. Without proper crumb management, API calls return 401 Unauthorized or HTML error pages instead of JSON data, breaking all downstream price and financial data processing.

FILE:references/COMPONENTS.md
# Component Capability Map

**Project**: finance-bp-114--edgar-crawler
**Scan date**: 2026-04-22
**Stats**: {'total_files': 4, 'total_classes': 16, 'total_functions': 0, 'total_stages': 4}

## Modules (4)

- [index_download_stage](components/index_download_stage.md): 4 classes
- [crawl_and_download_stage](components/crawl_and_download_stage.md): 3 classes
- [document_parsing_stage](components/document_parsing_stage.md): 8 classes
- [logging_infrastructure](components/logging_infrastructure.md): 1 classes

FILE:references/CONSTRAINTS.md
# Constraints

## preservation_manifest

```yaml
required_objects:
  business_decisions_count: 92
  fatal_constraints_count: 31
  non_fatal_constraints_count: 139
  use_cases_count: 1
  semantic_locks_count: 12
  preconditions_count: 4
  evidence_quality_rules_count: 2
  traceback_scenarios_count: 5

```

## Domain Constraints Injected (16)

- **`SHARED-DS-RL-001`** <sub>(fatal)</sub>: Rate Limit + 指数退避重试：所有外部数据 API 调用必须实施速率限制控制 和指数退避重试（Exponential Backoff with Jitter）。收到 429/503 响应后 立即重试是反模式，会加剧服务端压力并触发 IP 封禁。 最大重试次数 3-5 次，退避基数 1-2 秒，最大退避 60 秒。
- **`SHARED-DS-RL-002`** <sub>(high)</sub>: 批量 API 调用必须控制并发数（max_workers），不可无限制并行。 免费 API（akshare/tushare 免费版）通常限制为 1-3 并发； 付费 API 也有并发上限（tushare 积分制，不同积分对应不同并发）。 超出并发限制会触发 429 或 IP 封禁。推荐使用 asyncio.Semaphore 或 ThreadPoolExecutor 的 max_workers 参数显式控制。
- **`SHARED-DS-RL-003`** <sub>(high)</sub>: API Token / 凭证安全：数据源 API key（tushare token / akshare 无需 token 但 其他商业数据源需要）不可硬编码在代码中，必须通过环境变量或配置文件读取。 硬编码 token 提交到 Git 会导致 token 泄露和费用损失。
- **`SHARED-DS-RL-004`** <sub>(medium)</sub>: 请求节流（Throttling）：对同一 API 的批量请求应在请求间插入最小间隔 （akshare 部分接口要求 ≥ 0.5s；tushare 免费版每分钟 200 次）。 纯代码 sleep 不如令牌桶（Token Bucket）算法精确，推荐使用 ratelimit 或 slowapi 等成熟库。
- **`SHARED-DS-MISS-001`** <sub>(high)</sub>: 停牌日数据缺失策略：停牌股票在停牌期间无成交数据，数据库中会出现日期缺口。 缺失日期不可使用 forward-fill（会产生虚假成交量）； 应在数据库中以 is_suspended=True 标记，量和成交额填 0，价格保留前一日收盘价。 因子计算时必须过滤 is_suspended=True 的行。
- **`SHARED-DS-MISS-002`** <sub>(medium)</sub>: 新上市股票的历史数据边界：新股上市首日开始在数据库中出现，但其上市前 无历史数据。若因子计算的 lookback 期超过上市天数，会产生所有 NaN 因子值。 采集时应记录每只股票的上市日期（list_date），采集逻辑应以上市日期为起点， 不以固定开始日期。
- **`SHARED-DS-MISS-003`** <sub>(high)</sub>: 退市股票的数据完整性：已退市股票在主流数据源（akshare/tushare）中依然 可以查询历史数据（退市前的历史），但退市日期后无数据。 历史股票池构建时必须包含已退市股票（否则幸存者偏差）， 且采集时需明确处理退市日截止边界。
- **`SHARED-DS-MISS-004`** <sub>(high)</sub>: 多数据源数据对账（Cross-Source Reconciliation）：同一数据（如收盘价） 从不同数据源（akshare/tushare/baostock）获取可能存在细微差异 （不同复权方式/不同节假日处理/除息调整时间不同）。 应在 pipeline 中实施多源对账检查，差异超阈值（如 0.1%）时记录告警并人工确认。
- **`SHARED-DS-TIME-001`** <sub>(high)</sub>: 时间戳精度与类型一致性：数据库中时间戳应使用统一的数据类型 （timestamp 而非 varchar/int）。混用字符串日期（'2024-01-15'）和 Timestamp 对象是比较、索引、merge 出现细微 bug 的常见来源， 应在 pipeline 入口处强制转换。
- **`SHARED-DS-TIME-002`** <sub>(high)</sub>: 交易时间与自然时间的区分：日线数据的"日期"通常对应交易日（T日）， 而新闻/公告数据的"时间"是自然时间。合并两类数据时，必须将自然时间 映射到下一个可用交易日（next available trading day）， 否则会产生"公告在T日，但T日盘中已经可用"的 lookahead 问题。
- **`SHARED-DS-TIME-003`** <sub>(medium)</sub>: 夏令时（DST）处理：采集美股/欧洲股市数据时，夏令时切换日（3月/11月） 会导致同一 HH:MM 时刻对应不同的 UTC 时间，若未处理，当日时序数据 会出现1小时的漂移。应始终以 UTC 存储，展示时按市场本地时区转换。
- **`SHARED-DS-INCR-001`** <sub>(high)</sub>: 增量更新幂等性：数据更新脚本必须是幂等的（多次运行结果相同）。 若脚本因网络中断在中途失败，重新运行时不应产生重复数据或数据缺口。 实现方式：先写入临时表，校验后 UPSERT 到主表，不直接 INSERT/APPEND。
- **`SHARED-DS-INCR-002`** <sub>(high)</sub>: 数据完整性检验（数据校验和/行数检查）：每次数据更新后， 应对关键字段做完整性检验：行数是否在预期范围内、价格是否为正数、 日期是否连续（无缺失交易日）。缺少自动校验的数据管道是"沉默腐烂"的根源。
- **`SHARED-DS-INCR-003`** <sub>(medium)</sub>: 数据版本化：数据管道的输出数据应版本化管理（data versioning）。 当数据源更新了历史数据（如修订调整后的财务数据）， 旧版本数据应保留可追溯，不应静默覆盖，以便对比版本间差异及复现历史回测。
- **`SHARED-DS-INCR-004`** <sub>(medium)</sub>: 数据对齐到交易日历边界：采集完成后，应验证所有股票/资产的数据覆盖 完整性与交易日历的一致性。每只股票在每个交易日都应有一行数据 （停牌标记，不是缺失）。通过 pivot_table 检查 NaN 比例是有效的快速诊断手段。
- **`SHARED-DS-INCR-005`** <sub>(medium)</sub>: 缓存策略（Caching）：频繁读取的静态/低频更新数据（如股票信息、行业分类、 指数成分股）应本地缓存，避免每次运行重复 API 调用。 缓存必须设置过期时间（TTL），防止使用过期的行业分类或已失效的成分股信息。

FILE:references/LOCKS.md
# Semantic Locks + Preconditions

## Semantic Locks (12)

### `SL-01` <sub>(on_violation: fatal)</sub>
Execute sell orders before buy orders in every trading cycle

### `SL-02` <sub>(on_violation: fatal)</sub>
Trading signals MUST use next-bar execution (no look-ahead)

### `SL-03` <sub>(on_violation: fatal)</sub>
Entity IDs MUST follow format entity_type_exchange_code

### `SL-04` <sub>(on_violation: fatal)</sub>
DataFrame index MUST be MultiIndex (entity_id, timestamp)

### `SL-05` <sub>(on_violation: fatal)</sub>
TradingSignal MUST have EXACTLY ONE of: position_pct, order_money, order_amount

### `SL-06` <sub>(on_violation: fatal)</sub>
filter_result column semantics: True=BUY, False=SELL, None/NaN=NO ACTION

### `SL-07` <sub>(on_violation: fatal)</sub>
Transformer MUST run BEFORE Accumulator in factor pipeline

### `SL-08` <sub>(on_violation: fatal)</sub>
MACD parameters locked: fast=12, slow=26, signal=9

### `SL-09` <sub>(on_violation: warning)</sub>
Default transaction costs: buy_cost=0.001, sell_cost=0.001, slippage=0.001

### `SL-10` <sub>(on_violation: fatal)</sub>
A-share equity trading is T+1 (no same-day close of buy positions)

### `SL-11` <sub>(on_violation: fatal)</sub>
Recorder subclass MUST define provider AND data_schema class attributes

### `SL-12` <sub>(on_violation: fatal)</sub>
Factor result_df MUST contain either 'filter_result' OR 'score_result' column

## Preconditions (4)

- **`PC-01`**: `python3 -c 'import zvt; print(zvt.__version__)'` → on_fail: Run: python3 -m pip install zvt then re-run: python3 -m zvt.init_dirs to initialize data directories
- **`PC-02`**: `python3 -c "from zvt.api.kdata import get_kdata; df = get_kdata(entity_ids=['stock_sh_600000'], limit=1); assert df is not None and len(df) > 0, 'No kdata found'"` → on_fail: Run recorder first: python3 -m zvt.recorders.em.em_stock_kdata_recorder --entity_ids stock_sh_600000 (replace with your target entity IDs)
- **`PC-03`**: `python3 -c "import os; from pathlib import Path; zvt_home = Path(os.environ.get('ZVT_HOME', Path.home() / '.zvt')); assert zvt_home.exists(), f'ZVT home not found: {zvt_home}'"` → on_fail: Run: python3 -m zvt.init_dirs
- **`PC-04`**: `python3 -c "import os, tempfile; from pathlib import Path; zvt_home = Path(os.environ.get('ZVT_HOME', Path.home() / '.zvt')); test_f = zvt_home / '.write_test'; test_f.touch(); test_f.unlink()"` → on_fail: Check directory permissions: chmod u+w ~/.zvt or set ZVT_HOME environment variable to a writable location

FILE:references/USE_CASES.md
# Known Use Cases (KUC)

Total: **1**

## `KUC-101`
**Source**: `tests/test_extract_items.py`

Extracts and processes SEC EDGAR filings (10-K annual reports, 10-Q quarterly reports) from compressed ZIP archives for downstream financial analysis and document processing workflows.

FILE:references/WISDOM.md
# Cross-Project Wisdom

Total: **8**

## `CW-DATA-SOURCING-001` — Exponential backoff retry with rate limit detection
**From**: finance-bp-079--akshare, finance-bp-114--edgar-crawler · **Applicable to**: data-sourcing

Implement retry logic with exponential backoff specifically for HTTP 429 rate limit responses. Retrying immediately on rate limit errors worsens the block situation. Separate retry logic for transient network errors (TimeoutError, ConnectionError) from permanent errors (ValueError, KeyError) prevents resource waste and masks underlying bugs.

## `CW-DATA-SOURCING-002` — Strict date format validation and standardization
**From**: finance-bp-070--edgartools, finance-bp-079--akshare, finance-bp-084--eastmoney · **Applicable to**: data-sourcing

Validate date formats strictly (YYYY-MM-DD pattern with leap year and month-end checks) before processing XBRL or API data. Convert date strings between formats (YYYYMMDD to YYYY-MM-DD) when storing to databases. Invalid dates corrupt downstream financial calculations.

## `CW-DATA-SOURCING-003` — XBRL fact attribute completeness enforcement
**From**: finance-bp-070--edgartools, finance-bp-114--edgar-crawler · **Applicable to**: data-sourcing

Extract and validate all essential XBRL fact attributes (concept, value, period, unit) from every fact. Missing attributes cause financial analysis queries to return incomplete or misleading results. Period type (instant vs duration) must be correctly distinguished for accurate balance sheet rendering.

## `CW-DATA-SOURCING-004` — Streaming parser threshold for large documents
**From**: finance-bp-070--edgartools, finance-bp-128--yfinance · **Applicable to**: data-sourcing

Implement streaming parser activation when documents exceed configurable thresholds (10MB default). This prevents OOM errors on large NPORT-P filings or bulk document downloads. Also require timezone information for time-series data to prevent DST offset corruption.

## `CW-DATA-SOURCING-005` — Data accuracy disclaimer requirements
**From**: finance-bp-079--akshare, finance-bp-128--yfinance, finance-bp-097--OpenBB · **Applicable to**: data-sourcing

Always present scraped or third-party financial data with proper caveats about accuracy limitations and delays. Claims of guaranteed accuracy, real-time capabilities, or Yahoo/provider affiliation violate terms of service and can lead to user financial losses from reliance on delayed or incorrect data.

## `CW-DATA-SOURCING-006` — Atomic write ordering for versioned storage
**From**: finance-bp-103--ArcticDB · **Applicable to**: data-sourcing

Write atom keys (TABLE_DATA, TABLE_INDEX, VERSION) before updating mutable reference keys (VERSION_REF, SNAPSHOT_REF). Never modify atom keys after writing to preserve content-addressed storage invariants. This prevents readers from accessing incomplete data in multi-writer scenarios.

## `CW-DATA-SOURCING-007` — HTTP status code validation before data processing
**From**: finance-bp-079--akshare, finance-bp-097--OpenBB · **Applicable to**: data-sourcing

Always validate HTTP response status codes before processing response data. Error responses (404, 500) may contain HTML error pages that corrupt downstream JSON parsing. Explicitly check for HTTP 429 and raise RateLimitError for proper handling by callers.

## `CW-DATA-SOURCING-008` — Quality gates for financial recommendations
**From**: finance-bp-084--eastmoney · **Applicable to**: data-sourcing

Apply fundamental quality filters (ROE thresholds, OCF/Profit ratios, debt ratios) before generating financial recommendations. Without quality gates, low-quality stocks may be recommended for positions, leading to investment losses. Separate on-demand computation from scheduled pre-computation to handle API rate limits.

FILE:references/components/crawl_and_download_stage.md
# crawl_and_download_stage (3 classes)

## `crawl`
`crawl_and_download_stage/crawl.py:0`

## `download`
`crawl_and_download_stage/download.py:0`

## `iXBRL URL handling`
`crawl_and_download_stage/ixbrl-url-handling.py:0`

FILE:references/components/document_parsing_stage.md
# document_parsing_stage (8 classes)

## `ExtractItems.extract`
`document_parsing_stage/extractitems-extract.py:0`

## `HtmlStripper.feed`
`document_parsing_stage/htmlstripper-feed.py:0`

## `determine_items_to_extract`
`document_parsing_stage/determine-items-to-extract.py:0`

## `parse_item`
`document_parsing_stage/parse-item.py:0`

## `get_10q_parts`
`document_parsing_stage/get-10q-parts.py:0`

## `remove_tables`
`document_parsing_stage/remove-tables.py:0`

## `items_to_extract`
`document_parsing_stage/items-to-extract.py:0`

## `skip_extracted_filings`
`document_parsing_stage/skip-extracted-filings.py:0`

FILE:references/components/index_download_stage.md
# index_download_stage (4 classes)

## `download_indices`
`index_download_stage/download-indices.py:0`

## `get_specific_indices`
`index_download_stage/get-specific-indices.py:0`

## `requests_retry_session`
`index_download_stage/requests-retry-session.py:0`

## `user_agent`
`index_download_stage/user-agent.py:0`

FILE:references/components/logging_infrastructure.md
# logging_infrastructure (1 classes)

## `Logger.__init__`
`logging_infrastructure/logger-init.py:0`

FILE:references/seed.yaml
meta:
  id: finance-bp-114-v5.3
  version: v6.1
  blueprint_id: finance-bp-114
  sop_version: crystal-compilation-v6.1
  source_language: en
  compiled_at: '2026-04-22T13:00:54.950360+00:00'
  target_host: openclaw
  authoritative_artifact:
    primary: seed.yaml
    non_authoritative_derivatives:
    - SKILL.md (host-generated summary, may lag)
    - HEARTBEAT.md (host telemetry)
    - memory/*.md (host conversational memory)
    rule: On any behavioral decision (preconditions check, OV assertion, EQ rule firing, spec_lock verification), agents MUST
      re-read seed.yaml. Derivatives are for UI display only and may be out-of-date.
  execution_protocol:
    install_trigger:
    - Execute resources.host_adapter.install_recipes[] in declared order
    - Verify each package with import check before proceeding
    execute_trigger: When user intent matches intent_router.uc_entries[].positive_terms AND user uses action verb (run/execute/跑/执行/backtest/fetch/collect)
    on_execute:
    - Reload seed.yaml (do not rely on SKILL.md or cached summaries)
    - Run preconditions[] in declared order; halt on first fatal failure with on_fail message to user
    - Enter context_state_machine.CA1_MEMORY_CHECKED state
    - Evaluate evidence_quality.enforcement_rules[]; prepend user_disclosure_template
    - Translate user_facing_fields to user locale per locale_contract
    - "[V6 READING ORDER]\nThis crystal contains the following V6 layers. Before answering any business question, the host\
      \ MUST read them in order:\n  1. anti_patterns[] — cross-project anti-patterns (with AP-* ids)\n  2. cross_project_wisdom[]\
      \ — cross-project wisdom (with CW-* ids)\n  3. domain_constraints_injected[] — domain constraints (SHARED-* ids)\n \
      \ 4. known_use_cases[] — concrete business scenarios (KUC-* ids)\n  5. component_capability_map — AST component map\
      \ (by module)\n\nWhen answering user questions, proactively cite relevant AP-*/CW-*/SHARED-*/KUC-* ids with source text.\
      \ Examples: T+1 rules -> cite SHARED-* constraint; model comparison -> warn via AP-*; follow-holdings strategy -> cite\
      \ KUC-* with example file."
    workspace_resolution:
      scripts_path: '{host_workspace}/scripts/'
      skills_path: '{host_workspace}/skills/'
      trace_path: '{host_workspace}/.trace/'
  capability_tags:
    markets:
    - multi-market
    activities:
    - data-sourcing
  upgraded_from: finance-bp-114-v1.seed.yaml
  upgraded_at: '2026-04-22T13:20:30.751233+00:00'
  v6_inputs:
    ast_mind_map: knowledge/sources/finance/finance-bp-114--edgar-crawler/v6_inputs/ast_mind_map.yaml
    anti_patterns: null
    cross_project_wisdom: null
    examples_kuc: knowledge/sources/finance/finance-bp-114--edgar-crawler/v6_inputs/examples_kuc.yaml
    shared_pools_dir: knowledge/sources/finance/_shared
anti_patterns:
- id: AP-DATA-SOURCING-001
  title: Missing or invalid User-Agent headers for SEC API requests
  description: SEC EDGAR requires valid User-Agent identity with contact information in headers. Without this, requests are
    rejected with 403 Forbidden errors, completely blocking all filing access. Both edgartools and edgar-crawler enforce this
    constraint as fundamental to any data retrieval operation.
  project_source: finance-bp-070--edgartools, finance-bp-114--edgar-crawler
  severity: high
  applicable_to_tags:
    markets:
    - multi-market
    activities:
    - data-sourcing
  _source_file: anti-patterns/data-sourcing.yaml
- id: AP-DATA-SOURCING-002
  title: Ignoring external API rate limits causing IP blocking
  description: Multiple financial data sources (SEC EDGAR, Sina, Eastmoney, TuShare) enforce strict rate limits (10 req/sec,
    120 calls/minute). Exceeding these triggers temporary IP blocks lasting 10-60 minutes, causing complete data unavailability.
    Immediate retry attempts during blocks extend the block duration significantly.
  project_source: finance-bp-070--edgartools, finance-bp-079--akshare, finance-bp-084--eastmoney, finance-bp-114--edgar-crawler
  severity: high
  applicable_to_tags:
    markets:
    - multi-market
    activities:
    - data-sourcing
  _source_file: anti-patterns/data-sourcing.yaml
- id: AP-DATA-SOURCING-003
  title: No HTTP timeout configuration causing indefinite hangs
  description: HTTP requests to external financial data sources (Yahoo, Sina, Eastmoney) without timeout values can hang indefinitely
    on blocked connections. This freezes the entire application and prevents data collection from all other sources, creating
    cascading failures across the system.
  project_source: finance-bp-079--akshare
  severity: high
  applicable_to_tags:
    markets:
    - multi-market
    activities:
    - data-sourcing
  _source_file: anti-patterns/data-sourcing.yaml
- id: AP-DATA-SOURCING-004
  title: Invalidating XBRL period types for balance sheet analysis
  description: Balance sheets represent point-in-time snapshots (instant periods), not ranges (duration periods). Using duration
    periods for balance sheet statements causes stockholder equity and other line items to show nonsensical date ranges, corrupting
    financial calculations that depend on accurate period associations.
  project_source: finance-bp-070--edgartools
  severity: high
  applicable_to_tags:
    markets:
    - multi-market
    activities:
    - data-sourcing
  _source_file: anti-patterns/data-sourcing.yaml
- id: AP-DATA-SOURCING-005
  title: Malformed or empty JSON responses causing silent failures
  description: Financial API responses containing malformed JSON raise unhandled ValueError exceptions, crashing downstream
    processing. Similarly, empty JSON responses (empty dict, list, null) masquerading as valid data cause silent failures
    producing empty DataFrames or misleading results in financial analysis.
  project_source: finance-bp-079--akshare
  severity: medium
  applicable_to_tags:
    markets:
    - multi-market
    activities:
    - data-sourcing
  _source_file: anti-patterns/data-sourcing.yaml
- id: AP-DATA-SOURCING-006
  title: Source-specific symbol mapping errors causing data corruption
  description: Stock symbols require source-specific formatting (sh/sz prefixes for Sina, numeric codes for THS, etc.). Incorrect
    symbol mapping causes API calls to return empty results or wrong data, corrupting financial datasets with missing records
    or entirely incorrect tickers being stored.
  project_source: finance-bp-079--akshare
  severity: high
  applicable_to_tags:
    markets:
    - multi-market
    activities:
    - data-sourcing
  _source_file: anti-patterns/data-sourcing.yaml
- id: AP-DATA-SOURCING-007
  title: Using unsupported DataFrame types with time-series storage
  description: ArcticDB does not support MultiIndex columns, PyArrow-backed pandas DataFrames, or timedelta64 columns. Attempting
    to write these DataFrame types raises ArcticDbNotYetImplemented exceptions, causing write failures and permanent data
    loss if not properly handled before storage operations.
  project_source: finance-bp-103--ArcticDB
  severity: high
  applicable_to_tags:
    markets:
    - multi-market
    activities:
    - data-sourcing
  _source_file: anti-patterns/data-sourcing.yaml
- id: AP-DATA-SOURCING-008
  title: Non-atomic storage writes causing concurrent access corruption
  description: Storage backends without atomic write_if_none operations can cause data corruption under concurrent multi-writer
    access. Similarly, updating reference keys before atom keys complete allows readers to access incomplete or missing data,
    breaking version chain integrity.
  project_source: finance-bp-103--ArcticDB
  severity: high
  applicable_to_tags:
    markets:
    - multi-market
    activities:
    - data-sourcing
  _source_file: anti-patterns/data-sourcing.yaml
- id: AP-DATA-SOURCING-009
  title: Missing timezone-aware DatetimeIndex causing DST offset errors
  description: Price history DataFrames returned without timezone-aware DatetimeIndex cause incorrect timestamp interpretation
    when combined with other timezone-aware data. This leads to 23-25 hour offset errors during daylight saving time transitions,
    corrupting historical price calculations.
  project_source: finance-bp-128--yfinance
  severity: high
  applicable_to_tags:
    markets:
    - multi-market
    activities:
    - data-sourcing
  _source_file: anti-patterns/data-sourcing.yaml
- id: AP-DATA-SOURCING-010
  title: 8-K filing item numbering scheme mismatch for historical filings
  description: 8-K filings use obsolete item numbering (1-12) before 2004-08-23 and new numbering (1.01-9.01) after. Using
    the wrong numbering scheme causes no matches for historical filings, resulting in empty item sections and complete extraction
    failure for pre-2004 data.
  project_source: finance-bp-114--edgar-crawler
  severity: medium
  applicable_to_tags:
    markets:
    - multi-market
    activities:
    - data-sourcing
  _source_file: anti-patterns/data-sourcing.yaml
- id: AP-DATA-SOURCING-011
  title: Yahoo Finance missing crumb authentication causing 401/403 errors
  description: Yahoo Finance API requires crumb and cookie authentication with every request. Without proper crumb management,
    API calls return 401 Unauthorized or HTML error pages instead of JSON data, breaking all downstream price and financial
    data processing.
  project_source: finance-bp-128--yfinance
  severity: high
  applicable_to_tags:
    markets:
    - multi-market
    activities:
    - data-sourcing
  _source_file: anti-patterns/data-sourcing.yaml
- id: AP-DATA-SOURCING-012
  title: Large document parsing without streaming causing OOM errors
  description: SEC filings can exceed 160MB, and parsing large documents in memory without streaming causes OOM errors that
    crash the entire service for all users. Documents exceeding 10MB require switching to streaming parsers to prevent extreme
    memory usage.
  project_source: finance-bp-070--edgartools
  severity: high
  applicable_to_tags:
    markets:
    - multi-market
    activities:
    - data-sourcing
  _source_file: anti-patterns/data-sourcing.yaml
- id: AP-DATA-SOURCING-013
  title: Column mapping length mismatch causing DataFrame errors
  description: Column mapping constants with length mismatch against actual API response columns cause ValueError exceptions
    during DataFrame construction. Raw field names (f1, f2, f12) must be mapped to meaningful names (最新价, 涨跌幅) with exact
    column count alignment.
  project_source: finance-bp-079--akshare
  severity: medium
  applicable_to_tags:
    markets:
    - multi-market
    activities:
    - data-sourcing
  _source_file: anti-patterns/data-sourcing.yaml
- id: AP-DATA-SOURCING-014
  title: Pruning snapshot-protected versions breaking point-in-time recovery
  description: Deleting or pruning versions that are referenced by existing snapshots breaks historical data access. Snapshots
    provide point-in-time recovery capabilities, and removing their referenced versions causes read failures when users attempt
    to access data from specific snapshots.
  project_source: finance-bp-103--ArcticDB
  severity: high
  applicable_to_tags:
    markets:
    - multi-market
    activities:
    - data-sourcing
  _source_file: anti-patterns/data-sourcing.yaml
cross_project_wisdom:
- wisdom_id: CW-DATA-SOURCING-001
  source_project: finance-bp-079--akshare, finance-bp-114--edgar-crawler
  pattern_name: Exponential backoff retry with rate limit detection
  description: Implement retry logic with exponential backoff specifically for HTTP 429 rate limit responses. Retrying immediately
    on rate limit errors worsens the block situation. Separate retry logic for transient network errors (TimeoutError, ConnectionError)
    from permanent errors (ValueError, KeyError) prevents resource waste and masks underlying bugs.
  applicable_to_activity: data-sourcing
  _source_file: cross-project-wisdom/data-sourcing.yaml
- wisdom_id: CW-DATA-SOURCING-002
  source_project: finance-bp-070--edgartools, finance-bp-079--akshare, finance-bp-084--eastmoney
  pattern_name: Strict date format validation and standardization
  description: Validate date formats strictly (YYYY-MM-DD pattern with leap year and month-end checks) before processing XBRL
    or API data. Convert date strings between formats (YYYYMMDD to YYYY-MM-DD) when storing to databases. Invalid dates corrupt
    downstream financial calculations.
  applicable_to_activity: data-sourcing
  _source_file: cross-project-wisdom/data-sourcing.yaml
- wisdom_id: CW-DATA-SOURCING-003
  source_project: finance-bp-070--edgartools, finance-bp-114--edgar-crawler
  pattern_name: XBRL fact attribute completeness enforcement
  description: Extract and validate all essential XBRL fact attributes (concept, value, period, unit) from every fact. Missing
    attributes cause financial analysis queries to return incomplete or misleading results. Period type (instant vs duration)
    must be correctly distinguished for accurate balance sheet rendering.
  applicable_to_activity: data-sourcing
  _source_file: cross-project-wisdom/data-sourcing.yaml
- wisdom_id: CW-DATA-SOURCING-004
  source_project: finance-bp-070--edgartools, finance-bp-128--yfinance
  pattern_name: Streaming parser threshold for large documents
  description: Implement streaming parser activation when documents exceed configurable thresholds (10MB default). This prevents
    OOM errors on large NPORT-P filings or bulk document downloads. Also require timezone information for time-series data
    to prevent DST offset corruption.
  applicable_to_activity: data-sourcing
  _source_file: cross-project-wisdom/data-sourcing.yaml
- wisdom_id: CW-DATA-SOURCING-005
  source_project: finance-bp-079--akshare, finance-bp-128--yfinance, finance-bp-097--OpenBB
  pattern_name: Data accuracy disclaimer requirements
  description: Always present scraped or third-party financial data with proper caveats about accuracy limitations and delays.
    Claims of guaranteed accuracy, real-time capabilities, or Yahoo/provider affiliation violate terms of service and can
    lead to user financial losses from reliance on delayed or incorrect data.
  applicable_to_activity: data-sourcing
  _source_file: cross-project-wisdom/data-sourcing.yaml
- wisdom_id: CW-DATA-SOURCING-006
  source_project: finance-bp-103--ArcticDB
  pattern_name: Atomic write ordering for versioned storage
  description: Write atom keys (TABLE_DATA, TABLE_INDEX, VERSION) before updating mutable reference keys (VERSION_REF, SNAPSHOT_REF).
    Never modify atom keys after writing to preserve content-addressed storage invariants. This prevents readers from accessing
    incomplete data in multi-writer scenarios.
  applicable_to_activity: data-sourcing
  _source_file: cross-project-wisdom/data-sourcing.yaml
- wisdom_id: CW-DATA-SOURCING-007
  source_project: finance-bp-079--akshare, finance-bp-097--OpenBB
  pattern_name: HTTP status code validation before data processing
  description: Always validate HTTP response status codes before processing response data. Error responses (404, 500) may
    contain HTML error pages that corrupt downstream JSON parsing. Explicitly check for HTTP 429 and raise RateLimitError
    for proper handling by callers.
  applicable_to_activity: data-sourcing
  _source_file: cross-project-wisdom/data-sourcing.yaml
- wisdom_id: CW-DATA-SOURCING-008
  source_project: finance-bp-084--eastmoney
  pattern_name: Quality gates for financial recommendations
  description: Apply fundamental quality filters (ROE thresholds, OCF/Profit ratios, debt ratios) before generating financial
    recommendations. Without quality gates, low-quality stocks may be recommended for positions, leading to investment losses.
    Separate on-demand computation from scheduled pre-computation to handle API rate limits.
  applicable_to_activity: data-sourcing
  _source_file: cross-project-wisdom/data-sourcing.yaml
domain_constraints_injected:
- id: SHARED-DS-RL-001
  statement: 'Rate Limit + 指数退避重试：所有外部数据 API 调用必须实施速率限制控制 和指数退避重试（Exponential Backoff with Jitter）。收到 429/503 响应后 立即重试是反模式，会加剧服务端压力并触发
    IP 封禁。 最大重试次数 3-5 次，退避基数 1-2 秒，最大退避 60 秒。

    '
  severity: fatal
  capability_tags:
    activities:
    - data-sourcing
  applicable_conditions:
    blueprint_has_stage:
    - data_collection
  incompatible_with_tags: {}
  stage_id_remap_hints:
  - from_stage: data_collection
    constraint_context: all external API calls must implement exponential backoff retry with jitter
  evidence_refs:
  - type: community_validated
    ref: AWS《重试行为最佳实践》；akshare 文档限速说明；tushare 文档请求频率限制
    url: https://docs.aws.amazon.com/general/latest/gr/api-retries.html
  reference_code:
    bad_example: "# BAD: 立即重试，不退避，加剧 429\nfor attempt in range(5):\n    try:\n        data = api.get(symbol)\n        break\n\
      \    except RateLimitError:\n        time.sleep(0.1)  # 100ms 立即重试，会加剧问题\n"
    good_example: "# GOOD: 指数退避 + Jitter 重试\nimport random\n\ndef fetch_with_retry(func, *args, max_retries=5, base_delay=1.0):\n\
      \    for attempt in range(max_retries):\n        try:\n            return func(*args)\n        except (RateLimitError,\
      \ TimeoutError) as e:\n            if attempt == max_retries - 1:\n                raise\n            delay = min(base_delay\
      \ * (2 ** attempt), 60)\n            delay += random.uniform(0, delay * 0.1)  # +10% Jitter\n            time.sleep(delay)\n"
  provenance:
    source: community_validated
  _source_file: data-sourcing/constraints.yaml
- id: SHARED-DS-RL-002
  statement: '批量 API 调用必须控制并发数（max_workers），不可无限制并行。 免费 API（akshare/tushare 免费版）通常限制为 1-3 并发； 付费 API 也有并发上限（tushare 积分制，不同积分对应不同并发）。
    超出并发限制会触发 429 或 IP 封禁。推荐使用 asyncio.Semaphore 或 ThreadPoolExecutor 的 max_workers 参数显式控制。

    '
  severity: high
  capability_tags:
    activities:
    - data-sourcing
  applicable_conditions:
    blueprint_has_stage:
    - data_collection
  incompatible_with_tags: {}
  stage_id_remap_hints:
  - from_stage: data_collection
    constraint_context: concurrent API calls must be bounded by explicit max_workers/semaphore
  evidence_refs:
  - type: community_validated
    ref: tushare 文档积分与频率限制；akshare 文档接口说明；MiniMax 并发踩坑记录（Doramagic内部记忆）
  reference_code:
    bad_example: "# BAD: 无并发限制，触发 429\nwith ThreadPoolExecutor() as executor:\n    results = list(executor.map(fetch_stock,\
      \ stock_list))\n    # 默认 max_workers 可能创建几十个线程，立即触发 429\n"
    good_example: "# GOOD: 显式限制并发（akshare 免费版建议 max_workers=2）\nfrom concurrent.futures import ThreadPoolExecutor\nMAX_WORKERS\
      \ = 2  # 根据 API 文档调整\n\nwith ThreadPoolExecutor(max_workers=MAX_WORKERS) as executor:\n    results = list(executor.map(fetch_stock,\
      \ stock_list))\n"
  provenance:
    source: community_validated
  _source_file: data-sourcing/constraints.yaml
- id: SHARED-DS-RL-003
  statement: 'API Token / 凭证安全：数据源 API key（tushare token / akshare 无需 token 但 其他商业数据源需要）不可硬编码在代码中，必须通过环境变量或配置文件读取。 硬编码 token
    提交到 Git 会导致 token 泄露和费用损失。

    '
  severity: high
  capability_tags:
    activities:
    - data-sourcing
  applicable_conditions:
    blueprint_has_stage:
    - data_collection
  incompatible_with_tags: {}
  stage_id_remap_hints:
  - from_stage: data_collection
    constraint_context: API tokens must be loaded from environment variables, not hardcoded
  evidence_refs:
  - type: community_validated
    ref: tushare 文档 token 管理；GitHub Secret Scanning 最佳实践
    url: https://tushare.pro/document/2
  reference_code:
    bad_example: '# BAD: Token 硬编码，提交到 Git 后泄露

      ts.set_token(''abc123def456your_token_here'')

      pro = ts.pro_api()

      '
    good_example: "# GOOD: 从环境变量读取 token\nimport os\ntoken = os.environ.get('TUSHARE_TOKEN')\nif not token:\n    raise ValueError(\"\
      TUSHARE_TOKEN environment variable not set\")\nts.set_token(token)\npro = ts.pro_api()\n"
  provenance:
    source: community_validated
  _source_file: data-sourcing/constraints.yaml
- id: SHARED-DS-RL-004
  statement: '请求节流（Throttling）：对同一 API 的批量请求应在请求间插入最小间隔 （akshare 部分接口要求 ≥ 0.5s；tushare 免费版每分钟 200 次）。 纯代码 sleep 不如令牌桶（Token
    Bucket）算法精确，推荐使用 ratelimit 或 slowapi 等成熟库。

    '
  severity: medium
  capability_tags:
    activities:
    - data-sourcing
  applicable_conditions:
    blueprint_has_stage:
    - data_collection
  incompatible_with_tags: {}
  stage_id_remap_hints:
  - from_stage: data_collection
    constraint_context: per-request minimum interval must be enforced between API calls
  evidence_refs:
  - type: community_validated
    ref: akshare 官方文档接口说明；知乎《量化数据采集：如何优雅处理限速》
    url: https://akshare.akfamily.xyz/
  reference_code:
    bad_example: "# BAD: 固定 sleep 不准确，高并发下失效\nfor code in stock_list:\n    data = ak.stock_zh_a_hist(symbol=code)\n    time.sleep(0.1)\
      \  # 可能不够，也可能太保守\n"
    good_example: "# GOOD: 使用 ratelimit 装饰器精确控制\nfrom ratelimit import limits, sleep_and_retry\n\n@sleep_and_retry\n@limits(calls=200,\
      \ period=60)  # tushare 免费版: 200次/分钟\ndef fetch_daily(code, start, end):\n    return ts.pro_bar(ts_code=code, start_date=start,\
      \ end_date=end)\n"
  provenance:
    source: community_validated
  _source_file: data-sourcing/constraints.yaml
- id: SHARED-DS-MISS-001
  statement: '停牌日数据缺失策略：停牌股票在停牌期间无成交数据，数据库中会出现日期缺口。 缺失日期不可使用 forward-fill（会产生虚假成交量）； 应在数据库中以 is_suspended=True 标记，量和成交额填 0，价格保留前一日收盘价。
    因子计算时必须过滤 is_suspended=True 的行。

    '
  severity: high
  capability_tags:
    activities:
    - data-sourcing
    - backtesting
  applicable_conditions:
    blueprint_has_stage:
    - data_collection
    - data_filtering
  incompatible_with_tags: {}
  stage_id_remap_hints:
  - from_stage: data_collection
    constraint_context: suspended trading days must be explicitly marked with is_suspended=True, not silently forward-filled
  evidence_refs:
  - type: community_validated
    ref: tushare 文档 daily 接口停牌标志；qlib 文档 suspended stock handling
    url: https://tushare.pro/document/2?doc_id=28
  reference_code:
    bad_example: '# BAD: forward-fill 停牌日，量保持前一日非零值

      df = df.reindex(all_trading_days).fillna(method=''ffill'')

      # volume 被填充为非零值，停牌变"正常交易"

      '
    good_example: "# GOOD: 停牌日明确标记\nfull_index = pd.MultiIndex.from_product(\n    [all_stocks, all_trading_days], names=['stock',\
      \ 'date'])\ndf_full = df.reindex(full_index)\ndf_full['is_suspended'] = df_full['volume'].isna()\ndf_full['volume']\
      \ = df_full['volume'].fillna(0)\ndf_full['amount'] = df_full['amount'].fillna(0)\ndf_full['close'] = df_full['close'].fillna(method='ffill')\
      \  # 价格 ffill\n"
  provenance:
    source: community_validated
  _source_file: data-sourcing/constraints.yaml
- id: SHARED-DS-MISS-002
  statement: '新上市股票的历史数据边界：新股上市首日开始在数据库中出现，但其上市前 无历史数据。若因子计算的 lookback 期超过上市天数，会产生所有 NaN 因子值。 采集时应记录每只股票的上市日期（list_date），采集逻辑应以上市日期为起点，
    不以固定开始日期。

    '
  severity: medium
  capability_tags:
    activities:
    - data-sourcing
  applicable_conditions:
    blueprint_has_stage:
    - data_collection
  incompatible_with_tags: {}
  stage_id_remap_hints:
  - from_stage: data_collection
    constraint_context: data collection start date must be bounded by stock listing date, not a fixed start date
  evidence_refs:
  - type: community_validated
    ref: tushare stock_basic 接口 list_date 字段；akshare stock_info_a_code_name 接口
    url: https://tushare.pro/document/2?doc_id=25
  reference_code:
    bad_example: "# BAD: 统一从 2010-01-01 开始，新股有大量 NaN\nfor code in stock_list:\n    df = fetch(code, start='2010-01-01', end=today)\n"
    good_example: "# GOOD: 从上市日期开始采集\nstock_info = ts.get_stock_basics()  # 含 list_date\nfor code in stock_list:\n    list_date\
      \ = stock_info.loc[code, 'list_date']\n    df = fetch(code, start=list_date, end=today)\n"
  provenance:
    source: community_validated
  _source_file: data-sourcing/constraints.yaml
- id: SHARED-DS-MISS-003
  statement: '退市股票的数据完整性：已退市股票在主流数据源（akshare/tushare）中依然 可以查询历史数据（退市前的历史），但退市日期后无数据。 历史股票池构建时必须包含已退市股票（否则幸存者偏差）， 且采集时需明确处理退市日截止边界。

    '
  severity: high
  capability_tags:
    activities:
    - data-sourcing
    - backtesting
  applicable_conditions:
    blueprint_has_stage:
    - data_collection
  incompatible_with_tags: {}
  stage_id_remap_hints:
  - from_stage: data_collection
    constraint_context: delisted stocks must be included in historical universe; delist_date must be recorded
  evidence_refs:
  - type: community_validated
    ref: tushare stock_basic 接口 delist_date 字段；qlib 文档 Delisted Stock Handling
    url: https://tushare.pro/document/2?doc_id=25
  reference_code:
    bad_example: '# BAD: 只采集当前上市股票，遗漏已退市股票

      stock_list = ts.get_stock_basics()  # 只含当前上市股票

      '
    good_example: "# GOOD: 采集全量股票（含已退市）\nall_stocks = pro.stock_basic(\n    exchange='', list_status='L',  # 上市\n)\ndelisted\
      \ = pro.stock_basic(\n    exchange='', list_status='D',  # 退市\n)\nfull_universe = pd.concat([all_stocks, delisted])\n"
  provenance:
    source: community_validated
  _source_file: data-sourcing/constraints.yaml
- id: SHARED-DS-MISS-004
  statement: '多数据源数据对账（Cross-Source Reconciliation）：同一数据（如收盘价） 从不同数据源（akshare/tushare/baostock）获取可能存在细微差异 （不同复权方式/不同节假日处理/除息调整时间不同）。
    应在 pipeline 中实施多源对账检查，差异超阈值（如 0.1%）时记录告警并人工确认。

    '
  severity: high
  capability_tags:
    activities:
    - data-sourcing
  applicable_conditions:
    blueprint_has_stage:
    - data_collection
  incompatible_with_tags: {}
  stage_id_remap_hints:
  - from_stage: data_collection
    constraint_context: when using multiple data sources, cross-source price reconciliation must be performed
  evidence_refs:
  - type: community_validated
    ref: 雪球量化社区《数据质量：多数据源对账实践》；知乎《量化数据质量保障》
  reference_code:
    bad_example: '# BAD: 切换数据源不做对账，静默吞下差异

      df_primary = akshare_fetch(code)

      df_backup = baostock_fetch(code)

      # 如果主源失败，直接用备源，不验证一致性

      '
    good_example: "# GOOD: 双源对账，价格差异超 0.5% 告警\ntolerance = 0.005\nmerged = df_primary.join(df_backup, lsuffix='_ak', rsuffix='_bs')\n\
      diff = (merged['close_ak'] - merged['close_bs']).abs() / merged['close_ak']\nanomalies = diff[diff > tolerance]\nif\
      \ len(anomalies) > 0:\n    logger.warning(f\"Price discrepancy > {tolerance:.1%}: {len(anomalies)} rows\")\n"
  provenance:
    source: community_validated
  _source_file: data-sourcing/constraints.yaml
- id: SHARED-DS-TIME-001
  statement: '时间戳精度与类型一致性：数据库中时间戳应使用统一的数据类型 （timestamp 而非 varchar/int）。混用字符串日期（''2024-01-15''）和 Timestamp 对象是比较、索引、merge 出现细微
    bug 的常见来源， 应在 pipeline 入口处强制转换。

    '
  severity: high
  capability_tags:
    activities:
    - data-sourcing
  applicable_conditions:
    blueprint_has_stage:
    - data_collection
  incompatible_with_tags: {}
  stage_id_remap_hints:
  - from_stage: data_collection
    constraint_context: all date/time fields must be normalized to pd.Timestamp at data ingestion boundary
  evidence_refs:
  - type: community_validated
    ref: pandas 文档 to_datetime 最佳实践；SQLAlchemy TIMESTAMP 类型说明
    url: https://pandas.pydata.org/docs/reference/api/pandas.to_datetime.html
  reference_code:
    bad_example: '# BAD: 存储为字符串，比较出错

      df[''date''] = ''2024-01-15''  # 字符串

      latest = df[df[''date''] == ''2024-01-15'']  # 字符串比较，效率低

      '
    good_example: '# GOOD: 统一转换为 Timestamp

      df[''date''] = pd.to_datetime(df[''date''])

      latest = df[df[''date''] == pd.Timestamp(''2024-01-15'')]

      '
  provenance:
    source: community_validated
  _source_file: data-sourcing/constraints.yaml
- id: SHARED-DS-TIME-002
  statement: '交易时间与自然时间的区分：日线数据的"日期"通常对应交易日（T日）， 而新闻/公告数据的"时间"是自然时间。合并两类数据时，必须将自然时间 映射到下一个可用交易日（next available trading day），
    否则会产生"公告在T日，但T日盘中已经可用"的 lookahead 问题。

    '
  severity: high
  capability_tags:
    activities:
    - data-sourcing
    - backtesting
  applicable_conditions:
    blueprint_has_stage:
    - data_collection
    - data_filtering
  incompatible_with_tags: {}
  stage_id_remap_hints:
  - from_stage: data_collection
    constraint_context: announcement timestamps must be mapped to next trading day open, not announcement date
  evidence_refs:
  - type: community_validated
    ref: 知乎《量化数据时间戳处理：交易日与自然日的转换》；qlib 文档 point-in-time data
    url: https://qlib.readthedocs.io/
  reference_code:
    bad_example: '# BAD: 公告日当天即可用于交易信号（可能是盘后公告）

      signals = df.merge(announcements, on=''date'')  # 公告日 = 交易日

      '
    good_example: "# GOOD: 盘后公告映射到下一交易日\nimport exchange_calendars as xcals\ncal = xcals.get_calendar('XSHG')\n\ndef announcement_to_trade_date(ann_dt,\
      \ market_close_hour=15):\n    date = pd.Timestamp(ann_dt)\n    if date.hour >= market_close_hour:\n        # 盘后公告 →\
      \ 下一交易日生效\n        return cal.next_session(date.date())\n    else:\n        return date.date()\n\nannouncements['trade_date']\
      \ = announcements['ann_datetime'].apply(\n    announcement_to_trade_date)\n"
  provenance:
    source: community_validated
  _source_file: data-sourcing/constraints.yaml
- id: SHARED-DS-TIME-003
  statement: '夏令时（DST）处理：采集美股/欧洲股市数据时，夏令时切换日（3月/11月） 会导致同一 HH:MM 时刻对应不同的 UTC 时间，若未处理，当日时序数据 会出现1小时的漂移。应始终以 UTC 存储，展示时按市场本地时区转换。

    '
  severity: medium
  capability_tags:
    activities:
    - data-sourcing
  applicable_conditions:
    blueprint_has_stage:
    - data_collection
  incompatible_with_tags:
    markets:
    - cn-astock
  stage_id_remap_hints:
  - from_stage: data_collection
    constraint_context: DST transitions must be handled when collecting US/EU market data; store as UTC
  evidence_refs:
  - type: community_validated
    ref: pytz 文档 DST 处理；exchange_calendars 文档
    url: https://pytz.sourceforge.net/
  reference_code:
    bad_example: '# BAD: 用 naive datetime，夏令时切换日漂移

      df[''datetime''] = pd.to_datetime(df[''time_str''])  # no timezone

      '
    good_example: "# GOOD: 以 UTC 存储，展示时转本地时区\nimport pytz\neastern = pytz.timezone('America/New_York')\ndf['datetime_utc']\
      \ = pd.to_datetime(df['time_str']\n    ).dt.tz_localize(eastern, ambiguous='NaT'\n    ).dt.tz_convert('UTC')\n"
  provenance:
    source: community_validated
  _source_file: data-sourcing/constraints.yaml
- id: SHARED-DS-INCR-001
  statement: '增量更新幂等性：数据更新脚本必须是幂等的（多次运行结果相同）。 若脚本因网络中断在中途失败，重新运行时不应产生重复数据或数据缺口。 实现方式：先写入临时表，校验后 UPSERT 到主表，不直接 INSERT/APPEND。

    '
  severity: high
  capability_tags:
    activities:
    - data-sourcing
  applicable_conditions:
    blueprint_has_stage:
    - data_collection
  incompatible_with_tags: {}
  stage_id_remap_hints:
  - from_stage: data_collection
    constraint_context: 'data update scripts must be idempotent: use UPSERT, not INSERT/APPEND'
  evidence_refs:
  - type: community_validated
    ref: SQLite UPSERT 文档（INSERT OR REPLACE）；知乎《量化数据库设计：幂等更新》
    url: https://www.sqlite.org/lang_upsert.html
  reference_code:
    bad_example: '# BAD: 直接 APPEND，重跑产生重复数据

      df_new.to_sql(''daily_prices'', con=engine, if_exists=''append'', index=False)

      '
    good_example: "# GOOD: UPSERT（主键冲突则更新）\nfor _, row in df_new.iterrows():\n    engine.execute(\"\"\"\n        INSERT OR\
      \ REPLACE INTO daily_prices\n        (stock_code, date, open, high, low, close, volume)\n        VALUES (?, ?, ?, ?,\
      \ ?, ?, ?)\n    \"\"\", row.to_list())\n# SQLAlchemy 版本：使用 on_conflict_do_update\n"
  provenance:
    source: community_validated
  _source_file: data-sourcing/constraints.yaml
- id: SHARED-DS-INCR-002
  statement: '数据完整性检验（数据校验和/行数检查）：每次数据更新后， 应对关键字段做完整性检验：行数是否在预期范围内、价格是否为正数、 日期是否连续（无缺失交易日）。缺少自动校验的数据管道是"沉默腐烂"的根源。

    '
  severity: high
  capability_tags:
    activities:
    - data-sourcing
  applicable_conditions:
    blueprint_has_stage:
    - data_collection
  incompatible_with_tags: {}
  stage_id_remap_hints:
  - from_stage: data_collection
    constraint_context: 'post-update data quality checks must run automatically: row count, price positivity, date continuity'
  evidence_refs:
  - type: community_validated
    ref: Great Expectations 文档；知乎《量化数据质量治理：如何发现数据腐烂》
    url: https://docs.greatexpectations.io/
  reference_code:
    bad_example: '# BAD: 更新后不做任何检验

      update_daily_prices(date=today)

      print("Update done")  # 不知道是否成功，不知道有无缺漏

      '
    good_example: '# GOOD: 更新后自动校验

      update_daily_prices(date=today)


      # 检验1: 行数合理（A股约5000只股票）

      row_count = db.count("SELECT COUNT(*) FROM daily_prices WHERE date = ?", today)

      assert 4000 <= row_count <= 6000, f"Unexpected row count: {row_count}"


      # 检验2: 无零价格或负价格

      invalid = db.count("SELECT COUNT(*) FROM daily_prices WHERE close <= 0")

      assert invalid == 0, f"Found {invalid} invalid prices"


      # 检验3: 无日期缺口（检查最近 5 个交易日连续性）

      check_no_date_gaps(db, last_n_trading_days=5)

      '
  provenance:
    source: community_validated
  _source_file: data-sourcing/constraints.yaml
- id: SHARED-DS-INCR-003
  statement: '数据版本化：数据管道的输出数据应版本化管理（data versioning）。 当数据源更新了历史数据（如修订调整后的财务数据）， 旧版本数据应保留可追溯，不应静默覆盖，以便对比版本间差异及复现历史回测。

    '
  severity: medium
  capability_tags:
    activities:
    - data-sourcing
  applicable_conditions:
    blueprint_has_stage:
    - data_collection
  incompatible_with_tags: {}
  stage_id_remap_hints:
  - from_stage: data_collection
    constraint_context: historical data revisions must be versioned; silent overwrites are prohibited
  evidence_refs:
  - type: community_validated
    ref: ArcticDB 文档数据版本化；DVC (Data Version Control) 文档
    url: https://arcticdb.io/
  reference_code:
    bad_example: '# BAD: 覆盖写入，历史版本丢失

      df_revised.to_csv(''financial_data.csv'', index=False)  # 覆盖旧版本

      '
    good_example: '# GOOD: 带时间戳的版本化存储（使用 ArcticDB 或简单目录版本）

      version = datetime.now().strftime(''%Y%m%d_%H%M%S'')

      df_revised.to_parquet(f''data/financial_data_v{version}.parquet'')

      # 软链接指向最新版本

      # ln -sf financial_data_v{version}.parquet financial_data_latest.parquet


      # 或使用 ArcticDB（内置版本化）:

      import arcticdb as adb

      lib = adb.Arctic(''lmdb:///data/arctic_store'').get_library(''finance'')

      lib.write(''financial_data'', df_revised)  # 自动版本化

      '
  provenance:
    source: community_validated
  _source_file: data-sourcing/constraints.yaml
- id: SHARED-DS-INCR-004
  statement: '数据对齐到交易日历边界：采集完成后，应验证所有股票/资产的数据覆盖 完整性与交易日历的一致性。每只股票在每个交易日都应有一行数据 （停牌标记，不是缺失）。通过 pivot_table 检查 NaN 比例是有效的快速诊断手段。

    '
  severity: medium
  capability_tags:
    activities:
    - data-sourcing
  applicable_conditions:
    blueprint_has_stage:
    - data_collection
    - data_filtering
  incompatible_with_tags: {}
  stage_id_remap_hints:
  - from_stage: data_collection
    constraint_context: data completeness vs trading calendar must be verified after each ingestion
  evidence_refs:
  - type: community_validated
    ref: qlib 文档 data quality inspection；tushare 文档 daily 接口完整性说明
    url: https://qlib.readthedocs.io/
  reference_code:
    bad_example: '# BAD: 不检验数据完整性，静默忽略缺失

      df = load_all_stocks(start_date, end_date)

      run_backtest(df)

      '
    good_example: "# GOOD: pivot 矩阵检验覆盖率\nprice_matrix = df.pivot_table(\n    index='date', columns='stock_code', values='close')\n\
      coverage = 1 - price_matrix.isna().mean().mean()\nprint(f\"Data coverage: {coverage:.1%}\")\nif coverage < 0.95:\n \
      \   logger.warning(f\"Low coverage: {coverage:.1%}, check for missing stocks\")\n# 找出缺失严重的股票\nmissing_stocks = price_matrix.isna().mean()\n\
      bad_stocks = missing_stocks[missing_stocks > 0.05].index.tolist()\nif bad_stocks:\n    logger.warning(f\"Stocks with\
      \ >5% missing days: {bad_stocks}\")\n"
  provenance:
    source: community_validated
  _source_file: data-sourcing/constraints.yaml
- id: SHARED-DS-INCR-005
  statement: '缓存策略（Caching）：频繁读取的静态/低频更新数据（如股票信息、行业分类、 指数成分股）应本地缓存，避免每次运行重复 API 调用。 缓存必须设置过期时间（TTL），防止使用过期的行业分类或已失效的成分股信息。

    '
  severity: medium
  capability_tags:
    activities:
    - data-sourcing
  applicable_conditions:
    blueprint_has_stage:
    - data_collection
  incompatible_with_tags: {}
  stage_id_remap_hints:
  - from_stage: data_collection
    constraint_context: static/low-frequency data must be cached locally with TTL to avoid unnecessary API calls
  evidence_refs:
  - type: community_validated
    ref: akshare 文档建议本地缓存；functools.lru_cache 文档；joblib.Memory 文档
    url: https://akshare.akfamily.xyz/
  reference_code:
    bad_example: "# BAD: 每次运行都重新获取行业分类（慢且消耗配额）\ndef get_industry(stock):\n    return ak.stock_board_industry_name_em()  #\
      \ 每次调用 API\n"
    good_example: "# GOOD: 缓存行业分类，每日刷新一次\nfrom joblib import Memory\nfrom datetime import date\n\ncache_dir = './data_cache'\n\
      memory = Memory(cache_dir, verbose=0)\n\[email protected]\ndef get_industry_cached(cache_date: str):  # cache_date 作为缓存\
      \ key\n    return ak.stock_board_industry_name_em()\n\n# 每日刷新：用今日日期作为 key，自动使旧缓存失效\nindustry_df = get_industry_cached(str(date.today()))\n"
  provenance:
    source: community_validated
  _source_file: data-sourcing/constraints.yaml
resources_injected: {}
known_use_cases:
- kuc_id: KUC-101
  source_file: tests/test_extract_items.py
  business_problem: Extracts and processes SEC EDGAR filings (10-K annual reports, 10-Q quarterly reports) from compressed
    ZIP archives for downstream financial analysis and document processing workflows.
  intent_keywords:
  - EDGAR
  - SEC filings
  - 10-K extraction
  - annual report parsing
  - document extraction
  stage: data_collection
  data_domain: financial_data
  type: data_pipeline
component_capability_map:
  project: finance-bp-114--edgar-crawler
  scan_date: '2026-04-22'
  stats:
    total_files: 4
    total_classes: 16
    total_functions: 0
    total_stages: 4
  modules:
    index_download_stage:
      class_count: 4
      stage_id: index_download
      stage_order: 1
      responsibility: Downloads SEC EDGAR index files (TSV) for specified years/quarters. Provides the master list of available
        filings for downstream filtering. This stage exists because SEC EDGAR provides quarterly indices that must be fetched
        incrementally to avoid redundant network calls and enable efficient updates.
      classes:
      - name: download_indices
        file: index_download_stage/download-indices.py
        line: 0
        kind: required_method
        signature: ''
      - name: get_specific_indices
        file: index_download_stage/get-specific-indices.py
        line: 0
        kind: required_method
        signature: ''
      - name: requests_retry_session
        file: index_download_stage/requests-retry-session.py
        line: 0
        kind: required_method
        signature: ''
      - name: user_agent
        file: index_download_stage/user-agent.py
        line: 0
        kind: replaceable_point
      design_decision_count: 3
    crawl_and_download_stage:
      class_count: 3
      stage_id: crawl_and_download
      stage_order: 2
      responsibility: Parses HTML index pages from SEC EDGAR, extracts filing metadata (SIC, state, fiscal year), and downloads
        actual filing documents to local storage. This stage bridges index information to raw document files for downstream
        parsing.
      classes:
      - name: crawl
        file: crawl_and_download_stage/crawl.py
        line: 0
        kind: required_method
        signature: ''
      - name: download
        file: crawl_and_download_stage/download.py
        line: 0
        kind: required_method
        signature: ''
      - name: iXBRL URL handling
        file: crawl_and_download_stage/ixbrl-url-handling.py
        line: 0
        kind: replaceable_point
      design_decision_count: 3
    document_parsing_stage:
      class_count: 8
      stage_id: document_parsing
      stage_order: 3
      responsibility: Extracts structured items from raw HTML/text filings using regex pattern matching. Handles tables, spans,
        and filing-specific item structures for 10-K, 10-Q, and 8-K filings. This is the core NLP extraction engine that transforms
        unstructured documents into machine-readable JSON.
      classes:
      - name: ExtractItems.extract
        file: document_parsing_stage/extractitems-extract.py
        line: 0
        kind: required_method
        signature: ''
      - name: HtmlStripper.feed
        file: document_parsing_stage/htmlstripper-feed.py
        line: 0
        kind: required_method
        signature: ''
      - name: determine_items_to_extract
        file: document_parsing_stage/determine-items-to-extract.py
        line: 0
        kind: required_method
        signature: ''
      - name: parse_item
        file: document_parsing_stage/parse-item.py
        line: 0
        kind: required_method
        signature: ''
      - name: get_10q_parts
        file: document_parsing_stage/get-10q-parts.py
        line: 0
        kind: required_method
        signature: ''
      - name: remove_tables
        file: document_parsing_stage/remove-tables.py
        line: 0
        kind: replaceable_point
      - name: items_to_extract
        file: document_parsing_stage/items-to-extract.py
        line: 0
        kind: replaceable_point
      - name: skip_extracted_filings
        file: document_parsing_stage/skip-extracted-filings.py
        line: 0
        kind: replaceable_point
      design_decision_count: 8
    logging_infrastructure:
      class_count: 1
      stage_id: logging
      stage_order: 4
      responsibility: Centralized logging infrastructure providing timestamped log files and console output filtering. Enables
        debugging of specific execution windows and post-run forensics.
      classes:
      - name: Logger.__init__
        file: logging_infrastructure/logger-init.py
        line: 0
        kind: required_method
        signature: ''
      design_decision_count: 2
  data_flow_hints: []
locale_contract:
  source_language: en
  user_facing_fields:
  - human_summary.what_i_can_do.tagline
  - human_summary.what_i_can_do.use_cases[]
  - human_summary.what_i_auto_fetch[]
  - human_summary.what_i_ask_you[]
  - evidence_quality.user_disclosure_template
  - post_install_notice.message_template.positioning
  - post_install_notice.message_template.capability_catalog.groups[].name
  - post_install_notice.message_template.capability_catalog.groups[].description
  - post_install_notice.message_template.capability_catalog.groups[].ucs[].name
  - post_install_notice.message_template.capability_catalog.groups[].ucs[].short_description
  - post_install_notice.message_template.call_to_action
  - post_install_notice.message_template.featured_entries[].beginner_prompt
  - post_install_notice.message_template.more_info_hint
  - preconditions[].description
  - preconditions[].on_fail
  - intent_router.uc_entries[].name
  - intent_router.uc_entries[].ambiguity_question
  - architecture.pipeline
  - architecture.stages[].narrative.does_what
  - architecture.stages[].narrative.key_decisions
  - architecture.stages[].narrative.common_pitfalls
  - constraints.fatal[].consequence
  - constraints.regular[].consequence
  - output_validator.assertions[].failure_message
  - acceptance.hard_gates[].on_fail
  - skill_crystallization.action
  locale_detection_order:
  - explicit_user_declaration
  - first_message_language
  - system_locale
  translation_enforcement:
    trigger: on_first_user_message
    action: Render user_facing_fields in detected locale, preserving all IDs (BD-/SL-/UC-/finance-C-) and code identifiers
      verbatim
    violation_code: LOCALE-01
    violation_signal: User receives untranslated English Human Summary when detected locale != en
evidence_quality:
  declared:
    evidence_coverage_ratio: 1.0
    evidence_verify_ratio: 0.32926829268292684
    evidence_invalid: 55
    evidence_verified: 27
    evidence_auto_fixed: 0
    audit_coverage: 45/45 (100%)
    audit_pass_rate: 1/45 (2%)
    audit_fail_total: 29
    audit_finance_universal:
      pass: 0
      warn: 0
      fail: 0
    audit_subdomain_totals:
      pass: 1
      warn: 15
      fail: 29
  enforcement_rules:
  - id: EQ-01
    trigger: declared.evidence_verify_ratio < 0.5
    action: MUST invoke traceback lookup for all cited BD-IDs in output before emitting business code — read LATEST.yaml sections
      for each BD referenced
    violation_code: EQ-01-V
    violation_signal: Generated script references BD-IDs but no tool_call to read LATEST.yaml preceded code generation
  user_disclosure_template: '[QUALITY NOTICE] This crystal was compiled from blueprint finance-bp-114. Evidence verify ratio
    = 32.9% and audit fail total = 29. Generated results may have uncaptured requirement gaps. Verify critical decisions against
    source files (LATEST.yaml / LATEST.jsonl).'
traceback:
  source_files:
    blueprint: LATEST.yaml
    constraints: LATEST.jsonl
  mandatory_lookup_scenarios:
  - id: TB-01
    condition: Two constraints have apparently conflicting enforcement rules
    lookup_target: LATEST.jsonl — find both constraint IDs, compare `consequence` + `evidence_refs` to determine priority
  - id: TB-02
    condition: A business decision rationale is unclear or disputed
    lookup_target: LATEST.yaml — locate BD-ID under business_decisions, read `rationale` + `alternative_considered` fields
  - id: TB-03
    condition: evidence_invalid > 0 in evidence_quality.declared
    lookup_target: LATEST.yaml _enrich_meta — cross-check specific BD `evidence_refs` fields for invalid markers
  - id: TB-04
    condition: User asks where a rule comes from
    lookup_target: LATEST.jsonl — find constraint by ID, read `confidence.evidence_refs` for source file + line number
  - id: TB-05
    condition: Generated code does not match expected ZVT API behavior
    lookup_target: LATEST.yaml stages[].required_methods — verify method signature and evidence locator in source code
  degraded_lookup:
    no_fs_access: 'Ask the user to paste the relevant LATEST.yaml section or LATEST.jsonl lines for the BD-/finance-C- IDs
      in question. Crystal ID: finance-bp-114-v5.0.'
trace_schema:
  event_types:
  - precondition_check
  - spec_lock_check
  - evidence_rule_fired
  - evidence_rule_skipped
  - locale_translation_emitted
  - hard_gate_passed
  - hard_gate_failed
  - skill_emitted
  - false_completion_claim
preconditions:
- id: PC-01
  description: zvt package installed and importable
  check_command: python3 -c 'import zvt; print(zvt.__version__)'
  on_fail: 'Run: python3 -m pip install zvt  then re-run: python3 -m zvt.init_dirs to initialize data directories'
  severity: fatal
- id: PC-02
  description: K-data exists for target entities (required before backtesting)
  check_command: python3 -c "from zvt.api.kdata import get_kdata; df = get_kdata(entity_ids=['stock_sh_600000'], limit=1);
    assert df is not None and len(df) > 0, 'No kdata found'"
  on_fail: 'Run recorder first: python3 -m zvt.recorders.em.em_stock_kdata_recorder --entity_ids stock_sh_600000  (replace
    with your target entity IDs)'
  severity: fatal
  applies_to_uc: []
- id: PC-03
  description: ZVT data directory initialized (~/.zvt or ZVT_HOME)
  check_command: 'python3 -c "import os; from pathlib import Path; zvt_home = Path(os.environ.get(''ZVT_HOME'', Path.home()
    / ''.zvt'')); assert zvt_home.exists(), f''ZVT home not found: {zvt_home}''"'
  on_fail: 'Run: python3 -m zvt.init_dirs'
  severity: fatal
- id: PC-04
  description: SQLite write permission for ZVT data directory
  check_command: python3 -c "import os, tempfile; from pathlib import Path; zvt_home = Path(os.environ.get('ZVT_HOME', Path.home()
    / '.zvt')); test_f = zvt_home / '.write_test'; test_f.touch(); test_f.unlink()"
  on_fail: 'Check directory permissions: chmod u+w ~/.zvt  or set ZVT_HOME environment variable to a writable location'
  severity: warn
intent_router:
  uc_entries:
  - uc_id: UC-101
    name: SEC EDGAR Filing Extraction
    positive_terms:
    - EDGAR
    - SEC filings
    - 10-K extraction
    - annual report parsing
    - document extraction
    data_domain: financial_data
    negative_terms:
    - trading strategy
    - backtesting
    - stock screening
    - live trading
    - factor computation
    - machine learning prediction
    ambiguity_question: Are you looking to extract raw SEC EDGAR filings (10-K, 10-Q, 8-K) from compressed archives for document
      processing? Or do you need a different financial data pipeline task?
context_state_machine:
  states:
  - id: CA1_MEMORY_CHECKED
    entry: Task started
    exit: All memory queries attempted and recorded; memory_unavailable set if failed
    timeout: 30s — skip memory, mark memory_unavailable=true, proceed to CA2
  - id: CA2_GAPS_FILLED
    entry: CA1 complete
    exit: 'All FATAL-priority required inputs answered: target market (A-share/HK/US), data source, time range, strategy type'
    timeout: NOT skippable — FATAL inputs MUST be user-answered before proceeding
  - id: CA3_PATH_SELECTED
    entry: CA2 complete
    exit: intent_router matched single use case with confidence gap > 20% over next candidate, no data_domain ambiguity
    timeout: Trigger ambiguity_question for top-2 candidates, await user selection
  - id: CA4_EXECUTING
    entry: CA3 complete + user explicit confirmation received
    exit: All hard gates G1-Gn passed and output files written
    timeout: NOT skippable — user confirmation of execution path required
  enforcement: Code generation is PROHIBITED before CA4_EXECUTING. Any regression to earlier state MUST be announced to user.
    buy/sell ordering SL-01 check runs at CA4 entry.
spec_lock_registry:
  semantic_locks:
  - id: SL-01
    description: Execute sell orders before buy orders in every trading cycle
    locked_value: sell() called before buy() in each Trader.run() iteration
    violation_is: fatal
    source_bd_ids:
    - BD-018
  - id: SL-02
    description: Trading signals MUST use next-bar execution (no look-ahead)
    locked_value: due_timestamp = happen_timestamp + level.to_second()
    violation_is: fatal
    source_bd_ids:
    - BD-014
    - BD-025
  - id: SL-03
    description: Entity IDs MUST follow format entity_type_exchange_code
    locked_value: stock_sh_600000 | stockhk_hk_0700 | stockus_nasdaq_AAPL
    violation_is: fatal
    source_bd_ids: []
  - id: SL-04
    description: DataFrame index MUST be MultiIndex (entity_id, timestamp)
    locked_value: df.index.names == ['entity_id', 'timestamp']
    violation_is: fatal
    source_bd_ids: []
  - id: SL-05
    description: 'TradingSignal MUST have EXACTLY ONE of: position_pct, order_money, order_amount'
    locked_value: XOR enforcement in trading/__init__.py:68
    violation_is: fatal
    source_bd_ids: []
  - id: SL-06
    description: 'filter_result column semantics: True=BUY, False=SELL, None/NaN=NO ACTION'
    locked_value: factor.py:475 order_type_flag mapping
    violation_is: fatal
    source_bd_ids: []
  - id: SL-07
    description: Transformer MUST run BEFORE Accumulator in factor pipeline
    locked_value: 'compute_result(): transform at :403 before accumulator at :409'
    violation_is: fatal
    source_bd_ids: []
  - id: SL-08
    description: 'MACD parameters locked: fast=12, slow=26, signal=9'
    locked_value: factors/algorithm.py:30 macd(slow=26, fast=12, n=9)
    violation_is: fatal
    source_bd_ids:
    - BD-036
  - id: SL-09
    description: 'Default transaction costs: buy_cost=0.001, sell_cost=0.001, slippage=0.001'
    locked_value: sim_account.py:25 SimAccountService default costs
    violation_is: warning
    source_bd_ids:
    - BD-029
  - id: SL-10
    description: A-share equity trading is T+1 (no same-day close of buy positions)
    locked_value: sim_account.available_long filters by trading_t
    violation_is: fatal
    source_bd_ids: []
  - id: SL-11
    description: Recorder subclass MUST define provider AND data_schema class attributes
    locked_value: contract/recorder.py:71 Meta; register_schema decorator
    violation_is: fatal
    source_bd_ids: []
  - id: SL-12
    description: Factor result_df MUST contain either 'filter_result' OR 'score_result' column
    locked_value: result_df.columns.intersection({'filter_result', 'score_result'}) non-empty
    violation_is: fatal
    source_bd_ids: []
  implementation_hints:
  - id: IH-01
    hint: 'Use AdjustType enum exactly: qfq (pre-adjust), hfq (post-adjust), bfq (none) — contract/__init__.py:121'
  - id: IH-02
    hint: For A-share kdata, default to hfq for long-term analysis (dividend-adjusted) — trader.py:538 StockTrader
  - id: IH-03
    hint: SQLite connection MUST use check_same_thread=False for multi-threaded recorders
  - id: IH-04
    hint: Accumulator state serialization uses JSON with custom encoder/decoder hooks — contract/base_service.py
  - id: IH-05
    hint: Factor.level MUST match TargetSelector.level (enforced at add_factor) — factors/target_selector.py:84
preservation_manifest:
  required_objects:
    business_decisions_count: 92
    fatal_constraints_count: 31
    non_fatal_constraints_count: 139
    use_cases_count: 1
    semantic_locks_count: 12
    preconditions_count: 4
    evidence_quality_rules_count: 2
    traceback_scenarios_count: 5
architecture:
  pipeline: data_collection -> data_storage -> factor_computation -> target_selection -> trading_execution -> visualization
  stages:
  - id: data_collection
    narrative:
      does_what: TimeSeriesDataRecorder and FixedCycleDataRecorder fetch OHLCV and fundamental data from providers (eastmoney,
        joinquant, baostock, akshare) and persist domain objects (Stock1dKdata, BalanceSheet) to SQLite via df_to_db().
      key_decisions: BD-002 chose evaluate_start_end_size_timestamps for incremental fetch (not full refresh) because comparing
        to get_latest_saved_record avoids redundant API calls; BD-003 chose get_data_map field transformation to keep domain
        schema provider-agnostic.
      common_pitfalls: 'Don''t forget SL-11: Recorder subclass MUST declare both provider and data_schema class attributes
        else initialization fails with assertion error; finance-C-001 fatal violation.'
    business_decisions: []
  - id: data_storage
    narrative:
      does_what: StorageBackend persists DataFrames to per-provider SQLite databases at {data_path}/{provider}/{provider}_{db_name}.db
        using path templates from _get_path_template; Mixin.record_data and Mixin.query_data provide uniform read/write interface.
      key_decisions: BD-004 chose StorageBackend abstraction (not hardcoded SQLite) to allow future cloud storage swap; BD-006
        derives db_name from data_schema __tablename__ for per-domain database isolation.
      common_pitfalls: SL-04 violation (wrong DataFrame index) causes factor pipeline failures downstream; always ensure df.index.names
        == ['entity_id', 'timestamp'] before calling record_data.
    business_decisions: []
  - id: factor_computation
    narrative:
      does_what: Factor.compute() applies Transformer (stateless, e.g. MacdTransformer) then Accumulator (stateful, e.g. MaStatsAccumulator)
        to produce filter_result or score_result columns; EntityStateService persists per-entity rolling state across batches.
      key_decisions: BD-007 chose Factor inheriting DataReader for composable data access; SL-08 locks MACD at (fast=12, slow=26,
        n=9) — chose standard Appel parameters not adaptive because interpretability matters for practitioners.
      common_pitfalls: 'SL-07: Transformer MUST run before Accumulator — swapping order causes NaN propagation; SL-12: result_df
        must contain filter_result OR score_result column or TargetSelector silently drops all signals.'
    business_decisions: []
  - id: target_selection
    narrative:
      does_what: TargetSelector.add_factor() registers Factor instances; get_targets() returns entity_ids passing threshold
        filter at a specific timestamp, enabling point-in-time historical backtesting without look-ahead.
      key_decisions: BD-012 chose registrable factor list (not hardcoded) for runtime customization; BD-013 chose timestamp-specific
        filtering not current-only because backtests need historical point-in-time correctness.
      common_pitfalls: Factor.level MUST match TargetSelector.level (IH-05); mismatched levels cause silent empty target lists
        that look like no signals but are actually level-mismatch bugs.
    business_decisions: []
  - id: trading_execution
    narrative:
      does_what: Trader.run() calls sell() before buy() each cycle, generates TradingSignals with due_timestamp = happen_timestamp
        + level.to_second() for next-bar execution, and applies on_profit_control() for stop-loss/take-profit before regular
        target selection.
      key_decisions: SL-01 locks sell-before-buy order because available_long check in sim_account depends on it — chose this
        over symmetric ordering to prevent implicit leverage; BD-039 chose long=AND/short=OR multi-level logic to reflect
        risk asymmetry.
      common_pitfalls: 'SL-02 violation (immediate execution instead of next-bar) introduces look-ahead bias and makes backtest
        results unreproducible in live trading; SL-10: A-share T+1 constraint — backtesting without it overstates returns.'
    business_decisions: []
  - id: visualization
    narrative:
      does_what: Drawer.draw() combines kline main chart with factor overlays and Rect annotations for entry/exit signals
        using Plotly; Drawable interface on Factor enables consistent chart rendering across data types.
      key_decisions: BD-019 chose drawer_rects subclass override for custom annotations not hardcoded markers — allows traders
        to define entry/exit visuals without modifying base drawing logic.
      common_pitfalls: draw_result=True by default (BD-055) is fine for development but set draw_result=False in production/headless
        environments to avoid Plotly server startup overhead.
    business_decisions: []
  - id: cross_cutting_concerns
    narrative:
      does_what: 'Invariants and utilities that span multiple pipeline stages — collected from 36 source groups: 10-Q Bug
        Detection(1), 10-Q Processing(1), 8-K Processing(1), Caching Strategy(1), Directory Setup(3), Error Handling(1), and
        30 more.'
      key_decisions: 92 BDs merged here because they apply to more than one main stage (e.g. algorithm helpers, default value
        choices, ordering contracts, error handling). Agent should inspect individual BD summaries and link back to affected
        main stages via shared IDs.
      common_pitfalls: Cross-cutting concerns frequently surface as bugs when changes to one main stage unintentionally break
        another. Check constraints referencing these BDs and verify invariants still hold after any stage-local modification.
    business_decisions:
    - id: BD-057
      type: B/BA
      summary: 10-Q part separation bug detected when PART I is only mentioned in ToC and PART II is much longer
    - id: BD-038
      type: B/RC
      summary: '10-Q documents parsed in two parts: Part I (Items 1-4) and Part II (Items 1-6)'
    - id: BD-039
      type: B/RC
      summary: 8-K item format uses decimal notation (1.01, 2.01, 5.01) not simple numbers
    - id: BD-045
      type: B/RC
      summary: Company info cached in JSON file (companies_info.json) to avoid redundant API calls
    - id: BD-017
      type: B/BA
      summary: Dataset directory (DATASET_DIR) is created alongside __init__.py in a 'datasets' subfolder rather than allowing
        user specification
    - id: BD-018
      type: B
      summary: Logging directory (LOGGING_DIR) is created alongside __init__.py in a 'logs' subfolder rather than allowing
        user specification
    - id: BD-019
      type: B
      summary: Directories are created at import time in __init__.py rather than lazily or on-demand
    - id: BD-051
      type: B/DK
      summary: If each items are null after extraction, log warning and return None to skip filing
    - id: BD-046
      type: B/DK
      summary: 'Downloaded filename format: {CIK}_{FILING_TYPE}_{YEAR}_{ACCESSION_NUM}.{EXT}'
    - id: BD-056
      type: B/RC
      summary: File reading uses errors='backslashreplace' to handle encoding issues gracefully
    - id: BD-048
      type: B
      summary: CSV metadata written to temporary file first, then moved to final location to prevent data loss
    - id: BD-023
      type: B/RC
      summary: 8-K item naming change from simple numbers (1, 2, 3) to decimal format (1.01, 2.01, 5.01) occurred on August
        23, 2004
    - id: BD-026
      type: B
      summary: HTML closing tags (div, tr, p, li) replaced with two newline characters during stripping
    - id: BD-027
      type: B
      summary: <br> tags replaced with two newline characters during HTML stripping
    - id: BD-028
      type: B
      summary: TH/TD closing tags replaced with spaces rather than newlines during HTML stripping
    - id: BD-034
      type: B/RC
      summary: Item patterns adjusted to insert optional whitespace before trailing letters (A, B, C) for flexible matching
    - id: BD-035
      type: B/BA
      summary: 'SIGNATURE section allows variations: SIGNATURE, SIGNATURES, or Signature(s)'
    - id: BD-064
      type: B/BA
      summary: Item index pattern includes word boundary characters ([.*~-:\s\(]) after item number
    - id: BD-052
      type: B/BA
      summary: If no items_to_extract specified, each items for the filing type are extracted
    - id: BD-043
      type: B/DK
      summary: Retry mechanism uses 5 retries with exponential backoff factor of 0.2 for network requests
    - id: BD-062
      type: B
      summary: Exponential backoff status codes include 400, 401, 403, 500, 502, 503, 504, 505
    - id: BD-050
      type: B/BA
      summary: Process pool uses 1 worker process for parallel extraction
    - id: BD-065
      type: B/BA
      summary: Whitespace (but not newlines) matched as [\^\S\r\n] in patterns to preserve line breaks
    - id: BD-044
      type: B/RC
      summary: SEC rate limit response detected by checking for 'will be managed until action is taken' text
    - id: BD-036
      type: B/BA
      summary: Item section extraction selects longest matching section between item markers
    - id: BD-037
      type: B/RC
      summary: SIGNATURE extraction uses last occurrence in document rather than first
    - id: BD-063
      type: B/RC
      summary: Case-sensitive search attempted first before falling back to case-insensitive for item matching
    - id: BD-053
      type: B/BA
      summary: SIGNATURE section excluded by default; enabled via include_signature config flag
    - id: BD-033
      type: B
      summary: Horizontal span margins replaced with single space, vertical margins with single newline
    - id: BD-054
      type: B/BA
      summary: Tables removed by default during extraction; disabled via remove_tables config flag
    - id: BD-031
      type: B/RC
      summary: 'Non-blank background colors (not white, transparent, none, or #fff) trigger table removal'
    - id: BD-032
      type: B
      summary: Tables containing item index headers (Item 1, Item 1A, etc.) are preserved even if they have background colors
    - id: BD-029
      type: B/RC
      summary: Multiple consecutive newlines and spaces normalized to single newline, then multiple spaces to single space
    - id: BD-030
      type: B/RC
      summary: Special Unicode characters (smart quotes, em-dashes, various Unicode dashes) normalized to ASCII equivalents
    - id: BD-060
      type: B/RC
      summary: Page numbers and headers removed during text cleanup using regex patterns
    - id: BD-061
      type: B/RC
      summary: Table of Contents, Index to Financial Statements, Back to Contents, Quicklinks headers removed
    - id: BD-066
      type: B
      summary: Whitespace normalization function preserves structure while removing excessive spacing
    - id: BD-022
      type: B/BA
      summary: Regex flags set to IGNORECASE | DOTALL | MULTILINE for each item pattern matching
    - id: BD-041
      type: B/BA
      summary: Index URLs created by prepending 'https://www.sec.gov/Archives/' to relative paths
    - id: BD-004
      type: B/RC
      summary: Parse Document Format Files table for .htm/.html links; fall back to complete submission text file
    - id: BD-005
      type: M/BA
      summary: Store company metadata in companies_info.json to reduce per-filing lookups
    - id: BD-006
      type: BA/DK
      summary: 'Filename convention: {CIK}_{Type}_{Year}_{accession}.{ext}'
    - id: BD-047
      type: B/BA
      summary: 'Incremental download: existing files are skipped but new filings are downloaded'
    - id: BD-074
      type: BA
      summary: HtmlStripper sets convert_charrefs=True and strict=False - affects HTML parsing
    - id: BD-007
      type: B/RC
      summary: Detect HTML vs plain text by checking for <td> and <tr> elements
    - id: BD-008
      type: M/BA
      summary: Remove numerical tables but preserve text-containing tables via background-color detection
    - id: BD-009
      type: BA
      summary: Handle 10-Q two-part structure by splitting text before item extraction
    - id: BD-010
      type: B/BA
      summary: Adjust regex patterns for Roman numerals to capture both I,II and 1,2 formats
    - id: BD-011
      type: M/BA
      summary: Select longest matching section when multiple candidates exist (handles TOC interference)
    - id: BD-012
      type: M/BA
      summary: Process filings in parallel via ProcessPool
    - id: BD-013
      type: B/RC
      summary: '8-K items renamed after August 23, 2004 (old: 1-12, new: 1.01-9.01)'
    - id: BD-014
      type: M/BA
      summary: Set recursion limit to 30000 to handle deeply nested HTML
    - id: BD-020
      type: B/BA
      summary: Python recursion limit increased from default 1000 to 30000 to handle deeply nested HTML structures
    - id: BD-024
      type: B/RC
      summary: Roman numeral mapping (1-20) used for converting numeric parts to Roman numerals for 10-Q parsing
    - id: BD-025
      type: B/RC
      summary: HTML document detected by presence of both <td> AND <tr> elements (not just one)
    - id: BD-055
      type: B/RC
      summary: Embedded PDF sections (<PDF>...</PDF>) stripped from HTML documents
    - id: BD-058
      type: B
      summary: HTMLParser used for HTML stripping with custom data handler that accumulates text
    - id: BD-067
      type: B/BA
      summary: Date threshold for 8-K form version detection
    - id: BD-068
      type: B/BA
      summary: Background color filtering for table removal decision
    - id: BD-069
      type: B/RC
      summary: Special character Unicode normalization
    - id: BD-070
      type: B/BA
      summary: Ignore-matches counter for ToC filtering
    - id: BD-083
      type: BA
      summary: 'INTERACTION: BD-076 (global recursion limit 30000) × BD-074 (HtmlStripper HTMLParser settings) × BD-014 (recursion
        limit declaration) → StackOverflow risk cascade in deeply nested documents'
    - id: BD-084
      type: BA
      summary: 'INTERACTION: BD-001 (incremental download) × BD-047 (skip existing files) × BD-077 (CSV format contract) →
        Amplified efficiency gains with silent failure risk'
    - id: BD-085
      type: B/BA
      summary: 'INTERACTION: BD-072 (8-K cutoff invariant) × BD-067 (date threshold) × BD-013 (8-K item naming) → Critical
        invariant with contradictory implementation risk'
    - id: BD-086
      type: B/RC
      summary: 'INTERACTION: BD-003 (exponential backoff) × BD-044 (rate limit text detection) × BD-062 (status_forcelist)
        → Redundant error handling with partial coverage'
    - id: BD-087
      type: B
      summary: 'INTERACTION: BD-009 (10-Q two-part structure) × BD-038 (10-Q item naming) × BD-075 (part-item delimiter) ×
        BD-079 (Roman numeral map) → Cascading dependency on parsing sequence'
    - id: BD-088
      type: B
      summary: 'INTERACTION: BD-017 (DATASET_DIR fixed) × BD-018 (LOGGING_DIR fixed) × BD-019 (eager directory creation) →
        Deployment rigidity causing permission errors in restricted environments'
    - id: BD-089
      type: BA
      summary: 'INTERACTION: BD-045 (company info cache) × BD-002 (CIK lookup cache) → Duplicate caching mechanisms with stale
        data amplification risk'
    - id: BD-090
      type: B
      summary: 'INTERACTION: BD-007 (HTML detection) × BD-025 (td+tr detection) × BD-058 (HtmlStripper) → Detection failure
        cascades to extraction failure on edge-case documents'
    - id: BD-001
      type: BA
      summary: Download indices per-quarter to enable incremental updates without re-fetching each history
    - id: BD-002
      type: M/BA
      summary: Use separate company_info.json cache to avoid redundant CIK lookups
    - id: BD-003
      type: BA
      summary: Exponential backoff with 5 retries on each HTTP requests
    - id: BD-040
      type: B/RC
      summary: 'Quarterly indices stored as TSV files with pipe delimiter, columns: CIK, Company, Type, Date, links, etc.'
    - id: BD-042
      type: B/DK
      summary: EDGAR indices downloaded by year and quarter (e.g., 2023_QTR1.tsv, 2023_QTR2.tsv)
    - id: BD-059
      type: B/DK
      summary: Skip future quarters when downloading indices (based on current date)
    - id: BD-GAP-001
      type: DK
      summary: 'Missing: Stale data detection and expiry policy'
    - id: BD-GAP-002
      type: DK
      summary: 'Missing: Random seed full coverage'
    - id: BD-080
      type: DK
      summary: HtmlStripper inherits from HTMLParser - users may not realize this dependency
    - id: BD-072
      type: RC
      summary: 8-K obsolete cutoff date 2004-08-23 must match between code and tests
    - id: BD-075
      type: RC
      summary: 10-Q item naming convention uses '__' delimiter to encode part-item relationship
    - id: BD-077
      type: RC
      summary: FILINGS_METADATA.csv format is implicit contract between download and extract
    - id: BD-079
      type: RC
      summary: roman_numeral_map keys (1-20) must match part numbers in item_list_10q
    - id: BD-015
      type: M/DK
      summary: Timestamp log filenames for run-level isolation
    - id: BD-016
      type: M
      summary: Console shows INFO+, file captures DEBUG+
    - id: BD-021
      type: B/RC
      summary: CSS utils logging is suppressed at CRITICAL level to avoid noise from the library
    - id: BD-049
      type: B/BA
      summary: Console logging set to INFO level (not DEBUG) to reduce noise during execution
    - id: BD-071
      type: B
      summary: process_filing MUST call determine_items_to_extract BEFORE extract_items
    - id: BD-073
      type: B/RC
      summary: 10-Q extraction requires parts parsed before items - get_10q_parts before item loop
    - id: BD-076
      type: B/BA
      summary: Global recursion limit 30000 set at module load - affects each imports
    - id: BD-081
      type: BA
      summary: Logger instantiated at module level before config.json loaded
    - id: BD-078
      type: BA/DK
      summary: 10-Q bug recovery modifies self.items_list state then restores it
    - id: BD-082
      type: BA
      summary: 10-Q length_difference threshold 5000 chars drives retry loop
resources:
  packages:
  - name: beautifulsoup4==4.8.2
    version_pin: latest
  - name: lxml==4.9.1
    version_pin: latest
  - name: requests==2.31.0
    version_pin: latest
  - name: pandas==1.5.3
    version_pin: latest
  - name: click==7.0
    version_pin: latest
  - name: tqdm==4.42.1
    version_pin: latest
  - name: numpy==1.24.4
    version_pin: latest
  - name: cssutils==1.0.2
    version_pin: latest
  - name: pathos==0.2.9
    version_pin: latest
  - name: urllib3==1.26.7
    version_pin: latest
  strategy_scaffold:
    entry_point_name: run_backtest
    output_path: result.csv
    execution_mode: backtest
    conditional_entry_points:
      backtest:
        entry_point_name: run_backtest
        output_path: result.csv
      collector:
        entry_point_name: run_collector
        output_path: result.json
      factor:
        entry_point_name: run_factor
        output_path: result.parquet
      training:
        entry_point_name: run_training
        output_path: result.json
      serving:
        entry_point_name: run_server
        output_path: result.json
      research:
        entry_point_name: run_research
        output_path: result.json
    tail_template: "# === DO NOT MODIFY BELOW THIS LINE ===\nif __name__ == \"__main__\":\n    result = run_backtest()  #\
      \ implement above\n    from validate import enforce_validation\n    enforce_validation(result, output_path=\"{workspace}/result.csv\"\
      )\n# === END DO NOT MODIFY ==="
  host_adapter:
    target: openclaw
    timeout_seconds: 1800
    shell_operator_restriction: 'exec tool intercepts && / ; / | — never chain: ''pip install X && python Y''. Use separate
      exec calls.'
    install_recipes:
    - python3 -m pip install beautifulsoup4==4.8.2
    - python3 -m pip install lxml==4.9.1
    - python3 -m pip install requests==2.31.0
    - python3 -m pip install zvt
    credential_injection: JoinQuant/QMT credentials require user-side '!' prefix shell login. Never hardcode credentials in
      generated scripts.
    path_resolution: '{workspace} resolves to ~/.openclaw/workspace/doramagic at execution time.'
    file_io_tooling: Use openclaw 'write' tool for .py/.sql files; 'exec' tool for python3 /absolute/path/script.py (absolute
      paths only).
constraints:
  fatal:
  - id: finance-C-006
    when: When requesting data from SEC EDGAR
    action: include a valid User-Agent header identifying the requester with contact information
    severity: fatal
    kind: resource_boundary
    modality: must
    consequence: SEC EDGAR will reject requests without valid User-Agent identification with 403 Forbidden errors, preventing
      any data downloads
    stage_ids:
    - index_download
  - id: finance-C-017
    when: When constructing SEC EDGAR index URLs
    action: 'use the official SEC EDGAR full-index URL pattern: https://www.sec.gov/Archives/edgar/full-index/{year}/QTR{quarter}/master.zip'
    severity: fatal
    kind: resource_boundary
    modality: must
    consequence: Using incorrect URL patterns will result in 404 errors and complete download failure
    stage_ids:
    - index_download
  - id: finance-C-021
    when: When downloading SEC EDGAR filings via HTTP requests
    action: declare a valid User-Agent header containing contact information (name and email)
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: SEC EDGAR will block or throttle requests without a valid User-Agent header, causing downloads to fail with
      HTTP 403 errors
    stage_ids:
    - crawl_and_download
  - id: finance-C-028
    when: When downloading filings from SEC EDGAR API endpoints
    action: implement retry logic with exponential backoff to handle rate limiting responses (HTTP 429) and transient errors
    severity: fatal
    kind: operational_lesson
    modality: must
    consequence: SEC EDGAR enforces rate limits; without retry-backoff, repeated requests will trigger temporary IP blocks,
      halting all subsequent downloads
    stage_ids:
    - crawl_and_download
  - id: finance-C-030
    when: When generating filenames for downloaded filing documents
    action: use the convention {CIK}_{FilingTypeName}_{Year}_{accession}.{ext} to verify uniqueness per filing
    severity: fatal
    kind: architecture_guardrail
    modality: must
    consequence: Non-unique filenames cause subsequent downloads to overwrite existing files, resulting in data loss and incorrect
      filing-to-metadata associations
    stage_ids:
    - crawl_and_download
  - id: finance-C-032
    when: When requesting SEC EDGAR index files
    action: respect SEC EDGAR's rate limit of 10 requests per second to avoid triggering automated IP blocks
    severity: fatal
    kind: resource_boundary
    modality: must
    consequence: Exceeding rate limits causes SEC EDGAR to temporarily block the IP address, preventing all subsequent downloads
      until the block expires (typically 15-60 minutes)
    stage_ids:
    - crawl_and_download
  - id: finance-C-037
    when: When creating the FILINGS_METADATA.csv output file
    action: 'include each required columns: cik, company, filing_type, filing_date, period_of_report, sic, state_of_inc, htm_filing_link,
      filename'
    severity: fatal
    kind: architecture_guardrail
    modality: must
    consequence: Missing columns in the metadata CSV breaks downstream parsing stages that expect specific field names, causing
      KeyError exceptions in extract_items.py
    stage_ids:
    - crawl_and_download
  - id: finance-C-041
    when: When extracting items from SEC filings
    action: Detect HTML vs plain text by checking for <td> and <tr> table elements presence
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Incorrect format detection causes HTML tags to appear in extracted text or structured data to be lost, corrupting
      the extracted JSON output with malformed content
    stage_ids:
    - document_parsing
  - id: finance-C-042
    when: When removing HTML tables from filings
    action: Preserve unstyled tables that may contain item listings while removing styled financial tables
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Removing all tables indiscriminately causes item section headers and listing tables to be deleted, resulting
      in incomplete extraction of filing content
    stage_ids:
    - document_parsing
  - id: finance-C-043
    when: When processing 10-Q filings
    action: Separate document text into Part I and Part II before extracting items to prevent cross-contamination
    severity: fatal
    kind: architecture_guardrail
    modality: must
    consequence: Without part separation, identical item names in different parts (e.g., Item 1 in Part I vs Item 1 in Part
      II) cause content to be mixed or incorrectly attributed, corrupting the extracted data
    stage_ids:
    - document_parsing
  - id: finance-C-046
    when: When processing 8-K filings
    action: Use obsolete item numbering (1-12) for filings before 2004-08-23 and new numbering (1.01-9.01) for later filings
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Using wrong item numbering scheme causes no matches to be found for historical filings, resulting in empty
      item sections and complete extraction failure
    stage_ids:
    - document_parsing
  - id: finance-C-047
    when: When parsing deeply nested HTML documents
    action: Set Python recursion limit to 30000 to handle SEC filings with deeply nested tables and div elements
    severity: fatal
    kind: resource_boundary
    modality: must
    consequence: Default recursion limit of 1000 causes StackOverflow errors on malformed or deeply nested HTML documents,
      preventing extraction from completing
    stage_ids:
    - document_parsing
  - id: finance-C-051
    when: When generating JSON output from filings
    action: Name 10-K/8-K items as item_1, item_1A, item_2 and 10-Q items as part_1_item_1, part_2_item_1A per filing type
      convention
    severity: fatal
    kind: architecture_guardrail
    modality: must
    consequence: Inconsistent item naming prevents downstream NLP applications from reliably locating specific sections, causing
      feature extraction failures
    stage_ids:
    - document_parsing
  - id: finance-C-052
    when: When writing JSON output files
    action: Create filing type subdirectories (10-K, 10-Q, 8-K) before writing extracted JSON files
    severity: fatal
    kind: architecture_guardrail
    modality: must
    consequence: Missing directories cause FileNotFoundError during JSON write operations, preventing extracted data from
      being persisted to disk
    stage_ids:
    - document_parsing
  - id: finance-C-062
    when: When configuring the logging infrastructure
    action: Set the file logging level below DEBUG
    severity: fatal
    kind: domain_rule
    modality: must_not
    consequence: Setting file level below DEBUG will exclude DEBUG messages including request details needed for post-run
      forensics, violating the acceptance criteria that log files must contain DEBUG-level messages
    stage_ids:
    - logging
  - id: finance-C-063
    when: When configuring console logging output
    action: Set console handler level to INFO or higher
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Setting console level below INFO will cause DEBUG-level spam in stdout, violating the acceptance criterion
      that console output shows INFO-level messages only
    stage_ids:
    - logging
  - id: finance-C-064
    when: When generating log filenames
    action: Include timestamp in format YYYY_MM_DD_HH_MM_SS for run-level isolation
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Without timestamp in the filename, multiple runs will overwrite each other's log files, preventing debugging
      of specific execution windows
    stage_ids:
    - logging
  - id: finance-C-076
    when: When configuring the SEC EDGAR API connection
    action: Set user_agent to a valid contact string containing name and email (e.g., 'John Doe [email protected]')
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: SEC EDGAR will block requests without a proper User-Agent header, causing all index downloads and crawls
      to fail with traffic management messages
  - id: finance-C-078
    when: When processing TSV index files into DataFrames
    action: Treat each CSV/TSV fields as strings using dtype=str to prevent numeric coercion of CIK and numeric identifiers
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: CIK values like '0000320193' get coerced to integers 320193, causing file path mismatches and missing filing
      metadata lookups downstream
  - id: finance-C-079
    when: When transferring DataFrame between index_download and crawl_and_download stages
    action: 'Include each required columns: CIK, Company, Type, Date, complete_text_file_link, html_index, Filing Date, Period
      of Report, SIC, htm_file_link, State of Inc, State location, Fiscal Year End, filename'
    severity: fatal
    kind: architecture_guardrail
    modality: must
    consequence: document_parsing stage expects specific columns to build JSON structure; missing columns cause KeyError exceptions
      during extraction
  - id: finance-C-083
    when: When SEC EDGAR returns a 200 response with traffic management HTML
    action: Treat such responses as successful downloads; the content must be validated for expected HTML structure
    severity: fatal
    kind: resource_boundary
    modality: must_not
    consequence: Traffic management pages get saved as raw filings, corrupting the dataset with invalid HTML that causes extraction
      failures in document_parsing stage
  - id: finance-C-087
    when: When reading raw filing documents from disk
    action: Construct file path as {raw_filings_folder}/{Type}/{filename} matching the directory structure created during
      download
    severity: fatal
    kind: architecture_guardrail
    modality: must
    consequence: Incorrect file path causes FileNotFoundError, preventing document_parsing stage from processing downloaded
      filings
  - id: finance-C-095
    when: When reading or writing FILINGS_METADATA.csv between download and extract stages
    action: 'Verify CSV column names match exactly: CIK, Company, Type, Date, complete_text_file_link, html_index, Filing
      Date, Period of Report, SIC, htm_file_link, State of Inc, State location, Fiscal Year End, filename'
    severity: fatal
    kind: architecture_guardrail
    modality: must
    consequence: Extract stage fails with KeyError when accessing metadata columns that have mismatched names, causing the
      entire extraction pipeline to crash
  - id: finance-C-111
    when: When implementing 8-K filing extraction logic in extract_items.py
    action: Verify the 8-K obsolete cutoff date 2004-08-23 is consistent across both production code and test assertions to
      correctly identify which 8-K filings are obsolete under SEC filing rules
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Inconsistent cutoff date between code and tests creates false confidence scenarios where test validation
      passes but production fails, violating SEC regulatory requirements for 8-K filing extraction
    derived_from_bd_id: BD-072
  - id: finance-C-114
    when: When implementing or modifying document type detection logic for SEC filings
    action: Detect HTML vs plain text by checking for <td> and <tr> elements — this specific check distinguishes structured
      HTML from plain text with embedded tags
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Using extension-based detection or other heuristics causes incorrect parsing of .txt files containing HTML
      content, corrupting extracted SEC filing data
    derived_from_bd_id: BD-007
  - id: finance-C-118
    when: When implementing 10-Q item extraction parsing logic
    action: Use '__' as the delimiter when encoding part-item relationships in SEC filing section names — the parsing logic
      at line 927 depends on split('__') to correctly separate section numbers from item numbers
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Using a different delimiter breaks the hierarchical mapping between SEC filing sections and extracted items,
      causing structural corruption of the parsed 10-Q document tree
    derived_from_bd_id: BD-075
  - id: finance-C-119
    when: When modifying the FILINGS_METADATA.csv production or consumption logic in extract_items.py
    action: Maintain exact column names, ordering, and data types as specified in the implicit contract between download module
      (lines 424-439) and extract module (line 1199) — any change requires coordinated updates to both modules
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: If download changes CSV column names, ordering, or data types without coordinating with extract, downstream
      extraction will fail silently or produce incorrect filing metadata, corrupting all subsequent SEC filing analysis
    derived_from_bd_id: BD-077
  - id: finance-C-120
    when: When implementing part extraction logic for SEC 10-Q filings in extract_items.py
    action: Verify roman_numeral_map keys (I-XX, defined at lines 32-53) exactly match the PART regex pattern (line 540) —
      any mismatch causes part extraction to silently skip or misidentify SEC section boundaries
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: If roman_numeral_map keys do not align with PART regex pattern, part extraction will silently skip or misidentify
      SEC section boundaries, causing item content to be attributed to wrong sections in 10-Q filings
    derived_from_bd_id: BD-079
  - id: finance-C-121
    when: When implementing 8-K item extraction logic in extract_items.py
    action: 'Apply date-based pattern switching for 8-K item numbering: use decimal notation (1.01-9.01) for filings on or
      after August 23, 2004, and sequential integers (1-12) for filings before that date'
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Using only one pattern causes complete parsing failure for pre-2004 8-K filings — SEC mandated item numbering
      format change on August 23, 2004, and historical filings must use the old format
    derived_from_bd_id: BD-013
  - id: finance-C-141
    when: When implementing or testing 8-K filing format detection logic
    action: Centralize the 8-K cutoff date 2004-08-23 as a single shared constant with import-time validation — both BD-067
      and BD-013 implementations must reference the same constant to prevent timezone-related or rounding discrepancies
    severity: fatal
    kind: architecture_guardrail
    modality: must
    consequence: Without centralized date handling, tests and production code may use slightly different date representations
      for the 2004-08-23 8-K cutoff, causing silent incorrect extraction of pre-2004 8-K filings without any error indication
    derived_from_bd_id: BD-085
  - id: finance-C-168
    when: When implementing SEC EDGAR data retrieval with rate limit handling
    action: 'Verify rate limit detection operates as OR logic across each three mechanisms: (1) BD-044 HTML text detection
      (''will be managed until action is taken''), (2) BD-062 HTTP status codes (403/429), and (3) BD-003 exponential backoff
      retries — each three paths must independently trigger rate limit response'
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Incomplete rate limit detection causes SEC EDGAR requests to fail silently or return partial data. This violates
      regulatory data access reliability requirements and may result in gaps in mandatory financial disclosures used for trading
      decisions
    derived_from_bd_id: BD-086
  regular:
  - id: finance-C-001
    when: When parsing SEC EDGAR master.idx file content
    action: decode content using latin-1 encoding to preserve original byte values
    severity: high
    kind: domain_rule
    modality: must
    consequence: Using incorrect encoding (e.g., utf-8) will corrupt company names and paths containing non-ASCII characters,
      resulting in missing or malformed filing records in the index
    stage_ids:
    - index_download
  - id: finance-C-002
    when: When processing SEC EDGAR master.idx file header
    action: skip the first 10 lines containing header/metadata before parsing data rows
    severity: high
    kind: domain_rule
    modality: must
    consequence: Including header lines in the parsed data will cause downstream processing to fail when attempting to parse
      header text as filing records
    stage_ids:
    - index_download
  - id: finance-C-003
    when: When validating quarter parameters for SEC EDGAR index download
    action: pass invalid quarter values other than 1, 2, 3, or 4 to the download function
    severity: high
    kind: domain_rule
    modality: must_not
    consequence: Invalid quarter values will cause the download to fail with an exception, preventing any index files from
      being retrieved
    stage_ids:
    - index_download
  - id: finance-C-004
    when: When downloading indices for the current calendar year
    action: skip quarters that have not yet occurred based on current month calculation
    severity: medium
    kind: domain_rule
    modality: must
    consequence: Attempting to download future quarters will result in 404 errors and failed index downloads, wasting network
      bandwidth and causing incorrect failure tracking
    stage_ids:
    - index_download
  - id: finance-C-005
    when: When naming the downloaded SEC EDGAR index TSV files
    action: use the naming convention {year}_QTR{quarter}.tsv as required by downstream processing
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Incorrect file naming will cause downstream stages to fail when searching for index files, breaking the entire
      crawling pipeline
    stage_ids:
    - index_download
  - id: finance-C-007
    when: When making HTTP requests to SEC EDGAR
    action: implement retry logic with exponential backoff for handling rate limits and transient failures
    severity: high
    kind: resource_boundary
    modality: must
    consequence: Without retry logic, rate-limited requests (403 errors) will cause immediate download failures, preventing
      successful index retrieval
    stage_ids:
    - index_download
  - id: finance-C-008
    when: When retrying failed SEC EDGAR requests
    action: include HTTP 403 in the list of status codes that trigger automatic retry
    severity: high
    kind: resource_boundary
    modality: must
    consequence: Excluding 403 from retry status codes will cause rate-limit errors to fail immediately instead of being retried,
      breaking downloads
    stage_ids:
    - index_download
  - id: finance-C-009
    when: When processing already-downloaded SEC EDGAR indices
    action: enable skip_present_indices option to avoid redundant network calls and API rate limit consumption
    severity: medium
    kind: operational_lesson
    modality: should
    consequence: Re-downloading existing indices wastes bandwidth, consumes SEC EDGAR API rate limits, and extends execution
      time unnecessarily
    stage_ids:
    - index_download
  - id: finance-C-010
    when: When downloading SEC EDGAR master.zip archives
    action: extract and process the master.idx file from within the downloaded zip archive
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Failing to extract from the zip archive will cause the download to fail when trying to read the raw zip bytes
      as text
    stage_ids:
    - index_download
  - id: finance-C-011
    when: When processing SEC EDGAR index file paths
    action: convert .txt file references to -index.html references for proper HTML index access
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Using .txt references instead of -index.html will cause downstream document downloads to fail, as SEC EDGAR
      HTML indices are the standard access method
    stage_ids:
    - index_download
  - id: finance-C-012
    when: When saving processed index files to disk
    action: use pipe-delimiter format preserving CIK|Company|Form|Date|Path|HTML_Index structure
    severity: high
    kind: domain_rule
    modality: must
    consequence: Incorrect delimiter or missing fields will cause downstream parsing to fail when expecting the standard SEC
      EDGAR index format
    stage_ids:
    - index_download
  - id: finance-C-013
    when: When making claims about SEC EDGAR data coverage
    action: claim that downloaded indices represent complete real-time data without regulatory delays
    severity: medium
    kind: claim_boundary
    modality: must_not
    consequence: SEC EDGAR data has inherent delays and filing deadlines; presenting the data as real-time would mislead users
      about data freshness
    stage_ids:
    - index_download
  - id: finance-C-014
    when: When handling failed SEC EDGAR index downloads
    action: track failed indices separately and prompt user for retry decision instead of silently continuing
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Silently continuing after download failures will result in incomplete index coverage, causing downstream
      processing to miss filings from failed periods
    stage_ids:
    - index_download
  - id: finance-C-015
    when: When setting SEC EDGAR API request backoff parameters
    action: use backoff_factor of 0.2 or higher to avoid overwhelming SEC EDGAR rate limits
    severity: high
    kind: resource_boundary
    modality: must
    consequence: Too-aggressive backoff (or no backoff) will cause repeated 403 rate-limit errors, potentially resulting in
      temporary or permanent IP blocking by SEC EDGAR
    stage_ids:
    - index_download
  - id: finance-C-016
    when: When verifying downloaded index file existence
    action: check file existence using os.path.exists before deciding to skip or download
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Skipping the existence check will cause incorrect behavior when skip_present_indices is True but files are
      missing
    stage_ids:
    - index_download
  - id: finance-C-018
    when: When configuring the index download stage
    action: set start_year greater than end_year as this creates an empty download range
    severity: high
    kind: domain_rule
    modality: must_not
    consequence: Invalid year range will cause the download loop to execute zero iterations, producing no index files and
      silent failure
    stage_ids:
    - index_download
  - id: finance-C-019
    when: When using this tool for financial analysis or regulatory compliance
    action: claim this tool provides official SEC filings or guaranteed regulatory compliance verification
    severity: high
    kind: claim_boundary
    modality: must_not
    consequence: Presenting scraped EDGAR data as official or compliant could lead to legal liability and incorrect financial
      decisions based on potentially outdated or incomplete data
    stage_ids:
    - index_download
  - id: finance-C-020
    when: When considering skipping the retry mechanism
    action: skip the exponential backoff retry logic even when encountering transient network errors
    severity: high
    kind: rationalization_guard
    modality: must_not
    consequence: Skipping retries will cause single transient failures to become complete download failures, wasting previous
      successful requests in the batch
    stage_ids:
    - index_download
  - id: finance-C-022
    when: When crawling HTML index pages from SEC EDGAR
    action: extract the Period of Report field from the filing page; return None if it cannot be found
    severity: high
    kind: domain_rule
    modality: must
    consequence: Filings without a Period of Report cannot be properly categorized by year, causing incorrect temporal ordering
      and potential duplication of financial data
    stage_ids:
    - crawl_and_download
  - id: finance-C-023
    when: When processing EDGAR master.idx index files
    action: decode index file content using latin-1 encoding before processing
    severity: high
    kind: domain_rule
    modality: must
    consequence: Using incorrect encoding (e.g., UTF-8) will cause character decoding errors for non-ASCII company names,
      resulting in corrupted or truncated metadata entries
    stage_ids:
    - crawl_and_download
  - id: finance-C-024
    when: When downloading filing documents from SEC EDGAR
    action: prefer HTML (.htm/.html) document links over complete submission text files as primary download target
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Falling back directly to complete submission text files without attempting HTML parsing produces unstructured
      data that downstream parsers cannot process correctly
    stage_ids:
    - crawl_and_download
  - id: finance-C-025
    when: When encountering iXBRL document links (ix?doc= prefix) during filing download
    action: strip the ix?doc=/ prefix from URLs before downloading to obtain valid document URLs
    severity: high
    kind: resource_boundary
    modality: must
    consequence: Downloading with ix?doc=/ prefixed URLs will result in 404 errors or invalid content, causing the filing
      document to be missing from the dataset
    stage_ids:
    - crawl_and_download
  - id: finance-C-026
    when: When downloading indices for the current year
    action: skip quarters that have not yet elapsed to avoid requesting non-existent data
    severity: medium
    kind: domain_rule
    modality: must
    consequence: Requesting future quarters will return empty or 404 responses, wasting network bandwidth and potentially
      corrupting index state
    stage_ids:
    - crawl_and_download
  - id: finance-C-027
    when: When storing filing metadata and downloaded files
    action: write CSV metadata to a temporary file first, then atomically move to final location
    severity: high
    kind: operational_lesson
    modality: must
    consequence: Writing directly to the metadata CSV risks data loss if the process is interrupted (e.g., Ctrl+C), leaving
      an incomplete or corrupted metadata file
    stage_ids:
    - crawl_and_download
  - id: finance-C-029
    when: When organizing downloaded raw filing documents
    action: store files in subdirectories named after the filing type (e.g., RAW_FILINGS/10-K/)
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Storing all filing types in a single directory causes file name collisions and makes downstream parsing select
      the wrong document for each filing type
    stage_ids:
    - crawl_and_download
  - id: finance-C-031
    when: When fetching company metadata (SIC, state, fiscal year) for multiple filings
    action: cache company metadata in companies_info.json to avoid redundant HTTP requests per filing
    severity: medium
    kind: resource_boundary
    modality: must
    consequence: Fetching company metadata for each filing causes N redundant HTTP requests per company, multiplying API load
      and slowing down bulk downloads significantly
    stage_ids:
    - crawl_and_download
  - id: finance-C-033
    when: When processing Document Format Files table in EDGAR HTML indexes
    action: validate that tr.contents[7] exists and matches target filing types before extracting document links
    severity: high
    kind: domain_rule
    modality: must
    consequence: Accessing index 7 without bounds checking causes IndexError exceptions that crash the crawl process, leaving
      subsequent filings unprocessed
    stage_ids:
    - crawl_and_download
  - id: finance-C-034
    when: When validating quarter values in configuration
    action: reject quarter values outside the range [1, 2, 3, 4] with a descriptive error
    severity: high
    kind: domain_rule
    modality: must
    consequence: Invalid quarter values cause unpredictable behavior in index filtering, potentially downloading wrong quarter
      data or returning empty result sets
    stage_ids:
    - crawl_and_download
  - id: finance-C-035
    when: When extracting company metadata from SEC EDGAR company pages
    action: handle missing HTML elements gracefully using try-except blocks and fall back to cached values
    severity: medium
    kind: operational_lesson
    modality: must
    consequence: Parsing failures for SIC/state/fiscal year without fallback cause NaN values in metadata CSV, breaking downstream
      financial analysis that requires SIC codes for industry filtering
    stage_ids:
    - crawl_and_download
  - id: finance-C-036
    when: When downloading documents via HTTP requests
    action: check for SEC EDGAR rate-limit error messages in response text before proceeding
    severity: high
    kind: resource_boundary
    modality: must
    consequence: Ignoring rate-limit responses allows the script to continue requesting blocked endpoints, extending the IP
      block duration significantly
    stage_ids:
    - crawl_and_download
  - id: finance-C-038
    when: When downloading filings for multiple years and quarters
    action: skip validation that filings already exist locally before initiating new downloads
    severity: medium
    kind: operational_lesson
    modality: must_not
    consequence: Redownloading existing filings wastes bandwidth and API quota, and risks overwriting files that may have
      been manually curated or have different content
    stage_ids:
    - crawl_and_download
  - id: finance-C-039
    when: When providing filing types to the download module
    action: specify at least one valid filing type; reject empty filing type lists
    severity: high
    kind: domain_rule
    modality: must
    consequence: An empty filing type list causes the script to exit silently without downloading anything, wasting time on
      index downloads that serve no purpose
    stage_ids:
    - crawl_and_download
  - id: finance-C-040
    when: When using SEC EDGAR as a data source for financial analysis
    action: claim that downloaded filings represent real-time or current data
    severity: high
    kind: claim_boundary
    modality: must_not
    consequence: SEC EDGAR has inherent processing delays of 1-5 business days between filing submission and availability;
      presenting data as current misleads financial analysts about data freshness
    stage_ids:
    - crawl_and_download
  - id: finance-C-044
    when: When matching item patterns in filing text
    action: Match both Roman numerals (I, II, III) and Arabic numerals (1, 2, 3) for item numbering
    severity: high
    kind: domain_rule
    modality: must
    consequence: Single-format matching causes extraction failures for filings using alternative numbering conventions, resulting
      in missing or empty item sections in the output JSON
    stage_ids:
    - document_parsing
  - id: finance-C-045
    when: When selecting section boundaries between items
    action: Select the longest matching section when multiple candidates exist to prefer actual content over TOC entries
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Selecting shorter TOC entries over actual section content causes only table of contents text to be extracted,
      leaving item sections empty or incomplete
    stage_ids:
    - document_parsing
  - id: finance-C-048
    when: When processing CPU-bound text parsing operations
    action: Use ProcessPool (process-based parallelism) instead of thread-based parallelism to bypass the Global Interpreter
      Lock
    severity: medium
    kind: architecture_guardrail
    modality: must
    consequence: Thread-based parallelism suffers from GIL contention on CPU-bound parsing, causing severe performance degradation
      and extended processing times
    stage_ids:
    - document_parsing
  - id: finance-C-049
    when: When extracting from 10-Q reports
    action: Apply heuristics to detect and correct part separation errors in malformed filings
    severity: high
    kind: operational_lesson
    modality: must
    consequence: 10-Q filings with formatting bugs (missing PART I markers, PART I containing only ToC) cause incorrect part
      attribution, mixing financial data with narrative content
    stage_ids:
    - document_parsing
  - id: finance-C-050
    when: When handling embedded content in old filings
    action: Remove embedded PDF sections and handle legacy .txt format without <DOCUMENT> tags
    severity: high
    kind: operational_lesson
    modality: must
    consequence: Unprocessed PDF tags and missing <DOCUMENT> wrappers cause corrupted output or complete extraction failure
      for historical filings predating standardized EDGAR formatting
    stage_ids:
    - document_parsing
  - id: finance-C-053
    when: When handling edge cases in item extraction
    action: Return empty string for missing items rather than omitting keys from JSON output
    severity: high
    kind: domain_rule
    modality: must
    consequence: Missing keys in JSON output cause KeyError exceptions in downstream consumers expecting consistent schema
      across all filings
    stage_ids:
    - document_parsing
  - id: finance-C-054
    when: When logging extraction status
    action: Log warnings when 10-Q part separation encounters known formatting issues
    severity: medium
    kind: operational_lesson
    modality: must
    consequence: Silent failures in part extraction produce corrupted data without user awareness, causing downstream analysis
      to use incomplete or misattributed content
    stage_ids:
    - document_parsing
  - id: finance-C-055
    when: When verifying extracted filing data
    action: Validate that at least one item section was successfully extracted before returning JSON
    severity: high
    kind: domain_rule
    modality: must
    consequence: Returning JSON with all empty item sections provides no usable data while appearing successful, causing silent
      failures in data pipelines
    stage_ids:
    - document_parsing
  - id: finance-C-056
    when: When configuring extraction for production use
    action: Enable skip_extracted_filings option to support incremental and resumable extraction
    severity: low
    kind: operational_lesson
    modality: should
    consequence: Re-extracting already processed filings wastes CPU cycles on redundant parsing operations, increasing processing
      time proportionally to already-completed work
    stage_ids:
    - document_parsing
  - id: finance-C-057
    when: When processing filings for financial NLP research
    action: Remove financial/numerical tables from extracted text to facilitate text-only analysis workflows
    severity: medium
    kind: domain_rule
    modality: must
    consequence: Including numerical tables in text extraction corrupts NLP training data with tabular noise, degrading model
      performance on narrative financial text analysis
    stage_ids:
    - document_parsing
  - id: finance-C-058
    when: When validating input filing metadata
    action: Reject unsupported filing types with an exception listing available types
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Processing unsupported filing types produces no useful output while consuming resources, with cryptic failures
      if user doesn't understand why extraction isn't working
    stage_ids:
    - document_parsing
  - id: finance-C-059
    when: When cleaning extracted text
    action: Normalize Unicode special characters and fix broken section headers caused by OCR or transmission errors
    severity: medium
    kind: domain_rule
    modality: must
    consequence: Non-normalized special characters (smart quotes, em-dashes, non-breaking spaces) cause encoding issues and
      text matching failures in downstream NLP processing
    stage_ids:
    - document_parsing
  - id: finance-C-060
    when: When using extracted filing data for analysis
    action: Claim that extracted content is complete or free of parsing errors
    severity: medium
    kind: claim_boundary
    modality: must_not
    consequence: SEC filings frequently contain formatting bugs, inconsistent numbering, and encoding issues that cause extraction
      to fail for some items; presenting results as complete misleads users about data quality
    stage_ids:
    - document_parsing
  - id: finance-C-061
    when: When instantiating the Logger class
    action: Pass a name parameter to identify the logging context
    severity: high
    kind: domain_rule
    modality: must
    consequence: Without a name parameter, the logger lacks proper identification in log entries, making it difficult to trace
      which component generated log messages during debugging
    stage_ids:
    - logging
  - id: finance-C-065
    when: When creating log directories
    action: Verify the logs/ directory exists before writing log files
    severity: high
    kind: domain_rule
    modality: must
    consequence: Without ensuring the logs directory exists, log file writes will fail causing the logging system to malfunction
      and lose critical debugging information
    stage_ids:
    - logging
  - id: finance-C-066
    when: When suppressing third-party library logs
    action: Set urllib3 and cssutils log levels to CRITICAL to reduce noise
    severity: medium
    kind: resource_boundary
    modality: must
    consequence: Without suppressing third-party library noise, log files become polluted with irrelevant HTTP and CSS parsing
      messages, obscuring important application-level logging
    stage_ids:
    - logging
  - id: finance-C-067
    when: When selecting timestamp timezone
    action: Use gmtime() for UTC-based timestamps to verify cross-timezone consistency
    severity: medium
    kind: domain_rule
    modality: should
    consequence: Using localtime instead of gmtime will cause timestamp confusion when debugging logs across different timezones,
      making it difficult to correlate events from distributed runs
    stage_ids:
    - logging
  - id: finance-C-068
    when: When defining the LOGGING_DIR constant
    action: Place the logs directory relative to the package root
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Using an absolute path or wrong directory location will cause log file writes to fail or place logs in unexpected
      locations
    stage_ids:
    - logging
  - id: finance-C-069
    when: When configuring the file handler format
    action: Include asctime, name, levelname, and message in the log format for debugging
    severity: high
    kind: domain_rule
    modality: must
    consequence: Without comprehensive log format fields, post-run forensics becomes difficult as log entries lack context
      about timing, source component, and severity
    stage_ids:
    - logging
  - id: finance-C-070
    when: When configuring the console handler format
    action: Use simplified message-only format for console output
    severity: low
    kind: resource_boundary
    modality: must
    consequence: Including verbose format fields in console output clutters the terminal with redundant information during
      real-time monitoring
    stage_ids:
    - logging
  - id: finance-C-071
    when: When storing log files in version control
    action: Commit log files to version control
    severity: high
    kind: claim_boundary
    modality: must_not
    consequence: Committing log files to version control causes repository bloat and exposes potentially sensitive information
      about system internals
    stage_ids:
    - logging
  - id: finance-C-072
    when: When instantiating Logger for a new module
    action: Pass a descriptive name based on the module or operation context
    severity: medium
    kind: architecture_guardrail
    modality: must
    consequence: Without a descriptive name parameter, log entries become ambiguous about which module or operation generated
      them, hampering debugging
    stage_ids:
    - logging
  - id: finance-C-073
    when: When using the filemode parameter
    action: Use append mode ('a') to preserve log history across runs
    severity: medium
    kind: domain_rule
    modality: must
    consequence: Using write mode ('w') would overwrite existing logs, losing valuable historical debugging information from
      previous runs
    stage_ids:
    - logging
  - id: finance-C-074
    when: When adding a console handler to the root logger
    action: Add the console handler to the root logger to capture each module logs
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Adding console handler to a non-root logger will cause duplicate output or miss logs from other modules that
      don't explicitly use the same logger name
    stage_ids:
    - logging
  - id: finance-C-075
    when: When documenting the logging infrastructure
    action: Claim the logging system provides real-time streaming or live monitoring
    severity: medium
    kind: claim_boundary
    modality: must_not
    consequence: The logging system uses polling-based StreamHandler for console output and does not provide true real-time
      streaming capabilities, so such claims would be misleading
    stage_ids:
    - logging
  - id: finance-C-077
    when: When downloading EDGAR indices for future quarters
    action: Request indices for quarters beyond the current calendar quarter
    severity: high
    kind: domain_rule
    modality: must_not
    consequence: SEC EDGAR returns 404 errors for future quarter indices, causing download_indices() to fail repeatedly and
      waste API quota
  - id: finance-C-080
    when: When SEC EDGAR blocks requests due to rate limiting
    action: Wait and retry with exponential backoff (up to 5 retries with 0.2 backoff factor) before failing
    severity: high
    kind: resource_boundary
    modality: must
    consequence: Without retry logic, rate-limited requests fail immediately, causing incomplete index downloads and missing
      filing data
  - id: finance-C-081
    when: When appending new filings to FILINGS_METADATA.csv
    action: Write to a temporary file first (.tmp), then atomically move it to the final location using shutil.move
    severity: high
    kind: operational_lesson
    modality: must
    consequence: Direct writes can corrupt the CSV if interrupted (e.g., Ctrl+C), leaving metadata in an inconsistent state
      and causing duplicate downloads on retry
  - id: finance-C-082
    when: When missing company metadata is encountered during crawl
    action: Fill missing values (SIC, State of Inc, State location, Fiscal Year End) from companies_info.json cache keyed
      by CIK
    severity: medium
    kind: architecture_guardrail
    modality: must
    consequence: Incomplete metadata causes downstream document_parsing to produce JSON with empty or null fields, reducing
      data utility for NLP research
  - id: finance-C-084
    when: When processing 8-K filings dated before August 23, 2004
    action: Use obsolete 8-K item naming convention (items 1-12) instead of modern dot-notation (items 1.01-9.01)
    severity: high
    kind: operational_lesson
    modality: must
    consequence: Using wrong item pattern causes zero items to be extracted from pre-2004 8-K filings, resulting in incomplete
      NLP datasets
  - id: finance-C-085
    when: When extracting filing content from raw HTML/text documents
    action: Detect HTML structure via <td> and <tr> tags to determine whether to use BeautifulSoup parsing or plain text regex
      extraction
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Incorrect parsing mode causes garbled text extraction, breaking NLP tokenization and analysis downstream
  - id: finance-C-086
    when: When replacing NaN values in DataFrames read from CSV
    action: Convert np.nan to Python None for consistent null value handling across each downstream JSON serialization
    severity: high
    kind: domain_rule
    modality: must
    consequence: np.nan values serialize as 'NaN' strings in JSON, breaking schema validation and causing downstream parsing
      errors
  - id: finance-C-088
    when: When SEC EDGAR's bulk index files use .zip format
    action: Extract master.zip archive and parse master.idx file starting from line 11 (skipping EDGAR header lines)
    severity: medium
    kind: resource_boundary
    modality: must
    consequence: Parsing from line 1 includes EDGAR header metadata, corrupting the index DataFrame with invalid filing records
  - id: finance-C-089
    when: When handling iXBRL documents in SEC filings
    action: Strip ix?doc=/ prefix from URLs before downloading to get valid .htm file links
    severity: high
    kind: operational_lesson
    modality: must
    consequence: Invalid iXBRL URLs cause HTTP 404 errors, leaving raw filings missing and metadata pointing to non-existent
      files
  - id: finance-C-090
    when: When writing extracted filing JSON output
    action: Store JSON with UTF-8 encoding (ensure_ascii=False) to preserve special characters in financial text
    severity: medium
    kind: architecture_guardrail
    modality: must
    consequence: ASCII encoding mangles non-ASCII characters (e.g., trademark symbols, em-dashes, currency symbols), corrupting
      financial text for NLP training
  - id: finance-C-091
    when: When presenting EDGAR-CRAWLER as a data source for financial analysis
    action: Claim the extracted JSON structure is semantically equivalent to the original SEC filing documents
    severity: medium
    kind: claim_boundary
    modality: must_not
    consequence: HTML parsing can miss or incorrectly extract content; tables are optionally removed; the tool is designed
      for NLP research, not regulatory compliance
  - id: finance-C-092
    when: When using crawled SEC filing data for trading or investment decisions
    action: Treat EDGAR-CRAWLER output as real-time or authoritative financial data suitable for live trading
    severity: high
    kind: claim_boundary
    modality: must_not
    consequence: EDGAR has inherent reporting delays (8-K within 4 business days); crawled data reflects historical filings,
      not current market conditions
  - id: finance-C-093
    when: When encountering extraction failures for individual items
    action: Skip investigation and assume the source filing lacks that section content
    severity: medium
    kind: rationalization_guard
    modality: must_not
    consequence: Many 10-Q filings have formatting bugs (missing PART headers, ToC interference); skipping investigation leads
      to systematically incomplete NLP datasets
  - id: finance-C-094
    when: When implementing file paths across download and extract stages
    action: Use {DATASET_DIR} as root directory for each file paths, as defined in __init__.py:2
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: File operations write to unintended directories, causing data loss or retrieval failures because files are
      not in the expected canonical location
  - id: finance-C-096
    when: When reading filings metadata CSV from any stage
    action: Use dtype=str in pd.read_csv to prevent pandas type coercion on numeric fields like CIK, and replace np.nan with
      None
    severity: high
    kind: domain_rule
    modality: must
    consequence: CIK values lose leading zeros (e.g., 0000320193 becomes 320193), causing mismatches between downloaded file
      names and metadata references
  - id: finance-C-097
    when: When processing 8-K filings with dates around the historical transition point
    action: Use cutoff date '2004-08-23' consistently between extract_items.py and test_extract_items.py to determine whether
      to use item_list_8k or item_list_8k_obsolete
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Pre-2004-08-23 8-K filings use wrong item pattern matching (modern item names instead of obsolete), causing
      all items to extract as empty strings
  - id: finance-C-098
    when: When extracting items from 10-Q filings
    action: Use roman_numeral_map keys (1-20) that match part numbers in item_list_10q to enable dual-format matching (Roman
      and Arabic numerals) for PART detection
    severity: high
    kind: domain_rule
    modality: must
    consequence: PART I and PART 1 sections fail to match correctly, causing entire 10-Q parts to be missed during extraction
  - id: finance-C-099
    when: When presenting or reporting this system's extracted financial data to users
    action: Claim that extracted filing data equals real-time trading signals, calculated financial metrics, or live market
      data
    severity: high
    kind: claim_boundary
    modality: must_not
    consequence: Users build automated trading systems based on stale EDGAR filings (8-K/10-Q/10-K are delayed disclosures),
      leading to trades on outdated information and potential regulatory violations
  - id: finance-C-100
    when: When building financial analysis systems using this toolkit's output
    action: Claim that parsed 10-K/10-Q/8-K item text provides calculated financial metrics such as P/E ratios, EPS, or ROI
    severity: high
    kind: claim_boundary
    modality: must_not
    consequence: Users make investment decisions based on uncalculated text strings, leading to incorrect financial analysis
      and potential financial losses
  - id: finance-C-101
    when: When deploying this toolkit in enterprise document processing pipelines
    action: Claim that extracted JSON output includes schema validation, data quality guarantees, or completeness verification
      for production-grade compliance systems
    severity: high
    kind: claim_boundary
    modality: must_not
    consequence: Compliance systems accept unvalidated JSON with empty item fields as complete, leading to regulatory reporting
      gaps and audit failures
  - id: finance-C-102
    when: When processing non-SEC financial documents
    action: Claim support for extracting structured data from non-SEC financial data sources such as company press releases,
      earnings call transcripts, or international regulatory filings
    severity: high
    kind: claim_boundary
    modality: must_not
    consequence: Users attempt to parse non-SEC documents with SEC-specific item pattern matching, producing malformed JSON
      with missing or incorrect field mappings
  - id: finance-C-103
    when: When downloading filings from SEC EDGAR
    action: Declare a valid User-Agent string in HTTP requests to SEC EDGAR to comply with their access policy and avoid IP
      blocking
    severity: high
    kind: resource_boundary
    modality: must
    consequence: SEC EDGAR blocks requests without proper User-Agent identification, causing downloads to fail with traffic
      management messages
    stage_ids:
    - index_download
  - id: finance-C-104
    when: When naming extracted JSON keys for 10-Q items
    action: 'Use ''__'' delimiter to encode part-item relationship in JSON keys: {part}_item_{number} format (e.g., part_1_item_1,
      part_2_item_1A)'
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Downstream NLP systems expecting standard part_item_N format receive mismatched key names, causing schema
      validation failures and data ingestion errors
  - id: finance-C-105
    when: When naming extracted JSON keys for 10-K and 8-K items
    action: 'Use ''item_'' prefix for each item keys: item_{number} format (e.g., item_1, item_1A, item_2.01, item_9A)'
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Downstream systems expecting standard item_N key format receive malformed key names, breaking data pipeline
      integration
  - id: finance-C-106
    when: When implementing or modifying SEC filing item extraction regex patterns in extract_items.py
    action: Maintain regex patterns that capture both Roman numeral (I, II, III) and Arabic numeral (1, 2, 3) formats for
      item numbering to verify comprehensive extraction from historical SEC filings with heterogeneous numbering conventions
    severity: high
    kind: domain_rule
    modality: must
    consequence: Single-format matching causes extraction failures for historical filings with non-standard numbering conventions,
      leading to incomplete data extraction and missing critical disclosure items from the SEC corpus
    derived_from_bd_id: BD-010
  - id: finance-C-107
    when: When implementing SEC filing document parsing logic in download_filings.py
    action: Parse HTML document format tables for .htm/.html links first, then fall back to complete submission TXT files
      for older filings to verify full coverage from 1994 to present
    severity: high
    kind: domain_rule
    modality: must
    consequence: HTML-only parsing misses older TXT-based SEC submissions, creating gaps in filing history and incomplete
      coverage of historical regulatory filings prior to SEC standardization
    derived_from_bd_id: BD-004
  - id: finance-C-108
    when: When configuring logging levels in logger.py
    action: Set console output to INFO+ level for clean operational indicators without DEBUG noise, and file logging to DEBUG+
      level to capture complete diagnostic information for post-run forensics
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Single log level either clutters console output with DEBUG noise during batch operations or loses critical
      diagnostic information needed for failure investigation and performance debugging
    derived_from_bd_id: BD-016
  - id: finance-C-109
    when: When implementing CIK lookup and company metadata retrieval logic
    action: Maintain a persistent company_info.json cache file for company metadata to avoid redundant SEC EDGAR API calls
      during bulk operations, reducing API overhead by approximately 80%
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Per-filing CIK lookups trigger excessive API requests, increasing rate-limit risk and causing significant
      throughput degradation in bulk download scenarios with repeated CIK access patterns
    derived_from_bd_id: BD-002
  - id: finance-C-110
    when: When configuring log file naming in the logging setup
    action: Use timestamped log filenames to verify unique log files per execution run, preventing overwrites and enabling
      post-hoc debugging of specific execution windows
    severity: medium
    kind: operational_lesson
    modality: must
    consequence: Static log filenames cause overwrites between execution runs, making it impossible to diagnose issues in
      long-running bulk operations and losing critical forensic evidence
    derived_from_bd_id: BD-015
  - id: finance-C-112
    when: When implementing or refactoring directory initialization logic
    action: Change the eager directory creation pattern in __init__.py to lazy/on-demand creation — directories must be created
      at import time, not on first file write
    severity: high
    kind: domain_rule
    modality: must_not
    consequence: Lazy directory creation introduces FileNotFoundError during file operations when imports occur without triggering
      creation, breaking SEC filing downloads in production environments
    derived_from_bd_id: BD-019
  - id: finance-C-113
    when: When configuring dataset directory paths for SEC filing extraction
    action: Verify that DATASET_DIR path matches deployment requirements — if a custom location is needed, modify the hardcoded
      'datasets' subfolder path in __init__.py before running extraction workflows
    severity: medium
    kind: operational_lesson
    modality: should
    consequence: Hardcoded DATASET_DIR causes extraction failures when the 'datasets' subfolder location doesn't match user
      expectations or deployment environment paths
    derived_from_bd_id: BD-017
  - id: finance-C-115
    when: When implementing HTTP request retry logic for SEC EDGAR downloads
    action: Use exponential backoff with 5 retries for HTTP requests — SEC EDGAR enforces strict rate limits and returns 403
      errors when exceeded
    severity: high
    kind: domain_rule
    modality: must
    consequence: Without sufficient retry logic, bulk downloads fail prematurely on rate-limited requests, requiring manual
      restart and failing to complete large SEC filing batches
    derived_from_bd_id: BD-003
  - id: finance-C-116
    when: When modifying 10-Q extraction logic that uses state modification and restoration
    action: Preserve the state modification/restoration pattern for bug recovery — if refactoring, use a context manager or
      equivalent atomic pattern to verify self.items_list is always restored after temporary assignment
    severity: high
    kind: domain_rule
    modality: must
    consequence: Removing the state restoration pattern causes self.items_list to retain incorrect intermediate state after
      extraction failures, corrupting subsequent filing data in the batch
    derived_from_bd_id: BD-078
  - id: finance-C-117
    when: When implementing company metadata caching for SEC EDGAR downloads
    action: Cache company metadata (SIC codes, state of incorporation, fiscal year) in companies_info.json — caching eliminates
      redundant HTTP requests and prevents rate-limit pressure during bulk operations
    severity: high
    kind: domain_rule
    modality: must
    consequence: Without metadata caching, each filing triggers redundant HTTP requests for constant company information,
      causing approximately 50x increase in API calls and potential rate-limit failures
    derived_from_bd_id: BD-005
  - id: finance-C-122
    when: When extracting items from SEC 10-Q filings in extract_items.py
    action: Split document text into Part I (financial statements) and Part II (management discussion) before item extraction
      to prevent Item 1A contamination between sections
    severity: high
    kind: operational_lesson
    modality: must
    consequence: Without part-level separation, Item 1A in Part I (risk factors) mixes with Item 1A in Part II (controls discussion),
      corrupting downstream analysis by mixing financial risk disclosures with management assessment content
    derived_from_bd_id: BD-009
  - id: finance-C-123
    when: When implementing table filtering logic in extract_items.py
    action: Remove only tables with background-color or background-image attributes; preserve each other tables regardless
      of their visual appearance — do not assume each tables are data tables
    severity: high
    kind: architecture_guardrail
    modality: must_not
    consequence: Removing all tables destroys item listings and narrative content that appear in unstyled HTML tables, losing
      critical information from SEC filing management discussions and risk disclosures
    derived_from_bd_id: BD-008
  - id: finance-C-124
    when: When implementing section boundary detection in extract_items.py
    action: Select the longest matching section when multiple candidates share identical headers — this disambiguates Table
      of Contents entries (shorter) from actual item content (longer)
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Without longest-match selection, Table of Contents entries match first and cause premature section termination,
      truncating actual item content and losing 2-5% of critical SEC filing disclosure text per document
    derived_from_bd_id: BD-011
  - id: finance-C-125
    when: When implementing any randomized behavior in the SEC EDGAR download pipeline
    action: Assume the framework handles random seed configuration for reproducible downloads — the framework does not implement
      random seed management, leading to non-deterministic download sequences across runs
    severity: high
    kind: claim_boundary
    modality: must_not
    consequence: Without random seed management, download sequences vary between runs causing inconsistent file ordering,
      potential duplicate downloads, and non-reproducible audit trails that fail regulatory compliance requirements
    derived_from_bd_id: BD-GAP-002
  - id: finance-C-126
    when: When implementing reproducibility requirements in the SEC EDGAR download pipeline
    action: Implement random seed configuration by setting numpy.random.seed() and random.seed() before any randomized operations
      in index_download, and document the seed value used for each download session in logs
    severity: high
    kind: domain_rule
    modality: must
    consequence: Without explicit random seed handling, retry logic and shuffling operations produce different results each
      run, preventing audit reproducibility and making it impossible to reproduce exact download sequences for regulatory
      verification
    derived_from_bd_id: BD-GAP-002
  - id: finance-C-127
    when: When using HtmlStripper for HTML parsing in SEC document extraction
    action: Verify that convert_charrefs=True and strict=False are documented in system configuration; if implementing custom
      HTML parsing, verify equivalent entity conversion and malformed HTML tolerance behavior to maintain consistency with
      extraction pipeline
    severity: medium
    kind: operational_lesson
    modality: should
    consequence: HtmlStripper with convert_charrefs=True automatically converts HTML character references like &amp; to Unicode
      characters, potentially creating inconsistencies if downstream processing expects raw entities; strict=False silently
      tolerates malformed HTML which could mask parser errors
    derived_from_bd_id: BD-074
  - id: finance-C-128
    when: When implementing or configuring parallel filing processing in SEC document extraction
    action: Use ProcessPool for parallel filing processing due to Python GIL limitations on CPU-bound parallelism; ThreadPool
      is insufficient for text parsing workloads; verify processes >= 2 for actual parallelism benefit
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Using ThreadPool for CPU-bound HTML parsing and regex matching provides no parallelism benefit due to Python
      GIL; single-process execution becomes a bottleneck when processing large batches of SEC filings, causing linear scaling
      degradation
    derived_from_bd_id: BD-012
  - id: finance-C-129
    when: When implementing Roman numeral conversion for SEC 10-Q document parsing
    action: Verify roman_numeral_map covers values 1-20 for bidirectional conversion between numeric and Roman numeral Part/Item
      identifiers in 10-Q filings; values exceeding 20 will return '?' placeholder and cause section identification to fail
    severity: high
    kind: domain_rule
    modality: must
    consequence: SEC 10-Q filings use Roman numerals for Parts and Items (I, II, III, IV, V, VI, VII, VIII, IX, X, etc.).
      When a 10-Q contains Part X or higher, roman_numeral_map returns '?' placeholder, causing downstream section matching
      to fail silently and missing critical financial disclosures
    derived_from_bd_id: BD-024
  - id: finance-C-130
    when: When extracting SEC filing content from HTML documents
    action: Require presence of both <td> AND <tr> HTML elements to classify a document as HTML; documents containing only
      one element type should not be classified as full HTML documents
    severity: high
    kind: domain_rule
    modality: must
    consequence: Some SEC documents contain embedded HTML snippets that don't represent full document structure. Misclassifying
      a document as HTML when it only has partial table elements causes incorrect parsing logic to be applied, resulting in
      garbled or incomplete extraction of filing content
    derived_from_bd_id: BD-025
  - id: finance-C-131
    when: When implementing table extraction from SEC filing documents
    action: Preserve tables containing item index patterns (Item 1, Item 1A, Item 2, etc.) regardless of background color
      styling; do not filter or remove tables based solely on visual CSS attributes
    severity: high
    kind: domain_rule
    modality: must
    consequence: Item index tables are critical for document structure and navigation. Removing tables with colored backgrounds
      during document cleaning causes critical SEC filing section headers to be lost, breaking downstream content extraction
      and document structure analysis
    derived_from_bd_id: BD-032
  - id: finance-C-132
    when: When processing span elements in extracted SEC document content
    action: Replace horizontal span margins (CSS margin-left/margin-right) with single space character, and vertical span
      margins (CSS margin-top/margin-bottom) with single newline character; this rule applies to margin CSS properties only,
      not padding or other spacing
    severity: medium
    kind: architecture_guardrail
    modality: must
    consequence: Span margin replacement preserves intended word separation and line breaks in SEC documents. Without proper
      spacing rules, merged words lose boundaries horizontally and paragraph structure is lost vertically, causing content
      to become unreadable or misinterpreted
    derived_from_bd_id: BD-033
  - id: finance-C-133
    when: When constructing absolute URLs from SEC EDGAR index relative paths for filing downloads
    action: Prepend 'https://www.sec.gov/Archives/' to relative file paths to construct valid absolute URLs; validate or handle
      broken paths before URL construction to prevent 404 errors on downloads
    severity: medium
    kind: operational_lesson
    modality: should
    consequence: SEC EDGAR indices contain relative file paths that require base URL prepending. Broken or malformed relative
      paths result in 404 errors causing complete download failures with no indication of which filings were missed in batch
      processing
    derived_from_bd_id: BD-041
  - id: finance-C-134
    when: When implementing or refactoring logging initialization in SEC document extraction modules
    action: Instantiate logger after config.json is loaded and its logging configuration is available; do not create module-level
      LOGGER objects before configuration is loaded as this prevents custom logging settings from being applied
    severity: medium
    kind: operational_lesson
    modality: should_not
    consequence: Logger instantiated at module level before config.json loads creates temporal ordering dependency. The logger
      operates with default configuration throughout the module load phase, logging at incorrect levels or to wrong handlers
      until configuration is eventually applied, causing debugging visibility gaps
    derived_from_bd_id: BD-081
  - id: finance-C-135
    when: When configuring or adjusting 10-Q extraction retry loop parameters
    action: Verify length_difference threshold of 5000 chars matches actual document size expectations before using; setting
      threshold too high risks accepting incomplete extractions, while too low may cause unnecessary retries or valid partial
      extractions to be rejected
    severity: medium
    kind: operational_lesson
    modality: should
    consequence: The 5000-character length_difference threshold determines when the 10-Q extraction retry mechanism continues
      or terminates. Wrong threshold causes either incomplete content acceptance (high threshold) or valid extraction rejections
      (low threshold), both leading to unreliable backtest data quality
    derived_from_bd_id: BD-082
  - id: finance-C-136
    when: When processing deeply nested SEC EDGAR HTML documents with HtmlStripper and BeautifulSoup
    action: Investigate how recursion limit (30000), HTMLParser settings (convert_charrefs=True, strict=False), and BeautifulSoup
      tree traversal interact; implement graceful fallback mechanism for documents that may trigger RecursionError before
      hitting configured recursion limit
    severity: high
    kind: operational_lesson
    modality: must
    consequence: Deeply nested malformed HTML combined with lenient HTMLParser settings and recursive BeautifulSoup traversal
      creates a risk cascade where RecursionError occurs before hitting the configured 30000 limit, causing complete extraction
      failure instead of graceful degradation on pathological documents
    derived_from_bd_id: BD-083
  - id: finance-C-137
    when: When processing SEC EDGAR filings with deeply nested HTML tables and divs
    action: Set sys.setrecursionlimit to 30000 at module initialization before BeautifulSoup tree traversal; the recursion
      limit provides headroom for pathological nesting depth while bounding maximum stack depth to prevent resource exhaustion
      on extremely malformed documents
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: SEC EDGAR filings can contain deeply nested tables, divs, and spans that exceed Python's default recursion
      limit of 1000. Without elevated recursion limit, BeautifulSoup tree traversal triggers StackOverflow on malformed documents,
      causing complete extraction failure
    derived_from_bd_id: BD-014
  - id: finance-C-138
    when: When implementing table extraction logic for SEC financial documents
    action: Apply background color filtering threshold before table removal decisions — do not remove tables that have colored
      backgrounds (RGB-based threshold) as these typically represent financial data tables with visual hierarchy
    severity: high
    kind: domain_rule
    modality: must
    consequence: Without color-based filtering, financial tables rendered with colored backgrounds (for visual hierarchy in
      SEC filings) will be incorrectly discarded, causing loss of critical numerical data like balance sheets and income statements
    derived_from_bd_id: BD-068
  - id: finance-C-139
    when: When implementing Table of Contents matching logic in SEC document extraction
    action: Maintain the ignore-matches counter threshold for ToC filtering to prevent infinite loops on malformed documents
      — the counter MUST stop ToC-based matching and fall back to content extraction after reaching the threshold
    severity: high
    kind: domain_rule
    modality: must
    consequence: Without the ignore counter, malformed SEC documents with malformed ToC entries will cause unbounded iteration,
      leading to extraction process hangs or denial-of-service on crafted inputs
    derived_from_bd_id: BD-070
  - id: finance-C-140
    when: When importing the extract_items module in SEC filing extraction
    action: Set sys.setrecursionlimit(30000) as a global process-wide change at module load time — instead, use a context
      manager, specific function scope, or subprocess isolation to localize recursion limit changes
    severity: high
    kind: architecture_guardrail
    modality: must_not
    consequence: Global recursion limit changes at module import permanently alter process behavior, masking stack overflow
      bugs in unrelated code running in the same interpreter and causing unexpected truncation of legitimate deep recursion
    derived_from_bd_id: BD-076
  - id: finance-C-142
    when: When implementing page number and header removal in SEC document text cleanup
    action: Use comprehensive regex patterns covering each page number format variations (standalone numbers, with 'Page',
      with dashes, Roman numerals) and validate removal effectiveness with post-processing checks
    severity: medium
    kind: operational_lesson
    modality: must
    consequence: Incomplete regex patterns for page number removal will leave page artifacts in extracted text, degrading
      downstream analysis quality and potentially confusing content identification algorithms
    derived_from_bd_id: BD-060
  - id: finance-C-143
    when: When implementing section header matching in SEC filing extraction
    action: Apply case-sensitive matching before case-insensitive as priority order — when case-sensitive match exists anywhere
      in document, use it regardless of position, not just as tiebreaker
    severity: high
    kind: domain_rule
    modality: must
    consequence: Without correct case-sensitive-first priority, SEC filings with non-canonical section header casing will
      match case-insensitive variants first, potentially extracting wrong sections and corrupting document structure analysis
    derived_from_bd_id: BD-063
  - id: finance-C-144
    when: When processing SEC filings that contain embedded PDF content within HTML wrappers
    action: Strip embedded PDF sections (<PDF>...</PDF>) from HTML documents during extraction — actual PDF content is lost;
      do not treat these as extractable text items
    severity: high
    kind: domain_rule
    modality: must
    consequence: Embedded PDF content within HTML wrappers is not parseable as text; without stripping, raw PDF bytes contaminate
      text extraction and corrupt downstream analysis with unreadable content
    derived_from_bd_id: BD-055
  - id: finance-C-145
    when: When reading SEC filing files with inconsistent encoding from various sources
    action: Use errors='backslashreplace' for file reading to handle encoding issues gracefully — do not use UTF-8 strict
      mode which will crash on malformed encodings
    severity: high
    kind: domain_rule
    modality: must
    consequence: Without backslashreplace encoding handling, SEC filings with invalid UTF-8 sequences will cause file read
      exceptions, preventing extraction from completing on documents that could yield valid content
    derived_from_bd_id: BD-056
  - id: finance-C-146
    when: When implementing text cleanup that removes navigation elements from SEC filings
    action: Remove 'Table of Contents', 'Index to Financial Statements', 'Back to Contents', and 'Quicklinks' navigation headers
      — but validate these as navigation elements using positional context (appear at document start/mid-section) before removal,
      not just phrase matching alone
    severity: medium
    kind: operational_lesson
    modality: must
    consequence: Phrase-only matching removes section headers that legitimately contain these phrases in substantive content,
      causing silent loss of actual document sections disguised as navigation elements
    derived_from_bd_id: BD-061
  - id: finance-C-147
    when: When implementing Unicode normalization for special character handling in SEC filings
    action: Normalize Unicode representations (em-dashes, smart quotes, accented characters) to standard ASCII equivalents
      for consistent text matching — but document this normalization for downstream consumers and verify semantic preservation
      when normalization is applied
    severity: medium
    kind: operational_lesson
    modality: should
    consequence: Without documented normalization, downstream systems may not expect ASCII-converted characters, causing subtle
      semantic changes in financial terminology and company names that affect matching accuracy
    derived_from_bd_id: BD-069
  - id: finance-C-148
    when: When implementing file download and persistence logic for SEC filings
    action: Write CSV metadata to a temporary file first, then move to the final location using atomic rename — do not write
      directly to the target path
    severity: high
    kind: domain_rule
    modality: must
    consequence: Direct writes risk leaving partial data if the process is interrupted, corrupting the index file and causing
      downstream data retrieval failures
    derived_from_bd_id: BD-048
  - id: finance-C-149
    when: When processing SEC filing downloads with incremental update logic
    action: Verify that existing file detection relies on exact naming format matching as implemented in the codebase — do
      not assume alternative detection methods (checksums, manifest files) are used unless explicitly configured
    severity: medium
    kind: operational_lesson
    modality: should
    consequence: If the naming format convention changes or files are renamed externally, the detection logic may incorrectly
      skip existing files or re-download unnecessarily, causing data duplication or gaps
    derived_from_bd_id: BD-047
  - id: finance-C-150
    when: When implementing or modifying the SEC filing download module (BD-077 CSV format contract)
    action: Implement explicit schema validation at the CSV format contract boundary between download and extract modules
      to detect any format changes before they cause silent failures or corrupted metadata
    severity: high
    kind: operational_lesson
    modality: must
    consequence: Without schema validation, a modified CSV format causes the extract module to silently fail or produce corrupted
      company metadata, leading to incorrect financial data in backtesting results
    derived_from_bd_id: BD-084
  - id: finance-C-151
    when: When implementing CIK lookup or company info caching with incremental download (BD-002) and skip-existing (BD-047)
    action: Implement unified caching with TTL-based invalidation to verify company data reflects recent changes (e.g., new
      SIC codes, post-merger name changes), and validate that cached CIK lookups are consistent with current company_info
      records
    severity: medium
    kind: operational_lesson
    modality: should
    consequence: Duplicate cache mechanisms (companies_info.json vs company_info.json) with stale data cause CIK lookups to
      reference outdated company information, resulting in extraction of wrong company filings or missing updated company
      data in backtest
    derived_from_bd_id: BD-089
  - id: finance-C-152
    when: When implementing table extraction logic from SEC documents
    action: 'Check for non-blank background colors (any color that is not white, transparent, none, or #fff) and remove tables
      with such backgrounds — colored backgrounds often indicate navigation elements, disclaimers, or other non-content tables'
    severity: high
    kind: domain_rule
    modality: must
    consequence: Preserving tables with colored backgrounds causes extraction of non-content elements like navigation menus
      and disclaimers, contaminating the extracted data and reducing analysis quality
    derived_from_bd_id: BD-031
  - id: finance-C-153
    when: When implementing SEC item section pattern matching logic
    action: Insert optional whitespace (zero or more spaces) before trailing letters (A, B, C) in item patterns to match variations
      like Item 1A and Item 1 B in SEC documents
    severity: high
    kind: domain_rule
    modality: must
    consequence: Strict whitespace requirements in item patterns cause missed matches for sub-sections with extra whitespace,
      resulting in incomplete document extraction and missing risk factors
    derived_from_bd_id: BD-034
  - id: finance-C-154
    when: When implementing SIGNATURE section extraction from SEC filings
    action: Extract the SIGNATURE block from the last occurrence in the document, not the first — table of contents entries
      may appear before the actual signature block
    severity: high
    kind: domain_rule
    modality: must
    consequence: Extracting the first SIGNATURE occurrence captures TOC entries instead of the genuine signature block, resulting
      in incomplete or incorrect signer information extraction
    derived_from_bd_id: BD-037
  - id: finance-C-155
    when: When implementing 10-Q document parsing and item extraction
    action: 'Parse 10-Q documents in two parts: Part I (Items 1-4, financial information) and Part II (Items 1-6, non-financial
      information) — Items 5-6 only appear in Part II'
    severity: high
    kind: domain_rule
    modality: must
    consequence: Single-section 10-Q extraction misses Items 5-6 that appear only in Part II, resulting in incomplete regulatory
      filings and potential compliance failures
    derived_from_bd_id: BD-038
  - id: finance-C-156
    when: When implementing text extraction cleanup from SEC HTML documents
    action: Normalize whitespace by removing excessive spaces while preserving paragraph and list structure — excessive HTML
      whitespace creates noise in extracted text
    severity: high
    kind: domain_rule
    modality: must
    consequence: Without whitespace normalization, excessive spacing in HTML causes corrupted extracted text with irregular
      formatting, making downstream analysis unreliable
    derived_from_bd_id: BD-066
  - id: finance-C-157
    when: When implementing the process_filing function or refactoring filing processing logic
    action: Call determine_items_to_extract BEFORE calling extract_items to verify item selection logic executes before extraction
      begins
    severity: high
    kind: domain_rule
    modality: must
    consequence: Violating the function call order causes KeyError exceptions when extract_items attempts to access items
      that have not been pre-identified by determine_items_to_extract, resulting in runtime failures
    derived_from_bd_id: BD-071
  - id: finance-C-158
    when: When implementing 10-Q parsing logic or refactoring filing extraction components
    action: 'Preserve the cascading parsing sequence: (1) BD-009 split 10-Q into Part I and Part II first, (2) BD-038 applies
      item mapping within correct part context (Part I=Items 1-4, Part II=Items 1-6), (3) BD-075 uses ''__'' as part-item
      delimiter for encoding, (4) BD-079 uses Roman numeral map for part numbering — do not modify any single point without
      validating the full cascade'
    severity: high
    kind: domain_rule
    modality: must
    consequence: 'Breaking the cascade at any point causes cascading failures: changing BD-075 delimiter breaks split logic,
      incomplete BD-079 map fails part identification, BD-009 separation failure causes BD-038 to extract items in wrong context,
      all resulting in incorrect filing output'
    derived_from_bd_id: BD-087
  - id: finance-C-159
    when: When using the framework's default item extraction behavior without specifying items_to_extract
    action: Verify that extracting each available items aligns with your use case; if targeting specific items for analysis,
      explicitly specify items_to_extract parameter to avoid processing large filings with unnecessary items and potential
      performance degradation
    severity: medium
    kind: operational_lesson
    modality: should
    consequence: Default behavior extracts all available items, which may cause significant processing time on large filings
      and introduce noise in analysis when only specific items are needed for targeted research
    derived_from_bd_id: BD-052
  - id: finance-C-160
    when: When extracting SEC filing content using default configuration without explicit include_signature setting
    action: Verify that SIGNATURE sections containing personal signer information are not needed for your analysis; for compliance
      or audit use cases, set include_signature=true to capture signer details
    severity: medium
    kind: operational_lesson
    modality: should
    consequence: Default exclusion of SIGNATURE sections silently removes relevant signer information that may be required
      for compliance verification, audit trails, or forensic analysis use cases
    derived_from_bd_id: BD-053
  - id: finance-C-161
    when: When extracting SEC filing content using default configuration without explicit remove_tables setting
    action: Verify that tabular data including numerical content, financial tables, and structured information is not needed
      for your analysis; for quantitative research, set remove_tables=false to preserve table content
    severity: medium
    kind: operational_lesson
    modality: should
    consequence: Default table removal silently discards legitimate tabular content including financial data, numerical schedules,
      and structured information that may be critical for quantitative analysis and backtesting strategies
    derived_from_bd_id: BD-054
  - id: finance-C-162
    when: When implementing or refactoring directory initialization and file path handling logic in deployment scenarios
    action: Verify directory paths remain configurable or writable in restricted environments (shared servers, containers,
      cloud functions); must not assume the package directory is always writable
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Hardcoded package-relative directory paths cause immediate import failures in restricted deployment environments
      where the package directory lacks write permissions, preventing any trading functionality from loading
    derived_from_bd_id: BD-088
  - id: finance-C-163
    when: When implementing or refactoring HTML detection and parsing logic for document extraction
    action: Preserve the multi-criteria HTML detection logic requiring BOTH <td> AND <tr> elements before selecting HtmlStripper
      parsing strategy; must not simplify detection to require only <td> or only <tr>
    severity: high
    kind: domain_rule
    modality: must
    consequence: Simplifying HTML detection to require only partial table elements causes wrong parsing strategy selection,
      leading to extraction failure or corrupted output on edge-case documents that contain partial table structures
    derived_from_bd_id: BD-090
  - id: finance-C-164
    when: When parsing 10-Q SEC documents using section separation logic
    action: 'Implement length-based validation to detect parsing anomalies: flag documents where PART I appears only in ToC
      without substantive section body, or where PART II is disproportionately longer indicating section boundary detection
      failure'
    severity: medium
    kind: operational_lesson
    modality: should
    consequence: The section separation heuristic silently fails on 10-Q filings where PART I is listed in ToC but lacks a
      separate section, causing parsing to skip or misalign critical financial disclosure content
    derived_from_bd_id: BD-057
  - id: finance-C-165
    when: When implementing or refactoring item number matching patterns in SEC document extraction
    action: Preserve the explicit boundary character set [.*~-:\s\(] after item numbers in regex patterns; must not remove
      these separator characters or replace with simpler word boundary assertions only
    severity: high
    kind: domain_rule
    modality: must
    consequence: Simplifying item number patterns to use only word boundaries causes items followed by unexpected separator
      characters to fail matching, silently skipping important SEC disclosure items in extracted content
    derived_from_bd_id: BD-064
  - id: finance-C-166
    when: When implementing or refactoring whitespace handling in SEC document text processing patterns
    action: Preserve the explicit whitespace definition [^\S\r\n] (matching whitespace but explicitly excluding newlines and
      carriage returns); must not replace with standard \s or broader character classes that include line breaks
    severity: high
    kind: domain_rule
    modality: must
    consequence: Replacing the custom whitespace pattern with standard \s causes newlines to be treated as ordinary whitespace,
      destroying line-oriented document structure and breaking pattern matching that depends on line boundaries for SEC document
      parsing
    derived_from_bd_id: BD-065
  - id: finance-C-167
    when: When implementing 10-Q extraction workflow
    action: Call get_10q_parts to populate the parts dictionary with section boundaries before entering the item extraction
      loop — verify parts['metadata'], parts['financial_statements'], etc. are available for regex pattern matching
    severity: high
    kind: domain_rule
    modality: must
    consequence: Skipping get_10q_parts causes item regex patterns to operate on unparsed raw content, producing malformed
      or missing item data that corrupts downstream financial analysis and reporting
    derived_from_bd_id: BD-073
  - id: finance-C-169
    when: When processing HTTP responses from SEC EDGAR during data retrieval
    action: Assume rate limit detection is complete based on any single mechanism — BD-044 text detection alone is insufficient
      (only catches 'will be managed until action is taken'), BD-062 status codes alone miss 200 responses with embedded rate-limit
      content, BD-003 retry logic alone lacks explicit rate limit awareness
    severity: high
    kind: architecture_guardrail
    modality: must_not
    consequence: Relying on incomplete rate limit detection causes the framework to miss rate limit errors and retry non-rate-limit
      failures, or fail to retry when rate-limited — resulting in corrupted or missing market data that propagates into incorrect
      trading signals
    derived_from_bd_id: BD-086
  - id: finance-C-170
    when: When handling HTTP 200 responses with embedded content during SEC EDGAR retrieval
    action: Implement explicit content scanning for rate-limit indicators within 200 OK responses — BD-003 retry mechanism
      and BD-062 status code detection do not trigger for 200 status, so BD-044 HTML text detection is the only safeguard
      against rate-limited pages returned as successful responses
    severity: high
    kind: domain_rule
    modality: must
    consequence: Rate-limited pages returned as 200 OK bypass all error handling, causing the framework to treat rate-limited
      content as valid data. Trading strategies then execute on empty or placeholder content, leading to incorrect position
      sizing and significant financial losses
    derived_from_bd_id: BD-086
output_validator:
  assertions:
  - id: OV-01
    check_predicate: all(p in inspect.getsource(zvt.factors.algorithm.macd) for p in ['slow=26', 'fast=12', 'n=9'])
    failure_message: 'FATAL: MACD params drifted from (fast=12, slow=26, n=9) — SL-08 violation, non-reproducible signals'
    business_meaning: Standard MACD parameters are a semantic lock; drift makes results incomparable with industry-standard
      indicators and non-reproducible.
    source_ids:
    - SL-08
    - BD-036
  - id: OV-02
    check_predicate: result.get('total_trades', 0) > 0 or result.get('explicit_zero_trade_ack') is True
    failure_message: Zero trades executed — likely missing pre-fetched data (see PC-02) or over-restrictive filters
    business_meaning: A backtest with zero trades is not a valid result; either data is missing or the strategy never triggered.
      Structural non-emptiness check is insufficient — we need business confirmation.
    source_ids:
    - SL-01
    - finance-C-073
  - id: OV-03
    check_predicate: result.get('annual_return') is None or abs(float(result['annual_return'])) <= 5.0
    failure_message: 'FATAL: |annual_return| > 500% — likely look-ahead bias or data error'
    business_meaning: Annual returns exceeding 500% are physically implausible for A-share strategies; indicates look-ahead
      bias or corrupt data.
    source_ids: []
  - id: OV-04
    check_predicate: result.get('holding_change_pct') is None or abs(float(result['holding_change_pct'])) <= 1.0
    failure_message: 'FATAL: |holding_change_pct| > 100% — physically impossible'
    business_meaning: Holding change percentage cannot exceed 100%; violation indicates position accounting error.
    source_ids:
    - BD-029
  - id: OV-05
    check_predicate: result.get('max_drawdown') is None or abs(float(result['max_drawdown'])) <= 1.0
    failure_message: 'FATAL: |max_drawdown| > 100% — impossible for non-leveraged account'
    business_meaning: Maximum drawdown cannot exceed 100% without leverage; violation indicates calculation error or look-ahead
      bias.
    source_ids: []
  - id: OV-06
    check_predicate: not (hasattr(result, 'trade_log') and result.trade_log and any(result.trade_log[i].action == 'sell' and
      i+1 < len(result.trade_log) and result.trade_log[i+1].action == 'buy' and result.trade_log[i].timestamp == result.trade_log[i+1].timestamp
      for i in range(len(result.trade_log)-1)))
    failure_message: 'FATAL: buy-before-sell detected in same cycle — SL-01 violation, creates implicit leverage'
    business_meaning: SL-01 requires sell() before buy() in each cycle; violation means available_long was not updated before
      buying, risking duplicate positions.
    source_ids:
    - SL-01
  scaffold:
    validate_py_path: '{workspace}/validate.py'
    tail_block: "# === DO NOT MODIFY BELOW THIS LINE ===\nif __name__ == \"__main__\":\n    result = run_backtest()\n    from\
      \ validate import enforce_validation\n    enforce_validation(result, output_path=\"{workspace}/result.csv\")\n# ===\
      \ END DO NOT MODIFY ==="
  enforcement_protocol: 1. Never edit validate.py. 2. Never delete the DO NOT MODIFY tail block from the main script. 3. Never
    wrap enforce_validation() in try/except. 4. Never rewrite result write logic — it MUST go through enforce_validation.
    5. If validate.py raises ImportError, fix the dependency, do not remove the call.
acceptance:
  hard_gates:
  - id: G1
    check: '{workspace}/result.csv exists AND file size > 0'
    on_fail: Strategy did not produce output; check run_backtest() return value and enforce_validation() call
  - id: G2
    check: '{workspace}/result.csv.validation_passed marker file exists'
    on_fail: Validation did not complete; review validate.py output and fix assertion failures
  - id: G3
    check: 'Main script contains literal: from validate import enforce_validation'
    on_fail: Validation chain stripped; re-add the import in the DO NOT MODIFY block
  - id: G4
    check: 'Main script contains literal: # === DO NOT MODIFY BELOW THIS LINE ==='
    on_fail: Validation fence removed; regenerate DO NOT MODIFY tail block
  - id: G5
    check: 'result.csv has at least 1 row: pandas.read_csv(result.csv).shape[0] >= 1'
    on_fail: Empty result; check if trade_log is non-empty and factors generated signals. Confirm PC-02 (k-data exists) passed.
  - id: G6
    check: 'If MACD strategy: source contains ''slow=26'' AND ''fast=12'' AND ''n=9'' in algorithm call'
    on_fail: MACD params drifted from SL-08 lock; restore standard (12, 26, 9)
  - id: G7
    check: 'For data pipeline tasks: result.csv contains ''entity_id'' and ''timestamp'' fields'
    on_fail: Missing required columns; check Mixin.query_data return schema and DataFrame MultiIndex reset_index() before
      writing
  - id: G8
    check: 'OV-03 passes: abs(annual_return) <= 5.0 (500%)'
    on_fail: Physical plausibility check failed; investigate look-ahead bias or data corruption in input kdata
  soft_gates:
  - id: SG-01
    rubric: 'Strategy narrative consistency: user intent aligns with generated strategy.py logic. dim_a: signal direction
      (buy/sell) matches intent [1-5, pass>=4]; dim_b: frequency (daily/intraday) aligns [1-5, pass>=4]; dim_c: risk controls
      match user intent [1-5, pass>=4].'
  - id: SG-02
    rubric: 'Factor combination quality. dim_a: no highly correlated factor duplication [1-5, pass>=4]; dim_b: multi-period
      alignment correct [1-5, pass>=4]; dim_c: liquidity filter present for A-share [1-5, pass>=4].'
  - id: SG-03
    rubric: 'Data source selection appropriateness. dim_a: coverage sufficient for target entities [1-5, pass>=4]; dim_b:
      provider latency acceptable for strategy frequency [1-5, pass>=4]; dim_c: no unauthorized provider used without credentials
      [1-5, pass>=4].'
skill_crystallization:
  trigger: all_hard_gates_passed AND user_opt_out_skill_saving != true
  output_path_template: '{workspace}/../skills/{slug}.skill'
  slug_template: '{blueprint_id_short}-{uc_id_lower}'
  captured_fields:
  - name
  - intent_keywords
  - entry_point_script
  - validate_script
  - fatal_constraints
  - spec_locks
  - preconditions
  - install_recipes
  - human_summary_translated
  action: 'After all Hard Gates PASS, resolve slug via slug_template using the executed UC, then write the .skill YAML file
    at output_path_template. Notify user in their detected locale: ''Skill saved as {slug}.skill — next time say one of {sample_triggers}
    from the matched UC to invoke directly.'''
  violation_signal: All hard gates passed but no .skill file exists at expected path
  skill_file_schema:
    name: finance-bp-114 / SEC EDGAR Filing Extraction
    version: v5.3
    intent_keywords:
    - EDGAR
    - SEC filings
    - 10-K extraction
    - annual report parsing
    - document extraction
    entry_point: run_backtest
    fatal_guards:
    - SL-01
    - SL-02
    - SL-03
    - SL-04
    - SL-05
    - SL-06
    - SL-07
    - SL-08
    - SL-10
    - SL-11
    - SL-12
    spec_locks:
    - SL-01
    - SL-02
    - SL-03
    - SL-04
    - SL-05
    - SL-06
    - SL-07
    - SL-08
    - SL-09
    - SL-10
    - SL-11
    - SL-12
    preconditions:
    - PC-01
    - PC-02
    - PC-03
    - PC-04
post_install_notice:
  trigger: skill_installation_complete
  message_template:
    positioning: I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow.
    capability_catalog:
      group_strategy:
        source: auto_grouped
        strategy_reason: no candidate field had 2-7 distinct values; all capabilities collapsed into single group
      groups:
      - group_id: all
        name: All Capabilities
        description: ''
        emoji: 📦
        uc_count: 1
        ucs:
        - uc_id: UC-101
          name: SEC EDGAR Filing Extraction
          short_description: 'Extracts and processes SEC EDGAR filings (10-K annual reports, 10-Q quarterly reports) from
            compressed ZIP archives for downstream financial analysis '
          sample_triggers:
          - EDGAR
          - SEC filings
          - 10-K extraction
    call_to_action: Tell me which one you want to try.
    featured_entries:
    - uc_id: UC-101
      beginner_prompt: Try sec edgar filing extraction
      auto_selected: true
    - uc_id: UC-100
      beginner_prompt: Try capability UC-100
      auto_selected: true
    - uc_id: UC-101
      beginner_prompt: Try capability UC-101
      auto_selected: true
    more_info_hint: Ask me 'what else can you do?' to see all 1 capabilities.
  locale_rendering:
    instruction: On skill_installation_complete, translate ALL user-facing strings (positioning + capability_catalog.groups[].name
      + capability_catalog.groups[].description + capability_catalog.groups[].ucs[].short_description + call_to_action + featured_entries[].beginner_prompt
      + more_info_hint) into detected user locale per locale_contract. Preserve UC-IDs, group_id, emoji, and sample_triggers
      verbatim.
    preserve_verbatim:
    - UC-IDs
    - group_id
    - emoji
    - sample_triggers
    - technical_class_names
  enforcement:
    action: 'Host agent MUST send composed message to user as the FIRST user-facing response after skill_installation_complete
      event. Message MUST contain: positioning, capability_catalog (rendered as markdown tables per group), 3 featured_entries,
      call_to_action, and more_info_hint.'
    violation_code: PIN-01
    violation_signal: First user-facing message post-install does not contain the full capability_catalog (all UCs grouped)
      OR skips featured_entries OR skips call_to_action.
human_summary:
  persona: Doraemon
  what_i_can_do:
    tagline: 'I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow. Just tell me
      what you want; I''ll write the code, you don''t have to dig docs. (Heads up: ZVT natively supports A-share, HK, and
      crypto. US stocks — stockus_nasdaq_AAPL — are half-baked; don''t bother for serious work.)'
    use_cases:
    - SEC EDGAR Filing Extraction
    - A-share MACD daily golden-cross backtest with hfq price adjustment from eastmoney
    - 'End-to-end ZVT pipeline: FinanceRecorder + GoodCompanyFactor + StockTrader'
    - Multi-factor strategy with TargetSelector (AND mode) combining MACD + volume breakout
    - Index composition data collection (SZ1000, SZ2000) with EM recorder
    - Institutional fund holdings tracker via joinquant_fund_runner pattern
    - Custom Transformer + Accumulator factor with per-entity rolling state
  what_i_auto_fetch:
  - ZVT stage pipeline structure (data_collection → visualization) from LATEST.yaml
  - Semantic locks (SL-01 through SL-12) — especially sell-before-buy ordering and MACD params
  - Fatal constraints (finance-C-*) relevant to your target strategy type
  - 'Default parameters: MACD(12,26,9), hfq adjustment, buy_cost=0.001, base_capital=1M CNY'
  - Entity ID format (stock_sh_600000) and DataFrame MultiIndex convention
  - Provider-specific recorder class names and required class attributes
  what_i_ask_you:
  - 'Target market: A-share (default), HK, or crypto? (US stocks in ZVT are half-baked — stockus_nasdaq_AAPL exists but coverage
    is thin)'
  - 'Data source / provider: eastmoney (free, no account), joinquant (account+paid), baostock (free, good history), akshare,
    or qmt (broker)?'
  - 'Strategy type: MACD golden-cross, MA crossover, volume breakout, fundamental screen, or custom factor?'
  - 'Time range: start_timestamp and end_timestamp for backtest period'
  - 'Target entity IDs: specific stocks (stock_sh_600000) or index components (SZ1000)?'
  locale_rendering:
    instruction: On first user contact, translate all fields above into detected user locale while preserving Doraemon persona
      (direct, frank, mildly snarky, knows limits).
    preserve_verbatim:
    - BD-IDs
    - SL-IDs
    - UC-IDs
    - finance-C-IDs
    - class_names
    - function_names
    - file_paths
    - numeric_thresholds

ClawHub Coding Data Analysis+2

T@clawhub-tangweigang-jpg-8679fec286

Economic Dashboard

Skill

提供全球宏观经济数据仪表板视图，支持多源数据本地存储、冷热数据分离存储与自动化刷新调度。

---
name: economic-dashboard
description: |-
  提供全球宏观经济数据仪表板视图，支持多源数据本地存储、冷热数据分离存储与自动化刷新调度。
license: Proprietary. See LICENSE.txt in project root.
compatibility: Designed for Doramagic-host ecosystem (Claude Code / openclaw / Cursor). Requires Python 3.12+ with uv package manager.
metadata:
  version: "v6.1"
  blueprint_id: "finance-bp-083"
  compiled_at: "2026-04-22T13:00:33.402010+00:00"
  capability_markets: "global"
  capability_activities: "macro-data"
  sop_version: "crystal-compilation-v6.1"
---
# 宏观经济仪表板 (economic-dashboard)

> 提供全球宏观经济数据仪表板视图，支持多源数据本地存储、冷热数据分离存储与自动化刷新调度。

## Pipeline

`data_collection -> data_storage -> factor_computation -> target_selection -> trading_execution -> visualization`

## Top Use Cases (13 total)

### Database Snapshot Optimization (`UC-101`)
Creates optimized database backups by partitioning hot (<90 days) and cold (>90 days) data into appropriate storage formats with ZSTD compression and
**Triggers**: backup, snapshot, parquet

### Database Compaction and Optimization (`UC-102`)
Optimizes database performance by running VACUUM, rebuilding indexes, and deduplicating records within retention windows while measuring compression s
**Triggers**: vacuum, optimize, database cleanup

### Daily Economic Data Refresh (`UC-104`)
Fetches each economic data from FRED and Yahoo Finance APIs daily and stores results in cache for dashboard consumption
**Triggers**: refresh data, daily update, FRED data

For all **13** use cases, see [references/USE_CASES.md](references/USE_CASES.md).

**Execute trigger**: `When user intent matches intent_router.uc_entries[].positive_terms AND user uses action verb (run/execute/跑/执行/backtest/fetch/collect)`

## What I'll Ask You

- Target market: A-share (default), HK, or crypto? (US stocks in ZVT are half-baked — stockus_nasdaq_AAPL exists but coverage is thin)
- Data source / provider: eastmoney (free, no account), joinquant (account+paid), baostock (free, good history), akshare, or qmt (broker)?
- Strategy type: MACD golden-cross, MA crossover, volume breakout, fundamental screen, or custom factor?
- Time range: start_timestamp and end_timestamp for backtest period
- Target entity IDs: specific stocks (stock_sh_600000) or index components (SZ1000)?

## Semantic Locks (Fatal)

| ID | Rule | On Violation |
|---|---|---|
| `SL-01` | Execute sell orders before buy orders in every trading cycle | halt |
| `SL-02` | Trading signals MUST use next-bar execution (no look-ahead) | halt |
| `SL-03` | Entity IDs MUST follow format entity_type_exchange_code | halt |
| `SL-04` | DataFrame index MUST be MultiIndex (entity_id, timestamp) | halt |
| `SL-05` | TradingSignal MUST have EXACTLY ONE of: position_pct, order_money, order_amount | halt |
| `SL-06` | filter_result column semantics: True=BUY, False=SELL, None/NaN=NO ACTION | halt |
| `SL-07` | Transformer MUST run BEFORE Accumulator in factor pipeline | halt |
| `SL-08` | MACD parameters locked: fast=12, slow=26, signal=9 | halt |

Full lock definitions: [references/LOCKS.md](references/LOCKS.md)

## Top Anti-Patterns (14 total)

- **`AP-MACRO-DATA-001`**: SEC EDGAR Rate Limit Violation
- **`AP-MACRO-DATA-002`**: Temporal Knowledge Graph Look-Ahead Bias
- **`AP-MACRO-DATA-003`**: Technical Indicator Look-Ahead Bias via Missing Shift

All 14 anti-patterns: [references/ANTI_PATTERNS.md](references/ANTI_PATTERNS.md)

## Evidence Quality Notice

> [QUALITY NOTICE] This crystal was compiled from blueprint finance-bp-083. Evidence verify ratio = 28.0% and audit fail total = 33. Generated results may have uncaptured requirement gaps. Verify critical decisions against source files (LATEST.yaml / LATEST.jsonl).

## Reference Files

| File | Contents | When to Load |
|---|---|---|
| [references/seed.yaml](references/seed.yaml) | V6+ 全量权威 (source-of-truth) | 有行为/决策争议时必读 |
| [references/ANTI_PATTERNS.md](references/ANTI_PATTERNS.md) | 14 条跨项目反模式 | 开始实现前 |
| [references/WISDOM.md](references/WISDOM.md) | 跨项目精华借鉴 | 架构决策时 |
| [references/CONSTRAINTS.md](references/CONSTRAINTS.md) | domain + fatal 约束 | 规则冲突时 |
| [references/USE_CASES.md](references/USE_CASES.md) | 全量 KUC-* 业务场景 | 需要完整示例时 |
| [references/LOCKS.md](references/LOCKS.md) | SL-* + preconditions + hints | 生成回测/交易代码前 |
| [references/COMPONENTS.md](references/COMPONENTS.md) | AST 组件地图（按 module 拆分）| 查 API 时 |

---

*Compiled by Doramagic crystal-compilation-v6.1 from `finance-bp-083` blueprint at 2026-04-22T13:00:33.402010+00:00.*
*See [human_summary.md](human_summary.md) for non-technical overview.*

FILE:human_summary.md
# Human Summary

> I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow. Just tell me what you want; I'll write the code, you don't have to dig docs. (Heads up: ZVT natively supports A-share, HK, and crypto. US stocks — stockus_nasdaq_AAPL — are half-baked; don't bother for serious work.)

## What I Can Do

- **tagline**: I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow. Just tell me what you want; I'll write the code, you don't have to dig docs. (Heads up: ZVT natively supports A-share, HK, and crypto. US stocks — stockus_nasdaq_AAPL — are half-baked; don't bother for serious work.)
- **use_cases**: ['API Key Management Verification', 'Database Compaction and Optimization', 'Database Snapshot Optimization', 'A-share MACD daily golden-cross backtest with hfq price adjustment from eastmoney', 'End-to-end ZVT pipeline: FinanceRecorder + GoodCompanyFactor + StockTrader', 'Multi-factor strategy with TargetSelector (AND mode) combining MACD + volume breakout', 'Index composition data collection (SZ1000, SZ2000) with EM recorder']

## What I Ask You

- Target market: A-share (default), HK, or crypto? (US stocks in ZVT are half-baked — stockus_nasdaq_AAPL exists but coverage is thin)
- Data source / provider: eastmoney (free, no account), joinquant (account+paid), baostock (free, good history), akshare, or qmt (broker)?
- Strategy type: MACD golden-cross, MA crossover, volume breakout, fundamental screen, or custom factor?
- Time range: start_timestamp and end_timestamp for backtest period
- Target entity IDs: specific stocks (stock_sh_600000) or index components (SZ1000)?

FILE:references/ANTI_PATTERNS.md
# Anti-Patterns (Cross-Project)

Total: **14**

## finance-bp-074--FinRobot (1)

### `AP-MACRO-DATA-001` — SEC EDGAR Rate Limit Violation <sub>(high)</sub>

When implementing SEC API calls without applying rate limiting decorators, requests exceed the regulatory 10 requests per second limit. This causes IP blocking from SEC EDGAR, preventing all subsequent access to critical financial filings and completely disrupting the data collection pipeline. FinRobot demonstrates that SEC enforces strict rate limits and missing User-Agent headers compound this by causing silent request failures.

## finance-bp-077--Open_Source_Economic_Model (2)

### `AP-MACRO-DATA-004` — EIOPA Non-Compliant Curve Extrapolation <sub>(high)</sub>

When implementing the Smith-Wilson algorithm for EIOPA Solvency II calculations, using non-EIOPA compliant formulas or incorrect convergence point calculations violates regulatory specifications. The convergence point must use max(U+40, 60) years per EIOPA paragraph 120. Non-compliant formulas will fail regulatory audits for insurance liability calculations and produce incorrect risk-free rates, leading to materially wrong liability valuations.

### `AP-MACRO-DATA-009` — CSV BOM Encoding Corruption in Data Import <sub>(medium)</sub>

When importing CSV portfolio files with special characters without using 'utf-8-sig' encoding to handle BOM markers, CSV files with UTF-8 BOM markers fail to parse correctly. This causes KeyError exceptions when reading row fields, preventing portfolio data from loading entirely. The BOM marker silently corrupts the first column name read by pandas.

## finance-bp-080--FinDKG (3)

### `AP-MACRO-DATA-002` — Temporal Knowledge Graph Look-Ahead Bias <sub>(high)</sub>

When implementing temporal data splitting for knowledge graphs, using non-temporal train/val/test splits causes the model to see future events during training. The violation of train_edges occurring before val_edges and test_edges temporally results in inflated metrics that do not reflect real-world performance. This produces overfit models that fail catastrophically when deployed for actual temporal prediction tasks.

### `AP-MACRO-DATA-008` — DGL Graph Attribute Propagation Failure in Temporal Batching <sub>(medium)</sub>

When implementing temporal knowledge graph data collation without propagating graph attributes (num_relations, num_all_nodes, time_interval) to subgraph variants created by collate_fn, downstream model components encounter missing attribute errors. The EmbeddingUpdater and EdgeModel expect these attributes on all graph objects including subgraphs, causing training to fail with AttributeError.

### `AP-MACRO-DATA-014` — Temporal DataLoader Shuffling Breaking Graph Ordering <sub>(medium)</sub>

When configuring DataLoader for temporal knowledge graph training with shuffle=True, the temporal ordering required for cumulative graph construction is violated. The model receives edges in non-chronological order, breaking the prior_G, batch_G, cumulative_G construction logic that depends on edges_before_batch occurring before edges_in_batch.

## finance-bp-083--Economic-Dashboard (3)

### `AP-MACRO-DATA-003` — Technical Indicator Look-Ahead Bias via Missing Shift <sub>(high)</sub>

When implementing SMA crossover detection (golden/death cross) without using shift(1) to compare current bar state with prior bar state, crossover detection uses current bar data causing look-ahead bias. Signals appear to fire at the same bar as the cross occurs, producing unrealistic backtest results that fail in live trading. Rationalizing this with 'we need the current bar signal immediately' leads to future information leaking into current signals.

### `AP-MACRO-DATA-010` — OHLCV Data Quality Validation Failure <sub>(medium)</sub>

When calculating technical indicators from OHLCV data without verifying required columns (open, high, low, close, volume), missing required OHLCV columns causes ValueError and prevents technical indicator calculation for affected tickers. This blocks downstream regime classification and pattern detection for all tickers with incomplete data.

### `AP-MACRO-DATA-011` — Inconsistent Primary Key Schema Causing JOIN Failures <sub>(medium)</sub>

When storing derived features in DuckDB with a different primary key schema than technical_features table, inconsistent primary keys prevent JOIN operations between tables. This breaks regime classification and pattern detection pipelines. The composite primary key (ticker, date) must be consistent across all feature tables to enable efficient querying and data integrity.

## finance-bp-105--open-climate-investing (5)

### `AP-MACRO-DATA-005` — Factor Regression Using Raw Returns Instead of Excess Returns <sub>(high)</sub>

When computing returns for CAPM/Fama-French factor regression, using raw stock returns instead of subtracting the risk-free rate (Rf) violates standard financial econometric methodology. CAPM/FF regression requires excess returns (Return - Rf); using raw returns produces incorrect beta estimates that misrepresent a stock's systematic risk exposure. This leads to fundamentally flawed risk attribution and portfolio construction decisions.

### `AP-MACRO-DATA-006` — Percentage vs Decimal Unit Mismatch in Factor Data <sub>(high)</sub>

When importing Fama-French factors from CSV files, failing to divide percentage-formatted factors (e.g., 5.2) by 100 before regression causes coefficients scaled by 100x. This produces statistically invalid inference and meaningless factor loadings. The same issue applies to risk-free rate values, corrupting all CAPM beta calculations downstream.

### `AP-MACRO-DATA-007` — Insufficient Regression Observations for Statistical Validity <sub>(medium)</sub>

When implementing factor regression analysis, using fewer than 20 data points after filtering (inner join, winsorization, date range) produces unreliable or undefined t-statistics and p-values. OLS with insufficient observations produces meaningless regression coefficients, making it impossible to distinguish significant factor exposures from noise. This commonly occurs when combining multiple data sources with missing values.

### `AP-MACRO-DATA-012` — Frequency Column Enforcement Missing in Time Series Schema <sub>(medium)</sub>

When creating PostgreSQL schema for time series tables without explicit frequency column enforcement of 'MONTHLY' or 'DAILY' text values, mixed frequency data corrupts regression calculations. Combining incompatible data frequencies produces statistically invalid regression results. The database must enforce frequency consistency to prevent silent data corruption.

### `AP-MACRO-DATA-013` — PostgreSQL Fork in Multiprocessing Context <sub>(medium)</sub>

When implementing multiprocessing for parallel regression execution using fork start method with psycopg2 database connections, child processes inherit corrupted connection state. This causes 'connection already closed' errors or corrupted connection state in child processes, resulting in failed database writes and incomplete factor regression calculations.

FILE:references/COMPONENTS.md
# Component Capability Map

**Project**: finance-bp-083--Economic-Dashboard
**Scan date**: 2026-04-22
**Stats**: {'total_files': 7, 'total_classes': 36, 'total_functions': 0, 'total_stages': 7}

## Modules (7)

- [data_collection](components/data_collection.md): 6 classes
- [feature_engineering](components/feature_engineering.md): 6 classes
- [financial_analysis](components/financial_analysis.md): 6 classes
- [ml_training_&_prediction](components/ml_training_-_prediction.md): 6 classes
- [recession_probability](components/recession_probability.md): 3 classes
- [orchestration_&_automation](components/orchestration_-_automation.md): 3 classes
- [visualization_&_ui](components/visualization_-_ui.md): 6 classes

FILE:references/CONSTRAINTS.md
# Constraints

## preservation_manifest

```yaml
required_objects:
  business_decisions_count: 111
  fatal_constraints_count: 37
  non_fatal_constraints_count: 147
  use_cases_count: 13
  semantic_locks_count: 12
  preconditions_count: 4
  evidence_quality_rules_count: 2
  traceback_scenarios_count: 5

```

FILE:references/LOCKS.md
# Semantic Locks + Preconditions

## Semantic Locks (12)

### `SL-01` <sub>(on_violation: fatal)</sub>
Execute sell orders before buy orders in every trading cycle

### `SL-02` <sub>(on_violation: fatal)</sub>
Trading signals MUST use next-bar execution (no look-ahead)

### `SL-03` <sub>(on_violation: fatal)</sub>
Entity IDs MUST follow format entity_type_exchange_code

### `SL-04` <sub>(on_violation: fatal)</sub>
DataFrame index MUST be MultiIndex (entity_id, timestamp)

### `SL-05` <sub>(on_violation: fatal)</sub>
TradingSignal MUST have EXACTLY ONE of: position_pct, order_money, order_amount

### `SL-06` <sub>(on_violation: fatal)</sub>
filter_result column semantics: True=BUY, False=SELL, None/NaN=NO ACTION

### `SL-07` <sub>(on_violation: fatal)</sub>
Transformer MUST run BEFORE Accumulator in factor pipeline

### `SL-08` <sub>(on_violation: fatal)</sub>
MACD parameters locked: fast=12, slow=26, signal=9

### `SL-09` <sub>(on_violation: warning)</sub>
Default transaction costs: buy_cost=0.001, sell_cost=0.001, slippage=0.001

### `SL-10` <sub>(on_violation: fatal)</sub>
A-share equity trading is T+1 (no same-day close of buy positions)

### `SL-11` <sub>(on_violation: fatal)</sub>
Recorder subclass MUST define provider AND data_schema class attributes

### `SL-12` <sub>(on_violation: fatal)</sub>
Factor result_df MUST contain either 'filter_result' OR 'score_result' column

## Preconditions (4)

- **`PC-01`**: `python3 -c 'import zvt; print(zvt.__version__)'` → on_fail: Run: python3 -m pip install zvt then re-run: python3 -m zvt.init_dirs to initialize data directories
- **`PC-02`**: `python3 -c "from zvt.api.kdata import get_kdata; df = get_kdata(entity_ids=['stock_sh_600000'], limit=1); assert df is not None and len(df) > 0, 'No kdata found'"` → on_fail: Run recorder first: python3 -m zvt.recorders.em.em_stock_kdata_recorder --entity_ids stock_sh_600000 (replace with your target entity IDs)
- **`PC-03`**: `python3 -c "import os; from pathlib import Path; zvt_home = Path(os.environ.get('ZVT_HOME', Path.home() / '.zvt')); assert zvt_home.exists(), f'ZVT home not found: {zvt_home}'"` → on_fail: Run: python3 -m zvt.init_dirs
- **`PC-04`**: `python3 -c "import os, tempfile; from pathlib import Path; zvt_home = Path(os.environ.get('ZVT_HOME', Path.home() / '.zvt')); test_f = zvt_home / '.write_test'; test_f.touch(); test_f.unlink()"` → on_fail: Check directory permissions: chmod u+w ~/.zvt or set ZVT_HOME environment variable to a writable location

FILE:references/USE_CASES.md
# Known Use Cases (KUC)

Total: **13**

## `KUC-101`
**Source**: `scripts/create_database_snapshot_optimized.py`

Creates optimized database backups by partitioning hot (<90 days) and cold (>90 days) data into appropriate storage formats with ZSTD compression and incremental exports.

## `KUC-102`
**Source**: `scripts/compact_database.py`

Optimizes database performance by running VACUUM, rebuilding indexes, and deduplicating records within retention windows while measuring compression savings.

## `KUC-103`
**Source**: `scripts/verify_api_keys.py`

Verifies the API key management feature implementation is working correctly by testing module imports, credential initialization, and key storage/retrieval.

## `KUC-104`
**Source**: `scripts/refresh_data.py`

Fetches each economic data from FRED and Yahoo Finance APIs daily and stores results in cache for dashboard consumption.

## `KUC-105`
**Source**: `scripts/cleanup_old_data.py`

Archives data older than retention periods to Parquet files and deletes old records from main tables to reduce database size while maintaining historical access.

## `KUC-106`
**Source**: `scripts/quickstart_api_keys.py`

Provides a quick start guide for initializing and testing API key management, storing and verifying FRED API keys securely.

## `KUC-107`
**Source**: `scripts/setup_credentials.py`

Initializes and stores API credentials (FRED API key) securely in encrypted form for authenticated data access.

## `KUC-108`
**Source**: `scripts/move_fred_data.py`

Organizes FRED-related data files and scripts by moving them into a dedicated directory structure.

## `KUC-109`
**Source**: `scripts/generate_sample_data.py`

Generates sample datasets for offline mode testing, including FRED, Yahoo Finance, and World Bank sample data.

## `KUC-110`
**Source**: `scripts/init_database.py`

Initializes the DuckDB database by creating each required tables and indexes for the Economic Dashboard.

## `KUC-111`
**Source**: `scripts/fetch_sentiment_data.py`

Fetches news articles and sentiment data for specified stock symbols, including Google Trends data for sentiment analysis.

## `KUC-112`
**Source**: `scripts/migrate_pickle_to_duckdb.py`

Migrates existing pickle cache files containing FRED and Yahoo Finance data to the new DuckDB database format.

## `KUC-113`
**Source**: `scripts/refresh_data_smart.py`

Intelligently refreshes economic data based on natural update frequencies and SLAs, respecting rate limits and only fetching data when needed.

FILE:references/WISDOM.md
# Cross-Project Wisdom

Total: **8**

## `CW-MACRO-DATA-001` — Temporal Ordering Enforcement
**From**: finance-bp-080--FinDKG, finance-bp-083--Economic-Dashboard · **Applicable to**: macro-data

Across temporal knowledge graphs and financial time series, strict temporal ordering must be enforced in train/val/test splits and data loading. Train edges must occur strictly before validation edges, which must occur strictly before test edges. DataLoaders must never shuffle temporal data. Apply this pattern whenever implementing any time-series ML pipeline to prevent look-ahead bias that inflates evaluation metrics.

## `CW-MACRO-DATA-002` — Regulatory Formula Compliance
**From**: finance-bp-077--Open_Source_Economic_Model, finance-bp-105--open-climate-investing · **Applicable to**: macro-data

When implementing financial calculations subject to regulatory oversight (EIOPA Solvency II, CAPM, Fama-French), use exact formula specifications from authoritative sources. The Smith-Wilson convergence point must follow EIOPA paragraph 120, factor regressions must use excess returns with properly scaled inputs. Apply this pattern when calculations will be used for regulatory reporting or investment decision-making.

## `CW-MACRO-DATA-003` — Strict Data Schema Enforcement
**From**: finance-bp-083--Economic-Dashboard, finance-bp-077--Open_Source_Economic_Model · **Applicable to**: macro-data

Financial data pipelines require strict schema validation at ingestion points. OHLCV requires specific columns, CSV imports require exact column names matching field access, INI files require specific sections. Missing or malformed schema elements should fail loudly rather than produce silent corruption. Apply this pattern during data import to catch errors early before downstream calculations use bad data.

## `CW-MACRO-DATA-004` — Composite Primary Key Uniqueness
**From**: finance-bp-105--open-climate-investing, finance-bp-080--FinDKG, finance-bp-083--Economic-Dashboard · **Applicable to**: macro-data

Time-series financial databases require composite primary keys (ticker, date) to ensure uniqueness and enable efficient querying. Inconsistent primary keys across tables break JOIN operations essential for feature merging. Apply this pattern when designing any financial database schema involving time-series measurements with multiple entities.

## `CW-MACRO-DATA-005` — External API Rate Limiting
**From**: finance-bp-074--FinRobot · **Applicable to**: macro-data

When accessing external financial APIs (SEC EDGAR, data vendors), strict rate limiting must be implemented before deployment. SEC EDGAR enforces 10 requests per second with IP blocking consequences. Use decorators and proper User-Agent headers. Apply this pattern when integrating any external financial data API to prevent service disruption that blocks critical data access.

## `CW-MACRO-DATA-006` — Graph Attribute Propagation in Batching
**From**: finance-bp-080--FinDKG, finance-bp-105--open-climate-investing · **Applicable to**: macro-data

When creating subgraph variants during batch collation in graph-based ML, all metadata attributes (num_nodes, num_relations, time_interval) must be explicitly propagated to each subgraph. Downstream model components expect these attributes on all graph objects. Apply this pattern whenever implementing custom collate functions for graph neural networks to prevent training failures.

## `CW-MACRO-DATA-007` — Statistical Validity Thresholds
**From**: finance-bp-105--open-climate-investing, finance-bp-083--Economic-Dashboard · **Applicable to**: macro-data

Factor regressions and statistical calculations require minimum observation counts (typically 20+) for reliable inference. Inner joins, winsorization, and date filtering reduce observations; pipeline validation must check for sufficient data points before regression. Apply this pattern whenever computing regression statistics to ensure results are meaningful rather than spurious.

## `CW-MACRO-DATA-008` — Data Type Strictness for ML Operations
**From**: finance-bp-080--FinDKG, finance-bp-077--Open_Source_Economic_Model · **Applicable to**: macro-data

Graph operations and time calculations require strict dtype consistency (float32 for time values, integer for node types, boolean for masks). Type mismatches cause silent failures in edge_subgraph, degree calculations, and time interval transformations. Apply this pattern when preparing data for graph neural networks or any numerical ML pipeline to catch dtype issues early.

FILE:references/components/data_collection.md
# data_collection (6 classes)

## `CredentialsManager.set_api_key`
`data_collection/credentialsmanager-set-api-key.py:0`

## `CredentialsManager.get_api_key`
`data_collection/credentialsmanager-get-api-key.py:0`

## `load_fred_data`
`data_collection/load-fred-data.py:0`

## `load_yfinance_data`
`data_collection/load-yfinance-data.py:0`

## `data_source_adapter`
`data_collection/data-source-adapter.py:0`

## `cache_backend`
`data_collection/cache-backend.py:0`

FILE:references/components/feature_engineering.md
# feature_engineering (6 classes)

## `TechnicalIndicatorCalculator.calculate_all`
`feature_engineering/technicalindicatorcalculator-calculate-a.py:0`

## `OptionsMetricsCalculator.calculate`
`feature_engineering/optionsmetricscalculator-calculate.py:0`

## `DerivedFeaturesCalculator.compute`
`feature_engineering/derivedfeaturescalculator-compute.py:0`

## `FeaturePipeline.run_full_pipeline`
`feature_engineering/featurepipeline-run-full-pipeline.py:0`

## `indicator_library`
`feature_engineering/indicator-library.py:0`

## `feature_interactions`
`feature_engineering/feature-interactions.py:0`

FILE:references/components/financial_analysis.md
# financial_analysis (6 classes)

## `MarginCallRiskCalculator.calculate`
`financial_analysis/margincallriskcalculator-calculate.py:0`

## `LeverageMetricsCalculator.compute`
`financial_analysis/leveragemetricscalculator-compute.py:0`

## `InsiderTradingTracker.analyze`
`financial_analysis/insidertradingtracker-analyze.py:0`

## `FinancialHealthScorer.score`
`financial_analysis/financialhealthscorer-score.py:0`

## `risk_weights`
`financial_analysis/risk-weights.py:0`

## `insider_sentiment_formula`
`financial_analysis/insider-sentiment-formula.py:0`

FILE:references/components/ml_training_-_prediction.md
# ml_training_&_prediction (6 classes)

## `ModelTrainer.train`
`ml_training_&_prediction/modeltrainer-train.py:0`

## `EnsembleModel.fit`
`ml_training_&_prediction/ensemblemodel-fit.py:0`

## `PredictionEngine.predict`
`ml_training_&_prediction/predictionengine-predict.py:0`

## `BaseModel.save`
`ml_training_&_prediction/basemodel-save.py:0`

## `base_models`
`ml_training_&_prediction/base-models.py:0`

## `prediction_horizon`
`ml_training_&_prediction/prediction-horizon.py:0`

FILE:references/components/orchestration_-_automation.md
# orchestration_&_automation (3 classes)

## `market_data_refresh_dag`
`orchestration_&_automation/market-data-refresh-dag.py:0`

## `economic_data_refresh_dag`
`orchestration_&_automation/economic-data-refresh-dag.py:0`

## `alert_channel`
`orchestration_&_automation/alert-channel.py:0`

FILE:references/components/recession_probability.md
# recession_probability (3 classes)

## `RecessionProbabilityModel.calculate`
`recession_probability/recessionprobabilitymodel-calculate.py:0`

## `RecessionProbabilityModel.get_probability`
`recession_probability/recessionprobabilitymodel-get-probabilit.py:0`

## `indicator_weights`
`recession_probability/indicator-weights.py:0`

FILE:references/components/visualization_-_ui.md
# visualization_&_ui (6 classes)

## `app.py (landing page)`
`visualization_&_ui/app-py-landing-page.py:0`

## `10_Margin_Call_Risk_Monitor.render`
`visualization_&_ui/10-margin-call-risk-monitor-render.py:0`

## `11_Recession_Probability.render`
`visualization_&_ui/11-recession-probability-render.py:0`

## `13_Insider_Trading_Tracker.render`
`visualization_&_ui/13-insider-trading-tracker-render.py:0`

## `theme/styling`
`visualization_&_ui/theme-styling.py:0`

## `chart_library`
`visualization_&_ui/chart-library.py:0`

FILE:references/seed.yaml
meta:
  id: finance-bp-083-v5.3
  version: v6.1
  blueprint_id: finance-bp-083
  sop_version: crystal-compilation-v6.1
  source_language: en
  compiled_at: '2026-04-22T13:00:33.402010+00:00'
  target_host: openclaw
  authoritative_artifact:
    primary: seed.yaml
    non_authoritative_derivatives:
    - SKILL.md (host-generated summary, may lag)
    - HEARTBEAT.md (host telemetry)
    - memory/*.md (host conversational memory)
    rule: On any behavioral decision (preconditions check, OV assertion, EQ rule firing, spec_lock verification), agents MUST
      re-read seed.yaml. Derivatives are for UI display only and may be out-of-date.
  execution_protocol:
    install_trigger:
    - Execute resources.host_adapter.install_recipes[] in declared order
    - Verify each package with import check before proceeding
    execute_trigger: When user intent matches intent_router.uc_entries[].positive_terms AND user uses action verb (run/execute/跑/执行/backtest/fetch/collect)
    on_execute:
    - Reload seed.yaml (do not rely on SKILL.md or cached summaries)
    - Run preconditions[] in declared order; halt on first fatal failure with on_fail message to user
    - Enter context_state_machine.CA1_MEMORY_CHECKED state
    - Evaluate evidence_quality.enforcement_rules[]; prepend user_disclosure_template
    - Translate user_facing_fields to user locale per locale_contract
    - "[V6 READING ORDER]\nThis crystal contains the following V6 layers. Before answering any business question, the host\
      \ MUST read them in order:\n  1. anti_patterns[] — cross-project anti-patterns (with AP-* ids)\n  2. cross_project_wisdom[]\
      \ — cross-project wisdom (with CW-* ids)\n  3. domain_constraints_injected[] — domain constraints (SHARED-* ids)\n \
      \ 4. known_use_cases[] — concrete business scenarios (KUC-* ids)\n  5. component_capability_map — AST component map\
      \ (by module)\n\nWhen answering user questions, proactively cite relevant AP-*/CW-*/SHARED-*/KUC-* ids with source text.\
      \ Examples: T+1 rules -> cite SHARED-* constraint; model comparison -> warn via AP-*; follow-holdings strategy -> cite\
      \ KUC-* with example file."
    workspace_resolution:
      scripts_path: '{host_workspace}/scripts/'
      skills_path: '{host_workspace}/skills/'
      trace_path: '{host_workspace}/.trace/'
  capability_tags:
    markets:
    - global
    activities:
    - macro-data
  upgraded_from: finance-bp-083-v1.seed.yaml
  upgraded_at: '2026-04-22T13:20:19.259947+00:00'
  v6_inputs:
    ast_mind_map: knowledge/sources/finance/finance-bp-083--Economic-Dashboard/v6_inputs/ast_mind_map.yaml
    anti_patterns: null
    cross_project_wisdom: null
    examples_kuc: knowledge/sources/finance/finance-bp-083--Economic-Dashboard/v6_inputs/examples_kuc.yaml
    shared_pools_dir: knowledge/sources/finance/_shared
anti_patterns:
- id: AP-MACRO-DATA-001
  title: SEC EDGAR Rate Limit Violation
  description: When implementing SEC API calls without applying rate limiting decorators, requests exceed the regulatory 10
    requests per second limit. This causes IP blocking from SEC EDGAR, preventing all subsequent access to critical financial
    filings and completely disrupting the data collection pipeline. FinRobot demonstrates that SEC enforces strict rate limits
    and missing User-Agent headers compound this by causing silent request failures.
  project_source: finance-bp-074--FinRobot
  severity: high
  applicable_to_tags:
    markets:
    - global
    activities:
    - macro-data
  _source_file: anti-patterns/macro-data.yaml
- id: AP-MACRO-DATA-002
  title: Temporal Knowledge Graph Look-Ahead Bias
  description: When implementing temporal data splitting for knowledge graphs, using non-temporal train/val/test splits causes
    the model to see future events during training. The violation of train_edges occurring before val_edges and test_edges
    temporally results in inflated metrics that do not reflect real-world performance. This produces overfit models that fail
    catastrophically when deployed for actual temporal prediction tasks.
  project_source: finance-bp-080--FinDKG
  severity: high
  applicable_to_tags:
    markets:
    - global
    activities:
    - macro-data
  _source_file: anti-patterns/macro-data.yaml
- id: AP-MACRO-DATA-003
  title: Technical Indicator Look-Ahead Bias via Missing Shift
  description: When implementing SMA crossover detection (golden/death cross) without using shift(1) to compare current bar
    state with prior bar state, crossover detection uses current bar data causing look-ahead bias. Signals appear to fire
    at the same bar as the cross occurs, producing unrealistic backtest results that fail in live trading. Rationalizing this
    with 'we need the current bar signal immediately' leads to future information leaking into current signals.
  project_source: finance-bp-083--Economic-Dashboard
  severity: high
  applicable_to_tags:
    markets:
    - global
    activities:
    - macro-data
  _source_file: anti-patterns/macro-data.yaml
- id: AP-MACRO-DATA-004
  title: EIOPA Non-Compliant Curve Extrapolation
  description: When implementing the Smith-Wilson algorithm for EIOPA Solvency II calculations, using non-EIOPA compliant
    formulas or incorrect convergence point calculations violates regulatory specifications. The convergence point must use
    max(U+40, 60) years per EIOPA paragraph 120. Non-compliant formulas will fail regulatory audits for insurance liability
    calculations and produce incorrect risk-free rates, leading to materially wrong liability valuations.
  project_source: finance-bp-077--Open_Source_Economic_Model
  severity: high
  applicable_to_tags:
    markets:
    - global
    activities:
    - macro-data
  _source_file: anti-patterns/macro-data.yaml
- id: AP-MACRO-DATA-005
  title: Factor Regression Using Raw Returns Instead of Excess Returns
  description: When computing returns for CAPM/Fama-French factor regression, using raw stock returns instead of subtracting
    the risk-free rate (Rf) violates standard financial econometric methodology. CAPM/FF regression requires excess returns
    (Return - Rf); using raw returns produces incorrect beta estimates that misrepresent a stock's systematic risk exposure.
    This leads to fundamentally flawed risk attribution and portfolio construction decisions.
  project_source: finance-bp-105--open-climate-investing
  severity: high
  applicable_to_tags:
    markets:
    - global
    activities:
    - macro-data
  _source_file: anti-patterns/macro-data.yaml
- id: AP-MACRO-DATA-006
  title: Percentage vs Decimal Unit Mismatch in Factor Data
  description: When importing Fama-French factors from CSV files, failing to divide percentage-formatted factors (e.g., 5.2)
    by 100 before regression causes coefficients scaled by 100x. This produces statistically invalid inference and meaningless
    factor loadings. The same issue applies to risk-free rate values, corrupting all CAPM beta calculations downstream.
  project_source: finance-bp-105--open-climate-investing
  severity: high
  applicable_to_tags:
    markets:
    - global
    activities:
    - macro-data
  _source_file: anti-patterns/macro-data.yaml
- id: AP-MACRO-DATA-007
  title: Insufficient Regression Observations for Statistical Validity
  description: When implementing factor regression analysis, using fewer than 20 data points after filtering (inner join,
    winsorization, date range) produces unreliable or undefined t-statistics and p-values. OLS with insufficient observations
    produces meaningless regression coefficients, making it impossible to distinguish significant factor exposures from noise.
    This commonly occurs when combining multiple data sources with missing values.
  project_source: finance-bp-105--open-climate-investing
  severity: medium
  applicable_to_tags:
    markets:
    - global
    activities:
    - macro-data
  _source_file: anti-patterns/macro-data.yaml
- id: AP-MACRO-DATA-008
  title: DGL Graph Attribute Propagation Failure in Temporal Batching
  description: When implementing temporal knowledge graph data collation without propagating graph attributes (num_relations,
    num_all_nodes, time_interval) to subgraph variants created by collate_fn, downstream model components encounter missing
    attribute errors. The EmbeddingUpdater and EdgeModel expect these attributes on all graph objects including subgraphs,
    causing training to fail with AttributeError.
  project_source: finance-bp-080--FinDKG
  severity: medium
  applicable_to_tags:
    markets:
    - global
    activities:
    - macro-data
  _source_file: anti-patterns/macro-data.yaml
- id: AP-MACRO-DATA-009
  title: CSV BOM Encoding Corruption in Data Import
  description: When importing CSV portfolio files with special characters without using 'utf-8-sig' encoding to handle BOM
    markers, CSV files with UTF-8 BOM markers fail to parse correctly. This causes KeyError exceptions when reading row fields,
    preventing portfolio data from loading entirely. The BOM marker silently corrupts the first column name read by pandas.
  project_source: finance-bp-077--Open_Source_Economic_Model
  severity: medium
  applicable_to_tags:
    markets:
    - global
    activities:
    - macro-data
  _source_file: anti-patterns/macro-data.yaml
- id: AP-MACRO-DATA-010
  title: OHLCV Data Quality Validation Failure
  description: When calculating technical indicators from OHLCV data without verifying required columns (open, high, low,
    close, volume), missing required OHLCV columns causes ValueError and prevents technical indicator calculation for affected
    tickers. This blocks downstream regime classification and pattern detection for all tickers with incomplete data.
  project_source: finance-bp-083--Economic-Dashboard
  severity: medium
  applicable_to_tags:
    markets:
    - global
    activities:
    - macro-data
  _source_file: anti-patterns/macro-data.yaml
- id: AP-MACRO-DATA-011
  title: Inconsistent Primary Key Schema Causing JOIN Failures
  description: When storing derived features in DuckDB with a different primary key schema than technical_features table,
    inconsistent primary keys prevent JOIN operations between tables. This breaks regime classification and pattern detection
    pipelines. The composite primary key (ticker, date) must be consistent across all feature tables to enable efficient querying
    and data integrity.
  project_source: finance-bp-083--Economic-Dashboard
  severity: medium
  applicable_to_tags:
    markets:
    - global
    activities:
    - macro-data
  _source_file: anti-patterns/macro-data.yaml
- id: AP-MACRO-DATA-012
  title: Frequency Column Enforcement Missing in Time Series Schema
  description: When creating PostgreSQL schema for time series tables without explicit frequency column enforcement of 'MONTHLY'
    or 'DAILY' text values, mixed frequency data corrupts regression calculations. Combining incompatible data frequencies
    produces statistically invalid regression results. The database must enforce frequency consistency to prevent silent data
    corruption.
  project_source: finance-bp-105--open-climate-investing
  severity: medium
  applicable_to_tags:
    markets:
    - global
    activities:
    - macro-data
  _source_file: anti-patterns/macro-data.yaml
- id: AP-MACRO-DATA-013
  title: PostgreSQL Fork in Multiprocessing Context
  description: When implementing multiprocessing for parallel regression execution using fork start method with psycopg2 database
    connections, child processes inherit corrupted connection state. This causes 'connection already closed' errors or corrupted
    connection state in child processes, resulting in failed database writes and incomplete factor regression calculations.
  project_source: finance-bp-105--open-climate-investing
  severity: medium
  applicable_to_tags:
    markets:
    - global
    activities:
    - macro-data
  _source_file: anti-patterns/macro-data.yaml
- id: AP-MACRO-DATA-014
  title: Temporal DataLoader Shuffling Breaking Graph Ordering
  description: When configuring DataLoader for temporal knowledge graph training with shuffle=True, the temporal ordering
    required for cumulative graph construction is violated. The model receives edges in non-chronological order, breaking
    the prior_G, batch_G, cumulative_G construction logic that depends on edges_before_batch occurring before edges_in_batch.
  project_source: finance-bp-080--FinDKG
  severity: medium
  applicable_to_tags:
    markets:
    - global
    activities:
    - macro-data
  _source_file: anti-patterns/macro-data.yaml
cross_project_wisdom:
- wisdom_id: CW-MACRO-DATA-001
  source_project: finance-bp-080--FinDKG, finance-bp-083--Economic-Dashboard
  pattern_name: Temporal Ordering Enforcement
  description: Across temporal knowledge graphs and financial time series, strict temporal ordering must be enforced in train/val/test
    splits and data loading. Train edges must occur strictly before validation edges, which must occur strictly before test
    edges. DataLoaders must never shuffle temporal data. Apply this pattern whenever implementing any time-series ML pipeline
    to prevent look-ahead bias that inflates evaluation metrics.
  applicable_to_activity: macro-data
  _source_file: cross-project-wisdom/macro-data.yaml
- wisdom_id: CW-MACRO-DATA-002
  source_project: finance-bp-077--Open_Source_Economic_Model, finance-bp-105--open-climate-investing
  pattern_name: Regulatory Formula Compliance
  description: When implementing financial calculations subject to regulatory oversight (EIOPA Solvency II, CAPM, Fama-French),
    use exact formula specifications from authoritative sources. The Smith-Wilson convergence point must follow EIOPA paragraph
    120, factor regressions must use excess returns with properly scaled inputs. Apply this pattern when calculations will
    be used for regulatory reporting or investment decision-making.
  applicable_to_activity: macro-data
  _source_file: cross-project-wisdom/macro-data.yaml
- wisdom_id: CW-MACRO-DATA-003
  source_project: finance-bp-083--Economic-Dashboard, finance-bp-077--Open_Source_Economic_Model
  pattern_name: Strict Data Schema Enforcement
  description: Financial data pipelines require strict schema validation at ingestion points. OHLCV requires specific columns,
    CSV imports require exact column names matching field access, INI files require specific sections. Missing or malformed
    schema elements should fail loudly rather than produce silent corruption. Apply this pattern during data import to catch
    errors early before downstream calculations use bad data.
  applicable_to_activity: macro-data
  _source_file: cross-project-wisdom/macro-data.yaml
- wisdom_id: CW-MACRO-DATA-004
  source_project: finance-bp-105--open-climate-investing, finance-bp-080--FinDKG, finance-bp-083--Economic-Dashboard
  pattern_name: Composite Primary Key Uniqueness
  description: Time-series financial databases require composite primary keys (ticker, date) to ensure uniqueness and enable
    efficient querying. Inconsistent primary keys across tables break JOIN operations essential for feature merging. Apply
    this pattern when designing any financial database schema involving time-series measurements with multiple entities.
  applicable_to_activity: macro-data
  _source_file: cross-project-wisdom/macro-data.yaml
- wisdom_id: CW-MACRO-DATA-005
  source_project: finance-bp-074--FinRobot
  pattern_name: External API Rate Limiting
  description: When accessing external financial APIs (SEC EDGAR, data vendors), strict rate limiting must be implemented
    before deployment. SEC EDGAR enforces 10 requests per second with IP blocking consequences. Use decorators and proper
    User-Agent headers. Apply this pattern when integrating any external financial data API to prevent service disruption
    that blocks critical data access.
  applicable_to_activity: macro-data
  _source_file: cross-project-wisdom/macro-data.yaml
- wisdom_id: CW-MACRO-DATA-006
  source_project: finance-bp-080--FinDKG, finance-bp-105--open-climate-investing
  pattern_name: Graph Attribute Propagation in Batching
  description: When creating subgraph variants during batch collation in graph-based ML, all metadata attributes (num_nodes,
    num_relations, time_interval) must be explicitly propagated to each subgraph. Downstream model components expect these
    attributes on all graph objects. Apply this pattern whenever implementing custom collate functions for graph neural networks
    to prevent training failures.
  applicable_to_activity: macro-data
  _source_file: cross-project-wisdom/macro-data.yaml
- wisdom_id: CW-MACRO-DATA-007
  source_project: finance-bp-105--open-climate-investing, finance-bp-083--Economic-Dashboard
  pattern_name: Statistical Validity Thresholds
  description: Factor regressions and statistical calculations require minimum observation counts (typically 20+) for reliable
    inference. Inner joins, winsorization, and date filtering reduce observations; pipeline validation must check for sufficient
    data points before regression. Apply this pattern whenever computing regression statistics to ensure results are meaningful
    rather than spurious.
  applicable_to_activity: macro-data
  _source_file: cross-project-wisdom/macro-data.yaml
- wisdom_id: CW-MACRO-DATA-008
  source_project: finance-bp-080--FinDKG, finance-bp-077--Open_Source_Economic_Model
  pattern_name: Data Type Strictness for ML Operations
  description: Graph operations and time calculations require strict dtype consistency (float32 for time values, integer for
    node types, boolean for masks). Type mismatches cause silent failures in edge_subgraph, degree calculations, and time
    interval transformations. Apply this pattern when preparing data for graph neural networks or any numerical ML pipeline
    to catch dtype issues early.
  applicable_to_activity: macro-data
  _source_file: cross-project-wisdom/macro-data.yaml
domain_constraints_injected: []
resources_injected: {}
known_use_cases:
- kuc_id: KUC-101
  source_file: scripts/create_database_snapshot_optimized.py
  business_problem: Creates optimized database backups by partitioning hot (<90 days) and cold (>90 days) data into appropriate
    storage formats with ZSTD compression and incremental exports.
  intent_keywords:
  - backup
  - snapshot
  - parquet
  - database backup
  - compress data
  stage: data_collection
  data_domain: mixed
  type: data_pipeline
- kuc_id: KUC-102
  source_file: scripts/compact_database.py
  business_problem: Optimizes database performance by running VACUUM, rebuilding indexes, and deduplicating records within
    retention windows while measuring compression savings.
  intent_keywords:
  - vacuum
  - optimize
  - database cleanup
  - reclaim space
  - index rebuild
  stage: data_collection
  data_domain: mixed
  type: data_pipeline
- kuc_id: KUC-103
  source_file: scripts/verify_api_keys.py
  business_problem: Verifies the API key management feature implementation is working correctly by testing module imports,
    credential initialization, and key storage/retrieval.
  intent_keywords:
  - verify API keys
  - test credentials
  - API setup verification
  - credentials validation
  stage: data_collection
  data_domain: mixed
  type: monitoring
- kuc_id: KUC-104
  source_file: scripts/refresh_data.py
  business_problem: Fetches each economic data from FRED and Yahoo Finance APIs daily and stores results in cache for dashboard
    consumption.
  intent_keywords:
  - refresh data
  - daily update
  - FRED data
  - yfinance
  - economic data fetch
  stage: data_collection
  data_domain: financial_data
  type: data_pipeline
- kuc_id: KUC-105
  source_file: scripts/cleanup_old_data.py
  business_problem: Archives data older than retention periods to Parquet files and deletes old records from main tables to
    reduce database size while maintaining historical access.
  intent_keywords:
  - data retention
  - cleanup old data
  - archive historical
  - delete old records
  - retention policy
  stage: data_collection
  data_domain: mixed
  type: data_pipeline
- kuc_id: KUC-106
  source_file: scripts/quickstart_api_keys.py
  business_problem: Provides a quick start guide for initializing and testing API key management, storing and verifying FRED
    API keys securely.
  intent_keywords:
  - setup API keys
  - quick start
  - initialize credentials
  - API key setup
  stage: data_collection
  data_domain: mixed
  type: monitoring
- kuc_id: KUC-107
  source_file: scripts/setup_credentials.py
  business_problem: Initializes and stores API credentials (FRED API key) securely in encrypted form for authenticated data
    access.
  intent_keywords:
  - setup credentials
  - API key initialization
  - secure storage
  - FRED API
  stage: data_collection
  data_domain: mixed
  type: monitoring
- kuc_id: KUC-108
  source_file: scripts/move_fred_data.py
  business_problem: Organizes FRED-related data files and scripts by moving them into a dedicated directory structure.
  intent_keywords:
  - organize files
  - move FRED data
  - file management
  - directory structure
  stage: data_collection
  data_domain: financial_data
  type: data_pipeline
- kuc_id: KUC-109
  source_file: scripts/generate_sample_data.py
  business_problem: Generates sample datasets for offline mode testing, including FRED, Yahoo Finance, and World Bank sample
    data.
  intent_keywords:
  - generate sample data
  - offline mode
  - test data
  - sample datasets
  - offline testing
  stage: data_collection
  data_domain: mixed
  type: data_pipeline
- kuc_id: KUC-110
  source_file: scripts/init_database.py
  business_problem: Initializes the DuckDB database by creating each required tables and indexes for the Economic Dashboard.
  intent_keywords:
  - init database
  - create tables
  - database setup
  - DuckDB initialization
  stage: data_collection
  data_domain: mixed
  type: data_pipeline
- kuc_id: KUC-111
  source_file: scripts/fetch_sentiment_data.py
  business_problem: Fetches news articles and sentiment data for specified stock symbols, including Google Trends data for
    sentiment analysis.
  intent_keywords:
  - news sentiment
  - fetch news
  - sentiment analysis
  - stock news
  - Google Trends
  stage: data_collection
  data_domain: financial_data
  type: research_analysis
- kuc_id: KUC-112
  source_file: scripts/migrate_pickle_to_duckdb.py
  business_problem: Migrates existing pickle cache files containing FRED and Yahoo Finance data to the new DuckDB database
    format.
  intent_keywords:
  - migrate pickle
  - convert cache
  - DuckDB migration
  - pickle to database
  - data migration
  stage: data_collection
  data_domain: financial_data
  type: data_pipeline
- kuc_id: KUC-113
  source_file: scripts/refresh_data_smart.py
  business_problem: Intelligently refreshes economic data based on natural update frequencies and SLAs, respecting rate limits
    and only fetching data when needed.
  intent_keywords:
  - smart refresh
  - SLA aware
  - rate limit
  - incremental refresh
  - update frequency
  stage: data_collection
  data_domain: financial_data
  type: data_pipeline
component_capability_map:
  project: finance-bp-083--Economic-Dashboard
  scan_date: '2026-04-22'
  stats:
    total_files: 7
    total_classes: 36
    total_functions: 0
    total_stages: 7
  modules:
    data_collection:
      class_count: 6
      stage_id: data_collection
      stage_order: 1
      responsibility: Fetch economic data from FRED, Yahoo Finance, SEC, and CBOE APIs with offline fallback. Manages caching
        and rate limiting to ensure reliable data access. This stage exists because financial analysis requires consistent,
        fresh data from multiple authoritative sources, and the system must remain funct
      classes:
      - name: CredentialsManager.set_api_key
        file: data_collection/credentialsmanager-set-api-key.py
        line: 0
        kind: required_method
        signature: ''
      - name: CredentialsManager.get_api_key
        file: data_collection/credentialsmanager-get-api-key.py
        line: 0
        kind: required_method
        signature: ''
      - name: load_fred_data
        file: data_collection/load-fred-data.py
        line: 0
        kind: required_method
        signature: ''
      - name: load_yfinance_data
        file: data_collection/load-yfinance-data.py
        line: 0
        kind: required_method
        signature: ''
      - name: data_source_adapter
        file: data_collection/data-source-adapter.py
        line: 0
        kind: replaceable_point
      - name: cache_backend
        file: data_collection/cache-backend.py
        line: 0
        kind: replaceable_point
      design_decision_count: 4
    feature_engineering:
      class_count: 6
      stage_id: feature_engineering
      stage_order: 2
      responsibility: Calculate technical indicators, options metrics, and derived features from raw price/volume data. Transforms
        market data into ML-ready feature vectors. This stage exists because raw market data must be transformed into meaningful
        signals before any predictive modeling or analysis can occur.
      classes:
      - name: TechnicalIndicatorCalculator.calculate_all
        file: feature_engineering/technicalindicatorcalculator-calculate-a.py
        line: 0
        kind: required_method
        signature: ''
      - name: OptionsMetricsCalculator.calculate
        file: feature_engineering/optionsmetricscalculator-calculate.py
        line: 0
        kind: required_method
        signature: ''
      - name: DerivedFeaturesCalculator.compute
        file: feature_engineering/derivedfeaturescalculator-compute.py
        line: 0
        kind: required_method
        signature: ''
      - name: FeaturePipeline.run_full_pipeline
        file: feature_engineering/featurepipeline-run-full-pipeline.py
        line: 0
        kind: required_method
        signature: ''
      - name: indicator_library
        file: feature_engineering/indicator-library.py
        line: 0
        kind: replaceable_point
      - name: feature_interactions
        file: feature_engineering/feature-interactions.py
        line: 0
        kind: replaceable_point
      design_decision_count: 4
    financial_analysis:
      class_count: 6
      stage_id: financial_analysis
      stage_order: 3
      responsibility: Calculate margin risk, leverage exposure, insider trading signals, and financial health scores. Provides
        risk metrics and market stress indicators for portfolio risk management. This stage exists because raw market data
        needs risk-contextualization to support trading and investment decisions.
      classes:
      - name: MarginCallRiskCalculator.calculate
        file: financial_analysis/margincallriskcalculator-calculate.py
        line: 0
        kind: required_method
        signature: ''
      - name: LeverageMetricsCalculator.compute
        file: financial_analysis/leveragemetricscalculator-compute.py
        line: 0
        kind: required_method
        signature: ''
      - name: InsiderTradingTracker.analyze
        file: financial_analysis/insidertradingtracker-analyze.py
        line: 0
        kind: required_method
        signature: ''
      - name: FinancialHealthScorer.score
        file: financial_analysis/financialhealthscorer-score.py
        line: 0
        kind: required_method
        signature: ''
      - name: risk_weights
        file: financial_analysis/risk-weights.py
        line: 0
        kind: replaceable_point
      - name: insider_sentiment_formula
        file: financial_analysis/insider-sentiment-formula.py
        line: 0
        kind: replaceable_point
      design_decision_count: 4
    ml_training_&_prediction:
      class_count: 6
      stage_id: ml_training_prediction
      stage_order: 4
      responsibility: Train XGBoost/LightGBM ensemble models for stock direction prediction. Uses walk-forward validation
        to prevent lookahead bias in time series. This stage exists because statistical and ML models can identify patterns
        in feature data that support forward-looking market predictions.
      classes:
      - name: ModelTrainer.train
        file: ml_training_&_prediction/modeltrainer-train.py
        line: 0
        kind: required_method
        signature: ''
      - name: EnsembleModel.fit
        file: ml_training_&_prediction/ensemblemodel-fit.py
        line: 0
        kind: required_method
        signature: ''
      - name: PredictionEngine.predict
        file: ml_training_&_prediction/predictionengine-predict.py
        line: 0
        kind: required_method
        signature: ''
      - name: BaseModel.save
        file: ml_training_&_prediction/basemodel-save.py
        line: 0
        kind: required_method
        signature: ''
      - name: base_models
        file: ml_training_&_prediction/base-models.py
        line: 0
        kind: replaceable_point
      - name: prediction_horizon
        file: ml_training_&_prediction/prediction-horizon.py
        line: 0
        kind: replaceable_point
      design_decision_count: 4
    recession_probability:
      class_count: 3
      stage_id: recession_indicator
      stage_order: 5
      responsibility: Calculate recession probability using 7 weighted economic indicators (yield curve, labor, financial
        stress, etc.). Provides forward-looking economic risk assessment for macro risk management. This stage exists because
        single indicators are unreliable predictors; combining multiple signals reduces fa
      classes:
      - name: RecessionProbabilityModel.calculate
        file: recession_probability/recessionprobabilitymodel-calculate.py
        line: 0
        kind: required_method
        signature: ''
      - name: RecessionProbabilityModel.get_probability
        file: recession_probability/recessionprobabilitymodel-get-probabilit.py
        line: 0
        kind: required_method
        signature: ''
      - name: indicator_weights
        file: recession_probability/indicator-weights.py
        line: 0
        kind: replaceable_point
      design_decision_count: 3
    orchestration_&_automation:
      class_count: 3
      stage_id: orchestration_automation
      stage_order: 6
      responsibility: Schedule data refresh via Airflow DAGs. Coordinates ETL tasks, validates data quality, and sends alerts
        on failures. This stage exists because financial analysis requires fresh data on predictable schedules without manual
        intervention.
      classes:
      - name: market_data_refresh_dag
        file: orchestration_&_automation/market-data-refresh-dag.py
        line: 0
        kind: required_method
        signature: ''
      - name: economic_data_refresh_dag
        file: orchestration_&_automation/economic-data-refresh-dag.py
        line: 0
        kind: required_method
        signature: ''
      - name: alert_channel
        file: orchestration_&_automation/alert-channel.py
        line: 0
        kind: replaceable_point
      design_decision_count: 3
    visualization_&_ui:
      class_count: 6
      stage_id: visualization
      stage_order: 7
      responsibility: Streamlit-based dashboard pages for economic indicators, technical analysis, margin risk, insider trading,
        and ML predictions. This stage exists because analytical outputs need intuitive presentation to support decision-making
        by financial professionals.
      classes:
      - name: app.py (landing page)
        file: visualization_&_ui/app-py-landing-page.py
        line: 0
        kind: required_method
        signature: ''
      - name: 10_Margin_Call_Risk_Monitor.render
        file: visualization_&_ui/10-margin-call-risk-monitor-render.py
        line: 0
        kind: required_method
        signature: ''
      - name: 11_Recession_Probability.render
        file: visualization_&_ui/11-recession-probability-render.py
        line: 0
        kind: required_method
        signature: ''
      - name: 13_Insider_Trading_Tracker.render
        file: visualization_&_ui/13-insider-trading-tracker-render.py
        line: 0
        kind: required_method
        signature: ''
      - name: theme/styling
        file: visualization_&_ui/theme-styling.py
        line: 0
        kind: replaceable_point
      - name: chart_library
        file: visualization_&_ui/chart-library.py
        line: 0
        kind: replaceable_point
      design_decision_count: 3
  data_flow_hints: []
locale_contract:
  source_language: en
  user_facing_fields:
  - human_summary.what_i_can_do.tagline
  - human_summary.what_i_can_do.use_cases[]
  - human_summary.what_i_auto_fetch[]
  - human_summary.what_i_ask_you[]
  - evidence_quality.user_disclosure_template
  - post_install_notice.message_template.positioning
  - post_install_notice.message_template.capability_catalog.groups[].name
  - post_install_notice.message_template.capability_catalog.groups[].description
  - post_install_notice.message_template.capability_catalog.groups[].ucs[].name
  - post_install_notice.message_template.capability_catalog.groups[].ucs[].short_description
  - post_install_notice.message_template.call_to_action
  - post_install_notice.message_template.featured_entries[].beginner_prompt
  - post_install_notice.message_template.more_info_hint
  - preconditions[].description
  - preconditions[].on_fail
  - intent_router.uc_entries[].name
  - intent_router.uc_entries[].ambiguity_question
  - architecture.pipeline
  - architecture.stages[].narrative.does_what
  - architecture.stages[].narrative.key_decisions
  - architecture.stages[].narrative.common_pitfalls
  - constraints.fatal[].consequence
  - constraints.regular[].consequence
  - output_validator.assertions[].failure_message
  - acceptance.hard_gates[].on_fail
  - skill_crystallization.action
  locale_detection_order:
  - explicit_user_declaration
  - first_message_language
  - system_locale
  translation_enforcement:
    trigger: on_first_user_message
    action: Render user_facing_fields in detected locale, preserving all IDs (BD-/SL-/UC-/finance-C-) and code identifiers
      verbatim
    violation_code: LOCALE-01
    violation_signal: User receives untranslated English Human Summary when detected locale != en
evidence_quality:
  declared:
    evidence_coverage_ratio: 1.0
    evidence_verify_ratio: 0.28
    evidence_invalid: 54
    evidence_verified: 21
    evidence_auto_fixed: 0
    audit_coverage: 60/60 (100%)
    audit_pass_rate: 3/60 (5%)
    audit_fail_total: 33
    audit_finance_universal:
      pass: 2
      warn: 9
      fail: 9
    audit_subdomain_totals:
      pass: 1
      warn: 15
      fail: 24
  enforcement_rules:
  - id: EQ-01
    trigger: declared.evidence_verify_ratio < 0.5
    action: MUST invoke traceback lookup for all cited BD-IDs in output before emitting business code — read LATEST.yaml sections
      for each BD referenced
    violation_code: EQ-01-V
    violation_signal: Generated script references BD-IDs but no tool_call to read LATEST.yaml preceded code generation
  user_disclosure_template: '[QUALITY NOTICE] This crystal was compiled from blueprint finance-bp-083. Evidence verify ratio
    = 28.0% and audit fail total = 33. Generated results may have uncaptured requirement gaps. Verify critical decisions against
    source files (LATEST.yaml / LATEST.jsonl).'
traceback:
  source_files:
    blueprint: LATEST.yaml
    constraints: LATEST.jsonl
  mandatory_lookup_scenarios:
  - id: TB-01
    condition: Two constraints have apparently conflicting enforcement rules
    lookup_target: LATEST.jsonl — find both constraint IDs, compare `consequence` + `evidence_refs` to determine priority
  - id: TB-02
    condition: A business decision rationale is unclear or disputed
    lookup_target: LATEST.yaml — locate BD-ID under business_decisions, read `rationale` + `alternative_considered` fields
  - id: TB-03
    condition: evidence_invalid > 0 in evidence_quality.declared
    lookup_target: LATEST.yaml _enrich_meta — cross-check specific BD `evidence_refs` fields for invalid markers
  - id: TB-04
    condition: User asks where a rule comes from
    lookup_target: LATEST.jsonl — find constraint by ID, read `confidence.evidence_refs` for source file + line number
  - id: TB-05
    condition: Generated code does not match expected ZVT API behavior
    lookup_target: LATEST.yaml stages[].required_methods — verify method signature and evidence locator in source code
  degraded_lookup:
    no_fs_access: 'Ask the user to paste the relevant LATEST.yaml section or LATEST.jsonl lines for the BD-/finance-C- IDs
      in question. Crystal ID: finance-bp-083-v5.0.'
trace_schema:
  event_types:
  - precondition_check
  - spec_lock_check
  - evidence_rule_fired
  - evidence_rule_skipped
  - locale_translation_emitted
  - hard_gate_passed
  - hard_gate_failed
  - skill_emitted
  - false_completion_claim
preconditions:
- id: PC-01
  description: zvt package installed and importable
  check_command: python3 -c 'import zvt; print(zvt.__version__)'
  on_fail: 'Run: python3 -m pip install zvt  then re-run: python3 -m zvt.init_dirs to initialize data directories'
  severity: fatal
- id: PC-02
  description: K-data exists for target entities (required before backtesting)
  check_command: python3 -c "from zvt.api.kdata import get_kdata; df = get_kdata(entity_ids=['stock_sh_600000'], limit=1);
    assert df is not None and len(df) > 0, 'No kdata found'"
  on_fail: 'Run recorder first: python3 -m zvt.recorders.em.em_stock_kdata_recorder --entity_ids stock_sh_600000  (replace
    with your target entity IDs)'
  severity: fatal
  applies_to_uc: []
- id: PC-03
  description: ZVT data directory initialized (~/.zvt or ZVT_HOME)
  check_command: 'python3 -c "import os; from pathlib import Path; zvt_home = Path(os.environ.get(''ZVT_HOME'', Path.home()
    / ''.zvt'')); assert zvt_home.exists(), f''ZVT home not found: {zvt_home}''"'
  on_fail: 'Run: python3 -m zvt.init_dirs'
  severity: fatal
- id: PC-04
  description: SQLite write permission for ZVT data directory
  check_command: python3 -c "import os, tempfile; from pathlib import Path; zvt_home = Path(os.environ.get('ZVT_HOME', Path.home()
    / '.zvt')); test_f = zvt_home / '.write_test'; test_f.touch(); test_f.unlink()"
  on_fail: 'Check directory permissions: chmod u+w ~/.zvt  or set ZVT_HOME environment variable to a writable location'
  severity: warn
intent_router:
  uc_entries:
  - uc_id: UC-101
    name: Database Snapshot Optimization
    positive_terms:
    - backup
    - snapshot
    - parquet
    - database backup
    - compress data
    data_domain: mixed
    negative_terms:
    - restore
    - live trading
    - backtest
    - ML prediction
    ambiguity_question: Do you need a one-time backup or scheduled recurring snapshots?
  - uc_id: UC-102
    name: Database Compaction and Optimization
    positive_terms:
    - vacuum
    - optimize
    - database cleanup
    - reclaim space
    - index rebuild
    data_domain: mixed
    negative_terms:
    - restore
    - backup
    - trading strategy
    - screening
    ambiguity_question: Are you experiencing slow query performance or trying to reduce storage size?
  - uc_id: UC-103
    name: API Key Management Verification
    positive_terms:
    - verify API keys
    - test credentials
    - API setup verification
    - credentials validation
    data_domain: mixed
    negative_terms:
    - data refresh
    - backtest
    - screening
    - live trading
    ambiguity_question: Are you setting up credentials for the first time or troubleshooting existing credentials?
  - uc_id: UC-104
    name: Daily Economic Data Refresh
    positive_terms:
    - refresh data
    - daily update
    - FRED data
    - yfinance
    - economic data fetch
    data_domain: financial_data
    negative_terms:
    - backtest
    - screening
    - ML prediction
    - parquet export
    ambiguity_question: Do you need to refresh each data or just specific data sources?
  - uc_id: UC-105
    name: Data Retention Policy Cleanup
    positive_terms:
    - data retention
    - cleanup old data
    - archive historical
    - delete old records
    - retention policy
    data_domain: mixed
    negative_terms:
    - live trading
    - backtest
    - ML prediction
    - verify API keys
    ambiguity_question: Do you want to archive data to Parquet before deletion, or just delete old records?
  - uc_id: UC-106
    name: API Key Management Quickstart
    positive_terms:
    - setup API keys
    - quick start
    - initialize credentials
    - API key setup
    data_domain: mixed
    negative_terms:
    - data refresh
    - backtest
    - screening
    - database snapshot
    ambiguity_question: Are you setting up API keys for the first time or updating existing keys?
  - uc_id: UC-107
    name: Credentials Initialization
    positive_terms:
    - setup credentials
    - API key initialization
    - secure storage
    - FRED API
    data_domain: mixed
    negative_terms:
    - data refresh
    - backtest
    - screening
    - database compaction
    ambiguity_question: Do you need to add new API keys or update existing ones?
  - uc_id: UC-108
    name: FRED Data File Organization
    positive_terms:
    - organize files
    - move FRED data
    - file management
    - directory structure
    data_domain: financial_data
    negative_terms:
    - refresh data
    - backtest
    - ML prediction
    - API keys
    ambiguity_question: Is this a one-time file organization task or part of a larger migration?
  - uc_id: UC-109
    name: Offline Sample Data Generation
    positive_terms:
    - generate sample data
    - offline mode
    - test data
    - sample datasets
    - offline testing
    data_domain: mixed
    negative_terms:
    - live trading
    - backtest
    - API keys
    - production data
    ambiguity_question: Do you need sample data for a specific source or each sources?
  - uc_id: UC-110
    name: DuckDB Database Initialization
    positive_terms:
    - init database
    - create tables
    - database setup
    - DuckDB initialization
    data_domain: mixed
    negative_terms:
    - data refresh
    - backtest
    - screening
    - API keys
    ambiguity_question: Are you setting up a new database or resetting an existing one?
  - uc_id: UC-111
    name: News and Sentiment Data Fetching
    positive_terms:
    - news sentiment
    - fetch news
    - sentiment analysis
    - stock news
    - Google Trends
    data_domain: financial_data
    negative_terms:
    - live trading
    - backtest
    - database snapshot
    - API key verification
    ambiguity_question: Do you need sentiment data for specific symbols or a broad market scan?
  - uc_id: UC-112
    name: Pickle Cache to DuckDB Migration
    positive_terms:
    - migrate pickle
    - convert cache
    - DuckDB migration
    - pickle to database
    - data migration
    data_domain: financial_data
    negative_terms:
    - live trading
    - backtest
    - ML prediction
    - API keys
    ambiguity_question: Are you migrating historical data only or also maintaining pickle as cache?
  - uc_id: UC-113
    name: Smart Data Refresh with SLA Awareness
    positive_terms:
    - smart refresh
    - SLA aware
    - rate limit
    - incremental refresh
    - update frequency
    data_domain: financial_data
    negative_terms:
    - full refresh
    - backup
    - API key setup
    - ML prediction
    ambiguity_question: Do you need a forced full refresh or smart incremental updates based on SLAs?
context_state_machine:
  states:
  - id: CA1_MEMORY_CHECKED
    entry: Task started
    exit: All memory queries attempted and recorded; memory_unavailable set if failed
    timeout: 30s — skip memory, mark memory_unavailable=true, proceed to CA2
  - id: CA2_GAPS_FILLED
    entry: CA1 complete
    exit: 'All FATAL-priority required inputs answered: target market (A-share/HK/US), data source, time range, strategy type'
    timeout: NOT skippable — FATAL inputs MUST be user-answered before proceeding
  - id: CA3_PATH_SELECTED
    entry: CA2 complete
    exit: intent_router matched single use case with confidence gap > 20% over next candidate, no data_domain ambiguity
    timeout: Trigger ambiguity_question for top-2 candidates, await user selection
  - id: CA4_EXECUTING
    entry: CA3 complete + user explicit confirmation received
    exit: All hard gates G1-Gn passed and output files written
    timeout: NOT skippable — user confirmation of execution path required
  enforcement: Code generation is PROHIBITED before CA4_EXECUTING. Any regression to earlier state MUST be announced to user.
    buy/sell ordering SL-01 check runs at CA4 entry.
spec_lock_registry:
  semantic_locks:
  - id: SL-01
    description: Execute sell orders before buy orders in every trading cycle
    locked_value: sell() called before buy() in each Trader.run() iteration
    violation_is: fatal
    source_bd_ids:
    - BD-018
  - id: SL-02
    description: Trading signals MUST use next-bar execution (no look-ahead)
    locked_value: due_timestamp = happen_timestamp + level.to_second()
    violation_is: fatal
    source_bd_ids:
    - BD-014
    - BD-025
  - id: SL-03
    description: Entity IDs MUST follow format entity_type_exchange_code
    locked_value: stock_sh_600000 | stockhk_hk_0700 | stockus_nasdaq_AAPL
    violation_is: fatal
    source_bd_ids: []
  - id: SL-04
    description: DataFrame index MUST be MultiIndex (entity_id, timestamp)
    locked_value: df.index.names == ['entity_id', 'timestamp']
    violation_is: fatal
    source_bd_ids: []
  - id: SL-05
    description: 'TradingSignal MUST have EXACTLY ONE of: position_pct, order_money, order_amount'
    locked_value: XOR enforcement in trading/__init__.py:68
    violation_is: fatal
    source_bd_ids: []
  - id: SL-06
    description: 'filter_result column semantics: True=BUY, False=SELL, None/NaN=NO ACTION'
    locked_value: factor.py:475 order_type_flag mapping
    violation_is: fatal
    source_bd_ids: []
  - id: SL-07
    description: Transformer MUST run BEFORE Accumulator in factor pipeline
    locked_value: 'compute_result(): transform at :403 before accumulator at :409'
    violation_is: fatal
    source_bd_ids: []
  - id: SL-08
    description: 'MACD parameters locked: fast=12, slow=26, signal=9'
    locked_value: factors/algorithm.py:30 macd(slow=26, fast=12, n=9)
    violation_is: fatal
    source_bd_ids:
    - BD-036
  - id: SL-09
    description: 'Default transaction costs: buy_cost=0.001, sell_cost=0.001, slippage=0.001'
    locked_value: sim_account.py:25 SimAccountService default costs
    violation_is: warning
    source_bd_ids:
    - BD-029
  - id: SL-10
    description: A-share equity trading is T+1 (no same-day close of buy positions)
    locked_value: sim_account.available_long filters by trading_t
    violation_is: fatal
    source_bd_ids: []
  - id: SL-11
    description: Recorder subclass MUST define provider AND data_schema class attributes
    locked_value: contract/recorder.py:71 Meta; register_schema decorator
    violation_is: fatal
    source_bd_ids: []
  - id: SL-12
    description: Factor result_df MUST contain either 'filter_result' OR 'score_result' column
    locked_value: result_df.columns.intersection({'filter_result', 'score_result'}) non-empty
    violation_is: fatal
    source_bd_ids: []
  implementation_hints:
  - id: IH-01
    hint: 'Use AdjustType enum exactly: qfq (pre-adjust), hfq (post-adjust), bfq (none) — contract/__init__.py:121'
  - id: IH-02
    hint: For A-share kdata, default to hfq for long-term analysis (dividend-adjusted) — trader.py:538 StockTrader
  - id: IH-03
    hint: SQLite connection MUST use check_same_thread=False for multi-threaded recorders
  - id: IH-04
    hint: Accumulator state serialization uses JSON with custom encoder/decoder hooks — contract/base_service.py
  - id: IH-05
    hint: Factor.level MUST match TargetSelector.level (enforced at add_factor) — factors/target_selector.py:84
preservation_manifest:
  required_objects:
    business_decisions_count: 111
    fatal_constraints_count: 37
    non_fatal_constraints_count: 147
    use_cases_count: 13
    semantic_locks_count: 12
    preconditions_count: 4
    evidence_quality_rules_count: 2
    traceback_scenarios_count: 5
architecture:
  pipeline: data_collection -> data_storage -> factor_computation -> target_selection -> trading_execution -> visualization
  stages:
  - id: data_collection
    narrative:
      does_what: TimeSeriesDataRecorder and FixedCycleDataRecorder fetch OHLCV and fundamental data from providers (eastmoney,
        joinquant, baostock, akshare) and persist domain objects (Stock1dKdata, BalanceSheet) to SQLite via df_to_db().
      key_decisions: BD-002 chose evaluate_start_end_size_timestamps for incremental fetch (not full refresh) because comparing
        to get_latest_saved_record avoids redundant API calls; BD-003 chose get_data_map field transformation to keep domain
        schema provider-agnostic.
      common_pitfalls: 'Don''t forget SL-11: Recorder subclass MUST declare both provider and data_schema class attributes
        else initialization fails with assertion error; finance-C-001 fatal violation.'
    business_decisions:
    - id: BD-001
      type: B
      summary: DuckDB singleton connection
    - id: BD-002
      type: BA
      summary: Offline mode with sample data fallback
    - id: BD-003
      type: BA
      summary: SLA-based refresh scheduling (Daily=6h, Weekly=1d, Monthly=7d, Quarterly=30d)
    - id: BD-004
      type: B/RC
      summary: Encrypted credentials storage with Fernet symmetric encryption
    - id: BD-GAP-001
      type: DK
      summary: 'Missing: Trading calendar vs natural calendar'
    - id: BD-GAP-002
      type: DK
      summary: 'Missing: Timezone explicit annotation'
    - id: BD-GAP-003
      type: RC
      summary: 'Missing: float vs Decimal for currency'
    - id: BD-GAP-004
      type: M
      summary: 'Missing: Matrix ill-conditioning and stability'
    - id: BD-GAP-005
      type: B
      summary: 'Missing: PnL conservation'
    - id: BD-GAP-006
      type: DK
      summary: 'Missing: Model and data version snapshot binding'
    - id: BD-GAP-007
      type: RC
      summary: 'Missing: Settlement and delivery time convention'
    - id: BD-GAP-008
      type: RC
      summary: 'Missing: Price and quantity precision (tick/lot size)'
    - id: BD-GAP-009
      type: B
      summary: 'Missing: Cost Model Completeness'
    - id: BD-GAP-010
      type: B
      summary: 'Missing: Carry/Funding Cost Modeling'
    - id: BD-GAP-011
      type: B
      summary: 'Missing: Arbitrage-Free Constraints'
    - id: BD-GAP-012
      type: B
      summary: 'Missing: Optimization Constraint Completeness'
    - id: BD-GAP-013
      type: DK
      summary: 'Missing: Rebalancing Trigger Mechanism'
    - id: BD-GAP-014
      type: RC
      summary: 'Missing: Implement timezone-aware datetime handling across each data operations. Use UTC as canonical timezone
        and localize on display. Replace naive datetime.now() with timezone-aware alternatives.'
    - id: BD-GAP-015
      type: DK
      summary: 'Missing: Trading calendar vs natural calendar'
    - id: BD-GAP-016
      type: RC
      summary: 'Missing: float vs Decimal for currency'
    - id: BD-GAP-017
      type: M
      summary: 'Missing: Matrix ill-conditioning and stability'
    - id: BD-GAP-018
      type: B
      summary: 'Missing: PnL conservation'
    - id: BD-GAP-019
      type: B
      summary: 'Missing: Greeks Calculation'
    - id: BD-GAP-020
      type: B
      summary: 'Missing: Finite Difference Grid Stability'
    - id: BD-GAP-021
      type: B
      summary: 'Missing: Covariance Estimator Selection'
    - id: BD-GAP-022
      type: B
      summary: 'Missing: VaR/CVaR Confidence and Window'
    - id: BD-GAP-023
      type: B
      summary: 'Missing: Default Definition and IFRS 9 Staging'
    - id: BD-GAP-024
      type: B
      summary: 'Missing: PD/LGD/EAD Estimation Methods'
    - id: BD-GAP-025
      type: B
      summary: 'Missing: FTP (Funds Transfer Pricing) Method'
    - id: BD-GAP-026
      type: B
      summary: 'Missing: Cash Pool Legal Structure'
  - id: data_storage
    narrative:
      does_what: StorageBackend persists DataFrames to per-provider SQLite databases at {data_path}/{provider}/{provider}_{db_name}.db
        using path templates from _get_path_template; Mixin.record_data and Mixin.query_data provide uniform read/write interface.
      key_decisions: BD-004 chose StorageBackend abstraction (not hardcoded SQLite) to allow future cloud storage swap; BD-006
        derives db_name from data_schema __tablename__ for per-domain database isolation.
      common_pitfalls: SL-04 violation (wrong DataFrame index) causes factor pipeline failures downstream; always ensure df.index.names
        == ['entity_id', 'timestamp'] before calling record_data.
    business_decisions: []
  - id: factor_computation
    narrative:
      does_what: Factor.compute() applies Transformer (stateless, e.g. MacdTransformer) then Accumulator (stateful, e.g. MaStatsAccumulator)
        to produce filter_result or score_result columns; EntityStateService persists per-entity rolling state across batches.
      key_decisions: BD-007 chose Factor inheriting DataReader for composable data access; SL-08 locks MACD at (fast=12, slow=26,
        n=9) — chose standard Appel parameters not adaptive because interpretability matters for practitioners.
      common_pitfalls: 'SL-07: Transformer MUST run before Accumulator — swapping order causes NaN propagation; SL-12: result_df
        must contain filter_result OR score_result column or TargetSelector silently drops all signals.'
    business_decisions: []
  - id: target_selection
    narrative:
      does_what: TargetSelector.add_factor() registers Factor instances; get_targets() returns entity_ids passing threshold
        filter at a specific timestamp, enabling point-in-time historical backtesting without look-ahead.
      key_decisions: BD-012 chose registrable factor list (not hardcoded) for runtime customization; BD-013 chose timestamp-specific
        filtering not current-only because backtests need historical point-in-time correctness.
      common_pitfalls: Factor.level MUST match TargetSelector.level (IH-05); mismatched levels cause silent empty target lists
        that look like no signals but are actually level-mismatch bugs.
    business_decisions: []
  - id: trading_execution
    narrative:
      does_what: Trader.run() calls sell() before buy() each cycle, generates TradingSignals with due_timestamp = happen_timestamp
        + level.to_second() for next-bar execution, and applies on_profit_control() for stop-loss/take-profit before regular
        target selection.
      key_decisions: SL-01 locks sell-before-buy order because available_long check in sim_account depends on it — chose this
        over symmetric ordering to prevent implicit leverage; BD-039 chose long=AND/short=OR multi-level logic to reflect
        risk asymmetry.
      common_pitfalls: 'SL-02 violation (immediate execution instead of next-bar) introduces look-ahead bias and makes backtest
        results unreproducible in live trading; SL-10: A-share T+1 constraint — backtesting without it overstates returns.'
    business_decisions: []
  - id: visualization
    narrative:
      does_what: Drawer.draw() combines kline main chart with factor overlays and Rect annotations for entry/exit signals
        using Plotly; Drawable interface on Factor enables consistent chart rendering across data types.
      key_decisions: BD-019 chose drawer_rects subclass override for custom annotations not hardcoded markers — allows traders
        to define entry/exit visuals without modifying base drawing logic.
      common_pitfalls: draw_result=True by default (BD-055) is fine for development but set draw_result=False in production/headless
        environments to avoid Plotly server startup overhead.
    business_decisions:
    - id: BD-022
      type: BA
      summary: Offline-first with live fallback (green=live, orange=offline badge)
    - id: BD-023
      type: BA
      summary: Environment-based cache expiry (Dev=1h, Prod=24h)
    - id: BD-024
      type: BA
      summary: 5-year chart lookback default for long-term trends
  - id: cross_cutting_concerns
    narrative:
      does_what: 'Invariants and utilities that span multiple pipeline stages — collected from 23 source groups: default_value(8),
        feature_engineering(3), financial_analysis(15), global(10), inheritance(1), invariant(2), and 17 more.'
      key_decisions: 78 BDs merged here because they apply to more than one main stage (e.g. algorithm helpers, default value
        choices, ordering contracts, error handling). Agent should inspect individual BD summaries and link back to affected
        main stages via shared IDs.
      common_pitfalls: Cross-cutting concerns frequently surface as bugs when changes to one main stage unintentionally break
        another. Check constraints referencing these BDs and verify invariants still hold after any stage-local modification.
    business_decisions:
    - id: BD-061
      type: B/BA
      summary: DatabaseConnection singleton hardcodes memory limits affecting each queries
    - id: BD-063
      type: B/BA
      summary: MarginCallRiskCalculator hardcodes component weights encoding business assumptions
    - id: BD-064
      type: BA/DK
      summary: RecessionProbabilityModel encodes empirical indicator weights from financial research
    - id: BD-067
      type: B/BA
      summary: InsiderTradingTracker classifies transactions using hardcoded bullish/bearish codes
    - id: BD-068
      type: BA
      summary: Data refresh SLA contract defines cache expiry based on publication frequency
    - id: BD-071
      type: BA/DK
      summary: 'VIX regime classification thresholds hardcoded: <15 Low, <20 Normal, <30 Elevated, else Crisis'
    - id: BD-072
      type: B/BA
      summary: 'Default environment config cache_expiry differs: development=1h vs production=24h'
    - id: BD-075
      type: B/BA
      summary: CredentialsManager stores encryption key with os.chmod 0o600 permissions
    - id: BD-005
      type: M
      summary: Shift-based crossover detection for SMA50/SMA200
    - id: BD-006
      type: BA
      summary: Regime classification thresholds (Bullish=RSI>60 AND MACD>0 AND Price>SMA50)
    - id: BD-007
      type: BA
      summary: Feature pipeline validation (>10% null RSI or duplicate dates = failure)
    - id: BD-008
      type: BA
      summary: Composite risk scoring with fixed weights (30/25/25/20)
    - id: BD-009
      type: BA
      summary: Short interest thresholds (>30%=100 score, >20%=75 score)
    - id: BD-010
      type: BA
      summary: Bullish/bearish transaction codes (P=Purchase, M=Exercise=bullish; S=Sale=bearish)
    - id: BD-011
      type: BA
      summary: Sector classification (Offensive=XLY/XLK, Defensive=XLU/XLP, Cyclical=remainder)
    - id: BD-026
      type: B/DK
      summary: Use SMA (Simple Moving Average) for trend identification
    - id: BD-027
      type: B
      summary: Use EMA (Exponential Moving Average) for momentum indicators
    - id: BD-028
      type: B/BA
      summary: Use RSI(14) as primary momentum oscillator
    - id: BD-029
      type: B/BA
      summary: Use MACD with standard parameters for trend/momentum
    - id: BD-030
      type: B/BA
      summary: Use Bollinger Bands with 2 standard deviations
    - id: BD-031
      type: B/DK
      summary: Use ATR(14) to measure realized volatility
    - id: BD-032
      type: B
      summary: Use Stochastic Oscillator %K=14, %D=3
    - id: BD-033
      type: B/DK
      summary: Use Fibonacci ratios for price projection
    - id: BD-034
      type: B/DK
      summary: Use Elliott Wave theory for pattern recognition
    - id: BD-035
      type: B/BA
      summary: Classify trend strength using |avg_return|/volatility ratio
    - id: BD-055
      type: B/BA
      summary: Calculate volume trend using linear regression slope
    - id: BD-076
      type: BA/M
      summary: 'INTERACTION: BD-036 (Z-scores for feature normalization) × BD-052 (StandardScaler in ML) → Double-scaling
        risk corrupts ensemble probability calibration'
    - id: BD-077
      type: BA
      summary: 'INTERACTION: BD-021 (Weekday-only DAG scheduling) × BD-003 (SLA-based refresh intervals) → Weekend skips extend
        effective SLA windows violating data freshness guarantees'
    - id: BD-078
      type: B/BA
      summary: 'INTERACTION: BD-007 (Pipeline validation thresholds) × BD-062 (5-step execution order) → Validation contract
        silently broken if pipeline order changes'
    - id: BD-079
      type: BA
      summary: 'INTERACTION: BD-025 (Volatility threshold +1.0) × BD-037 (Percentile rank volatility) × BD-046 (Composite
        margin risk) → Triple-counting volatility amplifies defensive positioning triggering'
    - id: BD-080
      type: BA/M
      summary: 'INTERACTION: BD-061 (DB singleton memory limits) × BD-065 (EnsembleModel stacking) × BD-046 (Composite margin
        risk) → Memory ceiling creates risk cascade for multi-component risk calculations'
    - id: BD-081
      type: B
      summary: 'INTERACTION: BD-022 (Offline-first with badge) × BD-002 (Sample data fallback) × BD-023 (Env-based cache)
        → Conflicting data freshness signals confuse users about what''s live vs stale'
    - id: BD-082
      type: BA/M
      summary: 'INTERACTION: BD-012 (Walk-forward validation) × BD-070 (Binary target definition) → Validation quality degraded
        by target definition that discards magnitude'
    - id: BD-083
      type: BA/M
      summary: 'INTERACTION: BD-016 (Yield curve dominant 25%) × BD-017 (12-18m lookback) × BD-044 (Recession model weights)
        → Yield curve dominates recession probability creating single-point-of-failure in recession'
    - id: BD-084
      type: B/RC
      summary: 'INTERACTION: BD-004 (Fernet encryption) × BD-075 (0o600 key permissions) → Encryption provides false sense
        of security against privileged escalation'
    - id: BD-085
      type: BA/DK
      summary: 'INTERACTION: BD-024 (5-year default lookback) × BD-006 (Regime classification thresholds) × BD-038 (Momentum
        regime classification) → Historical thresholds calibrated on different market structure may'
    - id: BD-069
      type: B/BA
      summary: BaseModel.apply StandardScaler to each features during fit, serialized in model save
    - id: BD-070
      type: BA
      summary: 'ML training uses binary target: future_close > close determines UP(1)/DOWN(0)'
    - id: BD-074
      type: B/BA
      summary: Database schema enforces composite primary keys (ticker, date) across feature tables
    - id: BD-012
      type: BA/M
      summary: Walk-forward validation using TimeSeriesSplit
    - id: BD-013
      type: BA/DK
      summary: 5-day prediction horizon aligned with weekly rebalancing
    - id: BD-014
      type: M/DK
      summary: Ensemble with LogisticRegression meta-learner stacking
    - id: BD-015
      type: BA/M
      summary: StandardScaler on features before meta-learner
    - id: BD-036
      type: B/BA
      summary: Calculate Z-scores for feature normalization
    - id: BD-037
      type: B/DK
      summary: Classify volatility regime using percentile rank (75th/25th)
    - id: BD-038
      type: B/BA
      summary: Classify momentum regime using RSI/MACD/price position
    - id: BD-056
      type: B
      summary: Classify insider sentiment using weighted buy/sell value
    - id: BD-057
      type: B/BA
      summary: Use title-based insider weighting (CEO=3.0, CFO=2.5)
    - id: BD-058
      type: B/BA
      summary: Use 2x spike threshold for unusual activity detection
    - id: BD-045
      type: B/BA
      summary: Calculate VIX stress score from VIX and VVIX
    - id: BD-046
      type: B/DK
      summary: Use composite margin risk from 4 components (leverage/volume/options/liquidity)
    - id: BD-039
      type: B/BA
      summary: Calculate IV Rank as position in historical IV range
    - id: BD-040
      type: B
      summary: Calculate IV Percentile as percentage of days below current IV
    - id: BD-041
      type: B/BA
      summary: Use Herfindahl Index to measure sector concentration
    - id: BD-042
      type: B/RC
      summary: Calculate relative strength as excess return vs SPY
    - id: BD-059
      type: B/RC
      summary: Calculate sector correlation matrix using Pearson correlation
    - id: BD-060
      type: B/RC
      summary: Use dual momentum (10-day/50-day) for sector rotation
    - id: BD-054
      type: B/DK
      summary: Calculate historical volatility as annualized std of returns
    - id: BD-053
      type: B
      summary: Calculate Sharpe ratio for strategy evaluation
    - id: BD-047
      type: B/DK
      summary: Use XGBoost for stock direction prediction
    - id: BD-048
      type: B
      summary: Use LightGBM for faster gradient boosting
    - id: BD-049
      type: B
      summary: Use ensemble with LogisticRegression meta-learner
    - id: BD-052
      type: B/BA
      summary: Use StandardScaler for feature normalization in ML
    - id: BD-043
      type: B
      summary: Use Sahm Rule (0.5% unemployment rise from 12-month low)
    - id: BD-044
      type: B
      summary: Use weighted recession probability with yield curve dominant
    - id: BD-050
      type: B/DK
      summary: Use walk-forward validation for time series
    - id: BD-051
      type: B/DK
      summary: Use 5-day prediction horizon (1 trading week)
    - id: BD-019
      type: BA
      summary: Parallel ETL tasks (ICI and VIX fetches run concurrently)
    - id: BD-020
      type: BA
      summary: 3 retries with 5-minute delay for API failures
    - id: BD-021
      type: BA/DK
      summary: Weekday-only DAG scheduling (0 7 * * 1-5 = 7 AM UTC, Mon-Fri)
    - id: BD-062
      type: B/RC
      summary: 'FeaturePipeline mandates 5-step execution order: tech→options→derived→margin_risk→quality'
    - id: BD-066
      type: B/DK
      summary: Airflow DAG enforces init→[refreshs]→validate→notify dependency chain
    - id: BD-025
      type: B/BA
      summary: Volatility regime threshold +1.0 for high volatility
    - id: BD-065
      type: BA/M
      summary: 'EnsembleModel uses 2-level stacking: XGBoost+LightGBM base → LogisticRegression meta-learner'
    - id: BD-073
      type: B/BA
      summary: DerivedFeaturesCalculator uses shift(1) to detect golden/death cross transitions
    - id: BD-016
      type: BA/M
      summary: Yield curve has highest weight (0.25) in recession model
    - id: BD-017
      type: BA/M
      summary: 12-18 month inversion lookback (365*1.5 days)
    - id: BD-018
      type: BA/DK
      summary: 7-indicator weighted scoring for recession probability
resources:
  packages:
  - name: streamlit
    version_pin: latest
  - name: pandas
    version_pin: latest
  - name: plotly
    version_pin: latest
  - name: yfinance
    version_pin: latest
  - name: pandas-datareader
    version_pin: latest
  - name: numpy
    version_pin: latest
  - name: duckdb
    version_pin: latest
  - name: xgboost
    version_pin: latest
  - name: lightgbm
    version_pin: latest
  - name: scikit-learn
    version_pin: latest
  strategy_scaffold:
    entry_point_name: run_backtest
    output_path: result.csv
    execution_mode: backtest
    conditional_entry_points:
      backtest:
        entry_point_name: run_backtest
        output_path: result.csv
      collector:
        entry_point_name: run_collector
        output_path: result.json
      factor:
        entry_point_name: run_factor
        output_path: result.parquet
      training:
        entry_point_name: run_training
        output_path: result.json
      serving:
        entry_point_name: run_server
        output_path: result.json
      research:
        entry_point_name: run_research
        output_path: result.json
    tail_template: "# === DO NOT MODIFY BELOW THIS LINE ===\nif __name__ == \"__main__\":\n    result = run_backtest()  #\
      \ implement above\n    from validate import enforce_validation\n    enforce_validation(result, output_path=\"{workspace}/result.csv\"\
      )\n# === END DO NOT MODIFY ==="
  host_adapter:
    target: openclaw
    timeout_seconds: 1800
    shell_operator_restriction: 'exec tool intercepts && / ; / | — never chain: ''pip install X && python Y''. Use separate
      exec calls.'
    install_recipes:
    - python3 -m pip install streamlit
    - python3 -m pip install pandas
    - python3 -m pip install plotly
    - python3 -m pip install zvt
    credential_injection: JoinQuant/QMT credentials require user-side '!' prefix shell login. Never hardcode credentials in
      generated scripts.
    path_resolution: '{workspace} resolves to ~/.openclaw/workspace/doramagic at execution time.'
    file_io_tooling: Use openclaw 'write' tool for .py/.sql files; 'exec' tool for python3 /absolute/path/script.py (absolute
      paths only).
constraints:
  fatal:
  - id: finance-C-001
    when: When storing API credentials on filesystem
    action: set file permissions to 0o600 to restrict access to owner only
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: World-readable credentials files expose API keys to unauthorized users, enabling data theft or quota abuse
      on external services
    stage_ids:
    - data_collection
  - id: finance-C-017
    when: When calculating technical indicators from OHLCV data
    action: Verify OHLCV DataFrame contains required columns (open, high, low, close, volume)
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Missing required OHLCV columns causes ValueError and prevents technical indicator calculation for affected
      tickers
    stage_ids:
    - feature_engineering
  - id: finance-C-018
    when: When implementing SMA crossover detection (golden/death cross)
    action: Use shift(1) to compare current bar state with prior bar state for transition detection
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Without shift(1), crossover detection uses current bar data causing look-ahead bias where signals appear
      to fire at the same bar as the cross occurs
    stage_ids:
    - feature_engineering
  - id: finance-C-020
    when: When validating feature data quality after calculation
    action: Fail pipeline validation if more than 10% of RSI values are null
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: High null rate in RSI indicates insufficient historical data or data quality issues, leading to unreliable
      regime classifications and incorrect trading signals
    stage_ids:
    - feature_engineering
  - id: finance-C-021
    when: When validating feature data quality after calculation
    action: Fail pipeline validation if duplicate dates exist in technical features table
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Duplicate dates violate PRIMARY KEY constraint and cause incorrect feature associations, leading to wrong
      price patterns and trading signals
    stage_ids:
    - feature_engineering
  - id: finance-C-028
    when: When storing technical features in DuckDB
    action: Use composite primary key (ticker, date) to verify uniqueness and enable efficient querying
    severity: fatal
    kind: architecture_guardrail
    modality: must
    consequence: Duplicate (ticker, date) pairs cause incorrect feature retrieval and violate data integrity constraints in
      downstream ML training
    stage_ids:
    - feature_engineering
  - id: finance-C-029
    when: When storing derived features in DuckDB
    action: Use composite primary key (ticker, date) consistent with technical_features table schema
    severity: fatal
    kind: architecture_guardrail
    modality: must
    consequence: Inconsistent primary keys prevent JOIN operations between technical and derived features, breaking regime
      classification and pattern detection
    stage_ids:
    - feature_engineering
  - id: finance-C-036
    when: When implementing crossover detection patterns
    action: Skip shift(1) with reasoning that 'the data looks clean' or 'we need the current bar signal immediately'
    severity: fatal
    kind: rationalization_guard
    modality: must_not
    consequence: Rationalizing look-ahead bias introduction causes future information to leak into current signals, producing
      unrealistic backtest results that fail in live trading
    stage_ids:
    - feature_engineering
  - id: finance-C-037
    when: When encountering high null rates in technical indicators
    action: Skip validation with assumption that 'null values are acceptable for less common indicators'
    severity: fatal
    kind: rationalization_guard
    modality: must_not
    consequence: Ignoring elevated null rates leads to incomplete feature sets that cause ML model training failures or silent
      prediction errors
    stage_ids:
    - feature_engineering
  - id: finance-C-038
    when: When implementing composite margin risk scoring
    action: use exactly 30/25/25/20 weights for leverage/volatility/options/liquidity components that sum to 100%
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Incorrect weight allocation will distort risk assessment, causing underestimation of leverage exposure and
      leading to inappropriate trading decisions
    stage_ids:
    - financial_analysis
  - id: finance-C-039
    when: When classifying short interest for squeeze risk
    action: 'apply threshold breakpoints: >30% SI gets score 100, >20% gets 75, >10% gets 50, >5% gets 25, else 0'
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Incorrect short interest scoring thresholds will fail to identify high-risk short squeeze candidates, missing
      critical leverage risk signals
    stage_ids:
    - financial_analysis
  - id: finance-C-040
    when: When classifying VIX regime for volatility assessment
    action: 'map VIX levels to regimes: <15=Low, <20=Normal, <30=Elevated, >=30=Crisis'
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Incorrect VIX regime mapping will misclassify market stress levels, causing improper margin risk assessment
      during crisis periods
    stage_ids:
    - financial_analysis
  - id: finance-C-041
    when: When parsing SEC Form 4 insider transactions
    action: classify transaction codes P (Purchase) and M (Exercise) as bullish; S (Sale) as bearish; A/D/F/G/E as neutral
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Misclassifying insider transaction codes will invert sentiment signals, causing contrarian trading decisions
      opposite to actual insider activity
    stage_ids:
    - financial_analysis
  - id: finance-C-043
    when: When calculating Altman Z-Score for bankruptcy prediction
    action: 'apply Z-Score interpretation: >2.99=Safe Zone, 1.81-2.99=Grey Zone, <1.81=Distress Zone'
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Incorrect Z-Score threshold boundaries will misclassify bankruptcy risk, leading to investment in financially
      distressed companies
    stage_ids:
    - financial_analysis
  - id: finance-C-044
    when: When calculating sector relative strength
    action: compute relative strength as sector return minus SPY return (benchmark)
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Incorrect benchmark calculation will misrepresent sector outperformance, causing wrong risk-on/off rotation
      signals
    stage_ids:
    - financial_analysis
  - id: finance-C-046
    when: When classifying sectors for rotation analysis
    action: 'use predefined classification: OFFENSIVE=Technology/Consumer Discretionary/Communication/Financials, DEFENSIVE=Utilities/Consumer
      Staples/Healthcare, CYCLICAL=Energy/Materials/Industrials/Real Estate'
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Incorrect sector classification will invert risk-on/off signals, causing portfolio to take opposite positions
      during regime changes
    stage_ids:
    - financial_analysis
  - id: finance-C-050
    when: When computing composite margin call risk score
    action: 'apply formula: composite = leverage_score*0.30 + volatility_score*0.25 + options_score*0.25 + liquidity_score*0.20'
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Incorrect composite formula will misrepresent total margin risk, leading to inadequate position sizing during
      high-stress periods
    stage_ids:
    - financial_analysis
  - id: finance-C-058
    when: When implementing walk-forward validation for stock price prediction
    action: Use sklearn.model_selection.TimeSeriesSplit instead of random cross-validation
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Random cross-validation on time series causes look-ahead bias where future data leaks into training, making
      backtest results unrealistically optimistic and live trading performance far worse
    stage_ids:
    - ml_training_prediction
  - id: finance-C-059
    when: When computing binary target variable for directional prediction
    action: Calculate target as 1 if future_close > close, else 0 using LEAD window function
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Incorrect target calculation causes models to learn wrong patterns, producing systematic prediction errors
      that cannot be corrected by hyperparameter tuning
    stage_ids:
    - ml_training_prediction
  - id: finance-C-067
    when: When executing walk-forward validation
    action: 'Maintain strict temporal ordering: each training indices precede validation indices in each fold'
    severity: fatal
    kind: architecture_guardrail
    modality: must
    consequence: Any shuffle or random sampling within TimeSeriesSplit introduces look-ahead bias, inflating validation metrics
      and producing misleading live trading expectations
    stage_ids:
    - ml_training_prediction
  - id: finance-C-070
    when: When presenting ML model predictions to users
    action: Claim that backtest returns equal expected live trading returns
    severity: fatal
    kind: claim_boundary
    modality: must_not
    consequence: Backtest results exclude transaction costs, slippage, and market impact that materially reduce live returns;
      presenting backtest as live-equivalent constitutes misleading financial disclosure
    stage_ids:
    - ml_training_prediction
  - id: finance-C-074
    when: When implementing the recession probability calculation
    action: Verify indicator weights sum to exactly 1.0 for proper probability normalization
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: If weights do not sum to 1.0, the recession probability will be incorrectly scaled, leading to systematic
      over/underestimation of recession risk and poor investment allocation decisions
    stage_ids:
    - recession_indicator
  - id: finance-C-079
    when: When configuring indicator weights
    action: 'Define weights for each 7 signal categories: yield_curve, labor_market, financial_stress, economic_activity,
      consumer, housing, market'
    severity: fatal
    kind: architecture_guardrail
    modality: must
    consequence: Missing weights for any signal category will cause KeyError during probability calculation, breaking the
      entire recession probability model
    stage_ids:
    - recession_indicator
  - id: finance-C-081
    when: When displaying recession probability to users
    action: Claim predictive accuracy or guarantee future recession timing
    severity: fatal
    kind: claim_boundary
    modality: must_not
    consequence: Overstating prediction accuracy will mislead investors into making risky allocation decisions based on false
      confidence in recession forecasting, potentially causing significant financial losses
    stage_ids:
    - recession_indicator
  - id: finance-C-085
    when: When calculating the historical probability time series
    action: Apply proper rolling window calculation using only data available up to each historical date point
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Using future data in historical calculations introduces look-ahead bias, causing historical probabilities
      to appear more accurate than they actually were and invalidating backtested performance metrics
    stage_ids:
    - recession_indicator
  - id: finance-C-088
    when: When implementing the weighted probability calculation
    action: Calculate weighted probability as sum of (signal * weight) for each 7 signals
    severity: fatal
    kind: architecture_guardrail
    modality: must
    consequence: Incorrect weighted sum calculation will produce wrong recession probabilities, either double-counting certain
      indicators or ignoring others entirely, corrupting the model's core output
    stage_ids:
    - recession_indicator
  - id: finance-C-089
    when: When implementing data refresh tasks in Airflow DAGs
    action: Raise exception when no data records are fetched during refresh
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Dashboard displays empty or outdated data without any error indication, leading to incorrect economic analysis
      and trading decisions based on missing data
    stage_ids:
    - orchestration_automation
  - id: finance-C-090
    when: When validating refreshed market data quality
    action: Raise exception and flag data as stale when latest date exceeds threshold (ICI >14 days, VIX >7 days)
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Dashboard presents stale economic indicators as current data, potentially causing analysts to make decisions
      based on outdated market conditions
    stage_ids:
    - orchestration_automation
  - id: finance-C-099
    when: When defining Airflow DAG task dependencies
    action: Verify init_schema task completes before refresh tasks, and validation completes before notification
    severity: fatal
    kind: architecture_guardrail
    modality: must
    consequence: Refresh tasks run before schema initialization, causing database table not found errors and incomplete data
      loads into non-existent tables
    stage_ids:
    - orchestration_automation
  - id: finance-C-128
    when: When preparing training data for ML models from joined feature tables
    action: Use LEAD() window function with prediction_horizon offset for future_close to prevent look-ahead bias
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: ML model trained with look-ahead bias produces inflated backtest performance that does not generalize to
      live trading
  - id: finance-C-129
    when: When generating predictions using features with shifted close prices
    action: Apply shift(1) to close price before calculating returns to prevent using current candle data for current prediction
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Technical analysis uses intraday data that is not yet finalized, causing prediction errors and potential
      legal liability for front-running
  - id: finance-C-138
    when: When accessing API credentials from Airflow DAGs
    action: Log or display actual credential values in Airflow task logs or UI output
    severity: fatal
    kind: architecture_guardrail
    modality: must_not
    consequence: API keys exposed in logs violate security best practices and may allow unauthorized access to external data
      services
  - id: finance-C-145
    when: When storing encrypted API credentials
    action: Store encryption key files and credentials files with 0o600 permissions (owner read/write only)
    severity: fatal
    kind: architecture_guardrail
    modality: must
    consequence: World-readable credential files expose API keys, allowing unauthorized access to FRED, Yahoo Finance, and
      other data sources
  - id: finance-C-154
    when: When a user considers using this system for live trading
    action: Claim or imply this system supports real-time trading execution — it is an analytical dashboard only
    severity: fatal
    kind: claim_boundary
    modality: must_not
    consequence: Users who believe this dashboard executes trades will experience significant financial losses when they attempt
      to live trade
  - id: finance-C-159
    when: When running ML training on time-series stock data
    action: Use TimeSeriesSplit cross-validation — never shuffle time-series data to avoid look-ahead bias
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Random shuffle splits cause future data to leak into training, producing unrealistic backtest performance
      that will not generalize to live trading
  - id: finance-C-194
    when: When implementing credential storage for API keys and third-party tokens
    action: Use Fernet symmetric encryption to encrypt credentials at rest; store the encryption key separately with 0o600
      file permissions to verify credentials remain unreadable if the encrypted file is compromised
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Storing credentials in plaintext or with improperly protected keys exposes API tokens to theft, enabling
      unauthorized access to trading platforms and potential financial loss
    derived_from_bd_id: BD-004
  - id: finance-C-197
    when: When configuring cross-validation strategy for financial time series model training
    action: Use TimeSeriesSplit for walk-forward validation to prevent lookahead bias; must_not use random split, standard
      k-fold, or any shuffle-based splitting method for financial time series
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Random train/test split leaks future information into training data, inflating model performance metrics
      by 15-30% and causing live trading returns to fall far below backtested results
    derived_from_bd_id: BD-012
  regular:
  - id: finance-C-002
    when: When implementing cached data retrieval
    action: include timestamp metadata with cached data to enable expiry verification
    severity: high
    kind: domain_rule
    modality: must
    consequence: Without timestamp tracking, cache files are treated as valid regardless of age, causing stale data to be
      served as current data
    stage_ids:
    - data_collection
  - id: finance-C-003
    when: When fetching Yahoo Finance data in batch operations
    action: enforce rate limiting delay between individual ticker requests
    severity: high
    kind: domain_rule
    modality: must
    consequence: Unthrottled Yahoo Finance API calls trigger HTTP 429 Too Many Requests errors, causing data collection failures
      and blacklisting
    stage_ids:
    - data_collection
  - id: finance-C-004
    when: When loading FRED economic series data
    action: check offline mode status before attempting any API call
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: API calls made when offline mode is enabled waste resources and produce connection errors that block dashboard
      functionality
    stage_ids:
    - data_collection
  - id: finance-C-005
    when: When initializing database connections in Streamlit multi-page apps
    action: use singleton pattern to verify single shared connection instance
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Multiple DuckDB connections cause resource exhaustion and Connection object already initialized errors in
      multi-page Streamlit deployments
    stage_ids:
    - data_collection
  - id: finance-C-006
    when: When determining cache refresh schedules
    action: 'align cache expiry SLAs with FRED publication frequency: daily=6h, weekly=1d, monthly=7d, quarterly=30d'
    severity: medium
    kind: domain_rule
    modality: must
    consequence: Cache expiry mismatched to publication frequency causes either stale data delivery or unnecessary API calls
      that waste rate limits
    stage_ids:
    - data_collection
  - id: finance-C-007
    when: When processing Yahoo Finance batch requests
    action: process tickers in batches of at most 5 to respect API rate limits
    severity: high
    kind: resource_boundary
    modality: must
    consequence: Fetching unlimited tickers simultaneously triggers Yahoo Finance rate limiting, resulting in HTTP 429 errors
      and data collection failures
    stage_ids:
    - data_collection
  - id: finance-C-008
    when: When handling Yahoo Finance 429 rate limit errors
    action: fall back to expired cache (up to 168 hours) as last resort instead of failing silently
    severity: medium
    kind: domain_rule
    modality: must
    consequence: Silent failure on rate limit returns empty data, breaking dashboard displays with no indication of staleness
    stage_ids:
    - data_collection
  - id: finance-C-009
    when: When fetching SEC EDGAR data
    action: enforce 0.1 second delay between requests and implement exponential backoff on 429 responses
    severity: high
    kind: resource_boundary
    modality: must
    consequence: SEC EDGAR enforces ~10 req/sec limit; violations cause temporary IP bans and failed data collection
    stage_ids:
    - data_collection
  - id: finance-C-010
    when: When loading data with DuckDB available
    action: query DuckDB first before falling back to pickle cache or API
    severity: medium
    kind: architecture_guardrail
    modality: must
    consequence: Skipping DuckDB and querying API directly bypasses cached data, increasing latency and exhausting API rate
      limits unnecessarily
    stage_ids:
    - data_collection
  - id: finance-C-011
    when: When API keys are not configured
    action: load sample offline data from data/sample_FRED_data.csv and data/sample_*_data.csv files
    severity: high
    kind: resource_boundary
    modality: must
    consequence: Dashboard fails to load without fallback data, preventing demonstrations and offline development
    stage_ids:
    - data_collection
  - id: finance-C-012
    when: When claiming data freshness capabilities
    action: label Yahoo Finance data as real-time when it has inherent market delay
    severity: high
    kind: claim_boundary
    modality: must_not
    consequence: Misrepresenting delayed market data as real-time creates false expectations for trading decisions based on
      stale prices
    stage_ids:
    - data_collection
  - id: finance-C-013
    when: When serving sample offline data
    action: present static historical sample data as current market conditions
    severity: high
    kind: claim_boundary
    modality: must_not
    consequence: Displaying outdated sample data as current market data misleads users about economic conditions and asset
      valuations
    stage_ids:
    - data_collection
  - id: finance-C-014
    when: When accessing FRED without API key
    action: use unauthenticated pandas_datareader access which has lower rate limits
    severity: medium
    kind: resource_boundary
    modality: must
    consequence: FRED enforces strict unauthenticated rate limits; without API key, data fetching fails frequently during
      batch operations
    stage_ids:
    - data_collection
  - id: finance-C-015
    when: When caching Yahoo Finance data
    action: use 24-hour minimum cache expiry to reduce API calls and avoid rate limits
    severity: medium
    kind: domain_rule
    modality: must
    consequence: Short cache expiry causes repeated API calls that exhaust rate limits, especially for batch ticker fetching
    stage_ids:
    - data_collection
  - id: finance-C-016
    when: When deciding cache priority order
    action: check centralized cache before individual cache files
    severity: low
    kind: architecture_guardrail
    modality: must
    consequence: Individual cache lookup misses centralized data, causing redundant API fetches that waste rate limits and
      increase latency
    stage_ids:
    - data_collection
  - id: finance-C-019
    when: When calculating rolling Z-scores for feature normalization
    action: Handle division by zero in rolling_std by replacing zero values with NaN
    severity: high
    kind: domain_rule
    modality: must
    consequence: Division by zero produces NaN or infinite Z-scores, corrupting ML training data and causing model training
      failures or invalid predictions
    stage_ids:
    - feature_engineering
  - id: finance-C-022
    when: When classifying momentum regime using RSI, MACD, and SMA50
    action: 'Use regime thresholds: Bullish=RSI>60 AND MACD histogram>0 AND Price/SMA50>1.0, Bearish=RSI<40 AND MACD histogram<0
      AND Price/SMA50<1.0'
    severity: high
    kind: domain_rule
    modality: must
    consequence: Incorrect threshold values cause misclassification of market regime, leading to wrong trading strategy selection
      and financial losses
    stage_ids:
    - feature_engineering
  - id: finance-C-023
    when: When calculating technical indicators that depend on historical data
    action: Verify input OHLCV data has at least 200 rows to compute SMA200 without null values
    severity: high
    kind: resource_boundary
    modality: must
    consequence: Insufficient historical data causes SMA200 to be null, corrupting price_to_sma200 ratio and golden/death
      cross detection logic
    stage_ids:
    - feature_engineering
  - id: finance-C-024
    when: When calculating historical volatility from returns
    action: Require at least 20 trading days of data for 20-day rolling standard deviation
    severity: medium
    kind: resource_boundary
    modality: must
    consequence: Shorter lookback windows produce unreliable volatility estimates, causing incorrect risk assessments and
      position sizing errors
    stage_ids:
    - feature_engineering
  - id: finance-C-025
    when: When calculating IV Rank for options analysis
    action: Require at least 10 historical IV records in database for meaningful IV Rank calculation
    severity: medium
    kind: resource_boundary
    modality: must
    consequence: Insufficient IV history produces meaningless IV Rank values near 50%, causing incorrect options strategy
      selection
    stage_ids:
    - feature_engineering
  - id: finance-C-026
    when: When calculating Z-scores for feature normalization
    action: Use default rolling window of 20 days for mean and standard deviation calculation
    severity: low
    kind: resource_boundary
    modality: must
    consequence: Non-standard window sizes produce inconsistent Z-score distributions across features, reducing ML model interpretability
    stage_ids:
    - feature_engineering
  - id: finance-C-027
    when: When running the feature pipeline for a ticker
    action: 'Execute pipeline stages in order: technical indicators → options metrics → derived features → data quality validation'
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Out-of-order execution causes derived features to lack required technical indicators, producing null outputs
      and failing ML training
    stage_ids:
    - feature_engineering
  - id: finance-C-030
    when: When calculating feature interactions with options data
    action: Check options_data availability before attempting merge to prevent full-NaN interaction features
    severity: medium
    kind: architecture_guardrail
    modality: must
    consequence: Missing options data without proper guard causes NaN-filled interaction features, corrupting ML training
      inputs
    stage_ids:
    - feature_engineering
  - id: finance-C-031
    when: When handling options data fetch failures in pipeline
    action: Log warning and continue pipeline rather than failing entire batch when options data is unavailable
    severity: medium
    kind: operational_lesson
    modality: should
    consequence: Options data unavailability for one ticker causes entire batch pipeline failure, blocking feature generation
      for all other tickers
    stage_ids:
    - feature_engineering
  - id: finance-C-032
    when: When validating date range coverage for feature data
    action: Allow for trading calendar gaps by accepting at least 50% of calendar days as valid date coverage
    severity: medium
    kind: operational_lesson
    modality: must
    consequence: Strict calendar-day matching fails for valid trading data that excludes weekends and market holidays, causing
      false validation failures
    stage_ids:
    - feature_engineering
  - id: finance-C-033
    when: When calculating technical indicators for backtesting
    action: Claim real-time trading capability - this stage only produces historical indicators from past data
    severity: high
    kind: claim_boundary
    modality: must_not
    consequence: Presenting historical technical indicators as real-time trading signals misleads users about system capabilities,
      causing inappropriate trading decisions
    stage_ids:
    - feature_engineering
  - id: finance-C-034
    when: When presenting golden/death cross detection results
    action: Claim that historical cross detection accurately predicts future market direction
    severity: high
    kind: claim_boundary
    modality: must_not
    consequence: Past golden/death cross occurrences do not guarantee future signal accuracy, leading to overconfident trading
      strategies and potential financial losses
    stage_ids:
    - feature_engineering
  - id: finance-C-035
    when: When calculating regime classifications from technical indicators
    action: Claim that momentum_regime values directly translate to profitable trading signals
    severity: medium
    kind: claim_boundary
    modality: must_not
    consequence: Regime classifications describe historical market states, not future conditions. Using them as trading signals
      without proper risk management causes financial losses
    stage_ids:
    - feature_engineering
  - id: finance-C-042
    when: When calculating insider sentiment scores
    action: include neutral transaction codes (A/D/F/G/E) in buy/sell value calculations
    severity: high
    kind: domain_rule
    modality: must_not
    consequence: Including grants, gifts, and tax withholding in sentiment calculation will contaminate signals with non-investment
      transactions
    stage_ids:
    - financial_analysis
  - id: finance-C-045
    when: When estimating VVIX when actual data is unavailable
    action: use formula vvix = vix * 1.2 + 20 as fallback estimation
    severity: high
    kind: domain_rule
    modality: must
    consequence: Using incorrect VVIX estimation formula will produce unreliable volatility stress scores, compromising margin
      call risk accuracy
    stage_ids:
    - financial_analysis
  - id: finance-C-047
    when: When fetching Yahoo Finance data for financial analysis
    action: claim real-time data availability since yfinance inherently has 15+ minute market data delay
    severity: high
    kind: resource_boundary
    modality: must_not
    consequence: Presenting delayed data as real-time will mislead users about current market conditions, causing execution
      at stale prices
    stage_ids:
    - financial_analysis
  - id: finance-C-048
    when: When parsing SEC Form 4 XML filings
    action: implement fallback parsing logic since SEC EDGAR XML structure varies across filings and time periods
    severity: high
    kind: resource_boundary
    modality: must
    consequence: Without fallback parsing, transaction extraction will fail for filings with non-standard XML structures,
      causing incomplete insider data
    stage_ids:
    - financial_analysis
  - id: finance-C-049
    when: When calculating Piotroski F-Score
    action: require at least 2 years of financial data for year-over-year comparison
    severity: high
    kind: domain_rule
    modality: must
    consequence: Calculating F-Score with insufficient historical data will produce meaningless comparisons, masking fundamental
      deterioration signals
    stage_ids:
    - financial_analysis
  - id: finance-C-051
    when: When detecting market rotation patterns
    action: classify risk-on when offensive RS > 1 and defensive RS < -1; risk-off when defensive RS > 1 and offensive RS
      < -1
    severity: high
    kind: domain_rule
    modality: must
    consequence: Incorrect rotation classification thresholds will trigger wrong portfolio rebalancing, missing regime change
      opportunities
    stage_ids:
    - financial_analysis
  - id: finance-C-052
    when: When storing margin risk scores in database
    action: use INSERT OR REPLACE to verify latest scores overwrite stale data for same ticker/date
    severity: medium
    kind: architecture_guardrail
    modality: must
    consequence: Duplicate risk scores will cause confusion during analysis, potentially using outdated margin requirements
    stage_ids:
    - financial_analysis
  - id: finance-C-053
    when: When retrieving technical features for risk calculation
    action: handle None values gracefully since volume_ratio and bid_ask_spread may not exist for each tickers
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Missing optional fields without defaults will cause KeyError exceptions, breaking risk calculations for tickers
      without complete data
    stage_ids:
    - financial_analysis
  - id: finance-C-054
    when: When presenting backtest results for insider trading signals
    action: claim backtest returns represent expected live trading performance
    severity: high
    kind: claim_boundary
    modality: must_not
    consequence: Presenting backtest results as predictive will mislead investors, as historical performance does not account
      for slippage, liquidity constraints, or market impact
    stage_ids:
    - financial_analysis
  - id: finance-C-055
    when: When displaying financial health scores derived from SEC XBRL
    action: present scores as real-time financial condition since SEC filings have inherent reporting lag (quarterly/annual)
    severity: medium
    kind: claim_boundary
    modality: must_not
    consequence: Presenting stale financial data as current will mislead investors about company health, especially during
      earnings blackout periods
    stage_ids:
    - financial_analysis
  - id: finance-C-056
    when: When parsing Form 4 XML data
    action: skip validation assuming well-formed XML since SEC EDGAR contains malformed filings and encoding variations
    severity: high
    kind: rationalization_guard
    modality: must_not
    consequence: Parsing failures will silently drop transactions, creating survivorship bias where only clean filings contribute
      to sentiment
    stage_ids:
    - financial_analysis
  - id: finance-C-057
    when: When fetching VIX data from multiple sources
    action: skip error handling assuming first data source succeeds since network failures and API changes are common
    severity: high
    kind: rationalization_guard
    modality: must_not
    consequence: Single-source dependency will cause complete VIX unavailability during source outages, breaking margin risk
      calculations
    stage_ids:
    - financial_analysis
  - id: finance-C-060
    when: When training ensemble model with LogisticRegression meta-learner
    action: Apply StandardScaler to features before training the LogisticRegression meta-learner
    severity: high
    kind: domain_rule
    modality: must
    consequence: Unscaled meta-features cause LogisticRegression to converge poorly or produce unstable coefficients, degrading
      ensemble prediction quality and confidence calibration
    stage_ids:
    - ml_training_prediction
  - id: finance-C-061
    when: When storing ML predictions in DuckDB database
    action: Include confidence_score as the maximum probability from predict_proba
    severity: high
    kind: domain_rule
    modality: must
    consequence: Missing confidence scores prevent downstream risk management from assessing prediction reliability, leading
      to uniform position sizing that ignores model uncertainty
    stage_ids:
    - ml_training_prediction
  - id: finance-C-062
    when: When evaluating trained models
    action: Track accuracy, precision, recall, and AUC-ROC in model_performance table
    severity: high
    kind: domain_rule
    modality: must
    consequence: Without comprehensive metric tracking, degraded model quality goes undetected, leading to poor trading decisions
      based on deteriorating predictions
    stage_ids:
    - ml_training_prediction
  - id: finance-C-063
    when: When configuring prediction horizon for stock directional prediction
    action: Set prediction_horizon to align with rebalancing frequency (default 5 trading days)
    severity: medium
    kind: resource_boundary
    modality: must
    consequence: Mismatched prediction horizon causes signals to arrive at wrong times relative to rebalancing, forcing either
      early exits or holding positions past intended horizons
    stage_ids:
    - ml_training_prediction
  - id: finance-C-064
    when: When selecting base models for ensemble
    action: Use XGBoost and LightGBM as base models (both scale-invariant tree methods)
    severity: medium
    kind: resource_boundary
    modality: must
    consequence: Including scale-variant models in ensemble requires careful feature scaling management, increasing implementation
      complexity and potential for scaling bugs
    stage_ids:
    - ml_training_prediction
  - id: finance-C-065
    when: When preparing training data for time series models
    action: Verify minimum sample count for TimeSeriesSplit (n_splits * min_samples_per_split)
    severity: high
    kind: domain_rule
    modality: must
    consequence: Insufficient samples cause TimeSeriesSplit to produce empty folds, resulting in training failures or metrics
      computed on unreliably small validation sets
    stage_ids:
    - ml_training_prediction
  - id: finance-C-066
    when: When training gradient boosting models with validation data
    action: Provide eval_set parameter to enable early stopping
    severity: medium
    kind: operational_lesson
    modality: must
    consequence: Without early stopping, models train for fixed n_estimators regardless of convergence, causing overfitting
      to training data and poor generalization
    stage_ids:
    - ml_training_prediction
  - id: finance-C-068
    when: When combining base model predictions in ensemble
    action: Generate meta-features from base model predict_proba outputs, not raw predictions
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Using discrete predictions (0/1) as meta-features loses probability calibration information, reducing meta-learner
      effectiveness and ensemble accuracy
    stage_ids:
    - ml_training_prediction
  - id: finance-C-069
    when: When storing predictions in DuckDB
    action: Define PRIMARY KEY on (ticker, date) to prevent duplicate predictions
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Without primary key constraint, duplicate predictions for same ticker/date cause fan-trap queries and incorrect
      performance metrics in downstream analysis
    stage_ids:
    - ml_training_prediction
  - id: finance-C-071
    when: When using ML predictions for investment decisions
    action: Use model as sole basis for investment decisions
    severity: high
    kind: claim_boundary
    modality: must_not
    consequence: Relying exclusively on binary directional predictions without fundamental analysis, risk management, or portfolio-level
      position sizing exposes portfolios to concentrated losses
    stage_ids:
    - ml_training_prediction
  - id: finance-C-072
    when: When implementing ML training pipeline
    action: Skip TimeSeriesSplit even when data appears stationary or simple
    severity: high
    kind: rationalization_guard
    modality: must_not
    consequence: Stationary-appearing data may contain regime changes or structural breaks; skipping temporal validation produces
      overfitted models that fail catastrophically during market regime shifts
    stage_ids:
    - ml_training_prediction
  - id: finance-C-073
    when: When data appears clean and validation metrics look good
    action: Skip retraining models without checking for concept drift
    severity: high
    kind: rationalization_guard
    modality: must_not
    consequence: Market relationships evolve; models trained on historical data degrade over time. Stale models produce systematically
      biased predictions that accumulate losses
    stage_ids:
    - ml_training_prediction
  - id: finance-C-075
    when: When implementing individual signal calculations
    action: Normalize each signal to 0-1 range using min(signal, 1.0)
    severity: high
    kind: domain_rule
    modality: must
    consequence: Without proper signal normalization, individual signals exceeding 1.0 will distort the weighted probability
      calculation, potentially showing recession probabilities exceeding 100% or returning invalid NaN values
    stage_ids:
    - recession_indicator
  - id: finance-C-076
    when: When calculating the recession probability output
    action: Verify final probability is constrained between 0 and 1
    severity: high
    kind: domain_rule
    modality: must
    consequence: Probability values outside 0-1 range will cause incorrect risk level assignment (LOW/MODERATE/ELEVATED/HIGH),
      potentially misleading investment decisions and causing financial losses
    stage_ids:
    - recession_indicator
  - id: finance-C-077
    when: When loading data for recession probability calculation
    action: Load FRED indicator data via load_indicators_from_data before calling calculate_recession_probability
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Calling calculate_recession_probability without loading data first will raise ValueError and crash the application,
      preventing users from viewing recession risk assessment
    stage_ids:
    - recession_indicator
  - id: finance-C-078
    when: When calculating historical recession probabilities
    action: Require at least 12 periods of historical data as minimum lookback window
    severity: high
    kind: domain_rule
    modality: must
    consequence: Insufficient historical data will cause index errors or use incomplete unemployment/GDP statistics, producing
      unreliable recession probabilities that do not reflect actual economic conditions
    stage_ids:
    - recession_indicator
  - id: finance-C-080
    when: When implementing yield curve inversion detection
    action: Use 18-month (365*1.5 days) lookback window for detecting recent inversions
    severity: high
    kind: domain_rule
    modality: must
    consequence: Incorrect lookback period will fail to capture inversions that historically predict recessions within 12-18
      months, reducing predictive accuracy of the most important indicator
    stage_ids:
    - recession_indicator
  - id: finance-C-082
    when: When using the recession probability model for investment decisions
    action: Use the model as the sole basis for investment allocation decisions
    severity: high
    kind: claim_boundary
    modality: must_not
    consequence: Relying exclusively on this single-indicator model will ignore other critical factors such as company fundamentals,
      geopolitical risks, and market sentiment, leading to suboptimal or loss-making portfolio decisions
    stage_ids:
    - recession_indicator
  - id: finance-C-083
    when: When fetching FRED economic data for the model
    action: 'Obtain each 7 required FRED series: yield spreads, unemployment, claims, industrial production, GDP growth, consumer
      sentiment, corporate spreads, Fed funds rate'
    severity: high
    kind: resource_boundary
    modality: must
    consequence: Missing required FRED series will cause signal calculations to return 0.0 for affected indicators, systematically
      underestimating recession probability and providing false reassurance
    stage_ids:
    - recession_indicator
  - id: finance-C-084
    when: When assigning recession risk levels
    action: 'Use correct probability thresholds: LOW (<20%), MODERATE (20-40%), ELEVATED (40-70%), HIGH (>=70%)'
    severity: high
    kind: domain_rule
    modality: must
    consequence: Incorrect risk level thresholds will misclassify recession risk, potentially causing investors to either
      panic unnecessarily or remain inadequately hedged during genuine economic downturns
    stage_ids:
    - recession_indicator
  - id: finance-C-086
    when: When generating indicator explanations for users
    action: Provide explanations for each 7 indicator categories explaining their contribution to the recession probability
    severity: medium
    kind: architecture_guardrail
    modality: must
    consequence: Missing indicator explanations will leave users without context for why the model assigns specific recession
      probabilities, reducing transparency and trust in the model's output
    stage_ids:
    - recession_indicator
  - id: finance-C-087
    when: When loading FRED data for recession indicators
    action: Use cached data older than the cache expiry threshold without validation
    severity: medium
    kind: resource_boundary
    modality: must_not
    consequence: Using stale FRED data will produce outdated recession probabilities that do not reflect current economic
      conditions, potentially leading to incorrect investment decisions based on obsolete information
    stage_ids:
    - recession_indicator
  - id: finance-C-091
    when: When implementing economic data quality validation
    action: Raise exception when FRED data has fewer than 100 rows or 30 columns, or when Yahoo Finance has fewer than 3 tickers
    severity: high
    kind: domain_rule
    modality: must
    consequence: Dashboard renders with incomplete economic indicators, missing key series needed for recession analysis and
      financial forecasting
    stage_ids:
    - orchestration_automation
  - id: finance-C-092
    when: When configuring FRED API data refresh
    action: Insert 1-second sleep every 20 FRED requests to stay under 120 calls/minute rate limit
    severity: high
    kind: resource_boundary
    modality: must
    consequence: Automated refresh triggers FRED rate limiting, causing API failures and incomplete data fetching for all
      subsequent dashboard users
    stage_ids:
    - orchestration_automation
  - id: finance-C-093
    when: When configuring Yahoo Finance data refresh
    action: Implement exponential backoff (10s, 15s) for failed Yahoo Finance requests, up to 3 retries
    severity: high
    kind: resource_boundary
    modality: must
    consequence: Persistent Yahoo Finance rate limit errors cause complete market data refresh failure, leaving dashboard
      without current market indicators
    stage_ids:
    - orchestration_automation
  - id: finance-C-094
    when: When configuring Airflow DAG task execution
    action: Set execution_timeout to 30 minutes maximum per task to prevent indefinite hanging
    severity: high
    kind: resource_boundary
    modality: must
    consequence: Stuck DAG tasks block subsequent runs, causing cascading failures and missing data updates for multiple days
    stage_ids:
    - orchestration_automation
  - id: finance-C-095
    when: When configuring Airflow DAG retry policy
    action: Set retries=3 with retry_delay=5 minutes and email_on_retry=False to handle transient failures gracefully
    severity: high
    kind: resource_boundary
    modality: must
    consequence: Without proper retry configuration, transient API failures immediately cause DAG failures, generating excessive
      alert emails and wasting investigation time
    stage_ids:
    - orchestration_automation
  - id: finance-C-096
    when: When scheduling ICI ETF data refresh DAG
    action: Schedule ICI weekly ETF flows refresh for Wednesday (day 3) to capture the weekly publication
    severity: high
    kind: operational_lesson
    modality: must
    consequence: ICI data refresh scheduled on wrong day misses the weekly publication window, dashboard displays stale ETF
      flow data for entire week
    stage_ids:
    - orchestration_automation
  - id: finance-C-097
    when: When scheduling market data refresh DAG
    action: Schedule VIX and market data refresh for weekdays only (1-5) to align with trading calendar
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Weekend refresh attempts fetch stale or no market data, wasting compute resources and creating confusing
      empty data states in dashboard
    stage_ids:
    - orchestration_automation
  - id: finance-C-098
    when: When configuring AIRFLOW_ALERT_EMAIL for DAG failure notifications
    action: Set AIRFLOW_ALERT_EMAIL environment variable to enable failure email alerts when DAG tasks exhaust each retries
    severity: high
    kind: operational_lesson
    modality: must
    consequence: Silent DAG failures go unnoticed for extended periods, dashboard serves stale data without operators realizing
      refresh has stopped
    stage_ids:
    - orchestration_automation
  - id: finance-C-100
    when: When implementing parallel ETL tasks in DAG
    action: Execute ICI and VIX fetches concurrently using Airflow task list syntax to reduce total refresh time
    severity: medium
    kind: architecture_guardrail
    modality: must
    consequence: Sequential execution doubles refresh duration, risking timeout and delaying dashboard availability for morning
      market analysis
    stage_ids:
    - orchestration_automation
  - id: finance-C-101
    when: When configuring cache backup retention policy
    action: Delete CSV backups older than 30 days to prevent disk space exhaustion
    severity: high
    kind: resource_boundary
    modality: must
    consequence: Unbounded backup growth fills disk, causing DAG failures and potential data loss when system cannot write
      new backups or cache files
    stage_ids:
    - orchestration_automation
  - id: finance-C-102
    when: When implementing data refresh automation
    action: Claim guaranteed real-time data availability when refresh relies on polling-based external APIs
    severity: medium
    kind: claim_boundary
    modality: must_not
    consequence: Marketing claims of real-time data mislead users; actual data has inherent delays from FRED/Yahoo Finance
      polling, causing incorrect assumptions about data freshness
    stage_ids:
    - orchestration_automation
  - id: finance-C-103
    when: When presenting automated refresh results
    action: Present automated refresh metrics as evidence of system reliability without acknowledging external API dependency
    severity: medium
    kind: claim_boundary
    modality: must_not
    consequence: Dashboard claims strong reliability metrics while overlooking that failures stem from external API unavailability,
      misrepresenting operational success
    stage_ids:
    - orchestration_automation
  - id: finance-C-104
    when: When setting cache expiry thresholds
    action: Use 24-hour cache expiry to balance API load reduction with data freshness requirements
    severity: medium
    kind: resource_boundary
    modality: must
    consequence: Overly aggressive caching serves stale data; overly aggressive refresh exhausts API rate limits, both degrading
      dashboard utility
    stage_ids:
    - orchestration_automation
  - id: finance-C-105
    when: When configuring email alerts for DAG notifications
    action: Send alert emails on retry attempts (only send on final failure after each retries exhausted)
    severity: high
    kind: architecture_guardrail
    modality: must_not
    consequence: Excessive alert emails during retry phases cause alert fatigue, leading operators to ignore or disable critical
      failure notifications
    stage_ids:
    - orchestration_automation
  - id: finance-C-125
    when: When fetching Yahoo Finance OHLCV data for technical indicators
    action: Cache data with 24-hour expiry to respect rate limits (YFINANCE_CACHE_HOURS=24, YFINANCE_RATE_LIMIT_DELAY=0.5)
    severity: high
    kind: resource_boundary
    modality: must
    consequence: API rate limiting causes data fetch failures, resulting in missing technical indicators and NaN values propagating
      to downstream ML models
  - id: finance-C-126
    when: When loading OHLCV data from DuckDB for technical indicator calculation
    action: Verify date column is parsed as DatetimeIndex and OHLCV columns (open, high, low, close, volume) exist with numeric
      dtypes
    severity: high
    kind: domain_rule
    modality: must
    consequence: Technical indicators produce NaN or incorrect values because column names or types don't match the ta library
      expectations
  - id: finance-C-127
    when: When storing technical features in DuckDB from feature engineering
    action: Use PRIMARY KEY (ticker, date) to prevent duplicate records and verify schema matches technical_features table
      definition
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Duplicate feature records cause JOIN failures in ML training, producing inconsistent model inputs and invalid
      predictions
  - id: finance-C-130
    when: When loading FRED economic series for recession indicator calculation
    action: 'Verify series columns match the expected names: yield_spread_10y2y, unemployment_rate, consumer_sentiment, corporate_spread'
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Recession probability model receives wrong indicator values or NaN, producing incorrect recession probability
      scores
  - id: finance-C-131
    when: When passing margin risk scores to visualization dashboard
    action: Pass composite_risk_score as float in range 0-100 with risk_level classification string (Critical/High/Moderate/Low/Minimal)
    severity: medium
    kind: architecture_guardrail
    modality: must
    consequence: Dashboard fails to render risk gauges correctly, showing NaN or incorrect colors for risk indicators
  - id: finance-C-132
    when: When storing trained model files as .pkl for persistence
    action: Include ticker, model_type, and timestamp in filename for version tracking and retrieval (e.g., {ticker}_{model_type}_{timestamp}.pkl)
    severity: high
    kind: domain_rule
    modality: must
    consequence: Prediction engine cannot locate the correct model version, causing prediction failures or using outdated
      models
  - id: finance-C-133
    when: When displaying ML predictions in the dashboard
    action: Display confidence_score as probability percentage alongside the binary prediction (UP/DOWN) to avoid misrepresenting
      prediction certainty
    severity: medium
    kind: claim_boundary
    modality: must
    consequence: Users misinterpret high confidence scores as guaranteed outcomes, leading to overconfident trading decisions
      and potential financial losses
  - id: finance-C-134
    when: When presenting recession probability scores to users
    action: Display recession probability with explicit confidence intervals and historical accuracy statistics to prevent
      over-interpretation
    severity: high
    kind: claim_boundary
    modality: must
    consequence: Users treat recession probability as precise prediction rather than probabilistic estimate, causing inappropriate
      risk hedging or asset allocation decisions
  - id: finance-C-135
    when: When refreshing market data via Airflow DAGs
    action: 'Validate data freshness: ICI ETF weekly data must be within 14 days, VIX data within 7 days, otherwise raise
      exception'
    severity: high
    kind: operational_lesson
    modality: must
    consequence: Stale market data propagates to all downstream analyses, causing incorrect sector rotation signals and margin
      risk scores
  - id: finance-C-136
    when: When fetching options data from Yahoo Finance for IV metrics
    action: Handle division by zero for put_call_ratio when call_volume is 0, returning None instead of inf or NaN
    severity: medium
    kind: resource_boundary
    modality: must
    consequence: IV metrics with inf/NaN values cause feature engineering pipeline to fail or produce invalid margin risk
      scores
  - id: finance-C-139
    when: When calculating FRED data pivot from long to wide format
    action: Convert date column to DatetimeIndex and rename columns from series_id back to descriptive names after pivot operation
    severity: high
    kind: domain_rule
    modality: must
    consequence: Downstream modules expecting descriptive column names receive FRED series IDs, causing feature name mismatches
      and KeyErrors
  - id: finance-C-140
    when: When establishing database connections in the application
    action: Use the singleton DatabaseConnection via get_db_connection() — only one connection instance per process is allowed
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Multiple DuckDB connection instances can cause file locking conflicts and data inconsistency in concurrent
      access scenarios
  - id: finance-C-141
    when: When creating time-series tables in the DuckDB database
    action: Use composite primary key (entity_id, date) for each time-series tables to verify uniqueness and proper indexing
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Duplicate primary keys or missing date-based partitioning causes data insertion failures and incorrect time-series
      queries
  - id: finance-C-142
    when: When training ML models on the feature data
    action: Preserve feature column names from the database exactly as-is through the ML training pipeline
    severity: medium
    kind: architecture_guardrail
    modality: must
    consequence: Feature name changes break model prediction consistency, causing incorrect feature mapping between training
      and inference
  - id: finance-C-143
    when: When defining the ML binary classification target
    action: Create binary target as 1 if future_close > close, else 0 (5-day prediction horizon default)
    severity: high
    kind: domain_rule
    modality: must
    consequence: Incorrect target definition invalidates all ML model training results and prediction accuracy metrics
  - id: finance-C-144
    when: When configuring cache expiry times
    action: Set cache_expiry_hours to 1 hour in development and 24 hours in production environments
    severity: medium
    kind: operational_lesson
    modality: must
    consequence: Production cache too short causes excessive API calls and slow dashboard loads; dev cache too long delays
      visibility of data changes
  - id: finance-C-146
    when: When implementing regime crossover detection signals
    action: Apply shift(1) to compare bar t with bar t-1 for golden cross and death cross detection to avoid look-ahead bias
    severity: high
    kind: domain_rule
    modality: must
    consequence: Without shift(1), the signal uses today's moving average values to generate today's signal, causing look-ahead
      bias in backtests
  - id: finance-C-147
    when: When classifying VIX regime levels
    action: 'Apply thresholds: <15 Low, <20 Normal, <30 Elevated, >=30 Crisis'
    severity: medium
    kind: domain_rule
    modality: must
    consequence: Incorrect threshold boundaries cause wrong risk regime classification, leading to inappropriate margin risk
      calculations and trading recommendations
  - id: finance-C-148
    when: When implementing SLA-based cache refresh policies
    action: 'Enforce cache expiry based on data frequency: Daily=6h, Weekly=1d, Monthly=7d, Quarterly=30d'
    severity: medium
    kind: operational_lesson
    modality: must
    consequence: Stale data displayed to users when cache not refreshed; excessive API calls when over-refreshing data outside
      SLA windows
  - id: finance-C-149
    when: When fetching data from external sources (FRED, Yahoo Finance)
    action: Use UTC timestamps consistently across each scheduled jobs and data refresh workflows
    severity: high
    kind: domain_rule
    modality: must
    consequence: Mixed timezone handling causes data timestamp misalignment, leading to incorrect time-series joins and stale
      data served as fresh
  - id: finance-C-150
    when: When presenting or reporting this system's ML prediction results to users
    action: Claim that ML prediction accuracy or backtested returns equal expected live trading returns
    severity: high
    kind: claim_boundary
    modality: must_not
    consequence: Users make live capital allocation decisions based on inflated backtest returns, leading to severe underperformance
      in live trading and potential financial loss
  - id: finance-C-151
    when: When displaying recession probability model results
    action: Display the disclaimer that past indicator performance does not guarantee future predictive accuracy
    severity: high
    kind: claim_boundary
    modality: must
    consequence: Without proper disclaimer, users may rely on recession forecasts as definitive predictions, leading to poor
      investment timing decisions
  - id: finance-C-152
    when: When presenting technical analysis results
    action: Include the disclaimer that analysis is for educational purposes only and should not be considered financial advice
    severity: high
    kind: claim_boundary
    modality: must
    consequence: Without proper disclaimer, users may treat educational technical analysis as actionable trading signals,
      leading to financial losses
  - id: finance-C-153
    when: When presenting news sentiment analysis results
    action: Include the disclaimer that analysis is for informational purposes only and should not be considered financial
      advice
    severity: high
    kind: claim_boundary
    modality: must
    consequence: Without proper disclaimer, users may act on sentiment signals as reliable trading indicators, leading to
      poor investment decisions
  - id: finance-C-155
    when: When a user without Python environment setup considers using this system
    action: Claim or imply the system works out-of-the-box without Python 3.10+ and pip dependencies
    severity: high
    kind: claim_boundary
    modality: must_not
    consequence: Users without Python environment setup will be unable to run the dashboard or scripts, leading to frustration
      and wasted setup time
  - id: finance-C-156
    when: When presenting news sentiment analysis capabilities
    action: Claim NLP sentiment analysis provides accurate or reliable sentiment predictions
    severity: high
    kind: claim_boundary
    modality: must_not
    consequence: NLP sentiment analysis has inherent limitations in understanding context, sarcasm, and financial jargon,
      leading to misleading sentiment signals
  - id: finance-C-157
    when: When accessing FRED and Yahoo Finance data
    action: Accept that external API data has inherent delays — FRED daily data published ~4PM ET, Yahoo Finance has ~15-minute
      delay
    severity: medium
    kind: resource_boundary
    modality: must
    consequence: Users expecting real-time economic indicators will see stale data, potentially causing decisions based on
      outdated information
  - id: finance-C-158
    when: When operating without API keys
    action: Accept that the system operates in sample data mode with demonstration-quality results only
    severity: high
    kind: resource_boundary
    modality: must
    consequence: Users presenting sample/demo data as representative live data will make incorrect conclusions about market
      conditions
  - id: finance-C-160
    when: When implementing momentum indicators in technical analysis module
    action: Use EMA (Exponential Moving Average) with 12-period fast and 26-period slow boundaries aligned with standard MACD
      parameters; do not replace with DEMA or SMA without re-evaluating signal interpretation
    severity: high
    kind: domain_rule
    modality: must
    consequence: Switching to SMA would slow signal response, causing delayed entry/exit points; using DEMA introduces complexity
      without standardized parameters, breaking consistency with the trading system's momentum signal generation
    derived_from_bd_id: BD-027
  - id: finance-C-161
    when: When implementing RSI-based momentum oscillator
    action: Use RSI(14) with 30/70 overbought/oversold boundaries validated against the 5-day prediction horizon; verify that
      RSI period matches strategy timeframe by testing signal frequency vs noise ratio
    severity: medium
    kind: domain_rule
    modality: should
    consequence: Using RSI(7) produces excessive signals with higher noise, while RSI(21) may miss short-term momentum reversals;
      mismatched RSI period causes either over-trading or delayed signals relative to the 5-day prediction window
    derived_from_bd_id: BD-028
  - id: finance-C-162
    when: When implementing volatility measurement for position sizing
    action: Use ATR(14) as the volatility metric for risk parity calculations; ATR period must be 14 to capture meaningful
      average true range across different price levels and market conditions
    severity: high
    kind: domain_rule
    modality: must
    consequence: Using shorter ATR periods increases sensitivity to noise, while longer periods lag market volatility; mismatched
      ATR parameters cause incorrect position sizing, leading to either excessive risk or underallocation
    derived_from_bd_id: BD-031
  - id: finance-C-163
    when: When implementing sector rotation detector for relative strength calculations
    action: Calculate relative strength as excess return versus SPY benchmark over rolling 20/60/120-day periods; use positive/negative
      boundary to determine sector rotation direction
    severity: high
    kind: domain_rule
    modality: must
    consequence: Using sector SPDR ETFs as benchmark instead of SPY narrows market context, causing sector rotation signals
      to miss broad market regime changes and potentially allocating to sectors underperforming the broader market
    derived_from_bd_id: BD-042
  - id: finance-C-164
    when: When configuring data refresh intervals in data_series_config
    action: 'Align refresh schedules with FRED publication cycles: Daily series = 6h, Weekly series = 1d, Monthly series =
      7d (to capture NFP/CPI), Quarterly series = 30d (to capture GDP revisions)'
    severity: medium
    kind: domain_rule
    modality: should
    consequence: Using fixed 6-hour refresh wastes API quota on rarely-changing monthly data; conversely, refreshing monthly
      data only weekly may miss timely economic releases, causing stale indicators in the ML model
    derived_from_bd_id: BD-003
  - id: finance-C-165
    when: When implementing recession probability model
    action: Use 7-indicator weighted scoring approach including yield curve, labor (unemployment claims), financial (credit
      spreads), activity (PMI), consumer (consumer confidence), housing (starts/permits), and market (equity drawdown); do
      not rely on single-indicator models
    severity: high
    kind: domain_rule
    modality: must
    consequence: Single-indicator recession models (e.g., yield curve only) have high false positive rates during normal volatility
      cycles, causing premature portfolio de-risking and missed opportunities during extended bull markets
    derived_from_bd_id: BD-018
  - id: finance-C-166
    when: When preparing features for meta-learner training and prediction
    action: Apply StandardScaler to features before feeding into LogisticRegression meta-learner; XGBoost/LightGBM inputs
      should NOT be scaled since they are scale-invariant
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Without scaling, probability outputs from different base models receive unequal treatment due to feature
      scale differences, causing the meta-learner to overweight models with larger numerical ranges and underweight those
      with smaller ranges
    derived_from_bd_id: BD-015
  - id: finance-C-167
    when: When implementing or refactoring Stochastic Oscillator calculations in technical analysis
    action: Use %K period of 14 and %D smoothing period of 3 for the fast stochastic configuration — maintain these exact
      parameters to verify consistent momentum confirmation signals
    severity: high
    kind: domain_rule
    modality: must
    consequence: Changing %K or %D parameters alters signal timing and reduces momentum confirmation reliability; strategies
      calibrated with fast stochastic (%K=14, %D=3) may produce false signals with different parameter values
    derived_from_bd_id: BD-032
  - id: finance-C-168
    when: When implementing MACD calculations in technical analysis modules
    action: 'Use standard MACD parameters: 12-period fast EMA, 26-period slow EMA, and 9-period signal line — these parameters
      were validated through out-of-sample testing showing inferior performance with alternative values during 2020 volatility'
    severity: high
    kind: domain_rule
    modality: must
    consequence: Modified MACD parameters (e.g., 6,13,4) demonstrated inferior performance during market volatility events;
      using non-standard parameters may cause backtest results that cannot be replicated in live trading
    derived_from_bd_id: BD-029
  - id: finance-C-169
    when: When implementing sector correlation matrix calculations for portfolio optimization
    action: Use Pearson correlation to measure linear relationship strength between sector returns — this quantifies portfolio
      diversification and informs sector rotation timing aligned with Markowitz optimization framework
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Using non-Pearson correlation methods (e.g., Spearman rank or DCC-GARCH) changes risk quantification; Markowitz
      optimization requires Pearson correlation inputs, and alternative methods produce inconsistent diversification metrics
    derived_from_bd_id: BD-059
  - id: finance-C-170
    when: When implementing regime classification logic using technical indicators
    action: 'Apply exact thresholds: Bullish requires RSI>60 AND MACD>0 AND Price>SMA50; Bearish requires RSI<40 OR (MACD<0
      AND Price<SMA50) — these three-factor agreement thresholds filter noise while catching genuine regime shifts'
    severity: high
    kind: domain_rule
    modality: must
    consequence: Using different threshold values (e.g., RSI>50 alone) produces significantly different regime signals during
      consolidation periods; single-factor triggers generate excessive false signals causing wrong strategy execution
    derived_from_bd_id: BD-006
  - id: finance-C-171
    when: When implementing recession prediction model
    action: Assign yield curve indicator a weight of 0.25 in the recession model — this weight reflects the 12-18 month lead
      time empirically validated in academic literature (Estrella & Mishkin 1998, Rudebusch & Williams 2009)
    severity: high
    kind: domain_rule
    modality: must
    consequence: Equal weights or reduced yield curve weight deviates from empirically validated parameters; academic literature
      consistently shows yield curve inversion precedes recession by 3-18 months, and reducing its weight diminishes predictive
      accuracy
    derived_from_bd_id: BD-016
  - id: finance-C-172
    when: When implementing or refactoring options metrics calculations in modules.features.options_metrics
    action: Calculate IV Percentile as the percentage of historical trading days with IV lower than the current value — this
      provides time-based regime classification distinct from IV Rank
    severity: high
    kind: domain_rule
    modality: must
    consequence: Replacing IV Percentile with simpler metrics loses time-based regime classification; strategies relying on
      IV Percentile for options pricing decisions will use wrong volatility regime, causing systematic mispricing
    derived_from_bd_id: BD-040
  - id: finance-C-173
    when: When implementing or refactoring recession detection logic in modules.ml.recession_model
    action: Apply the Sahm Rule with the 0.5 percentage point threshold from 12-month unemployment minimum — do not modify
      this boundary to other values without comprehensive backtesting
    severity: high
    kind: domain_rule
    modality: must
    consequence: Changing the 0.5% recession trigger threshold alters recession signal timing; incorrect threshold causes
      either missed recession warnings or premature defensive positioning, both causing significant portfolio losses
    derived_from_bd_id: BD-043
  - id: finance-C-174
    when: When configuring Bollinger Bands parameters in financial_analysis technical analysis module
    action: Verify that Bollinger Bands standard deviation parameter is set to 2.0 — this captures approximately 95% of price
      action under normal distribution assumptions
    severity: medium
    kind: domain_rule
    modality: should
    consequence: Using non-2.0 standard deviation parameters alters breakout signal sensitivity; narrower bands increase false
      signals while wider bands miss genuine volatility expansions, causing poor mean-reversion entry timing
    derived_from_bd_id: BD-030
  - id: finance-C-175
    when: When implementing or modifying sector rotation signal generation in modules.features.sector_rotation
    action: Use dual momentum with 10-day and 50-day moving averages where 10-day > 50-day confirms upward momentum — these
      specific boundary periods capture short-term timing with medium-term noise filtering
    severity: high
    kind: domain_rule
    modality: must
    consequence: Changing momentum periods alters sector rotation signal timing and turnover; different period combinations
      were backtested and showed inferior risk-adjusted returns, causing suboptimal sector allocation decisions
    derived_from_bd_id: BD-060
  - id: finance-C-176
    when: When executing the feature engineering pipeline in modules.features.feature_pipeline
    action: Fail the pipeline when RSI null values exceed 10% or duplicate dates are detected — do not convert this to a warning;
      hard failure ensures visibility into data quality issues
    severity: high
    kind: domain_rule
    modality: must
    consequence: Converting this to a warning allows ML models to train silently on corrupted data, producing unreliable predictions
      that lead to poor trading decisions and financial losses
    derived_from_bd_id: BD-007
  - id: finance-C-177
    when: When applying RecessionProbabilityModel to non-US markets or post-2020 periods with structural economic breaks
    action: Recalibrate indicator weights (yield_curve 25%, labor 20%, financial 15%) based on current market empirical data
      — these weights are calibrated for US markets pre-2020 and may not reflect post-pandemic indicator relationships
    severity: medium
    kind: domain_rule
    modality: should
    consequence: Using pre-2020 calibrated weights on post-pandemic or emerging market data produces unreliable recession
      probabilities; indicator relationships changed significantly during COVID, leading to systematic misprediction
    derived_from_bd_id: BD-064
  - id: finance-C-178
    when: When configuring yield curve inversion lookback period in recession_indicator stage
    action: Use 12-18 month lookback period (365*1.5 = 547 days) to capture historical inversions that predict recessions
      within 12-18 months — current inversion adds signal even if spread has normalized after inversion
    severity: high
    kind: domain_rule
    modality: must
    consequence: Using shorter lookback (e.g., 6 months) misses recession signals from inversions that normalized but still
      predict near-term recession, causing delayed or missed defensive portfolio positioning
    derived_from_bd_id: BD-017
  - id: finance-C-180
    when: When implementing or modifying the dashboard's data freshness indicators
    action: Verify unified data freshness signals that reconcile connection status, last update timestamp, and data source
      badge into a single coherent indicator — do not show green connection status while displaying stale sample data
    severity: high
    kind: domain_rule
    modality: must
    consequence: Conflicting data freshness signals cause users to trust green connection status while acting on stale sample
      data, leading to incorrect trading decisions during critical market moments when data accuracy is essential
    derived_from_bd_id: BD-081
  - id: finance-C-181
    when: When implementing or refactoring insider trading classification logic
    action: Verify SEC Form 4 code classification against current filing conventions; add context-aware disambiguation for
      code P (distinguish private placements from standard purchases), and implement validation against company event calendars
    severity: medium
    kind: domain_rule
    modality: should
    consequence: Misclassifying private placement transactions as bullish (code P) causes false positive bullish signals,
      potentially leading to strategy entries based on insider purchases that were actually compensatory awards rather than
      directional bets
    derived_from_bd_id: BD-067
  - id: finance-C-182
    when: When training machine learning models using BaseModel with StandardScaler
    action: Verify feature distributions are approximately Gaussian within training windows; apply Shapiro-Wilk test (p>0.05)
      or examine kurtosis; if heavy tails detected, switch to RobustScaler or rank-based transformation
    severity: medium
    kind: domain_rule
    modality: should
    consequence: StandardScaler assumes Gaussian-distributed features; heavy-tailed features get scaled values misrepresenting
      true relative importance, causing models to overweight outlier-prone indicators and underweight stable ones during inference
    derived_from_bd_id: BD-069
  - id: finance-C-183
    when: When implementing or refactoring feature calculation logic for golden/death cross detection
    action: 'Implement frequency-aware shift calculation: for minute-level data use shift(1) referencing prior minute, for
      hourly use shift(1) referencing prior hour; validate that shift window matches input data granularity before computing
      SMA crossovers'
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Hardcoded shift(1) assumes daily closing prices; when applied to intraday minute-level data, shift(1) references
      the prior minute instead of prior day, causing completely wrong crossover detection that silently produces false signals
    derived_from_bd_id: BD-073
  - id: finance-C-184
    when: When designing or modifying database schemas for feature tables
    action: Normalize timestamps to UTC before storing; include exchange_timezone field in feature metadata; when querying
      across multiple exchanges, filter by exchange-specific trading day boundaries rather than assuming UTC date equivalence
    severity: medium
    kind: architecture_guardrail
    modality: should
    consequence: Composite primary key (ticker, date) assumes single timezone per date; when the same ticker trades on exchanges
      across different time zones, identical ticker/date combinations create conflicting entries with different trading day
      boundaries
    derived_from_bd_id: BD-074
  - id: finance-C-185
    when: When implementing or modifying feature pipeline execution order
    action: 'Maintain the validated 5-step execution order: tech→options→derived→margin_risk→quality; implement dependency
      graph validation that checks prerequisite stages complete before dependent stages run, ensuring null RSI failures are
      attributed to ordering violations not data quality issues'
    severity: high
    kind: domain_rule
    modality: must
    consequence: Pipeline validation thresholds (>10% null RSI) assume the fixed 5-step order; if derived features run before
      tech indicators, null RSI from missing upstream dependencies gets misattributed to data quality, causing the quality
      gate to silently accept corrupted inputs that corrupt downstream margin risk calculations
    derived_from_bd_id: BD-078
  - id: finance-C-186
    when: When configuring DAG scheduling with weekday-only runs and SLA-based refresh intervals
    action: Convert SLA definitions from natural days to business days (using pandas.bdate_range or similar); for critical
      indicators with <24h SLAs, add weekend catch-up runs at Saturday 7 AM UTC and Sunday 7 AM UTC; document actual data
      staleness bounds in dashboard metadata
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Weekday-only scheduling (Mon-Fri 7 AM UTC) combined with natural-day SLA definitions causes Friday afternoon
      releases to experience 63 hours of staleness for 6-hour SLA indicators; monthly indicators miss entire weekends, making
      dashboard freshness claims systematically optimistic by 2-3x for end-of-week releases
    derived_from_bd_id: BD-077
  - id: finance-C-187
    when: When implementing or modifying volatility regime detection logic in position sizing or margin calculations
    action: 'Implement de-duplication logic: when multiple volatility triggers fire simultaneously (z-score threshold + percentile
      rank + margin composite), use only one trigger and log the others as suppressed; or unify into a single volatility regime
      score with documented weight contribution'
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Three independent volatility triggers (z-score +1.0, percentile >75th, margin composite 25% weight) fire
      simultaneously during high-VIX periods, creating multiplicative defensive positioning that causes 40-60% larger position
      reductions than any single trigger would justify, leading to under-hedging followed by failure to re-enter quickly
    derived_from_bd_id: BD-079
  - id: finance-C-188
    when: When implementing crossover detection for SMA-based technical signals
    action: Use pandas shift(1) to compare current vs prior bar states for SMA50/SMA200 crossover detection — detect golden
      cross (bullish) and death cross (bearish) by comparing previous and current bar states
    severity: high
    kind: domain_rule
    modality: must
    consequence: Without shift(1), current bar would incorrectly signal crossovers that haven't occurred yet, causing repainting
      issues where signals appear and disappear as price moves within the same bar
    derived_from_bd_id: BD-005
  - id: finance-C-189
    when: When implementing ensemble prediction for multi-model strategies
    action: Use LogisticRegression meta-learner stacking on base model probabilities — pass base model predictions as features
      to a second-level model that learns optimal weighting; do NOT use simple averaging which treats each models equally
    severity: high
    kind: domain_rule
    modality: must
    consequence: Simple averaging gives equal weight to all models regardless of their current predictive power, causing suboptimal
      predictions during market regime changes when some models outperform others
    derived_from_bd_id: BD-014
  - id: finance-C-190
    when: When processing monetary values in backtesting calculations
    action: Use Python float type for currency calculations — float introduces rounding errors due to binary floating-point
      representation
    severity: high
    kind: architecture_guardrail
    modality: must_not
    consequence: Float rounding errors accumulate over many transactions, causing P&L discrepancies that may appear as profits
      or losses not present in actual trading
    derived_from_bd_id: BD-GAP-003
  - id: finance-C-191
    when: When implementing monetary calculations in the backtesting engine
    action: Use Python Decimal type for each currency calculations — import from decimal import Decimal; initialize with string
      or Decimal('X.XX') to avoid float conversion; apply Decimal throughout trade cost, P&L, and portfolio value calculations
    severity: high
    kind: domain_rule
    modality: must
    consequence: Without Decimal, float-based currency calculations cause silent rounding errors that accumulate in high-frequency
      or long-running backtests, leading to incorrect strategy performance metrics
    derived_from_bd_id: BD-GAP-003
  - id: finance-C-192
    when: When implementing database connection management in Streamlit multi-page applications
    action: Use singleton pattern for DuckDB connection — implement __new__ method to return single shared instance; do NOT
      create new connection instances per page or per query
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Multiple DuckDB connection instances cause 'database is locked' errors when Streamlit pages access data simultaneously,
      breaking multi-page app functionality
    derived_from_bd_id: BD-001
  - id: finance-C-193
    when: When implementing trend identification logic in the financial analysis module
    action: Use SMA (Simple Moving Average) with 20/50/200-period boundaries for trend identification; do not substitute with
      WMA or EMA without re-evaluating signal stability
    severity: high
    kind: domain_rule
    modality: must
    consequence: Switching from SMA to WMA introduces higher sensitivity to outliers and unstable historical baselines, causing
      trend signals to flip incorrectly during volatile periods and generating false trading signals
    derived_from_bd_id: BD-026
  - id: finance-C-195
    when: When using the framework in environments without FRED API credentials
    action: Verify offline mode status before performing analysis; verify users can distinguish between live FRED data and
      fallback sample_FRED_data.csv to avoid treating sample data as current market information
    severity: medium
    kind: operational_lesson
    modality: should
    consequence: Running analysis on sample data without awareness produces misleading results; trading decisions based on
      outdated sample data will not reflect current market conditions
    derived_from_bd_id: BD-002
  - id: finance-C-196
    when: When configuring prediction horizon and rebalancing frequency for ML-based trading strategies
    action: Verify prediction horizon (5 trading days) aligns with rebalancing frequency (weekly); verify these parameters
      match the intended swing trading strategy and not a different timeframe
    severity: medium
    kind: domain_rule
    modality: should
    consequence: Mismatched prediction horizon and rebalancing frequency causes the model to optimize for a different trading
      cycle, leading to signals that are irrelevant to actual weekly rebalancing decisions
    derived_from_bd_id: BD-013
  - id: finance-C-198
    when: When implementing credential storage and access control in production trading systems
    action: Do not rely on file encryption (Fernet) and file permissions (0o600) as sole defense against privileged escalation
      attacks — encryption keys must never be accessible to processes running under the same user account that owns encrypted
      credentials; use Hardware Security Module (HSM) or cloud KMS integration where encryption key never touches application
      memory
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: If an attacker gains owner account access via phishing, password reuse, or insider threat, encryption keys
      become immediately accessible and all credentials can be decrypted, compromising production trading credentials and
      enabling unauthorized market access
    derived_from_bd_id: BD-084
  - id: finance-C-199
    when: When classifying market volatility regimes using VIX thresholds for strategy risk adjustment
    action: Verify VIX regime thresholds (Low<15, Normal<20, Elevated<30, Crisis>=30) against current market structure; these
      thresholds were established during pre-2008 markets and post-financial crisis 'Normal' range has shifted upward with
      structurally higher VIX levels
    severity: medium
    kind: domain_rule
    modality: should
    consequence: Pre-2008 VIX thresholds cause over-sensitive Crisis signals in post-2008 markets with structurally elevated
      volatility, leading strategies to rotate away from risk assets prematurely and systematically underperform during extended
      periods of elevated but non-crisis volatility
    derived_from_bd_id: BD-071
  - id: finance-C-200
    when: When deploying the EnsembleModel for prediction tasks
    action: Monitor base learner prediction correlation; if XGBoost and LightGBM base predictions converge to correlation
      >0.95, the stacking architecture provides minimal benefit over a single model and base learners must be diversified
      with additional model types
    severity: high
    kind: domain_rule
    modality: must
    consequence: When base learners produce highly correlated predictions (>0.95), the 2-level stacking architecture provides
      no ensemble benefit, effectively acting as a single model with unnecessary computational overhead, causing degraded
      prediction accuracy compared to a properly diversified ensemble
    derived_from_bd_id: BD-065
  - id: finance-C-201
    when: When calculating recession probability for macroeconomic regime-aware strategy selection
    action: Verify yield curve (10Y-2Y spread) receives dominant weight of at least 40% in recession probability calculation;
      do not refactor to equal weights or alternative dominant indicators without re-validation against historical recession
      data
    severity: high
    kind: domain_rule
    modality: must
    consequence: Changing yield curve dominance in recession probability calculation alters recession signal timing and accuracy,
      causing recession-aware strategies to incorrectly rotate between risk assets and defensive positions, leading to significant
      performance degradation during economic transitions
    derived_from_bd_id: BD-044
  - id: finance-C-202
    when: When implementing cross-validation logic for time series forecasting
    action: Use expanding or rolling window splits that preserve temporal ordering; verify training data chronologically precedes
      validation data in every fold
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Using random train/test splits or k-fold cross-validation on time series data introduces look-ahead bias,
      causing backtest results to appear significantly better than live performance
    derived_from_bd_id: BD-050
  - id: finance-C-203
    when: When configuring the prediction horizon parameter for the ML model
    action: Set prediction_horizon to 5 (days) for consistency with backtesting; verify the horizon matches the strategy's
      signal-to-noise optimization for medium-term directional prediction
    severity: high
    kind: domain_rule
    modality: must
    consequence: Using a different prediction horizon than validated in backtesting causes backtest-live inconsistency; strategies
      optimized for 1-day horizon may have excessive transaction costs when applied with 5-day horizon
    derived_from_bd_id: BD-051
  - id: finance-C-204
    when: When configuring sector-based macro regime detection
    action: Verify the hardcoded sector classification (XLY/XLK=offensive, XLU/XLP=defensive, remainder=cyclical) matches
      the actual sector composition of the tradable universe; update OFFENSIVE_SECTORS and DEFENSIVE_SECTORS lists when universe
      changes
    severity: high
    kind: domain_rule
    modality: must
    consequence: Using incorrect sector classification causes wrong regime signals; misclassifying utilities as cyclical instead
      of defensive leads to incorrect risk-on/off detection and poor timing decisions
    derived_from_bd_id: BD-011
  - id: finance-C-205
    when: When validating ML model performance for live deployment
    action: Use magnitude-weighted targets, asymmetric loss functions, or supplementary PnL-based validation metrics (Sharpe
      ratio, actual returns) alongside AUC/accuracy; do not rely solely on binary classification metrics
    severity: high
    kind: operational_lesson
    modality: must
    consequence: Walk-forward validation with binary targets produces misleading metrics; models that correctly predict tiny
      0.1% moves score equally with those predicting 10% moves, causing deployment of models that maximize accuracy but minimize
      profitability
    derived_from_bd_id: BD-082
output_validator:
  assertions:
  - id: OV-01
    check_predicate: all(p in inspect.getsource(zvt.factors.algorithm.macd) for p in ['slow=26', 'fast=12', 'n=9'])
    failure_message: 'FATAL: MACD params drifted from (fast=12, slow=26, n=9) — SL-08 violation, non-reproducible signals'
    business_meaning: Standard MACD parameters are a semantic lock; drift makes results incomparable with industry-standard
      indicators and non-reproducible.
    source_ids:
    - SL-08
    - BD-036
  - id: OV-02
    check_predicate: result.get('total_trades', 0) > 0 or result.get('explicit_zero_trade_ack') is True
    failure_message: Zero trades executed — likely missing pre-fetched data (see PC-02) or over-restrictive filters
    business_meaning: A backtest with zero trades is not a valid result; either data is missing or the strategy never triggered.
      Structural non-emptiness check is insufficient — we need business confirmation.
    source_ids:
    - SL-01
    - finance-C-073
  - id: OV-03
    check_predicate: result.get('annual_return') is None or abs(float(result['annual_return'])) <= 5.0
    failure_message: 'FATAL: |annual_return| > 500% — likely look-ahead bias or data error'
    business_meaning: Annual returns exceeding 500% are physically implausible for A-share strategies; indicates look-ahead
      bias or corrupt data.
    source_ids: []
  - id: OV-04
    check_predicate: result.get('holding_change_pct') is None or abs(float(result['holding_change_pct'])) <= 1.0
    failure_message: 'FATAL: |holding_change_pct| > 100% — physically impossible'
    business_meaning: Holding change percentage cannot exceed 100%; violation indicates position accounting error.
    source_ids:
    - BD-029
  - id: OV-05
    check_predicate: result.get('max_drawdown') is None or abs(float(result['max_drawdown'])) <= 1.0
    failure_message: 'FATAL: |max_drawdown| > 100% — impossible for non-leveraged account'
    business_meaning: Maximum drawdown cannot exceed 100% without leverage; violation indicates calculation error or look-ahead
      bias.
    source_ids: []
  - id: OV-06
    check_predicate: not (hasattr(result, 'trade_log') and result.trade_log and any(result.trade_log[i].action == 'sell' and
      i+1 < len(result.trade_log) and result.trade_log[i+1].action == 'buy' and result.trade_log[i].timestamp == result.trade_log[i+1].timestamp
      for i in range(len(result.trade_log)-1)))
    failure_message: 'FATAL: buy-before-sell detected in same cycle — SL-01 violation, creates implicit leverage'
    business_meaning: SL-01 requires sell() before buy() in each cycle; violation means available_long was not updated before
      buying, risking duplicate positions.
    source_ids:
    - SL-01
  scaffold:
    validate_py_path: '{workspace}/validate.py'
    tail_block: "# === DO NOT MODIFY BELOW THIS LINE ===\nif __name__ == \"__main__\":\n    result = run_backtest()\n    from\
      \ validate import enforce_validation\n    enforce_validation(result, output_path=\"{workspace}/result.csv\")\n# ===\
      \ END DO NOT MODIFY ==="
  enforcement_protocol: 1. Never edit validate.py. 2. Never delete the DO NOT MODIFY tail block from the main script. 3. Never
    wrap enforce_validation() in try/except. 4. Never rewrite result write logic — it MUST go through enforce_validation.
    5. If validate.py raises ImportError, fix the dependency, do not remove the call.
acceptance:
  hard_gates:
  - id: G1
    check: '{workspace}/result.csv exists AND file size > 0'
    on_fail: Strategy did not produce output; check run_backtest() return value and enforce_validation() call
  - id: G2
    check: '{workspace}/result.csv.validation_passed marker file exists'
    on_fail: Validation did not complete; review validate.py output and fix assertion failures
  - id: G3
    check: 'Main script contains literal: from validate import enforce_validation'
    on_fail: Validation chain stripped; re-add the import in the DO NOT MODIFY block
  - id: G4
    check: 'Main script contains literal: # === DO NOT MODIFY BELOW THIS LINE ==='
    on_fail: Validation fence removed; regenerate DO NOT MODIFY tail block
  - id: G5
    check: 'result.csv has at least 1 row: pandas.read_csv(result.csv).shape[0] >= 1'
    on_fail: Empty result; check if trade_log is non-empty and factors generated signals. Confirm PC-02 (k-data exists) passed.
  - id: G6
    check: 'If MACD strategy: source contains ''slow=26'' AND ''fast=12'' AND ''n=9'' in algorithm call'
    on_fail: MACD params drifted from SL-08 lock; restore standard (12, 26, 9)
  - id: G7
    check: 'For data pipeline tasks: result.csv contains ''entity_id'' and ''timestamp'' fields'
    on_fail: Missing required columns; check Mixin.query_data return schema and DataFrame MultiIndex reset_index() before
      writing
  - id: G8
    check: 'OV-03 passes: abs(annual_return) <= 5.0 (500%)'
    on_fail: Physical plausibility check failed; investigate look-ahead bias or data corruption in input kdata
  soft_gates:
  - id: SG-01
    rubric: 'Strategy narrative consistency: user intent aligns with generated strategy.py logic. dim_a: signal direction
      (buy/sell) matches intent [1-5, pass>=4]; dim_b: frequency (daily/intraday) aligns [1-5, pass>=4]; dim_c: risk controls
      match user intent [1-5, pass>=4].'
  - id: SG-02
    rubric: 'Factor combination quality. dim_a: no highly correlated factor duplication [1-5, pass>=4]; dim_b: multi-period
      alignment correct [1-5, pass>=4]; dim_c: liquidity filter present for A-share [1-5, pass>=4].'
  - id: SG-03
    rubric: 'Data source selection appropriateness. dim_a: coverage sufficient for target entities [1-5, pass>=4]; dim_b:
      provider latency acceptable for strategy frequency [1-5, pass>=4]; dim_c: no unauthorized provider used without credentials
      [1-5, pass>=4].'
skill_crystallization:
  trigger: all_hard_gates_passed AND user_opt_out_skill_saving != true
  output_path_template: '{workspace}/../skills/{slug}.skill'
  slug_template: '{blueprint_id_short}-{uc_id_lower}'
  captured_fields:
  - name
  - intent_keywords
  - entry_point_script
  - validate_script
  - fatal_constraints
  - spec_locks
  - preconditions
  - install_recipes
  - human_summary_translated
  action: 'After all Hard Gates PASS, resolve slug via slug_template using the executed UC, then write the .skill YAML file
    at output_path_template. Notify user in their detected locale: ''Skill saved as {slug}.skill — next time say one of {sample_triggers}
    from the matched UC to invoke directly.'''
  violation_signal: All hard gates passed but no .skill file exists at expected path
  skill_file_schema:
    name: finance-bp-083 / Database Snapshot Optimization
    version: v5.3
    intent_keywords:
    - backup
    - snapshot
    - parquet
    - database backup
    - compress data
    entry_point: run_backtest
    fatal_guards:
    - SL-01
    - SL-02
    - SL-03
    - SL-04
    - SL-05
    - SL-06
    - SL-07
    - SL-08
    - SL-10
    - SL-11
    - SL-12
    spec_locks:
    - SL-01
    - SL-02
    - SL-03
    - SL-04
    - SL-05
    - SL-06
    - SL-07
    - SL-08
    - SL-09
    - SL-10
    - SL-11
    - SL-12
    preconditions:
    - PC-01
    - PC-02
    - PC-03
    - PC-04
post_install_notice:
  trigger: skill_installation_complete
  message_template:
    positioning: I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow.
    capability_catalog:
      group_strategy:
        source: auto_grouped
        strategy_reason: auto-grouped by UC.type (3 distinct values, balanced distribution)
      groups:
      - group_id: data_pipeline
        name: Data Pipeline
        description: ''
        emoji: 📊
        uc_count: 9
        ucs:
        - uc_id: UC-101
          name: Database Snapshot Optimization
          short_description: 'Creates optimized database backups by partitioning hot (<90 days) and cold (>90 days) data into
            appropriate storage formats with ZSTD compression and '
          sample_triggers:
          - backup
          - snapshot
          - parquet
        - uc_id: UC-102
          name: Database Compaction and Optimization
          short_description: Optimizes database performance by running VACUUM, rebuilding indexes, and deduplicating records
            within retention windows while measuring compression s
          sample_triggers:
          - vacuum
          - optimize
          - database cleanup
        - uc_id: UC-104
          name: Daily Economic Data Refresh
          short_description: Fetches each economic data from FRED and Yahoo Finance APIs daily and stores results in cache
            for dashboard consumption
          sample_triggers:
          - refresh data
          - daily update
          - FRED data
        - uc_id: UC-105
          name: Data Retention Policy Cleanup
          short_description: Archives data older than retention periods to Parquet files and deletes old records from main
            tables to reduce database size while maintaining histori
          sample_triggers:
          - data retention
          - cleanup old data
          - archive historical
        - uc_id: UC-108
          name: FRED Data File Organization
          short_description: Organizes FRED-related data files and scripts by moving them into a dedicated directory structure
          sample_triggers:
          - organize files
          - move FRED data
          - file management
        - uc_id: UC-109
          name: Offline Sample Data Generation
          short_description: Generates sample datasets for offline mode testing, including FRED, Yahoo Finance, and World
            Bank sample data
          sample_triggers:
          - generate sample data
          - offline mode
          - test data
        - uc_id: UC-110
          name: DuckDB Database Initialization
          short_description: Initializes the DuckDB database by creating each required tables and indexes for the Economic
            Dashboard
          sample_triggers:
          - init database
          - create tables
          - database setup
        - uc_id: UC-112
          name: Pickle Cache to DuckDB Migration
          short_description: Migrates existing pickle cache files containing FRED and Yahoo Finance data to the new DuckDB
            database format
          sample_triggers:
          - migrate pickle
          - convert cache
          - DuckDB migration
        - uc_id: UC-113
          name: Smart Data Refresh with SLA Awareness
          short_description: Intelligently refreshes economic data based on natural update frequencies and SLAs, respecting
            rate limits and only fetching data when needed
          sample_triggers:
          - smart refresh
          - SLA aware
          - rate limit
      - group_id: monitoring
        name: Monitoring
        description: ''
        emoji: 📦
        uc_count: 3
        ucs:
        - uc_id: UC-103
          name: API Key Management Verification
          short_description: Verifies the API key management feature implementation is working correctly by testing module
            imports, credential initialization, and key storage/retr
          sample_triggers:
          - verify API keys
          - test credentials
          - API setup verification
        - uc_id: UC-106
          name: API Key Management Quickstart
          short_description: Provides a quick start guide for initializing and testing API key management, storing and verifying
            FRED API keys securely
          sample_triggers:
          - setup API keys
          - quick start
          - initialize credentials
        - uc_id: UC-107
          name: Credentials Initialization
          short_description: Initializes and stores API credentials (FRED API key) securely in encrypted form for authenticated
            data access
          sample_triggers:
          - setup credentials
          - API key initialization
          - secure storage
      - group_id: research_analysis
        name: Research Analysis
        description: ''
        emoji: 📦
        uc_count: 1
        ucs:
        - uc_id: UC-111
          name: News and Sentiment Data Fetching
          short_description: Fetches news articles and sentiment data for specified stock symbols, including Google Trends
            data for sentiment analysis
          sample_triggers:
          - news sentiment
          - fetch news
          - sentiment analysis
    call_to_action: Tell me which one you want to try.
    featured_entries:
    - uc_id: UC-101
      beginner_prompt: Try database snapshot optimization
      auto_selected: true
    - uc_id: UC-102
      beginner_prompt: Try database compaction and optimization
      auto_selected: true
    - uc_id: UC-103
      beginner_prompt: Try api key management verification
      auto_selected: true
    more_info_hint: Ask me 'what else can you do?' to see all 13 capabilities.
  locale_rendering:
    instruction: On skill_installation_complete, translate ALL user-facing strings (positioning + capability_catalog.groups[].name
      + capability_catalog.groups[].description + capability_catalog.groups[].ucs[].short_description + call_to_action + featured_entries[].beginner_prompt
      + more_info_hint) into detected user locale per locale_contract. Preserve UC-IDs, group_id, emoji, and sample_triggers
      verbatim.
    preserve_verbatim:
    - UC-IDs
    - group_id
    - emoji
    - sample_triggers
    - technical_class_names
  enforcement:
    action: 'Host agent MUST send composed message to user as the FIRST user-facing response after skill_installation_complete
      event. Message MUST contain: positioning, capability_catalog (rendered as markdown tables per group), 3 featured_entries,
      call_to_action, and more_info_hint.'
    violation_code: PIN-01
    violation_signal: First user-facing message post-install does not contain the full capability_catalog (all UCs grouped)
      OR skips featured_entries OR skips call_to_action.
human_summary:
  persona: Doraemon
  what_i_can_do:
    tagline: 'I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow. Just tell me
      what you want; I''ll write the code, you don''t have to dig docs. (Heads up: ZVT natively supports A-share, HK, and
      crypto. US stocks — stockus_nasdaq_AAPL — are half-baked; don''t bother for serious work.)'
    use_cases:
    - API Key Management Verification
    - Database Compaction and Optimization
    - Database Snapshot Optimization
    - A-share MACD daily golden-cross backtest with hfq price adjustment from eastmoney
    - 'End-to-end ZVT pipeline: FinanceRecorder + GoodCompanyFactor + StockTrader'
    - Multi-factor strategy with TargetSelector (AND mode) combining MACD + volume breakout
    - Index composition data collection (SZ1000, SZ2000) with EM recorder
  what_i_auto_fetch:
  - ZVT stage pipeline structure (data_collection → visualization) from LATEST.yaml
  - Semantic locks (SL-01 through SL-12) — especially sell-before-buy ordering and MACD params
  - Fatal constraints (finance-C-*) relevant to your target strategy type
  - 'Default parameters: MACD(12,26,9), hfq adjustment, buy_cost=0.001, base_capital=1M CNY'
  - Entity ID format (stock_sh_600000) and DataFrame MultiIndex convention
  - Provider-specific recorder class names and required class attributes
  what_i_ask_you:
  - 'Target market: A-share (default), HK, or crypto? (US stocks in ZVT are half-baked — stockus_nasdaq_AAPL exists but coverage
    is thin)'
  - 'Data source / provider: eastmoney (free, no account), joinquant (account+paid), baostock (free, good history), akshare,
    or qmt (broker)?'
  - 'Strategy type: MACD golden-cross, MA crossover, volume breakout, fundamental screen, or custom factor?'
  - 'Time range: start_timestamp and end_timestamp for backtest period'
  - 'Target entity IDs: specific stocks (stock_sh_600000) or index components (SZ1000)?'
  locale_rendering:
    instruction: On first user contact, translate all fields above into detected user locale while preserving Doraemon persona
      (direct, frank, mildly snarky, knows limits).
    preserve_verbatim:
    - BD-IDs
    - SL-IDs
    - UC-IDs
    - finance-C-IDs
    - class_names
    - function_names
    - file_paths
    - numeric_thresholds

ClawHub Coding Testing+2

T@clawhub-tangweigang-jpg-8679fec286

Easytrader Cn Broker

Skill

提供A股券商客户端自动化交易能力，支持雪球、芸享等多券商登录与交易操作封装，涵盖账户余额查询、持仓管理、委托下单及组合跟随等核心功能。

---
name: easytrader-cn-broker
description: |-
  提供A股券商客户端自动化交易能力，支持雪球、芸享等多券商登录与交易操作封装，涵盖账户余额查询、持仓管理、委托下单及组合跟随等核心功能。
license: Proprietary. See LICENSE.txt in project root.
compatibility: Designed for Doramagic-host ecosystem (Claude Code / openclaw / Cursor). Requires Python 3.12+ with uv package manager.
metadata:
  version: "v6.1"
  blueprint_id: "finance-bp-094"
  compiled_at: "2026-04-22T13:00:40.820921+00:00"
  capability_markets: "cn-astock"
  capability_activities: "backtesting, factor-research"
  sop_version: "crystal-compilation-v6.1"
---
# A 股券商自动交易 (easytrader-cn-broker)

> 提供A股券商客户端自动化交易能力，支持雪球、芸享等多券商登录与交易操作封装，涵盖账户余额查询、持仓管理、委托下单及组合跟随等核心功能。

## Pipeline

`data_collection -> data_storage -> factor_computation -> target_selection -> trading_execution -> visualization`

## Top Use Cases (4 total)

### Broker API Server for Trading Operations (`UC-101`)
Provides HTTP REST API endpoints for broker authentication and retrieving account balance information programmatically, enabling integration with exte
**Triggers**: server, api, http

### XueQiu Trader Account Preparation Validation Test (`UC-102`)
Unit test that validates XueQiuTrader correctly handles account preparation with required parameters (cookies) and properly stores portfolio configura
**Triggers**: xueqiu, trader, account preparation

### YunHui Client Trader Integration Tests (`UC-103`)
Integration tests for YunHui (yh_client) broker trading operations including balance queries, today's trades/entrusts, and entrust cancellation functi
**Triggers**: yh_client, balance, entrust

For all **4** use cases, see [references/USE_CASES.md](references/USE_CASES.md).

**Execute trigger**: `When user intent matches intent_router.uc_entries[].positive_terms AND user uses action verb (run/execute/跑/执行/backtest/fetch/collect)`

## What I'll Ask You

- Target market: A-share (default), HK, or crypto? (US stocks in ZVT are half-baked — stockus_nasdaq_AAPL exists but coverage is thin)
- Data source / provider: eastmoney (free, no account), joinquant (account+paid), baostock (free, good history), akshare, or qmt (broker)?
- Strategy type: MACD golden-cross, MA crossover, volume breakout, fundamental screen, or custom factor?
- Time range: start_timestamp and end_timestamp for backtest period
- Target entity IDs: specific stocks (stock_sh_600000) or index components (SZ1000)?

## Semantic Locks (Fatal)

| ID | Rule | On Violation |
|---|---|---|
| `SL-01` | Execute sell orders before buy orders in every trading cycle | halt |
| `SL-02` | Trading signals MUST use next-bar execution (no look-ahead) | halt |
| `SL-03` | Entity IDs MUST follow format entity_type_exchange_code | halt |
| `SL-04` | DataFrame index MUST be MultiIndex (entity_id, timestamp) | halt |
| `SL-05` | TradingSignal MUST have EXACTLY ONE of: position_pct, order_money, order_amount | halt |
| `SL-06` | filter_result column semantics: True=BUY, False=SELL, None/NaN=NO ACTION | halt |
| `SL-07` | Transformer MUST run BEFORE Accumulator in factor pipeline | halt |
| `SL-08` | MACD parameters locked: fast=12, slow=26, signal=9 | halt |

Full lock definitions: [references/LOCKS.md](references/LOCKS.md)

## Top Anti-Patterns (25 total)

- **`AP-ZVT-183`**: 除权因子为 inf/NaN 时直接参与乘法导致复权静默失败
- **`AP-ZVT-179`**: 第三方数据接口超限后异常被吞噬，数据静默缺失
- **`AP-ZVT-183B`**: HFQ（后复权）与 QFQ（前复权）K 线表使用错误导致因子计算漂移

All 25 anti-patterns: [references/ANTI_PATTERNS.md](references/ANTI_PATTERNS.md)

## Evidence Quality Notice

> [QUALITY NOTICE] This crystal was compiled from blueprint finance-bp-094. Evidence verify ratio = 62.7% and audit fail total = 8. Generated results may have uncaptured requirement gaps. Verify critical decisions against source files (LATEST.yaml / LATEST.jsonl).

## Reference Files

| File | Contents | When to Load |
|---|---|---|
| [references/seed.yaml](references/seed.yaml) | V6+ 全量权威 (source-of-truth) | 有行为/决策争议时必读 |
| [references/ANTI_PATTERNS.md](references/ANTI_PATTERNS.md) | 25 条跨项目反模式 | 开始实现前 |
| [references/WISDOM.md](references/WISDOM.md) | 跨项目精华借鉴 | 架构决策时 |
| [references/CONSTRAINTS.md](references/CONSTRAINTS.md) | domain + fatal 约束 | 规则冲突时 |
| [references/USE_CASES.md](references/USE_CASES.md) | 全量 KUC-* 业务场景 | 需要完整示例时 |
| [references/LOCKS.md](references/LOCKS.md) | SL-* + preconditions + hints | 生成回测/交易代码前 |
| [references/COMPONENTS.md](references/COMPONENTS.md) | AST 组件地图（按 module 拆分）| 查 API 时 |

---

*Compiled by Doramagic crystal-compilation-v6.1 from `finance-bp-094` blueprint at 2026-04-22T13:00:40.820921+00:00.*
*See [human_summary.md](human_summary.md) for non-technical overview.*

FILE:human_summary.md
# Human Summary

> I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow. Just tell me what you want; I'll write the code, you don't have to dig docs. (Heads up: ZVT natively supports A-share, HK, and crypto. US stocks — stockus_nasdaq_AAPL — are half-baked; don't bother for serious work.)

## What I Can Do

- **tagline**: I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow. Just tell me what you want; I'll write the code, you don't have to dig docs. (Heads up: ZVT natively supports A-share, HK, and crypto. US stocks — stockus_nasdaq_AAPL — are half-baked; don't bother for serious work.)
- **use_cases**: ['YunHui Client Trader Integration Tests', 'XueQiu Trader Account Preparation Validation Test', 'Broker API Server for Trading Operations', 'A-share MACD daily golden-cross backtest with hfq price adjustment from eastmoney', 'End-to-end ZVT pipeline: FinanceRecorder + GoodCompanyFactor + StockTrader', 'Multi-factor strategy with TargetSelector (AND mode) combining MACD + volume breakout', 'Index composition data collection (SZ1000, SZ2000) with EM recorder']

## What I Ask You

- Target market: A-share (default), HK, or crypto? (US stocks in ZVT are half-baked — stockus_nasdaq_AAPL exists but coverage is thin)
- Data source / provider: eastmoney (free, no account), joinquant (account+paid), baostock (free, good history), akshare, or qmt (broker)?
- Strategy type: MACD golden-cross, MA crossover, volume breakout, fundamental screen, or custom factor?
- Time range: start_timestamp and end_timestamp for backtest period
- Target entity IDs: specific stocks (stock_sh_600000) or index components (SZ1000)?

FILE:references/ANTI_PATTERNS.md
# Anti-Patterns (Cross-Project)

Total: **25**

## qlib (9)

### `AP-QLIB-1930` — 回测结果与模型无关——共享 dataset 对象导致预测值被首次模型覆盖 <sub>(high)</sub>

Qlib 中多个模型复用同一个已 fit 的 DatasetH 实例时，dataset 内部的标准化 参数（fit_start_time/fit_end_time 决定的归一化统计量）在第一次 fit 后固化。 切换模型但不重新初始化 dataset，导致所有模型实际使用同一套预测信号。表现为 无论换 LightGBM/XGBoost/DNN，回测净值曲线完全一致。这是最危险的"实验看起来 在跑，但结论全部无效"反模式。

Source: https://github.com/microsoft/qlib/issues/1930

### `AP-QLIB-2090` — fit_start_time 与 train segment 双重配置引发隐式数据泄露 <sub>(high)</sub>

Qlib DatasetH 有两个"训练数据范围"：handler 的 fit_start_time/fit_end_time （决定归一化器拟合范围）和 segments.train（决定模型训练范围）。常见错误是 让 fit_end_time 覆盖 valid/test 段，使归一化统计量（均值、标准差）包含了 未来数据，造成前向偏差（look-ahead bias）。两者独立配置但语义耦合，文档 未明确说明 fit_end_time 必须 <= train_end。

Source: https://github.com/microsoft/qlib/issues/2090

### `AP-QLIB-2036` — MACD 因子公式文档错误——DEA 被多除一次 CLOSE 导致量纲不一致 <sub>(high)</sub>

Qlib 官方文档中的 Alpha 公式示例将 MACD 的 DEA 定义为 EMA(DIF, 9) / CLOSE， 但 DIF 已经是无量纲（除过 CLOSE 的），再次除以 CLOSE 导致 DEA 量纲为 1/price。 基于此文档公式构建的 MACD 因子在截面标准化后与正确公式差异显著，IC 下降。 此类文档层面的公式错误会被大量用户直接照搬入生产因子库。

Source: https://github.com/microsoft/qlib/issues/2036

### `AP-QLIB-2184` — 自定义 A 股数据导入前未按约定填充停牌日 NaN，引发下游因子噪声 <sub>(high)</sub>

Qlib 约定停牌日 open/close/high/low/volume/factor 字段均应填 NaN，以便框架 在因子计算时识别并跳过。用户自建 A 股数据集时若将停牌日保留为上一日价格 （常见于从东财/Wind 直接导出的数据），会导致停牌期间的价格动量因子出现 "假信号"（价格不变但因子非零）。Qlib 不校验此约定，错误静默流入训练数据。

Source: https://github.com/microsoft/qlib/issues/2184

### `AP-QLIB-1892` — PIT（Point-In-Time）财务数据收集器依赖外部股票列表接口，全量 A 股获取不完整 <sub>(high)</sub>

Qlib 的 PIT 数据收集器（财务数据时间点快照）在初始化时调用 get_hs_stock_symbols() 获取沪深股票列表。该函数依赖东财 API，经常仅返回 部分列表而非全量 5000+ 股票，且函数在获取不完整时直接 raise ValueError。 用户若按文档步骤操作，财务数据集将只覆盖部分股票，基于 PIT 财务因子的回测 存在严重生存者偏差（未被采集的股票被隐式排除）。

Source: https://github.com/microsoft/qlib/issues/1892

### `AP-QLIB-2097` — 全市场 instrument="all" 在 32GB 内存机器上 OOM，但 CSI300 正常 <sub>(medium)</sub>

Qlib 在加载 Alpha158 特征时会将指定 universe 的全部特征矩阵一次性载入内存。 使用 instrument="csi300"（300 股）与 instrument="all"（5000+ 股）的内存占用 差约 16 倍。32GB 机器跑全市场时在 init_instance_by_config 阶段直接 OOM， 错误信息不提示内存问题。用户容易误以为是配置错误，实际上需要分批加载或 使用流式特征计算。

Source: https://github.com/microsoft/qlib/issues/2097

### `AP-QLIB-1984` — LightGBM 模型标签维度校验逻辑永远不触发导致多标签训练静默失败 <sub>(medium)</sub>

Qlib gbdt.py 中用 y.values.ndim == 2 判断是否为多标签，但从 DataFrame 取出的 Series 的 ndim 永远为 1，条件永远为 False，因此多标签训练不会走 squeeze 分支，而是直接进入 LightGBM 训练并在更深处抛出语义不明的错误。 用户尝试自定义多标签任务时无法从错误信息定位到此根因。

Source: https://github.com/microsoft/qlib/issues/1984

### `AP-QLIB-1915` — 自定义 CSV 数据 dump_bin 后 DataHandler 报 Length mismatch，D.features 却正常 <sub>(high)</sub>

Qlib 存在两套数据访问路径：D.features（直接读 binary）和 DataHandler/DataHandlerLP （带 processor pipeline）。自定义 A 股 CSV 数据在 dump_bin 时若字段顺序 或 symbol 格式（如 600000.SH vs SH600000）与 Qlib 约定不符，DataHandler 的 processor 在 align/reindex 时触发 Length mismatch，而 D.features 因不 经过 processor 而成功。这一"两套路径行为不一致"让用户误以为数据已正确导入。

Source: https://github.com/microsoft/qlib/issues/1915

### `AP-QLIB-1949` — Colab/Linux 多进程后端与 Qlib ParallelExt 冲突导致 DataHandler 完全不可用 <sub>(medium)</sub>

Qlib 在非 fork 环境（Windows 或 Google Colab）中，DataHandler 使用 joblib 并行加载特征时，ParallelExt 初始化时访问 _backend_args 属性失败（AttributeError）。 根因是 joblib 1.5+ 移除了该内部属性，Qlib 的兼容层未更新。表现为 D.features 调用抛出多层嵌套异常，用户无法从错误栈判断是并行后端问题还是数据问题。

Source: https://github.com/microsoft/qlib/issues/1949

## vnpy (4)

### `AP-VNPY-3691` — K 线生成器首根 K 线时间戳不对齐，导致第一个周期信号错误 <sub>(high)</sub>

vnpy BarGenerator 在合成 N 分钟 K 线时，第一根推送的 K 线时间戳为"当前 tick 所在分钟"而非"完整 N 分钟周期结束时间"。具体表现：09:59 的 tick 会 触发一根不完整的 5 分钟 K 线推送（本应等到 10:04 才推送）。策略若在 on_bar 中直接用 datetime.minute % 5 过滤，第一根 K 线恰好通过，但包含的 数据不足一个完整周期，用于信号计算会产生错误的开仓信号。

Source: https://github.com/vnpy/vnpy/issues/3691

### `AP-VNPY-3669` — Alpha 模块历史数据增量保存时新旧 DataFrame schema 不兼容导致 SchemaError <sub>(medium)</sub>

vnpy Alpha 模块在保存 K 线数据到 Parquet 文件时，将新下载数据（可能含 Float64 列）与已存文件（历史 Int64 列）直接 polars.concat。polars 强类型 不允许隐式类型提升，抛出 SchemaError。根因是不同数据源/版本返回的字段类型 不一致（如 volume 在部分行情源为整数，在另一些为浮点），且 concat 前无 schema 对齐步骤。影响所有使用 vnpy alpha 进行回测的历史数据构建流程。

Source: https://github.com/vnpy/vnpy/issues/3669

### `AP-VNPY-3685` — 价差交易模块 run_backtesting() 在 Jupyter 环境下静默报错，结果不可信 <sub>(high)</sub>

vnpy 4.10 价差交易（SpreadTrading）模块的 run_backtesting() 在 Jupyter 环境下存在事件循环冲突（asyncio already running），导致回测引擎部分逻辑 不执行但不抛异常，返回看似正常的回测统计数据。同样代码在命令行 Python 中无此问题。vnpy 4.x 将部分 IO 改为 async 但 Jupyter 的事件循环与之不兼容， 是"回测结果看起来正确但实际不完整"的隐蔽陷阱。

Source: https://github.com/vnpy/vnpy/issues/3685

### `AP-VNPY-3700` — 安装脚本不使用 venv 导致全局 numpy 版本被降级破坏其他依赖 <sub>(medium)</sub>

vnpy install.bat 直接在系统/conda base 环境安装，会强制降级 numpy 到 <2.0 以满足 vnpy 依赖，破坏依赖 numpy 2.x 的其他量化工具（如 scipy、pytorch 新版）。 没有 requirements.txt，依赖边界不透明。在多工具共存的量化研究环境中， vnpy 的安装脚本是"全局环境污染"的常见根源。

Source: https://github.com/vnpy/vnpy/issues/3700

## zipline (6)

### `AP-ZIPLINE-138` — 回测价格为未复权价，教程图表误导用户误判策略收益 <sub>(high)</sub>

Zipline 教程使用 AAPL 股价图做演示，但 bundle 中存储的是未复权价格（raw price）， 而非经过拆股/分红调整的复权价。图表显示的历史价格与市场实际价约差 4 倍（Apple 历次拆股累计因子），用户误将"价格翻 4 倍"当作策略收益。A 股场景更严重： 除权前后价格跳变会在未复权数据中形成巨大"信号"，吸引技术指标在除权日产生 虚假突破信号。

Source: https://github.com/stefan-jansen/zipline-reloaded/issues/138

### `AP-ZIPLINE-235` — 默认以当根 K 线收盘价成交，低估实盘滑点，策略回测收益虚高 <sub>(high)</sub>

Zipline 默认滑点模型在当根 K 线触发信号后，以同根 K 线收盘价成交（current bar close fill）。实盘中信号只能在下一根 K 线的开盘价附近成交（T+1 order execution）。以 A 股日线为例，用收盘价回测比用次日开盘价成交平均高估日收益 约 0.1-0.3%，年化差距可超 30%。需显式配置 slippage model 为 VolumeShareSlippage 或 FixedSlippage 并设合理 volume_limit。

Source: https://github.com/stefan-jansen/zipline-reloaded/issues/235

### `AP-ZIPLINE-190` — 日历 start_session 设为非交易日触发 DateOutOfBounds，无提示如何修正 <sub>(medium)</sub>

Zipline 在注册 bundle 或运行算法时，若 start_session 参数恰好是非交易日 （如 1998-01-01 元旦），Calendar 校验抛出 DateOutOfBounds（"cannot be earlier than the first session"）。错误信息仅显示交易日历起始日，不提示"请改为第一个 交易日"。A 股场景：使用 SSE/SZSE 日历时，若 start_date 恰好是春节前最后 一天次日（节假日），会触发同类错误，调试成本极高。

Source: https://github.com/stefan-jansen/zipline-reloaded/issues/190

### `AP-ZIPLINE-181` — asset db 过期后 Pipeline 报"no assets traded"，误导用户排查数据范围 <sub>(high)</sub>

Zipline 的 asset database（SQLite）记录每只股票的 start/end 交易日期。若 使用了旧版 Quandl/自建 bundle 且未重新 ingest，在回测新日期范围时 Pipeline 抛出 "Failed to find any assets with country_code 'US' that traded between [dates]"。A 股场景：重新下载行情后若只更新价格数据而未重建 asset db，退市/ 新上市股票的日期范围不更新，Pipeline 过滤会悄悄排除这些股票，产生生存者偏差。

Source: https://github.com/stefan-jansen/zipline-reloaded/issues/181

### `AP-ZIPLINE-285` — week_start()/week_end() 在自定义日历（非美股）下静默失效 <sub>(medium)</sub>

Zipline schedule_function 的 date_rules.week_start() 和 date_rules.week_end() 依赖交易日历的周首/周末判断逻辑，但在非美股日历（如 ASX、SSE）中，该逻辑 与 NYSE 日历的偏移计算不兼容，导致 schedule 永远不触发或在错误的日期触发。 A 股场景：使用 SSE 日历时，含春节等连续长假的周，week_start 可能跳过整个 假期周而不调仓，但用户无法从日志发现未触发的调度。

Source: https://github.com/stefan-jansen/zipline-reloaded/issues/285

### `AP-ZIPLINE-240` — 回测日期时区必须为 UTC，传入 naive datetime 引发深层 AssertionError <sub>(medium)</sub>

Zipline 内部强制要求所有时间戳为 UTC aware datetime。当用户传入 naive datetime （无时区信息，如 pd.Timestamp('2020-01-01')）时，不在入口处报错，而是在 算法执行深处触发 AssertionError: Algorithm should have a utc datetime，栈深 难以定位。A 股开发者从本地 CST 时间导入数据时极易触发此陷阱，需在 bundle 注册时显式 tz_localize('UTC')。

Source: https://github.com/stefan-jansen/zipline-reloaded/issues/240

## zvt (6)

### `AP-ZVT-183` — 除权因子为 inf/NaN 时直接参与乘法导致复权静默失败 <sub>(high)</sub>

ZVT 在计算前复权因子时以 new/old 价格比计算 qfq_factor。当 old==0（新股首日 或数据缺失）时因子为 inf；当 kdata.open 本身为 None（停牌日未填充）时乘法 抛出 TypeError。结果：整个 entity 的复权计算中断，后续 K 线全部丢失，但主 流程只 log ERROR 不中断，用户往往不知道已有大量股票数据损坏。

Source: https://github.com/zvtvz/zvt/issues/183

### `AP-ZVT-179` — 第三方数据接口超限后异常被吞噬，数据静默缺失 <sub>(high)</sub>

ZVT 使用聚宽 jqdatasdk 批量拉取全市场 K 线时（4000+ 股票），触发聚宽每日 最大查询条数限制（错误：已超过每日最大查询数量）。ZVT 捕获异常后继续执行下一 entity，导致超限后所有股票的当日数据均静默缺失。回测若使用该残缺数据库，因 子计算结果将产生系统性偏差，且无告警。

Source: https://github.com/zvtvz/zvt/issues/179

### `AP-ZVT-161` — 全市场 SQLite 批量因子计算触发 too many SQL variables 错误 <sub>(medium)</sub>

ZVT 在计算 VolumeUpMaFactor 等多股因子时，将所有 entity_id 拼入单条 SQL 的 IN 子句。当 A 股全市场（5000+ 股）一次性查询时，触发 SQLite 默认限制 SQLITE_MAX_VARIABLE_NUMBER=999。调大 max_allowed_packet（MySQL 参数）无效， 根因是 SQLite 变量数上限。正确解法是分批查询，但 ZVT 早期版本未处理此边界。

Source: https://github.com/zvtvz/zvt/issues/161

### `AP-ZVT-129` — 使用通配符导入隐藏 API 版本变更，AdjustType 等枚举莫名消失 <sub>(medium)</sub>

ZVT 文档示例使用 `from zvt import *` 导入所有符号。当 ZVT 版本升级重构 枚举（如将 AdjustType 移入子模块）后，通配符导入不再包含该符号，触发 AttributeError。使用者误以为是安装问题，实际是版本间 API breaking change 未在 CHANGELOG 中标注，且通配符导入掩盖了具体来源。应显式 import 枚举类。

Source: https://github.com/zvtvz/zvt/issues/129

### `AP-ZVT-187` — 回测引擎未在数据层空结果时提前终止，导致空指针级联崩溃 <sub>(medium)</sub>

ZVT Trader 在 load_data 完成后检查数据为空时，不提前退出，而是将空 DataFrame 传入 selector 计算，触发后续 NoneType 操作链式崩溃。错误栈深且难以定位根因， 用户误以为是策略逻辑问题。根因是数据时间窗口配置错误（start/end 不在数据 库覆盖范围内）但无有效校验。

Source: https://github.com/zvtvz/zvt/issues/187

### `AP-ZVT-183B` — HFQ（后复权）与 QFQ（前复权）K 线表使用错误导致因子计算漂移 <sub>(high)</sub>

ZVT 提供 Stock1dKdata（不复权）、Stock1dHfqKdata（后复权）、Stock1dQfqKdata （前复权）三张独立表。用户在计算价格动量/均线因子时混用两张表（如用不复权 做均线，用后复权做收益率），导致除权日前后因子值产生跳变。ZVT 不做跨表 复权类型一致性校验，混用静默通过。

Source: https://github.com/zvtvz/zvt/issues/183

FILE:references/COMPONENTS.md
# Component Capability Map

**Project**: finance-bp-094--easytrader
**Scan date**: 2026-04-22
**Stats**: {'total_files': 5, 'total_classes': 28, 'total_functions': 0, 'total_stages': 5}

## Modules (5)

- [authentication_&_connection](components/authentication_-_connection.md): 5 classes
- [account_query](components/account_query.md): 6 classes
- [order_execution](components/order_execution.md): 6 classes
- [trade_following](components/trade_following.md): 6 classes
- [remote_service_layer](components/remote_service_layer.md): 5 classes

FILE:references/CONSTRAINTS.md
# Constraints

## preservation_manifest

```yaml
required_objects:
  business_decisions_count: 146
  fatal_constraints_count: 46
  non_fatal_constraints_count: 140
  use_cases_count: 4
  semantic_locks_count: 12
  preconditions_count: 4
  evidence_quality_rules_count: 2
  traceback_scenarios_count: 5

```

## Domain Constraints Injected (71)

- **`SHARED-CN-ASTOCK-T1-001`** <sub>(fatal)</sub>: A 股股票实行 T+1 交收制度：T 日买入的股票最早 T+1 日方可卖出。 T 日卖出所得资金可当日再用于买入。回测框架若未施加 T+1 持仓锁定， 将高估换手率与策略胜率，尤其损害日内反转类策略的真实性。
- **`SHARED-CN-ASTOCK-T1-002`** <sub>(fatal)</sub>: 沪深主板股票日涨跌幅上限为 ±10%（ST/SST 股票 ±5%）。 涨停封板时买方消失、跌停封板时卖方消失；回测若假设当日可以任意价格 成交，会系统性高估可执行性。封板检测应在成交模拟层强制实施。
- **`SHARED-CN-ASTOCK-T1-003`** <sub>(high)</sub>: 科创板和创业板（2020年8月改革后）正常交易日涨跌幅为 ±20%； 北交所 ±30%；新股上市后前5个交易日不设涨跌幅限制。 回测若对所有股票统一套用 ±10% 过滤逻辑，会错误剔除或错误包含这些板块的成交。
- **`SHARED-CN-ASTOCK-T1-004`** <sub>(high)</sub>: ST/*ST 股票日涨跌幅限制为 ±5%，流动性极差，成交假设不可与正常股票混用。 包含历史 ST 股票（最终退市）但不纳入回测会产生幸存者偏差； 纳入回测但不区分 ST 涨跌幅会错误模拟成交。
- **`SHARED-CN-ASTOCK-T1-005`** <sub>(medium)</sub>: A 股开盘集合竞价（9:15-9:25）和收盘集合竞价（14:57-15:00）期间， 成交价由"最大成交量原则"确定，非即时撮合。回测以开盘价或收盘价假设 即时全量成交会低估实际滑点风险，大单策略尤为明显。
- **`SHARED-CN-ASTOCK-T1-006`** <sub>(high)</sub>: 停牌制度：A 股长期停牌（2018年前可长达数月）期间，持仓资金被锁定， 无法再平衡，机会成本在回测中普遍被忽略。应在因子计算前过滤停牌日 （volume == 0 或 is_suspended == True），停牌期间不发出信号。
- **`SHARED-CN-ASTOCK-T1-007`** <sub>(high)</sub>: 新股上市后前5个交易日无涨跌幅限制（首日涨幅可超300%）， 且无完整历史数据（均线/波动率/换手率因子无法计算）。 应在因子计算前过滤上市不足 N 个交易日（通常 60-252 日）的股票。
- **`SHARED-CN-ASTOCK-T1-008`** <sub>(high)</sub>: A 股程序化交易监管新规（2025年7月7日施行）：单账户每秒申报/撤单 ≥ 300 笔， 或单日申报/撤单 ≥ 20000 笔，被认定为高频交易，须向交易所报备。 AI 生成的量化策略若频率超标则无法合规运行，应在策略设计期提示。
- **`SHARED-CN-ASTOCK-ADJ-001`** <sub>(fatal)</sub>: 除权除息日股价跳空是账面调整而非真实亏损。复权选择： 不复权会虚增策略亏损；前复权会将历史价格内嵌未来分红信息（lookahead bias）； 后复权以上市首日为基准累积，是量化回测的最优选择。
- **`SHARED-CN-ASTOCK-ADJ-002`** <sub>(fatal)</sub>: A 股上市公司财务报告披露有法定延迟：年报在次年4月30日前、 半年报在8月31日前、季报分别在4月30日（一季）/10月31日（三季）前披露。 回测中使用财务数据时，必须以实际披露日期（announcement_date）而非 会计期间结束日作为数据可用时间点，否则引入 point-in-time lookahead bias。
- **`SHARED-CN-ASTOCK-ADJ-003`** <sub>(high)</sub>: 分红送股转增和配股会导致除权除息日后股本增加，历史持股数量不变但股价等比 缩水，若回测系统未同步调整持仓股数，会在除权日产生虚假亏损或盈利。
- **`SHARED-CN-ASTOCK-ADJ-004`** <sub>(medium)</sub>: 大宗交易与竞价交易价差：大宗交易成交价可比市价折价最多 10%（主板）， 但此价格不影响次日竞价开盘。大宗交易数据出现在收盘后，若将其混入 日内 OHLCV 数据，会污染收盘价和成交量的正常计算。
- **`SHARED-CN-ASTOCK-ADJ-005`** <sub>(fatal)</sub>: 融资融券（两融）做空限制：A 股散户无法直接卖空，融券标的池有限（主要为 大盘蓝筹，中小盘融券极度稀缺），融券利率远高于融资利率。 回测若直接假设可做空任意股票，会产生不可执行的策略，实盘与回测严重背离。
- **`SHARED-CN-ASTOCK-FX-001`** <sub>(high)</sub>: 通过沪深港通（北向）买入股票，境外投资者合计持股上限 30%，预警线 28%。 当外资持股比例达 28% 时，联交所暂停该股新增买盘，直到降至 26% 才恢复。 策略若重仓外资偏好股（消费/医药龙头），需监控外资持股比例。
- **`SHARED-CN-ASTOCK-FX-002`** <sub>(high)</sub>: 5% 举牌规则：单一投资者持有上市公司已发行股份超过 5%，须在3日内向证监会 和交易所报告并公告；在此期间及公告后2日内不得再买卖。 量化选股系统若不考虑此规则，重仓股超过 5% 阈值后将面临强制停止买入。
- **`SHARED-CN-ASTOCK-FX-003`** <sub>(high)</sub>: 公募基金"双十原则"：单基金持有单只股票不超过净资产 10%， 同一基金管理人旗下所有基金合计不超过该公司已发行股份 10%。 量化选股组合若部署于公募基金，需在优化约束中强制加入合规上限。
- **`SHARED-CN-ASTOCK-FX-004`** <sub>(fatal)</sub>: 内幕交易边界：AI 辅助量化系统的所有输入数据必须来自公开已披露信息。 通过非公开渠道（私有数据服务/内部消息/重组前预知）触发的自动化交易 构成内幕交易，适用《证券法》第80-83条及《内幕交易行为认定指引》。
- **`SHARED-CN-ASTOCK-MKT-001`** <sub>(fatal)</sub>: 幸存者偏差：使用当前 A 股成分股（如当前沪深300）作为历史回测股票池， 会遗漏曾被纳入指数但因业绩差被调出或退市的股票。2020-2024年 A 股 退市数量加速（41家/年创纪录），此偏差日趋严重。必须使用历史时点快照。
- **`SHARED-CN-ASTOCK-MKT-002`** <sub>(medium)</sub>: 指数成分股调整效应：沪深300/中证500等每半年调整一次（6月/12月）， 被纳入股票通常在公告日至生效日之间显著上涨（被动资金被动买入）， 被剔除股票则相反。回测股票池应使用历史成分股快照，并标注调整窗口期。
- **`SHARED-CN-ASTOCK-MKT-003`** <sub>(high)</sub>: 策略拥挤（Strategy Crowding）：大量量化私募使用相似因子模型时， 持仓高度重叠，遇市场冲击时集体卖出形成踩踏。2024年2月 A 股量化危机 是典型案例（小盘股指数单日跌幅超 10%）。需监控因子多头持仓与 主流量化基金的重叠率。
- **`SHARED-CN-ASTOCK-MKT-004`** <sub>(high)</sub>: A 股量化对冲策略常用 IF/IC/IM 股指期货做多/空对冲系统性风险。 但 A 股股指期货长期处于贴水（远期价格 < 现货），IC 年化贴水可达 10-20%。 回测若仅考虑价格收益而忽略期货贴水/升水，会严重高估对冲策略净收益。
- **`SHARED-CN-ASTOCK-MKT-005`** <sub>(high)</sub>: A 股月度动量因子在方向上与美股相反：近1个月表现最好的股票， 下1个月大概率反转（反转效应而非动量）。机构研究（华泰/东吴证券） 与学术论文均验证：直接套用美股月度动量因子在 A 股会产生系统性亏损。
- **`SHARED-CN-ASTOCK-BF-001`** <sub>(medium)</sub>: 处置效应（Shefrin & Statman 1985）在 A 股散户中尤为显著： 投资者倾向于过早卖出盈利股票、过长持有亏损股票。上交所实证研究证实 超过 90% 的个人账户存在此效应，AI 辅助工具不应迁就"持有亏损等解套" 的直觉，而应基于量化信号提供纪律性止损止盈建议。
- **`SHARED-CN-ASTOCK-BF-002`** <sub>(medium)</sub>: A 股以散户为主（个人账户交易量占比超 80%），羊群效应显著：散户倾向于 跟风操作，导致价格非理性波动（如 2015年杠杆牛熊）。量化策略应避免 使用成交量排行/热度排行等可能强化羊群信号的指标作为主要因子。
- **`SHARED-CN-ASTOCK-BF-003`** <sub>(medium)</sub>: 过度自信效应（Barber & Odean 2000）在 A 股散户中更严重：散户年均换手率 超 500%，机构长期收益显著优于散户。高换手率策略经交易成本后净收益往往 更低。AI 不应鼓励"频繁操作"，而应推荐低频高质信号驱动交易。
- **`SHARED-CN-ASTOCK-BF-004`** <sub>(medium)</sub>: A 股日历效应：春节效应（节前5日和节后1-3日倾向上涨）、月初效应 （月初第1-5个交易日表现优于月中/月末）已有学术实证（南京财经大学等）。 策略应在日历特殊窗口降低信号置信度，或单独评估日历驱动收益的贡献。
- **`SHARED-CN-ASTOCK-BF-005`** <sub>(high)</sub>: 策略容量（Capacity）限制：A 股小盘/微盘股日均成交额仅数百万， 大资金买入/卖出会造成严重价格冲击，策略实际容量可能仅几千万元。 回测结果不可外推至亿级资金，应在回测中加入成交量比例上限约束。
- **`SHARED-CN-ASTOCK-COST-001`** <sub>(fatal)</sub>: A 股完整交易成本结构（2023年8月调整后）：印花税卖出单向 0.05%； 佣金双向约 0.01%（最低5元）；过户费（沪市）0.001%； 滑点/冲击成本小盘股 0.1%-0.5%/次。忽略成本的回测策略年化收益率 具有欺骗性，高频/高换手策略尤甚。
- **`SHARED-CN-ASTOCK-COST-002`** <sub>(high)</sub>: 市场冲击成本（Market Impact）在回测中通常完全缺失，但在实盘中可能是 最大成本来源。A 股小盘股 100 万元买入可能推高 1% 以上。冲击成本与 成交规模呈幂律而非线性关系，应使用 Almgren-Chriss 模型或简化版估算。
- **`SHARED-CN-ASTOCK-COST-003`** <sub>(medium)</sub>: 大股东/董监高减持新规（证监会第224号令，2024年5月）：持股5%以上大股东 通过集中竞价减持须提前15个交易日披露减持计划，3个月内不超过股份总数1%。 解禁股减持压力是 A 股特有的系统性风险因子，回测中忽略解禁日历会低估 相关股票的持股风险。
- **`SHARED-CN-ASTOCK-DATA-001`** <sub>(high)</sub>: A 股交易日历与自然日历不一致：存在法定节假日调休导致的"补班日"（周六上班）， 以及临时停市（2015年7月8日至7月10日因股灾紧急停市）。 使用通用工作日历（weekdays）推算 A 股交易日会产生偏差， 必须使用 A 股专用交易日历（如 exchange_calendars 或 tushare 的交易日接口）。
- **`SHARED-CN-ASTOCK-DATA-002`** <sub>(medium)</sub>: A 股退市后股票代码可能被新股重用（极少见但存在）。使用纯代码（如 '000001'） 作为历史数据主键而不包含交易所后缀（'.SZ'）或上市日期范围，可能导致 历史数据与当前股票的错误混淆，长周期回测中需特别注意。
- **`SHARED-BT-LAB-001`** <sub>(fatal)</sub>: 未来函数（Lookahead Bias）：在模拟历史时间点 t 的交易决策时， 不得使用 t 时刻之后才能知道的信息。最常见形式： (1) 使用收盘价计算信号并同日以收盘价成交； (2) 将 T 日收盘后计算的指标标记在同一根 K 线； (3) 使用当日最高/最低价作为成交假设。 信号计算与成交时间必须对齐：T 日收盘后计算信号，T+1 日开盘成交。
- **`SHARED-BT-LAB-002`** <sub>(high)</sub>: 指标预热期（Warmup Period）处理：滚动窗口指标在前 N 个 bar 时 NaN， 这些 bar 不应参与信号计算和持仓决策。强制要求指标的 warmup_period 与最长 lookback 期等长，且 warmup 期间持仓应置零。
- **`SHARED-BT-LAB-003`** <sub>(fatal)</sub>: ML/DL 模型时序数据划分必须按时间顺序：TRAIN < VALID < TEST， 不可使用随机 k-fold 分折（会将未来数据混入训练集）。 应使用 TimeSeriesSplit 或 Walk-Forward 验证。
- **`SHARED-BT-LAB-004`** <sub>(fatal)</sub>: 开盘价/最高价/最低价成交假设：日线回测中假设每日可以最高价卖出或 最低价买入（如动量策略"最高价止盈"），这是明显的 lookahead， 因为日内最高/最低价只有收盘后才能确认。成交价只能用开盘价或 前一日收盘价（带滑点）。
- **`SHARED-BT-LAB-005`** <sub>(high)</sub>: 数据对齐偏移（Off-by-one）：pandas rolling/shift 等操作容易引入细微的 1 期偏移错误。应在代码中明确记录每个序列的"观测时间点"， 并通过 assert 验证关键时间对齐关系。
- **`SHARED-BT-LAB-006`** <sub>(high)</sub>: 过度优化（Overfitting）：回测数量越多，过拟合概率越高。 Bailey et al.（2014）证明 Optimal Sharpe Ratio 期望值随回测次数单调递减。 应使用 Walk-Forward 验证代替 in-sample 参数穷举，并报告 Deflated Sharpe Ratio（DSR）而非峰值 Sharpe。
- **`SHARED-BT-SURV-001`** <sub>(fatal)</sub>: 幸存者偏差（Survivorship Bias）：使用当前市场成分股作为历史回测股票池， 会遗漏曾经存在但后来退市、摘牌或被合并的股票，系统性高估策略历史收益率。 回测股票池必须使用历史时点快照（point-in-time universe）。
- **`SHARED-BT-SURV-002`** <sub>(high)</sub>: In-Sample / Out-of-Sample 划分：策略开发、参数选择必须在样本内完成， 样本外数据仅用于最终验证，不可多次"看"样本外数据后继续调优 （会将样本外变为新的样本内，重蹈过拟合）。
- **`SHARED-BT-SURV-003`** <sub>(high)</sub>: 停牌/缺失数据的填充策略：停牌日价格不可简单用前一日收盘价 forward-fill， 因为这会在复盘时造成"零成交量"日参与了因子计算和信号生成。 应在因子计算层显式过滤缺失交易日，不填充。
- **`SHARED-BT-SURV-004`** <sub>(high)</sub>: 异常值（Extreme Value）污染：原始市场数据可能含有数据源错误（如除权未 及时调整、手工录入错误导致的极端价格），不清洗直接进入因子计算会产生 极端信号，污染整个横截面。应在 pipeline 入口处过滤 3-sigma 异常值。
- **`SHARED-BT-COST-001`** <sub>(fatal)</sub>: 交易成本（佣金 + 印花税/转让税 + 过户费）必须在回测初始化时强制配置， 不可使用零成本默认值。忽略成本的回测策略绩效指标具有欺骗性， 高换手率策略尤其严重（单边往返成本往往吞噬 50%+ 的毛收益）。
- **`SHARED-BT-COST-002`** <sub>(high)</sub>: 滑点（Slippage）建模：回测若无滑点，假设每笔订单以理想价格成交， 高频策略在实盘中会因成交价劣化而产生严重亏损。至少应配置固定点差 或比例滑点；大单应使用成交量比例模型（如不超过日成交量 5%）。
- **`SHARED-BT-COST-003`** <sub>(high)</sub>: 换手率（Turnover）必须在回测绩效报告中展示并与成本关联分析。 月换手率超过 50%（年化 600%+）时，策略净收益对成本假设极度敏感， 每 10bps 成本变化可能改变策略盈亏结论，必须做成本敏感性分析。
- **`SHARED-BT-COST-004`** <sub>(medium)</sub>: 仓位规模化（Position Sizing）必须纳入资金量约束：回测应模拟固定资金量 下的实际持仓股数（取整），而非假设可以持有小数股。 对小盘股，最小交易单位（A股：100股/手）会导致实际可持仓量与目标权重 产生偏差，应在回测中模拟取整效应。
- **`SHARED-BT-TIME-001`** <sub>(high)</sub>: 时间戳时区统一：多数据源合并时，UTC vs 本地时间混用是常见数据腐败源。 所有时间戳必须在 pipeline 入口处统一转换为同一时区（推荐 UTC 存储， 市场本地时区展示），不可在 pipeline 中途混用不同时区。
- **`SHARED-BT-TIME-002`** <sub>(high)</sub>: 交易日历对齐：合并不同市场或不同频率数据时（如日线价格 + 周频因子）， 必须使用明确的交易日历进行 reindex/merge，不可使用 outer join 后 fillna， 否则会在非交易日（节假日）创建虚假数据行。
- **`SHARED-BT-TIME-003`** <sub>(high)</sub>: 增量更新边界校验：历史数据增量更新时，必须从数据库查询已存最新日期， 仅下载该日期之后的数据。若重新下载已有数据并追加，会产生时间戳重复行， 导致回测时序错误。更新前后必须校验无重复 (index.duplicated().any() == False)。
- **`SHARED-BT-TIME-004`** <sub>(medium)</sub>: 回测绩效归因失真：基准（Benchmark）选择不当会使 Alpha/Beta 计算失真。 应选用策略实际可投资的被动基准（如 HS300 ETF），而非不可直接投资的 价格指数（如 HS300 指数）。价格指数不含股息再投资，会低估持仓基准收益。
- **`SHARED-BT-PERF-001`** <sub>(medium)</sub>: 最大回撤（Max Drawdown）计算必须使用净值序列（portfolio value）， 不可用累计收益率序列代替。若使用对数收益率累加，会低估回撤深度 （因对数收益率在下跌时会比简单收益率偏小）。
- **`SHARED-BT-PERF-002`** <sub>(medium)</sub>: Sharpe Ratio 年化化约定：年化 Sharpe = 日 Sharpe × sqrt(252)（股票，252 交易日） 或 × sqrt(365)（加密货币，365日）。不同系统默认不同，跨系统对比前必须 确认年化因子，否则 Sharpe 不可比。
- **`SHARED-BT-PERF-003`** <sub>(medium)</sub>: Calmar Ratio / Sortino Ratio 优于 Sharpe Ratio 作为风险调整收益指标： Sharpe 假设收益正态分布，A 股/加密市场的收益分布显著左偏（肥尾）， 会低估下行风险。量化评估应同时报告 Sortino（仅下行波动）和 Calmar（年化收益/最大回撤），不应单一依赖 Sharpe。
- **`SHARED-BT-PERF-004`** <sub>(medium)</sub>: 回测绩效归因应拆解为：alpha（主动收益）、beta（市场收益）、 因子暴露收益（style/sector）和特异性收益（stock selection）。 不做归因的回测无法区分"策略优秀"与"顺风行情恰好 beta 对了"。
- **`SHARED-FR-IC-001`** <sub>(high)</sub>: IC（信息系数）是衡量因子预测能力的核心指标，定义为因子值与 下期收益率的 Spearman 秩相关系数（ICIR = IC / std(IC)）。 IC 绝对值 > 0.05 视为有预测能力的初步证据，ICIR > 0.5 视为稳定。 不计算 IC 直接报告回测绩效是因子有效性证明缺失的典型问题。
- **`SHARED-FR-IC-002`** <sub>(high)</sub>: IC 衰减（IC Decay）分析：因子预测能力通常随持仓期增长而衰减。 应计算 1/5/10/20 日 IC 序列，识别因子的最优持仓期。 IC 在1日高但20日迅速衰减的因子是短期因子，不适合月度换仓策略； 反之亦然。使用错误的持仓期会严重损害因子实盘表现。
- **`SHARED-FR-IC-003`** <sub>(high)</sub>: Harvey, Liu & Zhu (2016) 警告：学术界已发现 300+ 个"显著"因子， 其中大量是多重检验下的误发现（False Discovery）。因子有效性要求： t-stat > 3.0（而非传统的 1.96）；或在不同时段/市场独立复现； 或有清晰的经济学逻辑。不满足上述条件的因子极可能是数据挖掘产物。
- **`SHARED-FR-IC-004`** <sub>(high)</sub>: 因子换手率（Factor Turnover）控制：高 IC 但高换手率的因子，在扣除 交易成本后净 IC 可能为负。应计算换手率调整后的有效 IC： net_IC = IC - turnover × cost_per_turn。目标换手率 ≤ 50%（月频）。
- **`SHARED-FR-IC-005`** <sub>(medium)</sub>: 因子衰减期（Half-life）是因子信号强度的核心参数，直接决定最优再平衡频率。 半衰期 < 5 日：日频或周频换仓；5-20 日：周频或双周；> 20 日：月频换仓。 错误地对短期因子使用月频换仓，会导致大量 alpha 在持仓期内消散。
- **`SHARED-FR-NEUT-001`** <sub>(high)</sub>: 行业中性化（Industry Neutralization）：因子值若不对行业均值中性化， 因子收益中会混入行业轮动收益，难以判断是因子本身还是行业暴露驱动了收益。 行业中性化操作：factor_neutral = factor - industry_mean(factor)。
- **`SHARED-FR-NEUT-002`** <sub>(high)</sub>: 市值中性化（Market Cap Neutralization）：小盘股效应（小盘跑赢大盘） 是金融史上最持久的 anomaly 之一，会污染几乎所有未中性化的因子。 若因子与市值高度相关，选股会系统性偏向小盘，收益来自市值暴露而非因子本身。 需同时进行行业和市值中性化（Fama-MacBeth 回归或残差法）。
- **`SHARED-FR-NEUT-003`** <sub>(high)</sub>: 异常值处理（Winsorize/MAD）：因子原始值通常含有极端值，极端值会扭曲 分组分析（如 Q1/Q10 十分位）。应对原始因子值做 Winsorize（截尾至 [1%, 99%] 或 3-sigma）或 MAD（中位数绝对偏差）缩尾，然后再排名/中性化。
- **`SHARED-FR-NEUT-004`** <sub>(medium)</sub>: 因子正交化（Factor Orthogonalization）：当多个因子共同用于合成打分时， 高相关因子的合成等效于对单一因子过度权重，稀释信号多样性。 应在合成前对因子做施密特正交化或 PCA，消除因子间的多重共线性。
- **`SHARED-FR-NEUT-005`** <sub>(medium)</sub>: 缺失数据填充策略：因子计算中的 NaN（停牌/新股/数据缺口）若用截面均值填充 会引入 lookahead bias（均值本身含未来信息）；若完全删除会产生幸存者偏差； 正确做法是用截面中位数（当日所有股票的中位数，不依赖未来）或将该股当日排除。
- **`SHARED-FR-PORT-001`** <sub>(high)</sub>: 分层分析（Quantile Analysis）：因子评估应使用 Q1/Q5（五分位）或 Q1/Q10（十分位）分组的多空收益差（top minus bottom spread）作为 主要评估指标，而非简单的多头收益。Q1 多 Q5 空的"单调性"检验是 因子有效性的核心证据：单调递增/递减 > 非单调 >> 仅多头有效。
- **`SHARED-FR-PORT-002`** <sub>(medium)</sub>: Alpha 衰减测试（Alpha Decay Test）：因子的月度 IC 在不同时段（牛市/熊市/ 震荡市）的稳定性是因子鲁棒性的重要证据。IC 仅在某个特定市场状态下有效 的因子不适合全天候部署；应分段（rolling 12M）展示 IC 时序， 识别因子失效期。
- **`SHARED-FR-PORT-003`** <sub>(medium)</sub>: 换仓成本感知（Turnover-Aware Selection）：因子排名靠近中间地带（49-51 分位） 的股票，排名小幅波动就会触发换仓，产生大量无效交易成本。 应在选股时设置换仓缓冲区（buffer zone）：只在排名变化超过阈值时才换仓。
- **`SHARED-FR-PORT-004`** <sub>(medium)</sub>: 分组收益的统计显著性（Bootstrap 检验）：因子分层收益差（Q1-Q5 spread） 即使在历史数据上很大，也可能是偶然，需要 bootstrap 或 t-test 检验 显著性（p-value < 0.05）。小样本回测期（< 3年）的分层收益尤其不可靠。
- **`SHARED-FR-XFER-001`** <sub>(high)</sub>: 因子跨市场可移植性验证：在一个市场有效的因子，不必然在另一个市场有效。 将美股因子直接套用 A 股、或将股票因子套用期货/加密货币，需要独立 IC 验证， 不可假设跨市场通用性。A 股特有异象（如反转效应、ST 价格异常）不存在于美股。
- **`SHARED-FR-XFER-002`** <sub>(medium)</sub>: 因子有效性时间稳定性：曾经有效的因子会因市场学习和套利行为逐渐失效 （McLean & Pontiff 2016 证明因子发表后平均衰减 58%）。 应定期（每季度/年）重新评估因子 IC，失效因子应及时替换或降权。
- **`SHARED-FR-XFER-003`** <sub>(medium)</sub>: 因子与宏观经济环境的交互：利率周期/经济周期/市场情绪对因子有效性影响显著。 价值因子（低 P/B）在利率上升期更有效；动量因子在趋势市更有效，震荡市失效。 部署因子前应评估当前宏观环境与因子最优生存环境的匹配度。

FILE:references/LOCKS.md
# Semantic Locks + Preconditions

## Semantic Locks (12)

### `SL-01` <sub>(on_violation: fatal)</sub>
Execute sell orders before buy orders in every trading cycle

### `SL-02` <sub>(on_violation: fatal)</sub>
Trading signals MUST use next-bar execution (no look-ahead)

### `SL-03` <sub>(on_violation: fatal)</sub>
Entity IDs MUST follow format entity_type_exchange_code

### `SL-04` <sub>(on_violation: fatal)</sub>
DataFrame index MUST be MultiIndex (entity_id, timestamp)

### `SL-05` <sub>(on_violation: fatal)</sub>
TradingSignal MUST have EXACTLY ONE of: position_pct, order_money, order_amount

### `SL-06` <sub>(on_violation: fatal)</sub>
filter_result column semantics: True=BUY, False=SELL, None/NaN=NO ACTION

### `SL-07` <sub>(on_violation: fatal)</sub>
Transformer MUST run BEFORE Accumulator in factor pipeline

### `SL-08` <sub>(on_violation: fatal)</sub>
MACD parameters locked: fast=12, slow=26, signal=9

### `SL-09` <sub>(on_violation: warning)</sub>
Default transaction costs: buy_cost=0.001, sell_cost=0.001, slippage=0.001

### `SL-10` <sub>(on_violation: fatal)</sub>
A-share equity trading is T+1 (no same-day close of buy positions)

### `SL-11` <sub>(on_violation: fatal)</sub>
Recorder subclass MUST define provider AND data_schema class attributes

### `SL-12` <sub>(on_violation: fatal)</sub>
Factor result_df MUST contain either 'filter_result' OR 'score_result' column

## Preconditions (4)

- **`PC-01`**: `python3 -c 'import zvt; print(zvt.__version__)'` → on_fail: Run: python3 -m pip install zvt then re-run: python3 -m zvt.init_dirs to initialize data directories
- **`PC-02`**: `python3 -c "from zvt.api.kdata import get_kdata; df = get_kdata(entity_ids=['stock_sh_600000'], limit=1); assert df is not None and len(df) > 0, 'No kdata found'"` → on_fail: Run recorder first: python3 -m zvt.recorders.em.em_stock_kdata_recorder --entity_ids stock_sh_600000 (replace with your target entity IDs)
- **`PC-03`**: `python3 -c "import os; from pathlib import Path; zvt_home = Path(os.environ.get('ZVT_HOME', Path.home() / '.zvt')); assert zvt_home.exists(), f'ZVT home not found: {zvt_home}'"` → on_fail: Run: python3 -m zvt.init_dirs
- **`PC-04`**: `python3 -c "import os, tempfile; from pathlib import Path; zvt_home = Path(os.environ.get('ZVT_HOME', Path.home() / '.zvt')); test_f = zvt_home / '.write_test'; test_f.touch(); test_f.unlink()"` → on_fail: Check directory permissions: chmod u+w ~/.zvt or set ZVT_HOME environment variable to a writable location

FILE:references/USE_CASES.md
# Known Use Cases (KUC)

Total: **4**

## `KUC-101`
**Source**: `easytrader/server.py`

Provides HTTP REST API endpoints for broker authentication and retrieving account balance information programmatically, enabling integration with external trading systems.

## `KUC-102`
**Source**: `tests/test_xqtrader.py`

Unit test that validates XueQiuTrader correctly handles account preparation with required parameters (cookies) and properly stores portfolio configuration.

## `KUC-103`
**Source**: `tests/test_easytrader.py`

Integration tests for YunHui (yh_client) broker trading operations including balance queries, today's trades/entrusts, and entrust cancellation functionality.

## `KUC-104`
**Source**: `tests/test_xq_follower.py`

Unit tests for XueQiuFollower that verify transaction projection and sell amount adjustment logic for portfolio mirroring operations.

FILE:references/WISDOM.md
# Cross-Project Wisdom

Total: **10**

## `CW-BT-001` — Cerebro 统一编排引擎
**From**: backtrader · **Applicable to**: backtesting

backtrader 用 Cerebro 作为单一入口，统一管理 data feeds、strategies、analyzers、 observers 的生命周期，支持一次 cerebro.run() 跑多策略+多数据源。 zvt 的 StockTrader 目前每次实例化只绑定一套因子，缺乏统一的多策略组合编排层； 借鉴 Cerebro 模式可让用户把多个 Trader 实例组合到一个 runner 中对比评估。

## `CW-BT-002` — Analyzer 插件化绩效评估
**From**: backtrader · **Applicable to**: backtesting

backtrader 提供 SharpeRatio、DrawDown、TimeReturn、TradeAnalyzer 等即插即用 的 Analyzer，可在不修改策略代码的情况下附加任意绩效指标。 zvt 当前绩效评估能力较弱，没有标准化的 Analyzer 接口； 借鉴此模式可让用户 cerebro.addanalyzer(SharpeRatio) 即得风险调整收益报告。

## `CW-BT-003` — Sizer 仓位管理分离
**From**: backtrader · **Applicable to**: backtesting

backtrader 将仓位管理（每次开仓买多少股/多大比例）单独抽象为 Sizer， 与信号逻辑完全解耦；内置 FixedSize、PercentSizer 等，用户可自定义。 zvt 目前没有显式的 Sizer 概念，仓位控制逻辑散落在 Trader.on_profit_control 等钩子中； 引入 Sizer 接口可使策略信号与资金管理规则独立演化和组合复用。

## `CW-BT-004` — Order 类型全集（Limit/Stop/OCO/Bracket）
**From**: backtrader · **Applicable to**: backtesting

backtrader 支持 Market、Limit、Stop、StopLimit、OCO（二选一）、 Bracket（止盈止损一对订单）等丰富订单类型，并模拟成交滑点和手续费方案。 zvt 回测目前主要支持市价成交，缺乏限价委托和组合订单模拟； 对于高频或实盘对接场景，完善订单类型将大幅提升回测真实性。

## `CW-BT-005` — 数据重采样与重播（Resampling & Replaying）
**From**: backtrader · **Applicable to**: backtesting

backtrader 可将低级别数据（如 1 min）实时 resample 为高级别（如 1 day）并同步驱动策略， 或 replay 逐 tick 模拟 OHLC 形成过程，实现日内精细回测。 zvt 目前多时间框架通过预录入不同级别 K 线实现，缺少运行时动态重采样； 借鉴此模式可在不重复录入数据的前提下支持任意时间粒度组合回测。

## `CW-VN-003` — CTA 回测引擎内置可视化
**From**: vnpy · **Applicable to**: backtesting

vnpy 的 cta_backtester 提供图形界面直接展示策略净值曲线、最大回撤、 每日盈亏、成交明细，无需 Jupyter Notebook。 zvt 目前回测结果可视化依赖 draw_result 方法调用 Plotly，但无统一的回测报告页面； 借鉴此模式可打包一个开箱即用的策略绩效仪表盘。

## `CW-VN-004` — vnpy.alpha ML 因子研究实验室（Lab）
**From**: vnpy · **Applicable to**: factor-research

vnpy 4.0 的 vnpy.alpha.lab 提供数据管理、模型训练、信号生成、策略回测一体化工作流， 支持 Lasso/LightGBM/MLP 等算法的标准化训练接口和可视化对比。 zvt 的 ML 能力目前仅有 MaStockMLMachine 一个入口，缺乏规范化 Lab 框架； 借鉴 Lab 模式可建立"特征工程→训练→信号→回测"的标准流水线，降低 ML 实验门槛。

## `CW-QL-001` — Point-in-Time 数据库（防未来数据泄漏）
**From**: qlib · **Applicable to**: backtesting

qlib 的 Point-in-Time Provider 保证在给定时间点 t 的查询只返回 t 时刻 真实可知的数据（财报发布延迟、修订历史均被正确处理）， 彻底消除回测中的 look-ahead bias。 zvt 目前财务数据以报告期为 timestamp，缺少"发布日"维度， 存在用未来财报数据做选股的潜在偏差；引入 PIT 模式可大幅提升回测可信度。

## `CW-QL-002` — Recorder + Experiment 实验管理（MLflow 风格）
**From**: qlib · **Applicable to**: factor-research

qlib 的 workflow 模块提供 Experiment/Recorder，自动记录每次模型训练的 超参数、特征、指标、预测结果，支持跨实验比较和模型版本管理。 zvt 目前缺乏 ML 实验追踪机制，每次重跑结果会覆盖前次； 借鉴 Recorder 模式可将每次因子实验的参数和结果持久化，支持快速复现和版本对比。

## `CW-QL-003` — Nested Decision Framework（多层嵌套决策执行）
**From**: qlib · **Applicable to**: backtesting

qlib 支持将高频执行层（分钟级委托拆单）嵌套在低频决策层（日级组合调仓）中， 两层独立优化且可组合运行，实现日内最优执行算法（如 TWAP、VWAP 调仓）。 zvt 目前回测仅有日线级别的成交假设，缺乏执行算法建模； 借鉴嵌套框架可让 zvt 区分"何时持有哪些股"与"如何以最小冲击成本建仓"两个问题。

FILE:references/components/account_query.md
# account_query (6 classes)

## `ClientTrader.balance`
`account_query/clienttrader-balance.py:0`

## `ClientTrader.position`
`account_query/clienttrader-position.py:0`

## `ClientTrader.entrust`
`account_query/clienttrader-entrust.py:0`

## `XueQiuTrader.balance`
`account_query/xueqiutrader-balance.py:0`

## `refresh_strategy`
`account_query/refresh-strategy.py:0`

## `grid_strategy`
`account_query/grid-strategy.py:0`

FILE:references/components/authentication_-_connection.md
# authentication_&_connection (5 classes)

## `BaseLoginClientTrader.prepare`
`authentication_&_connection/baseloginclienttrader-prepare.py:0`

## `BaseLoginClientTrader.connect`
`authentication_&_connection/baseloginclienttrader-connect.py:0`

## `WebTrader.check_login`
`authentication_&_connection/webtrader-check-login.py:0`

## `MiniqmtTrader.connect`
`authentication_&_connection/miniqmttrader-connect.py:0`

## `login_implementation`
`authentication_&_connection/login-implementation.py:0`

FILE:references/components/order_execution.md
# order_execution (6 classes)

## `ClientTrader.buy`
`order_execution/clienttrader-buy.py:0`

## `ClientTrader.sell`
`order_execution/clienttrader-sell.py:0`

## `TradePopDialogHandler.handle`
`order_execution/tradepopdialoghandler-handle.py:0`

## `XueQiuTrader.rebalance`
`order_execution/xueqiutrader-rebalance.py:0`

## `entrust_prop`
`order_execution/entrust-prop.py:0`

## `adjust_sell`
`order_execution/adjust-sell.py:0`

FILE:references/components/remote_service_layer.md
# remote_service_layer (5 classes)

## `server.run`
`remote_service_layer/server-run.py:0`

## `RemoteClient.buy`
`remote_service_layer/remoteclient-buy.py:0`

## `RemoteClient.sell`
`remote_service_layer/remoteclient-sell.py:0`

## `RemoteClient.balance`
`remote_service_layer/remoteclient-balance.py:0`

## `ssl`
`remote_service_layer/ssl.py:0`

FILE:references/components/trade_following.md
# trade_following (6 classes)

## `BaseFollower.follow`
`trade_following/basefollower-follow.py:0`

## `JoinQuantFollower.login`
`trade_following/joinquantfollower-login.py:0`

## `RiceQuantFollower.login`
`trade_following/ricequantfollower-login.py:0`

## `XueQiuFollower.follow`
`trade_following/xueqiufollower-follow.py:0`

## `cmd_cache`
`trade_following/cmd-cache.py:0`

## `platform`
`trade_following/platform.py:0`

ClawHub Backend Testing+2

T@clawhub-tangweigang-jpg-8679fec286

Eastmoney Api

Skill

为 VAlpha 量化终端用户提供 A 股市场数据获取、多数据源自动切换与熔断保护，支持 Tushare/Akshare 链路 fallback，并根据积分额度自动配置请求频率限制。

---
name: eastmoney-api
description: |-
  为 VAlpha 量化终端用户提供 A 股市场数据获取、多数据源自动切换与熔断保护，支持 Tushare/Akshare 链路 fallback，并根据积分额度自动配置请求频率限制。
license: Proprietary. See LICENSE.txt in project root.
compatibility: Designed for Doramagic-host ecosystem (Claude Code / openclaw / Cursor). Requires Python 3.12+ with uv package manager.
metadata:
  version: "v6.1"
  blueprint_id: "finance-bp-084"
  compiled_at: "2026-04-22T13:00:34.071788+00:00"
  capability_markets: "cn-astock"
  capability_activities: "data-sourcing"
  sop_version: "crystal-compilation-v6.1"
---
# 东方财富接口 (eastmoney-api)

> 为 VAlpha 量化终端用户提供 A 股市场数据获取、多数据源自动切换与熔断保护，支持 Tushare/Akshare 链路 fallback，并根据积分额度自动配置请求频率限制。

## Pipeline

`data_collection -> data_storage -> factor_computation -> target_selection -> trading_execution -> visualization`

## Top Use Cases (26 total)

### VAlpha Terminal Entry Point (`UC-101`)
Provides unified entry point for starting FastAPI server or running pre/post-market analysis
**Triggers**: start, server, run

### FastAPI Application Factory (`UC-102`)
Creates and configures FastAPI application instance with CORS, routers, and lifespan management
**Triggers**: application, fastapi, server

### Static File Serving and SPA Routing (`UC-103`)
Serves frontend static files and implements SPA catch-each routing for client-side navigation
**Triggers**: static, frontend, spa

For all **26** use cases, see [references/USE_CASES.md](references/USE_CASES.md).

**Execute trigger**: `When user intent matches intent_router.uc_entries[].positive_terms AND user uses action verb (run/execute/跑/执行/backtest/fetch/collect)`

## What I'll Ask You

- Target market: A-share (default), HK, or crypto? (US stocks in ZVT are half-baked — stockus_nasdaq_AAPL exists but coverage is thin)
- Data source / provider: eastmoney (free, no account), joinquant (account+paid), baostock (free, good history), akshare, or qmt (broker)?
- Strategy type: MACD golden-cross, MA crossover, volume breakout, fundamental screen, or custom factor?
- Time range: start_timestamp and end_timestamp for backtest period
- Target entity IDs: specific stocks (stock_sh_600000) or index components (SZ1000)?

## Semantic Locks (Fatal)

| ID | Rule | On Violation |
|---|---|---|
| `SL-01` | Execute sell orders before buy orders in every trading cycle | halt |
| `SL-02` | Trading signals MUST use next-bar execution (no look-ahead) | halt |
| `SL-03` | Entity IDs MUST follow format entity_type_exchange_code | halt |
| `SL-04` | DataFrame index MUST be MultiIndex (entity_id, timestamp) | halt |
| `SL-05` | TradingSignal MUST have EXACTLY ONE of: position_pct, order_money, order_amount | halt |
| `SL-06` | filter_result column semantics: True=BUY, False=SELL, None/NaN=NO ACTION | halt |
| `SL-07` | Transformer MUST run BEFORE Accumulator in factor pipeline | halt |
| `SL-08` | MACD parameters locked: fast=12, slow=26, signal=9 | halt |

Full lock definitions: [references/LOCKS.md](references/LOCKS.md)

## Top Anti-Patterns (14 total)

- **`AP-DATA-SOURCING-001`**: Missing or invalid User-Agent headers for SEC API requests
- **`AP-DATA-SOURCING-002`**: Ignoring external API rate limits causing IP blocking
- **`AP-DATA-SOURCING-003`**: No HTTP timeout configuration causing indefinite hangs

All 14 anti-patterns: [references/ANTI_PATTERNS.md](references/ANTI_PATTERNS.md)

## Evidence Quality Notice

> [QUALITY NOTICE] This crystal was compiled from blueprint finance-bp-084. Evidence verify ratio = 36.8% and audit fail total = 26. Generated results may have uncaptured requirement gaps. Verify critical decisions against source files (LATEST.yaml / LATEST.jsonl).

## Reference Files

| File | Contents | When to Load |
|---|---|---|
| [references/seed.yaml](references/seed.yaml) | V6+ 全量权威 (source-of-truth) | 有行为/决策争议时必读 |
| [references/ANTI_PATTERNS.md](references/ANTI_PATTERNS.md) | 14 条跨项目反模式 | 开始实现前 |
| [references/WISDOM.md](references/WISDOM.md) | 跨项目精华借鉴 | 架构决策时 |
| [references/CONSTRAINTS.md](references/CONSTRAINTS.md) | domain + fatal 约束 | 规则冲突时 |
| [references/USE_CASES.md](references/USE_CASES.md) | 全量 KUC-* 业务场景 | 需要完整示例时 |
| [references/LOCKS.md](references/LOCKS.md) | SL-* + preconditions + hints | 生成回测/交易代码前 |
| [references/COMPONENTS.md](references/COMPONENTS.md) | AST 组件地图（按 module 拆分）| 查 API 时 |

---

*Compiled by Doramagic crystal-compilation-v6.1 from `finance-bp-084` blueprint at 2026-04-22T13:00:34.071788+00:00.*
*See [human_summary.md](human_summary.md) for non-technical overview.*

FILE:human_summary.md
# Human Summary

> I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow. Just tell me what you want; I'll write the code, you don't have to dig docs. (Heads up: ZVT natively supports A-share, HK, and crypto. US stocks — stockus_nasdaq_AAPL — are half-baked; don't bother for serious work.)

## What I Can Do

- **tagline**: I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow. Just tell me what you want; I'll write the code, you don't have to dig docs. (Heads up: ZVT natively supports A-share, HK, and crypto. US stocks — stockus_nasdaq_AAPL — are half-baked; don't bother for serious work.)
- **use_cases**: ['Static File Serving and SPA Routing', 'FastAPI Application Factory', 'VAlpha Terminal Entry Point', 'A-share MACD daily golden-cross backtest with hfq price adjustment from eastmoney', 'End-to-end ZVT pipeline: FinanceRecorder + GoodCompanyFactor + StockTrader', 'Multi-factor strategy with TargetSelector (AND mode) combining MACD + volume breakout', 'Index composition data collection (SZ1000, SZ2000) with EM recorder']

## What I Ask You

- Target market: A-share (default), HK, or crypto? (US stocks in ZVT are half-baked — stockus_nasdaq_AAPL exists but coverage is thin)
- Data source / provider: eastmoney (free, no account), joinquant (account+paid), baostock (free, good history), akshare, or qmt (broker)?
- Strategy type: MACD golden-cross, MA crossover, volume breakout, fundamental screen, or custom factor?
- Time range: start_timestamp and end_timestamp for backtest period
- Target entity IDs: specific stocks (stock_sh_600000) or index components (SZ1000)?

FILE:references/ANTI_PATTERNS.md
# Anti-Patterns (Cross-Project)

Total: **14**

## finance-bp-070--edgartools (2)

### `AP-DATA-SOURCING-004` — Invalidating XBRL period types for balance sheet analysis <sub>(high)</sub>

Balance sheets represent point-in-time snapshots (instant periods), not ranges (duration periods). Using duration periods for balance sheet statements causes stockholder equity and other line items to show nonsensical date ranges, corrupting financial calculations that depend on accurate period associations.

### `AP-DATA-SOURCING-012` — Large document parsing without streaming causing OOM errors <sub>(high)</sub>

SEC filings can exceed 160MB, and parsing large documents in memory without streaming causes OOM errors that crash the entire service for all users. Documents exceeding 10MB require switching to streaming parsers to prevent extreme memory usage.

## finance-bp-070--edgartools, finance-bp-079--akshare, finance-bp-084--eastmoney, finance-bp-114--edgar-crawler (1)

### `AP-DATA-SOURCING-002` — Ignoring external API rate limits causing IP blocking <sub>(high)</sub>

Multiple financial data sources (SEC EDGAR, Sina, Eastmoney, TuShare) enforce strict rate limits (10 req/sec, 120 calls/minute). Exceeding these triggers temporary IP blocks lasting 10-60 minutes, causing complete data unavailability. Immediate retry attempts during blocks extend the block duration significantly.

## finance-bp-070--edgartools, finance-bp-114--edgar-crawler (1)

### `AP-DATA-SOURCING-001` — Missing or invalid User-Agent headers for SEC API requests <sub>(high)</sub>

SEC EDGAR requires valid User-Agent identity with contact information in headers. Without this, requests are rejected with 403 Forbidden errors, completely blocking all filing access. Both edgartools and edgar-crawler enforce this constraint as fundamental to any data retrieval operation.

## finance-bp-079--akshare (4)

### `AP-DATA-SOURCING-003` — No HTTP timeout configuration causing indefinite hangs <sub>(high)</sub>

HTTP requests to external financial data sources (Yahoo, Sina, Eastmoney) without timeout values can hang indefinitely on blocked connections. This freezes the entire application and prevents data collection from all other sources, creating cascading failures across the system.

### `AP-DATA-SOURCING-005` — Malformed or empty JSON responses causing silent failures <sub>(medium)</sub>

Financial API responses containing malformed JSON raise unhandled ValueError exceptions, crashing downstream processing. Similarly, empty JSON responses (empty dict, list, null) masquerading as valid data cause silent failures producing empty DataFrames or misleading results in financial analysis.

### `AP-DATA-SOURCING-006` — Source-specific symbol mapping errors causing data corruption <sub>(high)</sub>

Stock symbols require source-specific formatting (sh/sz prefixes for Sina, numeric codes for THS, etc.). Incorrect symbol mapping causes API calls to return empty results or wrong data, corrupting financial datasets with missing records or entirely incorrect tickers being stored.

### `AP-DATA-SOURCING-013` — Column mapping length mismatch causing DataFrame errors <sub>(medium)</sub>

Column mapping constants with length mismatch against actual API response columns cause ValueError exceptions during DataFrame construction. Raw field names (f1, f2, f12) must be mapped to meaningful names (最新价, 涨跌幅) with exact column count alignment.

## finance-bp-103--ArcticDB (3)

### `AP-DATA-SOURCING-007` — Using unsupported DataFrame types with time-series storage <sub>(high)</sub>

ArcticDB does not support MultiIndex columns, PyArrow-backed pandas DataFrames, or timedelta64 columns. Attempting to write these DataFrame types raises ArcticDbNotYetImplemented exceptions, causing write failures and permanent data loss if not properly handled before storage operations.

### `AP-DATA-SOURCING-008` — Non-atomic storage writes causing concurrent access corruption <sub>(high)</sub>

Storage backends without atomic write_if_none operations can cause data corruption under concurrent multi-writer access. Similarly, updating reference keys before atom keys complete allows readers to access incomplete or missing data, breaking version chain integrity.

### `AP-DATA-SOURCING-014` — Pruning snapshot-protected versions breaking point-in-time recovery <sub>(high)</sub>

Deleting or pruning versions that are referenced by existing snapshots breaks historical data access. Snapshots provide point-in-time recovery capabilities, and removing their referenced versions causes read failures when users attempt to access data from specific snapshots.

## finance-bp-114--edgar-crawler (1)

### `AP-DATA-SOURCING-010` — 8-K filing item numbering scheme mismatch for historical filings <sub>(medium)</sub>

8-K filings use obsolete item numbering (1-12) before 2004-08-23 and new numbering (1.01-9.01) after. Using the wrong numbering scheme causes no matches for historical filings, resulting in empty item sections and complete extraction failure for pre-2004 data.

## finance-bp-128--yfinance (2)

### `AP-DATA-SOURCING-009` — Missing timezone-aware DatetimeIndex causing DST offset errors <sub>(high)</sub>

Price history DataFrames returned without timezone-aware DatetimeIndex cause incorrect timestamp interpretation when combined with other timezone-aware data. This leads to 23-25 hour offset errors during daylight saving time transitions, corrupting historical price calculations.

### `AP-DATA-SOURCING-011` — Yahoo Finance missing crumb authentication causing 401/403 errors <sub>(high)</sub>

Yahoo Finance API requires crumb and cookie authentication with every request. Without proper crumb management, API calls return 401 Unauthorized or HTML error pages instead of JSON data, breaking all downstream price and financial data processing.

FILE:references/COMPONENTS.md
# Component Capability Map

**Project**: finance-bp-084--eastmoney
**Scan date**: 2026-04-22
**Stats**: {'total_files': 7, 'total_classes': 37, 'total_functions': 0, 'total_stages': 7}

## Modules (7)

- [data_collection](components/data_collection.md): 5 classes
- [factor_computation](components/factor_computation.md): 5 classes
- [recommendation_engine](components/recommendation_engine.md): 7 classes
- [analysis_&_reporting](components/analysis_-_reporting.md): 5 classes
- [portfolio_management](components/portfolio_management.md): 5 classes
- [scheduled_tasks](components/scheduled_tasks.md): 4 classes
- [llm_services](components/llm_services.md): 6 classes

FILE:references/CONSTRAINTS.md
# Constraints

## preservation_manifest

```yaml
required_objects:
  business_decisions_count: 229
  fatal_constraints_count: 29
  non_fatal_constraints_count: 254
  use_cases_count: 26
  semantic_locks_count: 12
  preconditions_count: 4
  evidence_quality_rules_count: 2
  traceback_scenarios_count: 5

```

## Domain Constraints Injected (47)

- **`SHARED-CN-ASTOCK-T1-001`** <sub>(fatal)</sub>: A 股股票实行 T+1 交收制度：T 日买入的股票最早 T+1 日方可卖出。 T 日卖出所得资金可当日再用于买入。回测框架若未施加 T+1 持仓锁定， 将高估换手率与策略胜率，尤其损害日内反转类策略的真实性。
- **`SHARED-CN-ASTOCK-T1-002`** <sub>(fatal)</sub>: 沪深主板股票日涨跌幅上限为 ±10%（ST/SST 股票 ±5%）。 涨停封板时买方消失、跌停封板时卖方消失；回测若假设当日可以任意价格 成交，会系统性高估可执行性。封板检测应在成交模拟层强制实施。
- **`SHARED-CN-ASTOCK-T1-003`** <sub>(high)</sub>: 科创板和创业板（2020年8月改革后）正常交易日涨跌幅为 ±20%； 北交所 ±30%；新股上市后前5个交易日不设涨跌幅限制。 回测若对所有股票统一套用 ±10% 过滤逻辑，会错误剔除或错误包含这些板块的成交。
- **`SHARED-CN-ASTOCK-T1-004`** <sub>(high)</sub>: ST/*ST 股票日涨跌幅限制为 ±5%，流动性极差，成交假设不可与正常股票混用。 包含历史 ST 股票（最终退市）但不纳入回测会产生幸存者偏差； 纳入回测但不区分 ST 涨跌幅会错误模拟成交。
- **`SHARED-CN-ASTOCK-T1-005`** <sub>(medium)</sub>: A 股开盘集合竞价（9:15-9:25）和收盘集合竞价（14:57-15:00）期间， 成交价由"最大成交量原则"确定，非即时撮合。回测以开盘价或收盘价假设 即时全量成交会低估实际滑点风险，大单策略尤为明显。
- **`SHARED-CN-ASTOCK-T1-006`** <sub>(high)</sub>: 停牌制度：A 股长期停牌（2018年前可长达数月）期间，持仓资金被锁定， 无法再平衡，机会成本在回测中普遍被忽略。应在因子计算前过滤停牌日 （volume == 0 或 is_suspended == True），停牌期间不发出信号。
- **`SHARED-CN-ASTOCK-T1-007`** <sub>(high)</sub>: 新股上市后前5个交易日无涨跌幅限制（首日涨幅可超300%）， 且无完整历史数据（均线/波动率/换手率因子无法计算）。 应在因子计算前过滤上市不足 N 个交易日（通常 60-252 日）的股票。
- **`SHARED-CN-ASTOCK-T1-008`** <sub>(high)</sub>: A 股程序化交易监管新规（2025年7月7日施行）：单账户每秒申报/撤单 ≥ 300 笔， 或单日申报/撤单 ≥ 20000 笔，被认定为高频交易，须向交易所报备。 AI 生成的量化策略若频率超标则无法合规运行，应在策略设计期提示。
- **`SHARED-CN-ASTOCK-ADJ-001`** <sub>(fatal)</sub>: 除权除息日股价跳空是账面调整而非真实亏损。复权选择： 不复权会虚增策略亏损；前复权会将历史价格内嵌未来分红信息（lookahead bias）； 后复权以上市首日为基准累积，是量化回测的最优选择。
- **`SHARED-CN-ASTOCK-ADJ-002`** <sub>(fatal)</sub>: A 股上市公司财务报告披露有法定延迟：年报在次年4月30日前、 半年报在8月31日前、季报分别在4月30日（一季）/10月31日（三季）前披露。 回测中使用财务数据时，必须以实际披露日期（announcement_date）而非 会计期间结束日作为数据可用时间点，否则引入 point-in-time lookahead bias。
- **`SHARED-CN-ASTOCK-ADJ-003`** <sub>(high)</sub>: 分红送股转增和配股会导致除权除息日后股本增加，历史持股数量不变但股价等比 缩水，若回测系统未同步调整持仓股数，会在除权日产生虚假亏损或盈利。
- **`SHARED-CN-ASTOCK-ADJ-004`** <sub>(medium)</sub>: 大宗交易与竞价交易价差：大宗交易成交价可比市价折价最多 10%（主板）， 但此价格不影响次日竞价开盘。大宗交易数据出现在收盘后，若将其混入 日内 OHLCV 数据，会污染收盘价和成交量的正常计算。
- **`SHARED-CN-ASTOCK-ADJ-005`** <sub>(fatal)</sub>: 融资融券（两融）做空限制：A 股散户无法直接卖空，融券标的池有限（主要为 大盘蓝筹，中小盘融券极度稀缺），融券利率远高于融资利率。 回测若直接假设可做空任意股票，会产生不可执行的策略，实盘与回测严重背离。
- **`SHARED-CN-ASTOCK-FX-001`** <sub>(high)</sub>: 通过沪深港通（北向）买入股票，境外投资者合计持股上限 30%，预警线 28%。 当外资持股比例达 28% 时，联交所暂停该股新增买盘，直到降至 26% 才恢复。 策略若重仓外资偏好股（消费/医药龙头），需监控外资持股比例。
- **`SHARED-CN-ASTOCK-FX-002`** <sub>(high)</sub>: 5% 举牌规则：单一投资者持有上市公司已发行股份超过 5%，须在3日内向证监会 和交易所报告并公告；在此期间及公告后2日内不得再买卖。 量化选股系统若不考虑此规则，重仓股超过 5% 阈值后将面临强制停止买入。
- **`SHARED-CN-ASTOCK-FX-003`** <sub>(high)</sub>: 公募基金"双十原则"：单基金持有单只股票不超过净资产 10%， 同一基金管理人旗下所有基金合计不超过该公司已发行股份 10%。 量化选股组合若部署于公募基金，需在优化约束中强制加入合规上限。
- **`SHARED-CN-ASTOCK-FX-004`** <sub>(fatal)</sub>: 内幕交易边界：AI 辅助量化系统的所有输入数据必须来自公开已披露信息。 通过非公开渠道（私有数据服务/内部消息/重组前预知）触发的自动化交易 构成内幕交易，适用《证券法》第80-83条及《内幕交易行为认定指引》。
- **`SHARED-CN-ASTOCK-MKT-001`** <sub>(fatal)</sub>: 幸存者偏差：使用当前 A 股成分股（如当前沪深300）作为历史回测股票池， 会遗漏曾被纳入指数但因业绩差被调出或退市的股票。2020-2024年 A 股 退市数量加速（41家/年创纪录），此偏差日趋严重。必须使用历史时点快照。
- **`SHARED-CN-ASTOCK-MKT-002`** <sub>(medium)</sub>: 指数成分股调整效应：沪深300/中证500等每半年调整一次（6月/12月）， 被纳入股票通常在公告日至生效日之间显著上涨（被动资金被动买入）， 被剔除股票则相反。回测股票池应使用历史成分股快照，并标注调整窗口期。
- **`SHARED-CN-ASTOCK-MKT-003`** <sub>(high)</sub>: 策略拥挤（Strategy Crowding）：大量量化私募使用相似因子模型时， 持仓高度重叠，遇市场冲击时集体卖出形成踩踏。2024年2月 A 股量化危机 是典型案例（小盘股指数单日跌幅超 10%）。需监控因子多头持仓与 主流量化基金的重叠率。
- **`SHARED-CN-ASTOCK-MKT-004`** <sub>(high)</sub>: A 股量化对冲策略常用 IF/IC/IM 股指期货做多/空对冲系统性风险。 但 A 股股指期货长期处于贴水（远期价格 < 现货），IC 年化贴水可达 10-20%。 回测若仅考虑价格收益而忽略期货贴水/升水，会严重高估对冲策略净收益。
- **`SHARED-CN-ASTOCK-MKT-005`** <sub>(high)</sub>: A 股月度动量因子在方向上与美股相反：近1个月表现最好的股票， 下1个月大概率反转（反转效应而非动量）。机构研究（华泰/东吴证券） 与学术论文均验证：直接套用美股月度动量因子在 A 股会产生系统性亏损。
- **`SHARED-CN-ASTOCK-BF-001`** <sub>(medium)</sub>: 处置效应（Shefrin & Statman 1985）在 A 股散户中尤为显著： 投资者倾向于过早卖出盈利股票、过长持有亏损股票。上交所实证研究证实 超过 90% 的个人账户存在此效应，AI 辅助工具不应迁就"持有亏损等解套" 的直觉，而应基于量化信号提供纪律性止损止盈建议。
- **`SHARED-CN-ASTOCK-BF-002`** <sub>(medium)</sub>: A 股以散户为主（个人账户交易量占比超 80%），羊群效应显著：散户倾向于 跟风操作，导致价格非理性波动（如 2015年杠杆牛熊）。量化策略应避免 使用成交量排行/热度排行等可能强化羊群信号的指标作为主要因子。
- **`SHARED-CN-ASTOCK-BF-003`** <sub>(medium)</sub>: 过度自信效应（Barber & Odean 2000）在 A 股散户中更严重：散户年均换手率 超 500%，机构长期收益显著优于散户。高换手率策略经交易成本后净收益往往 更低。AI 不应鼓励"频繁操作"，而应推荐低频高质信号驱动交易。
- **`SHARED-CN-ASTOCK-BF-004`** <sub>(medium)</sub>: A 股日历效应：春节效应（节前5日和节后1-3日倾向上涨）、月初效应 （月初第1-5个交易日表现优于月中/月末）已有学术实证（南京财经大学等）。 策略应在日历特殊窗口降低信号置信度，或单独评估日历驱动收益的贡献。
- **`SHARED-CN-ASTOCK-BF-005`** <sub>(high)</sub>: 策略容量（Capacity）限制：A 股小盘/微盘股日均成交额仅数百万， 大资金买入/卖出会造成严重价格冲击，策略实际容量可能仅几千万元。 回测结果不可外推至亿级资金，应在回测中加入成交量比例上限约束。
- **`SHARED-CN-ASTOCK-COST-001`** <sub>(fatal)</sub>: A 股完整交易成本结构（2023年8月调整后）：印花税卖出单向 0.05%； 佣金双向约 0.01%（最低5元）；过户费（沪市）0.001%； 滑点/冲击成本小盘股 0.1%-0.5%/次。忽略成本的回测策略年化收益率 具有欺骗性，高频/高换手策略尤甚。
- **`SHARED-CN-ASTOCK-COST-002`** <sub>(high)</sub>: 市场冲击成本（Market Impact）在回测中通常完全缺失，但在实盘中可能是 最大成本来源。A 股小盘股 100 万元买入可能推高 1% 以上。冲击成本与 成交规模呈幂律而非线性关系，应使用 Almgren-Chriss 模型或简化版估算。
- **`SHARED-CN-ASTOCK-COST-003`** <sub>(medium)</sub>: 大股东/董监高减持新规（证监会第224号令，2024年5月）：持股5%以上大股东 通过集中竞价减持须提前15个交易日披露减持计划，3个月内不超过股份总数1%。 解禁股减持压力是 A 股特有的系统性风险因子，回测中忽略解禁日历会低估 相关股票的持股风险。
- **`SHARED-CN-ASTOCK-DATA-001`** <sub>(high)</sub>: A 股交易日历与自然日历不一致：存在法定节假日调休导致的"补班日"（周六上班）， 以及临时停市（2015年7月8日至7月10日因股灾紧急停市）。 使用通用工作日历（weekdays）推算 A 股交易日会产生偏差， 必须使用 A 股专用交易日历（如 exchange_calendars 或 tushare 的交易日接口）。
- **`SHARED-CN-ASTOCK-DATA-002`** <sub>(medium)</sub>: A 股退市后股票代码可能被新股重用（极少见但存在）。使用纯代码（如 '000001'） 作为历史数据主键而不包含交易所后缀（'.SZ'）或上市日期范围，可能导致 历史数据与当前股票的错误混淆，长周期回测中需特别注意。
- **`SHARED-DS-RL-001`** <sub>(fatal)</sub>: Rate Limit + 指数退避重试：所有外部数据 API 调用必须实施速率限制控制 和指数退避重试（Exponential Backoff with Jitter）。收到 429/503 响应后 立即重试是反模式，会加剧服务端压力并触发 IP 封禁。 最大重试次数 3-5 次，退避基数 1-2 秒，最大退避 60 秒。
- **`SHARED-DS-RL-002`** <sub>(high)</sub>: 批量 API 调用必须控制并发数（max_workers），不可无限制并行。 免费 API（akshare/tushare 免费版）通常限制为 1-3 并发； 付费 API 也有并发上限（tushare 积分制，不同积分对应不同并发）。 超出并发限制会触发 429 或 IP 封禁。推荐使用 asyncio.Semaphore 或 ThreadPoolExecutor 的 max_workers 参数显式控制。
- **`SHARED-DS-RL-003`** <sub>(high)</sub>: API Token / 凭证安全：数据源 API key（tushare token / akshare 无需 token 但 其他商业数据源需要）不可硬编码在代码中，必须通过环境变量或配置文件读取。 硬编码 token 提交到 Git 会导致 token 泄露和费用损失。
- **`SHARED-DS-RL-004`** <sub>(medium)</sub>: 请求节流（Throttling）：对同一 API 的批量请求应在请求间插入最小间隔 （akshare 部分接口要求 ≥ 0.5s；tushare 免费版每分钟 200 次）。 纯代码 sleep 不如令牌桶（Token Bucket）算法精确，推荐使用 ratelimit 或 slowapi 等成熟库。
- **`SHARED-DS-MISS-001`** <sub>(high)</sub>: 停牌日数据缺失策略：停牌股票在停牌期间无成交数据，数据库中会出现日期缺口。 缺失日期不可使用 forward-fill（会产生虚假成交量）； 应在数据库中以 is_suspended=True 标记，量和成交额填 0，价格保留前一日收盘价。 因子计算时必须过滤 is_suspended=True 的行。
- **`SHARED-DS-MISS-002`** <sub>(medium)</sub>: 新上市股票的历史数据边界：新股上市首日开始在数据库中出现，但其上市前 无历史数据。若因子计算的 lookback 期超过上市天数，会产生所有 NaN 因子值。 采集时应记录每只股票的上市日期（list_date），采集逻辑应以上市日期为起点， 不以固定开始日期。
- **`SHARED-DS-MISS-003`** <sub>(high)</sub>: 退市股票的数据完整性：已退市股票在主流数据源（akshare/tushare）中依然 可以查询历史数据（退市前的历史），但退市日期后无数据。 历史股票池构建时必须包含已退市股票（否则幸存者偏差）， 且采集时需明确处理退市日截止边界。
- **`SHARED-DS-MISS-004`** <sub>(high)</sub>: 多数据源数据对账（Cross-Source Reconciliation）：同一数据（如收盘价） 从不同数据源（akshare/tushare/baostock）获取可能存在细微差异 （不同复权方式/不同节假日处理/除息调整时间不同）。 应在 pipeline 中实施多源对账检查，差异超阈值（如 0.1%）时记录告警并人工确认。
- **`SHARED-DS-TIME-001`** <sub>(high)</sub>: 时间戳精度与类型一致性：数据库中时间戳应使用统一的数据类型 （timestamp 而非 varchar/int）。混用字符串日期（'2024-01-15'）和 Timestamp 对象是比较、索引、merge 出现细微 bug 的常见来源， 应在 pipeline 入口处强制转换。
- **`SHARED-DS-TIME-002`** <sub>(high)</sub>: 交易时间与自然时间的区分：日线数据的"日期"通常对应交易日（T日）， 而新闻/公告数据的"时间"是自然时间。合并两类数据时，必须将自然时间 映射到下一个可用交易日（next available trading day）， 否则会产生"公告在T日，但T日盘中已经可用"的 lookahead 问题。
- **`SHARED-DS-INCR-001`** <sub>(high)</sub>: 增量更新幂等性：数据更新脚本必须是幂等的（多次运行结果相同）。 若脚本因网络中断在中途失败，重新运行时不应产生重复数据或数据缺口。 实现方式：先写入临时表，校验后 UPSERT 到主表，不直接 INSERT/APPEND。
- **`SHARED-DS-INCR-002`** <sub>(high)</sub>: 数据完整性检验（数据校验和/行数检查）：每次数据更新后， 应对关键字段做完整性检验：行数是否在预期范围内、价格是否为正数、 日期是否连续（无缺失交易日）。缺少自动校验的数据管道是"沉默腐烂"的根源。
- **`SHARED-DS-INCR-003`** <sub>(medium)</sub>: 数据版本化：数据管道的输出数据应版本化管理（data versioning）。 当数据源更新了历史数据（如修订调整后的财务数据）， 旧版本数据应保留可追溯，不应静默覆盖，以便对比版本间差异及复现历史回测。
- **`SHARED-DS-INCR-004`** <sub>(medium)</sub>: 数据对齐到交易日历边界：采集完成后，应验证所有股票/资产的数据覆盖 完整性与交易日历的一致性。每只股票在每个交易日都应有一行数据 （停牌标记，不是缺失）。通过 pivot_table 检查 NaN 比例是有效的快速诊断手段。
- **`SHARED-DS-INCR-005`** <sub>(medium)</sub>: 缓存策略（Caching）：频繁读取的静态/低频更新数据（如股票信息、行业分类、 指数成分股）应本地缓存，避免每次运行重复 API 调用。 缓存必须设置过期时间（TTL），防止使用过期的行业分类或已失效的成分股信息。

FILE:references/LOCKS.md
# Semantic Locks + Preconditions

## Semantic Locks (12)

### `SL-01` <sub>(on_violation: fatal)</sub>
Execute sell orders before buy orders in every trading cycle

### `SL-02` <sub>(on_violation: fatal)</sub>
Trading signals MUST use next-bar execution (no look-ahead)

### `SL-03` <sub>(on_violation: fatal)</sub>
Entity IDs MUST follow format entity_type_exchange_code

### `SL-04` <sub>(on_violation: fatal)</sub>
DataFrame index MUST be MultiIndex (entity_id, timestamp)

### `SL-05` <sub>(on_violation: fatal)</sub>
TradingSignal MUST have EXACTLY ONE of: position_pct, order_money, order_amount

### `SL-06` <sub>(on_violation: fatal)</sub>
filter_result column semantics: True=BUY, False=SELL, None/NaN=NO ACTION

### `SL-07` <sub>(on_violation: fatal)</sub>
Transformer MUST run BEFORE Accumulator in factor pipeline

### `SL-08` <sub>(on_violation: fatal)</sub>
MACD parameters locked: fast=12, slow=26, signal=9

### `SL-09` <sub>(on_violation: warning)</sub>
Default transaction costs: buy_cost=0.001, sell_cost=0.001, slippage=0.001

### `SL-10` <sub>(on_violation: fatal)</sub>
A-share equity trading is T+1 (no same-day close of buy positions)

### `SL-11` <sub>(on_violation: fatal)</sub>
Recorder subclass MUST define provider AND data_schema class attributes

### `SL-12` <sub>(on_violation: fatal)</sub>
Factor result_df MUST contain either 'filter_result' OR 'score_result' column

## Preconditions (4)

- **`PC-01`**: `python3 -c 'import zvt; print(zvt.__version__)'` → on_fail: Run: python3 -m pip install zvt then re-run: python3 -m zvt.init_dirs to initialize data directories
- **`PC-02`**: `python3 -c "from zvt.api.kdata import get_kdata; df = get_kdata(entity_ids=['stock_sh_600000'], limit=1); assert df is not None and len(df) > 0, 'No kdata found'"` → on_fail: Run recorder first: python3 -m zvt.recorders.em.em_stock_kdata_recorder --entity_ids stock_sh_600000 (replace with your target entity IDs)
- **`PC-03`**: `python3 -c "import os; from pathlib import Path; zvt_home = Path(os.environ.get('ZVT_HOME', Path.home() / '.zvt')); assert zvt_home.exists(), f'ZVT home not found: {zvt_home}'"` → on_fail: Run: python3 -m zvt.init_dirs
- **`PC-04`**: `python3 -c "import os, tempfile; from pathlib import Path; zvt_home = Path(os.environ.get('ZVT_HOME', Path.home() / '.zvt')); test_f = zvt_home / '.write_test'; test_f.touch(); test_f.unlink()"` → on_fail: Check directory permissions: chmod u+w ~/.zvt or set ZVT_HOME environment variable to a writable location

FILE:references/USE_CASES.md
# Known Use Cases (KUC)

Total: **26**

## `KUC-101`
**Source**: `main.py`

Provides unified entry point for starting FastAPI server or running pre/post-market analysis.

## `KUC-102`
**Source**: `app/main.py`

Creates and configures FastAPI application instance with CORS, routers, and lifespan management.

## `KUC-103`
**Source**: `app/static.py`

Serves frontend static files and implements SPA catch-each routing for client-side navigation.

## `KUC-104`
**Source**: `app/routers/alerts.py`

Manages user notifications across portfolios including unread counts, marking as read, and dismissing alerts.

## `KUC-105`
**Source**: `app/routers/auth.py`

Handles user registration, login, and JWT token generation for secure API access.

## `KUC-106`
**Source**: `app/routers/stocks.py`

Manages user's stock watchlist with CRUD operations, real-time quotes, and financial data retrieval.

## `KUC-107`
**Source**: `app/routers/sentiment.py`

Analyzes market sentiment from news and generates AI-powered sentiment reports.

## `KUC-108`
**Source**: `app/routers/generate.py`

Generates pre-market and post-market investment reports for funds or each user's portfolios.

## `KUC-109`
**Source**: `app/routers/market.py`

Provides fund search functionality and real-time market data using Akshare and TuShare APIs.

## `KUC-110`
**Source**: `app/routers/health.py`

Provides basic health check endpoint for system monitoring and load balancer checks.

## `KUC-111`
**Source**: `app/routers/assistant.py`

Provides conversational AI assistant with RAG-enhanced responses for investment queries.

## `KUC-112`
**Source**: `app/routers/recommendations.py`

Generates AI investment recommendations using quantitative factor-based engine for stocks and funds.

## `KUC-113`
**Source**: `app/routers/preferences.py`

Manages user's investment preferences including risk level presets and portfolio settings.

## `KUC-114`
**Source**: `app/routers/widgets.py`

Provides pre-aggregated market data for dashboard widgets including northbound flow and sector performance.

## `KUC-115`
**Source**: `app/routers/dashboard.py`

Provides dashboard overview, system statistics, and customizable layout management.

## `KUC-116`
**Source**: `app/routers/details.py`

Retrieves detailed stock information including spot data, historical prices, and financial indicators.

## `KUC-117`
**Source**: `app/routers/admin.py`

Provides admin endpoints for system testing, LLM connection verification, and web search testing.

## `KUC-118`
**Source**: `app/routers/funds.py`

Manages investment funds with diagnosis, risk metrics, drawdown analysis, and comparison features.

## `KUC-119`
**Source**: `app/routers/commodities.py`

Analyzes gold and silver commodities with price trends and investment insights.

## `KUC-120`
**Source**: `app/routers/reports.py`

Manages generated reports including listing, viewing, and organizing pre/post-market analysis files.

## `KUC-121`
**Source**: `app/routers/settings.py`

Manages application settings including LLM provider configuration and API key management.

## `KUC-122`
**Source**: `app/routers/portfolios.py`

Manages portfolios with unified positions, transactions, DIP plans, AI rebalancing, stress testing, and correlation analysis.

## `KUC-123`
**Source**: `app/routers/compare.py`

Compares multiple stocks side-by-side with metrics including price, PE, PB, market cap, and turnover.

## `KUC-124`
**Source**: `app/routers/news.py`

Aggregates and personalizes financial news feed with categories and bookmarking functionality.

## `KUC-125`
**Source**: `test_scan.py`

Tests raw TuShare API data access for money flow and HSGT northbound data scanning.

## `KUC-126`
**Source**: `test_hsgt_min.py`

Tests high-frequency northbound capital flow minute-level data retrieval.

FILE:references/WISDOM.md
# Cross-Project Wisdom

Total: **8**

## `CW-DATA-SOURCING-001` — Exponential backoff retry with rate limit detection
**From**: finance-bp-079--akshare, finance-bp-114--edgar-crawler · **Applicable to**: data-sourcing

Implement retry logic with exponential backoff specifically for HTTP 429 rate limit responses. Retrying immediately on rate limit errors worsens the block situation. Separate retry logic for transient network errors (TimeoutError, ConnectionError) from permanent errors (ValueError, KeyError) prevents resource waste and masks underlying bugs.

## `CW-DATA-SOURCING-002` — Strict date format validation and standardization
**From**: finance-bp-070--edgartools, finance-bp-079--akshare, finance-bp-084--eastmoney · **Applicable to**: data-sourcing

Validate date formats strictly (YYYY-MM-DD pattern with leap year and month-end checks) before processing XBRL or API data. Convert date strings between formats (YYYYMMDD to YYYY-MM-DD) when storing to databases. Invalid dates corrupt downstream financial calculations.

## `CW-DATA-SOURCING-003` — XBRL fact attribute completeness enforcement
**From**: finance-bp-070--edgartools, finance-bp-114--edgar-crawler · **Applicable to**: data-sourcing

Extract and validate all essential XBRL fact attributes (concept, value, period, unit) from every fact. Missing attributes cause financial analysis queries to return incomplete or misleading results. Period type (instant vs duration) must be correctly distinguished for accurate balance sheet rendering.

## `CW-DATA-SOURCING-004` — Streaming parser threshold for large documents
**From**: finance-bp-070--edgartools, finance-bp-128--yfinance · **Applicable to**: data-sourcing

Implement streaming parser activation when documents exceed configurable thresholds (10MB default). This prevents OOM errors on large NPORT-P filings or bulk document downloads. Also require timezone information for time-series data to prevent DST offset corruption.

## `CW-DATA-SOURCING-005` — Data accuracy disclaimer requirements
**From**: finance-bp-079--akshare, finance-bp-128--yfinance, finance-bp-097--OpenBB · **Applicable to**: data-sourcing

Always present scraped or third-party financial data with proper caveats about accuracy limitations and delays. Claims of guaranteed accuracy, real-time capabilities, or Yahoo/provider affiliation violate terms of service and can lead to user financial losses from reliance on delayed or incorrect data.

## `CW-DATA-SOURCING-006` — Atomic write ordering for versioned storage
**From**: finance-bp-103--ArcticDB · **Applicable to**: data-sourcing

Write atom keys (TABLE_DATA, TABLE_INDEX, VERSION) before updating mutable reference keys (VERSION_REF, SNAPSHOT_REF). Never modify atom keys after writing to preserve content-addressed storage invariants. This prevents readers from accessing incomplete data in multi-writer scenarios.

## `CW-DATA-SOURCING-007` — HTTP status code validation before data processing
**From**: finance-bp-079--akshare, finance-bp-097--OpenBB · **Applicable to**: data-sourcing

Always validate HTTP response status codes before processing response data. Error responses (404, 500) may contain HTML error pages that corrupt downstream JSON parsing. Explicitly check for HTTP 429 and raise RateLimitError for proper handling by callers.

## `CW-DATA-SOURCING-008` — Quality gates for financial recommendations
**From**: finance-bp-084--eastmoney · **Applicable to**: data-sourcing

Apply fundamental quality filters (ROE thresholds, OCF/Profit ratios, debt ratios) before generating financial recommendations. Without quality gates, low-quality stocks may be recommended for positions, leading to investment losses. Separate on-demand computation from scheduled pre-computation to handle API rate limits.

FILE:references/components/analysis_-_reporting.md
# analysis_&_reporting (5 classes)

## `PreMarketAnalyst.analyze`
`analysis_&_reporting/premarketanalyst-analyze.py:0`

## `PostMarketAnalyst.analyze`
`analysis_&_reporting/postmarketanalyst-analyze.py:0`

## `GoldSilverAnalyst.analyze`
`analysis_&_reporting/goldsilveranalyst-analyze.py:0`

## `BaseAnalyst._build_prompt`
`analysis_&_reporting/baseanalyst-build-prompt.py:0`

## `analysis_mode`
`analysis_&_reporting/analysis-mode.py:0`

FILE:references/components/data_collection.md
# data_collection (5 classes)

## `DataSourceManager.fetch_data`
`data_collection/datasourcemanager-fetch-data.py:0`

## `TuShareClient.get_daily_bars`
`data_collection/tushareclient-get-daily-bars.py:0`

## `RateLimiter.acquire`
`data_collection/ratelimiter-acquire.py:0`

## `CircuitBreaker.call`
`data_collection/circuitbreaker-call.py:0`

## `data_source_provider`
`data_collection/data-source-provider.py:0`

FILE:references/components/factor_computation.md
# factor_computation (5 classes)

## `DailyFactorComputer.compute_all`
`factor_computation/dailyfactorcomputer-compute-all.py:0`

## `TechnicalFactors.compute`
`factor_computation/technicalfactors-compute.py:0`

## `RiskFactors.compute`
`factor_computation/riskfactors-compute.py:0`

## `FactorCache.get`
`factor_computation/factorcache-get.py:0`

## `factor_computation_schedule`
`factor_computation/factor-computation-schedule.py:0`

FILE:references/components/llm_services.md
# llm_services (6 classes)

## `BaseLLMClient.chat`
`llm_services/basellmclient-chat.py:0`

## `GoogleGeminiClient.chat`
`llm_services/googlegeminiclient-chat.py:0`

## `OpenAIClient.chat`
`llm_services/openaiclient-chat.py:0`

## `ToolExecutor.execute`
`llm_services/toolexecutor-execute.py:0`

## `AssistantService.chat`
`llm_services/assistantservice-chat.py:0`

## `llm_provider`
`llm_services/llm-provider.py:0`

FILE:references/components/portfolio_management.md
# portfolio_management (5 classes)

## `SignalGenerator.gen_signal`
`portfolio_management/signalgenerator-gen-signal.py:0`

## `RiskMetricsCalculator.calculate`
`portfolio_management/riskmetricscalculator-calculate.py:0`

## `CorrelationAnalyzer.analyze`
`portfolio_management/correlationanalyzer-analyze.py:0`

## `StressTestEngine.run`
`portfolio_management/stresstestengine-run.py:0`

## `signal_thresholds`
`portfolio_management/signal-thresholds.py:0`

FILE:references/components/recommendation_engine.md
# recommendation_engine (7 classes)

## `RecommendationEngine.get_recommendation`
`recommendation_engine/recommendationengine-get-recommendation.py:0`

## `StockRecommendationEngine.generate`
`recommendation_engine/stockrecommendationengine-generate.py:0`

## `FundRecommendationEngine.generate`
`recommendation_engine/fundrecommendationengine-generate.py:0`

## `ShortTermStrategy.compute_score`
`recommendation_engine/shorttermstrategy-compute-score.py:0`

## `AlphaStrategy.compute_score`
`recommendation_engine/alphastrategy-compute-score.py:0`

## `strategy_weights`
`recommendation_engine/strategy-weights.py:0`

## `min_score_threshold`
`recommendation_engine/min-score-threshold.py:0`

FILE:references/components/scheduled_tasks.md
# scheduled_tasks (4 classes)

## `SchedulerManager.add_factor_computation_job`
`scheduled_tasks/schedulermanager-add-factor-computation-.py:0`

## `TradingCalendar.is_trading_day`
`scheduled_tasks/tradingcalendar-is-trading-day.py:0`

## `SchedulerManager.snapshot_portfolios`
`scheduled_tasks/schedulermanager-snapshot-portfolios.py:0`

## `job_schedule`
`scheduled_tasks/job-schedule.py:0`

ClawHub Coding Backend+2

T@clawhub-tangweigang-jpg-8679fec286

Darts Forecasting

Skill

Darts 是轻量级时间序列预测库，支持多市场金融数据的确定性与概率性预测，提供协变量整合与层级聚合能力。

---
name: darts-forecasting
description: |-
  Darts 是轻量级时间序列预测库，支持多市场金融数据的确定性与概率性预测，提供协变量整合与层级聚合能力。
license: Proprietary. See LICENSE.txt in project root.
compatibility: Designed for Doramagic-host ecosystem (Claude Code / openclaw / Cursor). Requires Python 3.12+ with uv package manager.
metadata:
  version: "v6.1"
  blueprint_id: "finance-bp-102"
  compiled_at: "2026-04-22T13:00:47.497902+00:00"
  capability_markets: "multi-market"
  capability_activities: "time-series-ml"
  sop_version: "crystal-compilation-v6.1"
---
# Darts 时序预测 (darts-forecasting)

> Darts 是轻量级时间序列预测库，支持多市场金融数据的确定性与概率性预测，提供协变量整合与层级聚合能力。

## Pipeline

`data_collection -> data_storage -> factor_computation -> target_selection -> trading_execution -> visualization`

## Top Use Cases (31 total)

### Sphinx Package Title Fixer (`UC-101`)
Automates extraction of descriptive titles and docstrings from Python packages to improve Sphinx API documentation readability
**Triggers**: sphinx documentation, package titles, docstring extraction

### Sphinx Documentation Configuration (`UC-102`)
Configures Sphinx documentation builder with extensions for auto-summary, autodoc, and graphviz visualization
**Triggers**: sphinx config, documentation, autodoc

### Example Utilities Module (`UC-131`)
Provides utility functions for managing Python paths when running Darts examples locally
**Triggers**: utilities, path management, example helpers

For all **31** use cases, see [references/USE_CASES.md](references/USE_CASES.md).

**Execute trigger**: `When user intent matches intent_router.uc_entries[].positive_terms AND user uses action verb (run/execute/跑/执行/backtest/fetch/collect)`

## What I'll Ask You

- Target market: A-share (default), HK, or crypto? (US stocks in ZVT are half-baked — stockus_nasdaq_AAPL exists but coverage is thin)
- Data source / provider: eastmoney (free, no account), joinquant (account+paid), baostock (free, good history), akshare, or qmt (broker)?
- Strategy type: MACD golden-cross, MA crossover, volume breakout, fundamental screen, or custom factor?
- Time range: start_timestamp and end_timestamp for backtest period
- Target entity IDs: specific stocks (stock_sh_600000) or index components (SZ1000)?

## Semantic Locks (Fatal)

| ID | Rule | On Violation |
|---|---|---|
| `SL-01` | Execute sell orders before buy orders in every trading cycle | halt |
| `SL-02` | Trading signals MUST use next-bar execution (no look-ahead) | halt |
| `SL-03` | Entity IDs MUST follow format entity_type_exchange_code | halt |
| `SL-04` | DataFrame index MUST be MultiIndex (entity_id, timestamp) | halt |
| `SL-05` | TradingSignal MUST have EXACTLY ONE of: position_pct, order_money, order_amount | halt |
| `SL-06` | filter_result column semantics: True=BUY, False=SELL, None/NaN=NO ACTION | halt |
| `SL-07` | Transformer MUST run BEFORE Accumulator in factor pipeline | halt |
| `SL-08` | MACD parameters locked: fast=12, slow=26, signal=9 | halt |

Full lock definitions: [references/LOCKS.md](references/LOCKS.md)

## Top Anti-Patterns (15 total)

- **`AP-TIME-SERIES-ML-001`**: TimeSeries values array dimensionality mismatch
- **`AP-TIME-SERIES-ML-002`**: Non-floating-point dtype in TimeSeries values
- **`AP-TIME-SERIES-ML-003`**: Irregular or non-monotonic time index

All 15 anti-patterns: [references/ANTI_PATTERNS.md](references/ANTI_PATTERNS.md)

## Evidence Quality Notice

> [QUALITY NOTICE] This crystal was compiled from blueprint finance-bp-102. Evidence verify ratio = 43.8% and audit fail total = 26. Generated results may have uncaptured requirement gaps. Verify critical decisions against source files (LATEST.yaml / LATEST.jsonl).

## Reference Files

| File | Contents | When to Load |
|---|---|---|
| [references/seed.yaml](references/seed.yaml) | V6+ 全量权威 (source-of-truth) | 有行为/决策争议时必读 |
| [references/ANTI_PATTERNS.md](references/ANTI_PATTERNS.md) | 15 条跨项目反模式 | 开始实现前 |
| [references/WISDOM.md](references/WISDOM.md) | 跨项目精华借鉴 | 架构决策时 |
| [references/CONSTRAINTS.md](references/CONSTRAINTS.md) | domain + fatal 约束 | 规则冲突时 |
| [references/USE_CASES.md](references/USE_CASES.md) | 全量 KUC-* 业务场景 | 需要完整示例时 |
| [references/LOCKS.md](references/LOCKS.md) | SL-* + preconditions + hints | 生成回测/交易代码前 |
| [references/COMPONENTS.md](references/COMPONENTS.md) | AST 组件地图（按 module 拆分）| 查 API 时 |

---

*Compiled by Doramagic crystal-compilation-v6.1 from `finance-bp-102` blueprint at 2026-04-22T13:00:47.497902+00:00.*
*See [human_summary.md](human_summary.md) for non-technical overview.*

FILE:human_summary.md
# Human Summary

> I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow. Just tell me what you want; I'll write the code, you don't have to dig docs. (Heads up: ZVT natively supports A-share, HK, and crypto. US stocks — stockus_nasdaq_AAPL — are half-baked; don't bother for serious work.)

## What I Can Do

- **tagline**: I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow. Just tell me what you want; I'll write the code, you don't have to dig docs. (Heads up: ZVT natively supports A-share, HK, and crypto. US stocks — stockus_nasdaq_AAPL — are half-baked; don't bother for serious work.)
- **use_cases**: ['Darts Quickstart Tutorial', 'Sphinx Documentation Configuration', 'Sphinx Package Title Fixer', 'A-share MACD daily golden-cross backtest with hfq price adjustment from eastmoney', 'End-to-end ZVT pipeline: FinanceRecorder + GoodCompanyFactor + StockTrader', 'Multi-factor strategy with TargetSelector (AND mode) combining MACD + volume breakout', 'Index composition data collection (SZ1000, SZ2000) with EM recorder']

## What I Ask You

- Target market: A-share (default), HK, or crypto? (US stocks in ZVT are half-baked — stockus_nasdaq_AAPL exists but coverage is thin)
- Data source / provider: eastmoney (free, no account), joinquant (account+paid), baostock (free, good history), akshare, or qmt (broker)?
- Strategy type: MACD golden-cross, MA crossover, volume breakout, fundamental screen, or custom factor?
- Time range: start_timestamp and end_timestamp for backtest period
- Target entity IDs: specific stocks (stock_sh_600000) or index components (SZ1000)?

FILE:references/ANTI_PATTERNS.md
# Anti-Patterns (Cross-Project)

Total: **15**

## finance-bp-102--Darts (7)

### `AP-TIME-SERIES-ML-001` — TimeSeries values array dimensionality mismatch <sub>(high)</sub>

When constructing a TimeSeries with a values array that is not expanded to exactly 3 dimensions (time×component×sample), downstream model operations expecting the standard 3D shape will fail with dimension mismatches. This causes all downstream models to receive incorrectly formatted data tensors, leading to complete pipeline failure or silent data corruption.

### `AP-TIME-SERIES-ML-002` — Non-floating-point dtype in TimeSeries values <sub>(high)</sub>

When setting TimeSeries values dtype to integer or non-floating-point types, numerical operations produce incorrect results during financial calculations. Financial forecasts require float64 or float32 precision to handle decimal computations accurately; integer dtypes truncate precision and cause accumulation of rounding errors that compound across time steps.

### `AP-TIME-SERIES-ML-003` — Irregular or non-monotonic time index <sub>(high)</sub>

When TimeSeries time index is not strictly monotonically increasing with a well-defined frequency and no gaps, downstream models produce incorrect forecasts due to temporal misalignment. Gap detection methods fail, and any temporal aggregation or differencing operations will produce meaningless results.

### `AP-TIME-SERIES-ML-004` — Time index and values length mismatch at construction <sub>(high)</sub>

When the time index length does not equal the values array first dimension length, TimeSeries construction fails with ValueError at construction time, preventing any data from being loaded into the system. This typically occurs when importing data from CSV or DataFrame sources where column alignment assumptions are incorrect.

### `AP-TIME-SERIES-ML-005` — Missing abstract method implementations in ForecastingModel subclasses <sub>(high)</sub>

When implementing ForecastingModel subclasses without implementing all required abstract methods (fit, predict, min_train_samples, _target_window_lengths, extreme_lags, supports_multivariate, supports_transferable_series_prediction), Python's ABC abstractmethod enforcement causes TypeError at instantiation time, preventing any model from being created.

### `AP-TIME-SERIES-ML-006` — fit() method not returning self for chaining <sub>(medium)</sub>

When fit() method does not return self for method chaining, the fluent interface pattern expected by users breaks at lines 209, 2932, and 3069 where chaining is attempted. Users encounter AttributeError when trying to chain operations like model.fit(series).predict(n_periods).

### `AP-TIME-SERIES-ML-007` — Frequency inference failure with insufficient timesteps <sub>(medium)</sub>

When using fill_missing_dates with fewer than 3 time steps, frequency inference fails with ValueError because at least 3 consecutive timestamps are required to determine a unique constant frequency. Irregular time series cannot be gap-filled without this minimum data.

## finance-bp-121--machine-learning-for-trading (8)

### `AP-TIME-SERIES-ML-008` — Look-ahead bias from random train/test splits <sub>(high)</sub>

When implementing cross-validation for financial time series using random K-fold or standard train_test_split without temporal ordering, future information leaks into training data. This look-ahead bias artificially inflates backtest performance metrics and leads to significant live trading losses when the model encounters truly unseen data.

### `AP-TIME-SERIES-ML-009` — Missing purge gap contaminating validation results <sub>(high)</sub>

When using walking forward split without an embargo gap between train and test periods, overlapping outcomes between training and test periods contaminate validation results. Without purge gap, seemingly good backtest results do not generalize to live performance due to information leakage across the split boundary.

### `AP-TIME-SERIES-ML-010` — Hardcoded credentials in source code <sub>(high)</sub>

When scraping content from websites requiring authentication by hardcoding credentials in source code files, exposed credentials lead to unauthorized access, potential account termination, and security breaches. Credentials should be loaded from environment variables or secure configuration files, never committed to version control.

### `AP-TIME-SERIES-ML-011` — TA-Lib infinite values causing ML model failures <sub>(high)</sub>

When computing technical indicators using TA-Lib (RSI, MACD, ATR) without handling edge cases, division-by-zero produces infinite values that corrupt the feature DataFrame. Gradient-based ML models (neural networks) cannot process infinite values, causing training to fail or produce NaN gradients.

### `AP-TIME-SERIES-ML-012` — MultiIndex structure lost during feature engineering <sub>(high)</sub>

When flattening or renaming the (ticker, date) MultiIndex during feature engineering for multi-ticker trading, downstream stages (prediction_modeling, backtesting) fail because they expect MultiIndex for proper temporal train/test splits. Data corruption occurs silently when multi-ticker data is treated as single-ticker.

### `AP-TIME-SERIES-ML-013` — Missing TA-Lib C library dependency <sub>(high)</sub>

When installing TA-Lib via pip install ta-lib alone without compiling the underlying C library, import fails because the Python package is merely a wrapper around compiled native code. This causes immediate runtime failure for any code attempting to import talib for technical indicator computation.

### `AP-TIME-SERIES-ML-014` — Trading calendar minutes_per_day mismatch <sub>(high)</sub>

When configuring extended-hours trading calendar with incorrect minutes_per_day (e.g., using 960 for extended hours instead of 1600), minute bar alignment with the calendar fails. Backtest prices do not correspond to actual trading times, producing meaningless results that don't reflect real market microstructure.

### `AP-TIME-SERIES-ML-015` — Zipline bundle ingest function signature mismatch <sub>(high)</sub>

When implementing Zipline bundle ingest function with incorrect parameter count or order, Zipline fails with TypeError during bundle ingest because the ingestion pipeline expects exactly 9 parameters in a specific order. Backtesting cannot run at all when bundle ingestion fails, blocking all downstream work.

FILE:references/COMPONENTS.md
# Component Capability Map

**Project**: finance-bp-102--Darts
**Scan date**: 2026-04-22
**Stats**: {'total_files': 15, 'total_classes': 77, 'total_functions': 0, 'total_stages': 15}

## Modules (15)

- [timeseries_data_representation](components/timeseries_data_representation.md): 5 classes
- [forecasting_model_base](components/forecasting_model_base.md): 5 classes
- [pytorch_deep_learning_forecasting](components/pytorch_deep_learning_forecasting.md): 7 classes
- [statistical_&_classical_forecasting](components/statistical_-_classical_forecasting.md): 5 classes
- [scikit-learn_regression_forecasting](components/scikit-learn_regression_forecasting.md): 5 classes
- [ensemble_forecasting](components/ensemble_forecasting.md): 4 classes
- [conformal_prediction](components/conformal_prediction.md): 4 classes
- [data_transformation_pipeline](components/data_transformation_pipeline.md): 6 classes
- [covariate_encoding](components/covariate_encoding.md): 5 classes
- [hierarchical_reconciliation](components/hierarchical_reconciliation.md): 5 classes
- [anomaly_detection](components/anomaly_detection.md): 7 classes
- [time_series_filtering](components/time_series_filtering.md): 4 classes
- [metrics_evaluation](components/metrics_evaluation.md): 6 classes
- [model_explainability](components/model_explainability.md): 4 classes
- [probabilistic_likelihoods](components/probabilistic_likelihoods.md): 5 classes

FILE:references/CONSTRAINTS.md
# Constraints

## preservation_manifest

```yaml
required_objects:
  business_decisions_count: 192
  fatal_constraints_count: 102
  non_fatal_constraints_count: 282
  use_cases_count: 31
  semantic_locks_count: 12
  preconditions_count: 4
  evidence_quality_rules_count: 2
  traceback_scenarios_count: 5

```

FILE:references/LOCKS.md
# Semantic Locks + Preconditions

## Semantic Locks (12)

### `SL-01` <sub>(on_violation: fatal)</sub>
Execute sell orders before buy orders in every trading cycle

### `SL-02` <sub>(on_violation: fatal)</sub>
Trading signals MUST use next-bar execution (no look-ahead)

### `SL-03` <sub>(on_violation: fatal)</sub>
Entity IDs MUST follow format entity_type_exchange_code

### `SL-04` <sub>(on_violation: fatal)</sub>
DataFrame index MUST be MultiIndex (entity_id, timestamp)

### `SL-05` <sub>(on_violation: fatal)</sub>
TradingSignal MUST have EXACTLY ONE of: position_pct, order_money, order_amount

### `SL-06` <sub>(on_violation: fatal)</sub>
filter_result column semantics: True=BUY, False=SELL, None/NaN=NO ACTION

### `SL-07` <sub>(on_violation: fatal)</sub>
Transformer MUST run BEFORE Accumulator in factor pipeline

### `SL-08` <sub>(on_violation: fatal)</sub>
MACD parameters locked: fast=12, slow=26, signal=9

### `SL-09` <sub>(on_violation: warning)</sub>
Default transaction costs: buy_cost=0.001, sell_cost=0.001, slippage=0.001

### `SL-10` <sub>(on_violation: fatal)</sub>
A-share equity trading is T+1 (no same-day close of buy positions)

### `SL-11` <sub>(on_violation: fatal)</sub>
Recorder subclass MUST define provider AND data_schema class attributes

### `SL-12` <sub>(on_violation: fatal)</sub>
Factor result_df MUST contain either 'filter_result' OR 'score_result' column

## Preconditions (4)

- **`PC-01`**: `python3 -c 'import zvt; print(zvt.__version__)'` → on_fail: Run: python3 -m pip install zvt then re-run: python3 -m zvt.init_dirs to initialize data directories
- **`PC-02`**: `python3 -c "from zvt.api.kdata import get_kdata; df = get_kdata(entity_ids=['stock_sh_600000'], limit=1); assert df is not None and len(df) > 0, 'No kdata found'"` → on_fail: Run recorder first: python3 -m zvt.recorders.em.em_stock_kdata_recorder --entity_ids stock_sh_600000 (replace with your target entity IDs)
- **`PC-03`**: `python3 -c "import os; from pathlib import Path; zvt_home = Path(os.environ.get('ZVT_HOME', Path.home() / '.zvt')); assert zvt_home.exists(), f'ZVT home not found: {zvt_home}'"` → on_fail: Run: python3 -m zvt.init_dirs
- **`PC-04`**: `python3 -c "import os, tempfile; from pathlib import Path; zvt_home = Path(os.environ.get('ZVT_HOME', Path.home() / '.zvt')); test_f = zvt_home / '.write_test'; test_f.touch(); test_f.unlink()"` → on_fail: Check directory permissions: chmod u+w ~/.zvt or set ZVT_HOME environment variable to a writable location

FILE:references/USE_CASES.md
# Known Use Cases (KUC)

Total: **31**

## `KUC-101`
**Source**: `docs/fix_package_titles.py`

Automates extraction of descriptive titles and docstrings from Python packages to improve Sphinx API documentation readability.

## `KUC-102`
**Source**: `docs/source/conf.py`

Configures Sphinx documentation builder with extensions for auto-summary, autodoc, and graphviz visualization.

## `KUC-103`
**Source**: `examples/00-quickstart.ipynb`

Introduces new users to the Darts time series library with basic operations like series creation, loading datasets, and simple transformations.

## `KUC-104`
**Source**: `examples/01-multi-time-series-and-covariates.ipynb`

Demonstrates forecasting multiple related time series simultaneously using covariates and multivariate models like VARIMA and NBEATS.

## `KUC-105`
**Source**: `examples/02-data-processing.ipynb`

Shows how to build reusable data processing pipelines with transformers for scaling, filling missing values, and other transformations.

## `KUC-106`
**Source**: `examples/03-FFT-examples.ipynb`

Uses Fast Fourier Transform for frequency-based time series forecasting, ideal for seasonal patterns.

## `KUC-107`
**Source**: `examples/04-RNN-examples.ipynb`

Demonstrates recurrent neural network models (RNN, LSTM, GRU) for time series forecasting with seasonality detection.

## `KUC-108`
**Source**: `examples/05-TCN-examples.ipynb`

Uses Temporal Convolutional Networks for high-performance time series forecasting with dilated convolutions.

## `KUC-109`
**Source**: `examples/06-Transformer-examples.ipynb`

Applies Transformer architecture with self-attention mechanisms for capturing long-range dependencies in time series.

## `KUC-110`
**Source**: `examples/07-NBEATS-examples.ipynb`

Uses NBEATS (Neural Basis Expansion Analysis) for interpretable deep learning time series forecasting.

## `KUC-111`
**Source**: `examples/08-DeepAR-examples.ipynb`

Implements DeepAR for probabilistic forecasting with uncertainty quantification using Gaussian likelihood.

## `KUC-112`
**Source**: `examples/09-DeepTCN-examples.ipynb`

Combines Deep TCN architecture with probabilistic prediction using quantile regression and Gaussian likelihood.

## `KUC-113`
**Source**: `examples/10-Kalman-filter-examples.ipynb`

Applies Kalman filtering for state estimation and noise reduction in time series with known state-space models.

## `KUC-114`
**Source**: `examples/11-GP-filter-examples.ipynb`

Uses Gaussian Process regression for flexible non-parametric filtering and noise reduction in time series.

## `KUC-115`
**Source**: `examples/12-Dynamic-Time-Warping-example.ipynb`

Computes similarity between time series using Dynamic Time Warping algorithm for pattern matching and comparison.

## `KUC-116`
**Source**: `examples/13-TFT-examples.ipynb`

Uses TFT for interpretable multi-horizon forecasting with attention visualization and quantile predictions.

## `KUC-117`
**Source**: `examples/14-transfer-learning.ipynb`

Demonstrates transferring knowledge from pre-trained models across different time series datasets (M3, M4 competitions).

## `KUC-118`
**Source**: `examples/15-static-covariates.ipynb`

Shows how to incorporate static (time-invariant) covariates into time series models for multivariate forecasting.

## `KUC-119`
**Source**: `examples/16-hierarchical-reconciliation.ipynb`

Demonstrates hierarchical forecasting with MinT reconciliation to ensure consistency across aggregation levels.

## `KUC-120`
**Source**: `examples/17-hyperparameter-optimization.ipynb`

Uses Optuna for automated hyperparameter tuning of forecasting models with early stopping and visualization.

## `KUC-121`
**Source**: `examples/18-TiDE-examples.ipynb`

Implements TiDE (Time-series Dense Encoder) for efficient long-sequence time series forecasting.

## `KUC-122`
**Source**: `examples/19-EnsembleModel-examples.ipynb`

Combines multiple forecasting models using ensemble techniques like naive ensembling and regression ensembling.

## `KUC-123`
**Source**: `examples/20-SKLearnModel-examples.ipynb`

Uses scikit-learn compatible models (Linear Regression, Random Forest, XGBoost, LightGBM) with SHAP explainability.

## `KUC-124`
**Source**: `examples/21-TSMixer-examples.ipynb`

Uses TSMixer for multi-variate time series forecasting with feature mixing and quantile regression.

## `KUC-125`
**Source**: `examples/22-anomaly-detection-examples.ipynb`

Detects anomalies in time series using scoring methods like KMeans, Wasserstein distance, and forecasting-based models.

## `KUC-126`
**Source**: `examples/23-Conformal-Prediction-examples.ipynb`

Provides distribution-free uncertainty quantification using conformal prediction with calibration sets.

## `KUC-127`
**Source**: `examples/24-SKLearnClassifierModel-examples.ipynb`

Classifies time series segments into categories using gradient-based features and CatBoost classifier.

## `KUC-128`
**Source**: `examples/25-Chronos-2-examples.ipynb`

Uses Chronos-2, a pre-trained time series foundation model, for zero-shot and fine-tuned forecasting.

## `KUC-129`
**Source**: `examples/26-NeuralForecast-examples.ipynb`

Integrates NeuralForecast models (from Nixtla) with Darts for advanced neural network time series forecasting.

## `KUC-130`
**Source**: `examples/27-Torch-and-Foundation-Model-Fine-Tuning-examples.ipynb`

Fine-tunes pre-trained foundation models like Chronos-2 and TiDE on custom time series data.

## `KUC-131`
**Source**: `examples/utils/utils.py`

Provides utility functions for managing Python paths when running Darts examples locally.

FILE:references/WISDOM.md
# Cross-Project Wisdom

Total: **10**

## `CW-TIME-SERIES-ML-001` — 3D TimeSeries dimensionality invariant
**From**: finance-bp-102--Darts · **Applicable to**: time-series-ml

Always expand TimeSeries values to exactly 3 dimensions (n_timesteps, n_components, n_samples) regardless of input format. This invariant enables uniform downstream processing regardless of whether the data is univariate (1 component), single-sample, or multivariate probabilistic series with multiple samples.

## `CW-TIME-SERIES-ML-002` — Strict time index validation
**From**: finance-bp-102--Darts · **Applicable to**: time-series-ml

Validate time index at construction: must be strictly monotonically increasing, have a well-defined frequency, no holes by default, and length must match values first dimension. This prevents silent data corruption in all downstream temporal operations.

## `CW-TIME-SERIES-ML-003` — MultiIndex preservation in multi-ticker pipelines
**From**: finance-bp-121--machine-learning-for-trading · **Applicable to**: time-series-ml

Maintain (ticker, date) MultiIndex structure throughout the entire feature engineering and prediction pipeline for multi-ticker trading systems. Downstream stages depend on this structure for proper temporal train/test splits that respect per-ticker time boundaries.

## `CW-TIME-SERIES-ML-004` — Purged walking forward cross-validation
**From**: finance-bp-121--machine-learning-for-trading · **Applicable to**: time-series-ml

Use purged walking forward split with embargo gap for financial time series validation. Random splits cause look-ahead bias, while splits without purge gaps contaminate results with overlapping outcomes. The purge gap prevents information leakage across train/test boundaries.

## `CW-TIME-SERIES-ML-005` — TA-Lib edge case sanitization
**From**: finance-bp-121--machine-learning-for-trading · **Applicable to**: time-series-ml

Always replace infinite values with NaN and call dropna before ML model training when using TA-Lib technical indicators. RSI, MACD, ATR and other indicators produce inf values during division-by-zero edge cases, which corrupt gradient-based model training.

## `CW-TIME-SERIES-ML-006` — Fluent forecasting model interface
**From**: finance-bp-102--Darts · **Applicable to**: time-series-ml

Implement fit() returning self and predict() on ForecastingModel subclasses to support method chaining. This fluent interface pattern is expected by users for idiomatic usage like model.fit(series).predict(n_periods).

## `CW-TIME-SERIES-ML-007` — Zipline bundle signature contract
**From**: finance-bp-121--machine-learning-for-trading · **Applicable to**: time-series-ml

When implementing Zipline bundle ingest functions, the function must accept exactly 9 parameters in the specified order: environ, asset_db_writer, minute_bar_writer, daily_bar_writer, adjustment_writer, calendar, start_session, end_session, cache. This contract is enforced by Zipline's ingestion pipeline.

## `CW-TIME-SERIES-ML-008` — Calendar minutes_per_day alignment
**From**: finance-bp-121--machine-learning-for-trading · **Applicable to**: time-series-ml

When configuring trading calendars for backtesting, set minutes_per_day to match the total trading minutes including extended hours (960 for regular NYSE, 1600 for extended hours starting 4:00 AM). This ensures minute bar alignment with actual trading times in the backtest.

## `CW-TIME-SERIES-ML-009` — Deterministic series detection
**From**: finance-bp-121--machine-learning-for-trading · **Applicable to**: time-series-ml

A TimeSeries is deterministic when n_samples equals 1, otherwise probabilistic. This distinction matters for methods like to_json and gaps detection which execute differently depending on whether the series contains probabilistic predictions or point estimates.

## `CW-TIME-SERIES-ML-010` — Minimum training sample enforcement
**From**: finance-bp-102--Darts · **Applicable to**: time-series-ml

Enforce min_train_series_length at fit time to prevent underfitting with insufficient historical data. Models should raise ValueError with clear messaging when training series length is below the model's minimum requirement, preventing silent poor forecasts.

FILE:references/components/anomaly_detection.md
# anomaly_detection (7 classes)

## `AnomalyModel.fit`
`anomaly_detection/anomalymodel-fit.py:0`

## `AnomalyModel.score`
`anomaly_detection/anomalymodel-score.py:0`

## `AnomalyModel.detect`
`anomaly_detection/anomalymodel-detect.py:0`

## `Detector.fit_detect`
`anomaly_detection/detector-fit-detect.py:0`

## `Scorer`
`anomaly_detection/scorer.py:0`

## `Detector threshold`
`anomaly_detection/detector-threshold.py:0`

## `Aggregator`
`anomaly_detection/aggregator.py:0`

FILE:references/components/conformal_prediction.md
# conformal_prediction (4 classes)

## `ConformalModel.fit`
`conformal_prediction/conformalmodel-fit.py:0`

## `ConformalModel.predict`
`conformal_prediction/conformalmodel-predict.py:0`

## `ConformalQRModel.fit`
`conformal_prediction/conformalqrmodel-fit.py:0`

## `Conformal method`
`conformal_prediction/conformal-method.py:0`

FILE:references/components/covariate_encoding.md
# covariate_encoding (5 classes)

## `Encoder.encode_train`
`covariate_encoding/encoder-encode-train.py:0`

## `Encoder.encode_inference`
`covariate_encoding/encoder-encode-inference.py:0`

## `SequentialEncoder.fit`
`covariate_encoding/sequentialencoder-fit.py:0`

## `Encoding type`
`covariate_encoding/encoding-type.py:0`

## `Cyclic normalization`
`covariate_encoding/cyclic-normalization.py:0`

FILE:references/components/data_transformation_pipeline.md
# data_transformation_pipeline (6 classes)

## `BaseDataTransformer.transform`
`data_transformation_pipeline/basedatatransformer-transform.py:0`

## `FittableDataTransformer.fit`
`data_transformation_pipeline/fittabledatatransformer-fit.py:0`

## `InvertibleDataTransformer.inverse_transform`
`data_transformation_pipeline/invertibledatatransformer-inverse-transf.py:0`

## `Pipeline.fit_transform`
`data_transformation_pipeline/pipeline-fit-transform.py:0`

## `Scaler backend`
`data_transformation_pipeline/scaler-backend.py:0`

## `Parallelization`
`data_transformation_pipeline/parallelization.py:0`

FILE:references/components/ensemble_forecasting.md
# ensemble_forecasting (4 classes)

## `EnsembleModel.fit`
`ensemble_forecasting/ensemblemodel-fit.py:0`

## `EnsembleModel.predict`
`ensemble_forecasting/ensemblemodel-predict.py:0`

## `RegressionEnsembleModel.fit`
`ensemble_forecasting/regressionensemblemodel-fit.py:0`

## `Ensemble method`
`ensemble_forecasting/ensemble-method.py:0`

FILE:references/components/forecasting_model_base.md
# forecasting_model_base (5 classes)

## `ForecastingModel.fit`
`forecasting_model_base/forecastingmodel-fit.py:0`

## `ForecastingModel.predict`
`forecasting_model_base/forecastingmodel-predict.py:0`

## `ForecastingModel.historical_forecasts`
`forecasting_model_base/forecastingmodel-historical-forecasts.py:0`

## `Encoder system`
`forecasting_model_base/encoder-system.py:0`

## `Likelihood`
`forecasting_model_base/likelihood.py:0`

FILE:references/components/hierarchical_reconciliation.md
# hierarchical_reconciliation (5 classes)

## `BottomUpReconciliator.fit`
`hierarchical_reconciliation/bottomupreconciliator-fit.py:0`

## `TopDownReconciliator.fit`
`hierarchical_reconciliation/topdownreconciliator-fit.py:0`

## `MinTReconciliator.fit`
`hierarchical_reconciliation/mintreconciliator-fit.py:0`

## `Reconciliator.transform`
`hierarchical_reconciliation/reconciliator-transform.py:0`

## `Reconciliation method`
`hierarchical_reconciliation/reconciliation-method.py:0`

FILE:references/components/metrics_evaluation.md
# metrics_evaluation (6 classes)

## `err`
`metrics_evaluation/err.py:0`

## `mae`
`metrics_evaluation/mae.py:0`

## `mape`
`metrics_evaluation/mape.py:0`

## `ql`
`metrics_evaluation/ql.py:0`

## `ic`
`metrics_evaluation/ic.py:0`

## `Reduction`
`metrics_evaluation/reduction.py:0`

FILE:references/components/model_explainability.md
# model_explainability (4 classes)

## `ShapExplainer.explain`
`model_explainability/shapexplainer-explain.py:0`

## `TFTExplainer.explain`
`model_explainability/tftexplainer-explain.py:0`

## `ShapExplainer.plot_explanation`
`model_explainability/shapexplainer-plot-explanation.py:0`

## `Explainer type`
`model_explainability/explainer-type.py:0`

FILE:references/components/probabilistic_likelihoods.md
# probabilistic_likelihoods (5 classes)

## `TorchLikelihood.compute_loss`
`probabilistic_likelihoods/torchlikelihood-compute-loss.py:0`

## `TorchLikelihood.sample`
`probabilistic_likelihoods/torchlikelihood-sample.py:0`

## `GaussianLikelihood.parameters`
`probabilistic_likelihoods/gaussianlikelihood-parameters.py:0`

## `QuantileRegression.sample`
`probabilistic_likelihoods/quantileregression-sample.py:0`

## `Distribution`
`probabilistic_likelihoods/distribution.py:0`

FILE:references/components/pytorch_deep_learning_forecasting.md
# pytorch_deep_learning_forecasting (7 classes)

## `TorchForecastingModel.fit`
`pytorch_deep_learning_forecasting/torchforecastingmodel-fit.py:0`

## `TorchForecastingModel.predict`
`pytorch_deep_learning_forecasting/torchforecastingmodel-predict.py:0`

## `TorchForecastingModel.save`
`pytorch_deep_learning_forecasting/torchforecastingmodel-save.py:0`

## `TorchForecastingModel.load`
`pytorch_deep_learning_forecasting/torchforecastingmodel-load.py:0`

## `Loss function`
`pytorch_deep_learning_forecasting/loss-function.py:0`

## `Optimizer`
`pytorch_deep_learning_forecasting/optimizer.py:0`

## `Training dataset`
`pytorch_deep_learning_forecasting/training-dataset.py:0`

FILE:references/components/scikit-learn_regression_forecasting.md
# scikit-learn_regression_forecasting (5 classes)

## `RegressionModel.fit`
`scikit-learn_regression_forecasting/regressionmodel-fit.py:0`

## `RegressionModel.predict`
`scikit-learn_regression_forecasting/regressionmodel-predict.py:0`

## `SKLearnClassifierModel.fit`
`scikit-learn_regression_forecasting/sklearnclassifiermodel-fit.py:0`

## `Regressor`
`scikit-learn_regression_forecasting/regressor.py:0`

## `Multi-output strategy`
`scikit-learn_regression_forecasting/multi-output-strategy.py:0`

FILE:references/components/statistical_-_classical_forecasting.md
# statistical_&_classical_forecasting (5 classes)

## `ARIMA.fit`
`statistical_&_classical_forecasting/arima-fit.py:0`

## `ARIMA.predict`
`statistical_&_classical_forecasting/arima-predict.py:0`

## `ExponentialSmoothing.fit`
`statistical_&_classical_forecasting/exponentialsmoothing-fit.py:0`

## `NaiveSeasonal.predict`
`statistical_&_classical_forecasting/naiveseasonal-predict.py:0`

## `Underlying statsmodel`
`statistical_&_classical_forecasting/underlying-statsmodel.py:0`

FILE:references/components/time_series_filtering.md
# time_series_filtering (4 classes)

## `FilteringModel.filter`
`time_series_filtering/filteringmodel-filter.py:0`

## `KalmanFilter.fit`
`time_series_filtering/kalmanfilter-fit.py:0`

## `GaussianProcessFilter.fit`
`time_series_filtering/gaussianprocessfilter-fit.py:0`

## `Filter type`
`time_series_filtering/filter-type.py:0`

FILE:references/components/timeseries_data_representation.md
# timeseries_data_representation (5 classes)

## `TimeSeries.from_csv`
`timeseries_data_representation/timeseries-from-csv.py:0`

## `TimeSeries.from_dataframe`
`timeseries_data_representation/timeseries-from-dataframe.py:0`

## `TimeSeries.slice`
`timeseries_data_representation/timeseries-slice.py:0`

## `TimeSeries.concatenate`
`timeseries_data_representation/timeseries-concatenate.py:0`

## `Backend implementation`
`timeseries_data_representation/backend-implementation.py:0`

ClawHub Data Analysis Documentation+2

T@clawhub-tangweigang-jpg-8679fec286

Czsc Chan Theory

Skill

CZSC 缠论技术分析工具，支持 K 线生成、笔线段识别、分型信号提取与 A 股回测可视化。

---
name: czsc-chan-theory
description: |-
  CZSC 缠论技术分析工具，支持 K 线生成、笔线段识别、分型信号提取与 A 股回测可视化。
license: Proprietary. See LICENSE.txt in project root.
compatibility: Designed for Doramagic-host ecosystem (Claude Code / openclaw / Cursor). Requires Python 3.12+ with uv package manager.
metadata:
  version: "v6.1"
  blueprint_id: "finance-bp-091"
  compiled_at: "2026-04-22T13:00:38.716020+00:00"
  capability_markets: "cn-astock"
  capability_activities: "backtesting, factor-research"
  sop_version: "crystal-compilation-v6.1"
---
# 缠论技术分析 (czsc-chan-theory)

> CZSC 缠论技术分析工具，支持 K 线生成、笔线段识别、分型信号提取与 A 股回测可视化。

## Pipeline

`data_collection -> data_storage -> factor_computation -> target_selection -> trading_execution -> visualization`

## Top Use Cases (10 total)

### Sphinx Documentation Configuration (`UC-101`)
Configuring Sphinx documentation builder for the czsc project, ensuring proper Python path setup and Rust version priority
**Triggers**: documentation, sphinx, configuration

### CZSC Performance Benchmarking (`UC-102`)
Benchmarking CZSC analysis performance with varying K-line counts to measure initialization speed and memory usage
**Triggers**: benchmark, performance, speed

### Volatility Classification Signal (`UC-104`)
Classifying market volatility into three tiers (low/middle/high) based on recent K-line price ranges for signal generation
**Triggers**: volatility, classification, signal

For all **10** use cases, see [references/USE_CASES.md](references/USE_CASES.md).

**Execute trigger**: `When user intent matches intent_router.uc_entries[].positive_terms AND user uses action verb (run/execute/跑/执行/backtest/fetch/collect)`

## What I'll Ask You

- Target market: A-share (default), HK, or crypto? (US stocks in ZVT are half-baked — stockus_nasdaq_AAPL exists but coverage is thin)
- Data source / provider: eastmoney (free, no account), joinquant (account+paid), baostock (free, good history), akshare, or qmt (broker)?
- Strategy type: MACD golden-cross, MA crossover, volume breakout, fundamental screen, or custom factor?
- Time range: start_timestamp and end_timestamp for backtest period
- Target entity IDs: specific stocks (stock_sh_600000) or index components (SZ1000)?

## Semantic Locks (Fatal)

| ID | Rule | On Violation |
|---|---|---|
| `SL-01` | Execute sell orders before buy orders in every trading cycle | halt |
| `SL-02` | Trading signals MUST use next-bar execution (no look-ahead) | halt |
| `SL-03` | Entity IDs MUST follow format entity_type_exchange_code | halt |
| `SL-04` | DataFrame index MUST be MultiIndex (entity_id, timestamp) | halt |
| `SL-05` | TradingSignal MUST have EXACTLY ONE of: position_pct, order_money, order_amount | halt |
| `SL-06` | filter_result column semantics: True=BUY, False=SELL, None/NaN=NO ACTION | halt |
| `SL-07` | Transformer MUST run BEFORE Accumulator in factor pipeline | halt |
| `SL-08` | MACD parameters locked: fast=12, slow=26, signal=9 | halt |

Full lock definitions: [references/LOCKS.md](references/LOCKS.md)

## Top Anti-Patterns (25 total)

- **`AP-ZVT-183`**: 除权因子为 inf/NaN 时直接参与乘法导致复权静默失败
- **`AP-ZVT-179`**: 第三方数据接口超限后异常被吞噬，数据静默缺失
- **`AP-ZVT-183B`**: HFQ（后复权）与 QFQ（前复权）K 线表使用错误导致因子计算漂移

All 25 anti-patterns: [references/ANTI_PATTERNS.md](references/ANTI_PATTERNS.md)

## Evidence Quality Notice

> [QUALITY NOTICE] This crystal was compiled from blueprint finance-bp-091. Evidence verify ratio = 60.4% and audit fail total = 13. Generated results may have uncaptured requirement gaps. Verify critical decisions against source files (LATEST.yaml / LATEST.jsonl).

## Reference Files

| File | Contents | When to Load |
|---|---|---|
| [references/seed.yaml](references/seed.yaml) | V6+ 全量权威 (source-of-truth) | 有行为/决策争议时必读 |
| [references/ANTI_PATTERNS.md](references/ANTI_PATTERNS.md) | 25 条跨项目反模式 | 开始实现前 |
| [references/WISDOM.md](references/WISDOM.md) | 跨项目精华借鉴 | 架构决策时 |
| [references/CONSTRAINTS.md](references/CONSTRAINTS.md) | domain + fatal 约束 | 规则冲突时 |
| [references/USE_CASES.md](references/USE_CASES.md) | 全量 KUC-* 业务场景 | 需要完整示例时 |
| [references/LOCKS.md](references/LOCKS.md) | SL-* + preconditions + hints | 生成回测/交易代码前 |
| [references/COMPONENTS.md](references/COMPONENTS.md) | AST 组件地图（按 module 拆分）| 查 API 时 |

---

*Compiled by Doramagic crystal-compilation-v6.1 from `finance-bp-091` blueprint at 2026-04-22T13:00:38.716020+00:00.*
*See [human_summary.md](human_summary.md) for non-technical overview.*

FILE:human_summary.md
# Human Summary

> I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow. Just tell me what you want; I'll write the code, you don't have to dig docs. (Heads up: ZVT natively supports A-share, HK, and crypto. US stocks — stockus_nasdaq_AAPL — are half-baked; don't bother for serious work.)

## What I Can Do

- **tagline**: I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow. Just tell me what you want; I'll write the code, you don't have to dig docs. (Heads up: ZVT natively supports A-share, HK, and crypto. US stocks — stockus_nasdaq_AAPL — are half-baked; don't bother for serious work.)
- **use_cases**: ['Trading View K-Line Visualization', 'CZSC Performance Benchmarking', 'Sphinx Documentation Configuration', 'A-share MACD daily golden-cross backtest with hfq price adjustment from eastmoney', 'End-to-end ZVT pipeline: FinanceRecorder + GoodCompanyFactor + StockTrader', 'Multi-factor strategy with TargetSelector (AND mode) combining MACD + volume breakout', 'Index composition data collection (SZ1000, SZ2000) with EM recorder']

## What I Ask You

- Target market: A-share (default), HK, or crypto? (US stocks in ZVT are half-baked — stockus_nasdaq_AAPL exists but coverage is thin)
- Data source / provider: eastmoney (free, no account), joinquant (account+paid), baostock (free, good history), akshare, or qmt (broker)?
- Strategy type: MACD golden-cross, MA crossover, volume breakout, fundamental screen, or custom factor?
- Time range: start_timestamp and end_timestamp for backtest period
- Target entity IDs: specific stocks (stock_sh_600000) or index components (SZ1000)?

FILE:references/ANTI_PATTERNS.md
# Anti-Patterns (Cross-Project)

Total: **25**

## qlib (9)

### `AP-QLIB-1930` — 回测结果与模型无关——共享 dataset 对象导致预测值被首次模型覆盖 <sub>(high)</sub>

Qlib 中多个模型复用同一个已 fit 的 DatasetH 实例时，dataset 内部的标准化 参数（fit_start_time/fit_end_time 决定的归一化统计量）在第一次 fit 后固化。 切换模型但不重新初始化 dataset，导致所有模型实际使用同一套预测信号。表现为 无论换 LightGBM/XGBoost/DNN，回测净值曲线完全一致。这是最危险的"实验看起来 在跑，但结论全部无效"反模式。

Source: https://github.com/microsoft/qlib/issues/1930

### `AP-QLIB-2090` — fit_start_time 与 train segment 双重配置引发隐式数据泄露 <sub>(high)</sub>

Qlib DatasetH 有两个"训练数据范围"：handler 的 fit_start_time/fit_end_time （决定归一化器拟合范围）和 segments.train（决定模型训练范围）。常见错误是 让 fit_end_time 覆盖 valid/test 段，使归一化统计量（均值、标准差）包含了 未来数据，造成前向偏差（look-ahead bias）。两者独立配置但语义耦合，文档 未明确说明 fit_end_time 必须 <= train_end。

Source: https://github.com/microsoft/qlib/issues/2090

### `AP-QLIB-2036` — MACD 因子公式文档错误——DEA 被多除一次 CLOSE 导致量纲不一致 <sub>(high)</sub>

Qlib 官方文档中的 Alpha 公式示例将 MACD 的 DEA 定义为 EMA(DIF, 9) / CLOSE， 但 DIF 已经是无量纲（除过 CLOSE 的），再次除以 CLOSE 导致 DEA 量纲为 1/price。 基于此文档公式构建的 MACD 因子在截面标准化后与正确公式差异显著，IC 下降。 此类文档层面的公式错误会被大量用户直接照搬入生产因子库。

Source: https://github.com/microsoft/qlib/issues/2036

### `AP-QLIB-2184` — 自定义 A 股数据导入前未按约定填充停牌日 NaN，引发下游因子噪声 <sub>(high)</sub>

Qlib 约定停牌日 open/close/high/low/volume/factor 字段均应填 NaN，以便框架 在因子计算时识别并跳过。用户自建 A 股数据集时若将停牌日保留为上一日价格 （常见于从东财/Wind 直接导出的数据），会导致停牌期间的价格动量因子出现 "假信号"（价格不变但因子非零）。Qlib 不校验此约定，错误静默流入训练数据。

Source: https://github.com/microsoft/qlib/issues/2184

### `AP-QLIB-1892` — PIT（Point-In-Time）财务数据收集器依赖外部股票列表接口，全量 A 股获取不完整 <sub>(high)</sub>

Qlib 的 PIT 数据收集器（财务数据时间点快照）在初始化时调用 get_hs_stock_symbols() 获取沪深股票列表。该函数依赖东财 API，经常仅返回 部分列表而非全量 5000+ 股票，且函数在获取不完整时直接 raise ValueError。 用户若按文档步骤操作，财务数据集将只覆盖部分股票，基于 PIT 财务因子的回测 存在严重生存者偏差（未被采集的股票被隐式排除）。

Source: https://github.com/microsoft/qlib/issues/1892

### `AP-QLIB-2097` — 全市场 instrument="all" 在 32GB 内存机器上 OOM，但 CSI300 正常 <sub>(medium)</sub>

Qlib 在加载 Alpha158 特征时会将指定 universe 的全部特征矩阵一次性载入内存。 使用 instrument="csi300"（300 股）与 instrument="all"（5000+ 股）的内存占用 差约 16 倍。32GB 机器跑全市场时在 init_instance_by_config 阶段直接 OOM， 错误信息不提示内存问题。用户容易误以为是配置错误，实际上需要分批加载或 使用流式特征计算。

Source: https://github.com/microsoft/qlib/issues/2097

### `AP-QLIB-1984` — LightGBM 模型标签维度校验逻辑永远不触发导致多标签训练静默失败 <sub>(medium)</sub>

Qlib gbdt.py 中用 y.values.ndim == 2 判断是否为多标签，但从 DataFrame 取出的 Series 的 ndim 永远为 1，条件永远为 False，因此多标签训练不会走 squeeze 分支，而是直接进入 LightGBM 训练并在更深处抛出语义不明的错误。 用户尝试自定义多标签任务时无法从错误信息定位到此根因。

Source: https://github.com/microsoft/qlib/issues/1984

### `AP-QLIB-1915` — 自定义 CSV 数据 dump_bin 后 DataHandler 报 Length mismatch，D.features 却正常 <sub>(high)</sub>

Qlib 存在两套数据访问路径：D.features（直接读 binary）和 DataHandler/DataHandlerLP （带 processor pipeline）。自定义 A 股 CSV 数据在 dump_bin 时若字段顺序 或 symbol 格式（如 600000.SH vs SH600000）与 Qlib 约定不符，DataHandler 的 processor 在 align/reindex 时触发 Length mismatch，而 D.features 因不 经过 processor 而成功。这一"两套路径行为不一致"让用户误以为数据已正确导入。

Source: https://github.com/microsoft/qlib/issues/1915

### `AP-QLIB-1949` — Colab/Linux 多进程后端与 Qlib ParallelExt 冲突导致 DataHandler 完全不可用 <sub>(medium)</sub>

Qlib 在非 fork 环境（Windows 或 Google Colab）中，DataHandler 使用 joblib 并行加载特征时，ParallelExt 初始化时访问 _backend_args 属性失败（AttributeError）。 根因是 joblib 1.5+ 移除了该内部属性，Qlib 的兼容层未更新。表现为 D.features 调用抛出多层嵌套异常，用户无法从错误栈判断是并行后端问题还是数据问题。

Source: https://github.com/microsoft/qlib/issues/1949

## vnpy (4)

### `AP-VNPY-3691` — K 线生成器首根 K 线时间戳不对齐，导致第一个周期信号错误 <sub>(high)</sub>

vnpy BarGenerator 在合成 N 分钟 K 线时，第一根推送的 K 线时间戳为"当前 tick 所在分钟"而非"完整 N 分钟周期结束时间"。具体表现：09:59 的 tick 会 触发一根不完整的 5 分钟 K 线推送（本应等到 10:04 才推送）。策略若在 on_bar 中直接用 datetime.minute % 5 过滤，第一根 K 线恰好通过，但包含的 数据不足一个完整周期，用于信号计算会产生错误的开仓信号。

Source: https://github.com/vnpy/vnpy/issues/3691

### `AP-VNPY-3669` — Alpha 模块历史数据增量保存时新旧 DataFrame schema 不兼容导致 SchemaError <sub>(medium)</sub>

vnpy Alpha 模块在保存 K 线数据到 Parquet 文件时，将新下载数据（可能含 Float64 列）与已存文件（历史 Int64 列）直接 polars.concat。polars 强类型 不允许隐式类型提升，抛出 SchemaError。根因是不同数据源/版本返回的字段类型 不一致（如 volume 在部分行情源为整数，在另一些为浮点），且 concat 前无 schema 对齐步骤。影响所有使用 vnpy alpha 进行回测的历史数据构建流程。

Source: https://github.com/vnpy/vnpy/issues/3669

### `AP-VNPY-3685` — 价差交易模块 run_backtesting() 在 Jupyter 环境下静默报错，结果不可信 <sub>(high)</sub>

vnpy 4.10 价差交易（SpreadTrading）模块的 run_backtesting() 在 Jupyter 环境下存在事件循环冲突（asyncio already running），导致回测引擎部分逻辑 不执行但不抛异常，返回看似正常的回测统计数据。同样代码在命令行 Python 中无此问题。vnpy 4.x 将部分 IO 改为 async 但 Jupyter 的事件循环与之不兼容， 是"回测结果看起来正确但实际不完整"的隐蔽陷阱。

Source: https://github.com/vnpy/vnpy/issues/3685

### `AP-VNPY-3700` — 安装脚本不使用 venv 导致全局 numpy 版本被降级破坏其他依赖 <sub>(medium)</sub>

vnpy install.bat 直接在系统/conda base 环境安装，会强制降级 numpy 到 <2.0 以满足 vnpy 依赖，破坏依赖 numpy 2.x 的其他量化工具（如 scipy、pytorch 新版）。 没有 requirements.txt，依赖边界不透明。在多工具共存的量化研究环境中， vnpy 的安装脚本是"全局环境污染"的常见根源。

Source: https://github.com/vnpy/vnpy/issues/3700

## zipline (6)

### `AP-ZIPLINE-138` — 回测价格为未复权价，教程图表误导用户误判策略收益 <sub>(high)</sub>

Zipline 教程使用 AAPL 股价图做演示，但 bundle 中存储的是未复权价格（raw price）， 而非经过拆股/分红调整的复权价。图表显示的历史价格与市场实际价约差 4 倍（Apple 历次拆股累计因子），用户误将"价格翻 4 倍"当作策略收益。A 股场景更严重： 除权前后价格跳变会在未复权数据中形成巨大"信号"，吸引技术指标在除权日产生 虚假突破信号。

Source: https://github.com/stefan-jansen/zipline-reloaded/issues/138

### `AP-ZIPLINE-235` — 默认以当根 K 线收盘价成交，低估实盘滑点，策略回测收益虚高 <sub>(high)</sub>

Zipline 默认滑点模型在当根 K 线触发信号后，以同根 K 线收盘价成交（current bar close fill）。实盘中信号只能在下一根 K 线的开盘价附近成交（T+1 order execution）。以 A 股日线为例，用收盘价回测比用次日开盘价成交平均高估日收益 约 0.1-0.3%，年化差距可超 30%。需显式配置 slippage model 为 VolumeShareSlippage 或 FixedSlippage 并设合理 volume_limit。

Source: https://github.com/stefan-jansen/zipline-reloaded/issues/235

### `AP-ZIPLINE-190` — 日历 start_session 设为非交易日触发 DateOutOfBounds，无提示如何修正 <sub>(medium)</sub>

Zipline 在注册 bundle 或运行算法时，若 start_session 参数恰好是非交易日 （如 1998-01-01 元旦），Calendar 校验抛出 DateOutOfBounds（"cannot be earlier than the first session"）。错误信息仅显示交易日历起始日，不提示"请改为第一个 交易日"。A 股场景：使用 SSE/SZSE 日历时，若 start_date 恰好是春节前最后 一天次日（节假日），会触发同类错误，调试成本极高。

Source: https://github.com/stefan-jansen/zipline-reloaded/issues/190

### `AP-ZIPLINE-181` — asset db 过期后 Pipeline 报"no assets traded"，误导用户排查数据范围 <sub>(high)</sub>

Zipline 的 asset database（SQLite）记录每只股票的 start/end 交易日期。若 使用了旧版 Quandl/自建 bundle 且未重新 ingest，在回测新日期范围时 Pipeline 抛出 "Failed to find any assets with country_code 'US' that traded between [dates]"。A 股场景：重新下载行情后若只更新价格数据而未重建 asset db，退市/ 新上市股票的日期范围不更新，Pipeline 过滤会悄悄排除这些股票，产生生存者偏差。

Source: https://github.com/stefan-jansen/zipline-reloaded/issues/181

### `AP-ZIPLINE-285` — week_start()/week_end() 在自定义日历（非美股）下静默失效 <sub>(medium)</sub>

Zipline schedule_function 的 date_rules.week_start() 和 date_rules.week_end() 依赖交易日历的周首/周末判断逻辑，但在非美股日历（如 ASX、SSE）中，该逻辑 与 NYSE 日历的偏移计算不兼容，导致 schedule 永远不触发或在错误的日期触发。 A 股场景：使用 SSE 日历时，含春节等连续长假的周，week_start 可能跳过整个 假期周而不调仓，但用户无法从日志发现未触发的调度。

Source: https://github.com/stefan-jansen/zipline-reloaded/issues/285

### `AP-ZIPLINE-240` — 回测日期时区必须为 UTC，传入 naive datetime 引发深层 AssertionError <sub>(medium)</sub>

Zipline 内部强制要求所有时间戳为 UTC aware datetime。当用户传入 naive datetime （无时区信息，如 pd.Timestamp('2020-01-01')）时，不在入口处报错，而是在 算法执行深处触发 AssertionError: Algorithm should have a utc datetime，栈深 难以定位。A 股开发者从本地 CST 时间导入数据时极易触发此陷阱，需在 bundle 注册时显式 tz_localize('UTC')。

Source: https://github.com/stefan-jansen/zipline-reloaded/issues/240

## zvt (6)

### `AP-ZVT-183` — 除权因子为 inf/NaN 时直接参与乘法导致复权静默失败 <sub>(high)</sub>

ZVT 在计算前复权因子时以 new/old 价格比计算 qfq_factor。当 old==0（新股首日 或数据缺失）时因子为 inf；当 kdata.open 本身为 None（停牌日未填充）时乘法 抛出 TypeError。结果：整个 entity 的复权计算中断，后续 K 线全部丢失，但主 流程只 log ERROR 不中断，用户往往不知道已有大量股票数据损坏。

Source: https://github.com/zvtvz/zvt/issues/183

### `AP-ZVT-179` — 第三方数据接口超限后异常被吞噬，数据静默缺失 <sub>(high)</sub>

ZVT 使用聚宽 jqdatasdk 批量拉取全市场 K 线时（4000+ 股票），触发聚宽每日 最大查询条数限制（错误：已超过每日最大查询数量）。ZVT 捕获异常后继续执行下一 entity，导致超限后所有股票的当日数据均静默缺失。回测若使用该残缺数据库，因 子计算结果将产生系统性偏差，且无告警。

Source: https://github.com/zvtvz/zvt/issues/179

### `AP-ZVT-161` — 全市场 SQLite 批量因子计算触发 too many SQL variables 错误 <sub>(medium)</sub>

ZVT 在计算 VolumeUpMaFactor 等多股因子时，将所有 entity_id 拼入单条 SQL 的 IN 子句。当 A 股全市场（5000+ 股）一次性查询时，触发 SQLite 默认限制 SQLITE_MAX_VARIABLE_NUMBER=999。调大 max_allowed_packet（MySQL 参数）无效， 根因是 SQLite 变量数上限。正确解法是分批查询，但 ZVT 早期版本未处理此边界。

Source: https://github.com/zvtvz/zvt/issues/161

### `AP-ZVT-129` — 使用通配符导入隐藏 API 版本变更，AdjustType 等枚举莫名消失 <sub>(medium)</sub>

ZVT 文档示例使用 `from zvt import *` 导入所有符号。当 ZVT 版本升级重构 枚举（如将 AdjustType 移入子模块）后，通配符导入不再包含该符号，触发 AttributeError。使用者误以为是安装问题，实际是版本间 API breaking change 未在 CHANGELOG 中标注，且通配符导入掩盖了具体来源。应显式 import 枚举类。

Source: https://github.com/zvtvz/zvt/issues/129

### `AP-ZVT-187` — 回测引擎未在数据层空结果时提前终止，导致空指针级联崩溃 <sub>(medium)</sub>

ZVT Trader 在 load_data 完成后检查数据为空时，不提前退出，而是将空 DataFrame 传入 selector 计算，触发后续 NoneType 操作链式崩溃。错误栈深且难以定位根因， 用户误以为是策略逻辑问题。根因是数据时间窗口配置错误（start/end 不在数据 库覆盖范围内）但无有效校验。

Source: https://github.com/zvtvz/zvt/issues/187

### `AP-ZVT-183B` — HFQ（后复权）与 QFQ（前复权）K 线表使用错误导致因子计算漂移 <sub>(high)</sub>

ZVT 提供 Stock1dKdata（不复权）、Stock1dHfqKdata（后复权）、Stock1dQfqKdata （前复权）三张独立表。用户在计算价格动量/均线因子时混用两张表（如用不复权 做均线，用后复权做收益率），导致除权日前后因子值产生跳变。ZVT 不做跨表 复权类型一致性校验，混用静默通过。

Source: https://github.com/zvtvz/zvt/issues/183

FILE:references/COMPONENTS.md
# Component Capability Map

**Project**: finance-bp-091--czsc
**Scan date**: 2026-04-22
**Stats**: {'total_files': 6, 'total_classes': 33, 'total_functions': 0, 'total_stages': 6}

## Modules (6)

- [data_collection_layer](components/data_collection_layer.md): 5 classes
- [chan_theory_analysis](components/chan_theory_analysis.md): 5 classes
- [signal_computation](components/signal_computation.md): 5 classes
- [event_&_position_management](components/event_-_position_management.md): 5 classes
- [trading_execution](components/trading_execution.md): 7 classes
- [backtest_&_performance_analysis](components/backtest_-_performance_analysis.md): 6 classes

FILE:references/CONSTRAINTS.md
# Constraints

## preservation_manifest

```yaml
required_objects:
  business_decisions_count: 162
  fatal_constraints_count: 46
  non_fatal_constraints_count: 189
  use_cases_count: 10
  semantic_locks_count: 12
  preconditions_count: 4
  evidence_quality_rules_count: 2
  traceback_scenarios_count: 5

```

## Domain Constraints Injected (71)

- **`SHARED-CN-ASTOCK-T1-001`** <sub>(fatal)</sub>: A 股股票实行 T+1 交收制度：T 日买入的股票最早 T+1 日方可卖出。 T 日卖出所得资金可当日再用于买入。回测框架若未施加 T+1 持仓锁定， 将高估换手率与策略胜率，尤其损害日内反转类策略的真实性。
- **`SHARED-CN-ASTOCK-T1-002`** <sub>(fatal)</sub>: 沪深主板股票日涨跌幅上限为 ±10%（ST/SST 股票 ±5%）。 涨停封板时买方消失、跌停封板时卖方消失；回测若假设当日可以任意价格 成交，会系统性高估可执行性。封板检测应在成交模拟层强制实施。
- **`SHARED-CN-ASTOCK-T1-003`** <sub>(high)</sub>: 科创板和创业板（2020年8月改革后）正常交易日涨跌幅为 ±20%； 北交所 ±30%；新股上市后前5个交易日不设涨跌幅限制。 回测若对所有股票统一套用 ±10% 过滤逻辑，会错误剔除或错误包含这些板块的成交。
- **`SHARED-CN-ASTOCK-T1-004`** <sub>(high)</sub>: ST/*ST 股票日涨跌幅限制为 ±5%，流动性极差，成交假设不可与正常股票混用。 包含历史 ST 股票（最终退市）但不纳入回测会产生幸存者偏差； 纳入回测但不区分 ST 涨跌幅会错误模拟成交。
- **`SHARED-CN-ASTOCK-T1-005`** <sub>(medium)</sub>: A 股开盘集合竞价（9:15-9:25）和收盘集合竞价（14:57-15:00）期间， 成交价由"最大成交量原则"确定，非即时撮合。回测以开盘价或收盘价假设 即时全量成交会低估实际滑点风险，大单策略尤为明显。
- **`SHARED-CN-ASTOCK-T1-006`** <sub>(high)</sub>: 停牌制度：A 股长期停牌（2018年前可长达数月）期间，持仓资金被锁定， 无法再平衡，机会成本在回测中普遍被忽略。应在因子计算前过滤停牌日 （volume == 0 或 is_suspended == True），停牌期间不发出信号。
- **`SHARED-CN-ASTOCK-T1-007`** <sub>(high)</sub>: 新股上市后前5个交易日无涨跌幅限制（首日涨幅可超300%）， 且无完整历史数据（均线/波动率/换手率因子无法计算）。 应在因子计算前过滤上市不足 N 个交易日（通常 60-252 日）的股票。
- **`SHARED-CN-ASTOCK-T1-008`** <sub>(high)</sub>: A 股程序化交易监管新规（2025年7月7日施行）：单账户每秒申报/撤单 ≥ 300 笔， 或单日申报/撤单 ≥ 20000 笔，被认定为高频交易，须向交易所报备。 AI 生成的量化策略若频率超标则无法合规运行，应在策略设计期提示。
- **`SHARED-CN-ASTOCK-ADJ-001`** <sub>(fatal)</sub>: 除权除息日股价跳空是账面调整而非真实亏损。复权选择： 不复权会虚增策略亏损；前复权会将历史价格内嵌未来分红信息（lookahead bias）； 后复权以上市首日为基准累积，是量化回测的最优选择。
- **`SHARED-CN-ASTOCK-ADJ-002`** <sub>(fatal)</sub>: A 股上市公司财务报告披露有法定延迟：年报在次年4月30日前、 半年报在8月31日前、季报分别在4月30日（一季）/10月31日（三季）前披露。 回测中使用财务数据时，必须以实际披露日期（announcement_date）而非 会计期间结束日作为数据可用时间点，否则引入 point-in-time lookahead bias。
- **`SHARED-CN-ASTOCK-ADJ-003`** <sub>(high)</sub>: 分红送股转增和配股会导致除权除息日后股本增加，历史持股数量不变但股价等比 缩水，若回测系统未同步调整持仓股数，会在除权日产生虚假亏损或盈利。
- **`SHARED-CN-ASTOCK-ADJ-004`** <sub>(medium)</sub>: 大宗交易与竞价交易价差：大宗交易成交价可比市价折价最多 10%（主板）， 但此价格不影响次日竞价开盘。大宗交易数据出现在收盘后，若将其混入 日内 OHLCV 数据，会污染收盘价和成交量的正常计算。
- **`SHARED-CN-ASTOCK-ADJ-005`** <sub>(fatal)</sub>: 融资融券（两融）做空限制：A 股散户无法直接卖空，融券标的池有限（主要为 大盘蓝筹，中小盘融券极度稀缺），融券利率远高于融资利率。 回测若直接假设可做空任意股票，会产生不可执行的策略，实盘与回测严重背离。
- **`SHARED-CN-ASTOCK-FX-001`** <sub>(high)</sub>: 通过沪深港通（北向）买入股票，境外投资者合计持股上限 30%，预警线 28%。 当外资持股比例达 28% 时，联交所暂停该股新增买盘，直到降至 26% 才恢复。 策略若重仓外资偏好股（消费/医药龙头），需监控外资持股比例。
- **`SHARED-CN-ASTOCK-FX-002`** <sub>(high)</sub>: 5% 举牌规则：单一投资者持有上市公司已发行股份超过 5%，须在3日内向证监会 和交易所报告并公告；在此期间及公告后2日内不得再买卖。 量化选股系统若不考虑此规则，重仓股超过 5% 阈值后将面临强制停止买入。
- **`SHARED-CN-ASTOCK-FX-003`** <sub>(high)</sub>: 公募基金"双十原则"：单基金持有单只股票不超过净资产 10%， 同一基金管理人旗下所有基金合计不超过该公司已发行股份 10%。 量化选股组合若部署于公募基金，需在优化约束中强制加入合规上限。
- **`SHARED-CN-ASTOCK-FX-004`** <sub>(fatal)</sub>: 内幕交易边界：AI 辅助量化系统的所有输入数据必须来自公开已披露信息。 通过非公开渠道（私有数据服务/内部消息/重组前预知）触发的自动化交易 构成内幕交易，适用《证券法》第80-83条及《内幕交易行为认定指引》。
- **`SHARED-CN-ASTOCK-MKT-001`** <sub>(fatal)</sub>: 幸存者偏差：使用当前 A 股成分股（如当前沪深300）作为历史回测股票池， 会遗漏曾被纳入指数但因业绩差被调出或退市的股票。2020-2024年 A 股 退市数量加速（41家/年创纪录），此偏差日趋严重。必须使用历史时点快照。
- **`SHARED-CN-ASTOCK-MKT-002`** <sub>(medium)</sub>: 指数成分股调整效应：沪深300/中证500等每半年调整一次（6月/12月）， 被纳入股票通常在公告日至生效日之间显著上涨（被动资金被动买入）， 被剔除股票则相反。回测股票池应使用历史成分股快照，并标注调整窗口期。
- **`SHARED-CN-ASTOCK-MKT-003`** <sub>(high)</sub>: 策略拥挤（Strategy Crowding）：大量量化私募使用相似因子模型时， 持仓高度重叠，遇市场冲击时集体卖出形成踩踏。2024年2月 A 股量化危机 是典型案例（小盘股指数单日跌幅超 10%）。需监控因子多头持仓与 主流量化基金的重叠率。
- **`SHARED-CN-ASTOCK-MKT-004`** <sub>(high)</sub>: A 股量化对冲策略常用 IF/IC/IM 股指期货做多/空对冲系统性风险。 但 A 股股指期货长期处于贴水（远期价格 < 现货），IC 年化贴水可达 10-20%。 回测若仅考虑价格收益而忽略期货贴水/升水，会严重高估对冲策略净收益。
- **`SHARED-CN-ASTOCK-MKT-005`** <sub>(high)</sub>: A 股月度动量因子在方向上与美股相反：近1个月表现最好的股票， 下1个月大概率反转（反转效应而非动量）。机构研究（华泰/东吴证券） 与学术论文均验证：直接套用美股月度动量因子在 A 股会产生系统性亏损。
- **`SHARED-CN-ASTOCK-BF-001`** <sub>(medium)</sub>: 处置效应（Shefrin & Statman 1985）在 A 股散户中尤为显著： 投资者倾向于过早卖出盈利股票、过长持有亏损股票。上交所实证研究证实 超过 90% 的个人账户存在此效应，AI 辅助工具不应迁就"持有亏损等解套" 的直觉，而应基于量化信号提供纪律性止损止盈建议。
- **`SHARED-CN-ASTOCK-BF-002`** <sub>(medium)</sub>: A 股以散户为主（个人账户交易量占比超 80%），羊群效应显著：散户倾向于 跟风操作，导致价格非理性波动（如 2015年杠杆牛熊）。量化策略应避免 使用成交量排行/热度排行等可能强化羊群信号的指标作为主要因子。
- **`SHARED-CN-ASTOCK-BF-003`** <sub>(medium)</sub>: 过度自信效应（Barber & Odean 2000）在 A 股散户中更严重：散户年均换手率 超 500%，机构长期收益显著优于散户。高换手率策略经交易成本后净收益往往 更低。AI 不应鼓励"频繁操作"，而应推荐低频高质信号驱动交易。
- **`SHARED-CN-ASTOCK-BF-004`** <sub>(medium)</sub>: A 股日历效应：春节效应（节前5日和节后1-3日倾向上涨）、月初效应 （月初第1-5个交易日表现优于月中/月末）已有学术实证（南京财经大学等）。 策略应在日历特殊窗口降低信号置信度，或单独评估日历驱动收益的贡献。
- **`SHARED-CN-ASTOCK-BF-005`** <sub>(high)</sub>: 策略容量（Capacity）限制：A 股小盘/微盘股日均成交额仅数百万， 大资金买入/卖出会造成严重价格冲击，策略实际容量可能仅几千万元。 回测结果不可外推至亿级资金，应在回测中加入成交量比例上限约束。
- **`SHARED-CN-ASTOCK-COST-001`** <sub>(fatal)</sub>: A 股完整交易成本结构（2023年8月调整后）：印花税卖出单向 0.05%； 佣金双向约 0.01%（最低5元）；过户费（沪市）0.001%； 滑点/冲击成本小盘股 0.1%-0.5%/次。忽略成本的回测策略年化收益率 具有欺骗性，高频/高换手策略尤甚。
- **`SHARED-CN-ASTOCK-COST-002`** <sub>(high)</sub>: 市场冲击成本（Market Impact）在回测中通常完全缺失，但在实盘中可能是 最大成本来源。A 股小盘股 100 万元买入可能推高 1% 以上。冲击成本与 成交规模呈幂律而非线性关系，应使用 Almgren-Chriss 模型或简化版估算。
- **`SHARED-CN-ASTOCK-COST-003`** <sub>(medium)</sub>: 大股东/董监高减持新规（证监会第224号令，2024年5月）：持股5%以上大股东 通过集中竞价减持须提前15个交易日披露减持计划，3个月内不超过股份总数1%。 解禁股减持压力是 A 股特有的系统性风险因子，回测中忽略解禁日历会低估 相关股票的持股风险。
- **`SHARED-CN-ASTOCK-DATA-001`** <sub>(high)</sub>: A 股交易日历与自然日历不一致：存在法定节假日调休导致的"补班日"（周六上班）， 以及临时停市（2015年7月8日至7月10日因股灾紧急停市）。 使用通用工作日历（weekdays）推算 A 股交易日会产生偏差， 必须使用 A 股专用交易日历（如 exchange_calendars 或 tushare 的交易日接口）。
- **`SHARED-CN-ASTOCK-DATA-002`** <sub>(medium)</sub>: A 股退市后股票代码可能被新股重用（极少见但存在）。使用纯代码（如 '000001'） 作为历史数据主键而不包含交易所后缀（'.SZ'）或上市日期范围，可能导致 历史数据与当前股票的错误混淆，长周期回测中需特别注意。
- **`SHARED-BT-LAB-001`** <sub>(fatal)</sub>: 未来函数（Lookahead Bias）：在模拟历史时间点 t 的交易决策时， 不得使用 t 时刻之后才能知道的信息。最常见形式： (1) 使用收盘价计算信号并同日以收盘价成交； (2) 将 T 日收盘后计算的指标标记在同一根 K 线； (3) 使用当日最高/最低价作为成交假设。 信号计算与成交时间必须对齐：T 日收盘后计算信号，T+1 日开盘成交。
- **`SHARED-BT-LAB-002`** <sub>(high)</sub>: 指标预热期（Warmup Period）处理：滚动窗口指标在前 N 个 bar 时 NaN， 这些 bar 不应参与信号计算和持仓决策。强制要求指标的 warmup_period 与最长 lookback 期等长，且 warmup 期间持仓应置零。
- **`SHARED-BT-LAB-003`** <sub>(fatal)</sub>: ML/DL 模型时序数据划分必须按时间顺序：TRAIN < VALID < TEST， 不可使用随机 k-fold 分折（会将未来数据混入训练集）。 应使用 TimeSeriesSplit 或 Walk-Forward 验证。
- **`SHARED-BT-LAB-004`** <sub>(fatal)</sub>: 开盘价/最高价/最低价成交假设：日线回测中假设每日可以最高价卖出或 最低价买入（如动量策略"最高价止盈"），这是明显的 lookahead， 因为日内最高/最低价只有收盘后才能确认。成交价只能用开盘价或 前一日收盘价（带滑点）。
- **`SHARED-BT-LAB-005`** <sub>(high)</sub>: 数据对齐偏移（Off-by-one）：pandas rolling/shift 等操作容易引入细微的 1 期偏移错误。应在代码中明确记录每个序列的"观测时间点"， 并通过 assert 验证关键时间对齐关系。
- **`SHARED-BT-LAB-006`** <sub>(high)</sub>: 过度优化（Overfitting）：回测数量越多，过拟合概率越高。 Bailey et al.（2014）证明 Optimal Sharpe Ratio 期望值随回测次数单调递减。 应使用 Walk-Forward 验证代替 in-sample 参数穷举，并报告 Deflated Sharpe Ratio（DSR）而非峰值 Sharpe。
- **`SHARED-BT-SURV-001`** <sub>(fatal)</sub>: 幸存者偏差（Survivorship Bias）：使用当前市场成分股作为历史回测股票池， 会遗漏曾经存在但后来退市、摘牌或被合并的股票，系统性高估策略历史收益率。 回测股票池必须使用历史时点快照（point-in-time universe）。
- **`SHARED-BT-SURV-002`** <sub>(high)</sub>: In-Sample / Out-of-Sample 划分：策略开发、参数选择必须在样本内完成， 样本外数据仅用于最终验证，不可多次"看"样本外数据后继续调优 （会将样本外变为新的样本内，重蹈过拟合）。
- **`SHARED-BT-SURV-003`** <sub>(high)</sub>: 停牌/缺失数据的填充策略：停牌日价格不可简单用前一日收盘价 forward-fill， 因为这会在复盘时造成"零成交量"日参与了因子计算和信号生成。 应在因子计算层显式过滤缺失交易日，不填充。
- **`SHARED-BT-SURV-004`** <sub>(high)</sub>: 异常值（Extreme Value）污染：原始市场数据可能含有数据源错误（如除权未 及时调整、手工录入错误导致的极端价格），不清洗直接进入因子计算会产生 极端信号，污染整个横截面。应在 pipeline 入口处过滤 3-sigma 异常值。
- **`SHARED-BT-COST-001`** <sub>(fatal)</sub>: 交易成本（佣金 + 印花税/转让税 + 过户费）必须在回测初始化时强制配置， 不可使用零成本默认值。忽略成本的回测策略绩效指标具有欺骗性， 高换手率策略尤其严重（单边往返成本往往吞噬 50%+ 的毛收益）。
- **`SHARED-BT-COST-002`** <sub>(high)</sub>: 滑点（Slippage）建模：回测若无滑点，假设每笔订单以理想价格成交， 高频策略在实盘中会因成交价劣化而产生严重亏损。至少应配置固定点差 或比例滑点；大单应使用成交量比例模型（如不超过日成交量 5%）。
- **`SHARED-BT-COST-003`** <sub>(high)</sub>: 换手率（Turnover）必须在回测绩效报告中展示并与成本关联分析。 月换手率超过 50%（年化 600%+）时，策略净收益对成本假设极度敏感， 每 10bps 成本变化可能改变策略盈亏结论，必须做成本敏感性分析。
- **`SHARED-BT-COST-004`** <sub>(medium)</sub>: 仓位规模化（Position Sizing）必须纳入资金量约束：回测应模拟固定资金量 下的实际持仓股数（取整），而非假设可以持有小数股。 对小盘股，最小交易单位（A股：100股/手）会导致实际可持仓量与目标权重 产生偏差，应在回测中模拟取整效应。
- **`SHARED-BT-TIME-001`** <sub>(high)</sub>: 时间戳时区统一：多数据源合并时，UTC vs 本地时间混用是常见数据腐败源。 所有时间戳必须在 pipeline 入口处统一转换为同一时区（推荐 UTC 存储， 市场本地时区展示），不可在 pipeline 中途混用不同时区。
- **`SHARED-BT-TIME-002`** <sub>(high)</sub>: 交易日历对齐：合并不同市场或不同频率数据时（如日线价格 + 周频因子）， 必须使用明确的交易日历进行 reindex/merge，不可使用 outer join 后 fillna， 否则会在非交易日（节假日）创建虚假数据行。
- **`SHARED-BT-TIME-003`** <sub>(high)</sub>: 增量更新边界校验：历史数据增量更新时，必须从数据库查询已存最新日期， 仅下载该日期之后的数据。若重新下载已有数据并追加，会产生时间戳重复行， 导致回测时序错误。更新前后必须校验无重复 (index.duplicated().any() == False)。
- **`SHARED-BT-TIME-004`** <sub>(medium)</sub>: 回测绩效归因失真：基准（Benchmark）选择不当会使 Alpha/Beta 计算失真。 应选用策略实际可投资的被动基准（如 HS300 ETF），而非不可直接投资的 价格指数（如 HS300 指数）。价格指数不含股息再投资，会低估持仓基准收益。
- **`SHARED-BT-PERF-001`** <sub>(medium)</sub>: 最大回撤（Max Drawdown）计算必须使用净值序列（portfolio value）， 不可用累计收益率序列代替。若使用对数收益率累加，会低估回撤深度 （因对数收益率在下跌时会比简单收益率偏小）。
- **`SHARED-BT-PERF-002`** <sub>(medium)</sub>: Sharpe Ratio 年化化约定：年化 Sharpe = 日 Sharpe × sqrt(252)（股票，252 交易日） 或 × sqrt(365)（加密货币，365日）。不同系统默认不同，跨系统对比前必须 确认年化因子，否则 Sharpe 不可比。
- **`SHARED-BT-PERF-003`** <sub>(medium)</sub>: Calmar Ratio / Sortino Ratio 优于 Sharpe Ratio 作为风险调整收益指标： Sharpe 假设收益正态分布，A 股/加密市场的收益分布显著左偏（肥尾）， 会低估下行风险。量化评估应同时报告 Sortino（仅下行波动）和 Calmar（年化收益/最大回撤），不应单一依赖 Sharpe。
- **`SHARED-BT-PERF-004`** <sub>(medium)</sub>: 回测绩效归因应拆解为：alpha（主动收益）、beta（市场收益）、 因子暴露收益（style/sector）和特异性收益（stock selection）。 不做归因的回测无法区分"策略优秀"与"顺风行情恰好 beta 对了"。
- **`SHARED-FR-IC-001`** <sub>(high)</sub>: IC（信息系数）是衡量因子预测能力的核心指标，定义为因子值与 下期收益率的 Spearman 秩相关系数（ICIR = IC / std(IC)）。 IC 绝对值 > 0.05 视为有预测能力的初步证据，ICIR > 0.5 视为稳定。 不计算 IC 直接报告回测绩效是因子有效性证明缺失的典型问题。
- **`SHARED-FR-IC-002`** <sub>(high)</sub>: IC 衰减（IC Decay）分析：因子预测能力通常随持仓期增长而衰减。 应计算 1/5/10/20 日 IC 序列，识别因子的最优持仓期。 IC 在1日高但20日迅速衰减的因子是短期因子，不适合月度换仓策略； 反之亦然。使用错误的持仓期会严重损害因子实盘表现。
- **`SHARED-FR-IC-003`** <sub>(high)</sub>: Harvey, Liu & Zhu (2016) 警告：学术界已发现 300+ 个"显著"因子， 其中大量是多重检验下的误发现（False Discovery）。因子有效性要求： t-stat > 3.0（而非传统的 1.96）；或在不同时段/市场独立复现； 或有清晰的经济学逻辑。不满足上述条件的因子极可能是数据挖掘产物。
- **`SHARED-FR-IC-004`** <sub>(high)</sub>: 因子换手率（Factor Turnover）控制：高 IC 但高换手率的因子，在扣除 交易成本后净 IC 可能为负。应计算换手率调整后的有效 IC： net_IC = IC - turnover × cost_per_turn。目标换手率 ≤ 50%（月频）。
- **`SHARED-FR-IC-005`** <sub>(medium)</sub>: 因子衰减期（Half-life）是因子信号强度的核心参数，直接决定最优再平衡频率。 半衰期 < 5 日：日频或周频换仓；5-20 日：周频或双周；> 20 日：月频换仓。 错误地对短期因子使用月频换仓，会导致大量 alpha 在持仓期内消散。
- **`SHARED-FR-NEUT-001`** <sub>(high)</sub>: 行业中性化（Industry Neutralization）：因子值若不对行业均值中性化， 因子收益中会混入行业轮动收益，难以判断是因子本身还是行业暴露驱动了收益。 行业中性化操作：factor_neutral = factor - industry_mean(factor)。
- **`SHARED-FR-NEUT-002`** <sub>(high)</sub>: 市值中性化（Market Cap Neutralization）：小盘股效应（小盘跑赢大盘） 是金融史上最持久的 anomaly 之一，会污染几乎所有未中性化的因子。 若因子与市值高度相关，选股会系统性偏向小盘，收益来自市值暴露而非因子本身。 需同时进行行业和市值中性化（Fama-MacBeth 回归或残差法）。
- **`SHARED-FR-NEUT-003`** <sub>(high)</sub>: 异常值处理（Winsorize/MAD）：因子原始值通常含有极端值，极端值会扭曲 分组分析（如 Q1/Q10 十分位）。应对原始因子值做 Winsorize（截尾至 [1%, 99%] 或 3-sigma）或 MAD（中位数绝对偏差）缩尾，然后再排名/中性化。
- **`SHARED-FR-NEUT-004`** <sub>(medium)</sub>: 因子正交化（Factor Orthogonalization）：当多个因子共同用于合成打分时， 高相关因子的合成等效于对单一因子过度权重，稀释信号多样性。 应在合成前对因子做施密特正交化或 PCA，消除因子间的多重共线性。
- **`SHARED-FR-NEUT-005`** <sub>(medium)</sub>: 缺失数据填充策略：因子计算中的 NaN（停牌/新股/数据缺口）若用截面均值填充 会引入 lookahead bias（均值本身含未来信息）；若完全删除会产生幸存者偏差； 正确做法是用截面中位数（当日所有股票的中位数，不依赖未来）或将该股当日排除。
- **`SHARED-FR-PORT-001`** <sub>(high)</sub>: 分层分析（Quantile Analysis）：因子评估应使用 Q1/Q5（五分位）或 Q1/Q10（十分位）分组的多空收益差（top minus bottom spread）作为 主要评估指标，而非简单的多头收益。Q1 多 Q5 空的"单调性"检验是 因子有效性的核心证据：单调递增/递减 > 非单调 >> 仅多头有效。
- **`SHARED-FR-PORT-002`** <sub>(medium)</sub>: Alpha 衰减测试（Alpha Decay Test）：因子的月度 IC 在不同时段（牛市/熊市/ 震荡市）的稳定性是因子鲁棒性的重要证据。IC 仅在某个特定市场状态下有效 的因子不适合全天候部署；应分段（rolling 12M）展示 IC 时序， 识别因子失效期。
- **`SHARED-FR-PORT-003`** <sub>(medium)</sub>: 换仓成本感知（Turnover-Aware Selection）：因子排名靠近中间地带（49-51 分位） 的股票，排名小幅波动就会触发换仓，产生大量无效交易成本。 应在选股时设置换仓缓冲区（buffer zone）：只在排名变化超过阈值时才换仓。
- **`SHARED-FR-PORT-004`** <sub>(medium)</sub>: 分组收益的统计显著性（Bootstrap 检验）：因子分层收益差（Q1-Q5 spread） 即使在历史数据上很大，也可能是偶然，需要 bootstrap 或 t-test 检验 显著性（p-value < 0.05）。小样本回测期（< 3年）的分层收益尤其不可靠。
- **`SHARED-FR-XFER-001`** <sub>(high)</sub>: 因子跨市场可移植性验证：在一个市场有效的因子，不必然在另一个市场有效。 将美股因子直接套用 A 股、或将股票因子套用期货/加密货币，需要独立 IC 验证， 不可假设跨市场通用性。A 股特有异象（如反转效应、ST 价格异常）不存在于美股。
- **`SHARED-FR-XFER-002`** <sub>(medium)</sub>: 因子有效性时间稳定性：曾经有效的因子会因市场学习和套利行为逐渐失效 （McLean & Pontiff 2016 证明因子发表后平均衰减 58%）。 应定期（每季度/年）重新评估因子 IC，失效因子应及时替换或降权。
- **`SHARED-FR-XFER-003`** <sub>(medium)</sub>: 因子与宏观经济环境的交互：利率周期/经济周期/市场情绪对因子有效性影响显著。 价值因子（低 P/B）在利率上升期更有效；动量因子在趋势市更有效，震荡市失效。 部署因子前应评估当前宏观环境与因子最优生存环境的匹配度。

FILE:references/LOCKS.md
# Semantic Locks + Preconditions

## Semantic Locks (12)

### `SL-01` <sub>(on_violation: fatal)</sub>
Execute sell orders before buy orders in every trading cycle

### `SL-02` <sub>(on_violation: fatal)</sub>
Trading signals MUST use next-bar execution (no look-ahead)

### `SL-03` <sub>(on_violation: fatal)</sub>
Entity IDs MUST follow format entity_type_exchange_code

### `SL-04` <sub>(on_violation: fatal)</sub>
DataFrame index MUST be MultiIndex (entity_id, timestamp)

### `SL-05` <sub>(on_violation: fatal)</sub>
TradingSignal MUST have EXACTLY ONE of: position_pct, order_money, order_amount

### `SL-06` <sub>(on_violation: fatal)</sub>
filter_result column semantics: True=BUY, False=SELL, None/NaN=NO ACTION

### `SL-07` <sub>(on_violation: fatal)</sub>
Transformer MUST run BEFORE Accumulator in factor pipeline

### `SL-08` <sub>(on_violation: fatal)</sub>
MACD parameters locked: fast=12, slow=26, signal=9

### `SL-09` <sub>(on_violation: warning)</sub>
Default transaction costs: buy_cost=0.001, sell_cost=0.001, slippage=0.001

### `SL-10` <sub>(on_violation: fatal)</sub>
A-share equity trading is T+1 (no same-day close of buy positions)

### `SL-11` <sub>(on_violation: fatal)</sub>
Recorder subclass MUST define provider AND data_schema class attributes

### `SL-12` <sub>(on_violation: fatal)</sub>
Factor result_df MUST contain either 'filter_result' OR 'score_result' column

## Preconditions (4)

- **`PC-01`**: `python3 -c 'import zvt; print(zvt.__version__)'` → on_fail: Run: python3 -m pip install zvt then re-run: python3 -m zvt.init_dirs to initialize data directories
- **`PC-02`**: `python3 -c "from zvt.api.kdata import get_kdata; df = get_kdata(entity_ids=['stock_sh_600000'], limit=1); assert df is not None and len(df) > 0, 'No kdata found'"` → on_fail: Run recorder first: python3 -m zvt.recorders.em.em_stock_kdata_recorder --entity_ids stock_sh_600000 (replace with your target entity IDs)
- **`PC-03`**: `python3 -c "import os; from pathlib import Path; zvt_home = Path(os.environ.get('ZVT_HOME', Path.home() / '.zvt')); assert zvt_home.exists(), f'ZVT home not found: {zvt_home}'"` → on_fail: Run: python3 -m zvt.init_dirs
- **`PC-04`**: `python3 -c "import os, tempfile; from pathlib import Path; zvt_home = Path(os.environ.get('ZVT_HOME', Path.home() / '.zvt')); test_f = zvt_home / '.write_test'; test_f.touch(); test_f.unlink()"` → on_fail: Check directory permissions: chmod u+w ~/.zvt or set ZVT_HOME environment variable to a writable location

FILE:references/USE_CASES.md
# Known Use Cases (KUC)

Total: **10**

## `KUC-101`
**Source**: `docs/source/conf.py`

Configuring Sphinx documentation builder for the czsc project, ensuring proper Python path setup and Rust version priority.

## `KUC-102`
**Source**: `examples/develop/czsc_benchmark.py`

Benchmarking CZSC analysis performance with varying K-line counts to measure initialization speed and memory usage.

## `KUC-103`
**Source**: `examples/develop/test_trading_view_kline.py`

Testing and demonstrating K-line visualization using trading_view_kline function with mock data.

## `KUC-104`
**Source**: `examples/signals_dev/bar_volatility_V241013.py`

Classifying market volatility into three tiers (low/middle/high) based on recent K-line price ranges for signal generation.

## `KUC-105`
**Source**: `examples/signals_dev/signal_match.py`

Parsing and analyzing signal definitions from czsc.signals module using SignalsParser for research and configuration purposes.

## `KUC-106`
**Source**: `examples/use_backtest_report.py`

Generating HTML and PDF backtest reports from trading strategy performance data for analysis and presentation.

## `KUC-107`
**Source**: `examples/use_cta_research.py`

Using CTAResearch framework to develop and test CTA trading strategies with mock data through backtesting.

## `KUC-108`
**Source**: `examples/use_html_report_builder.py`

Creating flexible HTML reports with custom headers, performance metrics, charts, and tables using HtmlReportBuilder.

## `KUC-109`
**Source**: `examples/use_optimize.py`

Optimizing entry and exit trading signals by systematically searching candidate signal combinations to find optimal parameters.

## `KUC-110`
**Source**: `examples/事件策略研究工具使用案例.ipynb`

Researching event-based trading strategies using CZSC objects for K-line analysis,笔 detection, and chart visualization.

FILE:references/WISDOM.md
# Cross-Project Wisdom

Total: **10**

## `CW-BT-001` — Cerebro 统一编排引擎
**From**: backtrader · **Applicable to**: backtesting

backtrader 用 Cerebro 作为单一入口，统一管理 data feeds、strategies、analyzers、 observers 的生命周期，支持一次 cerebro.run() 跑多策略+多数据源。 zvt 的 StockTrader 目前每次实例化只绑定一套因子，缺乏统一的多策略组合编排层； 借鉴 Cerebro 模式可让用户把多个 Trader 实例组合到一个 runner 中对比评估。

## `CW-BT-002` — Analyzer 插件化绩效评估
**From**: backtrader · **Applicable to**: backtesting

backtrader 提供 SharpeRatio、DrawDown、TimeReturn、TradeAnalyzer 等即插即用 的 Analyzer，可在不修改策略代码的情况下附加任意绩效指标。 zvt 当前绩效评估能力较弱，没有标准化的 Analyzer 接口； 借鉴此模式可让用户 cerebro.addanalyzer(SharpeRatio) 即得风险调整收益报告。

## `CW-BT-003` — Sizer 仓位管理分离
**From**: backtrader · **Applicable to**: backtesting

backtrader 将仓位管理（每次开仓买多少股/多大比例）单独抽象为 Sizer， 与信号逻辑完全解耦；内置 FixedSize、PercentSizer 等，用户可自定义。 zvt 目前没有显式的 Sizer 概念，仓位控制逻辑散落在 Trader.on_profit_control 等钩子中； 引入 Sizer 接口可使策略信号与资金管理规则独立演化和组合复用。

## `CW-BT-004` — Order 类型全集（Limit/Stop/OCO/Bracket）
**From**: backtrader · **Applicable to**: backtesting

backtrader 支持 Market、Limit、Stop、StopLimit、OCO（二选一）、 Bracket（止盈止损一对订单）等丰富订单类型，并模拟成交滑点和手续费方案。 zvt 回测目前主要支持市价成交，缺乏限价委托和组合订单模拟； 对于高频或实盘对接场景，完善订单类型将大幅提升回测真实性。

## `CW-BT-005` — 数据重采样与重播（Resampling & Replaying）
**From**: backtrader · **Applicable to**: backtesting

backtrader 可将低级别数据（如 1 min）实时 resample 为高级别（如 1 day）并同步驱动策略， 或 replay 逐 tick 模拟 OHLC 形成过程，实现日内精细回测。 zvt 目前多时间框架通过预录入不同级别 K 线实现，缺少运行时动态重采样； 借鉴此模式可在不重复录入数据的前提下支持任意时间粒度组合回测。

## `CW-VN-003` — CTA 回测引擎内置可视化
**From**: vnpy · **Applicable to**: backtesting

vnpy 的 cta_backtester 提供图形界面直接展示策略净值曲线、最大回撤、 每日盈亏、成交明细，无需 Jupyter Notebook。 zvt 目前回测结果可视化依赖 draw_result 方法调用 Plotly，但无统一的回测报告页面； 借鉴此模式可打包一个开箱即用的策略绩效仪表盘。

## `CW-VN-004` — vnpy.alpha ML 因子研究实验室（Lab）
**From**: vnpy · **Applicable to**: factor-research

vnpy 4.0 的 vnpy.alpha.lab 提供数据管理、模型训练、信号生成、策略回测一体化工作流， 支持 Lasso/LightGBM/MLP 等算法的标准化训练接口和可视化对比。 zvt 的 ML 能力目前仅有 MaStockMLMachine 一个入口，缺乏规范化 Lab 框架； 借鉴 Lab 模式可建立"特征工程→训练→信号→回测"的标准流水线，降低 ML 实验门槛。

## `CW-QL-001` — Point-in-Time 数据库（防未来数据泄漏）
**From**: qlib · **Applicable to**: backtesting

qlib 的 Point-in-Time Provider 保证在给定时间点 t 的查询只返回 t 时刻 真实可知的数据（财报发布延迟、修订历史均被正确处理）， 彻底消除回测中的 look-ahead bias。 zvt 目前财务数据以报告期为 timestamp，缺少"发布日"维度， 存在用未来财报数据做选股的潜在偏差；引入 PIT 模式可大幅提升回测可信度。

## `CW-QL-002` — Recorder + Experiment 实验管理（MLflow 风格）
**From**: qlib · **Applicable to**: factor-research

qlib 的 workflow 模块提供 Experiment/Recorder，自动记录每次模型训练的 超参数、特征、指标、预测结果，支持跨实验比较和模型版本管理。 zvt 目前缺乏 ML 实验追踪机制，每次重跑结果会覆盖前次； 借鉴 Recorder 模式可将每次因子实验的参数和结果持久化，支持快速复现和版本对比。

## `CW-QL-003` — Nested Decision Framework（多层嵌套决策执行）
**From**: qlib · **Applicable to**: backtesting

qlib 支持将高频执行层（分钟级委托拆单）嵌套在低频决策层（日级组合调仓）中， 两层独立优化且可组合运行，实现日内最优执行算法（如 TWAP、VWAP 调仓）。 zvt 目前回测仅有日线级别的成交假设，缺乏执行算法建模； 借鉴嵌套框架可让 zvt 区分"何时持有哪些股"与"如何以最小冲击成本建仓"两个问题。

FILE:references/components/backtest_-_performance_analysis.md
# backtest_&_performance_analysis (6 classes)

## `WeightBacktest.evaluate`
`backtest_&_performance_analysis/weightbacktest-evaluate.py:0`

## `PairsPerformance.calculate`
`backtest_&_performance_analysis/pairsperformance-calculate.py:0`

## `KlineChart.render`
`backtest_&_performance_analysis/klinechart-render.py:0`

## `evaluate_holds`
`backtest_&_performance_analysis/evaluate-holds.py:0`

## `report_format`
`backtest_&_performance_analysis/report-format.py:0`

## `performance_metrics`
`backtest_&_performance_analysis/performance-metrics.py:0`

FILE:references/components/chan_theory_analysis.md
# chan_theory_analysis (5 classes)

## `CZSC.update`
`chan_theory_analysis/czsc-update.py:0`

## `check_bi`
`chan_theory_analysis/check-bi.py:0`

## `remove_include`
`chan_theory_analysis/remove-include.py:0`

## `bi_recognition_algorithm`
`chan_theory_analysis/bi-recognition-algorithm.py:0`

## `kline_processing`
`chan_theory_analysis/kline-processing.py:0`

FILE:references/components/data_collection_layer.md
# data_collection_layer (5 classes)

## `DataClient.get_bars`
`data_collection_layer/dataclient-get-bars.py:0`

## `BarGenerator.update`
`data_collection_layer/bargenerator-update.py:0`

## `FeishuApiBase.upload_file`
`data_collection_layer/feishuapibase-upload-file.py:0`

## `data_source`
`data_collection_layer/data-source.py:0`

## `cache_backend`
`data_collection_layer/cache-backend.py:0`

FILE:references/components/event_-_position_management.md
# event_&_position_management (5 classes)

## `Event.is_match`
`event_&_position_management/event-is-match.py:0`

## `Position.update`
`event_&_position_management/position-update.py:0`

## `Position._can_close_today`
`event_&_position_management/position-can-close-today.py:0`

## `risk_controls`
`event_&_position_management/risk-controls.py:0`

## `reentry_policy`
`event_&_position_management/reentry-policy.py:0`

FILE:references/components/signal_computation.md
# signal_computation (5 classes)

## `CzscSignals.update_signals`
`signal_computation/czscsignals-update-signals.py:0`

## `SignalsParser.parse`
`signal_computation/signalsparser-parse.py:0`

## `get_signals_by_conf`
`signal_computation/get-signals-by-conf.py:0`

## `signal_library`
`signal_computation/signal-library.py:0`

## `frequency_selection`
`signal_computation/frequency-selection.py:0`

FILE:references/components/trading_execution.md
# trading_execution (7 classes)

## `CzscTrader.update`
`trading_execution/czsctrader-update.py:0`

## `CzscStrategyBase.positions`
`trading_execution/czscstrategybase-positions.py:0`

## `CzscTrader.get_ensemble_pos`
`trading_execution/czsctrader-get-ensemble-pos.py:0`

## `DummyBacktest.on_sig`
`trading_execution/dummybacktest-on-sig.py:0`

## `ensemble_method`
`trading_execution/ensemble-method.py:0`

## `execution_mode`
`trading_execution/execution-mode.py:0`

## `strategy_base`
`trading_execution/strategy-base.py:0`

ClawHub Coding Data Analysis+2

T@clawhub-tangweigang-jpg-8679fec286

Cuemacro Finmarket

Skill

金融市场回测框架，支持FX G10货币对技术指标策略回测、ArcticDB高频tick数据本地与S3云端存储、Quandl等数据源的市场数据获取与缓存。

---
name: cuemacro-finmarket
description: |-
  金融市场回测框架，支持FX G10货币对技术指标策略回测、ArcticDB高频tick数据本地与S3云端存储、Quandl等数据源的市场数据获取与缓存。
license: Proprietary. See LICENSE.txt in project root.
compatibility: Designed for Doramagic-host ecosystem (Claude Code / openclaw / Cursor). Requires Python 3.12+ with uv package manager.
metadata:
  version: "v6.1"
  blueprint_id: "finance-bp-108"
  compiled_at: "2026-04-22T13:00:51.768652+00:00"
  capability_markets: "multi-market"
  capability_activities: "portfolio-analytics"
  sop_version: "crystal-compilation-v6.1"
---
# Cuemacro 市场工具 (cuemacro-finmarket)

> 金融市场回测框架，支持FX G10货币对技术指标策略回测、ArcticDB高频tick数据本地与S3云端存储、Quandl等数据源的市场数据获取与缓存。

## Pipeline

`data_collection -> data_storage -> factor_computation -> target_selection -> trading_execution -> visualization`

## Top Use Cases (4 total)

### ArcticDB Tick Data Storage (`UC-101`)
Provides persistent storage for high-frequency tick market data using ArcticDB, supporting both local LMDB and S3 cloud storage backends for efficient
**Triggers**: arcticdb, tick data storage, time series database

### Market Data Fetching from Vendors (`UC-103`)
Fetches economic and financial market data from external vendors like Quandl, demonstrating how to request and cache market data with specific fields
**Triggers**: market data, quandl, fetch data

### S3 Cloud Storage for Tick Data (`UC-104`)
Demonstrates writing and reading tick market data to/from AWS S3 cloud storage using Parquet format for efficient compression and retrieval of histori
**Triggers**: s3 storage, aws, parquet

For all **4** use cases, see [references/USE_CASES.md](references/USE_CASES.md).

**Execute trigger**: `When user intent matches intent_router.uc_entries[].positive_terms AND user uses action verb (run/execute/跑/执行/backtest/fetch/collect)`

## What I'll Ask You

- Target market: A-share (default), HK, or crypto? (US stocks in ZVT are half-baked — stockus_nasdaq_AAPL exists but coverage is thin)
- Data source / provider: eastmoney (free, no account), joinquant (account+paid), baostock (free, good history), akshare, or qmt (broker)?
- Strategy type: MACD golden-cross, MA crossover, volume breakout, fundamental screen, or custom factor?
- Time range: start_timestamp and end_timestamp for backtest period
- Target entity IDs: specific stocks (stock_sh_600000) or index components (SZ1000)?

## Semantic Locks (Fatal)

| ID | Rule | On Violation |
|---|---|---|
| `SL-01` | Execute sell orders before buy orders in every trading cycle | halt |
| `SL-02` | Trading signals MUST use next-bar execution (no look-ahead) | halt |
| `SL-03` | Entity IDs MUST follow format entity_type_exchange_code | halt |
| `SL-04` | DataFrame index MUST be MultiIndex (entity_id, timestamp) | halt |
| `SL-05` | TradingSignal MUST have EXACTLY ONE of: position_pct, order_money, order_amount | halt |
| `SL-06` | filter_result column semantics: True=BUY, False=SELL, None/NaN=NO ACTION | halt |
| `SL-07` | Transformer MUST run BEFORE Accumulator in factor pipeline | halt |
| `SL-08` | MACD parameters locked: fast=12, slow=26, signal=9 | halt |

Full lock definitions: [references/LOCKS.md](references/LOCKS.md)

## Top Anti-Patterns (14 total)

- **`AP-PORTFOLIO-ANALYTICS-001`**: Division by zero in price ratio calculations corrupts rebalancing
- **`AP-PORTFOLIO-ANALYTICS-002`**: Look-ahead bias from unshifted signal generation and position calculations
- **`AP-PORTFOLIO-ANALYTICS-003`**: Non-positive-semidefinite covariance matrix breaks CVXPY optimization

All 14 anti-patterns: [references/ANTI_PATTERNS.md](references/ANTI_PATTERNS.md)

## Evidence Quality Notice

> [QUALITY NOTICE] This crystal was compiled from blueprint finance-bp-108. Evidence verify ratio = 32.0% and audit fail total = 18. Generated results may have uncaptured requirement gaps. Verify critical decisions against source files (LATEST.yaml / LATEST.jsonl).

## Reference Files

| File | Contents | When to Load |
|---|---|---|
| [references/seed.yaml](references/seed.yaml) | V6+ 全量权威 (source-of-truth) | 有行为/决策争议时必读 |
| [references/ANTI_PATTERNS.md](references/ANTI_PATTERNS.md) | 14 条跨项目反模式 | 开始实现前 |
| [references/WISDOM.md](references/WISDOM.md) | 跨项目精华借鉴 | 架构决策时 |
| [references/CONSTRAINTS.md](references/CONSTRAINTS.md) | domain + fatal 约束 | 规则冲突时 |
| [references/USE_CASES.md](references/USE_CASES.md) | 全量 KUC-* 业务场景 | 需要完整示例时 |
| [references/LOCKS.md](references/LOCKS.md) | SL-* + preconditions + hints | 生成回测/交易代码前 |
| [references/COMPONENTS.md](references/COMPONENTS.md) | AST 组件地图（按 module 拆分）| 查 API 时 |

---

*Compiled by Doramagic crystal-compilation-v6.1 from `finance-bp-108` blueprint at 2026-04-22T13:00:51.768652+00:00.*
*See [human_summary.md](human_summary.md) for non-technical overview.*

FILE:human_summary.md
# Human Summary

> I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow. Just tell me what you want; I'll write the code, you don't have to dig docs. (Heads up: ZVT natively supports A-share, HK, and crypto. US stocks — stockus_nasdaq_AAPL — are half-baked; don't bother for serious work.)

## What I Can Do

- **tagline**: I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow. Just tell me what you want; I'll write the code, you don't have to dig docs. (Heads up: ZVT natively supports A-share, HK, and crypto. US stocks — stockus_nasdaq_AAPL — are half-baked; don't bother for serious work.)
- **use_cases**: ['Market Data Fetching from Vendors', 'FX G10 Cross Backtesting', 'ArcticDB Tick Data Storage', 'A-share MACD daily golden-cross backtest with hfq price adjustment from eastmoney', 'End-to-end ZVT pipeline: FinanceRecorder + GoodCompanyFactor + StockTrader', 'Multi-factor strategy with TargetSelector (AND mode) combining MACD + volume breakout', 'Index composition data collection (SZ1000, SZ2000) with EM recorder']

## What I Ask You

- Target market: A-share (default), HK, or crypto? (US stocks in ZVT are half-baked — stockus_nasdaq_AAPL exists but coverage is thin)
- Data source / provider: eastmoney (free, no account), joinquant (account+paid), baostock (free, good history), akshare, or qmt (broker)?
- Strategy type: MACD golden-cross, MA crossover, volume breakout, fundamental screen, or custom factor?
- Time range: start_timestamp and end_timestamp for backtest period
- Target entity IDs: specific stocks (stock_sh_600000) or index components (SZ1000)?

FILE:references/ANTI_PATTERNS.md
# Anti-Patterns (Cross-Project)

Total: **14**

## finance-bp-066--wealthbot (2)

### `AP-PORTFOLIO-ANALYTICS-001` — Division by zero in price ratio calculations corrupts rebalancing <sub>(high)</sub>

When calculating price_diff using current_price divided by old_price without validating old_price is non-zero, the result is NaN or INF. This corrupts portfolio rebalancing calculations in wealthbot, causing incorrect buy/sell decisions based on invalid prices_diff values. The same issue appears in getPricesDiff() where divide-by-zero when old_price equals zero produces NaN/infinity that propagates to all subsequent trade decisions.

### `AP-PORTFOLIO-ANALYTICS-004` — Incorrect portfolio value tracking destroys time-series integrity <sub>(high)</sub>

Updating existing ClientPortfolioValue records instead of creating new ones destroys the time-series integrity needed for billing calculations and historical reconciliation. This creates data corruption where billing calculations and historical reporting against custodian records will fail to match. Portfolio value records must be linked to parent ClientPortfolio via proper relationships to avoid orphaned records.

## finance-bp-068--xalpha (1)

### `AP-PORTFOLIO-ANALYTICS-006` — FIFO sell order violation corrupts cost basis and XIRR <sub>(high)</sub>

Processing positions out of chronological order in FIFO sell operations causes incorrect cost basis assignment, leading to inaccurate realized gains/losses and wrong XIRR calculation. Chinese funds have tiered redemption fees based on holding periods, so FIFO violations result in incorrect holding period calculation and wrong redemption fee being applied, causing direct financial loss.

## finance-bp-068--xalpha, finance-bp-093--PyPortfolioOpt, finance-bp-117--Riskfolio-Lib (1)

### `AP-PORTFOLIO-ANALYTICS-010` — Missing DataFrame schema validation causes KeyError propagation <sub>(medium)</sub>

Passing non-DataFrame objects (numpy arrays, lists) where DataFrame is expected causes NameError, AttributeError, or TypeError in downstream pandas operations. xalpha's fundinfo.price requires specific columns (date, netvalue, totvalue, comment), PyPortfolioOpt and Riskfolio-Lib require index alignment between expected returns and covariance matrix. Missing columns cause backtest calculations to fail with NaN values or KeyError.

## finance-bp-082--stock-screener (1)

### `AP-PORTFOLIO-ANALYTICS-007` — Score validation bypass allows invalid composite calculations <sub>(medium)</sub>

Accepting scores outside the 0-100 range in screener results corrupts ranking and rating logic, causing unpredictable screening results that violate the fundamental score contract. When combined with division-by-zero guards that return 0.0 for empty screener lists, this creates unpredictable behavior where invalid scores produce wrong composite calculations and incorrect Strong Buy/Buy/Watch/Pass ratings.

## finance-bp-093--PyPortfolioOpt (1)

### `AP-PORTFOLIO-ANALYTICS-008` — Convex optimization constraints violate DCP rules <sub>(high)</sub>

Using non-convex objectives or DCP-violating expressions in CVXPY optimization causes DCPError, completely preventing portfolio optimization from running. Similarly, providing non-callable constraints or invalid bounds formats (not matching n_assets length) causes TypeError. Feasibility violations like setting target_volatility below global minimum or target_return above maximum achievable return make problems infeasible.

## finance-bp-093--PyPortfolioOpt, finance-bp-117--Riskfolio-Lib (1)

### `AP-PORTFOLIO-ANALYTICS-003` — Non-positive-semidefinite covariance matrix breaks CVXPY optimization <sub>(high)</sub>

Passing a non-positive-semidefinite covariance matrix to CVXPY optimization with assume_PSD=True produces incorrect results because the solver assumes validity without verification. This causes Cholesky decomposition to fail or produce garbage weights, preventing portfolio optimization from running entirely. Riskfolio-Lib and PyPortfolioOpt both require explicit PSD validation before optimization.

## finance-bp-106--pyfolio-reloaded (2)

### `AP-PORTFOLIO-ANALYTICS-005` — Allocation denominator excludes cash, corrupting portfolio composition <sub>(medium)</sub>

When computing allocation percentages excluding cash from the denominator, portfolio allocation percentages will not sum to 100%, misrepresenting the portfolio's actual composition. Additionally, concentration metrics become artificially skewed when including cash (a non-position asset), producing misleading diversification assessments that could lead to inappropriate risk management decisions.

### `AP-PORTFOLIO-ANALYTICS-009` — Transaction data corruption from missing columns and invalid dates <sub>(medium)</sub>

Extracting round trips from transactions DataFrame without validating required columns (amount, price, symbol) causes KeyError exceptions. When open_dt is not strictly less than close_dt, negative or zero duration values indicate data corruption causing incorrect holding period statistics. Similarly, non-normalized transaction timestamps cause intra-day trades to be incorrectly split across days.

## finance-bp-107--empyrical-reloaded (1)

### `AP-PORTFOLIO-ANALYTICS-011` — Wrong annualization factors distort cross-frequency metric comparison <sub>(high)</sub>

Applying incorrect annualization factors (wrong values for daily, weekly, monthly, quarterly, yearly frequencies) produces non-comparable metrics across different return frequencies, causing invalid strategy comparisons and misallocated capital. The Sharpe ratio formula must use correct annualization with sample standard deviation (ddof=1), otherwise producing misleading risk-adjusted return estimates.

## finance-bp-107--empyrical-reloaded, finance-bp-118--FinanceToolkit (1)

### `AP-PORTFOLIO-ANALYTICS-012` — Misaligned time series in alpha/beta calculation produces invalid factor analysis <sub>(high)</sub>

Passing returns and factor_returns to alpha_beta functions without verifying data alignment on index labels (pd.Series) or length equality (np.ndarray) produces incorrect alpha/beta values due to correlation computed between mismatched periods. Including benchmark ticker in the asset ticker list causes circular correlation producing meaningless beta values of approximately 1.0.

## finance-bp-108--finmarketpy (1)

### `AP-PORTFOLIO-ANALYTICS-013` — Forward-filling spot prices creates look-ahead bias in TRI construction <sub>(high)</sub>

Forward-filling spot prices creates look-ahead bias where future prices are used to calculate historical returns, invalidating all TRI-based backtest results. The total return index construction requires multiplicative cumulation using cumprod (not cumsum) with base value 100, as additive cumulation allows negative cumulative returns to break the index chain.

## finance-bp-108--finmarketpy, finance-bp-106--pyfolio-reloaded (1)

### `AP-PORTFOLIO-ANALYTICS-002` — Look-ahead bias from unshifted signal generation and position calculations <sub>(high)</sub>

Generating trading signals from current-period technical indicators (RSI, moving averages) without proper shift(-1) creates look-ahead bias, causing live trading returns to fall far below backtested results. Similarly, when estimating intraday positions from transactions without applying shift(1) to EOD positions, day-start positions are contaminated with end-of-day values, making results unrepresentative of actual trading.

## finance-bp-117--Riskfolio-Lib, finance-bp-093--PyPortfolioOpt (1)

### `AP-PORTFOLIO-ANALYTICS-014` — Unsupported solver selection breaks advanced risk calculations <sub>(medium)</sub>

Using solvers that don't support required cone programming (power cone, exponential cone) causes CVXPY to fail with SolverError, returning None and breaking risk calculations. CLARABEL, SCS, ECOS support power cone for RLVaR/RLDaR calculations, while CLARABEL/MOSEK/SCS/ECOS support exponential cone for EVaR calculations. Riskfolio-Lib and PyPortfolioOpt both require careful solver selection.

FILE:references/COMPONENTS.md
# Component Capability Map

**Project**: finance-bp-108--finmarketpy
**Scan date**: 2026-04-22
**Stats**: {'total_files': 6, 'total_classes': 33, 'total_functions': 0, 'total_stages': 6}

## Modules (6)

- [market_data_collection](components/market_data_collection.md): 4 classes
- [technical_indicator_&_signal_generation](components/technical_indicator_-_signal_generation.md): 5 classes
- [total_return_index_construction](components/total_return_index_construction.md): 6 classes
- [fx_volatility_surface_&_pricing](components/fx_volatility_surface_-_pricing.md): 6 classes
- [strategy_backtesting_engine](components/strategy_backtesting_engine.md): 7 classes
- [trade_analysis_&_reporting](components/trade_analysis_-_reporting.md): 5 classes

FILE:references/CONSTRAINTS.md
# Constraints

## preservation_manifest

```yaml
required_objects:
  business_decisions_count: 82
  fatal_constraints_count: 18
  non_fatal_constraints_count: 132
  use_cases_count: 4
  semantic_locks_count: 12
  preconditions_count: 4
  evidence_quality_rules_count: 2
  traceback_scenarios_count: 5

```

FILE:references/LOCKS.md
# Semantic Locks + Preconditions

## Semantic Locks (12)

### `SL-01` <sub>(on_violation: fatal)</sub>
Execute sell orders before buy orders in every trading cycle

### `SL-02` <sub>(on_violation: fatal)</sub>
Trading signals MUST use next-bar execution (no look-ahead)

### `SL-03` <sub>(on_violation: fatal)</sub>
Entity IDs MUST follow format entity_type_exchange_code

### `SL-04` <sub>(on_violation: fatal)</sub>
DataFrame index MUST be MultiIndex (entity_id, timestamp)

### `SL-05` <sub>(on_violation: fatal)</sub>
TradingSignal MUST have EXACTLY ONE of: position_pct, order_money, order_amount

### `SL-06` <sub>(on_violation: fatal)</sub>
filter_result column semantics: True=BUY, False=SELL, None/NaN=NO ACTION

### `SL-07` <sub>(on_violation: fatal)</sub>
Transformer MUST run BEFORE Accumulator in factor pipeline

### `SL-08` <sub>(on_violation: fatal)</sub>
MACD parameters locked: fast=12, slow=26, signal=9

### `SL-09` <sub>(on_violation: warning)</sub>
Default transaction costs: buy_cost=0.001, sell_cost=0.001, slippage=0.001

### `SL-10` <sub>(on_violation: fatal)</sub>
A-share equity trading is T+1 (no same-day close of buy positions)

### `SL-11` <sub>(on_violation: fatal)</sub>
Recorder subclass MUST define provider AND data_schema class attributes

### `SL-12` <sub>(on_violation: fatal)</sub>
Factor result_df MUST contain either 'filter_result' OR 'score_result' column

## Preconditions (4)

- **`PC-01`**: `python3 -c 'import zvt; print(zvt.__version__)'` → on_fail: Run: python3 -m pip install zvt then re-run: python3 -m zvt.init_dirs to initialize data directories
- **`PC-02`**: `python3 -c "from zvt.api.kdata import get_kdata; df = get_kdata(entity_ids=['stock_sh_600000'], limit=1); assert df is not None and len(df) > 0, 'No kdata found'"` → on_fail: Run recorder first: python3 -m zvt.recorders.em.em_stock_kdata_recorder --entity_ids stock_sh_600000 (replace with your target entity IDs)
- **`PC-03`**: `python3 -c "import os; from pathlib import Path; zvt_home = Path(os.environ.get('ZVT_HOME', Path.home() / '.zvt')); assert zvt_home.exists(), f'ZVT home not found: {zvt_home}'"` → on_fail: Run: python3 -m zvt.init_dirs
- **`PC-04`**: `python3 -c "import os, tempfile; from pathlib import Path; zvt_home = Path(os.environ.get('ZVT_HOME', Path.home() / '.zvt')); test_f = zvt_home / '.write_test'; test_f.touch(); test_f.unlink()"` → on_fail: Check directory permissions: chmod u+w ~/.zvt or set ZVT_HOME environment variable to a writable location

FILE:references/USE_CASES.md
# Known Use Cases (KUC)

Total: **4**

## `KUC-101`
**Source**: `finmarketpy_examples/finmarketpy_notebooks/arcticdb_example.ipynb`

Provides persistent storage for high-frequency tick market data using ArcticDB, supporting both local LMDB and S3 cloud storage backends for efficient time series data management.

## `KUC-102`
**Source**: `finmarketpy_examples/finmarketpy_notebooks/backtest_example.ipynb`

Enables historical backtesting of FX trading strategies using G10 currency pairs with technical indicator-based signal generation to evaluate strategy performance.

## `KUC-103`
**Source**: `finmarketpy_examples/finmarketpy_notebooks/market_data_example.ipynb`

Fetches economic and financial market data from external vendors like Quandl, demonstrating how to request and cache market data with specific fields and date ranges.

## `KUC-104`
**Source**: `finmarketpy_examples/finmarketpy_notebooks/s3_bucket_example.ipynb`

Demonstrates writing and reading tick market data to/from AWS S3 cloud storage using Parquet format for efficient compression and retrieval of historical FX tick data.

FILE:references/WISDOM.md
# Cross-Project Wisdom

Total: **10**

## `CW-PORTFOLIO-ANALYTICS-001` — Defensive zero-division guards with explicit handling
**From**: finance-bp-066--wealthbot, finance-bp-082--stock-screener, finance-bp-093--PyPortfolioOpt · **Applicable to**: portfolio-analytics

Always guard division operations with explicit zero-value checks before executing. In price ratio calculations, filter out securities where old_price is zero before calling getPricesDiff. In composite score calculations, guard against total_weight of zero and return 0.0 for empty input lists. This prevents NaN/infinity propagation that corrupts downstream calculations and crashes pipelines.

## `CW-PORTFOLIO-ANALYTICS-002` — Covariance matrix positive-semidefiniteness verification
**From**: finance-bp-093--PyPortfolioOpt, finance-bp-117--Riskfolio-Lib · **Applicable to**: portfolio-analytics

Always verify covariance matrix is positive-semidefinite before passing to CVXPY optimization. Apply eigenvalue clipping if violated, as non-PSD matrices cause Cholesky decomposition failures. Both PyPortfolioOpt and Riskfolio-Lib enforce this constraint to prevent optimizer from finding mathematically invalid solutions or crashing entirely.

## `CW-PORTFOLIO-ANALYTICS-003` — Geometric compounding for cumulative returns
**From**: finance-bp-068--xalpha, finance-bp-106--pyfolio-reloaded, finance-bp-107--empyrical-reloaded · **Applicable to**: portfolio-analytics

Compute cumulative returns using geometric compounding via cumprod(1 + returns), never arithmetic cumulation via cumsum. Arithmetic cumulative sum overstates gains and understates losses, causing cumulative returns to diverge significantly from actual portfolio performance over volatile periods. This principle applies to total return index construction and any cumulative performance calculation.

## `CW-PORTFOLIO-ANALYTICS-004` — Temporal shift enforcement to prevent look-ahead bias
**From**: finance-bp-108--finmarketpy, finance-bp-106--pyfolio-reloaded · **Applicable to**: portfolio-analytics

Enforce proper temporal shifting in signal generation and position calculations. Use shift(-1) for exit signals to prevent look-ahead bias, and shift(1) when estimating intraday positions from EOD data. Forward-fill carry data and backward-fill only old data gaps, never forward-fill spot prices. Violations cause live trading returns to diverge from backtested results.

## `CW-PORTFOLIO-ANALYTICS-005` — DCP-compliant convex optimization construction
**From**: finance-bp-093--PyPortfolioOpt, finance-bp-117--Riskfolio-Lib · **Applicable to**: portfolio-analytics

Use only DCP-compliant convex objectives and constraints in CVXPY. Provide constraints as callable functions accepting weight variables, use valid bounds formats matching n_assets length, and verify target parameters (volatility, return) are within feasible ranges. Non-convex or infeasible problems fail with DCPError or OptimizationError, preventing optimization entirely.

## `CW-PORTFOLIO-ANALYTICS-006` — Correct Sharpe ratio formula with risk-free rate subtraction
**From**: finance-bp-107--empyrical-reloaded, finance-bp-118--FinanceToolkit · **Applicable to**: portfolio-analytics

Calculate Sharpe ratio using (mean returns - risk_free) / std(returns) * sqrt(annualization) with sample standard deviation (ddof=1). Subtract risk-free rate from asset returns before dividing by volatility. Incorrect Sharpe ratio calculation produces misleading risk-adjusted return estimates, causing poor investment decisions based on faulty performance attribution.

## `CW-PORTFOLIO-ANALYTICS-007` — Immutable FIFO position tracking with chronological ordering
**From**: finance-bp-068--xalpha, finance-bp-066--wealthbot · **Applicable to**: portfolio-analytics

Maintain FIFO position tracking with strictly increasing date order for position entries. Use copy() function to create independent copies before mutating remtable to avoid side effects. Enforce chronological ordering in sell operations to ensure correct cost basis and holding period calculation, particularly important for funds with tiered fees by holding period.

## `CW-PORTFOLIO-ANALYTICS-008` — Validation at system boundaries with descriptive errors
**From**: finance-bp-082--stock-screener, finance-bp-093--PyPortfolioOpt, finance-bp-117--Riskfolio-Lib · **Applicable to**: portfolio-analytics

Enforce validation at system boundaries with descriptive error messages. Validate expected returns matches covariance matrix dimensions, score values are within [0, 100], confidence values within [0, 1], and required DataFrame columns are present. Invalid inputs should raise ValueError with descriptive messages listing valid options to prevent silent failures or corrupted calculations.

## `CW-PORTFOLIO-ANALYTICS-009` — Decimal rounding for monetary calculations
**From**: finance-bp-068--xalpha, finance-bp-107--empyrical-reloaded · **Applicable to**: portfolio-analytics

Use Decimal with explicit rounding (myround) for each monetary calculation to avoid floating-point errors that cause share miscalculation and incorrect cost basis. This prevents rounding errors from propagating to XIRR and portfolio valuation calculations. Direct floating-point operations in financial calculations accumulate errors that become material over many transactions.

## `CW-PORTFOLIO-ANALYTICS-010` — Cash flow sign convention enforcement
**From**: finance-bp-106--pyfolio-reloaded, finance-bp-068--xalpha · **Applicable to**: portfolio-analytics

Mark cash outflows as negative and cash inflows as positive in cftable. Incorrect cash flow signs cause NPV calculation to invert, producing negative returns for profitable trades and vice versa. Verify sum of round trip PnLs equals total realized transaction dollars to catch sign convention errors before they corrupt performance attribution.

FILE:references/components/fx_volatility_surface_-_pricing.md
# fx_volatility_surface_&_pricing (6 classes)

## `FXVolSurface.build_vol_surface`
`fx_volatility_surface_&_pricing/fxvolsurface-build-vol-surface.py:0`

## `FXOptionsPricer.price_option`
`fx_volatility_surface_&_pricing/fxoptionspricer-price-option.py:0`

## `FXForwardsPricer.price_odd_date`
`fx_volatility_surface_&_pricing/fxforwardspricer-price-odd-date.py:0`

## `VolStats.calculate`
`fx_volatility_surface_&_pricing/volstats-calculate.py:0`

## `vol_function_type`
`fx_volatility_surface_&_pricing/vol-function-type.py:0`

## `pricing_engine`
`fx_volatility_surface_&_pricing/pricing-engine.py:0`

FILE:references/components/market_data_collection.md
# market_data_collection (4 classes)

## `MarketDataRequest.__init__`
`market_data_collection/marketdatarequest-init.py:0`

## `Market.fetch_market_data`
`market_data_collection/market-fetch-market-data.py:0`

## `SpeedCache.generate_key`
`market_data_collection/speedcache-generate-key.py:0`

## `data_source`
`market_data_collection/data-source.py:0`

FILE:references/components/strategy_backtesting_engine.md
# strategy_backtesting_engine (7 classes)

## `Backtest.calculate_trading_PnL`
`strategy_backtesting_engine/backtest-calculate-trading-pnl.py:0`

## `TradingModel.construct_strategy`
`strategy_backtesting_engine/tradingmodel-construct-strategy.py:0`

## `PortfolioWeightConstruction.optimize_portfolio_weights`
`strategy_backtesting_engine/portfolioweightconstruction-optimize-por.py:0`

## `RiskEngine.calculate_leverage_factor`
`strategy_backtesting_engine/riskengine-calculate-leverage-factor.py:0`

## `portfolio_combination`
`strategy_backtesting_engine/portfolio-combination.py:0`

## `signal_delay`
`strategy_backtesting_engine/signal-delay.py:0`

## `portfolio_vol_adjust`
`strategy_backtesting_engine/portfolio-vol-adjust.py:0`

FILE:references/components/technical_indicator_-_signal_generation.md
# technical_indicator_&_signal_generation (5 classes)

## `TechIndicator.create_tech_ind`
`technical_indicator_&_signal_generation/techindicator-create-tech-ind.py:0`

## `TechParams.__init__`
`technical_indicator_&_signal_generation/techparams-init.py:0`

## `EventsFactory.create_event_signal`
`technical_indicator_&_signal_generation/eventsfactory-create-event-signal.py:0`

## `indicator_type`
`technical_indicator_&_signal_generation/indicator-type.py:0`

## `signal_direction_filter`
`technical_indicator_&_signal_generation/signal-direction-filter.py:0`

FILE:references/components/total_return_index_construction.md
# total_return_index_construction (6 classes)

## `FXSpotCurve.construct_total_returns_index`
`total_return_index_construction/fxspotcurve-construct-total-returns-inde.py:0`

## `FXForwardsCurve.roll_contracts`
`total_return_index_construction/fxforwardscurve-roll-contracts.py:0`

## `FXOptionsCurve.construct_tri`
`total_return_index_construction/fxoptionscurve-construct-tri.py:0`

## `AbstractCurve.generate_key`
`total_return_index_construction/abstractcurve-generate-key.py:0`

## `roll_event`
`total_return_index_construction/roll-event.py:0`

## `construct_via_currency`
`total_return_index_construction/construct-via-currency.py:0`

FILE:references/components/trade_analysis_-_reporting.md
# trade_analysis_&_reporting (5 classes)

## `TradeAnalysis.analyse_strategy`
`trade_analysis_&_reporting/tradeanalysis-analyse-strategy.py:0`

## `BacktestComparison.compare`
`trade_analysis_&_reporting/backtestcomparison-compare.py:0`

## `Report.generate`
`trade_analysis_&_reporting/report-generate.py:0`

## `Seasonality.detect`
`trade_analysis_&_reporting/seasonality-detect.py:0`

## `analysis_engine`
`trade_analysis_&_reporting/analysis-engine.py:0`

FILE:references/seed.yaml
meta:
  id: finance-bp-108-v5.3
  version: v6.1
  blueprint_id: finance-bp-108
  sop_version: crystal-compilation-v6.1
  source_language: en
  compiled_at: '2026-04-22T13:00:51.768652+00:00'
  target_host: openclaw
  authoritative_artifact:
    primary: seed.yaml
    non_authoritative_derivatives:
    - SKILL.md (host-generated summary, may lag)
    - HEARTBEAT.md (host telemetry)
    - memory/*.md (host conversational memory)
    rule: On any behavioral decision (preconditions check, OV assertion, EQ rule firing, spec_lock verification), agents MUST
      re-read seed.yaml. Derivatives are for UI display only and may be out-of-date.
  execution_protocol:
    install_trigger:
    - Execute resources.host_adapter.install_recipes[] in declared order
    - Verify each package with import check before proceeding
    execute_trigger: When user intent matches intent_router.uc_entries[].positive_terms AND user uses action verb (run/execute/跑/执行/backtest/fetch/collect)
    on_execute:
    - Reload seed.yaml (do not rely on SKILL.md or cached summaries)
    - Run preconditions[] in declared order; halt on first fatal failure with on_fail message to user
    - Enter context_state_machine.CA1_MEMORY_CHECKED state
    - Evaluate evidence_quality.enforcement_rules[]; prepend user_disclosure_template
    - Translate user_facing_fields to user locale per locale_contract
    - "[V6 READING ORDER]\nThis crystal contains the following V6 layers. Before answering any business question, the host\
      \ MUST read them in order:\n  1. anti_patterns[] — cross-project anti-patterns (with AP-* ids)\n  2. cross_project_wisdom[]\
      \ — cross-project wisdom (with CW-* ids)\n  3. domain_constraints_injected[] — domain constraints (SHARED-* ids)\n \
      \ 4. known_use_cases[] — concrete business scenarios (KUC-* ids)\n  5. component_capability_map — AST component map\
      \ (by module)\n\nWhen answering user questions, proactively cite relevant AP-*/CW-*/SHARED-*/KUC-* ids with source text.\
      \ Examples: T+1 rules -> cite SHARED-* constraint; model comparison -> warn via AP-*; follow-holdings strategy -> cite\
      \ KUC-* with example file."
    workspace_resolution:
      scripts_path: '{host_workspace}/scripts/'
      skills_path: '{host_workspace}/skills/'
      trace_path: '{host_workspace}/.trace/'
  capability_tags:
    markets:
    - multi-market
    activities:
    - portfolio-analytics
  upgraded_from: finance-bp-108-v1.seed.yaml
  upgraded_at: '2026-04-22T13:20:29.151803+00:00'
  v6_inputs:
    ast_mind_map: knowledge/sources/finance/finance-bp-108--finmarketpy/v6_inputs/ast_mind_map.yaml
    anti_patterns: null
    cross_project_wisdom: null
    examples_kuc: knowledge/sources/finance/finance-bp-108--finmarketpy/v6_inputs/examples_kuc.yaml
    shared_pools_dir: knowledge/sources/finance/_shared
anti_patterns:
- id: AP-PORTFOLIO-ANALYTICS-001
  title: Division by zero in price ratio calculations corrupts rebalancing
  description: When calculating price_diff using current_price divided by old_price without validating old_price is non-zero,
    the result is NaN or INF. This corrupts portfolio rebalancing calculations in wealthbot, causing incorrect buy/sell decisions
    based on invalid prices_diff values. The same issue appears in getPricesDiff() where divide-by-zero when old_price equals
    zero produces NaN/infinity that propagates to all subsequent trade decisions.
  project_source: finance-bp-066--wealthbot
  severity: high
  applicable_to_tags:
    markets:
    - multi-market
    activities:
    - portfolio-analytics
  _source_file: anti-patterns/portfolio-analytics.yaml
- id: AP-PORTFOLIO-ANALYTICS-002
  title: Look-ahead bias from unshifted signal generation and position calculations
  description: Generating trading signals from current-period technical indicators (RSI, moving averages) without proper shift(-1)
    creates look-ahead bias, causing live trading returns to fall far below backtested results. Similarly, when estimating
    intraday positions from transactions without applying shift(1) to EOD positions, day-start positions are contaminated
    with end-of-day values, making results unrepresentative of actual trading.
  project_source: finance-bp-108--finmarketpy, finance-bp-106--pyfolio-reloaded
  severity: high
  applicable_to_tags:
    markets:
    - multi-market
    activities:
    - portfolio-analytics
  _source_file: anti-patterns/portfolio-analytics.yaml
- id: AP-PORTFOLIO-ANALYTICS-003
  title: Non-positive-semidefinite covariance matrix breaks CVXPY optimization
  description: Passing a non-positive-semidefinite covariance matrix to CVXPY optimization with assume_PSD=True produces incorrect
    results because the solver assumes validity without verification. This causes Cholesky decomposition to fail or produce
    garbage weights, preventing portfolio optimization from running entirely. Riskfolio-Lib and PyPortfolioOpt both require
    explicit PSD validation before optimization.
  project_source: finance-bp-093--PyPortfolioOpt, finance-bp-117--Riskfolio-Lib
  severity: high
  applicable_to_tags:
    markets:
    - multi-market
    activities:
    - portfolio-analytics
  _source_file: anti-patterns/portfolio-analytics.yaml
- id: AP-PORTFOLIO-ANALYTICS-004
  title: Incorrect portfolio value tracking destroys time-series integrity
  description: Updating existing ClientPortfolioValue records instead of creating new ones destroys the time-series integrity
    needed for billing calculations and historical reconciliation. This creates data corruption where billing calculations
    and historical reporting against custodian records will fail to match. Portfolio value records must be linked to parent
    ClientPortfolio via proper relationships to avoid orphaned records.
  project_source: finance-bp-066--wealthbot
  severity: high
  applicable_to_tags:
    markets:
    - multi-market
    activities:
    - portfolio-analytics
  _source_file: anti-patterns/portfolio-analytics.yaml
- id: AP-PORTFOLIO-ANALYTICS-005
  title: Allocation denominator excludes cash, corrupting portfolio composition
  description: When computing allocation percentages excluding cash from the denominator, portfolio allocation percentages
    will not sum to 100%, misrepresenting the portfolio's actual composition. Additionally, concentration metrics become artificially
    skewed when including cash (a non-position asset), producing misleading diversification assessments that could lead to
    inappropriate risk management decisions.
  project_source: finance-bp-106--pyfolio-reloaded
  severity: medium
  applicable_to_tags:
    markets:
    - multi-market
    activities:
    - portfolio-analytics
  _source_file: anti-patterns/portfolio-analytics.yaml
- id: AP-PORTFOLIO-ANALYTICS-006
  title: FIFO sell order violation corrupts cost basis and XIRR
  description: Processing positions out of chronological order in FIFO sell operations causes incorrect cost basis assignment,
    leading to inaccurate realized gains/losses and wrong XIRR calculation. Chinese funds have tiered redemption fees based
    on holding periods, so FIFO violations result in incorrect holding period calculation and wrong redemption fee being applied,
    causing direct financial loss.
  project_source: finance-bp-068--xalpha
  severity: high
  applicable_to_tags:
    markets:
    - multi-market
    activities:
    - portfolio-analytics
  _source_file: anti-patterns/portfolio-analytics.yaml
- id: AP-PORTFOLIO-ANALYTICS-007
  title: Score validation bypass allows invalid composite calculations
  description: Accepting scores outside the 0-100 range in screener results corrupts ranking and rating logic, causing unpredictable
    screening results that violate the fundamental score contract. When combined with division-by-zero guards that return
    0.0 for empty screener lists, this creates unpredictable behavior where invalid scores produce wrong composite calculations
    and incorrect Strong Buy/Buy/Watch/Pass ratings.
  project_source: finance-bp-082--stock-screener
  severity: medium
  applicable_to_tags:
    markets:
    - multi-market
    activities:
    - portfolio-analytics
  _source_file: anti-patterns/portfolio-analytics.yaml
- id: AP-PORTFOLIO-ANALYTICS-008
  title: Convex optimization constraints violate DCP rules
  description: Using non-convex objectives or DCP-violating expressions in CVXPY optimization causes DCPError, completely
    preventing portfolio optimization from running. Similarly, providing non-callable constraints or invalid bounds formats
    (not matching n_assets length) causes TypeError. Feasibility violations like setting target_volatility below global minimum
    or target_return above maximum achievable return make problems infeasible.
  project_source: finance-bp-093--PyPortfolioOpt
  severity: high
  applicable_to_tags:
    markets:
    - multi-market
    activities:
    - portfolio-analytics
  _source_file: anti-patterns/portfolio-analytics.yaml
- id: AP-PORTFOLIO-ANALYTICS-009
  title: Transaction data corruption from missing columns and invalid dates
  description: Extracting round trips from transactions DataFrame without validating required columns (amount, price, symbol)
    causes KeyError exceptions. When open_dt is not strictly less than close_dt, negative or zero duration values indicate
    data corruption causing incorrect holding period statistics. Similarly, non-normalized transaction timestamps cause intra-day
    trades to be incorrectly split across days.
  project_source: finance-bp-106--pyfolio-reloaded
  severity: medium
  applicable_to_tags:
    markets:
    - multi-market
    activities:
    - portfolio-analytics
  _source_file: anti-patterns/portfolio-analytics.yaml
- id: AP-PORTFOLIO-ANALYTICS-010
  title: Missing DataFrame schema validation causes KeyError propagation
  description: Passing non-DataFrame objects (numpy arrays, lists) where DataFrame is expected causes NameError, AttributeError,
    or TypeError in downstream pandas operations. xalpha's fundinfo.price requires specific columns (date, netvalue, totvalue,
    comment), PyPortfolioOpt and Riskfolio-Lib require index alignment between expected returns and covariance matrix. Missing
    columns cause backtest calculations to fail with NaN values or KeyError.
  project_source: finance-bp-068--xalpha, finance-bp-093--PyPortfolioOpt, finance-bp-117--Riskfolio-Lib
  severity: medium
  applicable_to_tags:
    markets:
    - multi-market
    activities:
    - portfolio-analytics
  _source_file: anti-patterns/portfolio-analytics.yaml
- id: AP-PORTFOLIO-ANALYTICS-011
  title: Wrong annualization factors distort cross-frequency metric comparison
  description: Applying incorrect annualization factors (wrong values for daily, weekly, monthly, quarterly, yearly frequencies)
    produces non-comparable metrics across different return frequencies, causing invalid strategy comparisons and misallocated
    capital. The Sharpe ratio formula must use correct annualization with sample standard deviation (ddof=1), otherwise producing
    misleading risk-adjusted return estimates.
  project_source: finance-bp-107--empyrical-reloaded
  severity: high
  applicable_to_tags:
    markets:
    - multi-market
    activities:
    - portfolio-analytics
  _source_file: anti-patterns/portfolio-analytics.yaml
- id: AP-PORTFOLIO-ANALYTICS-012
  title: Misaligned time series in alpha/beta calculation produces invalid factor analysis
  description: Passing returns and factor_returns to alpha_beta functions without verifying data alignment on index labels
    (pd.Series) or length equality (np.ndarray) produces incorrect alpha/beta values due to correlation computed between mismatched
    periods. Including benchmark ticker in the asset ticker list causes circular correlation producing meaningless beta values
    of approximately 1.0.
  project_source: finance-bp-107--empyrical-reloaded, finance-bp-118--FinanceToolkit
  severity: high
  applicable_to_tags:
    markets:
    - multi-market
    activities:
    - portfolio-analytics
  _source_file: anti-patterns/portfolio-analytics.yaml
- id: AP-PORTFOLIO-ANALYTICS-013
  title: Forward-filling spot prices creates look-ahead bias in TRI construction
  description: Forward-filling spot prices creates look-ahead bias where future prices are used to calculate historical returns,
    invalidating all TRI-based backtest results. The total return index construction requires multiplicative cumulation using
    cumprod (not cumsum) with base value 100, as additive cumulation allows negative cumulative returns to break the index
    chain.
  project_source: finance-bp-108--finmarketpy
  severity: high
  applicable_to_tags:
    markets:
    - multi-market
    activities:
    - portfolio-analytics
  _source_file: anti-patterns/portfolio-analytics.yaml
- id: AP-PORTFOLIO-ANALYTICS-014
  title: Unsupported solver selection breaks advanced risk calculations
  description: Using solvers that don't support required cone programming (power cone, exponential cone) causes CVXPY to fail
    with SolverError, returning None and breaking risk calculations. CLARABEL, SCS, ECOS support power cone for RLVaR/RLDaR
    calculations, while CLARABEL/MOSEK/SCS/ECOS support exponential cone for EVaR calculations. Riskfolio-Lib and PyPortfolioOpt
    both require careful solver selection.
  project_source: finance-bp-117--Riskfolio-Lib, finance-bp-093--PyPortfolioOpt
  severity: medium
  applicable_to_tags:
    markets:
    - multi-market
    activities:
    - portfolio-analytics
  _source_file: anti-patterns/portfolio-analytics.yaml
cross_project_wisdom:
- wisdom_id: CW-PORTFOLIO-ANALYTICS-001
  source_project: finance-bp-066--wealthbot, finance-bp-082--stock-screener, finance-bp-093--PyPortfolioOpt
  pattern_name: Defensive zero-division guards with explicit handling
  description: Always guard division operations with explicit zero-value checks before executing. In price ratio calculations,
    filter out securities where old_price is zero before calling getPricesDiff. In composite score calculations, guard against
    total_weight of zero and return 0.0 for empty input lists. This prevents NaN/infinity propagation that corrupts downstream
    calculations and crashes pipelines.
  applicable_to_activity: portfolio-analytics
  _source_file: cross-project-wisdom/portfolio-analytics.yaml
- wisdom_id: CW-PORTFOLIO-ANALYTICS-002
  source_project: finance-bp-093--PyPortfolioOpt, finance-bp-117--Riskfolio-Lib
  pattern_name: Covariance matrix positive-semidefiniteness verification
  description: Always verify covariance matrix is positive-semidefinite before passing to CVXPY optimization. Apply eigenvalue
    clipping if violated, as non-PSD matrices cause Cholesky decomposition failures. Both PyPortfolioOpt and Riskfolio-Lib
    enforce this constraint to prevent optimizer from finding mathematically invalid solutions or crashing entirely.
  applicable_to_activity: portfolio-analytics
  _source_file: cross-project-wisdom/portfolio-analytics.yaml
- wisdom_id: CW-PORTFOLIO-ANALYTICS-003
  source_project: finance-bp-068--xalpha, finance-bp-106--pyfolio-reloaded, finance-bp-107--empyrical-reloaded
  pattern_name: Geometric compounding for cumulative returns
  description: Compute cumulative returns using geometric compounding via cumprod(1 + returns), never arithmetic cumulation
    via cumsum. Arithmetic cumulative sum overstates gains and understates losses, causing cumulative returns to diverge significantly
    from actual portfolio performance over volatile periods. This principle applies to total return index construction and
    any cumulative performance calculation.
  applicable_to_activity: portfolio-analytics
  _source_file: cross-project-wisdom/portfolio-analytics.yaml
- wisdom_id: CW-PORTFOLIO-ANALYTICS-004
  source_project: finance-bp-108--finmarketpy, finance-bp-106--pyfolio-reloaded
  pattern_name: Temporal shift enforcement to prevent look-ahead bias
  description: Enforce proper temporal shifting in signal generation and position calculations. Use shift(-1) for exit signals
    to prevent look-ahead bias, and shift(1) when estimating intraday positions from EOD data. Forward-fill carry data and
    backward-fill only old data gaps, never forward-fill spot prices. Violations cause live trading returns to diverge from
    backtested results.
  applicable_to_activity: portfolio-analytics
  _source_file: cross-project-wisdom/portfolio-analytics.yaml
- wisdom_id: CW-PORTFOLIO-ANALYTICS-005
  source_project: finance-bp-093--PyPortfolioOpt, finance-bp-117--Riskfolio-Lib
  pattern_name: DCP-compliant convex optimization construction
  description: Use only DCP-compliant convex objectives and constraints in CVXPY. Provide constraints as callable functions
    accepting weight variables, use valid bounds formats matching n_assets length, and verify target parameters (volatility,
    return) are within feasible ranges. Non-convex or infeasible problems fail with DCPError or OptimizationError, preventing
    optimization entirely.
  applicable_to_activity: portfolio-analytics
  _source_file: cross-project-wisdom/portfolio-analytics.yaml
- wisdom_id: CW-PORTFOLIO-ANALYTICS-006
  source_project: finance-bp-107--empyrical-reloaded, finance-bp-118--FinanceToolkit
  pattern_name: Correct Sharpe ratio formula with risk-free rate subtraction
  description: Calculate Sharpe ratio using (mean returns - risk_free) / std(returns) * sqrt(annualization) with sample standard
    deviation (ddof=1). Subtract risk-free rate from asset returns before dividing by volatility. Incorrect Sharpe ratio calculation
    produces misleading risk-adjusted return estimates, causing poor investment decisions based on faulty performance attribution.
  applicable_to_activity: portfolio-analytics
  _source_file: cross-project-wisdom/portfolio-analytics.yaml
- wisdom_id: CW-PORTFOLIO-ANALYTICS-007
  source_project: finance-bp-068--xalpha, finance-bp-066--wealthbot
  pattern_name: Immutable FIFO position tracking with chronological ordering
  description: Maintain FIFO position tracking with strictly increasing date order for position entries. Use copy() function
    to create independent copies before mutating remtable to avoid side effects. Enforce chronological ordering in sell operations
    to ensure correct cost basis and holding period calculation, particularly important for funds with tiered fees by holding
    period.
  applicable_to_activity: portfolio-analytics
  _source_file: cross-project-wisdom/portfolio-analytics.yaml
- wisdom_id: CW-PORTFOLIO-ANALYTICS-008
  source_project: finance-bp-082--stock-screener, finance-bp-093--PyPortfolioOpt, finance-bp-117--Riskfolio-Lib
  pattern_name: Validation at system boundaries with descriptive errors
  description: Enforce validation at system boundaries with descriptive error messages. Validate expected returns matches
    covariance matrix dimensions, score values are within [0, 100], confidence values within [0, 1], and required DataFrame
    columns are present. Invalid inputs should raise ValueError with descriptive messages listing valid options to prevent
    silent failures or corrupted calculations.
  applicable_to_activity: portfolio-analytics
  _source_file: cross-project-wisdom/portfolio-analytics.yaml
- wisdom_id: CW-PORTFOLIO-ANALYTICS-009
  source_project: finance-bp-068--xalpha, finance-bp-107--empyrical-reloaded
  pattern_name: Decimal rounding for monetary calculations
  description: Use Decimal with explicit rounding (myround) for each monetary calculation to avoid floating-point errors that
    cause share miscalculation and incorrect cost basis. This prevents rounding errors from propagating to XIRR and portfolio
    valuation calculations. Direct floating-point operations in financial calculations accumulate errors that become material
    over many transactions.
  applicable_to_activity: portfolio-analytics
  _source_file: cross-project-wisdom/portfolio-analytics.yaml
- wisdom_id: CW-PORTFOLIO-ANALYTICS-010
  source_project: finance-bp-106--pyfolio-reloaded, finance-bp-068--xalpha
  pattern_name: Cash flow sign convention enforcement
  description: Mark cash outflows as negative and cash inflows as positive in cftable. Incorrect cash flow signs cause NPV
    calculation to invert, producing negative returns for profitable trades and vice versa. Verify sum of round trip PnLs
    equals total realized transaction dollars to catch sign convention errors before they corrupt performance attribution.
  applicable_to_activity: portfolio-analytics
  _source_file: cross-project-wisdom/portfolio-analytics.yaml
domain_constraints_injected: []
resources_injected: {}
known_use_cases:
- kuc_id: KUC-101
  source_file: finmarketpy_examples/finmarketpy_notebooks/arcticdb_example.ipynb
  business_problem: Provides persistent storage for high-frequency tick market data using ArcticDB, supporting both local
    LMDB and S3 cloud storage backends for efficient time series data management.
  intent_keywords:
  - arcticdb
  - tick data storage
  - time series database
  - lmdb
  - market data persistence
  stage: data_collection
  data_domain: market_data
  type: data_pipeline
- kuc_id: KUC-102
  source_file: finmarketpy_examples/finmarketpy_notebooks/backtest_example.ipynb
  business_problem: Enables historical backtesting of FX trading strategies using G10 currency pairs with technical indicator-based
    signal generation to evaluate strategy performance.
  intent_keywords:
  - backtest
  - fx trading
  - g10 currency
  - technical indicators
  - strategy testing
  stage: backtesting
  data_domain: trading_data
  type: trading_strategy
- kuc_id: KUC-103
  source_file: finmarketpy_examples/finmarketpy_notebooks/market_data_example.ipynb
  business_problem: Fetches economic and financial market data from external vendors like Quandl, demonstrating how to request
    and cache market data with specific fields and date ranges.
  intent_keywords:
  - market data
  - quandl
  - fetch data
  - vendor data
  - interest rates
  stage: data_collection
  data_domain: market_data
  type: data_pipeline
- kuc_id: KUC-104
  source_file: finmarketpy_examples/finmarketpy_notebooks/s3_bucket_example.ipynb
  business_problem: Demonstrates writing and reading tick market data to/from AWS S3 cloud storage using Parquet format for
    efficient compression and retrieval of historical FX tick data.
  intent_keywords:
  - s3 storage
  - aws
  - parquet
  - cloud storage
  - tick data
  stage: data_collection
  data_domain: market_data
  type: data_pipeline
component_capability_map:
  project: finance-bp-108--finmarketpy
  scan_date: '2026-04-22'
  stats:
    total_files: 6
    total_classes: 33
    total_functions: 0
    total_stages: 6
  modules:
    market_data_collection:
      class_count: 4
      stage_id: data_collection
      stage_order: 1
      responsibility: Fetches market data from external vendors (Bloomberg, FRED, Quandl) via findatapy MarketDataRequest
        abstraction layer. Provides raw time series for downstream processing, abstracting vendor-specific ticker formats
        from strategy logic.
      classes:
      - name: MarketDataRequest.__init__
        file: market_data_collection/marketdatarequest-init.py
        line: 0
        kind: required_method
        signature: ''
      - name: Market.fetch_market_data
        file: market_data_collection/market-fetch-market-data.py
        line: 0
        kind: required_method
        signature: ''
      - name: SpeedCache.generate_key
        file: market_data_collection/speedcache-generate-key.py
        line: 0
        kind: required_method
        signature: ''
      - name: data_source
        file: market_data_collection/data-source.py
        line: 0
        kind: replaceable_point
      design_decision_count: 2
    technical_indicator_&_signal_generation:
      class_count: 5
      stage_id: signal_generation
      stage_order: 2
      responsibility: Computes technical indicators (SMA, EMA, RSI, Bollinger Bands) and converts them to discrete +1/-1 trading
        signals. Acts as the core alpha generation engine, transforming raw price data into actionable directional signals.
      classes:
      - name: TechIndicator.create_tech_ind
        file: technical_indicator_&_signal_generation/techindicator-create-tech-ind.py
        line: 0
        kind: required_method
        signature: ''
      - name: TechParams.__init__
        file: technical_indicator_&_signal_generation/techparams-init.py
        line: 0
        kind: required_method
        signature: ''
      - name: EventsFactory.create_event_signal
        file: technical_indicator_&_signal_generation/eventsfactory-create-event-signal.py
        line: 0
        kind: required_method
        signature: ''
      - name: indicator_type
        file: technical_indicator_&_signal_generation/indicator-type.py
        line: 0
        kind: replaceable_point
      - name: signal_direction_filter
        file: technical_indicator_&_signal_generation/signal-direction-filter.py
        line: 0
        kind: replaceable_point
      design_decision_count: 3
    total_return_index_construction:
      class_count: 6
      stage_id: curve_construction
      stage_order: 3
      responsibility: Builds continuous time series of total return indices for FX spot, forwards, and options by incorporating
        carry/roll costs and handling rolling around contract expiry dates. Provides the asset return stream for P&L calculation.
      classes:
      - name: FXSpotCurve.construct_total_returns_index
        file: total_return_index_construction/fxspotcurve-construct-total-returns-inde.py
        line: 0
        kind: required_method
        signature: ''
      - name: FXForwardsCurve.roll_contracts
        file: total_return_index_construction/fxforwardscurve-roll-contracts.py
        line: 0
        kind: required_method
        signature: ''
      - name: FXOptionsCurve.construct_tri
        file: total_return_index_construction/fxoptionscurve-construct-tri.py
        line: 0
        kind: required_method
        signature: ''
      - name: AbstractCurve.generate_key
        file: total_return_index_construction/abstractcurve-generate-key.py
        line: 0
        kind: required_method
        signature: ''
      - name: roll_event
        file: total_return_index_construction/roll-event.py
        line: 0
        kind: replaceable_point
      - name: construct_via_currency
        file: total_return_index_construction/construct-via-currency.py
        line: 0
        kind: replaceable_point
      design_decision_count: 4
    fx_volatility_surface_&_pricing:
      class_count: 6
      stage_id: volatility_pricing
      stage_order: 4
      responsibility: Builds interpolated FX volatility surface from market quotes (ATM vol, 25d/10d risk reversals/strangles).
        Prices vanilla FX options using FinancePy. Computes realized volatility and volatility risk premium for vol-targeting
        adjustments.
      classes:
      - name: FXVolSurface.build_vol_surface
        file: fx_volatility_surface_&_pricing/fxvolsurface-build-vol-surface.py
        line: 0
        kind: required_method
        signature: ''
      - name: FXOptionsPricer.price_option
        file: fx_volatility_surface_&_pricing/fxoptionspricer-price-option.py
        line: 0
        kind: required_method
        signature: ''
      - name: FXForwardsPricer.price_odd_date
        file: fx_volatility_surface_&_pricing/fxforwardspricer-price-odd-date.py
        line: 0
        kind: required_method
        signature: ''
      - name: VolStats.calculate
        file: fx_volatility_surface_&_pricing/volstats-calculate.py
        line: 0
        kind: required_method
        signature: ''
      - name: vol_function_type
        file: fx_volatility_surface_&_pricing/vol-function-type.py
        line: 0
        kind: replaceable_point
      - name: pricing_engine
        file: fx_volatility_surface_&_pricing/pricing-engine.py
        line: 0
        kind: replaceable_point
      design_decision_count: 4
    strategy_backtesting_engine:
      class_count: 7
      stage_id: backtesting
      stage_order: 5
      responsibility: Combines signals with asset returns to compute P&L. Applies volatility targeting, position limits, and
        transaction costs. Aggregates into portfolio returns with exposure tracking and leverage management. Core engine for
        strategy evaluation.
      classes:
      - name: Backtest.calculate_trading_PnL
        file: strategy_backtesting_engine/backtest-calculate-trading-pnl.py
        line: 0
        kind: required_method
        signature: ''
      - name: TradingModel.construct_strategy
        file: strategy_backtesting_engine/tradingmodel-construct-strategy.py
        line: 0
        kind: required_method
        signature: ''
      - name: PortfolioWeightConstruction.optimize_portfolio_weights
        file: strategy_backtesting_engine/portfolioweightconstruction-optimize-por.py
        line: 0
        kind: required_method
        signature: ''
      - name: RiskEngine.calculate_leverage_factor
        file: strategy_backtesting_engine/riskengine-calculate-leverage-factor.py
        line: 0
        kind: required_method
        signature: ''
      - name: portfolio_combination
        file: strategy_backtesting_engine/portfolio-combination.py
        line: 0
        kind: replaceable_point
      - name: signal_delay
        file: strategy_backtesting_engine/signal-delay.py
        line: 0
        kind: replaceable_point
      - name: portfolio_vol_adjust
        file: strategy_backtesting_engine/portfolio-vol-adjust.py
        line: 0
        kind: replaceable_point
      design_decision_count: 7
    trade_analysis_&_reporting:
      class_count: 5
      stage_id: analysis_reporting
      stage_order: 6
      responsibility: Post-backtest analysis including return statistics, sensitivity analysis to parameters and transaction
        costs, day-of-month effects, and comparison across multiple models. Transforms raw P&L into actionable insights.
      classes:
      - name: TradeAnalysis.analyse_strategy
        file: trade_analysis_&_reporting/tradeanalysis-analyse-strategy.py
        line: 0
        kind: required_method
        signature: ''
      - name: BacktestComparison.compare
        file: trade_analysis_&_reporting/backtestcomparison-compare.py
        line: 0
        kind: required_method
        signature: ''
      - name: Report.generate
        file: trade_analysis_&_reporting/report-generate.py
        line: 0
        kind: required_method
        signature: ''
      - name: Seasonality.detect
        file: trade_analysis_&_reporting/seasonality-detect.py
        line: 0
        kind: required_method
        signature: ''
      - name: analysis_engine
        file: trade_analysis_&_reporting/analysis-engine.py
        line: 0
        kind: replaceable_point
      design_decision_count: 2
  data_flow_hints: []
locale_contract:
  source_language: en
  user_facing_fields:
  - human_summary.what_i_can_do.tagline
  - human_summary.what_i_can_do.use_cases[]
  - human_summary.what_i_auto_fetch[]
  - human_summary.what_i_ask_you[]
  - evidence_quality.user_disclosure_template
  - post_install_notice.message_template.positioning
  - post_install_notice.message_template.capability_catalog.groups[].name
  - post_install_notice.message_template.capability_catalog.groups[].description
  - post_install_notice.message_template.capability_catalog.groups[].ucs[].name
  - post_install_notice.message_template.capability_catalog.groups[].ucs[].short_description
  - post_install_notice.message_template.call_to_action
  - post_install_notice.message_template.featured_entries[].beginner_prompt
  - post_install_notice.message_template.more_info_hint
  - preconditions[].description
  - preconditions[].on_fail
  - intent_router.uc_entries[].name
  - intent_router.uc_entries[].ambiguity_question
  - architecture.pipeline
  - architecture.stages[].narrative.does_what
  - architecture.stages[].narrative.key_decisions
  - architecture.stages[].narrative.common_pitfalls
  - constraints.fatal[].consequence
  - constraints.regular[].consequence
  - output_validator.assertions[].failure_message
  - acceptance.hard_gates[].on_fail
  - skill_crystallization.action
  locale_detection_order:
  - explicit_user_declaration
  - first_message_language
  - system_locale
  translation_enforcement:
    trigger: on_first_user_message
    action: Render user_facing_fields in detected locale, preserving all IDs (BD-/SL-/UC-/finance-C-) and code identifiers
      verbatim
    violation_code: LOCALE-01
    violation_signal: User receives untranslated English Human Summary when detected locale != en
evidence_quality:
  declared:
    evidence_coverage_ratio: 1.0
    evidence_verify_ratio: 0.32
    evidence_invalid: 51
    evidence_verified: 24
    evidence_auto_fixed: 0
    audit_coverage: 60/60 (100%)
    audit_pass_rate: 8/60 (13%)
    audit_fail_total: 18
    audit_finance_universal:
      pass: 5
      warn: 11
      fail: 4
    audit_subdomain_totals:
      pass: 3
      warn: 23
      fail: 14
  enforcement_rules:
  - id: EQ-01
    trigger: declared.evidence_verify_ratio < 0.5
    action: MUST invoke traceback lookup for all cited BD-IDs in output before emitting business code — read LATEST.yaml sections
      for each BD referenced
    violation_code: EQ-01-V
    violation_signal: Generated script references BD-IDs but no tool_call to read LATEST.yaml preceded code generation
  user_disclosure_template: '[QUALITY NOTICE] This crystal was compiled from blueprint finance-bp-108. Evidence verify ratio
    = 32.0% and audit fail total = 18. Generated results may have uncaptured requirement gaps. Verify critical decisions against
    source files (LATEST.yaml / LATEST.jsonl).'
traceback:
  source_files:
    blueprint: LATEST.yaml
    constraints: LATEST.jsonl
  mandatory_lookup_scenarios:
  - id: TB-01
    condition: Two constraints have apparently conflicting enforcement rules
    lookup_target: LATEST.jsonl — find both constraint IDs, compare `consequence` + `evidence_refs` to determine priority
  - id: TB-02
    condition: A business decision rationale is unclear or disputed
    lookup_target: LATEST.yaml — locate BD-ID under business_decisions, read `rationale` + `alternative_considered` fields
  - id: TB-03
    condition: evidence_invalid > 0 in evidence_quality.declared
    lookup_target: LATEST.yaml _enrich_meta — cross-check specific BD `evidence_refs` fields for invalid markers
  - id: TB-04
    condition: User asks where a rule comes from
    lookup_target: LATEST.jsonl — find constraint by ID, read `confidence.evidence_refs` for source file + line number
  - id: TB-05
    condition: Generated code does not match expected ZVT API behavior
    lookup_target: LATEST.yaml stages[].required_methods — verify method signature and evidence locator in source code
  degraded_lookup:
    no_fs_access: 'Ask the user to paste the relevant LATEST.yaml section or LATEST.jsonl lines for the BD-/finance-C- IDs
      in question. Crystal ID: finance-bp-108-v5.0.'
trace_schema:
  event_types:
  - precondition_check
  - spec_lock_check
  - evidence_rule_fired
  - evidence_rule_skipped
  - locale_translation_emitted
  - hard_gate_passed
  - hard_gate_failed
  - skill_emitted
  - false_completion_claim
preconditions:
- id: PC-01
  description: zvt package installed and importable
  check_command: python3 -c 'import zvt; print(zvt.__version__)'
  on_fail: 'Run: python3 -m pip install zvt  then re-run: python3 -m zvt.init_dirs to initialize data directories'
  severity: fatal
- id: PC-02
  description: K-data exists for target entities (required before backtesting)
  check_command: python3 -c "from zvt.api.kdata import get_kdata; df = get_kdata(entity_ids=['stock_sh_600000'], limit=1);
    assert df is not None and len(df) > 0, 'No kdata found'"
  on_fail: 'Run recorder first: python3 -m zvt.recorders.em.em_stock_kdata_recorder --entity_ids stock_sh_600000  (replace
    with your target entity IDs)'
  severity: fatal
  applies_to_uc: []
- id: PC-03
  description: ZVT data directory initialized (~/.zvt or ZVT_HOME)
  check_command: 'python3 -c "import os; from pathlib import Path; zvt_home = Path(os.environ.get(''ZVT_HOME'', Path.home()
    / ''.zvt'')); assert zvt_home.exists(), f''ZVT home not found: {zvt_home}''"'
  on_fail: 'Run: python3 -m zvt.init_dirs'
  severity: fatal
- id: PC-04
  description: SQLite write permission for ZVT data directory
  check_command: python3 -c "import os, tempfile; from pathlib import Path; zvt_home = Path(os.environ.get('ZVT_HOME', Path.home()
    / '.zvt')); test_f = zvt_home / '.write_test'; test_f.touch(); test_f.unlink()"
  on_fail: 'Check directory permissions: chmod u+w ~/.zvt  or set ZVT_HOME environment variable to a writable location'
  severity: warn
intent_router:
  uc_entries:
  - uc_id: UC-101
    name: ArcticDB Tick Data Storage
    positive_terms:
    - arcticdb
    - tick data storage
    - time series database
    - lmdb
    - market data persistence
    data_domain: market_data
    negative_terms:
    - backtest strategy
    - screening
    - live trading
    - screening factors
    - ml prediction
    ambiguity_question: Are you looking to store and retrieve tick-level market data from a time series database, or are you
      running a trading strategy or backtest?
  - uc_id: UC-102
    name: FX G10 Cross Backtesting
    positive_terms:
    - backtest
    - fx trading
    - g10 currency
    - technical indicators
    - strategy testing
    data_domain: trading_data
    negative_terms:
    - arcticdb storage
    - s3 bucket
    - screening
    - live trading
    - data collection only
    ambiguity_question: Do you want to test a trading strategy on historical FX data (backtesting), or do you need to just
      fetch and store market data?
  - uc_id: UC-103
    name: Market Data Fetching from Vendors
    positive_terms:
    - market data
    - quandl
    - fetch data
    - vendor data
    - interest rates
    data_domain: market_data
    negative_terms:
    - backtest
    - strategy
    - arcticdb
    - s3 storage
    - live trading
    ambiguity_question: Are you trying to download market data from a vendor for analysis, or are you running a backtest or
      trading strategy?
  - uc_id: UC-104
    name: S3 Cloud Storage for Tick Data
    positive_terms:
    - s3 storage
    - aws
    - parquet
    - cloud storage
    - tick data
    data_domain: market_data
    negative_terms:
    - backtest strategy
    - arcticdb
    - quandl
    - live trading
    - screening
    ambiguity_question: Do you need to store tick data in AWS S3 cloud storage, or are you running a trading strategy or backtest?
context_state_machine:
  states:
  - id: CA1_MEMORY_CHECKED
    entry: Task started
    exit: All memory queries attempted and recorded; memory_unavailable set if failed
    timeout: 30s — skip memory, mark memory_unavailable=true, proceed to CA2
  - id: CA2_GAPS_FILLED
    entry: CA1 complete
    exit: 'All FATAL-priority required inputs answered: target market (A-share/HK/US), data source, time range, strategy type'
    timeout: NOT skippable — FATAL inputs MUST be user-answered before proceeding
  - id: CA3_PATH_SELECTED
    entry: CA2 complete
    exit: intent_router matched single use case with confidence gap > 20% over next candidate, no data_domain ambiguity
    timeout: Trigger ambiguity_question for top-2 candidates, await user selection
  - id: CA4_EXECUTING
    entry: CA3 complete + user explicit confirmation received
    exit: All hard gates G1-Gn passed and output files written
    timeout: NOT skippable — user confirmation of execution path required
  enforcement: Code generation is PROHIBITED before CA4_EXECUTING. Any regression to earlier state MUST be announced to user.
    buy/sell ordering SL-01 check runs at CA4 entry.
spec_lock_registry:
  semantic_locks:
  - id: SL-01
    description: Execute sell orders before buy orders in every trading cycle
    locked_value: sell() called before buy() in each Trader.run() iteration
    violation_is: fatal
    source_bd_ids:
    - BD-018
  - id: SL-02
    description: Trading signals MUST use next-bar execution (no look-ahead)
    locked_value: due_timestamp = happen_timestamp + level.to_second()
    violation_is: fatal
    source_bd_ids:
    - BD-014
    - BD-025
  - id: SL-03
    description: Entity IDs MUST follow format entity_type_exchange_code
    locked_value: stock_sh_600000 | stockhk_hk_0700 | stockus_nasdaq_AAPL
    violation_is: fatal
    source_bd_ids: []
  - id: SL-04
    description: DataFrame index MUST be MultiIndex (entity_id, timestamp)
    locked_value: df.index.names == ['entity_id', 'timestamp']
    violation_is: fatal
    source_bd_ids: []
  - id: SL-05
    description: 'TradingSignal MUST have EXACTLY ONE of: position_pct, order_money, order_amount'
    locked_value: XOR enforcement in trading/__init__.py:68
    violation_is: fatal
    source_bd_ids: []
  - id: SL-06
    description: 'filter_result column semantics: True=BUY, False=SELL, None/NaN=NO ACTION'
    locked_value: factor.py:475 order_type_flag mapping
    violation_is: fatal
    source_bd_ids: []
  - id: SL-07
    description: Transformer MUST run BEFORE Accumulator in factor pipeline
    locked_value: 'compute_result(): transform at :403 before accumulator at :409'
    violation_is: fatal
    source_bd_ids: []
  - id: SL-08
    description: 'MACD parameters locked: fast=12, slow=26, signal=9'
    locked_value: factors/algorithm.py:30 macd(slow=26, fast=12, n=9)
    violation_is: fatal
    source_bd_ids:
    - BD-036
  - id: SL-09
    description: 'Default transaction costs: buy_cost=0.001, sell_cost=0.001, slippage=0.001'
    locked_value: sim_account.py:25 SimAccountService default costs
    violation_is: warning
    source_bd_ids:
    - BD-029
  - id: SL-10
    description: A-share equity trading is T+1 (no same-day close of buy positions)
    locked_value: sim_account.available_long filters by trading_t
    violation_is: fatal
    source_bd_ids: []
  - id: SL-11
    description: Recorder subclass MUST define provider AND data_schema class attributes
    locked_value: contract/recorder.py:71 Meta; register_schema decorator
    violation_is: fatal
    source_bd_ids: []
  - id: SL-12
    description: Factor result_df MUST contain either 'filter_result' OR 'score_result' column
    locked_value: result_df.columns.intersection({'filter_result', 'score_result'}) non-empty
    violation_is: fatal
    source_bd_ids: []
  implementation_hints:
  - id: IH-01
    hint: 'Use AdjustType enum exactly: qfq (pre-adjust), hfq (post-adjust), bfq (none) — contract/__init__.py:121'
  - id: IH-02
    hint: For A-share kdata, default to hfq for long-term analysis (dividend-adjusted) — trader.py:538 StockTrader
  - id: IH-03
    hint: SQLite connection MUST use check_same_thread=False for multi-threaded recorders
  - id: IH-04
    hint: Accumulator state serialization uses JSON with custom encoder/decoder hooks — contract/base_service.py
  - id: IH-05
    hint: Factor.level MUST match TargetSelector.level (enforced at add_factor) — factors/target_selector.py:84
preservation_manifest:
  required_objects:
    business_decisions_count: 82
    fatal_constraints_count: 18
    non_fatal_constraints_count: 132
    use_cases_count: 4
    semantic_locks_count: 12
    preconditions_count: 4
    evidence_quality_rules_count: 2
    traceback_scenarios_count: 5
architecture:
  pipeline: data_collection -> data_storage -> factor_computation -> target_selection -> trading_execution -> visualization
  stages:
  - id: data_collection
    narrative:
      does_what: TimeSeriesDataRecorder and FixedCycleDataRecorder fetch OHLCV and fundamental data from providers (eastmoney,
        joinquant, baostock, akshare) and persist domain objects (Stock1dKdata, BalanceSheet) to SQLite via df_to_db().
      key_decisions: BD-002 chose evaluate_start_end_size_timestamps for incremental fetch (not full refresh) because comparing
        to get_latest_saved_record avoids redundant API calls; BD-003 chose get_data_map field transformation to keep domain
        schema provider-agnostic.
      common_pitfalls: 'Don''t forget SL-11: Recorder subclass MUST declare both provider and data_schema class attributes
        else initialization fails with assertion error; finance-C-001 fatal violation.'
    business_decisions:
    - id: BD-001
      type: B/DK
      summary: Separate vendor tickers from internal tickers via MarketDataRequest
    - id: BD-002
      type: BA/DK
      summary: Use 'close' as default field across each data sources
    - id: BD-031
      type: B/DK
      summary: Left join with fill-down for asset-signal alignment
    - id: BD-032
      type: B/BA
      summary: FFILL for carry/deposit data, not spot
    - id: BD-GAP-001
      type: B
      summary: 'Missing: Train/test time split integrity'
    - id: BD-GAP-002
      type: B
      summary: 'Missing: Immutable event log'
    - id: BD-GAP-003
      type: B
      summary: 'Missing: Immutable event log'
    - id: BD-GAP-004
      type: B
      summary: 'Missing: Default Definition & IFRS 9 Staging'
    - id: BD-GAP-005
      type: B
      summary: 'Missing: Stress Test Macro Variables'
    - id: BD-GAP-006
      type: B
      summary: 'Missing: Funds Transfer Pricing (FTP)'
    - id: BD-GAP-007
      type: B
      summary: 'Missing: Cash Pooling Legal Structure'
  - id: data_storage
    narrative:
      does_what: StorageBackend persists DataFrames to per-provider SQLite databases at {data_path}/{provider}/{provider}_{db_name}.db
        using path templates from _get_path_template; Mixin.record_data and Mixin.query_data provide uniform read/write interface.
      key_decisions: BD-004 chose StorageBackend abstraction (not hardcoded SQLite) to allow future cloud storage swap; BD-006
        derives db_name from data_schema __tablename__ for per-domain database isolation.
      common_pitfalls: SL-04 violation (wrong DataFrame index) causes factor pipeline failures downstream; always ensure df.index.names
        == ['entity_id', 'timestamp'] before calling record_data.
    business_decisions: []
  - id: factor_computation
    narrative:
      does_what: Factor.compute() applies Transformer (stateless, e.g. MacdTransformer) then Accumulator (stateful, e.g. MaStatsAccumulator)
        to produce filter_result or score_result columns; EntityStateService persists per-entity rolling state across batches.
      key_decisions: BD-007 chose Factor inheriting DataReader for composable data access; SL-08 locks MACD at (fast=12, slow=26,
        n=9) — chose standard Appel parameters not adaptive because interpretability matters for practitioners.
      common_pitfalls: 'SL-07: Transformer MUST run before Accumulator — swapping order causes NaN propagation; SL-12: result_df
        must contain filter_result OR score_result column or TargetSelector silently drops all signals.'
    business_decisions: []
  - id: target_selection
    narrative:
      does_what: TargetSelector.add_factor() registers Factor instances; get_targets() returns entity_ids passing threshold
        filter at a specific timestamp, enabling point-in-time historical backtesting without look-ahead.
      key_decisions: BD-012 chose registrable factor list (not hardcoded) for runtime customization; BD-013 chose timestamp-specific
        filtering not current-only because backtests need historical point-in-time correctness.
      common_pitfalls: Factor.level MUST match TargetSelector.level (IH-05); mismatched levels cause silent empty target lists
        that look like no signals but are actually level-mismatch bugs.
    business_decisions: []
  - id: trading_execution
    narrative:
      does_what: Trader.run() calls sell() before buy() each cycle, generates TradingSignals with due_timestamp = happen_timestamp
        + level.to_second() for next-bar execution, and applies on_profit_control() for stop-loss/take-profit before regular
        target selection.
      key_decisions: SL-01 locks sell-before-buy order because available_long check in sim_account depends on it — chose this
        over symmetric ordering to prevent implicit leverage; BD-039 chose long=AND/short=OR multi-level logic to reflect
        risk asymmetry.
      common_pitfalls: 'SL-02 violation (immediate execution instead of next-bar) introduces look-ahead bias and makes backtest
        results unreproducible in live trading; SL-10: A-share T+1 constraint — backtesting without it overstates returns.'
    business_decisions: []
  - id: visualization
    narrative:
      does_what: Drawer.draw() combines kline main chart with factor overlays and Rect annotations for entry/exit signals
        using Plotly; Drawable interface on Factor enables consistent chart rendering across data types.
      key_decisions: BD-019 chose drawer_rects subclass override for custom annotations not hardcoded markers — allows traders
        to define entry/exit visuals without modifying base drawing logic.
      common_pitfalls: draw_result=True by default (BD-055) is fine for development but set draw_result=False in production/headless
        environments to avoid Plotly server startup overhead.
    business_decisions: []
  - id: cross_cutting_concerns
    narrative:
      does_what: 'Invariants and utilities that span multiple pipeline stages — collected from 31 source groups: analysis_reporting(5),
        asset_selection(1), backtesting(7), carry_calculation(1), cost_modeling(1), curve_construction(6), and 25 more.'
      key_decisions: 71 BDs merged here because they apply to more than one main stage (e.g. algorithm helpers, default value
        choices, ordering contracts, error handling). Agent should inspect individual BD summaries and link back to affected
        main stages via shared IDs.
      common_pitfalls: Cross-cutting concerns frequently surface as bugs when changes to one main stage unintentionally break
        another. Check constraints referencing these BDs and verify invariants still hold after any stage-local modification.
    business_decisions:
    - id: BD-020
      type: BA
      summary: Output path defaults to 'output_data/YYYYMMDD' with timestamp
    - id: BD-021
      type: M
      summary: PyFolio integration optional (try/except import)
    - id: BD-052
      type: B
      summary: GraphicalLassoCV for covariance-based network learning
    - id: BD-053
      type: B/DK
      summary: Affinity propagation for clustering assets in network
    - id: BD-054
      type: B
      summary: Locally Linear Embedding for 2D network visualization
    - id: BD-029
      type: B
      summary: G10 USD crosses basket for FX trend following
    - id: BD-014
      type: B/BA
      summary: 10% vol target as default for both signal and portfolio
    - id: BD-015
      type: BA/DK
      summary: 252 annualization factor for daily data
    - id: BD-016
      type: BA
      summary: Max leverage capped at 5x for vol-targeted strategies
    - id: BD-017
      type: BA
      summary: Signal delayed by 0 periods (trade same day as signal)
    - id: BD-018
      type: B/BA
      summary: Position limits applied via element-wise clip adjustment
    - id: BD-019
      type: BA/DK
      summary: Transaction costs in basis points (bp) not decimals
    - id: BD-028
      type: B/BA
      summary: Only allow longs in EURUSD single-currency strategy
    - id: BD-039
      type: B/BA
      summary: ON (overnight) tenor for spot carry calculation
    - id: BD-023
      type: B/BA
      summary: Default spot transaction cost of 2.5 basis points
    - id: BD-006
      type: B/BA
      summary: Depos tenor 'ON' (overnight) as default for spot curve
    - id: BD-007
      type: BA
      summary: Multiplicative cum_index by default ('mult' not 'add')
    - id: BD-008
      type: BA/DK
      summary: FX options roll on 'expiry-date' not month-end
    - id: BD-009
      type: BA/DK
      summary: Construct crosses via 'no' (direct) or via domestic currency
    - id: BD-027
      type: B
      summary: Rebalance frequency of 'BM' (business month end) for vol targeting
    - id: BD-050
      type: B/BA
      summary: Portfolio combination default is None (equal weight)
    - id: BD-040
      type: B/BA
      summary: 'Currencies with 365-day basis: AUD, CAD, GBP, NZD'
    - id: BD-074
      type: BA/M
      summary: Cumulative index default 'mult' vs 'add' changes P&L scaling fundamentally
    - id: BD-055
      type: B/BA
      summary: Event study uses NYC 10am cutoff for economic releases
    - id: BD-030
      type: B/BA
      summary: Signal delay of 0 (same-day execution)
    - id: BD-038
      type: B/BA
      summary: 1M tenor as default for options and forwards trading
    - id: BD-071
      type: RC
      summary: Hardcoded 365-day count currencies affect ALL FX curve calculations
    - id: BD-072
      type: B
      summary: Stop loss/take profit signals MUST be applied BEFORE portfolio weight optimization
    - id: BD-075
      type: B
      summary: Signal delay via signal_delay shift MUST occur BEFORE non-trading day masking
    - id: BD-073
      type: BA/DK
      summary: Transaction costs divided by 2 assumes symmetric round-trip costs (entry + exit)
    - id: BD-059
      type: B/BA
      summary: Parallel backtesting with multiprocessing on Linux (8 threads)
    - id: BD-060
      type: B/BA
      summary: Output calculation fields disabled by default for performance
    - id: BD-033
      type: B
      summary: Numba JIT compilation for total return index calculation
    - id: BD-049
      type: B/BA
      summary: Multiplicative cumulative index starting at 100
    - id: BD-024
      type: B
      summary: Vol target of 10% annualised with 20-day lookback period
    - id: BD-025
      type: B/BA
      summary: Maximum leverage of 5x for vol-adjusted signals
    - id: BD-051
      type: B/BA
      summary: Position clip resample to BM (business month) by default
    - id: BD-057
      type: B
      summary: Use expiry-date roll event for options strategy
    - id: BD-058
      type: B
      summary: Roll 5 days before roll event
    - id: BD-003
      type: B/BA
      summary: Signal uses +1 for buy/above, -1 for sell/below (not 1/0)
    - id: BD-004
      type: BA/DK
      summary: fillna=True by default in TechParams
    - id: BD-005
      type: BA/DK
      summary: Forward-fill signals on non-trading days (hold previous position)
    - id: BD-022
      type: B/BA
      summary: SMA period of 200 for trend following FX strategy
    - id: BD-044
      type: B/BA
      summary: Buy if spot above SMA, sell if below
    - id: BD-045
      type: B/BA
      summary: First n-1 periods set to NaN for rolling indicators
    - id: BD-046
      type: B
      summary: GMMA uses EMA spans [3,5,7,10,12,15] short and [30,35,40,45,50,60] long
    - id: BD-047
      type: B/BA
      summary: RSI period of 14 for momentum calculation
    - id: BD-048
      type: B/BA
      summary: ATR period of 14 for volatility-adjusted signals
    - id: BD-062
      type: B/BA
      summary: Volatility-targeting via rolling realized vol with max leverage cap
    - id: BD-065
      type: B/BA
      summary: Risk stop signals with stop-loss and take-profit levels
    - id: BD-066
      type: B/BA
      summary: Position clipping to enforce net/total exposure limits
    - id: BD-068
      type: B/BA
      summary: Black-Scholes model for FX vanilla option pricing
    - id: BD-064
      type: B
      summary: FX implied vol surface interpolation with polynomial/Clark5 methods
    - id: BD-069
      type: T
      summary: Rolling realized volatility with annualization factor
    - id: BD-070
      type: T
      summary: Volatility risk premium as implied minus realized vol
    - id: BD-063
      type: B/DK
      summary: Guppy Multiple Moving Average with 12 EMA components
    - id: BD-067
      type: B
      summary: Graphical Lasso for sparse covariance estimation in network
    - id: BD-026
      type: B/BA
      summary: Annualisation factor of 252 for daily data
    - id: BD-041
      type: B/BA
      summary: Multiplier of sqrt(3) for Friday ON vol adjustment
    - id: BD-056
      type: B/BA
      summary: Realised vol rolling window of tenor_days for daily data
    - id: BD-042
      type: B
      summary: Weighted median model for implied vol addon estimation
    - id: BD-043
      type: B/BA
      summary: Model window of 20 days for vol addon calculation
    - id: BD-010
      type: B/BA
      summary: CLARK5 interpolation for vol surface by default
    - id: BD-011
      type: B/BA
      summary: Fwd-delta-neutral-premium-adj ATM method
    - id: BD-012
      type: M/BA
      summary: Nelder-Mead-Numba solver (faster but less accurate)
    - id: BD-013
      type: M
      summary: Uses FinancePy externally for pricing engine
    - id: BD-034
      type: B
      summary: CLARK5 interpolation for FX vol surface
    - id: BD-035
      type: B/BA
      summary: Forward delta neutral premium adjusted for ATM method
    - id: BD-036
      type: B/BA
      summary: Spot delta premium adjusted for delta quoting
    - id: BD-037
      type: B/DK
      summary: Nelder-Mead Numba solver for vol surface calibration
    - id: BD-061
      type: B/BA
      summary: Premium output in pct-for (base currency percentage)
resources:
  packages:
  - name: blosc
    version_pin: latest
  - name: chartpy
    version_pin: latest
  - name: findatapy
    version_pin: latest
  - name: matplotlib
    version_pin: latest
  - name: numba
    version_pin: latest
  - name: numpy
    version_pin: latest
  - name: pandas
    version_pin: latest
  - name: scikit-learn
    version_pin: latest
  - name: seasonal
    version_pin: latest
  - name: financepy
    version_pin: latest
  strategy_scaffold:
    entry_point_name: run_backtest
    output_path: result.csv
    execution_mode: backtest
    conditional_entry_points:
      backtest:
        entry_point_name: run_backtest
        output_path: result.csv
      collector:
        entry_point_name: run_collector
        output_path: result.json
      factor:
        entry_point_name: run_factor
        output_path: result.parquet
      training:
        entry_point_name: run_training
        output_path: result.json
      serving:
        entry_point_name: run_server
        output_path: result.json
      research:
        entry_point_name: run_research
        output_path: result.json
    tail_template: "# === DO NOT MODIFY BELOW THIS LINE ===\nif __name__ == \"__main__\":\n    result = run_backtest()  #\
      \ implement above\n    from validate import enforce_validation\n    enforce_validation(result, output_path=\"{workspace}/result.csv\"\
      )\n# === END DO NOT MODIFY ==="
  host_adapter:
    target: openclaw
    timeout_seconds: 1800
    shell_operator_restriction: 'exec tool intercepts && / ; / | — never chain: ''pip install X && python Y''. Use separate
      exec calls.'
    install_recipes:
    - python3 -m pip install blosc
    - python3 -m pip install chartpy
    - python3 -m pip install findatapy
    - python3 -m pip install zvt
    credential_injection: JoinQuant/QMT credentials require user-side '!' prefix shell login. Never hardcode credentials in
      generated scripts.
    path_resolution: '{workspace} resolves to ~/.openclaw/workspace/doramagic at execution time.'
    file_io_tooling: Use openclaw 'write' tool for .py/.sql files; 'exec' tool for python3 /absolute/path/script.py (absolute
      paths only).
constraints:
  fatal:
  - id: finance-C-008
    when: When accessing Bloomberg Terminal for market data
    action: Attempt to use Bloomberg Desktop API on non-Windows platforms
    severity: fatal
    kind: resource_boundary
    modality: must_not
    consequence: Bloomberg Terminal/DAPI is Windows-only; attempting to use it on Linux/Mac causes immediate failure with
      cryptic errors
    stage_ids:
    - data_collection
  - id: finance-C-011
    when: When using finmarketpy for live trading decisions
    action: Claim finmarketpy provides real-time trading signals or live execution capability
    severity: fatal
    kind: claim_boundary
    modality: must_not
    consequence: finmarketpy is a backtesting library; claiming live trading capability misleads users into making trading
      decisions based on unverified backtest-only code
    stage_ids:
    - data_collection
  - id: finance-C-012
    when: When presenting backtest results
    action: Present backtest returns as guaranteed future trading performance
    severity: fatal
    kind: claim_boundary
    modality: must_not
    consequence: Backtest results reflect historical conditions and perfect execution assumptions; presenting them as future
      guarantees leads to unexpected live trading losses
    stage_ids:
    - data_collection
  - id: finance-C-016
    when: When computing discrete trading signals from technical indicators
    action: output exclusively +1 (long), -1 (short), or NaN (flat) values
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Signal values outside {+1, -1, NaN} will cause incorrect position sizing in the backtest engine, as the PnL
      calculation at backtestengine.py:201-216 assumes symmetric long/short encoding
    stage_ids:
    - signal_generation
  - id: finance-C-017
    when: When implementing indicator-based signal generation
    action: introduce inherent 1-period lag through rolling window or shift operations
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Signals generated without lag will exhibit look-ahead bias, causing live trading returns to fall far below
      backtested results because the strategy would have traded on information not yet available
    stage_ids:
    - signal_generation
  - id: finance-C-022
    when: When constructing signals from RSI indicator
    action: use shift(-1) for RSI exit signals to prevent look-ahead bias
    severity: fatal
    kind: architecture_guardrail
    modality: must
    consequence: Using current-period RSI values for signal generation creates look-ahead bias, as the signal would fire based
      on price movements that haven't occurred yet in the current period
    stage_ids:
    - signal_generation
  - id: finance-C-029
    when: When implementing total return index construction with multiplicative cumulation
    action: Initialize TRI at base value 100 and compound returns forward using cumprod
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Additive cumulation (cumsum) allows negative cumulative returns to break the index chain, causing TRI values
      to become meaningless for P&L calculation and potentially producing misleading backtest results
    stage_ids:
    - curve_construction
  - id: finance-C-035
    when: When handling missing deposit rate data at start of TRI series
    action: Forward-fill carry data and backward-fill only old data gaps, never forward-fill spot prices
    severity: fatal
    kind: architecture_guardrail
    modality: must
    consequence: Forward-filling spot prices creates look-ahead bias where future prices are used to calculate historical
      returns, invalidating all TRI-based backtest results
    stage_ids:
    - curve_construction
  - id: finance-C-046
    when: When calculating option delta hedging P&L
    action: Use previous period's delta for current period's spot return hedging
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Using current delta with current spot return creates circular dependency where hedge ratio is set after price
      is known, producing fictitious hedging profits
    stage_ids:
    - curve_construction
  - id: finance-C-048
    when: When computing cross-currency returns via intermediate currency
    action: Subtract term currency returns from base currency returns (base_rets - terms_rets)
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Incorrect sign in cross-currency return calculation inverts the strategy direction, causing long positions
      to be treated as shorts and producing completely wrong P&L attribution
    stage_ids:
    - curve_construction
  - id: finance-C-059
    when: When pricing options on high-vol event dates
    action: Build vol surface before extracting surface or pricing instruments
    severity: fatal
    kind: architecture_guardrail
    modality: must
    consequence: Calling extract_vol_surface or price_instrument without prior build_vol_surface results in NoneType errors,
      causing complete failure of option pricing pipeline
    stage_ids:
    - volatility_pricing
  - id: finance-C-061
    when: When processing vol surface quotes for JPY pairs
    action: Apply divisor of 100 to JPY rates to convert from percentage to decimal
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: JPY market quotes are in percentage (e.g., 3.46%) while code expects decimal; omitting divisor produces 100x
      wrong rates, corrupting discount curves and option prices
    stage_ids:
    - volatility_pricing
  - id: finance-C-085
    when: When outputting strategy comparison results
    action: validate each models are TradingModel instances before plotting
    severity: fatal
    kind: architecture_guardrail
    modality: must
    consequence: Non-TradingModel input causes AttributeError at runtime when plotting methods are called, blocking comparison
      reports
    stage_ids:
    - analysis_reporting
  - id: finance-C-089
    when: When presenting backtest results to stakeholders
    action: present simulated backtest returns as guaranteed future performance
    severity: fatal
    kind: claim_boundary
    modality: must_not
    consequence: Misrepresenting backtest results as predictive leads to inappropriate strategy allocation, potentially causing
      significant financial losses when live trading differs from historical simulation
    stage_ids:
    - analysis_reporting
  - id: finance-C-102
    when: When implementing signal generation logic for bidirectional trading strategies
    action: Use +1 for buy/above-threshold signals and -1 for sell/below-threshold signals — do not use 1/0 binary encoding
      which lacks directional information for short positions
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Using 1/0 binary encoding causes undefined behavior at signal boundaries and eliminates the ability to represent
      short positions, breaking long/short strategy symmetry and producing incorrect trading signals
    derived_from_bd_id: BD-003
  - id: finance-C-106
    when: When implementing option pricing functionality in FX options trading
    action: Verify FinancePy library is installed and accessible — the framework delegates each option pricing calculations
      to FinancePy; without it, pricing functionality will fail
    severity: fatal
    kind: architecture_guardrail
    modality: must
    consequence: Attempting to use option pricing without FinancePy causes import failures, breaking the entire options pricing
      pipeline and preventing vol surface calibration and option valuation
    derived_from_bd_id: BD-013
  - id: finance-C-109
    when: When implementing FX curve calculations for AUD, CAD, GBP, or NZD currencies
    action: Apply ACT/365 (ACTual/365) day count convention for accruals and discounting on AUD, CAD, GBP, and NZD currencies
      — do not use 30/360 or ACT/360
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Applying incorrect day count conventions violates market conventions and regulatory reporting requirements,
      causing miscalculated funding costs and incorrect risk valuations that may constitute regulatory violations
    derived_from_bd_id: BD-071
  - id: finance-C-139
    when: When validating Black-Scholes model inputs for FX vanilla option pricing
    action: 'Verify that volatility inputs conform to Black-Scholes constant-vol assumption: volatility_std > 0.001 (0.1%),
      volatility < 2.0 (200%), and volatility_surface_skew < 0.05 (5%) — inputs outside these bounds indicate smile dynamics
      incompatible with the model'
    severity: fatal
    kind: architecture_guardrail
    modality: must
    consequence: Invalid volatility inputs cause Black-Scholes to produce nonsensical prices (near-zero or infinite values
      when vol approaches zero or infinity), making all downstream Greeks and hedge ratios unreliable
    derived_from_bd_id: BD-068
  regular:
  - id: finance-C-001
    when: When implementing data collection for backtesting
    action: Use 'close' field as the default trading field for price calculations
    severity: high
    kind: domain_rule
    modality: must
    consequence: Using non-'close' fields as default may cause incorrect backtest results, since close prices are most reliable
      for end-of-day backtesting and strategies are typically designed around close-to-close returns
    stage_ids:
    - data_collection
  - id: finance-C-002
    when: When processing market data with non-trading days
    action: Produce NaN values for missing data on non-trading days rather than raising errors
    severity: high
    kind: domain_rule
    modality: must
    consequence: Throwing errors on non-trading days will halt the entire backtest pipeline; NaN values allow the system to
      continue and fill forward from the last valid price
    stage_ids:
    - data_collection
  - id: finance-C-003
    when: When creating DataFrame output from market data collection
    action: Verify returned DataFrame has DateTimeIndex aligned to the requested date range
    severity: high
    kind: domain_rule
    modality: must
    consequence: Non-DatetimeIndex or misaligned index causes downstream indicator calculations and signal generation to fail
      or produce incorrect results
    stage_ids:
    - data_collection
  - id: finance-C-004
    when: When collecting data from external market data vendors
    action: Use MarketDataRequest abstraction to separate vendor-specific tickers from internal tickers
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Mixing vendor tickers with internal logic creates tight coupling; switching vendors requires rewriting strategy
      code instead of just updating ticker mappings
    stage_ids:
    - data_collection
  - id: finance-C-005
    when: When loading market data via TradingModel
    action: Use the load_assets method as the sole data entry point for the TradingModel
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Direct access to market data bypassing load_assets bypasses data validation and standardization, causing
      inconsistent behavior across strategies
    stage_ids:
    - data_collection
  - id: finance-C-006
    when: When fetching market data from external data sources
    action: Implement fallback mechanism when data fetch returns None
    severity: high
    kind: operational_lesson
    modality: must
    consequence: Network failures, API errors, or missing credentials cause fetch_market to return None; without fallback,
      the entire backtest fails without generating any results
    stage_ids:
    - data_collection
  - id: finance-C-007
    when: When using external market data vendors
    action: Assume real-time data availability from vendors that only provide delayed data
    severity: high
    kind: resource_boundary
    modality: must_not
    consequence: Most vendors (FRED, Quandl, Yahoo Finance) provide delayed data; assuming real-time causes live trading strategies
      to fail or trade on stale prices
    stage_ids:
    - data_collection
  - id: finance-C-009
    when: When downloading market data from paid data vendors
    action: Configure API keys for FRED, Quandl, and other vendors requiring authentication
    severity: high
    kind: resource_boundary
    modality: must
    consequence: Missing API keys cause authentication failures; data downloads fail and backtest cannot proceed
    stage_ids:
    - data_collection
  - id: finance-C-010
    when: When standardizing column names from different data vendors
    action: Verify DataFrame columns match standardized names regardless of vendor source
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Bloomberg returns 'PX_LAST', FRED returns 'close'; if not normalized, downstream signal generation and P&L
      calculations fail with KeyError
    stage_ids:
    - data_collection
  - id: finance-C-013
    when: When using parallel data fetching from external vendors
    action: Use excessive parallel threads that trigger vendor rate limits
    severity: medium
    kind: operational_lesson
    modality: must_not
    consequence: Some data providers limit concurrent requests; excessive threads cause HTTP 429 errors, data fetch failures,
      or temporary API key suspension
    stage_ids:
    - data_collection
  - id: finance-C-014
    when: When configuring market data collection for backtesting
    action: Set signal_delay parameter to prevent look-ahead bias in strategy signals
    severity: high
    kind: domain_rule
    modality: must
    consequence: Signals generated at end-of-day using same-day close prices cannot be executed; without signal_delay, backtest
      assumes impossible execution timing
    stage_ids:
    - data_collection
  - id: finance-C-015
    when: When running multiple parallel backtests with market data fetching
    action: Use Redis caching to reduce redundant API calls to data vendors
    severity: medium
    kind: operational_lesson
    modality: should
    consequence: Repeated data fetches for same tickers consume API quota, slow down backtesting, and may hit rate limits
    stage_ids:
    - data_collection
  - id: finance-C-018
    when: When initializing TechParams for signal generation
    action: verify indicator warmup periods produce NaN signals for the initial window
    severity: high
    kind: domain_rule
    modality: must
    consequence: Signals generated before the warmup period completes will use incomplete indicator calculations, producing
      unreliable trading signals that can cause significant financial losses
    stage_ids:
    - signal_generation
  - id: finance-C-019
    when: When processing signals across non-trading days
    action: forward-fill the previous valid signal to maintain position continuity
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Without forward-fill, signals on non-trading days will be NaN, causing unintended position flat states that
      break position continuity and distort backtested PnL calculations
    stage_ids:
    - signal_generation
  - id: finance-C-020
    when: When configuring signal direction filters
    action: simultaneously set both only_allow_longs and only_allow_shorts to True
    severity: high
    kind: domain_rule
    modality: must_not
    consequence: Setting both direction filters simultaneously results in all signals being zeroed out, as each filter eliminates
      the other's signals, producing a dead strategy with zero returns
    stage_ids:
    - signal_generation
  - id: finance-C-021
    when: When creating technical indicators with fillna enabled
    action: forward-fill missing prices before computing indicators to verify continuous signals
    severity: high
    kind: domain_rule
    modality: must
    consequence: Computing indicators without forward-filling prices creates NaN gaps that propagate through rolling calculations,
      causing discontinuous indicator values and erratic signal generation
    stage_ids:
    - signal_generation
  - id: finance-C-023
    when: When backtesting strategies with technical indicators
    action: claim that backtest returns equal expected live trading returns
    severity: high
    kind: claim_boundary
    modality: must_not
    consequence: Presenting backtest results as live trading proof violates the fundamental limitation that backtests cannot
      account for slippage, liquidity constraints, and execution delays present in live markets
    stage_ids:
    - signal_generation
  - id: finance-C-024
    when: When configuring signal_delay parameter
    action: apply signal shift in the backtest engine after signal generation
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Skipping signal_delay application causes signals to execute on the same bar as indicator calculation, creating
      look-ahead bias in the backtest that won't occur in live trading
    stage_ids:
    - signal_generation
  - id: finance-C-025
    when: When computing Bollinger Bands signals
    action: forward-fill flat signals between band touches to maintain position state
    severity: medium
    kind: architecture_guardrail
    modality: must
    consequence: Without forward-fill, Bollinger Band signals become NaN between touch events, causing unintended position
      flat states that break the trend-following logic
    stage_ids:
    - signal_generation
  - id: finance-C-026
    when: When constructing signal DataFrames
    action: preserve the original price DataFrame index (DateTimeIndex) for temporal alignment
    severity: high
    kind: domain_rule
    modality: must
    consequence: Using a different index causes misaligned signal-to-price multiplication in the backtest, producing NaN PnL
      values because signal and price timestamps don't match
    stage_ids:
    - signal_generation
  - id: finance-C-027
    when: When implementing custom technical indicators
    action: override create_custom_tech_ind method and call parent implementation for standard indicators
    severity: medium
    kind: architecture_guardrail
    modality: must
    consequence: Custom indicators that don't follow the _signal/_techind naming convention will break the TradingModel.construct_strategy
      call chain, causing PnL calculations to fail
    stage_ids:
    - signal_generation
  - id: finance-C-028
    when: When using volatility-adjusted signals
    action: claim real-time signal generation capability when using polling-based data fetching
    severity: medium
    kind: claim_boundary
    modality: must_not
    consequence: Polling-based data fetching cannot provide true real-time signals; claiming real-time capability misleads
      users about the system's latency characteristics
    stage_ids:
    - signal_generation
  - id: finance-C-030
    when: When constructing total return indices for FX spot
    action: Use overnight deposit tenor (ON) for carry calculation to represent true daily carry cost
    severity: high
    kind: domain_rule
    modality: must
    consequence: Using longer deposit tenors (1M, 3M) misrepresents daily carry cost, causing TRI to over/understate true
      overnight FX position returns and leading to incorrect strategy P&L attribution
    stage_ids:
    - curve_construction
  - id: finance-C-031
    when: When calculating carry accrual for TRI construction
    action: Apply correct day count convention based on currency (365 for AUD/CAD/GBP/NZD, 360 for others)
    severity: high
    kind: domain_rule
    modality: must
    consequence: Incorrect day count causes carry returns to be miscalculated by approximately 1.4%, leading to systematic
      TRI drift from true values and invalid strategy performance comparison
    stage_ids:
    - curve_construction
  - id: finance-C-032
    when: When rolling FX forwards contracts in TRI construction
    action: Use month-end as roll trigger with 5 business days before for 1M contracts
    severity: high
    kind: domain_rule
    modality: must
    consequence: Rolling on incorrect dates causes exposure to expiring contracts, resulting in gap risk at delivery and misalignment
      with broker execution dates
    stage_ids:
    - curve_construction
  - id: finance-C-033
    when: When rolling FX options contracts in TRI construction
    action: Use expiry-date as roll trigger to avoid gamma exposure near expiration
    severity: high
    kind: domain_rule
    modality: must
    consequence: Rolling at month-end instead of expiry-date creates positions with high gamma near expiration, causing delta
      hedging costs to spike and TRI to overstate option strategy returns
    stage_ids:
    - curve_construction
  - id: finance-C-034
    when: When joining TRI series with different currency pairs
    action: Use outer join to preserve each dates across currency pairs
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Inner join drops dates where only some pairs have data, creating gaps in multi-currency portfolio TRI and
      causing signal generation to skip valid trading days
    stage_ids:
    - curve_construction
  - id: finance-C-036
    when: When constructing cross rates via intermediate currency
    action: Handle USDUSD special case by returning zero returns and apply correct sign for base/terms currency
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Missing USDUSD special case causes division by zero or incorrect cross-rate calculation, producing NaN or
      wrong TRI values for pairs involving the reference currency
    stage_ids:
    - curve_construction
  - id: finance-C-037
    when: When processing FX forward tickers for TRI construction
    action: Convert ticker notation to market convention before processing (e.g., USDEUR -> EURUSD)
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Processing non-standard ticker notation causes incorrect delivery date calculation and wrong roll timing,
      leading to positions held past expiration
    stage_ids:
    - curve_construction
  - id: finance-C-038
    when: When calculating time differences for daily carry accrual
    action: Floor time difference to whole days and set first value to zero
    severity: medium
    kind: domain_rule
    modality: must
    consequence: Using sub-daily time differences causes incorrect carry scaling for intraday data, making TRI inconsistent
      between daily and intraday backtests
    stage_ids:
    - curve_construction
  - id: finance-C-039
    when: When marking FX forwards to market at roll date
    action: Use previous contract's interpolated forward price for MTM calculation on roll dates
    severity: high
    kind: domain_rule
    modality: must
    consequence: Using current (new) contract price for MTM at roll date causes artificial return spike/discontinuity, inflating
      TRI and misrepresenting actual P&L at roll
    stage_ids:
    - curve_construction
  - id: finance-C-040
    when: When working with NDF currencies (e.g., BRL, INR, KRW)
    action: Forward-fill missing deposit data since NDF fixings may have gaps
    severity: high
    kind: operational_lesson
    modality: must
    consequence: Without forward-filling NDF deposit data, TRI construction fails on business days without fixing, creating
      NaN carry values and breaking the cumulative index chain
    stage_ids:
    - curve_construction
  - id: finance-C-041
    when: When aligning deposit data with spot prices
    action: Join carry with spot using inner join to verify carry only exists when spot is available
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Left join with spot causes carry values to exist on days without spot data, leading to carry accrual without
      price movement and TRI overstatement
    stage_ids:
    - curve_construction
  - id: finance-C-042
    when: When handling first return in TRI series
    action: Set first return to zero (0) rather than leaving as NaN from shift operation
    severity: high
    kind: domain_rule
    modality: must
    consequence: NaN first return propagates through cumprod, corrupting entire TRI series and causing downstream signal generation
      and P&L calculation failures
    stage_ids:
    - curve_construction
  - id: finance-C-043
    when: When implementing FX options TRI with delta hedging
    action: Price exiting option using previous day's strike and expiry to avoid look-ahead bias
    severity: high
    kind: domain_rule
    modality: must
    consequence: Using current contract parameters for MTM on exit date reveals future information, creating artificial hedging
      profits/losses not achievable in live trading
    stage_ids:
    - curve_construction
  - id: finance-C-044
    when: When using freeze_implied_vol for options pricing
    action: Disable vol freezing unless testing sensitivity scenarios
    severity: medium
    kind: operational_lesson
    modality: should_not
    consequence: Freezing implied vol causes TRI to ignore vol surface evolution, misrepresenting realized option P&L and
      creating discrepancies with live trading where vol changes affect delta
    stage_ids:
    - curve_construction
  - id: finance-C-045
    when: When comparing constructed TRI with external benchmarks (Bloomberg)
    action: Accept approximate tracking rather than exact match due to timing and convention differences
    severity: medium
    kind: claim_boundary
    modality: must
    consequence: Presenting constructed TRI as identical to Bloomberg indices overstates accuracy; timing differences (NYC
      vs. LDN cut) and convention handling create measurable tracking error
    stage_ids:
    - curve_construction
  - id: finance-C-047
    when: When handling FX option expiry dates for backtest
    action: Adjust expiry to nearest available market data date if expiry falls on non-trading day
    severity: high
    kind: operational_lesson
    modality: must
    consequence: Using calendar expiry date when market data doesn't exist causes NaN pricing and TRI breaks at expiry, leading
      to missing P&L around roll dates
    stage_ids:
    - curve_construction
  - id: finance-C-049
    when: When building an FX volatility surface from market quotes
    action: Use CLARK5 interpolation function type for smoother vol surface without arbitrage
    severity: high
    kind: domain_rule
    modality: must
    consequence: Using BBG interpolation may produce jagged vol surfaces with potential arbitrage violations, causing incorrect
      option pricing and PnL calculation errors
    stage_ids:
    - volatility_pricing
  - id: finance-C-050
    when: When quoting ATM volatility for FX options
    action: Use fwd-delta-neutral-premium-adj ATM method to account for premium difference between calls/puts
    severity: high
    kind: domain_rule
    modality: must
    consequence: Using spot or forward ATM methods without premium adjustment will misalign delta-neutral strikes, causing
      systematic pricing errors in FX options strategies
    stage_ids:
    - volatility_pricing
  - id: finance-C-051
    when: When interpolating volatility surface across tenors
    action: Interpolate linearly in variance space (sigma^2 * T), not in vol space
    severity: high
    kind: domain_rule
    modality: must
    consequence: Linear interpolation in vol space creates biased vol surface, causing material pricing errors especially
      for long-dated options where variance interpolation is mathematically correct
    stage_ids:
    - volatility_pricing
  - id: finance-C-052
    when: When calculating volatility risk premium
    action: Align implied vol and realized vol periods using BDay offset, not simple shift
    severity: high
    kind: domain_rule
    modality: must
    consequence: Using pandas shift() for VRP calculation introduces look-ahead bias, causing VRP estimates to include future
      information and misrepresent true vol risk premium
    stage_ids:
    - volatility_pricing
  - id: finance-C-053
    when: When building vol surface for unstable market periods
    action: Increase solver tolerance to fill sparse vol surface areas during high-vol events
    severity: high
    kind: domain_rule
    modality: must
    consequence: During market stress (Brexit, elections), default tolerance 1e-8 causes solver non-convergence, leaving gaps
      in vol surface that corrupt downstream option pricing
    stage_ids:
    - volatility_pricing
  - id: finance-C-054
    when: When calculating realized volatility
    action: Strip time component from datetime index before returning realized vol series
    severity: medium
    kind: domain_rule
    modality: must
    consequence: Realized vol series retains timestamp information causing join failures with daily implied vol data, producing
      NaN VRP values and broken downstream calculations
    stage_ids:
    - volatility_pricing
  - id: finance-C-055
    when: When using FX vol surface for option pricing
    action: Install FinancePy as optional dependency separately from finmarketpy
    severity: high
    kind: resource_boundary
    modality: must
    consequence: FinancePy has strict version dependencies (llvmlite) that conflict with other libraries; installing via pip
      with --no-deps prevents dependency conflicts that break vol surface construction
    stage_ids:
    - volatility_pricing
  - id: finance-C-056
    when: When choosing a volatility surface interpolation method
    action: Use SABR model for production vol surface fitting
    severity: high
    kind: resource_boundary
    modality: must_not
    consequence: SABR volatility function type is not fully implemented in current FinancePy version, causing unpredictable
      behavior and potential crashes during build_vol_surface calls
    stage_ids:
    - volatility_pricing
  - id: finance-C-057
    when: When calibrating FX vol surface with nelder-mead-numba solver
    action: Accept accuracy-speed tradeoff; numba version is faster but less precise
    severity: medium
    kind: resource_boundary
    modality: must
    consequence: Using nelder-mead-numba for real-time pricing gives speed but introduces calibration imprecision that accumulates
      across vol surface strikes, causing systematic mispricing
    stage_ids:
    - volatility_pricing
  - id: finance-C-058
    when: When running FX vol surface code with Numba JIT compilation
    action: Delete __pycache__ folders if Numba frontend errors occur
    severity: high
    kind: operational_lesson
    modality: must
    consequence: Stale Numba cache causes 'Failed in nopython mode pipeline' errors that prevent vol surface construction,
      blocking all option pricing functionality
    stage_ids:
    - volatility_pricing
  - id: finance-C-060
    when: When calling calculate_vol_for_strike_expiry
    action: Pass either expiry_date or tenor parameter for vol interpolation
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Passing neither expiry_date nor tenor returns None, causing downstream NaN vol values that corrupt option
      pricing calculations
    stage_ids:
    - volatility_pricing
  - id: finance-C-062
    when: When working with 10d delta strikes in vol surface
    action: Expect full 10d strike interpolation support in price_instrument
    severity: medium
    kind: resource_boundary
    modality: must_not
    consequence: 10d OTM strike pricing has incomplete implementation (TODO comment), causing incorrect strikes to be used
      for 10d butterfly and strangle constructions
    stage_ids:
    - volatility_pricing
  - id: finance-C-063
    when: When comparing implied vol with realized vol
    action: Present VRP as guaranteed predictor of future realized vol
    severity: high
    kind: claim_boundary
    modality: must_not
    consequence: Vol risk premium is a statistical relationship that frequently breaks down during market regime changes,
      and presenting it as predictive causes overconfident risk estimates
    stage_ids:
    - volatility_pricing
  - id: finance-C-064
    when: When computing vol surface across multiple dates
    action: Use extract_vol_surface_across_dates for batch processing, not individual calls
    severity: medium
    kind: operational_lesson
    modality: must
    consequence: Building vol surface individually for each date without the batch method causes redundant computation and
      potential inconsistent extreme value tracking across the surface
    stage_ids:
    - volatility_pricing
  - id: finance-C-065
    when: When implementing volatility targeting with rolling window periods
    action: account for NaN values in the first N periods before the rolling window completes
    severity: high
    kind: domain_rule
    modality: must
    consequence: Volatility-adjusted leverage will be NaN for the initial periods equal to the rolling window (vol_periods),
      causing backtest P&L series to contain NaN values for warmup periods and potentially causing downstream calculation
      errors in portfolio aggregation
    stage_ids:
    - backtesting
  - id: finance-C-066
    when: When calculating transaction costs from basis points input
    action: convert bp to decimal by dividing by (2.0 * 100.0 * 100.0) for spot_tc_bp and by (100.0 * 100.0) for spot_rc_bp
    severity: high
    kind: domain_rule
    modality: must
    consequence: Transaction costs will be incorrectly applied (off by factor of 10000), severely overstating or understating
      actual trading costs and producing misleading backtest P&L results that do not reflect realistic transaction cost drag
    stage_ids:
    - backtesting
  - id: finance-C-067
    when: When configuring annualization factor for daily return statistics
    action: use 252 as the annualization factor for daily data to match standard trading days in a year
    severity: high
    kind: domain_rule
    modality: must
    consequence: Annualized return and volatility statistics will be overstated if using 365 (natural days) or understated
      if using other values, leading to incorrect Sharpe ratios and risk metrics that mischaracterize strategy performance
    stage_ids:
    - backtesting
  - id: finance-C-068
    when: When aligning signals with asset returns for P&L calculation
    action: apply forward-fill (ffill) to asset holidays and signal gaps to carry forward the last valid value
    severity: medium
    kind: domain_rule
    modality: must
    consequence: P&L will contain NaN values on asset holidays even though the signal remains valid, causing discontinuities
      in cumulative returns and incorrect trade counts when assets resume trading after gaps
    stage_ids:
    - backtesting
  - id: finance-C-069
    when: When applying stop loss and take profit risk management
    action: apply stop loss/take profit signals BEFORE portfolio weight optimization and volatility targeting
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Risk stops applied after portfolio optimization will cause incorrect signal cascades where leverage adjustments
      trigger unnecessary stops, distorting the true risk-adjusted returns and overstating the frequency of stop-out events
    stage_ids:
    - backtesting
  - id: finance-C-070
    when: When implementing signal delay for execution timing
    action: apply signal_delay via pandas shift operation to delay signal execution by the configured number of periods
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Backtest will exhibit look-ahead bias if signal_delay=0 is used with same-day execution assumption, causing
      P&L alignment errors when the strategy is deployed live with next-day execution, overstating historical returns
    stage_ids:
    - backtesting
  - id: finance-C-071
    when: When calculating trade counts from signal changes
    action: compute trade counts as the absolute difference of signal values using shift(1) operation
    severity: medium
    kind: architecture_guardrail
    modality: must
    consequence: Trade counts will not match actual signal changes if computed differently, causing discrepancies between
      reported turnover and P&L attribution analysis, making it impossible to accurately measure transaction cost drag
    stage_ids:
    - backtesting
  - id: finance-C-072
    when: When applying position limits via element-wise clip adjustment
    action: scale positions proportionally using the position_clip_adjustment to respect net and total exposure limits
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Portfolio will exceed configured position limits causing unintended concentrated exposure, potentially leading
      to significant losses during adverse market movements when leverage exceeds risk parameters
    stage_ids:
    - backtesting
  - id: finance-C-073
    when: When configuring maximum leverage for volatility-targeted strategies
    action: set max_leverage to 5.0 as the conservative default to prevent runaway leverage in volatile periods
    severity: high
    kind: resource_boundary
    modality: must
    consequence: Unlimited leverage during low-volatility periods will create excessive position sizes that amplify losses
      during volatility spikes, potentially causing margin calls and forced liquidation at the worst possible time
    stage_ids:
    - backtesting
  - id: finance-C-074
    when: When using vol targeting without sufficient historical data
    action: verify the backtest period contains at least vol_periods observations before computing meaningful leverage
    severity: medium
    kind: operational_lesson
    modality: must
    consequence: Leverage calculation will produce unreliable or extreme values due to insufficient sample for volatility
      estimation, causing unstable position sizing that either under-allocates capital or generates excessive leverage
    stage_ids:
    - backtesting
  - id: finance-C-075
    when: When applying portfolio-level volatility targeting
    action: apply portfolio leverage AFTER individual signal leverage to correctly scale the combined position
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Double application or incorrect ordering of leverage will distort portfolio-level risk controls, causing
      either excessive or insufficient total exposure compared to the intended vol target
    stage_ids:
    - backtesting
  - id: finance-C-076
    when: When interpreting backtest results as indicators of live trading performance
    action: present backtest returns as guaranteed expected live trading returns without specified caveats
    severity: high
    kind: claim_boundary
    modality: must_not
    consequence: Backtest results include look-ahead bias, ignore slippage, and do not account for execution variability,
      leading to unrealistic expectations that can cause poor risk management decisions and significant losses when deploying
      strategies live
    stage_ids:
    - backtesting
  - id: finance-C-077
    when: When using finmarketpy backtest results for regulatory or compliance reporting
    action: represent simulated backtest P&L as proof of actual trading performance without independent verification
    severity: high
    kind: claim_boundary
    modality: must_not
    consequence: Presenting backtest-only results as verified trading records violates compliance requirements and can constitute
      misleading marketing, exposing the practitioner to regulatory action and reputational damage
    stage_ids:
    - backtesting
  - id: finance-C-078
    when: When comparing strategies with different trade frequencies
    action: account for transaction costs proportionally to trade frequency when evaluating strategy profitability
    severity: high
    kind: operational_lesson
    modality: must
    consequence: High-frequency signal changes will incur disproportionate transaction costs that may exceed gross returns,
      causing strategies to appear profitable in backtest but lose money in live trading due to cost drag
    stage_ids:
    - backtesting
  - id: finance-C-079
    when: When configuring the portfolio combination method
    action: verify portfolio_combination method is explicitly set to 'sum', 'mean', or 'weighted' to avoid implicit equal-weighting
    severity: medium
    kind: operational_lesson
    modality: must
    consequence: Implicit mean weighting will silently combine signals with potentially unintended equal contribution, distorting
      risk-adjusted returns and causing position sizing that doesn't match the intended portfolio construction methodology
    stage_ids:
    - backtesting
  - id: finance-C-080
    when: When implementing sensitivity analysis with parameter sweeps
    action: reset trading model parameters after completing the sweep
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Without parameter reset, subsequent analyses use incorrectly overridden parameter values, causing corrupted
      P&L reports and invalid Sharpe ratios across model runs
    stage_ids:
    - analysis_reporting
  - id: finance-C-081
    when: When running strategy return statistics with finmarketpy engine
    action: save and restore SCALE_FACTOR after report generation
    severity: medium
    kind: domain_rule
    modality: must
    consequence: SCALE_FACTOR remains at 0.75 for subsequent plots, causing incorrectly scaled charts and potentially misleading
      visual representation of performance metrics
    stage_ids:
    - analysis_reporting
  - id: finance-C-082
    when: When importing TradeAnalysis for PyFolio-based reporting
    action: assume PyFolio is always available in the environment
    severity: high
    kind: resource_boundary
    modality: must_not
    consequence: Code crashes with ImportError when PyFolio is not installed, preventing report generation entirely
    stage_ids:
    - analysis_reporting
  - id: finance-C-083
    when: When configuring chart rendering engine
    action: specify a valid chart engine from the supported set
    severity: high
    kind: resource_boundary
    modality: must
    consequence: Invalid engine name causes chart generation failure, blocking all plot outputs including Sharpe ratios and
      drawdown visualizations
    stage_ids:
    - analysis_reporting
  - id: finance-C-084
    when: When conducting parameter sensitivity analysis
    action: verify parameter_list matches pretty_portfolio_names in length
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Index mismatch causes IndexError during sensitivity loop iteration, causing analysis to fail mid-sweep and
      produce incomplete CSV exports
    stage_ids:
    - analysis_reporting
  - id: finance-C-086
    when: When calculating Sharpe ratios from strategy returns
    action: annualize returns using sqrt of annualization factor
    severity: high
    kind: domain_rule
    modality: must
    consequence: Misaligned annualization produces misleading Sharpe ratios that misrepresent strategy performance, leading
      to incorrect investment decisions
    stage_ids:
    - analysis_reporting
  - id: finance-C-087
    when: When using parallel processing for sensitivity analysis
    action: check platform-specific thread configuration before enabling
    severity: medium
    kind: resource_boundary
    modality: should
    consequence: Incorrect thread count for platform causes poor parallel performance or resource exhaustion, slowing down
      parameter sweeps significantly
    stage_ids:
    - analysis_reporting
  - id: finance-C-088
    when: When generating drawdown metrics for reporting
    action: compute drawdowns from cumulative returns series without look-ahead
    severity: high
    kind: domain_rule
    modality: must
    consequence: Drawdown calculation using future information produces artificially optimistic risk metrics, misrepresenting
      actual historical drawdown exposure
    stage_ids:
    - analysis_reporting
  - id: finance-C-090
    when: When resampling time series for annualized statistics
    action: align resample_ann_factor with actual data frequency
    severity: high
    kind: domain_rule
    modality: must
    consequence: Mismatched annualization factor produces incorrect annualized returns and Sharpe ratios, distorting strategy
      comparison across different data frequencies
    stage_ids:
    - analysis_reporting
  - id: finance-C-091
    when: When exporting CSV data from sensitivity analysis
    action: verify DUMP_PATH directory exists before writing
    severity: medium
    kind: operational_lesson
    modality: must
    consequence: Missing output directory causes FileNotFoundError, preventing CSV export of requested statistics and blocking
      automated analysis pipelines
    stage_ids:
    - analysis_reporting
  - id: finance-C-092
    when: When computing Information Ratio for strategy comparison
    action: use returns excess over benchmark, not absolute returns
    severity: high
    kind: domain_rule
    modality: must
    consequence: IR calculation using absolute returns instead of excess returns misrepresents manager skill, leading to flawed
      strategy selection decisions
    stage_ids:
    - analysis_reporting
  - id: finance-C-093
    when: When creating multi-model comparison plots
    action: align strategy PnL time series by date before comparison
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Unaligned time series cause incorrect difference calculations in comparison plots, producing misleading relative
      performance charts
    stage_ids:
    - analysis_reporting
  - id: finance-C-094
    when: When processing day-of-month seasonality analysis
    action: resample to business days before calculating bus_day seasonality
    severity: medium
    kind: domain_rule
    modality: must
    consequence: Calendar day resampling includes non-trading days, distorting day-of-month seasonality patterns and leading
      to incorrect trading signal timing
    stage_ids:
    - analysis_reporting
  - id: finance-C-095
    when: When setting up report output path with timestamp
    action: include execution timestamp in output directory name
    severity: low
    kind: operational_lesson
    modality: should
    consequence: Without timestamp, repeated runs overwrite previous analysis outputs, losing historical comparison data and
      complicating audit trails
    stage_ids:
    - analysis_reporting
  - id: finance-C-096
    when: When running backtest analysis without NumPy float precision concerns
    action: ignore floating-point precision in Sharpe ratio calculations
    severity: medium
    kind: domain_rule
    modality: must_not
    consequence: Cumulative floating-point errors in long-running backtests distort final Sharpe ratio calculations, causing
      subtle performance misrepresentation
    stage_ids:
    - analysis_reporting
  - id: finance-C-097
    when: When filtering return statistics by date range
    action: apply plot_start and plot_finish filters after cumulative index calculation
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Premature filtering before cumulative calculation produces incomplete equity curves with incorrect starting
      values, misrepresenting historical performance
    stage_ids:
    - analysis_reporting
  - id: finance-C-098
    when: When calculating individual trade P&L from signals and returns
    action: use signal changes to identify trade boundaries, not daily returns
    severity: medium
    kind: domain_rule
    modality: must
    consequence: Trade identification using incorrect boundary markers splits trades at wrong points, producing inaccurate
      individual trade statistics and misleading win/loss ratios
    stage_ids:
    - analysis_reporting
  - id: finance-C-099
    when: When claiming the system can analyze any trading strategy
    action: overstate analysis capabilities without considering data quality requirements
    severity: medium
    kind: claim_boundary
    modality: must_not
    consequence: Analyzing strategies with insufficient historical data or missing price points produces unreliable Sharpe
      ratios and drawdown metrics that appear statistically valid but lack sufficient sample size
    stage_ids:
    - analysis_reporting
  - id: finance-C-101
    when: When implementing position sizing logic in the vol-targeting system
    action: Use vol_target=0.10 (10% annualized) and lookback_period=20 days as specified — these are the standard parameters
      for risk normalization across currency pairs
    severity: high
    kind: domain_rule
    modality: must
    consequence: Using non-standard vol target values changes position sizes proportionally, causing the strategy to either
      under-risk or over-risk across currency pairs with different volatilities, leading to inconsistent risk-adjusted returns
    derived_from_bd_id: BD-024
  - id: finance-C-103
    when: When implementing signal transitions between directional positions
    action: Use explicit zero assignment for neutral signals — do not leave signal value undefined or rely on implicit zero-crossing
      behavior during transitions
    severity: high
    kind: domain_rule
    modality: must
    consequence: Leaving signal transitions undefined creates ambiguity that can introduce silent directional bias, causing
      incorrect position assignments in backtesting and misalignment with live trading execution
    derived_from_bd_id: BD-003
  - id: finance-C-104
    when: When constructing equity curve or cumulative return index in backtesting
    action: Use multiplicative compounding mode (cum_index='mult') for cumulative index construction — this accurately represents
      real trading where profits and losses are reinvested proportionally
    severity: high
    kind: domain_rule
    modality: must
    consequence: Using additive cumulation mode produces negative index values and misrepresents portfolio growth, distorting
      drawdown severity and return distribution in backtest results
    derived_from_bd_id: BD-007
  - id: finance-C-105
    when: When configuring P&L scaling mode in backtestrequest
    action: Use cum_index='mult' (multiplicative) mode — multiplicative compounding scales P&L geometrically and correctly
      represents percentage returns with continuous compounding
    severity: high
    kind: domain_rule
    modality: must
    consequence: Using additive ('add') mode treats returns as linear increments, misrepresenting drawdown severity and return
      distribution, leading to incorrect risk assessment and strategy evaluation
    derived_from_bd_id: BD-074
  - id: finance-C-107
    when: When generating portfolio analytics and tear sheets
    action: Be aware that PyFolio integration is optional — the core backtesting engine functions without PyFolio, but advanced
      analytics (tear sheets) will be unavailable if PyFolio is not installed
    severity: medium
    kind: operational_lesson
    modality: should
    consequence: Attempting to generate PyFolio tear sheets without installation raises errors, but the error is handled gracefully
      and does not crash the backtesting engine
    derived_from_bd_id: BD-021
  - id: finance-C-108
    when: When calibrating volatility surfaces for FX options pricing
    action: Use Nelder-Mead optimization with Numba JIT compilation as the default solver — the Numba JIT acceleration is
      essential for the speed/accuracy tradeoff; alternative solvers without JIT will be significantly slower
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Using alternative solvers without Numba JIT causes calibration to run significantly slower, making real-time
      vol surface updates infeasible for backtesting workflows
    derived_from_bd_id: BD-012
  - id: finance-C-110
    when: When implementing rebalancing frequency logic for volatility targeting in backtesting
    action: Use business month end (BM) rebalancing frequency aligned with data frequency — BM provides monthly calibration
      for vol calculations without excessive transaction costs; weekly was rejected due to higher costs and quarterly was
      rejected as too infrequent
    severity: high
    kind: domain_rule
    modality: must
    consequence: Using incorrect rebalancing frequency (e.g., weekly or quarterly) desynchronizes position adjustments from
      vol calculation periods, causing inaccurate volatility targeting and mis-sized positions in backtesting
    derived_from_bd_id: BD-027
  - id: finance-C-111
    when: When selecting currency pairs for FX trend following strategies
    action: Use G10 USD crosses as the FX universe (7-10 pairs) — G10 represents the most liquid FX universe with sufficient
      volatility and data quality for trend following; EM currencies were rejected due to liquidity concerns
    severity: high
    kind: domain_rule
    modality: must
    consequence: Using non-G10 currencies (e.g., EM crosses) may introduce liquidity risk with wider spreads and data gaps,
      causing execution prices to deviate significantly from backtest assumptions
    derived_from_bd_id: BD-029
  - id: finance-C-112
    when: When constructing spot curves for FX carry calculations
    action: Use overnight (ON) deposit tenor as the default starting point for spot curve construction — ON deposits are the
      most liquid tenor representing true overnight carry cost for position funding; override only for emerging market currencies
      with illiquid ON markets
    severity: high
    kind: domain_rule
    modality: must
    consequence: Using 1W or 1M deposit as default adds complexity and less liquidity to carry interpretation, causing carry
      calculation errors that misrepresent true funding costs for FX positions
    derived_from_bd_id: BD-006
  - id: finance-C-113
    when: When constructing volatility surfaces for FX options pricing
    action: Use CLARK5 interpolation algorithm as the default vol surface interpolation method — CLARK5 provides smoother
      interpolation avoiding wing oscillation artifacts compared to cubic spline; override only for exotic surfaces with discontinuities
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Using cubic spline (BBG) interpolation introduces oscillation artifacts at the wings of the vol surface,
      causing incorrect option pricing especially for deep ITM/OTM strikes
    derived_from_bd_id: BD-010
  - id: finance-C-114
    when: When aligning asset price data with trading signals in backtesting
    action: Use left join with forward-fill (fill direction='left') for asset-signal alignment — left join preserves asset
      observations while forward-fill ensures alignment without introducing look-ahead bias; inner join drops assets without
      signals and right join drops signal observations
    severity: high
    kind: domain_rule
    modality: must
    consequence: Using inner join drops assets without signals causing survivorship bias; using right join drops signal observations;
      incorrect join type introduces data leakage or survivorship bias in backtest results
    derived_from_bd_id: BD-031
  - id: finance-C-115
    when: When calibrating volatility surface parameters for FX options pricing
    action: Use Nelder-Mead simplex solver for vol surface calibration — this derivative-free method handles noisy objective
      functions from market quotes without requiring gradient computation; L-BFGS-B requires gradients and Levenberg-Marquardt
      requires specific problem structure
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Gradient-based solvers may fail to converge on noisy vol surface calibration data, producing incorrect surface
      parameters and systematic option pricing errors
    derived_from_bd_id: BD-037
  - id: finance-C-116
    when: When configuring volatility targeting with maximum leverage limits in backtesting
    action: Verify that signal_vol_max_leverage=5x is configured correctly for vol-targeted strategies — this 5x cap prevents
      runaway leverage during high-volatility periods and reflects common regulatory limits; only override with explicit documentation
      and regulatory compliance review
    severity: high
    kind: operational_lesson
    modality: must
    consequence: Without proper leverage caps, high-volatility periods trigger extreme position sizing that can exceed account
      capacity and regulatory limits, causing catastrophic losses and compliance violations
    derived_from_bd_id: BD-016
  - id: finance-C-117
    when: When implementing signal delay configuration for execution timing in backtesting
    action: Verify that signal_delay=0 (same-bar execution) matches strategy timing assumptions — for close-to-close strategies
      requiring end-of-day signal evaluation, set delay=1 to avoid same-bar look-ahead; high-frequency strategies may use
      sub-day delays
    severity: medium
    kind: operational_lesson
    modality: should
    consequence: Same-bar execution (delay=0) may introduce subtle look-ahead bias if signal generation uses same-bar closing
      prices, causing backtest results to appear more favorable than actual trading would achieve
    derived_from_bd_id: BD-017
  - id: finance-C-118
    when: When calculating technical indicators with default fillna behavior
    action: Verify that fillna=True (default forward-fill of missing values) matches strategy requirements — for high-frequency
      or momentum strategies sensitive to stale data, set fillna=False to prevent carrying stale weekend/holiday prices across
      multiple days
    severity: medium
    kind: operational_lesson
    modality: should
    consequence: Forward-filling through weekends carries stale prices across multiple days, causing momentum strategies to
      hold outdated positions with artificially smoothed indicators and increased risk exposure
    derived_from_bd_id: BD-004
  - id: finance-C-119
    when: When processing signals across non-trading periods in backtesting
    action: Verify that forward-fill (ffill) strategy aligns with position management requirements — for momentum strategies
      requiring periodic rebalancing or signal decay, implement explicit signal decay logic instead of relying on ffill to
      hold positions through thin market days
    severity: medium
    kind: operational_lesson
    modality: should
    consequence: Forward-filling signals through non-trading periods maintains stale positions without rebalancing, causing
      momentum strategies to underperform during trending markets with periodic volatility spikes
    derived_from_bd_id: BD-005
  - id: finance-C-120
    when: When implementing cumulative return index calculation for backtesting
    action: Use Numba JIT compilation for total return index calculation — do not replace with pure Python loops or untested
      optimization approaches
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Pure Python loops are 10-100x slower than Numba JIT; for backtests with >1000 observations, this causes unacceptable
      runtime making iterative development impractical
    derived_from_bd_id: BD-033
  - id: finance-C-121
    when: When constructing FX volatility surfaces for option pricing
    action: Use CLARK5 interpolation for FX vol surface fitting — do not substitute with alternative methods without validating
      accuracy against industry standard
    severity: high
    kind: domain_rule
    modality: must
    consequence: CLARK5 provides smooth vol surface fitting across strikes and tenors; alternative methods may introduce pricing
      errors in vanilla FX options, causing systematic mispricing
    derived_from_bd_id: BD-034
  - id: finance-C-122
    when: When estimating implied vol addon for option pricing
    action: Use weighted median model for vol addon estimation — do not replace with mean or unweighted median as they are
      sensitive to outliers
    severity: high
    kind: domain_rule
    modality: must
    consequence: Weighted median is robust to outliers in vol addon estimation; using mean causes tail observations to distort
      addon values, leading to incorrect option pricing and delta hedging errors
    derived_from_bd_id: BD-042
  - id: finance-C-123
    when: When determining ATM strike for vanilla FX options pricing
    action: Use forward delta-neutral with premium adjustment method for ATM strike determination — do not use spot or forward
      ATM without premium adjustment
    severity: high
    kind: domain_rule
    modality: must
    consequence: Forward delta-neutral with premium adjustment ensures accurate delta hedging across the vol surface; using
      spot or forward ATM without premium adjustment causes delta hedging errors in FX options
    derived_from_bd_id: BD-011
  - id: finance-C-124
    when: When configuring volatility targeting for signal or portfolio level
    action: Explicitly verify and set vol_target parameter rather than relying on the 10% default — higher-volatility strategies
      (e.g., short-vol, carry) must explicitly increase the target
    severity: medium
    kind: operational_lesson
    modality: should
    consequence: Default 10% vol target is appropriate for moderate-risk FX trend-following but causes unlimited leverage
      risk for high-volatility strategies if not explicitly overridden
    derived_from_bd_id: BD-014
  - id: finance-C-125
    when: When clustering assets in network analysis
    action: Use affinity propagation algorithm for asset clustering — do not replace with K-means or hierarchical clustering
      that require pre-specifying cluster count
    severity: medium
    kind: architecture_guardrail
    modality: must
    consequence: Affinity propagation automatically determines cluster numbers without requiring pre-specified k; using K-means
      with arbitrary k would produce incorrect natural groupings in FX markets
    derived_from_bd_id: BD-053
  - id: finance-C-126
    when: When implementing Guppy Multiple Moving Average technical indicator
    action: Use exactly 12 EMA components with windows (3,5,7,9,11,14,21,28,35,42,49,56) for Guppy MMA — do not reduce component
      count or change EMA windows
    severity: high
    kind: domain_rule
    modality: must
    consequence: The fixed 12-component structure distinguishes rapid trend changes from sustained directional moves via short-group
      vs long-group crossovers; changing component count alters entry/exit signal boundaries
    derived_from_bd_id: BD-063
  - id: finance-C-127
    when: When configuring output paths for backtest analysis
    action: Override the default 'output_data/YYYYMMDD' path for production pipelines requiring consistent output paths —
      production systems must use deterministic paths
    severity: medium
    kind: operational_lesson
    modality: should
    consequence: Default timestamped paths prevent overwrites in iterative workflows but production pipelines require consistent
      paths for downstream processing; failing to override causes missing data in automated workflows
    derived_from_bd_id: BD-020
  - id: finance-C-128
    when: When implementing FX options roll logic for backtesting
    action: Trigger FX options rolling at actual expiry dates rather than calendar month-end — use 'expiry-date' roll event,
      not month-end
    severity: high
    kind: domain_rule
    modality: must
    consequence: Options gamma peaks at expiry; rolling at month-end misaligns with actual option lifecycle, causing unintended
      gamma exposure and misaligned variance-targeting in strategies
    derived_from_bd_id: BD-008
  - id: finance-C-129
    when: When constructing cross-currency pairs for return attribution
    action: Verify required cross rates are available when using 'no' (direct) mode; if unavailable, fallback to domestic
      currency conversion requires complete market data for the domestic currency
    severity: high
    kind: domain_rule
    modality: must
    consequence: Direct triangulation ('no' mode) requires all required cross rates; using it with missing rates produces
      incorrect USD-denominated P&L attribution across currency exposures
    derived_from_bd_id: BD-009
  - id: finance-C-130
    when: When implementing rolling technical indicators in signal generation
    action: Set first n-1 periods to NaN to create warmup periods equal to indicator period — verify rolling indicators require
      full lookback window before generating signals
    severity: high
    kind: domain_rule
    modality: must
    consequence: Without NaN warmup periods, rolling indicators generate unstable signals from incomplete history, causing
      look-ahead bias where signals use future information not available at signal time
    derived_from_bd_id: BD-045
  - id: finance-C-131
    when: When implementing RSI momentum calculation for trading signals
    action: Use RSI period of 14 (Wilder's original specification) — verify the period meets minimum of 1 and balances signal
      stability with responsiveness for momentum signals
    severity: medium
    kind: operational_lesson
    modality: should
    consequence: Using non-standard RSI period causes signal characteristics to deviate from historical performance data validated
      with 14-period settings, leading to unreliable momentum signals
    derived_from_bd_id: BD-047
  - id: finance-C-132
    when: When implementing volatility-adjusted signals using ATR
    action: Use ATR period of 14 (Wilder's original specification) — verify the period meets minimum of 1 and captures approximately
      two weeks of daily price action
    severity: medium
    kind: operational_lesson
    modality: should
    consequence: Using non-standard ATR period causes volatility-adjusted signals to be either too noisy or too slow, reducing
      signal quality and strategy performance for volatility-managed trades
    derived_from_bd_id: BD-048
  - id: finance-C-133
    when: When implementing event study logic for economic releases
    action: Set NYC 10am cutoff for economic releases (8:30am ET typical release time) — verify timezone conversion uses America/New_York
      local market time to capture same-day market response
    severity: medium
    kind: operational_lesson
    modality: should
    consequence: Incorrect cutoff time causes event study to misalign economic releases with market price responses, leading
      to incorrect event attribution and distorted strategy performance metrics
    derived_from_bd_id: BD-055
  - id: finance-C-134
    when: When implementing volatility calculation for realized volatility using rolling window
    action: Use tenor_days as the rolling window for realized volatility on daily data — verify window scales proportionally
      with instrument maturity to maintain consistency with day count conventions
    severity: high
    kind: domain_rule
    modality: must
    consequence: Using fixed rolling window instead of tenor_days causes volatility estimates to be inconsistent across instruments
      with different maturities, leading to incorrect risk assessment and strategy performance degradation
    derived_from_bd_id: BD-056
  - id: finance-C-135
    when: When implementing volatility-targeting position sizing logic in backtesting
    action: Apply max leverage cap (typically 1.0-2.0x) when scaling positions based on realized volatility — the cap prevents
      unbounded position growth during low-volatility periods that would otherwise exceed prudent risk limits
    severity: high
    kind: domain_rule
    modality: must
    consequence: Without a max leverage cap, volatility-targeting produces oversized positions during low-volatility regimes,
      amplifying losses when volatility reverts and potentially exceeding available capital or margin limits
    derived_from_bd_id: BD-062
  - id: finance-C-136
    when: When implementing trade exit logic in backtesting
    action: Enforce stop-loss and take-profit levels to bound PnL distributions per trade — positions must be closed when
      price reaches the stop-loss level (capping maximum loss) or take-profit level (locking predetermined gains)
    severity: high
    kind: domain_rule
    modality: must
    consequence: Without stop-loss and take-profit enforcement, trades remain open indefinitely, producing unbounded loss
      potential and making backtest results incompatible with live trading discipline where positions require manual or automated
      exit
    derived_from_bd_id: BD-065
  - id: finance-C-137
    when: When implementing position sizing and capital allocation logic in backtesting
    action: Apply position clipping to enforce hard limits on net exposure (directional risk) and gross exposure (sum of absolute
      positions) — prevent positions from exceeding predetermined risk budgets that bound margin requirements and counterparty
      exposure
    severity: high
    kind: domain_rule
    modality: must
    consequence: Without position clipping, favorable signals drive unbounded position accumulation that exceeds available
      capital, causing margin calls in live trading and making backtest results unreproducible due to capital constraint violations
    derived_from_bd_id: BD-066
  - id: finance-C-138
    when: When pricing FX vanilla options using Black-Scholes model
    action: Apply Black-Scholes for instruments with significant volatility smile or skew — the constant-vol assumption under
      log-normal dynamics systematically misprices deep OTM options where smile effects are pronounced, causing 5-15% pricing
      errors on risk-reversal strategies
    severity: high
    kind: domain_rule
    modality: must_not
    consequence: Black-Scholes systematically underprices out-of-the-money puts and overprices out-of-the-money calls when
      volatility smile exists, causing hedge ratios and premium estimates to deviate significantly from market-quoted prices
    derived_from_bd_id: BD-068
  - id: finance-C-140
    when: When using the framework's default premium output format for FX option pricing
    action: Verify that fx_options_premium_output='pct_for' matches user expectations for G10 pairs — if pct_dom or abs format
      is required for specific use cases, explicitly override the default parameter to avoid misinterpretation of premium
      quotes
    severity: medium
    kind: operational_lesson
    modality: should
    consequence: Default pct-for format expresses premium as percentage of foreign currency notional, which may be misinterpreted
      as domestic-currency percentage for pairs with significant cross rates, causing hedge ratio and PnL calculation errors
    derived_from_bd_id: BD-061
  - id: finance-C-141
    when: When implementing options roll management in the framework
    action: Use expiry-date roll event for options rolling — roll_event must be one of 'expiry', 'delivery', or 'value'; expiry
      is the standard for FX options per the standardized expiration calendar
    severity: high
    kind: domain_rule
    modality: must
    consequence: Using non-expiry roll events for FX options causes positions to miss the standardized expiration calendar,
      potentially resulting in unwanted physical delivery for cash-settled instruments or incorrect position tracking
    derived_from_bd_id: BD-057
  - id: finance-C-142
    when: When implementing options roll timing logic in the framework
    action: Set roll_days parameter to 5 as the pre-roll buffer — roll_days must be >= 1; 5 days provides optimal buffer for
      execution without excessive stale positions or last-minute execution risk
    severity: high
    kind: domain_rule
    modality: must
    consequence: Using fewer than 5 roll days creates tight execution windows that risk missing the roll event due to execution
      delays; using more than 5 days leaves positions stale too early, reducing carry returns unnecessarily
    derived_from_bd_id: BD-058
  - id: finance-C-143
    when: When implementing FX implied volatility surface interpolation for pricing and risk management
    action: Use polynomial interpolation or Clark5 model for FX implied vol surface — these methods provide smooth interpolation
      between discrete strike-tenor nodes and respect theoretical no-arbitrage structure
    severity: high
    kind: domain_rule
    modality: must
    consequence: Using cubic splines or SABR for vol surface interpolation introduces arbitrage violations in the wings, causing
      economically invalid prices that lead to systematic mispricing and incorrect risk calculations
    derived_from_bd_id: BD-064
  - id: finance-C-144
    when: When implementing covariance estimation for financial network analysis with high-dimensional asset sets
    action: Use Graphical Lasso for sparse covariance estimation — the L1 penalty on precision matrix produces sparse estimates
      essential for revealing genuine conditional relationships and reducing overfitting
    severity: high
    kind: domain_rule
    modality: must
    consequence: Using sample covariance or Ledoit-Wolf shrinkage produces dense matrices that overfit to historical noise
      in high-dimensional settings, causing poor out-of-sample portfolio performance and incorrect risk attribution
    derived_from_bd_id: BD-067
  - id: finance-C-145
    when: When implementing the backtesting workflow that combines signal generation with portfolio optimization
    action: Apply stop loss and take profit signals to returns_df BEFORE calling optimize_portfolio_weights — stop loss/take
      profit constraints modify the return distribution and must be applied pre-optimization
    severity: high
    kind: domain_rule
    modality: must
    consequence: Applying stop loss/take profit after optimization uses unconstrained returns for weight calculation, then
      projects constraints onto already-optimized weights, violating optimality conditions and producing suboptimal portfolios
    derived_from_bd_id: BD-072
  - id: finance-C-146
    when: When using the framework's default annualization factor for risk metric calculations
    action: Verify that ann_factor=252 matches the trading calendar assumption (~21 trading days/month) for the target market;
      adjust to 250 for European markets or 252 for US equity/FX if needed
    severity: medium
    kind: operational_lesson
    modality: should
    consequence: Using 252 for markets with different trading day conventions (e.g., 250 for some European exchanges) systematically
      overstates annualized Sharpe ratios by 0.8%, leading to flawed strategy selection based on misleading performance metrics
    derived_from_bd_id: BD-026
  - id: finance-C-147
    when: When implementing or refactoring execution direction logic for EURUSD single-currency strategies
    action: Enforce only_allow_longs=True to restrict execution to long positions only; must reject or filter any short signals
      before order generation
    severity: high
    kind: domain_rule
    modality: must
    consequence: Removing the longs-only constraint allows short positions which introduce funding costs, overnight borrowing
      fees, and counterparty risk not accounted for in the EURUSD baseline strategy design, causing live P&L to diverge significantly
      from backtest
    derived_from_bd_id: BD-028
  - id: finance-C-148
    when: When implementing data filling logic for multi-asset backtesting with spot and carry data
    action: Apply forward-fill (FFILL) only to carry rates, deposit rates, and yield curve data; spot prices must use the
      last known traded price without forward-filling across missing values
    severity: high
    kind: domain_rule
    modality: must_not
    consequence: Forward-filling spot prices creates look-ahead bias where future prices are used retroactively in historical
      backtests; this causes strategy signals to appear earlier than they could in live trading, systematically overstating
      returns by 2-5% in volatile periods
    derived_from_bd_id: BD-032
  - id: finance-C-149
    when: When pricing FX options and determining at-the-money strike for delta hedging calculations
    action: 'Use forward delta neutral ATM method: calculate ATM strike as forward-adjusted spot (spot * exp(rate_diff * tenor)),
      not spot delta neutral; verify fx_options_atm_method is set to ''forward_delta_neutral'' for major currency pairs'
    severity: high
    kind: domain_rule
    modality: must
    consequence: Using spot delta neutral ATM instead of forward delta neutral systematically misprices options by ignoring
      interest rate differential; for high-yielding currency pairs, this causes 3-8% mispricing in option premiums, leading
      to incorrect hedge ratios and P&L attribution errors
    derived_from_bd_id: BD-035
  - id: finance-C-151
    when: When recording trading events and backtesting results for audit or reproduction
    action: Assume the framework provides an immutable event log — the framework does not implement audit trail functionality;
      events can be modified or deleted after execution
    severity: high
    kind: claim_boundary
    modality: must_not
    consequence: Without immutable event logging, backtest results cannot be audited or reproduced; regulatory compliance
      requirements for trading systems are violated and forensic analysis becomes impossible
    derived_from_bd_id: BD-GAP-002
  - id: finance-C-152
    when: When implementing event logging for trading and backtesting operations
    action: Implement append-only event storage with cryptographic integrity checks; log entries must include timestamp, event_type,
      payload hash, and previous_entry_hash to detect tampering
    severity: high
    kind: domain_rule
    modality: must
    consequence: Implementing immutable event logging ensures full auditability and reproducibility of backtest results; regulators
      and internal auditors can verify strategy execution, and disputes can be resolved with cryptographic evidence
    derived_from_bd_id: BD-GAP-002
output_validator:
  assertions:
  - id: OV-01
    check_predicate: all(p in inspect.getsource(zvt.factors.algorithm.macd) for p in ['slow=26', 'fast=12', 'n=9'])
    failure_message: 'FATAL: MACD params drifted from (fast=12, slow=26, n=9) — SL-08 violation, non-reproducible signals'
    business_meaning: Standard MACD parameters are a semantic lock; drift makes results incomparable with industry-standard
      indicators and non-reproducible.
    source_ids:
    - SL-08
    - BD-036
  - id: OV-02
    check_predicate: result.get('total_trades', 0) > 0 or result.get('explicit_zero_trade_ack') is True
    failure_message: Zero trades executed — likely missing pre-fetched data (see PC-02) or over-restrictive filters
    business_meaning: A backtest with zero trades is not a valid result; either data is missing or the strategy never triggered.
      Structural non-emptiness check is insufficient — we need business confirmation.
    source_ids:
    - SL-01
    - finance-C-073
  - id: OV-03
    check_predicate: result.get('annual_return') is None or abs(float(result['annual_return'])) <= 5.0
    failure_message: 'FATAL: |annual_return| > 500% — likely look-ahead bias or data error'
    business_meaning: Annual returns exceeding 500% are physically implausible for A-share strategies; indicates look-ahead
      bias or corrupt data.
    source_ids: []
  - id: OV-04
    check_predicate: result.get('holding_change_pct') is None or abs(float(result['holding_change_pct'])) <= 1.0
    failure_message: 'FATAL: |holding_change_pct| > 100% — physically impossible'
    business_meaning: Holding change percentage cannot exceed 100%; violation indicates position accounting error.
    source_ids:
    - BD-029
  - id: OV-05
    check_predicate: result.get('max_drawdown') is None or abs(float(result['max_drawdown'])) <= 1.0
    failure_message: 'FATAL: |max_drawdown| > 100% — impossible for non-leveraged account'
    business_meaning: Maximum drawdown cannot exceed 100% without leverage; violation indicates calculation error or look-ahead
      bias.
    source_ids: []
  - id: OV-06
    check_predicate: not (hasattr(result, 'trade_log') and result.trade_log and any(result.trade_log[i].action == 'sell' and
      i+1 < len(result.trade_log) and result.trade_log[i+1].action == 'buy' and result.trade_log[i].timestamp == result.trade_log[i+1].timestamp
      for i in range(len(result.trade_log)-1)))
    failure_message: 'FATAL: buy-before-sell detected in same cycle — SL-01 violation, creates implicit leverage'
    business_meaning: SL-01 requires sell() before buy() in each cycle; violation means available_long was not updated before
      buying, risking duplicate positions.
    source_ids:
    - SL-01
  scaffold:
    validate_py_path: '{workspace}/validate.py'
    tail_block: "# === DO NOT MODIFY BELOW THIS LINE ===\nif __name__ == \"__main__\":\n    result = run_backtest()\n    from\
      \ validate import enforce_validation\n    enforce_validation(result, output_path=\"{workspace}/result.csv\")\n# ===\
      \ END DO NOT MODIFY ==="
  enforcement_protocol: 1. Never edit validate.py. 2. Never delete the DO NOT MODIFY tail block from the main script. 3. Never
    wrap enforce_validation() in try/except. 4. Never rewrite result write logic — it MUST go through enforce_validation.
    5. If validate.py raises ImportError, fix the dependency, do not remove the call.
acceptance:
  hard_gates:
  - id: G1
    check: '{workspace}/result.csv exists AND file size > 0'
    on_fail: Strategy did not produce output; check run_backtest() return value and enforce_validation() call
  - id: G2
    check: '{workspace}/result.csv.validation_passed marker file exists'
    on_fail: Validation did not complete; review validate.py output and fix assertion failures
  - id: G3
    check: 'Main script contains literal: from validate import enforce_validation'
    on_fail: Validation chain stripped; re-add the import in the DO NOT MODIFY block
  - id: G4
    check: 'Main script contains literal: # === DO NOT MODIFY BELOW THIS LINE ==='
    on_fail: Validation fence removed; regenerate DO NOT MODIFY tail block
  - id: G5
    check: 'result.csv has at least 1 row: pandas.read_csv(result.csv).shape[0] >= 1'
    on_fail: Empty result; check if trade_log is non-empty and factors generated signals. Confirm PC-02 (k-data exists) passed.
  - id: G6
    check: 'If MACD strategy: source contains ''slow=26'' AND ''fast=12'' AND ''n=9'' in algorithm call'
    on_fail: MACD params drifted from SL-08 lock; restore standard (12, 26, 9)
  - id: G7
    check: 'For data pipeline tasks: result.csv contains ''entity_id'' and ''timestamp'' fields'
    on_fail: Missing required columns; check Mixin.query_data return schema and DataFrame MultiIndex reset_index() before
      writing
  - id: G8
    check: 'OV-03 passes: abs(annual_return) <= 5.0 (500%)'
    on_fail: Physical plausibility check failed; investigate look-ahead bias or data corruption in input kdata
  soft_gates:
  - id: SG-01
    rubric: 'Strategy narrative consistency: user intent aligns with generated strategy.py logic. dim_a: signal direction
      (buy/sell) matches intent [1-5, pass>=4]; dim_b: frequency (daily/intraday) aligns [1-5, pass>=4]; dim_c: risk controls
      match user intent [1-5, pass>=4].'
  - id: SG-02
    rubric: 'Factor combination quality. dim_a: no highly correlated factor duplication [1-5, pass>=4]; dim_b: multi-period
      alignment correct [1-5, pass>=4]; dim_c: liquidity filter present for A-share [1-5, pass>=4].'
  - id: SG-03
    rubric: 'Data source selection appropriateness. dim_a: coverage sufficient for target entities [1-5, pass>=4]; dim_b:
      provider latency acceptable for strategy frequency [1-5, pass>=4]; dim_c: no unauthorized provider used without credentials
      [1-5, pass>=4].'
skill_crystallization:
  trigger: all_hard_gates_passed AND user_opt_out_skill_saving != true
  output_path_template: '{workspace}/../skills/{slug}.skill'
  slug_template: '{blueprint_id_short}-{uc_id_lower}'
  captured_fields:
  - name
  - intent_keywords
  - entry_point_script
  - validate_script
  - fatal_constraints
  - spec_locks
  - preconditions
  - install_recipes
  - human_summary_translated
  action: 'After all Hard Gates PASS, resolve slug via slug_template using the executed UC, then write the .skill YAML file
    at output_path_template. Notify user in their detected locale: ''Skill saved as {slug}.skill — next time say one of {sample_triggers}
    from the matched UC to invoke directly.'''
  violation_signal: All hard gates passed but no .skill file exists at expected path
  skill_file_schema:
    name: finance-bp-108 / ArcticDB Tick Data Storage
    version: v5.3
    intent_keywords:
    - arcticdb
    - tick data storage
    - time series database
    - lmdb
    - market data persistence
    entry_point: run_backtest
    fatal_guards:
    - SL-01
    - SL-02
    - SL-03
    - SL-04
    - SL-05
    - SL-06
    - SL-07
    - SL-08
    - SL-10
    - SL-11
    - SL-12
    spec_locks:
    - SL-01
    - SL-02
    - SL-03
    - SL-04
    - SL-05
    - SL-06
    - SL-07
    - SL-08
    - SL-09
    - SL-10
    - SL-11
    - SL-12
    preconditions:
    - PC-01
    - PC-02
    - PC-03
    - PC-04
post_install_notice:
  trigger: skill_installation_complete
  message_template:
    positioning: I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow.
    capability_catalog:
      group_strategy:
        source: auto_grouped
        strategy_reason: auto-grouped by UC.type (2 distinct values, balanced distribution)
      groups:
      - group_id: data_pipeline
        name: Data Pipeline
        description: ''
        emoji: 📊
        uc_count: 3
        ucs:
        - uc_id: UC-101
          name: ArcticDB Tick Data Storage
          short_description: Provides persistent storage for high-frequency tick market data using ArcticDB, supporting both
            local LMDB and S3 cloud storage backends for efficient
          sample_triggers:
          - arcticdb
          - tick data storage
          - time series database
        - uc_id: UC-103
          name: Market Data Fetching from Vendors
          short_description: 'Fetches economic and financial market data from external vendors like Quandl, demonstrating
            how to request and cache market data with specific fields '
          sample_triggers:
          - market data
          - quandl
          - fetch data
        - uc_id: UC-104
          name: S3 Cloud Storage for Tick Data
          short_description: Demonstrates writing and reading tick market data to/from AWS S3 cloud storage using Parquet
            format for efficient compression and retrieval of histori
          sample_triggers:
          - s3 storage
          - aws
          - parquet
      - group_id: trading_strategy
        name: Trading Strategy
        description: ''
        emoji: 📦
        uc_count: 1
        ucs:
        - uc_id: UC-102
          name: FX G10 Cross Backtesting
          short_description: Enables historical backtesting of FX trading strategies using G10 currency pairs with technical
            indicator-based signal generation to evaluate strategy
          sample_triggers:
          - backtest
          - fx trading
          - g10 currency
    call_to_action: Tell me which one you want to try.
    featured_entries:
    - uc_id: UC-101
      beginner_prompt: Try arcticdb tick data storage
      auto_selected: true
    - uc_id: UC-102
      beginner_prompt: Try fx g10 cross backtesting
      auto_selected: true
    - uc_id: UC-103
      beginner_prompt: Try market data fetching from vendors
      auto_selected: true
    more_info_hint: Ask me 'what else can you do?' to see all 4 capabilities.
  locale_rendering:
    instruction: On skill_installation_complete, translate ALL user-facing strings (positioning + capability_catalog.groups[].name
      + capability_catalog.groups[].description + capability_catalog.groups[].ucs[].short_description + call_to_action + featured_entries[].beginner_prompt
      + more_info_hint) into detected user locale per locale_contract. Preserve UC-IDs, group_id, emoji, and sample_triggers
      verbatim.
    preserve_verbatim:
    - UC-IDs
    - group_id
    - emoji
    - sample_triggers
    - technical_class_names
  enforcement:
    action: 'Host agent MUST send composed message to user as the FIRST user-facing response after skill_installation_complete
      event. Message MUST contain: positioning, capability_catalog (rendered as markdown tables per group), 3 featured_entries,
      call_to_action, and more_info_hint.'
    violation_code: PIN-01
    violation_signal: First user-facing message post-install does not contain the full capability_catalog (all UCs grouped)
      OR skips featured_entries OR skips call_to_action.
human_summary:
  persona: Doraemon
  what_i_can_do:
    tagline: 'I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow. Just tell me
      what you want; I''ll write the code, you don''t have to dig docs. (Heads up: ZVT natively supports A-share, HK, and
      crypto. US stocks — stockus_nasdaq_AAPL — are half-baked; don''t bother for serious work.)'
    use_cases:
    - Market Data Fetching from Vendors
    - FX G10 Cross Backtesting
    - ArcticDB Tick Data Storage
    - A-share MACD daily golden-cross backtest with hfq price adjustment from eastmoney
    - 'End-to-end ZVT pipeline: FinanceRecorder + GoodCompanyFactor + StockTrader'
    - Multi-factor strategy with TargetSelector (AND mode) combining MACD + volume breakout
    - Index composition data collection (SZ1000, SZ2000) with EM recorder
  what_i_auto_fetch:
  - ZVT stage pipeline structure (data_collection → visualization) from LATEST.yaml
  - Semantic locks (SL-01 through SL-12) — especially sell-before-buy ordering and MACD params
  - Fatal constraints (finance-C-*) relevant to your target strategy type
  - 'Default parameters: MACD(12,26,9), hfq adjustment, buy_cost=0.001, base_capital=1M CNY'
  - Entity ID format (stock_sh_600000) and DataFrame MultiIndex convention
  - Provider-specific recorder class names and required class attributes
  what_i_ask_you:
  - 'Target market: A-share (default), HK, or crypto? (US stocks in ZVT are half-baked — stockus_nasdaq_AAPL exists but coverage
    is thin)'
  - 'Data source / provider: eastmoney (free, no account), joinquant (account+paid), baostock (free, good history), akshare,
    or qmt (broker)?'
  - 'Strategy type: MACD golden-cross, MA crossover, volume breakout, fundamental screen, or custom factor?'
  - 'Time range: start_timestamp and end_timestamp for backtest period'
  - 'Target entity IDs: specific stocks (stock_sh_600000) or index components (SZ1000)?'
  locale_rendering:
    instruction: On first user contact, translate all fields above into detected user locale while preserving Doraemon persona
      (direct, frank, mildly snarky, knows limits).
    preserve_verbatim:
    - BD-IDs
    - SL-IDs
    - UC-IDs
    - finance-C-IDs
    - class_names
    - function_names
    - file_paths
    - numeric_thresholds

ClawHub Coding Cloud+2

T@clawhub-tangweigang-jpg-8679fec286

Cryptofeed Ws Feeds

Skill

实时获取多个加密货币交易所的市场数据流，支持异步回调处理并将交易、行情、订单簿等数据持久化到ArcticDB时序数据库。

---
name: cryptofeed-ws-feeds
description: |-
  实时获取多个加密货币交易所的市场数据流，支持异步回调处理并将交易、行情、订单簿等数据持久化到ArcticDB时序数据库。
license: Proprietary. See LICENSE.txt in project root.
compatibility: Designed for Doramagic-host ecosystem (Claude Code / openclaw / Cursor). Requires Python 3.12+ with uv package manager.
metadata:
  version: "v6.1"
  blueprint_id: "finance-bp-110"
  compiled_at: "2026-04-22T13:00:52.892309+00:00"
  capability_markets: "crypto"
  capability_activities: "crypto-trading"
  sop_version: "crystal-compilation-v6.1"
---
# 加密货币实时行情 (cryptofeed-ws-feeds)

> 实时获取多个加密货币交易所的市场数据流，支持异步回调处理并将交易、行情、订单簿等数据持久化到ArcticDB时序数据库。

## Pipeline

`data_collection -> data_storage -> factor_computation -> target_selection -> trading_execution -> visualization`

## Top Use Cases (40 total)

### General Callback Handler Demo (`UC-101`)
Demonstrates how to define and use async callback handlers for receiving real-time market data updates from cryptocurrency exchanges
**Triggers**: callback handler, ticker callback, async handler

### ArcticDB Data Storage (`UC-102`)
Stores cryptocurrency trade, funding, and ticker data to ArcticDB (Arctic) time-series database for persistence and later analysis
**Triggers**: ArcticDB, arctic storage, time series database

### Bequant/HitBTC Exchange Features (`UC-103`)
Demonstrates each supported features (ticker, trades, order book, candles) for Bequant and HitBTC exchanges which share the same API
**Triggers**: Bequant, HitBTC, Bitcoin.com exchange

For all **40** use cases, see [references/USE_CASES.md](references/USE_CASES.md).

**Execute trigger**: `When user intent matches intent_router.uc_entries[].positive_terms AND user uses action verb (run/execute/跑/执行/backtest/fetch/collect)`

## What I'll Ask You

- Target market: A-share (default), HK, or crypto? (US stocks in ZVT are half-baked — stockus_nasdaq_AAPL exists but coverage is thin)
- Data source / provider: eastmoney (free, no account), joinquant (account+paid), baostock (free, good history), akshare, or qmt (broker)?
- Strategy type: MACD golden-cross, MA crossover, volume breakout, fundamental screen, or custom factor?
- Time range: start_timestamp and end_timestamp for backtest period
- Target entity IDs: specific stocks (stock_sh_600000) or index components (SZ1000)?

## Semantic Locks (Fatal)

| ID | Rule | On Violation |
|---|---|---|
| `SL-01` | Execute sell orders before buy orders in every trading cycle | halt |
| `SL-02` | Trading signals MUST use next-bar execution (no look-ahead) | halt |
| `SL-03` | Entity IDs MUST follow format entity_type_exchange_code | halt |
| `SL-04` | DataFrame index MUST be MultiIndex (entity_id, timestamp) | halt |
| `SL-05` | TradingSignal MUST have EXACTLY ONE of: position_pct, order_money, order_amount | halt |
| `SL-06` | filter_result column semantics: True=BUY, False=SELL, None/NaN=NO ACTION | halt |
| `SL-07` | Transformer MUST run BEFORE Accumulator in factor pipeline | halt |
| `SL-08` | MACD parameters locked: fast=12, slow=26, signal=9 | halt |

Full lock definitions: [references/LOCKS.md](references/LOCKS.md)

## Top Anti-Patterns (13 total)

- **`AP-CRYPTO-TRADING-001`**: Float Arithmetic for Monetary Values
- **`AP-CRYPTO-TRADING-002`**: Missing Market Initialization Before Access
- **`AP-CRYPTO-TRADING-003`**: Bypassing API Facade Layer

All 13 anti-patterns: [references/ANTI_PATTERNS.md](references/ANTI_PATTERNS.md)

## Evidence Quality Notice

> [QUALITY NOTICE] This crystal was compiled from blueprint finance-bp-110. Evidence verify ratio = 53.1% and audit fail total = 18. Generated results may have uncaptured requirement gaps. Verify critical decisions against source files (LATEST.yaml / LATEST.jsonl).

## Reference Files

| File | Contents | When to Load |
|---|---|---|
| [references/seed.yaml](references/seed.yaml) | V6+ 全量权威 (source-of-truth) | 有行为/决策争议时必读 |
| [references/ANTI_PATTERNS.md](references/ANTI_PATTERNS.md) | 13 条跨项目反模式 | 开始实现前 |
| [references/WISDOM.md](references/WISDOM.md) | 跨项目精华借鉴 | 架构决策时 |
| [references/CONSTRAINTS.md](references/CONSTRAINTS.md) | domain + fatal 约束 | 规则冲突时 |
| [references/USE_CASES.md](references/USE_CASES.md) | 全量 KUC-* 业务场景 | 需要完整示例时 |
| [references/LOCKS.md](references/LOCKS.md) | SL-* + preconditions + hints | 生成回测/交易代码前 |
| [references/COMPONENTS.md](references/COMPONENTS.md) | AST 组件地图（按 module 拆分）| 查 API 时 |

---

*Compiled by Doramagic crystal-compilation-v6.1 from `finance-bp-110` blueprint at 2026-04-22T13:00:52.892309+00:00.*
*See [human_summary.md](human_summary.md) for non-technical overview.*

FILE:human_summary.md
# Human Summary

> I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow. Just tell me what you want; I'll write the code, you don't have to dig docs. (Heads up: ZVT natively supports A-share, HK, and crypto. US stocks — stockus_nasdaq_AAPL — are half-baked; don't bother for serious work.)

## What I Can Do

- **tagline**: I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow. Just tell me what you want; I'll write the code, you don't have to dig docs. (Heads up: ZVT natively supports A-share, HK, and crypto. US stocks — stockus_nasdaq_AAPL — are half-baked; don't bother for serious work.)
- **use_cases**: ['Bequant/HitBTC Exchange Features', 'ArcticDB Data Storage', 'General Callback Handler Demo', 'A-share MACD daily golden-cross backtest with hfq price adjustment from eastmoney', 'End-to-end ZVT pipeline: FinanceRecorder + GoodCompanyFactor + StockTrader', 'Multi-factor strategy with TargetSelector (AND mode) combining MACD + volume breakout', 'Index composition data collection (SZ1000, SZ2000) with EM recorder']

## What I Ask You

- Target market: A-share (default), HK, or crypto? (US stocks in ZVT are half-baked — stockus_nasdaq_AAPL exists but coverage is thin)
- Data source / provider: eastmoney (free, no account), joinquant (account+paid), baostock (free, good history), akshare, or qmt (broker)?
- Strategy type: MACD golden-cross, MA crossover, volume breakout, fundamental screen, or custom factor?
- Time range: start_timestamp and end_timestamp for backtest period
- Target entity IDs: specific stocks (stock_sh_600000) or index components (SZ1000)?

FILE:references/ANTI_PATTERNS.md
# Anti-Patterns (Cross-Project)

Total: **13**

## ccxt (1)

### `AP-CRYPTO-TRADING-002` — Missing Market Initialization Before Access <sub>(high)</sub>

Attempting to access market data via symbol lookups before load_markets() is called leaves self.markets empty, causing KeyError or BadSymbol exceptions on all trading operations and data retrieval. This breaks the entire trading workflow at the first market interaction.

## cryptofeed (3)

### `AP-CRYPTO-TRADING-009` — Applying Order Book Deltas Before Snapshot <sub>(high)</sub>

Processing order book delta messages before receiving a snapshot for the symbol applies updates to an uninitialized or stale book state. Price levels are incorrectly added/removed, corrupting the local book representation with no way to recover without full reset.

### `AP-CRYPTO-TRADING-010` — Silent HTTP Error Handling <sub>(medium)</sub>

Ignoring non-200 HTTP response status codes without raising exceptions causes silent failures for data requests. Market data is missing or corrupted, failed requests are not retried, and downstream consumers receive incomplete data with no indication of failure.

### `AP-CRYPTO-TRADING-011` — Missing Sequence Number Validation <sub>(medium)</sub>

Not validating that order book sequence numbers increment by exactly 1 allows out-of-order or missing messages to corrupt local book state. Stale or incorrect price levels persist in the book, leading to wrong trading signals and corrupted market depth data.

## hummingbot (5)

### `AP-CRYPTO-TRADING-005` — Unvalidated Collateral for Order Execution <sub>(high)</sub>

Submitting orders without checking collateral requirements including order cost, percent fees, and fixed fees against available balance causes orders to exceed margin. This triggers immediate liquidation or forced position closure at unfavorable prices with partial or total loss of collateral.

### `AP-CRYPTO-TRADING-006` — Close Order Placed Before Open Order Fills <sub>(high)</sub>

Placing a close order before verifying the open order is fully filled causes mismatched position sizes. The executor attempts to close a larger or smaller position than actually exists, leading to unintended directional exposure and potential losses exceeding the configured risk parameters.

### `AP-CRYPTO-TRADING-007` — Arbitrage Across Non-Interchangeable Tokens <sub>(high)</sub>

Executing arbitrage trades between tokens that appear similar but are not interchangeable causes permanent loss of funds. The received tokens cannot be used to close the opposing position, stranding capital and creating one-sided exposure with no recovery path.

### `AP-CRYPTO-TRADING-008` — Skipping Triple Barrier Evaluations <sub>(high)</sub>

Omitting control_stop_loss, control_take_profit, or control_time_limit calls in the control_barriers cycle leaves positions unprotected. Losses exceed configured thresholds as barrier checks never trigger, positions remain open beyond risk tolerance, resulting in amplified losses.

### `AP-CRYPTO-TRADING-012` — Wrong Position Key for Perpetual Modes <sub>(medium)</sub>

Using trading_pair only as the position key in HEDGE mode causes different position sides to collide and overwrite each other. Position tracking becomes incorrect, leading to wrong order matching and potential financial loss when the system misidentifies position direction.

## rotki (3)

### `AP-CRYPTO-TRADING-003` — Bypassing API Facade Layer <sub>(high)</sub>

Directly accessing internal service methods without routing through the RestAPI facade bypasses authentication, task tracking, and error handling mechanisms. Anonymous requests can execute privileged operations, creating critical security vulnerabilities where unauthorized users access sensitive financial data or execute trades.

### `AP-CRYPTO-TRADING-004` — Non-Checksummed EVM Addresses <sub>(high)</sub>

Passing lowercase or mixed-case Ethereum addresses to RPC nodes causes InvalidAddress exceptions since nodes enforce EIP-55 checksum format. This results in RemoteError failures that halt all blockchain data collection for the affected chain, with no graceful degradation or fallback.

### `AP-CRYPTO-TRADING-013` — Overwriting User-Customized Event Classifications <sub>(medium)</sub>

Re-decoding operations silently replace user-modified events marked as CUSTOMIZED without explicit user action. User edits to event classifications are permanently lost, causing incorrect accounting treatment and potential tax reporting errors that may not be detected until audit.

## rotki, hummingbot, cryptofeed, ccxt (1)

### `AP-CRYPTO-TRADING-001` — Float Arithmetic for Monetary Values <sub>(high)</sub>

Using Python float type instead of Decimal for price, amount, balance, PnL, and other financial calculations causes precision errors due to binary floating-point representation. Rounding errors compound across multiple calculations, leading to incorrect order sizing, wrong profit/loss reporting, and potentially incorrect trading decisions or tax calculations.

FILE:references/COMPONENTS.md
# Component Capability Map

**Project**: finance-bp-110--cryptofeed
**Scan date**: 2026-04-22
**Stats**: {'total_files': 8, 'total_classes': 30, 'total_functions': 0, 'total_stages': 8}

## Modules (8)

- [feed_handler_orchestration](components/feed_handler_orchestration.md): 4 classes
- [connection_management](components/connection_management.md): 4 classes
- [exchange_interface_layer](components/exchange_interface_layer.md): 5 classes
- [data_normalization](components/data_normalization.md): 4 classes
- [order_book_processing](components/order_book_processing.md): 3 classes
- [nbbo_aggregation](components/nbbo_aggregation.md): 2 classes
- [callback_dispatch](components/callback_dispatch.md): 3 classes
- [backend_storage](components/backend_storage.md): 5 classes

FILE:references/CONSTRAINTS.md
# Constraints

## preservation_manifest

```yaml
required_objects:
  business_decisions_count: 103
  fatal_constraints_count: 33
  non_fatal_constraints_count: 196
  use_cases_count: 40
  semantic_locks_count: 12
  preconditions_count: 4
  evidence_quality_rules_count: 2
  traceback_scenarios_count: 5

```

FILE:references/LOCKS.md
# Semantic Locks + Preconditions

## Semantic Locks (12)

### `SL-01` <sub>(on_violation: fatal)</sub>
Execute sell orders before buy orders in every trading cycle

### `SL-02` <sub>(on_violation: fatal)</sub>
Trading signals MUST use next-bar execution (no look-ahead)

### `SL-03` <sub>(on_violation: fatal)</sub>
Entity IDs MUST follow format entity_type_exchange_code

### `SL-04` <sub>(on_violation: fatal)</sub>
DataFrame index MUST be MultiIndex (entity_id, timestamp)

### `SL-05` <sub>(on_violation: fatal)</sub>
TradingSignal MUST have EXACTLY ONE of: position_pct, order_money, order_amount

### `SL-06` <sub>(on_violation: fatal)</sub>
filter_result column semantics: True=BUY, False=SELL, None/NaN=NO ACTION

### `SL-07` <sub>(on_violation: fatal)</sub>
Transformer MUST run BEFORE Accumulator in factor pipeline

### `SL-08` <sub>(on_violation: fatal)</sub>
MACD parameters locked: fast=12, slow=26, signal=9

### `SL-09` <sub>(on_violation: warning)</sub>
Default transaction costs: buy_cost=0.001, sell_cost=0.001, slippage=0.001

### `SL-10` <sub>(on_violation: fatal)</sub>
A-share equity trading is T+1 (no same-day close of buy positions)

### `SL-11` <sub>(on_violation: fatal)</sub>
Recorder subclass MUST define provider AND data_schema class attributes

### `SL-12` <sub>(on_violation: fatal)</sub>
Factor result_df MUST contain either 'filter_result' OR 'score_result' column

## Preconditions (4)

- **`PC-01`**: `python3 -c 'import zvt; print(zvt.__version__)'` → on_fail: Run: python3 -m pip install zvt then re-run: python3 -m zvt.init_dirs to initialize data directories
- **`PC-02`**: `python3 -c "from zvt.api.kdata import get_kdata; df = get_kdata(entity_ids=['stock_sh_600000'], limit=1); assert df is not None and len(df) > 0, 'No kdata found'"` → on_fail: Run recorder first: python3 -m zvt.recorders.em.em_stock_kdata_recorder --entity_ids stock_sh_600000 (replace with your target entity IDs)
- **`PC-03`**: `python3 -c "import os; from pathlib import Path; zvt_home = Path(os.environ.get('ZVT_HOME', Path.home() / '.zvt')); assert zvt_home.exists(), f'ZVT home not found: {zvt_home}'"` → on_fail: Run: python3 -m zvt.init_dirs
- **`PC-04`**: `python3 -c "import os, tempfile; from pathlib import Path; zvt_home = Path(os.environ.get('ZVT_HOME', Path.home() / '.zvt')); test_f = zvt_home / '.write_test'; test_f.touch(); test_f.unlink()"` → on_fail: Check directory permissions: chmod u+w ~/.zvt or set ZVT_HOME environment variable to a writable location

FILE:references/USE_CASES.md
# Known Use Cases (KUC)

Total: **40**

## `KUC-101`
**Source**: `examples/demo.py`

Demonstrates how to define and use async callback handlers for receiving real-time market data updates from cryptocurrency exchanges.

## `KUC-102`
**Source**: `examples/demo_arctic.py`

Stores cryptocurrency trade, funding, and ticker data to ArcticDB (Arctic) time-series database for persistence and later analysis.

## `KUC-103`
**Source**: `examples/demo_bequant_bitcoincom_hitbtc.py`

Demonstrates each supported features (ticker, trades, order book, candles) for Bequant and HitBTC exchanges which share the same API.

## `KUC-104`
**Source**: `examples/demo_binance_authenticated.py`

Demonstrates authenticated access to Binance, Binance Delivery, and Binance Futures for receiving account balances, positions, and order updates in real-time.

## `KUC-105`
**Source**: `examples/demo_binance_delivery.py`

Shows data subscription for Binance Delivery perpetual futures including order book, ticker, and trade data.

## `KUC-106`
**Source**: `examples/demo_binancetr.py`

Demonstrates data subscription for Binance TR exchange including ticker, trades, order book with delta updates, and candle data.

## `KUC-107`
**Source**: `examples/demo_bitfinex_authenticated.py`

Demonstrates synchronous authenticated Bitfinex trading operations including balance queries, order management, and trade execution.

## `KUC-108`
**Source**: `examples/demo_bybit_authenticated.py`

Demonstrates Bybit authenticated feeds for receiving order updates and trade fills in real-time for account monitoring.

## `KUC-109`
**Source**: `examples/demo_check_trade_timestamps.py`

Monitors and compares trade timestamps across multiple exchanges to verify timestamp consistency and identify potential synchronization issues.

## `KUC-110`
**Source**: `examples/demo_concurrent_proxy.py`

Demonstrates using HTTP proxy to bypass exchange rate limits when subscribing to many symbols, enabling concurrent order book and open interest data collection.

## `KUC-111`
**Source**: `examples/demo_custom_agg.py`

Demonstrates custom aggregation of trade data over time windows, tracking min/max prices for each symbol within the aggregation period.

## `KUC-112`
**Source**: `examples/demo_deribit_authenticated.py`

Demonstrates Deribit authenticated feeds for order info, trade fills, and balance updates for comprehensive account monitoring.

## `KUC-113`
**Source**: `examples/demo_elastic.py`

Stores order book, funding, and trade data to Elasticsearch for search and analytics capabilities.

## `KUC-114`
**Source**: `examples/demo_existing_loop.py`

Demonstrates integrating cryptofeed with an existing asyncio event loop, allowing concurrent execution with other async tasks.

## `KUC-115`
**Source**: `examples/demo_gateiofutures.py`

Demonstrates subscription to Gate.io futures exchange for ticker, trades, order book, funding, and candle data.

## `KUC-116`
**Source**: `examples/demo_gcppubsub.py`

Publishes trade data to Google Cloud Platform Pub/Sub for event-driven architectures and cloud-based processing.

## `KUC-117`
**Source**: `examples/demo_influxdb.py`

Stores funding, order book, trades, ticker, and candles to InfluxDB time-series database for monitoring and analysis.

## `KUC-118`
**Source**: `examples/demo_kafka.py`

Streams order book and trade data to Apache Kafka with custom topic and partition routing for scalable event processing.

## `KUC-119`
**Source**: `examples/demo_liquidations.py`

Monitors and displays liquidations across each exchanges that support this channel, useful for identifying market stress and volatility.

## `KUC-120`
**Source**: `examples/demo_loop.py`

Demonstrates dynamic addition of feeds to a running event loop and scheduled callbacks for adding/removing feeds over time.

## `KUC-121`
**Source**: `examples/demo_mongo.py`

Stores order book, trades, and ticker data to MongoDB document database with flexible schema for JSON storage.

## `KUC-122`
**Source**: `examples/demo_multicb.py`

Demonstrates registering multiple callback handlers for a single data channel, enabling parallel processing of the same data.

## `KUC-123`
**Source**: `examples/demo_nbbo.py`

Calculates National Best Bid and Offer (NBBO) by aggregating best bid/ask prices across Coinbase, Gemini, and Kraken for a given symbol.

## `KUC-124`
**Source**: `examples/demo_ohlcv.py`

Aggregates trade data into OHLCV (Open, High, Low, Close, Volume) candles over configurable time windows for charting.

## `KUC-125`
**Source**: `examples/demo_okx_authenticated.py`

Demonstrates authenticated OKX exchange for receiving real-time order updates for account monitoring.

## `KUC-126`
**Source**: `examples/demo_playback.py`

Plays back historical market data from captured PCAP files through the callback system for backtesting and analysis.

## `KUC-127`
**Source**: `examples/demo_postgres.py`

Stores comprehensive market data (candles, index, ticker, trades, open interest, liquidations, funding, order books) to PostgreSQL with custom column mapping.

## `KUC-128`
**Source**: `examples/demo_quasardb.py`

Stores ticker, trades, candles, open interest, index, and liquidation data to QuasarDB for high-performance time-series analytics.

## `KUC-129`
**Source**: `examples/demo_questdb.py`

Stores order book, candles, funding, ticker, and trade data to QuestDB for high-performance time-series database operations.

## `KUC-130`
**Source**: `examples/demo_rabbitmq_exchange.py`

Publishes order book data to RabbitMQ using topic exchange routing for flexible message filtering and distribution.

## `KUC-131`
**Source**: `examples/demo_rabbitmq_queue.py`

Publishes order book data to RabbitMQ using queue-based delivery for point-to-point message distribution.

## `KUC-132`
**Source**: `examples/demo_raw_data.py`

Collects raw WebSocket data to files for offline analysis, debugging, or historical data preservation.

## `KUC-133`
**Source**: `examples/demo_redis.py`

Stores trades, funding, candles, order books, open interest, and ticker data to Redis with both pub/sub and persistent storage backends.

## `KUC-134`
**Source**: `examples/demo_renko.py`

Transforms trade data into Renko chart bricks based on fixed price movements for trend visualization independent of time.

## `KUC-135`
**Source**: `examples/demo_tcp.py`

Streams trade data over TCP sockets for network-based data distribution to remote systems or applications.

## `KUC-136`
**Source**: `examples/demo_throttle.py`

Limits the rate of order book callbacks to a specified number per time window, useful for managing downstream system load.

## `KUC-137`
**Source**: `examples/demo_udp.py`

Streams order book and trade data over UDP datagrams for low-latency network distribution to remote systems.

## `KUC-138`
**Source**: `examples/demo_uds.py`

Streams ticker and trade data over Unix domain sockets for high-performance inter-process communication on the same host.

## `KUC-139`
**Source**: `examples/demo_victoriametrics.py`

Stores trade, ticker, order book, and candle data to VictoriaMetrics for Prometheus-compatible time-series monitoring and analytics.

## `KUC-140`
**Source**: `examples/demo_zmq.py`

Publishes order book and ticker data over ZeroMQ pub/sub for lightweight message distribution to multiple subscribers.

FILE:references/WISDOM.md
# Cross-Project Wisdom

Total: **8**

## `CW-CRYPTO-TRADING-001` — Decimal Type for All Monetary Values
**From**: rotki, hummingbot, cryptofeed, ccxt · **Applicable to**: crypto-trading

All four projects mandate Decimal type for price, amount, balance, quantity, and PnL fields. Float arithmetic causes rounding errors that compound across financial calculations, leading to incorrect order sizing and reporting. Always use Decimal for any value representing money in crypto trading systems.

## `CW-CRYPTO-TRADING-002` — Initialize Data Structures Before Access
**From**: ccxt, cryptofeed, rotki · **Applicable to**: crypto-trading

Projects consistently require explicit initialization before data access: load_markets() before symbol lookups, check symbol population before mapping access, establish RPC connections before queries. Skipping initialization causes KeyError, AttributeError, or silent data corruption that breaks downstream operations.

## `CW-CRYPTO-TRADING-003` — Precise String Arithmetic for Financial Calculations
**From**: ccxt · **Applicable to**: crypto-trading

CCXT mandates Precise.string_* static methods (string_mul, string_div, string_add, string_sub) for monetary calculations to avoid floating-point precision errors. This is especially critical for high-precision exchange data where rounding errors cause incorrect order costs, fees, and balances that may result in financial loss.

## `CW-CRYPTO-TRADING-004` — Respect Exchange Rate Limits
**From**: ccxt · **Applicable to**: crypto-trading

Disabling rate limiting via enableRateLimit=False causes HTTP 429 responses and potential temporary or permanent API key suspension by exchanges. CCXT enforces rate limits per IP/API key pair, and bypassing throttle() gates results in compliance violations that disrupt all trading activity until exchanges lift bans.

## `CW-CRYPTO-TRADING-005` — Inverse Contract Price Adjustment
**From**: ccxt, hummingbot · **Applicable to**: crypto-trading

Perpetual swap cost calculations require applying inverse price adjustment (1/price) before multiplying by contractSize for inverse contracts. Incorrect cost calculation causes wrong position sizing, leading to unexpected liquidation or insufficient margin for perpetual trading positions.

## `CW-CRYPTO-TRADING-006` — Strict Connection Lifecycle Ordering
**From**: cryptofeed, ccxt · **Applicable to**: crypto-trading

Both projects enforce strict execution order for connection operations: cryptofeed requires authenticate -> subscribe -> message handler sequence, while ccxt mandates connect -> on_connected_callback -> subscriptions -> on_close_callback. Out-of-order operations cause subscription failures and no data flow through connections.

## `CW-CRYPTO-TRADING-007` — Validate Input Data Structure Before Processing
**From**: rotki, cryptofeed · **Applicable to**: crypto-trading

Rotki validates EVM address checksum format before RPC calls; cryptofeed checks Symbols.populated() before symbol mapping access. Validating data structure before processing prevents downstream crashes (KeyError, InvalidAddress) and data corruption that is harder to debug when symptoms appear in unrelated code paths.

## `CW-CRYPTO-TRADING-008` — Validate Order Sizes Against Exchange Minimums
**From**: hummingbot · **Applicable to**: crypto-trading

DCAExecutor amounts must be validated against min_notional_size and amounts_quote/prices against min_order_size before execution. Orders below exchange minimums are rejected, breaking strategy execution and potentially leaving positions partially unfilled at unfavorable prices.

FILE:references/components/backend_storage.md
# backend_storage (5 classes)

## `Backend.write`
`backend_storage/backend-write.py:0`

## `Backend.start`
`backend_storage/backend-start.py:0`

## `BackendQueue.start`
`backend_storage/backendqueue-start.py:0`

## `storage_backend`
`backend_storage/storage-backend.py:0`

## `ipc_mechanism`
`backend_storage/ipc-mechanism.py:0`

FILE:references/components/callback_dispatch.md
# callback_dispatch (3 classes)

## `Callback.__call__`
`callback_dispatch/callback-call.py:0`

## `AggregateCallback.__init__`
`callback_dispatch/aggregatecallback-init.py:0`

## `callback_type`
`callback_dispatch/callback-type.py:0`

FILE:references/components/connection_management.md
# connection_management (4 classes)

## `WebsocketEndpoint.connect`
`connection_management/websocketendpoint-connect.py:0`

## `WebsocketEndpoint.read`
`connection_management/websocketendpoint-read.py:0`

## `WebsocketEndpoint.write`
`connection_management/websocketendpoint-write.py:0`

## `websocket_library`
`connection_management/websocket-library.py:0`

FILE:references/components/data_normalization.md
# data_normalization (4 classes)

## `Trade.__init__`
`data_normalization/trade-init.py:0`

## `Book.__init__`
`data_normalization/book-init.py:0`

## `Callback.__call__`
`data_normalization/callback-call.py:0`

## `type_validation`
`data_normalization/type-validation.py:0`

FILE:references/components/exchange_interface_layer.md
# exchange_interface_layer (5 classes)

## `Binance.subscribe`
`exchange_interface_layer/binance-subscribe.py:0`

## `Coinbase._connect`
`exchange_interface_layer/coinbase-connect.py:0`

## `Exchange.standardize_symbol`
`exchange_interface_layer/exchange-standardize-symbol.py:0`

## `symbol_mapping_backend`
`exchange_interface_layer/symbol-mapping-backend.py:0`

## `rest_rate_limit_handling`
`exchange_interface_layer/rest-rate-limit-handling.py:0`

FILE:references/components/feed_handler_orchestration.md
# feed_handler_orchestration (4 classes)

## `FeedHandler.add_feed`
`feed_handler_orchestration/feedhandler-add-feed.py:0`

## `FeedHandler.add_nbbo`
`feed_handler_orchestration/feedhandler-add-nbbo.py:0`

## `FeedHandler.run`
`feed_handler_orchestration/feedhandler-run.py:0`

## `async_event_loop`
`feed_handler_orchestration/async-event-loop.py:0`

FILE:references/components/nbbo_aggregation.md
# nbbo_aggregation (2 classes)

## `NBBO._update`
`nbbo_aggregation/nbbo-update.py:0`

## `nbbo_source`
`nbbo_aggregation/nbbo-source.py:0`

FILE:references/components/order_book_processing.md
# order_book_processing (3 classes)

## `Book.callback`
`order_book_processing/book-callback.py:0`

## `Book.update`
`order_book_processing/book-update.py:0`

## `book_depth`
`order_book_processing/book-depth.py:0`

ClawHub Coding Data Analysis+2

T@clawhub-tangweigang-jpg-8679fec286

Credit Transition Matrix

Skill

处理信用评级转移矩阵，支持Not-Rated状态重分配、年度与月度矩阵转换、状态空间定义及数据集表征。

---
name: credit-transition-matrix
description: |-
  处理信用评级转移矩阵，支持Not-Rated状态重分配、年度与月度矩阵转换、状态空间定义及数据集表征。
license: Proprietary. See LICENSE.txt in project root.
compatibility: Designed for Doramagic-host ecosystem (Claude Code / openclaw / Cursor). Requires Python 3.12+ with uv package manager.
metadata:
  version: "v6.1"
  blueprint_id: "finance-bp-119"
  compiled_at: "2026-04-22T13:00:58.228711+00:00"
  capability_markets: "global"
  capability_activities: "credit-risk"
  sop_version: "crystal-compilation-v6.1"
---
# 信用转移矩阵 (credit-transition-matrix)

> 处理信用评级转移矩阵，支持Not-Rated状态重分配、年度与月度矩阵转换、状态空间定义及数据集表征。

## Pipeline

`data_collection -> data_storage -> factor_computation -> target_selection -> trading_execution -> visualization`

## Top Use Cases (22 total)

### Adjust Not-Rated State in Credit Migration Matrices (`UC-101`)
Credit rating transition matrices often contain 'not-rated' (NR) observations that need to be redistributed to rated states for downstream risk calcul
**Triggers**: not-rated, NR adjustment, credit migration

### Adjust Not-Rated State via Python Script (`UC-104`)
Corporate credit rating migration data contains NR (not-rated) states that must be removed using noninformative redistribution method before calculati
**Triggers**: not-rated, NR removal, credit rating

### Clean and Prepare Transition Data (`UC-108`)
Raw credit rating data requires preprocessing including column renaming, state validation, and absorbing state verification before it can be used for
**Triggers**: data cleaning, preprocessing, validation

For all **22** use cases, see [references/USE_CASES.md](references/USE_CASES.md).

**Execute trigger**: `When user intent matches intent_router.uc_entries[].positive_terms AND user uses action verb (run/execute/跑/执行/backtest/fetch/collect)`

## What I'll Ask You

- Target market: A-share (default), HK, or crypto? (US stocks in ZVT are half-baked — stockus_nasdaq_AAPL exists but coverage is thin)
- Data source / provider: eastmoney (free, no account), joinquant (account+paid), baostock (free, good history), akshare, or qmt (broker)?
- Strategy type: MACD golden-cross, MA crossover, volume breakout, fundamental screen, or custom factor?
- Time range: start_timestamp and end_timestamp for backtest period
- Target entity IDs: specific stocks (stock_sh_600000) or index components (SZ1000)?

## Semantic Locks (Fatal)

| ID | Rule | On Violation |
|---|---|---|
| `SL-01` | Execute sell orders before buy orders in every trading cycle | halt |
| `SL-02` | Trading signals MUST use next-bar execution (no look-ahead) | halt |
| `SL-03` | Entity IDs MUST follow format entity_type_exchange_code | halt |
| `SL-04` | DataFrame index MUST be MultiIndex (entity_id, timestamp) | halt |
| `SL-05` | TradingSignal MUST have EXACTLY ONE of: position_pct, order_money, order_amount | halt |
| `SL-06` | filter_result column semantics: True=BUY, False=SELL, None/NaN=NO ACTION | halt |
| `SL-07` | Transformer MUST run BEFORE Accumulator in factor pipeline | halt |
| `SL-08` | MACD parameters locked: fast=12, slow=26, signal=9 | halt |

Full lock definitions: [references/LOCKS.md](references/LOCKS.md)

## Top Anti-Patterns (14 total)

- **`AP-CREDIT-RISK-001`**: Empty DataFrame passed to bucketing pipeline
- **`AP-CREDIT-RISK-002`**: Multi-dimensional target array causing WoE shape mismatch
- **`AP-CREDIT-RISK-003`**: OptimalBucketer receiving high-cardinality numerical features

All 14 anti-patterns: [references/ANTI_PATTERNS.md](references/ANTI_PATTERNS.md)

## Evidence Quality Notice

> [QUALITY NOTICE] This crystal was compiled from blueprint finance-bp-119. Evidence verify ratio = 35.8% and audit fail total = 15. Generated results may have uncaptured requirement gaps. Verify critical decisions against source files (LATEST.yaml / LATEST.jsonl).

## Reference Files

| File | Contents | When to Load |
|---|---|---|
| [references/seed.yaml](references/seed.yaml) | V6+ 全量权威 (source-of-truth) | 有行为/决策争议时必读 |
| [references/ANTI_PATTERNS.md](references/ANTI_PATTERNS.md) | 14 条跨项目反模式 | 开始实现前 |
| [references/WISDOM.md](references/WISDOM.md) | 跨项目精华借鉴 | 架构决策时 |
| [references/CONSTRAINTS.md](references/CONSTRAINTS.md) | domain + fatal 约束 | 规则冲突时 |
| [references/USE_CASES.md](references/USE_CASES.md) | 全量 KUC-* 业务场景 | 需要完整示例时 |
| [references/LOCKS.md](references/LOCKS.md) | SL-* + preconditions + hints | 生成回测/交易代码前 |
| [references/COMPONENTS.md](references/COMPONENTS.md) | AST 组件地图（按 module 拆分）| 查 API 时 |

---

*Compiled by Doramagic crystal-compilation-v6.1 from `finance-bp-119` blueprint at 2026-04-22T13:00:58.228711+00:00.*
*See [human_summary.md](human_summary.md) for non-technical overview.*

FILE:human_summary.md
# Human Summary

> I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow. Just tell me what you want; I'll write the code, you don't have to dig docs. (Heads up: ZVT natively supports A-share, HK, and crypto. US stocks — stockus_nasdaq_AAPL — are half-baked; don't bother for serious work.)

## What I Can Do

- **tagline**: I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow. Just tell me what you want; I'll write the code, you don't have to dig docs. (Heads up: ZVT natively supports A-share, HK, and crypto. US stocks — stockus_nasdaq_AAPL — are half-baked; don't bother for serious work.)
- **use_cases**: ['Convert Annual to Monthly Transition Matrices via Generator', 'Transition Matrix Operations Demonstration', 'Adjust Not-Rated State in Credit Migration Matrices', 'A-share MACD daily golden-cross backtest with hfq price adjustment from eastmoney', 'End-to-end ZVT pipeline: FinanceRecorder + GoodCompanyFactor + StockTrader', 'Multi-factor strategy with TargetSelector (AND mode) combining MACD + volume breakout', 'Index composition data collection (SZ1000, SZ2000) with EM recorder']

## What I Ask You

- Target market: A-share (default), HK, or crypto? (US stocks in ZVT are half-baked — stockus_nasdaq_AAPL exists but coverage is thin)
- Data source / provider: eastmoney (free, no account), joinquant (account+paid), baostock (free, good history), akshare, or qmt (broker)?
- Strategy type: MACD golden-cross, MA crossover, volume breakout, fundamental screen, or custom factor?
- Time range: start_timestamp and end_timestamp for backtest period
- Target entity IDs: specific stocks (stock_sh_600000) or index components (SZ1000)?

FILE:references/ANTI_PATTERNS.md
# Anti-Patterns (Cross-Project)

Total: **14**

## finance-bp-050--skorecard (5)

### `AP-CREDIT-RISK-001` — Empty DataFrame passed to bucketing pipeline <sub>(high)</sub>

When preparing input data for bucketing, passing an empty DataFrame with zero rows or zero columns causes immediate ValueError at validation stage. This prevents any downstream processing and blocks the entire credit risk scoring pipeline from executing. The root cause is missing defensive validation before data enters the bucketing workflow.

### `AP-CREDIT-RISK-002` — Multi-dimensional target array causing WoE shape mismatch <sub>(high)</sub>

When providing target variable y to bucketers without normalizing to 1D numpy array through _check_y validation, downstream Weight of Evidence calculations fail with shape mismatches. The consequence is corrupted bucket tables with incorrect credit risk scores that misrepresent default probability estimates.

### `AP-CREDIT-RISK-003` — OptimalBucketer receiving high-cardinality numerical features <sub>(high)</sub>

When implementing prebucketing for OptimalBucketer on numerical features without reducing to at most 100 unique values, the system raises NotPreBucketedError and blocks the entire bucketing pipeline. Similarly, AsIsNumericalBucketer fails with the same error for columns exceeding 100 unique values, preventing feature transformation in production scoring.

### `AP-CREDIT-RISK-004` — Special values distorting optimal bin boundaries <sub>(high)</sub>

When implementing fit() for bucketers without filtering special values from X before computing bin boundaries using _filter_specials_for_fit(), outlier special values distort optimal bin boundaries. This causes incorrect weight-of-evidence calculations and unreliable credit risk scores that misrepresent borrower default probabilities.

### `AP-CREDIT-RISK-005` — Two-phase bucketing ordering violation causing special value loss <sub>(high)</sub>

When fitting a BucketingProcess with two-phase bucketing without fitting prebucketing_pipeline before bucketing_pipeline, special value remapping fails because pre-bucket labels are unavailable. Additionally, not using _find_remapped_specials() after prebucketing causes special values to lose their correct bucket mappings, resulting in runtime errors.

## finance-bp-072--lending (3)

### `AP-CREDIT-RISK-006` — Loan amount exceeding product and collateral limits <sub>(high)</sub>

When validating loan amount for loan applications without enforcing loan_amount does not exceed maximum_loan_amount from loan product or proposed securities, disbursing amounts exceeding product or collateral limits exposes the lender to uncollateralized risk. This violates lending policy and creates direct financial loss exposure through unauthorized lending.

### `AP-CREDIT-RISK-007` — Disbursement validation failures creating unauthorized exposure <sub>(high)</sub>

When implementing loan disbursement validation without checking disbursed amount against loan limit, assigned security value, available limit amount, and limit applicability dates, unauthorized disbursements occur. For Line of Credit loans, disbursement outside approved periods or exceeding available limits creates unauthorized lending exposure and regulatory compliance violations.

### `AP-CREDIT-RISK-008` — Interest accrual on written-off loans inflating income <sub>(high)</sub>

When processing interest accrual for Written Off loans without verifying posting_date is on or after the loan write-off date, interest is artificially inflated on non-performing assets. This misrepresents loan portfolio value, violates provisioning requirements, and creates false income reporting that misleads stakeholders about actual financial performance.

## finance-bp-112--openLGD (2)

### `AP-CREDIT-RISK-009` — Loop index errors in federated parameter averaging <sub>(high)</sub>

When implementing federated parameter averaging logic, using the final index n instead of the loop variable k causes only the last server's weight to be applied repeatedly. Additionally, skipping the first server by starting loop index at 1 excludes valid parameters from averaging, breaking federated convergence and producing incorrect LGD estimates across all nodes.

### `AP-CREDIT-RISK-010` — API response format inconsistency breaking federated coordination <sub>(high)</sub>

When implementing GET /start and POST /update endpoints for LGD estimation without consistent 'intercept' and 'coefficient' keys in JSON responses, the federated coordinator fails to parse responses causing KeyError. Different return key names (e.g., 'coef' instead of 'coefficient') break both standalone and federated execution paths.

## finance-bp-119--transitionMatrix (4)

### `AP-CREDIT-RISK-011` — Invalid transition probabilities corrupting Markov matrices <sub>(high)</sub>

When generating synthetic Markov chain data or estimating transition matrices with probabilities outside [0, 1] or row sums not equal to 1.0, the resulting matrices violate the fundamental mathematical definition of a stochastic transition matrix. This corrupts all downstream Markov chain modeling and credit curve generation, producing unreliable credit risk estimates.

### `AP-CREDIT-RISK-012` — Unsorted event data causing incorrect transition matrix estimates <sub>(high)</sub>

When feeding generated data to cohort or duration estimators without sorting by entity ID first, then by ascending time, incorrect timepoint assignment occurs in estimators, leading to wrong transition counts. Unsorted data also causes the Aalen-Johansen algorithm to process events out of temporal order, producing incorrect transition matrices that violate the Markov property.

### `AP-CREDIT-RISK-013` — Zero-count division causing NaN in transition matrices <sub>(high)</sub>

When normalizing counts to produce transition probabilities without checking source state population count is greater than zero before division, division by zero occurs and causes NaN values in the transition matrix. These NaN values corrupt all downstream matrix operations including generator matrix computation and credit curve generation.

### `AP-CREDIT-RISK-014` — Wrong matrix logarithm method producing invalid generator matrices <sub>(medium)</sub>

When implementing generator() method without using scipy.linalg.logm for matrix logarithm computation, using numpy.log or other approximation methods produces invalid generator matrices with row sums not equal to zero. This violates the mathematical definition of an infinitesimal generator, causing incorrect continuous-time Markov chain modeling.

FILE:references/COMPONENTS.md
# Component Capability Map

**Project**: finance-bp-119--transitionMatrix
**Scan date**: 2026-04-22
**Stats**: {'total_files': 7, 'total_classes': 32, 'total_functions': 0, 'total_stages': 7}

## Modules (7)

- [state_space_definition](components/state_space_definition.md): 2 classes
- [data_preprocessing](components/data_preprocessing.md): 5 classes
- [synthetic_data_generation](components/synthetic_data_generation.md): 4 classes
- [matrix_estimation](components/matrix_estimation.md): 5 classes
- [matrix_representation](components/matrix_representation.md): 6 classes
- [matrix_operations](components/matrix_operations.md): 7 classes
- [credit_curve_analysis](components/credit_curve_analysis.md): 3 classes

FILE:references/CONSTRAINTS.md
# Constraints

## preservation_manifest

```yaml
required_objects:
  business_decisions_count: 131
  fatal_constraints_count: 46
  non_fatal_constraints_count: 147
  use_cases_count: 22
  semantic_locks_count: 12
  preconditions_count: 4
  evidence_quality_rules_count: 2
  traceback_scenarios_count: 5

```

FILE:references/LOCKS.md
# Semantic Locks + Preconditions

## Semantic Locks (12)

### `SL-01` <sub>(on_violation: fatal)</sub>
Execute sell orders before buy orders in every trading cycle

### `SL-02` <sub>(on_violation: fatal)</sub>
Trading signals MUST use next-bar execution (no look-ahead)

### `SL-03` <sub>(on_violation: fatal)</sub>
Entity IDs MUST follow format entity_type_exchange_code

### `SL-04` <sub>(on_violation: fatal)</sub>
DataFrame index MUST be MultiIndex (entity_id, timestamp)

### `SL-05` <sub>(on_violation: fatal)</sub>
TradingSignal MUST have EXACTLY ONE of: position_pct, order_money, order_amount

### `SL-06` <sub>(on_violation: fatal)</sub>
filter_result column semantics: True=BUY, False=SELL, None/NaN=NO ACTION

### `SL-07` <sub>(on_violation: fatal)</sub>
Transformer MUST run BEFORE Accumulator in factor pipeline

### `SL-08` <sub>(on_violation: fatal)</sub>
MACD parameters locked: fast=12, slow=26, signal=9

### `SL-09` <sub>(on_violation: warning)</sub>
Default transaction costs: buy_cost=0.001, sell_cost=0.001, slippage=0.001

### `SL-10` <sub>(on_violation: fatal)</sub>
A-share equity trading is T+1 (no same-day close of buy positions)

### `SL-11` <sub>(on_violation: fatal)</sub>
Recorder subclass MUST define provider AND data_schema class attributes

### `SL-12` <sub>(on_violation: fatal)</sub>
Factor result_df MUST contain either 'filter_result' OR 'score_result' column

## Preconditions (4)

- **`PC-01`**: `python3 -c 'import zvt; print(zvt.__version__)'` → on_fail: Run: python3 -m pip install zvt then re-run: python3 -m zvt.init_dirs to initialize data directories
- **`PC-02`**: `python3 -c "from zvt.api.kdata import get_kdata; df = get_kdata(entity_ids=['stock_sh_600000'], limit=1); assert df is not None and len(df) > 0, 'No kdata found'"` → on_fail: Run recorder first: python3 -m zvt.recorders.em.em_stock_kdata_recorder --entity_ids stock_sh_600000 (replace with your target entity IDs)
- **`PC-03`**: `python3 -c "import os; from pathlib import Path; zvt_home = Path(os.environ.get('ZVT_HOME', Path.home() / '.zvt')); assert zvt_home.exists(), f'ZVT home not found: {zvt_home}'"` → on_fail: Run: python3 -m zvt.init_dirs
- **`PC-04`**: `python3 -c "import os, tempfile; from pathlib import Path; zvt_home = Path(os.environ.get('ZVT_HOME', Path.home() / '.zvt')); test_f = zvt_home / '.write_test'; test_f.touch(); test_f.unlink()"` → on_fail: Check directory permissions: chmod u+w ~/.zvt or set ZVT_HOME environment variable to a writable location

FILE:references/USE_CASES.md
# Known Use Cases (KUC)

Total: **22**

## `KUC-101`
**Source**: `examples/notebooks/Adjust_NotRated_State.ipynb`

Credit rating transition matrices often contain 'not-rated' (NR) observations that need to be redistributed to rated states for downstream risk calculations and regulatory reporting.

## `KUC-102`
**Source**: `examples/notebooks/Matrix_Operations.ipynb`

Users need to understand how to initialize, validate, and work with transition matrices for credit risk modeling.

## `KUC-103`
**Source**: `examples/notebooks/Monthly_from_Annual.ipynb`

Credit risk models require transition matrices at different time horizons (monthly, quarterly, annual) but only annual matrices may be available; matrix exponentiation of generators enables temporal scaling.

## `KUC-104`
**Source**: `examples/python/adjust_nr_state.py`

Corporate credit rating migration data contains NR (not-rated) states that must be removed using noninformative redistribution method before calculating regulatory capital requirements.

## `KUC-105`
**Source**: `examples/python/characterize_datasets.py`

Data scientists need to understand the characteristics of credit rating transition datasets before applying estimation methods or building models.

## `KUC-106`
**Source**: `examples/python/compare_estimators.py`

Different transition matrix estimation methods produce different results; researchers need to compare cohort-based vs duration-based (Aalen-Johansen) estimators to choose appropriate methods for their data.

## `KUC-107`
**Source**: `examples/python/credit_curves.py`

Credit risk management requires visualization of how default probabilities and credit quality evolve over time through multi-period transition matrices.

## `KUC-108`
**Source**: `examples/python/data_cleaning_example.py`

Raw credit rating data requires preprocessing including column renaming, state validation, and absorbing state verification before it can be used for matrix estimation.

## `KUC-109`
**Source**: `examples/python/deterministic_paths.py`

Testing and validation of transition matrix estimators requires reproducible deterministic transition paths with known outcomes.

## `KUC-110`
**Source**: `examples/python/empirical_transition_matrix.py`

Credit risk modeling requires empirical transition matrix estimation from continuous-time duration data where observation times vary across entities.

## `KUC-111`
**Source**: `examples/python/estimate_matrix.py`

Complete workflow for estimating credit rating transition matrices from historical data using multiple estimation approaches with generator extraction.

## `KUC-112`
**Source**: `examples/python/fix_multiperiod_matrix.py`

Historical credit migration matrices may have structural issues (non-square, missing states, negative probabilities) that must be corrected before use in risk models.

## `KUC-113`
**Source**: `examples/python/generate_full_multiperiod_set.py`

Risk models require complete transition matrices across each time horizons; sparse historical observations must be expanded using matrix exponentiation.

## `KUC-114`
**Source**: `examples/python/generate_synthetic_data.py`

Development and testing of transition matrix estimators requires synthetic data with known properties for validation and benchmarking.

## `KUC-115`
**Source**: `examples/python/generate_visuals.py`

Stakeholders require visual representations of credit migration patterns including Sankey diagrams, heatmaps, and step plots for reporting and presentations.

## `KUC-116`
**Source**: `examples/python/matrix_from_cohort_data.py`

Credit rating agencies publish migration data in cohort format; estimation from this data format requires cohort-based transition matrix estimation.

## `KUC-117`
**Source**: `examples/python/matrix_from_duration_data.py`

Individual credit observations with varying timestamps require duration-based transition matrix estimation using time-to-event methodology.

## `KUC-118`
**Source**: `examples/python/matrix_lendingclub.py`

Peer-to-peer lending platforms like LendingClub have unique grade states; requires specialized transition matrix estimation from loan performance data.

## `KUC-119`
**Source**: `examples/python/matrix_operations.py`

Transition matrices require various mathematical operations including power, validation, printing, and generator extraction for risk calculations.

## `KUC-120`
**Source**: `examples/python/matrix_set_lendingclub.py`

P2P lending risk models require transition matrix sets across multiple periods to capture evolving loan portfolio behavior over time.

## `KUC-121`
**Source**: `examples/python/matrix_set_operations.py`

Multi-period risk models require operations on collections of transition matrices including copying, power-based cumulation, and validation.

## `KUC-122`
**Source**: `examples/python/state_space_operations.py`

Different credit rating agencies use different rating scales; users need to convert between S&P, Moody's, DBRS and other rating systems for portfolio analysis.

FILE:references/WISDOM.md
# Cross-Project Wisdom

Total: **8**

## `CW-CREDIT-RISK-001` — Strict input DataFrame schema validation
**From**: finance-bp-050--skorecard, finance-bp-112--openLGD · **Applicable to**: credit-risk

Both skorecard and openLGD require strict validation that input DataFrames contain exactly the expected columns (X/Y for openLGD, specified variable names for skorecard). This pattern is critical when data flows through multiple transformation stages where downstream modules access columns by name without defensive checking. Always validate column existence before pipeline execution.

## `CW-CREDIT-RISK-002` — Explicit random_state for ML model reproducibility
**From**: finance-bp-112--openLGD · **Applicable to**: credit-risk

In federated learning scenarios with SGDRegressor, omitting random_state causes non-deterministic results due to random data shuffling and weight initialization. This breaks federated learning convergence guarantees. Always set random_state explicitly when reproducibility across nodes or runs is required for regulatory auditability.

## `CW-CREDIT-RISK-003` — Mandatory data sorting before multi-stage estimation
**From**: finance-bp-050--skorecard, finance-bp-119--transitionMatrix · **Applicable to**: credit-risk

Both skorecard's two-phase bucketing and transitionMatrix's Aalen-Johansen estimator require data to be in a specific order before processing. Skorecard requires prebucketing before bucketing; transitionMatrix requires sorting by entity ID then time. Violating this ordering produces incorrect results or runtime errors. Always establish and enforce processing order in multi-stage pipelines.

## `CW-CREDIT-RISK-004` — Consistent API response key naming across all endpoints
**From**: finance-bp-112--openLGD · **Applicable to**: credit-risk

In federated systems with multiple API endpoints (/start, /update), all responses must use identical key names for parameters (intercept, coefficient). Inconsistency causes coordination loop failures in downstream consumers. Define a schema contract upfront and enforce key naming consistency across all response types.

## `CW-CREDIT-RISK-005` — Cardinality bounds checking before array operations
**From**: finance-bp-050--skorecard, finance-bp-119--transitionMatrix · **Applicable to**: credit-risk

Both skorecard's bucketers (max 100 unique values) and transitionMatrix's matrix operations (state cardinality matching matrix dimensions) require strict cardinality validation before creating numpy arrays or performing computations. Violations cause NotPreBucketedError or index out-of-bounds errors. Always validate cardinality constraints before array initialization.

## `CW-CREDIT-RISK-006` — Financial validation gates before transaction execution
**From**: finance-bp-072--lending · **Applicable to**: credit-risk

Lending systems require validation that disbursement amounts do not exceed limits, collateral values, or authorized periods before any transaction executes. These are financial loss prevention controls, not optional business logic. Missing these validations creates unauthorized exposure and regulatory compliance violations that cannot be remedied retroactively.

## `CW-CREDIT-RISK-007` — Mathematical constraint validation for probability outputs
**From**: finance-bp-050--skorecard, finance-bp-119--transitionMatrix · **Applicable to**: credit-risk

Credit risk models must validate mathematical constraints on outputs: skorecard's WoE requires valid bin assignments, transitionMatrix's transition matrices require row sums equals 1.0 and generator matrices require row sums equals 0.0. Invalid mathematical properties corrupt downstream risk calculations. Validate constraints before returning results.

## `CW-CREDIT-RISK-008` — Port-to-ID mapping consistency in distributed model serving
**From**: finance-bp-112--openLGD · **Applicable to**: credit-risk

When deploying distributed model servers, port numbers must map deterministically to server IDs (e.g., port 5001 maps to server ID 1). Computation of ID from port must be consistent across all components. Inconsistencies cause incorrect data directory selection and model parameter mismatches. Document and validate port-ID mappings during deployment.

FILE:references/components/credit_curve_analysis.md
# credit_curve_analysis (3 classes)

## `CreditCurve.default_curves`
`credit_curve_analysis/creditcurve-default-curves.py:0`

## `CreditCurve.validate`
`credit_curve_analysis/creditcurve-validate.py:0`

## `absorbing_state_detection`
`credit_curve_analysis/absorbing-state-detection.py:0`

FILE:references/components/data_preprocessing.md
# data_preprocessing (5 classes)

## `to_canonical`
`data_preprocessing/to-canonical.py:0`

## `to_compact`
`data_preprocessing/to-compact.py:0`

## `bin_timestamps`
`data_preprocessing/bin-timestamps.py:0`

## `generate_cohort_bounds`
`data_preprocessing/generate-cohort-bounds.py:0`

## `cohort_assignment`
`data_preprocessing/cohort-assignment.py:0`

FILE:references/components/matrix_estimation.md
# matrix_estimation (5 classes)

## `BaseEstimator.fit`
`matrix_estimation/baseestimator-fit.py:0`

## `CohortEstimator.fit`
`matrix_estimation/cohortestimator-fit.py:0`

## `AalenJohansenEstimator.fit`
`matrix_estimation/aalenjohansenestimator-fit.py:0`

## `confidence_interval_method`
`matrix_estimation/confidence-interval-method.py:0`

## `estimator_type`
`matrix_estimation/estimator-type.py:0`

FILE:references/components/matrix_operations.md
# matrix_operations (7 classes)

## `TransitionMatrix.generator`
`matrix_operations/transitionmatrix-generator.py:0`

## `TransitionMatrix.power`
`matrix_operations/transitionmatrix-power.py:0`

## `TransitionMatrixSet.cumulate`
`matrix_operations/transitionmatrixset-cumulate.py:0`

## `TransitionMatrixSet.incremental`
`matrix_operations/transitionmatrixset-incremental.py:0`

## `TransitionMatrix.remove`
`matrix_operations/transitionmatrix-remove.py:0`

## `TransitionMatrixSet.default_curves`
`matrix_operations/transitionmatrixset-default-curves.py:0`

## `generator_fix_negative`
`matrix_operations/generator-fix-negative.py:0`

FILE:references/components/matrix_representation.md
# matrix_representation (6 classes)

## `TransitionMatrix.validate`
`matrix_representation/transitionmatrix-validate.py:0`

## `TransitionMatrix.fix_rowsums`
`matrix_representation/transitionmatrix-fix-rowsums.py:0`

## `TransitionMatrix.characterize`
`matrix_representation/transitionmatrix-characterize.py:0`

## `TransitionMatrixSet.to_json`
`matrix_representation/transitionmatrixset-to-json.py:0`

## `validation_accuracy`
`matrix_representation/validation-accuracy.py:0`

## `matrix_set_method`
`matrix_representation/matrix-set-method.py:0`

FILE:references/components/state_space_definition.md
# state_space_definition (2 classes)

## `StateSpace.get_states`
`state_space_definition/statespace-get-states.py:0`

## `state_inference_strategy`
`state_space_definition/state-inference-strategy.py:0`

FILE:references/components/synthetic_data_generation.md
# synthetic_data_generation (4 classes)

## `exponential_transitions`
`synthetic_data_generation/exponential-transitions.py:0`

## `markov_chain`
`synthetic_data_generation/markov-chain.py:0`

## `long_format`
`synthetic_data_generation/long-format.py:0`

## `time_distribution`
`synthetic_data_generation/time-distribution.py:0`

ClawHub Coding Backend+2

T@clawhub-tangweigang-jpg-8679fec286

Credit Scorecard

Skill

基于监督学习、决策树或聚类等多种算法，自动为评分卡变量生成最优分箱边界，同时支持单调性约束和缺失值处理。

---
name: credit-scorecard
description: |-
  基于监督学习、决策树或聚类等多种算法，自动为评分卡变量生成最优分箱边界，同时支持单调性约束和缺失值处理。
license: Proprietary. See LICENSE.txt in project root.
compatibility: Designed for Doramagic-host ecosystem (Claude Code / openclaw / Cursor). Requires Python 3.12+ with uv package manager.
metadata:
  version: "v6.1"
  blueprint_id: "finance-bp-050"
  compiled_at: "2026-04-22T13:00:17.518473+00:00"
  capability_markets: "global"
  capability_activities: "credit-risk"
  sop_version: "crystal-compilation-v6.1"
---
# 信用评分卡 (credit-scorecard)

> 基于监督学习、决策树或聚类等多种算法，自动为评分卡变量生成最优分箱边界，同时支持单调性约束和缺失值处理。

## Pipeline

`data_collection -> data_storage -> factor_computation -> target_selection -> trading_execution -> visualization`

## Top Use Cases (43 total)

### Optimal Supervised Bucketing (`UC-1`)
Automatically find optimal bucket boundaries that maximize predictive power while respecting monotonicity constraints
**Triggers**: optimal, supervised, monotonic

### Decision Tree Supervised Bucketing (`UC-2`)
Use supervised learning to find bucket boundaries based on target variable correlation
**Triggers**: decision tree, supervised, pre-bin

### Equal Width Unsupervised Bucketing (`UC-3`)
Divide numerical features into N equally spaced intervals regardless of data distribution
**Triggers**: equal width, unsupervised, histogram

For all **43** use cases, see [references/USE_CASES.md](references/USE_CASES.md).

**Execute trigger**: `When user intent matches intent_router.uc_entries[].positive_terms AND user uses action verb (run/execute/跑/执行/backtest/fetch/collect)`

## What I'll Ask You

- Target market: A-share (default), HK, or crypto? (US stocks in ZVT are half-baked — stockus_nasdaq_AAPL exists but coverage is thin)
- Data source / provider: eastmoney (free, no account), joinquant (account+paid), baostock (free, good history), akshare, or qmt (broker)?
- Strategy type: MACD golden-cross, MA crossover, volume breakout, fundamental screen, or custom factor?
- Time range: start_timestamp and end_timestamp for backtest period
- Target entity IDs: specific stocks (stock_sh_600000) or index components (SZ1000)?

## Semantic Locks (Fatal)

| ID | Rule | On Violation |
|---|---|---|
| `SL-01` | Execute sell orders before buy orders in every trading cycle | halt |
| `SL-02` | Trading signals MUST use next-bar execution (no look-ahead) | halt |
| `SL-03` | Entity IDs MUST follow format entity_type_exchange_code | halt |
| `SL-04` | DataFrame index MUST be MultiIndex (entity_id, timestamp) | halt |
| `SL-05` | TradingSignal MUST have EXACTLY ONE of: position_pct, order_money, order_amount | halt |
| `SL-06` | filter_result column semantics: True=BUY, False=SELL, None/NaN=NO ACTION | halt |
| `SL-07` | Transformer MUST run BEFORE Accumulator in factor pipeline | halt |
| `SL-08` | MACD parameters locked: fast=12, slow=26, signal=9 | halt |

Full lock definitions: [references/LOCKS.md](references/LOCKS.md)

## Top Anti-Patterns (14 total)

- **`AP-CREDIT-RISK-001`**: Empty DataFrame passed to bucketing pipeline
- **`AP-CREDIT-RISK-002`**: Multi-dimensional target array causing WoE shape mismatch
- **`AP-CREDIT-RISK-003`**: OptimalBucketer receiving high-cardinality numerical features

All 14 anti-patterns: [references/ANTI_PATTERNS.md](references/ANTI_PATTERNS.md)

## Evidence Quality Notice

> [QUALITY NOTICE] This crystal was compiled from blueprint finance-bp-050. Evidence verify ratio = 78.6% and audit fail total = 24. Generated results may have uncaptured requirement gaps. Verify critical decisions against source files (LATEST.yaml / LATEST.jsonl).

## Reference Files

| File | Contents | When to Load |
|---|---|---|
| [references/seed.yaml](references/seed.yaml) | V6+ 全量权威 (source-of-truth) | 有行为/决策争议时必读 |
| [references/ANTI_PATTERNS.md](references/ANTI_PATTERNS.md) | 14 条跨项目反模式 | 开始实现前 |
| [references/WISDOM.md](references/WISDOM.md) | 跨项目精华借鉴 | 架构决策时 |
| [references/CONSTRAINTS.md](references/CONSTRAINTS.md) | domain + fatal 约束 | 规则冲突时 |
| [references/USE_CASES.md](references/USE_CASES.md) | 全量 KUC-* 业务场景 | 需要完整示例时 |
| [references/LOCKS.md](references/LOCKS.md) | SL-* + preconditions + hints | 生成回测/交易代码前 |
| [references/COMPONENTS.md](references/COMPONENTS.md) | AST 组件地图（按 module 拆分）| 查 API 时 |

---

*Compiled by Doramagic crystal-compilation-v6.1 from `finance-bp-050` blueprint at 2026-04-22T13:00:17.518473+00:00.*
*See [human_summary.md](human_summary.md) for non-technical overview.*

FILE:human_summary.md
# Human Summary

> I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow. Just tell me what you want; I'll write the code, you don't have to dig docs. (Heads up: ZVT natively supports A-share, HK, and crypto. US stocks — stockus_nasdaq_AAPL — are half-baked; don't bother for serious work.)

## What I Can Do

- **tagline**: I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow. Just tell me what you want; I'll write the code, you don't have to dig docs. (Heads up: ZVT natively supports A-share, HK, and crypto. US stocks — stockus_nasdaq_AAPL — are half-baked; don't bother for serious work.)
- **use_cases**: ['Equal Width Unsupervised Bucketing', 'Decision Tree Supervised Bucketing', 'Optimal Supervised Bucketing', 'A-share MACD daily golden-cross backtest with hfq price adjustment from eastmoney', 'End-to-end ZVT pipeline: FinanceRecorder + GoodCompanyFactor + StockTrader', 'Multi-factor strategy with TargetSelector (AND mode) combining MACD + volume breakout', 'Index composition data collection (SZ1000, SZ2000) with EM recorder']

## What I Ask You

- Target market: A-share (default), HK, or crypto? (US stocks in ZVT are half-baked — stockus_nasdaq_AAPL exists but coverage is thin)
- Data source / provider: eastmoney (free, no account), joinquant (account+paid), baostock (free, good history), akshare, or qmt (broker)?
- Strategy type: MACD golden-cross, MA crossover, volume breakout, fundamental screen, or custom factor?
- Time range: start_timestamp and end_timestamp for backtest period
- Target entity IDs: specific stocks (stock_sh_600000) or index components (SZ1000)?

FILE:references/ANTI_PATTERNS.md
# Anti-Patterns (Cross-Project)

Total: **14**

## finance-bp-050--skorecard (5)

### `AP-CREDIT-RISK-001` — Empty DataFrame passed to bucketing pipeline <sub>(high)</sub>

When preparing input data for bucketing, passing an empty DataFrame with zero rows or zero columns causes immediate ValueError at validation stage. This prevents any downstream processing and blocks the entire credit risk scoring pipeline from executing. The root cause is missing defensive validation before data enters the bucketing workflow.

### `AP-CREDIT-RISK-002` — Multi-dimensional target array causing WoE shape mismatch <sub>(high)</sub>

When providing target variable y to bucketers without normalizing to 1D numpy array through _check_y validation, downstream Weight of Evidence calculations fail with shape mismatches. The consequence is corrupted bucket tables with incorrect credit risk scores that misrepresent default probability estimates.

### `AP-CREDIT-RISK-003` — OptimalBucketer receiving high-cardinality numerical features <sub>(high)</sub>

When implementing prebucketing for OptimalBucketer on numerical features without reducing to at most 100 unique values, the system raises NotPreBucketedError and blocks the entire bucketing pipeline. Similarly, AsIsNumericalBucketer fails with the same error for columns exceeding 100 unique values, preventing feature transformation in production scoring.

### `AP-CREDIT-RISK-004` — Special values distorting optimal bin boundaries <sub>(high)</sub>

When implementing fit() for bucketers without filtering special values from X before computing bin boundaries using _filter_specials_for_fit(), outlier special values distort optimal bin boundaries. This causes incorrect weight-of-evidence calculations and unreliable credit risk scores that misrepresent borrower default probabilities.

### `AP-CREDIT-RISK-005` — Two-phase bucketing ordering violation causing special value loss <sub>(high)</sub>

When fitting a BucketingProcess with two-phase bucketing without fitting prebucketing_pipeline before bucketing_pipeline, special value remapping fails because pre-bucket labels are unavailable. Additionally, not using _find_remapped_specials() after prebucketing causes special values to lose their correct bucket mappings, resulting in runtime errors.

## finance-bp-072--lending (3)

### `AP-CREDIT-RISK-006` — Loan amount exceeding product and collateral limits <sub>(high)</sub>

When validating loan amount for loan applications without enforcing loan_amount does not exceed maximum_loan_amount from loan product or proposed securities, disbursing amounts exceeding product or collateral limits exposes the lender to uncollateralized risk. This violates lending policy and creates direct financial loss exposure through unauthorized lending.

### `AP-CREDIT-RISK-007` — Disbursement validation failures creating unauthorized exposure <sub>(high)</sub>

When implementing loan disbursement validation without checking disbursed amount against loan limit, assigned security value, available limit amount, and limit applicability dates, unauthorized disbursements occur. For Line of Credit loans, disbursement outside approved periods or exceeding available limits creates unauthorized lending exposure and regulatory compliance violations.

### `AP-CREDIT-RISK-008` — Interest accrual on written-off loans inflating income <sub>(high)</sub>

When processing interest accrual for Written Off loans without verifying posting_date is on or after the loan write-off date, interest is artificially inflated on non-performing assets. This misrepresents loan portfolio value, violates provisioning requirements, and creates false income reporting that misleads stakeholders about actual financial performance.

## finance-bp-112--openLGD (2)

### `AP-CREDIT-RISK-009` — Loop index errors in federated parameter averaging <sub>(high)</sub>

When implementing federated parameter averaging logic, using the final index n instead of the loop variable k causes only the last server's weight to be applied repeatedly. Additionally, skipping the first server by starting loop index at 1 excludes valid parameters from averaging, breaking federated convergence and producing incorrect LGD estimates across all nodes.

### `AP-CREDIT-RISK-010` — API response format inconsistency breaking federated coordination <sub>(high)</sub>

When implementing GET /start and POST /update endpoints for LGD estimation without consistent 'intercept' and 'coefficient' keys in JSON responses, the federated coordinator fails to parse responses causing KeyError. Different return key names (e.g., 'coef' instead of 'coefficient') break both standalone and federated execution paths.

## finance-bp-119--transitionMatrix (4)

### `AP-CREDIT-RISK-011` — Invalid transition probabilities corrupting Markov matrices <sub>(high)</sub>

When generating synthetic Markov chain data or estimating transition matrices with probabilities outside [0, 1] or row sums not equal to 1.0, the resulting matrices violate the fundamental mathematical definition of a stochastic transition matrix. This corrupts all downstream Markov chain modeling and credit curve generation, producing unreliable credit risk estimates.

### `AP-CREDIT-RISK-012` — Unsorted event data causing incorrect transition matrix estimates <sub>(high)</sub>

When feeding generated data to cohort or duration estimators without sorting by entity ID first, then by ascending time, incorrect timepoint assignment occurs in estimators, leading to wrong transition counts. Unsorted data also causes the Aalen-Johansen algorithm to process events out of temporal order, producing incorrect transition matrices that violate the Markov property.

### `AP-CREDIT-RISK-013` — Zero-count division causing NaN in transition matrices <sub>(high)</sub>

When normalizing counts to produce transition probabilities without checking source state population count is greater than zero before division, division by zero occurs and causes NaN values in the transition matrix. These NaN values corrupt all downstream matrix operations including generator matrix computation and credit curve generation.

### `AP-CREDIT-RISK-014` — Wrong matrix logarithm method producing invalid generator matrices <sub>(medium)</sub>

When implementing generator() method without using scipy.linalg.logm for matrix logarithm computation, using numpy.log or other approximation methods produces invalid generator matrices with row sums not equal to zero. This violates the mathematical definition of an infinitesimal generator, causing incorrect continuous-time Markov chain modeling.

FILE:references/COMPONENTS.md
# Component Capability Map

**Project**: finance-bp-050--skorecard
**Scan date**: 2026-04-22
**Stats**: {'total_files': 9, 'total_classes': 18, 'total_functions': 0, 'total_stages': 9}

## Modules (9)

- [data_preparation](components/data_preparation.md): 2 classes
- [feature_pre-bucketing_(optional)](components/feature_pre-bucketing_-optional.md): 2 classes
- [feature_bucketing_/_binning](components/feature_bucketing_-_binning.md): 2 classes
- [weight_of_evidence_(woe)_encoding](components/weight_of_evidence_-woe-_encoding.md): 2 classes
- [feature_selection](components/feature_selection.md): 2 classes
- [logistic_regression_model_training](components/logistic_regression_model_training.md): 2 classes
- [scorecard_rescaling](components/scorecard_rescaling.md): 2 classes
- [validation_and_reporting](components/validation_and_reporting.md): 2 classes
- [model_deployment](components/model_deployment.md): 2 classes

FILE:references/CONSTRAINTS.md
# Constraints

## preservation_manifest

```yaml
required_objects:
  business_decisions_count: 124
  fatal_constraints_count: 64
  non_fatal_constraints_count: 194
  use_cases_count: 43
  semantic_locks_count: 12
  preconditions_count: 4
  evidence_quality_rules_count: 2
  traceback_scenarios_count: 5

```

FILE:references/LOCKS.md
# Semantic Locks + Preconditions

## Semantic Locks (12)

### `SL-01` <sub>(on_violation: fatal)</sub>
Execute sell orders before buy orders in every trading cycle

### `SL-02` <sub>(on_violation: fatal)</sub>
Trading signals MUST use next-bar execution (no look-ahead)

### `SL-03` <sub>(on_violation: fatal)</sub>
Entity IDs MUST follow format entity_type_exchange_code

### `SL-04` <sub>(on_violation: fatal)</sub>
DataFrame index MUST be MultiIndex (entity_id, timestamp)

### `SL-05` <sub>(on_violation: fatal)</sub>
TradingSignal MUST have EXACTLY ONE of: position_pct, order_money, order_amount

### `SL-06` <sub>(on_violation: fatal)</sub>
filter_result column semantics: True=BUY, False=SELL, None/NaN=NO ACTION

### `SL-07` <sub>(on_violation: fatal)</sub>
Transformer MUST run BEFORE Accumulator in factor pipeline

### `SL-08` <sub>(on_violation: fatal)</sub>
MACD parameters locked: fast=12, slow=26, signal=9

### `SL-09` <sub>(on_violation: warning)</sub>
Default transaction costs: buy_cost=0.001, sell_cost=0.001, slippage=0.001

### `SL-10` <sub>(on_violation: fatal)</sub>
A-share equity trading is T+1 (no same-day close of buy positions)

### `SL-11` <sub>(on_violation: fatal)</sub>
Recorder subclass MUST define provider AND data_schema class attributes

### `SL-12` <sub>(on_violation: fatal)</sub>
Factor result_df MUST contain either 'filter_result' OR 'score_result' column

## Preconditions (4)

- **`PC-01`**: `python3 -c 'import zvt; print(zvt.__version__)'` → on_fail: Run: python3 -m pip install zvt then re-run: python3 -m zvt.init_dirs to initialize data directories
- **`PC-02`**: `python3 -c "from zvt.api.kdata import get_kdata; df = get_kdata(entity_ids=['stock_sh_600000'], limit=1); assert df is not None and len(df) > 0, 'No kdata found'"` → on_fail: Run recorder first: python3 -m zvt.recorders.em.em_stock_kdata_recorder --entity_ids stock_sh_600000 (replace with your target entity IDs)
- **`PC-03`**: `python3 -c "import os; from pathlib import Path; zvt_home = Path(os.environ.get('ZVT_HOME', Path.home() / '.zvt')); assert zvt_home.exists(), f'ZVT home not found: {zvt_home}'"` → on_fail: Run: python3 -m zvt.init_dirs
- **`PC-04`**: `python3 -c "import os, tempfile; from pathlib import Path; zvt_home = Path(os.environ.get('ZVT_HOME', Path.home() / '.zvt')); test_f = zvt_home / '.write_test'; test_f.touch(); test_f.unlink()"` → on_fail: Check directory permissions: chmod u+w ~/.zvt or set ZVT_HOME environment variable to a writable location

FILE:references/USE_CASES.md
# Known Use Cases (KUC)

Total: **43**

## `KUC-1`
**Source**: `skorecard/bucketers/bucketers.py`

Automatically find optimal bucket boundaries that maximize predictive power while respecting monotonicity constraints

## `KUC-2`
**Source**: `skorecard/bucketers/bucketers.py`

Use supervised learning to find bucket boundaries based on target variable correlation

## `KUC-3`
**Source**: `skorecard/bucketers/bucketers.py`

Divide numerical features into N equally spaced intervals regardless of data distribution

## `KUC-4`
**Source**: `skorecard/bucketers/bucketers.py`

Divide numerical features into N buckets with equal number of observations (quantiles)

## `KUC-5`
**Source**: `skorecard/bucketers/bucketers.py`

Use agglomerative clustering to find natural groupings in numerical data

## `KUC-6`
**Source**: `skorecard/bucketers/bucketers.py`

Convert categorical variables into ordered ordinal numbers based on target rate or frequency

## `KUC-7`
**Source**: `skorecard/bucketers/bucketers.py`

Treat existing unique categories as pre-defined buckets without transformation

## `KUC-8`
**Source**: `skorecard/bucketers/bucketers.py`

Treat existing unique numerical values as bucket boundaries (for pre-bucketed data)

## `KUC-9`
**Source**: `skorecard/bucketers/bucketers.py`

Apply manually defined bucket boundaries from YAML or dictionary to new data

## `KUC-10`
**Source**: `skorecard/pipeline/bucketing_process.py`

First pre-bucket high-cardinality features, then apply final bucketing strategy

## `KUC-11`
**Source**: `skorecard/bucketers/base_bucketer.py`

Handle missing values by assigning them to specific buckets or treating separately

## `KUC-12`
**Source**: `skorecard/bucketers/base_bucketer.py`

Assign specific outlier or important values to their own dedicated buckets

## `KUC-13`
**Source**: `skorecard/bucketers/base_bucketer.py`

Visually explore and manually adjust bucket boundaries using Dash web app

## `KUC-14`
**Source**: `skorecard/preprocessing/_WoEEncoder.py`

Transform bucket IDs into Weight of Evidence values for logistic regression

## `KUC-15`
**Source**: `skorecard/metrics/metrics.py`

Measure the predictive power of individual features for credit risk

## `KUC-16`
**Source**: `skorecard/reporting/report.py`

Monitor distribution drift between training and production data

## `KUC-17`
**Source**: `skorecard/linear_model/linear_model.py`

Build logistic regression model with statistical significance for credit scoring

## `KUC-18`
**Source**: `skorecard/skorecard.py`

Build complete credit scoring scorecard in one step

## `KUC-19`
**Source**: `skorecard/rescale/rescale.py`

Convert model probabilities to traditional scorecard scale (e.g., 300-850)

## `KUC-20`
**Source**: `skorecard/reporting/report.py`

Generate detailed bucket tables with event rate, WoE, IV for documentation

## `KUC-21`
**Source**: `skorecard/reporting/plotting.py`

Visualize bucket distributions with event rate or WoE trends

## `KUC-22`
**Source**: `skorecard/bucket_mapping.py`

Export bucket mappings to YAML for production deployment

## `KUC-23`
**Source**: `skorecard/pipeline/pipeline.py`

Integrate skorecard bucketers into existing scikit-learn ML pipelines

## `KUC-24`
**Source**: `docs/tutorials/2_feature_selection.ipynb`

Select most predictive and stable features using IV and PSI metrics

## `KUC-25`
**Source**: `skorecard/utils/validation.py`

Detect suppressor effects and multicollinearity between features

## `KUC-26`
**Source**: `docs/tutorials/categoricals.ipynb`

Handle categorical variables with many categories in credit scoring

## `KUC-27`
**Source**: `docs/discussion/benchmark_with_EBM.ipynb`

Compare skorecard performance against Explainable Boosting Machines

## `KUC-101`
**Source**: `docs/discussion/benchmark_stats_feature.ipynb`

Compare performance of different machine learning classifiers on credit card default prediction using AUC metrics.

## `KUC-103`
**Source**: `docs/discussion/benchmarks.ipynb`

Run comprehensive benchmarks comparing multiple classifiers on credit card data with timing analysis.

## `KUC-104`
**Source**: `docs/howto/Optimizations.ipynb`

Find optimal bucketing parameters (max_n_bins, min_bin_size) using grid search with Information Value scoring.

## `KUC-105`
**Source**: `docs/howto/mix_with_other_packages.ipynb`

Combine skorecard bucketing with external packages like category_encoders and sklearn transformers in a pipeline.

## `KUC-106`
**Source**: `docs/howto/psi_and_iv.ipynb`

Calculate Population Stability Index (PSI) and Information Value (IV) to validate feature stability and predictive power.

## `KUC-107`
**Source**: `docs/howto/save_buckets_to_file.ipynb`

Persist bucketer configurations to YAML files for reuse and deployment across environments.

## `KUC-108`
**Source**: `docs/howto/using_manually_defined_buckets.ipynb`

Define custom bucket boundaries manually for specific business requirements without automatic binning.

## `KUC-109`
**Source**: `docs/tutorials/1_bucketing.ipynb`

Learn fundamental bucketing concepts for credit card data including categorical and numerical feature handling.

## `KUC-111`
**Source**: `docs/tutorials/3_skorecard_model.ipynb`

Build an end-to-end scorecard model combining bucketing with logistic regression for credit scoring.

## `KUC-113`
**Source**: `docs/tutorials/interactive_bucketing.ipynb`

Learn interactive bucketing approach for manual adjustment of bin boundaries in a pipeline.

## `KUC-114`
**Source**: `docs/tutorials/methods.ipynb`

Explore bucketer methods including summary statistics, bucket tables, plots, and YAML export.

## `KUC-115`
**Source**: `docs/tutorials/missing_values.ipynb`

Handle missing values in bucketing with various treatment strategies like neutral, similar, least_risky.

## `KUC-116`
**Source**: `docs/tutorials/reporting.ipynb`

Generate reports and visualizations for scorecard models including bucket tables and weight plots.

## `KUC-117`
**Source**: `docs/tutorials/specials.ipynb`

Define special values and ranges in bucketing that require separate treatment from regular bins.

## `KUC-118`
**Source**: `docs/tutorials/the_basics.ipynb`

Introduction to basic bucketing operations including DecisionTree and EqualWidth bucketers.

## `KUC-119`
**Source**: `docs/tutorials/using-bucketing-process.ipynb`

Learn the BucketingProcess workflow with pre-bucketing and bucketing stages for complex credit scoring.

FILE:references/WISDOM.md
# Cross-Project Wisdom

Total: **8**

## `CW-CREDIT-RISK-001` — Strict input DataFrame schema validation
**From**: finance-bp-050--skorecard, finance-bp-112--openLGD · **Applicable to**: credit-risk

Both skorecard and openLGD require strict validation that input DataFrames contain exactly the expected columns (X/Y for openLGD, specified variable names for skorecard). This pattern is critical when data flows through multiple transformation stages where downstream modules access columns by name without defensive checking. Always validate column existence before pipeline execution.

## `CW-CREDIT-RISK-002` — Explicit random_state for ML model reproducibility
**From**: finance-bp-112--openLGD · **Applicable to**: credit-risk

In federated learning scenarios with SGDRegressor, omitting random_state causes non-deterministic results due to random data shuffling and weight initialization. This breaks federated learning convergence guarantees. Always set random_state explicitly when reproducibility across nodes or runs is required for regulatory auditability.

## `CW-CREDIT-RISK-003` — Mandatory data sorting before multi-stage estimation
**From**: finance-bp-050--skorecard, finance-bp-119--transitionMatrix · **Applicable to**: credit-risk

Both skorecard's two-phase bucketing and transitionMatrix's Aalen-Johansen estimator require data to be in a specific order before processing. Skorecard requires prebucketing before bucketing; transitionMatrix requires sorting by entity ID then time. Violating this ordering produces incorrect results or runtime errors. Always establish and enforce processing order in multi-stage pipelines.

## `CW-CREDIT-RISK-004` — Consistent API response key naming across all endpoints
**From**: finance-bp-112--openLGD · **Applicable to**: credit-risk

In federated systems with multiple API endpoints (/start, /update), all responses must use identical key names for parameters (intercept, coefficient). Inconsistency causes coordination loop failures in downstream consumers. Define a schema contract upfront and enforce key naming consistency across all response types.

## `CW-CREDIT-RISK-005` — Cardinality bounds checking before array operations
**From**: finance-bp-050--skorecard, finance-bp-119--transitionMatrix · **Applicable to**: credit-risk

Both skorecard's bucketers (max 100 unique values) and transitionMatrix's matrix operations (state cardinality matching matrix dimensions) require strict cardinality validation before creating numpy arrays or performing computations. Violations cause NotPreBucketedError or index out-of-bounds errors. Always validate cardinality constraints before array initialization.

## `CW-CREDIT-RISK-006` — Financial validation gates before transaction execution
**From**: finance-bp-072--lending · **Applicable to**: credit-risk

Lending systems require validation that disbursement amounts do not exceed limits, collateral values, or authorized periods before any transaction executes. These are financial loss prevention controls, not optional business logic. Missing these validations creates unauthorized exposure and regulatory compliance violations that cannot be remedied retroactively.

## `CW-CREDIT-RISK-007` — Mathematical constraint validation for probability outputs
**From**: finance-bp-050--skorecard, finance-bp-119--transitionMatrix · **Applicable to**: credit-risk

Credit risk models must validate mathematical constraints on outputs: skorecard's WoE requires valid bin assignments, transitionMatrix's transition matrices require row sums equals 1.0 and generator matrices require row sums equals 0.0. Invalid mathematical properties corrupt downstream risk calculations. Validate constraints before returning results.

## `CW-CREDIT-RISK-008` — Port-to-ID mapping consistency in distributed model serving
**From**: finance-bp-112--openLGD · **Applicable to**: credit-risk

When deploying distributed model servers, port numbers must map deterministically to server IDs (e.g., port 5001 maps to server ID 1). Computation of ID from port must be consistent across all components. Inconsistencies cause incorrect data directory selection and model parameter mismatches. Document and validate port-ID mappings during deployment.

FILE:references/components/data_preparation.md
# data_preparation (2 classes)

## `ensure_dataframe`
`data_preparation/ensure-dataframe.py:0`

## `slot`
`data_preparation/slot.py:0`

FILE:references/components/feature_bucketing_-_binning.md
# feature_bucketing_/_binning (2 classes)

## `BaseBucketer.fit`
`feature_bucketing_/_binning/basebucketer-fit.py:0`

## `slot`
`feature_bucketing_/_binning/slot.py:0`

FILE:references/components/feature_pre-bucketing_-optional.md
# feature_pre-bucketing_(optional) (2 classes)

## `BaseBucketer.fit`
`feature_pre-bucketing_(optional)/basebucketer-fit.py:0`

## `slot`
`feature_pre-bucketing_(optional)/slot.py:0`

FILE:references/components/feature_selection.md
# feature_selection (2 classes)

## `iv`
`feature_selection/iv.py:0`

## `slot`
`feature_selection/slot.py:0`

FILE:references/components/logistic_regression_model_training.md
# logistic_regression_model_training (2 classes)

## `LogisticRegression.fit`
`logistic_regression_model_training/logisticregression-fit.py:0`

## `slot`
`logistic_regression_model_training/slot.py:0`

FILE:references/components/model_deployment.md
# model_deployment (2 classes)

## `FeaturesBucketMapping.save_yml`
`model_deployment/featuresbucketmapping-save-yml.py:0`

## `slot`
`model_deployment/slot.py:0`

FILE:references/components/scorecard_rescaling.md
# scorecard_rescaling (2 classes)

## `ScoreCardPoints.fit`
`scorecard_rescaling/scorecardpoints-fit.py:0`

## `slot`
`scorecard_rescaling/slot.py:0`

FILE:references/components/validation_and_reporting.md
# validation_and_reporting (2 classes)

## `SkorecardPipeline.bucket_table`
`validation_and_reporting/skorecardpipeline-bucket-table.py:0`

## `slot`
`validation_and_reporting/slot.py:0`

FILE:references/components/weight_of_evidence_-woe-_encoding.md
# weight_of_evidence_(woe)_encoding (2 classes)

## `WoeEncoder.fit`
`weight_of_evidence_(woe)_encoding/woeencoder-fit.py:0`

## `slot`
`weight_of_evidence_(woe)_encoding/slot.py:0`

ClawHub Coding Backend+2

T@clawhub-tangweigang-jpg-8679fec286

Credit Lgd Model

Skill

构建并训练 LGD（违约损失率）机器学习模型，支持基于历史违约数据的信用风险量化评估与预测。

---
name: credit-lgd-model
description: |-
  构建并训练 LGD（违约损失率）机器学习模型，支持基于历史违约数据的信用风险量化评估与预测。
license: Proprietary. See LICENSE.txt in project root.
compatibility: Designed for Doramagic-host ecosystem (Claude Code / openclaw / Cursor). Requires Python 3.12+ with uv package manager.
metadata:
  version: "v6.1"
  blueprint_id: "finance-bp-112"
  compiled_at: "2026-04-22T13:00:54.441302+00:00"
  capability_markets: "global"
  capability_activities: "credit-risk"
  sop_version: "crystal-compilation-v6.1"
---
# 信用违约损失模型 (credit-lgd-model)

> 构建并训练 LGD（违约损失率）机器学习模型，支持基于历史违约数据的信用风险量化评估与预测。

## Pipeline

`data_collection -> data_storage -> factor_computation -> target_selection -> trading_execution -> visualization`

## Top Use Cases (1 total)

### Sphinx Documentation Configuration (`UC-101`)
This file configures the Sphinx documentation builder for the openLGD project, setting up project metadata, version information, and path configuratio
**Triggers**: documentation, sphinx, configuration

**Execute trigger**: `When user intent matches intent_router.uc_entries[].positive_terms AND user uses action verb (run/execute/跑/执行/backtest/fetch/collect)`

## What I'll Ask You

- Target market: A-share (default), HK, or crypto? (US stocks in ZVT are half-baked — stockus_nasdaq_AAPL exists but coverage is thin)
- Data source / provider: eastmoney (free, no account), joinquant (account+paid), baostock (free, good history), akshare, or qmt (broker)?
- Strategy type: MACD golden-cross, MA crossover, volume breakout, fundamental screen, or custom factor?
- Time range: start_timestamp and end_timestamp for backtest period
- Target entity IDs: specific stocks (stock_sh_600000) or index components (SZ1000)?

## Semantic Locks (Fatal)

| ID | Rule | On Violation |
|---|---|---|
| `SL-01` | Execute sell orders before buy orders in every trading cycle | halt |
| `SL-02` | Trading signals MUST use next-bar execution (no look-ahead) | halt |
| `SL-03` | Entity IDs MUST follow format entity_type_exchange_code | halt |
| `SL-04` | DataFrame index MUST be MultiIndex (entity_id, timestamp) | halt |
| `SL-05` | TradingSignal MUST have EXACTLY ONE of: position_pct, order_money, order_amount | halt |
| `SL-06` | filter_result column semantics: True=BUY, False=SELL, None/NaN=NO ACTION | halt |
| `SL-07` | Transformer MUST run BEFORE Accumulator in factor pipeline | halt |
| `SL-08` | MACD parameters locked: fast=12, slow=26, signal=9 | halt |

Full lock definitions: [references/LOCKS.md](references/LOCKS.md)

## Top Anti-Patterns (14 total)

- **`AP-CREDIT-RISK-001`**: Empty DataFrame passed to bucketing pipeline
- **`AP-CREDIT-RISK-002`**: Multi-dimensional target array causing WoE shape mismatch
- **`AP-CREDIT-RISK-003`**: OptimalBucketer receiving high-cardinality numerical features

All 14 anti-patterns: [references/ANTI_PATTERNS.md](references/ANTI_PATTERNS.md)

## Evidence Quality Notice

> [QUALITY NOTICE] This crystal was compiled from blueprint finance-bp-112. Evidence verify ratio = 21.0% and audit fail total = 23. Generated results may have uncaptured requirement gaps. Verify critical decisions against source files (LATEST.yaml / LATEST.jsonl).

## Reference Files

| File | Contents | When to Load |
|---|---|---|
| [references/seed.yaml](references/seed.yaml) | V6+ 全量权威 (source-of-truth) | 有行为/决策争议时必读 |
| [references/ANTI_PATTERNS.md](references/ANTI_PATTERNS.md) | 14 条跨项目反模式 | 开始实现前 |
| [references/WISDOM.md](references/WISDOM.md) | 跨项目精华借鉴 | 架构决策时 |
| [references/CONSTRAINTS.md](references/CONSTRAINTS.md) | domain + fatal 约束 | 规则冲突时 |
| [references/USE_CASES.md](references/USE_CASES.md) | 全量 KUC-* 业务场景 | 需要完整示例时 |
| [references/LOCKS.md](references/LOCKS.md) | SL-* + preconditions + hints | 生成回测/交易代码前 |
| [references/COMPONENTS.md](references/COMPONENTS.md) | AST 组件地图（按 module 拆分）| 查 API 时 |

---

*Compiled by Doramagic crystal-compilation-v6.1 from `finance-bp-112` blueprint at 2026-04-22T13:00:54.441302+00:00.*
*See [human_summary.md](human_summary.md) for non-technical overview.*

FILE:human_summary.md
# Human Summary

> I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow. Just tell me what you want; I'll write the code, you don't have to dig docs. (Heads up: ZVT natively supports A-share, HK, and crypto. US stocks — stockus_nasdaq_AAPL — are half-baked; don't bother for serious work.)

## What I Can Do

- **tagline**: I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow. Just tell me what you want; I'll write the code, you don't have to dig docs. (Heads up: ZVT natively supports A-share, HK, and crypto. US stocks — stockus_nasdaq_AAPL — are half-baked; don't bother for serious work.)
- **use_cases**: ['Sphinx Documentation Configuration', 'A-share MACD daily golden-cross backtest with hfq price adjustment from eastmoney', 'End-to-end ZVT pipeline: FinanceRecorder + GoodCompanyFactor + StockTrader', 'Multi-factor strategy with TargetSelector (AND mode) combining MACD + volume breakout', 'Index composition data collection (SZ1000, SZ2000) with EM recorder', 'Institutional fund holdings tracker via joinquant_fund_runner pattern', 'Custom Transformer + Accumulator factor with per-entity rolling state']

## What I Ask You

- Target market: A-share (default), HK, or crypto? (US stocks in ZVT are half-baked — stockus_nasdaq_AAPL exists but coverage is thin)
- Data source / provider: eastmoney (free, no account), joinquant (account+paid), baostock (free, good history), akshare, or qmt (broker)?
- Strategy type: MACD golden-cross, MA crossover, volume breakout, fundamental screen, or custom factor?
- Time range: start_timestamp and end_timestamp for backtest period
- Target entity IDs: specific stocks (stock_sh_600000) or index components (SZ1000)?

FILE:references/ANTI_PATTERNS.md
# Anti-Patterns (Cross-Project)

Total: **14**

## finance-bp-050--skorecard (5)

### `AP-CREDIT-RISK-001` — Empty DataFrame passed to bucketing pipeline <sub>(high)</sub>

When preparing input data for bucketing, passing an empty DataFrame with zero rows or zero columns causes immediate ValueError at validation stage. This prevents any downstream processing and blocks the entire credit risk scoring pipeline from executing. The root cause is missing defensive validation before data enters the bucketing workflow.

### `AP-CREDIT-RISK-002` — Multi-dimensional target array causing WoE shape mismatch <sub>(high)</sub>

When providing target variable y to bucketers without normalizing to 1D numpy array through _check_y validation, downstream Weight of Evidence calculations fail with shape mismatches. The consequence is corrupted bucket tables with incorrect credit risk scores that misrepresent default probability estimates.

### `AP-CREDIT-RISK-003` — OptimalBucketer receiving high-cardinality numerical features <sub>(high)</sub>

When implementing prebucketing for OptimalBucketer on numerical features without reducing to at most 100 unique values, the system raises NotPreBucketedError and blocks the entire bucketing pipeline. Similarly, AsIsNumericalBucketer fails with the same error for columns exceeding 100 unique values, preventing feature transformation in production scoring.

### `AP-CREDIT-RISK-004` — Special values distorting optimal bin boundaries <sub>(high)</sub>

When implementing fit() for bucketers without filtering special values from X before computing bin boundaries using _filter_specials_for_fit(), outlier special values distort optimal bin boundaries. This causes incorrect weight-of-evidence calculations and unreliable credit risk scores that misrepresent borrower default probabilities.

### `AP-CREDIT-RISK-005` — Two-phase bucketing ordering violation causing special value loss <sub>(high)</sub>

When fitting a BucketingProcess with two-phase bucketing without fitting prebucketing_pipeline before bucketing_pipeline, special value remapping fails because pre-bucket labels are unavailable. Additionally, not using _find_remapped_specials() after prebucketing causes special values to lose their correct bucket mappings, resulting in runtime errors.

## finance-bp-072--lending (3)

### `AP-CREDIT-RISK-006` — Loan amount exceeding product and collateral limits <sub>(high)</sub>

When validating loan amount for loan applications without enforcing loan_amount does not exceed maximum_loan_amount from loan product or proposed securities, disbursing amounts exceeding product or collateral limits exposes the lender to uncollateralized risk. This violates lending policy and creates direct financial loss exposure through unauthorized lending.

### `AP-CREDIT-RISK-007` — Disbursement validation failures creating unauthorized exposure <sub>(high)</sub>

When implementing loan disbursement validation without checking disbursed amount against loan limit, assigned security value, available limit amount, and limit applicability dates, unauthorized disbursements occur. For Line of Credit loans, disbursement outside approved periods or exceeding available limits creates unauthorized lending exposure and regulatory compliance violations.

### `AP-CREDIT-RISK-008` — Interest accrual on written-off loans inflating income <sub>(high)</sub>

When processing interest accrual for Written Off loans without verifying posting_date is on or after the loan write-off date, interest is artificially inflated on non-performing assets. This misrepresents loan portfolio value, violates provisioning requirements, and creates false income reporting that misleads stakeholders about actual financial performance.

## finance-bp-112--openLGD (2)

### `AP-CREDIT-RISK-009` — Loop index errors in federated parameter averaging <sub>(high)</sub>

When implementing federated parameter averaging logic, using the final index n instead of the loop variable k causes only the last server's weight to be applied repeatedly. Additionally, skipping the first server by starting loop index at 1 excludes valid parameters from averaging, breaking federated convergence and producing incorrect LGD estimates across all nodes.

### `AP-CREDIT-RISK-010` — API response format inconsistency breaking federated coordination <sub>(high)</sub>

When implementing GET /start and POST /update endpoints for LGD estimation without consistent 'intercept' and 'coefficient' keys in JSON responses, the federated coordinator fails to parse responses causing KeyError. Different return key names (e.g., 'coef' instead of 'coefficient') break both standalone and federated execution paths.

## finance-bp-119--transitionMatrix (4)

### `AP-CREDIT-RISK-011` — Invalid transition probabilities corrupting Markov matrices <sub>(high)</sub>

When generating synthetic Markov chain data or estimating transition matrices with probabilities outside [0, 1] or row sums not equal to 1.0, the resulting matrices violate the fundamental mathematical definition of a stochastic transition matrix. This corrupts all downstream Markov chain modeling and credit curve generation, producing unreliable credit risk estimates.

### `AP-CREDIT-RISK-012` — Unsorted event data causing incorrect transition matrix estimates <sub>(high)</sub>

When feeding generated data to cohort or duration estimators without sorting by entity ID first, then by ascending time, incorrect timepoint assignment occurs in estimators, leading to wrong transition counts. Unsorted data also causes the Aalen-Johansen algorithm to process events out of temporal order, producing incorrect transition matrices that violate the Markov property.

### `AP-CREDIT-RISK-013` — Zero-count division causing NaN in transition matrices <sub>(high)</sub>

When normalizing counts to produce transition probabilities without checking source state population count is greater than zero before division, division by zero occurs and causes NaN values in the transition matrix. These NaN values corrupt all downstream matrix operations including generator matrix computation and credit curve generation.

### `AP-CREDIT-RISK-014` — Wrong matrix logarithm method producing invalid generator matrices <sub>(medium)</sub>

When implementing generator() method without using scipy.linalg.logm for matrix logarithm computation, using numpy.log or other approximation methods produces invalid generator matrices with row sums not equal to zero. This violates the mathematical definition of an infinitesimal generator, causing incorrect continuous-time Markov chain modeling.

FILE:references/COMPONENTS.md
# Component Capability Map

**Project**: finance-bp-112--openLGD
**Scan date**: 2026-04-22
**Stats**: {'total_files': 5, 'total_classes': 12, 'total_functions': 0, 'total_stages': 5}

## Modules (5)

- [data_acquisition](components/data_acquisition.md): 2 classes
- [model_estimation](components/model_estimation.md): 2 classes
- [model_serving](components/model_serving.md): 3 classes
- [federated_coordination](components/federated_coordination.md): 3 classes
- [standalone_execution](components/standalone_execution.md): 2 classes

FILE:references/CONSTRAINTS.md
# Constraints

## preservation_manifest

```yaml
required_objects:
  business_decisions_count: 91
  fatal_constraints_count: 31
  non_fatal_constraints_count: 99
  use_cases_count: 1
  semantic_locks_count: 12
  preconditions_count: 4
  evidence_quality_rules_count: 2
  traceback_scenarios_count: 5

```

FILE:references/LOCKS.md
# Semantic Locks + Preconditions

## Semantic Locks (12)

### `SL-01` <sub>(on_violation: fatal)</sub>
Execute sell orders before buy orders in every trading cycle

### `SL-02` <sub>(on_violation: fatal)</sub>
Trading signals MUST use next-bar execution (no look-ahead)

### `SL-03` <sub>(on_violation: fatal)</sub>
Entity IDs MUST follow format entity_type_exchange_code

### `SL-04` <sub>(on_violation: fatal)</sub>
DataFrame index MUST be MultiIndex (entity_id, timestamp)

### `SL-05` <sub>(on_violation: fatal)</sub>
TradingSignal MUST have EXACTLY ONE of: position_pct, order_money, order_amount

### `SL-06` <sub>(on_violation: fatal)</sub>
filter_result column semantics: True=BUY, False=SELL, None/NaN=NO ACTION

### `SL-07` <sub>(on_violation: fatal)</sub>
Transformer MUST run BEFORE Accumulator in factor pipeline

### `SL-08` <sub>(on_violation: fatal)</sub>
MACD parameters locked: fast=12, slow=26, signal=9

### `SL-09` <sub>(on_violation: warning)</sub>
Default transaction costs: buy_cost=0.001, sell_cost=0.001, slippage=0.001

### `SL-10` <sub>(on_violation: fatal)</sub>
A-share equity trading is T+1 (no same-day close of buy positions)

### `SL-11` <sub>(on_violation: fatal)</sub>
Recorder subclass MUST define provider AND data_schema class attributes

### `SL-12` <sub>(on_violation: fatal)</sub>
Factor result_df MUST contain either 'filter_result' OR 'score_result' column

## Preconditions (4)

- **`PC-01`**: `python3 -c 'import zvt; print(zvt.__version__)'` → on_fail: Run: python3 -m pip install zvt then re-run: python3 -m zvt.init_dirs to initialize data directories
- **`PC-02`**: `python3 -c "from zvt.api.kdata import get_kdata; df = get_kdata(entity_ids=['stock_sh_600000'], limit=1); assert df is not None and len(df) > 0, 'No kdata found'"` → on_fail: Run recorder first: python3 -m zvt.recorders.em.em_stock_kdata_recorder --entity_ids stock_sh_600000 (replace with your target entity IDs)
- **`PC-03`**: `python3 -c "import os; from pathlib import Path; zvt_home = Path(os.environ.get('ZVT_HOME', Path.home() / '.zvt')); assert zvt_home.exists(), f'ZVT home not found: {zvt_home}'"` → on_fail: Run: python3 -m zvt.init_dirs
- **`PC-04`**: `python3 -c "import os, tempfile; from pathlib import Path; zvt_home = Path(os.environ.get('ZVT_HOME', Path.home() / '.zvt')); test_f = zvt_home / '.write_test'; test_f.touch(); test_f.unlink()"` → on_fail: Check directory permissions: chmod u+w ~/.zvt or set ZVT_HOME environment variable to a writable location

FILE:references/USE_CASES.md
# Known Use Cases (KUC)

Total: **1**

## `KUC-101`
**Source**: `docs/source/conf.py`

This file configures the Sphinx documentation builder for the openLGD project, setting up project metadata, version information, and path configurations needed to generate developer documentation.

FILE:references/WISDOM.md
# Cross-Project Wisdom

Total: **8**

## `CW-CREDIT-RISK-001` — Strict input DataFrame schema validation
**From**: finance-bp-050--skorecard, finance-bp-112--openLGD · **Applicable to**: credit-risk

Both skorecard and openLGD require strict validation that input DataFrames contain exactly the expected columns (X/Y for openLGD, specified variable names for skorecard). This pattern is critical when data flows through multiple transformation stages where downstream modules access columns by name without defensive checking. Always validate column existence before pipeline execution.

## `CW-CREDIT-RISK-002` — Explicit random_state for ML model reproducibility
**From**: finance-bp-112--openLGD · **Applicable to**: credit-risk

In federated learning scenarios with SGDRegressor, omitting random_state causes non-deterministic results due to random data shuffling and weight initialization. This breaks federated learning convergence guarantees. Always set random_state explicitly when reproducibility across nodes or runs is required for regulatory auditability.

## `CW-CREDIT-RISK-003` — Mandatory data sorting before multi-stage estimation
**From**: finance-bp-050--skorecard, finance-bp-119--transitionMatrix · **Applicable to**: credit-risk

Both skorecard's two-phase bucketing and transitionMatrix's Aalen-Johansen estimator require data to be in a specific order before processing. Skorecard requires prebucketing before bucketing; transitionMatrix requires sorting by entity ID then time. Violating this ordering produces incorrect results or runtime errors. Always establish and enforce processing order in multi-stage pipelines.

## `CW-CREDIT-RISK-004` — Consistent API response key naming across all endpoints
**From**: finance-bp-112--openLGD · **Applicable to**: credit-risk

In federated systems with multiple API endpoints (/start, /update), all responses must use identical key names for parameters (intercept, coefficient). Inconsistency causes coordination loop failures in downstream consumers. Define a schema contract upfront and enforce key naming consistency across all response types.

## `CW-CREDIT-RISK-005` — Cardinality bounds checking before array operations
**From**: finance-bp-050--skorecard, finance-bp-119--transitionMatrix · **Applicable to**: credit-risk

Both skorecard's bucketers (max 100 unique values) and transitionMatrix's matrix operations (state cardinality matching matrix dimensions) require strict cardinality validation before creating numpy arrays or performing computations. Violations cause NotPreBucketedError or index out-of-bounds errors. Always validate cardinality constraints before array initialization.

## `CW-CREDIT-RISK-006` — Financial validation gates before transaction execution
**From**: finance-bp-072--lending · **Applicable to**: credit-risk

Lending systems require validation that disbursement amounts do not exceed limits, collateral values, or authorized periods before any transaction executes. These are financial loss prevention controls, not optional business logic. Missing these validations creates unauthorized exposure and regulatory compliance violations that cannot be remedied retroactively.

## `CW-CREDIT-RISK-007` — Mathematical constraint validation for probability outputs
**From**: finance-bp-050--skorecard, finance-bp-119--transitionMatrix · **Applicable to**: credit-risk

Credit risk models must validate mathematical constraints on outputs: skorecard's WoE requires valid bin assignments, transitionMatrix's transition matrices require row sums equals 1.0 and generator matrices require row sums equals 0.0. Invalid mathematical properties corrupt downstream risk calculations. Validate constraints before returning results.

## `CW-CREDIT-RISK-008` — Port-to-ID mapping consistency in distributed model serving
**From**: finance-bp-112--openLGD · **Applicable to**: credit-risk

When deploying distributed model servers, port numbers must map deterministically to server IDs (e.g., port 5001 maps to server ID 1). Computation of ID from port must be consistent across all components. Inconsistencies cause incorrect data directory selection and model parameter mismatches. Document and validate port-ID mappings during deployment.

FILE:references/components/data_acquisition.md
# data_acquisition (2 classes)

## `dataSource`
`data_acquisition/datasource.py:0`

## `Data Transport Layer`
`data_acquisition/data-transport-layer.py:0`

FILE:references/components/federated_coordination.md
# federated_coordination (3 classes)

## `federated_run`
`federated_coordination/federated-run.py:0`

## `Aggregation Algorithm`
`federated_coordination/aggregation-algorithm.py:0`

## `Communication Pattern`
`federated_coordination/communication-pattern.py:0`

FILE:references/components/model_estimation.md
# model_estimation (2 classes)

## `lgdModel`
`model_estimation/lgdmodel.py:0`

## `Regression Algorithm`
`model_estimation/regression-algorithm.py:0`

FILE:references/components/model_serving.md
# model_serving (3 classes)

## `start_calculation`
`model_serving/start-calculation.py:0`

## `update_calculation`
`model_serving/update-calculation.py:0`

## `Server Framework`
`model_serving/server-framework.py:0`

FILE:references/components/standalone_execution.md
# standalone_execution (2 classes)

## `standalone_run`
`standalone_execution/standalone-run.py:0`

## `Execution Mode`
`standalone_execution/execution-mode.py:0`

FILE:references/seed.yaml
meta:
  id: finance-bp-112-v5.3
  version: v6.1
  blueprint_id: finance-bp-112
  sop_version: crystal-compilation-v6.1
  source_language: en
  compiled_at: '2026-04-22T13:00:54.441302+00:00'
  target_host: openclaw
  authoritative_artifact:
    primary: seed.yaml
    non_authoritative_derivatives:
    - SKILL.md (host-generated summary, may lag)
    - HEARTBEAT.md (host telemetry)
    - memory/*.md (host conversational memory)
    rule: On any behavioral decision (preconditions check, OV assertion, EQ rule firing, spec_lock verification), agents MUST
      re-read seed.yaml. Derivatives are for UI display only and may be out-of-date.
  execution_protocol:
    install_trigger:
    - Execute resources.host_adapter.install_recipes[] in declared order
    - Verify each package with import check before proceeding
    execute_trigger: When user intent matches intent_router.uc_entries[].positive_terms AND user uses action verb (run/execute/跑/执行/backtest/fetch/collect)
    on_execute:
    - Reload seed.yaml (do not rely on SKILL.md or cached summaries)
    - Run preconditions[] in declared order; halt on first fatal failure with on_fail message to user
    - Enter context_state_machine.CA1_MEMORY_CHECKED state
    - Evaluate evidence_quality.enforcement_rules[]; prepend user_disclosure_template
    - Translate user_facing_fields to user locale per locale_contract
    - "[V6 READING ORDER]\nThis crystal contains the following V6 layers. Before answering any business question, the host\
      \ MUST read them in order:\n  1. anti_patterns[] — cross-project anti-patterns (with AP-* ids)\n  2. cross_project_wisdom[]\
      \ — cross-project wisdom (with CW-* ids)\n  3. domain_constraints_injected[] — domain constraints (SHARED-* ids)\n \
      \ 4. known_use_cases[] — concrete business scenarios (KUC-* ids)\n  5. component_capability_map — AST component map\
      \ (by module)\n\nWhen answering user questions, proactively cite relevant AP-*/CW-*/SHARED-*/KUC-* ids with source text.\
      \ Examples: T+1 rules -> cite SHARED-* constraint; model comparison -> warn via AP-*; follow-holdings strategy -> cite\
      \ KUC-* with example file."
    workspace_resolution:
      scripts_path: '{host_workspace}/scripts/'
      skills_path: '{host_workspace}/skills/'
      trace_path: '{host_workspace}/.trace/'
  capability_tags:
    markets:
    - global
    activities:
    - credit-risk
  upgraded_from: finance-bp-112-v1.seed.yaml
  upgraded_at: '2026-04-22T13:20:30.477210+00:00'
  v6_inputs:
    ast_mind_map: knowledge/sources/finance/finance-bp-112--openLGD/v6_inputs/ast_mind_map.yaml
    anti_patterns: null
    cross_project_wisdom: null
    examples_kuc: knowledge/sources/finance/finance-bp-112--openLGD/v6_inputs/examples_kuc.yaml
    shared_pools_dir: knowledge/sources/finance/_shared
anti_patterns:
- id: AP-CREDIT-RISK-001
  title: Empty DataFrame passed to bucketing pipeline
  description: When preparing input data for bucketing, passing an empty DataFrame with zero rows or zero columns causes immediate
    ValueError at validation stage. This prevents any downstream processing and blocks the entire credit risk scoring pipeline
    from executing. The root cause is missing defensive validation before data enters the bucketing workflow.
  project_source: finance-bp-050--skorecard
  severity: high
  applicable_to_tags:
    markets:
    - global
    activities:
    - credit-risk
  _source_file: anti-patterns/credit-risk.yaml
- id: AP-CREDIT-RISK-002
  title: Multi-dimensional target array causing WoE shape mismatch
  description: When providing target variable y to bucketers without normalizing to 1D numpy array through _check_y validation,
    downstream Weight of Evidence calculations fail with shape mismatches. The consequence is corrupted bucket tables with
    incorrect credit risk scores that misrepresent default probability estimates.
  project_source: finance-bp-050--skorecard
  severity: high
  applicable_to_tags:
    markets:
    - global
    activities:
    - credit-risk
  _source_file: anti-patterns/credit-risk.yaml
- id: AP-CREDIT-RISK-003
  title: OptimalBucketer receiving high-cardinality numerical features
  description: When implementing prebucketing for OptimalBucketer on numerical features without reducing to at most 100 unique
    values, the system raises NotPreBucketedError and blocks the entire bucketing pipeline. Similarly, AsIsNumericalBucketer
    fails with the same error for columns exceeding 100 unique values, preventing feature transformation in production scoring.
  project_source: finance-bp-050--skorecard
  severity: high
  applicable_to_tags:
    markets:
    - global
    activities:
    - credit-risk
  _source_file: anti-patterns/credit-risk.yaml
- id: AP-CREDIT-RISK-004
  title: Special values distorting optimal bin boundaries
  description: When implementing fit() for bucketers without filtering special values from X before computing bin boundaries
    using _filter_specials_for_fit(), outlier special values distort optimal bin boundaries. This causes incorrect weight-of-evidence
    calculations and unreliable credit risk scores that misrepresent borrower default probabilities.
  project_source: finance-bp-050--skorecard
  severity: high
  applicable_to_tags:
    markets:
    - global
    activities:
    - credit-risk
  _source_file: anti-patterns/credit-risk.yaml
- id: AP-CREDIT-RISK-005
  title: Two-phase bucketing ordering violation causing special value loss
  description: When fitting a BucketingProcess with two-phase bucketing without fitting prebucketing_pipeline before bucketing_pipeline,
    special value remapping fails because pre-bucket labels are unavailable. Additionally, not using _find_remapped_specials()
    after prebucketing causes special values to lose their correct bucket mappings, resulting in runtime errors.
  project_source: finance-bp-050--skorecard
  severity: high
  applicable_to_tags:
    markets:
    - global
    activities:
    - credit-risk
  _source_file: anti-patterns/credit-risk.yaml
- id: AP-CREDIT-RISK-006
  title: Loan amount exceeding product and collateral limits
  description: When validating loan amount for loan applications without enforcing loan_amount does not exceed maximum_loan_amount
    from loan product or proposed securities, disbursing amounts exceeding product or collateral limits exposes the lender
    to uncollateralized risk. This violates lending policy and creates direct financial loss exposure through unauthorized
    lending.
  project_source: finance-bp-072--lending
  severity: high
  applicable_to_tags:
    markets:
    - global
    activities:
    - credit-risk
  _source_file: anti-patterns/credit-risk.yaml
- id: AP-CREDIT-RISK-007
  title: Disbursement validation failures creating unauthorized exposure
  description: When implementing loan disbursement validation without checking disbursed amount against loan limit, assigned
    security value, available limit amount, and limit applicability dates, unauthorized disbursements occur. For Line of Credit
    loans, disbursement outside approved periods or exceeding available limits creates unauthorized lending exposure and regulatory
    compliance violations.
  project_source: finance-bp-072--lending
  severity: high
  applicable_to_tags:
    markets:
    - global
    activities:
    - credit-risk
  _source_file: anti-patterns/credit-risk.yaml
- id: AP-CREDIT-RISK-008
  title: Interest accrual on written-off loans inflating income
  description: When processing interest accrual for Written Off loans without verifying posting_date is on or after the loan
    write-off date, interest is artificially inflated on non-performing assets. This misrepresents loan portfolio value, violates
    provisioning requirements, and creates false income reporting that misleads stakeholders about actual financial performance.
  project_source: finance-bp-072--lending
  severity: high
  applicable_to_tags:
    markets:
    - global
    activities:
    - credit-risk
  _source_file: anti-patterns/credit-risk.yaml
- id: AP-CREDIT-RISK-009
  title: Loop index errors in federated parameter averaging
  description: When implementing federated parameter averaging logic, using the final index n instead of the loop variable
    k causes only the last server's weight to be applied repeatedly. Additionally, skipping the first server by starting loop
    index at 1 excludes valid parameters from averaging, breaking federated convergence and producing incorrect LGD estimates
    across all nodes.
  project_source: finance-bp-112--openLGD
  severity: high
  applicable_to_tags:
    markets:
    - global
    activities:
    - credit-risk
  _source_file: anti-patterns/credit-risk.yaml
- id: AP-CREDIT-RISK-010
  title: API response format inconsistency breaking federated coordination
  description: When implementing GET /start and POST /update endpoints for LGD estimation without consistent 'intercept' and
    'coefficient' keys in JSON responses, the federated coordinator fails to parse responses causing KeyError. Different return
    key names (e.g., 'coef' instead of 'coefficient') break both standalone and federated execution paths.
  project_source: finance-bp-112--openLGD
  severity: high
  applicable_to_tags:
    markets:
    - global
    activities:
    - credit-risk
  _source_file: anti-patterns/credit-risk.yaml
- id: AP-CREDIT-RISK-011
  title: Invalid transition probabilities corrupting Markov matrices
  description: When generating synthetic Markov chain data or estimating transition matrices with probabilities outside [0,
    1] or row sums not equal to 1.0, the resulting matrices violate the fundamental mathematical definition of a stochastic
    transition matrix. This corrupts all downstream Markov chain modeling and credit curve generation, producing unreliable
    credit risk estimates.
  project_source: finance-bp-119--transitionMatrix
  severity: high
  applicable_to_tags:
    markets:
    - global
    activities:
    - credit-risk
  _source_file: anti-patterns/credit-risk.yaml
- id: AP-CREDIT-RISK-012
  title: Unsorted event data causing incorrect transition matrix estimates
  description: When feeding generated data to cohort or duration estimators without sorting by entity ID first, then by ascending
    time, incorrect timepoint assignment occurs in estimators, leading to wrong transition counts. Unsorted data also causes
    the Aalen-Johansen algorithm to process events out of temporal order, producing incorrect transition matrices that violate
    the Markov property.
  project_source: finance-bp-119--transitionMatrix
  severity: high
  applicable_to_tags:
    markets:
    - global
    activities:
    - credit-risk
  _source_file: anti-patterns/credit-risk.yaml
- id: AP-CREDIT-RISK-013
  title: Zero-count division causing NaN in transition matrices
  description: When normalizing counts to produce transition probabilities without checking source state population count
    is greater than zero before division, division by zero occurs and causes NaN values in the transition matrix. These NaN
    values corrupt all downstream matrix operations including generator matrix computation and credit curve generation.
  project_source: finance-bp-119--transitionMatrix
  severity: high
  applicable_to_tags:
    markets:
    - global
    activities:
    - credit-risk
  _source_file: anti-patterns/credit-risk.yaml
- id: AP-CREDIT-RISK-014
  title: Wrong matrix logarithm method producing invalid generator matrices
  description: When implementing generator() method without using scipy.linalg.logm for matrix logarithm computation, using
    numpy.log or other approximation methods produces invalid generator matrices with row sums not equal to zero. This violates
    the mathematical definition of an infinitesimal generator, causing incorrect continuous-time Markov chain modeling.
  project_source: finance-bp-119--transitionMatrix
  severity: medium
  applicable_to_tags:
    markets:
    - global
    activities:
    - credit-risk
  _source_file: anti-patterns/credit-risk.yaml
cross_project_wisdom:
- wisdom_id: CW-CREDIT-RISK-001
  source_project: finance-bp-050--skorecard, finance-bp-112--openLGD
  pattern_name: Strict input DataFrame schema validation
  description: Both skorecard and openLGD require strict validation that input DataFrames contain exactly the expected columns
    (X/Y for openLGD, specified variable names for skorecard). This pattern is critical when data flows through multiple transformation
    stages where downstream modules access columns by name without defensive checking. Always validate column existence before
    pipeline execution.
  applicable_to_activity: credit-risk
  _source_file: cross-project-wisdom/credit-risk.yaml
- wisdom_id: CW-CREDIT-RISK-002
  source_project: finance-bp-112--openLGD
  pattern_name: Explicit random_state for ML model reproducibility
  description: In federated learning scenarios with SGDRegressor, omitting random_state causes non-deterministic results due
    to random data shuffling and weight initialization. This breaks federated learning convergence guarantees. Always set
    random_state explicitly when reproducibility across nodes or runs is required for regulatory auditability.
  applicable_to_activity: credit-risk
  _source_file: cross-project-wisdom/credit-risk.yaml
- wisdom_id: CW-CREDIT-RISK-003
  source_project: finance-bp-050--skorecard, finance-bp-119--transitionMatrix
  pattern_name: Mandatory data sorting before multi-stage estimation
  description: Both skorecard's two-phase bucketing and transitionMatrix's Aalen-Johansen estimator require data to be in
    a specific order before processing. Skorecard requires prebucketing before bucketing; transitionMatrix requires sorting
    by entity ID then time. Violating this ordering produces incorrect results or runtime errors. Always establish and enforce
    processing order in multi-stage pipelines.
  applicable_to_activity: credit-risk
  _source_file: cross-project-wisdom/credit-risk.yaml
- wisdom_id: CW-CREDIT-RISK-004
  source_project: finance-bp-112--openLGD
  pattern_name: Consistent API response key naming across all endpoints
  description: In federated systems with multiple API endpoints (/start, /update), all responses must use identical key names
    for parameters (intercept, coefficient). Inconsistency causes coordination loop failures in downstream consumers. Define
    a schema contract upfront and enforce key naming consistency across all response types.
  applicable_to_activity: credit-risk
  _source_file: cross-project-wisdom/credit-risk.yaml
- wisdom_id: CW-CREDIT-RISK-005
  source_project: finance-bp-050--skorecard, finance-bp-119--transitionMatrix
  pattern_name: Cardinality bounds checking before array operations
  description: Both skorecard's bucketers (max 100 unique values) and transitionMatrix's matrix operations (state cardinality
    matching matrix dimensions) require strict cardinality validation before creating numpy arrays or performing computations.
    Violations cause NotPreBucketedError or index out-of-bounds errors. Always validate cardinality constraints before array
    initialization.
  applicable_to_activity: credit-risk
  _source_file: cross-project-wisdom/credit-risk.yaml
- wisdom_id: CW-CREDIT-RISK-006
  source_project: finance-bp-072--lending
  pattern_name: Financial validation gates before transaction execution
  description: Lending systems require validation that disbursement amounts do not exceed limits, collateral values, or authorized
    periods before any transaction executes. These are financial loss prevention controls, not optional business logic. Missing
    these validations creates unauthorized exposure and regulatory compliance violations that cannot be remedied retroactively.
  applicable_to_activity: credit-risk
  _source_file: cross-project-wisdom/credit-risk.yaml
- wisdom_id: CW-CREDIT-RISK-007
  source_project: finance-bp-050--skorecard, finance-bp-119--transitionMatrix
  pattern_name: Mathematical constraint validation for probability outputs
  description: 'Credit risk models must validate mathematical constraints on outputs: skorecard''s WoE requires valid bin
    assignments, transitionMatrix''s transition matrices require row sums equals 1.0 and generator matrices require row sums
    equals 0.0. Invalid mathematical properties corrupt downstream risk calculations. Validate constraints before returning
    results.'
  applicable_to_activity: credit-risk
  _source_file: cross-project-wisdom/credit-risk.yaml
- wisdom_id: CW-CREDIT-RISK-008
  source_project: finance-bp-112--openLGD
  pattern_name: Port-to-ID mapping consistency in distributed model serving
  description: When deploying distributed model servers, port numbers must map deterministically to server IDs (e.g., port
    5001 maps to server ID 1). Computation of ID from port must be consistent across all components. Inconsistencies cause
    incorrect data directory selection and model parameter mismatches. Document and validate port-ID mappings during deployment.
  applicable_to_activity: credit-risk
  _source_file: cross-project-wisdom/credit-risk.yaml
domain_constraints_injected: []
resources_injected: {}
known_use_cases:
- kuc_id: KUC-101
  source_file: docs/source/conf.py
  business_problem: This file configures the Sphinx documentation builder for the openLGD project, setting up project metadata,
    version information, and path configurations needed to generate developer documentation.
  intent_keywords:
  - documentation
  - sphinx
  - configuration
  - build docs
  - project setup
  stage: documentation
  data_domain: mixed
  type: extension_example
component_capability_map:
  project: finance-bp-112--openLGD
  scan_date: '2026-04-22'
  stats:
    total_files: 5
    total_classes: 12
    total_functions: 0
    total_stages: 5
  modules:
    data_acquisition:
      class_count: 2
      stage_id: data_acquisition
      stage_order: 1
      responsibility: Retrieves LGD regression data from either local CSV files or REST API endpoints. Supports two transport
        modes enabling development and production deployments without code changes.
      classes:
      - name: dataSource
        file: data_acquisition/datasource.py
        line: 0
        kind: required_method
        signature: ''
      - name: Data Transport Layer
        file: data_acquisition/data-transport-layer.py
        line: 0
        kind: replaceable_point
      design_decision_count: 3
    model_estimation:
      class_count: 2
      stage_id: model_estimation
      stage_order: 2
      responsibility: Executes iterative linear regression using stochastic gradient descent. Supports warm-start mode for
        federated learning where prior averaged parameters initialize local estimation.
      classes:
      - name: lgdModel
        file: model_estimation/lgdmodel.py
        line: 0
        kind: required_method
        signature: ''
      - name: Regression Algorithm
        file: model_estimation/regression-algorithm.py
        line: 0
        kind: replaceable_point
      design_decision_count: 4
    model_serving:
      class_count: 3
      stage_id: model_serving
      stage_order: 3
      responsibility: Flask-based HTTP server that exposes LGD estimation via REST endpoints. Each server instance maintains
        local data access and provides cold-start and warm-start estimation paths.
      classes:
      - name: start_calculation
        file: model_serving/start-calculation.py
        line: 0
        kind: required_method
        signature: ''
      - name: update_calculation
        file: model_serving/update-calculation.py
        line: 0
        kind: required_method
        signature: ''
      - name: Server Framework
        file: model_serving/server-framework.py
        line: 0
        kind: replaceable_point
      design_decision_count: 4
    federated_coordination:
      class_count: 3
      stage_id: federated_coordination
      stage_order: 4
      responsibility: 'Orchestrates federated learning across multiple model servers using parameter averaging. Implements
        the FedAvg algorithm: local estimation, parameter collection, weighted averaging, and broadcast to each servers.'
      classes:
      - name: federated_run
        file: federated_coordination/federated-run.py
        line: 0
        kind: required_method
        signature: ''
      - name: Aggregation Algorithm
        file: federated_coordination/aggregation-algorithm.py
        line: 0
        kind: replaceable_point
      - name: Communication Pattern
        file: federated_coordination/communication-pattern.py
        line: 0
        kind: replaceable_point
      design_decision_count: 5
    standalone_execution:
      class_count: 2
      stage_id: standalone_execution
      stage_order: 5
      responsibility: Single-process LGD estimation loop for development and testing. Validates environment setup and core
        estimation logic without federation overhead.
      classes:
      - name: standalone_run
        file: standalone_execution/standalone-run.py
        line: 0
        kind: required_method
        signature: ''
      - name: Execution Mode
        file: standalone_execution/execution-mode.py
        line: 0
        kind: replaceable_point
      design_decision_count: 2
  data_flow_hints: []
locale_contract:
  source_language: en
  user_facing_fields:
  - human_summary.what_i_can_do.tagline
  - human_summary.what_i_can_do.use_cases[]
  - human_summary.what_i_auto_fetch[]
  - human_summary.what_i_ask_you[]
  - evidence_quality.user_disclosure_template
  - post_install_notice.message_template.positioning
  - post_install_notice.message_template.capability_catalog.groups[].name
  - post_install_notice.message_template.capability_catalog.groups[].description
  - post_install_notice.message_template.capability_catalog.groups[].ucs[].name
  - post_install_notice.message_template.capability_catalog.groups[].ucs[].short_description
  - post_install_notice.message_template.call_to_action
  - post_install_notice.message_template.featured_entries[].beginner_prompt
  - post_install_notice.message_template.more_info_hint
  - preconditions[].description
  - preconditions[].on_fail
  - intent_router.uc_entries[].name
  - intent_router.uc_entries[].ambiguity_question
  - architecture.pipeline
  - architecture.stages[].narrative.does_what
  - architecture.stages[].narrative.key_decisions
  - architecture.stages[].narrative.common_pitfalls
  - constraints.fatal[].consequence
  - constraints.regular[].consequence
  - output_validator.assertions[].failure_message
  - acceptance.hard_gates[].on_fail
  - skill_crystallization.action
  locale_detection_order:
  - explicit_user_declaration
  - first_message_language
  - system_locale
  translation_enforcement:
    trigger: on_first_user_message
    action: Render user_facing_fields in detected locale, preserving all IDs (BD-/SL-/UC-/finance-C-) and code identifiers
      verbatim
    violation_code: LOCALE-01
    violation_signal: User receives untranslated English Human Summary when detected locale != en
evidence_quality:
  declared:
    evidence_coverage_ratio: 1.0
    evidence_verify_ratio: 0.20987654320987653
    evidence_invalid: 64
    evidence_verified: 17
    evidence_auto_fixed: 0
    audit_coverage: 30/30 (100%)
    audit_pass_rate: 1/30 (3%)
    audit_fail_total: 23
    audit_finance_universal:
      pass: 1
      warn: 3
      fail: 15
    audit_subdomain_totals:
      pass: 0
      warn: 3
      fail: 8
  enforcement_rules:
  - id: EQ-01
    trigger: declared.evidence_verify_ratio < 0.5
    action: MUST invoke traceback lookup for all cited BD-IDs in output before emitting business code — read LATEST.yaml sections
      for each BD referenced
    violation_code: EQ-01-V
    violation_signal: Generated script references BD-IDs but no tool_call to read LATEST.yaml preceded code generation
  user_disclosure_template: '[QUALITY NOTICE] This crystal was compiled from blueprint finance-bp-112. Evidence verify ratio
    = 21.0% and audit fail total = 23. Generated results may have uncaptured requirement gaps. Verify critical decisions against
    source files (LATEST.yaml / LATEST.jsonl).'
traceback:
  source_files:
    blueprint: LATEST.yaml
    constraints: LATEST.jsonl
  mandatory_lookup_scenarios:
  - id: TB-01
    condition: Two constraints have apparently conflicting enforcement rules
    lookup_target: LATEST.jsonl — find both constraint IDs, compare `consequence` + `evidence_refs` to determine priority
  - id: TB-02
    condition: A business decision rationale is unclear or disputed
    lookup_target: LATEST.yaml — locate BD-ID under business_decisions, read `rationale` + `alternative_considered` fields
  - id: TB-03
    condition: evidence_invalid > 0 in evidence_quality.declared
    lookup_target: LATEST.yaml _enrich_meta — cross-check specific BD `evidence_refs` fields for invalid markers
  - id: TB-04
    condition: User asks where a rule comes from
    lookup_target: LATEST.jsonl — find constraint by ID, read `confidence.evidence_refs` for source file + line number
  - id: TB-05
    condition: Generated code does not match expected ZVT API behavior
    lookup_target: LATEST.yaml stages[].required_methods — verify method signature and evidence locator in source code
  degraded_lookup:
    no_fs_access: 'Ask the user to paste the relevant LATEST.yaml section or LATEST.jsonl lines for the BD-/finance-C- IDs
      in question. Crystal ID: finance-bp-112-v5.0.'
trace_schema:
  event_types:
  - precondition_check
  - spec_lock_check
  - evidence_rule_fired
  - evidence_rule_skipped
  - locale_translation_emitted
  - hard_gate_passed
  - hard_gate_failed
  - skill_emitted
  - false_completion_claim
preconditions:
- id: PC-01
  description: zvt package installed and importable
  check_command: python3 -c 'import zvt; print(zvt.__version__)'
  on_fail: 'Run: python3 -m pip install zvt  then re-run: python3 -m zvt.init_dirs to initialize data directories'
  severity: fatal
- id: PC-02
  description: K-data exists for target entities (required before backtesting)
  check_command: python3 -c "from zvt.api.kdata import get_kdata; df = get_kdata(entity_ids=['stock_sh_600000'], limit=1);
    assert df is not None and len(df) > 0, 'No kdata found'"
  on_fail: 'Run recorder first: python3 -m zvt.recorders.em.em_stock_kdata_recorder --entity_ids stock_sh_600000  (replace
    with your target entity IDs)'
  severity: fatal
  applies_to_uc: []
- id: PC-03
  description: ZVT data directory initialized (~/.zvt or ZVT_HOME)
  check_command: 'python3 -c "import os; from pathlib import Path; zvt_home = Path(os.environ.get(''ZVT_HOME'', Path.home()
    / ''.zvt'')); assert zvt_home.exists(), f''ZVT home not found: {zvt_home}''"'
  on_fail: 'Run: python3 -m zvt.init_dirs'
  severity: fatal
- id: PC-04
  description: SQLite write permission for ZVT data directory
  check_command: python3 -c "import os, tempfile; from pathlib import Path; zvt_home = Path(os.environ.get('ZVT_HOME', Path.home()
    / '.zvt')); test_f = zvt_home / '.write_test'; test_f.touch(); test_f.unlink()"
  on_fail: 'Check directory permissions: chmod u+w ~/.zvt  or set ZVT_HOME environment variable to a writable location'
  severity: warn
intent_router:
  uc_entries:
  - uc_id: UC-101
    name: Sphinx Documentation Configuration
    positive_terms:
    - documentation
    - sphinx
    - configuration
    - build docs
    - project setup
    data_domain: mixed
    negative_terms:
    - trading strategy
    - screening
    - data pipeline
    - monitoring
    - live trading
    - factor computation
    - machine learning
    ambiguity_question: Are you looking to configure documentation build tools, or are you trying to implement a trading strategy,
      data pipeline, or analytical workflow?
context_state_machine:
  states:
  - id: CA1_MEMORY_CHECKED
    entry: Task started
    exit: All memory queries attempted and recorded; memory_unavailable set if failed
    timeout: 30s — skip memory, mark memory_unavailable=true, proceed to CA2
  - id: CA2_GAPS_FILLED
    entry: CA1 complete
    exit: 'All FATAL-priority required inputs answered: target market (A-share/HK/US), data source, time range, strategy type'
    timeout: NOT skippable — FATAL inputs MUST be user-answered before proceeding
  - id: CA3_PATH_SELECTED
    entry: CA2 complete
    exit: intent_router matched single use case with confidence gap > 20% over next candidate, no data_domain ambiguity
    timeout: Trigger ambiguity_question for top-2 candidates, await user selection
  - id: CA4_EXECUTING
    entry: CA3 complete + user explicit confirmation received
    exit: All hard gates G1-Gn passed and output files written
    timeout: NOT skippable — user confirmation of execution path required
  enforcement: Code generation is PROHIBITED before CA4_EXECUTING. Any regression to earlier state MUST be announced to user.
    buy/sell ordering SL-01 check runs at CA4 entry.
spec_lock_registry:
  semantic_locks:
  - id: SL-01
    description: Execute sell orders before buy orders in every trading cycle
    locked_value: sell() called before buy() in each Trader.run() iteration
    violation_is: fatal
    source_bd_ids:
    - BD-018
  - id: SL-02
    description: Trading signals MUST use next-bar execution (no look-ahead)
    locked_value: due_timestamp = happen_timestamp + level.to_second()
    violation_is: fatal
    source_bd_ids:
    - BD-014
    - BD-025
  - id: SL-03
    description: Entity IDs MUST follow format entity_type_exchange_code
    locked_value: stock_sh_600000 | stockhk_hk_0700 | stockus_nasdaq_AAPL
    violation_is: fatal
    source_bd_ids: []
  - id: SL-04
    description: DataFrame index MUST be MultiIndex (entity_id, timestamp)
    locked_value: df.index.names == ['entity_id', 'timestamp']
    violation_is: fatal
    source_bd_ids: []
  - id: SL-05
    description: 'TradingSignal MUST have EXACTLY ONE of: position_pct, order_money, order_amount'
    locked_value: XOR enforcement in trading/__init__.py:68
    violation_is: fatal
    source_bd_ids: []
  - id: SL-06
    description: 'filter_result column semantics: True=BUY, False=SELL, None/NaN=NO ACTION'
    locked_value: factor.py:475 order_type_flag mapping
    violation_is: fatal
    source_bd_ids: []
  - id: SL-07
    description: Transformer MUST run BEFORE Accumulator in factor pipeline
    locked_value: 'compute_result(): transform at :403 before accumulator at :409'
    violation_is: fatal
    source_bd_ids: []
  - id: SL-08
    description: 'MACD parameters locked: fast=12, slow=26, signal=9'
    locked_value: factors/algorithm.py:30 macd(slow=26, fast=12, n=9)
    violation_is: fatal
    source_bd_ids:
    - BD-036
  - id: SL-09
    description: 'Default transaction costs: buy_cost=0.001, sell_cost=0.001, slippage=0.001'
    locked_value: sim_account.py:25 SimAccountService default costs
    violation_is: warning
    source_bd_ids:
    - BD-029
  - id: SL-10
    description: A-share equity trading is T+1 (no same-day close of buy positions)
    locked_value: sim_account.available_long filters by trading_t
    violation_is: fatal
    source_bd_ids: []
  - id: SL-11
    description: Recorder subclass MUST define provider AND data_schema class attributes
    locked_value: contract/recorder.py:71 Meta; register_schema decorator
    violation_is: fatal
    source_bd_ids: []
  - id: SL-12
    description: Factor result_df MUST contain either 'filter_result' OR 'score_result' column
    locked_value: result_df.columns.intersection({'filter_result', 'score_result'}) non-empty
    violation_is: fatal
    source_bd_ids: []
  implementation_hints:
  - id: IH-01
    hint: 'Use AdjustType enum exactly: qfq (pre-adjust), hfq (post-adjust), bfq (none) — contract/__init__.py:121'
  - id: IH-02
    hint: For A-share kdata, default to hfq for long-term analysis (dividend-adjusted) — trader.py:538 StockTrader
  - id: IH-03
    hint: SQLite connection MUST use check_same_thread=False for multi-threaded recorders
  - id: IH-04
    hint: Accumulator state serialization uses JSON with custom encoder/decoder hooks — contract/base_service.py
  - id: IH-05
    hint: Factor.level MUST match TargetSelector.level (enforced at add_factor) — factors/target_selector.py:84
preservation_manifest:
  required_objects:
    business_decisions_count: 91
    fatal_constraints_count: 31
    non_fatal_constraints_count: 99
    use_cases_count: 1
    semantic_locks_count: 12
    preconditions_count: 4
    evidence_quality_rules_count: 2
    traceback_scenarios_count: 5
architecture:
  pipeline: data_collection -> data_storage -> factor_computation -> target_selection -> trading_execution -> visualization
  stages:
  - id: data_collection
    narrative:
      does_what: TimeSeriesDataRecorder and FixedCycleDataRecorder fetch OHLCV and fundamental data from providers (eastmoney,
        joinquant, baostock, akshare) and persist domain objects (Stock1dKdata, BalanceSheet) to SQLite via df_to_db().
      key_decisions: BD-002 chose evaluate_start_end_size_timestamps for incremental fetch (not full refresh) because comparing
        to get_latest_saved_record avoids redundant API calls; BD-003 chose get_data_map field transformation to keep domain
        schema provider-agnostic.
      common_pitfalls: 'Don''t forget SL-11: Recorder subclass MUST declare both provider and data_schema class attributes
        else initialization fails with assertion error; finance-C-001 fatal violation.'
    business_decisions: []
  - id: data_storage
    narrative:
      does_what: StorageBackend persists DataFrames to per-provider SQLite databases at {data_path}/{provider}/{provider}_{db_name}.db
        using path templates from _get_path_template; Mixin.record_data and Mixin.query_data provide uniform read/write interface.
      key_decisions: BD-004 chose StorageBackend abstraction (not hardcoded SQLite) to allow future cloud storage swap; BD-006
        derives db_name from data_schema __tablename__ for per-domain database isolation.
      common_pitfalls: SL-04 violation (wrong DataFrame index) causes factor pipeline failures downstream; always ensure df.index.names
        == ['entity_id', 'timestamp'] before calling record_data.
    business_decisions: []
  - id: factor_computation
    narrative:
      does_what: Factor.compute() applies Transformer (stateless, e.g. MacdTransformer) then Accumulator (stateful, e.g. MaStatsAccumulator)
        to produce filter_result or score_result columns; EntityStateService persists per-entity rolling state across batches.
      key_decisions: BD-007 chose Factor inheriting DataReader for composable data access; SL-08 locks MACD at (fast=12, slow=26,
        n=9) — chose standard Appel parameters not adaptive because interpretability matters for practitioners.
      common_pitfalls: 'SL-07: Transformer MUST run before Accumulator — swapping order causes NaN propagation; SL-12: result_df
        must contain filter_result OR score_result column or TargetSelector silently drops all signals.'
    business_decisions: []
  - id: target_selection
    narrative:
      does_what: TargetSelector.add_factor() registers Factor instances; get_targets() returns entity_ids passing threshold
        filter at a specific timestamp, enabling point-in-time historical backtesting without look-ahead.
      key_decisions: BD-012 chose registrable factor list (not hardcoded) for runtime customization; BD-013 chose timestamp-specific
        filtering not current-only because backtests need historical point-in-time correctness.
      common_pitfalls: Factor.level MUST match TargetSelector.level (IH-05); mismatched levels cause silent empty target lists
        that look like no signals but are actually level-mismatch bugs.
    business_decisions: []
  - id: trading_execution
    narrative:
      does_what: Trader.run() calls sell() before buy() each cycle, generates TradingSignals with due_timestamp = happen_timestamp
        + level.to_second() for next-bar execution, and applies on_profit_control() for stop-loss/take-profit before regular
        target selection.
      key_decisions: SL-01 locks sell-before-buy order because available_long check in sim_account depends on it — chose this
        over symmetric ordering to prevent implicit leverage; BD-039 chose long=AND/short=OR multi-level logic to reflect
        risk asymmetry.
      common_pitfalls: 'SL-02 violation (immediate execution instead of next-bar) introduces look-ahead bias and makes backtest
        results unreproducible in live trading; SL-10: A-share T+1 constraint — backtesting without it overstates returns.'
    business_decisions: []
  - id: visualization
    narrative:
      does_what: Drawer.draw() combines kline main chart with factor overlays and Rect annotations for entry/exit signals
        using Plotly; Drawable interface on Factor enables consistent chart rendering across data types.
      key_decisions: BD-019 chose drawer_rects subclass override for custom annotations not hardcoded markers — allows traders
        to define entry/exit visuals without modifying base drawing logic.
      common_pitfalls: draw_result=True by default (BD-055) is fine for development but set draw_result=False in production/headless
        environments to avoid Plotly server startup overhead.
    business_decisions: []
  - id: cross_cutting_concerns
    narrative:
      does_what: 'Invariants and utilities that span multiple pipeline stages — collected from 27 source groups: API(3), Aggregation(1),
        Algorithm(5), Architecture(2), Configuration(4), Deployment(2), and 21 more.'
      key_decisions: 91 BDs merged here because they apply to more than one main stage (e.g. algorithm helpers, default value
        choices, ordering contracts, error handling). Agent should inspect individual BD summaries and link back to affected
        main stages via shared IDs.
      common_pitfalls: Cross-cutting concerns frequently surface as bugs when changes to one main stage unintentionally break
        another. Check constraints referencing these BDs and verify invariants still hold after any stage-local modification.
    business_decisions:
    - id: BD-035
      type: B/BA
      summary: Use GET /start endpoint to initiate cold start and retrieve initial local estimates
    - id: BD-036
      type: B
      summary: Use POST /update endpoint to receive averaged parameters and return new estimates
    - id: BD-037
      type: B
      summary: Use GET / as health check endpoint to verify server liveness
    - id: BD-024
      type: B
      summary: Use equal weights (0.25 each) for federated parameter averaging across 4 servers
    - id: BD-020
      type: B
      summary: Use SGDRegressor from scikit-learn for linear regression with stochastic gradient descent
    - id: BD-021
      type: B
      summary: Set max_iter=1 per epoch for incremental/online learning style updates
    - id: BD-022
      type: B/BA
      summary: Disable regularization (tol=None) and early stopping for pure empirical loss
    - id: BD-052
      type: B
      summary: Set warm_start=False to erase previous solution on each fit call
    - id: BD-053
      type: B
      summary: Use verbose=0 for silent training output
    - id: BD-019
      type: BA/DK
      summary: Use federated learning architecture where data stays local and only model parameters are aggregated
    - id: BD-038
      type: BA/M
      summary: Design stateless model servers where each request is computed independently
    - id: BD-041
      type: B
      summary: Use YAML configuration file for cluster parameters (hosts, epochs, servers)
    - id: BD-042
      type: B/RC
      summary: Run Flask in debug mode (debug=True) for development
    - id: BD-057
      type: B
      summary: Configure base URL as 'http://127.0.0.1:500' in config.yml for local demo
    - id: BD-058
      type: B/RC
      summary: Use ruamel.yaml for YAML parsing with safe loading
    - id: BD-039
      type: B
      summary: Use Fabric deployment tool for cluster management tasks
    - id: BD-040
      type: B/BA
      summary: Use Docker containers for openNPL data backend deployment
    - id: BD-031
      type: B/BA
      summary: Run Flask model servers on ports 5001-5004 for the federated cluster
    - id: BD-032
      type: B
      summary: Run openNPL data backend servers on ports 8001-8004 for database-backed demo
    - id: BD-033
      type: B
      summary: 'Derive server ID from port number: server_id = port - 5000'
    - id: BD-034
      type: B/BA
      summary: Configure 4 federated servers as default cluster size
    - id: BD-054
      type: B
      summary: Print server estimates and averaged parameters after each epoch
    - id: BD-047
      type: B/DK
      summary: Provide /stop endpoint for graceful server shutdown
    - id: BD-048
      type: B
      summary: Recommend Linux environment for running the federated demo
    - id: BD-049
      type: B/DK
      summary: Use virtual environment for dependency isolation
    - id: BD-050
      type: B/DK
      summary: Use XTerm windows for displaying model server output during demo
    - id: BD-045
      type: B
      summary: Run separate client/coordinator process to orchestrate federated rounds
    - id: BD-046
      type: B
      summary: Check each model server health before starting federated calculation
    - id: BD-028
      type: B/DK
      summary: 'Exchange two parameters between coordinator and servers: intercept and coefficient'
    - id: BD-029
      type: B/BA
      summary: Use cold start (no initial params) for first iteration, warm start thereafter
    - id: BD-030
      type: B/DK
      summary: Use JSON serialization for parameter exchange in federated protocol
    - id: BD-051
      type: B/BA
      summary: Use HTTP requests library for client-server communication
    - id: BD-061
      type: DK/B
      summary: 'TODO: Implement fractional regression variations for LGD models'
    - id: BD-062
      type: DK/B
      summary: 'TODO: Adopt different data loading strategies for standalone vs federated learning'
    - id: BD-059
      type: DK/B
      summary: 'TODO: Remove hardcoded weights - fetch node data shape via controlled API'
    - id: BD-060
      type: DK
      summary: 'TODO: Remove file/URL path hardwiring in dataSource'
    - id: BD-055
      type: B/BA
      summary: Provide standalone_run.py as single-server validation before federated demo
    - id: BD-023
      type: B/BA
      summary: Set 10 epochs as default training iterations
    - id: BD-056
      type: B
      summary: Iterate federated rounds by calling lgdModel with previous averaged params
    - id: BD-001
      type: B
      summary: Choice parameter controls data transport rather than separate functions
    - id: BD-002
      type: BA/DK
      summary: Port-derived server ID convention (port - 5000 = server number)
    - id: BD-003
      type: B/BA
      summary: Hardcoded data schema (X, Y column names)
    - id: BD-025
      type: B
      summary: 'Provide two data source modes: local filesystem (choice=1) and REST API (choice=2)'
    - id: BD-026
      type: B/BA
      summary: Store CSV data in server_dirs/{server_id}/regression_data.csv pattern
    - id: BD-027
      type: B/BA
      summary: Define CSV data format with X column as target and Y as explanatory variable
    - id: BD-043
      type: B/RC
      summary: Query openNPL API endpoint /api/npl_data/counterparties for data backend
    - id: BD-044
      type: B/BA
      summary: Extract current_assets and cash_and_cash_equivalent_items as X and Y features
    - id: BD-073
      type: BA/DK
      summary: 'SGDRegressor defaults encode iterative forcing: max_iter=1, tol=None, early_stopping=False'
    - id: BD-075
      type: BA/DK
      summary: 'Server ID derived from port via hardcoded offset: n = int(port) - 5000'
    - id: BD-077
      type: BA/DK
      summary: Data source choice=1 loads from ./server_dirs/{server}/regression_data.csv
    - id: BD-081
      type: BA
      summary: Epochs count hardcoded in config.yml (10) vs standalone_run.py (10) - dual maintenance risk
    - id: BD-012
      type: M/BA
      summary: Federated Averaging (FedAvg) algorithm
    - id: BD-013
      type: B/BA
      summary: Equal weighting across servers
    - id: BD-014
      type: M/BA
      summary: Per-epoch parameter collection and averaging
    - id: BD-015
      type: B/BA
      summary: Hardcoded weight dictionary
    - id: BD-016
      type: B/BA
      summary: Blocking sequential server communication
    - id: BD-082
      type: B/BA
      summary: 'INTERACTION: BD-038 (stateless servers) × BD-005/BD-029 (warm-start via intercept_init/coef_init) → Paradox:
        warm-start REQUIRES state persistence across requests, contradicting stateless server desig'
    - id: BD-083
      type: B/BA
      summary: 'INTERACTION: BD-003/BD-078 (X=target, Y=explanatory column convention) × BD-025/BD-043 (dual data source modes)
        → Convention fragility amplified by data source variability'
    - id: BD-084
      type: BA
      summary: 'INTERACTION: BD-002/BD-033/BD-075 (port-derived server ID: n = port - 5000) × BD-080 (exactly 4 servers required)
        → Port availability dependency creates cascading failure risk'
    - id: BD-085
      type: B
      summary: 'INTERACTION: BD-013/BD-024/BD-076 (equal 0.25 weighting) × BD-016 (blocking sequential communication) → Unequal
        convergence quality with linear latency penalty'
    - id: BD-086
      type: B
      summary: 'INTERACTION: BD-004/BD-021/BD-052/BD-064/BD-073 (SGDRegressor single-epoch settings) × BD-019 (federated learning
        architecture) → Training limitation undermines federated convergence benefit'
    - id: BD-087
      type: B/RC
      summary: 'INTERACTION: BD-072 (start BEFORE update ordering) × BD-074 (averaging BEFORE next epoch) × BD-016 (sequential
        blocking) → Single slow server creates cascading deadlock risk in federated rounds'
    - id: BD-088
      type: BA
      summary: 'INTERACTION: BD-023 (epochs: 10) × BD-081 (epochs dual-hardcoded) → Configuration inconsistency risk between
        federated and standalone modes'
    - id: BD-089
      type: B/BA
      summary: 'RISK CASCADE: BD-076 (equal weighting) → BD-085 (latency amplification) → BD-087 (cascading deadlock) → federation
        failure when data is heterogeneous'
    - id: BD-090
      type: BA
      summary: 'RISK CASCADE: BD-075 (port-derived ID) → BD-080 (4-server hardcode) → BD-046 (health check) → deployment failure
        cascades to federation inability'
    - id: BD-091
      type: BA
      summary: 'CONTRADICTION: BD-038 (stateless servers) states ''each request is computed independently'' while BD-072 (start
        BEFORE update) mandates stateful request ordering across federated rounds'
    - id: BD-074
      type: B
      summary: Federated averaging MUST complete before sending averaged params to next epoch
    - id: BD-076
      type: B/BA
      summary: Equal federated weights (0.25 each) hardcoded for 4 servers - no API to fetch data size
    - id: BD-063
      type: B/BA
      summary: Linear regression model using SGDRegressor instead of closed-form OLS
    - id: BD-064
      type: B/BA
      summary: SGD optimization with max_iter=1 per fit call and warm_start disabled
    - id: BD-065
      type: B/BA
      summary: Early stopping disabled with no convergence tolerance criterion
    - id: BD-066
      type: B
      summary: No explicit regularization penalty applied to loss function
    - id: BD-067
      type: B
      summary: Server-based data source selection with file vs REST API input method
    - id: BD-068
      type: B/RC
      summary: 'Variable assignment convention: X as target, y as explanatory variables'
    - id: BD-069
      type: B/BA
      summary: Default squared error loss (squared_loss) with default optimal learning rate schedule
    - id: BD-070
      type: B/BA
      summary: Model parameter initialization supported via coef_init and intercept_init
    - id: BD-071
      type: B/BA
      summary: Fitted parameters returned as dictionary with predictions and metadata
    - id: BD-004
      type: M/BA
      summary: SGDRegressor with max_iter=1 for iterative control
    - id: BD-005
      type: M/DK
      summary: Warm-start via intercept_init/coef_init parameters
    - id: BD-006
      type: B/BA
      summary: None-checking as cold/warm start toggle
    - id: BD-007
      type: B/BA
      summary: SGDRegressor hardcoded (not abstracted or configurable)
    - id: BD-008
      type: B/BA
      summary: Port-to-server-ID derivation at runtime
    - id: BD-009
      type: BA
      summary: Server ID derived from request.host header parsing
    - id: BD-010
      type: B/DK
      summary: Signal-based shutdown via SIGKILL/SIGTERM selection
    - id: BD-011
      type: B/BA
      summary: Three-endpoint API design (/, /start, /update)
    - id: BD-072
      type: RC
      summary: Federated workflow REQUIRES /start cold-start BEFORE /update warm-start calls per epoch
    - id: BD-078
      type: RC
      summary: 'CSV column convention: X=target, Y=explanatory; extraction order matters for regression'
    - id: BD-079
      type: DK/B
      summary: Standalone and federated modes implement identical iterative SGD loop - code duplication
    - id: BD-017
      type: B
      summary: Identical epoch loop structure to federated_run
    - id: BD-018
      type: B
      summary: Direct lgdModel imports without server abstraction
    - id: BD-080
      type: B/BA
      summary: server_dirs/X/ requires exactly 4 subdirectories with identical CSV structure
resources:
  packages:
  - name: Flask
    version_pin: latest
  - name: scikit-learn
    version_pin: latest
  - name: numpy
    version_pin: latest
  - name: pandas
    version_pin: latest
  - name: scipy
    version_pin: latest
  - name: requests
    version_pin: latest
  - name: ruamel.yaml
    version_pin: latest
  - name: fabric
    version_pin: latest
  - name: Sphinx
    version_pin: latest
  - name: sphinx-rtd-theme
    version_pin: latest
  strategy_scaffold:
    entry_point_name: run_backtest
    output_path: result.csv
    execution_mode: backtest
    conditional_entry_points:
      backtest:
        entry_point_name: run_backtest
        output_path: result.csv
      collector:
        entry_point_name: run_collector
        output_path: result.json
      factor:
        entry_point_name: run_factor
        output_path: result.parquet
      training:
        entry_point_name: run_training
        output_path: result.json
      serving:
        entry_point_name: run_server
        output_path: result.json
      research:
        entry_point_name: run_research
        output_path: result.json
    tail_template: "# === DO NOT MODIFY BELOW THIS LINE ===\nif __name__ == \"__main__\":\n    result = run_backtest()  #\
      \ implement above\n    from validate import enforce_validation\n    enforce_validation(result, output_path=\"{workspace}/result.csv\"\
      )\n# === END DO NOT MODIFY ==="
  host_adapter:
    target: openclaw
    timeout_seconds: 1800
    shell_operator_restriction: 'exec tool intercepts && / ; / | — never chain: ''pip install X && python Y''. Use separate
      exec calls.'
    install_recipes:
    - python3 -m pip install Flask
    - python3 -m pip install scikit-learn
    - python3 -m pip install numpy
    - python3 -m pip install zvt
    credential_injection: JoinQuant/QMT credentials require user-side '!' prefix shell login. Never hardcode credentials in
      generated scripts.
    path_resolution: '{workspace} resolves to ~/.openclaw/workspace/doramagic at execution time.'
    file_io_tooling: Use openclaw 'write' tool for .py/.sql files; 'exec' tool for python3 /absolute/path/script.py (absolute
      paths only).
constraints:
  fatal:
  - id: finance-C-001
    when: When implementing data acquisition for LGD regression model
    action: Return a DataFrame containing exactly 'X' and 'Y' columns
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: The downstream lgdModel.py module accesses df[['X']] and df['Y'] columns without validation, causing KeyError
      exceptions if column names are different
    stage_ids:
    - data_acquisition
  - id: finance-C-002
    when: When implementing local file mode (choice=1) in dataSource
    action: Read CSV file from server_dirs/{server_id}/regression_data.csv path
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: pandas.read_csv will raise FileNotFoundError if the file path is incorrect, and there is no try-except handler
      to provide meaningful error messages
    stage_ids:
    - data_acquisition
  - id: finance-C-004
    when: When configuring data transport for the LGD model
    action: Pass choice values other than 1 or 2 to dataSource
    severity: fatal
    kind: resource_boundary
    modality: must_not
    consequence: If choice is neither 1 nor 2, the function returns None implicitly, causing lgdModel.py to fail when trying
      to access df[['X']] columns
    stage_ids:
    - data_acquisition
  - id: finance-C-005
    when: When deploying the federated model server infrastructure
    action: Start model servers on ports 5001-500N matching server IDs for correct port-to-ID mapping
    severity: fatal
    kind: architecture_guardrail
    modality: must
    consequence: model_server.py:35 computes n = int(port) - 5000 to derive server ID; wrong port causes incorrect data directory
      selection
    stage_ids:
    - data_acquisition
  - id: finance-C-011
    when: When implementing SGDRegressor for federated LGD estimation
    action: Set random_state parameter explicitly to verify reproducibility across federated nodes
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Without random_state, each lgdModel call produces non-deterministic results due to random data shuffling
      and weight initialization. This breaks federated learning convergence guarantees as different nodes will reach different
      local minima.
    stage_ids:
    - model_estimation
  - id: finance-C-013
    when: When providing input data to lgdModel
    action: Verify data source contains columns exactly named 'X' (explanatory) and 'Y' (target)
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: lgdModel accesses df[['X']] and df['Y'] without validation. Missing or misnamed columns will raise KeyError
      at runtime, breaking both standalone and federated execution flows.
    stage_ids:
    - model_estimation
  - id: finance-C-014
    when: When implementing federated parameter averaging logic
    action: Iterate over each participating servers (k from 1 to n) when computing weighted average
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: federated_run.py uses weights[str(n)] for all servers instead of weights[str(k)] for each server k. This
      causes the averaged parameters to use only the last server's weight, corrupting federated model convergence and producing
      incorrect global LGD estimates.
    stage_ids:
    - model_estimation
  - id: finance-C-018
    when: When returning fitted parameters from lgdModel
    action: Return dict with keys 'intercept' and 'coefficient' containing scalar values
    severity: fatal
    kind: architecture_guardrail
    modality: must
    consequence: model_server.py and standalone_run.py access params['intercept'] and params['coefficient']. Returning different
      key names (e.g., 'coef' or 'coefficients') would cause KeyError in all downstream consumers, breaking both standalone
      and federated modes.
    stage_ids:
    - model_estimation
  - id: finance-C-033
    when: When implementing GET /start endpoint for cold-start LGD estimation
    action: Return JSON with 'intercept' and 'coefficient' keys from lgdModel cold-start calculation
    severity: fatal
    kind: architecture_guardrail
    modality: must
    consequence: Federated coordinator fails to parse response causing KeyError in federated_run.py:58-59 when accessing data['coefficient']
    stage_ids:
    - model_serving
  - id: finance-C-034
    when: When implementing POST /update endpoint for warm-start LGD estimation
    action: Accept JSON body with 'intercept' and 'coefficient' fields and return updated parameters in same JSON structure
    severity: fatal
    kind: architecture_guardrail
    modality: must
    consequence: Federated coordination loop breaks when /update response format differs from /start response format
    stage_ids:
    - model_serving
  - id: finance-C-037
    when: When presenting LGD estimation results from model server for regulatory credit risk reporting
    action: Claim that backtest model parameters equal live production model parameters
    severity: fatal
    kind: claim_boundary
    modality: must_not
    consequence: Regulatory non-compliance when presenting simulated model estimates as actual risk quantification without
      noting estimation methodology differences
    stage_ids:
    - model_serving
  - id: finance-C-041
    when: When implementing the initial parameter averaging loop
    action: skip the first server by starting loop index at 1 instead of 0
    severity: fatal
    kind: domain_rule
    modality: must_not
    consequence: The first server's parameters are excluded from initial averaging, causing all subsequent averaged parameters
      to be incorrect and breaking federated convergence. Server 0 contribution is completely lost.
    stage_ids:
    - federated_coordination
  - id: finance-C-042
    when: When implementing the epoch averaging loop
    action: use the loop variable k to index weights instead of using the final index n
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Epoch averaging uses weights[str(n)] inside the loop instead of weights[str(k)], causing only the last server's
      weight (0.0 for n=4) to be applied repeatedly, producing meaningless averaged parameters.
    stage_ids:
    - federated_coordination
  - id: finance-C-043
    when: When initializing SGDRegressor with warm start parameters
    action: pass intercept_init and coef_init to the fit() method of sklearn SGDRegressor
    severity: fatal
    kind: domain_rule
    modality: must_not
    consequence: SGDRegressor.fit() does not accept intercept_init or coef_init parameters. Passing these will raise a TypeError,
      breaking all federated update cycles and preventing convergence.
    stage_ids:
    - federated_coordination
  - id: finance-C-053
    when: When implementing SGDRegressor warm-start parameter initialization
    action: Use sklearn set_params() method to set initial coefficient and intercept values before fitting with warm_start=True
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: SGDRegressor.fit() does not accept intercept_init and coef_init keyword arguments, causing TypeError at runtime
      when warm-starting with pre-existing parameter values
    stage_ids:
    - standalone_execution
  - id: finance-C-054
    when: When implementing warm-start SGDRegressor with external parameter initialization
    action: Set clf.coef_ and clf.intercept_ attributes directly before calling fit(), or use partial_fit() for stateful updates
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Attempting to pass custom initial parameters via non-existent fit() arguments will raise TypeError, breaking
      the epoch iteration loop
    stage_ids:
    - standalone_execution
  - id: finance-C-057
    when: When configuring dataSource for local CSV mode
    action: Set choice parameter to 1 and verify server_dirs/{server}/regression_data.csv exists before calling dataSource()
    severity: fatal
    kind: resource_boundary
    modality: must
    consequence: Missing data directory or incorrect file path will trigger FileNotFoundError, preventing model estimation
      from executing
    stage_ids:
    - standalone_execution
  - id: finance-C-066
    when: When implementing data acquisition, ensure DataFrame schema matches model expectations
    action: Return a pandas DataFrame with columns exactly named 'X' (target variable) and 'Y' (explanatory variable)
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: If column names differ, lgdModel.py line 73 df[['X']] and line 75 df['Y'] will raise KeyError, causing model
      estimation to fail silently or crash
    stage_ids:
    - data_acquisition
    - model_estimation
  - id: finance-C-067
    when: When passing SGDRegressor parameters to HTTP endpoints, ensure proper type extraction
    action: Extract scalar values from numpy arrays using index [0] before returning dict
    severity: fatal
    kind: architecture_guardrail
    modality: must
    consequence: JSON serialization of numpy arrays produces incompatible format that Flask cannot jsonify correctly, causing
      HTTP endpoint failures
    stage_ids:
    - model_estimation
    - model_serving
  - id: finance-C-078
    when: When implementing or validating DataFrame inputs to the LGD estimation model
    action: Provide DataFrames containing both 'X' (target/LGD variable) and 'Y' (explanatory variable) columns, as the model
      extracts X=df[['X']] and y=df['Y']
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: KeyError or incorrect regression results when the model tries to access missing 'X' or 'Y' columns, breaking
      the LGD estimation pipeline
  - id: finance-C-079
    when: When initializing or updating LGD model parameters in federated mode
    action: Pass parameter dictionaries containing both 'intercept' and 'coefficient' keys as required by the sklearn SGDRegressor
      warm-start interface
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: KeyError or TypeError when sklearn fit() receives incorrect parameter dict structure, breaking federated
      parameter exchange
  - id: finance-C-080
    when: When spawning Flask model servers for federated LGD estimation
    action: Assign server ports in the range 5001-5004, as the server code derives server ID via n = int(port) - 5000 to map
      to server_dirs/N/data
    severity: fatal
    kind: architecture_guardrail
    modality: must
    consequence: Incorrect data directory mapping causing FileNotFoundError or loading wrong server's data, breaking the entire
      federated estimation
  - id: finance-C-081
    when: When executing a federated LGD training epoch
    action: Call /start (cold-start) before any /update (warm-start) calls, as the model requires initial parameters to be
      established first
    severity: fatal
    kind: architecture_guardrail
    modality: must
    consequence: Incorrect model parameters propagated to all servers when warm-start is called without prior cold-start,
      leading to divergent or invalid federated estimates
  - id: finance-C-082
    when: When implementing the SGDRegressor-based LGD model iteration
    action: Set max_iter=1 and tol=None to enforce single-epoch per fit() call, as each gradient step must be performed independently
      across federated nodes
    severity: fatal
    kind: architecture_guardrail
    modality: must
    consequence: Multi-epoch convergence within a single fit() call breaks the federated averaging contract, causing incorrect
      parameter aggregation across nodes
  - id: finance-C-085
    when: When deploying the openLGD Flask model servers
    action: Run Flask with debug=True in production or any security-sensitive environment, as this enables code execution
      and interactive debugger
    severity: fatal
    kind: domain_rule
    modality: must_not
    consequence: Remote code execution vulnerability when werkzeug debugger is exposed in production, allowing attackers to
      execute arbitrary code on the server
  - id: finance-C-086
    when: When presenting or marketing openLGD's capabilities to users
    action: Claim that openLGD is suitable for production deployment, as it is explicitly documented as early alpha software
      with unstable API
    severity: fatal
    kind: claim_boundary
    modality: must_not
    consequence: Users deploy alpha software in production, experiencing unexpected API breaking changes, unhandled edge cases,
      and security vulnerabilities
  - id: finance-C-099
    when: When implementing or evaluating federated learning architecture decisions
    action: Verify raw data remains local to each server node and only model parameters (intercept and coefficient) traverse
      the network — must NOT implement centralized data pooling even if technically feasible
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Centralizing raw financial data violates data sovereignty requirements for multi-institution scenarios, causing
      regulatory non-compliance with GDPR, banking secrecy laws, and institutional data sharing prohibitions
    derived_from_bd_id: BD-019
  - id: finance-C-102
    when: When implementing data loading and regression preparation in lgdModel.py and dataSource.py
    action: Extract column 'X' as target values and column 'Y' as explanatory variables in the exact order specified at lgdModel.py:73-75
      — must NOT swap, rename, or use alternative column mappings
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Inverting the X/Y column convention produces an inverted regression model where target and predictor variables
      are swapped, causing completely incorrect LGD estimates and invalid credit risk assessments
    derived_from_bd_id: BD-078
  - id: finance-C-106
    when: When parsing YAML configuration files in federated_run.py
    action: Use ruamel.yaml with safe loading (typ='safe') for YAML parsing; never use yaml.load without specifying a safe
      Loader to prevent arbitrary code execution through YAML deserialization vulnerabilities
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Unsafe YAML loading allows arbitrary code execution from malicious configuration files, creating remote code
      execution vulnerability in production deployments
    derived_from_bd_id: BD-058
  - id: finance-C-108
    when: When implementing federated learning parameter synchronization across distributed servers
    action: Implement per-epoch parameter collection and averaging with explicit epochs configuration; verify each server's
      partial_fit() results are collected and averaged before the next round begins
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Missing per-epoch synchronization causes parameter drift across servers, leading to inconsistent model states
      and failed federated convergence
    derived_from_bd_id: BD-014
  - id: finance-C-121
    when: When implementing or refactoring federated learning aggregation logic
    action: Verify federated averaging operation (parameter aggregation from servers) completes entirely before sending averaged
      parameters to the next epoch — do not parallelize or reorder the averaging step with subsequent epoch processing
    severity: fatal
    kind: architecture_guardrail
    modality: must
    consequence: Skipping or parallelizing the averaging step causes stale or inconsistent parameters to propagate, corrupting
      federated learning convergence and producing models that do not represent true global consensus
    derived_from_bd_id: BD-074
  regular:
  - id: finance-C-003
    when: When implementing API mode (choice=2) in dataSource
    action: Construct URL using localhost:800{server_id} pattern and /api/npl_data/counterparties endpoint
    severity: high
    kind: resource_boundary
    modality: must
    consequence: requests.get will raise ConnectionError if the target server is not running, with no error handling in the
      code
    stage_ids:
    - data_acquisition
  - id: finance-C-006
    when: When running openLGD in production
    action: Expect production-grade API stability from openLGD
    severity: high
    kind: claim_boundary
    modality: must_not
    consequence: README.md:18 explicitly states 'early alpha release' and CHANGELOG.rst:3 warns 'API IS STILL VERY UNSTABLE
      AS MORE USE CASES / FEATURES ARE ADDED REGULARLY'
    stage_ids:
    - data_acquisition
  - id: finance-C-007
    when: When deploying federated mode with API data source (choice=2)
    action: Verify target API server is running before calling dataSource with choice=2
    severity: high
    kind: resource_boundary
    modality: must
    consequence: requests.get() in dataSource.py:41 will raise ConnectionError if the localhost:800X API server is not running,
      and there is no error handling
    stage_ids:
    - data_acquisition
  - id: finance-C-008
    when: When implementing local file mode (choice=1) data acquisition
    action: Verify server_dirs/{server_id} directory exists before attempting to read CSV
    severity: high
    kind: operational_lesson
    modality: must
    consequence: Missing directory will cause pandas.read_csv to raise FileNotFoundError with no custom error message or recovery
      mechanism
    stage_ids:
    - data_acquisition
  - id: finance-C-009
    when: When adding new data sources or modifying data acquisition
    action: Hardcode file paths, URLs, or column names directly in dataSource implementation
    severity: medium
    kind: operational_lesson
    modality: must_not
    consequence: dataSource.py:25 TODO comment explicitly states 'remove file / url path hardwiring', hardcoded paths make
      deployment brittle and non-portable
    stage_ids:
    - data_acquisition
  - id: finance-C-010
    when: When using API mode (choice=2) data acquisition
    action: Handle the nested API call pattern (counterparty list then individual records)
    severity: medium
    kind: operational_lesson
    modality: must
    consequence: dataSource.py:40-47 makes two sequential requests.get calls; if any individual data_url fails, the loop continues
      with incomplete data
    stage_ids:
    - data_acquisition
  - id: finance-C-012
    when: When implementing the cold/warm start toggle logic
    action: Verify both intercept and coef parameters are provided together for warm-start mode
    severity: high
    kind: domain_rule
    modality: must
    consequence: The condition 'if intercept is None or coef is None' triggers cold-start if either parameter is missing.
      Partial initialization with only one parameter will silently fall back to random initialization, producing incorrect
      model updates in the federated loop.
    stage_ids:
    - model_estimation
  - id: finance-C-015
    when: When deploying federated model servers
    action: Use ports other than 5001-5004 for the default configuration without updating both server and client code
    severity: high
    kind: resource_boundary
    modality: must_not
    consequence: model_server.py:35 derives server ID as 'int(port) - 5000', and Federated_Demo.md documents ports 5001-5004.
      Port mismatches between server and controller cause requests to reach wrong servers, breaking federated coordination.
    stage_ids:
    - model_estimation
  - id: finance-C-016
    when: When selecting regression algorithm for LGD estimation
    action: Accept that SGDRegressor is the only available algorithm (not abstracted or configurable)
    severity: medium
    kind: resource_boundary
    modality: must
    consequence: The regression algorithm is hardcoded to sklearn.linear_model.SGDRegressor. Replacing it requires modifying
      lgdModel.py directly. This creates a tight coupling and prevents using alternative algorithms (e.g., ridge regression,
      ElasticNet) without code changes.
    stage_ids:
    - model_estimation
  - id: finance-C-017
    when: When running lgdModel in iterative fashion for federated learning
    action: Set max_iter=1 to verify exactly one gradient step per function call
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: The max_iter=1 setting is critical for the federated learning architecture where each call represents one
      epoch. Increasing max_iter would perform multiple gradient steps per call, breaking the per-epoch parameter update contract
      required by federated averaging.
    stage_ids:
    - model_estimation
  - id: finance-C-019
    when: When claiming LGD estimation capabilities
    action: Claim statistical rigor equivalent to pooled dataset analysis
    severity: high
    kind: claim_boundary
    modality: must_not
    consequence: Federated LGD estimation with SGD produces parameters that may not converge to the pooled optimum due to
      data heterogeneity across servers. Presenting federated estimates as equivalent to centralized estimation would misrepresent
      the statistical properties of the model.
    stage_ids:
    - model_estimation
  - id: finance-C-020
    when: When considering replacing the SGDRegressor implementation
    action: Claim that federated learning produces identical results to centralized estimation
    severity: medium
    kind: claim_boundary
    modality: should_not
    consequence: Federated averaging with SGD is an approximation that depends on data distribution across servers. Different
      server configurations will produce different model parameters even with identical hyperparameters, which is expected
      behavior, not a bug.
    stage_ids:
    - model_estimation
  - id: finance-C-021
    when: When evaluating federated averaging convergence
    action: Skip monitoring parameter stability across epochs
    severity: medium
    kind: operational_lesson
    modality: must_not
    consequence: Without tracking parameter change magnitude across epochs, users cannot determine if the federated process
      has converged. Parameters oscillating or diverging indicate misconfiguration or data quality issues that would go unnoticed.
    stage_ids:
    - model_estimation
  - id: finance-C-022
    when: When performing initial cold-start call
    action: Pass None for both intercept and coef parameters
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: The first federated iteration requires random initialization. Passing non-None values on cold-start would
      improperly seed the federated process with arbitrary values, corrupting the initial global model state.
    stage_ids:
    - model_estimation
  - id: finance-C-023
    when: When implementing port-to-server-ID derivation in Flask endpoints
    action: Validate that port can be converted to integer before subtracting 5000
    severity: high
    kind: domain_rule
    modality: must
    consequence: ValueError exception when Host header contains non-numeric port, causing HTTP 500 response to clients
    stage_ids:
    - model_serving
  - id: finance-C-024
    when: When implementing POST /update endpoint that parses JSON request body
    action: Validate JSON parsing result and check required fields 'intercept' and 'coefficient' exist
    severity: high
    kind: domain_rule
    modality: must
    consequence: KeyError exception when client sends JSON without 'intercept' or 'coefficient' fields, causing HTTP 500 response
    stage_ids:
    - model_serving
  - id: finance-C-026
    when: When implementing Flask model server endpoint that accesses local data
    action: Use server_dirs/{port-5000}/regression_data.csv as the data directory path pattern
    severity: high
    kind: domain_rule
    modality: must
    consequence: FileNotFoundError when server tries to access non-existent data directory, causing cold-start estimation
      to fail
    stage_ids:
    - model_serving
  - id: finance-C-027
    when: When implementing /update endpoint that expects model parameters
    action: Verify request Content-Type is application/json before parsing request body
    severity: medium
    kind: domain_rule
    modality: must
    consequence: Malformed JSON response or HTTP 415 Unsupported Media Type when client sends non-JSON data
    stage_ids:
    - model_serving
  - id: finance-C-028
    when: When implementing model server in federated cluster topology
    action: Run server instances on ports 5001-5004 matching server_dirs/1 through server_dirs/4
    severity: high
    kind: resource_boundary
    modality: must
    consequence: Server with port 5005 incorrectly maps to server_dirs/5 which may not exist, causing data loading failure
    stage_ids:
    - model_serving
  - id: finance-C-029
    when: When deploying Flask-based model server for federated LGD estimation
    action: Accept that Flask development server is single-threaded and not suitable for high-concurrency production workloads
    severity: medium
    kind: resource_boundary
    modality: must
    consequence: HTTP request blocking causing federated coordination timeouts when multiple clients connect simultaneously
    stage_ids:
    - model_serving
  - id: finance-C-030
    when: When configuring model servers for federated estimation workflow
    action: Start each model servers before executing federated_run.py coordinator script
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: ConnectionError when coordinator attempts GET /start or POST /update on unavailable server, breaking federated
      iteration
    stage_ids:
    - model_serving
  - id: finance-C-031
    when: When using openLGD in early alpha release for credit risk estimation
    action: Expect API instability and prepare for breaking changes in each release cycle
    severity: medium
    kind: operational_lesson
    modality: should
    consequence: Silent model parameter changes causing inconsistent LGD estimates across federated nodes after library upgrade
    stage_ids:
    - model_serving
  - id: finance-C-032
    when: When implementing federated LGD estimation with multiple server instances
    action: Verify each server instance has unique port and corresponding server_dirs/{n}/data directory provisioned
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Multiple servers accessing same data directory causing race conditions in CSV file read operations
    stage_ids:
    - model_serving
  - id: finance-C-035
    when: When implementing Flask model server that derives server ID from HTTP Host header
    action: Use request.host header for port extraction to verify multi-tenant isolation per server instance
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Wrong server ID used for data directory access causing data contamination between federated nodes
    stage_ids:
    - model_serving
  - id: finance-C-036
    when: When implementing root endpoint (/) for health check
    action: Return HTTP 200 OK with JSON response indicating server liveness and identity
    severity: medium
    kind: architecture_guardrail
    modality: must
    consequence: Health check monitoring tools fail to detect server availability, causing false alarms in federated cluster
      monitoring
    stage_ids:
    - model_serving
  - id: finance-C-038
    when: When deploying model_server.py in a federated credit risk production system
    action: Advertise the Flask development server as production-ready HTTP service
    severity: high
    kind: claim_boundary
    modality: must_not
    consequence: Security audit failure and operational risk when relying on Flask debug server lacking production hardening
      features
    stage_ids:
    - model_serving
  - id: finance-C-039
    when: When using model server for openLGD federated estimation in alpha stage
    action: Assume API compatibility between minor version upgrades without regression testing
    severity: high
    kind: claim_boundary
    modality: must_not
    consequence: Silent breaking changes in JSON response format causing federated coordination to fail silently or produce
      incorrect averaged parameters
    stage_ids:
    - model_serving
  - id: finance-C-040
    when: When estimating LGD model parameters with federated learning across multiple servers
    action: Claim federated-averaged parameters are equivalent to centrally-computed parameters without mathematical proof
      of convergence
    severity: medium
    kind: claim_boundary
    modality: must_not
    consequence: Incorrect credit risk estimates when federated averaging assumptions (data homogeneity, equal weighting)
      are violated in practice
    stage_ids:
    - model_serving
  - id: finance-C-044
    when: When making HTTP requests to federated servers
    action: include timeout parameters to prevent indefinite blocking on unreachable servers
    severity: high
    kind: resource_boundary
    modality: must
    consequence: Without HTTP timeouts, a single unresponsive server causes the entire federated run to hang indefinitely.
      In production, this blocks all participating servers waiting for the coordinator.
    stage_ids:
    - federated_coordination
  - id: finance-C-045
    when: When configuring server weights for federated averaging
    action: dynamically calculate weights based on actual data volumes or sample counts per server
    severity: high
    kind: operational_lesson
    modality: must
    consequence: Equal weighting assumes equal data volumes across servers. If servers have unequal data (e.g., 100 vs 10000
      samples), the weighted average under-represents larger datasets, producing biased LGD estimates that misrepresent actual
      credit risk.
    stage_ids:
    - federated_coordination
  - id: finance-C-046
    when: When running Flask model servers in production
    action: run with debug=True enabled in production environments
    severity: high
    kind: resource_boundary
    modality: must_not
    consequence: Flask debug mode enables code reloading and Werkzeug debugger, exposing the Python traceback to attackers.
      This creates remote code execution vulnerabilities in production deployments.
    stage_ids:
    - federated_coordination
  - id: finance-C-047
    when: When configuring the number of servers in config.yml
    action: verify the server count matches exactly the number of weights defined in the weights dictionary
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: The weights dictionary is hardcoded for 4 servers. If config.yml specifies servers != 4, the URL construction
      and weight indexing will fail, causing KeyError or IndexError exceptions.
    stage_ids:
    - federated_coordination
  - id: finance-C-048
    when: When presenting federated learning results
    action: claim that the system provides production-ready real-time federated credit risk modeling
    severity: high
    kind: claim_boundary
    modality: must_not
    consequence: The README explicitly states 'This is an early alpha release. openLGD is still in active development'. Presenting
      alpha software as production-ready violates user expectations and regulatory requirements for credit risk models.
    stage_ids:
    - federated_coordination
  - id: finance-C-049
    when: When processing JSON responses from federated servers
    action: validate response structure before accessing dictionary keys
    severity: medium
    kind: domain_rule
    modality: must
    consequence: Without validation, if a server returns malformed JSON or missing keys ('coefficient', 'intercept'), the
      code raises KeyError, crashing the entire federated run mid-epoch.
    stage_ids:
    - federated_coordination
  - id: finance-C-050
    when: When handling HTTP errors from server communication
    action: check HTTP status codes and implement retry logic for transient failures
    severity: medium
    kind: resource_boundary
    modality: must
    consequence: Network partitions, server overload, or temporary unavailability cause HTTP errors that crash the federated
      run. Without error handling, a single epoch failure prevents any parameter updates from being applied.
    stage_ids:
    - federated_coordination
  - id: finance-C-051
    when: When implementing data source abstraction
    action: externalize file paths and URL patterns to configuration instead of hardcoding in source
    severity: medium
    kind: operational_lesson
    modality: must
    consequence: Hardcoded paths like './server_dirs/' and 'http://localhost:800' prevent the system from running in different
      environments without code modifications.
    stage_ids:
    - federated_coordination
  - id: finance-C-052
    when: When scaling the federated system to more than 4 servers
    action: assume the hardcoded weights dictionary remains valid without modification
    severity: high
    kind: architecture_guardrail
    modality: must_not
    consequence: Increasing servers > 4 in config.yml causes KeyError when accessing weights beyond the 4 hardcoded keys,
      crashing the federated run.
    stage_ids:
    - federated_coordination
  - id: finance-C-055
    when: When configuring SGDRegressor for iterative model estimation
    action: Set warm_start=True to enable parameter reuse across consecutive fit() calls
    severity: high
    kind: domain_rule
    modality: must
    consequence: With warm_start=False, each fit() call resets coefficients to random initialization, preventing convergence
      across epochs and producing non-monotonic parameter estimates
    stage_ids:
    - standalone_execution
  - id: finance-C-056
    when: When preparing CSV data for LGD model estimation
    action: Verify data files contain exactly two columns named 'X' (target variable) and 'Y' (explanatory variable) without
      missing values
    severity: high
    kind: domain_rule
    modality: must
    consequence: Mismatched column names or missing values will cause KeyError during DataFrame extraction or produce NaN
      coefficients, invalidating the LGD estimation
    stage_ids:
    - standalone_execution
  - id: finance-C-058
    when: When running standalone execution for environment validation
    action: Execute standalone_run.py first to verify paths, dependencies, and core estimation logic before launching federated
      servers
    severity: medium
    kind: architecture_guardrail
    modality: should
    consequence: Skipping standalone validation may lead to cryptic errors during federated execution when environment issues
      could have been caught earlier
    stage_ids:
    - standalone_execution
  - id: finance-C-059
    when: When configuring standalone execution epochs
    action: Hardcode Epochs value in standalone_run.py when it should be configurable via config.yml like federated_run.py
    severity: medium
    kind: resource_boundary
    modality: must_not
    consequence: Hardcoded 10 epochs prevents testing different convergence behaviors and creates inconsistency between standalone
      and federated execution configurations
    stage_ids:
    - standalone_execution
  - id: finance-C-060
    when: When comparing standalone vs federated estimation results
    action: Use identical epoch loop structure in standalone_run.py and federated_run.py to enable deterministic result comparison
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Different loop structures prevent meaningful validation that standalone lgdModel produces identical results
      to model_server endpoint, defeating the purpose of standalone as validation framework
    stage_ids:
    - standalone_execution
  - id: finance-C-061
    when: When accessing LGD estimation logic from standalone execution
    action: Import lgdModel directly without model_server abstraction to validate core estimation works independently of federation
      infrastructure
    severity: medium
    kind: architecture_guardrail
    modality: must
    consequence: Using model_server endpoints in standalone mode introduces unnecessary HTTP overhead and masks potential
      lgdModel issues behind server interface complexity
    stage_ids:
    - standalone_execution
  - id: finance-C-062
    when: When presenting standalone execution as production system
    action: Claim standalone execution produces production-ready LGD estimates equivalent to enterprise financial systems
    severity: high
    kind: claim_boundary
    modality: must_not
    consequence: openLGD is explicitly documented as 'early alpha' research software; presenting alpha results as production-ready
      violates the project's stated development status
    stage_ids:
    - standalone_execution
  - id: finance-C-063
    when: When claiming standalone results validate federated production deployments
    action: Claim that single-server standalone LGD estimates equal federated multi-server estimates without accounting for
      data partitioning and averaging differences
    severity: medium
    kind: claim_boundary
    modality: must_not
    consequence: Standalone runs use consolidated data while federated runs partition data across servers and average parameters,
      producing different estimation landscapes
    stage_ids:
    - standalone_execution
  - id: finance-C-064
    when: When executing standalone_run.py without understanding SGDRegressor convergence
    action: Assume 10 epochs produces converged parameters without verifying coefficient stability across consecutive epochs
    severity: medium
    kind: operational_lesson
    modality: must_not
    consequence: With tol=None and max_iter=1 per fit() call, 10 external epochs may be insufficient for convergence with
      complex datasets, leading to unreliable LGD estimates
    stage_ids:
    - standalone_execution
  - id: finance-C-065
    when: When using sklearn SGDRegressor with stochastic gradient descent for financial modeling
    action: Set random_state parameter explicitly to verify reproducible coefficient estimates across executions
    severity: high
    kind: domain_rule
    modality: must
    consequence: Without explicit random_state, SGDRegressor will produce different coefficient estimates on each run due
      to random shuffling of training samples, preventing reproducible validation
    stage_ids:
    - standalone_execution
  - id: finance-C-068
    when: When sending parameters to POST /update endpoint, ensure Content-Type header is set
    action: 'Set HTTP header ''Content-Type'': ''application/json'' when posting JSON data'
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Without proper Content-Type header, Flask may parse request.data incorrectly, causing json.loads to fail
      with UnicodeDecodeError
    stage_ids:
    - federated_coordination
    - model_serving
  - id: finance-C-069
    when: When implementing federated averaging, ensure loop iterates over each servers
    action: Use correct loop variable in weight index - must be k, not n
    severity: high
    kind: domain_rule
    modality: must_not
    consequence: Loop on line 64 uses range(1, n) which excludes server 0, then uses weights[str(n)] which may be undefined,
      causing KeyError or incorrect weighted averaging
    stage_ids:
    - federated_coordination
  - id: finance-C-070
    when: When configuring federated workflow, ensure weights sum to 1.0 for proper averaging
    action: Validate that weight sum equals 1.0 or is proportional across each participating servers
    severity: high
    kind: domain_rule
    modality: must
    consequence: Incorrect weights cause model parameters to be improperly averaged, leading to biased LGD estimates and incorrect
      credit risk capital calculations
    stage_ids:
    - federated_coordination
  - id: finance-C-071
    when: When loading config.yml for federated coordination, validate each required keys exist
    action: Check that config contains 'hosts', 'epochs', and 'servers' keys before accessing them
    severity: high
    kind: resource_boundary
    modality: must
    consequence: Missing config keys cause KeyError when accessing config['hosts'], config['epochs'], or config['servers'],
      preventing federated coordination from starting
    stage_ids:
    - federated_coordination
  - id: finance-C-072
    when: When mapping server port to server ID, ensure port follows the 5000+ convention
    action: Format model server URL as base URL plus server number without trailing slash
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Incorrect URL format causes requests.get() to fail with ConnectionError, preventing federated parameter aggregation
    stage_ids:
    - federated_coordination
    - model_serving
  - id: finance-C-073
    when: When loading data from CSV, ensure the file has exactly 2 columns with proper headers
    action: Validate CSV structure returns DataFrame with exactly columns 'X' and 'Y'
    severity: medium
    kind: domain_rule
    modality: must
    consequence: CSV parsing errors or missing columns cause KeyError in lgdModel.py, producing incorrect LGD estimates without
      warning
    stage_ids:
    - data_acquisition
  - id: finance-C-074
    when: When receiving JSON in POST request, ensure both 'intercept' and 'coefficient' keys exist
    action: Validate params dictionary contains both 'intercept' and 'coefficient' keys before passing to lgdModel
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Missing keys cause KeyError in model_server.py:44-45, crashing the update endpoint and halting federated
      training
    stage_ids:
    - model_serving
  - id: finance-C-075
    when: When returning parameters from GET /start endpoint, ensure dict can be serialized
    action: Return dict with float values (not numpy scalars) for JSON serialization compatibility
    severity: medium
    kind: architecture_guardrail
    modality: must
    consequence: Flask jsonify fails on numpy float64 types, returning 500 Internal Server Error and breaking federated coordination
    stage_ids:
    - model_serving
  - id: finance-C-076
    when: When using openLGD for credit risk decisions, do not present simulated results as validated outcomes
    action: Claim results are for 'research and validation purposes' rather than production credit risk quantification
    severity: medium
    kind: claim_boundary
    modality: must_not
    consequence: Presenting early alpha LGD estimates as validated credit risk parameters may violate regulatory requirements
      for capital calculation under Basel frameworks
    stage_ids:
    - model_estimation
    - model_serving
  - id: finance-C-077
    when: When encountering model convergence warnings, do not skip investigation by assuming 'early iterations are normal'
    action: Investigate each convergence warnings before continuing federated iterations
    severity: low
    kind: rationalization_guard
    modality: must_not
    consequence: Skipping convergence investigation may hide numerical instability or data quality issues, producing unreliable
      LGD estimates
    stage_ids:
    - model_estimation
    - federated_coordination
  - id: finance-C-083
    when: When completing a federated averaging round before sending parameters to the next epoch
    action: Complete the averaging of each node parameters before distributing averaged params to any node, as premature sending
      of partial averages corrupts model convergence
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Partial averages sent to nodes cause parameter drift and non-convergence in subsequent epochs, producing
      invalid LGD model coefficients
  - id: finance-C-084
    when: When configuring federated averaging weights across model servers
    action: Adjust server weights proportionally when changing the number of participating servers, as the default 0.25 weight
      assumes exactly 4 equal-data-volume nodes
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Incorrect weighted averaging causes biased LGD model parameters when server data volumes differ from the
      4-node equal-weight assumption
  - id: finance-C-087
    when: When using openLGD for credit risk decision-making
    action: Claim that federated LGD model estimates are equivalent to centralized model estimates, as data distribution assumptions
      differ between modes
    severity: high
    kind: claim_boundary
    modality: must_not
    consequence: Incorrect regulatory capital calculations when federated estimates diverge from centralized benchmarks without
      proper validation methodology
  - id: finance-C-088
    when: When selecting an LGD modeling approach
    action: Claim that openLGD supports non-linear LGD modeling, as the implementation uses only sklearn SGDRegressor with
      linear loss
    severity: high
    kind: claim_boundary
    modality: must_not
    consequence: Incorrect credit risk assessments when users expect non-linear LGD capabilities (GLM with binomial, beta
      regression) that openLGD does not provide
  - id: finance-C-089
    when: When scaling the federated LGD deployment beyond the demo configuration
    action: Claim that openLGD supports large-scale federated deployments with many servers, as the sequential communication
      architecture creates a bottleneck
    severity: high
    kind: claim_boundary
    modality: must_not
    consequence: Severe performance degradation or timeout failures when scaling beyond 4 servers due to sequential HTTP-based
      parameter exchange
  - id: finance-C-090
    when: When using SGDRegressor with the configured hyperparameters
    action: Claim that the model will converge to an optimal solution within any specific number of epochs, as tol=None disables
      convergence checking
    severity: medium
    kind: claim_boundary
    modality: must_not
    consequence: Users may stop training prematurely expecting convergence, leading to under-fitted LGD models with suboptimal
      coefficient estimates
  - id: finance-C-091
    when: When implementing federated learning workflows based on this blueprint
    action: Fetch actual node data shapes via a controlled API instead of hardcoding weights, as TODO at federated_run.py:33
      acknowledges this limitation
    severity: medium
    kind: operational_lesson
    modality: must
    consequence: Incorrect model averaging when actual server data volumes differ from the assumed equal distribution, producing
      biased LGD estimates
  - id: finance-C-092
    when: When sourcing LGD training data through the dataSource abstraction
    action: Use parametric paths (server_dirs/N/datafile.csv or proper openNPL API endpoints) as hardcoded in dataSource.py,
      or risk data loading failures
    severity: high
    kind: resource_boundary
    modality: must
    consequence: FileNotFoundError or data loading failures when data files are not in the expected parametric locations,
      breaking both standalone and federated modes
  - id: finance-C-093
    when: When selecting data loading method in the LGD estimation workflow
    action: Use choice=1 for local CSV files or choice=2 for openNPL REST API, as the dataSource function branches on these
      discrete values only
    severity: medium
    kind: resource_boundary
    modality: must
    consequence: Unexpected behavior or data loading failure when using unsupported choice values for data sourcing
  - id: finance-C-094
    when: When implementing or refactoring training loop logic in standalone_run.py
    action: Maintain identical epoch loop structure and Epochs configuration as federated_run.py to verify valid comparison
      between standalone and federated training results
    severity: high
    kind: domain_rule
    modality: must
    consequence: Modifying the epoch loop structure independently in standalone_run.py breaks the comparison guarantee between
      standalone and federated training modes, making it impossible to verify that federation complexity does not introduce
      behavioral changes
    derived_from_bd_id: BD-017
  - id: finance-C-095
    when: When implementing or refactoring model initialization logic in lgdModel.py
    action: Assume parameters can be passed as None when intending to use existing values — always distinguish between 'parameter
      not provided' (use existing) and 'parameter explicitly set to None' (cold start)
    severity: high
    kind: operational_lesson
    modality: must_not
    consequence: Confusing None vs not-present causes silent cold starts that reset model state, producing incorrect LGD estimates
      and invalidating credit risk calculations
    derived_from_bd_id: BD-006
  - id: finance-C-096
    when: When implementing federated protocol communication between coordinator and servers
    action: Exchange only the two scalar parameters (intercept and coefficient) per communication round — must NOT add gradient
      vectors, Hessian information, or additional statistics to the payload
    severity: high
    kind: domain_rule
    modality: must
    consequence: Adding extra parameters to federated exchanges increases bandwidth requirements and attack surface for parameter
      tampering, violating the minimal payload design essential for bandwidth-constrained environments
    derived_from_bd_id: BD-028
  - id: finance-C-097
    when: When implementing data acquisition for LGD model training
    action: Query the openNPL API endpoint /api/npl_data/counterparties for structured entity data including financial metrics
      — must NOT use alternative data sources without validation against openNPL schema
    severity: high
    kind: domain_rule
    modality: must
    consequence: Using mismatched data sources causes schema incompatibilities with downstream LGD estimation, potentially
      producing meaningless regression results or silent data corruption
    derived_from_bd_id: BD-043
  - id: finance-C-098
    when: When modifying training epoch configuration across the federated learning system
    action: Update epoch count in both config.yml and standalone_run.py simultaneously — implement a centralized constant
      or import from a shared module to prevent dual maintenance drift
    severity: medium
    kind: architecture_guardrail
    modality: should
    consequence: Updating epochs in only one location causes divergent training duration between federated and standalone
      modes, invalidating comparative results and producing non-reproducible experiments
    derived_from_bd_id: BD-081
  - id: finance-C-100
    when: When implementing FedAvg aggregation logic in federated_run.py
    action: Fetch actual node data shapes (sample counts) via controlled API and apply weighted averaging proportional to
      local data volumes — must NOT use hardcoded equal weights (0.25 each) in production environments
    severity: high
    kind: domain_rule
    modality: must
    consequence: Using equal weights when datasets have heterogeneous sizes causes model convergence bias toward smaller nodes,
      producing suboptimal LGD estimates that systematically underestimate risk for larger institutions
    derived_from_bd_id: BD-059
  - id: finance-C-101
    when: When implementing federated coordination and model aggregation in federated_run.py
    action: Use Federated Averaging (FedAvg) algorithm with synchronous rounds and parameter-level averaging only — must NOT
      implement asynchronous averaging, differential privacy mechanisms, or secure aggregation without explicit architectural
      approval
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Using alternative aggregation methods without architectural review breaks the synchronous FedAvg assumption,
      potentially causing parameter staleness, convergence failures, or compatibility issues with existing server implementations
    derived_from_bd_id: BD-012
  - id: finance-C-103
    when: When importing and using the lgdModel module for standalone execution
    action: Import lgdModel directly via 'from lgdModel import lgdModel' to verify standalone execution without network dependencies;
      do not introduce HTTP client abstractions that would couple core estimation to the model_server layer
    severity: high
    kind: domain_rule
    modality: must
    consequence: Refactoring to use HTTP client for lgdModel would break standalone execution, preventing unit testing and
      local development without network infrastructure
    derived_from_bd_id: BD-018
  - id: finance-C-104
    when: When implementing linear regression for LGD estimation in federated learning
    action: Use SGDRegressor from scikit-learn with partial_fit() for incremental learning; do not replace with OLS closed-form
      solution or other batch-only algorithms that require centralized data
    severity: high
    kind: domain_rule
    modality: must
    consequence: Switching to OLS or batch-only regression breaks the federated learning architecture, requiring centralized
      data aggregation that violates distributed processing assumptions
    derived_from_bd_id: BD-020
  - id: finance-C-105
    when: When implementing LGD estimation logic
    action: Use SGDRegressor as the regression algorithm; be aware that switching to alternative algorithms (e.g., RandomForest,
      neural networks) requires implementing abstract base class, factory pattern, and serialization abstraction layers
    severity: medium
    kind: architecture_guardrail
    modality: must
    consequence: Hardcoded SGDRegressor assumption means alternative algorithms require significant refactoring; strategy
      accuracy depends on regression model choice and must be validated independently
    derived_from_bd_id: BD-007
  - id: finance-C-107
    when: When configuring federated learning server infrastructure
    action: Use explicit server ID configuration via environment variable instead of port-derived ID (n = port - 5000); verify
      port availability in the 5001-5004 range before startup; implement graceful degradation when ports are unavailable
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Port-derived server ID creates cascading failure risk where port conflicts prevent server startup, causing
      health check failures that halt entire federated execution
    derived_from_bd_id: BD-084
  - id: finance-C-109
    when: When implementing model training with partial_fit() in lgdModel.py
    action: Use max_iter=1 for each partial_fit() call to maintain explicit per-epoch control in the federated orchestration
      loop — do not change to higher values as this blurs the distinction between local and global epochs
    severity: high
    kind: domain_rule
    modality: must
    consequence: Increasing max_iter beyond 1 causes local optimization iterations to blend with global federated rounds,
      making convergence analysis unreliable and breaking the federated coordination contract
    derived_from_bd_id: BD-021
  - id: finance-C-110
    when: When implementing server ID derivation in model_server.py
    action: Assume consecutive port allocation starting from 5001 for server ID derivation — use explicit server ID configuration
      instead
    severity: medium
    kind: operational_lesson
    modality: should_not
    consequence: Port-based ID derivation assumes a specific port numbering scheme that may not hold in all deployment scenarios,
      causing server ID mismatches when ports are allocated non-consecutively
    derived_from_bd_id: BD-008
  - id: finance-C-111
    when: When designing APIs for model serving endpoints
    action: Implement external state management for distributed deployments — the /, /start, /update three-endpoint design
      assumes stateless operation and does not handle distributed state coordination
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Stateless API design fails in distributed scenarios where multiple server instances require coordination,
      causing inconsistent state across requests
    derived_from_bd_id: BD-011
  - id: finance-C-112
    when: When implementing server lifecycle management
    action: Implement the /stop endpoint for graceful shutdown to verify in-flight requests complete and state is properly
      finalized before termination
    severity: medium
    kind: operational_lesson
    modality: should
    consequence: Abrupt server termination without graceful shutdown risks data corruption and leaves clients with incomplete
      responses
    derived_from_bd_id: BD-047
  - id: finance-C-113
    when: When setting up project dependencies
    action: Use virtual environment isolation for dependency management to prevent conflicts with system packages and verify
      reproducible builds
    severity: medium
    kind: operational_lesson
    modality: should
    consequence: System-wide package installation risks breaking system packages and causes dependency conflicts across projects
    derived_from_bd_id: BD-049
  - id: finance-C-114
    when: When integrating with sklearn utilities or ML pipelines
    action: Use standard sklearn convention where X=features and y=target — the code uses reversed convention with X as target
      and y as explanatory variables
    severity: high
    kind: domain_rule
    modality: must_not
    consequence: Reversed X/y convention causes silent failures when using sklearn utilities expecting standard ordering,
      producing incorrect model predictions or cryptic errors
    derived_from_bd_id: BD-068
  - id: finance-C-115
    when: When implementing federated round orchestration with sequential blocking
    action: Implement timeout handling and retry logic for individual server calls — sequential blocking with BD-072 (/start
      before /update) and BD-074 (averaging before next epoch) creates cascading deadlock if any server becomes unresponsive
      mid-round
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: A single slow or unresponsive server during /start or /update blocks the entire federated round with no timeout
      mechanism, causing cascading timeouts across all rounds
    derived_from_bd_id: BD-087
  - id: finance-C-116
    when: When configuring training epochs for LGD model
    action: Centralize epoch configuration in config.yml and import from it in both standalone_run.py and federated_run.py
      — do not hardcode epoch values separately
    severity: high
    kind: domain_rule
    modality: must
    consequence: Dual-hardcoded epoch values create maintenance hazard; updating epochs in one file but not the other causes
      federated and standalone modes to train for different durations, invalidating BD-055 validation baseline
    derived_from_bd_id: BD-088
  - id: finance-C-117
    when: When parsing server port numbers to derive server IDs
    action: Hardcode the magic formula n = int(port) - 5000 — the port-to-ID mapping depends on a specific port range allocation
      that must remain consistent
    severity: medium
    kind: architecture_guardrail
    modality: must_not
    consequence: Hardcoded port offset makes server ID derivation brittle; changing port allocation scheme breaks ID mapping
      silently throughout the system
    derived_from_bd_id: BD-075
  - id: finance-C-118
    when: When refactoring training loop code
    action: Consider extracting the shared SGD training loop from standalone_run.py and federated_run.py into a common module
      to eliminate duplication — duplicate logic in epoch_loop across both files creates maintenance risk
    severity: low
    kind: operational_lesson
    modality: should
    consequence: Identical training loop logic duplicated in two files requires synchronized updates; changes applied to one
      file but not the other cause divergent behavior between modes
    derived_from_bd_id: BD-079
  - id: finance-C-119
    when: When implementing federated learning fit logic using partial_fit() calls
    action: Enable warm_start on the SGDRegressor — warm_start must remain False to verify each partial_fit() call starts
      fresh without leveraging optimizer state from previous iterations
    severity: high
    kind: domain_rule
    modality: must_not
    consequence: Setting warm_start=True preserves optimizer state across partial_fit() calls, causing unintended state carryover
      between federated rounds and breaking the parameter averaging protocol semantics
    derived_from_bd_id: BD-052
  - id: finance-C-120
    when: When selecting features for Loss Given Default (LGD) credit risk estimation
    action: Verify that current_assets and cash_and_cash_equivalent_items are the intended features — if replacing with alternative
      features, verify liquidity characteristics are still captured as these are fundamental to credit risk modeling
    severity: medium
    kind: operational_lesson
    modality: should
    consequence: Using alternative features without liquidity coverage may cause the LGD model to underestimate default losses
      for asset-heavy borrowers, leading to insufficient provision calculations in live trading
    derived_from_bd_id: BD-044
  - id: finance-C-122
    when: When implementing warm-start functionality using coef_init or intercept_init parameters
    action: Explicitly set warm_start=True before calling fit() to enable parameter reuse — without warm_start=True, coef_init
      and intercept_init only apply to the first fit() call and subsequent calls will reinitialize parameters, silently discarding
      warm-start behavior
    severity: high
    kind: domain_rule
    modality: must
    consequence: If warm_start=False (default), coef_init and intercept_init parameters are ignored after the first fit()
      call, causing warm-start attempts to silently fail and lose previously learned parameter state
    derived_from_bd_id: BD-070
  - id: finance-C-123
    when: When consuming model output from lgdModel.fit()
    action: Verify that the consumer code expects dictionary return type from fitted_params — if integrating with downstream
      systems, verify dict interface compatibility or implement explicit type handling; coordinate with team before changing
      return format to tuple or dataclass
    severity: medium
    kind: operational_lesson
    modality: must
    consequence: Return format assumes dict interface consumer; if downstream systems expect a different type or the return
      format changes, data consumption breaks silently causing downstream processing failures
    derived_from_bd_id: BD-071
  - id: finance-C-124
    when: When implementing stateless server endpoints with warm-start functionality
    action: Do not rely on stateless server architecture for /update endpoints that require warm-start — implement state persistence
      via coordinator tracking iteration state, sticky sessions with persistent server instances, or shared state store; parameters
      received via intercept_init/coef_init must be preserved across requests
    severity: high
    kind: domain_rule
    modality: must
    consequence: Stateless server design initializes fresh state per request, but warm-start requires parameter state preservation;
      parameters passed via intercept_init/coef_init are silently discarded, causing the federated protocol to produce inconsistent
      models across rounds
    derived_from_bd_id: BD-082
  - id: finance-C-125
    when: When integrating data from multiple sources using X/Y column conventions
    action: Implement schema validation for column mapping between openNPL API fields and X/Y conventions — verify that hardcoded
      field mappings (current_assets, cash_and_cash_equivalent_items) in dataSource function remain synchronized with upstream
      API schema; add explicit error handling if expected columns are missing or renamed
    severity: high
    kind: domain_rule
    modality: must
    consequence: Hardcoded X/Y column convention breaks silently when openNPL API schema changes, causing incorrect feature
      extraction with no obvious error; downstream models train on misaligned data producing invalid LGD estimates
    derived_from_bd_id: BD-083
  - id: finance-C-126
    when: When implementing federated learning coordination logic in the framework
    action: Implement sequential blocking without timeout handling for inter-server ordering contracts — this creates a cascading
      deadlock vulnerability when servers have heterogeneous data volumes
    severity: high
    kind: architecture_guardrail
    modality: must_not
    consequence: Sequential blocking with no timeouts causes the federation to deadlock when any server experiences extended
      training time due to larger datasets; all participating servers hang waiting for the slowest server, causing complete
      federation failure
    derived_from_bd_id: BD-089
  - id: finance-C-127
    when: When implementing federated learning round coordination in the framework
    action: Implement timeout handling for inter-server ordering contracts and add data volume heterogeneity checks before
      initiating coordination rounds
    severity: high
    kind: domain_rule
    modality: must
    consequence: Without timeout handling and heterogeneity checks, the federation will experience cascading deadlocks when
      servers have significantly different dataset sizes, causing complete round failures and federation collapse
    derived_from_bd_id: BD-089
  - id: finance-C-128
    when: When implementing or modifying model serving logic
    action: Initialize model from scratch for each request using server identifier — do not cache model state in server memory
      between requests
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: 'Stateful model servers cause backtest-live inconsistency: in containerized deployments, instances may be
      created/destroyed with stale cached state, and load-balanced multi-instance setups may route requests to instances with
      outdated models, leading to unpredictable execution results'
    derived_from_bd_id: BD-038
  - id: finance-C-129
    when: When deploying model servers in containerized or load-balanced environments
    action: Verify each request loads model parameters from persistent storage (server_dirs/{server_id}) independently — confirm
      no in-memory model caching across requests
    severity: high
    kind: domain_rule
    modality: must
    consequence: 'In-memory caching of model state causes platform-dependent behavior: different container instances may have
      different cached states, making backtest results non-reproducible across deployment configurations'
    derived_from_bd_id: BD-038
  - id: finance-C-130
    when: When implementing federated averaging weight configuration
    action: Verify that sample_count is equal across each servers before using equal weights; if sample counts differ significantly,
      implement proportional weighting based on actual sample counts per server
    severity: medium
    kind: operational_lesson
    modality: should
    consequence: Equal weighting (25% each) silently distorts federated model accuracy when servers have unequal data volumes;
      in production, servers with smaller datasets are over-weighted while larger datasets are under-weighted, leading to
      suboptimal model convergence and degraded prediction accuracy
    derived_from_bd_id: BD-013/BD-015
  - id: finance-C-131
    when: When scaling the federated learning system beyond demo scale (>4 servers)
    action: Replace blocking sequential HTTP requests with async parallel execution (asyncio with aiohttp) or thread pool
      to reduce latency from O(n) linear scaling to near-constant time
    severity: medium
    kind: operational_lesson
    modality: should
    consequence: Sequential blocking communication creates linear latency growth O(n) with server count; for 8+ servers, round-trip
      time doubles compared to parallel execution, causing unacceptable delays in production federated training rounds
    derived_from_bd_id: BD-016
output_validator:
  assertions:
  - id: OV-01
    check_predicate: all(p in inspect.getsource(zvt.factors.algorithm.macd) for p in ['slow=26', 'fast=12', 'n=9'])
    failure_message: 'FATAL: MACD params drifted from (fast=12, slow=26, n=9) — SL-08 violation, non-reproducible signals'
    business_meaning: Standard MACD parameters are a semantic lock; drift makes results incomparable with industry-standard
      indicators and non-reproducible.
    source_ids:
    - SL-08
    - BD-036
  - id: OV-02
    check_predicate: result.get('total_trades', 0) > 0 or result.get('explicit_zero_trade_ack') is True
    failure_message: Zero trades executed — likely missing pre-fetched data (see PC-02) or over-restrictive filters
    business_meaning: A backtest with zero trades is not a valid result; either data is missing or the strategy never triggered.
      Structural non-emptiness check is insufficient — we need business confirmation.
    source_ids:
    - SL-01
    - finance-C-073
  - id: OV-03
    check_predicate: result.get('annual_return') is None or abs(float(result['annual_return'])) <= 5.0
    failure_message: 'FATAL: |annual_return| > 500% — likely look-ahead bias or data error'
    business_meaning: Annual returns exceeding 500% are physically implausible for A-share strategies; indicates look-ahead
      bias or corrupt data.
    source_ids: []
  - id: OV-04
    check_predicate: result.get('holding_change_pct') is None or abs(float(result['holding_change_pct'])) <= 1.0
    failure_message: 'FATAL: |holding_change_pct| > 100% — physically impossible'
    business_meaning: Holding change percentage cannot exceed 100%; violation indicates position accounting error.
    source_ids:
    - BD-029
  - id: OV-05
    check_predicate: result.get('max_drawdown') is None or abs(float(result['max_drawdown'])) <= 1.0
    failure_message: 'FATAL: |max_drawdown| > 100% — impossible for non-leveraged account'
    business_meaning: Maximum drawdown cannot exceed 100% without leverage; violation indicates calculation error or look-ahead
      bias.
    source_ids: []
  - id: OV-06
    check_predicate: not (hasattr(result, 'trade_log') and result.trade_log and any(result.trade_log[i].action == 'sell' and
      i+1 < len(result.trade_log) and result.trade_log[i+1].action == 'buy' and result.trade_log[i].timestamp == result.trade_log[i+1].timestamp
      for i in range(len(result.trade_log)-1)))
    failure_message: 'FATAL: buy-before-sell detected in same cycle — SL-01 violation, creates implicit leverage'
    business_meaning: SL-01 requires sell() before buy() in each cycle; violation means available_long was not updated before
      buying, risking duplicate positions.
    source_ids:
    - SL-01
  scaffold:
    validate_py_path: '{workspace}/validate.py'
    tail_block: "# === DO NOT MODIFY BELOW THIS LINE ===\nif __name__ == \"__main__\":\n    result = run_backtest()\n    from\
      \ validate import enforce_validation\n    enforce_validation(result, output_path=\"{workspace}/result.csv\")\n# ===\
      \ END DO NOT MODIFY ==="
  enforcement_protocol: 1. Never edit validate.py. 2. Never delete the DO NOT MODIFY tail block from the main script. 3. Never
    wrap enforce_validation() in try/except. 4. Never rewrite result write logic — it MUST go through enforce_validation.
    5. If validate.py raises ImportError, fix the dependency, do not remove the call.
acceptance:
  hard_gates:
  - id: G1
    check: '{workspace}/result.csv exists AND file size > 0'
    on_fail: Strategy did not produce output; check run_backtest() return value and enforce_validation() call
  - id: G2
    check: '{workspace}/result.csv.validation_passed marker file exists'
    on_fail: Validation did not complete; review validate.py output and fix assertion failures
  - id: G3
    check: 'Main script contains literal: from validate import enforce_validation'
    on_fail: Validation chain stripped; re-add the import in the DO NOT MODIFY block
  - id: G4
    check: 'Main script contains literal: # === DO NOT MODIFY BELOW THIS LINE ==='
    on_fail: Validation fence removed; regenerate DO NOT MODIFY tail block
  - id: G5
    check: 'result.csv has at least 1 row: pandas.read_csv(result.csv).shape[0] >= 1'
    on_fail: Empty result; check if trade_log is non-empty and factors generated signals. Confirm PC-02 (k-data exists) passed.
  - id: G6
    check: 'If MACD strategy: source contains ''slow=26'' AND ''fast=12'' AND ''n=9'' in algorithm call'
    on_fail: MACD params drifted from SL-08 lock; restore standard (12, 26, 9)
  - id: G7
    check: 'For data pipeline tasks: result.csv contains ''entity_id'' and ''timestamp'' fields'
    on_fail: Missing required columns; check Mixin.query_data return schema and DataFrame MultiIndex reset_index() before
      writing
  - id: G8
    check: 'OV-03 passes: abs(annual_return) <= 5.0 (500%)'
    on_fail: Physical plausibility check failed; investigate look-ahead bias or data corruption in input kdata
  soft_gates:
  - id: SG-01
    rubric: 'Strategy narrative consistency: user intent aligns with generated strategy.py logic. dim_a: signal direction
      (buy/sell) matches intent [1-5, pass>=4]; dim_b: frequency (daily/intraday) aligns [1-5, pass>=4]; dim_c: risk controls
      match user intent [1-5, pass>=4].'
  - id: SG-02
    rubric: 'Factor combination quality. dim_a: no highly correlated factor duplication [1-5, pass>=4]; dim_b: multi-period
      alignment correct [1-5, pass>=4]; dim_c: liquidity filter present for A-share [1-5, pass>=4].'
  - id: SG-03
    rubric: 'Data source selection appropriateness. dim_a: coverage sufficient for target entities [1-5, pass>=4]; dim_b:
      provider latency acceptable for strategy frequency [1-5, pass>=4]; dim_c: no unauthorized provider used without credentials
      [1-5, pass>=4].'
skill_crystallization:
  trigger: all_hard_gates_passed AND user_opt_out_skill_saving != true
  output_path_template: '{workspace}/../skills/{slug}.skill'
  slug_template: '{blueprint_id_short}-{uc_id_lower}'
  captured_fields:
  - name
  - intent_keywords
  - entry_point_script
  - validate_script
  - fatal_constraints
  - spec_locks
  - preconditions
  - install_recipes
  - human_summary_translated
  action: 'After all Hard Gates PASS, resolve slug via slug_template using the executed UC, then write the .skill YAML file
    at output_path_template. Notify user in their detected locale: ''Skill saved as {slug}.skill — next time say one of {sample_triggers}
    from the matched UC to invoke directly.'''
  violation_signal: All hard gates passed but no .skill file exists at expected path
  skill_file_schema:
    name: finance-bp-112 / Sphinx Documentation Configuration
    version: v5.3
    intent_keywords:
    - documentation
    - sphinx
    - configuration
    - build docs
    - project setup
    entry_point: run_backtest
    fatal_guards:
    - SL-01
    - SL-02
    - SL-03
    - SL-04
    - SL-05
    - SL-06
    - SL-07
    - SL-08
    - SL-10
    - SL-11
    - SL-12
    spec_locks:
    - SL-01
    - SL-02
    - SL-03
    - SL-04
    - SL-05
    - SL-06
    - SL-07
    - SL-08
    - SL-09
    - SL-10
    - SL-11
    - SL-12
    preconditions:
    - PC-01
    - PC-02
    - PC-03
    - PC-04
post_install_notice:
  trigger: skill_installation_complete
  message_template:
    positioning: I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow.
    capability_catalog:
      group_strategy:
        source: auto_grouped
        strategy_reason: no candidate field had 2-7 distinct values; all capabilities collapsed into single group
      groups:
      - group_id: all
        name: All Capabilities
        description: ''
        emoji: 📦
        uc_count: 1
        ucs:
        - uc_id: UC-101
          name: Sphinx Documentation Configuration
          short_description: This file configures the Sphinx documentation builder for the openLGD project, setting up project
            metadata, version information, and path configuratio
          sample_triggers:
          - documentation
          - sphinx
          - configuration
    call_to_action: Tell me which one you want to try.
    featured_entries:
    - uc_id: UC-101
      beginner_prompt: Try sphinx documentation configuration
      auto_selected: true
    - uc_id: UC-100
      beginner_prompt: Try capability UC-100
      auto_selected: true
    - uc_id: UC-101
      beginner_prompt: Try capability UC-101
      auto_selected: true
    more_info_hint: Ask me 'what else can you do?' to see all 1 capabilities.
  locale_rendering:
    instruction: On skill_installation_complete, translate ALL user-facing strings (positioning + capability_catalog.groups[].name
      + capability_catalog.groups[].description + capability_catalog.groups[].ucs[].short_description + call_to_action + featured_entries[].beginner_prompt
      + more_info_hint) into detected user locale per locale_contract. Preserve UC-IDs, group_id, emoji, and sample_triggers
      verbatim.
    preserve_verbatim:
    - UC-IDs
    - group_id
    - emoji
    - sample_triggers
    - technical_class_names
  enforcement:
    action: 'Host agent MUST send composed message to user as the FIRST user-facing response after skill_installation_complete
      event. Message MUST contain: positioning, capability_catalog (rendered as markdown tables per group), 3 featured_entries,
      call_to_action, and more_info_hint.'
    violation_code: PIN-01
    violation_signal: First user-facing message post-install does not contain the full capability_catalog (all UCs grouped)
      OR skips featured_entries OR skips call_to_action.
human_summary:
  persona: Doraemon
  what_i_can_do:
    tagline: 'I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow. Just tell me
      what you want; I''ll write the code, you don''t have to dig docs. (Heads up: ZVT natively supports A-share, HK, and
      crypto. US stocks — stockus_nasdaq_AAPL — are half-baked; don''t bother for serious work.)'
    use_cases:
    - Sphinx Documentation Configuration
    - A-share MACD daily golden-cross backtest with hfq price adjustment from eastmoney
    - 'End-to-end ZVT pipeline: FinanceRecorder + GoodCompanyFactor + StockTrader'
    - Multi-factor strategy with TargetSelector (AND mode) combining MACD + volume breakout
    - Index composition data collection (SZ1000, SZ2000) with EM recorder
    - Institutional fund holdings tracker via joinquant_fund_runner pattern
    - Custom Transformer + Accumulator factor with per-entity rolling state
  what_i_auto_fetch:
  - ZVT stage pipeline structure (data_collection → visualization) from LATEST.yaml
  - Semantic locks (SL-01 through SL-12) — especially sell-before-buy ordering and MACD params
  - Fatal constraints (finance-C-*) relevant to your target strategy type
  - 'Default parameters: MACD(12,26,9), hfq adjustment, buy_cost=0.001, base_capital=1M CNY'
  - Entity ID format (stock_sh_600000) and DataFrame MultiIndex convention
  - Provider-specific recorder class names and required class attributes
  what_i_ask_you:
  - 'Target market: A-share (default), HK, or crypto? (US stocks in ZVT are half-baked — stockus_nasdaq_AAPL exists but coverage
    is thin)'
  - 'Data source / provider: eastmoney (free, no account), joinquant (account+paid), baostock (free, good history), akshare,
    or qmt (broker)?'
  - 'Strategy type: MACD golden-cross, MA crossover, volume breakout, fundamental screen, or custom factor?'
  - 'Time range: start_timestamp and end_timestamp for backtest period'
  - 'Target entity IDs: specific stocks (stock_sh_600000) or index components (SZ1000)?'
  locale_rendering:
    instruction: On first user contact, translate all fields above into detected user locale while preserving Doraemon persona
      (direct, frank, mildly snarky, knows limits).
    preserve_verbatim:
    - BD-IDs
    - SL-IDs
    - UC-IDs
    - finance-C-IDs
    - class_names
    - function_names
    - file_paths
    - numeric_thresholds

ClawHub Coding Product+2

T@clawhub-tangweigang-jpg-8679fec286

Climate Esg Investing

Skill

使用Fama-French因子模型进行气候ESG投资分析，支持月度股价数据下载、因子相关性计算、OLS回归诊断及显著性筛选，帮助用户构建因子组合和风险评估。

---
name: climate-esg-investing
description: |-
  使用Fama-French因子模型进行气候ESG投资分析，支持月度股价数据下载、因子相关性计算、OLS回归诊断及显著性筛选，帮助用户构建因子组合和风险评估。
license: Proprietary. See LICENSE.txt in project root.
compatibility: Designed for Doramagic-host ecosystem (Claude Code / openclaw / Cursor). Requires Python 3.12+ with uv package manager.
metadata:
  version: "v6.1"
  blueprint_id: "finance-bp-105"
  compiled_at: "2026-04-22T13:00:49.775031+00:00"
  capability_markets: "global"
  capability_activities: "macro-data"
  sop_version: "crystal-compilation-v6.1"
---
# ESG 气候投资 (climate-esg-investing)

> 使用Fama-French因子模型进行气候ESG投资分析，支持月度股价数据下载、因子相关性计算、OLS回归诊断及显著性筛选，帮助用户构建因子组合和风险评估。

## Pipeline

`data_collection -> data_storage -> factor_computation -> target_selection -> trading_execution -> visualization`

## Top Use Cases (9 total)

### Sector Stock Count and Significant Factor Regression Analyzer (`UC-101`)
Identifies how many stocks from an index fall into each sector and screens for stocks with statistically significant factor regression results based o
**Triggers**: sector composition, significant regression, p-value screening

### Factor Correlation Calculator (`UC-102`)
Computes correlations between different factors over time to understand factor relationship dynamics and potential multicollinearity issues
**Triggers**: factor correlation, correlation matrix, factor relationships

### OLS Regression with Diagnostic Statistics (`UC-103`)
Performs ordinary least squares regression on factor data with comprehensive diagnostic tests including Durbin-Watson, Jarque-Bera, and Breusch-Pagan
**Triggers**: OLS regression, diagnostic tests, statistical tests

For all **9** use cases, see [references/USE_CASES.md](references/USE_CASES.md).

**Execute trigger**: `When user intent matches intent_router.uc_entries[].positive_terms AND user uses action verb (run/execute/跑/执行/backtest/fetch/collect)`

## What I'll Ask You

- Target market: A-share (default), HK, or crypto? (US stocks in ZVT are half-baked — stockus_nasdaq_AAPL exists but coverage is thin)
- Data source / provider: eastmoney (free, no account), joinquant (account+paid), baostock (free, good history), akshare, or qmt (broker)?
- Strategy type: MACD golden-cross, MA crossover, volume breakout, fundamental screen, or custom factor?
- Time range: start_timestamp and end_timestamp for backtest period
- Target entity IDs: specific stocks (stock_sh_600000) or index components (SZ1000)?

## Semantic Locks (Fatal)

| ID | Rule | On Violation |
|---|---|---|
| `SL-01` | Execute sell orders before buy orders in every trading cycle | halt |
| `SL-02` | Trading signals MUST use next-bar execution (no look-ahead) | halt |
| `SL-03` | Entity IDs MUST follow format entity_type_exchange_code | halt |
| `SL-04` | DataFrame index MUST be MultiIndex (entity_id, timestamp) | halt |
| `SL-05` | TradingSignal MUST have EXACTLY ONE of: position_pct, order_money, order_amount | halt |
| `SL-06` | filter_result column semantics: True=BUY, False=SELL, None/NaN=NO ACTION | halt |
| `SL-07` | Transformer MUST run BEFORE Accumulator in factor pipeline | halt |
| `SL-08` | MACD parameters locked: fast=12, slow=26, signal=9 | halt |

Full lock definitions: [references/LOCKS.md](references/LOCKS.md)

## Top Anti-Patterns (14 total)

- **`AP-MACRO-DATA-001`**: SEC EDGAR Rate Limit Violation
- **`AP-MACRO-DATA-002`**: Temporal Knowledge Graph Look-Ahead Bias
- **`AP-MACRO-DATA-003`**: Technical Indicator Look-Ahead Bias via Missing Shift

All 14 anti-patterns: [references/ANTI_PATTERNS.md](references/ANTI_PATTERNS.md)

## Evidence Quality Notice

> [QUALITY NOTICE] This crystal was compiled from blueprint finance-bp-105. Evidence verify ratio = 3.3% and audit fail total = 20. Generated results may have uncaptured requirement gaps. Verify critical decisions against source files (LATEST.yaml / LATEST.jsonl).

## Reference Files

| File | Contents | When to Load |
|---|---|---|
| [references/seed.yaml](references/seed.yaml) | V6+ 全量权威 (source-of-truth) | 有行为/决策争议时必读 |
| [references/ANTI_PATTERNS.md](references/ANTI_PATTERNS.md) | 14 条跨项目反模式 | 开始实现前 |
| [references/WISDOM.md](references/WISDOM.md) | 跨项目精华借鉴 | 架构决策时 |
| [references/CONSTRAINTS.md](references/CONSTRAINTS.md) | domain + fatal 约束 | 规则冲突时 |
| [references/USE_CASES.md](references/USE_CASES.md) | 全量 KUC-* 业务场景 | 需要完整示例时 |
| [references/LOCKS.md](references/LOCKS.md) | SL-* + preconditions + hints | 生成回测/交易代码前 |
| [references/COMPONENTS.md](references/COMPONENTS.md) | AST 组件地图（按 module 拆分）| 查 API 时 |

---

*Compiled by Doramagic crystal-compilation-v6.1 from `finance-bp-105` blueprint at 2026-04-22T13:00:49.775031+00:00.*
*See [human_summary.md](human_summary.md) for non-technical overview.*

FILE:human_summary.md
# Human Summary

> I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow. Just tell me what you want; I'll write the code, you don't have to dig docs. (Heads up: ZVT natively supports A-share, HK, and crypto. US stocks — stockus_nasdaq_AAPL — are half-baked; don't bother for serious work.)

## What I Can Do

- **tagline**: I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow. Just tell me what you want; I'll write the code, you don't have to dig docs. (Heads up: ZVT natively supports A-share, HK, and crypto. US stocks — stockus_nasdaq_AAPL — are half-baked; don't bother for serious work.)
- **use_cases**: ['OLS Regression with Diagnostic Statistics', 'Factor Correlation Calculator', 'Sector Stock Count and Significant Factor Regression Analyzer', 'A-share MACD daily golden-cross backtest with hfq price adjustment from eastmoney', 'End-to-end ZVT pipeline: FinanceRecorder + GoodCompanyFactor + StockTrader', 'Multi-factor strategy with TargetSelector (AND mode) combining MACD + volume breakout', 'Index composition data collection (SZ1000, SZ2000) with EM recorder']

## What I Ask You

- Target market: A-share (default), HK, or crypto? (US stocks in ZVT are half-baked — stockus_nasdaq_AAPL exists but coverage is thin)
- Data source / provider: eastmoney (free, no account), joinquant (account+paid), baostock (free, good history), akshare, or qmt (broker)?
- Strategy type: MACD golden-cross, MA crossover, volume breakout, fundamental screen, or custom factor?
- Time range: start_timestamp and end_timestamp for backtest period
- Target entity IDs: specific stocks (stock_sh_600000) or index components (SZ1000)?

FILE:references/ANTI_PATTERNS.md
# Anti-Patterns (Cross-Project)

Total: **14**

## finance-bp-074--FinRobot (1)

### `AP-MACRO-DATA-001` — SEC EDGAR Rate Limit Violation <sub>(high)</sub>

When implementing SEC API calls without applying rate limiting decorators, requests exceed the regulatory 10 requests per second limit. This causes IP blocking from SEC EDGAR, preventing all subsequent access to critical financial filings and completely disrupting the data collection pipeline. FinRobot demonstrates that SEC enforces strict rate limits and missing User-Agent headers compound this by causing silent request failures.

## finance-bp-077--Open_Source_Economic_Model (2)

### `AP-MACRO-DATA-004` — EIOPA Non-Compliant Curve Extrapolation <sub>(high)</sub>

When implementing the Smith-Wilson algorithm for EIOPA Solvency II calculations, using non-EIOPA compliant formulas or incorrect convergence point calculations violates regulatory specifications. The convergence point must use max(U+40, 60) years per EIOPA paragraph 120. Non-compliant formulas will fail regulatory audits for insurance liability calculations and produce incorrect risk-free rates, leading to materially wrong liability valuations.

### `AP-MACRO-DATA-009` — CSV BOM Encoding Corruption in Data Import <sub>(medium)</sub>

When importing CSV portfolio files with special characters without using 'utf-8-sig' encoding to handle BOM markers, CSV files with UTF-8 BOM markers fail to parse correctly. This causes KeyError exceptions when reading row fields, preventing portfolio data from loading entirely. The BOM marker silently corrupts the first column name read by pandas.

## finance-bp-080--FinDKG (3)

### `AP-MACRO-DATA-002` — Temporal Knowledge Graph Look-Ahead Bias <sub>(high)</sub>

When implementing temporal data splitting for knowledge graphs, using non-temporal train/val/test splits causes the model to see future events during training. The violation of train_edges occurring before val_edges and test_edges temporally results in inflated metrics that do not reflect real-world performance. This produces overfit models that fail catastrophically when deployed for actual temporal prediction tasks.

### `AP-MACRO-DATA-008` — DGL Graph Attribute Propagation Failure in Temporal Batching <sub>(medium)</sub>

When implementing temporal knowledge graph data collation without propagating graph attributes (num_relations, num_all_nodes, time_interval) to subgraph variants created by collate_fn, downstream model components encounter missing attribute errors. The EmbeddingUpdater and EdgeModel expect these attributes on all graph objects including subgraphs, causing training to fail with AttributeError.

### `AP-MACRO-DATA-014` — Temporal DataLoader Shuffling Breaking Graph Ordering <sub>(medium)</sub>

When configuring DataLoader for temporal knowledge graph training with shuffle=True, the temporal ordering required for cumulative graph construction is violated. The model receives edges in non-chronological order, breaking the prior_G, batch_G, cumulative_G construction logic that depends on edges_before_batch occurring before edges_in_batch.

## finance-bp-083--Economic-Dashboard (3)

### `AP-MACRO-DATA-003` — Technical Indicator Look-Ahead Bias via Missing Shift <sub>(high)</sub>

When implementing SMA crossover detection (golden/death cross) without using shift(1) to compare current bar state with prior bar state, crossover detection uses current bar data causing look-ahead bias. Signals appear to fire at the same bar as the cross occurs, producing unrealistic backtest results that fail in live trading. Rationalizing this with 'we need the current bar signal immediately' leads to future information leaking into current signals.

### `AP-MACRO-DATA-010` — OHLCV Data Quality Validation Failure <sub>(medium)</sub>

When calculating technical indicators from OHLCV data without verifying required columns (open, high, low, close, volume), missing required OHLCV columns causes ValueError and prevents technical indicator calculation for affected tickers. This blocks downstream regime classification and pattern detection for all tickers with incomplete data.

### `AP-MACRO-DATA-011` — Inconsistent Primary Key Schema Causing JOIN Failures <sub>(medium)</sub>

When storing derived features in DuckDB with a different primary key schema than technical_features table, inconsistent primary keys prevent JOIN operations between tables. This breaks regime classification and pattern detection pipelines. The composite primary key (ticker, date) must be consistent across all feature tables to enable efficient querying and data integrity.

## finance-bp-105--open-climate-investing (5)

### `AP-MACRO-DATA-005` — Factor Regression Using Raw Returns Instead of Excess Returns <sub>(high)</sub>

When computing returns for CAPM/Fama-French factor regression, using raw stock returns instead of subtracting the risk-free rate (Rf) violates standard financial econometric methodology. CAPM/FF regression requires excess returns (Return - Rf); using raw returns produces incorrect beta estimates that misrepresent a stock's systematic risk exposure. This leads to fundamentally flawed risk attribution and portfolio construction decisions.

### `AP-MACRO-DATA-006` — Percentage vs Decimal Unit Mismatch in Factor Data <sub>(high)</sub>

When importing Fama-French factors from CSV files, failing to divide percentage-formatted factors (e.g., 5.2) by 100 before regression causes coefficients scaled by 100x. This produces statistically invalid inference and meaningless factor loadings. The same issue applies to risk-free rate values, corrupting all CAPM beta calculations downstream.

### `AP-MACRO-DATA-007` — Insufficient Regression Observations for Statistical Validity <sub>(medium)</sub>

When implementing factor regression analysis, using fewer than 20 data points after filtering (inner join, winsorization, date range) produces unreliable or undefined t-statistics and p-values. OLS with insufficient observations produces meaningless regression coefficients, making it impossible to distinguish significant factor exposures from noise. This commonly occurs when combining multiple data sources with missing values.

### `AP-MACRO-DATA-012` — Frequency Column Enforcement Missing in Time Series Schema <sub>(medium)</sub>

When creating PostgreSQL schema for time series tables without explicit frequency column enforcement of 'MONTHLY' or 'DAILY' text values, mixed frequency data corrupts regression calculations. Combining incompatible data frequencies produces statistically invalid regression results. The database must enforce frequency consistency to prevent silent data corruption.

### `AP-MACRO-DATA-013` — PostgreSQL Fork in Multiprocessing Context <sub>(medium)</sub>

When implementing multiprocessing for parallel regression execution using fork start method with psycopg2 database connections, child processes inherit corrupted connection state. This causes 'connection already closed' errors or corrupted connection state in child processes, resulting in failed database writes and incomplete factor regression calculations.

FILE:references/COMPONENTS.md
# Component Capability Map

**Project**: finance-bp-105--open-climate-investing
**Scan date**: 2026-04-22
**Stats**: {'total_files': 7, 'total_classes': 25, 'total_functions': 0, 'total_stages': 7}

## Modules (7)

- [stock_data_collection](components/stock_data_collection.md): 2 classes
- [database_setup_&_data_import](components/database_setup_-_data_import.md): 4 classes
- [bmg_factor_computation](components/bmg_factor_computation.md): 4 classes
- [factor_regression_analysis](components/factor_regression_analysis.md): 4 classes
- [bulk_regression_execution](components/bulk_regression_execution.md): 5 classes
- [factor_correlation_&_orthogonalization](components/factor_correlation_-_orthogonalization.md): 3 classes
- [regression_results_analysis](components/regression_results_analysis.md): 3 classes

FILE:references/CONSTRAINTS.md
# Constraints

## preservation_manifest

```yaml
required_objects:
  business_decisions_count: 124
  fatal_constraints_count: 39
  non_fatal_constraints_count: 139
  use_cases_count: 9
  semantic_locks_count: 12
  preconditions_count: 4
  evidence_quality_rules_count: 2
  traceback_scenarios_count: 5

```

FILE:references/LOCKS.md
# Semantic Locks + Preconditions

## Semantic Locks (12)

### `SL-01` <sub>(on_violation: fatal)</sub>
Execute sell orders before buy orders in every trading cycle

### `SL-02` <sub>(on_violation: fatal)</sub>
Trading signals MUST use next-bar execution (no look-ahead)

### `SL-03` <sub>(on_violation: fatal)</sub>
Entity IDs MUST follow format entity_type_exchange_code

### `SL-04` <sub>(on_violation: fatal)</sub>
DataFrame index MUST be MultiIndex (entity_id, timestamp)

### `SL-05` <sub>(on_violation: fatal)</sub>
TradingSignal MUST have EXACTLY ONE of: position_pct, order_money, order_amount

### `SL-06` <sub>(on_violation: fatal)</sub>
filter_result column semantics: True=BUY, False=SELL, None/NaN=NO ACTION

### `SL-07` <sub>(on_violation: fatal)</sub>
Transformer MUST run BEFORE Accumulator in factor pipeline

### `SL-08` <sub>(on_violation: fatal)</sub>
MACD parameters locked: fast=12, slow=26, signal=9

### `SL-09` <sub>(on_violation: warning)</sub>
Default transaction costs: buy_cost=0.001, sell_cost=0.001, slippage=0.001

### `SL-10` <sub>(on_violation: fatal)</sub>
A-share equity trading is T+1 (no same-day close of buy positions)

### `SL-11` <sub>(on_violation: fatal)</sub>
Recorder subclass MUST define provider AND data_schema class attributes

### `SL-12` <sub>(on_violation: fatal)</sub>
Factor result_df MUST contain either 'filter_result' OR 'score_result' column

## Preconditions (4)

- **`PC-01`**: `python3 -c 'import zvt; print(zvt.__version__)'` → on_fail: Run: python3 -m pip install zvt then re-run: python3 -m zvt.init_dirs to initialize data directories
- **`PC-02`**: `python3 -c "from zvt.api.kdata import get_kdata; df = get_kdata(entity_ids=['stock_sh_600000'], limit=1); assert df is not None and len(df) > 0, 'No kdata found'"` → on_fail: Run recorder first: python3 -m zvt.recorders.em.em_stock_kdata_recorder --entity_ids stock_sh_600000 (replace with your target entity IDs)
- **`PC-03`**: `python3 -c "import os; from pathlib import Path; zvt_home = Path(os.environ.get('ZVT_HOME', Path.home() / '.zvt')); assert zvt_home.exists(), f'ZVT home not found: {zvt_home}'"` → on_fail: Run: python3 -m zvt.init_dirs
- **`PC-04`**: `python3 -c "import os, tempfile; from pathlib import Path; zvt_home = Path(os.environ.get('ZVT_HOME', Path.home() / '.zvt')); test_f = zvt_home / '.write_test'; test_f.touch(); test_f.unlink()"` → on_fail: Check directory permissions: chmod u+w ~/.zvt or set ZVT_HOME environment variable to a writable location

FILE:references/USE_CASES.md
# Known Use Cases (KUC)

Total: **9**

## `KUC-101`
**Source**: `scripts/bmg_analyze.py`

Identifies how many stocks from an index fall into each sector and screens for stocks with statistically significant factor regression results based on p-values.

## `KUC-102`
**Source**: `scripts/correlate.py`

Computes correlations between different factors over time to understand factor relationship dynamics and potential multicollinearity issues.

## `KUC-103`
**Source**: `scripts/regression_function.py`

Performs ordinary least squares regression on factor data with comprehensive diagnostic tests including Durbin-Watson, Jarque-Bera, and Breusch-Pagan tests.

## `KUC-104`
**Source**: `scripts/bulk_script.py`

Builds custom Fama-French style factor models by merging stock returns, Fama-French factors, and carbon risk factors into unified datasets for analysis.

## `KUC-105`
**Source**: `scripts/stock_price_function.py`

Downloads historical stock price data from Yahoo Finance with support for daily and monthly frequencies, including automatic retry on timeout.

## `KUC-106`
**Source**: `scripts/bmg_series.py`

Creates Brown-Green (BMG) factor series by calculating the return differential between brown (high carbon) and green (low carbon) stocks for carbon risk analysis.

## `KUC-107`
**Source**: `scripts/get_regressions.py`

Executes factor regression analysis across multiple stocks in parallel using multiprocessing, loading Fama-French and carbon risk factors from database.

## `KUC-108`
**Source**: `scripts/get_stocks.py`

Imports stock return data from CSV or downloads from yfinance, with support for incremental updates to maintain current database with stock returns.

## `KUC-109`
**Source**: `scripts/setup_db.py`

Initializes database schema and imports Fama-French, bond, and carbon risk factors into PostgreSQL tables, including BMG factor data.

FILE:references/WISDOM.md
# Cross-Project Wisdom

Total: **8**

## `CW-MACRO-DATA-001` — Temporal Ordering Enforcement
**From**: finance-bp-080--FinDKG, finance-bp-083--Economic-Dashboard · **Applicable to**: macro-data

Across temporal knowledge graphs and financial time series, strict temporal ordering must be enforced in train/val/test splits and data loading. Train edges must occur strictly before validation edges, which must occur strictly before test edges. DataLoaders must never shuffle temporal data. Apply this pattern whenever implementing any time-series ML pipeline to prevent look-ahead bias that inflates evaluation metrics.

## `CW-MACRO-DATA-002` — Regulatory Formula Compliance
**From**: finance-bp-077--Open_Source_Economic_Model, finance-bp-105--open-climate-investing · **Applicable to**: macro-data

When implementing financial calculations subject to regulatory oversight (EIOPA Solvency II, CAPM, Fama-French), use exact formula specifications from authoritative sources. The Smith-Wilson convergence point must follow EIOPA paragraph 120, factor regressions must use excess returns with properly scaled inputs. Apply this pattern when calculations will be used for regulatory reporting or investment decision-making.

## `CW-MACRO-DATA-003` — Strict Data Schema Enforcement
**From**: finance-bp-083--Economic-Dashboard, finance-bp-077--Open_Source_Economic_Model · **Applicable to**: macro-data

Financial data pipelines require strict schema validation at ingestion points. OHLCV requires specific columns, CSV imports require exact column names matching field access, INI files require specific sections. Missing or malformed schema elements should fail loudly rather than produce silent corruption. Apply this pattern during data import to catch errors early before downstream calculations use bad data.

## `CW-MACRO-DATA-004` — Composite Primary Key Uniqueness
**From**: finance-bp-105--open-climate-investing, finance-bp-080--FinDKG, finance-bp-083--Economic-Dashboard · **Applicable to**: macro-data

Time-series financial databases require composite primary keys (ticker, date) to ensure uniqueness and enable efficient querying. Inconsistent primary keys across tables break JOIN operations essential for feature merging. Apply this pattern when designing any financial database schema involving time-series measurements with multiple entities.

## `CW-MACRO-DATA-005` — External API Rate Limiting
**From**: finance-bp-074--FinRobot · **Applicable to**: macro-data

When accessing external financial APIs (SEC EDGAR, data vendors), strict rate limiting must be implemented before deployment. SEC EDGAR enforces 10 requests per second with IP blocking consequences. Use decorators and proper User-Agent headers. Apply this pattern when integrating any external financial data API to prevent service disruption that blocks critical data access.

## `CW-MACRO-DATA-006` — Graph Attribute Propagation in Batching
**From**: finance-bp-080--FinDKG, finance-bp-105--open-climate-investing · **Applicable to**: macro-data

When creating subgraph variants during batch collation in graph-based ML, all metadata attributes (num_nodes, num_relations, time_interval) must be explicitly propagated to each subgraph. Downstream model components expect these attributes on all graph objects. Apply this pattern whenever implementing custom collate functions for graph neural networks to prevent training failures.

## `CW-MACRO-DATA-007` — Statistical Validity Thresholds
**From**: finance-bp-105--open-climate-investing, finance-bp-083--Economic-Dashboard · **Applicable to**: macro-data

Factor regressions and statistical calculations require minimum observation counts (typically 20+) for reliable inference. Inner joins, winsorization, and date filtering reduce observations; pipeline validation must check for sufficient data points before regression. Apply this pattern whenever computing regression statistics to ensure results are meaningful rather than spurious.

## `CW-MACRO-DATA-008` — Data Type Strictness for ML Operations
**From**: finance-bp-080--FinDKG, finance-bp-077--Open_Source_Economic_Model · **Applicable to**: macro-data

Graph operations and time calculations require strict dtype consistency (float32 for time values, integer for node types, boolean for masks). Type mismatches cause silent failures in edge_subgraph, degree calculations, and time interval transformations. Apply this pattern when preparing data for graph neural networks or any numerical ML pipeline to catch dtype issues early.

FILE:references/components/bmg_factor_computation.md
# bmg_factor_computation (4 classes)

## `add_bmg_series`
`bmg_factor_computation/add-bmg-series.py:0`

## `get_bmg_series`
`bmg_factor_computation/get-bmg-series.py:0`

## `load_stocks_returns_from_db`
`bmg_factor_computation/load-stocks-returns-from-db.py:0`

## `factor_definition`
`bmg_factor_computation/factor-definition.py:0`

FILE:references/components/bulk_regression_execution.md
# bulk_regression_execution (5 classes)

## `run_regression`
`bulk_regression_execution/run-regression.py:0`

## `run_regression_internal`
`bulk_regression_execution/run-regression-internal.py:0`

## `store_regression_into_db`
`bulk_regression_execution/store-regression-into-db.py:0`

## `window_type`
`bulk_regression_execution/window-type.py:0`

## `parallelization`
`bulk_regression_execution/parallelization.py:0`

FILE:references/components/database_setup_-_data_import.md
# database_setup_&_data_import (4 classes)

## `load_data_files`
`database_setup_&_data_import/load-data-files.py:0`

## `import_monthly_ff_data_factors_into_sql`
`database_setup_&_data_import/import-monthly-ff-data-factors-into-sql.py:0`

## `import_msci_constituents_into_sql`
`database_setup_&_data_import/import-msci-constituents-into-sql.py:0`

## `database`
`database_setup_&_data_import/database.py:0`

FILE:references/components/factor_correlation_-_orthogonalization.md
# factor_correlation_&_orthogonalization (3 classes)

## `process_factor`
`factor_correlation_&_orthogonalization/process-factor.py:0`

## `execute_batch`
`factor_correlation_&_orthogonalization/execute-batch.py:0`

## `orthogonalization_method`
`factor_correlation_&_orthogonalization/orthogonalization-method.py:0`

FILE:references/components/factor_regression_analysis.md
# factor_regression_analysis (4 classes)

## `run_regression`
`factor_regression_analysis/run-regression.py:0`

## `regression_input_output`
`factor_regression_analysis/regression-input-output.py:0`

## `merge_data`
`factor_regression_analysis/merge-data.py:0`

## `regression_model`
`factor_regression_analysis/regression-model.py:0`

FILE:references/components/regression_results_analysis.md
# regression_results_analysis (3 classes)

## `get_stocks_with_significant_regressions`
`regression_results_analysis/get-stocks-with-significant-regressions.py:0`

## `get_sectors_with_significant_final_regression`
`regression_results_analysis/get-sectors-with-significant-final-regre.py:0`

## `significance_criteria`
`regression_results_analysis/significance-criteria.py:0`

FILE:references/components/stock_data_collection.md
# stock_data_collection (2 classes)

## `stock_grabber`
`stock_data_collection/stock-grabber.py:0`

## `data_source`
`stock_data_collection/data-source.py:0`

FILE:references/seed.yaml
meta:
  id: finance-bp-105-v5.3
  version: v6.1
  blueprint_id: finance-bp-105
  sop_version: crystal-compilation-v6.1
  source_language: en
  compiled_at: '2026-04-22T13:00:49.775031+00:00'
  target_host: openclaw
  authoritative_artifact:
    primary: seed.yaml
    non_authoritative_derivatives:
    - SKILL.md (host-generated summary, may lag)
    - HEARTBEAT.md (host telemetry)
    - memory/*.md (host conversational memory)
    rule: On any behavioral decision (preconditions check, OV assertion, EQ rule firing, spec_lock verification), agents MUST
      re-read seed.yaml. Derivatives are for UI display only and may be out-of-date.
  execution_protocol:
    install_trigger:
    - Execute resources.host_adapter.install_recipes[] in declared order
    - Verify each package with import check before proceeding
    execute_trigger: When user intent matches intent_router.uc_entries[].positive_terms AND user uses action verb (run/execute/跑/执行/backtest/fetch/collect)
    on_execute:
    - Reload seed.yaml (do not rely on SKILL.md or cached summaries)
    - Run preconditions[] in declared order; halt on first fatal failure with on_fail message to user
    - Enter context_state_machine.CA1_MEMORY_CHECKED state
    - Evaluate evidence_quality.enforcement_rules[]; prepend user_disclosure_template
    - Translate user_facing_fields to user locale per locale_contract
    - "[V6 READING ORDER]\nThis crystal contains the following V6 layers. Before answering any business question, the host\
      \ MUST read them in order:\n  1. anti_patterns[] — cross-project anti-patterns (with AP-* ids)\n  2. cross_project_wisdom[]\
      \ — cross-project wisdom (with CW-* ids)\n  3. domain_constraints_injected[] — domain constraints (SHARED-* ids)\n \
      \ 4. known_use_cases[] — concrete business scenarios (KUC-* ids)\n  5. component_capability_map — AST component map\
      \ (by module)\n\nWhen answering user questions, proactively cite relevant AP-*/CW-*/SHARED-*/KUC-* ids with source text.\
      \ Examples: T+1 rules -> cite SHARED-* constraint; model comparison -> warn via AP-*; follow-holdings strategy -> cite\
      \ KUC-* with example file."
    workspace_resolution:
      scripts_path: '{host_workspace}/scripts/'
      skills_path: '{host_workspace}/skills/'
      trace_path: '{host_workspace}/.trace/'
  capability_tags:
    markets:
    - global
    activities:
    - macro-data
  upgraded_from: finance-bp-105-v1.seed.yaml
  upgraded_at: '2026-04-22T13:20:28.159836+00:00'
  v6_inputs:
    ast_mind_map: knowledge/sources/finance/finance-bp-105--open-climate-investing/v6_inputs/ast_mind_map.yaml
    anti_patterns: null
    cross_project_wisdom: null
    examples_kuc: knowledge/sources/finance/finance-bp-105--open-climate-investing/v6_inputs/examples_kuc.yaml
    shared_pools_dir: knowledge/sources/finance/_shared
anti_patterns:
- id: AP-MACRO-DATA-001
  title: SEC EDGAR Rate Limit Violation
  description: When implementing SEC API calls without applying rate limiting decorators, requests exceed the regulatory 10
    requests per second limit. This causes IP blocking from SEC EDGAR, preventing all subsequent access to critical financial
    filings and completely disrupting the data collection pipeline. FinRobot demonstrates that SEC enforces strict rate limits
    and missing User-Agent headers compound this by causing silent request failures.
  project_source: finance-bp-074--FinRobot
  severity: high
  applicable_to_tags:
    markets:
    - global
    activities:
    - macro-data
  _source_file: anti-patterns/macro-data.yaml
- id: AP-MACRO-DATA-002
  title: Temporal Knowledge Graph Look-Ahead Bias
  description: When implementing temporal data splitting for knowledge graphs, using non-temporal train/val/test splits causes
    the model to see future events during training. The violation of train_edges occurring before val_edges and test_edges
    temporally results in inflated metrics that do not reflect real-world performance. This produces overfit models that fail
    catastrophically when deployed for actual temporal prediction tasks.
  project_source: finance-bp-080--FinDKG
  severity: high
  applicable_to_tags:
    markets:
    - global
    activities:
    - macro-data
  _source_file: anti-patterns/macro-data.yaml
- id: AP-MACRO-DATA-003
  title: Technical Indicator Look-Ahead Bias via Missing Shift
  description: When implementing SMA crossover detection (golden/death cross) without using shift(1) to compare current bar
    state with prior bar state, crossover detection uses current bar data causing look-ahead bias. Signals appear to fire
    at the same bar as the cross occurs, producing unrealistic backtest results that fail in live trading. Rationalizing this
    with 'we need the current bar signal immediately' leads to future information leaking into current signals.
  project_source: finance-bp-083--Economic-Dashboard
  severity: high
  applicable_to_tags:
    markets:
    - global
    activities:
    - macro-data
  _source_file: anti-patterns/macro-data.yaml
- id: AP-MACRO-DATA-004
  title: EIOPA Non-Compliant Curve Extrapolation
  description: When implementing the Smith-Wilson algorithm for EIOPA Solvency II calculations, using non-EIOPA compliant
    formulas or incorrect convergence point calculations violates regulatory specifications. The convergence point must use
    max(U+40, 60) years per EIOPA paragraph 120. Non-compliant formulas will fail regulatory audits for insurance liability
    calculations and produce incorrect risk-free rates, leading to materially wrong liability valuations.
  project_source: finance-bp-077--Open_Source_Economic_Model
  severity: high
  applicable_to_tags:
    markets:
    - global
    activities:
    - macro-data
  _source_file: anti-patterns/macro-data.yaml
- id: AP-MACRO-DATA-005
  title: Factor Regression Using Raw Returns Instead of Excess Returns
  description: When computing returns for CAPM/Fama-French factor regression, using raw stock returns instead of subtracting
    the risk-free rate (Rf) violates standard financial econometric methodology. CAPM/FF regression requires excess returns
    (Return - Rf); using raw returns produces incorrect beta estimates that misrepresent a stock's systematic risk exposure.
    This leads to fundamentally flawed risk attribution and portfolio construction decisions.
  project_source: finance-bp-105--open-climate-investing
  severity: high
  applicable_to_tags:
    markets:
    - global
    activities:
    - macro-data
  _source_file: anti-patterns/macro-data.yaml
- id: AP-MACRO-DATA-006
  title: Percentage vs Decimal Unit Mismatch in Factor Data
  description: When importing Fama-French factors from CSV files, failing to divide percentage-formatted factors (e.g., 5.2)
    by 100 before regression causes coefficients scaled by 100x. This produces statistically invalid inference and meaningless
    factor loadings. The same issue applies to risk-free rate values, corrupting all CAPM beta calculations downstream.
  project_source: finance-bp-105--open-climate-investing
  severity: high
  applicable_to_tags:
    markets:
    - global
    activities:
    - macro-data
  _source_file: anti-patterns/macro-data.yaml
- id: AP-MACRO-DATA-007
  title: Insufficient Regression Observations for Statistical Validity
  description: When implementing factor regression analysis, using fewer than 20 data points after filtering (inner join,
    winsorization, date range) produces unreliable or undefined t-statistics and p-values. OLS with insufficient observations
    produces meaningless regression coefficients, making it impossible to distinguish significant factor exposures from noise.
    This commonly occurs when combining multiple data sources with missing values.
  project_source: finance-bp-105--open-climate-investing
  severity: medium
  applicable_to_tags:
    markets:
    - global
    activities:
    - macro-data
  _source_file: anti-patterns/macro-data.yaml
- id: AP-MACRO-DATA-008
  title: DGL Graph Attribute Propagation Failure in Temporal Batching
  description: When implementing temporal knowledge graph data collation without propagating graph attributes (num_relations,
    num_all_nodes, time_interval) to subgraph variants created by collate_fn, downstream model components encounter missing
    attribute errors. The EmbeddingUpdater and EdgeModel expect these attributes on all graph objects including subgraphs,
    causing training to fail with AttributeError.
  project_source: finance-bp-080--FinDKG
  severity: medium
  applicable_to_tags:
    markets:
    - global
    activities:
    - macro-data
  _source_file: anti-patterns/macro-data.yaml
- id: AP-MACRO-DATA-009
  title: CSV BOM Encoding Corruption in Data Import
  description: When importing CSV portfolio files with special characters without using 'utf-8-sig' encoding to handle BOM
    markers, CSV files with UTF-8 BOM markers fail to parse correctly. This causes KeyError exceptions when reading row fields,
    preventing portfolio data from loading entirely. The BOM marker silently corrupts the first column name read by pandas.
  project_source: finance-bp-077--Open_Source_Economic_Model
  severity: medium
  applicable_to_tags:
    markets:
    - global
    activities:
    - macro-data
  _source_file: anti-patterns/macro-data.yaml
- id: AP-MACRO-DATA-010
  title: OHLCV Data Quality Validation Failure
  description: When calculating technical indicators from OHLCV data without verifying required columns (open, high, low,
    close, volume), missing required OHLCV columns causes ValueError and prevents technical indicator calculation for affected
    tickers. This blocks downstream regime classification and pattern detection for all tickers with incomplete data.
  project_source: finance-bp-083--Economic-Dashboard
  severity: medium
  applicable_to_tags:
    markets:
    - global
    activities:
    - macro-data
  _source_file: anti-patterns/macro-data.yaml
- id: AP-MACRO-DATA-011
  title: Inconsistent Primary Key Schema Causing JOIN Failures
  description: When storing derived features in DuckDB with a different primary key schema than technical_features table,
    inconsistent primary keys prevent JOIN operations between tables. This breaks regime classification and pattern detection
    pipelines. The composite primary key (ticker, date) must be consistent across all feature tables to enable efficient querying
    and data integrity.
  project_source: finance-bp-083--Economic-Dashboard
  severity: medium
  applicable_to_tags:
    markets:
    - global
    activities:
    - macro-data
  _source_file: anti-patterns/macro-data.yaml
- id: AP-MACRO-DATA-012
  title: Frequency Column Enforcement Missing in Time Series Schema
  description: When creating PostgreSQL schema for time series tables without explicit frequency column enforcement of 'MONTHLY'
    or 'DAILY' text values, mixed frequency data corrupts regression calculations. Combining incompatible data frequencies
    produces statistically invalid regression results. The database must enforce frequency consistency to prevent silent data
    corruption.
  project_source: finance-bp-105--open-climate-investing
  severity: medium
  applicable_to_tags:
    markets:
    - global
    activities:
    - macro-data
  _source_file: anti-patterns/macro-data.yaml
- id: AP-MACRO-DATA-013
  title: PostgreSQL Fork in Multiprocessing Context
  description: When implementing multiprocessing for parallel regression execution using fork start method with psycopg2 database
    connections, child processes inherit corrupted connection state. This causes 'connection already closed' errors or corrupted
    connection state in child processes, resulting in failed database writes and incomplete factor regression calculations.
  project_source: finance-bp-105--open-climate-investing
  severity: medium
  applicable_to_tags:
    markets:
    - global
    activities:
    - macro-data
  _source_file: anti-patterns/macro-data.yaml
- id: AP-MACRO-DATA-014
  title: Temporal DataLoader Shuffling Breaking Graph Ordering
  description: When configuring DataLoader for temporal knowledge graph training with shuffle=True, the temporal ordering
    required for cumulative graph construction is violated. The model receives edges in non-chronological order, breaking
    the prior_G, batch_G, cumulative_G construction logic that depends on edges_before_batch occurring before edges_in_batch.
  project_source: finance-bp-080--FinDKG
  severity: medium
  applicable_to_tags:
    markets:
    - global
    activities:
    - macro-data
  _source_file: anti-patterns/macro-data.yaml
cross_project_wisdom:
- wisdom_id: CW-MACRO-DATA-001
  source_project: finance-bp-080--FinDKG, finance-bp-083--Economic-Dashboard
  pattern_name: Temporal Ordering Enforcement
  description: Across temporal knowledge graphs and financial time series, strict temporal ordering must be enforced in train/val/test
    splits and data loading. Train edges must occur strictly before validation edges, which must occur strictly before test
    edges. DataLoaders must never shuffle temporal data. Apply this pattern whenever implementing any time-series ML pipeline
    to prevent look-ahead bias that inflates evaluation metrics.
  applicable_to_activity: macro-data
  _source_file: cross-project-wisdom/macro-data.yaml
- wisdom_id: CW-MACRO-DATA-002
  source_project: finance-bp-077--Open_Source_Economic_Model, finance-bp-105--open-climate-investing
  pattern_name: Regulatory Formula Compliance
  description: When implementing financial calculations subject to regulatory oversight (EIOPA Solvency II, CAPM, Fama-French),
    use exact formula specifications from authoritative sources. The Smith-Wilson convergence point must follow EIOPA paragraph
    120, factor regressions must use excess returns with properly scaled inputs. Apply this pattern when calculations will
    be used for regulatory reporting or investment decision-making.
  applicable_to_activity: macro-data
  _source_file: cross-project-wisdom/macro-data.yaml
- wisdom_id: CW-MACRO-DATA-003
  source_project: finance-bp-083--Economic-Dashboard, finance-bp-077--Open_Source_Economic_Model
  pattern_name: Strict Data Schema Enforcement
  description: Financial data pipelines require strict schema validation at ingestion points. OHLCV requires specific columns,
    CSV imports require exact column names matching field access, INI files require specific sections. Missing or malformed
    schema elements should fail loudly rather than produce silent corruption. Apply this pattern during data import to catch
    errors early before downstream calculations use bad data.
  applicable_to_activity: macro-data
  _source_file: cross-project-wisdom/macro-data.yaml
- wisdom_id: CW-MACRO-DATA-004
  source_project: finance-bp-105--open-climate-investing, finance-bp-080--FinDKG, finance-bp-083--Economic-Dashboard
  pattern_name: Composite Primary Key Uniqueness
  description: Time-series financial databases require composite primary keys (ticker, date) to ensure uniqueness and enable
    efficient querying. Inconsistent primary keys across tables break JOIN operations essential for feature merging. Apply
    this pattern when designing any financial database schema involving time-series measurements with multiple entities.
  applicable_to_activity: macro-data
  _source_file: cross-project-wisdom/macro-data.yaml
- wisdom_id: CW-MACRO-DATA-005
  source_project: finance-bp-074--FinRobot
  pattern_name: External API Rate Limiting
  description: When accessing external financial APIs (SEC EDGAR, data vendors), strict rate limiting must be implemented
    before deployment. SEC EDGAR enforces 10 requests per second with IP blocking consequences. Use decorators and proper
    User-Agent headers. Apply this pattern when integrating any external financial data API to prevent service disruption
    that blocks critical data access.
  applicable_to_activity: macro-data
  _source_file: cross-project-wisdom/macro-data.yaml
- wisdom_id: CW-MACRO-DATA-006
  source_project: finance-bp-080--FinDKG, finance-bp-105--open-climate-investing
  pattern_name: Graph Attribute Propagation in Batching
  description: When creating subgraph variants during batch collation in graph-based ML, all metadata attributes (num_nodes,
    num_relations, time_interval) must be explicitly propagated to each subgraph. Downstream model components expect these
    attributes on all graph objects. Apply this pattern whenever implementing custom collate functions for graph neural networks
    to prevent training failures.
  applicable_to_activity: macro-data
  _source_file: cross-project-wisdom/macro-data.yaml
- wisdom_id: CW-MACRO-DATA-007
  source_project: finance-bp-105--open-climate-investing, finance-bp-083--Economic-Dashboard
  pattern_name: Statistical Validity Thresholds
  description: Factor regressions and statistical calculations require minimum observation counts (typically 20+) for reliable
    inference. Inner joins, winsorization, and date filtering reduce observations; pipeline validation must check for sufficient
    data points before regression. Apply this pattern whenever computing regression statistics to ensure results are meaningful
    rather than spurious.
  applicable_to_activity: macro-data
  _source_file: cross-project-wisdom/macro-data.yaml
- wisdom_id: CW-MACRO-DATA-008
  source_project: finance-bp-080--FinDKG, finance-bp-077--Open_Source_Economic_Model
  pattern_name: Data Type Strictness for ML Operations
  description: Graph operations and time calculations require strict dtype consistency (float32 for time values, integer for
    node types, boolean for masks). Type mismatches cause silent failures in edge_subgraph, degree calculations, and time
    interval transformations. Apply this pattern when preparing data for graph neural networks or any numerical ML pipeline
    to catch dtype issues early.
  applicable_to_activity: macro-data
  _source_file: cross-project-wisdom/macro-data.yaml
domain_constraints_injected: []
resources_injected: {}
known_use_cases:
- kuc_id: KUC-101
  source_file: scripts/bmg_analyze.py
  business_problem: Identifies how many stocks from an index fall into each sector and screens for stocks with statistically
    significant factor regression results based on p-values.
  intent_keywords:
  - sector composition
  - significant regression
  - p-value screening
  - stock sectors
  - factor analysis
  stage: factor_computation
  data_domain: holding_data
  type: screening
- kuc_id: KUC-102
  source_file: scripts/correlate.py
  business_problem: Computes correlations between different factors over time to understand factor relationship dynamics and
    potential multicollinearity issues.
  intent_keywords:
  - factor correlation
  - correlation matrix
  - factor relationships
  - multicollinearity
  - time series correlation
  stage: factor_computation
  data_domain: financial_data
  type: research_analysis
- kuc_id: KUC-103
  source_file: scripts/regression_function.py
  business_problem: Performs ordinary least squares regression on factor data with comprehensive diagnostic tests including
    Durbin-Watson, Jarque-Bera, and Breusch-Pagan tests.
  intent_keywords:
  - OLS regression
  - diagnostic tests
  - statistical tests
  - regression analysis
  - residual analysis
  stage: factor_computation
  data_domain: financial_data
  type: research_analysis
- kuc_id: KUC-104
  source_file: scripts/bulk_script.py
  business_problem: Builds custom Fama-French style factor models by merging stock returns, Fama-French factors, and carbon
    risk factors into unified datasets for analysis.
  intent_keywords:
  - Fama-French model
  - factor model
  - carbon risk
  - factor construction
  - data merging
  stage: data_collection
  data_domain: mixed
  type: data_pipeline
- kuc_id: KUC-105
  source_file: scripts/stock_price_function.py
  business_problem: Downloads historical stock price data from Yahoo Finance with support for daily and monthly frequencies,
    including automatic retry on timeout.
  intent_keywords:
  - stock prices
  - price download
  - yfinance
  - historical data
  - market data
  stage: data_collection
  data_domain: market_data
  type: data_pipeline
- kuc_id: KUC-106
  source_file: scripts/bmg_series.py
  business_problem: Creates Brown-Green (BMG) factor series by calculating the return differential between brown (high carbon)
    and green (low carbon) stocks for carbon risk analysis.
  intent_keywords:
  - BMG factor
  - carbon risk
  - brown green stocks
  - factor creation
  - environmental factor
  stage: factor_computation
  data_domain: financial_data
  type: builtin_factor
- kuc_id: KUC-107
  source_file: scripts/get_regressions.py
  business_problem: Executes factor regression analysis across multiple stocks in parallel using multiprocessing, loading
    Fama-French and carbon risk factors from database.
  intent_keywords:
  - regression
  - multiprocessing
  - parallel analysis
  - factor regression
  - batch analysis
  stage: factor_computation
  data_domain: financial_data
  type: research_analysis
- kuc_id: KUC-108
  source_file: scripts/get_stocks.py
  business_problem: Imports stock return data from CSV or downloads from yfinance, with support for incremental updates to
    maintain current database with stock returns.
  intent_keywords:
  - import stocks
  - stock data
  - data import
  - stock returns
  - database update
  stage: data_collection
  data_domain: trading_data
  type: data_pipeline
- kuc_id: KUC-109
  source_file: scripts/setup_db.py
  business_problem: Initializes database schema and imports Fama-French, bond, and carbon risk factors into PostgreSQL tables,
    including BMG factor data.
  intent_keywords:
  - database setup
  - schema initialization
  - factor import
  - carbon data
  - bond factors
  stage: data_collection
  data_domain: financial_data
  type: data_pipeline
component_capability_map:
  project: finance-bp-105--open-climate-investing
  scan_date: '2026-04-22'
  stats:
    total_files: 7
    total_classes: 25
    total_functions: 0
    total_stages: 7
  modules:
    stock_data_collection:
      class_count: 2
      stage_id: data_collection
      stage_order: 1
      responsibility: Fetches stock price data from Yahoo Finance API with monthly/daily frequency support. Provides retry
        logic and incomplete data filtering. Acts as the primary data ingestion point for each downstream analysis.
      classes:
      - name: stock_grabber
        file: stock_data_collection/stock-grabber.py
        line: 0
        kind: required_method
        signature: ''
      - name: data_source
        file: stock_data_collection/data-source.py
        line: 0
        kind: replaceable_point
      design_decision_count: 4
    database_setup_&_data_import:
      class_count: 4
      stage_id: data_import
      stage_order: 2
      responsibility: Initializes PostgreSQL schema, imports Fama-French factors, carbon risk factors, stock constituents,
        and other reference data. Provides batch CSV ingestion with idempotent upsert behavior.
      classes:
      - name: load_data_files
        file: database_setup_&_data_import/load-data-files.py
        line: 0
        kind: required_method
        signature: ''
      - name: import_monthly_ff_data_factors_into_sql
        file: database_setup_&_data_import/import-monthly-ff-data-factors-into-sql.py
        line: 0
        kind: required_method
        signature: ''
      - name: import_msci_constituents_into_sql
        file: database_setup_&_data_import/import-msci-constituents-into-sql.py
        line: 0
        kind: required_method
        signature: ''
      - name: database
        file: database_setup_&_data_import/database.py
        line: 0
        kind: replaceable_point
      design_decision_count: 5
    bmg_factor_computation:
      class_count: 4
      stage_id: bmg_factor_computation
      stage_order: 3
      responsibility: Computes Brown-Minus-Green (BMG) carbon risk factor from index constituent returns. BMG = Brown Returns
        - Green Returns. Positive BMG means brown stocks outperform green, indicating carbon risk premium.
      classes:
      - name: add_bmg_series
        file: bmg_factor_computation/add-bmg-series.py
        line: 0
        kind: required_method
        signature: ''
      - name: get_bmg_series
        file: bmg_factor_computation/get-bmg-series.py
        line: 0
        kind: required_method
        signature: ''
      - name: load_stocks_returns_from_db
        file: bmg_factor_computation/load-stocks-returns-from-db.py
        line: 0
        kind: required_method
        signature: ''
      - name: factor_definition
        file: bmg_factor_computation/factor-definition.py
        line: 0
        kind: replaceable_point
      design_decision_count: 3
    factor_regression_analysis:
      class_count: 4
      stage_id: factor_regression
      stage_order: 4
      responsibility: Runs OLS regressions of stock returns on carbon risk factor and Fama-French factors. Computes coefficients,
        t-stats, p-values, and diagnostic statistics with proper data alignment and validation.
      classes:
      - name: run_regression
        file: factor_regression_analysis/run-regression.py
        line: 0
        kind: required_method
        signature: ''
      - name: regression_input_output
        file: factor_regression_analysis/regression-input-output.py
        line: 0
        kind: required_method
        signature: ''
      - name: merge_data
        file: factor_regression_analysis/merge-data.py
        line: 0
        kind: required_method
        signature: ''
      - name: regression_model
        file: factor_regression_analysis/regression-model.py
        line: 0
        kind: replaceable_point
      design_decision_count: 6
    bulk_regression_execution:
      class_count: 5
      stage_id: bulk_regression
      stage_order: 5
      responsibility: Runs rolling-window regressions across multiple stocks with multiprocessing support. Stores results
        incrementally in PostgreSQL using UPSERT pattern. Enables analysis of large stock universes.
      classes:
      - name: run_regression
        file: bulk_regression_execution/run-regression.py
        line: 0
        kind: required_method
        signature: ''
      - name: run_regression_internal
        file: bulk_regression_execution/run-regression-internal.py
        line: 0
        kind: required_method
        signature: ''
      - name: store_regression_into_db
        file: bulk_regression_execution/store-regression-into-db.py
        line: 0
        kind: required_method
        signature: ''
      - name: window_type
        file: bulk_regression_execution/window-type.py
        line: 0
        kind: replaceable_point
      - name: parallelization
        file: bulk_regression_execution/parallelization.py
        line: 0
        kind: replaceable_point
      design_decision_count: 6
    factor_correlation_&_orthogonalization:
      class_count: 3
      stage_id: factor_orthogonalization
      stage_order: 6
      responsibility: Analyzes correlation between BMG factor and other factors (Fama-French), then orthogonalizes BMG by
        regressing on significantly correlated factors and storing residuals. Removes factor contamination.
      classes:
      - name: process_factor
        file: factor_correlation_&_orthogonalization/process-factor.py
        line: 0
        kind: required_method
        signature: ''
      - name: execute_batch
        file: factor_correlation_&_orthogonalization/execute-batch.py
        line: 0
        kind: required_method
        signature: ''
      - name: orthogonalization_method
        file: factor_correlation_&_orthogonalization/orthogonalization-method.py
        line: 0
        kind: replaceable_point
      design_decision_count: 3
    regression_results_analysis:
      class_count: 3
      stage_id: results_analysis
      stage_order: 7
      responsibility: Queries stored regression results to identify stocks and sectors with significant carbon risk exposure.
        Aggregates statistics by sector and significance level for actionable insights.
      classes:
      - name: get_stocks_with_significant_regressions
        file: regression_results_analysis/get-stocks-with-significant-regressions.py
        line: 0
        kind: required_method
        signature: ''
      - name: get_sectors_with_significant_final_regression
        file: regression_results_analysis/get-sectors-with-significant-final-regre.py
        line: 0
        kind: required_method
        signature: ''
      - name: significance_criteria
        file: regression_results_analysis/significance-criteria.py
        line: 0
        kind: replaceable_point
      design_decision_count: 2
  data_flow_hints: []
locale_contract:
  source_language: en
  user_facing_fields:
  - human_summary.what_i_can_do.tagline
  - human_summary.what_i_can_do.use_cases[]
  - human_summary.what_i_auto_fetch[]
  - human_summary.what_i_ask_you[]
  - evidence_quality.user_disclosure_template
  - post_install_notice.message_template.positioning
  - post_install_notice.message_template.capability_catalog.groups[].name
  - post_install_notice.message_template.capability_catalog.groups[].description
  - post_install_notice.message_template.capability_catalog.groups[].ucs[].name
  - post_install_notice.message_template.capability_catalog.groups[].ucs[].short_description
  - post_install_notice.message_template.call_to_action
  - post_install_notice.message_template.featured_entries[].beginner_prompt
  - post_install_notice.message_template.more_info_hint
  - preconditions[].description
  - preconditions[].on_fail
  - intent_router.uc_entries[].name
  - intent_router.uc_entries[].ambiguity_question
  - architecture.pipeline
  - architecture.stages[].narrative.does_what
  - architecture.stages[].narrative.key_decisions
  - architecture.stages[].narrative.common_pitfalls
  - constraints.fatal[].consequence
  - constraints.regular[].consequence
  - output_validator.assertions[].failure_message
  - acceptance.hard_gates[].on_fail
  - skill_crystallization.action
  locale_detection_order:
  - explicit_user_declaration
  - first_message_language
  - system_locale
  translation_enforcement:
    trigger: on_first_user_message
    action: Render user_facing_fields in detected locale, preserving all IDs (BD-/SL-/UC-/finance-C-) and code identifiers
      verbatim
    violation_code: LOCALE-01
    violation_signal: User receives untranslated English Human Summary when detected locale != en
evidence_quality:
  declared:
    evidence_coverage_ratio: 1.0
    evidence_verify_ratio: 0.03260869565217391
    evidence_invalid: 89
    evidence_verified: 3
    evidence_auto_fixed: 0
    audit_coverage: 39/39 (100%)
    audit_pass_rate: 0/39 (0%)
    audit_fail_total: 20
    audit_finance_universal:
      pass: 0
      warn: 9
      fail: 11
    audit_subdomain_totals:
      pass: 0
      warn: 10
      fail: 9
  enforcement_rules:
  - id: EQ-01
    trigger: declared.evidence_verify_ratio < 0.5
    action: MUST invoke traceback lookup for all cited BD-IDs in output before emitting business code — read LATEST.yaml sections
      for each BD referenced
    violation_code: EQ-01-V
    violation_signal: Generated script references BD-IDs but no tool_call to read LATEST.yaml preceded code generation
  user_disclosure_template: '[QUALITY NOTICE] This crystal was compiled from blueprint finance-bp-105. Evidence verify ratio
    = 3.3% and audit fail total = 20. Generated results may have uncaptured requirement gaps. Verify critical decisions against
    source files (LATEST.yaml / LATEST.jsonl).'
traceback:
  source_files:
    blueprint: LATEST.yaml
    constraints: LATEST.jsonl
  mandatory_lookup_scenarios:
  - id: TB-01
    condition: Two constraints have apparently conflicting enforcement rules
    lookup_target: LATEST.jsonl — find both constraint IDs, compare `consequence` + `evidence_refs` to determine priority
  - id: TB-02
    condition: A business decision rationale is unclear or disputed
    lookup_target: LATEST.yaml — locate BD-ID under business_decisions, read `rationale` + `alternative_considered` fields
  - id: TB-03
    condition: evidence_invalid > 0 in evidence_quality.declared
    lookup_target: LATEST.yaml _enrich_meta — cross-check specific BD `evidence_refs` fields for invalid markers
  - id: TB-04
    condition: User asks where a rule comes from
    lookup_target: LATEST.jsonl — find constraint by ID, read `confidence.evidence_refs` for source file + line number
  - id: TB-05
    condition: Generated code does not match expected ZVT API behavior
    lookup_target: LATEST.yaml stages[].required_methods — verify method signature and evidence locator in source code
  degraded_lookup:
    no_fs_access: 'Ask the user to paste the relevant LATEST.yaml section or LATEST.jsonl lines for the BD-/finance-C- IDs
      in question. Crystal ID: finance-bp-105-v5.0.'
trace_schema:
  event_types:
  - precondition_check
  - spec_lock_check
  - evidence_rule_fired
  - evidence_rule_skipped
  - locale_translation_emitted
  - hard_gate_passed
  - hard_gate_failed
  - skill_emitted
  - false_completion_claim
preconditions:
- id: PC-01
  description: zvt package installed and importable
  check_command: python3 -c 'import zvt; print(zvt.__version__)'
  on_fail: 'Run: python3 -m pip install zvt  then re-run: python3 -m zvt.init_dirs to initialize data directories'
  severity: fatal
- id: PC-02
  description: K-data exists for target entities (required before backtesting)
  check_command: python3 -c "from zvt.api.kdata import get_kdata; df = get_kdata(entity_ids=['stock_sh_600000'], limit=1);
    assert df is not None and len(df) > 0, 'No kdata found'"
  on_fail: 'Run recorder first: python3 -m zvt.recorders.em.em_stock_kdata_recorder --entity_ids stock_sh_600000  (replace
    with your target entity IDs)'
  severity: fatal
  applies_to_uc:
  - UC-101
  - UC-102
  - UC-103
  - UC-106
  - UC-107
- id: PC-03
  description: ZVT data directory initialized (~/.zvt or ZVT_HOME)
  check_command: 'python3 -c "import os; from pathlib import Path; zvt_home = Path(os.environ.get(''ZVT_HOME'', Path.home()
    / ''.zvt'')); assert zvt_home.exists(), f''ZVT home not found: {zvt_home}''"'
  on_fail: 'Run: python3 -m zvt.init_dirs'
  severity: fatal
- id: PC-04
  description: SQLite write permission for ZVT data directory
  check_command: python3 -c "import os, tempfile; from pathlib import Path; zvt_home = Path(os.environ.get('ZVT_HOME', Path.home()
    / '.zvt')); test_f = zvt_home / '.write_test'; test_f.touch(); test_f.unlink()"
  on_fail: 'Check directory permissions: chmod u+w ~/.zvt  or set ZVT_HOME environment variable to a writable location'
  severity: warn
intent_router:
  uc_entries:
  - uc_id: UC-101
    name: Sector Stock Count and Significant Factor Regression Analyzer
    positive_terms:
    - sector composition
    - significant regression
    - p-value screening
    - stock sectors
    - factor analysis
    data_domain: holding_data
    negative_terms:
    - stock price download
    - correlation calculation
    - factor series creation
    - database setup
    - bulk factor model
    ambiguity_question: Are you looking to screen stocks by sector distribution from an index, or to find stocks with statistically
      significant factor relationships?
  - uc_id: UC-102
    name: Factor Correlation Calculator
    positive_terms:
    - factor correlation
    - correlation matrix
    - factor relationships
    - multicollinearity
    - time series correlation
    data_domain: financial_data
    negative_terms:
    - stock screening
    - price download
    - BMG factor
    - regression execution
    - database setup
    ambiguity_question: Do you want to see how different factors correlate with each other, or are you looking to run factor
      regressions on individual stocks?
  - uc_id: UC-103
    name: OLS Regression with Diagnostic Statistics
    positive_terms:
    - OLS regression
    - diagnostic tests
    - statistical tests
    - regression analysis
    - residual analysis
    data_domain: financial_data
    negative_terms:
    - correlation
    - stock screening
    - data download
    - factor creation
    - database utilities
    ambiguity_question: Are you trying to run regression analysis with statistical diagnostics, or do you need something else
      like factor correlations or data loading?
  - uc_id: UC-104
    name: Fama-French Factor Model Generator
    positive_terms:
    - Fama-French model
    - factor model
    - carbon risk
    - factor construction
    - data merging
    data_domain: mixed
    negative_terms:
    - stock screening
    - correlation
    - price download
    - database setup
    - BMG factor
    ambiguity_question: Are you building a custom factor model combining multiple data sources, or are you analyzing existing
      factors for correlations or regressions?
  - uc_id: UC-105
    name: Stock Price Data Downloader
    positive_terms:
    - stock prices
    - price download
    - yfinance
    - historical data
    - market data
    data_domain: market_data
    negative_terms:
    - regression
    - correlation
    - screening
    - BMG factor
    - database setup
    ambiguity_question: Are you trying to download raw stock price data, or are you looking to perform analysis like regressions,
      correlations, or screening?
  - uc_id: UC-106
    name: BMG Factor Series Creator
    positive_terms:
    - BMG factor
    - carbon risk
    - brown green stocks
    - factor creation
    - environmental factor
    data_domain: financial_data
    negative_terms:
    - regression
    - correlation
    - stock prices
    - screening
    - database setup
    ambiguity_question: Are you creating a new BMG (brown-minus-green) carbon risk factor series, or are you using an existing
      factor for analysis like regressions or correlations?
  - uc_id: UC-107
    name: Multi-Stock Factor Regression Runner
    positive_terms:
    - regression
    - multiprocessing
    - parallel analysis
    - factor regression
    - batch analysis
    data_domain: financial_data
    negative_terms:
    - correlation
    - screening
    - price download
    - BMG creation
    - database setup
    ambiguity_question: Are you running factor regressions on multiple stocks at scale, or are you looking for single-stock
      analysis, factor correlations, or data loading?
  - uc_id: UC-108
    name: Stock Data Import and Update
    positive_terms:
    - import stocks
    - stock data
    - data import
    - stock returns
    - database update
    data_domain: trading_data
    negative_terms:
    - regression
    - correlation
    - screening
    - BMG factor
    - factor model
    ambiguity_question: Are you loading or updating stock return data in the database, or are you performing analysis like
      regressions, screening, or factor computations?
  - uc_id: UC-109
    name: Database Schema Initialization and Data Import
    positive_terms:
    - database setup
    - schema initialization
    - factor import
    - carbon data
    - bond factors
    data_domain: financial_data
    negative_terms:
    - regression
    - correlation
    - screening
    - stock prices
    - factor analysis
    ambiguity_question: Are you setting up the database schema and loading factor data, or are you performing analysis like
      regressions, correlations, or screening?
context_state_machine:
  states:
  - id: CA1_MEMORY_CHECKED
    entry: Task started
    exit: All memory queries attempted and recorded; memory_unavailable set if failed
    timeout: 30s — skip memory, mark memory_unavailable=true, proceed to CA2
  - id: CA2_GAPS_FILLED
    entry: CA1 complete
    exit: 'All FATAL-priority required inputs answered: target market (A-share/HK/US), data source, time range, strategy type'
    timeout: NOT skippable — FATAL inputs MUST be user-answered before proceeding
  - id: CA3_PATH_SELECTED
    entry: CA2 complete
    exit: intent_router matched single use case with confidence gap > 20% over next candidate, no data_domain ambiguity
    timeout: Trigger ambiguity_question for top-2 candidates, await user selection
  - id: CA4_EXECUTING
    entry: CA3 complete + user explicit confirmation received
    exit: All hard gates G1-Gn passed and output files written
    timeout: NOT skippable — user confirmation of execution path required
  enforcement: Code generation is PROHIBITED before CA4_EXECUTING. Any regression to earlier state MUST be announced to user.
    buy/sell ordering SL-01 check runs at CA4 entry.
spec_lock_registry:
  semantic_locks:
  - id: SL-01
    description: Execute sell orders before buy orders in every trading cycle
    locked_value: sell() called before buy() in each Trader.run() iteration
    violation_is: fatal
    source_bd_ids:
    - BD-018
  - id: SL-02
    description: Trading signals MUST use next-bar execution (no look-ahead)
    locked_value: due_timestamp = happen_timestamp + level.to_second()
    violation_is: fatal
    source_bd_ids:
    - BD-014
    - BD-025
  - id: SL-03
    description: Entity IDs MUST follow format entity_type_exchange_code
    locked_value: stock_sh_600000 | stockhk_hk_0700 | stockus_nasdaq_AAPL
    violation_is: fatal
    source_bd_ids: []
  - id: SL-04
    description: DataFrame index MUST be MultiIndex (entity_id, timestamp)
    locked_value: df.index.names == ['entity_id', 'timestamp']
    violation_is: fatal
    source_bd_ids: []
  - id: SL-05
    description: 'TradingSignal MUST have EXACTLY ONE of: position_pct, order_money, order_amount'
    locked_value: XOR enforcement in trading/__init__.py:68
    violation_is: fatal
    source_bd_ids: []
  - id: SL-06
    description: 'filter_result column semantics: True=BUY, False=SELL, None/NaN=NO ACTION'
    locked_value: factor.py:475 order_type_flag mapping
    violation_is: fatal
    source_bd_ids: []
  - id: SL-07
    description: Transformer MUST run BEFORE Accumulator in factor pipeline
    locked_value: 'compute_result(): transform at :403 before accumulator at :409'
    violation_is: fatal
    source_bd_ids: []
  - id: SL-08
    description: 'MACD parameters locked: fast=12, slow=26, signal=9'
    locked_value: factors/algorithm.py:30 macd(slow=26, fast=12, n=9)
    violation_is: fatal
    source_bd_ids:
    - BD-036
  - id: SL-09
    description: 'Default transaction costs: buy_cost=0.001, sell_cost=0.001, slippage=0.001'
    locked_value: sim_account.py:25 SimAccountService default costs
    violation_is: warning
    source_bd_ids:
    - BD-029
  - id: SL-10
    description: A-share equity trading is T+1 (no same-day close of buy positions)
    locked_value: sim_account.available_long filters by trading_t
    violation_is: fatal
    source_bd_ids: []
  - id: SL-11
    description: Recorder subclass MUST define provider AND data_schema class attributes
    locked_value: contract/recorder.py:71 Meta; register_schema decorator
    violation_is: fatal
    source_bd_ids: []
  - id: SL-12
    description: Factor result_df MUST contain either 'filter_result' OR 'score_result' column
    locked_value: result_df.columns.intersection({'filter_result', 'score_result'}) non-empty
    violation_is: fatal
    source_bd_ids: []
  implementation_hints:
  - id: IH-01
    hint: 'Use AdjustType enum exactly: qfq (pre-adjust), hfq (post-adjust), bfq (none) — contract/__init__.py:121'
  - id: IH-02
    hint: For A-share kdata, default to hfq for long-term analysis (dividend-adjusted) — trader.py:538 StockTrader
  - id: IH-03
    hint: SQLite connection MUST use check_same_thread=False for multi-threaded recorders
  - id: IH-04
    hint: Accumulator state serialization uses JSON with custom encoder/decoder hooks — contract/base_service.py
  - id: IH-05
    hint: Factor.level MUST match TargetSelector.level (enforced at add_factor) — factors/target_selector.py:84
preservation_manifest:
  required_objects:
    business_decisions_count: 124
    fatal_constraints_count: 39
    non_fatal_constraints_count: 139
    use_cases_count: 9
    semantic_locks_count: 12
    preconditions_count: 4
    evidence_quality_rules_count: 2
    traceback_scenarios_count: 5
architecture:
  pipeline: data_collection -> data_storage -> factor_computation -> target_selection -> trading_execution -> visualization
  stages:
  - id: data_collection
    narrative:
      does_what: TimeSeriesDataRecorder and FixedCycleDataRecorder fetch OHLCV and fundamental data from providers (eastmoney,
        joinquant, baostock, akshare) and persist domain objects (Stock1dKdata, BalanceSheet) to SQLite via df_to_db().
      key_decisions: BD-002 chose evaluate_start_end_size_timestamps for incremental fetch (not full refresh) because comparing
        to get_latest_saved_record avoids redundant API calls; BD-003 chose get_data_map field transformation to keep domain
        schema provider-agnostic.
      common_pitfalls: 'Don''t forget SL-11: Recorder subclass MUST declare both provider and data_schema class attributes
        else initialization fails with assertion error; finance-C-001 fatal violation.'
    business_decisions:
    - id: BD-001
      type: B/BA
      summary: Monthly interval uses 1mo yfinance interval (NOT 1d)
    - id: BD-002
      type: BA
      summary: Drops last entry assuming incomplete candle
    - id: BD-003
      type: BA/DK
      summary: Monthly dates aligned to MonthEnd via pandas offset
    - id: BD-004
      type: B
      summary: 3 retry attempts on JSON decode errors
    - id: BD-031
      type: T
      summary: Default data frequency is MONTHLY across each operations
    - id: BD-035
      type: T
      summary: Minimum 20 data points required for valid regression
    - id: BD-036
      type: T
      summary: Returns capped at ±50% (0.5) during regression to remove outliers
    - id: BD-037
      type: T
      summary: Abnormal returns > 100% (>1) are filtered out from stock data
    - id: BD-038
      type: T
      summary: Inner join used when merging stock returns with factor data
    - id: BD-039
      type: B/DK
      summary: Use Yahoo Finance (yfinance) as source for stock price data
    - id: BD-040
      type: B
      summary: Store BMG series in database for reuse across analysis
    - id: BD-047
      type: T
      summary: Drop last (incomplete) data point when fetching stock history
    - id: BD-048
      type: T
      summary: For composites with no price data, compute and store only returns
    - id: BD-054
      type: T
      summary: Skip components with missing percentage when computing composite returns
    - id: BD-058
      type: T
      summary: Fama-French factors converted from decimal to percentage (divided by 100)
    - id: BD-061
      type: B/RC
      summary: 'Store stock details: EBITDA, enterprise value, P/E, cash, debt, shares outstanding'
    - id: BD-GAP-001
      type: RC
      summary: 'Missing: Timezone explicit annotation + UTC'
    - id: BD-GAP-002
      type: DK
      summary: 'Missing: Point-in-Time data availability'
    - id: BD-GAP-003
      type: DK
      summary: 'Missing: Stale data detection and expiry'
    - id: BD-GAP-004
      type: B
      summary: 'Missing: PnL conservation (realized + unrealized)'
    - id: BD-GAP-005
      type: B
      summary: 'Missing: Train/test time split integrity'
    - id: BD-GAP-006
      type: DK
      summary: 'Missing: Random seed full coverage'
    - id: BD-GAP-007
      type: RC
      summary: 'Missing: Settlement and delivery time convention'
    - id: BD-GAP-008
      type: DK
      summary: 'Missing: Rebalancing Trigger Mechanism'
    - id: BD-GAP-009
      type: M
      summary: 'Missing: Transition Matrix Time Homogeneity & Conditioning'
    - id: BD-GAP-010
      type: DK
      summary: 'Missing: Versioned Writes & Snapshot Semantics'
    - id: BD-GAP-011
      type: DK
      summary: 'Missing: ** "Implement UTC timezone normalization for each datetime fields. Add tzinfo awareness to stock_data,
        carbon_risk_factor, and ff_factor tables.'
    - id: BD-GAP-012
      type: DK
      summary: 'Missing: ** "Add random.seed(42) or equivalent to each stochastic operations. Document reproducibility requirements
        in regression_function.py and correlate.py.'
    - id: BD-GAP-013
      type: RC
      summary: 'Missing: ** "Replace ON CONFLICT DO UPDATE with versioned INSERT (add valid_from/valid_to timestamps). Implement
        append-only audit table for carbon_risk_factor changes.'
    - id: BD-GAP-014
      type: B
      summary: 'Missing: ** "Add data_version column to each factor tables. Track run_id and experiment_id in regression results
        table.'
    - id: BD-GAP-015
      type: DK
      summary: 'Missing: Stale data detection and expiry'
    - id: BD-GAP-016
      type: B
      summary: 'Missing: PnL conservation (realized + unrealized)'
    - id: BD-GAP-017
      type: M/DK
      summary: 'Missing: Day count convention'
    - id: BD-GAP-018
      type: M
      summary: 'Missing: Covariance Matrix PSD Fix Strategy'
    - id: BD-GAP-019
      type: B
      summary: 'Missing: Default Definition & IFRS 9 Staging'
    - id: BD-GAP-020
      type: B
      summary: 'Missing: PD/LGD/EAD Estimation (IRB vs Standard)'
    - id: BD-GAP-021
      type: B
      summary: 'Missing: Vasicek Single-Factor Asset Correlation'
    - id: BD-GAP-022
      type: M
      summary: 'Missing: ** "Implement covariance matrix PSD repair strategy (nearest correlation, eigenvalue clipping, or
        shrinkage estimator).'
  - id: data_storage
    narrative:
      does_what: StorageBackend persists DataFrames to per-provider SQLite databases at {data_path}/{provider}/{provider}_{db_name}.db
        using path templates from _get_path_template; Mixin.record_data and Mixin.query_data provide uniform read/write interface.
      key_decisions: BD-004 chose StorageBackend abstraction (not hardcoded SQLite) to allow future cloud storage swap; BD-006
        derives db_name from data_schema __tablename__ for per-domain database isolation.
      common_pitfalls: SL-04 violation (wrong DataFrame index) causes factor pipeline failures downstream; always ensure df.index.names
        == ['entity_id', 'timestamp'] before calling record_data.
    business_decisions: []
  - id: factor_computation
    narrative:
      does_what: Factor.compute() applies Transformer (stateless, e.g. MacdTransformer) then Accumulator (stateful, e.g. MaStatsAccumulator)
        to produce filter_result or score_result columns; EntityStateService persists per-entity rolling state across batches.
      key_decisions: BD-007 chose Factor inheriting DataReader for composable data access; SL-08 locks MACD at (fast=12, slow=26,
        n=9) — chose standard Appel parameters not adaptive because interpretability matters for practitioners.
      common_pitfalls: 'SL-07: Transformer MUST run before Accumulator — swapping order causes NaN propagation; SL-12: result_df
        must contain filter_result OR score_result column or TargetSelector silently drops all signals.'
    business_decisions: []
  - id: target_selection
    narrative:
      does_what: TargetSelector.add_factor() registers Factor instances; get_targets() returns entity_ids passing threshold
        filter at a specific timestamp, enabling point-in-time historical backtesting without look-ahead.
      key_decisions: BD-012 chose registrable factor list (not hardcoded) for runtime customization; BD-013 chose timestamp-specific
        filtering not current-only because backtests need historical point-in-time correctness.
      common_pitfalls: Factor.level MUST match TargetSelector.level (IH-05); mismatched levels cause silent empty target lists
        that look like no signals but are actually level-mismatch bugs.
    business_decisions: []
  - id: trading_execution
    narrative:
      does_what: Trader.run() calls sell() before buy() each cycle, generates TradingSignals with due_timestamp = happen_timestamp
        + level.to_second() for next-bar execution, and applies on_profit_control() for stop-loss/take-profit before regular
        target selection.
      key_decisions: SL-01 locks sell-before-buy order because available_long check in sim_account depends on it — chose this
        over symmetric ordering to prevent implicit leverage; BD-039 chose long=AND/short=OR multi-level logic to reflect
        risk asymmetry.
      common_pitfalls: 'SL-02 violation (immediate execution instead of next-bar) introduces look-ahead bias and makes backtest
        results unreproducible in live trading; SL-10: A-share T+1 constraint — backtesting without it overstates returns.'
    business_decisions: []
  - id: visualization
    narrative:
      does_what: Drawer.draw() combines kline main chart with factor overlays and Rect annotations for entry/exit signals
        using Plotly; Drawable interface on Factor enables consistent chart rendering across data types.
      key_decisions: BD-019 chose drawer_rects subclass override for custom annotations not hardcoded markers — allows traders
        to define entry/exit visuals without modifying base drawing logic.
      common_pitfalls: draw_result=True by default (BD-055) is fine for development but set draw_result=False in production/headless
        environments to avoid Plotly server startup overhead.
    business_decisions: []
  - id: cross_cutting_concerns
    narrative:
      does_what: 'Invariants and utilities that span multiple pipeline stages — collected from 25 source groups: Composite
        Calculation(1), Date Handling(1), Date Normalization(2), Index Definition(1), Model Diagnostics(1), Model Validity(1),
        and 19 more.'
      key_decisions: 86 BDs merged here because they apply to more than one main stage (e.g. algorithm helpers, default value
        choices, ordering contracts, error handling). Agent should inspect individual BD summaries and link back to affected
        main stages via shared IDs.
      common_pitfalls: Cross-cutting concerns frequently surface as bugs when changes to one main stage unintentionally break
        another. Check constraints referencing these BDs and verify invariants still hold after any stage-local modification.
    business_decisions:
    - id: BD-043
      type: T
      summary: 'Composite stock returns calculated as weighted average: sum(pct*return) / sum(pct)'
    - id: BD-053
      type: T
      summary: Use minimum available date when user start_date precedes data range
    - id: BD-051
      type: T
      summary: Monthly dates anchored to end-of-month using MonthEnd(1)
    - id: BD-052
      type: T
      summary: WML factors adjusted to month-end by subtracting 1 day from date
    - id: BD-042
      type: B/DK
      summary: 'Index stocks: IVV = S&P 500, XWD.TO = MSCI World'
    - id: BD-046
      type: T
      summary: 'Store three statistical tests: Jarque-Bera, Breusch-Pagan, Durbin-Watson'
    - id: BD-060
      type: T
      summary: Require at least N+10 data points where N is number of factors in regression
    - id: BD-049
      type: T
      summary: '''DEFAULT'' is reserved factor name, cannot be used for custom BMG series'
    - id: BD-050
      type: T
      summary: Bulk regression mode loads each stocks from DB and joins in memory
    - id: BD-041
      type: T
      summary: Convert stock prices to excess returns by subtracting risk-free rate (Rf)
    - id: BD-032
      type: T
      summary: Default significance threshold = 0.05 for regression analysis
    - id: BD-010
      type: B
      summary: BMG = Brown Returns - Green Returns
    - id: BD-011
      type: B
      summary: Composite stock returns = weighted average of components
    - id: BD-012
      type: BA
      summary: NaN values removed before storing BMG
    - id: BD-030
      type: B/BA
      summary: BMG (Brown-Minus-Green) factor = Brown stock returns minus Green stock returns, calculated as return_x - return_y
    - id: BD-044
      type: T
      summary: Orthogonalize BMG factor by regressing on FF factors, storing residuals
    - id: BD-019
      type: BA/M
      summary: Default interval = 60 months (5 years) for MONTHLY
    - id: BD-020
      type: BA/DK
      summary: Default interval = 730 days (2 years) for DAILY
    - id: BD-021
      type: B/DK
      summary: Rolling window advances by interval (NOT by 1 month)
    - id: BD-022
      type: B/RC
      summary: multiprocessing.set_start_method('spawn')
    - id: BD-023
      type: B/BA
      summary: Connection pool size = 20
    - id: BD-024
      type: BA
      summary: Data loaded once, passed to workers via itertools.repeat
    - id: BD-005
      type: B
      summary: ON CONFLICT DO NOTHING for data imports
    - id: BD-006
      type: BA
      summary: Uses COPY FROM PROGRAM with grep to filter raw FF files
    - id: BD-007
      type: B
      summary: WML (weak momentum) stored in same ff_factor table
    - id: BD-008
      type: B
      summary: Stores both close price AND computed return in stock_data
    - id: BD-009
      type: B
      summary: Frequency column on each time series tables
    - id: BD-057
      type: T
      summary: CSV dates must be in YYYYMMDD or YYYYMM format for import
    - id: BD-059
      type: T
      summary: Delete existing data before importing new factor data
    - id: BD-080
      type: B/BA
      summary: Frequency defaults to MONTHLY; DAILY requires explicit specification
    - id: BD-081
      type: B/BA
      summary: Factor data divided by 100 before regression; Close adjusted by risk-free rate
    - id: BD-088
      type: M/BA
      summary: '''DEFAULT'' is reserved factor_name; cannot be used for user-defined BMG series'
    - id: BD-092
      type: B/BA
      summary: Default interval 0 becomes 60 months (MONTHLY) or 730 days (DAILY)
    - id: BD-025
      type: BA/M
      summary: Significance threshold filter (default 0.1)
    - id: BD-026
      type: B
      summary: Orthogonalized factor stored with -ORTHO suffix
    - id: BD-027
      type: B
      summary: Residuals become the orthogonalized factor
    - id: BD-033
      type: T
      summary: Orthogonalization significance threshold = 0.10
    - id: BD-013
      type: B
      summary: Inner join on dates for each DataFrame merges
    - id: BD-014
      type: B
      summary: Excess returns = Close - Rf (risk-free rate)
    - id: BD-015
      type: BA
      summary: 'Winsorization: keep only |value| < 0.5'
    - id: BD-016
      type: BA/M
      summary: Minimum 20 data points required
    - id: BD-017
      type: B/BA
      summary: FF factors and RF stored as percentage, divided by 100 before use
    - id: BD-018
      type: M/DK
      summary: Custom DateInRangeError for date boundary validation
    - id: BD-062
      type: B/BA
      summary: Pearson correlation for factor correlation matrix
    - id: BD-063
      type: B/BA
      summary: OLS regression for factor orthogonalization
    - id: BD-064
      type: B/BA
      summary: Significance threshold for variable selection at p-value 0.1
    - id: BD-065
      type: B
      summary: Two-stage sequential regression for orthogonalization
    - id: BD-066
      type: B/BA
      summary: OLS residuals as orthogonalized factor values
    - id: BD-094
      type: B/DK
      summary: 'INTERACTION: BD-021 (rolling window by interval) × BD-055 (rolling window by 1 period) → Contradictory window
        advancement logic'
    - id: BD-095
      type: B/BA
      summary: 'INTERACTION: BD-036 (±0.5 winsorization) × BD-091 (≤1.0 abnormal filter) → Inconsistent outlier handling across
        pipeline stages'
    - id: BD-096
      type: BA
      summary: 'INTERACTION: BD-009 (frequency column on tables) × BD-013 (inner join on dates) × BD-038 (inner join for stock-factor
        merge) → Hidden data loss cascade'
    - id: BD-097
      type: B/BA
      summary: 'INTERACTION: BD-021 (rolling window by interval) × BD-019 (60-month default) × BD-020 (730-day default) ×
        BD-080 (MONTHLY default) → Risk cascade of window size misinterpretation'
    - id: BD-098
      type: BA
      summary: 'INTERACTION: BD-039 (Yahoo Finance source) × BD-008 (precomputed returns stored) × BD-047 (drop last incomplete
        entry) → Amplification of data quality risks'
    - id: BD-099
      type: BA/M
      summary: 'INTERACTION: BD-017 (FF factors /100) × BD-041 (excess returns = Close - Rf) × BD-081 (factor /100 and Close
        - Rf) → Overlapping normalization decisions with latent contradiction risk'
    - id: BD-100
      type: BA/DK
      summary: 'INTERACTION: BD-022 (spawn multiprocessing) × BD-090 (connection pool get/put pairing) × BD-024 (data loaded
        once, passed via repeat) → Hidden dependency on spawn-specific behavior'
    - id: BD-101
      type: B
      summary: 'INTERACTION: BD-059 (DELETE before INSERT) × BD-005 (ON CONFLICT DO NOTHING) × BD-093 (refresh_views after
        completion) → Risk cascade of atomicity violations'
    - id: BD-102
      type: BA
      summary: 'INTERACTION: BD-015 (±0.5 winsorization) × BD-072 (outlier filter [-0.5, 0.5]) × BD-087 ([-0.5, 0.5] bounds)
        → Redundant outlier decisions across scripts'
    - id: BD-085
      type: DK
      summary: Composite stocks compute returns via component weighted average BEFORE regression
    - id: BD-082
      type: BA
      summary: Sliding window regression requires data_end_date <= end_date to terminate loop
    - id: BD-083
      type: BA/M
      summary: Minimum 20 data points required for valid OLS regression
    - id: BD-087
      type: BA
      summary: Outlier filter bounds returns to [-0.5, 0.5] before regression
    - id: BD-091
      type: BA/DK
      summary: 'Abnormal returns filtered: data[''r''] <= 1 before DB insert'
    - id: BD-079
      type: B/BA
      summary: BMG factor MUST be brown minus green (subtraction order is critical)
    - id: BD-084
      type: B/DK
      summary: 'Date range validation: start_date < max_data AND end_date > min_data required'
    - id: BD-089
      type: B
      summary: 'Data merge order: stock → carbon → ff → rf (inner join on dates)'
    - id: BD-086
      type: BA
      summary: Multiprocessing uses spawn method with explicit data duplication per worker
    - id: BD-090
      type: BA
      summary: 'Connection pool lifecycle: getconn() must be paired with putconn()'
    - id: BD-093
      type: M
      summary: Database refresh_views called after each regressions complete
    - id: BD-071
      type: B/BA
      summary: OLS regression for stock factor analysis
    - id: BD-072
      type: B
      summary: Data clipping to [-0.5, 0.5] range for outlier removal
    - id: BD-073
      type: B
      summary: Minimum sample size requirement n > k + 10 for regression
    - id: BD-074
      type: B/BA
      summary: Jarque-Bera test for normality of residuals
    - id: BD-075
      type: B/BA
      summary: Breusch-Pagan test for heteroscedasticity
    - id: BD-076
      type: B/BA
      summary: Durbin-Watson test for autocorrelation
    - id: BD-077
      type: B
      summary: R-squared for model fit assessment
    - id: BD-028
      type: B/BA
      summary: Significance defined as bmg_p_gt_abs_t < threshold
    - id: BD-029
      type: BA
      summary: 'Majority rule: >50% of periods significant'
    - id: BD-034
      type: T
      summary: 'Regression interval defaults: 60 months for MONTHLY, 730 days for DAILY'
    - id: BD-045
      type: T
      summary: 'Run two-stage OLS: first with each factors, then only significant factors'
    - id: BD-055
      type: T
      summary: Rolling window advances by 1 period (month or day) between regressions
    - id: BD-056
      type: T
      summary: Daily frequency uses 730-day (2-year) interval when not specified
    - id: BD-067
      type: B
      summary: Simple percentage change for return calculation
    - id: BD-068
      type: B
      summary: Return cap at 100% for abnormal return filtering
    - id: BD-069
      type: B/BA
      summary: Weighted average for composite stock returns
    - id: BD-070
      type: B/RC
      summary: Outer join for merging component stock data
    - id: BD-078
      type: B
      summary: Decimal type for percentage-weighted return multiplication
resources:
  packages:
  - name: pandas
    version_pin: ==1.5.3
  - name: numpy
    version_pin: ==1.24.4
  - name: matplotlib
    version_pin: '>=2'
  - name: requests
    version_pin: ==2.31.0
  - name: scipy
    version_pin: '>=1.3.0'
  - name: scikit-learn
    version_pin: '>1.4.2'
  - name: pytest
    version_pin: '>=8.3'
  strategy_scaffold:
    entry_point_name: run_backtest
    output_path: result.csv
    execution_mode: backtest
    conditional_entry_points:
      backtest:
        entry_point_name: run_backtest
        output_path: result.csv
      collector:
        entry_point_name: run_collector
        output_path: result.json
      factor:
        entry_point_name: run_factor
        output_path: result.parquet
      training:
        entry_point_name: run_training
        output_path: result.json
      serving:
        entry_point_name: run_server
        output_path: result.json
      research:
        entry_point_name: run_research
        output_path: result.json
    tail_template: "# === DO NOT MODIFY BELOW THIS LINE ===\nif __name__ == \"__main__\":\n    result = run_backtest()  #\
      \ implement above\n    from validate import enforce_validation\n    enforce_validation(result, output_path=\"{workspace}/result.csv\"\
      )\n# === END DO NOT MODIFY ==="
  host_adapter:
    target: openclaw
    timeout_seconds: 1800
    shell_operator_restriction: 'exec tool intercepts && / ; / | — never chain: ''pip install X && python Y''. Use separate
      exec calls.'
    install_recipes:
    - python3 -m pip install zvt
    credential_injection: JoinQuant/QMT credentials require user-side '!' prefix shell login. Never hardcode credentials in
      generated scripts.
    path_resolution: '{workspace} resolves to ~/.openclaw/workspace/doramagic at execution time.'
    file_io_tooling: Use openclaw 'write' tool for .py/.sql files; 'exec' tool for python3 /absolute/path/script.py (absolute
      paths only).
constraints:
  fatal:
  - id: finance-C-001
    when: When fetching monthly frequency stock data
    action: Use yfinance interval='1mo' (monthly candles), NOT interval='1d'
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Using daily data for monthly frequency causes incorrect MonthEnd date alignment, leading to misaligned returns
      that corrupt downstream factor regressions
    stage_ids:
    - data_collection
  - id: finance-C-002
    when: When fetching stock price data from yfinance
    action: Drop the last entry assuming it is an incomplete candle
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Partial last-period candle causes incomplete return calculations, producing NaN or incorrect values when
      computing pct_change in downstream stages
    stage_ids:
    - data_collection
  - id: finance-C-007
    when: When fetching stock data with yfinance
    action: Limit frequency to DAILY or MONTHLY only (raises Exception for other values)
    severity: fatal
    kind: resource_boundary
    modality: must
    consequence: Unsupported frequency triggers Exception at stock_price_function.py:25, aborting the data fetch and leaving
      downstream stages without required data
    stage_ids:
    - data_collection
  - id: finance-C-011
    when: When computing returns from stock price data
    action: Verify Close column contains numeric float values (not strings or objects)
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Non-numeric Close values cause pct_change() to return NaN for all rows, producing empty regression inputs
      and invalid factor loadings
    stage_ids:
    - data_collection
  - id: finance-C-016
    when: When importing Fama-French factor data from raw CSV files
    action: apply grep preprocessing to filter header rows before COPY import
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Raw FF CSV files contain header rows that would corrupt the database with invalid date strings, causing all
      factor data imports to fail or contain garbage values
    stage_ids:
    - data_import
  - id: finance-C-017
    when: When creating PostgreSQL schema for time series tables
    action: define frequency column with text type and enforce 'MONTHLY' or 'DAILY' values
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Without explicit frequency enforcement, mixed frequency data causes regression calculations to combine incompatible
      data, producing statistically invalid results
    stage_ids:
    - data_import
  - id: finance-C-020
    when: When importing Fama-French factors
    action: remove rows with any null factor values (mkt_rf, smb, hml, wml) after import
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Incomplete factor rows cause matrix inversion failures in regressions, as factor covariance matrix cannot
      be computed with null values
    stage_ids:
    - data_import
  - id: finance-C-022
    when: When setting up the data infrastructure
    action: use PostgreSQL as the database backend with COPY FROM PROGRAM support
    severity: fatal
    kind: resource_boundary
    modality: must
    consequence: The import scripts rely on PostgreSQL-specific features (COPY FROM PROGRAM, materialized views, ON CONFLICT).
      Using incompatible databases causes all imports to fail
    stage_ids:
    - data_import
  - id: finance-C-029
    when: When initializing database schema
    action: 'create each seven required tables: stocks, stock_data, carbon_risk_factor, ff_factor, risk_free, stock_components,
      stock_stats'
    severity: fatal
    kind: architecture_guardrail
    modality: must
    consequence: Missing any required table causes downstream scripts to fail with relation does not exist errors, blocking
      all factor regression calculations
    stage_ids:
    - data_import
  - id: finance-C-046
    when: When implementing factor regression data alignment
    action: use inner join to merge stock returns, carbon factor, FF factors, and risk-free rate on matching dates
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Regression coefficients become statistically invalid when data points lack complete factor coverage, causing
      biased estimates and unreliable inference
    stage_ids:
    - factor_regression
  - id: finance-C-047
    when: When computing excess returns for factor regression
    action: subtract risk-free rate (Rf) from stock Close price to compute excess returns, not raw returns
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: CAPM/FF regression requires excess returns (Close - Rf); using raw returns violates standard financial econometric
      methodology and produces incorrect beta estimates
    stage_ids:
    - factor_regression
  - id: finance-C-048
    when: When preparing Fama-French factors and risk-free rate for OLS regression
    action: divide percentage-formatted FF factors and risk-free rate by 100 before using in regression
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: FF factors stored as percentages (e.g., 5.2) must be converted to decimals (0.052); using raw percentages
      produces coefficients scaled by 100x and invalid inference
    stage_ids:
    - factor_regression
  - id: finance-C-050
    when: When validating regression data sufficiency
    action: require at least 20 data points after each filtering (inner join, winsorization, date range)
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: OLS with fewer than 20 observations produces unreliable t-statistics and p-values; standard practice requires
      minimum 20 monthly observations for statistical validity
    stage_ids:
    - factor_regression
  - id: finance-C-061
    when: When implementing regression analysis with factor models
    action: Use at least 20 data points for statistical validity of regression coefficients
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Regression statistics (t-stats, p-values, R²) become unreliable or undefined with fewer than 20 observations,
      producing meaningless or misleading factor loadings
    stage_ids:
    - bulk_regression
  - id: finance-C-065
    when: When implementing multiprocessing for parallel regression execution
    action: Use spawn method instead of fork for multiprocessing start
    severity: fatal
    kind: resource_boundary
    modality: must
    consequence: Forking a process with psycopg2 connections causes 'connection already closed' errors or corrupted connection
      state in child processes, resulting in failed database writes
    stage_ids:
    - bulk_regression
  - id: finance-C-067
    when: When acquiring database connections from the pool
    action: Return connections to the pool after use with putconn()
    severity: fatal
    kind: resource_boundary
    modality: must
    consequence: Connection leak exhausts the pool, causing subsequent operations to block indefinitely waiting for available
      connections
    stage_ids:
    - bulk_regression
  - id: finance-C-075
    when: When implementing factor orthogonalization using OLS regression
    action: Add a constant term (intercept) to the regression model via x.insert(0, 'Constant', 1)
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Regression without intercept will produce biased estimates of factor loadings, causing the orthogonalization
      to incorrectly attribute variance to the constant term instead of the BMG factor
    stage_ids:
    - factor_orthogonalization
  - id: finance-C-076
    when: When selecting factors for orthogonalization regression
    action: Use p-values from the initial full regression to filter factors where p-value is below the significance threshold
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Including non-significant factors in orthogonalization introduces noise and may overfit the regression model,
      producing unreliable residuals that contaminate downstream factor analysis
    stage_ids:
    - factor_orthogonalization
  - id: finance-C-077
    when: When naming the orthogonalized factor
    action: Append the -ORTHO suffix to the original factor name to distinguish it from raw factor values
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Without the -ORTHO suffix, downstream regression scripts cannot distinguish between raw and orthogonalized
      BMG factors, causing incorrect factor selection and contaminated regression results
    stage_ids:
    - factor_orthogonalization
  - id: finance-C-078
    when: When storing orthogonalization results in the database
    action: Delete any existing orthogonalized factor with the same name before inserting new results to prevent duplicate
      primary key violations
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Duplicate entries in carbon_risk_factor table will cause primary key constraint violations and corrupt downstream
      regression analysis that depends on unique factor-date combinations
    stage_ids:
    - factor_orthogonalization
  - id: finance-C-080
    when: When storing orthogonalized factor values
    action: Store the regression residuals (model.resid.values) as the bmg column in carbon_risk_factor table
    severity: fatal
    kind: architecture_guardrail
    modality: must
    consequence: Storing fitted values instead of residuals defeats the purpose of orthogonalization, as the resulting factor
      still contains variance explained by correlated factors
    stage_ids:
    - factor_orthogonalization
  - id: finance-C-082
    when: When ensuring orthogonalized factors are available for downstream regressions
    action: Store orthogonalized factors in the carbon_risk_factor table with matching frequency parameter so get_regressions.py
      can query them by factor_name
    severity: fatal
    kind: architecture_guardrail
    modality: must
    consequence: Storing orthogonalized factors in a separate table or with different schema breaks the data access pattern
      used by subsequent regression scripts, making the orthogonalized factors unavailable for selection
    stage_ids:
    - factor_orthogonalization
  - id: finance-C-091
    when: When implementing significance filtering for BMG coefficient
    action: Use bmg_p_gt_abs_t < threshold to identify significant results
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Using incorrect significance test direction (e.g., bmg_p_gt_abs_t > threshold) would include non-significant
      results, causing false positive climate risk identifications
    stage_ids:
    - results_analysis
  - id: finance-C-092
    when: When implementing majority significance counting across periods
    action: Use HAVING count(CASE WHEN bmg_p_gt_abs_t < threshold THEN 1 END) > count(CASE WHEN bmg_p_gt_abs_t >= threshold
      THEN 1 END) for majority rule
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Using >= 50% threshold instead of > 50% would misclassify stocks with exactly 50% significant periods as
      having significant exposure
    stage_ids:
    - results_analysis
  - id: finance-C-101
    when: When presenting analysis results to users
    action: Present backtest regression results as definitive proof of live trading performance
    severity: fatal
    kind: claim_boundary
    modality: must_not
    consequence: Presenting backtest results as live performance would mislead users into expecting similar trading returns,
      violating financial regulatory guidance and causing potential financial loss
    stage_ids:
    - results_analysis
  - id: finance-C-102
    when: When using analysis results for investment decisions
    action: Treat BMG factor regression results as guaranteed investment returns
    severity: fatal
    kind: claim_boundary
    modality: must_not
    consequence: Regression coefficients reflect historical market pricing of climate risk and do not predict future returns,
      leading to potential financial losses if used for direct trading
    stage_ids:
    - results_analysis
  - id: finance-C-128
    when: When implementing time series data handling across each stages
    action: Use Date index with name 'Date' for each time series DataFrames
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Merging and correlation calculations fail when date index naming is inconsistent, causing regression to produce
      incorrect factor loadings and invalid climate risk measurements
  - id: finance-C-129
    when: When storing or processing time series data across each stages
    action: Use only 'MONTHLY' or 'DAILY' values for the frequency field
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Invalid frequency values cause unsupported frequency exceptions, preventing data storage and blocking all
      regression analysis
  - id: finance-C-130
    when: When storing or processing percentage values across each stages
    action: Store percentage values as percentage format (e.g., 5.2) NOT decimal format (e.g., 0.052)
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Factor values multiplied by 100 during calculation but stored without conversion, causing regression coefficients
      to be 100x too large and producing meaningless BMG climate risk loadings
  - id: finance-C-131
    when: When running regression analysis across each stages
    action: Verify inner join on dates across stock returns, carbon risk factor, and Fama-French factors
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Missing date alignment causes rows with NaN values in regression, producing invalid OLS coefficients and
      unreliable climate risk factor loadings
  - id: finance-C-132
    when: When running regression analysis across each stages
    action: Require at least 20 data points for valid statistical regression results
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Regression with insufficient data points produces unreliable t-statistics and p-values, causing incorrect
      conclusions about climate risk factor significance
  - id: finance-C-134
    when: When naming BMG climate risk factors across each stages
    action: Use 'DEFAULT' as a custom factor name — it is reserved as the system default BMG factor
    severity: fatal
    kind: domain_rule
    modality: must_not
    consequence: Overwriting DEFAULT factor causes all downstream regressions to use wrong climate risk data, invalidating
      entire analysis
  - id: finance-C-143
    when: When operating this system in production
    action: Require PostgreSQL database infrastructure — the system stores each stock data, factor data, and regression results
      in database tables
    severity: fatal
    kind: resource_boundary
    modality: must
    consequence: Without PostgreSQL, no data can be stored or retrieved, completely blocking all analysis and regression workflows
    stage_ids:
    - data_import
  - id: finance-C-148
    when: When executing the regression workflow
    action: 'Follow the required data flow order: load factors from DB → merge on dates using inner join → calculate stock
      returns with pct_change → filter outliers → run OLS regression'
    severity: fatal
    kind: architecture_guardrail
    modality: must
    consequence: Deviation from the required workflow order causes data misalignment, resulting in incorrect factor loadings
      and invalid climate risk measurements
  - id: finance-C-165
    when: When implementing BMG (Brown-Minus-Green) factor calculation for carbon risk regression analysis
    action: Calculate BMG factor as return_x minus return_y (Brown stock returns minus Green stock returns) to construct a
      long-short portfolio capturing pure carbon risk premium without contamination from other factor exposures
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Incorrect BMG calculation (e.g., reversed subtraction order or different weighting) corrupts the primary
      dependent variable in carbon risk analysis, causing strategies to be built on wrong factor exposures and potentially
      losing capital on mispriced carbon risk
    derived_from_bd_id: BD-030
  - id: finance-C-174
    when: When implementing or refactoring BMG factor computation logic
    action: Maintain BMG formula as Brown Returns minus Green Returns (Brown - Green); changing the order to Green minus Brown
      inverts the entire framework's interpretation of positive values
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Inverting the BMG formula causes all climate factor analysis to report opposite results, producing false
      carbon alpha signals and misidentifying brown stock outperformance as green stock outperformance
    derived_from_bd_id: BD-010
  - id: finance-C-177
    when: When implementing or using rolling window regression API
    action: Explicitly specify overlap parameter behavior when requesting rolling window advancement; do not rely on ambiguous
      window advancement logic that contradicts between BD-021 (non-overlapping for statistical independence) and BD-055 (1-period
      advancement)
    severity: fatal
    kind: architecture_guardrail
    modality: must
    consequence: Contradictory window advancement logic between non-overlapping (BD-021) and overlapping (BD-055) specifications
      produces invalid rolling regression results with silent errors in beta estimates and statistical inference
    derived_from_bd_id: BD-094
  - id: finance-C-181
    when: When normalizing factor returns for regression
    action: 'Verify consistent application of factor normalization: FF factors must be divided by 100 AND excess returns must
      be computed as Close minus risk-free rate; verify both operations are applied to maintain consistent scale between factors
      and returns'
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Inconsistent normalization (applying only one of factor/100 or Close-Rf) creates scale mismatches in regression
      coefficients, causing systematic pricing errors and invalid factor risk premium estimates
    derived_from_bd_id: BD-099
  - id: finance-C-205
    when: When configuring rolling window regression parameters
    action: Verify window advancement size matches the interval default (60 months for MONTHLY, 730 days for DAILY) — BD-019,
      BD-020, BD-080, and BD-021 must be consistent; verify that interval advancement logic uses the same unit (months vs
      days) as the window size
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Window size misinterpretation due to unit mismatch (month-based window advancing by day-based interval) causes
      incorrect beta estimates, invalid t-statistics, and wrong BMG premium conclusions that appear statistically valid but
      are structurally flawed
    derived_from_bd_id: BD-097
  regular:
  - id: finance-C-003
    when: When aligning monthly stock data timestamps
    action: Add MonthEnd(1) offset to index for proper financial convention alignment
    severity: high
    kind: domain_rule
    modality: must
    consequence: Without MonthEnd alignment, monthly return dates do not match financial reporting periods, causing misalignment
      with Fama-French factors and BMG climate data
    stage_ids:
    - data_collection
  - id: finance-C-004
    when: When processing stock data with the Close column
    action: Drop NaN values from the Close series before returning
    severity: high
    kind: domain_rule
    modality: must
    consequence: NaN values in Close column propagate to pct_change calculations, producing all-NaN rows and corrupting downstream
      regression inputs
    stage_ids:
    - data_collection
  - id: finance-C-005
    when: When returning stock data from stock_df_grab
    action: Verify the DataFrame index is named 'Date' as per interface contract
    severity: high
    kind: domain_rule
    modality: must
    consequence: Downstream stages like bulk_script.py:15 expect index.rename('Date') and merge operations fail when index
      name is missing or incorrect
    stage_ids:
    - data_collection
  - id: finance-C-006
    when: When using yfinance API for stock data
    action: Assume yfinance provides real-time data without delay
    severity: medium
    kind: resource_boundary
    modality: must_not
    consequence: yfinance data has inherent delays (15+ minutes for US stocks), presenting historical data as real-time causes
      incorrect trading signals and performance claims
    stage_ids:
    - data_collection
  - id: finance-C-008
    when: When yfinance JSON decode fails (transient network error)
    action: Retry up to 3 attempts before raising ValueError timeout
    severity: high
    kind: resource_boundary
    modality: must
    consequence: Without retry logic, transient network failures cause immediate data fetch failures, preventing bulk stock
      imports from completing
    stage_ids:
    - data_collection
  - id: finance-C-009
    when: When importing stock data into the database
    action: Skip stocks that fail to fetch (raise ValueError and catch in calling code)
    severity: high
    kind: operational_lesson
    modality: must
    consequence: Uncaught exceptions from invalid tickers or API failures halt the entire batch import process, preventing
      valid stocks from being processed
    stage_ids:
    - data_collection
  - id: finance-C-010
    when: When the fetch ticker is invalid or data unavailable
    action: Return empty DataFrame, not crash the calling process
    severity: high
    kind: operational_lesson
    modality: must
    consequence: Invalid ticker causing unhandled exception terminates multiprocessing pool workers, losing progress on all
      pending stocks in bulk imports
    stage_ids:
    - data_collection
  - id: finance-C-012
    when: When representing fetched stock data as results
    action: Claim yfinance historical data represents real-time trading performance
    severity: high
    kind: claim_boundary
    modality: must_not
    consequence: Presenting historical backtest results as live trading performance violates financial regulations and misleads
      investors about expected returns
    stage_ids:
    - data_collection
  - id: finance-C-013
    when: When handling fetched stock price data
    action: Use raw yfinance data as the sole source without validation against other data providers
    severity: medium
    kind: claim_boundary
    modality: must_not
    consequence: Relying exclusively on yfinance without cross-validation risks data quality issues (delays, missing data,
      incorrect splits) being undetected and propagated to analysis
    stage_ids:
    - data_collection
  - id: finance-C-014
    when: When accessing stock price data in downstream stages
    action: Use stock_df_grab() wrapper function as the data entry point, not yfinance directly
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Bypassing the data ingestion layer causes missing Date column formatting, incorrect index naming, and inconsistent
      data format across downstream stages
    stage_ids:
    - data_collection
  - id: finance-C-015
    when: When formatting stock data output
    action: Convert Date index to date objects and position Date column as first column in DataFrame
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Inconsistent date handling causes merge operations in bulk_script.py and factor_regression.py to fail, producing
      incorrect or empty merged datasets
    stage_ids:
    - data_collection
  - id: finance-C-018
    when: When storing financial factor values in PostgreSQL
    action: use DECIMAL data type with specified precision instead of FLOAT
    severity: high
    kind: domain_rule
    modality: must
    consequence: Floating-point representation causes rounding errors in financial calculations, leading to incorrect factor
      loadings and misstated investment performance metrics
    stage_ids:
    - data_import
  - id: finance-C-019
    when: When inserting stock return data
    action: accept returns greater than 100% or NaN values into stock_data table
    severity: high
    kind: domain_rule
    modality: must_not
    consequence: Abnormal returns (>100%) and NaN values cause division errors and corrupt downstream regressions, producing
      meaningless beta/alpha estimates
    stage_ids:
    - data_import
  - id: finance-C-021
    when: When performing data imports via COPY command
    action: use staging tables with ON CONFLICT DO NOTHING to prevent duplicate key violations on re-runs
    severity: high
    kind: domain_rule
    modality: must
    consequence: Without idempotent upsert behavior, re-running data imports throws unique constraint violations, blocking
      incremental updates and requiring full database recreation
    stage_ids:
    - data_import
  - id: finance-C-023
    when: When importing MSCI constituents and weights
    action: populate both stocks table (with ticker, name, sector) and stock_components table (with parent ticker, component
      stock, percentage)
    severity: high
    kind: domain_rule
    modality: must
    consequence: Missing either stocks or stock_components records breaks the parent-component relationship, preventing composite
      ETF return calculations and sector analysis
    stage_ids:
    - data_import
  - id: finance-C-024
    when: When importing Fama-French factors
    action: store each four factors (Mkt-RF, SMB, HML, WML) in the ff_factor table with unified structure
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Without WML (weak momentum) in the same table, factor regressions miss the momentum factor, producing incorrect
      four-factor model estimates
    stage_ids:
    - data_import
  - id: finance-C-025
    when: When storing stock price data
    action: store both close price and pre-computed return in stock_data table
    severity: medium
    kind: architecture_guardrail
    modality: must
    consequence: Without pre-computed returns, every query must recalculate pct_change, causing repeated O(n) scans over large
      time series and slow dashboard queries
    stage_ids:
    - data_import
  - id: finance-C-026
    when: When creating database tables
    action: define explicit PRIMARY KEY constraints on each tables
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Without primary keys, duplicate rows can accumulate silently, causing one-to-many relationship failures in
      JOIN queries and incorrect regression results
    stage_ids:
    - data_import
  - id: finance-C-027
    when: When importing index constituent data
    action: clear existing constituents for the same parent ticker before inserting new ones
    severity: high
    kind: operational_lesson
    modality: must
    consequence: Without deleting old constituents first, the stock_components table accumulates stale entries, causing outdated
      index weights to pollute regression analysis
    stage_ids:
    - data_import
  - id: finance-C-028
    when: When importing bond factor data
    action: delete each existing rows before import (non-idempotent full replacement)
    severity: medium
    kind: operational_lesson
    modality: must
    consequence: Unlike other tables, bond_factor uses DELETE + COPY without ON CONFLICT, so stale data remains if import
      fails midway through the file
    stage_ids:
    - data_import
  - id: finance-C-030
    when: When performing data imports
    action: claim that imported historical data equals real-time market conditions
    severity: high
    kind: claim_boundary
    modality: must_not
    consequence: Presenting historical backtest results as equivalent to live trading violates regulatory standards and investor
      protection requirements, as past performance does not guarantee future results
    stage_ids:
    - data_import
  - id: finance-C-031
    when: When using the system for investment decisions
    action: claim that simulated portfolio returns from imported data represent actual trading performance
    severity: high
    kind: claim_boundary
    modality: must_not
    consequence: Simulated returns exclude transaction costs, slippage, and liquidity constraints, causing material overstatement
      of expected live trading profits
    stage_ids:
    - data_import
  - id: finance-C-049
    when: When filtering extreme returns from regression input
    action: exclude observations where |return| >= 0.5 (50%) using winsorization to remove outliers
    severity: high
    kind: domain_rule
    modality: must
    consequence: Extreme returns exceeding 50% likely represent data errors or corporate actions that dominate regression
      coefficients, causing spurious factor loadings
    stage_ids:
    - factor_regression
  - id: finance-C-051
    when: When generating output DataFrames from factor regression
    action: set DateTime index name to 'Date' in merged DataFrame to maintain naming consistency
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Downstream scripts (get_regressions.py) expect 'Date' index name; mismatched index names cause merge failures
      and silent data loss
    stage_ids:
    - factor_regression
  - id: finance-C-052
    when: When validating date range inputs against available data
    action: raise custom DateInRangeError when start_date >= max_date or end_date <= min_date
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Invalid date boundaries cause empty regression datasets or silent data truncation, producing meaningless
      or missing results without clear error messages
    stage_ids:
    - factor_regression
  - id: finance-C-053
    when: When ensuring OLS has sufficient degrees of freedom
    action: require number of observations to exceed number of factors by at least 10 (shape[0] > shape[1] + 10)
    severity: high
    kind: domain_rule
    modality: must
    consequence: Insufficient degrees of freedom causes OLS estimation to fail or produce unstable estimates; the +10 buffer
      ensures reliable standard error estimation
    stage_ids:
    - factor_regression
  - id: finance-C-054
    when: When computing factor regression coefficients
    action: include intercept term (constant) as first column in regression design matrix
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: FF factor models require an intercept to capture abnormal returns; omitting the constant produces biased
      coefficient estimates
    stage_ids:
    - factor_regression
  - id: finance-C-055
    when: When performing rolling/interval regression analysis
    action: only run regression when overlapping data exists between stock, FF factors, and carbon factor after start_date
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Regression without complete factor coverage produces invalid coefficients; attempting regression with missing
      data raises ValueError in code
    stage_ids:
    - factor_regression
  - id: finance-C-056
    when: When presenting factor regression results
    action: claim that backtest regression coefficients predict future stock performance or guarantee investment returns
    severity: high
    kind: claim_boundary
    modality: must_not
    consequence: Past factor loadings do not guarantee future results; regression captures historical relationships that may
      change due to market regime shifts, structural breaks, or factor decay
    stage_ids:
    - factor_regression
  - id: finance-C-057
    when: When using regression coefficients for investment decisions
    action: present the software output as financial investment advice or recommendation
    severity: high
    kind: claim_boundary
    modality: must_not
    consequence: Disclaimer explicitly states this is not investment advice; presenting regression output as recommendations
      violates the stated purpose and creates legal/regulatory risk
    stage_ids:
    - factor_regression
  - id: finance-C-058
    when: When running bulk regressions for multiple stocks
    action: enable silent mode to suppress verbose output and improve processing performance
    severity: medium
    kind: operational_lesson
    modality: should
    consequence: Verbose printing for each of hundreds of stocks creates excessive I/O overhead, significantly slowing bulk
      regression processing
    stage_ids:
    - factor_regression
  - id: finance-C-059
    when: When converting date inputs for factor regression
    action: parse date strings in YYYY-MM-DD format; non-string dates are passed through unchanged
    severity: medium
    kind: architecture_guardrail
    modality: must
    consequence: Incorrectly formatted date strings cause ValueError in datetime.strptime; silent pass-through of non-string
      dates prevents type checking
    stage_ids:
    - factor_regression
  - id: finance-C-060
    when: When handling edge case of empty merged DataFrame
    action: raise ValueError immediately when merged DataFrame length is zero before attempting regression
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Attempting OLS on empty DataFrame causes downstream errors; explicit validation with clear error message
      prevents cryptic failures
    stage_ids:
    - factor_regression
  - id: finance-C-062
    when: When implementing bulk data processing with rolling windows
    action: Handle missing data by dropping rows with NaN values before regression
    severity: high
    kind: domain_rule
    modality: must
    consequence: Inner join on dates with missing factor data produces empty DataFrames, causing silent failures and missing
      regression results for affected stocks
    stage_ids:
    - bulk_regression
  - id: finance-C-063
    when: When setting the default regression interval
    action: Use 60 months (5 years) as default for MONTHLY frequency and 730 days (2 years) for DAILY frequency
    severity: medium
    kind: domain_rule
    modality: must
    consequence: Using insufficient observations reduces statistical significance; using too many observations includes stale
      data that degrades factor model accuracy
    stage_ids:
    - bulk_regression
  - id: finance-C-064
    when: When configuring PostgreSQL connection pool for bulk regression
    action: Limit connection pool to maximum 20 concurrent connections
    severity: high
    kind: resource_boundary
    modality: must
    consequence: Exceeding pool size causes connection exhaustion errors, blocking all database operations and halting regression
      execution mid-process
    stage_ids:
    - bulk_regression
  - id: finance-C-066
    when: When loading factor data for bulk regression workers
    action: Load each factor data once and share via itertools.repeat across workers
    severity: high
    kind: resource_boundary
    modality: must
    consequence: Per-worker data loading causes redundant database queries, multiplying I/O load by number of workers and
      drastically slowing bulk regression execution
    stage_ids:
    - bulk_regression
  - id: finance-C-068
    when: When storing regression results in the database
    action: Use DELETE before INSERT pattern for UPSERT semantics on stock_stats table
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Duplicate key violations prevent regression storage; without idempotent writes, re-running regressions produces
      constraint errors instead of updates
    stage_ids:
    - bulk_regression
  - id: finance-C-069
    when: When completing bulk regression processing
    action: Refresh materialized views (stock_and_stats, stock_component_and_stats, stock_parent_and_stats) after bulk completion
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Stale materialized view data causes queries to return outdated regression results, misleading downstream
      analysis and portfolio construction
    stage_ids:
    - bulk_regression
  - id: finance-C-070
    when: When implementing rolling window regression advancement
    action: Advance window by interval_freq (1 day/month) to create overlapping windows
    severity: medium
    kind: architecture_guardrail
    modality: must
    consequence: Non-overlapping windows reduce time-series observations; overlapping windows provide more data points for
      statistical significance while maintaining temporal ordering
    stage_ids:
    - bulk_regression
  - id: finance-C-071
    when: When configuring frequency parameter for regression
    action: Use only MONTHLY or DAILY as valid frequency values
    severity: high
    kind: domain_rule
    modality: must
    consequence: 'Unsupported frequency causes Exception with message ''Unsupported frequency: {value}'', preventing any regression
      from running'
    stage_ids:
    - bulk_regression
  - id: finance-C-072
    when: When presenting regression results to users
    action: Claim that backtest regression results predict future trading performance
    severity: high
    kind: claim_boundary
    modality: must_not
    consequence: Presenting historical factor loadings as predictive of future returns violates regulatory guidance and misleads
      investors about actual expected performance
    stage_ids:
    - bulk_regression
  - id: finance-C-073
    when: When displaying regression coefficient results
    action: Present statistical results as investment recommendations
    severity: high
    kind: claim_boundary
    modality: must_not
    consequence: Factor loadings (beta, t-stats) are analytical outputs, not buy/sell signals; presenting them as recommendations
      violates the project's stated purpose and legal disclaimers
    stage_ids:
    - bulk_regression
  - id: finance-C-074
    when: When using ThreadedConnectionPool for database operations
    action: Use ThreadedConnectionPool (not SimpleConnectionPool) to support concurrent thread access
    severity: high
    kind: resource_boundary
    modality: must
    consequence: SimpleConnectionPool does not handle concurrent thread access safely, causing race conditions and intermittent
      'connection already closed' errors
    stage_ids:
    - bulk_regression
  - id: finance-C-079
    when: When merging factor dataframes for orthogonalization
    action: Use left join when merging additional factors to preserve each BMG observations, then call dropna() to remove
      incomplete rows
    severity: high
    kind: domain_rule
    modality: must
    consequence: Using outer join or skipping dropna() will include rows with missing factor values, producing NaN residuals
      that corrupt the orthogonalized factor series
    stage_ids:
    - factor_orthogonalization
  - id: finance-C-081
    when: When running batch processing of each BMG series
    action: Skip factor names that already end with -ORTHO to prevent re-orthogonalization of already-orthogonalized factors
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Re-orthogonalizing already-orthogonalized factors produces doubly-transformed residuals with degraded statistical
      properties and unpredictable factor loadings
    stage_ids:
    - factor_orthogonalization
  - id: finance-C-083
    when: When executing orthogonalization batch processing
    action: Use psycopg2.extras.execute_batch() with parameterized queries to safely insert DataFrame rows into the database
    severity: high
    kind: resource_boundary
    modality: must
    consequence: String formatting SQL queries exposes the system to SQL injection attacks and will fail when factor names
      contain special characters like dashes or underscores
    stage_ids:
    - factor_orthogonalization
  - id: finance-C-084
    when: When requiring data availability for orthogonalization
    action: Verify carbon_risk_factor, ff_factor, and additional_factors tables contain overlapping date ranges for the selected
      frequency
    severity: high
    kind: resource_boundary
    modality: must
    consequence: If date ranges do not overlap, the inner merge will produce an empty DataFrame, the regression will fail,
      and no orthogonalized factor will be generated
    stage_ids:
    - factor_orthogonalization
  - id: finance-C-085
    when: When processing orthogonalization in batch mode with --each flag
    action: Verify that the database connection pool is properly initialized before accessing it in process_factor()
    severity: high
    kind: resource_boundary
    modality: must
    consequence: Accessing uninitialized connection pool will raise an AttributeError, causing the batch orthogonalization
      script to fail entirely
    stage_ids:
    - factor_orthogonalization
  - id: finance-C-086
    when: When interpreting orthogonalization results
    action: Claim that orthogonalized factor returns equal or predict live trading performance
    severity: high
    kind: claim_boundary
    modality: must_not
    consequence: Backtested orthogonalization results do not guarantee future live trading returns. Orthogonalization removes
      historical correlation structure but market dynamics may change, invalidating the factor's predictive power in forward-looking
      trading
    stage_ids:
    - factor_orthogonalization
  - id: finance-C-087
    when: When using the orthogonalization system
    action: Present orthogonalization results as investment advice or specific security recommendations
    severity: high
    kind: claim_boundary
    modality: must_not
    consequence: The system is for informational and research purposes only. Orthogonalization is a mathematical transformation
      that removes statistical correlation—it does not constitute financial advice about particular securities or investment
      strategies
    stage_ids:
    - factor_orthogonalization
  - id: finance-C-088
    when: When running orthogonalization with the --each flag
    action: Process factors that are already orthogonalized (ending with -ORTHO) to avoid double transformation
    severity: high
    kind: operational_lesson
    modality: must_not
    consequence: Processing already-orthogonalized factors produces doubly-transformed residuals that have degraded statistical
      properties and unpredictable behavior in subsequent regressions
    stage_ids:
    - factor_orthogonalization
  - id: finance-C-089
    when: When configuring the significance threshold for orthogonalization
    action: Use a threshold of 0.1 (10%) as the default p-value cutoff for including factors in orthogonalization
    severity: medium
    kind: operational_lesson
    modality: should
    consequence: Using an overly strict threshold (e.g., 0.05) may exclude factors that have meaningful correlation with the
      BMG factor, leaving residual correlation that contaminates downstream analysis
    stage_ids:
    - factor_orthogonalization
  - id: finance-C-090
    when: When verifying orthogonalization completeness
    action: Confirm the correlation between the orthogonalized factor and each FF factor is below the significance threshold
      after orthogonalization
    severity: high
    kind: domain_rule
    modality: must
    consequence: If orthogonalized factor still shows significant correlation with any FF factor, the orthogonalization was
      incomplete, leading to factor contamination in subsequent regression analysis
    stage_ids:
    - factor_orthogonalization
  - id: finance-C-093
    when: When retrieving the most recent regression results
    action: Query max(thru_date) within each (ticker, bmg_factor_name) group to identify final period results
    severity: high
    kind: domain_rule
    modality: must
    consequence: Without max(thru_date) filtering, results may include stale regression periods causing outdated analysis
      and incorrect conclusions
    stage_ids:
    - results_analysis
  - id: finance-C-094
    when: When joining stock_stats with stocks table
    action: Use LEFT JOIN for stocks table to preserve stocks without sector assignments
    severity: high
    kind: domain_rule
    modality: must
    consequence: Using INNER JOIN would exclude stocks with missing sector information, causing incomplete sector aggregation
      and missing climate risk identification
    stage_ids:
    - results_analysis
  - id: finance-C-095
    when: When specifying BMG factor name for analysis
    action: Use 'DEFAULT' as a factor_name parameter value
    severity: high
    kind: resource_boundary
    modality: must_not
    consequence: DEFAULT is a reserved factor name in the carbon_risk_factor table, passing it as a filter parameter would
      return no results or unintended data
    stage_ids:
    - results_analysis
  - id: finance-C-096
    when: When accessing regression results from the database
    action: REFRESH MATERIALIZED VIEW before querying to verify latest results are available
    severity: medium
    kind: resource_boundary
    modality: must
    consequence: Without refreshing materialized views, query results may be stale, showing outdated regression periods instead
      of current analysis
    stage_ids:
    - results_analysis
  - id: finance-C-097
    when: When analyzing sector-level climate risk exposure
    action: Group by both sector and bmg_factor_name to avoid mixing different climate factors
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Without proper grouping by factor_name, sector aggregations would mix results from different BMG series,
      causing misleading climate risk assessments
    stage_ids:
    - results_analysis
  - id: finance-C-098
    when: When aggregating stock-level results to sector level
    action: Count distinct tickers per sector, not raw rows, to avoid double-counting
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Without distinct ticker counting, stocks appearing in multiple regression periods would be counted multiple
      times, inflating sector significance numbers
    stage_ids:
    - results_analysis
  - id: finance-C-099
    when: When accepting p-value threshold input from users
    action: Use default significance threshold of 0.05 (5% level) when no threshold specified
    severity: medium
    kind: operational_lesson
    modality: should
    consequence: Without a reasonable default, users might accidentally filter results with threshold 0 (no significant results)
      or fail to specify threshold at all
    stage_ids:
    - results_analysis
  - id: finance-C-100
    when: When calculating ratio of significant stocks to index stocks by sector
    action: Handle missing sector counts by returning None or zero ratio, not causing division errors
    severity: medium
    kind: operational_lesson
    modality: must
    consequence: Division by sector count of zero would crash the analysis output, preventing users from identifying climate
      risk patterns
    stage_ids:
    - results_analysis
  - id: finance-C-103
    when: When interpreting BMG coefficient significance results
    action: Consider p-value significance only within the context of model assumption diagnostics (Jarque-Bera, Breusch-Pagan,
      Durbin-Watson)
    severity: high
    kind: claim_boundary
    modality: must
    consequence: Ignoring model diagnostics and treating p-value alone as proof of climate risk could identify spurious correlations
      due to non-normal residuals or heteroskedasticity
    stage_ids:
    - results_analysis
  - id: finance-C-104
    when: When evaluating statistical significance of BMG coefficients
    action: Skip validation of Jarque-Bera p-value > 0.05 confirming Gaussian residuals
    severity: high
    kind: domain_rule
    modality: must_not
    consequence: OLS regression assumes Gaussian-distributed residuals; violating this assumption makes p-values unreliable
      and significance conclusions invalid
    stage_ids:
    - results_analysis
  - id: finance-C-105
    when: When comparing climate risk results across different BMG factor series
    action: Separate analysis by bmg_factor_name to avoid mixing orthogonalized and non-orthogonalized climate factors
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Mixing results from different BMG series (e.g., XOP-SMOG vs XOP-SMOG_HML_HYGIEI) would produce inconsistent
      and incomparable climate risk assessments
    stage_ids:
    - results_analysis
  - id: finance-C-106
    when: When interpreting sector-level climate risk aggregation
    action: Conclude that each stocks in a sector have the same climate risk profile based on aggregate counts
    severity: medium
    kind: claim_boundary
    modality: must_not
    consequence: Sector aggregation shows distribution of significant exposures, not uniform risk; individual stock-level
      analysis is required for portfolio construction
    stage_ids:
    - results_analysis
  - id: finance-C-133
    when: When handling composite stock tickers across each stages
    action: Populate stock_components table with component_stock and percentage weights before processing composite tickers
    severity: high
    kind: domain_rule
    modality: must
    consequence: Missing stock_components entries cause division by zero or NaN composite returns, breaking portfolio-level
      BMG climate risk analysis
  - id: finance-C-135
    when: When processing stock return data across each stages
    action: Filter out and exclude returns exceeding 100% (return > 1) as abnormal data
    severity: high
    kind: domain_rule
    modality: must
    consequence: Abnormal returns from data errors distort regression coefficients, leading to incorrect BMG factor loadings
      and misguided climate risk assessments
  - id: finance-C-136
    when: When running OLS regression across each stages
    action: Cap regression input values at [-0.5, 0.5] range to filter extreme outliers before estimation
    severity: high
    kind: domain_rule
    modality: must
    consequence: Extreme outliers in returns corrupt OLS estimation, producing unreliable coefficients and invalid statistical
      tests (Jarque-Bera, Breusch-Pagan, Durbin-Watson)
  - id: finance-C-137
    when: When presenting or reporting this system's regression results to users
    action: Claim that backtested factor loadings equal expected live trading returns — regression analysis ignores transaction
      costs, slippage, market impact, and execution delays
    severity: high
    kind: claim_boundary
    modality: must_not
    consequence: Users allocate live capital based on inflated backtest returns, leading to severe underperformance in live
      trading and potential financial loss
  - id: finance-C-138
    when: When marketing or describing this system's capabilities
    action: Claim real-time trading support — this is a historical factor analysis and regression system, NOT a live trading
      execution platform
    severity: high
    kind: claim_boundary
    modality: must_not
    consequence: Users purchase or implement this system expecting live trading capabilities that do not exist, leading to
      implementation failures and missed investment opportunities
  - id: finance-C-139
    when: When applying this system's models to non-equity asset classes
    action: Claim BMG climate risk factor applicability to bonds, commodities, or other non-equity assets — the Fama-French-Carhart
      model is designed for equity analysis
    severity: high
    kind: claim_boundary
    modality: must_not
    consequence: Invalid factor model application to non-equity assets produces meaningless risk loadings, leading to incorrect
      portfolio construction and financial losses
  - id: finance-C-140
    when: When presenting statistical significance results
    action: Claim predictive certainty based on p-values < 0.05 — statistical significance in historical regression does not
      guarantee future factor effectiveness
    severity: high
    kind: claim_boundary
    modality: must_not
    consequence: Overconfidence in statistical significance leads to over-trading, concentrated positions, and losses when
      market regimes change
  - id: finance-C-141
    when: When using yfinance for stock data retrieval
    action: Acknowledge that yfinance provides delayed data (approximately 15 minutes for US stocks, longer for international)
      — it is NOT real-time market data
    severity: high
    kind: resource_boundary
    modality: must
    consequence: Backtested returns computed with delayed data differ from real-time prices, causing look-ahead bias and overestimated
      strategy performance
  - id: finance-C-142
    when: When depending on external factor data providers
    action: Verify Fama-French factor data availability from Ken French Data Library — external factors have coverage limitations
      and may not extend to current dates
    severity: high
    kind: resource_boundary
    modality: must
    consequence: Missing factor data blocks regression execution, leaving portfolios without updated climate risk loadings
      during critical market periods
  - id: finance-C-144
    when: When computing BMG factor loadings for stocks
    action: Use rolling window intervals of at least 60 months for MONTHLY frequency or 730 days for DAILY frequency to verify
      statistical validity
    severity: medium
    kind: resource_boundary
    modality: must
    consequence: Shorter intervals produce statistically unreliable factor loadings due to insufficient sample size, leading
      to incorrect climate risk rankings
  - id: finance-C-145
    when: When updating stock or factor data in the database
    action: Refresh materialized views (stock_and_stats, stock_component_and_stats, stock_parent_and_stats) after data modifications
    severity: high
    kind: operational_lesson
    modality: must
    consequence: Stale materialized views show outdated regression results, causing users to make portfolio decisions based
      on old climate risk assessments
  - id: finance-C-146
    when: When loading international stock tickers
    action: Use correct exchange suffix format for non-US tickers (e.g., .L for London, .TO for Toronto, .AS for Amsterdam,
      .SW for Switzerland)
    severity: medium
    kind: operational_lesson
    modality: must
    consequence: Incorrect ticker format causes data retrieval failures, blocking climate risk analysis for international
      equities
  - id: finance-C-147
    when: When accessing data across each analysis stages
    action: Access each stock data through database load functions (load_stocks_from_db, load_stocks_data_with_returns_from_db)
      to verify consistent data format and caching
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Direct data access bypasses caching and format conversion, causing inconsistent regression results across
      different analysis runs
  - id: finance-C-149
    when: When accessing database resources
    action: Use database connection pool for concurrent operations — never create new connections in parallel processing loops
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Unmanaged connection creation exhausts database connections, causing connection failures and blocking all
      analysis operations
  - id: finance-C-150
    when: When maintaining data integrity across database operations
    action: Use autocommit mode for read-only database operations to avoid holding locks on materialized views
    severity: medium
    kind: architecture_guardrail
    modality: must
    consequence: Transaction locks on materialized views block concurrent analysis requests, causing analysis delays and timeout
      failures
  - id: finance-C-151
    when: When collecting monthly interval data using yfinance
    action: Verify that the 1mo yfinance interval provides MonthEnd-aligned data for the specific stock universe being analyzed;
      validate that returned dates match known MonthEnd dates (e.g., 2023-01-31, 2023-02-28) and not arbitrary calendar month
      boundaries
    severity: medium
    kind: operational_lesson
    modality: should
    consequence: Using 1mo interval assumes proper MonthEnd alignment, but yfinance may return non-aligned dates causing monthly
      return calculations to shift by days, leading to systematic misalignment between backtest returns and actual financial
      reporting periods
    derived_from_bd_id: BD-001
  - id: finance-C-152
    when: When implementing multiprocessing for bulk regression processing
    action: Use multiprocessing.set_start_method('spawn') to create fresh processes with inherited state; must NOT use 'fork'
      method as it causes psycopg2 connection corruption on Linux and crashes on macOS/Windows
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Forking processes with psycopg2 connections causes connection sharing corruption, leading to 'connection
      already closed' errors or data corruption in database operations during bulk regression batches
    derived_from_bd_id: BD-022
  - id: finance-C-153
    when: When implementing database materialized view refresh strategy for batch regression workloads
    action: Call refresh_views only after each regression batch completes to batch materialized view updates into single operations;
      must NOT refresh during regression processing as concurrent queries may see partially-updated views
    severity: high
    kind: domain_rule
    modality: must
    consequence: Refreshing views during regression processing creates a window where concurrent queries receive partially-updated
      data, causing inconsistent factor values and potentially generating incorrect regression coefficients that differ between
      runs
    derived_from_bd_id: BD-093
  - id: finance-C-154
    when: When validating factor_name input for BMG series insertion
    action: Reject or sanitize any user-provided factor_name equal to 'DEFAULT' at insert time; enforce schema constraint
      that 'DEFAULT' is reserved for system use and cannot be user-defined
    severity: high
    kind: domain_rule
    modality: must
    consequence: Allowing 'DEFAULT' as a user-defined factor_name creates ambiguity in queries that rely on the schema default
      value, causing queries to return unexpected mixed results instead of the intended default factor
    derived_from_bd_id: BD-088
  - id: finance-C-155
    when: When implementing factor data storage or modifying database schema in the data_import stage
    action: Store WML (weak momentum) in the same ff_factor table as other Fama-French factors — do not separate into a different
      table or external storage
    severity: high
    kind: domain_rule
    modality: must
    consequence: Separating WML into a different table breaks existing JOIN queries that assume unified factor storage, causing
      factor data access failures and requiring schema refactoring across multiple modules
    derived_from_bd_id: BD-007
  - id: finance-C-156
    when: When configuring database connection pool for bulk_regression workers
    action: Set connection pool size to exactly 20 to support up to 20 concurrent regression workers — do not exceed PostgreSQL
      connection limits or reduce below this threshold
    severity: high
    kind: domain_rule
    modality: must
    consequence: Connection pool smaller than 20 causes worker queuing and dramatically slows bulk regression; pool larger
      than 20 exceeds PostgreSQL connection limits and causes database connection failures
    derived_from_bd_id: BD-023
  - id: finance-C-157
    when: When using the framework's default connection pool size parameter for bulk regression processing
    action: Verify that connection pool size = 20 matches the available PostgreSQL connection limits and actual worker count
      requirements; adjust based on system resources if needed
    severity: medium
    kind: operational_lesson
    modality: should
    consequence: Hardcoded pool size of 20 may exhaust connections on systems with limited PostgreSQL allocation or cause
      insufficient parallelism when running more than 20 workers concurrently
    derived_from_bd_id: BD-023
  - id: finance-C-158
    when: When defining index benchmarks for factor model estimation in the Index Definition stage
    action: Use IVV for S&P 500 US market proxy and XWD.TO for MSCI World global market proxy — do not substitute with SPY,
      EEM, or other ETFs without re-evaluating factor model representativeness
    severity: high
    kind: domain_rule
    modality: must
    consequence: Using SPY instead of IVV increases expense ratio costs; using EEM changes the benchmark from developed markets
      to emerging markets, distorting global factor exposure estimates
    derived_from_bd_id: BD-042
  - id: finance-C-159
    when: When implementing stock data merging logic in the returns.calculation stage
    action: Use OUTER JOIN when merging component stock data to preserve each observations from each stock — do not use INNER
      JOIN as it loses data when trading dates misalign
    severity: high
    kind: domain_rule
    modality: must
    consequence: Inner join silently drops observations where dates don't align, reducing sample size unpredictably and causing
      survivorship bias in factor calculations
    derived_from_bd_id: BD-070
  - id: finance-C-160
    when: When computing BMG factor values in the bmg_factor_computation stage
    action: Remove NaN values before storing BMG — inner join on returns to verify valid paired brown and green observations;
      do not forward-fill or leave NaN values
    severity: high
    kind: domain_rule
    modality: must
    consequence: NaN values in BMG series corrupt factor time series and cause regression analysis to fail or produce invalid
      coefficients due to missing observation pairs
    derived_from_bd_id: BD-012
  - id: finance-C-161
    when: When using the framework's default abnormal return filter for data validation in the invariant stage
    action: Verify that the threshold data['r'] <= 1 correctly identifies data errors versus valid extreme returns for your
      asset universe; manually review returns above 100% before inclusion
    severity: medium
    kind: operational_lesson
    modality: should
    consequence: The 100% threshold may filter legitimate intraday spikes or overnight gap returns in volatile markets, removing
      valid observations that could significantly affect factor calculations
    derived_from_bd_id: BD-091
  - id: finance-C-162
    when: When implementing rolling window logic for MONTHLY factor estimation
    action: Use exactly 60 months (5 years) as the default rolling window for MONTHLY frequency — this window provides sufficient
      data points for statistical significance while remaining responsive to regime changes
    severity: high
    kind: domain_rule
    modality: must
    consequence: Windows shorter than 60 months for MONTHLY frequency have insufficient data points for statistically significant
      factor detection; windows longer than 60 months introduce regime shift contamination
    derived_from_bd_id: BD-019
  - id: finance-C-163
    when: When implementing factor orthogonalization logic in the factor_orthogonalization stage
    action: Only orthogonalize against factors with p-value below 0.1 — factors with higher p-values are due to chance correlation
      and should be retained as-is to avoid destroying valid signal
    severity: high
    kind: domain_rule
    modality: must
    consequence: Orthogonalizing factors with non-significant correlation removes useful information from the model, causing
      factor exposure estimates to be systematically biased and backtest returns to be overstated
    derived_from_bd_id: BD-025
  - id: finance-C-164
    when: When implementing or modifying statistical significance testing in carbon risk regression analysis
    action: Define statistical significance using the two-tailed p-value condition bmg_p_gt_abs_t < threshold (typically 0.05),
      where bmg_p is the p-value for the BMG coefficient and abs_t is the absolute value of the t-statistic
    severity: high
    kind: domain_rule
    modality: must
    consequence: Using one-tailed test or different p-value formulations produces incorrect significance decisions, causing
      strategies to incorrectly accept or reject carbon risk factors based on flawed statistical inference
    derived_from_bd_id: BD-028
  - id: finance-C-166
    when: When implementing date range validation for factor regression input data
    action: 'Enforce strict inequality validation: start_date < max_data AND end_date > min_data, ensuring at least one data
      point exists on each side of the requested window; raise DateInRangeError if violated instead of silent truncation'
    severity: high
    kind: domain_rule
    modality: must
    consequence: Omitting the bidirectional date range check allows regressions to run with truncated input data, producing
      biased coefficients and misleading performance attribution that leads to strategies with incorrect risk factor loadings
    derived_from_bd_id: BD-084
  - id: finance-C-167
    when: When applying winsorization to factor regression input data
    action: Apply winsorization threshold of |value| < 0.5 to remove extreme returns exceeding 50x monthly return as likely
      data errors (stock splits, delistings); verify the threshold matches actual data error patterns and adjust if needed
    severity: medium
    kind: operational_lesson
    modality: should
    consequence: Using a different winsorization threshold or disabling it entirely causes outliers to dominate regression
      coefficients, leading to strategies that overweight stocks with data errors and underweight legitimate high-volatility
      periods
    derived_from_bd_id: BD-015
  - id: finance-C-168
    when: When implementing multiprocessing for production database operations
    action: Maintain the interaction contract between BD-022 (spawn start method), BD-090 (connection get/put pairing), and
      BD-024 (data pre-loading via itertools.repeat); do not change spawn to fork, do not reorder data pre-loading relative
      to worker spawning, and preserve connection pairing discipline
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Breaking the spawn/get-put/preload interaction causes connection pool corruption or N×memory consumption
      from deep-copying DataFrames to each worker, leading to database connection errors or memory exhaustion in production
    derived_from_bd_id: BD-100
  - id: finance-C-169
    when: When implementing OLS regression for factor analysis
    action: Enforce minimum 20 data points for valid OLS regression; reject or warn when fewer observations are available
      as standard errors become unreliable, t-statistics lose validity, and confidence intervals widen unpredictably
    severity: high
    kind: domain_rule
    modality: must
    consequence: Running regression with fewer than 20 data points produces unreliable standard errors and invalid t-statistics,
      causing strategies to be selected based on statistically meaningless regressions that appear valid
    derived_from_bd_id: BD-083
  - id: finance-C-170
    when: When processing data in production data collection pipeline
    action: Assume the framework handles stale data detection and automatic expiry — this capability is confirmed absent;
      the framework does not validate data freshness or expire outdated records
    severity: high
    kind: claim_boundary
    modality: must_not
    consequence: Using stale data in regression analysis produces incorrect coefficients based on outdated market conditions,
      leading to strategies with wrong factor loadings and significant capital losses in live trading
    derived_from_bd_id: BD-GAP-003
  - id: finance-C-171
    when: When implementing data collection pipeline for factor regression
    action: 'Implement timestamp validation for each data record: compare record timestamp against current time, flag records
      exceeding configured expiry_threshold (e.g., 30 days for daily data), and prevent processing of stale data without explicit
      user acknowledgment via a data_freshness_warning parameter'
    severity: high
    kind: operational_lesson
    modality: must
    consequence: Without stale data detection, regression analysis uses outdated market data leading to wrong factor coefficients
      and strategies that fail to adapt to current market conditions
    derived_from_bd_id: BD-GAP-003
  - id: finance-C-172
    when: When implementing or modifying any code involving random number generation
    action: Assume each random number generators have reproducible seeds by default — random seed coverage is confirmed incomplete;
      numpy.random, random module, and other RNG calls may lack explicit seed configuration
    severity: high
    kind: claim_boundary
    modality: must_not
    consequence: Using non-reproducible RNG seeds causes backtest results to vary between runs, making it impossible to reproduce
      strategy performance or verify code changes against established benchmarks
    derived_from_bd_id: BD-GAP-006
  - id: finance-C-173
    when: When implementing any code involving random number generation for factor analysis
    action: Audit each numpy.random, random, and other RNG calls in the codebase; verify each has an explicit seed parameter
      or inherits from a global RNG with a configurable seed; call set_random_seed(global_seed) at initialization and document
      seed requirements in code comments
    severity: high
    kind: operational_lesson
    modality: must
    consequence: Missing random seed coverage causes backtest results to be non-reproducible, preventing verification of strategy
      changes and making it impossible to distinguish code changes from random variation in results
    derived_from_bd_id: BD-GAP-006
  - id: finance-C-175
    when: When calculating composite index fund returns (IVV, XWD.TO)
    action: Use weighted average of components based on index constituent weights, not equal weighting; weighted averaging
      accurately represents actual index composition while equal weighting over-weights small-cap stocks
    severity: high
    kind: domain_rule
    modality: must
    consequence: Using equal weighting instead of weighted averaging over-weights small-cap stocks that comprise small portions
      of indices, causing composite returns to diverge from actual index performance and distorting backtest results
    derived_from_bd_id: BD-011
  - id: finance-C-176
    when: When performing factor orthogonalization using OLS regression
    action: Validate homoscedasticity and absence of autocorrelation in factor residuals before applying OLS; if these assumptions
      are violated, use Generalized Least Squares (GLS) to handle heteroscedastic errors
    severity: medium
    kind: operational_lesson
    modality: should
    consequence: OLS assumes homoscedastic errors and no autocorrelation; violating these assumptions produces biased factor
      decompositions that cause the orthogonalization to attribute variance incorrectly between market and idiosyncratic components
    derived_from_bd_id: BD-063
  - id: finance-C-178
    when: When loading data for parallel bulk regression processing
    action: Load each data before spawning worker processes and pass pre-loaded DataFrames via reference (e.g., itertools.repeat);
      do not implement per-worker data loading as it creates race conditions if underlying data changes between worker start
      times
    severity: high
    kind: operational_lesson
    modality: must
    consequence: Per-worker data loading introduces race conditions where different workers may process inconsistent snapshots
      of data if the dataset changes between worker initialization, causing inconsistent regression results across parallel
      windows
    derived_from_bd_id: BD-024
  - id: finance-C-179
    when: When determining factor significance using rolling window analysis
    action: Apply majority rule threshold (>50% of non-overlapping periods significant) for factor significance determination;
      do not use stricter 100% threshold (every period significant) or lenient single-period threshold
    severity: high
    kind: operational_lesson
    modality: must
    consequence: Using incorrect significance thresholds produces false positives (single-period) or false negatives (100%
      rule); majority rule guards against finding statistical significance by chance in small numbers of non-overlapping windows
      while remaining robust to power limitations
    derived_from_bd_id: BD-029
  - id: finance-C-180
    when: When implementing factor normalization across the framework
    action: Centralize normalization logic into a single, documented normalization function that applies both FF factors divided
      by 100 and excess returns computed as Close minus risk-free rate; avoid distributed normalization code that applies
      these operations inconsistently across different modules
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Distributed normalization decisions (BD-017, BD-041, BD-081) create latent contradiction risk where future
      modifications may apply division/subtraction incorrectly, masking errors across multiple overlapping decisions and causing
      silent factor mis-specification
    derived_from_bd_id: BD-099
  - id: finance-C-182
    when: When implementing data merging logic for factor regression
    action: Use inner join to merge DataFrames by date — only periods with each required data should be included in regression
      analysis
    severity: high
    kind: domain_rule
    modality: must
    consequence: Using outer join introduces NaN values that cause regression failures or produce biased coefficient estimates,
      making factor analysis unreliable and non-reproducible
    derived_from_bd_id: BD-013
  - id: finance-C-183
    when: When calculating returns for factor regression input
    action: Calculate excess returns as Close minus risk-free rate (Rf) to isolate the risk premium component that factors
      aim to explain
    severity: high
    kind: domain_rule
    modality: must
    consequence: Using raw returns violates standard CAPM/FF financial econometrics methodology, producing meaningless factor
      loadings that cannot be compared to academic literature or used for portfolio construction
    derived_from_bd_id: BD-014
  - id: finance-C-184
    when: When storing orthogonalized factors in database or analysis outputs
    action: Append -ORTHO suffix to orthogonalized factor names to distinguish processed factors from raw factors in database
      queries and analysis scripts
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Without the -ORTHO suffix, raw and orthogonalized factors become indistinguishable in database queries, causing
      data misuse where raw factors are accidentally used where orthogonalized ones are required
    derived_from_bd_id: BD-026
  - id: finance-C-185
    when: When implementing variable selection logic for factor regression
    action: Use p-value 0.1 as the significance threshold for variable inclusion — factors with p-value below 0.1 should be
      included in the model
    severity: medium
    kind: operational_lesson
    modality: should
    consequence: Using stricter 0.05 threshold may exclude economically meaningful factors, while 0.2 looser threshold may
      introduce overfitting; incorrect threshold selection leads to either underfitting or overfitting
    derived_from_bd_id: BD-064
  - id: finance-C-186
    when: When implementing factor orthogonalization for multi-factor models
    action: Use OLS residuals as orthogonalized factor values — regress each factor against market and use residuals to capture
      the component orthogonal to market
    severity: high
    kind: domain_rule
    modality: must
    consequence: OLS orthogonalization assumes linear relationship between factors and market; non-linear patterns are captured
      in residuals, enabling multi-factor models without multicollinearity — alternative methods (Schmidt, Gram-Schmidt) may
      produce different results
    derived_from_bd_id: BD-066
  - id: finance-C-187
    when: When implementing sliding window regression loop termination logic
    action: Check that data_end_date <= end_date as the loop termination condition — return (None, False) when data extends
      beyond requested range to prevent silent failures
    severity: high
    kind: domain_rule
    modality: must
    consequence: Without this termination check, the loop produces incomplete or misleading regression results when data extends
      beyond requested range, causing silent data loss at boundaries
    derived_from_bd_id: BD-082
  - id: finance-C-188
    when: When implementing parallel OLS computations using multiprocessing
    action: Use spawn method with explicit data duplication via itertools.repeat for each worker — pass carbon_data, ff_data,
      and rf_data as independent copies to prevent shared-state corruption
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Using fork method with shared memory causes race conditions in pandas operations, producing non-reproducible
      results across runs and potentially incorrect factor coefficients
    derived_from_bd_id: BD-086
  - id: finance-C-189
    when: When implementing outlier filtering for returns data before regression
    action: Clip returns to [-0.5, 0.5] (±50% single-period returns) to exclude data entry errors and survivorship bias artifacts
      while capturing legitimate extreme events
    severity: high
    kind: domain_rule
    modality: must
    consequence: Without proper outlier bounds, extreme returns disproportionately influence OLS coefficients, which are sensitive
      to leverage points; standard-deviation based bounds are rejected due to sensitivity to the very outliers they would
      filter
    derived_from_bd_id: BD-087
  - id: finance-C-190
    when: When processing datetime fields from multi-timezone data sources (US, EU, Asia markets)
    action: Assume the framework handles UTC timezone normalization for datetime fields — the framework does not implement
      timezone awareness for stock_data, carbon_risk_factor, and ff_factor tables
    severity: high
    kind: claim_boundary
    modality: must_not
    consequence: Without UTC timezone normalization, multi-timezone data ingestion produces incorrect timestamp alignment,
      causing cross-market factor regression to use mismatched dates and produce invalid results
    derived_from_bd_id: BD-GAP-011
  - id: finance-C-191
    when: When processing datetime fields from multi-timezone data sources
    action: Add explicit timezone conversion ensuring each datetime fields are normalized to UTC before storage; add tzinfo
      awareness to stock_data, carbon_risk_factor, and ff_factor tables using standard library timezone utilities
    severity: high
    kind: domain_rule
    modality: must
    consequence: Multi-timezone data ingestion without explicit timezone handling causes timestamp misalignment across markets,
      making factor regression results unreliable and non-reproducible
    derived_from_bd_id: BD-GAP-011
  - id: finance-C-192
    when: When implementing or refactoring stochastic operations in factor regression scripts
    action: Assume the framework sets random seeds for reproducibility — stochastic operations do not have deterministic behavior
      by default
    severity: high
    kind: claim_boundary
    modality: must_not
    consequence: Without random seed initialization, stochastic operations produce different results on each run, compromising
      research reproducibility, model comparison, and peer review
    derived_from_bd_id: BD-GAP-012
  - id: finance-C-193
    when: When implementing or refactoring stochastic operations in factor regression scripts
    action: Add random.seed(42) or equivalent seed initialization to each stochastic operations in regression_function.py
      and correlate.py; document reproducibility requirements in both modules
    severity: high
    kind: domain_rule
    modality: must
    consequence: Research results without seeded randomness cannot be reproduced, compromising model comparison and peer review
      — different runs produce different factor coefficients, making validation impossible
    derived_from_bd_id: BD-GAP-012
  - id: finance-C-194
    when: When implementing factor regression or using FF factor data in calculations
    action: Divide stored FF factor values (RF, market premium, SMB, HML, etc.) by 100 before passing to OLS or statistical
      functions — factors are stored as percentages (e.g., 5.2) but must be converted to decimals (0.052) for proper model
      estimation
    severity: high
    kind: domain_rule
    modality: must
    consequence: Using undivided percentage values in OLS regression causes coefficients to be off by a factor of 100, making
      factor loadings meaningless and alpha estimates completely incorrect
    derived_from_bd_id: BD-017
  - id: finance-C-195
    when: When selecting or switching stock price data sources in production pipelines
    action: Verify any data source alternative to yfinance provides adjusted close prices that account for dividends and splits
      — verify the source delivers total return data, not just price returns
    severity: high
    kind: domain_rule
    modality: must
    consequence: Switching to a data source that provides unadjusted close prices causes return calculations to overstate
      actual investment performance by ignoring dividend reinvestment, leading to systematic overestimation of backtested
      returns
    derived_from_bd_id: BD-039
  - id: finance-C-196
    when: When constructing value-investing factors or screening stocks by fundamentals
    action: Verify that EBITDA, enterprise value, P/E ratio, cash, debt, and shares outstanding fields are populated for each
      target stocks before executing EV/EBITDA or P/E screening — handle NULL/missing values explicitly (do not assume default
      zero)
    severity: high
    kind: domain_rule
    modality: must
    consequence: Screening for value stocks using incomplete fundamental data causes stocks with missing P/E or EV fields
      to be incorrectly included or excluded, distorting factor returns and potentially selecting overvalued stocks
    derived_from_bd_id: BD-061
  - id: finance-C-197
    when: When importing Fama-French factor files into the database
    action: Preprocess FF CSV files with grep to remove header rows before executing COPY FROM PROGRAM — the raw FF files
      contain metadata headers that would cause parsing errors or corrupt factor data if loaded directly
    severity: high
    kind: domain_rule
    modality: must
    consequence: Loading FF factor files without header removal causes PostgreSQL to interpret header strings as numeric factor
      values, corrupting all factor data and producing NaN coefficients in regression outputs
    derived_from_bd_id: BD-006
  - id: finance-C-198
    when: When implementing BMG factor calculation in factor analysis
    action: Verify BMG is calculated as brown minus green (merged['return_x'] - merged['return_y']), NOT green minus brown
    severity: high
    kind: domain_rule
    modality: must
    consequence: Reversing the subtraction order produces negative BMG returns that fundamentally invert factor direction,
      corrupting all downstream alpha calculations and portfolio exposure estimates
    derived_from_bd_id: BD-079
  - id: finance-C-199
    when: When running factor regressions without explicit frequency parameter
    action: Verify frequency matches the actual data source periodicity; be aware that MONTHLY default uses 60 months while
      DAILY uses 730 days with 12x memory footprint
    severity: medium
    kind: operational_lesson
    modality: should
    consequence: Using DAILY frequency on monthly-sourced fundamental data produces misleadingly precise estimates that don't
      reflect actual data availability, inflating apparent statistical power while degrading model validity
    derived_from_bd_id: BD-080
  - id: finance-C-200
    when: When preparing data for factor regression analysis
    action: Divide factor returns by 100 to convert percentage to decimal form AND subtract risk-free rate from Close to isolate
      excess returns
    severity: high
    kind: domain_rule
    modality: must
    consequence: Omitting the /100 normalization produces coefficients 100x larger than expected, making interpretation impossible
      and comparisons across factors misleading
    derived_from_bd_id: BD-081
  - id: finance-C-201
    when: When using Durbin-Watson test to detect autocorrelation in factor regression residuals
    action: Be aware that DW test has a 'dead zone' (1.5 to 2.5) where it cannot detect autocorrelation; use Breusch-Godfrey
      test or Newey-West HAC standard errors when DW is inconclusive
    severity: medium
    kind: operational_lesson
    modality: should
    consequence: DW values between 1.5 and 2.5 indicate inconclusive results where autocorrelation may be present but undetected,
      leading to underestimated standard errors and spurious statistical significance
    derived_from_bd_id: BD-076
  - id: finance-C-202
    when: When implementing or refactoring interval parameter handling in rolling regression logic
    action: Treat interval=0 as a request for default history length (60 months for MONTHLY, 730 days for DAILY) — do not
      interpret interval=0 as zero-length window or skip processing
    severity: high
    kind: domain_rule
    modality: must
    consequence: Interpreting interval=0 as zero produces empty windows, causing rolling regressions to return NaN or default
      to incomplete samples, making all beta estimates and BMG premium conclusions unreliable
    derived_from_bd_id: BD-092
  - id: finance-C-203
    when: When using the framework's default interval parameter for rolling window calculations
    action: Verify that interval=0 defaults align with the intended statistical sample size (60 months for monthly data, 730
      days for daily data) and document any override of these defaults
    severity: medium
    kind: operational_lesson
    modality: should
    consequence: Default interval=0 produces 5-year monthly or 2-year daily samples; changing defaults to shorter windows
      reduces statistical power while longer windows may include structural breaks, both distorting BMG premium estimates
    derived_from_bd_id: BD-092
  - id: finance-C-204
    when: When implementing outlier filtering logic in the returns processing pipeline
    action: 'Verify consistent outlier threshold application: winsorization threshold (±0.5 or ±50%) must match or be derived
      from the abnormal return filter threshold (<=1.0 or 100%) — document any intentional asymmetry'
    severity: high
    kind: domain_rule
    modality: must
    consequence: Inconsistent thresholds (±50% winsorization vs <=100% filter) cause the same return data to be treated differently
      across pipeline stages, biasing factor loadings toward moderate-return stocks and distorting BMG premium conclusions
    derived_from_bd_id: BD-095
  - id: finance-C-206
    when: When implementing return calculation logic that multiplies percentage values
    action: Use Decimal type for percentage-weighted return multiplication to prevent floating-point rounding errors — do
      not replace Decimal with float even if performance concerns are raised
    severity: high
    kind: domain_rule
    modality: must
    consequence: Using float instead of Decimal in percentage multiplication introduces rounding errors that accumulate across
      thousands of transactions, causing reported returns to differ from actual returns by basis points that compound into
      material losses in high-frequency strategies
    derived_from_bd_id: BD-078
  - id: finance-C-207
    when: When implementing data merge logic in the factor regression pipeline
    action: 'Maintain the sequential merge order: stock → carbon → ff → rf using inner joins on dates — do not reorder or
      use different join types'
    severity: high
    kind: domain_rule
    modality: must
    consequence: Reversing the merge order changes which dates are retained at each step, producing different filtered datasets
      and invalidating regression results. The invariant that all final rows contain dates present across all four datasets
      would be broken.
    derived_from_bd_id: BD-089
  - id: finance-C-208
    when: When calculating portfolio returns, equity curves, or position PnL
    action: Assume the framework automatically conserves PnL across realized and unrealized components — the framework does
      not implement PnL conservation validation
    severity: high
    kind: claim_boundary
    modality: must_not
    consequence: Without PnL conservation validation, realized and unrealized PnL components can diverge silently, causing
      equity curve misstatement and incorrect performance attribution in backtesting results
    derived_from_bd_id: BD-GAP-004
  - id: finance-C-209
    when: When implementing PnL tracking and portfolio accounting
    action: Verify total_pnl = realized_pnl + unrealized_pnl is mathematically conserved across each calculations; validate
      that closing a position correctly transfers unrealized PnL to realized PnL and maintains the invariant at every rebalance
      point
    severity: high
    kind: domain_rule
    modality: must
    consequence: PnL components that fail to reconcile indicate accounting errors; in live trading this causes position discrepancies
      and incorrect profit/loss reporting
    derived_from_bd_id: BD-GAP-004
  - id: finance-C-210
    when: When performing model training, backtesting, or evaluating strategy performance
    action: Assume the framework enforces strict temporal train/test split integrity without data leakage — the framework
      does not implement temporal split validation
    severity: high
    kind: claim_boundary
    modality: must_not
    consequence: Without temporal split integrity enforcement, future data can leak into training sets, causing look-ahead
      bias where backtest results are systematically inflated compared to live trading
    derived_from_bd_id: BD-GAP-005
  - id: finance-C-211
    when: When implementing train/test split logic for factor regression models
    action: Verify that each training samples precede test samples chronologically using split_date as the boundary; verify
      no temporal overlap exists and that the split is deterministic based on experiment_id for reproducibility
    severity: high
    kind: domain_rule
    modality: must
    consequence: Temporal data leakage causes strategies to appear profitable in backtesting but fail in live trading, leading
      to direct financial losses from deploying overfitted models
    derived_from_bd_id: BD-GAP-005
  - id: finance-C-212
    when: When running factor regressions, model experiments, or any reproducible analysis
    action: Assume the framework tracks data versions and experiment lineage automatically — the framework does not implement
      data versioning or run tracking columns
    severity: high
    kind: claim_boundary
    modality: must_not
    consequence: Without data versioning and run tracking, exact model state cannot be reproduced, making A/B testing impossible
      and invalidating scientific claims about strategy improvements
    derived_from_bd_id: BD-GAP-014
  - id: finance-C-213
    when: When defining factor tables and regression metadata schema
    action: Add data_version column to each factor table to track which input dataset was used; populate run_id and experiment_id
      columns in the regression_results table for every execution to enable traceability and A/B comparison
    severity: high
    kind: domain_rule
    modality: must
    consequence: Missing versioning metadata makes it impossible to reproduce model state, conduct valid A/B tests between
      strategy versions, or diagnose why performance changed over time
    derived_from_bd_id: BD-GAP-014
output_validator:
  assertions:
  - id: OV-01
    check_predicate: all(p in inspect.getsource(zvt.factors.algorithm.macd) for p in ['slow=26', 'fast=12', 'n=9'])
    failure_message: 'FATAL: MACD params drifted from (fast=12, slow=26, n=9) — SL-08 violation, non-reproducible signals'
    business_meaning: Standard MACD parameters are a semantic lock; drift makes results incomparable with industry-standard
      indicators and non-reproducible.
    source_ids:
    - SL-08
    - BD-036
  - id: OV-02
    check_predicate: result.get('total_trades', 0) > 0 or result.get('explicit_zero_trade_ack') is True
    failure_message: Zero trades executed — likely missing pre-fetched data (see PC-02) or over-restrictive filters
    business_meaning: A backtest with zero trades is not a valid result; either data is missing or the strategy never triggered.
      Structural non-emptiness check is insufficient — we need business confirmation.
    source_ids:
    - SL-01
    - finance-C-073
  - id: OV-03
    check_predicate: result.get('annual_return') is None or abs(float(result['annual_return'])) <= 5.0
    failure_message: 'FATAL: |annual_return| > 500% — likely look-ahead bias or data error'
    business_meaning: Annual returns exceeding 500% are physically implausible for A-share strategies; indicates look-ahead
      bias or corrupt data.
    source_ids: []
  - id: OV-04
    check_predicate: result.get('holding_change_pct') is None or abs(float(result['holding_change_pct'])) <= 1.0
    failure_message: 'FATAL: |holding_change_pct| > 100% — physically impossible'
    business_meaning: Holding change percentage cannot exceed 100%; violation indicates position accounting error.
    source_ids:
    - BD-029
  - id: OV-05
    check_predicate: result.get('max_drawdown') is None or abs(float(result['max_drawdown'])) <= 1.0
    failure_message: 'FATAL: |max_drawdown| > 100% — impossible for non-leveraged account'
    business_meaning: Maximum drawdown cannot exceed 100% without leverage; violation indicates calculation error or look-ahead
      bias.
    source_ids: []
  - id: OV-06
    check_predicate: not (hasattr(result, 'trade_log') and result.trade_log and any(result.trade_log[i].action == 'sell' and
      i+1 < len(result.trade_log) and result.trade_log[i+1].action == 'buy' and result.trade_log[i].timestamp == result.trade_log[i+1].timestamp
      for i in range(len(result.trade_log)-1)))
    failure_message: 'FATAL: buy-before-sell detected in same cycle — SL-01 violation, creates implicit leverage'
    business_meaning: SL-01 requires sell() before buy() in each cycle; violation means available_long was not updated before
      buying, risking duplicate positions.
    source_ids:
    - SL-01
  scaffold:
    validate_py_path: '{workspace}/validate.py'
    tail_block: "# === DO NOT MODIFY BELOW THIS LINE ===\nif __name__ == \"__main__\":\n    result = run_backtest()\n    from\
      \ validate import enforce_validation\n    enforce_validation(result, output_path=\"{workspace}/result.csv\")\n# ===\
      \ END DO NOT MODIFY ==="
  enforcement_protocol: 1. Never edit validate.py. 2. Never delete the DO NOT MODIFY tail block from the main script. 3. Never
    wrap enforce_validation() in try/except. 4. Never rewrite result write logic — it MUST go through enforce_validation.
    5. If validate.py raises ImportError, fix the dependency, do not remove the call.
acceptance:
  hard_gates:
  - id: G1
    check: '{workspace}/result.csv exists AND file size > 0'
    on_fail: Strategy did not produce output; check run_backtest() return value and enforce_validation() call
  - id: G2
    check: '{workspace}/result.csv.validation_passed marker file exists'
    on_fail: Validation did not complete; review validate.py output and fix assertion failures
  - id: G3
    check: 'Main script contains literal: from validate import enforce_validation'
    on_fail: Validation chain stripped; re-add the import in the DO NOT MODIFY block
  - id: G4
    check: 'Main script contains literal: # === DO NOT MODIFY BELOW THIS LINE ==='
    on_fail: Validation fence removed; regenerate DO NOT MODIFY tail block
  - id: G5
    check: 'result.csv has at least 1 row: pandas.read_csv(result.csv).shape[0] >= 1'
    on_fail: Empty result; check if trade_log is non-empty and factors generated signals. Confirm PC-02 (k-data exists) passed.
  - id: G6
    check: 'If MACD strategy: source contains ''slow=26'' AND ''fast=12'' AND ''n=9'' in algorithm call'
    on_fail: MACD params drifted from SL-08 lock; restore standard (12, 26, 9)
  - id: G7
    check: 'For data pipeline tasks: result.csv contains ''entity_id'' and ''timestamp'' fields'
    on_fail: Missing required columns; check Mixin.query_data return schema and DataFrame MultiIndex reset_index() before
      writing
  - id: G8
    check: 'OV-03 passes: abs(annual_return) <= 5.0 (500%)'
    on_fail: Physical plausibility check failed; investigate look-ahead bias or data corruption in input kdata
  soft_gates:
  - id: SG-01
    rubric: 'Strategy narrative consistency: user intent aligns with generated strategy.py logic. dim_a: signal direction
      (buy/sell) matches intent [1-5, pass>=4]; dim_b: frequency (daily/intraday) aligns [1-5, pass>=4]; dim_c: risk controls
      match user intent [1-5, pass>=4].'
  - id: SG-02
    rubric: 'Factor combination quality. dim_a: no highly correlated factor duplication [1-5, pass>=4]; dim_b: multi-period
      alignment correct [1-5, pass>=4]; dim_c: liquidity filter present for A-share [1-5, pass>=4].'
  - id: SG-03
    rubric: 'Data source selection appropriateness. dim_a: coverage sufficient for target entities [1-5, pass>=4]; dim_b:
      provider latency acceptable for strategy frequency [1-5, pass>=4]; dim_c: no unauthorized provider used without credentials
      [1-5, pass>=4].'
skill_crystallization:
  trigger: all_hard_gates_passed AND user_opt_out_skill_saving != true
  output_path_template: '{workspace}/../skills/{slug}.skill'
  slug_template: '{blueprint_id_short}-{uc_id_lower}'
  captured_fields:
  - name
  - intent_keywords
  - entry_point_script
  - validate_script
  - fatal_constraints
  - spec_locks
  - preconditions
  - install_recipes
  - human_summary_translated
  action: 'After all Hard Gates PASS, resolve slug via slug_template using the executed UC, then write the .skill YAML file
    at output_path_template. Notify user in their detected locale: ''Skill saved as {slug}.skill — next time say one of {sample_triggers}
    from the matched UC to invoke directly.'''
  violation_signal: All hard gates passed but no .skill file exists at expected path
  skill_file_schema:
    name: finance-bp-105 / Sector Stock Count and Significant Factor Regression Analyzer
    version: v5.3
    intent_keywords:
    - sector composition
    - significant regression
    - p-value screening
    - stock sectors
    - factor analysis
    entry_point: run_backtest
    fatal_guards:
    - SL-01
    - SL-02
    - SL-03
    - SL-04
    - SL-05
    - SL-06
    - SL-07
    - SL-08
    - SL-10
    - SL-11
    - SL-12
    spec_locks:
    - SL-01
    - SL-02
    - SL-03
    - SL-04
    - SL-05
    - SL-06
    - SL-07
    - SL-08
    - SL-09
    - SL-10
    - SL-11
    - SL-12
    preconditions:
    - PC-01
    - PC-02
    - PC-03
    - PC-04
post_install_notice:
  trigger: skill_installation_complete
  message_template:
    positioning: I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow.
    capability_catalog:
      group_strategy:
        source: auto_grouped
        strategy_reason: auto-grouped by UC.type (4 distinct values, balanced distribution)
      groups:
      - group_id: screening
        name: Screening
        description: ''
        emoji: 📦
        uc_count: 1
        ucs:
        - uc_id: UC-101
          name: Sector Stock Count and Significant Factor Regression Analyzer
          short_description: Identifies how many stocks from an index fall into each sector and screens for stocks with statistically
            significant factor regression results based o
          sample_triggers:
          - sector composition
          - significant regression
          - p-value screening
      - group_id: research_analysis
        name: Research Analysis
        description: ''
        emoji: 📦
        uc_count: 3
        ucs:
        - uc_id: UC-102
          name: Factor Correlation Calculator
          short_description: Computes correlations between different factors over time to understand factor relationship dynamics
            and potential multicollinearity issues
          sample_triggers:
          - factor correlation
          - correlation matrix
          - factor relationships
        - uc_id: UC-103
          name: OLS Regression with Diagnostic Statistics
          short_description: 'Performs ordinary least squares regression on factor data with comprehensive diagnostic tests
            including Durbin-Watson, Jarque-Bera, and Breusch-Pagan '
          sample_triggers:
          - OLS regression
          - diagnostic tests
          - statistical tests
        - uc_id: UC-107
          name: Multi-Stock Factor Regression Runner
          short_description: Executes factor regression analysis across multiple stocks in parallel using multiprocessing,
            loading Fama-French and carbon risk factors from databas
          sample_triggers:
          - regression
          - multiprocessing
          - parallel analysis
      - group_id: data_pipeline
        name: Data Pipeline
        description: ''
        emoji: 📊
        uc_count: 4
        ucs:
        - uc_id: UC-104
          name: Fama-French Factor Model Generator
          short_description: Builds custom Fama-French style factor models by merging stock returns, Fama-French factors,
            and carbon risk factors into unified datasets for analysi
          sample_triggers:
          - Fama-French model
          - factor model
          - carbon risk
        - uc_id: UC-105
          name: Stock Price Data Downloader
          short_description: Downloads historical stock price data from Yahoo Finance with support for daily and monthly frequencies,
            including automatic retry on timeout
          sample_triggers:
          - stock prices
          - price download
          - yfinance
        - uc_id: UC-108
          name: Stock Data Import and Update
          short_description: Imports stock return data from CSV or downloads from yfinance, with support for incremental updates
            to maintain current database with stock returns
          sample_triggers:
          - import stocks
          - stock data
          - data import
        - uc_id: UC-109
          name: Database Schema Initialization and Data Import
          short_description: Initializes database schema and imports Fama-French, bond, and carbon risk factors into PostgreSQL
            tables, including BMG factor data
          sample_triggers:
          - database setup
          - schema initialization
          - factor import
      - group_id: builtin_factor
        name: Builtin Factor
        description: ''
        emoji: 🧮
        uc_count: 1
        ucs:
        - uc_id: UC-106
          name: BMG Factor Series Creator
          short_description: Creates Brown-Green (BMG) factor series by calculating the return differential between brown
            (high carbon) and green (low carbon) stocks for carbon ri
          sample_triggers:
          - BMG factor
          - carbon risk
          - brown green stocks
    call_to_action: Tell me which one you want to try.
    featured_entries:
    - uc_id: UC-101
      beginner_prompt: Try sector stock count and significant factor regression analyzer
      auto_selected: true
    - uc_id: UC-102
      beginner_prompt: Try factor correlation calculator
      auto_selected: true
    - uc_id: UC-103
      beginner_prompt: Try ols regression with diagnostic statistics
      auto_selected: true
    more_info_hint: Ask me 'what else can you do?' to see all 9 capabilities.
  locale_rendering:
    instruction: On skill_installation_complete, translate ALL user-facing strings (positioning + capability_catalog.groups[].name
      + capability_catalog.groups[].description + capability_catalog.groups[].ucs[].short_description + call_to_action + featured_entries[].beginner_prompt
      + more_info_hint) into detected user locale per locale_contract. Preserve UC-IDs, group_id, emoji, and sample_triggers
      verbatim.
    preserve_verbatim:
    - UC-IDs
    - group_id
    - emoji
    - sample_triggers
    - technical_class_names
  enforcement:
    action: 'Host agent MUST send composed message to user as the FIRST user-facing response after skill_installation_complete
      event. Message MUST contain: positioning, capability_catalog (rendered as markdown tables per group), 3 featured_entries,
      call_to_action, and more_info_hint.'
    violation_code: PIN-01
    violation_signal: First user-facing message post-install does not contain the full capability_catalog (all UCs grouped)
      OR skips featured_entries OR skips call_to_action.
human_summary:
  persona: Doraemon
  what_i_can_do:
    tagline: 'I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow. Just tell me
      what you want; I''ll write the code, you don''t have to dig docs. (Heads up: ZVT natively supports A-share, HK, and
      crypto. US stocks — stockus_nasdaq_AAPL — are half-baked; don''t bother for serious work.)'
    use_cases:
    - OLS Regression with Diagnostic Statistics
    - Factor Correlation Calculator
    - Sector Stock Count and Significant Factor Regression Analyzer
    - A-share MACD daily golden-cross backtest with hfq price adjustment from eastmoney
    - 'End-to-end ZVT pipeline: FinanceRecorder + GoodCompanyFactor + StockTrader'
    - Multi-factor strategy with TargetSelector (AND mode) combining MACD + volume breakout
    - Index composition data collection (SZ1000, SZ2000) with EM recorder
  what_i_auto_fetch:
  - ZVT stage pipeline structure (data_collection → visualization) from LATEST.yaml
  - Semantic locks (SL-01 through SL-12) — especially sell-before-buy ordering and MACD params
  - Fatal constraints (finance-C-*) relevant to your target strategy type
  - 'Default parameters: MACD(12,26,9), hfq adjustment, buy_cost=0.001, base_capital=1M CNY'
  - Entity ID format (stock_sh_600000) and DataFrame MultiIndex convention
  - Provider-specific recorder class names and required class attributes
  what_i_ask_you:
  - 'Target market: A-share (default), HK, or crypto? (US stocks in ZVT are half-baked — stockus_nasdaq_AAPL exists but coverage
    is thin)'
  - 'Data source / provider: eastmoney (free, no account), joinquant (account+paid), baostock (free, good history), akshare,
    or qmt (broker)?'
  - 'Strategy type: MACD golden-cross, MA crossover, volume breakout, fundamental screen, or custom factor?'
  - 'Time range: start_timestamp and end_timestamp for backtest period'
  - 'Target entity IDs: specific stocks (stock_sh_600000) or index components (SZ1000)?'
  locale_rendering:
    instruction: On first user contact, translate all fields above into detected user locale while preserving Doraemon persona
      (direct, frank, mildly snarky, knows limits).
    preserve_verbatim:
    - BD-IDs
    - SL-IDs
    - UC-IDs
    - finance-C-IDs
    - class_names
    - function_names
    - file_paths
    - numeric_thresholds

ClawHub Coding Testing+2

T@clawhub-tangweigang-jpg-8679fec286

Ccxt Crypto Api

Skill

CCXT 库统一封装全球主流加密货币交易所的交易 API，支持订单管理、市场行情查询、账户余额监控与自动化借贷等核心操作。

---
name: ccxt-crypto-api
description: |-
  CCXT 库统一封装全球主流加密货币交易所的交易 API，支持订单管理、市场行情查询、账户余额监控与自动化借贷等核心操作。
license: Proprietary. See LICENSE.txt in project root.
compatibility: Designed for Doramagic-host ecosystem (Claude Code / openclaw / Cursor). Requires Python 3.12+ with uv package manager.
metadata:
  version: "v6.1"
  blueprint_id: "finance-bp-111"
  compiled_at: "2026-04-22T13:00:53.651332+00:00"
  capability_markets: "crypto"
  capability_activities: "crypto-trading"
  sop_version: "crystal-compilation-v6.1"
---
# CCXT 加密交易接口 (ccxt-crypto-api)

> CCXT 库统一封装全球主流加密货币交易所的交易 API，支持订单管理、市场行情查询、账户余额监控与自动化借贷等核心操作。

## Pipeline

`data_collection -> data_storage -> factor_computation -> target_selection -> trading_execution -> visualization`

## Top Use Cases (100 total)

### Bitfinex fUST Lending Bot (`UC-101`)
Automates cryptocurrency lending on Bitfinex by checking for lending opportunities and executing market orders to deploy funds into lending markets
**Triggers**: lending, bot, bitfinex

### Cross-Exchange Spot Arbitrage Bot (`UC-102`)
Scans multiple exchanges (OKX, Bybit, Binance, KuCoin, BitMart, Gate.io) for price discrepancies in spot markets and executes arbitrage trades
**Triggers**: arbitrage, spot trading, cross-exchange

### Binance Create and Cancel Order (`UC-103`)
Demonstrates creating a limit order on Binance and then canceling it, useful for testing order workflows
**Triggers**: create order, cancel order, binance

For all **100** use cases, see [references/USE_CASES.md](references/USE_CASES.md).

**Execute trigger**: `When user intent matches intent_router.uc_entries[].positive_terms AND user uses action verb (run/execute/跑/执行/backtest/fetch/collect)`

## What I'll Ask You

- Target market: A-share (default), HK, or crypto? (US stocks in ZVT are half-baked — stockus_nasdaq_AAPL exists but coverage is thin)
- Data source / provider: eastmoney (free, no account), joinquant (account+paid), baostock (free, good history), akshare, or qmt (broker)?
- Strategy type: MACD golden-cross, MA crossover, volume breakout, fundamental screen, or custom factor?
- Time range: start_timestamp and end_timestamp for backtest period
- Target entity IDs: specific stocks (stock_sh_600000) or index components (SZ1000)?

## Semantic Locks (Fatal)

| ID | Rule | On Violation |
|---|---|---|
| `SL-01` | Execute sell orders before buy orders in every trading cycle | halt |
| `SL-02` | Trading signals MUST use next-bar execution (no look-ahead) | halt |
| `SL-03` | Entity IDs MUST follow format entity_type_exchange_code | halt |
| `SL-04` | DataFrame index MUST be MultiIndex (entity_id, timestamp) | halt |
| `SL-05` | TradingSignal MUST have EXACTLY ONE of: position_pct, order_money, order_amount | halt |
| `SL-06` | filter_result column semantics: True=BUY, False=SELL, None/NaN=NO ACTION | halt |
| `SL-07` | Transformer MUST run BEFORE Accumulator in factor pipeline | halt |
| `SL-08` | MACD parameters locked: fast=12, slow=26, signal=9 | halt |

Full lock definitions: [references/LOCKS.md](references/LOCKS.md)

## Top Anti-Patterns (13 total)

- **`AP-CRYPTO-TRADING-001`**: Float Arithmetic for Monetary Values
- **`AP-CRYPTO-TRADING-002`**: Missing Market Initialization Before Access
- **`AP-CRYPTO-TRADING-003`**: Bypassing API Facade Layer

All 13 anti-patterns: [references/ANTI_PATTERNS.md](references/ANTI_PATTERNS.md)

## Evidence Quality Notice

> [QUALITY NOTICE] This crystal was compiled from blueprint finance-bp-111. Evidence verify ratio = 60.5% and audit fail total = 9. Generated results may have uncaptured requirement gaps. Verify critical decisions against source files (LATEST.yaml / LATEST.jsonl).

## Reference Files

| File | Contents | When to Load |
|---|---|---|
| [references/seed.yaml](references/seed.yaml) | V6+ 全量权威 (source-of-truth) | 有行为/决策争议时必读 |
| [references/ANTI_PATTERNS.md](references/ANTI_PATTERNS.md) | 13 条跨项目反模式 | 开始实现前 |
| [references/WISDOM.md](references/WISDOM.md) | 跨项目精华借鉴 | 架构决策时 |
| [references/CONSTRAINTS.md](references/CONSTRAINTS.md) | domain + fatal 约束 | 规则冲突时 |
| [references/USE_CASES.md](references/USE_CASES.md) | 全量 KUC-* 业务场景 | 需要完整示例时 |
| [references/LOCKS.md](references/LOCKS.md) | SL-* + preconditions + hints | 生成回测/交易代码前 |
| [references/COMPONENTS.md](references/COMPONENTS.md) | AST 组件地图（按 module 拆分）| 查 API 时 |

---

*Compiled by Doramagic crystal-compilation-v6.1 from `finance-bp-111` blueprint at 2026-04-22T13:00:53.651332+00:00.*
*See [human_summary.md](human_summary.md) for non-technical overview.*

FILE:human_summary.md
# Human Summary

> I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow. Just tell me what you want; I'll write the code, you don't have to dig docs. (Heads up: ZVT natively supports A-share, HK, and crypto. US stocks — stockus_nasdaq_AAPL — are half-baked; don't bother for serious work.)

## What I Can Do

- **tagline**: I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow. Just tell me what you want; I'll write the code, you don't have to dig docs. (Heads up: ZVT natively supports A-share, HK, and crypto. US stocks — stockus_nasdaq_AAPL — are half-baked; don't bother for serious work.)
- **use_cases**: ['Binance Create and Cancel Order', 'Cross-Exchange Spot Arbitrage Bot', 'Bitfinex fUST Lending Bot', 'A-share MACD daily golden-cross backtest with hfq price adjustment from eastmoney', 'End-to-end ZVT pipeline: FinanceRecorder + GoodCompanyFactor + StockTrader', 'Multi-factor strategy with TargetSelector (AND mode) combining MACD + volume breakout', 'Index composition data collection (SZ1000, SZ2000) with EM recorder']

## What I Ask You

- Target market: A-share (default), HK, or crypto? (US stocks in ZVT are half-baked — stockus_nasdaq_AAPL exists but coverage is thin)
- Data source / provider: eastmoney (free, no account), joinquant (account+paid), baostock (free, good history), akshare, or qmt (broker)?
- Strategy type: MACD golden-cross, MA crossover, volume breakout, fundamental screen, or custom factor?
- Time range: start_timestamp and end_timestamp for backtest period
- Target entity IDs: specific stocks (stock_sh_600000) or index components (SZ1000)?

FILE:references/ANTI_PATTERNS.md
# Anti-Patterns (Cross-Project)

Total: **13**

## ccxt (1)

### `AP-CRYPTO-TRADING-002` — Missing Market Initialization Before Access <sub>(high)</sub>

Attempting to access market data via symbol lookups before load_markets() is called leaves self.markets empty, causing KeyError or BadSymbol exceptions on all trading operations and data retrieval. This breaks the entire trading workflow at the first market interaction.

## cryptofeed (3)

### `AP-CRYPTO-TRADING-009` — Applying Order Book Deltas Before Snapshot <sub>(high)</sub>

Processing order book delta messages before receiving a snapshot for the symbol applies updates to an uninitialized or stale book state. Price levels are incorrectly added/removed, corrupting the local book representation with no way to recover without full reset.

### `AP-CRYPTO-TRADING-010` — Silent HTTP Error Handling <sub>(medium)</sub>

Ignoring non-200 HTTP response status codes without raising exceptions causes silent failures for data requests. Market data is missing or corrupted, failed requests are not retried, and downstream consumers receive incomplete data with no indication of failure.

### `AP-CRYPTO-TRADING-011` — Missing Sequence Number Validation <sub>(medium)</sub>

Not validating that order book sequence numbers increment by exactly 1 allows out-of-order or missing messages to corrupt local book state. Stale or incorrect price levels persist in the book, leading to wrong trading signals and corrupted market depth data.

## hummingbot (5)

### `AP-CRYPTO-TRADING-005` — Unvalidated Collateral for Order Execution <sub>(high)</sub>

Submitting orders without checking collateral requirements including order cost, percent fees, and fixed fees against available balance causes orders to exceed margin. This triggers immediate liquidation or forced position closure at unfavorable prices with partial or total loss of collateral.

### `AP-CRYPTO-TRADING-006` — Close Order Placed Before Open Order Fills <sub>(high)</sub>

Placing a close order before verifying the open order is fully filled causes mismatched position sizes. The executor attempts to close a larger or smaller position than actually exists, leading to unintended directional exposure and potential losses exceeding the configured risk parameters.

### `AP-CRYPTO-TRADING-007` — Arbitrage Across Non-Interchangeable Tokens <sub>(high)</sub>

Executing arbitrage trades between tokens that appear similar but are not interchangeable causes permanent loss of funds. The received tokens cannot be used to close the opposing position, stranding capital and creating one-sided exposure with no recovery path.

### `AP-CRYPTO-TRADING-008` — Skipping Triple Barrier Evaluations <sub>(high)</sub>

Omitting control_stop_loss, control_take_profit, or control_time_limit calls in the control_barriers cycle leaves positions unprotected. Losses exceed configured thresholds as barrier checks never trigger, positions remain open beyond risk tolerance, resulting in amplified losses.

### `AP-CRYPTO-TRADING-012` — Wrong Position Key for Perpetual Modes <sub>(medium)</sub>

Using trading_pair only as the position key in HEDGE mode causes different position sides to collide and overwrite each other. Position tracking becomes incorrect, leading to wrong order matching and potential financial loss when the system misidentifies position direction.

## rotki (3)

### `AP-CRYPTO-TRADING-003` — Bypassing API Facade Layer <sub>(high)</sub>

Directly accessing internal service methods without routing through the RestAPI facade bypasses authentication, task tracking, and error handling mechanisms. Anonymous requests can execute privileged operations, creating critical security vulnerabilities where unauthorized users access sensitive financial data or execute trades.

### `AP-CRYPTO-TRADING-004` — Non-Checksummed EVM Addresses <sub>(high)</sub>

Passing lowercase or mixed-case Ethereum addresses to RPC nodes causes InvalidAddress exceptions since nodes enforce EIP-55 checksum format. This results in RemoteError failures that halt all blockchain data collection for the affected chain, with no graceful degradation or fallback.

### `AP-CRYPTO-TRADING-013` — Overwriting User-Customized Event Classifications <sub>(medium)</sub>

Re-decoding operations silently replace user-modified events marked as CUSTOMIZED without explicit user action. User edits to event classifications are permanently lost, causing incorrect accounting treatment and potential tax reporting errors that may not be detected until audit.

## rotki, hummingbot, cryptofeed, ccxt (1)

### `AP-CRYPTO-TRADING-001` — Float Arithmetic for Monetary Values <sub>(high)</sub>

Using Python float type instead of Decimal for price, amount, balance, PnL, and other financial calculations causes precision errors due to binary floating-point representation. Rounding errors compound across multiple calculations, leading to incorrect order sizing, wrong profit/loss reporting, and potentially incorrect trading decisions or tax calculations.

FILE:references/COMPONENTS.md
# Component Capability Map

**Project**: finance-bp-111--ccxt
**Scan date**: 2026-04-22
**Stats**: {'total_files': 6, 'total_classes': 21, 'total_functions': 0, 'total_stages': 6}

## Modules (6)

- [market_and_currency_loading](components/market_and_currency_loading.md): 3 classes
- [api_request_construction](components/api_request_construction.md): 3 classes
- [network_request_execution](components/network_request_execution.md): 3 classes
- [response_parsing_and_normalization](components/response_parsing_and_normalization.md): 3 classes
- [websocket_real-time_streaming](components/websocket_real-time_streaming.md): 4 classes
- [trading_operations](components/trading_operations.md): 5 classes

FILE:references/CONSTRAINTS.md
# Constraints

## preservation_manifest

```yaml
required_objects:
  business_decisions_count: 138
  fatal_constraints_count: 46
  non_fatal_constraints_count: 175
  use_cases_count: 100
  semantic_locks_count: 12
  preconditions_count: 4
  evidence_quality_rules_count: 2
  traceback_scenarios_count: 5

```

FILE:references/LOCKS.md
# Semantic Locks + Preconditions

## Semantic Locks (12)

### `SL-01` <sub>(on_violation: fatal)</sub>
Execute sell orders before buy orders in every trading cycle

### `SL-02` <sub>(on_violation: fatal)</sub>
Trading signals MUST use next-bar execution (no look-ahead)

### `SL-03` <sub>(on_violation: fatal)</sub>
Entity IDs MUST follow format entity_type_exchange_code

### `SL-04` <sub>(on_violation: fatal)</sub>
DataFrame index MUST be MultiIndex (entity_id, timestamp)

### `SL-05` <sub>(on_violation: fatal)</sub>
TradingSignal MUST have EXACTLY ONE of: position_pct, order_money, order_amount

### `SL-06` <sub>(on_violation: fatal)</sub>
filter_result column semantics: True=BUY, False=SELL, None/NaN=NO ACTION

### `SL-07` <sub>(on_violation: fatal)</sub>
Transformer MUST run BEFORE Accumulator in factor pipeline

### `SL-08` <sub>(on_violation: fatal)</sub>
MACD parameters locked: fast=12, slow=26, signal=9

### `SL-09` <sub>(on_violation: warning)</sub>
Default transaction costs: buy_cost=0.001, sell_cost=0.001, slippage=0.001

### `SL-10` <sub>(on_violation: fatal)</sub>
A-share equity trading is T+1 (no same-day close of buy positions)

### `SL-11` <sub>(on_violation: fatal)</sub>
Recorder subclass MUST define provider AND data_schema class attributes

### `SL-12` <sub>(on_violation: fatal)</sub>
Factor result_df MUST contain either 'filter_result' OR 'score_result' column

## Preconditions (4)

- **`PC-01`**: `python3 -c 'import zvt; print(zvt.__version__)'` → on_fail: Run: python3 -m pip install zvt then re-run: python3 -m zvt.init_dirs to initialize data directories
- **`PC-02`**: `python3 -c "from zvt.api.kdata import get_kdata; df = get_kdata(entity_ids=['stock_sh_600000'], limit=1); assert df is not None and len(df) > 0, 'No kdata found'"` → on_fail: Run recorder first: python3 -m zvt.recorders.em.em_stock_kdata_recorder --entity_ids stock_sh_600000 (replace with your target entity IDs)
- **`PC-03`**: `python3 -c "import os; from pathlib import Path; zvt_home = Path(os.environ.get('ZVT_HOME', Path.home() / '.zvt')); assert zvt_home.exists(), f'ZVT home not found: {zvt_home}'"` → on_fail: Run: python3 -m zvt.init_dirs
- **`PC-04`**: `python3 -c "import os, tempfile; from pathlib import Path; zvt_home = Path(os.environ.get('ZVT_HOME', Path.home() / '.zvt')); test_f = zvt_home / '.write_test'; test_f.touch(); test_f.unlink()"` → on_fail: Check directory permissions: chmod u+w ~/.zvt or set ZVT_HOME environment variable to a writable location

FILE:references/USE_CASES.md
# Known Use Cases (KUC)

Total: **100**

## `KUC-101`
**Source**: `examples/bots/py/bitfinex-lending-bot.py`

Automates cryptocurrency lending on Bitfinex by checking for lending opportunities and executing market orders to deploy funds into lending markets.

## `KUC-102`
**Source**: `examples/bots/py/spot-arbitrage-bot.py`

Scans multiple exchanges (OKX, Bybit, Binance, KuCoin, BitMart, Gate.io) for price discrepancies in spot markets and executes arbitrage trades.

## `KUC-103`
**Source**: `examples/ccxt.pro/py/binance-create-order-cancel-order.py`

Demonstrates creating a limit order on Binance and then canceling it, useful for testing order workflows.

## `KUC-104`
**Source**: `examples/ccxt.pro/py/binance-fetch-balance-snapshot-watch-balance-updates.py`

Captures initial balance snapshot and continuously monitors for balance updates via WebSocket, printing changes when they occur.

## `KUC-105`
**Source**: `examples/ccxt.pro/py/binance-futures-watch-balance.py`

Continuously watches futures, delivery, and spot balances on Binance simultaneously using asyncio.

## `KUC-106`
**Source**: `examples/ccxt.pro/py/binance-futures-watch-order-book.py`

Streams real-time order book updates for BTC/USDT futures contract on Binance.

## `KUC-107`
**Source**: `examples/ccxt.pro/py/binance-futures.py`

Continuously monitors and prints order book updates with timestamps for BTC/USDT on Binance.

## `KUC-108`
**Source**: `examples/ccxt.pro/py/binance-reload-markets.py`

Periodically reloads market data from Binance while simultaneously watching order books, ensuring market data stays current.

## `KUC-109`
**Source**: `examples/ccxt.pro/py/binance-spot-and-futures.py`

Watches multiple order books across different market types (spot, futures) and displays them together.

## `KUC-110`
**Source**: `examples/ccxt.pro/py/binance-watch-many-orderbooks.py`

Subscribes to order book updates for multiple trading pairs simultaneously on Binance, printing each updates.

## `KUC-111`
**Source**: `examples/ccxt.pro/py/binance-watch-margin-balance.py`

Monitors margin account balance changes on Binance via WebSocket, printing updates when margin positions change.

## `KUC-112`
**Source**: `examples/ccxt.pro/py/binance-watch-ohlcv.py`

Streams real-time OHLCV (candlestick) data for ETH/USDT on Binance with configurable timeframe and limit.

## `KUC-113`
**Source**: `examples/ccxt.pro/py/binance-watch-order-book-individual-updates.py`

Captures and displays individual high-frequency order book updates by subclassing Binance exchange to intercept messages.

## `KUC-114`
**Source**: `examples/ccxt.pro/py/binance-watch-orderbook-watch-balance.py`

Simultaneously monitors order book and balance updates, displaying them together with common handler logic.

## `KUC-115`
**Source**: `examples/ccxt.pro/py/binance-watch-orders-being-placed.py`

Watches active orders and balance updates while also placing delayed orders to demonstrate order lifecycle monitoring.

## `KUC-116`
**Source**: `examples/ccxt.pro/py/binance-watch-spot-futures-balances-continuously.py`

Continuously monitors balance across multiple Binance accounts (spot, USD-M futures, COIN-M futures) and prints each currency totals.

## `KUC-117`
**Source**: `examples/ccxt.pro/py/bitmex_watch_ohlcv.py`

Streams real-time OHLCV candlestick data for BTC/USD perpetual contract on Bitmex with formatted table output.

## `KUC-118`
**Source**: `examples/ccxt.pro/py/bitmex_watch_ticker_and_ohlcv.py`

Simultaneously streams ticker data and OHLCV candlesticks on Bitmex with color-coded output for visual distinction.

## `KUC-119`
**Source**: `examples/ccxt.pro/py/bitvavo-watch-order-book.py`

Streams real-time order book updates for BTC/EUR on Bitvavo European exchange with nonce verification.

## `KUC-120`
**Source**: `examples/ccxt.pro/py/build-ohlcv-many-symbols.py`

Constructs OHLCV candlesticks from individual trades for multiple symbols, supporting both complete and incomplete candles.

## `KUC-121`
**Source**: `examples/ccxt.pro/py/coinbase-watch-all-trades.py`

Watches each trade updates on Coinbase for BTC/USD and tracks the last trade ID to avoid duplicates.

## `KUC-122`
**Source**: `examples/ccxt.pro/py/coinbase-watch-trades.py`

Streams trade data for BTC/USD on Coinbase, printing latest trade with count of cached trades.

## `KUC-123`
**Source**: `examples/ccxt.pro/py/consume-all-trades.py`

Continuously consumes and prints each trade updates for BTC/USD on Bitmex, clearing trade cache after processing.

## `KUC-124`
**Source**: `examples/ccxt.pro/py/gateio-watch-trades.py`

Watches trade updates on Gate.io for BTC/USDT with timestamp-based pagination to fetch incremental updates.

## `KUC-125`
**Source**: `examples/ccxt.pro/py/intercept-original-ohlcv-updates.py`

Subclasses Binance to intercept and process raw OHLCV WebSocket messages before passing to standard handler.

## `KUC-126`
**Source**: `examples/ccxt.pro/py/kucoin-watch-multiple-orderbooks.py`

Watches order books for multiple symbols (KDA/USDT, KDA/BTC, BTC/USDT) simultaneously on KuCoin with authentication.

## `KUC-127`
**Source**: `examples/ccxt.pro/py/many-exchanges-many-different-streams.py`

Monitors multiple data streams (order book, ticker, trades) across multiple exchanges simultaneously.

## `KUC-128`
**Source**: `examples/ccxt.pro/py/many-exchanges-many-orderbooks-synchronized.py`

Watches order books across multiple exchanges and displays each current order books together in a synchronized view.

## `KUC-129`
**Source**: `examples/ccxt.pro/py/many-exchanges-many-orderbooks-throttled.py`

Watches order books across multiple exchanges with throttled output every 5 seconds to manage display bandwidth.

## `KUC-130`
**Source**: `examples/ccxt.pro/py/many-exchanges-many-streams-with-keys.py`

Advanced multi-exchange monitoring supporting both symbol-specific and global streams (like server time) with API authentication.

## `KUC-131`
**Source**: `examples/ccxt.pro/py/many-exchanges-many-streams.py`

Watches order books for multiple symbols across OKX and Binance simultaneously using async gather patterns.

## `KUC-132`
**Source**: `examples/ccxt.pro/py/many-exchanges-many-symbols-watch-trades.py`

Streams trade data for multiple symbols across OKX and Binance with incremental updates enabled.

## `KUC-133`
**Source**: `examples/ccxt.pro/py/many-exchanges.py`

Simple example watching BTC/USDT order books across Kraken, Binance, and Bitmex simultaneously.

## `KUC-134`
**Source**: `examples/ccxt.pro/py/multiple-exchanges-watch-orderbook-continuously.py`

Monitors CELO/USD order books across Coinbase Pro, OKCoin, and Bittrex, printing when top bid changes.

## `KUC-135`
**Source**: `examples/ccxt.pro/py/okex-create-swap-order.py`

Places a market order for BTC/USDT perpetual swap contract on OKX with configurable position direction.

## `KUC-136`
**Source**: `examples/ccxt.pro/py/okex-watch-margin-balance-with-params.py`

Watches OKX margin account balance for specific symbol (BTC/USDT) using params-based approach with verbose output.

## `KUC-137`
**Source**: `examples/ccxt.pro/py/okex-watch-margin-balance.py`

Continuously monitors OKX margin account balance changes for BTC/USDT with verbose debugging enabled.

## `KUC-138`
**Source**: `examples/ccxt.pro/py/okx-bbo-tbt.py`

Streams best bid/ask data tick-by-tick on OKX for high-frequency price monitoring.

## `KUC-139`
**Source**: `examples/ccxt.pro/py/on-connected-user-hook.py`

Demonstrates WebSocket connection lifecycle hook by placing an order immediately upon connection establishment.

## `KUC-140`
**Source**: `examples/ccxt.pro/py/one-exchange-different-streams.py`

Watches both order book and trades streams simultaneously for BTC/USD on Bitstamp.

## `KUC-141`
**Source**: `examples/ccxt.pro/py/one-exchange-many-streams.py`

Watches order books for multiple symbols (BTC/USDT, ETH/USDT, ETH/BTC) on FTX exchange with throttling.

## `KUC-142`
**Source**: `examples/ccxt.pro/py/phemex-cancel-all-orders.py`

Cancels each open orders for a specific symbol (UNI/USDT) on Phemex exchange.

## `KUC-143`
**Source**: `examples/ccxt.pro/py/spot-vs-future-arbitrage-bitmart.py`

Monitors both spot and futures order books on BitMart to detect arbitrage opportunities between the two markets.

## `KUC-144`
**Source**: `examples/ccxt.pro/py/watch-all-symbols.py`

Watches order books for each available trading pairs on Kraken, printing every 100th update to manage output.

## `KUC-145`
**Source**: `examples/ccxt.pro/py/watch-custom-exchange-specific-streams.py`

Implements custom WebSocket handler for Binance mini ticker stream not natively supported in CCXT Pro.

## `KUC-146`
**Source**: `examples/ccxt.pro/py/watch-many-exchanges-many-tickers.py`

Streams ticker data (bid/ask/last) for multiple symbols across Binance and FTX simultaneously.

## `KUC-147`
**Source**: `examples/ccxt.pro/py/watch-ticker-to-csv.py`

Streams ticker data for multiple symbols and writes results to CSV files for historical analysis.

## `KUC-148`
**Source**: `examples/py/aiohttp-custom-session-connector.py`

Configures CCXT async support to use SOCKS proxy for exchanges that require it for connectivity.

## `KUC-149`
**Source**: `examples/py/all-exchanges.py`

Lists each cryptocurrency exchanges supported by the CCXT library for discovery purposes.

## `KUC-150`
**Source**: `examples/py/arbitrage-pairs.py`

Scans multiple exchanges to find arbitrage opportunities by comparing prices across different trading pairs.

## `KUC-151`
**Source**: `examples/py/asciichart.py`

Provides terminal-based charting capability to visualize price data using ASCII art.

## `KUC-152`
**Source**: `examples/py/async-analyse-augur-v1-vs-v2-exchanges.py`

Compares trading pairs across Augur v1 and v2 exchanges to identify differences in available markets.

## `KUC-153`
**Source**: `examples/py/async-balance-coinbasepro.py`

Fetches account balance from Coinbase Pro exchange using sandbox environment for testing.

## `KUC-154`
**Source**: `examples/py/async-balance-gdax.py`

Fetches account balance from GDAX (Coinbase) exchange using sandbox mode.

## `KUC-155`
**Source**: `examples/py/async-balance.py`

Fetches account balance from Bittrex exchange asynchronously.

## `KUC-156`
**Source**: `examples/py/async-balances.py`

Fetches balances from multiple exchanges (Kraken, Bitfinex) concurrently.

## `KUC-157`
**Source**: `examples/py/async-basic-callchain.py`

Demonstrates sequential async operations pattern: load markets, fetch ticker, fetch order book on multiple exchanges.

## `KUC-158`
**Source**: `examples/py/async-basic-orderbook.py`

Fetches order book data from OKX exchange asynchronously.

## `KUC-159`
**Source**: `examples/py/async-basic-rate-limiter.py`

Demonstrates CCXT's built-in rate limiting by making 100 consecutive API calls without hitting exchange limits.

## `KUC-160`
**Source**: `examples/py/async-basic.py`

Simple example demonstrating async market loading from Binance exchange.

## `KUC-161`
**Source**: `examples/py/async-binance-cancel-option-order.py`

Cancels a specific options order on Binance using the implicit API for options trading.

## `KUC-162`
**Source**: `examples/py/async-binance-create-margin-order.py`

Places a limit buy order on Binance using margin trading account type.

## `KUC-163`
**Source**: `examples/py/async-binance-create-option-order.py`

Places a call options order on Binance USDT Options market.

## `KUC-164`
**Source**: `examples/py/async-binance-create-trailing-percent-order.py`

Places a trailing percent stop order on Binance USD-M futures with reduce-only flag.

## `KUC-165`
**Source**: `examples/py/async-binance-fetch-margin-balance-with-options.py`

Fetches margin account balance from Binance using options-based configuration.

## `KUC-166`
**Source**: `examples/py/async-binance-fetch-margin-balance-with-params.py`

Fetches Binance margin balance using params-based approach specifying type.

## `KUC-167`
**Source**: `examples/py/async-binance-fetch-option-OHLCV.py`

Fetches historical candlestick data for Binance options contracts.

## `KUC-168`
**Source**: `examples/py/async-binance-fetch-option-details.py`

Fetches options market details (mark price, etc.) from Binance using implicit API.

## `KUC-169`
**Source**: `examples/py/async-binance-fetch-option-order.py`

Fetches open options orders from Binance with pagination support.

## `KUC-170`
**Source**: `examples/py/async-binance-fetch-option-orderbook.py`

Fetches options order book from Binance USDT Options market.

## `KUC-171`
**Source**: `examples/py/async-binance-fetch-option-position.py`

Fetches options position information from Binance.

## `KUC-172`
**Source**: `examples/py/async-binance-fetch-option-ticker.py`

Fetches ticker/price information for Binance options contracts.

## `KUC-173`
**Source**: `examples/py/async-binance-fetch-ticker-continuously.py`

Continuously fetches ticker data from Binance with error handling and retry logic for robustness.

## `KUC-174`
**Source**: `examples/py/async-binance-futures-vs-spot.py`

Compares account data (balance, orders, trades) between Binance spot and futures accounts.

## `KUC-175`
**Source**: `examples/py/async-binance-margin-borrow.py`

Borrows cryptocurrency from Binance margin account for trading or other purposes.

## `KUC-176`
**Source**: `examples/py/async-binance-margin-repay.py`

Repays borrowed cryptocurrency on Binance margin account to reduce margin debt.

## `KUC-177`
**Source**: `examples/py/async-binance-usdm-fetch-continuous-klines-ohlcv.py`

Fetches continuous klines (perpetual contract) data from Binance USD-M futures.

## `KUC-178`
**Source**: `examples/py/async-bitfinex-public-get-symbols.py`

Fetches list of trading symbols available on Bitfinex exchange via public API.

## `KUC-179`
**Source**: `examples/py/async-bitget-perpetual-futures-swaps.py`

Places perpetual swap orders and fetches balance on Bitget exchange with API authentication.

## `KUC-180`
**Source**: `examples/py/async-bitstamp-create-limit-buy-order.py`

Places a limit buy order on Bitstamp exchange with configurable price and amount.

## `KUC-181`
**Source**: `examples/py/async-bitstamp-create-order-cancel-order.py`

Places a sell limit order on Bitstamp then cancels it, demonstrating full order lifecycle.

## `KUC-182`
**Source**: `examples/py/async-bittrex-orderbook.py`

Async generator that continuously polls order book data from Bittrex exchange.

## `KUC-183`
**Source**: `examples/py/async-bybit-transfer.py`

Fetches transfer history and executes internal transfers between Bybit account wallets (spot, derivatives, options).

## `KUC-184`
**Source**: `examples/py/async-fetch-balance.py`

Simple async example to fetch account balance from Bitstamp exchange.

## `KUC-185`
**Source**: `examples/py/async-fetch-many-orderbooks-continuously.py`

Continuously fetches order books for multiple symbols across OKX and Binance exchanges.

## `KUC-186`
**Source**: `examples/py/async-fetch-ohlcv-indicators-discord-webhook.py`

Fetches OHLCV data, calculates RSI indicator, and sends alerts to Discord when RSI conditions are met.

## `KUC-187`
**Source**: `examples/py/async-fetch-ohlcv-multiple-symbols-continuously.py`

Continuously fetches latest OHLCV candles for multiple symbols on Binance in a loop.

## `KUC-188`
**Source**: `examples/py/async-fetch-order-book-from-many-exchanges.py`

Fetches order book from multiple exchanges (Binance, KuCoin, Huobi) concurrently for the same symbol.

## `KUC-189`
**Source**: `examples/py/async-fetch-ticker.py`

Simple one-liner to fetch current ticker price from Binance.

## `KUC-190`
**Source**: `examples/py/async-gather-concurrency.py`

Demonstrates concurrent API calls using asyncio.gather to fetch order books from multiple symbols efficiently.

## `KUC-191`
**Source**: `examples/py/async-gdax-fetch-order-book-continuously.py`

Continuously polls order book data from Binance (mislabeled as GDAX in example) in a while loop.

## `KUC-192`
**Source**: `examples/py/async-generator-basic.py`

Demonstrates async generator pattern to continuously yield ticker data from Poloniex.

## `KUC-193`
**Source**: `examples/py/async-generator-multiple-tickers.py`

Async generator that cycles through multiple tickers on Kraken with round-robin approach.

## `KUC-194`
**Source**: `examples/py/async-generator-ticker-poller.py`

Authenticated async generator that polls BTC/USD ticker from Kraken continuously.

## `KUC-195`
**Source**: `examples/py/async-hollaex-sandbox.py`

Tests Hollaex API connectivity using sandbox mode with test API keys.

## `KUC-196`
**Source**: `examples/py/async-instantiate-all-at-once.py`

Creates instances of each CCXT-supported exchanges and demonstrates accessing one.

## `KUC-197`
**Source**: `examples/py/async-kucoin-rate-limit.py`

Demonstrates robust OHLCV fetching from KuCoin with proper rate limit handling and retry logic.

## `KUC-198`
**Source**: `examples/py/async-macd.py`

Calculates MACD (Moving Average Convergence Divergence) indicator on live OHLCV data for trading decisions.

## `KUC-199`
**Source**: `examples/py/async-market-making-symbols.py`

Scans each exchanges to find symbols with 0% maker fees, useful for market making strategies.

## `KUC-200`
**Source**: `examples/py/async-multiple-accounts.py`

Manages multiple exchange accounts simultaneously, fetching balance data from each account.

FILE:references/WISDOM.md
# Cross-Project Wisdom

Total: **8**

## `CW-CRYPTO-TRADING-001` — Decimal Type for All Monetary Values
**From**: rotki, hummingbot, cryptofeed, ccxt · **Applicable to**: crypto-trading

All four projects mandate Decimal type for price, amount, balance, quantity, and PnL fields. Float arithmetic causes rounding errors that compound across financial calculations, leading to incorrect order sizing and reporting. Always use Decimal for any value representing money in crypto trading systems.

## `CW-CRYPTO-TRADING-002` — Initialize Data Structures Before Access
**From**: ccxt, cryptofeed, rotki · **Applicable to**: crypto-trading

Projects consistently require explicit initialization before data access: load_markets() before symbol lookups, check symbol population before mapping access, establish RPC connections before queries. Skipping initialization causes KeyError, AttributeError, or silent data corruption that breaks downstream operations.

## `CW-CRYPTO-TRADING-003` — Precise String Arithmetic for Financial Calculations
**From**: ccxt · **Applicable to**: crypto-trading

CCXT mandates Precise.string_* static methods (string_mul, string_div, string_add, string_sub) for monetary calculations to avoid floating-point precision errors. This is especially critical for high-precision exchange data where rounding errors cause incorrect order costs, fees, and balances that may result in financial loss.

## `CW-CRYPTO-TRADING-004` — Respect Exchange Rate Limits
**From**: ccxt · **Applicable to**: crypto-trading

Disabling rate limiting via enableRateLimit=False causes HTTP 429 responses and potential temporary or permanent API key suspension by exchanges. CCXT enforces rate limits per IP/API key pair, and bypassing throttle() gates results in compliance violations that disrupt all trading activity until exchanges lift bans.

## `CW-CRYPTO-TRADING-005` — Inverse Contract Price Adjustment
**From**: ccxt, hummingbot · **Applicable to**: crypto-trading

Perpetual swap cost calculations require applying inverse price adjustment (1/price) before multiplying by contractSize for inverse contracts. Incorrect cost calculation causes wrong position sizing, leading to unexpected liquidation or insufficient margin for perpetual trading positions.

## `CW-CRYPTO-TRADING-006` — Strict Connection Lifecycle Ordering
**From**: cryptofeed, ccxt · **Applicable to**: crypto-trading

Both projects enforce strict execution order for connection operations: cryptofeed requires authenticate -> subscribe -> message handler sequence, while ccxt mandates connect -> on_connected_callback -> subscriptions -> on_close_callback. Out-of-order operations cause subscription failures and no data flow through connections.

## `CW-CRYPTO-TRADING-007` — Validate Input Data Structure Before Processing
**From**: rotki, cryptofeed · **Applicable to**: crypto-trading

Rotki validates EVM address checksum format before RPC calls; cryptofeed checks Symbols.populated() before symbol mapping access. Validating data structure before processing prevents downstream crashes (KeyError, InvalidAddress) and data corruption that is harder to debug when symptoms appear in unrelated code paths.

## `CW-CRYPTO-TRADING-008` — Validate Order Sizes Against Exchange Minimums
**From**: hummingbot · **Applicable to**: crypto-trading

DCAExecutor amounts must be validated against min_notional_size and amounts_quote/prices against min_order_size before execution. Orders below exchange minimums are rejected, breaking strategy execution and potentially leaving positions partially unfilled at unfavorable prices.

FILE:references/components/api_request_construction.md
# api_request_construction (3 classes)

## `Exchange.sign`
`api_request_construction/exchange-sign.py:0`

## `Entry descriptor`
`api_request_construction/entry-descriptor.py:0`

## `sign`
`api_request_construction/sign.py:0`

FILE:references/components/market_and_currency_loading.md
# market_and_currency_loading (3 classes)

## `Exchange.load_markets`
`market_and_currency_loading/exchange-load-markets.py:0`

## `binance.fetch_markets`
`market_and_currency_loading/binance-fetch-markets.py:0`

## `fetch_markets`
`market_and_currency_loading/fetch-markets.py:0`

FILE:references/components/network_request_execution.md
# network_request_execution (3 classes)

## `Exchange.fetch`
`network_request_execution/exchange-fetch.py:0`

## `Throttler.wait`
`network_request_execution/throttler-wait.py:0`

## `rateLimit`
`network_request_execution/ratelimit.py:0`

FILE:references/components/response_parsing_and_normalization.md
# response_parsing_and_normalization (3 classes)

## `Exchange.safe_float`
`response_parsing_and_normalization/exchange-safe-float.py:0`

## `Precise.string_mul`
`response_parsing_and_normalization/precise-string-mul.py:0`

## `parse_* methods`
`response_parsing_and_normalization/parse-methods.py:0`

FILE:references/components/trading_operations.md
# trading_operations (5 classes)

## `binance.create_order`
`trading_operations/binance-create-order.py:0`

## `binance.create_orders`
`trading_operations/binance-create-orders.py:0`

## `Exchange.check_order_arguments`
`trading_operations/exchange-check-order-arguments.py:0`

## `margin modes`
`trading_operations/margin-modes.py:0`

## `position mode`
`trading_operations/position-mode.py:0`

FILE:references/components/websocket_real-time_streaming.md
# websocket_real-time_streaming (4 classes)

## `Client.connect`
`websocket_real-time_streaming/client-connect.py:0`

## `Future.race`
`websocket_real-time_streaming/future-race.py:0`

## `binance.watch_trades`
`websocket_real-time_streaming/binance-watch-trades.py:0`

## `handle_* methods`
`websocket_real-time_streaming/handle-methods.py:0`

ClawHub Coding DevOps+2

T@clawhub-tangweigang-jpg-8679fec286

Bt Portfolio Backtest

Skill

使用 bt 框架构建和回测多策略投资组合，支持风险平价、等风险贡献、逆波动率加权等组合构建方法，以及政府债券滚动交易的模拟回测。

---
name: bt-portfolio-backtest
description: |-
  使用 bt 框架构建和回测多策略投资组合，支持风险平价、等风险贡献、逆波动率加权等组合构建方法，以及政府债券滚动交易的模拟回测。
license: Proprietary. See LICENSE.txt in project root.
compatibility: Designed for Doramagic-host ecosystem (Claude Code / openclaw / Cursor). Requires Python 3.12+ with uv package manager.
metadata:
  version: "v6.1"
  blueprint_id: "finance-bp-125"
  compiled_at: "2026-04-22T13:01:02.252610+00:00"
  capability_markets: "multi-market"
  capability_activities: "backtesting, factor-research"
  sop_version: "crystal-compilation-v6.1"
---
# bt 组合回测 (bt-portfolio-backtest)

> 使用 bt 框架构建和回测多策略投资组合，支持风险平价、等风险贡献、逆波动率加权等组合构建方法，以及政府债券滚动交易的模拟回测。

## Pipeline

`data_collection -> data_storage -> factor_computation -> target_selection -> trading_execution -> visualization`

## Top Use Cases (20 total)

### Buy and Hold Monthly Rebalancing Strategy (`UC-101`)
Implements a passive buy-and-hold strategy with monthly rebalancing to fixed target weights, demonstrating core backtesting framework capabilities
**Triggers**: buy and hold, monthly rebalance, fixed weights

### Equal Risk Contribution Portfolio Construction (`UC-102`)
Demonstrates Equal Risk Contribution (ERC) portfolio weighting using multivariate normal returns and covariance matrix inputs for risk parity allocati
**Triggers**: equal risk contribution, risk parity, covariance

### Fixed Income Government Bond Rolling Strategy (`UC-103`)
Simulates rolling government bond trading with synthetic price-to-yield calculations and bond lifecycle management for fixed income backtesting
**Triggers**: fixed income, government bonds, rolling bonds

For all **20** use cases, see [references/USE_CASES.md](references/USE_CASES.md).

**Execute trigger**: `When user intent matches intent_router.uc_entries[].positive_terms AND user uses action verb (run/execute/跑/执行/backtest/fetch/collect)`

## What I'll Ask You

- Target market: A-share (default), HK, or crypto? (US stocks in ZVT are half-baked — stockus_nasdaq_AAPL exists but coverage is thin)
- Data source / provider: eastmoney (free, no account), joinquant (account+paid), baostock (free, good history), akshare, or qmt (broker)?
- Strategy type: MACD golden-cross, MA crossover, volume breakout, fundamental screen, or custom factor?
- Time range: start_timestamp and end_timestamp for backtest period
- Target entity IDs: specific stocks (stock_sh_600000) or index components (SZ1000)?

## Semantic Locks (Fatal)

| ID | Rule | On Violation |
|---|---|---|
| `SL-01` | Execute sell orders before buy orders in every trading cycle | halt |
| `SL-02` | Trading signals MUST use next-bar execution (no look-ahead) | halt |
| `SL-03` | Entity IDs MUST follow format entity_type_exchange_code | halt |
| `SL-04` | DataFrame index MUST be MultiIndex (entity_id, timestamp) | halt |
| `SL-05` | TradingSignal MUST have EXACTLY ONE of: position_pct, order_money, order_amount | halt |
| `SL-06` | filter_result column semantics: True=BUY, False=SELL, None/NaN=NO ACTION | halt |
| `SL-07` | Transformer MUST run BEFORE Accumulator in factor pipeline | halt |
| `SL-08` | MACD parameters locked: fast=12, slow=26, signal=9 | halt |

Full lock definitions: [references/LOCKS.md](references/LOCKS.md)

## Top Anti-Patterns (25 total)

- **`AP-ZVT-183`**: 除权因子为 inf/NaN 时直接参与乘法导致复权静默失败
- **`AP-ZVT-179`**: 第三方数据接口超限后异常被吞噬，数据静默缺失
- **`AP-ZVT-183B`**: HFQ（后复权）与 QFQ（前复权）K 线表使用错误导致因子计算漂移

All 25 anti-patterns: [references/ANTI_PATTERNS.md](references/ANTI_PATTERNS.md)

## Evidence Quality Notice

> [QUALITY NOTICE] This crystal was compiled from blueprint finance-bp-125. Evidence verify ratio = 10.0% and audit fail total = 14. Generated results may have uncaptured requirement gaps. Verify critical decisions against source files (LATEST.yaml / LATEST.jsonl).

## Reference Files

| File | Contents | When to Load |
|---|---|---|
| [references/seed.yaml](references/seed.yaml) | V6+ 全量权威 (source-of-truth) | 有行为/决策争议时必读 |
| [references/ANTI_PATTERNS.md](references/ANTI_PATTERNS.md) | 25 条跨项目反模式 | 开始实现前 |
| [references/WISDOM.md](references/WISDOM.md) | 跨项目精华借鉴 | 架构决策时 |
| [references/CONSTRAINTS.md](references/CONSTRAINTS.md) | domain + fatal 约束 | 规则冲突时 |
| [references/USE_CASES.md](references/USE_CASES.md) | 全量 KUC-* 业务场景 | 需要完整示例时 |
| [references/LOCKS.md](references/LOCKS.md) | SL-* + preconditions + hints | 生成回测/交易代码前 |
| [references/COMPONENTS.md](references/COMPONENTS.md) | AST 组件地图（按 module 拆分）| 查 API 时 |

---

*Compiled by Doramagic crystal-compilation-v6.1 from `finance-bp-125` blueprint at 2026-04-22T13:01:02.252610+00:00.*
*See [human_summary.md](human_summary.md) for non-technical overview.*

FILE:human_summary.md
# Human Summary

> I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow. Just tell me what you want; I'll write the code, you don't have to dig docs. (Heads up: ZVT natively supports A-share, HK, and crypto. US stocks — stockus_nasdaq_AAPL — are half-baked; don't bother for serious work.)

## What I Can Do

- **tagline**: I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow. Just tell me what you want; I'll write the code, you don't have to dig docs. (Heads up: ZVT natively supports A-share, HK, and crypto. US stocks — stockus_nasdaq_AAPL — are half-baked; don't bother for serious work.)
- **use_cases**: ['Fixed Income Government Bond Rolling Strategy', 'Equal Risk Contribution Portfolio Construction', 'Buy and Hold Monthly Rebalancing Strategy', 'A-share MACD daily golden-cross backtest with hfq price adjustment from eastmoney', 'End-to-end ZVT pipeline: FinanceRecorder + GoodCompanyFactor + StockTrader', 'Multi-factor strategy with TargetSelector (AND mode) combining MACD + volume breakout', 'Index composition data collection (SZ1000, SZ2000) with EM recorder']

## What I Ask You

- Target market: A-share (default), HK, or crypto? (US stocks in ZVT are half-baked — stockus_nasdaq_AAPL exists but coverage is thin)
- Data source / provider: eastmoney (free, no account), joinquant (account+paid), baostock (free, good history), akshare, or qmt (broker)?
- Strategy type: MACD golden-cross, MA crossover, volume breakout, fundamental screen, or custom factor?
- Time range: start_timestamp and end_timestamp for backtest period
- Target entity IDs: specific stocks (stock_sh_600000) or index components (SZ1000)?

FILE:references/ANTI_PATTERNS.md
# Anti-Patterns (Cross-Project)

Total: **25**

## qlib (9)

### `AP-QLIB-1930` — 回测结果与模型无关——共享 dataset 对象导致预测值被首次模型覆盖 <sub>(high)</sub>

Qlib 中多个模型复用同一个已 fit 的 DatasetH 实例时，dataset 内部的标准化 参数（fit_start_time/fit_end_time 决定的归一化统计量）在第一次 fit 后固化。 切换模型但不重新初始化 dataset，导致所有模型实际使用同一套预测信号。表现为 无论换 LightGBM/XGBoost/DNN，回测净值曲线完全一致。这是最危险的"实验看起来 在跑，但结论全部无效"反模式。

Source: https://github.com/microsoft/qlib/issues/1930

### `AP-QLIB-2090` — fit_start_time 与 train segment 双重配置引发隐式数据泄露 <sub>(high)</sub>

Qlib DatasetH 有两个"训练数据范围"：handler 的 fit_start_time/fit_end_time （决定归一化器拟合范围）和 segments.train（决定模型训练范围）。常见错误是 让 fit_end_time 覆盖 valid/test 段，使归一化统计量（均值、标准差）包含了 未来数据，造成前向偏差（look-ahead bias）。两者独立配置但语义耦合，文档 未明确说明 fit_end_time 必须 <= train_end。

Source: https://github.com/microsoft/qlib/issues/2090

### `AP-QLIB-2036` — MACD 因子公式文档错误——DEA 被多除一次 CLOSE 导致量纲不一致 <sub>(high)</sub>

Qlib 官方文档中的 Alpha 公式示例将 MACD 的 DEA 定义为 EMA(DIF, 9) / CLOSE， 但 DIF 已经是无量纲（除过 CLOSE 的），再次除以 CLOSE 导致 DEA 量纲为 1/price。 基于此文档公式构建的 MACD 因子在截面标准化后与正确公式差异显著，IC 下降。 此类文档层面的公式错误会被大量用户直接照搬入生产因子库。

Source: https://github.com/microsoft/qlib/issues/2036

### `AP-QLIB-2184` — 自定义 A 股数据导入前未按约定填充停牌日 NaN，引发下游因子噪声 <sub>(high)</sub>

Qlib 约定停牌日 open/close/high/low/volume/factor 字段均应填 NaN，以便框架 在因子计算时识别并跳过。用户自建 A 股数据集时若将停牌日保留为上一日价格 （常见于从东财/Wind 直接导出的数据），会导致停牌期间的价格动量因子出现 "假信号"（价格不变但因子非零）。Qlib 不校验此约定，错误静默流入训练数据。

Source: https://github.com/microsoft/qlib/issues/2184

### `AP-QLIB-1892` — PIT（Point-In-Time）财务数据收集器依赖外部股票列表接口，全量 A 股获取不完整 <sub>(high)</sub>

Qlib 的 PIT 数据收集器（财务数据时间点快照）在初始化时调用 get_hs_stock_symbols() 获取沪深股票列表。该函数依赖东财 API，经常仅返回 部分列表而非全量 5000+ 股票，且函数在获取不完整时直接 raise ValueError。 用户若按文档步骤操作，财务数据集将只覆盖部分股票，基于 PIT 财务因子的回测 存在严重生存者偏差（未被采集的股票被隐式排除）。

Source: https://github.com/microsoft/qlib/issues/1892

### `AP-QLIB-2097` — 全市场 instrument="all" 在 32GB 内存机器上 OOM，但 CSI300 正常 <sub>(medium)</sub>

Qlib 在加载 Alpha158 特征时会将指定 universe 的全部特征矩阵一次性载入内存。 使用 instrument="csi300"（300 股）与 instrument="all"（5000+ 股）的内存占用 差约 16 倍。32GB 机器跑全市场时在 init_instance_by_config 阶段直接 OOM， 错误信息不提示内存问题。用户容易误以为是配置错误，实际上需要分批加载或 使用流式特征计算。

Source: https://github.com/microsoft/qlib/issues/2097

### `AP-QLIB-1984` — LightGBM 模型标签维度校验逻辑永远不触发导致多标签训练静默失败 <sub>(medium)</sub>

Qlib gbdt.py 中用 y.values.ndim == 2 判断是否为多标签，但从 DataFrame 取出的 Series 的 ndim 永远为 1，条件永远为 False，因此多标签训练不会走 squeeze 分支，而是直接进入 LightGBM 训练并在更深处抛出语义不明的错误。 用户尝试自定义多标签任务时无法从错误信息定位到此根因。

Source: https://github.com/microsoft/qlib/issues/1984

### `AP-QLIB-1915` — 自定义 CSV 数据 dump_bin 后 DataHandler 报 Length mismatch，D.features 却正常 <sub>(high)</sub>

Qlib 存在两套数据访问路径：D.features（直接读 binary）和 DataHandler/DataHandlerLP （带 processor pipeline）。自定义 A 股 CSV 数据在 dump_bin 时若字段顺序 或 symbol 格式（如 600000.SH vs SH600000）与 Qlib 约定不符，DataHandler 的 processor 在 align/reindex 时触发 Length mismatch，而 D.features 因不 经过 processor 而成功。这一"两套路径行为不一致"让用户误以为数据已正确导入。

Source: https://github.com/microsoft/qlib/issues/1915

### `AP-QLIB-1949` — Colab/Linux 多进程后端与 Qlib ParallelExt 冲突导致 DataHandler 完全不可用 <sub>(medium)</sub>

Qlib 在非 fork 环境（Windows 或 Google Colab）中，DataHandler 使用 joblib 并行加载特征时，ParallelExt 初始化时访问 _backend_args 属性失败（AttributeError）。 根因是 joblib 1.5+ 移除了该内部属性，Qlib 的兼容层未更新。表现为 D.features 调用抛出多层嵌套异常，用户无法从错误栈判断是并行后端问题还是数据问题。

Source: https://github.com/microsoft/qlib/issues/1949

## vnpy (4)

### `AP-VNPY-3691` — K 线生成器首根 K 线时间戳不对齐，导致第一个周期信号错误 <sub>(high)</sub>

vnpy BarGenerator 在合成 N 分钟 K 线时，第一根推送的 K 线时间戳为"当前 tick 所在分钟"而非"完整 N 分钟周期结束时间"。具体表现：09:59 的 tick 会 触发一根不完整的 5 分钟 K 线推送（本应等到 10:04 才推送）。策略若在 on_bar 中直接用 datetime.minute % 5 过滤，第一根 K 线恰好通过，但包含的 数据不足一个完整周期，用于信号计算会产生错误的开仓信号。

Source: https://github.com/vnpy/vnpy/issues/3691

### `AP-VNPY-3669` — Alpha 模块历史数据增量保存时新旧 DataFrame schema 不兼容导致 SchemaError <sub>(medium)</sub>

vnpy Alpha 模块在保存 K 线数据到 Parquet 文件时，将新下载数据（可能含 Float64 列）与已存文件（历史 Int64 列）直接 polars.concat。polars 强类型 不允许隐式类型提升，抛出 SchemaError。根因是不同数据源/版本返回的字段类型 不一致（如 volume 在部分行情源为整数，在另一些为浮点），且 concat 前无 schema 对齐步骤。影响所有使用 vnpy alpha 进行回测的历史数据构建流程。

Source: https://github.com/vnpy/vnpy/issues/3669

### `AP-VNPY-3685` — 价差交易模块 run_backtesting() 在 Jupyter 环境下静默报错，结果不可信 <sub>(high)</sub>

vnpy 4.10 价差交易（SpreadTrading）模块的 run_backtesting() 在 Jupyter 环境下存在事件循环冲突（asyncio already running），导致回测引擎部分逻辑 不执行但不抛异常，返回看似正常的回测统计数据。同样代码在命令行 Python 中无此问题。vnpy 4.x 将部分 IO 改为 async 但 Jupyter 的事件循环与之不兼容， 是"回测结果看起来正确但实际不完整"的隐蔽陷阱。

Source: https://github.com/vnpy/vnpy/issues/3685

### `AP-VNPY-3700` — 安装脚本不使用 venv 导致全局 numpy 版本被降级破坏其他依赖 <sub>(medium)</sub>

vnpy install.bat 直接在系统/conda base 环境安装，会强制降级 numpy 到 <2.0 以满足 vnpy 依赖，破坏依赖 numpy 2.x 的其他量化工具（如 scipy、pytorch 新版）。 没有 requirements.txt，依赖边界不透明。在多工具共存的量化研究环境中， vnpy 的安装脚本是"全局环境污染"的常见根源。

Source: https://github.com/vnpy/vnpy/issues/3700

## zipline (6)

### `AP-ZIPLINE-138` — 回测价格为未复权价，教程图表误导用户误判策略收益 <sub>(high)</sub>

Zipline 教程使用 AAPL 股价图做演示，但 bundle 中存储的是未复权价格（raw price）， 而非经过拆股/分红调整的复权价。图表显示的历史价格与市场实际价约差 4 倍（Apple 历次拆股累计因子），用户误将"价格翻 4 倍"当作策略收益。A 股场景更严重： 除权前后价格跳变会在未复权数据中形成巨大"信号"，吸引技术指标在除权日产生 虚假突破信号。

Source: https://github.com/stefan-jansen/zipline-reloaded/issues/138

### `AP-ZIPLINE-235` — 默认以当根 K 线收盘价成交，低估实盘滑点，策略回测收益虚高 <sub>(high)</sub>

Zipline 默认滑点模型在当根 K 线触发信号后，以同根 K 线收盘价成交（current bar close fill）。实盘中信号只能在下一根 K 线的开盘价附近成交（T+1 order execution）。以 A 股日线为例，用收盘价回测比用次日开盘价成交平均高估日收益 约 0.1-0.3%，年化差距可超 30%。需显式配置 slippage model 为 VolumeShareSlippage 或 FixedSlippage 并设合理 volume_limit。

Source: https://github.com/stefan-jansen/zipline-reloaded/issues/235

### `AP-ZIPLINE-190` — 日历 start_session 设为非交易日触发 DateOutOfBounds，无提示如何修正 <sub>(medium)</sub>

Zipline 在注册 bundle 或运行算法时，若 start_session 参数恰好是非交易日 （如 1998-01-01 元旦），Calendar 校验抛出 DateOutOfBounds（"cannot be earlier than the first session"）。错误信息仅显示交易日历起始日，不提示"请改为第一个 交易日"。A 股场景：使用 SSE/SZSE 日历时，若 start_date 恰好是春节前最后 一天次日（节假日），会触发同类错误，调试成本极高。

Source: https://github.com/stefan-jansen/zipline-reloaded/issues/190

### `AP-ZIPLINE-181` — asset db 过期后 Pipeline 报"no assets traded"，误导用户排查数据范围 <sub>(high)</sub>

Zipline 的 asset database（SQLite）记录每只股票的 start/end 交易日期。若 使用了旧版 Quandl/自建 bundle 且未重新 ingest，在回测新日期范围时 Pipeline 抛出 "Failed to find any assets with country_code 'US' that traded between [dates]"。A 股场景：重新下载行情后若只更新价格数据而未重建 asset db，退市/ 新上市股票的日期范围不更新，Pipeline 过滤会悄悄排除这些股票，产生生存者偏差。

Source: https://github.com/stefan-jansen/zipline-reloaded/issues/181

### `AP-ZIPLINE-285` — week_start()/week_end() 在自定义日历（非美股）下静默失效 <sub>(medium)</sub>

Zipline schedule_function 的 date_rules.week_start() 和 date_rules.week_end() 依赖交易日历的周首/周末判断逻辑，但在非美股日历（如 ASX、SSE）中，该逻辑 与 NYSE 日历的偏移计算不兼容，导致 schedule 永远不触发或在错误的日期触发。 A 股场景：使用 SSE 日历时，含春节等连续长假的周，week_start 可能跳过整个 假期周而不调仓，但用户无法从日志发现未触发的调度。

Source: https://github.com/stefan-jansen/zipline-reloaded/issues/285

### `AP-ZIPLINE-240` — 回测日期时区必须为 UTC，传入 naive datetime 引发深层 AssertionError <sub>(medium)</sub>

Zipline 内部强制要求所有时间戳为 UTC aware datetime。当用户传入 naive datetime （无时区信息，如 pd.Timestamp('2020-01-01')）时，不在入口处报错，而是在 算法执行深处触发 AssertionError: Algorithm should have a utc datetime，栈深 难以定位。A 股开发者从本地 CST 时间导入数据时极易触发此陷阱，需在 bundle 注册时显式 tz_localize('UTC')。

Source: https://github.com/stefan-jansen/zipline-reloaded/issues/240

## zvt (6)

### `AP-ZVT-183` — 除权因子为 inf/NaN 时直接参与乘法导致复权静默失败 <sub>(high)</sub>

ZVT 在计算前复权因子时以 new/old 价格比计算 qfq_factor。当 old==0（新股首日 或数据缺失）时因子为 inf；当 kdata.open 本身为 None（停牌日未填充）时乘法 抛出 TypeError。结果：整个 entity 的复权计算中断，后续 K 线全部丢失，但主 流程只 log ERROR 不中断，用户往往不知道已有大量股票数据损坏。

Source: https://github.com/zvtvz/zvt/issues/183

### `AP-ZVT-179` — 第三方数据接口超限后异常被吞噬，数据静默缺失 <sub>(high)</sub>

ZVT 使用聚宽 jqdatasdk 批量拉取全市场 K 线时（4000+ 股票），触发聚宽每日 最大查询条数限制（错误：已超过每日最大查询数量）。ZVT 捕获异常后继续执行下一 entity，导致超限后所有股票的当日数据均静默缺失。回测若使用该残缺数据库，因 子计算结果将产生系统性偏差，且无告警。

Source: https://github.com/zvtvz/zvt/issues/179

### `AP-ZVT-161` — 全市场 SQLite 批量因子计算触发 too many SQL variables 错误 <sub>(medium)</sub>

ZVT 在计算 VolumeUpMaFactor 等多股因子时，将所有 entity_id 拼入单条 SQL 的 IN 子句。当 A 股全市场（5000+ 股）一次性查询时，触发 SQLite 默认限制 SQLITE_MAX_VARIABLE_NUMBER=999。调大 max_allowed_packet（MySQL 参数）无效， 根因是 SQLite 变量数上限。正确解法是分批查询，但 ZVT 早期版本未处理此边界。

Source: https://github.com/zvtvz/zvt/issues/161

### `AP-ZVT-129` — 使用通配符导入隐藏 API 版本变更，AdjustType 等枚举莫名消失 <sub>(medium)</sub>

ZVT 文档示例使用 `from zvt import *` 导入所有符号。当 ZVT 版本升级重构 枚举（如将 AdjustType 移入子模块）后，通配符导入不再包含该符号，触发 AttributeError。使用者误以为是安装问题，实际是版本间 API breaking change 未在 CHANGELOG 中标注，且通配符导入掩盖了具体来源。应显式 import 枚举类。

Source: https://github.com/zvtvz/zvt/issues/129

### `AP-ZVT-187` — 回测引擎未在数据层空结果时提前终止，导致空指针级联崩溃 <sub>(medium)</sub>

ZVT Trader 在 load_data 完成后检查数据为空时，不提前退出，而是将空 DataFrame 传入 selector 计算，触发后续 NoneType 操作链式崩溃。错误栈深且难以定位根因， 用户误以为是策略逻辑问题。根因是数据时间窗口配置错误（start/end 不在数据 库覆盖范围内）但无有效校验。

Source: https://github.com/zvtvz/zvt/issues/187

### `AP-ZVT-183B` — HFQ（后复权）与 QFQ（前复权）K 线表使用错误导致因子计算漂移 <sub>(high)</sub>

ZVT 提供 Stock1dKdata（不复权）、Stock1dHfqKdata（后复权）、Stock1dQfqKdata （前复权）三张独立表。用户在计算价格动量/均线因子时混用两张表（如用不复权 做均线，用后复权做收益率），导致除权日前后因子值产生跳变。ZVT 不做跨表 复权类型一致性校验，混用静默通过。

Source: https://github.com/zvtvz/zvt/issues/183

FILE:references/COMPONENTS.md
# Component Capability Map

**Project**: finance-bp-125--bt
**Scan date**: 2026-04-22
**Stats**: {'total_files': 6, 'total_classes': 30, 'total_functions': 0, 'total_stages': 6}

## Modules (6)

- [data_input_&_preprocessing](components/data_input_-_preprocessing.md): 3 classes
- [tree_structure_construction](components/tree_structure_construction.md): 6 classes
- [strategy_logic_execution_(algostack)](components/strategy_logic_execution_-algostack.md): 6 classes
- [capital_allocation_&_rebalancing](components/capital_allocation_-_rebalancing.md): 6 classes
- [value_propagation_&_stale_resolution](components/value_propagation_-_stale_resolution.md): 4 classes
- [result_analysis_&_benchmarking](components/result_analysis_-_benchmarking.md): 5 classes

FILE:references/CONSTRAINTS.md
# Constraints

## preservation_manifest

```yaml
required_objects:
  business_decisions_count: 105
  fatal_constraints_count: 34
  non_fatal_constraints_count: 129
  use_cases_count: 20
  semantic_locks_count: 12
  preconditions_count: 4
  evidence_quality_rules_count: 2
  traceback_scenarios_count: 5

```

## Domain Constraints Injected (39)

- **`SHARED-BT-LAB-001`** <sub>(fatal)</sub>: 未来函数（Lookahead Bias）：在模拟历史时间点 t 的交易决策时， 不得使用 t 时刻之后才能知道的信息。最常见形式： (1) 使用收盘价计算信号并同日以收盘价成交； (2) 将 T 日收盘后计算的指标标记在同一根 K 线； (3) 使用当日最高/最低价作为成交假设。 信号计算与成交时间必须对齐：T 日收盘后计算信号，T+1 日开盘成交。
- **`SHARED-BT-LAB-002`** <sub>(high)</sub>: 指标预热期（Warmup Period）处理：滚动窗口指标在前 N 个 bar 时 NaN， 这些 bar 不应参与信号计算和持仓决策。强制要求指标的 warmup_period 与最长 lookback 期等长，且 warmup 期间持仓应置零。
- **`SHARED-BT-LAB-003`** <sub>(fatal)</sub>: ML/DL 模型时序数据划分必须按时间顺序：TRAIN < VALID < TEST， 不可使用随机 k-fold 分折（会将未来数据混入训练集）。 应使用 TimeSeriesSplit 或 Walk-Forward 验证。
- **`SHARED-BT-LAB-004`** <sub>(fatal)</sub>: 开盘价/最高价/最低价成交假设：日线回测中假设每日可以最高价卖出或 最低价买入（如动量策略"最高价止盈"），这是明显的 lookahead， 因为日内最高/最低价只有收盘后才能确认。成交价只能用开盘价或 前一日收盘价（带滑点）。
- **`SHARED-BT-LAB-005`** <sub>(high)</sub>: 数据对齐偏移（Off-by-one）：pandas rolling/shift 等操作容易引入细微的 1 期偏移错误。应在代码中明确记录每个序列的"观测时间点"， 并通过 assert 验证关键时间对齐关系。
- **`SHARED-BT-LAB-006`** <sub>(high)</sub>: 过度优化（Overfitting）：回测数量越多，过拟合概率越高。 Bailey et al.（2014）证明 Optimal Sharpe Ratio 期望值随回测次数单调递减。 应使用 Walk-Forward 验证代替 in-sample 参数穷举，并报告 Deflated Sharpe Ratio（DSR）而非峰值 Sharpe。
- **`SHARED-BT-SURV-001`** <sub>(fatal)</sub>: 幸存者偏差（Survivorship Bias）：使用当前市场成分股作为历史回测股票池， 会遗漏曾经存在但后来退市、摘牌或被合并的股票，系统性高估策略历史收益率。 回测股票池必须使用历史时点快照（point-in-time universe）。
- **`SHARED-BT-SURV-002`** <sub>(high)</sub>: In-Sample / Out-of-Sample 划分：策略开发、参数选择必须在样本内完成， 样本外数据仅用于最终验证，不可多次"看"样本外数据后继续调优 （会将样本外变为新的样本内，重蹈过拟合）。
- **`SHARED-BT-SURV-003`** <sub>(high)</sub>: 停牌/缺失数据的填充策略：停牌日价格不可简单用前一日收盘价 forward-fill， 因为这会在复盘时造成"零成交量"日参与了因子计算和信号生成。 应在因子计算层显式过滤缺失交易日，不填充。
- **`SHARED-BT-SURV-004`** <sub>(high)</sub>: 异常值（Extreme Value）污染：原始市场数据可能含有数据源错误（如除权未 及时调整、手工录入错误导致的极端价格），不清洗直接进入因子计算会产生 极端信号，污染整个横截面。应在 pipeline 入口处过滤 3-sigma 异常值。
- **`SHARED-BT-COST-001`** <sub>(fatal)</sub>: 交易成本（佣金 + 印花税/转让税 + 过户费）必须在回测初始化时强制配置， 不可使用零成本默认值。忽略成本的回测策略绩效指标具有欺骗性， 高换手率策略尤其严重（单边往返成本往往吞噬 50%+ 的毛收益）。
- **`SHARED-BT-COST-002`** <sub>(high)</sub>: 滑点（Slippage）建模：回测若无滑点，假设每笔订单以理想价格成交， 高频策略在实盘中会因成交价劣化而产生严重亏损。至少应配置固定点差 或比例滑点；大单应使用成交量比例模型（如不超过日成交量 5%）。
- **`SHARED-BT-COST-003`** <sub>(high)</sub>: 换手率（Turnover）必须在回测绩效报告中展示并与成本关联分析。 月换手率超过 50%（年化 600%+）时，策略净收益对成本假设极度敏感， 每 10bps 成本变化可能改变策略盈亏结论，必须做成本敏感性分析。
- **`SHARED-BT-COST-004`** <sub>(medium)</sub>: 仓位规模化（Position Sizing）必须纳入资金量约束：回测应模拟固定资金量 下的实际持仓股数（取整），而非假设可以持有小数股。 对小盘股，最小交易单位（A股：100股/手）会导致实际可持仓量与目标权重 产生偏差，应在回测中模拟取整效应。
- **`SHARED-BT-TIME-001`** <sub>(high)</sub>: 时间戳时区统一：多数据源合并时，UTC vs 本地时间混用是常见数据腐败源。 所有时间戳必须在 pipeline 入口处统一转换为同一时区（推荐 UTC 存储， 市场本地时区展示），不可在 pipeline 中途混用不同时区。
- **`SHARED-BT-TIME-002`** <sub>(high)</sub>: 交易日历对齐：合并不同市场或不同频率数据时（如日线价格 + 周频因子）， 必须使用明确的交易日历进行 reindex/merge，不可使用 outer join 后 fillna， 否则会在非交易日（节假日）创建虚假数据行。
- **`SHARED-BT-TIME-003`** <sub>(high)</sub>: 增量更新边界校验：历史数据增量更新时，必须从数据库查询已存最新日期， 仅下载该日期之后的数据。若重新下载已有数据并追加，会产生时间戳重复行， 导致回测时序错误。更新前后必须校验无重复 (index.duplicated().any() == False)。
- **`SHARED-BT-TIME-004`** <sub>(medium)</sub>: 回测绩效归因失真：基准（Benchmark）选择不当会使 Alpha/Beta 计算失真。 应选用策略实际可投资的被动基准（如 HS300 ETF），而非不可直接投资的 价格指数（如 HS300 指数）。价格指数不含股息再投资，会低估持仓基准收益。
- **`SHARED-BT-PERF-001`** <sub>(medium)</sub>: 最大回撤（Max Drawdown）计算必须使用净值序列（portfolio value）， 不可用累计收益率序列代替。若使用对数收益率累加，会低估回撤深度 （因对数收益率在下跌时会比简单收益率偏小）。
- **`SHARED-BT-PERF-002`** <sub>(medium)</sub>: Sharpe Ratio 年化化约定：年化 Sharpe = 日 Sharpe × sqrt(252)（股票，252 交易日） 或 × sqrt(365)（加密货币，365日）。不同系统默认不同，跨系统对比前必须 确认年化因子，否则 Sharpe 不可比。
- **`SHARED-BT-PERF-003`** <sub>(medium)</sub>: Calmar Ratio / Sortino Ratio 优于 Sharpe Ratio 作为风险调整收益指标： Sharpe 假设收益正态分布，A 股/加密市场的收益分布显著左偏（肥尾）， 会低估下行风险。量化评估应同时报告 Sortino（仅下行波动）和 Calmar（年化收益/最大回撤），不应单一依赖 Sharpe。
- **`SHARED-BT-PERF-004`** <sub>(medium)</sub>: 回测绩效归因应拆解为：alpha（主动收益）、beta（市场收益）、 因子暴露收益（style/sector）和特异性收益（stock selection）。 不做归因的回测无法区分"策略优秀"与"顺风行情恰好 beta 对了"。
- **`SHARED-FR-IC-001`** <sub>(high)</sub>: IC（信息系数）是衡量因子预测能力的核心指标，定义为因子值与 下期收益率的 Spearman 秩相关系数（ICIR = IC / std(IC)）。 IC 绝对值 > 0.05 视为有预测能力的初步证据，ICIR > 0.5 视为稳定。 不计算 IC 直接报告回测绩效是因子有效性证明缺失的典型问题。
- **`SHARED-FR-IC-002`** <sub>(high)</sub>: IC 衰减（IC Decay）分析：因子预测能力通常随持仓期增长而衰减。 应计算 1/5/10/20 日 IC 序列，识别因子的最优持仓期。 IC 在1日高但20日迅速衰减的因子是短期因子，不适合月度换仓策略； 反之亦然。使用错误的持仓期会严重损害因子实盘表现。
- **`SHARED-FR-IC-003`** <sub>(high)</sub>: Harvey, Liu & Zhu (2016) 警告：学术界已发现 300+ 个"显著"因子， 其中大量是多重检验下的误发现（False Discovery）。因子有效性要求： t-stat > 3.0（而非传统的 1.96）；或在不同时段/市场独立复现； 或有清晰的经济学逻辑。不满足上述条件的因子极可能是数据挖掘产物。
- **`SHARED-FR-IC-004`** <sub>(high)</sub>: 因子换手率（Factor Turnover）控制：高 IC 但高换手率的因子，在扣除 交易成本后净 IC 可能为负。应计算换手率调整后的有效 IC： net_IC = IC - turnover × cost_per_turn。目标换手率 ≤ 50%（月频）。
- **`SHARED-FR-IC-005`** <sub>(medium)</sub>: 因子衰减期（Half-life）是因子信号强度的核心参数，直接决定最优再平衡频率。 半衰期 < 5 日：日频或周频换仓；5-20 日：周频或双周；> 20 日：月频换仓。 错误地对短期因子使用月频换仓，会导致大量 alpha 在持仓期内消散。
- **`SHARED-FR-NEUT-001`** <sub>(high)</sub>: 行业中性化（Industry Neutralization）：因子值若不对行业均值中性化， 因子收益中会混入行业轮动收益，难以判断是因子本身还是行业暴露驱动了收益。 行业中性化操作：factor_neutral = factor - industry_mean(factor)。
- **`SHARED-FR-NEUT-002`** <sub>(high)</sub>: 市值中性化（Market Cap Neutralization）：小盘股效应（小盘跑赢大盘） 是金融史上最持久的 anomaly 之一，会污染几乎所有未中性化的因子。 若因子与市值高度相关，选股会系统性偏向小盘，收益来自市值暴露而非因子本身。 需同时进行行业和市值中性化（Fama-MacBeth 回归或残差法）。
- **`SHARED-FR-NEUT-003`** <sub>(high)</sub>: 异常值处理（Winsorize/MAD）：因子原始值通常含有极端值，极端值会扭曲 分组分析（如 Q1/Q10 十分位）。应对原始因子值做 Winsorize（截尾至 [1%, 99%] 或 3-sigma）或 MAD（中位数绝对偏差）缩尾，然后再排名/中性化。
- **`SHARED-FR-NEUT-004`** <sub>(medium)</sub>: 因子正交化（Factor Orthogonalization）：当多个因子共同用于合成打分时， 高相关因子的合成等效于对单一因子过度权重，稀释信号多样性。 应在合成前对因子做施密特正交化或 PCA，消除因子间的多重共线性。
- **`SHARED-FR-NEUT-005`** <sub>(medium)</sub>: 缺失数据填充策略：因子计算中的 NaN（停牌/新股/数据缺口）若用截面均值填充 会引入 lookahead bias（均值本身含未来信息）；若完全删除会产生幸存者偏差； 正确做法是用截面中位数（当日所有股票的中位数，不依赖未来）或将该股当日排除。
- **`SHARED-FR-PORT-001`** <sub>(high)</sub>: 分层分析（Quantile Analysis）：因子评估应使用 Q1/Q5（五分位）或 Q1/Q10（十分位）分组的多空收益差（top minus bottom spread）作为 主要评估指标，而非简单的多头收益。Q1 多 Q5 空的"单调性"检验是 因子有效性的核心证据：单调递增/递减 > 非单调 >> 仅多头有效。
- **`SHARED-FR-PORT-002`** <sub>(medium)</sub>: Alpha 衰减测试（Alpha Decay Test）：因子的月度 IC 在不同时段（牛市/熊市/ 震荡市）的稳定性是因子鲁棒性的重要证据。IC 仅在某个特定市场状态下有效 的因子不适合全天候部署；应分段（rolling 12M）展示 IC 时序， 识别因子失效期。
- **`SHARED-FR-PORT-003`** <sub>(medium)</sub>: 换仓成本感知（Turnover-Aware Selection）：因子排名靠近中间地带（49-51 分位） 的股票，排名小幅波动就会触发换仓，产生大量无效交易成本。 应在选股时设置换仓缓冲区（buffer zone）：只在排名变化超过阈值时才换仓。
- **`SHARED-FR-PORT-004`** <sub>(medium)</sub>: 分组收益的统计显著性（Bootstrap 检验）：因子分层收益差（Q1-Q5 spread） 即使在历史数据上很大，也可能是偶然，需要 bootstrap 或 t-test 检验 显著性（p-value < 0.05）。小样本回测期（< 3年）的分层收益尤其不可靠。
- **`SHARED-FR-XFER-001`** <sub>(high)</sub>: 因子跨市场可移植性验证：在一个市场有效的因子，不必然在另一个市场有效。 将美股因子直接套用 A 股、或将股票因子套用期货/加密货币，需要独立 IC 验证， 不可假设跨市场通用性。A 股特有异象（如反转效应、ST 价格异常）不存在于美股。
- **`SHARED-FR-XFER-002`** <sub>(medium)</sub>: 因子有效性时间稳定性：曾经有效的因子会因市场学习和套利行为逐渐失效 （McLean & Pontiff 2016 证明因子发表后平均衰减 58%）。 应定期（每季度/年）重新评估因子 IC，失效因子应及时替换或降权。
- **`SHARED-FR-XFER-003`** <sub>(medium)</sub>: 因子与宏观经济环境的交互：利率周期/经济周期/市场情绪对因子有效性影响显著。 价值因子（低 P/B）在利率上升期更有效；动量因子在趋势市更有效，震荡市失效。 部署因子前应评估当前宏观环境与因子最优生存环境的匹配度。

FILE:references/LOCKS.md
# Semantic Locks + Preconditions

## Semantic Locks (12)

### `SL-01` <sub>(on_violation: fatal)</sub>
Execute sell orders before buy orders in every trading cycle

### `SL-02` <sub>(on_violation: fatal)</sub>
Trading signals MUST use next-bar execution (no look-ahead)

### `SL-03` <sub>(on_violation: fatal)</sub>
Entity IDs MUST follow format entity_type_exchange_code

### `SL-04` <sub>(on_violation: fatal)</sub>
DataFrame index MUST be MultiIndex (entity_id, timestamp)

### `SL-05` <sub>(on_violation: fatal)</sub>
TradingSignal MUST have EXACTLY ONE of: position_pct, order_money, order_amount

### `SL-06` <sub>(on_violation: fatal)</sub>
filter_result column semantics: True=BUY, False=SELL, None/NaN=NO ACTION

### `SL-07` <sub>(on_violation: fatal)</sub>
Transformer MUST run BEFORE Accumulator in factor pipeline

### `SL-08` <sub>(on_violation: fatal)</sub>
MACD parameters locked: fast=12, slow=26, signal=9

### `SL-09` <sub>(on_violation: warning)</sub>
Default transaction costs: buy_cost=0.001, sell_cost=0.001, slippage=0.001

### `SL-10` <sub>(on_violation: fatal)</sub>
A-share equity trading is T+1 (no same-day close of buy positions)

### `SL-11` <sub>(on_violation: fatal)</sub>
Recorder subclass MUST define provider AND data_schema class attributes

### `SL-12` <sub>(on_violation: fatal)</sub>
Factor result_df MUST contain either 'filter_result' OR 'score_result' column

## Preconditions (4)

- **`PC-01`**: `python3 -c 'import zvt; print(zvt.__version__)'` → on_fail: Run: python3 -m pip install zvt then re-run: python3 -m zvt.init_dirs to initialize data directories
- **`PC-02`**: `python3 -c "from zvt.api.kdata import get_kdata; df = get_kdata(entity_ids=['stock_sh_600000'], limit=1); assert df is not None and len(df) > 0, 'No kdata found'"` → on_fail: Run recorder first: python3 -m zvt.recorders.em.em_stock_kdata_recorder --entity_ids stock_sh_600000 (replace with your target entity IDs)
- **`PC-03`**: `python3 -c "import os; from pathlib import Path; zvt_home = Path(os.environ.get('ZVT_HOME', Path.home() / '.zvt')); assert zvt_home.exists(), f'ZVT home not found: {zvt_home}'"` → on_fail: Run: python3 -m zvt.init_dirs
- **`PC-04`**: `python3 -c "import os, tempfile; from pathlib import Path; zvt_home = Path(os.environ.get('ZVT_HOME', Path.home() / '.zvt')); test_f = zvt_home / '.write_test'; test_f.touch(); test_f.unlink()"` → on_fail: Check directory permissions: chmod u+w ~/.zvt or set ZVT_HOME environment variable to a writable location

FILE:references/USE_CASES.md
# Known Use Cases (KUC)

Total: **20**

## `KUC-101`
**Source**: `docs/source/Buy_and_hold.ipynb`

Implements a passive buy-and-hold strategy with monthly rebalancing to fixed target weights, demonstrating core backtesting framework capabilities.

## `KUC-102`
**Source**: `docs/source/ERC.ipynb`

Demonstrates Equal Risk Contribution (ERC) portfolio weighting using multivariate normal returns and covariance matrix inputs for risk parity allocation.

## `KUC-103`
**Source**: `docs/source/Fixed_Income.ipynb`

Simulates rolling government bond trading with synthetic price-to-yield calculations and bond lifecycle management for fixed income backtesting.

## `KUC-104`
**Source**: `docs/source/PTE.ipynb`

Implements inverse volatility weighting to allocate more capital to lower-volatility assets, with 3-month lookback and 1-day lag for rebalancing.

## `KUC-105`
**Source**: `docs/source/Strategy_Combination.ipynb`

Combines multiple trading strategies into a single portfolio to test strategy allocation and diversification across different algorithmic approaches.

## `KUC-106`
**Source**: `docs/source/Target_Volatility.ipynb`

Controls portfolio-level volatility to a target annualized level (10%) using weekly rebalancing with inverse volatility asset weights.

## `KUC-107`
**Source**: `docs/source/Trend_1.ipynb`

Implements trend following using a rolling 12-month median as a moving average signal for asset selection and timing decisions.

## `KUC-108`
**Source**: `docs/source/Trend_2.ipynb`

Demonstrates custom algorithm creation by implementing a Signal algo that calculates total returns over a lookback period for monthly rebalancing decisions.

## `KUC-109`
**Source**: `docs/source/examples-nb.ipynb`

Demonstrates the SelectWhere algorithm for selecting securities based on custom signal DataFrames, using 50-day rolling mean as a sample indicator.

## `KUC-110`
**Source**: `docs/source/intro.ipynb`

Educational example comparing monthly equal-weight vs weekly inverse-volatility strategies using real market data (AAPL, MSFT, SPY, AGG).

## `KUC-111`
**Source**: `examples/pairs_trading.py`

Implements statistical arbitrage pairs trading by identifying cointegrated pairs whose indicator exceeds threshold for long/short positioning.

## `KUC-112`
**Source**: `examples/buy_and_hold.py`

Executable Python version of buy-and-hold strategy with monthly rebalancing, demonstrating standalone script execution for portfolio backtesting.

## `KUC-113`
**Source**: `examples/fixed_income.ipynb`

Creates synthetic government bond data with rolling maturity schedules, price-to-yield calculations, and bond lifecycle management for fixed income backtesting.

## `KUC-114`
**Source**: `examples/ERC.ipynb`

Equal Risk Contribution portfolio using multivariate normal returns and explicit covariance matrix for risk parity weighting across assets.

## `KUC-115`
**Source**: `examples/PTE.ipynb`

Inverse volatility weighting strategy using 3-month historical data and 1-day lag to reduce risk concentration in high-volatility assets.

## `KUC-116`
**Source**: `examples/Strategy_Combination.ipynb`

Combines multiple strategies into a unified portfolio allocation framework for testing strategy diversification and correlation effects.

## `KUC-117`
**Source**: `examples/Target_Volatility.ipynb`

Controls portfolio volatility to 10% annualized target using weekly rebalancing and inverse volatility asset weighting with 12-month lookback.

## `KUC-118`
**Source**: `examples/buy_and_hold.ipynb`

Basic buy-and-hold strategy with monthly rebalancing to fixed weights (60/40), demonstrating core framework rebalancing mechanics.

## `KUC-119`
**Source**: `examples/trend_1.ipynb`

Trend following strategy using 12-month rolling median as a baseline indicator, visualizing price vs moving average crossover signals.

## `KUC-120`
**Source**: `examples/trend_2.ipynb`

Custom Signal algorithm that calculates total returns over configurable lookback periods for monthly rebalancing decisions.

FILE:references/WISDOM.md
# Cross-Project Wisdom

Total: **10**

## `CW-BT-001` — Cerebro 统一编排引擎
**From**: backtrader · **Applicable to**: backtesting

backtrader 用 Cerebro 作为单一入口，统一管理 data feeds、strategies、analyzers、 observers 的生命周期，支持一次 cerebro.run() 跑多策略+多数据源。 zvt 的 StockTrader 目前每次实例化只绑定一套因子，缺乏统一的多策略组合编排层； 借鉴 Cerebro 模式可让用户把多个 Trader 实例组合到一个 runner 中对比评估。

## `CW-BT-002` — Analyzer 插件化绩效评估
**From**: backtrader · **Applicable to**: backtesting

backtrader 提供 SharpeRatio、DrawDown、TimeReturn、TradeAnalyzer 等即插即用 的 Analyzer，可在不修改策略代码的情况下附加任意绩效指标。 zvt 当前绩效评估能力较弱，没有标准化的 Analyzer 接口； 借鉴此模式可让用户 cerebro.addanalyzer(SharpeRatio) 即得风险调整收益报告。

## `CW-BT-003` — Sizer 仓位管理分离
**From**: backtrader · **Applicable to**: backtesting

backtrader 将仓位管理（每次开仓买多少股/多大比例）单独抽象为 Sizer， 与信号逻辑完全解耦；内置 FixedSize、PercentSizer 等，用户可自定义。 zvt 目前没有显式的 Sizer 概念，仓位控制逻辑散落在 Trader.on_profit_control 等钩子中； 引入 Sizer 接口可使策略信号与资金管理规则独立演化和组合复用。

## `CW-BT-004` — Order 类型全集（Limit/Stop/OCO/Bracket）
**From**: backtrader · **Applicable to**: backtesting

backtrader 支持 Market、Limit、Stop、StopLimit、OCO（二选一）、 Bracket（止盈止损一对订单）等丰富订单类型，并模拟成交滑点和手续费方案。 zvt 回测目前主要支持市价成交，缺乏限价委托和组合订单模拟； 对于高频或实盘对接场景，完善订单类型将大幅提升回测真实性。

## `CW-BT-005` — 数据重采样与重播（Resampling & Replaying）
**From**: backtrader · **Applicable to**: backtesting

backtrader 可将低级别数据（如 1 min）实时 resample 为高级别（如 1 day）并同步驱动策略， 或 replay 逐 tick 模拟 OHLC 形成过程，实现日内精细回测。 zvt 目前多时间框架通过预录入不同级别 K 线实现，缺少运行时动态重采样； 借鉴此模式可在不重复录入数据的前提下支持任意时间粒度组合回测。

## `CW-VN-003` — CTA 回测引擎内置可视化
**From**: vnpy · **Applicable to**: backtesting

vnpy 的 cta_backtester 提供图形界面直接展示策略净值曲线、最大回撤、 每日盈亏、成交明细，无需 Jupyter Notebook。 zvt 目前回测结果可视化依赖 draw_result 方法调用 Plotly，但无统一的回测报告页面； 借鉴此模式可打包一个开箱即用的策略绩效仪表盘。

## `CW-VN-004` — vnpy.alpha ML 因子研究实验室（Lab）
**From**: vnpy · **Applicable to**: factor-research

vnpy 4.0 的 vnpy.alpha.lab 提供数据管理、模型训练、信号生成、策略回测一体化工作流， 支持 Lasso/LightGBM/MLP 等算法的标准化训练接口和可视化对比。 zvt 的 ML 能力目前仅有 MaStockMLMachine 一个入口，缺乏规范化 Lab 框架； 借鉴 Lab 模式可建立"特征工程→训练→信号→回测"的标准流水线，降低 ML 实验门槛。

## `CW-QL-001` — Point-in-Time 数据库（防未来数据泄漏）
**From**: qlib · **Applicable to**: backtesting

qlib 的 Point-in-Time Provider 保证在给定时间点 t 的查询只返回 t 时刻 真实可知的数据（财报发布延迟、修订历史均被正确处理）， 彻底消除回测中的 look-ahead bias。 zvt 目前财务数据以报告期为 timestamp，缺少"发布日"维度， 存在用未来财报数据做选股的潜在偏差；引入 PIT 模式可大幅提升回测可信度。

## `CW-QL-002` — Recorder + Experiment 实验管理（MLflow 风格）
**From**: qlib · **Applicable to**: factor-research

qlib 的 workflow 模块提供 Experiment/Recorder，自动记录每次模型训练的 超参数、特征、指标、预测结果，支持跨实验比较和模型版本管理。 zvt 目前缺乏 ML 实验追踪机制，每次重跑结果会覆盖前次； 借鉴 Recorder 模式可将每次因子实验的参数和结果持久化，支持快速复现和版本对比。

## `CW-QL-003` — Nested Decision Framework（多层嵌套决策执行）
**From**: qlib · **Applicable to**: backtesting

qlib 支持将高频执行层（分钟级委托拆单）嵌套在低频决策层（日级组合调仓）中， 两层独立优化且可组合运行，实现日内最优执行算法（如 TWAP、VWAP 调仓）。 zvt 目前回测仅有日线级别的成交假设，缺乏执行算法建模； 借鉴嵌套框架可让 zvt 区分"何时持有哪些股"与"如何以最小冲击成本建仓"两个问题。

FILE:references/components/capital_allocation_-_rebalancing.md
# capital_allocation_&_rebalancing (6 classes)

## `StrategyBase.allocate`
`capital_allocation_&_rebalancing/strategybase-allocate.py:0`

## `StrategyBase.rebalance`
`capital_allocation_&_rebalancing/strategybase-rebalance.py:0`

## `SecurityBase.transact`
`capital_allocation_&_rebalancing/securitybase-transact.py:0`

## `SecurityBase.outlay`
`capital_allocation_&_rebalancing/securitybase-outlay.py:0`

## `weight_mode`
`capital_allocation_&_rebalancing/weight-mode.py:0`

## `commission_model`
`capital_allocation_&_rebalancing/commission-model.py:0`

FILE:references/components/data_input_-_preprocessing.md
# data_input_&_preprocessing (3 classes)

## `Backtest.run`
`data_input_&_preprocessing/backtest-run.py:0`

## `Backtest._process_data`
`data_input_&_preprocessing/backtest-process-data.py:0`

## `data_source`
`data_input_&_preprocessing/data-source.py:0`

FILE:references/components/result_analysis_-_benchmarking.md
# result_analysis_&_benchmarking (5 classes)

## `Result.__init__`
`result_analysis_&_benchmarking/result-init.py:0`

## `RandomBenchmarkResult.__init__`
`result_analysis_&_benchmarking/randombenchmarkresult-init.py:0`

## `RenormalizedFixedIncomeResult._price`
`result_analysis_&_benchmarking/renormalizedfixedincomeresult-price.py:0`

## `Result.get_transactions`
`result_analysis_&_benchmarking/result-get-transactions.py:0`

## `normalization`
`result_analysis_&_benchmarking/normalization.py:0`

FILE:references/components/strategy_logic_execution_-algostack.md
# strategy_logic_execution_(algostack) (6 classes)

## `Algo.__call__`
`strategy_logic_execution_(algostack)/algo-call.py:0`

## `AlgoStack.__call__`
`strategy_logic_execution_(algostack)/algostack-call.py:0`

## `Strategy.run`
`strategy_logic_execution_(algostack)/strategy-run.py:0`

## `run_timing`
`strategy_logic_execution_(algostack)/run-timing.py:0`

## `selection_logic`
`strategy_logic_execution_(algostack)/selection-logic.py:0`

## `weighting_logic`
`strategy_logic_execution_(algostack)/weighting-logic.py:0`

FILE:references/components/tree_structure_construction.md
# tree_structure_construction (6 classes)

## `Node.__init__`
`tree_structure_construction/node-init.py:0`

## `StrategyBase.__init__`
`tree_structure_construction/strategybase-init.py:0`

## `SecurityBase.__init__`
`tree_structure_construction/securitybase-init.py:0`

## `Node._add_children`
`tree_structure_construction/node-add-children.py:0`

## `security_type`
`tree_structure_construction/security-type.py:0`

## `lazy_creation`
`tree_structure_construction/lazy-creation.py:0`

FILE:references/components/value_propagation_-_stale_resolution.md
# value_propagation_&_stale_resolution (4 classes)

## `StrategyBase.update`
`value_propagation_&_stale_resolution/strategybase-update.py:0`

## `SecurityBase.update`
`value_propagation_&_stale_resolution/securitybase-update.py:0`

## `StrategyBase._sync_data`
`value_propagation_&_stale_resolution/strategybase-sync-data.py:0`

## `price_source`
`value_propagation_&_stale_resolution/price-source.py:0`

ClawHub Coding Data Analysis+2

T@clawhub-tangweigang-jpg-8679fec286

Previous3 / 4Next