@clawhub-tangweigang-jpg-8679fec286
使用 Freqtrade 框架加载多交易所 OHLCV 历史数据并进行策略回测分析。
---
name: freqtrade-crypto-bot
description: |-
使用 Freqtrade 框架加载多交易所 OHLCV 历史数据并进行策略回测分析。
license: Proprietary. See LICENSE.txt in project root.
compatibility: Designed for Doramagic-host ecosystem (Claude Code / openclaw / Cursor). Requires Python 3.12+ with uv package manager.
metadata:
version: "v6.1"
blueprint_id: "finance-bp-085"
compiled_at: "2026-04-22T13:00:34.948027+00:00"
capability_markets: "multi-market"
capability_activities: "backtesting, factor-research"
sop_version: "crystal-compilation-v6.1"
---
# Freqtrade 加密回测 (freqtrade-crypto-bot)
> 使用 Freqtrade 框架加载多交易所 OHLCV 历史数据并进行策略回测分析。
## Pipeline
`data_collection -> data_storage -> factor_computation -> target_selection -> trading_execution -> visualization`
## Top Use Cases (1 total)
### Strategy Analysis Template (`UC-101`)
Users need a template to load historical market data and analyze trading strategy performance using Freqtrade's configuration and history loading capa
**Triggers**: strategy analysis, backtesting template, historical data loading
**Execute trigger**: `When user intent matches intent_router.uc_entries[].positive_terms AND user uses action verb (run/execute/跑/执行/backtest/fetch/collect)`
## What I'll Ask You
- Target market: A-share (default), HK, or crypto? (US stocks in ZVT are half-baked — stockus_nasdaq_AAPL exists but coverage is thin)
- Data source / provider: eastmoney (free, no account), joinquant (account+paid), baostock (free, good history), akshare, or qmt (broker)?
- Strategy type: MACD golden-cross, MA crossover, volume breakout, fundamental screen, or custom factor?
- Time range: start_timestamp and end_timestamp for backtest period
- Target entity IDs: specific stocks (stock_sh_600000) or index components (SZ1000)?
## Semantic Locks (Fatal)
| ID | Rule | On Violation |
|---|---|---|
| `SL-01` | Execute sell orders before buy orders in every trading cycle | halt |
| `SL-02` | Trading signals MUST use next-bar execution (no look-ahead) | halt |
| `SL-03` | Entity IDs MUST follow format entity_type_exchange_code | halt |
| `SL-04` | DataFrame index MUST be MultiIndex (entity_id, timestamp) | halt |
| `SL-05` | TradingSignal MUST have EXACTLY ONE of: position_pct, order_money, order_amount | halt |
| `SL-06` | filter_result column semantics: True=BUY, False=SELL, None/NaN=NO ACTION | halt |
| `SL-07` | Transformer MUST run BEFORE Accumulator in factor pipeline | halt |
| `SL-08` | MACD parameters locked: fast=12, slow=26, signal=9 | halt |
Full lock definitions: [references/LOCKS.md](references/LOCKS.md)
## Top Anti-Patterns (25 total)
- **`AP-ZVT-183`**: 除权因子为 inf/NaN 时直接参与乘法导致复权静默失败
- **`AP-ZVT-179`**: 第三方数据接口超限后异常被吞噬,数据静默缺失
- **`AP-ZVT-183B`**: HFQ(后复权)与 QFQ(前复权)K 线表使用错误导致因子计算漂移
All 25 anti-patterns: [references/ANTI_PATTERNS.md](references/ANTI_PATTERNS.md)
## Evidence Quality Notice
> [QUALITY NOTICE] This crystal was compiled from blueprint finance-bp-085. Evidence verify ratio = 43.3% and audit fail total = 1. Generated results may have uncaptured requirement gaps. Verify critical decisions against source files (LATEST.yaml / LATEST.jsonl).
## Reference Files
| File | Contents | When to Load |
|---|---|---|
| [references/seed.yaml](references/seed.yaml) | V6+ 全量权威 (source-of-truth) | 有行为/决策争议时必读 |
| [references/ANTI_PATTERNS.md](references/ANTI_PATTERNS.md) | 25 条跨项目反模式 | 开始实现前 |
| [references/WISDOM.md](references/WISDOM.md) | 跨项目精华借鉴 | 架构决策时 |
| [references/CONSTRAINTS.md](references/CONSTRAINTS.md) | domain + fatal 约束 | 规则冲突时 |
| [references/USE_CASES.md](references/USE_CASES.md) | 全量 KUC-* 业务场景 | 需要完整示例时 |
| [references/LOCKS.md](references/LOCKS.md) | SL-* + preconditions + hints | 生成回测/交易代码前 |
| [references/COMPONENTS.md](references/COMPONENTS.md) | AST 组件地图(按 module 拆分)| 查 API 时 |
---
*Compiled by Doramagic crystal-compilation-v6.1 from `finance-bp-085` blueprint at 2026-04-22T13:00:34.948027+00:00.*
*See [human_summary.md](human_summary.md) for non-technical overview.*
FILE:human_summary.md
# Human Summary
> I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow. Just tell me what you want; I'll write the code, you don't have to dig docs. (Heads up: ZVT natively supports A-share, HK, and crypto. US stocks — stockus_nasdaq_AAPL — are half-baked; don't bother for serious work.)
## What I Can Do
- **tagline**: I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow. Just tell me what you want; I'll write the code, you don't have to dig docs. (Heads up: ZVT natively supports A-share, HK, and crypto. US stocks — stockus_nasdaq_AAPL — are half-baked; don't bother for serious work.)
- **use_cases**: ['Strategy Analysis Template', 'A-share MACD daily golden-cross backtest with hfq price adjustment from eastmoney', 'End-to-end ZVT pipeline: FinanceRecorder + GoodCompanyFactor + StockTrader', 'Multi-factor strategy with TargetSelector (AND mode) combining MACD + volume breakout', 'Index composition data collection (SZ1000, SZ2000) with EM recorder', 'Institutional fund holdings tracker via joinquant_fund_runner pattern', 'Custom Transformer + Accumulator factor with per-entity rolling state']
## What I Ask You
- Target market: A-share (default), HK, or crypto? (US stocks in ZVT are half-baked — stockus_nasdaq_AAPL exists but coverage is thin)
- Data source / provider: eastmoney (free, no account), joinquant (account+paid), baostock (free, good history), akshare, or qmt (broker)?
- Strategy type: MACD golden-cross, MA crossover, volume breakout, fundamental screen, or custom factor?
- Time range: start_timestamp and end_timestamp for backtest period
- Target entity IDs: specific stocks (stock_sh_600000) or index components (SZ1000)?
FILE:references/ANTI_PATTERNS.md
# Anti-Patterns (Cross-Project)
Total: **25**
## qlib (9)
### `AP-QLIB-1930` — 回测结果与模型无关——共享 dataset 对象导致预测值被首次模型覆盖 <sub>(high)</sub>
Qlib 中多个模型复用同一个已 fit 的 DatasetH 实例时,dataset 内部的标准化 参数(fit_start_time/fit_end_time 决定的归一化统计量)在第一次 fit 后固化。 切换模型但不重新初始化 dataset,导致所有模型实际使用同一套预测信号。表现为 无论换 LightGBM/XGBoost/DNN,回测净值曲线完全一致。这是最危险的"实验看起来 在跑,但结论全部无效"反模式。
Source: https://github.com/microsoft/qlib/issues/1930
### `AP-QLIB-2090` — fit_start_time 与 train segment 双重配置引发隐式数据泄露 <sub>(high)</sub>
Qlib DatasetH 有两个"训练数据范围":handler 的 fit_start_time/fit_end_time (决定归一化器拟合范围)和 segments.train(决定模型训练范围)。常见错误是 让 fit_end_time 覆盖 valid/test 段,使归一化统计量(均值、标准差)包含了 未来数据,造成前向偏差(look-ahead bias)。两者独立配置但语义耦合,文档 未明确说明 fit_end_time 必须 <= train_end。
Source: https://github.com/microsoft/qlib/issues/2090
### `AP-QLIB-2036` — MACD 因子公式文档错误——DEA 被多除一次 CLOSE 导致量纲不一致 <sub>(high)</sub>
Qlib 官方文档中的 Alpha 公式示例将 MACD 的 DEA 定义为 EMA(DIF, 9) / CLOSE, 但 DIF 已经是无量纲(除过 CLOSE 的),再次除以 CLOSE 导致 DEA 量纲为 1/price。 基于此文档公式构建的 MACD 因子在截面标准化后与正确公式差异显著,IC 下降。 此类文档层面的公式错误会被大量用户直接照搬入生产因子库。
Source: https://github.com/microsoft/qlib/issues/2036
### `AP-QLIB-2184` — 自定义 A 股数据导入前未按约定填充停牌日 NaN,引发下游因子噪声 <sub>(high)</sub>
Qlib 约定停牌日 open/close/high/low/volume/factor 字段均应填 NaN,以便框架 在因子计算时识别并跳过。用户自建 A 股数据集时若将停牌日保留为上一日价格 (常见于从东财/Wind 直接导出的数据),会导致停牌期间的价格动量因子出现 "假信号"(价格不变但因子非零)。Qlib 不校验此约定,错误静默流入训练数据。
Source: https://github.com/microsoft/qlib/issues/2184
### `AP-QLIB-1892` — PIT(Point-In-Time)财务数据收集器依赖外部股票列表接口,全量 A 股获取不完整 <sub>(high)</sub>
Qlib 的 PIT 数据收集器(财务数据时间点快照)在初始化时调用 get_hs_stock_symbols() 获取沪深股票列表。该函数依赖东财 API,经常仅返回 部分列表而非全量 5000+ 股票,且函数在获取不完整时直接 raise ValueError。 用户若按文档步骤操作,财务数据集将只覆盖部分股票,基于 PIT 财务因子的回测 存在严重生存者偏差(未被采集的股票被隐式排除)。
Source: https://github.com/microsoft/qlib/issues/1892
### `AP-QLIB-2097` — 全市场 instrument="all" 在 32GB 内存机器上 OOM,但 CSI300 正常 <sub>(medium)</sub>
Qlib 在加载 Alpha158 特征时会将指定 universe 的全部特征矩阵一次性载入内存。 使用 instrument="csi300"(300 股)与 instrument="all"(5000+ 股)的内存占用 差约 16 倍。32GB 机器跑全市场时在 init_instance_by_config 阶段直接 OOM, 错误信息不提示内存问题。用户容易误以为是配置错误,实际上需要分批加载或 使用流式特征计算。
Source: https://github.com/microsoft/qlib/issues/2097
### `AP-QLIB-1984` — LightGBM 模型标签维度校验逻辑永远不触发导致多标签训练静默失败 <sub>(medium)</sub>
Qlib gbdt.py 中用 y.values.ndim == 2 判断是否为多标签,但从 DataFrame 取出的 Series 的 ndim 永远为 1,条件永远为 False,因此多标签训练不会走 squeeze 分支,而是直接进入 LightGBM 训练并在更深处抛出语义不明的错误。 用户尝试自定义多标签任务时无法从错误信息定位到此根因。
Source: https://github.com/microsoft/qlib/issues/1984
### `AP-QLIB-1915` — 自定义 CSV 数据 dump_bin 后 DataHandler 报 Length mismatch,D.features 却正常 <sub>(high)</sub>
Qlib 存在两套数据访问路径:D.features(直接读 binary)和 DataHandler/DataHandlerLP (带 processor pipeline)。自定义 A 股 CSV 数据在 dump_bin 时若字段顺序 或 symbol 格式(如 600000.SH vs SH600000)与 Qlib 约定不符,DataHandler 的 processor 在 align/reindex 时触发 Length mismatch,而 D.features 因不 经过 processor 而成功。这一"两套路径行为不一致"让用户误以为数据已正确导入。
Source: https://github.com/microsoft/qlib/issues/1915
### `AP-QLIB-1949` — Colab/Linux 多进程后端与 Qlib ParallelExt 冲突导致 DataHandler 完全不可用 <sub>(medium)</sub>
Qlib 在非 fork 环境(Windows 或 Google Colab)中,DataHandler 使用 joblib 并行加载特征时,ParallelExt 初始化时访问 _backend_args 属性失败(AttributeError)。 根因是 joblib 1.5+ 移除了该内部属性,Qlib 的兼容层未更新。表现为 D.features 调用抛出多层嵌套异常,用户无法从错误栈判断是并行后端问题还是数据问题。
Source: https://github.com/microsoft/qlib/issues/1949
## vnpy (4)
### `AP-VNPY-3691` — K 线生成器首根 K 线时间戳不对齐,导致第一个周期信号错误 <sub>(high)</sub>
vnpy BarGenerator 在合成 N 分钟 K 线时,第一根推送的 K 线时间戳为"当前 tick 所在分钟"而非"完整 N 分钟周期结束时间"。具体表现:09:59 的 tick 会 触发一根不完整的 5 分钟 K 线推送(本应等到 10:04 才推送)。策略若在 on_bar 中直接用 datetime.minute % 5 过滤,第一根 K 线恰好通过,但包含的 数据不足一个完整周期,用于信号计算会产生错误的开仓信号。
Source: https://github.com/vnpy/vnpy/issues/3691
### `AP-VNPY-3669` — Alpha 模块历史数据增量保存时新旧 DataFrame schema 不兼容导致 SchemaError <sub>(medium)</sub>
vnpy Alpha 模块在保存 K 线数据到 Parquet 文件时,将新下载数据(可能含 Float64 列)与已存文件(历史 Int64 列)直接 polars.concat。polars 强类型 不允许隐式类型提升,抛出 SchemaError。根因是不同数据源/版本返回的字段类型 不一致(如 volume 在部分行情源为整数,在另一些为浮点),且 concat 前无 schema 对齐步骤。影响所有使用 vnpy alpha 进行回测的历史数据构建流程。
Source: https://github.com/vnpy/vnpy/issues/3669
### `AP-VNPY-3685` — 价差交易模块 run_backtesting() 在 Jupyter 环境下静默报错,结果不可信 <sub>(high)</sub>
vnpy 4.10 价差交易(SpreadTrading)模块的 run_backtesting() 在 Jupyter 环境下存在事件循环冲突(asyncio already running),导致回测引擎部分逻辑 不执行但不抛异常,返回看似正常的回测统计数据。同样代码在命令行 Python 中无此问题。vnpy 4.x 将部分 IO 改为 async 但 Jupyter 的事件循环与之不兼容, 是"回测结果看起来正确但实际不完整"的隐蔽陷阱。
Source: https://github.com/vnpy/vnpy/issues/3685
### `AP-VNPY-3700` — 安装脚本不使用 venv 导致全局 numpy 版本被降级破坏其他依赖 <sub>(medium)</sub>
vnpy install.bat 直接在系统/conda base 环境安装,会强制降级 numpy 到 <2.0 以满足 vnpy 依赖,破坏依赖 numpy 2.x 的其他量化工具(如 scipy、pytorch 新版)。 没有 requirements.txt,依赖边界不透明。在多工具共存的量化研究环境中, vnpy 的安装脚本是"全局环境污染"的常见根源。
Source: https://github.com/vnpy/vnpy/issues/3700
## zipline (6)
### `AP-ZIPLINE-138` — 回测价格为未复权价,教程图表误导用户误判策略收益 <sub>(high)</sub>
Zipline 教程使用 AAPL 股价图做演示,但 bundle 中存储的是未复权价格(raw price), 而非经过拆股/分红调整的复权价。图表显示的历史价格与市场实际价约差 4 倍(Apple 历次拆股累计因子),用户误将"价格翻 4 倍"当作策略收益。A 股场景更严重: 除权前后价格跳变会在未复权数据中形成巨大"信号",吸引技术指标在除权日产生 虚假突破信号。
Source: https://github.com/stefan-jansen/zipline-reloaded/issues/138
### `AP-ZIPLINE-235` — 默认以当根 K 线收盘价成交,低估实盘滑点,策略回测收益虚高 <sub>(high)</sub>
Zipline 默认滑点模型在当根 K 线触发信号后,以同根 K 线收盘价成交(current bar close fill)。实盘中信号只能在下一根 K 线的开盘价附近成交(T+1 order execution)。以 A 股日线为例,用收盘价回测比用次日开盘价成交平均高估日收益 约 0.1-0.3%,年化差距可超 30%。需显式配置 slippage model 为 VolumeShareSlippage 或 FixedSlippage 并设合理 volume_limit。
Source: https://github.com/stefan-jansen/zipline-reloaded/issues/235
### `AP-ZIPLINE-190` — 日历 start_session 设为非交易日触发 DateOutOfBounds,无提示如何修正 <sub>(medium)</sub>
Zipline 在注册 bundle 或运行算法时,若 start_session 参数恰好是非交易日 (如 1998-01-01 元旦),Calendar 校验抛出 DateOutOfBounds("cannot be earlier than the first session")。错误信息仅显示交易日历起始日,不提示"请改为第一个 交易日"。A 股场景:使用 SSE/SZSE 日历时,若 start_date 恰好是春节前最后 一天次日(节假日),会触发同类错误,调试成本极高。
Source: https://github.com/stefan-jansen/zipline-reloaded/issues/190
### `AP-ZIPLINE-181` — asset db 过期后 Pipeline 报"no assets traded",误导用户排查数据范围 <sub>(high)</sub>
Zipline 的 asset database(SQLite)记录每只股票的 start/end 交易日期。若 使用了旧版 Quandl/自建 bundle 且未重新 ingest,在回测新日期范围时 Pipeline 抛出 "Failed to find any assets with country_code 'US' that traded between [dates]"。A 股场景:重新下载行情后若只更新价格数据而未重建 asset db,退市/ 新上市股票的日期范围不更新,Pipeline 过滤会悄悄排除这些股票,产生生存者偏差。
Source: https://github.com/stefan-jansen/zipline-reloaded/issues/181
### `AP-ZIPLINE-285` — week_start()/week_end() 在自定义日历(非美股)下静默失效 <sub>(medium)</sub>
Zipline schedule_function 的 date_rules.week_start() 和 date_rules.week_end() 依赖交易日历的周首/周末判断逻辑,但在非美股日历(如 ASX、SSE)中,该逻辑 与 NYSE 日历的偏移计算不兼容,导致 schedule 永远不触发或在错误的日期触发。 A 股场景:使用 SSE 日历时,含春节等连续长假的周,week_start 可能跳过整个 假期周而不调仓,但用户无法从日志发现未触发的调度。
Source: https://github.com/stefan-jansen/zipline-reloaded/issues/285
### `AP-ZIPLINE-240` — 回测日期时区必须为 UTC,传入 naive datetime 引发深层 AssertionError <sub>(medium)</sub>
Zipline 内部强制要求所有时间戳为 UTC aware datetime。当用户传入 naive datetime (无时区信息,如 pd.Timestamp('2020-01-01'))时,不在入口处报错,而是在 算法执行深处触发 AssertionError: Algorithm should have a utc datetime,栈深 难以定位。A 股开发者从本地 CST 时间导入数据时极易触发此陷阱,需在 bundle 注册时显式 tz_localize('UTC')。
Source: https://github.com/stefan-jansen/zipline-reloaded/issues/240
## zvt (6)
### `AP-ZVT-183` — 除权因子为 inf/NaN 时直接参与乘法导致复权静默失败 <sub>(high)</sub>
ZVT 在计算前复权因子时以 new/old 价格比计算 qfq_factor。当 old==0(新股首日 或数据缺失)时因子为 inf;当 kdata.open 本身为 None(停牌日未填充)时乘法 抛出 TypeError。结果:整个 entity 的复权计算中断,后续 K 线全部丢失,但主 流程只 log ERROR 不中断,用户往往不知道已有大量股票数据损坏。
Source: https://github.com/zvtvz/zvt/issues/183
### `AP-ZVT-179` — 第三方数据接口超限后异常被吞噬,数据静默缺失 <sub>(high)</sub>
ZVT 使用聚宽 jqdatasdk 批量拉取全市场 K 线时(4000+ 股票),触发聚宽每日 最大查询条数限制(错误:已超过每日最大查询数量)。ZVT 捕获异常后继续执行下一 entity,导致超限后所有股票的当日数据均静默缺失。回测若使用该残缺数据库,因 子计算结果将产生系统性偏差,且无告警。
Source: https://github.com/zvtvz/zvt/issues/179
### `AP-ZVT-161` — 全市场 SQLite 批量因子计算触发 too many SQL variables 错误 <sub>(medium)</sub>
ZVT 在计算 VolumeUpMaFactor 等多股因子时,将所有 entity_id 拼入单条 SQL 的 IN 子句。当 A 股全市场(5000+ 股)一次性查询时,触发 SQLite 默认限制 SQLITE_MAX_VARIABLE_NUMBER=999。调大 max_allowed_packet(MySQL 参数)无效, 根因是 SQLite 变量数上限。正确解法是分批查询,但 ZVT 早期版本未处理此边界。
Source: https://github.com/zvtvz/zvt/issues/161
### `AP-ZVT-129` — 使用通配符导入隐藏 API 版本变更,AdjustType 等枚举莫名消失 <sub>(medium)</sub>
ZVT 文档示例使用 `from zvt import *` 导入所有符号。当 ZVT 版本升级重构 枚举(如将 AdjustType 移入子模块)后,通配符导入不再包含该符号,触发 AttributeError。使用者误以为是安装问题,实际是版本间 API breaking change 未在 CHANGELOG 中标注,且通配符导入掩盖了具体来源。应显式 import 枚举类。
Source: https://github.com/zvtvz/zvt/issues/129
### `AP-ZVT-187` — 回测引擎未在数据层空结果时提前终止,导致空指针级联崩溃 <sub>(medium)</sub>
ZVT Trader 在 load_data 完成后检查数据为空时,不提前退出,而是将空 DataFrame 传入 selector 计算,触发后续 NoneType 操作链式崩溃。错误栈深且难以定位根因, 用户误以为是策略逻辑问题。根因是数据时间窗口配置错误(start/end 不在数据 库覆盖范围内)但无有效校验。
Source: https://github.com/zvtvz/zvt/issues/187
### `AP-ZVT-183B` — HFQ(后复权)与 QFQ(前复权)K 线表使用错误导致因子计算漂移 <sub>(high)</sub>
ZVT 提供 Stock1dKdata(不复权)、Stock1dHfqKdata(后复权)、Stock1dQfqKdata (前复权)三张独立表。用户在计算价格动量/均线因子时混用两张表(如用不复权 做均线,用后复权做收益率),导致除权日前后因子值产生跳变。ZVT 不做跨表 复权类型一致性校验,混用静默通过。
Source: https://github.com/zvtvz/zvt/issues/183
FILE:references/COMPONENTS.md
# Component Capability Map
**Project**: finance-bp-085--freqtrade
**Scan date**: 2026-04-22
**Stats**: {'total_files': 8, 'total_classes': 46, 'total_functions': 0, 'total_stages': 8}
## Modules (8)
- [data_ingestion_&_history_management](components/data_ingestion_-_history_management.md): 6 classes
- [strategy_analysis_&_signal_generation](components/strategy_analysis_-_signal_generation.md): 8 classes
- [freqai_ml_training_&_inference](components/freqai_ml_training_-_inference.md): 6 classes
- [order_execution_&_trade_management](components/order_execution_-_trade_management.md): 7 classes
- [backtesting_engine](components/backtesting_engine.md): 5 classes
- [hyperoptimization](components/hyperoptimization.md): 5 classes
- [rpc_communication](components/rpc_communication.md): 5 classes
- [configuration_loading_&_validation](components/configuration_loading_-_validation.md): 4 classes
FILE:references/CONSTRAINTS.md
# Constraints
## preservation_manifest
```yaml
required_objects:
business_decisions_count: 174
fatal_constraints_count: 77
non_fatal_constraints_count: 202
use_cases_count: 1
semantic_locks_count: 12
preconditions_count: 4
evidence_quality_rules_count: 2
traceback_scenarios_count: 5
```
## Domain Constraints Injected (39)
- **`SHARED-BT-LAB-001`** <sub>(fatal)</sub>: 未来函数(Lookahead Bias):在模拟历史时间点 t 的交易决策时, 不得使用 t 时刻之后才能知道的信息。最常见形式: (1) 使用收盘价计算信号并同日以收盘价成交; (2) 将 T 日收盘后计算的指标标记在同一根 K 线; (3) 使用当日最高/最低价作为成交假设。 信号计算与成交时间必须对齐:T 日收盘后计算信号,T+1 日开盘成交。
- **`SHARED-BT-LAB-002`** <sub>(high)</sub>: 指标预热期(Warmup Period)处理:滚动窗口指标在前 N 个 bar 时 NaN, 这些 bar 不应参与信号计算和持仓决策。强制要求指标的 warmup_period 与最长 lookback 期等长,且 warmup 期间持仓应置零。
- **`SHARED-BT-LAB-003`** <sub>(fatal)</sub>: ML/DL 模型时序数据划分必须按时间顺序:TRAIN < VALID < TEST, 不可使用随机 k-fold 分折(会将未来数据混入训练集)。 应使用 TimeSeriesSplit 或 Walk-Forward 验证。
- **`SHARED-BT-LAB-004`** <sub>(fatal)</sub>: 开盘价/最高价/最低价成交假设:日线回测中假设每日可以最高价卖出或 最低价买入(如动量策略"最高价止盈"),这是明显的 lookahead, 因为日内最高/最低价只有收盘后才能确认。成交价只能用开盘价或 前一日收盘价(带滑点)。
- **`SHARED-BT-LAB-005`** <sub>(high)</sub>: 数据对齐偏移(Off-by-one):pandas rolling/shift 等操作容易引入细微的 1 期偏移错误。应在代码中明确记录每个序列的"观测时间点", 并通过 assert 验证关键时间对齐关系。
- **`SHARED-BT-LAB-006`** <sub>(high)</sub>: 过度优化(Overfitting):回测数量越多,过拟合概率越高。 Bailey et al.(2014)证明 Optimal Sharpe Ratio 期望值随回测次数单调递减。 应使用 Walk-Forward 验证代替 in-sample 参数穷举,并报告 Deflated Sharpe Ratio(DSR)而非峰值 Sharpe。
- **`SHARED-BT-SURV-001`** <sub>(fatal)</sub>: 幸存者偏差(Survivorship Bias):使用当前市场成分股作为历史回测股票池, 会遗漏曾经存在但后来退市、摘牌或被合并的股票,系统性高估策略历史收益率。 回测股票池必须使用历史时点快照(point-in-time universe)。
- **`SHARED-BT-SURV-002`** <sub>(high)</sub>: In-Sample / Out-of-Sample 划分:策略开发、参数选择必须在样本内完成, 样本外数据仅用于最终验证,不可多次"看"样本外数据后继续调优 (会将样本外变为新的样本内,重蹈过拟合)。
- **`SHARED-BT-SURV-003`** <sub>(high)</sub>: 停牌/缺失数据的填充策略:停牌日价格不可简单用前一日收盘价 forward-fill, 因为这会在复盘时造成"零成交量"日参与了因子计算和信号生成。 应在因子计算层显式过滤缺失交易日,不填充。
- **`SHARED-BT-SURV-004`** <sub>(high)</sub>: 异常值(Extreme Value)污染:原始市场数据可能含有数据源错误(如除权未 及时调整、手工录入错误导致的极端价格),不清洗直接进入因子计算会产生 极端信号,污染整个横截面。应在 pipeline 入口处过滤 3-sigma 异常值。
- **`SHARED-BT-COST-001`** <sub>(fatal)</sub>: 交易成本(佣金 + 印花税/转让税 + 过户费)必须在回测初始化时强制配置, 不可使用零成本默认值。忽略成本的回测策略绩效指标具有欺骗性, 高换手率策略尤其严重(单边往返成本往往吞噬 50%+ 的毛收益)。
- **`SHARED-BT-COST-002`** <sub>(high)</sub>: 滑点(Slippage)建模:回测若无滑点,假设每笔订单以理想价格成交, 高频策略在实盘中会因成交价劣化而产生严重亏损。至少应配置固定点差 或比例滑点;大单应使用成交量比例模型(如不超过日成交量 5%)。
- **`SHARED-BT-COST-003`** <sub>(high)</sub>: 换手率(Turnover)必须在回测绩效报告中展示并与成本关联分析。 月换手率超过 50%(年化 600%+)时,策略净收益对成本假设极度敏感, 每 10bps 成本变化可能改变策略盈亏结论,必须做成本敏感性分析。
- **`SHARED-BT-COST-004`** <sub>(medium)</sub>: 仓位规模化(Position Sizing)必须纳入资金量约束:回测应模拟固定资金量 下的实际持仓股数(取整),而非假设可以持有小数股。 对小盘股,最小交易单位(A股:100股/手)会导致实际可持仓量与目标权重 产生偏差,应在回测中模拟取整效应。
- **`SHARED-BT-TIME-001`** <sub>(high)</sub>: 时间戳时区统一:多数据源合并时,UTC vs 本地时间混用是常见数据腐败源。 所有时间戳必须在 pipeline 入口处统一转换为同一时区(推荐 UTC 存储, 市场本地时区展示),不可在 pipeline 中途混用不同时区。
- **`SHARED-BT-TIME-002`** <sub>(high)</sub>: 交易日历对齐:合并不同市场或不同频率数据时(如日线价格 + 周频因子), 必须使用明确的交易日历进行 reindex/merge,不可使用 outer join 后 fillna, 否则会在非交易日(节假日)创建虚假数据行。
- **`SHARED-BT-TIME-003`** <sub>(high)</sub>: 增量更新边界校验:历史数据增量更新时,必须从数据库查询已存最新日期, 仅下载该日期之后的数据。若重新下载已有数据并追加,会产生时间戳重复行, 导致回测时序错误。更新前后必须校验无重复 (index.duplicated().any() == False)。
- **`SHARED-BT-TIME-004`** <sub>(medium)</sub>: 回测绩效归因失真:基准(Benchmark)选择不当会使 Alpha/Beta 计算失真。 应选用策略实际可投资的被动基准(如 HS300 ETF),而非不可直接投资的 价格指数(如 HS300 指数)。价格指数不含股息再投资,会低估持仓基准收益。
- **`SHARED-BT-PERF-001`** <sub>(medium)</sub>: 最大回撤(Max Drawdown)计算必须使用净值序列(portfolio value), 不可用累计收益率序列代替。若使用对数收益率累加,会低估回撤深度 (因对数收益率在下跌时会比简单收益率偏小)。
- **`SHARED-BT-PERF-002`** <sub>(medium)</sub>: Sharpe Ratio 年化化约定:年化 Sharpe = 日 Sharpe × sqrt(252)(股票,252 交易日) 或 × sqrt(365)(加密货币,365日)。不同系统默认不同,跨系统对比前必须 确认年化因子,否则 Sharpe 不可比。
- **`SHARED-BT-PERF-003`** <sub>(medium)</sub>: Calmar Ratio / Sortino Ratio 优于 Sharpe Ratio 作为风险调整收益指标: Sharpe 假设收益正态分布,A 股/加密市场的收益分布显著左偏(肥尾), 会低估下行风险。量化评估应同时报告 Sortino(仅下行波动)和 Calmar(年化收益/最大回撤),不应单一依赖 Sharpe。
- **`SHARED-BT-PERF-004`** <sub>(medium)</sub>: 回测绩效归因应拆解为:alpha(主动收益)、beta(市场收益)、 因子暴露收益(style/sector)和特异性收益(stock selection)。 不做归因的回测无法区分"策略优秀"与"顺风行情恰好 beta 对了"。
- **`SHARED-FR-IC-001`** <sub>(high)</sub>: IC(信息系数)是衡量因子预测能力的核心指标,定义为因子值与 下期收益率的 Spearman 秩相关系数(ICIR = IC / std(IC))。 IC 绝对值 > 0.05 视为有预测能力的初步证据,ICIR > 0.5 视为稳定。 不计算 IC 直接报告回测绩效是因子有效性证明缺失的典型问题。
- **`SHARED-FR-IC-002`** <sub>(high)</sub>: IC 衰减(IC Decay)分析:因子预测能力通常随持仓期增长而衰减。 应计算 1/5/10/20 日 IC 序列,识别因子的最优持仓期。 IC 在1日高但20日迅速衰减的因子是短期因子,不适合月度换仓策略; 反之亦然。使用错误的持仓期会严重损害因子实盘表现。
- **`SHARED-FR-IC-003`** <sub>(high)</sub>: Harvey, Liu & Zhu (2016) 警告:学术界已发现 300+ 个"显著"因子, 其中大量是多重检验下的误发现(False Discovery)。因子有效性要求: t-stat > 3.0(而非传统的 1.96);或在不同时段/市场独立复现; 或有清晰的经济学逻辑。不满足上述条件的因子极可能是数据挖掘产物。
- **`SHARED-FR-IC-004`** <sub>(high)</sub>: 因子换手率(Factor Turnover)控制:高 IC 但高换手率的因子,在扣除 交易成本后净 IC 可能为负。应计算换手率调整后的有效 IC: net_IC = IC - turnover × cost_per_turn。目标换手率 ≤ 50%(月频)。
- **`SHARED-FR-IC-005`** <sub>(medium)</sub>: 因子衰减期(Half-life)是因子信号强度的核心参数,直接决定最优再平衡频率。 半衰期 < 5 日:日频或周频换仓;5-20 日:周频或双周;> 20 日:月频换仓。 错误地对短期因子使用月频换仓,会导致大量 alpha 在持仓期内消散。
- **`SHARED-FR-NEUT-001`** <sub>(high)</sub>: 行业中性化(Industry Neutralization):因子值若不对行业均值中性化, 因子收益中会混入行业轮动收益,难以判断是因子本身还是行业暴露驱动了收益。 行业中性化操作:factor_neutral = factor - industry_mean(factor)。
- **`SHARED-FR-NEUT-002`** <sub>(high)</sub>: 市值中性化(Market Cap Neutralization):小盘股效应(小盘跑赢大盘) 是金融史上最持久的 anomaly 之一,会污染几乎所有未中性化的因子。 若因子与市值高度相关,选股会系统性偏向小盘,收益来自市值暴露而非因子本身。 需同时进行行业和市值中性化(Fama-MacBeth 回归或残差法)。
- **`SHARED-FR-NEUT-003`** <sub>(high)</sub>: 异常值处理(Winsorize/MAD):因子原始值通常含有极端值,极端值会扭曲 分组分析(如 Q1/Q10 十分位)。应对原始因子值做 Winsorize(截尾至 [1%, 99%] 或 3-sigma)或 MAD(中位数绝对偏差)缩尾,然后再排名/中性化。
- **`SHARED-FR-NEUT-004`** <sub>(medium)</sub>: 因子正交化(Factor Orthogonalization):当多个因子共同用于合成打分时, 高相关因子的合成等效于对单一因子过度权重,稀释信号多样性。 应在合成前对因子做施密特正交化或 PCA,消除因子间的多重共线性。
- **`SHARED-FR-NEUT-005`** <sub>(medium)</sub>: 缺失数据填充策略:因子计算中的 NaN(停牌/新股/数据缺口)若用截面均值填充 会引入 lookahead bias(均值本身含未来信息);若完全删除会产生幸存者偏差; 正确做法是用截面中位数(当日所有股票的中位数,不依赖未来)或将该股当日排除。
- **`SHARED-FR-PORT-001`** <sub>(high)</sub>: 分层分析(Quantile Analysis):因子评估应使用 Q1/Q5(五分位)或 Q1/Q10(十分位)分组的多空收益差(top minus bottom spread)作为 主要评估指标,而非简单的多头收益。Q1 多 Q5 空的"单调性"检验是 因子有效性的核心证据:单调递增/递减 > 非单调 >> 仅多头有效。
- **`SHARED-FR-PORT-002`** <sub>(medium)</sub>: Alpha 衰减测试(Alpha Decay Test):因子的月度 IC 在不同时段(牛市/熊市/ 震荡市)的稳定性是因子鲁棒性的重要证据。IC 仅在某个特定市场状态下有效 的因子不适合全天候部署;应分段(rolling 12M)展示 IC 时序, 识别因子失效期。
- **`SHARED-FR-PORT-003`** <sub>(medium)</sub>: 换仓成本感知(Turnover-Aware Selection):因子排名靠近中间地带(49-51 分位) 的股票,排名小幅波动就会触发换仓,产生大量无效交易成本。 应在选股时设置换仓缓冲区(buffer zone):只在排名变化超过阈值时才换仓。
- **`SHARED-FR-PORT-004`** <sub>(medium)</sub>: 分组收益的统计显著性(Bootstrap 检验):因子分层收益差(Q1-Q5 spread) 即使在历史数据上很大,也可能是偶然,需要 bootstrap 或 t-test 检验 显著性(p-value < 0.05)。小样本回测期(< 3年)的分层收益尤其不可靠。
- **`SHARED-FR-XFER-001`** <sub>(high)</sub>: 因子跨市场可移植性验证:在一个市场有效的因子,不必然在另一个市场有效。 将美股因子直接套用 A 股、或将股票因子套用期货/加密货币,需要独立 IC 验证, 不可假设跨市场通用性。A 股特有异象(如反转效应、ST 价格异常)不存在于美股。
- **`SHARED-FR-XFER-002`** <sub>(medium)</sub>: 因子有效性时间稳定性:曾经有效的因子会因市场学习和套利行为逐渐失效 (McLean & Pontiff 2016 证明因子发表后平均衰减 58%)。 应定期(每季度/年)重新评估因子 IC,失效因子应及时替换或降权。
- **`SHARED-FR-XFER-003`** <sub>(medium)</sub>: 因子与宏观经济环境的交互:利率周期/经济周期/市场情绪对因子有效性影响显著。 价值因子(低 P/B)在利率上升期更有效;动量因子在趋势市更有效,震荡市失效。 部署因子前应评估当前宏观环境与因子最优生存环境的匹配度。
FILE:references/LOCKS.md
# Semantic Locks + Preconditions
## Semantic Locks (12)
### `SL-01` <sub>(on_violation: fatal)</sub>
Execute sell orders before buy orders in every trading cycle
### `SL-02` <sub>(on_violation: fatal)</sub>
Trading signals MUST use next-bar execution (no look-ahead)
### `SL-03` <sub>(on_violation: fatal)</sub>
Entity IDs MUST follow format entity_type_exchange_code
### `SL-04` <sub>(on_violation: fatal)</sub>
DataFrame index MUST be MultiIndex (entity_id, timestamp)
### `SL-05` <sub>(on_violation: fatal)</sub>
TradingSignal MUST have EXACTLY ONE of: position_pct, order_money, order_amount
### `SL-06` <sub>(on_violation: fatal)</sub>
filter_result column semantics: True=BUY, False=SELL, None/NaN=NO ACTION
### `SL-07` <sub>(on_violation: fatal)</sub>
Transformer MUST run BEFORE Accumulator in factor pipeline
### `SL-08` <sub>(on_violation: fatal)</sub>
MACD parameters locked: fast=12, slow=26, signal=9
### `SL-09` <sub>(on_violation: warning)</sub>
Default transaction costs: buy_cost=0.001, sell_cost=0.001, slippage=0.001
### `SL-10` <sub>(on_violation: fatal)</sub>
A-share equity trading is T+1 (no same-day close of buy positions)
### `SL-11` <sub>(on_violation: fatal)</sub>
Recorder subclass MUST define provider AND data_schema class attributes
### `SL-12` <sub>(on_violation: fatal)</sub>
Factor result_df MUST contain either 'filter_result' OR 'score_result' column
## Preconditions (4)
- **`PC-01`**: `python3 -c 'import zvt; print(zvt.__version__)'` → on_fail: Run: python3 -m pip install zvt then re-run: python3 -m zvt.init_dirs to initialize data directories
- **`PC-02`**: `python3 -c "from zvt.api.kdata import get_kdata; df = get_kdata(entity_ids=['stock_sh_600000'], limit=1); assert df is not None and len(df) > 0, 'No kdata found'"` → on_fail: Run recorder first: python3 -m zvt.recorders.em.em_stock_kdata_recorder --entity_ids stock_sh_600000 (replace with your target entity IDs)
- **`PC-03`**: `python3 -c "import os; from pathlib import Path; zvt_home = Path(os.environ.get('ZVT_HOME', Path.home() / '.zvt')); assert zvt_home.exists(), f'ZVT home not found: {zvt_home}'"` → on_fail: Run: python3 -m zvt.init_dirs
- **`PC-04`**: `python3 -c "import os, tempfile; from pathlib import Path; zvt_home = Path(os.environ.get('ZVT_HOME', Path.home() / '.zvt')); test_f = zvt_home / '.write_test'; test_f.touch(); test_f.unlink()"` → on_fail: Check directory permissions: chmod u+w ~/.zvt or set ZVT_HOME environment variable to a writable location
FILE:references/USE_CASES.md
# Known Use Cases (KUC)
Total: **1**
## `KUC-101`
**Source**: `freqtrade/templates/strategy_analysis_example.ipynb`
Users need a template to load historical market data and analyze trading strategy performance using Freqtrade's configuration and history loading capabilities before live deployment.
FILE:references/WISDOM.md
# Cross-Project Wisdom
Total: **10**
## `CW-BT-001` — Cerebro 统一编排引擎
**From**: backtrader · **Applicable to**: backtesting
backtrader 用 Cerebro 作为单一入口,统一管理 data feeds、strategies、analyzers、 observers 的生命周期,支持一次 cerebro.run() 跑多策略+多数据源。 zvt 的 StockTrader 目前每次实例化只绑定一套因子,缺乏统一的多策略组合编排层; 借鉴 Cerebro 模式可让用户把多个 Trader 实例组合到一个 runner 中对比评估。
## `CW-BT-002` — Analyzer 插件化绩效评估
**From**: backtrader · **Applicable to**: backtesting
backtrader 提供 SharpeRatio、DrawDown、TimeReturn、TradeAnalyzer 等即插即用 的 Analyzer,可在不修改策略代码的情况下附加任意绩效指标。 zvt 当前绩效评估能力较弱,没有标准化的 Analyzer 接口; 借鉴此模式可让用户 cerebro.addanalyzer(SharpeRatio) 即得风险调整收益报告。
## `CW-BT-003` — Sizer 仓位管理分离
**From**: backtrader · **Applicable to**: backtesting
backtrader 将仓位管理(每次开仓买多少股/多大比例)单独抽象为 Sizer, 与信号逻辑完全解耦;内置 FixedSize、PercentSizer 等,用户可自定义。 zvt 目前没有显式的 Sizer 概念,仓位控制逻辑散落在 Trader.on_profit_control 等钩子中; 引入 Sizer 接口可使策略信号与资金管理规则独立演化和组合复用。
## `CW-BT-004` — Order 类型全集(Limit/Stop/OCO/Bracket)
**From**: backtrader · **Applicable to**: backtesting
backtrader 支持 Market、Limit、Stop、StopLimit、OCO(二选一)、 Bracket(止盈止损一对订单)等丰富订单类型,并模拟成交滑点和手续费方案。 zvt 回测目前主要支持市价成交,缺乏限价委托和组合订单模拟; 对于高频或实盘对接场景,完善订单类型将大幅提升回测真实性。
## `CW-BT-005` — 数据重采样与重播(Resampling & Replaying)
**From**: backtrader · **Applicable to**: backtesting
backtrader 可将低级别数据(如 1 min)实时 resample 为高级别(如 1 day)并同步驱动策略, 或 replay 逐 tick 模拟 OHLC 形成过程,实现日内精细回测。 zvt 目前多时间框架通过预录入不同级别 K 线实现,缺少运行时动态重采样; 借鉴此模式可在不重复录入数据的前提下支持任意时间粒度组合回测。
## `CW-VN-003` — CTA 回测引擎内置可视化
**From**: vnpy · **Applicable to**: backtesting
vnpy 的 cta_backtester 提供图形界面直接展示策略净值曲线、最大回撤、 每日盈亏、成交明细,无需 Jupyter Notebook。 zvt 目前回测结果可视化依赖 draw_result 方法调用 Plotly,但无统一的回测报告页面; 借鉴此模式可打包一个开箱即用的策略绩效仪表盘。
## `CW-VN-004` — vnpy.alpha ML 因子研究实验室(Lab)
**From**: vnpy · **Applicable to**: factor-research
vnpy 4.0 的 vnpy.alpha.lab 提供数据管理、模型训练、信号生成、策略回测一体化工作流, 支持 Lasso/LightGBM/MLP 等算法的标准化训练接口和可视化对比。 zvt 的 ML 能力目前仅有 MaStockMLMachine 一个入口,缺乏规范化 Lab 框架; 借鉴 Lab 模式可建立"特征工程→训练→信号→回测"的标准流水线,降低 ML 实验门槛。
## `CW-QL-001` — Point-in-Time 数据库(防未来数据泄漏)
**From**: qlib · **Applicable to**: backtesting
qlib 的 Point-in-Time Provider 保证在给定时间点 t 的查询只返回 t 时刻 真实可知的数据(财报发布延迟、修订历史均被正确处理), 彻底消除回测中的 look-ahead bias。 zvt 目前财务数据以报告期为 timestamp,缺少"发布日"维度, 存在用未来财报数据做选股的潜在偏差;引入 PIT 模式可大幅提升回测可信度。
## `CW-QL-002` — Recorder + Experiment 实验管理(MLflow 风格)
**From**: qlib · **Applicable to**: factor-research
qlib 的 workflow 模块提供 Experiment/Recorder,自动记录每次模型训练的 超参数、特征、指标、预测结果,支持跨实验比较和模型版本管理。 zvt 目前缺乏 ML 实验追踪机制,每次重跑结果会覆盖前次; 借鉴 Recorder 模式可将每次因子实验的参数和结果持久化,支持快速复现和版本对比。
## `CW-QL-003` — Nested Decision Framework(多层嵌套决策执行)
**From**: qlib · **Applicable to**: backtesting
qlib 支持将高频执行层(分钟级委托拆单)嵌套在低频决策层(日级组合调仓)中, 两层独立优化且可组合运行,实现日内最优执行算法(如 TWAP、VWAP 调仓)。 zvt 目前回测仅有日线级别的成交假设,缺乏执行算法建模; 借鉴嵌套框架可让 zvt 区分"何时持有哪些股"与"如何以最小冲击成本建仓"两个问题。
FILE:references/components/backtesting_engine.md
# backtesting_engine (5 classes)
## `Backtesting.backtest_loop`
`backtesting_engine/backtesting-backtest-loop.py:0`
## `Backtesting._run_funding_fees`
`backtesting_engine/backtesting-run-funding-fees.py:0`
## `trade_list_to_dataframe`
`backtesting_engine/trade-list-to-dataframe.py:0`
## `fill_model`
`backtesting_engine/fill-model.py:0`
## `protections`
`backtesting_engine/protections.py:0`
FILE:references/components/configuration_loading_-_validation.md
# configuration_loading_&_validation (4 classes)
## `Configuration.load_config`
`configuration_loading_&_validation/configuration-load-config.py:0`
## `Configuration.deep_merge_dicts`
`configuration_loading_&_validation/configuration-deep-merge-dicts.py:0`
## `CONF_SCHEMA.validate`
`configuration_loading_&_validation/conf-schema-validate.py:0`
## `config_validation`
`configuration_loading_&_validation/config-validation.py:0`
FILE:references/components/data_ingestion_-_history_management.md
# data_ingestion_&_history_management (6 classes)
## `Exchange._init_subclasses`
`data_ingestion_&_history_management/exchange-init-subclasses.py:0`
## `IDataHandler.get_file_extension`
`data_ingestion_&_history_management/idatahandler-get-file-extension.py:0`
## `DataProvider.get_df`
`data_ingestion_&_history_management/dataprovider-get-df.py:0`
## `load_data`
`data_ingestion_&_history_management/load-data.py:0`
## `data_handler_implementation`
`data_ingestion_&_history_management/data-handler-implementation.py:0`
## `exchange_adapter`
`data_ingestion_&_history_management/exchange-adapter.py:0`
FILE:references/components/freqai_ml_training_-_inference.md
# freqai_ml_training_&_inference (6 classes)
## `IFreqaiModel.train`
`freqai_ml_training_&_inference/ifreqaimodel-train.py:0`
## `IFreqaiModel.predict`
`freqai_ml_training_&_inference/ifreqaimodel-predict.py:0`
## `FreqaiDataKitchen.check_if_new_training_required`
`freqai_ml_training_&_inference/freqaidatakitchen-check-if-new-training-.py:0`
## `FreqaiDataDrawer.load_historic_predictions_from_disk`
`freqai_ml_training_&_inference/freqaidatadrawer-load-historic-predictio.py:0`
## `prediction_model`
`freqai_ml_training_&_inference/prediction-model.py:0`
## `compute_device`
`freqai_ml_training_&_inference/compute-device.py:0`
FILE:references/components/hyperoptimization.md
# hyperoptimization (5 classes)
## `Hyperopt.run`
`hyperoptimization/hyperopt-run.py:0`
## `HyperOptimizer.hyperopt_pickle_magic`
`hyperoptimization/hyperoptimizer-hyperopt-pickle-magic.py:0`
## `IHyperOptLoss.__call__`
`hyperoptimization/ihyperoptloss-call.py:0`
## `optimizer`
`hyperoptimization/optimizer.py:0`
## `loss_function`
`hyperoptimization/loss-function.py:0`
FILE:references/components/order_execution_-_trade_management.md
# order_execution_&_trade_management (7 classes)
## `FreqtradeBot.execute_entry`
`order_execution_&_trade_management/freqtradebot-execute-entry.py:0`
## `FreqtradeBot.handle_trade`
`order_execution_&_trade_management/freqtradebot-handle-trade.py:0`
## `Trade.calc_profit`
`order_execution_&_trade_management/trade-calc-profit.py:0`
## `Trade.adjust_trade_position`
`order_execution_&_trade_management/trade-adjust-trade-position.py:0`
## `order_type`
`order_execution_&_trade_management/order-type.py:0`
## `stoploss_placement`
`order_execution_&_trade_management/stoploss-placement.py:0`
## `position_sizing`
`order_execution_&_trade_management/position-sizing.py:0`
FILE:references/components/rpc_communication.md
# rpc_communication (5 classes)
## `RPC._rpc_force_entry`
`rpc_communication/rpc-rpc-force-entry.py:0`
## `RPC._rpc_force_exit`
`rpc_communication/rpc-rpc-force-exit.py:0`
## `RPC._ws_request_analyzed_df`
`rpc_communication/rpc-ws-request-analyzed-df.py:0`
## `RPCManager.start`
`rpc_communication/rpcmanager-start.py:0`
## `rpc_transport`
`rpc_communication/rpc-transport.py:0`
FILE:references/components/strategy_analysis_-_signal_generation.md
# strategy_analysis_&_signal_generation (8 classes)
## `IStrategy.populate_indicators`
`strategy_analysis_&_signal_generation/istrategy-populate-indicators.py:0`
## `IStrategy.populate_entry_trend`
`strategy_analysis_&_signal_generation/istrategy-populate-entry-trend.py:0`
## `IStrategy.populate_exit_trend`
`strategy_analysis_&_signal_generation/istrategy-populate-exit-trend.py:0`
## `IStrategy.get_entry_signal`
`strategy_analysis_&_signal_generation/istrategy-get-entry-signal.py:0`
## `IStrategy.should_exit`
`strategy_analysis_&_signal_generation/istrategy-should-exit.py:0`
## `strategy_implementation`
`strategy_analysis_&_signal_generation/strategy-implementation.py:0`
## `pairlist_filter`
`strategy_analysis_&_signal_generation/pairlist-filter.py:0`
## `custom_callbacks`
`strategy_analysis_&_signal_generation/custom-callbacks.py:0`
执行银行系统级压力测试,基于EBA 2018真实数据计算CET1比率与杠杆率,模拟firesale情景下资产负债表韧性。
---
name: firesale-stress-test
description: |-
执行银行系统级压力测试,基于EBA 2018真实数据计算CET1比率与杠杆率,模拟firesale情景下资产负债表韧性。
license: Proprietary. See LICENSE.txt in project root.
compatibility: Designed for Doramagic-host ecosystem (Claude Code / openclaw / Cursor). Requires Python 3.12+ with uv package manager.
metadata:
version: "v6.1"
blueprint_id: "finance-bp-067"
compiled_at: "2026-04-22T13:00:22.380878+00:00"
capability_markets: "global"
capability_activities: "regtech-compliance"
sop_version: "crystal-compilation-v6.1"
---
# 银行压力测试 (firesale-stress-test)
> 执行银行系统级压力测试,基于EBA 2018真实数据计算CET1比率与杠杆率,模拟firesale情景下资产负债表韧性。
## Pipeline
`data_collection -> data_storage -> factor_computation -> target_selection -> trading_execution -> visualization`
## Top Use Cases (0 total)
**Execute trigger**: `When user intent matches intent_router.uc_entries[].positive_terms AND user uses action verb (run/execute/跑/执行/backtest/fetch/collect)`
## What I'll Ask You
- Target market: A-share (default), HK, or crypto? (US stocks in ZVT are half-baked — stockus_nasdaq_AAPL exists but coverage is thin)
- Data source / provider: eastmoney (free, no account), joinquant (account+paid), baostock (free, good history), akshare, or qmt (broker)?
- Strategy type: MACD golden-cross, MA crossover, volume breakout, fundamental screen, or custom factor?
- Time range: start_timestamp and end_timestamp for backtest period
- Target entity IDs: specific stocks (stock_sh_600000) or index components (SZ1000)?
## Semantic Locks (Fatal)
| ID | Rule | On Violation |
|---|---|---|
| `SL-01` | Execute sell orders before buy orders in every trading cycle | halt |
| `SL-02` | Trading signals MUST use next-bar execution (no look-ahead) | halt |
| `SL-03` | Entity IDs MUST follow format entity_type_exchange_code | halt |
| `SL-04` | DataFrame index MUST be MultiIndex (entity_id, timestamp) | halt |
| `SL-05` | TradingSignal MUST have EXACTLY ONE of: position_pct, order_money, order_amount | halt |
| `SL-06` | filter_result column semantics: True=BUY, False=SELL, None/NaN=NO ACTION | halt |
| `SL-07` | Transformer MUST run BEFORE Accumulator in factor pipeline | halt |
| `SL-08` | MACD parameters locked: fast=12, slow=26, signal=9 | halt |
Full lock definitions: [references/LOCKS.md](references/LOCKS.md)
## Top Anti-Patterns (15 total)
- **`AP-REGTECH-001`**: Missing attribute initialization on data structures
- **`AP-REGTECH-002`**: Self-loops in transaction graphs violate domain rules
- **`AP-REGTECH-003`**: Unvalidated floating-point inputs cause runtime crashes
All 15 anti-patterns: [references/ANTI_PATTERNS.md](references/ANTI_PATTERNS.md)
## Evidence Quality Notice
> [QUALITY NOTICE] This crystal was compiled from blueprint finance-bp-067. Evidence verify ratio = 56.1% and audit fail total = 22. Generated results may have uncaptured requirement gaps. Verify critical decisions against source files (LATEST.yaml / LATEST.jsonl).
## Reference Files
| File | Contents | When to Load |
|---|---|---|
| [references/seed.yaml](references/seed.yaml) | V6+ 全量权威 (source-of-truth) | 有行为/决策争议时必读 |
| [references/ANTI_PATTERNS.md](references/ANTI_PATTERNS.md) | 15 条跨项目反模式 | 开始实现前 |
| [references/WISDOM.md](references/WISDOM.md) | 跨项目精华借鉴 | 架构决策时 |
| [references/CONSTRAINTS.md](references/CONSTRAINTS.md) | domain + fatal 约束 | 规则冲突时 |
| [references/USE_CASES.md](references/USE_CASES.md) | 全量 KUC-* 业务场景 | 需要完整示例时 |
| [references/LOCKS.md](references/LOCKS.md) | SL-* + preconditions + hints | 生成回测/交易代码前 |
| [references/COMPONENTS.md](references/COMPONENTS.md) | AST 组件地图(按 module 拆分)| 查 API 时 |
---
*Compiled by Doramagic crystal-compilation-v6.1 from `finance-bp-067` blueprint at 2026-04-22T13:00:22.380878+00:00.*
*See [human_summary.md](human_summary.md) for non-technical overview.*
FILE:human_summary.md
# Human Summary
> I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow. Just tell me what you want; I'll write the code, you don't have to dig docs. (Heads up: ZVT natively supports A-share, HK, and crypto. US stocks — stockus_nasdaq_AAPL — are half-baked; don't bother for serious work.)
## What I Can Do
- **tagline**: I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow. Just tell me what you want; I'll write the code, you don't have to dig docs. (Heads up: ZVT natively supports A-share, HK, and crypto. US stocks — stockus_nasdaq_AAPL — are half-baked; don't bother for serious work.)
- **use_cases**: ['A-share MACD daily golden-cross backtest with hfq price adjustment from eastmoney', 'End-to-end ZVT pipeline: FinanceRecorder + GoodCompanyFactor + StockTrader', 'Multi-factor strategy with TargetSelector (AND mode) combining MACD + volume breakout', 'Index composition data collection (SZ1000, SZ2000) with EM recorder', 'Institutional fund holdings tracker via joinquant_fund_runner pattern', 'Custom Transformer + Accumulator factor with per-entity rolling state', 'Bollinger Band mean-reversion factor with BollTransformer (window=20, window_dev=2)']
## What I Ask You
- Target market: A-share (default), HK, or crypto? (US stocks in ZVT are half-baked — stockus_nasdaq_AAPL exists but coverage is thin)
- Data source / provider: eastmoney (free, no account), joinquant (account+paid), baostock (free, good history), akshare, or qmt (broker)?
- Strategy type: MACD golden-cross, MA crossover, volume breakout, fundamental screen, or custom factor?
- Time range: start_timestamp and end_timestamp for backtest period
- Target entity IDs: specific stocks (stock_sh_600000) or index components (SZ1000)?
FILE:references/ANTI_PATTERNS.md
# Anti-Patterns (Cross-Project)
Total: **15**
## finance-bp-060--AMLSim (1)
### `AP-REGTECH-011` — Mismatched configuration parameters across coupled components <sub>(medium)</sub>
When TransactionGenerator and Nominator use different degree_threshold values, Nominator identifies hub accounts using different criteria than TransactionGenerator. This causes incorrect fan-in/fan-out candidate selection. Consequence: AML typology patterns placed on wrong accounts, invalidating simulation results.
## finance-bp-060--AMLSim, finance-bp-067--firesale_stresstest (1)
### `AP-REGTECH-002` — Self-loops in transaction graphs violate domain rules <sub>(high)</sub>
When generating directed transaction graphs or AML typologies, allowing source == destination edges creates self-loops. In AML simulation, self-loops represent accounts sending money to themselves, which is not a valid money laundering pattern. In fire-sale models, self-loops cause undefined behavior. Consequence: corrupted graph topology and invalid typology validation.
## finance-bp-060--AMLSim, finance-bp-071--opensanctions (1)
### `AP-REGTECH-001` — Missing attribute initialization on data structures <sub>(high)</sub>
When loading account lists or creating entity dictionaries, failing to initialize required list/dict attributes (e.g., normal_models, statement IDs) causes KeyError or ValueError at runtime. The code path that reads these structures assumes they exist, but the initialization path omits them. Consequence: pipeline crashes or data loss for affected entities.
## finance-bp-062--ifrs9 (3)
### `AP-REGTECH-005` — Incorrect amortization windows violate IFRS 9 compliance <sub>(high)</sub>
Stage 1 ECL requires exactly 12-month amortization (11 zero-indexed iterations) while Stage 2/3 requires full remaining tenor (tenor-1 iterations). Using identical windows for all stages causes ECL over/understatement. Consequence: regulatory non-compliance and materially incorrect loan loss provisions.
### `AP-REGTECH-010` — Incorrect cumulative PD ordering corrupts lifetime ECL term structure <sub>(high)</sub>
Using cumprod(1-conPD) without shift(1) and fillna(1) produces corrupted first-period survival probability. This cascades into all subsequent marginal and cumulative PD calculations, violating IFRS 9 lifetime ECL requirements. Consequence: systematically incorrect provisions across all remaining tenor periods.
### `AP-REGTECH-015` — Missing EAD component in ECL formula produces incomplete provisions <sub>(high)</sub>
IFRS 9 requires ECL = PD x LGD x EAD. When the EAD module is missing or not integrated, the ECL calculation is incomplete and unusable for provisioning. Consequence: regulatory rejection of ECL calculations, blocking of provisioning and reporting processes.
## finance-bp-062--ifrs9, finance-bp-067--firesale_stresstest (2)
### `AP-REGTECH-003` — Unvalidated floating-point inputs cause runtime crashes <sub>(high)</sub>
When parsing CSV files or computing statistical functions on raw data, failing to validate inputs against acceptable ranges (e.g., DDP near 0 or 1 for norm.ppf, unvalidated floats from CSV) causes ValueError or infinite/NaN values. Consequence: entire model crashes before simulation or corrupted downstream calculations.
### `AP-REGTECH-004` — Division by zero in financial calculations produces inf/NaN <sub>(high)</sub>
When calculating ratios like DDP (downgrade observations / total observations) or price impact denominators (total_quantities), zero-denominator cases are not guarded. The resulting inf/NaN propagates through all downstream calculations, corrupting CCI, ECL, or market clearing. Consequence: systematic data corruption across the entire calculation pipeline.
## finance-bp-067--firesale_stresstest (4)
### `AP-REGTECH-006` — Wrong leverage formula in threshold-based decisions <sub>(high)</sub>
Computing leverage as equity-to-liabilities (E/L) instead of equity-to-assets (E/A) produces different values. This causes deleveraging triggers and insolvency detection to fire at wrong thresholds. Consequence: zombie banks continue operating with negative equity, or healthy banks unnecessarily deleverage.
### `AP-REGTECH-007` — Confusing deleveraging buffer threshold with insolvency threshold <sub>(high)</sub>
Banks below 3% leverage are insolvent and must default, but deleveraging should trigger at 4% buffer. Using the same threshold eliminates the buffer zone, causing immediate default with no intermediate corrective action. Consequence: excessive bank failures amplify systemic contagion.
### `AP-REGTECH-013` — Order-dependent execution creates first-mover advantage bias <sub>(medium)</sub>
Without separating step() and act() phases, first-acting banks sell assets before others decide, creating systematic first-mover advantage. This distorts the competitive equilibrium and fire-sale dynamics. Consequence: unreliable systemic risk estimates that understate contagion for late-acting banks.
### `AP-REGTECH-014` — Immediate asset sales cause double-selling and undefined state <sub>(medium)</sub>
Executing asset sales immediately rather than queuing them to a buffer allows multiple banks holding the same asset to sell simultaneously without accounting for concurrent intentions. Consequence: undefined price impact and incorrect cash transfers in market clearing.
## finance-bp-071--opensanctions (3)
### `AP-REGTECH-008` — Cache keys omit request body for state-changing methods <sub>(high)</sub>
Using only URL for cache fingerprints on POST/PATCH requests means different request bodies return identical cached content. This causes stale data, missing entities, and data corruption in compliance screening pipelines. Consequence: sanctions matches missed or false positives from stale entity data.
### `AP-REGTECH-009` — ID collision in entity construction creates false sanctions matches <sub>(high)</sub>
When constructing entity IDs from source identifiers, insufficient identifying attributes cause different real-world entities to receive identical IDs. The database then merges them into one entity. Consequence: a sanctioned entity's ID matches an innocent entity, causing false positive compliance alerts.
### `AP-REGTECH-012` — Reverse property assignment corrupts entity construction <sub>(medium)</sub>
Stub (reverse) properties represent inverse relationships and raise InvalidData when directly assigned. Attempting to add values to stub properties instead of forward properties causes ValueError, aborting entity construction. Consequence: entities lost from output, incomplete compliance datasets.
FILE:references/COMPONENTS.md
# Component Capability Map
**Project**: finance-bp-067--firesale_stresstest
**Scan date**: 2026-04-22
**Stats**: {'total_files': 5, 'total_classes': 23, 'total_functions': 0, 'total_stages': 5}
## Modules (5)
- [model_initialization](components/model_initialization.md): 4 classes
- [shock_application](components/shock_application.md): 5 classes
- [agent_decision_phase](components/agent_decision_phase.md): 5 classes
- [market_clearing](components/market_clearing.md): 6 classes
- [default_handling](components/default_handling.md): 3 classes
FILE:references/CONSTRAINTS.md
# Constraints
## preservation_manifest
```yaml
required_objects:
business_decisions_count: 143
fatal_constraints_count: 42
non_fatal_constraints_count: 127
use_cases_count: 0
semantic_locks_count: 12
preconditions_count: 4
evidence_quality_rules_count: 2
traceback_scenarios_count: 5
```
FILE:references/LOCKS.md
# Semantic Locks + Preconditions
## Semantic Locks (12)
### `SL-01` <sub>(on_violation: fatal)</sub>
Execute sell orders before buy orders in every trading cycle
### `SL-02` <sub>(on_violation: fatal)</sub>
Trading signals MUST use next-bar execution (no look-ahead)
### `SL-03` <sub>(on_violation: fatal)</sub>
Entity IDs MUST follow format entity_type_exchange_code
### `SL-04` <sub>(on_violation: fatal)</sub>
DataFrame index MUST be MultiIndex (entity_id, timestamp)
### `SL-05` <sub>(on_violation: fatal)</sub>
TradingSignal MUST have EXACTLY ONE of: position_pct, order_money, order_amount
### `SL-06` <sub>(on_violation: fatal)</sub>
filter_result column semantics: True=BUY, False=SELL, None/NaN=NO ACTION
### `SL-07` <sub>(on_violation: fatal)</sub>
Transformer MUST run BEFORE Accumulator in factor pipeline
### `SL-08` <sub>(on_violation: fatal)</sub>
MACD parameters locked: fast=12, slow=26, signal=9
### `SL-09` <sub>(on_violation: warning)</sub>
Default transaction costs: buy_cost=0.001, sell_cost=0.001, slippage=0.001
### `SL-10` <sub>(on_violation: fatal)</sub>
A-share equity trading is T+1 (no same-day close of buy positions)
### `SL-11` <sub>(on_violation: fatal)</sub>
Recorder subclass MUST define provider AND data_schema class attributes
### `SL-12` <sub>(on_violation: fatal)</sub>
Factor result_df MUST contain either 'filter_result' OR 'score_result' column
## Preconditions (4)
- **`PC-01`**: `python3 -c 'import zvt; print(zvt.__version__)'` → on_fail: Run: python3 -m pip install zvt then re-run: python3 -m zvt.init_dirs to initialize data directories
- **`PC-02`**: `python3 -c "from zvt.api.kdata import get_kdata; df = get_kdata(entity_ids=['stock_sh_600000'], limit=1); assert df is not None and len(df) > 0, 'No kdata found'"` → on_fail: Run recorder first: python3 -m zvt.recorders.em.em_stock_kdata_recorder --entity_ids stock_sh_600000 (replace with your target entity IDs)
- **`PC-03`**: `python3 -c "import os; from pathlib import Path; zvt_home = Path(os.environ.get('ZVT_HOME', Path.home() / '.zvt')); assert zvt_home.exists(), f'ZVT home not found: {zvt_home}'"` → on_fail: Run: python3 -m zvt.init_dirs
- **`PC-04`**: `python3 -c "import os, tempfile; from pathlib import Path; zvt_home = Path(os.environ.get('ZVT_HOME', Path.home() / '.zvt')); test_f = zvt_home / '.write_test'; test_f.touch(); test_f.unlink()"` → on_fail: Check directory permissions: chmod u+w ~/.zvt or set ZVT_HOME environment variable to a writable location
FILE:references/USE_CASES.md
# Known Use Cases (KUC)
Total: **0**
FILE:references/WISDOM.md
# Cross-Project Wisdom
Total: **10**
## `CW-REGTECH-001` — Input bounds validation before statistical computation
**From**: finance-bp-062--ifrs9, finance-bp-067--firesale_stresstest · **Applicable to**: regtech-compliance
Statistical functions like norm.ppf() and cumprod() have strict input requirements that, if violated, produce infinite or NaN values corrupting entire pipelines. Always validate inputs against domain constraints (DDP in (0,1), counts > 0) before passing to statistical functions. Apply to any statistical or inverse-CDF computation.
## `CW-REGTECH-002` — Graph/topology invariant verification before construction
**From**: finance-bp-060--AMLSim, finance-bp-067--firesale_stresstest · **Applicable to**: regtech-compliance
Before constructing graph structures (transaction networks, transition matrices), verify invariants: sum(in-degrees) = sum(out-degrees), matrix row sums = 1.0, degree sequence length divisibility. This catches data corruption early before expensive graph construction operations. Apply to any bipartite or directed graph generation.
## `CW-REGTECH-003` — Regulatory amortization window discipline
**From**: finance-bp-062--ifrs9 · **Applicable to**: regtech-compliance
IFRS 9 mandates different ECL calculation windows: exactly 12-month for Stage 1 (11 zero-indexed iterations), full remaining tenor for Stage 2/3. Mixing these up violates compliance requirements. Always encode stage-specific window logic explicitly rather than reusing a single loop variable across stages.
## `CW-REGTECH-004` — Fingerprint composition must include all request dimensions
**From**: finance-bp-071--opensanctions · **Applicable to**: regtech-compliance
Cache keys must include all request parameters that affect response content: URL, HTTP method, authentication headers, and request body for state-changing methods. POST requests with different bodies returning identical cache is a silent data corruption bug. Always compose fingerprints from the union of all content-affecting parameters.
## `CW-REGTECH-005` — Floating-point zero-equivalence with explicit epsilon tolerance
**From**: finance-bp-067--firesale_stresstest · **Applicable to**: regtech-compliance
IEEE 754 floating-point precision causes exact zero comparisons to fail in financial calculations. Always use eps=1e-9 tolerance for zero-equivalence checks in market clearing, leverage ratios, and price impact calculations. This prevents division-by-zero crashes and incorrect cash transfers.
## `CW-REGTECH-006` — Stage classification threshold ordering enforcement
**From**: finance-bp-062--ifrs9 · **Applicable to**: regtech-compliance
IFRS 9 SICR thresholds must be ordered: BUCKETS 2-3 trigger Stage 2, BUCKETS >=4 trigger Stage 3. Applying thresholds in wrong order or omitting absolute DPD triggers causes material ECL misstatement. Validate threshold ordering and document bucket-to-stage mapping explicitly.
## `CW-REGTECH-007` — Initialization-before-use dependency ordering
**From**: finance-bp-067--firesale_stresstest · **Applicable to**: regtech-compliance
Operational dependencies must initialize before dependent objects use them: AssetMarket before bank registration, CSV file existence before parsing, entity ID before statement addition. Violations cause AttributeError or FileNotFoundError that abort entire initialization. Always encode dependency ordering explicitly in initialization sequences.
## `CW-REGTECH-008` — Sufficient entity ID collision prevention
**From**: finance-bp-071--opensanctions · **Applicable to**: regtech-compliance
Entity IDs must include enough identifying attributes (dataset prefix, source, identifier type, document number) to guarantee uniqueness. Collisions create false equivalence between unrelated entities, directly causing false positive sanctions matches. Include the maximum available discriminating attributes in ID construction.
## `CW-REGTECH-009` — Hub selection with candidate removal before addition
**From**: finance-bp-060--AMLSim · **Applicable to**: regtech-compliance
When selecting hub accounts for typology placement, always call remove_typology_candidate BEFORE add_node for each selected account. Reversing this order causes hub self-selection (accounts choosing themselves) and duplicate assignment across overlapping patterns. Apply to any allocation algorithm with candidate pooling.
## `CW-REGTECH-010` — Insolvency detection before operational decisions
**From**: finance-bp-067--firesale_stresstest · **Applicable to**: regtech-compliance
Banks below the insolvency threshold (3% leverage) must trigger default immediately, not enter the deleveraging decision logic. Checking operational thresholds before insolvency creates zombie banks with negative equity. Always gate operational decisions on prior insolvency state.
FILE:references/components/agent_decision_phase.md
# agent_decision_phase (5 classes)
## `Bank.act`
`agent_decision_phase/bank-act.py:0`
## `BankLeverageConstraint.check`
`agent_decision_phase/bankleverageconstraint-check.py:0`
## `do_delever.execute`
`agent_decision_phase/do-delever-execute.py:0`
## `delever_strategy`
`agent_decision_phase/delever-strategy.py:0`
## `leverage_threshold`
`agent_decision_phase/leverage-threshold.py:0`
FILE:references/components/default_handling.md
# default_handling (3 classes)
## `Bank.do_trigger_default`
`default_handling/bank-do-trigger-default.py:0`
## `sell_assets_proportionally`
`default_handling/sell-assets-proportionally.py:0`
## `default_treatment`
`default_handling/default-treatment.py:0`
FILE:references/components/market_clearing.md
# market_clearing (6 classes)
## `AssetMarket.clear_the_market`
`market_clearing/assetmarket-clear-the-market.py:0`
## `Order.settle`
`market_clearing/order-settle.py:0`
## `compute_price_impact`
`market_clearing/compute-price-impact.py:0`
## `clearing_mode`
`market_clearing/clearing-mode.py:0`
## `price_impact_function`
`market_clearing/price-impact-function.py:0`
## `execution_price`
`market_clearing/execution-price.py:0`
FILE:references/components/model_initialization.md
# model_initialization (4 classes)
## `Model.initialize`
`model_initialization/model-initialize.py:0`
## `Bank.__init__`
`model_initialization/bank-init.py:0`
## `AssetMarket.__init__`
`model_initialization/assetmarket-init.py:0`
## `data_source`
`model_initialization/data-source.py:0`
FILE:references/components/shock_application.md
# shock_application (5 classes)
## `Model.apply_initial_shock`
`shock_application/model-apply-initial-shock.py:0`
## `AssetMarket.update_price`
`shock_application/assetmarket-update-price.py:0`
## `Tradable.update_price`
`shock_application/tradable-update-price.py:0`
## `shock_asset`
`shock_application/shock-asset.py:0`
## `shock_fraction`
`shock_application/shock-fraction.py:0`
FILE:references/seed.yaml
meta:
id: finance-bp-067-v5.3
version: v6.1
blueprint_id: finance-bp-067
sop_version: crystal-compilation-v6.1
source_language: en
compiled_at: '2026-04-22T13:00:22.380878+00:00'
target_host: openclaw
authoritative_artifact:
primary: seed.yaml
non_authoritative_derivatives:
- SKILL.md (host-generated summary, may lag)
- HEARTBEAT.md (host telemetry)
- memory/*.md (host conversational memory)
rule: On any behavioral decision (preconditions check, OV assertion, EQ rule firing, spec_lock verification), agents MUST
re-read seed.yaml. Derivatives are for UI display only and may be out-of-date.
execution_protocol:
install_trigger:
- Execute resources.host_adapter.install_recipes[] in declared order
- Verify each package with import check before proceeding
execute_trigger: When user intent matches intent_router.uc_entries[].positive_terms AND user uses action verb (run/execute/跑/执行/backtest/fetch/collect)
on_execute:
- Reload seed.yaml (do not rely on SKILL.md or cached summaries)
- Run preconditions[] in declared order; halt on first fatal failure with on_fail message to user
- Enter context_state_machine.CA1_MEMORY_CHECKED state
- Evaluate evidence_quality.enforcement_rules[]; prepend user_disclosure_template
- Translate user_facing_fields to user locale per locale_contract
- "[V6 READING ORDER]\nThis crystal contains the following V6 layers. Before answering any business question, the host\
\ MUST read them in order:\n 1. anti_patterns[] — cross-project anti-patterns (with AP-* ids)\n 2. cross_project_wisdom[]\
\ — cross-project wisdom (with CW-* ids)\n 3. domain_constraints_injected[] — domain constraints (SHARED-* ids)\n \
\ 4. known_use_cases[] — concrete business scenarios (KUC-* ids)\n 5. component_capability_map — AST component map\
\ (by module)\n\nWhen answering user questions, proactively cite relevant AP-*/CW-*/SHARED-*/KUC-* ids with source text.\
\ Examples: T+1 rules -> cite SHARED-* constraint; model comparison -> warn via AP-*; follow-holdings strategy -> cite\
\ KUC-* with example file."
workspace_resolution:
scripts_path: '{host_workspace}/scripts/'
skills_path: '{host_workspace}/skills/'
trace_path: '{host_workspace}/.trace/'
capability_tags:
markets:
- global
activities:
- regtech-compliance
upgraded_from: finance-bp-067-v1.seed.yaml
upgraded_at: '2026-04-22T13:20:13.710708+00:00'
v6_inputs:
ast_mind_map: knowledge/sources/finance/finance-bp-067--firesale_stresstest/v6_inputs/ast_mind_map.yaml
anti_patterns: null
cross_project_wisdom: null
examples_kuc: knowledge/sources/finance/finance-bp-067--firesale_stresstest/v6_inputs/examples_kuc.yaml
shared_pools_dir: knowledge/sources/finance/_shared
anti_patterns:
- id: AP-REGTECH-001
title: Missing attribute initialization on data structures
description: 'When loading account lists or creating entity dictionaries, failing to initialize required list/dict attributes
(e.g., normal_models, statement IDs) causes KeyError or ValueError at runtime. The code path that reads these structures
assumes they exist, but the initialization path omits them. Consequence: pipeline crashes or data loss for affected entities.'
project_source: finance-bp-060--AMLSim, finance-bp-071--opensanctions
severity: high
applicable_to_tags:
markets:
- global
activities:
- regtech-compliance
_source_file: anti-patterns/regtech.yaml
- id: AP-REGTECH-002
title: Self-loops in transaction graphs violate domain rules
description: 'When generating directed transaction graphs or AML typologies, allowing source == destination edges creates
self-loops. In AML simulation, self-loops represent accounts sending money to themselves, which is not a valid money laundering
pattern. In fire-sale models, self-loops cause undefined behavior. Consequence: corrupted graph topology and invalid typology
validation.'
project_source: finance-bp-060--AMLSim, finance-bp-067--firesale_stresstest
severity: high
applicable_to_tags:
markets:
- global
activities:
- regtech-compliance
_source_file: anti-patterns/regtech.yaml
- id: AP-REGTECH-003
title: Unvalidated floating-point inputs cause runtime crashes
description: 'When parsing CSV files or computing statistical functions on raw data, failing to validate inputs against
acceptable ranges (e.g., DDP near 0 or 1 for norm.ppf, unvalidated floats from CSV) causes ValueError or infinite/NaN
values. Consequence: entire model crashes before simulation or corrupted downstream calculations.'
project_source: finance-bp-062--ifrs9, finance-bp-067--firesale_stresstest
severity: high
applicable_to_tags:
markets:
- global
activities:
- regtech-compliance
_source_file: anti-patterns/regtech.yaml
- id: AP-REGTECH-004
title: Division by zero in financial calculations produces inf/NaN
description: 'When calculating ratios like DDP (downgrade observations / total observations) or price impact denominators
(total_quantities), zero-denominator cases are not guarded. The resulting inf/NaN propagates through all downstream calculations,
corrupting CCI, ECL, or market clearing. Consequence: systematic data corruption across the entire calculation pipeline.'
project_source: finance-bp-062--ifrs9, finance-bp-067--firesale_stresstest
severity: high
applicable_to_tags:
markets:
- global
activities:
- regtech-compliance
_source_file: anti-patterns/regtech.yaml
- id: AP-REGTECH-005
title: Incorrect amortization windows violate IFRS 9 compliance
description: 'Stage 1 ECL requires exactly 12-month amortization (11 zero-indexed iterations) while Stage 2/3 requires full
remaining tenor (tenor-1 iterations). Using identical windows for all stages causes ECL over/understatement. Consequence:
regulatory non-compliance and materially incorrect loan loss provisions.'
project_source: finance-bp-062--ifrs9
severity: high
applicable_to_tags:
markets:
- global
activities:
- regtech-compliance
_source_file: anti-patterns/regtech.yaml
- id: AP-REGTECH-006
title: Wrong leverage formula in threshold-based decisions
description: 'Computing leverage as equity-to-liabilities (E/L) instead of equity-to-assets (E/A) produces different values.
This causes deleveraging triggers and insolvency detection to fire at wrong thresholds. Consequence: zombie banks continue
operating with negative equity, or healthy banks unnecessarily deleverage.'
project_source: finance-bp-067--firesale_stresstest
severity: high
applicable_to_tags:
markets:
- global
activities:
- regtech-compliance
_source_file: anti-patterns/regtech.yaml
- id: AP-REGTECH-007
title: Confusing deleveraging buffer threshold with insolvency threshold
description: 'Banks below 3% leverage are insolvent and must default, but deleveraging should trigger at 4% buffer. Using
the same threshold eliminates the buffer zone, causing immediate default with no intermediate corrective action. Consequence:
excessive bank failures amplify systemic contagion.'
project_source: finance-bp-067--firesale_stresstest
severity: high
applicable_to_tags:
markets:
- global
activities:
- regtech-compliance
_source_file: anti-patterns/regtech.yaml
- id: AP-REGTECH-008
title: Cache keys omit request body for state-changing methods
description: 'Using only URL for cache fingerprints on POST/PATCH requests means different request bodies return identical
cached content. This causes stale data, missing entities, and data corruption in compliance screening pipelines. Consequence:
sanctions matches missed or false positives from stale entity data.'
project_source: finance-bp-071--opensanctions
severity: high
applicable_to_tags:
markets:
- global
activities:
- regtech-compliance
_source_file: anti-patterns/regtech.yaml
- id: AP-REGTECH-009
title: ID collision in entity construction creates false sanctions matches
description: 'When constructing entity IDs from source identifiers, insufficient identifying attributes cause different
real-world entities to receive identical IDs. The database then merges them into one entity. Consequence: a sanctioned
entity''s ID matches an innocent entity, causing false positive compliance alerts.'
project_source: finance-bp-071--opensanctions
severity: high
applicable_to_tags:
markets:
- global
activities:
- regtech-compliance
_source_file: anti-patterns/regtech.yaml
- id: AP-REGTECH-010
title: Incorrect cumulative PD ordering corrupts lifetime ECL term structure
description: 'Using cumprod(1-conPD) without shift(1) and fillna(1) produces corrupted first-period survival probability.
This cascades into all subsequent marginal and cumulative PD calculations, violating IFRS 9 lifetime ECL requirements.
Consequence: systematically incorrect provisions across all remaining tenor periods.'
project_source: finance-bp-062--ifrs9
severity: high
applicable_to_tags:
markets:
- global
activities:
- regtech-compliance
_source_file: anti-patterns/regtech.yaml
- id: AP-REGTECH-011
title: Mismatched configuration parameters across coupled components
description: 'When TransactionGenerator and Nominator use different degree_threshold values, Nominator identifies hub accounts
using different criteria than TransactionGenerator. This causes incorrect fan-in/fan-out candidate selection. Consequence:
AML typology patterns placed on wrong accounts, invalidating simulation results.'
project_source: finance-bp-060--AMLSim
severity: medium
applicable_to_tags:
markets:
- global
activities:
- regtech-compliance
_source_file: anti-patterns/regtech.yaml
- id: AP-REGTECH-012
title: Reverse property assignment corrupts entity construction
description: 'Stub (reverse) properties represent inverse relationships and raise InvalidData when directly assigned. Attempting
to add values to stub properties instead of forward properties causes ValueError, aborting entity construction. Consequence:
entities lost from output, incomplete compliance datasets.'
project_source: finance-bp-071--opensanctions
severity: medium
applicable_to_tags:
markets:
- global
activities:
- regtech-compliance
_source_file: anti-patterns/regtech.yaml
- id: AP-REGTECH-013
title: Order-dependent execution creates first-mover advantage bias
description: 'Without separating step() and act() phases, first-acting banks sell assets before others decide, creating
systematic first-mover advantage. This distorts the competitive equilibrium and fire-sale dynamics. Consequence: unreliable
systemic risk estimates that understate contagion for late-acting banks.'
project_source: finance-bp-067--firesale_stresstest
severity: medium
applicable_to_tags:
markets:
- global
activities:
- regtech-compliance
_source_file: anti-patterns/regtech.yaml
- id: AP-REGTECH-014
title: Immediate asset sales cause double-selling and undefined state
description: 'Executing asset sales immediately rather than queuing them to a buffer allows multiple banks holding the same
asset to sell simultaneously without accounting for concurrent intentions. Consequence: undefined price impact and incorrect
cash transfers in market clearing.'
project_source: finance-bp-067--firesale_stresstest
severity: medium
applicable_to_tags:
markets:
- global
activities:
- regtech-compliance
_source_file: anti-patterns/regtech.yaml
- id: AP-REGTECH-015
title: Missing EAD component in ECL formula produces incomplete provisions
description: 'IFRS 9 requires ECL = PD x LGD x EAD. When the EAD module is missing or not integrated, the ECL calculation
is incomplete and unusable for provisioning. Consequence: regulatory rejection of ECL calculations, blocking of provisioning
and reporting processes.'
project_source: finance-bp-062--ifrs9
severity: high
applicable_to_tags:
markets:
- global
activities:
- regtech-compliance
_source_file: anti-patterns/regtech.yaml
cross_project_wisdom:
- wisdom_id: CW-REGTECH-001
source_project: finance-bp-062--ifrs9, finance-bp-067--firesale_stresstest
pattern_name: Input bounds validation before statistical computation
description: Statistical functions like norm.ppf() and cumprod() have strict input requirements that, if violated, produce
infinite or NaN values corrupting entire pipelines. Always validate inputs against domain constraints (DDP in (0,1), counts
> 0) before passing to statistical functions. Apply to any statistical or inverse-CDF computation.
applicable_to_activity: regtech-compliance
_source_file: cross-project-wisdom/regtech.yaml
- wisdom_id: CW-REGTECH-002
source_project: finance-bp-060--AMLSim, finance-bp-067--firesale_stresstest
pattern_name: Graph/topology invariant verification before construction
description: 'Before constructing graph structures (transaction networks, transition matrices), verify invariants: sum(in-degrees)
= sum(out-degrees), matrix row sums = 1.0, degree sequence length divisibility. This catches data corruption early before
expensive graph construction operations. Apply to any bipartite or directed graph generation.'
applicable_to_activity: regtech-compliance
_source_file: cross-project-wisdom/regtech.yaml
- wisdom_id: CW-REGTECH-003
source_project: finance-bp-062--ifrs9
pattern_name: Regulatory amortization window discipline
description: 'IFRS 9 mandates different ECL calculation windows: exactly 12-month for Stage 1 (11 zero-indexed iterations),
full remaining tenor for Stage 2/3. Mixing these up violates compliance requirements. Always encode stage-specific window
logic explicitly rather than reusing a single loop variable across stages.'
applicable_to_activity: regtech-compliance
_source_file: cross-project-wisdom/regtech.yaml
- wisdom_id: CW-REGTECH-004
source_project: finance-bp-071--opensanctions
pattern_name: Fingerprint composition must include all request dimensions
description: 'Cache keys must include all request parameters that affect response content: URL, HTTP method, authentication
headers, and request body for state-changing methods. POST requests with different bodies returning identical cache is
a silent data corruption bug. Always compose fingerprints from the union of all content-affecting parameters.'
applicable_to_activity: regtech-compliance
_source_file: cross-project-wisdom/regtech.yaml
- wisdom_id: CW-REGTECH-005
source_project: finance-bp-067--firesale_stresstest
pattern_name: Floating-point zero-equivalence with explicit epsilon tolerance
description: IEEE 754 floating-point precision causes exact zero comparisons to fail in financial calculations. Always use
eps=1e-9 tolerance for zero-equivalence checks in market clearing, leverage ratios, and price impact calculations. This
prevents division-by-zero crashes and incorrect cash transfers.
applicable_to_activity: regtech-compliance
_source_file: cross-project-wisdom/regtech.yaml
- wisdom_id: CW-REGTECH-006
source_project: finance-bp-062--ifrs9
pattern_name: Stage classification threshold ordering enforcement
description: 'IFRS 9 SICR thresholds must be ordered: BUCKETS 2-3 trigger Stage 2, BUCKETS >=4 trigger Stage 3. Applying
thresholds in wrong order or omitting absolute DPD triggers causes material ECL misstatement. Validate threshold ordering
and document bucket-to-stage mapping explicitly.'
applicable_to_activity: regtech-compliance
_source_file: cross-project-wisdom/regtech.yaml
- wisdom_id: CW-REGTECH-007
source_project: finance-bp-067--firesale_stresstest
pattern_name: Initialization-before-use dependency ordering
description: 'Operational dependencies must initialize before dependent objects use them: AssetMarket before bank registration,
CSV file existence before parsing, entity ID before statement addition. Violations cause AttributeError or FileNotFoundError
that abort entire initialization. Always encode dependency ordering explicitly in initialization sequences.'
applicable_to_activity: regtech-compliance
_source_file: cross-project-wisdom/regtech.yaml
- wisdom_id: CW-REGTECH-008
source_project: finance-bp-071--opensanctions
pattern_name: Sufficient entity ID collision prevention
description: Entity IDs must include enough identifying attributes (dataset prefix, source, identifier type, document number)
to guarantee uniqueness. Collisions create false equivalence between unrelated entities, directly causing false positive
sanctions matches. Include the maximum available discriminating attributes in ID construction.
applicable_to_activity: regtech-compliance
_source_file: cross-project-wisdom/regtech.yaml
- wisdom_id: CW-REGTECH-009
source_project: finance-bp-060--AMLSim
pattern_name: Hub selection with candidate removal before addition
description: When selecting hub accounts for typology placement, always call remove_typology_candidate BEFORE add_node for
each selected account. Reversing this order causes hub self-selection (accounts choosing themselves) and duplicate assignment
across overlapping patterns. Apply to any allocation algorithm with candidate pooling.
applicable_to_activity: regtech-compliance
_source_file: cross-project-wisdom/regtech.yaml
- wisdom_id: CW-REGTECH-010
source_project: finance-bp-067--firesale_stresstest
pattern_name: Insolvency detection before operational decisions
description: Banks below the insolvency threshold (3% leverage) must trigger default immediately, not enter the deleveraging
decision logic. Checking operational thresholds before insolvency creates zombie banks with negative equity. Always gate
operational decisions on prior insolvency state.
applicable_to_activity: regtech-compliance
_source_file: cross-project-wisdom/regtech.yaml
domain_constraints_injected: []
resources_injected: {}
component_capability_map:
project: finance-bp-067--firesale_stresstest
scan_date: '2026-04-22'
stats:
total_files: 5
total_classes: 23
total_functions: 0
total_stages: 5
modules:
model_initialization:
class_count: 4
stage_id: initialization
stage_order: 1
responsibility: 'Load bank balance sheets from EBA data and initialize market infrastructure. WHY: Provides reproducible
starting state from real European banking data.'
classes:
- name: Model.initialize
file: model_initialization/model-initialize.py
line: 0
kind: required_method
signature: ''
- name: Bank.__init__
file: model_initialization/bank-init.py
line: 0
kind: required_method
signature: ''
- name: AssetMarket.__init__
file: model_initialization/assetmarket-init.py
line: 0
kind: required_method
signature: ''
- name: data_source
file: model_initialization/data-source.py
line: 0
kind: replaceable_point
design_decision_count: 4
shock_application:
class_count: 5
stage_id: shock_application
stage_order: 2
responsibility: 'Apply exogenous initial shock to asset prices, triggering potential deleveraging cascade. WHY: Models
contagion from external market shock (e.g., sovereign debt crisis).'
classes:
- name: Model.apply_initial_shock
file: shock_application/model-apply-initial-shock.py
line: 0
kind: required_method
signature: ''
- name: AssetMarket.update_price
file: shock_application/assetmarket-update-price.py
line: 0
kind: required_method
signature: ''
- name: Tradable.update_price
file: shock_application/tradable-update-price.py
line: 0
kind: required_method
signature: ''
- name: shock_asset
file: shock_application/shock-asset.py
line: 0
kind: replaceable_point
- name: shock_fraction
file: shock_application/shock-fraction.py
line: 0
kind: replaceable_point
design_decision_count: 3
agent_decision_phase:
class_count: 5
stage_id: agent_decision
stage_order: 3
responsibility: 'Each bank evaluates solvency and chooses deleveraging actions. WHY: Separating decision from execution
ensures order independence.'
classes:
- name: Bank.act
file: agent_decision_phase/bank-act.py
line: 0
kind: required_method
signature: ''
- name: BankLeverageConstraint.check
file: agent_decision_phase/bankleverageconstraint-check.py
line: 0
kind: required_method
signature: ''
- name: do_delever.execute
file: agent_decision_phase/do-delever-execute.py
line: 0
kind: required_method
signature: ''
- name: delever_strategy
file: agent_decision_phase/delever-strategy.py
line: 0
kind: replaceable_point
- name: leverage_threshold
file: agent_decision_phase/leverage-threshold.py
line: 0
kind: replaceable_point
design_decision_count: 6
market_clearing:
class_count: 6
stage_id: market_clearing
stage_order: 4
responsibility: 'Execute each queued sell orders and compute price impact. WHY: Batch execution isolates market mechanics
from agent decision-making.'
classes:
- name: AssetMarket.clear_the_market
file: market_clearing/assetmarket-clear-the-market.py
line: 0
kind: required_method
signature: ''
- name: Order.settle
file: market_clearing/order-settle.py
line: 0
kind: required_method
signature: ''
- name: compute_price_impact
file: market_clearing/compute-price-impact.py
line: 0
kind: required_method
signature: ''
- name: clearing_mode
file: market_clearing/clearing-mode.py
line: 0
kind: replaceable_point
- name: price_impact_function
file: market_clearing/price-impact-function.py
line: 0
kind: replaceable_point
- name: execution_price
file: market_clearing/execution-price.py
line: 0
kind: replaceable_point
design_decision_count: 6
default_handling:
class_count: 3
stage_id: default_handling
stage_order: 5
responsibility: 'Process bank defaults and redistribute assets. WHY: Defaults are terminal events that affect systemic
risk calculations.'
classes:
- name: Bank.do_trigger_default
file: default_handling/bank-do-trigger-default.py
line: 0
kind: required_method
signature: ''
- name: sell_assets_proportionally
file: default_handling/sell-assets-proportionally.py
line: 0
kind: required_method
signature: ''
- name: default_treatment
file: default_handling/default-treatment.py
line: 0
kind: replaceable_point
design_decision_count: 3
data_flow_hints: []
locale_contract:
source_language: en
user_facing_fields:
- human_summary.what_i_can_do.tagline
- human_summary.what_i_can_do.use_cases[]
- human_summary.what_i_auto_fetch[]
- human_summary.what_i_ask_you[]
- evidence_quality.user_disclosure_template
- post_install_notice.message_template.positioning
- post_install_notice.message_template.capability_catalog.groups[].name
- post_install_notice.message_template.capability_catalog.groups[].description
- post_install_notice.message_template.capability_catalog.groups[].ucs[].name
- post_install_notice.message_template.capability_catalog.groups[].ucs[].short_description
- post_install_notice.message_template.call_to_action
- post_install_notice.message_template.featured_entries[].beginner_prompt
- post_install_notice.message_template.more_info_hint
- preconditions[].description
- preconditions[].on_fail
- intent_router.uc_entries[].name
- intent_router.uc_entries[].ambiguity_question
- architecture.pipeline
- architecture.stages[].narrative.does_what
- architecture.stages[].narrative.key_decisions
- architecture.stages[].narrative.common_pitfalls
- constraints.fatal[].consequence
- constraints.regular[].consequence
- output_validator.assertions[].failure_message
- acceptance.hard_gates[].on_fail
- skill_crystallization.action
locale_detection_order:
- explicit_user_declaration
- first_message_language
- system_locale
translation_enforcement:
trigger: on_first_user_message
action: Render user_facing_fields in detected locale, preserving all IDs (BD-/SL-/UC-/finance-C-) and code identifiers
verbatim
violation_code: LOCALE-01
violation_signal: User receives untranslated English Human Summary when detected locale != en
evidence_quality:
declared:
evidence_coverage_ratio: 1.0
evidence_verify_ratio: 0.5607476635514018
evidence_invalid: 47
evidence_verified: 60
evidence_auto_fixed: 0
audit_coverage: 32/32 (100%)
audit_pass_rate: 1/32 (3%)
audit_fail_total: 22
audit_finance_universal:
pass: 1
warn: 6
fail: 13
audit_subdomain_totals:
pass: 0
warn: 3
fail: 9
enforcement_rules:
- id: EQ-01
trigger: declared.evidence_verify_ratio < 0.5
action: MUST invoke traceback lookup for all cited BD-IDs in output before emitting business code — read LATEST.yaml sections
for each BD referenced
violation_code: EQ-01-V
violation_signal: Generated script references BD-IDs but no tool_call to read LATEST.yaml preceded code generation
user_disclosure_template: '[QUALITY NOTICE] This crystal was compiled from blueprint finance-bp-067. Evidence verify ratio
= 56.1% and audit fail total = 22. Generated results may have uncaptured requirement gaps. Verify critical decisions against
source files (LATEST.yaml / LATEST.jsonl).'
traceback:
source_files:
blueprint: LATEST.yaml
constraints: LATEST.jsonl
mandatory_lookup_scenarios:
- id: TB-01
condition: Two constraints have apparently conflicting enforcement rules
lookup_target: LATEST.jsonl — find both constraint IDs, compare `consequence` + `evidence_refs` to determine priority
- id: TB-02
condition: A business decision rationale is unclear or disputed
lookup_target: LATEST.yaml — locate BD-ID under business_decisions, read `rationale` + `alternative_considered` fields
- id: TB-03
condition: evidence_invalid > 0 in evidence_quality.declared
lookup_target: LATEST.yaml _enrich_meta — cross-check specific BD `evidence_refs` fields for invalid markers
- id: TB-04
condition: User asks where a rule comes from
lookup_target: LATEST.jsonl — find constraint by ID, read `confidence.evidence_refs` for source file + line number
- id: TB-05
condition: Generated code does not match expected ZVT API behavior
lookup_target: LATEST.yaml stages[].required_methods — verify method signature and evidence locator in source code
degraded_lookup:
no_fs_access: 'Ask the user to paste the relevant LATEST.yaml section or LATEST.jsonl lines for the BD-/finance-C- IDs
in question. Crystal ID: finance-bp-067-v5.0.'
trace_schema:
event_types:
- precondition_check
- spec_lock_check
- evidence_rule_fired
- evidence_rule_skipped
- locale_translation_emitted
- hard_gate_passed
- hard_gate_failed
- skill_emitted
- false_completion_claim
preconditions:
- id: PC-01
description: zvt package installed and importable
check_command: python3 -c 'import zvt; print(zvt.__version__)'
on_fail: 'Run: python3 -m pip install zvt then re-run: python3 -m zvt.init_dirs to initialize data directories'
severity: fatal
- id: PC-02
description: K-data exists for target entities (required before backtesting)
check_command: python3 -c "from zvt.api.kdata import get_kdata; df = get_kdata(entity_ids=['stock_sh_600000'], limit=1);
assert df is not None and len(df) > 0, 'No kdata found'"
on_fail: 'Run recorder first: python3 -m zvt.recorders.em.em_stock_kdata_recorder --entity_ids stock_sh_600000 (replace
with your target entity IDs)'
severity: fatal
applies_to_uc: []
- id: PC-03
description: ZVT data directory initialized (~/.zvt or ZVT_HOME)
check_command: 'python3 -c "import os; from pathlib import Path; zvt_home = Path(os.environ.get(''ZVT_HOME'', Path.home()
/ ''.zvt'')); assert zvt_home.exists(), f''ZVT home not found: {zvt_home}''"'
on_fail: 'Run: python3 -m zvt.init_dirs'
severity: fatal
- id: PC-04
description: SQLite write permission for ZVT data directory
check_command: python3 -c "import os, tempfile; from pathlib import Path; zvt_home = Path(os.environ.get('ZVT_HOME', Path.home()
/ '.zvt')); test_f = zvt_home / '.write_test'; test_f.touch(); test_f.unlink()"
on_fail: 'Check directory permissions: chmod u+w ~/.zvt or set ZVT_HOME environment variable to a writable location'
severity: warn
intent_router:
uc_entries: []
context_state_machine:
states:
- id: CA1_MEMORY_CHECKED
entry: Task started
exit: All memory queries attempted and recorded; memory_unavailable set if failed
timeout: 30s — skip memory, mark memory_unavailable=true, proceed to CA2
- id: CA2_GAPS_FILLED
entry: CA1 complete
exit: 'All FATAL-priority required inputs answered: target market (A-share/HK/US), data source, time range, strategy type'
timeout: NOT skippable — FATAL inputs MUST be user-answered before proceeding
- id: CA3_PATH_SELECTED
entry: CA2 complete
exit: intent_router matched single use case with confidence gap > 20% over next candidate, no data_domain ambiguity
timeout: Trigger ambiguity_question for top-2 candidates, await user selection
- id: CA4_EXECUTING
entry: CA3 complete + user explicit confirmation received
exit: All hard gates G1-Gn passed and output files written
timeout: NOT skippable — user confirmation of execution path required
enforcement: Code generation is PROHIBITED before CA4_EXECUTING. Any regression to earlier state MUST be announced to user.
buy/sell ordering SL-01 check runs at CA4 entry.
spec_lock_registry:
semantic_locks:
- id: SL-01
description: Execute sell orders before buy orders in every trading cycle
locked_value: sell() called before buy() in each Trader.run() iteration
violation_is: fatal
source_bd_ids:
- BD-018
- id: SL-02
description: Trading signals MUST use next-bar execution (no look-ahead)
locked_value: due_timestamp = happen_timestamp + level.to_second()
violation_is: fatal
source_bd_ids:
- BD-014
- BD-025
- id: SL-03
description: Entity IDs MUST follow format entity_type_exchange_code
locked_value: stock_sh_600000 | stockhk_hk_0700 | stockus_nasdaq_AAPL
violation_is: fatal
source_bd_ids: []
- id: SL-04
description: DataFrame index MUST be MultiIndex (entity_id, timestamp)
locked_value: df.index.names == ['entity_id', 'timestamp']
violation_is: fatal
source_bd_ids: []
- id: SL-05
description: 'TradingSignal MUST have EXACTLY ONE of: position_pct, order_money, order_amount'
locked_value: XOR enforcement in trading/__init__.py:68
violation_is: fatal
source_bd_ids: []
- id: SL-06
description: 'filter_result column semantics: True=BUY, False=SELL, None/NaN=NO ACTION'
locked_value: factor.py:475 order_type_flag mapping
violation_is: fatal
source_bd_ids: []
- id: SL-07
description: Transformer MUST run BEFORE Accumulator in factor pipeline
locked_value: 'compute_result(): transform at :403 before accumulator at :409'
violation_is: fatal
source_bd_ids: []
- id: SL-08
description: 'MACD parameters locked: fast=12, slow=26, signal=9'
locked_value: factors/algorithm.py:30 macd(slow=26, fast=12, n=9)
violation_is: fatal
source_bd_ids:
- BD-036
- id: SL-09
description: 'Default transaction costs: buy_cost=0.001, sell_cost=0.001, slippage=0.001'
locked_value: sim_account.py:25 SimAccountService default costs
violation_is: warning
source_bd_ids:
- BD-029
- id: SL-10
description: A-share equity trading is T+1 (no same-day close of buy positions)
locked_value: sim_account.available_long filters by trading_t
violation_is: fatal
source_bd_ids: []
- id: SL-11
description: Recorder subclass MUST define provider AND data_schema class attributes
locked_value: contract/recorder.py:71 Meta; register_schema decorator
violation_is: fatal
source_bd_ids: []
- id: SL-12
description: Factor result_df MUST contain either 'filter_result' OR 'score_result' column
locked_value: result_df.columns.intersection({'filter_result', 'score_result'}) non-empty
violation_is: fatal
source_bd_ids: []
implementation_hints:
- id: IH-01
hint: 'Use AdjustType enum exactly: qfq (pre-adjust), hfq (post-adjust), bfq (none) — contract/__init__.py:121'
- id: IH-02
hint: For A-share kdata, default to hfq for long-term analysis (dividend-adjusted) — trader.py:538 StockTrader
- id: IH-03
hint: SQLite connection MUST use check_same_thread=False for multi-threaded recorders
- id: IH-04
hint: Accumulator state serialization uses JSON with custom encoder/decoder hooks — contract/base_service.py
- id: IH-05
hint: Factor.level MUST match TargetSelector.level (enforced at add_factor) — factors/target_selector.py:84
preservation_manifest:
required_objects:
business_decisions_count: 143
fatal_constraints_count: 42
non_fatal_constraints_count: 127
use_cases_count: 0
semantic_locks_count: 12
preconditions_count: 4
evidence_quality_rules_count: 2
traceback_scenarios_count: 5
architecture:
pipeline: data_collection -> data_storage -> factor_computation -> target_selection -> trading_execution -> visualization
stages:
- id: data_collection
narrative:
does_what: TimeSeriesDataRecorder and FixedCycleDataRecorder fetch OHLCV and fundamental data from providers (eastmoney,
joinquant, baostock, akshare) and persist domain objects (Stock1dKdata, BalanceSheet) to SQLite via df_to_db().
key_decisions: BD-002 chose evaluate_start_end_size_timestamps for incremental fetch (not full refresh) because comparing
to get_latest_saved_record avoids redundant API calls; BD-003 chose get_data_map field transformation to keep domain
schema provider-agnostic.
common_pitfalls: 'Don''t forget SL-11: Recorder subclass MUST declare both provider and data_schema class attributes
else initialization fails with assertion error; finance-C-001 fatal violation.'
business_decisions: []
- id: data_storage
narrative:
does_what: StorageBackend persists DataFrames to per-provider SQLite databases at {data_path}/{provider}/{provider}_{db_name}.db
using path templates from _get_path_template; Mixin.record_data and Mixin.query_data provide uniform read/write interface.
key_decisions: BD-004 chose StorageBackend abstraction (not hardcoded SQLite) to allow future cloud storage swap; BD-006
derives db_name from data_schema __tablename__ for per-domain database isolation.
common_pitfalls: SL-04 violation (wrong DataFrame index) causes factor pipeline failures downstream; always ensure df.index.names
== ['entity_id', 'timestamp'] before calling record_data.
business_decisions: []
- id: factor_computation
narrative:
does_what: Factor.compute() applies Transformer (stateless, e.g. MacdTransformer) then Accumulator (stateful, e.g. MaStatsAccumulator)
to produce filter_result or score_result columns; EntityStateService persists per-entity rolling state across batches.
key_decisions: BD-007 chose Factor inheriting DataReader for composable data access; SL-08 locks MACD at (fast=12, slow=26,
n=9) — chose standard Appel parameters not adaptive because interpretability matters for practitioners.
common_pitfalls: 'SL-07: Transformer MUST run before Accumulator — swapping order causes NaN propagation; SL-12: result_df
must contain filter_result OR score_result column or TargetSelector silently drops all signals.'
business_decisions: []
- id: target_selection
narrative:
does_what: TargetSelector.add_factor() registers Factor instances; get_targets() returns entity_ids passing threshold
filter at a specific timestamp, enabling point-in-time historical backtesting without look-ahead.
key_decisions: BD-012 chose registrable factor list (not hardcoded) for runtime customization; BD-013 chose timestamp-specific
filtering not current-only because backtests need historical point-in-time correctness.
common_pitfalls: Factor.level MUST match TargetSelector.level (IH-05); mismatched levels cause silent empty target lists
that look like no signals but are actually level-mismatch bugs.
business_decisions: []
- id: trading_execution
narrative:
does_what: Trader.run() calls sell() before buy() each cycle, generates TradingSignals with due_timestamp = happen_timestamp
+ level.to_second() for next-bar execution, and applies on_profit_control() for stop-loss/take-profit before regular
target selection.
key_decisions: SL-01 locks sell-before-buy order because available_long check in sim_account depends on it — chose this
over symmetric ordering to prevent implicit leverage; BD-039 chose long=AND/short=OR multi-level logic to reflect
risk asymmetry.
common_pitfalls: 'SL-02 violation (immediate execution instead of next-bar) introduces look-ahead bias and makes backtest
results unreproducible in live trading; SL-10: A-share T+1 constraint — backtesting without it overstates returns.'
business_decisions: []
- id: visualization
narrative:
does_what: Drawer.draw() combines kline main chart with factor overlays and Rect annotations for entry/exit signals
using Plotly; Drawable interface on Factor enables consistent chart rendering across data types.
key_decisions: BD-019 chose drawer_rects subclass override for custom annotations not hardcoded markers — allows traders
to define entry/exit visuals without modifying base drawing logic.
common_pitfalls: draw_result=True by default (BD-055) is fine for development but set draw_result=False in production/headless
environments to avoid Plotly server startup overhead.
business_decisions: []
- id: cross_cutting_concerns
narrative:
does_what: 'Invariants and utilities that span multiple pipeline stages — collected from 22 source groups: agent_decision(6),
behaviour_strategy(5), behaviours(2), constraint_definition(7), constraints(1), default_handling(11), and 16 more.'
key_decisions: 143 BDs merged here because they apply to more than one main stage (e.g. algorithm helpers, default value
choices, ordering contracts, error handling). Agent should inspect individual BD summaries and link back to affected
main stages via shared IDs.
common_pitfalls: Cross-cutting concerns frequently surface as bugs when changes to one main stage unintentionally break
another. Check constraints referencing these BDs and verify invariants still hold after any stage-local modification.
business_decisions:
- id: BD-008
type: B
summary: Two-phase step (step + act) for order independence
- id: BD-009
type: B/BA
summary: Insolvent banks trigger default (raise exception)
- id: BD-010
type: B
summary: Order-independent buffer putForSale_
- id: BD-011
type: B/BA
summary: 'Deleveraging priority: pay loans first, then sell'
- id: BD-012
type: B/BA
summary: 'Threshold model: act when leverage < buffer (4%)'
- id: BD-013
type: B
summary: Perform proportionally across each actions of same type
- id: BD-043
type: B/DK
summary: Proportional delevering across each assets/liabilities
- id: BD-044
type: B/BA
summary: Pay liabilities first, then sell assets to raise liquidity
- id: BD-059
type: B
summary: Perform proportional delevering by max-amount weighting
- id: BD-061
type: B
summary: Truncate loan payment to notional to prevent overpayment
- id: BD-063
type: B
summary: Available actions reconstructed from scratch each step
- id: BD-085
type: B
summary: Sell/proportional deleveraging strategy
- id: BD-086
type: B
summary: 'Two-step delever: pay liabilities first, then sell assets'
- id: BD-025
type: B/BA
summary: Minimum leverage (insolvency threshold) = 3%
- id: BD-026
type: B/BA
summary: Leverage buffer threshold = 4% triggers delevering behavior
- id: BD-027
type: B/BA
summary: Target leverage = 5% when delevering
- id: BD-041
type: B/BA
summary: Solvency measured purely by leverage ratio (equity/assets)
- id: BD-064
type: B/DK
summary: Asset valuation = quantity * price for tradables
- id: BD-065
type: B/BA
summary: Loan valuation = principal (face value)
- id: BD-066
type: B/DK
summary: Other assets/liabilities use principal amount as valuation
- id: BD-071
type: B/DK
summary: Use leverage ratio lambda = E/A for insolvency detection
- id: BD-020
type: B/BA
summary: Default deferred to next step() phase
- id: BD-021
type: B/BA
summary: Bank alive flag prevents further actions
- id: BD-022
type: B/BA
summary: Default sells each assets proportionally
- id: BD-046
type: B/BA
summary: Upon default, sell ALL tradable assets immediately
- id: BD-092
type: BA/DK
summary: SIMULTANEOUS_FIRESALE=True batches each sells before price impact
- id: BD-093
type: BA/DK
summary: PRICE_IMPACTS defaults to 0.05 (5% price drop per 5% market sold)
- id: BD-094
type: BA
summary: BANK_LEVERAGE_BUFFER=0.04 is threshold for initiating deleveraging
- id: BD-095
type: BA
summary: BANK_LEVERAGE_MIN=0.03 is insolvency trigger (leverage < 3%)
- id: BD-101
type: BA
summary: ASSET_TO_SHOCK defaults to GOV_BONDS for initial price shock
- id: BD-105
type: BA
summary: Loan/OtherLiability split 50-50 from total liability
- id: BD-107
type: BA/DK
summary: 'Exponential price impact formula: 5% sold -> 5% drop at beta=~10.5'
- id: BD-108
type: B/BA
summary: 'INTERACTION: [BD-005/BD-029] × [BD-014/BD-032] × [BD-015/BD-038] → Amplification of cascade severity through
simultaneous fire sale compression'
- id: BD-109
type: B/BA
summary: 'INTERACTION: [BD-026/BD-073] → [BD-043] → [BD-014] → [BD-015] → [BD-026] → Risk Cascade feedback loop: deleveraging
buffer triggers fire sales that erode buffer again'
- id: BD-110
type: BA
summary: 'INTERACTION: [BD-026/BD-073] vs [BD-027/BD-074] → Contradiction: 1% buffer between trigger (4%) and target
(5%) is insufficient for stabilization'
- id: BD-111
type: BA
summary: 'INTERACTION: [BD-043] × [BD-017/BD-083] × [BD-015] → Hidden dependency: Proportional deleveraging with midpoint
pricing undervalues assets under stress'
- id: BD-112
type: BA/DK
summary: 'INTERACTION: [BD-014/BD-032] × [BD-048] → Hidden dependency: Simultaneous firesale requires price impact computed
BEFORE settlement, breaking if order reversed'
- id: BD-113
type: BA/M
summary: 'INTERACTION: [BD-045] × [BD-051] × [BD-052] → Hidden dependency: Random shuffle for fairness requires fixed
seed AND sufficient Monte Carlo runs for validity'
- id: BD-114
type: BA/M
summary: 'INTERACTION: [BD-018] × [BD-022] → Hidden dependency: Per-asset-type price impact assumes fungibility that
breaks under default liquidation'
- id: BD-115
type: BA
summary: 'INTERACTION: [BD-002/BD-067] × [BD-071] × [BD-003/BD-033] → Risk Cascade: Balance sheet derivation with 5%
cash creates liquidity-solvency timing mismatch'
- id: BD-116
type: BA
summary: 'INTERACTION: [BD-007] × [BD-062] × [BD-040] → Hidden dependency: Default price of 1.0 for unknown assets creates
silent failures in parameter sweep results'
- id: BD-117
type: BA/DK
summary: 'INTERACTION: [BD-041] (leverage-only insolvency) × [BD-065] (loan at face value) × [BD-064] (tradables at
market) → Contradiction: Mixed valuation creates arbitrary solvency boundaries'
- id: BD-118
type: B/BA
summary: 'INTERACTION: [BD-009] × [BD-020] × [BD-042] → Risk Cascade: Deferred default execution creates accumulation
of silent distress across timesteps'
- id: BD-119
type: BA
summary: 'INTERACTION: [BD-006] × [BD-049] → Amplification: Per-bank price update combined with asymmetric price dynamics
creates systematic underpricing'
- id: BD-106
type: M
summary: Contract extends ESLContract from external economicsl library
- id: BD-001
type: B/BA
summary: Banks derived from real EBA 2018 CSV data
- id: BD-002
type: B/BA
summary: 'Balance sheet formula: asset=CET1E/leverage, liability=asset-CET1E'
- id: BD-003
type: BA
summary: Cash fixed at 5% of total assets
- id: BD-004
type: BA/M
summary: Other liability split 50/50 with loan
- id: BD-023
type: B/BA
summary: Use 48 banks from EBA 2018 EU-wide stress test data as model population
- id: BD-033
type: B/BA
summary: Cash allocation = 5% of total assets
- id: BD-034
type: B/BA
summary: Loans and other liabilities split 50/50
- id: BD-035
type: B/RC
summary: Corporate bonds = debt securities minus government bonds
- id: BD-067
type: B/BA
summary: 'Balance sheet derived: asset = CET1E / (leverage/100)'
- id: BD-068
type: B/RC
summary: Other assets = total assets - debt securities - cash
- id: BD-072
type: B/BA
summary: Insolvency threshold at leverage < 3%
- id: BD-073
type: B/BA
summary: Trigger deleveraging when leverage < 4% (buffer zone)
- id: BD-074
type: B/BA
summary: Target leverage of 5% (100/20 leverage ratio)
- id: BD-079
type: B/BA
summary: 'Systemic risk threshold: EOSE < 5% returns 0 (no systemic event)'
- id: BD-080
type: B/BA
summary: EOSE = number_of_defaulted_banks / NBANKS (48 banks)
- id: BD-081
type: B/BA
summary: Run simulation for exactly 6 timesteps
- id: BD-084
type: B
summary: 'Two-phase execution: simultaneous vs random shuffle firesale'
- id: BD-088
type: B/BA
summary: Initial shock applied to government bonds market
- id: BD-GAP-001
type: DK
summary: 'Missing: as-of vs processing time'
- id: BD-GAP-002
type: DK
summary: 'Missing: Trading calendar isolation'
- id: BD-GAP-003
type: DK
summary: 'Missing: Timezone explicit annotation'
- id: BD-GAP-004
type: M
summary: 'Missing: Matrix ill-conditioning'
- id: BD-GAP-005
type: DK
summary: 'Missing: Point-in-Time data availability'
- id: BD-GAP-006
type: DK
summary: 'Missing: Stale data detection'
- id: BD-GAP-007
type: B
summary: 'Missing: PnL conservation'
- id: BD-GAP-008
type: DK
summary: 'Missing: Model and data version snapshot'
- id: BD-GAP-009
type: RC
summary: 'Missing: Price/quantity precision (tick/lot)'
- id: BD-GAP-010
type: M
summary: 'Missing: Transition matrix time homogeneity'
- id: BD-GAP-011
type: B
summary: 'Missing: Overdue definition (DPD 30/60/90)'
- id: BD-GAP-012
type: RC
summary: 'Missing: Collection priority & compliance'
- id: BD-GAP-013
type: DK
summary: 'Missing: Reconciliation timeliness'
- id: BD-GAP-014
type: DK
summary: 'Missing: as-of vs processing time'
- id: BD-GAP-015
type: DK
summary: 'Missing: Trading calendar isolation'
- id: BD-GAP-016
type: DK
summary: 'Missing: Timezone explicit annotation'
- id: BD-GAP-017
type: M
summary: 'Missing: Matrix ill-conditioning'
- id: BD-GAP-018
type: DK
summary: 'Missing: Stale data detection'
- id: BD-GAP-019
type: B
summary: 'Missing: PnL conservation'
- id: BD-GAP-020
type: M/DK
summary: 'Missing: Day count convention'
- id: BD-GAP-021
type: RC
summary: 'Missing: Price/quantity precision (tick/lot)'
- id: BD-GAP-022
type: B
summary: 'Missing: Default definition & IFRS 9 stages'
- id: BD-GAP-023
type: B
summary: 'Missing: PD/LGD/EAD estimation (IRB vs Standard)'
- id: BD-GAP-024
type: B
summary: 'Missing: Vasicek single-factor correlation (rho)'
- id: BD-096
type: DK/B
summary: putForSale_ tracks pending sales to ensure order independence
- id: BD-097
type: DK/B
summary: oldPrices stored before price update for mid-point settlement pricing
- id: BD-102
type: DK
summary: cash = 0.05 * asset (5% cash buffer) during balance sheet init
- id: BD-014
type: B/DK
summary: SIMULTANEOUS_FIRESALE=True batches each sales
- id: BD-015
type: B/BA
summary: Exponential price impact per Cifuentes 2005
- id: BD-016
type: BA/DK
summary: 5% market cap sold = 5% price drop by default
- id: BD-017
type: B
summary: Midpoint price execution
- id: BD-018
type: B
summary: Price impact per asset type, not per asset
- id: BD-019
type: BA
summary: Floating point tolerance eps=1e-9
- id: BD-031
type: B/BA
summary: Default price impact = 5% (5% market sold causes 5% price drop) linear baseline
- id: BD-032
type: B
summary: Simultaneous firesale batch processing enabled
- id: BD-037
type: B
summary: Asset sales settle at midpoint price (current + old price) / 2
- id: BD-038
type: B/BA
summary: Exponential price impact function per Cifuentes 2005
- id: BD-039
type: B/DK
summary: Beta calibrated so 5% market cap sold = 5% price drop
- id: BD-040
type: B/BA
summary: Default asset prices initialized at 1.0
- id: BD-048
type: B/DK
summary: Price impact computed before sales settle in clear_the_market()
- id: BD-049
type: B/BA
summary: Update asset prices only when price loss > 0
- id: BD-050
type: B/DK
summary: Cumulative quantities tracked separately from per-step
- id: BD-056
type: B/DK
summary: putForSale_ accumulator ensures order independence in asset sales
- id: BD-058
type: B/BA
summary: Price impact function uses exponential decay
- id: BD-062
type: B/BA
summary: 'Use defaultdict with lambda: 1.0 for default prices'
- id: BD-069
type: B/DK
summary: Use exponential price impact function for asset pricing
- id: BD-070
type: B/BA
summary: Calibrate price impact so 5% market sell causes 5% price drop
- id: BD-083
type: B/BA
summary: 'Settle sales at midpoint price: (current + old_price) / 2'
- id: BD-036
type: B/BA
summary: Floating point tolerance = 1e-9 EUR for zero checks
- id: BD-060
type: B/DK
summary: Do not execute action if amount is effectively zero
- id: BD-089
type: T
summary: step() MUST be called before act() per simulation tick
- id: BD-090
type: RC
summary: put_for_sale() MUST be called before clear_the_market() in same tick
- id: BD-091
type: T
summary: act() raises DefaultException, trigger_default() executes in NEXT step()
- id: BD-104
type: RC
summary: do_delever pays liabilities BEFORE selling assets (priority order)
- id: BD-098
type: B
summary: 'Contract pattern: get_action() returns action objects, is_eligible() filters'
- id: BD-099
type: B/DK
summary: perform_proportionally() distributes actions by max-amount weighting
- id: BD-103
type: B/DK
summary: random.shuffle(allAgents) ensures order independence across simulation runs
- id: BD-075
type: B
summary: Run 100 Monte Carlo simulations per parameter set
- id: BD-076
type: B
summary: Use sample mean and standard deviation for aggregating MC results
- id: BD-077
type: B/DK
summary: Set price impact parameter sweep from 0% to 10% in 21 points
- id: BD-082
type: B
summary: Use fixed random seed 1337 for reproducibility
- id: BD-024
type: B/BA
summary: Systemic event threshold = 5% average bank defaults (Gai-Kapadia 2010)
- id: BD-052
type: B
summary: Run 100 simulations for random shuffling benchmark
- id: BD-053
type: B/DK
summary: Price impact parameter sweep from 0% to 10%
- id: BD-054
type: B/BA
summary: Initial shock parameter sweep from 0% to 30%
- id: BD-055
type: B/BA
summary: Leverage buffer = 1.0 for leverage targeting comparison baseline
- id: BD-057
type: B/BA
summary: Leverage targeting comparison uses 100% buffer override
- id: BD-005
type: BA
summary: Initial shock defaults to 20% on government bonds
- id: BD-006
type: B/DK
summary: Price shock propagates by updating each bank asset prices
- id: BD-007
type: BA
summary: Default price is 1.0 for unknown asset types
- id: BD-028
type: B/BA
summary: Government bonds selected as asset to shock
- id: BD-029
type: B/BA
summary: Initial shock magnitude = 20% price drop
- id: BD-078
type: B/BA
summary: Set initial shock sweep from 0% to 30% in 21 points
- id: BD-087
type: B/BA
summary: Use 100% leverage buffer (1.0) for leverage targeting simulation
- id: BD-030
type: B/BA
summary: Simulation runs for 6 timesteps
- id: BD-042
type: B/BA
summary: Default execution deferred to step() phase
- id: BD-045
type: B/BA
summary: Random shuffle of agent order each simulation round
- id: BD-047
type: B/BA
summary: 'Step/act split: step() handles defaults, act() handles delevering'
- id: BD-051
type: B/DK
summary: Fixed random seed = 1337 for reproducibility
- id: BD-100
type: M/DK
summary: random_shuffling.py compares SIMULTANEOUS_FIRESALE vs sequential clearing
resources:
packages:
- name: numpy
version_pin: latest
- name: py-economicsl
version_pin: latest
- name: matplotlib
version_pin: latest
- name: jupytext
version_pin: latest
- name: rise
version_pin: latest
- name: py-destilledESL
version_pin: latest
strategy_scaffold:
entry_point_name: run_backtest
output_path: result.csv
execution_mode: backtest
conditional_entry_points:
backtest:
entry_point_name: run_backtest
output_path: result.csv
collector:
entry_point_name: run_collector
output_path: result.json
factor:
entry_point_name: run_factor
output_path: result.parquet
training:
entry_point_name: run_training
output_path: result.json
serving:
entry_point_name: run_server
output_path: result.json
research:
entry_point_name: run_research
output_path: result.json
tail_template: "# === DO NOT MODIFY BELOW THIS LINE ===\nif __name__ == \"__main__\":\n result = run_backtest() #\
\ implement above\n from validate import enforce_validation\n enforce_validation(result, output_path=\"{workspace}/result.csv\"\
)\n# === END DO NOT MODIFY ==="
host_adapter:
target: openclaw
timeout_seconds: 1800
shell_operator_restriction: 'exec tool intercepts && / ; / | — never chain: ''pip install X && python Y''. Use separate
exec calls.'
install_recipes:
- python3 -m pip install numpy
- python3 -m pip install py-economicsl
- python3 -m pip install matplotlib
- python3 -m pip install zvt
credential_injection: JoinQuant/QMT credentials require user-side '!' prefix shell login. Never hardcode credentials in
generated scripts.
path_resolution: '{workspace} resolves to ~/.openclaw/workspace/doramagic at execution time.'
file_io_tooling: Use openclaw 'write' tool for .py/.sql files; 'exec' tool for python3 /absolute/path/script.py (absolute
paths only).
constraints:
fatal:
- id: finance-C-001
when: When initializing banks from EBA_2018.csv
action: Create exactly 48 bank agents matching the NBANKS constant
severity: fatal
kind: domain_rule
modality: must
consequence: Model systemic risk calculations will be incorrect if fewer or more than 48 banks are created, as the get_extent_of_systemic_event
function divides by NBANKS=48 and the balance sheet aggregation will not match EBA 2018 reported values
stage_ids:
- initialization
- id: finance-C-002
when: When creating the AssetMarket during initialization
action: Initialize asset prices to exactly 1.0 for each asset types using defaultdict
severity: fatal
kind: domain_rule
modality: must
consequence: Initial price of 1.0 is required for consistent price impact calculations using Cifuentes 2005 formula. If
prices differ, the initial shock magnitude and contagion dynamics will be miscalibrated
stage_ids:
- initialization
- id: finance-C-003
when: When calculating bank balance sheet values from EBA data
action: Verify each parsed CSV values convert to valid floats without errors
severity: fatal
kind: domain_rule
modality: must
consequence: CSV parsing with float() on unvalidated strings will raise ValueError at runtime, causing the entire model
to crash before any simulation can run
stage_ids:
- initialization
- id: finance-C-005
when: When initializing model state
action: Verify EBA_2018.csv file exists in the working directory
severity: fatal
kind: resource_boundary
modality: must
consequence: FileNotFoundError will crash the model initialization if EBA_2018.csv is missing, preventing any stress test
simulation from running
stage_ids:
- initialization
- id: finance-C-010
when: When initializing the model
action: Create AssetMarket instance before initializing any bank balance sheets
severity: fatal
kind: architecture_guardrail
modality: must
consequence: Bank balance sheet initialization registers assets with the AssetMarket (institutions.py:48, 55); if AssetMarket
does not exist yet, AttributeError will crash initialization
stage_ids:
- initialization
- id: finance-C-014
when: When implementing deleveraging logic for banks
action: Check leverage ratio against the insolvency threshold (3%) first, before any deleveraging decision
severity: fatal
kind: architecture_guardrail
modality: must
consequence: Banks below 3% leverage are insolvent and must trigger default, not deleverage. Failure to check insolvency
first causes zombie banks to continue operating with negative equity
stage_ids:
- agent_decision
- id: finance-C-015
when: When implementing the agent decision phase execution order
action: Separate step() and act() phases, executing each step() calls before each act() calls within each simulation round
severity: fatal
kind: architecture_guardrail
modality: must
consequence: Without two-phase execution, agent decisions become order-dependent. First-acting banks can sell assets before
others decide, creating first-mover advantage that distorts systemic risk measurement
stage_ids:
- agent_decision
- id: finance-C-016
when: When implementing asset sale actions
action: Queue asset sales to the putForSale_ buffer rather than executing sales immediately
severity: fatal
kind: architecture_guardrail
modality: must
consequence: Immediate asset sales cause double-selling when multiple banks hold the same asset, as the market cannot
account for concurrent sale intentions before execution
stage_ids:
- agent_decision
- id: finance-C-019
when: When calculating leverage for decision thresholds
action: Compute leverage ratio as equity-to-assets (λ = E/A), not equity-to-liabilities
severity: fatal
kind: domain_rule
modality: must
consequence: Incorrect leverage formula causes wrong threshold comparisons, potentially triggering deleveraging at wrong
leverage levels or failing to detect insolvency
stage_ids:
- agent_decision
- id: finance-C-020
when: When implementing the deleveraging trigger condition
action: Trigger deleveraging when leverage < 4% buffer, not when leverage < 3% minimum
severity: fatal
kind: domain_rule
modality: must
consequence: Using wrong threshold causes banks to delay deleveraging until insolvency or react prematurely, breaking
the threshold model designed to avoid excessive trading at minor losses
stage_ids:
- agent_decision
- id: finance-C-027
when: When calibrating the leverage thresholds
action: Use the same threshold value for both insolvency trigger and deleveraging buffer
severity: fatal
kind: domain_rule
modality: must_not
consequence: Equal thresholds eliminate the buffer zone, causing banks to either do nothing or immediately default with
no intermediate deleveraging option
stage_ids:
- agent_decision
- id: finance-C-029
when: When implementing the market clearing stage
action: Use a floating-point tolerance eps=1e-9 for zero-equivalence checks in financial calculations
severity: fatal
kind: domain_rule
modality: must
consequence: Without eps=1e-9 tolerance, IEEE 754 floating-point precision errors cause division-by-zero crashes or incorrect
cash transfers, corrupting the simulation's financial state
stage_ids:
- market_clearing
- id: finance-C-030
when: When adding orders to the orderbook
action: Assert quantity > 0 before adding orders to prevent invalid sales
severity: fatal
kind: domain_rule
modality: must
consequence: Zero or negative quantities in the orderbook cause incorrect price impact calculations and cash transfers,
corrupting the market clearing mechanism
stage_ids:
- market_clearing
- id: finance-C-031
when: When calculating price impact from asset sales
action: Implement the Cifuentes 2005 exponential price impact formula with beta calibrated to 5% impact at 5% sold
severity: fatal
kind: domain_rule
modality: must
consequence: Using a linear or incorrect price impact formula causes the model to misestimate fire-sale contagion dynamics,
leading to unreliable systemic risk estimates
stage_ids:
- market_clearing
- id: finance-C-042
when: When tracking cumulative sales for price impact
action: Initialize total_quantities with actual market capitalization values from bank balance sheets
severity: fatal
kind: domain_rule
modality: must
consequence: Zero or uninitialized total_quantities causes division-by-zero or undefined price impact calculations, crashing
the market clearing stage
stage_ids:
- market_clearing
- id: finance-C-043
when: When implementing bank default handling logic
action: defer default execution to the step() phase by using do_trigger_default flag
severity: fatal
kind: domain_rule
modality: must
consequence: Immediate default execution in act() phase causes order-dependent effects when default triggers bilateral
funding pulls, leading to non-reproducible simulation results depending on agent processing order
stage_ids:
- default_handling
- id: finance-C-044
when: When processing a bank marked as dead
action: prevent the dead bank from executing any further actions by checking the alive flag
severity: fatal
kind: domain_rule
modality: must
consequence: Dead banks continue to execute actions, corrupting systemic risk calculations by including already defaulted
institutions in deleveraging decisions
stage_ids:
- default_handling
- id: finance-C-045
when: When liquidating defaulting bank's assets
action: sell assets proportionally using the same sell_assets_proportionally function as normal deleveraging
severity: fatal
kind: domain_rule
modality: must
consequence: Non-proportional fire sale liquidation creates inconsistent market dynamics compared to normal deleveraging,
causing skewed price impact calculations and incorrect systemic risk measurements
stage_ids:
- default_handling
- id: finance-C-052
when: When processing agent actions in the simulation loop
action: execute each step() calls for each agents before any act() calls
severity: fatal
kind: architecture_guardrail
modality: must
consequence: Interleaving step() and act() calls causes defaults triggered during act() to execute out of order, breaking
the deferred execution design and causing inconsistent systemic outcomes
stage_ids:
- default_handling
- id: finance-C-055
when: When validating that fire sale assets enter the market orderbook
action: confirm each defaulted bank assets are added to the market orderbook before price clearing
severity: fatal
kind: domain_rule
modality: must
consequence: Defaulting bank assets not entering the orderbook means they are not included in price impact calculations,
causing fire sale prices to be artificially inflated and systemic risk to be understated
stage_ids:
- default_handling
- id: finance-C-056
when: When initializing the model from EBA_2018.csv balance sheet data
action: create exactly 48 Bank agents as specified by NBANKS constant
severity: fatal
kind: domain_rule
modality: must
consequence: Systemic risk calculations (eose = sum(out) / NBANKS) will produce incorrect results if the bank count differs
from 48, causing misleading stress test conclusions
stage_ids:
- initialization
- shock_application
- id: finance-C-057
when: When creating Bank balance sheets during initialization
action: populate balance sheet with exactly 4 asset components (cash, corp_bonds, gov_bonds, other_asset) and 2 liability
components (loan, other_liability)
severity: fatal
kind: architecture_guardrail
modality: must
consequence: Balance sheet integrity violations will cause incorrect leverage calculations, leading to wrong insolvency
detection and potentially masked systemic failures
stage_ids:
- initialization
- shock_application
- id: finance-C-058
when: When AssetMarket is created during initialization
action: initialize prices dict with default value of 1.0 for each asset types and record oldPrices as empty dict
severity: fatal
kind: architecture_guardrail
modality: must
consequence: Uninitialized price structures will cause KeyError exceptions during shock application or price impact calculations,
halting simulation
stage_ids:
- initialization
- shock_application
- id: finance-C-066
when: When checking bank solvency via leverage constraint
action: raise DefaultException when leverage ratio (equity/assets) falls below BANK_LEVERAGE_MIN threshold (3%)
severity: fatal
kind: architecture_guardrail
modality: must
consequence: Insolvent banks will continue operating, accumulating losses and creating misleading systemic risk metrics
in the stress test
stage_ids:
- market_clearing
- default_handling
- id: finance-C-074
when: When implementing leverage ratio calculations for bank solvency checks
action: Use equity_valuation divided by asset_valuation (lambda = E/A), not debt-to-equity or other ratios
severity: fatal
kind: domain_rule
modality: must
consequence: Bank solvency status will be incorrectly assessed, causing insolvent banks to continue trading or solvent
banks to default unnecessarily
- id: finance-C-075
when: When implementing any financial comparisons involving price, quantity, or monetary values
action: Use eps = 1e-9 tolerance threshold to avoid floating-point edge cases in IEEE 754 arithmetic
severity: fatal
kind: domain_rule
modality: must
consequence: Financial comparisons may fail silently due to floating-point precision errors, causing assets with near-zero
prices to be incorrectly sold or loans with negligible amounts to be processed
- id: finance-C-076
when: When determining whether a bank has defaulted and must liquidate each tradable assets
action: Use BankLeverageConstraint.is_insolvent() which checks if lambda < BANK_LEVERAGE_MIN (3%)
severity: fatal
kind: domain_rule
modality: must
consequence: Insolvent banks will not be properly defaulted, causing them to continue deleveraging and selling assets
at depressed prices, accelerating systemic contagion
- id: finance-C-077
when: When implementing any Action class for bank deleveraging behavior
action: Implement perform() method for execution logic and get_max() method returning maximum executable amount
severity: fatal
kind: architecture_guardrail
modality: must
consequence: Bank deleveraging actions will fail to execute or return incorrect amounts, breaking the contagion model
and producing invalid systemic risk estimates
- id: finance-C-078
when: When implementing any Contract class to be considered for agent deleveraging
action: Implement is_eligible(me) method returning boolean to filter which contracts are acted upon
severity: fatal
kind: architecture_guardrail
modality: must
consequence: Contracts will always return False for eligibility, preventing banks from selling assets or paying loans
to delever, breaking the fire-sale cascade mechanism
- id: finance-C-080
when: When executing the simulation tick loop for agent-based model
action: Call step() on each agents before calling act() on each agents, with market clearing between them
severity: fatal
kind: architecture_guardrail
modality: must
consequence: Order-dependent execution will produce different results based on agent ordering, invalidating simulation
reproducibility and academic results
- id: finance-C-081
when: When processing asset sales within a simulation tick
action: Call put_for_sale() on the market before calling clear_the_market() in the same tick to preserve orderbook integrity
severity: fatal
kind: architecture_guardrail
modality: must
consequence: Assets will be sold at current prices without accumulated orders, eliminating price impact effects and breaking
the fire-sale contagion model
- id: finance-C-082
when: When tracking asset quantities during settlement in the orderbook
action: Decrement putForSale_ buffer when settling a sale to prevent double-selling the same asset
severity: fatal
kind: architecture_guardrail
modality: must
consequence: Assets will be sold multiple times in the same tick, causing phantom cash generation and incorrect balance
sheet calculations that corrupt systemic risk metrics
- id: finance-C-086
when: When presenting or reporting this system's simulation results to users or stakeholders
action: Claim that simulation results represent real-time trading system capabilities or live market execution
severity: fatal
kind: claim_boundary
modality: must_not
consequence: Users will make investment or policy decisions based on inapplicable simulation assumptions, leading to severe
misallocation of capital or incorrect regulatory assessments
- id: finance-C-087
when: When using this model for credit risk assessment or regulatory capital calculations
action: Claim regulatory-grade accuracy or compliance with Basel/CRR capital adequacy frameworks
severity: fatal
kind: claim_boundary
modality: must_not
consequence: Regulatory submissions based on simplified fire-sale contagion model will violate compliance requirements,
exposing institutions to penalties and supervisory action
- id: finance-C-095
when: When implementing bank asset initialization
action: Calculate corporate bonds as total debt securities minus government bonds — government bonds must be allocated
first, corporate bonds are whatever remains in the debt securities bucket
severity: fatal
kind: domain_rule
modality: must
consequence: Incorrectly treating corporate bonds as direct allocation instead of residual breaks regulatory data structure,
causing balance sheet mismatch and invalid stress test results
derived_from_bd_id: BD-035
- id: finance-C-099
when: When implementing contract inheritance hierarchy
action: Verify ESLContract base class is available from the economicsl library before using contract inheritance — verify
the dependency is installed and the library version is compatible
severity: fatal
kind: architecture_guardrail
modality: must
consequence: Contract inheritance from missing external library causes immediate initialization failure, preventing any
simulation from running
derived_from_bd_id: BD-106
- id: finance-C-101
when: When implementing market operation sequence within a tick
action: Call put_for_sale() before clear_the_market() in the same tick — the pending sales must be registered before market
finalization
severity: fatal
kind: domain_rule
modality: must
consequence: Reversing the operation order causes clear_the_market() to execute with stale market state, leading to incorrect
price settlement and violating the intended market operation sequence
derived_from_bd_id: BD-090
- id: finance-C-116
when: When implementing asset valuation for tradable securities in constraint_definition
action: 'Use mark-to-market valuation: value = quantity × current price for each tradable assets — do not substitute with
amortized cost or historical cost'
severity: fatal
kind: architecture_guardrail
modality: must
consequence: Using non-market valuations during stress scenarios masks actual leverage ratios; mark-to-market reveals
true asset values needed for regulatory compliance and contagion detection
derived_from_bd_id: BD-064
- id: finance-C-124
when: When implementing the delevering behavior for financial institutions
action: Pay each liabilities first, then sell assets to raise liquidity — liability payments must occur before any asset
sales within each delevering step
severity: fatal
kind: domain_rule
modality: must
consequence: Paying liabilities first reflects legal obligation hierarchy; reversing this order (selling assets before
liabilities) violates both regulatory requirements and behavioral realism for regulated banks
derived_from_bd_id: BD-044
- id: finance-C-139
when: When configuring or verifying price impact computation models for liquidation scenarios
action: Verify price impact computation differentiates between liquid and illiquid assets within the same asset type;
DO NOT apply uniform price impact coefficients across asset-type groupings when liquidity characteristics differ
severity: fatal
kind: domain_rule
modality: must
consequence: Uniform price impact coefficients applied to mixed-liquidity asset groups cause systematic mispricing of
illiquid asset liquidations, with execution shortfalls accumulating silently in backtests
derived_from_bd_id: BD-114
- id: finance-C-160
when: When implementing insolvency detection in the banking model
action: Trigger bank default when leverage falls below 3% — banks with leverage >= 3% must remain solvent and tradable;
banks with leverage < 3% must be marked as insolvent and excluded from trading
severity: fatal
kind: domain_rule
modality: must
consequence: Setting incorrect insolvency threshold causes either premature defaults (threshold too high) or late defaults
allowing insolvent banks to continue trading (threshold too low), both violating regulatory capital adequacy assumptions
derived_from_bd_id: BD-072
- id: finance-C-162
when: When setting initial capital structure constraints in the banking model
action: Set target leverage at 5% representing 2% buffer above the 4% deleveraging trigger — verify target > deleveraging_trigger
> insolvency_threshold (5% > 4% > 3%) hierarchical ordering is maintained
severity: fatal
kind: architecture_guardrail
modality: must
consequence: Setting target leverage <= deleveraging trigger removes the buffer hierarchy, preventing banks from maintaining
safe capital buffers above the deleveraging activation level
derived_from_bd_id: BD-074
regular:
- id: finance-C-004
when: When initializing bank balance sheets
action: Allow banks to be created with non-positive leverage ratios
severity: high
kind: domain_rule
modality: must_not
consequence: Banks with zero or negative leverage ratios will cause division by zero in asset calculation (CET1E/leverage)
or immediately trigger insolvency, corrupting the model's initial state
stage_ids:
- initialization
- id: finance-C-006
when: When initializing the model
action: Use real EBA 2018 data as the sole source for bank balance sheet initialization
severity: high
kind: resource_boundary
modality: must
consequence: The model claims reproducibility from real European banking data; using synthetic or modified data would
invalidate the empirical grounding that distinguishes this stress test from arbitrary simulations
stage_ids:
- initialization
- id: finance-C-007
when: When initializing banks from EBA CSV data
action: Parse gov_bonds field using eval() to handle compound expressions like '13+27682'
severity: high
kind: operational_lesson
modality: must
consequence: The EBA_2018.csv uses additive notation for government bond holdings (e.g., '13+27682' meaning 13+27682 units);
using simple int() parsing would fail, causing ValueError and model crash
stage_ids:
- initialization
- id: finance-C-008
when: When initializing banks from EBA CSV data
action: Split CSV rows using space delimiter as the data format specifies
severity: high
kind: operational_lesson
modality: must
consequence: The EBA_2018.csv uses space-separated values with no quoted fields; using comma splitting would misalign
columns, causing bank_name, CET1E, leverage to be incorrectly parsed and balance sheet calculations to fail
stage_ids:
- initialization
- id: finance-C-009
when: When initializing the model
action: Set random seeds before any simulation runs to verify reproducibility
severity: medium
kind: operational_lesson
modality: must
consequence: Without deterministic random seeds, the random.shuffle() in run_simulation (model.py:89) will produce different
agent ordering each run, causing non-reproducible systemic risk measurements
stage_ids:
- initialization
- id: finance-C-011
when: When initializing bank balance sheets
action: 'Maintain correct order of balance sheet component calculations: asset→cash→liability→loan→other_liability'
severity: high
kind: architecture_guardrail
modality: must
consequence: Balance sheet components have dependencies (liability=asset-CET1E, loan=liability/2); incorrect calculation
order will produce invalid negative values or incorrect leverage ratios
stage_ids:
- initialization
- id: finance-C-012
when: When initializing the model
action: Claim that stress test results represent actual live trading outcomes
severity: high
kind: claim_boundary
modality: must_not
consequence: This is a simplified fire sale contagion model using EBA 2018 data snapshots; presenting model outputs as
predictions of actual bank behavior or market outcomes would mislead stakeholders about real-world risk
stage_ids:
- initialization
- id: finance-C-013
when: When initializing the model
action: Present model outputs as regulatory-grade capital adequacy assessments
severity: high
kind: claim_boundary
modality: must_not
consequence: The model uses simplified tier-1 leverage ratio (CET1E/leverage) rather than risk-weighted capital ratios
from Basel frameworks; presenting it as regulatory assessment would misrepresent compliance status
stage_ids:
- initialization
- id: finance-C-017
when: When implementing deleveraging priority
action: Execute loan repayment before asset sales when both actions are available
severity: high
kind: domain_rule
modality: must
consequence: Selling assets before repaying loans triggers asset price impact (per Cifuentes 2005), reducing collateral
value and potentially accelerating systemic contagion compared to liability reduction first
stage_ids:
- agent_decision
- id: finance-C-018
when: When implementing proportional allocation across multiple contracts
action: Distribute deleveraging amounts proportionally based on each contract's maximum action value
severity: high
kind: domain_rule
modality: must
consequence: Non-proportional allocation allows one large position to monopolize deleveraging, causing concentration risk
and violating Cont-Schaanning 2017 fire sale stress testing principles
stage_ids:
- agent_decision
- id: finance-C-021
when: When running batch simulations for systemic risk measurement
action: Randomly shuffle agent processing order each simulation round
severity: high
kind: architecture_guardrail
modality: must
consequence: Fixed agent order creates systematic first-mover bias across simulations, causing deterministic results that
mask true systemic risk variance and correlation effects
stage_ids:
- agent_decision
- id: finance-C-022
when: When default is triggered during the decision phase
action: Defer default execution to the step() phase rather than executing immediately
severity: high
kind: architecture_guardrail
modality: must
consequence: Immediate default execution during act() creates order dependency, as the defaulting bank's asset liquidation
affects other banks' decisions in the same round
stage_ids:
- agent_decision
- id: finance-C-023
when: When calculating monetary amounts in deleveraging operations
action: Use epsilon tolerance (eps = 1e-9) to avoid floating-point edge cases
severity: medium
kind: domain_rule
modality: must
consequence: Without epsilon tolerance, near-zero quantities trigger infinite loops or NaN values in exponential price
impact calculations, corrupting simulation results
stage_ids:
- agent_decision
- id: finance-C-024
when: When implementing the deleveraging strategy
action: Use a hardcoded fixed order for contract selection; strategy must be parameterized
severity: medium
kind: resource_boundary
modality: must_not
consequence: Hardcoded strategy prevents testing different allocation behaviors (e.g., proportional vs. equal-weighted),
limiting stress test coverage and validation
stage_ids:
- agent_decision
- id: finance-C-025
when: When running the model for firesale scenarios
action: Enable SIMULTANEOUS_FIRESALE mode to clear market between step and act phases
severity: high
kind: operational_lesson
modality: must
consequence: Without simultaneous firesale mode, asset prices update sequentially during step(), causing price discovery
timing to depend on agent order rather than aggregate demand
stage_ids:
- agent_decision
- id: finance-C-026
when: When presenting simulation results
action: Claim simulated returns represent expected live trading performance
severity: high
kind: claim_boundary
modality: must_not
consequence: Backtested stress test results do not guarantee live execution outcomes due to market impact assumptions,
simplified behavioral rules, and absence of counterparty reactions
stage_ids:
- agent_decision
- id: finance-C-028
when: When determining asset eligibility for deleveraging
action: Only include assets with quantity greater than already-marked-for-sale amount
severity: high
kind: architecture_guardrail
modality: must
consequence: Including already-sold assets in available actions causes attempts to sell non-existent quantities, leading
to negative balances or invalid state
stage_ids:
- agent_decision
- id: finance-C-032
when: When settling sell orders in the market clearing stage
action: Execute sales at the midpoint of pre-sale and post-sale prices to prevent front-running
severity: high
kind: domain_rule
modality: must
consequence: Using only post-sale prices creates an unfair advantage for late sellers, violating the batch execution guarantee
and distorting market clearing fairness
stage_ids:
- market_clearing
- id: finance-C-033
when: When preventing zero-quantity or zero-price sales
action: Skip sale execution when asset price or computed quantity is effectively zero
severity: high
kind: domain_rule
modality: must
consequence: Executing sales with zero price or zero quantity causes division errors or incorrect cash accounting, corrupting
the market clearing settlement
stage_ids:
- market_clearing
- id: finance-C-034
when: When modeling market clearing behavior
action: Batch each asset sales before computing price impact when SIMULTANEOUS_FIRESALE=True
severity: high
kind: architecture_guardrail
modality: must
consequence: Sequential execution with price signals affecting subsequent sales creates look-ahead bias, preventing realistic
modeling of illiquid market dynamics
stage_ids:
- market_clearing
- id: finance-C-035
when: When executing the market clearing sequence
action: Capture oldPrices BEFORE computing price impact, then compute price impact BEFORE settling orders
severity: high
kind: architecture_guardrail
modality: must
consequence: Capturing oldPrices after price impact or settling before price impact breaks the midpoint price guarantee,
causing incorrect cash transfers to selling banks
stage_ids:
- market_clearing
- id: finance-C-036
when: When computing price impact across multiple asset types
action: Aggregate quantities sold per asset type before computing price impact for that type
severity: high
kind: architecture_guardrail
modality: must
consequence: Computing price impact per individual asset instead of per asset type violates the assumption that same-type
assets are perfect substitutes, causing incorrect price trajectories
stage_ids:
- market_clearing
- id: finance-C-037
when: When configuring the stress testing model
action: Claim the model produces real-time trading signals or actual market prices
severity: high
kind: claim_boundary
modality: must_not
consequence: Presenting model outputs as live trading signals misleads stakeholders about the model's purpose as a stress
testing and systemic risk estimation tool
stage_ids:
- market_clearing
- id: finance-C-038
when: When presenting stress testing results
action: Claim backtest returns equal expected live trading returns without specified caveats
severity: medium
kind: claim_boundary
modality: must_not
consequence: Backtested systemic risk estimates reflect model assumptions and historical data patterns, not guaranteed
future market behavior under stress conditions
stage_ids:
- market_clearing
- id: finance-C-039
when: When configuring the clearing mode
action: Understand that SIMULTANEOUS_FIRESALE=True batches each sales, preventing price signals from affecting subsequent
sales in the same round
severity: high
kind: operational_lesson
modality: must
consequence: Setting SIMULTANEOUS_FIRESALE=False allows cascading price impacts where each sale affects subsequent prices,
fundamentally changing the market clearing semantics
stage_ids:
- market_clearing
- id: finance-C-040
when: When selecting the price impact function
action: Use exponential price impact rather than linear to capture nonlinear liquidity drain where small sales have limited
impact and large sales trigger steep drops
severity: high
kind: resource_boundary
modality: must
consequence: Linear price impact underestimates fire-sale contagion for large sales and overestimates impact for small
sales, producing misleading systemic risk estimates
stage_ids:
- market_clearing
- id: finance-C-041
when: When configuring price impact parameters
action: Set the default price impact to 0.05 (5%) meaning selling 5% of market cap causes 5% price drop
severity: medium
kind: resource_boundary
modality: must
consequence: Using non-calibrated price impact parameters produces unrealistic liquidity assumptions, either overstating
or understating systemic risk
stage_ids:
- market_clearing
- id: finance-C-046
when: When a DefaultException is raised during agent decision phase
action: increment the bank_defaults_this_round counter for record keeping
severity: high
kind: domain_rule
modality: must
consequence: Missing counter increment breaks systemic risk tracking and makes it impossible to calculate the extent of
system-wide stress events from simulation output
stage_ids:
- default_handling
- id: finance-C-047
when: When executing agent actions in the main simulation loop
action: include the defaulting bank in the current round's agent list to verify its assets enter the orderbook
severity: high
kind: architecture_guardrail
modality: must
consequence: Removing the defaulting bank before its assets are sold means fire sale liquidation never occurs, breaking
the contagion mechanism and producing incorrect systemic risk estimates
stage_ids:
- default_handling
- id: finance-C-048
when: When market clearing occurs during simultaneous fire sale mode
action: clear the market and update asset prices before the next round of agent decisions
severity: high
kind: architecture_guardrail
modality: must
consequence: Surviving banks receive stale prices from before fire sales, causing them to make incorrect deleveraging
decisions based on inflated asset valuations
stage_ids:
- default_handling
- id: finance-C-049
when: When implementing default handling for the stress testing model
action: restrict default handling to SOLVENCY type only (not liquidity or margin call failures)
severity: high
kind: resource_boundary
modality: must
consequence: Including unsupported default types (liquidity, margin call) in the model produces results that do not match
the documented model specification, misleading stakeholders about systemic risk
stage_ids:
- default_handling
- id: finance-C-050
when: When selecting the default treatment algorithm
action: treat default_treatment as a replaceable module that can be swapped without breaking the core simulation loop
severity: medium
kind: resource_boundary
modality: should
consequence: Hardcoding default treatment logic prevents experimentation with alternative resolution strategies and makes
the model less adaptable to different stress scenarios
stage_ids:
- default_handling
- id: finance-C-051
when: When running the simulation to ensure order independence
action: shuffle the agent list before each round of step() and act() calls
severity: high
kind: operational_lesson
modality: must
consequence: Without shuffling, agent processing order affects default timing and fire sale sequencing, producing non-deterministic
results that cannot be replicated across simulation runs
stage_ids:
- default_handling
- id: finance-C-053
when: When interpreting simulation results for policy decisions
action: claim that backtest stress test results predict actual live trading outcomes
severity: high
kind: claim_boundary
modality: must_not
consequence: Presenting simulation outputs as expected real-world results ignores model simplification, assumptions, and
the well-documented gap between backtest and live performance
stage_ids:
- default_handling
- id: finance-C-054
when: When presenting the stress testing framework capabilities
action: claim real-time trading support for a pure backtesting simulation framework
severity: high
kind: claim_boundary
modality: must_not
consequence: The system uses polling-based simulation with no live market connectivity, so claiming real-time capabilities
would mislead users about actual system functionality
stage_ids:
- default_handling
- id: finance-C-059
when: When applying the initial shock to asset prices
action: reduce price by exactly the INITIAL_SHOCK fraction (default 20%) and propagate price changes to each Bank agents
via update_asset_price()
severity: high
kind: domain_rule
modality: must
consequence: Agents will retain stale asset valuations causing incorrect leverage calculations, leading to wrong deleveraging
decisions and distorted systemic risk results
stage_ids:
- shock_application
- agent_decision
- id: finance-C-060
when: When propagating shocked prices to agent balance sheets
action: update the price attribute of each Tradable contract to match the new market price from AssetMarket.prices
severity: high
kind: architecture_guardrail
modality: must
consequence: Agents will calculate leverage based on outdated prices, causing incorrect insolvency detection and potential
domino effects in the contagion chain
stage_ids:
- shock_application
- agent_decision
- id: finance-C-061
when: When Banks submit sell orders to the AssetMarket
action: pass orders only with quantity greater than zero (enforced by assert) and append Order objects to the orderbook
list
severity: high
kind: domain_rule
modality: must
consequence: AssertionError will terminate the simulation when zero-quantity orders are submitted, preventing completion
of stress test runs
stage_ids:
- agent_decision
- market_clearing
- id: finance-C-062
when: When SIMULTANEOUS_FIRESALE mode is enabled (default)
action: accumulate each orders in orderbook during agent decisions before calling clear_the_market() once per timestep
severity: high
kind: architecture_guardrail
modality: must
consequence: If clear_the_market() is called per-order instead of batched, price impact calculations will be fragmented
and produce incorrect cascading price effects
stage_ids:
- agent_decision
- market_clearing
- id: finance-C-063
when: When clearing the market and computing price impact
action: save oldPrices BEFORE updating prices so that settlement uses the correct pre-impact price for calculating proceeds
severity: high
kind: architecture_guardrail
modality: must
consequence: Settlement will use incorrect price reference, causing wrong cash proceeds calculations and breaking conservation
of value across the edge
stage_ids:
- market_clearing
- agent_decision
- id: finance-C-064
when: When updating agent asset prices after market clearing
action: propagate price changes to agents only when price decreases (priceLost > 0), not on price increases
severity: medium
kind: domain_rule
modality: must
consequence: Agents will receive incorrect positive price signals on asset sales, causing artificial equity increases
and distorted deleveraging behavior
stage_ids:
- market_clearing
- agent_decision
- id: finance-C-065
when: When settling asset sale orders
action: use midpoint pricing formula (current_price + old_price) / 2 for settlement, not the new impacted price
severity: high
kind: domain_rule
modality: must
consequence: Using the post-impact price would undervalue seller proceeds, causing systematic underestimation of cash
raised and distorted leverage calculations
stage_ids:
- market_clearing
- agent_decision
- id: finance-C-067
when: When catching DefaultException in agent decision phase
action: set the bank's alive flag to False and increment bank_defaults_this_round counter for tracking
severity: high
kind: architecture_guardrail
modality: must
consequence: Dead banks will remain in active agent lists, causing errors in subsequent rounds and corrupting systemic
risk calculations
stage_ids:
- default_handling
- agent_decision
- id: finance-C-068
when: When filtering agents for the next simulation round
action: process banks marked as not alive (alive=False) through step() or act() methods
severity: high
kind: operational_lesson
modality: must_not
consequence: Processing dead banks will cause KeyError or attribute errors as their balance sheets have been liquidated,
halting the simulation
stage_ids:
- default_handling
- agent_decision
- id: finance-C-069
when: When iterating over agents in each simulation timestep
action: shuffle agent list randomly before processing to verify order independence in the simultaneous firesale scenario
severity: medium
kind: operational_lesson
modality: must
consequence: Systematic processing order will bias results as earlier agents get better prices in sequential clearing,
creating non-reproducible and unfair outcomes
stage_ids:
- default_handling
- agent_decision
- id: finance-C-070
when: When presenting stress test results as system-wide risk metrics
action: claim that simulated bank defaults directly predict actual bank failures in a real stress scenario
severity: high
kind: claim_boundary
modality: must_not
consequence: Presenting simulated defaults as predictions will mislead policymakers as the model uses simplified balance
sheet structures and threshold-based triggers
stage_ids:
- market_clearing
- default_handling
- agent_decision
- id: finance-C-071
when: When calibrating price impact parameters based on literature
action: set beta coefficient such that selling 5% of market cap causes exactly 5% price drop (the Cifuentes 2005 convention)
severity: medium
kind: operational_lesson
modality: must
consequence: Incorrect beta calibration will distort systemic risk estimates, leading to either underestimation of fire
sale contagion or excessive conservatism
stage_ids:
- agent_decision
- market_clearing
- id: finance-C-072
when: When interpreting simulation results for policy decisions
action: assume that backtested stress test parameters (leverage thresholds, price impacts) will remain valid without recalibration
for changing market conditions
severity: medium
kind: claim_boundary
modality: must_not
consequence: Stale calibration parameters will produce misleading risk assessments as market microstructure and bank balance
sheet compositions evolve over time
stage_ids:
- shock_application
- agent_decision
- market_clearing
- id: finance-C-073
when: When encountering unexpected insolvency behavior or price anomalies
action: assume the problem is due to data quality without investigating the leverage calculation and price update chain
severity: high
kind: rationalization_guard
modality: must_not
consequence: Misattributing cascading failures to data issues will mask implementation bugs in the contagion mechanics,
leading to persistent incorrect results
stage_ids:
- shock_application
- agent_decision
- market_clearing
- id: finance-C-079
when: When representing asset quantities in tradable contracts
action: Store quantities as floats, not integers, to support fractional sales and proportional allocation
severity: high
kind: architecture_guardrail
modality: must
consequence: Integer quantities prevent proportional deleveraging when a bank needs to sell 33.5% of its assets, breaking
the Greenwood 2015 and Cont-Schaanning 2017 behavioral models
- id: finance-C-083
when: When calculating settlement prices during asset sales
action: Use midpoint of current price and oldPrices (pre-update price) for settlement to prevent look-ahead bias
severity: high
kind: architecture_guardrail
modality: must
consequence: Settlement at updated prices allows agents to profit from their own price-impacting trades, introducing strategic
arbitrage and invalidating the economic model
- id: finance-C-084
when: When conducting agent-based simulation to ensure reproducible yet stochastic results
action: Shuffle agent execution order randomly each tick to verify order-independent execution
severity: high
kind: operational_lesson
modality: must
consequence: Simulation results become deterministic with agent ordering, producing biased systemic risk estimates that
depend on input data ordering rather than structural factors
- id: finance-C-085
when: When using py-economicsl external library dependency
action: Pin to specific git commit or version tag as this is a research library without PyPI release
severity: high
kind: resource_boundary
modality: must
consequence: Dependency breakage or API changes will cause simulation to fail, blocking academic research and reproducibility
efforts
- id: finance-C-088
when: When interpreting the model's price impact function
action: Claim that the Cifuentes 2005 exponential price impact model equals actual market microstructure or high-frequency
trading dynamics
severity: high
kind: claim_boundary
modality: must_not
consequence: Policy recommendations based on simplified price impact will underestimate or overestimate systemic risk
in real markets with order book dynamics, dark pools, and HFT
- id: finance-C-089
when: When using this model for high-frequency trading strategy development
action: Claim applicability to tick-by-tick trading or intraday arbitrage strategies
severity: high
kind: claim_boundary
modality: must_not
consequence: Trading algorithms based on daily timestep model will fail to capture sub-second market dynamics, latency
effects, and adverse selection from HFT counterparties
- id: finance-C-090
when: When presenting simulation outputs without proper caveats
action: Omit disclosure that results are based on 2018 EBA data, simplified behavioral assumptions, and exponential price
impact approximation
severity: medium
kind: claim_boundary
modality: must_not
consequence: Stakeholders will misinterpret outputs as current market conditions rather than historical stress test scenario,
leading to inappropriate resource allocation or policy decisions
- id: finance-C-091
when: When comparing simulation results across different parameter configurations
action: Re-initialize model from scratch (model.initialize()) before each simulation run to prevent state leakage
severity: high
kind: operational_lesson
modality: must
consequence: Residual state from previous runs will corrupt systemic risk measurements, producing false comparisons between
price impact or shock scenarios
- id: finance-C-092
when: When initializing bank balance sheets from external data
action: Parse government bond holdings correctly using eval() for the 'n+m' notation in EBA_2018.csv
severity: high
kind: domain_rule
modality: must
consequence: Banks with split government bond holdings will have incorrect balance sheets, distorting asset allocation
and systemic risk calculations
- id: finance-C-093
when: When modeling bank deleveraging behavior in the simulation
action: Pay off liabilities first, then sell assets proportionally if cash is insufficient to meet deleveraging target
severity: high
kind: architecture_guardrail
modality: must
consequence: Banks will sell assets before reducing liabilities, incorrectly amplifying fire-sale pressure and overestimating
systemic contagion
- id: finance-C-094
when: When implementing execution order in agent_decision logic
action: 'Execute actions in two-phase structure: first decide in act() method, then execute in step() method — never execute
trades directly in the decision method'
severity: high
kind: domain_rule
modality: must
consequence: Combining decision and execution in a single phase creates first-mover advantage in simultaneous firesale
scenarios, causing parallel execution to produce unfair outcomes where the first actor gets better prices
derived_from_bd_id: BD-008
- id: finance-C-096
when: When implementing bank initialization with cash buffer parameters
action: Verify that cash=0.05*total_assets ratio matches actual bank behavior for the time period being modeled; adjust
to actual historical cash ratios if analyzing different regulatory environments
severity: medium
kind: operational_lesson
modality: should
consequence: Hardcoded 5% cash buffer does not reflect actual bank behavior across different time periods and regulatory
environments, causing stress test results to overestimate or underestimate bank liquidity during crises
derived_from_bd_id: BD-003
- id: finance-C-097
when: When implementing market clearing price impact calculations
action: Verify PRICE_IMPACTS calibration matches actual liquidity conditions — beta=-1/0.05*log(1-price_impact) calibrates
to 5% impact at 5% sold; adjust parameters if modeling different liquidity regimes
severity: medium
kind: operational_lesson
modality: should
consequence: Default price impact calibration may not match actual market liquidity, causing strategies to assume execution
at unrealistic prices and produce backtest results that cannot be replicated in live trading
derived_from_bd_id: BD-016
- id: finance-C-098
when: When implementing liability allocation logic
action: Verify liability_split_ratio==0.5 when granular data is unavailable — if actual distribution differs significantly,
use historical proportions instead of equal split to avoid modeler bias
severity: high
kind: domain_rule
modality: must
consequence: Equal 50/50 split is an arbitrary assumption that may not match actual liability composition, introducing
systematic bias into balance sheet calculations that compounds over time
derived_from_bd_id: BD-004
- id: finance-C-100
when: When implementing market clearing for fire sale simulations
action: Validate that the selected clearing mode (SIMULTANEOUS_FIRESALE or sequential) matches the intended scenario —
if strategy depends on a specific mode, the alternative mode must produce comparable results for robustness
severity: high
kind: domain_rule
modality: must
consequence: Using incorrect clearing mode produces materially different market outcomes, causing strategy backtests to
be invalid for scenarios with different execution assumptions
derived_from_bd_id: BD-100
- id: finance-C-102
when: When implementing Monte Carlo simulation with agent order randomization in backtesting
action: Use random shuffle (BD-045) with fixed seed 1337 (BD-051) AND run across 100 Monte Carlo simulations (BD-052)
together; each three components are required for statistical validity - using any subset produces biased or non-reproducible
results
severity: high
kind: domain_rule
modality: must
consequence: Using random shuffle without fixed seed produces non-reproducible results; using fixed seed without sufficient
MC runs produces statistically biased results that misrepresent fair agent ordering
derived_from_bd_id: BD-113
- id: finance-C-103
when: When implementing state transition modeling with transition matrices in backtesting
action: Assume transition matrices are time-homogeneous by default - this capability is missing; matrices may vary across
different time periods without explicit homogeneity handling
severity: high
kind: claim_boundary
modality: must_not
consequence: Assuming time-homogeneous transition matrices when the framework does not enforce this causes incorrect state
evolution modeling, producing biased Monte Carlo simulation results
derived_from_bd_id: BD-GAP-010
- id: finance-C-104
when: When implementing position deleveraging logic during stress scenarios
action: Execute proportional delevering based on max-amount weighting — larger positions must be reduced proportionally
more than smaller ones to maintain portfolio composition structure
severity: high
kind: domain_rule
modality: must
consequence: Changing deleveraging from proportional to equal-weighting causes portfolio composition to shift toward smaller
positions, creating artificial diversification that does not reflect actual deleveraging behavior in stress scenarios
derived_from_bd_id: BD-059
- id: finance-C-105
when: When implementing loan payment calculation logic in contract execution
action: Truncate loan payment amounts at the original notional principal — accumulated interest payments must not exceed
the loan principal amount
severity: high
kind: domain_rule
modality: must
consequence: Removing the notional truncation allows accumulated interest to exceed the original loan principal, causing
banks to overpay lenders in simulation and creating non-realistic financial outcomes
derived_from_bd_id: BD-061
- id: finance-C-106
when: When implementing action generation logic for bank decision-making
action: Recompute available actions from scratch at each simulation step — do not cache or reuse action lists from previous
steps
severity: high
kind: architecture_guardrail
modality: must
consequence: Caching available actions causes stale action data when balance sheet state changes between steps, potentially
allowing banks to select actions that are no longer valid and producing inconsistent simulation behavior
derived_from_bd_id: BD-063
- id: finance-C-107
when: When implementing default/fire-sale liquidation logic in bank resolution
action: Execute fire-sale asset sales proportionally across each asset types — maintain the same relative weighting as
normal deleveraging to verify behavioral consistency
severity: high
kind: domain_rule
modality: must
consequence: Using non-proportional asset selection during default creates inconsistent behavior compared to normal deleveraging,
introducing arbitrary asset type preferences that distort resolution outcomes and create unrealistic liquidation patterns
derived_from_bd_id: BD-022
- id: finance-C-108
when: When initializing the model population for EU banking sector simulation
action: Use the 48-bank population from EBA 2018 EU-wide stress test data — verify that any alternative bank set maintains
similar systemic representativeness characteristics (total assets, tier 1 capital ratios, geographic distribution)
severity: medium
kind: operational_lesson
modality: should
consequence: Using synthetic or alternative bank populations without empirical grounding may produce simulation results
that do not reflect actual EU banking sector behavior, creating false claims about systemic risk and resolution outcomes
derived_from_bd_id: BD-023
- id: finance-C-109
when: When implementing asset sale order execution in market clearing
action: Accumulate each putForSale orders before executing any sales — do not execute individual sales immediately upon
request to verify price discovery fairness across each participants
severity: high
kind: domain_rule
modality: must
consequence: Executing asset sales immediately upon request creates first-mover advantages where early sellers receive
better prices before aggregate selling pressure affects market prices, distorting simulation fairness and producing
unrealistic price distributions
derived_from_bd_id: BD-056
- id: finance-C-110
when: When implementing action execution logic for any transaction type
action: Skip execution of actions with amounts below epsilon threshold (1e-9) — do not process micro-transactions that
could cause floating-point numerical artifacts
severity: medium
kind: architecture_guardrail
modality: must
consequence: Processing zero or near-zero amounts without epsilon filtering introduces floating-point rounding errors
that accumulate across simulation steps, potentially causing incorrect balance calculations and non-deterministic results
derived_from_bd_id: BD-060
- id: finance-C-111
when: When implementing leverage targeting and deleveraging trigger logic
action: Verify that the gap between deleveraging trigger threshold and target leverage is sufficient (recommend minimum
3-5 percentage points) — the 1 percentage point buffer between 4% trigger and 5% target is mathematically insufficient
severity: high
kind: operational_lesson
modality: must
consequence: The 1% buffer causes oscillation behavior where banks overshoot below 3% leverage before reaching the 5%
target due to price impact during asset sales reducing the asset base denominator, leading to insolvency outcomes that
could be avoided with wider buffer
derived_from_bd_id: BD-110
- id: finance-C-112
when: When implementing proportional deleveraging combined with midpoint pricing in stress scenarios
action: Apply stress-adjusted pricing multiplier for proportional deleveraging under high aggregate selling pressure —
the midpoint pricing formula (old_price + new_price) / 2 undervalues assets when exponential price impact is active
severity: high
kind: domain_rule
modality: must
consequence: Proportional deleveraging combined with midpoint pricing systematically undervalues asset sales during stress
because exponential price impact makes new_price significantly lower under high aggregate selling, causing banks to
receive below-fair-value prices for proportional amounts
derived_from_bd_id: BD-111
- id: finance-C-113
when: When implementing Monte Carlo result aggregation logic in the random_shuffling module
action: Use sample mean and standard deviation to aggregate simulation results across runs — do not replace with median/MAD
or other robust statistics
severity: high
kind: domain_rule
modality: must
consequence: Replacing mean/std with median/MAD changes the statistical characterization of simulation outcomes; tail
risk identification and scenario comparison become inconsistent with the documented analytical framework
derived_from_bd_id: BD-076
- id: finance-C-114
when: When configuring the systemic risk threshold for cascade detection in risk_measurement
action: Use exactly 5% as the systemic event threshold based on Gai-Kapadia (2010) cascade model literature — do not change
to alternative values without re-validation against academic grounding
severity: high
kind: domain_rule
modality: must_not
consequence: Changing the 5% threshold alters what constitutes a systemic event; higher thresholds miss moderate cascades
while lower thresholds over-flag normal bank failures, both causing misaligned risk management responses
derived_from_bd_id: BD-024
- id: finance-C-115
when: When implementing leverage buffer logic in constraint_definition
action: Set leverage buffer threshold to exactly 4% — banks at or below this level must initiate delevering actions, maintaining
1% gap above insolvency threshold
severity: high
kind: domain_rule
modality: must
consequence: The 1% buffer (4% minus 3% insolvency) provides critical reaction time for banks to reduce risk before failure;
changing this buffer reduces or eliminates the safety margin
derived_from_bd_id: BD-026
- id: finance-C-117
when: When implementing valuation for non-market assets/liabilities in constraint_definition
action: Use principal amount as valuation for Other assets and liabilities — accept that principal may differ from economic
value for items with prepayment options or embedded derivatives
severity: high
kind: architecture_guardrail
modality: must
consequence: Using principal provides consistent accounting treatment across non-market items; replacing with fair value
estimation requires additional data and adds complexity that was explicitly rejected
derived_from_bd_id: BD-066
- id: finance-C-118
when: When implementing price impact calculation in markets
action: Use exponential price impact function for asset pricing — do not replace with linear models which underestimate
market impact for large trades
severity: high
kind: architecture_guardrail
modality: must
consequence: Linear price impact models underestimate cascading losses during stress; exponential functions capture realistic
fire-sale dynamics where large sell orders cause disproportionately large price drops
derived_from_bd_id: BD-069
- id: finance-C-119
when: When running parameter sweeps over price impact or initial shock magnitudes
action: Add explicit validation to detect unknown asset types before sweep execution — do not rely on defaultdict(lambda:1.0)
default price which silently returns 1.0 for misspelled or missing asset types
severity: medium
kind: operational_lesson
modality: must
consequence: Unknown assets silently receiving price 1.0 causes parameter sweep results to include incorrect valuations;
during shock magnitude sweeps, misspelled asset types always show price 1.0 regardless of the shock parameter, corrupting
the entire analysis
derived_from_bd_id: BD-116
- id: finance-C-120
when: When implementing or modifying price impact calculations in market clearing
action: Use the exponential price impact function per Cifuentes 2005 — the exponential form ensures price impacts accelerate
as volume sold increases, capturing realistic market depth constraints in fire sale scenarios
severity: high
kind: domain_rule
modality: must
consequence: Using linear price impact underestimates fire sale severity in illiquid conditions, causing strategies to
appear more resilient than they would be in actual market stress scenarios
derived_from_bd_id: BD-038
- id: finance-C-121
when: When initializing asset prices in the market model
action: Verify that initial asset prices are normalized to 1.0 (not actual market prices), and understand this assumes
each assets start at par before percentage shocks are applied
severity: medium
kind: operational_lesson
modality: should
consequence: Using actual market prices instead of normalized values introduces heterogenous starting conditions that
make leverage calculations and shock comparisons inconsistent across different asset price scales
derived_from_bd_id: BD-040
- id: finance-C-122
when: When implementing solvency determination logic for financial institutions
action: Determine solvency using ONLY the leverage ratio (equity/assets) — do not incorporate liquidity ratios, credit
quality, or off-balance-sheet items into insolvency determination
severity: high
kind: domain_rule
modality: must
consequence: Adding multi-factor solvency criteria changes default timing and cascade dynamics, making backtest results
inconsistent with the model's designed behavior aligned to Basel III leverage standards
derived_from_bd_id: BD-041
- id: finance-C-123
when: When implementing default resolution logic in the simulation
action: Defer default execution to the step() phase only — defaults must be resolved at step boundaries; mid-step insolvencies
must accumulate without triggering default until the next step() call
severity: high
kind: domain_rule
modality: must
consequence: Immediate default execution creates ambiguous ordering dependencies where banks observe different market
states, breaking the consistent cascade ordering the model relies on for reproducibility
derived_from_bd_id: BD-042
- id: finance-C-125
when: When implementing agent ordering in the simulation round loop
action: Randomly shuffle the agent order each simulation round — fixed ordering introduces systematic bias where earlier-acting
banks consistently gain artificial advantages or disadvantages based purely on initialization order
severity: high
kind: architecture_guardrail
modality: must
consequence: Fixed ordering creates reproducible but biased results where bank outcomes depend on initialization order
rather than actual financial position, making backtest conclusions about bank resilience unreliable
derived_from_bd_id: BD-045
- id: finance-C-126
when: When implementing default trigger handling for financial institutions
action: Immediately sell each tradable assets upon default — maximize recovery for creditors by liquidating each tradable
positions; non-tradable loans and positions should be written off
severity: high
kind: domain_rule
modality: must
consequence: Partial or gradual liquidation changes creditor recovery rates and fire sale pressure dynamics, creating
inconsistent cascade severity compared to the model's maximum fire sale scenario design
derived_from_bd_id: BD-046
- id: finance-C-127
when: When implementing simulation control flow
action: Maintain strict separation between step() and act() — step() must handle each default resolution, act() must handle
each delevering; defaults must be finalized before banks can act
severity: high
kind: architecture_guardrail
modality: must
consequence: Interleaved or concurrent execution creates circular dependencies where delevering triggers default which
triggers more delevering, causing unpredictable cascade dynamics and breaking model reproducibility
derived_from_bd_id: BD-047
- id: finance-C-128
when: When implementing asset price update logic in market clearing
action: Update asset prices only when price loss > 0 — prices can only decrease during stress events; gains during market
stress must not be applied (asymmetric dynamics)
severity: high
kind: domain_rule
modality: must
consequence: Symmetric price updates during fire sales allow prices to recover mid-crisis, underestimating the duration
and severity of fire sale cascades by allowing unrealistically quick market rebounds
derived_from_bd_id: BD-049
- id: finance-C-129
when: When configuring initial shock parameters for stress testing scenarios
action: Set initial shock parameters to sweep from 0% (no stress) to 30% (severe crisis) — verify shock values are within
the calibrated range and not using default assumptions
severity: medium
kind: operational_lesson
modality: should
consequence: Using default shock values without calibration may miss resilience thresholds; values outside 0-30% range
may capture unrealistic scenarios not observed in historical financial crises
derived_from_bd_id: BD-054
- id: finance-C-132
when: When implementing or refactoring the balance sheet initialization logic
action: Calculate total_assets as CET1E/leverage_ratio and liabilities as total_assets - CET1E, maintaining the fundamental
accounting identity asset = liability + equity
severity: high
kind: domain_rule
modality: must
consequence: Breaking the accounting identity causes the balance sheet to fail balancing, producing incorrect leverage
ratios and making all regulatory capital calculations meaningless
derived_from_bd_id: BD-002
- id: finance-C-133
when: When configuring the model's capital structure parameters for regulatory stress testing
action: Verify that CET1E (Common Equity Tier 1 capital) and leverage_ratio values match the intended regulatory framework
requirements, and document the source of these parameters
severity: medium
kind: operational_lesson
modality: should
consequence: Incorrect capital assumptions produce wrong leverage ratios, causing the stress test to misrepresent the
bank's actual capital adequacy and regulatory standing
derived_from_bd_id: BD-002
- id: finance-C-134
when: When implementing the fire sale clearing logic during market stress events
action: Process each fire sale orders simultaneously before computing price impact — aggregate each sell orders first,
then calculate price impact once on the combined quantity, not sequentially on each individual order
severity: high
kind: domain_rule
modality: must
consequence: Sequential price impact application underestimates the true market impact of aggregate fire sales, causing
the stress test to overstate remaining portfolio value during realistic liquidity crises
derived_from_bd_id: BD-014
- id: finance-C-135
when: When initializing the bank's asset portfolio in the stress test model
action: Calculate other_assets as total_assets - gov_bonds - corp_bonds - cash, ensuring the balance sheet identity holds
after explicit asset allocation
severity: high
kind: domain_rule
modality: must
consequence: Incorrect residual calculation breaks the balance sheet identity, causing total assets to not equal the sum
of allocated assets plus other assets, invalidating all subsequent stress calculations
derived_from_bd_id: BD-068
- id: finance-C-136
when: When configuring the initial shock parameters for sovereign debt stress testing
action: Verify that the 20% initial shock default on government bonds matches the intended stress scenario severity, and
adjust if modeling a different crisis magnitude (e.g., mild 5%, extreme 40%)
severity: medium
kind: operational_lesson
modality: should
consequence: Using the wrong shock magnitude produces non-representative stress test results — too low understates contagion
risk, too high may trigger unrealistic cascading defaults in the model
derived_from_bd_id: BD-005
- id: finance-C-137
when: When implementing the default handling and fire sale execution logic
action: Batch each sell orders together before computing price impact — set SIMULTANEOUS_FIRESALE=True to calculate impact
once on aggregate quantity, preserving the illiquidity assumption that orders don't move prices until clearing
severity: high
kind: domain_rule
modality: must
consequence: Disabling batch mode causes price signals from early sales to affect subsequent sales, breaking the illiquidity
assumption and understating the cascade severity in stress scenarios
derived_from_bd_id: BD-092
- id: finance-C-138
when: When implementing liquidation logic using default liquidation that sells each tradable assets proportionally
action: Verify that per-asset-type price impact assumptions hold for the specific assets being liquidated; assets with
different liquidity characteristics (e.g., government bonds vs corporate bonds from defaulted entity) should NOT be
treated as fungible substitutes
severity: high
kind: operational_lesson
modality: must
consequence: Treating illiquid assets as fungible with liquid counterparts underestimates liquidation costs by 10-30%
for distressed securities, causing backtest results to overstate actual recovery values
derived_from_bd_id: BD-114
- id: finance-C-140
when: When implementing or extending initialization and scheduling logic in the trading framework
action: Assume trading calendar operations work correctly without explicit isolation from system calendar operations;
trading calendar interactions with system calendar can cause incorrect scheduling and missed trading windows
severity: high
kind: claim_boundary
modality: must_not
consequence: Without explicit trading calendar isolation, batch jobs and scheduled tasks may execute on non-trading days
or miss trading windows, causing strategies to fail or positions to remain unmanaged during critical periods
derived_from_bd_id: BD-GAP-002
- id: finance-C-141
when: When implementing trading calendar management in the framework
action: Implement explicit trading calendar isolation by maintaining a separate calendar instance for trading operations,
ensuring each date/time operations involving trading schedules use this isolated calendar and validate against trading
day rules
severity: high
kind: domain_rule
modality: must
consequence: Without isolated trading calendar, system calendar changes or timezone shifts can corrupt trading schedules,
causing strategies to attempt trading on non-trading days or skip valid trading opportunities
derived_from_bd_id: BD-GAP-002
- id: finance-C-142
when: When implementing any date/time handling in the trading framework
action: Assume date/time values are implicitly in the correct timezone; implicit timezone handling leads to execution
at wrong times, causing strategies to trade before or after market opens
severity: high
kind: claim_boundary
modality: must_not
consequence: Implicit timezone handling causes trades to execute at wrong times (e.g., buying after market close, selling
before open), resulting in missed opportunities or execution at unfavorable prices
derived_from_bd_id: BD-GAP-003
- id: finance-C-143
when: When implementing any datetime or timestamp fields in the trading framework
action: Add explicit timezone annotation to each datetime fields and enforce timezone validation at data ingestion; verify
each timestamps are converted to a canonical timezone (e.g., UTC) before processing and converted to market timezone
only at display/execution time
severity: high
kind: domain_rule
modality: must
consequence: Without explicit timezone annotation, backtests use incorrect timestamps leading to trades at wrong times,
with live trading failing to match backtest behavior due to timezone mismatches
derived_from_bd_id: BD-GAP-003
- id: finance-C-144
when: When implementing default account fund collection logic
action: Assume the framework handles collection priority and compliance automatically; failure to implement explicit collection
priority rules violates regulatory requirements and may result in improper fund handling
severity: high
kind: claim_boundary
modality: must_not
consequence: Without explicit collection priority and compliance controls, fund collection may violate regulatory sequencing
requirements, leading to compliance violations, penalties, or customer disputes
derived_from_bd_id: BD-GAP-012
- id: finance-C-145
when: When implementing account fund collection and recovery operations
action: Implement explicit collection priority rules defining the sequence of fund sources (e.g., cash accounts first,
then securities, then other assets) and add compliance validation checks to verify collection follows regulatory requirements
for the applicable jurisdiction
severity: high
kind: domain_rule
modality: must
consequence: Explicit collection priority ensures regulatory compliance and prevents improper fund seizure; without it,
collection may violate customer protections or regulatory sequencing rules
derived_from_bd_id: BD-GAP-012
- id: finance-C-146
when: When implementing price execution logic for batch trade matching
action: Use midpoint pricing (average of pre/post prices) for each executions in batch mode; verify sellers receive the
midpoint price to prevent front-running and guarantee fair execution symmetry for each participants
severity: high
kind: domain_rule
modality: must
consequence: Deviating from midpoint pricing creates front-running opportunities where participants trade at favorable
prices at the expense of others, breaking batch execution fairness and potentially causing disputes or regulatory scrutiny
derived_from_bd_id: BD-017
- id: finance-C-147
when: When implementing or modifying insolvency detection logic in bank default handling
action: Preserve the insolvency trigger at leverage < 3% (BANK_LEVERAGE_MIN=0.03) as the hard stop for forced liquidation
— any modification to this threshold must be explicitly reviewed
severity: high
kind: operational_lesson
modality: must
consequence: Changing the insolvency threshold below 3% allows banks with inadequate asset coverage to continue trading,
amplifying losses that should have triggered forced liquidation and distorting systemic risk measurements
derived_from_bd_id: BD-095
- id: finance-C-148
when: When implementing firesale settlement logic that handles simultaneous bank liquidations
action: Compute price impact BEFORE settling any firesale orders — the ordering of compute_price_impact then settle is
critical and must be preserved; reversing this order corrupts each cascade result
severity: high
kind: operational_lesson
modality: must
consequence: Reversing the firesale order causes the first sale to update market prices before subsequent sales execute,
resulting in later sales receiving worse execution at artificially depressed prices and silently corrupting the entire
cascade
derived_from_bd_id: BD-112
- id: finance-C-149
when: When implementing solvency determination logic that combines market-valued and face-valued assets
action: Recognize that solvency boundaries are determined by a mixed valuation approach — tradable assets use market prices
while loans use face value, creating arbitrary cutoff points where market declines are masked
severity: medium
kind: operational_lesson
modality: should
consequence: Mixed valuation may show a bank as solvent when market-valued assets have declined substantially, because
unchanged loan face values mask actual credit deterioration and market conditions are not reflected in solvency calculations
derived_from_bd_id: BD-117
- id: finance-C-150
when: When implementing data initialization or lookback logic for historical data queries
action: Assume the framework provides point-in-time data availability — historical data queries may return current values
rather than values as of a specific date; the framework does not implement temporal data versioning
severity: high
kind: claim_boundary
modality: must_not
consequence: Without point-in-time data handling, historical backtests use current values for past dates, introducing
look-ahead bias that makes backtest results completely non-reproducible in live trading
derived_from_bd_id: BD-GAP-005
- id: finance-C-151
when: When implementing data layer initialization or historical data retrieval
action: Implement point-in-time data retrieval using temporal query fields (e.g., as_of_date, valid_start, valid_end)
in the data schema, ensuring historical queries return values as they existed at each specific point in time
severity: high
kind: operational_lesson
modality: must
consequence: Without point-in-time data retrieval, historical backtests use current values for past dates, introducing
look-ahead bias that causes live trading returns to fall far below backtested results
derived_from_bd_id: BD-GAP-005
- id: finance-C-152
when: When implementing firesale execution logic in market clearing that handles simultaneous bank liquidations
action: Process each firesale orders in a batch at the same pre-settlement price — execute each orders in the firesale
simultaneously at the price computed before any settlement occurs; do NOT interleave with other transactions or change
to sequential processing
severity: high
kind: domain_rule
modality: must
consequence: Changing from simultaneous batch processing to sequential or interleaved execution gives first-mover advantage
to early sellers and understates the liquidity pressure of simultaneous liquidations during systemic stress, producing
unrealistic stress test results
derived_from_bd_id: BD-032
- id: finance-C-153
when: When initializing asset prices in market simulation
action: 'Verify that each assets are explicitly initialized with known prices before use; the defaultdict(lambda: 1.0)
default may mask missing initialization and silently propagate placeholder values'
severity: medium
kind: operational_lesson
modality: should
consequence: Using default price of 1.0 for unseen assets can silently propagate placeholder values through calculations,
causing incorrect simulation results that appear valid without explicit validation
derived_from_bd_id: BD-062
- id: finance-C-154
when: When implementing loan valuation in banking simulation
action: Assume loans are valued at face value regardless of credit quality; impaired loans must be marked down to reflect
actual recoverable value, not held at principal
severity: high
kind: domain_rule
modality: must_not
consequence: Valuing impaired loans at face value overstates asset values, causing incorrect leverage ratios and capital
adequacy calculations; in live trading, impaired loan portfolios would trigger regulatory violations not apparent in
backtest
derived_from_bd_id: BD-065
- id: finance-C-155
when: When configuring leverage targeting sensitivity analysis
action: Document that buffer=1.0 (100%) is an extreme edge case where target equals regulatory minimum; results represent
worst-case targeting behavior and do not reflect conservative banking practices
severity: medium
kind: operational_lesson
modality: must
consequence: Using buffer=1.0 as a general-purpose baseline produces unrealistic results; actual banks maintain buffers
of 0.5-3% for safety, so leverage targeting at minimum threshold is an atypical edge case that will not generalize to
practical scenarios
derived_from_bd_id: BD-055
- id: finance-C-156
when: When configuring leverage targeting sensitivity analysis with 100% buffer
action: Document that 100% buffer represents aggressive targeting with no safety margin; results are not generalizable
to more conservative targeting approaches that maintain actual safety buffers
severity: medium
kind: operational_lesson
modality: must
consequence: Leverage targeting with 100% buffer isolates targeting from buffer effects but creates results that apply
only to edge-case configurations; practical targeting strategies maintain 0.5-3% buffers, so comparative analysis may
mislead if this boundary is not documented
derived_from_bd_id: BD-057
- id: finance-C-157
when: When implementing price impact calculations for large-volume trades
action: Use exponential decay price impact function; do not replace with linear models as they underestimate large-volume
impacts during market stress conditions
severity: high
kind: domain_rule
modality: must
consequence: Linear price impact models systematically underestimate market impact for large trades, causing backtested
execution costs to appear lower than actual costs in live trading with realistic market depth constraints
derived_from_bd_id: BD-058
- id: finance-C-158
when: When implementing balance sheet initialization in the banking model
action: 'Calculate total assets from CET1E and leverage ratio using formula: assets = CET1E / (leverage/100) — do not
reverse the calculation by starting from assets and deriving capital'
severity: high
kind: domain_rule
modality: must
consequence: Reversing the calculation order (assets → capital) breaks the iterative loop for capital ratio calculation,
causing either non-convergence or incorrect leverage ratios that misrepresent regulatory capital adequacy
derived_from_bd_id: BD-067
- id: finance-C-159
when: When implementing market stress price impact in the banking model
action: Calibrate price impact using fixed 1:1 ratio where 5% market sell causes 5% price drop — do not optimize or adjust
this ratio unless explicitly testing alternative liquidity scenarios
severity: high
kind: domain_rule
modality: must
consequence: Using higher ratios overstates fire-sale dynamics causing excessive price drops in stress tests; using lower
ratios understates liquidity risk leading to inaccurate stress scenario results
derived_from_bd_id: BD-070
- id: finance-C-161
when: When implementing deleveraging logic in the banking model
action: Activate deleveraging when leverage falls below 4% — this creates a 1% buffer zone above the 3% insolvency threshold
allowing banks remediation opportunity before default
severity: high
kind: architecture_guardrail
modality: must
consequence: Removing or changing the buffer zone eliminates the remediation window, causing banks to default immediately
when hitting insolvency threshold instead of attempting to reduce leverage proactively
derived_from_bd_id: BD-073
- id: finance-C-163
when: When configuring stress test shock sweep parameters
action: Verify that the shock sweep range covers 0-30% with at least 21 points (1.5% increments) to capture threshold
effects where contagion becomes systemic
severity: medium
kind: operational_lesson
modality: should
consequence: Incorrect shock range causes critical crisis scenarios to be missed entirely, making systemic risk assessments
incomplete and potentially causing underestimation of tail risk exposure
derived_from_bd_id: BD-078
- id: finance-C-164
when: When computing systemic risk classification in stress test simulations
action: 'Apply 5% EOSE threshold: values below 5% indicate no systemic event, values at or above 5% indicate systemic
crisis'
severity: high
kind: domain_rule
modality: must
consequence: Using an incorrect EOSE threshold causes misclassification of systemic events; wrong threshold may trigger
false alerts or miss critical contagion scenarios, leading to incorrect risk management decisions
derived_from_bd_id: BD-079
- id: finance-C-165
when: When configuring stress test simulation duration
action: Verify that 6 timesteps are sufficient to capture full contagion dynamics including initial shock, first wave,
stabilization, and final state resolution
severity: medium
kind: operational_lesson
modality: should
consequence: Reducing timesteps below 6 may cause late-stage contagion effects to be missed entirely, resulting in incomplete
systemic risk assessment and underestimation of cascade failures
derived_from_bd_id: BD-081
- id: finance-C-166
when: When implementing settlement price calculation for asset sales in stress test simulations
action: 'Calculate settlement price as midpoint: (current_price + old_price) / 2 — this represents fair value OTC execution
preventing extreme fire-sale valuations'
severity: high
kind: domain_rule
modality: must
consequence: Using only current price undervalues assets during fire sales while using only old price overvalues; either
deviation causes systematic misvaluation leading to incorrect loss calculations and suboptimal risk management
derived_from_bd_id: BD-083
- id: finance-C-167
when: When configuring leverage targeting simulation parameters
action: Document and validate that leverage_buffer=1.0 produces specific simulation outcomes — this creates an aggressive
deleveraging target (2x current leverage), and strategies using different buffer values will yield materially different
cascade dynamics
severity: medium
kind: operational_lesson
modality: should
consequence: Using fixed buffer=1.0 means the simulation only tests aggressive deleveraging scenarios; strategies assuming
different buffer values may produce different cascade behaviors not captured in this backtest
derived_from_bd_id: BD-087
- id: finance-C-168
when: When running stress test simulations with initial market shocks
action: Verify that initial shock targeting government bonds only (no equity shock) aligns with the stress scenario hypothesis
— other asset classes may behave differently under stress
severity: medium
kind: operational_lesson
modality: should
consequence: Applying initial shock only to government bonds means equity-driven stress scenarios produce different cascade
patterns not captured in the simulation; backtest results are specific to sovereign risk events, not general market
stress
derived_from_bd_id: BD-088
- id: finance-C-169
when: When interpreting cascade severity results from stress testing
action: Recognize that 20% initial bond shock combined with SIMULTANEOUS_FIRESALE batch processing produces 2-3x higher
cascade severity than sequential selling — this amplification effect is specific to simultaneous execution and does
not represent each fire sale scenarios
severity: high
kind: domain_rule
modality: must
consequence: The triple interaction effect (BD-108) causes cascade severity 2-3x higher than any single mechanism; interpreting
results as representative of sequential fire sales would overestimate systemic risk by 200-300%
derived_from_bd_id: BD-108
- id: finance-C-170
when: When analyzing long-duration stress scenarios or cascade persistence
action: 'Account for the deleveraging feedback loop: leverage_buffer triggers fire sales, which erode asset values, which
drop leverage further, re-triggering the buffer threshold — this creates extended cascade duration not seen in single-pass
simulations'
severity: high
kind: domain_rule
modality: must
consequence: The BD-109 feedback loop causes cascades to persist until assets are fully sold or banks default; single-timestep
severity metrics underestimate total systemic impact by not capturing the iterative erosion pattern
derived_from_bd_id: BD-109
- id: finance-C-171
when: When designing or validating stress test timing mechanisms
action: 'Account for deferred default execution: insolvent banks continue operating during the interval between detection
and next step(), accumulating positions that may create additional interdependencies before defaults execute'
severity: high
kind: domain_rule
modality: must
consequence: Deferred execution (BD-118) allows insolvent banks to accumulate positions between detection and execution;
cascade timing and severity differ from immediate-execution models, potentially masking or amplifying systemic risk
depending on step granularity
derived_from_bd_id: BD-118
output_validator:
assertions:
- id: OV-01
check_predicate: all(p in inspect.getsource(zvt.factors.algorithm.macd) for p in ['slow=26', 'fast=12', 'n=9'])
failure_message: 'FATAL: MACD params drifted from (fast=12, slow=26, n=9) — SL-08 violation, non-reproducible signals'
business_meaning: Standard MACD parameters are a semantic lock; drift makes results incomparable with industry-standard
indicators and non-reproducible.
source_ids:
- SL-08
- BD-036
- id: OV-02
check_predicate: result.get('total_trades', 0) > 0 or result.get('explicit_zero_trade_ack') is True
failure_message: Zero trades executed — likely missing pre-fetched data (see PC-02) or over-restrictive filters
business_meaning: A backtest with zero trades is not a valid result; either data is missing or the strategy never triggered.
Structural non-emptiness check is insufficient — we need business confirmation.
source_ids:
- SL-01
- finance-C-073
- id: OV-03
check_predicate: result.get('annual_return') is None or abs(float(result['annual_return'])) <= 5.0
failure_message: 'FATAL: |annual_return| > 500% — likely look-ahead bias or data error'
business_meaning: Annual returns exceeding 500% are physically implausible for A-share strategies; indicates look-ahead
bias or corrupt data.
source_ids: []
- id: OV-04
check_predicate: result.get('holding_change_pct') is None or abs(float(result['holding_change_pct'])) <= 1.0
failure_message: 'FATAL: |holding_change_pct| > 100% — physically impossible'
business_meaning: Holding change percentage cannot exceed 100%; violation indicates position accounting error.
source_ids:
- BD-029
- id: OV-05
check_predicate: result.get('max_drawdown') is None or abs(float(result['max_drawdown'])) <= 1.0
failure_message: 'FATAL: |max_drawdown| > 100% — impossible for non-leveraged account'
business_meaning: Maximum drawdown cannot exceed 100% without leverage; violation indicates calculation error or look-ahead
bias.
source_ids: []
- id: OV-06
check_predicate: not (hasattr(result, 'trade_log') and result.trade_log and any(result.trade_log[i].action == 'sell' and
i+1 < len(result.trade_log) and result.trade_log[i+1].action == 'buy' and result.trade_log[i].timestamp == result.trade_log[i+1].timestamp
for i in range(len(result.trade_log)-1)))
failure_message: 'FATAL: buy-before-sell detected in same cycle — SL-01 violation, creates implicit leverage'
business_meaning: SL-01 requires sell() before buy() in each cycle; violation means available_long was not updated before
buying, risking duplicate positions.
source_ids:
- SL-01
scaffold:
validate_py_path: '{workspace}/validate.py'
tail_block: "# === DO NOT MODIFY BELOW THIS LINE ===\nif __name__ == \"__main__\":\n result = run_backtest()\n from\
\ validate import enforce_validation\n enforce_validation(result, output_path=\"{workspace}/result.csv\")\n# ===\
\ END DO NOT MODIFY ==="
enforcement_protocol: 1. Never edit validate.py. 2. Never delete the DO NOT MODIFY tail block from the main script. 3. Never
wrap enforce_validation() in try/except. 4. Never rewrite result write logic — it MUST go through enforce_validation.
5. If validate.py raises ImportError, fix the dependency, do not remove the call.
acceptance:
hard_gates:
- id: G1
check: '{workspace}/result.csv exists AND file size > 0'
on_fail: Strategy did not produce output; check run_backtest() return value and enforce_validation() call
- id: G2
check: '{workspace}/result.csv.validation_passed marker file exists'
on_fail: Validation did not complete; review validate.py output and fix assertion failures
- id: G3
check: 'Main script contains literal: from validate import enforce_validation'
on_fail: Validation chain stripped; re-add the import in the DO NOT MODIFY block
- id: G4
check: 'Main script contains literal: # === DO NOT MODIFY BELOW THIS LINE ==='
on_fail: Validation fence removed; regenerate DO NOT MODIFY tail block
- id: G5
check: 'result.csv has at least 1 row: pandas.read_csv(result.csv).shape[0] >= 1'
on_fail: Empty result; check if trade_log is non-empty and factors generated signals. Confirm PC-02 (k-data exists) passed.
- id: G6
check: 'If MACD strategy: source contains ''slow=26'' AND ''fast=12'' AND ''n=9'' in algorithm call'
on_fail: MACD params drifted from SL-08 lock; restore standard (12, 26, 9)
- id: G7
check: 'For data pipeline tasks: result.csv contains ''entity_id'' and ''timestamp'' fields'
on_fail: Missing required columns; check Mixin.query_data return schema and DataFrame MultiIndex reset_index() before
writing
- id: G8
check: 'OV-03 passes: abs(annual_return) <= 5.0 (500%)'
on_fail: Physical plausibility check failed; investigate look-ahead bias or data corruption in input kdata
soft_gates:
- id: SG-01
rubric: 'Strategy narrative consistency: user intent aligns with generated strategy.py logic. dim_a: signal direction
(buy/sell) matches intent [1-5, pass>=4]; dim_b: frequency (daily/intraday) aligns [1-5, pass>=4]; dim_c: risk controls
match user intent [1-5, pass>=4].'
- id: SG-02
rubric: 'Factor combination quality. dim_a: no highly correlated factor duplication [1-5, pass>=4]; dim_b: multi-period
alignment correct [1-5, pass>=4]; dim_c: liquidity filter present for A-share [1-5, pass>=4].'
- id: SG-03
rubric: 'Data source selection appropriateness. dim_a: coverage sufficient for target entities [1-5, pass>=4]; dim_b:
provider latency acceptable for strategy frequency [1-5, pass>=4]; dim_c: no unauthorized provider used without credentials
[1-5, pass>=4].'
skill_crystallization:
trigger: all_hard_gates_passed AND user_opt_out_skill_saving != true
output_path_template: '{workspace}/../skills/{slug}.skill'
slug_template: '{blueprint_id_short}-{uc_id_lower}'
captured_fields:
- name
- intent_keywords
- entry_point_script
- validate_script
- fatal_constraints
- spec_locks
- preconditions
- install_recipes
- human_summary_translated
action: 'After all Hard Gates PASS, resolve slug via slug_template using the executed UC, then write the .skill YAML file
at output_path_template. Notify user in their detected locale: ''Skill saved as {slug}.skill — next time say one of {sample_triggers}
from the matched UC to invoke directly.'''
violation_signal: All hard gates passed but no .skill file exists at expected path
skill_file_schema:
name: finance-bp-067 / UC-101
version: v5.3
intent_keywords: []
entry_point: run_backtest
fatal_guards:
- SL-01
- SL-02
- SL-03
- SL-04
- SL-05
- SL-06
- SL-07
- SL-08
- SL-10
- SL-11
- SL-12
spec_locks:
- SL-01
- SL-02
- SL-03
- SL-04
- SL-05
- SL-06
- SL-07
- SL-08
- SL-09
- SL-10
- SL-11
- SL-12
preconditions:
- PC-01
- PC-02
- PC-03
- PC-04
post_install_notice:
trigger: skill_installation_complete
message_template:
positioning: I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow.
capability_catalog:
group_strategy:
source: auto_grouped
strategy_reason: no candidate field had 2-7 distinct values; all capabilities collapsed into single group
groups:
- group_id: all
name: All Capabilities
description: ''
emoji: 📦
uc_count: 0
ucs: []
call_to_action: Tell me which one you want to try.
featured_entries:
- uc_id: UC-100
beginner_prompt: Try capability UC-100
auto_selected: true
- uc_id: UC-101
beginner_prompt: Try capability UC-101
auto_selected: true
- uc_id: UC-102
beginner_prompt: Try capability UC-102
auto_selected: true
more_info_hint: Ask me 'what else can you do?' to see all 0 capabilities.
locale_rendering:
instruction: On skill_installation_complete, translate ALL user-facing strings (positioning + capability_catalog.groups[].name
+ capability_catalog.groups[].description + capability_catalog.groups[].ucs[].short_description + call_to_action + featured_entries[].beginner_prompt
+ more_info_hint) into detected user locale per locale_contract. Preserve UC-IDs, group_id, emoji, and sample_triggers
verbatim.
preserve_verbatim:
- UC-IDs
- group_id
- emoji
- sample_triggers
- technical_class_names
enforcement:
action: 'Host agent MUST send composed message to user as the FIRST user-facing response after skill_installation_complete
event. Message MUST contain: positioning, capability_catalog (rendered as markdown tables per group), 3 featured_entries,
call_to_action, and more_info_hint.'
violation_code: PIN-01
violation_signal: First user-facing message post-install does not contain the full capability_catalog (all UCs grouped)
OR skips featured_entries OR skips call_to_action.
human_summary:
persona: Doraemon
what_i_can_do:
tagline: 'I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow. Just tell me
what you want; I''ll write the code, you don''t have to dig docs. (Heads up: ZVT natively supports A-share, HK, and
crypto. US stocks — stockus_nasdaq_AAPL — are half-baked; don''t bother for serious work.)'
use_cases:
- A-share MACD daily golden-cross backtest with hfq price adjustment from eastmoney
- 'End-to-end ZVT pipeline: FinanceRecorder + GoodCompanyFactor + StockTrader'
- Multi-factor strategy with TargetSelector (AND mode) combining MACD + volume breakout
- Index composition data collection (SZ1000, SZ2000) with EM recorder
- Institutional fund holdings tracker via joinquant_fund_runner pattern
- Custom Transformer + Accumulator factor with per-entity rolling state
- Bollinger Band mean-reversion factor with BollTransformer (window=20, window_dev=2)
what_i_auto_fetch:
- ZVT stage pipeline structure (data_collection → visualization) from LATEST.yaml
- Semantic locks (SL-01 through SL-12) — especially sell-before-buy ordering and MACD params
- Fatal constraints (finance-C-*) relevant to your target strategy type
- 'Default parameters: MACD(12,26,9), hfq adjustment, buy_cost=0.001, base_capital=1M CNY'
- Entity ID format (stock_sh_600000) and DataFrame MultiIndex convention
- Provider-specific recorder class names and required class attributes
what_i_ask_you:
- 'Target market: A-share (default), HK, or crypto? (US stocks in ZVT are half-baked — stockus_nasdaq_AAPL exists but coverage
is thin)'
- 'Data source / provider: eastmoney (free, no account), joinquant (account+paid), baostock (free, good history), akshare,
or qmt (broker)?'
- 'Strategy type: MACD golden-cross, MA crossover, volume breakout, fundamental screen, or custom factor?'
- 'Time range: start_timestamp and end_timestamp for backtest period'
- 'Target entity IDs: specific stocks (stock_sh_600000) or index components (SZ1000)?'
locale_rendering:
instruction: On first user contact, translate all fields above into detected user locale while preserving Doraemon persona
(direct, frank, mildly snarky, knows limits).
preserve_verbatim:
- BD-IDs
- SL-IDs
- UC-IDs
- finance-C-IDs
- class_names
- function_names
- file_paths
- numeric_thresholds
多智能体金融分析平台,支持股票研究、市场预测、财报解读与量化回测策略构建,覆盖全球市场数据分析。
---
name: finrobot-multi-agent
description: |-
多智能体金融分析平台,支持股票研究、市场预测、财报解读与量化回测策略构建,覆盖全球市场数据分析。
license: Proprietary. See LICENSE.txt in project root.
compatibility: Designed for Doramagic-host ecosystem (Claude Code / openclaw / Cursor). Requires Python 3.12+ with uv package manager.
metadata:
version: "v6.1"
blueprint_id: "finance-bp-074"
compiled_at: "2026-04-22T13:00:27.479397+00:00"
capability_markets: "global"
capability_activities: "macro-data"
sop_version: "crystal-compilation-v6.1"
---
# FinRobot 多智能体 (finrobot-multi-agent)
> 多智能体金融分析平台,支持股票研究、市场预测、财报解读与量化回测策略构建,覆盖全球市场数据分析。
## Pipeline
`data_collection -> data_storage -> factor_computation -> target_selection -> trading_execution -> visualization`
## Top Use Cases (14 total)
### FMP API Equity Research Report Generator (`UC-101`)
Investors need comprehensive equity research reports that combine financial statement analysis, peer comparisons, and recent news to make informed inv
**Triggers**: equity research, financial analysis report, FMP API
### Multi-Agent Annual Report Generator (`UC-102`)
Financial analysts require automated generation of customized financial analysis reports that can interact with clients, gather requirements, and prod
**Triggers**: annual report, financial report generation, multi-agent
### OpenBB Financial Data Agent (`UC-104`)
Users need an intelligent agent interface to access OpenBB's comprehensive financial data capabilities including market data, fundamentals, and techni
**Triggers**: openbb, financial data agent, market data
For all **14** use cases, see [references/USE_CASES.md](references/USE_CASES.md).
**Execute trigger**: `When user intent matches intent_router.uc_entries[].positive_terms AND user uses action verb (run/execute/跑/执行/backtest/fetch/collect)`
## What I'll Ask You
- Target market: A-share (default), HK, or crypto? (US stocks in ZVT are half-baked — stockus_nasdaq_AAPL exists but coverage is thin)
- Data source / provider: eastmoney (free, no account), joinquant (account+paid), baostock (free, good history), akshare, or qmt (broker)?
- Strategy type: MACD golden-cross, MA crossover, volume breakout, fundamental screen, or custom factor?
- Time range: start_timestamp and end_timestamp for backtest period
- Target entity IDs: specific stocks (stock_sh_600000) or index components (SZ1000)?
## Semantic Locks (Fatal)
| ID | Rule | On Violation |
|---|---|---|
| `SL-01` | Execute sell orders before buy orders in every trading cycle | halt |
| `SL-02` | Trading signals MUST use next-bar execution (no look-ahead) | halt |
| `SL-03` | Entity IDs MUST follow format entity_type_exchange_code | halt |
| `SL-04` | DataFrame index MUST be MultiIndex (entity_id, timestamp) | halt |
| `SL-05` | TradingSignal MUST have EXACTLY ONE of: position_pct, order_money, order_amount | halt |
| `SL-06` | filter_result column semantics: True=BUY, False=SELL, None/NaN=NO ACTION | halt |
| `SL-07` | Transformer MUST run BEFORE Accumulator in factor pipeline | halt |
| `SL-08` | MACD parameters locked: fast=12, slow=26, signal=9 | halt |
Full lock definitions: [references/LOCKS.md](references/LOCKS.md)
## Top Anti-Patterns (14 total)
- **`AP-MACRO-DATA-001`**: SEC EDGAR Rate Limit Violation
- **`AP-MACRO-DATA-002`**: Temporal Knowledge Graph Look-Ahead Bias
- **`AP-MACRO-DATA-003`**: Technical Indicator Look-Ahead Bias via Missing Shift
All 14 anti-patterns: [references/ANTI_PATTERNS.md](references/ANTI_PATTERNS.md)
## Evidence Quality Notice
> [QUALITY NOTICE] This crystal was compiled from blueprint finance-bp-074. Evidence verify ratio = 11.5% and audit fail total = 36. Generated results may have uncaptured requirement gaps. Verify critical decisions against source files (LATEST.yaml / LATEST.jsonl).
## Reference Files
| File | Contents | When to Load |
|---|---|---|
| [references/seed.yaml](references/seed.yaml) | V6+ 全量权威 (source-of-truth) | 有行为/决策争议时必读 |
| [references/ANTI_PATTERNS.md](references/ANTI_PATTERNS.md) | 14 条跨项目反模式 | 开始实现前 |
| [references/WISDOM.md](references/WISDOM.md) | 跨项目精华借鉴 | 架构决策时 |
| [references/CONSTRAINTS.md](references/CONSTRAINTS.md) | domain + fatal 约束 | 规则冲突时 |
| [references/USE_CASES.md](references/USE_CASES.md) | 全量 KUC-* 业务场景 | 需要完整示例时 |
| [references/LOCKS.md](references/LOCKS.md) | SL-* + preconditions + hints | 生成回测/交易代码前 |
| [references/COMPONENTS.md](references/COMPONENTS.md) | AST 组件地图(按 module 拆分)| 查 API 时 |
---
*Compiled by Doramagic crystal-compilation-v6.1 from `finance-bp-074` blueprint at 2026-04-22T13:00:27.479397+00:00.*
*See [human_summary.md](human_summary.md) for non-technical overview.*
FILE:human_summary.md
# Human Summary
> I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow. Just tell me what you want; I'll write the code, you don't have to dig docs. (Heads up: ZVT natively supports A-share, HK, and crypto. US stocks — stockus_nasdaq_AAPL — are half-baked; don't bother for serious work.)
## What I Can Do
- **tagline**: I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow. Just tell me what you want; I'll write the code, you don't have to dig docs. (Heads up: ZVT natively supports A-share, HK, and crypto. US stocks — stockus_nasdaq_AAPL — are half-baked; don't bother for serious work.)
- **use_cases**: ['FinnGPT Market Forecaster', 'Multi-Agent Annual Report Generator', 'FMP API Equity Research Report Generator', 'A-share MACD daily golden-cross backtest with hfq price adjustment from eastmoney', 'End-to-end ZVT pipeline: FinanceRecorder + GoodCompanyFactor + StockTrader', 'Multi-factor strategy with TargetSelector (AND mode) combining MACD + volume breakout', 'Index composition data collection (SZ1000, SZ2000) with EM recorder']
## What I Ask You
- Target market: A-share (default), HK, or crypto? (US stocks in ZVT are half-baked — stockus_nasdaq_AAPL exists but coverage is thin)
- Data source / provider: eastmoney (free, no account), joinquant (account+paid), baostock (free, good history), akshare, or qmt (broker)?
- Strategy type: MACD golden-cross, MA crossover, volume breakout, fundamental screen, or custom factor?
- Time range: start_timestamp and end_timestamp for backtest period
- Target entity IDs: specific stocks (stock_sh_600000) or index components (SZ1000)?
FILE:references/ANTI_PATTERNS.md
# Anti-Patterns (Cross-Project)
Total: **14**
## finance-bp-074--FinRobot (1)
### `AP-MACRO-DATA-001` — SEC EDGAR Rate Limit Violation <sub>(high)</sub>
When implementing SEC API calls without applying rate limiting decorators, requests exceed the regulatory 10 requests per second limit. This causes IP blocking from SEC EDGAR, preventing all subsequent access to critical financial filings and completely disrupting the data collection pipeline. FinRobot demonstrates that SEC enforces strict rate limits and missing User-Agent headers compound this by causing silent request failures.
## finance-bp-077--Open_Source_Economic_Model (2)
### `AP-MACRO-DATA-004` — EIOPA Non-Compliant Curve Extrapolation <sub>(high)</sub>
When implementing the Smith-Wilson algorithm for EIOPA Solvency II calculations, using non-EIOPA compliant formulas or incorrect convergence point calculations violates regulatory specifications. The convergence point must use max(U+40, 60) years per EIOPA paragraph 120. Non-compliant formulas will fail regulatory audits for insurance liability calculations and produce incorrect risk-free rates, leading to materially wrong liability valuations.
### `AP-MACRO-DATA-009` — CSV BOM Encoding Corruption in Data Import <sub>(medium)</sub>
When importing CSV portfolio files with special characters without using 'utf-8-sig' encoding to handle BOM markers, CSV files with UTF-8 BOM markers fail to parse correctly. This causes KeyError exceptions when reading row fields, preventing portfolio data from loading entirely. The BOM marker silently corrupts the first column name read by pandas.
## finance-bp-080--FinDKG (3)
### `AP-MACRO-DATA-002` — Temporal Knowledge Graph Look-Ahead Bias <sub>(high)</sub>
When implementing temporal data splitting for knowledge graphs, using non-temporal train/val/test splits causes the model to see future events during training. The violation of train_edges occurring before val_edges and test_edges temporally results in inflated metrics that do not reflect real-world performance. This produces overfit models that fail catastrophically when deployed for actual temporal prediction tasks.
### `AP-MACRO-DATA-008` — DGL Graph Attribute Propagation Failure in Temporal Batching <sub>(medium)</sub>
When implementing temporal knowledge graph data collation without propagating graph attributes (num_relations, num_all_nodes, time_interval) to subgraph variants created by collate_fn, downstream model components encounter missing attribute errors. The EmbeddingUpdater and EdgeModel expect these attributes on all graph objects including subgraphs, causing training to fail with AttributeError.
### `AP-MACRO-DATA-014` — Temporal DataLoader Shuffling Breaking Graph Ordering <sub>(medium)</sub>
When configuring DataLoader for temporal knowledge graph training with shuffle=True, the temporal ordering required for cumulative graph construction is violated. The model receives edges in non-chronological order, breaking the prior_G, batch_G, cumulative_G construction logic that depends on edges_before_batch occurring before edges_in_batch.
## finance-bp-083--Economic-Dashboard (3)
### `AP-MACRO-DATA-003` — Technical Indicator Look-Ahead Bias via Missing Shift <sub>(high)</sub>
When implementing SMA crossover detection (golden/death cross) without using shift(1) to compare current bar state with prior bar state, crossover detection uses current bar data causing look-ahead bias. Signals appear to fire at the same bar as the cross occurs, producing unrealistic backtest results that fail in live trading. Rationalizing this with 'we need the current bar signal immediately' leads to future information leaking into current signals.
### `AP-MACRO-DATA-010` — OHLCV Data Quality Validation Failure <sub>(medium)</sub>
When calculating technical indicators from OHLCV data without verifying required columns (open, high, low, close, volume), missing required OHLCV columns causes ValueError and prevents technical indicator calculation for affected tickers. This blocks downstream regime classification and pattern detection for all tickers with incomplete data.
### `AP-MACRO-DATA-011` — Inconsistent Primary Key Schema Causing JOIN Failures <sub>(medium)</sub>
When storing derived features in DuckDB with a different primary key schema than technical_features table, inconsistent primary keys prevent JOIN operations between tables. This breaks regime classification and pattern detection pipelines. The composite primary key (ticker, date) must be consistent across all feature tables to enable efficient querying and data integrity.
## finance-bp-105--open-climate-investing (5)
### `AP-MACRO-DATA-005` — Factor Regression Using Raw Returns Instead of Excess Returns <sub>(high)</sub>
When computing returns for CAPM/Fama-French factor regression, using raw stock returns instead of subtracting the risk-free rate (Rf) violates standard financial econometric methodology. CAPM/FF regression requires excess returns (Return - Rf); using raw returns produces incorrect beta estimates that misrepresent a stock's systematic risk exposure. This leads to fundamentally flawed risk attribution and portfolio construction decisions.
### `AP-MACRO-DATA-006` — Percentage vs Decimal Unit Mismatch in Factor Data <sub>(high)</sub>
When importing Fama-French factors from CSV files, failing to divide percentage-formatted factors (e.g., 5.2) by 100 before regression causes coefficients scaled by 100x. This produces statistically invalid inference and meaningless factor loadings. The same issue applies to risk-free rate values, corrupting all CAPM beta calculations downstream.
### `AP-MACRO-DATA-007` — Insufficient Regression Observations for Statistical Validity <sub>(medium)</sub>
When implementing factor regression analysis, using fewer than 20 data points after filtering (inner join, winsorization, date range) produces unreliable or undefined t-statistics and p-values. OLS with insufficient observations produces meaningless regression coefficients, making it impossible to distinguish significant factor exposures from noise. This commonly occurs when combining multiple data sources with missing values.
### `AP-MACRO-DATA-012` — Frequency Column Enforcement Missing in Time Series Schema <sub>(medium)</sub>
When creating PostgreSQL schema for time series tables without explicit frequency column enforcement of 'MONTHLY' or 'DAILY' text values, mixed frequency data corrupts regression calculations. Combining incompatible data frequencies produces statistically invalid regression results. The database must enforce frequency consistency to prevent silent data corruption.
### `AP-MACRO-DATA-013` — PostgreSQL Fork in Multiprocessing Context <sub>(medium)</sub>
When implementing multiprocessing for parallel regression execution using fork start method with psycopg2 database connections, child processes inherit corrupted connection state. This causes 'connection already closed' errors or corrupted connection state in child processes, resulting in failed database writes and incomplete factor regression calculations.
FILE:references/COMPONENTS.md
# Component Capability Map
**Project**: finance-bp-074--FinRobot
**Scan date**: 2026-04-22
**Stats**: {'total_files': 10, 'total_classes': 38, 'total_functions': 0, 'total_stages': 10}
## Modules (10)
- [financial_data_collection](components/financial_data_collection.md): 6 classes
- [quantitative_analysis_&_backtesting](components/quantitative_analysis_-_backtesting.md): 4 classes
- [multi-agent_workflow_orchestration](components/multi-agent_workflow_orchestration.md): 6 classes
- [financial_statement_analysis](components/financial_statement_analysis.md): 3 classes
- [valuation_analysis](components/valuation_analysis.md): 2 classes
- [sensitivity_analysis](components/sensitivity_analysis.md): 2 classes
- [market_catalyst_analysis](components/market_catalyst_analysis.md): 2 classes
- [equity_research_report_generation](components/equity_research_report_generation.md): 6 classes
- [web_application_&_user_interface](components/web_application_-_user_interface.md): 5 classes
- [rag-enhanced_document_retrieval](components/rag-enhanced_document_retrieval.md): 2 classes
FILE:references/CONSTRAINTS.md
# Constraints
## preservation_manifest
```yaml
required_objects:
business_decisions_count: 151
fatal_constraints_count: 38
non_fatal_constraints_count: 213
use_cases_count: 14
semantic_locks_count: 12
preconditions_count: 4
evidence_quality_rules_count: 2
traceback_scenarios_count: 5
```
FILE:references/LOCKS.md
# Semantic Locks + Preconditions
## Semantic Locks (12)
### `SL-01` <sub>(on_violation: fatal)</sub>
Execute sell orders before buy orders in every trading cycle
### `SL-02` <sub>(on_violation: fatal)</sub>
Trading signals MUST use next-bar execution (no look-ahead)
### `SL-03` <sub>(on_violation: fatal)</sub>
Entity IDs MUST follow format entity_type_exchange_code
### `SL-04` <sub>(on_violation: fatal)</sub>
DataFrame index MUST be MultiIndex (entity_id, timestamp)
### `SL-05` <sub>(on_violation: fatal)</sub>
TradingSignal MUST have EXACTLY ONE of: position_pct, order_money, order_amount
### `SL-06` <sub>(on_violation: fatal)</sub>
filter_result column semantics: True=BUY, False=SELL, None/NaN=NO ACTION
### `SL-07` <sub>(on_violation: fatal)</sub>
Transformer MUST run BEFORE Accumulator in factor pipeline
### `SL-08` <sub>(on_violation: fatal)</sub>
MACD parameters locked: fast=12, slow=26, signal=9
### `SL-09` <sub>(on_violation: warning)</sub>
Default transaction costs: buy_cost=0.001, sell_cost=0.001, slippage=0.001
### `SL-10` <sub>(on_violation: fatal)</sub>
A-share equity trading is T+1 (no same-day close of buy positions)
### `SL-11` <sub>(on_violation: fatal)</sub>
Recorder subclass MUST define provider AND data_schema class attributes
### `SL-12` <sub>(on_violation: fatal)</sub>
Factor result_df MUST contain either 'filter_result' OR 'score_result' column
## Preconditions (4)
- **`PC-01`**: `python3 -c 'import zvt; print(zvt.__version__)'` → on_fail: Run: python3 -m pip install zvt then re-run: python3 -m zvt.init_dirs to initialize data directories
- **`PC-02`**: `python3 -c "from zvt.api.kdata import get_kdata; df = get_kdata(entity_ids=['stock_sh_600000'], limit=1); assert df is not None and len(df) > 0, 'No kdata found'"` → on_fail: Run recorder first: python3 -m zvt.recorders.em.em_stock_kdata_recorder --entity_ids stock_sh_600000 (replace with your target entity IDs)
- **`PC-03`**: `python3 -c "import os; from pathlib import Path; zvt_home = Path(os.environ.get('ZVT_HOME', Path.home() / '.zvt')); assert zvt_home.exists(), f'ZVT home not found: {zvt_home}'"` → on_fail: Run: python3 -m zvt.init_dirs
- **`PC-04`**: `python3 -c "import os, tempfile; from pathlib import Path; zvt_home = Path(os.environ.get('ZVT_HOME', Path.home() / '.zvt')); test_f = zvt_home / '.write_test'; test_f.touch(); test_f.unlink()"` → on_fail: Check directory permissions: chmod u+w ~/.zvt or set ZVT_HOME environment variable to a writable location
FILE:references/USE_CASES.md
# Known Use Cases (KUC)
Total: **14**
## `KUC-101`
**Source**: `finrobot_equity/core/src/Run.ipynb`
Investors need comprehensive equity research reports that combine financial statement analysis, peer comparisons, and recent news to make informed investment decisions on specific companies.
## `KUC-102`
**Source**: `tutorials_advanced/agent_annual_report.ipynb`
Financial analysts require automated generation of customized financial analysis reports that can interact with clients, gather requirements, and produce professional-grade documents.
## `KUC-103`
**Source**: `tutorials_advanced/agent_fingpt_forecaster.ipynb`
Traders and investors need AI-powered market analysis that combines company profiles, financial data, and news to generate short-term price movement predictions.
## `KUC-104`
**Source**: `tutorials_advanced/agent_openbb.ipynb`
Users need an intelligent agent interface to access OpenBB's comprehensive financial data capabilities including market data, fundamentals, and technical analysis.
## `KUC-105`
**Source**: `tutorials_advanced/agent_trade_strategist.ipynb`
Algorithmic traders need automated assistance to develop, code, and backtest custom trading strategies using the BackTrader framework with logging for further analysis.
## `KUC-106`
**Source**: `tutorials_advanced/lmm_agent_mplfinance.ipynb`
Analysts need vision-enabled AI agents that can analyze financial charts and market visualizations alongside textual market news for comprehensive analysis.
## `KUC-107`
**Source**: `tutorials_advanced/lmm_agent_opt_smacross.ipynb`
Quantitative traders need AI-assisted optimization of Simple Moving Average crossover strategies by visually inspecting charts and iteratively refining parameters.
## `KUC-108`
**Source**: `tutorials_beginner/agent_annual_report.ipynb`
Beginners need a simple way to generate formatted PDF annual reports from SEC 10-K filings with appropriate length and professional presentation.
## `KUC-109`
**Source**: `tutorials_beginner/agent_fingpt_forecaster.ipynb`
Beginners need easy stock price movement predictions based on company news and available financial information for basic investment decision support.
## `KUC-110`
**Source**: `tutorials_beginner/agent_rag_earnings_call_sec_filings.ipynb`
Analysts need to query and analyze large collections of earnings call transcripts and SEC filings using retrieval augmented generation for insights.
## `KUC-111`
**Source**: `tutorials_beginner/agent_rag_qa.ipynb`
Users need to ask questions and get answers from annual report documents using RAG technology to quickly extract specific financial information.
## `KUC-112`
**Source**: `tutorials_beginner/agent_rag_qa_up.ipynb`
Analysts need to query across multiple financial document sources including earnings calls and SEC filings simultaneously to get comprehensive answers.
## `KUC-113`
**Source**: `tutorials_beginner/ollama function call.ipynb`
Users with privacy requirements or local infrastructure need to use local LLMs (Ollama) to call financial data functions for stock information retrieval.
## `KUC-114`
**Source**: `tutorials_beginner/ollama stock chart.ipynb`
Users need to generate stock price charts using local LLM infrastructure with yfinance data, avoiding cloud dependencies for visualization.
FILE:references/WISDOM.md
# Cross-Project Wisdom
Total: **8**
## `CW-MACRO-DATA-001` — Temporal Ordering Enforcement
**From**: finance-bp-080--FinDKG, finance-bp-083--Economic-Dashboard · **Applicable to**: macro-data
Across temporal knowledge graphs and financial time series, strict temporal ordering must be enforced in train/val/test splits and data loading. Train edges must occur strictly before validation edges, which must occur strictly before test edges. DataLoaders must never shuffle temporal data. Apply this pattern whenever implementing any time-series ML pipeline to prevent look-ahead bias that inflates evaluation metrics.
## `CW-MACRO-DATA-002` — Regulatory Formula Compliance
**From**: finance-bp-077--Open_Source_Economic_Model, finance-bp-105--open-climate-investing · **Applicable to**: macro-data
When implementing financial calculations subject to regulatory oversight (EIOPA Solvency II, CAPM, Fama-French), use exact formula specifications from authoritative sources. The Smith-Wilson convergence point must follow EIOPA paragraph 120, factor regressions must use excess returns with properly scaled inputs. Apply this pattern when calculations will be used for regulatory reporting or investment decision-making.
## `CW-MACRO-DATA-003` — Strict Data Schema Enforcement
**From**: finance-bp-083--Economic-Dashboard, finance-bp-077--Open_Source_Economic_Model · **Applicable to**: macro-data
Financial data pipelines require strict schema validation at ingestion points. OHLCV requires specific columns, CSV imports require exact column names matching field access, INI files require specific sections. Missing or malformed schema elements should fail loudly rather than produce silent corruption. Apply this pattern during data import to catch errors early before downstream calculations use bad data.
## `CW-MACRO-DATA-004` — Composite Primary Key Uniqueness
**From**: finance-bp-105--open-climate-investing, finance-bp-080--FinDKG, finance-bp-083--Economic-Dashboard · **Applicable to**: macro-data
Time-series financial databases require composite primary keys (ticker, date) to ensure uniqueness and enable efficient querying. Inconsistent primary keys across tables break JOIN operations essential for feature merging. Apply this pattern when designing any financial database schema involving time-series measurements with multiple entities.
## `CW-MACRO-DATA-005` — External API Rate Limiting
**From**: finance-bp-074--FinRobot · **Applicable to**: macro-data
When accessing external financial APIs (SEC EDGAR, data vendors), strict rate limiting must be implemented before deployment. SEC EDGAR enforces 10 requests per second with IP blocking consequences. Use decorators and proper User-Agent headers. Apply this pattern when integrating any external financial data API to prevent service disruption that blocks critical data access.
## `CW-MACRO-DATA-006` — Graph Attribute Propagation in Batching
**From**: finance-bp-080--FinDKG, finance-bp-105--open-climate-investing · **Applicable to**: macro-data
When creating subgraph variants during batch collation in graph-based ML, all metadata attributes (num_nodes, num_relations, time_interval) must be explicitly propagated to each subgraph. Downstream model components expect these attributes on all graph objects. Apply this pattern whenever implementing custom collate functions for graph neural networks to prevent training failures.
## `CW-MACRO-DATA-007` — Statistical Validity Thresholds
**From**: finance-bp-105--open-climate-investing, finance-bp-083--Economic-Dashboard · **Applicable to**: macro-data
Factor regressions and statistical calculations require minimum observation counts (typically 20+) for reliable inference. Inner joins, winsorization, and date filtering reduce observations; pipeline validation must check for sufficient data points before regression. Apply this pattern whenever computing regression statistics to ensure results are meaningful rather than spurious.
## `CW-MACRO-DATA-008` — Data Type Strictness for ML Operations
**From**: finance-bp-080--FinDKG, finance-bp-077--Open_Source_Economic_Model · **Applicable to**: macro-data
Graph operations and time calculations require strict dtype consistency (float32 for time values, integer for node types, boolean for masks). Type mismatches cause silent failures in edge_subgraph, degree calculations, and time interval transformations. Apply this pattern when preparing data for graph neural networks or any numerical ML pipeline to catch dtype issues early.
FILE:references/components/equity_research_report_generation.md
# equity_research_report_generation (6 classes)
## `EnhancedTextGenerator.generate`
`equity_research_report_generation/enhancedtextgenerator-generate.py:0`
## `EnhancedChartGenerator.generate_revenue_chart`
`equity_research_report_generation/enhancedchartgenerator-generate-revenue-.py:0`
## `EquityResearchAgentManager.generate_all_sections`
`equity_research_report_generation/equityresearchagentmanager-generate-all-.py:0`
## `HTMLRenderer.render`
`equity_research_report_generation/htmlrenderer-render.py:0`
## `output_format`
`equity_research_report_generation/output-format.py:0`
## `text_generation`
`equity_research_report_generation/text-generation.py:0`
FILE:references/components/financial_data_collection.md
# financial_data_collection (6 classes)
## `YFinanceUtils.get_stock_price`
`financial_data_collection/yfinanceutils-get-stock-price.py:0`
## `FMPUtils.get_financial_metrics`
`financial_data_collection/fmputils-get-financial-metrics.py:0`
## `SECUtils.extract_filing_section`
`financial_data_collection/secutils-extract-filing-section.py:0`
## `SECExtractor.parse_document`
`financial_data_collection/secextractor-parse-document.py:0`
## `data_source`
`financial_data_collection/data-source.py:0`
## `filing_parser`
`financial_data_collection/filing-parser.py:0`
FILE:references/components/financial_statement_analysis.md
# financial_statement_analysis (3 classes)
## `ReportAnalysisUtils.generate_analysis`
`financial_statement_analysis/reportanalysisutils-generate-analysis.py:0`
## `FinancialDataProcessor.extract_metrics`
`financial_statement_analysis/financialdataprocessor-extract-metrics.py:0`
## `metric_extractor`
`financial_statement_analysis/metric-extractor.py:0`
FILE:references/components/market_catalyst_analysis.md
# market_catalyst_analysis (2 classes)
## `CatalystAnalyzer.analyze`
`market_catalyst_analysis/catalystanalyzer-analyze.py:0`
## `classifier`
`market_catalyst_analysis/classifier.py:0`
FILE:references/components/multi-agent_workflow_orchestration.md
# multi-agent_workflow_orchestration (6 classes)
## `FinRobot.chat`
`multi-agent_workflow_orchestration/finrobot-chat.py:0`
## `SingleAssistant.chat`
`multi-agent_workflow_orchestration/singleassistant-chat.py:0`
## `MultiAssistantWithLeader.chat`
`multi-agent_workflow_orchestration/multiassistantwithleader-chat.py:0`
## `MultiAssistant.chat`
`multi-agent_workflow_orchestration/multiassistant-chat.py:0`
## `workflow_pattern`
`multi-agent_workflow_orchestration/workflow-pattern.py:0`
## `speaker_selection`
`multi-agent_workflow_orchestration/speaker-selection.py:0`
FILE:references/components/quantitative_analysis_-_backtesting.md
# quantitative_analysis_&_backtesting (4 classes)
## `BackTraderUtils.run`
`quantitative_analysis_&_backtesting/backtraderutils-run.py:0`
## `DeployedCapitalAnalyzer.analyze`
`quantitative_analysis_&_backtesting/deployedcapitalanalyzer-analyze.py:0`
## `strategy`
`quantitative_analysis_&_backtesting/strategy.py:0`
## `sizer`
`quantitative_analysis_&_backtesting/sizer.py:0`
FILE:references/components/rag-enhanced_document_retrieval.md
# rag-enhanced_document_retrieval (2 classes)
## `RetrieveUserProxyAgent.retrieve`
`rag-enhanced_document_retrieval/retrieveuserproxyagent-retrieve.py:0`
## `vector_store`
`rag-enhanced_document_retrieval/vector-store.py:0`
FILE:references/components/sensitivity_analysis.md
# sensitivity_analysis (2 classes)
## `SensitivityAnalyzer.analyze`
`sensitivity_analysis/sensitivityanalyzer-analyze.py:0`
## `sensitivity_dimension`
`sensitivity_analysis/sensitivity-dimension.py:0`
FILE:references/components/valuation_analysis.md
# valuation_analysis (2 classes)
## `ValuationEngine.calculate`
`valuation_analysis/valuationengine-calculate.py:0`
## `valuation_method`
`valuation_analysis/valuation-method.py:0`
FILE:references/components/web_application_-_user_interface.md
# web_application_&_user_interface (5 classes)
## `FastAPI endpoints`
`web_application_&_user_interface/fastapi-endpoints.py:0`
## `User.create`
`web_application_&_user_interface/user-create.py:0`
## `ReportRequest.track`
`web_application_&_user_interface/reportrequest-track.py:0`
## `auth_provider`
`web_application_&_user_interface/auth-provider.py:0`
## `session_store`
`web_application_&_user_interface/session-store.py:0`
Use ensemble deep reinforcement learning (A2C, DDPG, PPO, TD3, SAC) to execute automated multi-market stock trading with
---
name: finrl-rl-trading
description: |-
Use ensemble deep reinforcement learning (A2C, DDPG, PPO, TD3, SAC) to execute automated multi-market stock trading with
license: Proprietary. See LICENSE.txt in project root.
compatibility: Designed for Doramagic-host ecosystem (Claude Code / openclaw / Cursor). Requires Python 3.12+ with uv package manager.
metadata:
version: "v6.1"
blueprint_id: "finance-bp-061"
compiled_at: "2026-04-22T13:00:18.884984+00:00"
capability_markets: "multi-market"
capability_activities: "backtesting, factor-research"
sop_version: "crystal-compilation-v6.1"
---
# FinRL 强化学习交易 (finrl-rl-trading)
> Use ensemble deep reinforcement learning (A2C, DDPG, PPO, TD3, SAC) to execute automated multi-market stock tr。
## Pipeline
`data_collection -> data_storage -> factor_computation -> target_selection -> trading_execution -> visualization`
## Top Use Cases (14 total)
### Ensemble Stock Trading ICAIF 2020 (`UC-101`)
Executing automated stock trading using an ensemble of multiple DRL agents (A2C, DDPG, PPO, TD3, SAC) to reduce individual agent weakness and improve
**Triggers**: ensemble trading, multiple agents, stock trading
### NeurIPS 2018 DRL Training (`UC-107`)
Training deep reinforcement learning agents (A2C, DDPG, PPO, SAC, TD3) for automated stock trading using the StockTradingEnv environment
**Triggers**: DRL training, stock trading, A2C
### NeurIPS 2018 Ensemble Backtesting (`UC-108`)
Backtesting multiple trained DRL agents against baseline strategies (MVO, DJIA) to evaluate and compare ensemble trading performance
**Triggers**: backtesting, ensemble, DRL agents
For all **14** use cases, see [references/USE_CASES.md](references/USE_CASES.md).
**Execute trigger**: `When user intent matches intent_router.uc_entries[].positive_terms AND user uses action verb (run/execute/跑/执行/backtest/fetch/collect)`
## What I'll Ask You
- Target market: A-share (default), HK, or crypto? (US stocks in ZVT are half-baked — stockus_nasdaq_AAPL exists but coverage is thin)
- Data source / provider: eastmoney (free, no account), joinquant (account+paid), baostock (free, good history), akshare, or qmt (broker)?
- Strategy type: MACD golden-cross, MA crossover, volume breakout, fundamental screen, or custom factor?
- Time range: start_timestamp and end_timestamp for backtest period
- Target entity IDs: specific stocks (stock_sh_600000) or index components (SZ1000)?
## Semantic Locks (Fatal)
| ID | Rule | On Violation |
|---|---|---|
| `SL-01` | Execute sell orders before buy orders in every trading cycle | halt |
| `SL-02` | Trading signals MUST use next-bar execution (no look-ahead) | halt |
| `SL-03` | Entity IDs MUST follow format entity_type_exchange_code | halt |
| `SL-04` | DataFrame index MUST be MultiIndex (entity_id, timestamp) | halt |
| `SL-05` | TradingSignal MUST have EXACTLY ONE of: position_pct, order_money, order_amount | halt |
| `SL-06` | filter_result column semantics: True=BUY, False=SELL, None/NaN=NO ACTION | halt |
| `SL-07` | Transformer MUST run BEFORE Accumulator in factor pipeline | halt |
| `SL-08` | MACD parameters locked: fast=12, slow=26, signal=9 | halt |
Full lock definitions: [references/LOCKS.md](references/LOCKS.md)
## Top Anti-Patterns (25 total)
- **`AP-ZVT-183`**: 除权因子为 inf/NaN 时直接参与乘法导致复权静默失败
- **`AP-ZVT-179`**: 第三方数据接口超限后异常被吞噬,数据静默缺失
- **`AP-ZVT-183B`**: HFQ(后复权)与 QFQ(前复权)K 线表使用错误导致因子计算漂移
All 25 anti-patterns: [references/ANTI_PATTERNS.md](references/ANTI_PATTERNS.md)
## Evidence Quality Notice
> [QUALITY NOTICE] This crystal was compiled from blueprint finance-bp-061. Evidence verify ratio = 18.9% and audit fail total = 32. Generated results may have uncaptured requirement gaps. Verify critical decisions against source files (LATEST.yaml / LATEST.jsonl).
## Reference Files
| File | Contents | When to Load |
|---|---|---|
| [references/seed.yaml](references/seed.yaml) | V6+ 全量权威 (source-of-truth) | 有行为/决策争议时必读 |
| [references/ANTI_PATTERNS.md](references/ANTI_PATTERNS.md) | 25 条跨项目反模式 | 开始实现前 |
| [references/WISDOM.md](references/WISDOM.md) | 跨项目精华借鉴 | 架构决策时 |
| [references/CONSTRAINTS.md](references/CONSTRAINTS.md) | domain + fatal 约束 | 规则冲突时 |
| [references/USE_CASES.md](references/USE_CASES.md) | 全量 KUC-* 业务场景 | 需要完整示例时 |
| [references/LOCKS.md](references/LOCKS.md) | SL-* + preconditions + hints | 生成回测/交易代码前 |
| [references/COMPONENTS.md](references/COMPONENTS.md) | AST 组件地图(按 module 拆分)| 查 API 时 |
---
*Compiled by Doramagic crystal-compilation-v6.1 from `finance-bp-061` blueprint at 2026-04-22T13:00:18.884984+00:00.*
*See [human_summary.md](human_summary.md) for non-technical overview.*
FILE:human_summary.md
# Human Summary
> I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow. Just tell me what you want; I'll write the code, you don't have to dig docs. (Heads up: ZVT natively supports A-share, HK, and crypto. US stocks — stockus_nasdaq_AAPL — are half-baked; don't bother for serious work.)
## What I Can Do
- **tagline**: I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow. Just tell me what you want; I'll write the code, you don't have to dig docs. (Heads up: ZVT natively supports A-share, HK, and crypto. US stocks — stockus_nasdaq_AAPL — are half-baked; don't bother for serious work.)
- **use_cases**: ['Paper Trading with Alpaca API', 'Graph Portfolio Manager with GNN', 'Ensemble Stock Trading ICAIF 2020', 'A-share MACD daily golden-cross backtest with hfq price adjustment from eastmoney', 'End-to-end ZVT pipeline: FinanceRecorder + GoodCompanyFactor + StockTrader', 'Multi-factor strategy with TargetSelector (AND mode) combining MACD + volume breakout', 'Index composition data collection (SZ1000, SZ2000) with EM recorder']
## What I Ask You
- Target market: A-share (default), HK, or crypto? (US stocks in ZVT are half-baked — stockus_nasdaq_AAPL exists but coverage is thin)
- Data source / provider: eastmoney (free, no account), joinquant (account+paid), baostock (free, good history), akshare, or qmt (broker)?
- Strategy type: MACD golden-cross, MA crossover, volume breakout, fundamental screen, or custom factor?
- Time range: start_timestamp and end_timestamp for backtest period
- Target entity IDs: specific stocks (stock_sh_600000) or index components (SZ1000)?
FILE:references/ANTI_PATTERNS.md
# Anti-Patterns (Cross-Project)
Total: **25**
## qlib (9)
### `AP-QLIB-1930` — 回测结果与模型无关——共享 dataset 对象导致预测值被首次模型覆盖 <sub>(high)</sub>
Qlib 中多个模型复用同一个已 fit 的 DatasetH 实例时,dataset 内部的标准化 参数(fit_start_time/fit_end_time 决定的归一化统计量)在第一次 fit 后固化。 切换模型但不重新初始化 dataset,导致所有模型实际使用同一套预测信号。表现为 无论换 LightGBM/XGBoost/DNN,回测净值曲线完全一致。这是最危险的"实验看起来 在跑,但结论全部无效"反模式。
Source: https://github.com/microsoft/qlib/issues/1930
### `AP-QLIB-2090` — fit_start_time 与 train segment 双重配置引发隐式数据泄露 <sub>(high)</sub>
Qlib DatasetH 有两个"训练数据范围":handler 的 fit_start_time/fit_end_time (决定归一化器拟合范围)和 segments.train(决定模型训练范围)。常见错误是 让 fit_end_time 覆盖 valid/test 段,使归一化统计量(均值、标准差)包含了 未来数据,造成前向偏差(look-ahead bias)。两者独立配置但语义耦合,文档 未明确说明 fit_end_time 必须 <= train_end。
Source: https://github.com/microsoft/qlib/issues/2090
### `AP-QLIB-2036` — MACD 因子公式文档错误——DEA 被多除一次 CLOSE 导致量纲不一致 <sub>(high)</sub>
Qlib 官方文档中的 Alpha 公式示例将 MACD 的 DEA 定义为 EMA(DIF, 9) / CLOSE, 但 DIF 已经是无量纲(除过 CLOSE 的),再次除以 CLOSE 导致 DEA 量纲为 1/price。 基于此文档公式构建的 MACD 因子在截面标准化后与正确公式差异显著,IC 下降。 此类文档层面的公式错误会被大量用户直接照搬入生产因子库。
Source: https://github.com/microsoft/qlib/issues/2036
### `AP-QLIB-2184` — 自定义 A 股数据导入前未按约定填充停牌日 NaN,引发下游因子噪声 <sub>(high)</sub>
Qlib 约定停牌日 open/close/high/low/volume/factor 字段均应填 NaN,以便框架 在因子计算时识别并跳过。用户自建 A 股数据集时若将停牌日保留为上一日价格 (常见于从东财/Wind 直接导出的数据),会导致停牌期间的价格动量因子出现 "假信号"(价格不变但因子非零)。Qlib 不校验此约定,错误静默流入训练数据。
Source: https://github.com/microsoft/qlib/issues/2184
### `AP-QLIB-1892` — PIT(Point-In-Time)财务数据收集器依赖外部股票列表接口,全量 A 股获取不完整 <sub>(high)</sub>
Qlib 的 PIT 数据收集器(财务数据时间点快照)在初始化时调用 get_hs_stock_symbols() 获取沪深股票列表。该函数依赖东财 API,经常仅返回 部分列表而非全量 5000+ 股票,且函数在获取不完整时直接 raise ValueError。 用户若按文档步骤操作,财务数据集将只覆盖部分股票,基于 PIT 财务因子的回测 存在严重生存者偏差(未被采集的股票被隐式排除)。
Source: https://github.com/microsoft/qlib/issues/1892
### `AP-QLIB-2097` — 全市场 instrument="all" 在 32GB 内存机器上 OOM,但 CSI300 正常 <sub>(medium)</sub>
Qlib 在加载 Alpha158 特征时会将指定 universe 的全部特征矩阵一次性载入内存。 使用 instrument="csi300"(300 股)与 instrument="all"(5000+ 股)的内存占用 差约 16 倍。32GB 机器跑全市场时在 init_instance_by_config 阶段直接 OOM, 错误信息不提示内存问题。用户容易误以为是配置错误,实际上需要分批加载或 使用流式特征计算。
Source: https://github.com/microsoft/qlib/issues/2097
### `AP-QLIB-1984` — LightGBM 模型标签维度校验逻辑永远不触发导致多标签训练静默失败 <sub>(medium)</sub>
Qlib gbdt.py 中用 y.values.ndim == 2 判断是否为多标签,但从 DataFrame 取出的 Series 的 ndim 永远为 1,条件永远为 False,因此多标签训练不会走 squeeze 分支,而是直接进入 LightGBM 训练并在更深处抛出语义不明的错误。 用户尝试自定义多标签任务时无法从错误信息定位到此根因。
Source: https://github.com/microsoft/qlib/issues/1984
### `AP-QLIB-1915` — 自定义 CSV 数据 dump_bin 后 DataHandler 报 Length mismatch,D.features 却正常 <sub>(high)</sub>
Qlib 存在两套数据访问路径:D.features(直接读 binary)和 DataHandler/DataHandlerLP (带 processor pipeline)。自定义 A 股 CSV 数据在 dump_bin 时若字段顺序 或 symbol 格式(如 600000.SH vs SH600000)与 Qlib 约定不符,DataHandler 的 processor 在 align/reindex 时触发 Length mismatch,而 D.features 因不 经过 processor 而成功。这一"两套路径行为不一致"让用户误以为数据已正确导入。
Source: https://github.com/microsoft/qlib/issues/1915
### `AP-QLIB-1949` — Colab/Linux 多进程后端与 Qlib ParallelExt 冲突导致 DataHandler 完全不可用 <sub>(medium)</sub>
Qlib 在非 fork 环境(Windows 或 Google Colab)中,DataHandler 使用 joblib 并行加载特征时,ParallelExt 初始化时访问 _backend_args 属性失败(AttributeError)。 根因是 joblib 1.5+ 移除了该内部属性,Qlib 的兼容层未更新。表现为 D.features 调用抛出多层嵌套异常,用户无法从错误栈判断是并行后端问题还是数据问题。
Source: https://github.com/microsoft/qlib/issues/1949
## vnpy (4)
### `AP-VNPY-3691` — K 线生成器首根 K 线时间戳不对齐,导致第一个周期信号错误 <sub>(high)</sub>
vnpy BarGenerator 在合成 N 分钟 K 线时,第一根推送的 K 线时间戳为"当前 tick 所在分钟"而非"完整 N 分钟周期结束时间"。具体表现:09:59 的 tick 会 触发一根不完整的 5 分钟 K 线推送(本应等到 10:04 才推送)。策略若在 on_bar 中直接用 datetime.minute % 5 过滤,第一根 K 线恰好通过,但包含的 数据不足一个完整周期,用于信号计算会产生错误的开仓信号。
Source: https://github.com/vnpy/vnpy/issues/3691
### `AP-VNPY-3669` — Alpha 模块历史数据增量保存时新旧 DataFrame schema 不兼容导致 SchemaError <sub>(medium)</sub>
vnpy Alpha 模块在保存 K 线数据到 Parquet 文件时,将新下载数据(可能含 Float64 列)与已存文件(历史 Int64 列)直接 polars.concat。polars 强类型 不允许隐式类型提升,抛出 SchemaError。根因是不同数据源/版本返回的字段类型 不一致(如 volume 在部分行情源为整数,在另一些为浮点),且 concat 前无 schema 对齐步骤。影响所有使用 vnpy alpha 进行回测的历史数据构建流程。
Source: https://github.com/vnpy/vnpy/issues/3669
### `AP-VNPY-3685` — 价差交易模块 run_backtesting() 在 Jupyter 环境下静默报错,结果不可信 <sub>(high)</sub>
vnpy 4.10 价差交易(SpreadTrading)模块的 run_backtesting() 在 Jupyter 环境下存在事件循环冲突(asyncio already running),导致回测引擎部分逻辑 不执行但不抛异常,返回看似正常的回测统计数据。同样代码在命令行 Python 中无此问题。vnpy 4.x 将部分 IO 改为 async 但 Jupyter 的事件循环与之不兼容, 是"回测结果看起来正确但实际不完整"的隐蔽陷阱。
Source: https://github.com/vnpy/vnpy/issues/3685
### `AP-VNPY-3700` — 安装脚本不使用 venv 导致全局 numpy 版本被降级破坏其他依赖 <sub>(medium)</sub>
vnpy install.bat 直接在系统/conda base 环境安装,会强制降级 numpy 到 <2.0 以满足 vnpy 依赖,破坏依赖 numpy 2.x 的其他量化工具(如 scipy、pytorch 新版)。 没有 requirements.txt,依赖边界不透明。在多工具共存的量化研究环境中, vnpy 的安装脚本是"全局环境污染"的常见根源。
Source: https://github.com/vnpy/vnpy/issues/3700
## zipline (6)
### `AP-ZIPLINE-138` — 回测价格为未复权价,教程图表误导用户误判策略收益 <sub>(high)</sub>
Zipline 教程使用 AAPL 股价图做演示,但 bundle 中存储的是未复权价格(raw price), 而非经过拆股/分红调整的复权价。图表显示的历史价格与市场实际价约差 4 倍(Apple 历次拆股累计因子),用户误将"价格翻 4 倍"当作策略收益。A 股场景更严重: 除权前后价格跳变会在未复权数据中形成巨大"信号",吸引技术指标在除权日产生 虚假突破信号。
Source: https://github.com/stefan-jansen/zipline-reloaded/issues/138
### `AP-ZIPLINE-235` — 默认以当根 K 线收盘价成交,低估实盘滑点,策略回测收益虚高 <sub>(high)</sub>
Zipline 默认滑点模型在当根 K 线触发信号后,以同根 K 线收盘价成交(current bar close fill)。实盘中信号只能在下一根 K 线的开盘价附近成交(T+1 order execution)。以 A 股日线为例,用收盘价回测比用次日开盘价成交平均高估日收益 约 0.1-0.3%,年化差距可超 30%。需显式配置 slippage model 为 VolumeShareSlippage 或 FixedSlippage 并设合理 volume_limit。
Source: https://github.com/stefan-jansen/zipline-reloaded/issues/235
### `AP-ZIPLINE-190` — 日历 start_session 设为非交易日触发 DateOutOfBounds,无提示如何修正 <sub>(medium)</sub>
Zipline 在注册 bundle 或运行算法时,若 start_session 参数恰好是非交易日 (如 1998-01-01 元旦),Calendar 校验抛出 DateOutOfBounds("cannot be earlier than the first session")。错误信息仅显示交易日历起始日,不提示"请改为第一个 交易日"。A 股场景:使用 SSE/SZSE 日历时,若 start_date 恰好是春节前最后 一天次日(节假日),会触发同类错误,调试成本极高。
Source: https://github.com/stefan-jansen/zipline-reloaded/issues/190
### `AP-ZIPLINE-181` — asset db 过期后 Pipeline 报"no assets traded",误导用户排查数据范围 <sub>(high)</sub>
Zipline 的 asset database(SQLite)记录每只股票的 start/end 交易日期。若 使用了旧版 Quandl/自建 bundle 且未重新 ingest,在回测新日期范围时 Pipeline 抛出 "Failed to find any assets with country_code 'US' that traded between [dates]"。A 股场景:重新下载行情后若只更新价格数据而未重建 asset db,退市/ 新上市股票的日期范围不更新,Pipeline 过滤会悄悄排除这些股票,产生生存者偏差。
Source: https://github.com/stefan-jansen/zipline-reloaded/issues/181
### `AP-ZIPLINE-285` — week_start()/week_end() 在自定义日历(非美股)下静默失效 <sub>(medium)</sub>
Zipline schedule_function 的 date_rules.week_start() 和 date_rules.week_end() 依赖交易日历的周首/周末判断逻辑,但在非美股日历(如 ASX、SSE)中,该逻辑 与 NYSE 日历的偏移计算不兼容,导致 schedule 永远不触发或在错误的日期触发。 A 股场景:使用 SSE 日历时,含春节等连续长假的周,week_start 可能跳过整个 假期周而不调仓,但用户无法从日志发现未触发的调度。
Source: https://github.com/stefan-jansen/zipline-reloaded/issues/285
### `AP-ZIPLINE-240` — 回测日期时区必须为 UTC,传入 naive datetime 引发深层 AssertionError <sub>(medium)</sub>
Zipline 内部强制要求所有时间戳为 UTC aware datetime。当用户传入 naive datetime (无时区信息,如 pd.Timestamp('2020-01-01'))时,不在入口处报错,而是在 算法执行深处触发 AssertionError: Algorithm should have a utc datetime,栈深 难以定位。A 股开发者从本地 CST 时间导入数据时极易触发此陷阱,需在 bundle 注册时显式 tz_localize('UTC')。
Source: https://github.com/stefan-jansen/zipline-reloaded/issues/240
## zvt (6)
### `AP-ZVT-183` — 除权因子为 inf/NaN 时直接参与乘法导致复权静默失败 <sub>(high)</sub>
ZVT 在计算前复权因子时以 new/old 价格比计算 qfq_factor。当 old==0(新股首日 或数据缺失)时因子为 inf;当 kdata.open 本身为 None(停牌日未填充)时乘法 抛出 TypeError。结果:整个 entity 的复权计算中断,后续 K 线全部丢失,但主 流程只 log ERROR 不中断,用户往往不知道已有大量股票数据损坏。
Source: https://github.com/zvtvz/zvt/issues/183
### `AP-ZVT-179` — 第三方数据接口超限后异常被吞噬,数据静默缺失 <sub>(high)</sub>
ZVT 使用聚宽 jqdatasdk 批量拉取全市场 K 线时(4000+ 股票),触发聚宽每日 最大查询条数限制(错误:已超过每日最大查询数量)。ZVT 捕获异常后继续执行下一 entity,导致超限后所有股票的当日数据均静默缺失。回测若使用该残缺数据库,因 子计算结果将产生系统性偏差,且无告警。
Source: https://github.com/zvtvz/zvt/issues/179
### `AP-ZVT-161` — 全市场 SQLite 批量因子计算触发 too many SQL variables 错误 <sub>(medium)</sub>
ZVT 在计算 VolumeUpMaFactor 等多股因子时,将所有 entity_id 拼入单条 SQL 的 IN 子句。当 A 股全市场(5000+ 股)一次性查询时,触发 SQLite 默认限制 SQLITE_MAX_VARIABLE_NUMBER=999。调大 max_allowed_packet(MySQL 参数)无效, 根因是 SQLite 变量数上限。正确解法是分批查询,但 ZVT 早期版本未处理此边界。
Source: https://github.com/zvtvz/zvt/issues/161
### `AP-ZVT-129` — 使用通配符导入隐藏 API 版本变更,AdjustType 等枚举莫名消失 <sub>(medium)</sub>
ZVT 文档示例使用 `from zvt import *` 导入所有符号。当 ZVT 版本升级重构 枚举(如将 AdjustType 移入子模块)后,通配符导入不再包含该符号,触发 AttributeError。使用者误以为是安装问题,实际是版本间 API breaking change 未在 CHANGELOG 中标注,且通配符导入掩盖了具体来源。应显式 import 枚举类。
Source: https://github.com/zvtvz/zvt/issues/129
### `AP-ZVT-187` — 回测引擎未在数据层空结果时提前终止,导致空指针级联崩溃 <sub>(medium)</sub>
ZVT Trader 在 load_data 完成后检查数据为空时,不提前退出,而是将空 DataFrame 传入 selector 计算,触发后续 NoneType 操作链式崩溃。错误栈深且难以定位根因, 用户误以为是策略逻辑问题。根因是数据时间窗口配置错误(start/end 不在数据 库覆盖范围内)但无有效校验。
Source: https://github.com/zvtvz/zvt/issues/187
### `AP-ZVT-183B` — HFQ(后复权)与 QFQ(前复权)K 线表使用错误导致因子计算漂移 <sub>(high)</sub>
ZVT 提供 Stock1dKdata(不复权)、Stock1dHfqKdata(后复权)、Stock1dQfqKdata (前复权)三张独立表。用户在计算价格动量/均线因子时混用两张表(如用不复权 做均线,用后复权做收益率),导致除权日前后因子值产生跳变。ZVT 不做跨表 复权类型一致性校验,混用静默通过。
Source: https://github.com/zvtvz/zvt/issues/183
FILE:references/COMPONENTS.md
# Component Capability Map
**Project**: finance-bp-061--FinRL
**Scan date**: 2026-04-22
**Stats**: {'total_files': 9, 'total_classes': 39, 'total_functions': 0, 'total_stages': 9}
## Modules (9)
- [market_data_acquisition](components/market_data_acquisition.md): 4 classes
- [data_cleaning_&_alignment](components/data_cleaning_-_alignment.md): 3 classes
- [technical_indicator_computation](components/technical_indicator_computation.md): 4 classes
- [normalization_&_array_conversion](components/normalization_-_array_conversion.md): 3 classes
- [gym_environment_creation](components/gym_environment_creation.md): 6 classes
- [drl_model_training](components/drl_model_training.md): 5 classes
- [ensemble_validation](components/ensemble_validation.md): 4 classes
- [backtesting_&_paper_trading](components/backtesting_-_paper_trading.md): 5 classes
- [performance_metrics_&_visualization](components/performance_metrics_-_visualization.md): 5 classes
FILE:references/CONSTRAINTS.md
# Constraints
## preservation_manifest
```yaml
required_objects:
business_decisions_count: 134
fatal_constraints_count: 60
non_fatal_constraints_count: 195
use_cases_count: 14
semantic_locks_count: 12
preconditions_count: 4
evidence_quality_rules_count: 2
traceback_scenarios_count: 5
```
## Domain Constraints Injected (39)
- **`SHARED-BT-LAB-001`** <sub>(fatal)</sub>: 未来函数(Lookahead Bias):在模拟历史时间点 t 的交易决策时, 不得使用 t 时刻之后才能知道的信息。最常见形式: (1) 使用收盘价计算信号并同日以收盘价成交; (2) 将 T 日收盘后计算的指标标记在同一根 K 线; (3) 使用当日最高/最低价作为成交假设。 信号计算与成交时间必须对齐:T 日收盘后计算信号,T+1 日开盘成交。
- **`SHARED-BT-LAB-002`** <sub>(high)</sub>: 指标预热期(Warmup Period)处理:滚动窗口指标在前 N 个 bar 时 NaN, 这些 bar 不应参与信号计算和持仓决策。强制要求指标的 warmup_period 与最长 lookback 期等长,且 warmup 期间持仓应置零。
- **`SHARED-BT-LAB-003`** <sub>(fatal)</sub>: ML/DL 模型时序数据划分必须按时间顺序:TRAIN < VALID < TEST, 不可使用随机 k-fold 分折(会将未来数据混入训练集)。 应使用 TimeSeriesSplit 或 Walk-Forward 验证。
- **`SHARED-BT-LAB-004`** <sub>(fatal)</sub>: 开盘价/最高价/最低价成交假设:日线回测中假设每日可以最高价卖出或 最低价买入(如动量策略"最高价止盈"),这是明显的 lookahead, 因为日内最高/最低价只有收盘后才能确认。成交价只能用开盘价或 前一日收盘价(带滑点)。
- **`SHARED-BT-LAB-005`** <sub>(high)</sub>: 数据对齐偏移(Off-by-one):pandas rolling/shift 等操作容易引入细微的 1 期偏移错误。应在代码中明确记录每个序列的"观测时间点", 并通过 assert 验证关键时间对齐关系。
- **`SHARED-BT-LAB-006`** <sub>(high)</sub>: 过度优化(Overfitting):回测数量越多,过拟合概率越高。 Bailey et al.(2014)证明 Optimal Sharpe Ratio 期望值随回测次数单调递减。 应使用 Walk-Forward 验证代替 in-sample 参数穷举,并报告 Deflated Sharpe Ratio(DSR)而非峰值 Sharpe。
- **`SHARED-BT-SURV-001`** <sub>(fatal)</sub>: 幸存者偏差(Survivorship Bias):使用当前市场成分股作为历史回测股票池, 会遗漏曾经存在但后来退市、摘牌或被合并的股票,系统性高估策略历史收益率。 回测股票池必须使用历史时点快照(point-in-time universe)。
- **`SHARED-BT-SURV-002`** <sub>(high)</sub>: In-Sample / Out-of-Sample 划分:策略开发、参数选择必须在样本内完成, 样本外数据仅用于最终验证,不可多次"看"样本外数据后继续调优 (会将样本外变为新的样本内,重蹈过拟合)。
- **`SHARED-BT-SURV-003`** <sub>(high)</sub>: 停牌/缺失数据的填充策略:停牌日价格不可简单用前一日收盘价 forward-fill, 因为这会在复盘时造成"零成交量"日参与了因子计算和信号生成。 应在因子计算层显式过滤缺失交易日,不填充。
- **`SHARED-BT-SURV-004`** <sub>(high)</sub>: 异常值(Extreme Value)污染:原始市场数据可能含有数据源错误(如除权未 及时调整、手工录入错误导致的极端价格),不清洗直接进入因子计算会产生 极端信号,污染整个横截面。应在 pipeline 入口处过滤 3-sigma 异常值。
- **`SHARED-BT-COST-001`** <sub>(fatal)</sub>: 交易成本(佣金 + 印花税/转让税 + 过户费)必须在回测初始化时强制配置, 不可使用零成本默认值。忽略成本的回测策略绩效指标具有欺骗性, 高换手率策略尤其严重(单边往返成本往往吞噬 50%+ 的毛收益)。
- **`SHARED-BT-COST-002`** <sub>(high)</sub>: 滑点(Slippage)建模:回测若无滑点,假设每笔订单以理想价格成交, 高频策略在实盘中会因成交价劣化而产生严重亏损。至少应配置固定点差 或比例滑点;大单应使用成交量比例模型(如不超过日成交量 5%)。
- **`SHARED-BT-COST-003`** <sub>(high)</sub>: 换手率(Turnover)必须在回测绩效报告中展示并与成本关联分析。 月换手率超过 50%(年化 600%+)时,策略净收益对成本假设极度敏感, 每 10bps 成本变化可能改变策略盈亏结论,必须做成本敏感性分析。
- **`SHARED-BT-COST-004`** <sub>(medium)</sub>: 仓位规模化(Position Sizing)必须纳入资金量约束:回测应模拟固定资金量 下的实际持仓股数(取整),而非假设可以持有小数股。 对小盘股,最小交易单位(A股:100股/手)会导致实际可持仓量与目标权重 产生偏差,应在回测中模拟取整效应。
- **`SHARED-BT-TIME-001`** <sub>(high)</sub>: 时间戳时区统一:多数据源合并时,UTC vs 本地时间混用是常见数据腐败源。 所有时间戳必须在 pipeline 入口处统一转换为同一时区(推荐 UTC 存储, 市场本地时区展示),不可在 pipeline 中途混用不同时区。
- **`SHARED-BT-TIME-002`** <sub>(high)</sub>: 交易日历对齐:合并不同市场或不同频率数据时(如日线价格 + 周频因子), 必须使用明确的交易日历进行 reindex/merge,不可使用 outer join 后 fillna, 否则会在非交易日(节假日)创建虚假数据行。
- **`SHARED-BT-TIME-003`** <sub>(high)</sub>: 增量更新边界校验:历史数据增量更新时,必须从数据库查询已存最新日期, 仅下载该日期之后的数据。若重新下载已有数据并追加,会产生时间戳重复行, 导致回测时序错误。更新前后必须校验无重复 (index.duplicated().any() == False)。
- **`SHARED-BT-TIME-004`** <sub>(medium)</sub>: 回测绩效归因失真:基准(Benchmark)选择不当会使 Alpha/Beta 计算失真。 应选用策略实际可投资的被动基准(如 HS300 ETF),而非不可直接投资的 价格指数(如 HS300 指数)。价格指数不含股息再投资,会低估持仓基准收益。
- **`SHARED-BT-PERF-001`** <sub>(medium)</sub>: 最大回撤(Max Drawdown)计算必须使用净值序列(portfolio value), 不可用累计收益率序列代替。若使用对数收益率累加,会低估回撤深度 (因对数收益率在下跌时会比简单收益率偏小)。
- **`SHARED-BT-PERF-002`** <sub>(medium)</sub>: Sharpe Ratio 年化化约定:年化 Sharpe = 日 Sharpe × sqrt(252)(股票,252 交易日) 或 × sqrt(365)(加密货币,365日)。不同系统默认不同,跨系统对比前必须 确认年化因子,否则 Sharpe 不可比。
- **`SHARED-BT-PERF-003`** <sub>(medium)</sub>: Calmar Ratio / Sortino Ratio 优于 Sharpe Ratio 作为风险调整收益指标: Sharpe 假设收益正态分布,A 股/加密市场的收益分布显著左偏(肥尾), 会低估下行风险。量化评估应同时报告 Sortino(仅下行波动)和 Calmar(年化收益/最大回撤),不应单一依赖 Sharpe。
- **`SHARED-BT-PERF-004`** <sub>(medium)</sub>: 回测绩效归因应拆解为:alpha(主动收益)、beta(市场收益)、 因子暴露收益(style/sector)和特异性收益(stock selection)。 不做归因的回测无法区分"策略优秀"与"顺风行情恰好 beta 对了"。
- **`SHARED-FR-IC-001`** <sub>(high)</sub>: IC(信息系数)是衡量因子预测能力的核心指标,定义为因子值与 下期收益率的 Spearman 秩相关系数(ICIR = IC / std(IC))。 IC 绝对值 > 0.05 视为有预测能力的初步证据,ICIR > 0.5 视为稳定。 不计算 IC 直接报告回测绩效是因子有效性证明缺失的典型问题。
- **`SHARED-FR-IC-002`** <sub>(high)</sub>: IC 衰减(IC Decay)分析:因子预测能力通常随持仓期增长而衰减。 应计算 1/5/10/20 日 IC 序列,识别因子的最优持仓期。 IC 在1日高但20日迅速衰减的因子是短期因子,不适合月度换仓策略; 反之亦然。使用错误的持仓期会严重损害因子实盘表现。
- **`SHARED-FR-IC-003`** <sub>(high)</sub>: Harvey, Liu & Zhu (2016) 警告:学术界已发现 300+ 个"显著"因子, 其中大量是多重检验下的误发现(False Discovery)。因子有效性要求: t-stat > 3.0(而非传统的 1.96);或在不同时段/市场独立复现; 或有清晰的经济学逻辑。不满足上述条件的因子极可能是数据挖掘产物。
- **`SHARED-FR-IC-004`** <sub>(high)</sub>: 因子换手率(Factor Turnover)控制:高 IC 但高换手率的因子,在扣除 交易成本后净 IC 可能为负。应计算换手率调整后的有效 IC: net_IC = IC - turnover × cost_per_turn。目标换手率 ≤ 50%(月频)。
- **`SHARED-FR-IC-005`** <sub>(medium)</sub>: 因子衰减期(Half-life)是因子信号强度的核心参数,直接决定最优再平衡频率。 半衰期 < 5 日:日频或周频换仓;5-20 日:周频或双周;> 20 日:月频换仓。 错误地对短期因子使用月频换仓,会导致大量 alpha 在持仓期内消散。
- **`SHARED-FR-NEUT-001`** <sub>(high)</sub>: 行业中性化(Industry Neutralization):因子值若不对行业均值中性化, 因子收益中会混入行业轮动收益,难以判断是因子本身还是行业暴露驱动了收益。 行业中性化操作:factor_neutral = factor - industry_mean(factor)。
- **`SHARED-FR-NEUT-002`** <sub>(high)</sub>: 市值中性化(Market Cap Neutralization):小盘股效应(小盘跑赢大盘) 是金融史上最持久的 anomaly 之一,会污染几乎所有未中性化的因子。 若因子与市值高度相关,选股会系统性偏向小盘,收益来自市值暴露而非因子本身。 需同时进行行业和市值中性化(Fama-MacBeth 回归或残差法)。
- **`SHARED-FR-NEUT-003`** <sub>(high)</sub>: 异常值处理(Winsorize/MAD):因子原始值通常含有极端值,极端值会扭曲 分组分析(如 Q1/Q10 十分位)。应对原始因子值做 Winsorize(截尾至 [1%, 99%] 或 3-sigma)或 MAD(中位数绝对偏差)缩尾,然后再排名/中性化。
- **`SHARED-FR-NEUT-004`** <sub>(medium)</sub>: 因子正交化(Factor Orthogonalization):当多个因子共同用于合成打分时, 高相关因子的合成等效于对单一因子过度权重,稀释信号多样性。 应在合成前对因子做施密特正交化或 PCA,消除因子间的多重共线性。
- **`SHARED-FR-NEUT-005`** <sub>(medium)</sub>: 缺失数据填充策略:因子计算中的 NaN(停牌/新股/数据缺口)若用截面均值填充 会引入 lookahead bias(均值本身含未来信息);若完全删除会产生幸存者偏差; 正确做法是用截面中位数(当日所有股票的中位数,不依赖未来)或将该股当日排除。
- **`SHARED-FR-PORT-001`** <sub>(high)</sub>: 分层分析(Quantile Analysis):因子评估应使用 Q1/Q5(五分位)或 Q1/Q10(十分位)分组的多空收益差(top minus bottom spread)作为 主要评估指标,而非简单的多头收益。Q1 多 Q5 空的"单调性"检验是 因子有效性的核心证据:单调递增/递减 > 非单调 >> 仅多头有效。
- **`SHARED-FR-PORT-002`** <sub>(medium)</sub>: Alpha 衰减测试(Alpha Decay Test):因子的月度 IC 在不同时段(牛市/熊市/ 震荡市)的稳定性是因子鲁棒性的重要证据。IC 仅在某个特定市场状态下有效 的因子不适合全天候部署;应分段(rolling 12M)展示 IC 时序, 识别因子失效期。
- **`SHARED-FR-PORT-003`** <sub>(medium)</sub>: 换仓成本感知(Turnover-Aware Selection):因子排名靠近中间地带(49-51 分位) 的股票,排名小幅波动就会触发换仓,产生大量无效交易成本。 应在选股时设置换仓缓冲区(buffer zone):只在排名变化超过阈值时才换仓。
- **`SHARED-FR-PORT-004`** <sub>(medium)</sub>: 分组收益的统计显著性(Bootstrap 检验):因子分层收益差(Q1-Q5 spread) 即使在历史数据上很大,也可能是偶然,需要 bootstrap 或 t-test 检验 显著性(p-value < 0.05)。小样本回测期(< 3年)的分层收益尤其不可靠。
- **`SHARED-FR-XFER-001`** <sub>(high)</sub>: 因子跨市场可移植性验证:在一个市场有效的因子,不必然在另一个市场有效。 将美股因子直接套用 A 股、或将股票因子套用期货/加密货币,需要独立 IC 验证, 不可假设跨市场通用性。A 股特有异象(如反转效应、ST 价格异常)不存在于美股。
- **`SHARED-FR-XFER-002`** <sub>(medium)</sub>: 因子有效性时间稳定性:曾经有效的因子会因市场学习和套利行为逐渐失效 (McLean & Pontiff 2016 证明因子发表后平均衰减 58%)。 应定期(每季度/年)重新评估因子 IC,失效因子应及时替换或降权。
- **`SHARED-FR-XFER-003`** <sub>(medium)</sub>: 因子与宏观经济环境的交互:利率周期/经济周期/市场情绪对因子有效性影响显著。 价值因子(低 P/B)在利率上升期更有效;动量因子在趋势市更有效,震荡市失效。 部署因子前应评估当前宏观环境与因子最优生存环境的匹配度。
FILE:references/LOCKS.md
# Semantic Locks + Preconditions
## Semantic Locks (12)
### `SL-01` <sub>(on_violation: fatal)</sub>
Execute sell orders before buy orders in every trading cycle
### `SL-02` <sub>(on_violation: fatal)</sub>
Trading signals MUST use next-bar execution (no look-ahead)
### `SL-03` <sub>(on_violation: fatal)</sub>
Entity IDs MUST follow format entity_type_exchange_code
### `SL-04` <sub>(on_violation: fatal)</sub>
DataFrame index MUST be MultiIndex (entity_id, timestamp)
### `SL-05` <sub>(on_violation: fatal)</sub>
TradingSignal MUST have EXACTLY ONE of: position_pct, order_money, order_amount
### `SL-06` <sub>(on_violation: fatal)</sub>
filter_result column semantics: True=BUY, False=SELL, None/NaN=NO ACTION
### `SL-07` <sub>(on_violation: fatal)</sub>
Transformer MUST run BEFORE Accumulator in factor pipeline
### `SL-08` <sub>(on_violation: fatal)</sub>
MACD parameters locked: fast=12, slow=26, signal=9
### `SL-09` <sub>(on_violation: warning)</sub>
Default transaction costs: buy_cost=0.001, sell_cost=0.001, slippage=0.001
### `SL-10` <sub>(on_violation: fatal)</sub>
A-share equity trading is T+1 (no same-day close of buy positions)
### `SL-11` <sub>(on_violation: fatal)</sub>
Recorder subclass MUST define provider AND data_schema class attributes
### `SL-12` <sub>(on_violation: fatal)</sub>
Factor result_df MUST contain either 'filter_result' OR 'score_result' column
## Preconditions (4)
- **`PC-01`**: `python3 -c 'import zvt; print(zvt.__version__)'` → on_fail: Run: python3 -m pip install zvt then re-run: python3 -m zvt.init_dirs to initialize data directories
- **`PC-02`**: `python3 -c "from zvt.api.kdata import get_kdata; df = get_kdata(entity_ids=['stock_sh_600000'], limit=1); assert df is not None and len(df) > 0, 'No kdata found'"` → on_fail: Run recorder first: python3 -m zvt.recorders.em.em_stock_kdata_recorder --entity_ids stock_sh_600000 (replace with your target entity IDs)
- **`PC-03`**: `python3 -c "import os; from pathlib import Path; zvt_home = Path(os.environ.get('ZVT_HOME', Path.home() / '.zvt')); assert zvt_home.exists(), f'ZVT home not found: {zvt_home}'"` → on_fail: Run: python3 -m zvt.init_dirs
- **`PC-04`**: `python3 -c "import os, tempfile; from pathlib import Path; zvt_home = Path(os.environ.get('ZVT_HOME', Path.home() / '.zvt')); test_f = zvt_home / '.write_test'; test_f.touch(); test_f.unlink()"` → on_fail: Check directory permissions: chmod u+w ~/.zvt or set ZVT_HOME environment variable to a writable location
FILE:references/USE_CASES.md
# Known Use Cases (KUC)
Total: **14**
## `KUC-101`
**Source**: `examples/FinRL_Ensemble_StockTrading_ICAIF_2020.ipynb`
Executing automated stock trading using an ensemble of multiple DRL agents (A2C, DDPG, PPO, TD3, SAC) to reduce individual agent weakness and improve risk-adjusted returns.
## `KUC-102`
**Source**: `examples/FinRL_GPM_Demo.ipynb`
Optimizing stock portfolios using Graph Neural Networks (GPM architecture) that capture temporal and relational relationships between stocks in the NASDAQ market.
## `KUC-103`
**Source**: `examples/FinRL_PaperTrading_Demo.ipynb`
Executing simulated real-time stock trading with Alpaca paper trading API using a custom PPO neural network architecture to test strategies without financial risk.
## `KUC-104`
**Source**: `examples/FinRL_PaperTrading_Demo_refactored.py`
Production-ready paper trading script using Alpaca API with command-line argument parsing for automated DOW 30 stock trading.
## `KUC-105`
**Source**: `examples/FinRL_PortfolioOptimizationEnv_Demo.ipynb`
Optimizing cryptocurrency or stock portfolios using EIIE (Environment-Informed Investment Encoder) architecture for Brazilian market stocks.
## `KUC-106`
**Source**: `examples/FinRL_StockTrading_2026_1_data.py`
Fetching and processing stock market data from Yahoo Finance with technical indicators for automated stock trading model development.
## `KUC-107`
**Source**: `examples/FinRL_StockTrading_2026_2_train.py`
Training deep reinforcement learning agents (A2C, DDPG, PPO, SAC, TD3) for automated stock trading using the StockTradingEnv environment.
## `KUC-108`
**Source**: `examples/FinRL_StockTrading_2026_3_Backtest.py`
Backtesting multiple trained DRL agents against baseline strategies (MVO, DJIA) to evaluate and compare ensemble trading performance.
## `KUC-109`
**Source**: `finrl/applications/Stock_NeurIPS2018/Stock_NeurIPS2018_1_Data.ipynb`
Fetching DOW 30 stock data with VIX fear index and turbulence indicators for robust market condition modeling in stock trading.
## `KUC-110`
**Source**: `finrl/applications/Stock_NeurIPS2018/Stock_NeurIPS2018_2_Train.ipynb`
Training A2C reinforcement learning agent for automated stock trading with technical indicators and trading cost considerations.
## `KUC-111`
**Source**: `finrl/applications/Stock_NeurIPS2018/Stock_NeurIPS2018_3_Backtest.ipynb`
Evaluating and comparing multiple DRL trading agents (A2C, DDPG, PPO, SAC, TD3) through backtesting against market baselines.
## `KUC-112`
**Source**: `finrl/applications/imitation_learning/Imitation_Sandbox.ipynb`
Experimental sandbox for testing imitation learning algorithms (TD3+BC) combined with market factor models for stock portfolio management.
## `KUC-113`
**Source**: `finrl/applications/imitation_learning/Stock_Selection.ipynb`
Using imitation learning techniques to learn stock selection strategies from expert behavior combined with technical indicators.
## `KUC-114`
**Source**: `finrl/applications/imitation_learning/Weight_Initialization.ipynb`
Investigating weight initialization strategies for imitation learning models to improve stock portfolio management performance.
FILE:references/WISDOM.md
# Cross-Project Wisdom
Total: **10**
## `CW-BT-001` — Cerebro 统一编排引擎
**From**: backtrader · **Applicable to**: backtesting
backtrader 用 Cerebro 作为单一入口,统一管理 data feeds、strategies、analyzers、 observers 的生命周期,支持一次 cerebro.run() 跑多策略+多数据源。 zvt 的 StockTrader 目前每次实例化只绑定一套因子,缺乏统一的多策略组合编排层; 借鉴 Cerebro 模式可让用户把多个 Trader 实例组合到一个 runner 中对比评估。
## `CW-BT-002` — Analyzer 插件化绩效评估
**From**: backtrader · **Applicable to**: backtesting
backtrader 提供 SharpeRatio、DrawDown、TimeReturn、TradeAnalyzer 等即插即用 的 Analyzer,可在不修改策略代码的情况下附加任意绩效指标。 zvt 当前绩效评估能力较弱,没有标准化的 Analyzer 接口; 借鉴此模式可让用户 cerebro.addanalyzer(SharpeRatio) 即得风险调整收益报告。
## `CW-BT-003` — Sizer 仓位管理分离
**From**: backtrader · **Applicable to**: backtesting
backtrader 将仓位管理(每次开仓买多少股/多大比例)单独抽象为 Sizer, 与信号逻辑完全解耦;内置 FixedSize、PercentSizer 等,用户可自定义。 zvt 目前没有显式的 Sizer 概念,仓位控制逻辑散落在 Trader.on_profit_control 等钩子中; 引入 Sizer 接口可使策略信号与资金管理规则独立演化和组合复用。
## `CW-BT-004` — Order 类型全集(Limit/Stop/OCO/Bracket)
**From**: backtrader · **Applicable to**: backtesting
backtrader 支持 Market、Limit、Stop、StopLimit、OCO(二选一)、 Bracket(止盈止损一对订单)等丰富订单类型,并模拟成交滑点和手续费方案。 zvt 回测目前主要支持市价成交,缺乏限价委托和组合订单模拟; 对于高频或实盘对接场景,完善订单类型将大幅提升回测真实性。
## `CW-BT-005` — 数据重采样与重播(Resampling & Replaying)
**From**: backtrader · **Applicable to**: backtesting
backtrader 可将低级别数据(如 1 min)实时 resample 为高级别(如 1 day)并同步驱动策略, 或 replay 逐 tick 模拟 OHLC 形成过程,实现日内精细回测。 zvt 目前多时间框架通过预录入不同级别 K 线实现,缺少运行时动态重采样; 借鉴此模式可在不重复录入数据的前提下支持任意时间粒度组合回测。
## `CW-VN-003` — CTA 回测引擎内置可视化
**From**: vnpy · **Applicable to**: backtesting
vnpy 的 cta_backtester 提供图形界面直接展示策略净值曲线、最大回撤、 每日盈亏、成交明细,无需 Jupyter Notebook。 zvt 目前回测结果可视化依赖 draw_result 方法调用 Plotly,但无统一的回测报告页面; 借鉴此模式可打包一个开箱即用的策略绩效仪表盘。
## `CW-VN-004` — vnpy.alpha ML 因子研究实验室(Lab)
**From**: vnpy · **Applicable to**: factor-research
vnpy 4.0 的 vnpy.alpha.lab 提供数据管理、模型训练、信号生成、策略回测一体化工作流, 支持 Lasso/LightGBM/MLP 等算法的标准化训练接口和可视化对比。 zvt 的 ML 能力目前仅有 MaStockMLMachine 一个入口,缺乏规范化 Lab 框架; 借鉴 Lab 模式可建立"特征工程→训练→信号→回测"的标准流水线,降低 ML 实验门槛。
## `CW-QL-001` — Point-in-Time 数据库(防未来数据泄漏)
**From**: qlib · **Applicable to**: backtesting
qlib 的 Point-in-Time Provider 保证在给定时间点 t 的查询只返回 t 时刻 真实可知的数据(财报发布延迟、修订历史均被正确处理), 彻底消除回测中的 look-ahead bias。 zvt 目前财务数据以报告期为 timestamp,缺少"发布日"维度, 存在用未来财报数据做选股的潜在偏差;引入 PIT 模式可大幅提升回测可信度。
## `CW-QL-002` — Recorder + Experiment 实验管理(MLflow 风格)
**From**: qlib · **Applicable to**: factor-research
qlib 的 workflow 模块提供 Experiment/Recorder,自动记录每次模型训练的 超参数、特征、指标、预测结果,支持跨实验比较和模型版本管理。 zvt 目前缺乏 ML 实验追踪机制,每次重跑结果会覆盖前次; 借鉴 Recorder 模式可将每次因子实验的参数和结果持久化,支持快速复现和版本对比。
## `CW-QL-003` — Nested Decision Framework(多层嵌套决策执行)
**From**: qlib · **Applicable to**: backtesting
qlib 支持将高频执行层(分钟级委托拆单)嵌套在低频决策层(日级组合调仓)中, 两层独立优化且可组合运行,实现日内最优执行算法(如 TWAP、VWAP 调仓)。 zvt 目前回测仅有日线级别的成交假设,缺乏执行算法建模; 借鉴嵌套框架可让 zvt 区分"何时持有哪些股"与"如何以最小冲击成本建仓"两个问题。
FILE:references/components/backtesting_-_paper_trading.md
# backtesting_&_paper_trading (5 classes)
## `DRL_prediction`
`backtesting_&_paper_trading/drl-prediction.py:0`
## `DRL_prediction_load_from_file`
`backtesting_&_paper_trading/drl-prediction-load-from-file.py:0`
## `AlpacaPaperTrading.run`
`backtesting_&_paper_trading/alpacapapertrading-run.py:0`
## `trade_mode`
`backtesting_&_paper_trading/trade-mode.py:0`
## `deterministic`
`backtesting_&_paper_trading/deterministic.py:0`
FILE:references/components/data_cleaning_-_alignment.md
# data_cleaning_&_alignment (3 classes)
## `YahooFinanceProcessor.clean_data`
`data_cleaning_&_alignment/yahoofinanceprocessor-clean-data.py:0`
## `FeatureEngineer.clean_data`
`data_cleaning_&_alignment/featureengineer-clean-data.py:0`
## `fill_method`
`data_cleaning_&_alignment/fill-method.py:0`
FILE:references/components/drl_model_training.md
# drl_model_training (5 classes)
## `DRLAgent.get_model`
`drl_model_training/drlagent-get-model.py:0`
## `DRLAgent.train_model`
`drl_model_training/drlagent-train-model.py:0`
## `DRLAgent (ElegantRL).get_model`
`drl_model_training/drlagent-elegantrl-get-model.py:0`
## `drl_lib`
`drl_model_training/drl-lib.py:0`
## `algorithm`
`drl_model_training/algorithm.py:0`
FILE:references/components/ensemble_validation.md
# ensemble_validation (4 classes)
## `DRLEnsembleAgent.run_ensemble_strategy`
`ensemble_validation/drlensembleagent-run-ensemble-strategy.py:0`
## `get_validation_sharpe`
`ensemble_validation/get-validation-sharpe.py:0`
## `rebalance_window`
`ensemble_validation/rebalance-window.py:0`
## `validation_metric`
`ensemble_validation/validation-metric.py:0`
FILE:references/components/gym_environment_creation.md
# gym_environment_creation (6 classes)
## `StockTradingEnv.__init__`
`gym_environment_creation/stocktradingenv-init.py:0`
## `StockTradingEnv.step`
`gym_environment_creation/stocktradingenv-step.py:0`
## `StockTradingEnv.reset`
`gym_environment_creation/stocktradingenv-reset.py:0`
## `PortfolioOptimizationEnv.step`
`gym_environment_creation/portfoliooptimizationenv-step.py:0`
## `reward_scaling`
`gym_environment_creation/reward-scaling.py:0`
## `action_space_type`
`gym_environment_creation/action-space-type.py:0`
FILE:references/components/market_data_acquisition.md
# market_data_acquisition (4 classes)
## `DataProcessor.download`
`market_data_acquisition/dataprocessor-download.py:0`
## `YahooFinanceProcessor.fetch_data`
`market_data_acquisition/yahoofinanceprocessor-fetch-data.py:0`
## `AlpacaProcessor.fetch_data`
`market_data_acquisition/alpacaprocessor-fetch-data.py:0`
## `data_source`
`market_data_acquisition/data-source.py:0`
FILE:references/components/normalization_-_array_conversion.md
# normalization_&_array_conversion (3 classes)
## `GroupByScaler.fit_transform`
`normalization_&_array_conversion/groupbyscaler-fit-transform.py:0`
## `df_to_array`
`normalization_&_array_conversion/df-to-array.py:0`
## `scaler`
`normalization_&_array_conversion/scaler.py:0`
FILE:references/components/performance_metrics_-_visualization.md
# performance_metrics_&_visualization (5 classes)
## `backtest_stats`
`performance_metrics_&_visualization/backtest-stats.py:0`
## `plot_return`
`performance_metrics_&_visualization/plot-return.py:0`
## `get_baseline`
`performance_metrics_&_visualization/get-baseline.py:0`
## `benchmark`
`performance_metrics_&_visualization/benchmark.py:0`
## `plot_format`
`performance_metrics_&_visualization/plot-format.py:0`
FILE:references/components/technical_indicator_computation.md
# technical_indicator_computation (4 classes)
## `FeatureEngineer.add_technical_indicator`
`technical_indicator_computation/featureengineer-add-technical-indicator.py:0`
## `calculate_turbulence`
`technical_indicator_computation/calculate-turbulence.py:0`
## `indicator_list`
`technical_indicator_computation/indicator-list.py:0`
## `turbulence_enabled`
`technical_indicator_computation/turbulence-enabled.py:0`
提供多市场金融强化学习环境,支持PPO/DQN等DRL算法回测、Markowitz组合优化与实时模拟交易,适配Alpaca等券商接口。。
---
name: finrl-meta-envs
description: |-
提供多市场金融强化学习环境,支持PPO/DQN等DRL算法回测、Markowitz组合优化与实时模拟交易,适配Alpaca等券商接口。。
license: Proprietary. See LICENSE.txt in project root.
compatibility: Designed for Doramagic-host ecosystem (Claude Code / openclaw / Cursor). Requires Python 3.12+ with uv package manager.
metadata:
version: "v6.1"
blueprint_id: "finance-bp-116"
compiled_at: "2026-04-22T13:00:56.369548+00:00"
capability_markets: "multi-market"
capability_activities: "backtesting, factor-research"
sop_version: "crystal-compilation-v6.1"
---
# FinRL 强化环境 (finrl-meta-envs)
> 提供多市场金融强化学习环境,支持PPO/DQN等DRL算法回测、Markowitz组合优化与实时模拟交易,适配Alpaca等券商接口。
## Pipeline
`data_collection -> data_storage -> factor_computation -> target_selection -> trading_execution -> visualization`
## Top Use Cases (9 total)
### Automated Paper Trading with PPO Agent (`UC-101`)
Execute simulated paper trades in real-time using a trained PPO reinforcement learning agent connected to Alpaca brokerage API, enabling risk-free str
**Triggers**: paper trading, PPO agent, Alpaca
### Alpaca Paper Trading Demo with PPO (`UC-104`)
Demonstrate live paper trading execution using a PPO neural network agent connected to Alpaca's paper trading API, enabling real-time trade simulation
**Triggers**: paper trading, Alpaca demo, PPO
### Markowitz Mean-Variance Portfolio Optimization (`UC-102`)
Optimize portfolio allocation across multiple assets using Markowitz mean-variance optimization to maximize risk-adjusted returns, balancing expected
**Triggers**: portfolio optimization, Markowitz, mean-variance
For all **9** use cases, see [references/USE_CASES.md](references/USE_CASES.md).
**Execute trigger**: `When user intent matches intent_router.uc_entries[].positive_terms AND user uses action verb (run/execute/跑/执行/backtest/fetch/collect)`
## What I'll Ask You
- Target market: A-share (default), HK, or crypto? (US stocks in ZVT are half-baked — stockus_nasdaq_AAPL exists but coverage is thin)
- Data source / provider: eastmoney (free, no account), joinquant (account+paid), baostock (free, good history), akshare, or qmt (broker)?
- Strategy type: MACD golden-cross, MA crossover, volume breakout, fundamental screen, or custom factor?
- Time range: start_timestamp and end_timestamp for backtest period
- Target entity IDs: specific stocks (stock_sh_600000) or index components (SZ1000)?
## Semantic Locks (Fatal)
| ID | Rule | On Violation |
|---|---|---|
| `SL-01` | Execute sell orders before buy orders in every trading cycle | halt |
| `SL-02` | Trading signals MUST use next-bar execution (no look-ahead) | halt |
| `SL-03` | Entity IDs MUST follow format entity_type_exchange_code | halt |
| `SL-04` | DataFrame index MUST be MultiIndex (entity_id, timestamp) | halt |
| `SL-05` | TradingSignal MUST have EXACTLY ONE of: position_pct, order_money, order_amount | halt |
| `SL-06` | filter_result column semantics: True=BUY, False=SELL, None/NaN=NO ACTION | halt |
| `SL-07` | Transformer MUST run BEFORE Accumulator in factor pipeline | halt |
| `SL-08` | MACD parameters locked: fast=12, slow=26, signal=9 | halt |
Full lock definitions: [references/LOCKS.md](references/LOCKS.md)
## Top Anti-Patterns (25 total)
- **`AP-ZVT-183`**: 除权因子为 inf/NaN 时直接参与乘法导致复权静默失败
- **`AP-ZVT-179`**: 第三方数据接口超限后异常被吞噬,数据静默缺失
- **`AP-ZVT-183B`**: HFQ(后复权)与 QFQ(前复权)K 线表使用错误导致因子计算漂移
All 25 anti-patterns: [references/ANTI_PATTERNS.md](references/ANTI_PATTERNS.md)
## Evidence Quality Notice
> [QUALITY NOTICE] This crystal was compiled from blueprint finance-bp-116. Evidence verify ratio = 23.2% and audit fail total = 8. Generated results may have uncaptured requirement gaps. Verify critical decisions against source files (LATEST.yaml / LATEST.jsonl).
## Reference Files
| File | Contents | When to Load |
|---|---|---|
| [references/seed.yaml](references/seed.yaml) | V6+ 全量权威 (source-of-truth) | 有行为/决策争议时必读 |
| [references/ANTI_PATTERNS.md](references/ANTI_PATTERNS.md) | 25 条跨项目反模式 | 开始实现前 |
| [references/WISDOM.md](references/WISDOM.md) | 跨项目精华借鉴 | 架构决策时 |
| [references/CONSTRAINTS.md](references/CONSTRAINTS.md) | domain + fatal 约束 | 规则冲突时 |
| [references/USE_CASES.md](references/USE_CASES.md) | 全量 KUC-* 业务场景 | 需要完整示例时 |
| [references/LOCKS.md](references/LOCKS.md) | SL-* + preconditions + hints | 生成回测/交易代码前 |
| [references/COMPONENTS.md](references/COMPONENTS.md) | AST 组件地图(按 module 拆分)| 查 API 时 |
---
*Compiled by Doramagic crystal-compilation-v6.1 from `finance-bp-116` blueprint at 2026-04-22T13:00:56.369548+00:00.*
*See [human_summary.md](human_summary.md) for non-technical overview.*
FILE:human_summary.md
# Human Summary
> I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow. Just tell me what you want; I'll write the code, you don't have to dig docs. (Heads up: ZVT natively supports A-share, HK, and crypto. US stocks — stockus_nasdaq_AAPL — are half-baked; don't bother for serious work.)
## What I Can Do
- **tagline**: I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow. Just tell me what you want; I'll write the code, you don't have to dig docs. (Heads up: ZVT natively supports A-share, HK, and crypto. US stocks — stockus_nasdaq_AAPL — are half-baked; don't bother for serious work.)
- **use_cases**: ['Ensemble Stock Trading with DRL Agents', 'Markowitz Mean-Variance Portfolio Optimization', 'Automated Paper Trading with PPO Agent', 'A-share MACD daily golden-cross backtest with hfq price adjustment from eastmoney', 'End-to-end ZVT pipeline: FinanceRecorder + GoodCompanyFactor + StockTrader', 'Multi-factor strategy with TargetSelector (AND mode) combining MACD + volume breakout', 'Index composition data collection (SZ1000, SZ2000) with EM recorder']
## What I Ask You
- Target market: A-share (default), HK, or crypto? (US stocks in ZVT are half-baked — stockus_nasdaq_AAPL exists but coverage is thin)
- Data source / provider: eastmoney (free, no account), joinquant (account+paid), baostock (free, good history), akshare, or qmt (broker)?
- Strategy type: MACD golden-cross, MA crossover, volume breakout, fundamental screen, or custom factor?
- Time range: start_timestamp and end_timestamp for backtest period
- Target entity IDs: specific stocks (stock_sh_600000) or index components (SZ1000)?
FILE:references/ANTI_PATTERNS.md
# Anti-Patterns (Cross-Project)
Total: **25**
## qlib (9)
### `AP-QLIB-1930` — 回测结果与模型无关——共享 dataset 对象导致预测值被首次模型覆盖 <sub>(high)</sub>
Qlib 中多个模型复用同一个已 fit 的 DatasetH 实例时,dataset 内部的标准化 参数(fit_start_time/fit_end_time 决定的归一化统计量)在第一次 fit 后固化。 切换模型但不重新初始化 dataset,导致所有模型实际使用同一套预测信号。表现为 无论换 LightGBM/XGBoost/DNN,回测净值曲线完全一致。这是最危险的"实验看起来 在跑,但结论全部无效"反模式。
Source: https://github.com/microsoft/qlib/issues/1930
### `AP-QLIB-2090` — fit_start_time 与 train segment 双重配置引发隐式数据泄露 <sub>(high)</sub>
Qlib DatasetH 有两个"训练数据范围":handler 的 fit_start_time/fit_end_time (决定归一化器拟合范围)和 segments.train(决定模型训练范围)。常见错误是 让 fit_end_time 覆盖 valid/test 段,使归一化统计量(均值、标准差)包含了 未来数据,造成前向偏差(look-ahead bias)。两者独立配置但语义耦合,文档 未明确说明 fit_end_time 必须 <= train_end。
Source: https://github.com/microsoft/qlib/issues/2090
### `AP-QLIB-2036` — MACD 因子公式文档错误——DEA 被多除一次 CLOSE 导致量纲不一致 <sub>(high)</sub>
Qlib 官方文档中的 Alpha 公式示例将 MACD 的 DEA 定义为 EMA(DIF, 9) / CLOSE, 但 DIF 已经是无量纲(除过 CLOSE 的),再次除以 CLOSE 导致 DEA 量纲为 1/price。 基于此文档公式构建的 MACD 因子在截面标准化后与正确公式差异显著,IC 下降。 此类文档层面的公式错误会被大量用户直接照搬入生产因子库。
Source: https://github.com/microsoft/qlib/issues/2036
### `AP-QLIB-2184` — 自定义 A 股数据导入前未按约定填充停牌日 NaN,引发下游因子噪声 <sub>(high)</sub>
Qlib 约定停牌日 open/close/high/low/volume/factor 字段均应填 NaN,以便框架 在因子计算时识别并跳过。用户自建 A 股数据集时若将停牌日保留为上一日价格 (常见于从东财/Wind 直接导出的数据),会导致停牌期间的价格动量因子出现 "假信号"(价格不变但因子非零)。Qlib 不校验此约定,错误静默流入训练数据。
Source: https://github.com/microsoft/qlib/issues/2184
### `AP-QLIB-1892` — PIT(Point-In-Time)财务数据收集器依赖外部股票列表接口,全量 A 股获取不完整 <sub>(high)</sub>
Qlib 的 PIT 数据收集器(财务数据时间点快照)在初始化时调用 get_hs_stock_symbols() 获取沪深股票列表。该函数依赖东财 API,经常仅返回 部分列表而非全量 5000+ 股票,且函数在获取不完整时直接 raise ValueError。 用户若按文档步骤操作,财务数据集将只覆盖部分股票,基于 PIT 财务因子的回测 存在严重生存者偏差(未被采集的股票被隐式排除)。
Source: https://github.com/microsoft/qlib/issues/1892
### `AP-QLIB-2097` — 全市场 instrument="all" 在 32GB 内存机器上 OOM,但 CSI300 正常 <sub>(medium)</sub>
Qlib 在加载 Alpha158 特征时会将指定 universe 的全部特征矩阵一次性载入内存。 使用 instrument="csi300"(300 股)与 instrument="all"(5000+ 股)的内存占用 差约 16 倍。32GB 机器跑全市场时在 init_instance_by_config 阶段直接 OOM, 错误信息不提示内存问题。用户容易误以为是配置错误,实际上需要分批加载或 使用流式特征计算。
Source: https://github.com/microsoft/qlib/issues/2097
### `AP-QLIB-1984` — LightGBM 模型标签维度校验逻辑永远不触发导致多标签训练静默失败 <sub>(medium)</sub>
Qlib gbdt.py 中用 y.values.ndim == 2 判断是否为多标签,但从 DataFrame 取出的 Series 的 ndim 永远为 1,条件永远为 False,因此多标签训练不会走 squeeze 分支,而是直接进入 LightGBM 训练并在更深处抛出语义不明的错误。 用户尝试自定义多标签任务时无法从错误信息定位到此根因。
Source: https://github.com/microsoft/qlib/issues/1984
### `AP-QLIB-1915` — 自定义 CSV 数据 dump_bin 后 DataHandler 报 Length mismatch,D.features 却正常 <sub>(high)</sub>
Qlib 存在两套数据访问路径:D.features(直接读 binary)和 DataHandler/DataHandlerLP (带 processor pipeline)。自定义 A 股 CSV 数据在 dump_bin 时若字段顺序 或 symbol 格式(如 600000.SH vs SH600000)与 Qlib 约定不符,DataHandler 的 processor 在 align/reindex 时触发 Length mismatch,而 D.features 因不 经过 processor 而成功。这一"两套路径行为不一致"让用户误以为数据已正确导入。
Source: https://github.com/microsoft/qlib/issues/1915
### `AP-QLIB-1949` — Colab/Linux 多进程后端与 Qlib ParallelExt 冲突导致 DataHandler 完全不可用 <sub>(medium)</sub>
Qlib 在非 fork 环境(Windows 或 Google Colab)中,DataHandler 使用 joblib 并行加载特征时,ParallelExt 初始化时访问 _backend_args 属性失败(AttributeError)。 根因是 joblib 1.5+ 移除了该内部属性,Qlib 的兼容层未更新。表现为 D.features 调用抛出多层嵌套异常,用户无法从错误栈判断是并行后端问题还是数据问题。
Source: https://github.com/microsoft/qlib/issues/1949
## vnpy (4)
### `AP-VNPY-3691` — K 线生成器首根 K 线时间戳不对齐,导致第一个周期信号错误 <sub>(high)</sub>
vnpy BarGenerator 在合成 N 分钟 K 线时,第一根推送的 K 线时间戳为"当前 tick 所在分钟"而非"完整 N 分钟周期结束时间"。具体表现:09:59 的 tick 会 触发一根不完整的 5 分钟 K 线推送(本应等到 10:04 才推送)。策略若在 on_bar 中直接用 datetime.minute % 5 过滤,第一根 K 线恰好通过,但包含的 数据不足一个完整周期,用于信号计算会产生错误的开仓信号。
Source: https://github.com/vnpy/vnpy/issues/3691
### `AP-VNPY-3669` — Alpha 模块历史数据增量保存时新旧 DataFrame schema 不兼容导致 SchemaError <sub>(medium)</sub>
vnpy Alpha 模块在保存 K 线数据到 Parquet 文件时,将新下载数据(可能含 Float64 列)与已存文件(历史 Int64 列)直接 polars.concat。polars 强类型 不允许隐式类型提升,抛出 SchemaError。根因是不同数据源/版本返回的字段类型 不一致(如 volume 在部分行情源为整数,在另一些为浮点),且 concat 前无 schema 对齐步骤。影响所有使用 vnpy alpha 进行回测的历史数据构建流程。
Source: https://github.com/vnpy/vnpy/issues/3669
### `AP-VNPY-3685` — 价差交易模块 run_backtesting() 在 Jupyter 环境下静默报错,结果不可信 <sub>(high)</sub>
vnpy 4.10 价差交易(SpreadTrading)模块的 run_backtesting() 在 Jupyter 环境下存在事件循环冲突(asyncio already running),导致回测引擎部分逻辑 不执行但不抛异常,返回看似正常的回测统计数据。同样代码在命令行 Python 中无此问题。vnpy 4.x 将部分 IO 改为 async 但 Jupyter 的事件循环与之不兼容, 是"回测结果看起来正确但实际不完整"的隐蔽陷阱。
Source: https://github.com/vnpy/vnpy/issues/3685
### `AP-VNPY-3700` — 安装脚本不使用 venv 导致全局 numpy 版本被降级破坏其他依赖 <sub>(medium)</sub>
vnpy install.bat 直接在系统/conda base 环境安装,会强制降级 numpy 到 <2.0 以满足 vnpy 依赖,破坏依赖 numpy 2.x 的其他量化工具(如 scipy、pytorch 新版)。 没有 requirements.txt,依赖边界不透明。在多工具共存的量化研究环境中, vnpy 的安装脚本是"全局环境污染"的常见根源。
Source: https://github.com/vnpy/vnpy/issues/3700
## zipline (6)
### `AP-ZIPLINE-138` — 回测价格为未复权价,教程图表误导用户误判策略收益 <sub>(high)</sub>
Zipline 教程使用 AAPL 股价图做演示,但 bundle 中存储的是未复权价格(raw price), 而非经过拆股/分红调整的复权价。图表显示的历史价格与市场实际价约差 4 倍(Apple 历次拆股累计因子),用户误将"价格翻 4 倍"当作策略收益。A 股场景更严重: 除权前后价格跳变会在未复权数据中形成巨大"信号",吸引技术指标在除权日产生 虚假突破信号。
Source: https://github.com/stefan-jansen/zipline-reloaded/issues/138
### `AP-ZIPLINE-235` — 默认以当根 K 线收盘价成交,低估实盘滑点,策略回测收益虚高 <sub>(high)</sub>
Zipline 默认滑点模型在当根 K 线触发信号后,以同根 K 线收盘价成交(current bar close fill)。实盘中信号只能在下一根 K 线的开盘价附近成交(T+1 order execution)。以 A 股日线为例,用收盘价回测比用次日开盘价成交平均高估日收益 约 0.1-0.3%,年化差距可超 30%。需显式配置 slippage model 为 VolumeShareSlippage 或 FixedSlippage 并设合理 volume_limit。
Source: https://github.com/stefan-jansen/zipline-reloaded/issues/235
### `AP-ZIPLINE-190` — 日历 start_session 设为非交易日触发 DateOutOfBounds,无提示如何修正 <sub>(medium)</sub>
Zipline 在注册 bundle 或运行算法时,若 start_session 参数恰好是非交易日 (如 1998-01-01 元旦),Calendar 校验抛出 DateOutOfBounds("cannot be earlier than the first session")。错误信息仅显示交易日历起始日,不提示"请改为第一个 交易日"。A 股场景:使用 SSE/SZSE 日历时,若 start_date 恰好是春节前最后 一天次日(节假日),会触发同类错误,调试成本极高。
Source: https://github.com/stefan-jansen/zipline-reloaded/issues/190
### `AP-ZIPLINE-181` — asset db 过期后 Pipeline 报"no assets traded",误导用户排查数据范围 <sub>(high)</sub>
Zipline 的 asset database(SQLite)记录每只股票的 start/end 交易日期。若 使用了旧版 Quandl/自建 bundle 且未重新 ingest,在回测新日期范围时 Pipeline 抛出 "Failed to find any assets with country_code 'US' that traded between [dates]"。A 股场景:重新下载行情后若只更新价格数据而未重建 asset db,退市/ 新上市股票的日期范围不更新,Pipeline 过滤会悄悄排除这些股票,产生生存者偏差。
Source: https://github.com/stefan-jansen/zipline-reloaded/issues/181
### `AP-ZIPLINE-285` — week_start()/week_end() 在自定义日历(非美股)下静默失效 <sub>(medium)</sub>
Zipline schedule_function 的 date_rules.week_start() 和 date_rules.week_end() 依赖交易日历的周首/周末判断逻辑,但在非美股日历(如 ASX、SSE)中,该逻辑 与 NYSE 日历的偏移计算不兼容,导致 schedule 永远不触发或在错误的日期触发。 A 股场景:使用 SSE 日历时,含春节等连续长假的周,week_start 可能跳过整个 假期周而不调仓,但用户无法从日志发现未触发的调度。
Source: https://github.com/stefan-jansen/zipline-reloaded/issues/285
### `AP-ZIPLINE-240` — 回测日期时区必须为 UTC,传入 naive datetime 引发深层 AssertionError <sub>(medium)</sub>
Zipline 内部强制要求所有时间戳为 UTC aware datetime。当用户传入 naive datetime (无时区信息,如 pd.Timestamp('2020-01-01'))时,不在入口处报错,而是在 算法执行深处触发 AssertionError: Algorithm should have a utc datetime,栈深 难以定位。A 股开发者从本地 CST 时间导入数据时极易触发此陷阱,需在 bundle 注册时显式 tz_localize('UTC')。
Source: https://github.com/stefan-jansen/zipline-reloaded/issues/240
## zvt (6)
### `AP-ZVT-183` — 除权因子为 inf/NaN 时直接参与乘法导致复权静默失败 <sub>(high)</sub>
ZVT 在计算前复权因子时以 new/old 价格比计算 qfq_factor。当 old==0(新股首日 或数据缺失)时因子为 inf;当 kdata.open 本身为 None(停牌日未填充)时乘法 抛出 TypeError。结果:整个 entity 的复权计算中断,后续 K 线全部丢失,但主 流程只 log ERROR 不中断,用户往往不知道已有大量股票数据损坏。
Source: https://github.com/zvtvz/zvt/issues/183
### `AP-ZVT-179` — 第三方数据接口超限后异常被吞噬,数据静默缺失 <sub>(high)</sub>
ZVT 使用聚宽 jqdatasdk 批量拉取全市场 K 线时(4000+ 股票),触发聚宽每日 最大查询条数限制(错误:已超过每日最大查询数量)。ZVT 捕获异常后继续执行下一 entity,导致超限后所有股票的当日数据均静默缺失。回测若使用该残缺数据库,因 子计算结果将产生系统性偏差,且无告警。
Source: https://github.com/zvtvz/zvt/issues/179
### `AP-ZVT-161` — 全市场 SQLite 批量因子计算触发 too many SQL variables 错误 <sub>(medium)</sub>
ZVT 在计算 VolumeUpMaFactor 等多股因子时,将所有 entity_id 拼入单条 SQL 的 IN 子句。当 A 股全市场(5000+ 股)一次性查询时,触发 SQLite 默认限制 SQLITE_MAX_VARIABLE_NUMBER=999。调大 max_allowed_packet(MySQL 参数)无效, 根因是 SQLite 变量数上限。正确解法是分批查询,但 ZVT 早期版本未处理此边界。
Source: https://github.com/zvtvz/zvt/issues/161
### `AP-ZVT-129` — 使用通配符导入隐藏 API 版本变更,AdjustType 等枚举莫名消失 <sub>(medium)</sub>
ZVT 文档示例使用 `from zvt import *` 导入所有符号。当 ZVT 版本升级重构 枚举(如将 AdjustType 移入子模块)后,通配符导入不再包含该符号,触发 AttributeError。使用者误以为是安装问题,实际是版本间 API breaking change 未在 CHANGELOG 中标注,且通配符导入掩盖了具体来源。应显式 import 枚举类。
Source: https://github.com/zvtvz/zvt/issues/129
### `AP-ZVT-187` — 回测引擎未在数据层空结果时提前终止,导致空指针级联崩溃 <sub>(medium)</sub>
ZVT Trader 在 load_data 完成后检查数据为空时,不提前退出,而是将空 DataFrame 传入 selector 计算,触发后续 NoneType 操作链式崩溃。错误栈深且难以定位根因, 用户误以为是策略逻辑问题。根因是数据时间窗口配置错误(start/end 不在数据 库覆盖范围内)但无有效校验。
Source: https://github.com/zvtvz/zvt/issues/187
### `AP-ZVT-183B` — HFQ(后复权)与 QFQ(前复权)K 线表使用错误导致因子计算漂移 <sub>(high)</sub>
ZVT 提供 Stock1dKdata(不复权)、Stock1dHfqKdata(后复权)、Stock1dQfqKdata (前复权)三张独立表。用户在计算价格动量/均线因子时混用两张表(如用不复权 做均线,用后复权做收益率),导致除权日前后因子值产生跳变。ZVT 不做跨表 复权类型一致性校验,混用静默通过。
Source: https://github.com/zvtvz/zvt/issues/183
FILE:references/COMPONENTS.md
# Component Capability Map
**Project**: finance-bp-116--FinRL-Meta
**Scan date**: 2026-04-22
**Stats**: {'total_files': 6, 'total_classes': 29, 'total_functions': 0, 'total_stages': 6}
## Modules (6)
- [data_collection](components/data_collection.md): 5 classes
- [feature_engineering](components/feature_engineering.md): 4 classes
- [environment_simulation](components/environment_simulation.md): 5 classes
- [agent_training](components/agent_training.md): 6 classes
- [order_execution_&_execution_optimization](components/order_execution_-_execution_optimization.md): 6 classes
- [paper_trading](components/paper_trading.md): 3 classes
FILE:references/CONSTRAINTS.md
# Constraints
## preservation_manifest
```yaml
required_objects:
business_decisions_count: 176
fatal_constraints_count: 40
non_fatal_constraints_count: 218
use_cases_count: 9
semantic_locks_count: 12
preconditions_count: 4
evidence_quality_rules_count: 2
traceback_scenarios_count: 5
```
## Domain Constraints Injected (39)
- **`SHARED-BT-LAB-001`** <sub>(fatal)</sub>: 未来函数(Lookahead Bias):在模拟历史时间点 t 的交易决策时, 不得使用 t 时刻之后才能知道的信息。最常见形式: (1) 使用收盘价计算信号并同日以收盘价成交; (2) 将 T 日收盘后计算的指标标记在同一根 K 线; (3) 使用当日最高/最低价作为成交假设。 信号计算与成交时间必须对齐:T 日收盘后计算信号,T+1 日开盘成交。
- **`SHARED-BT-LAB-002`** <sub>(high)</sub>: 指标预热期(Warmup Period)处理:滚动窗口指标在前 N 个 bar 时 NaN, 这些 bar 不应参与信号计算和持仓决策。强制要求指标的 warmup_period 与最长 lookback 期等长,且 warmup 期间持仓应置零。
- **`SHARED-BT-LAB-003`** <sub>(fatal)</sub>: ML/DL 模型时序数据划分必须按时间顺序:TRAIN < VALID < TEST, 不可使用随机 k-fold 分折(会将未来数据混入训练集)。 应使用 TimeSeriesSplit 或 Walk-Forward 验证。
- **`SHARED-BT-LAB-004`** <sub>(fatal)</sub>: 开盘价/最高价/最低价成交假设:日线回测中假设每日可以最高价卖出或 最低价买入(如动量策略"最高价止盈"),这是明显的 lookahead, 因为日内最高/最低价只有收盘后才能确认。成交价只能用开盘价或 前一日收盘价(带滑点)。
- **`SHARED-BT-LAB-005`** <sub>(high)</sub>: 数据对齐偏移(Off-by-one):pandas rolling/shift 等操作容易引入细微的 1 期偏移错误。应在代码中明确记录每个序列的"观测时间点", 并通过 assert 验证关键时间对齐关系。
- **`SHARED-BT-LAB-006`** <sub>(high)</sub>: 过度优化(Overfitting):回测数量越多,过拟合概率越高。 Bailey et al.(2014)证明 Optimal Sharpe Ratio 期望值随回测次数单调递减。 应使用 Walk-Forward 验证代替 in-sample 参数穷举,并报告 Deflated Sharpe Ratio(DSR)而非峰值 Sharpe。
- **`SHARED-BT-SURV-001`** <sub>(fatal)</sub>: 幸存者偏差(Survivorship Bias):使用当前市场成分股作为历史回测股票池, 会遗漏曾经存在但后来退市、摘牌或被合并的股票,系统性高估策略历史收益率。 回测股票池必须使用历史时点快照(point-in-time universe)。
- **`SHARED-BT-SURV-002`** <sub>(high)</sub>: In-Sample / Out-of-Sample 划分:策略开发、参数选择必须在样本内完成, 样本外数据仅用于最终验证,不可多次"看"样本外数据后继续调优 (会将样本外变为新的样本内,重蹈过拟合)。
- **`SHARED-BT-SURV-003`** <sub>(high)</sub>: 停牌/缺失数据的填充策略:停牌日价格不可简单用前一日收盘价 forward-fill, 因为这会在复盘时造成"零成交量"日参与了因子计算和信号生成。 应在因子计算层显式过滤缺失交易日,不填充。
- **`SHARED-BT-SURV-004`** <sub>(high)</sub>: 异常值(Extreme Value)污染:原始市场数据可能含有数据源错误(如除权未 及时调整、手工录入错误导致的极端价格),不清洗直接进入因子计算会产生 极端信号,污染整个横截面。应在 pipeline 入口处过滤 3-sigma 异常值。
- **`SHARED-BT-COST-001`** <sub>(fatal)</sub>: 交易成本(佣金 + 印花税/转让税 + 过户费)必须在回测初始化时强制配置, 不可使用零成本默认值。忽略成本的回测策略绩效指标具有欺骗性, 高换手率策略尤其严重(单边往返成本往往吞噬 50%+ 的毛收益)。
- **`SHARED-BT-COST-002`** <sub>(high)</sub>: 滑点(Slippage)建模:回测若无滑点,假设每笔订单以理想价格成交, 高频策略在实盘中会因成交价劣化而产生严重亏损。至少应配置固定点差 或比例滑点;大单应使用成交量比例模型(如不超过日成交量 5%)。
- **`SHARED-BT-COST-003`** <sub>(high)</sub>: 换手率(Turnover)必须在回测绩效报告中展示并与成本关联分析。 月换手率超过 50%(年化 600%+)时,策略净收益对成本假设极度敏感, 每 10bps 成本变化可能改变策略盈亏结论,必须做成本敏感性分析。
- **`SHARED-BT-COST-004`** <sub>(medium)</sub>: 仓位规模化(Position Sizing)必须纳入资金量约束:回测应模拟固定资金量 下的实际持仓股数(取整),而非假设可以持有小数股。 对小盘股,最小交易单位(A股:100股/手)会导致实际可持仓量与目标权重 产生偏差,应在回测中模拟取整效应。
- **`SHARED-BT-TIME-001`** <sub>(high)</sub>: 时间戳时区统一:多数据源合并时,UTC vs 本地时间混用是常见数据腐败源。 所有时间戳必须在 pipeline 入口处统一转换为同一时区(推荐 UTC 存储, 市场本地时区展示),不可在 pipeline 中途混用不同时区。
- **`SHARED-BT-TIME-002`** <sub>(high)</sub>: 交易日历对齐:合并不同市场或不同频率数据时(如日线价格 + 周频因子), 必须使用明确的交易日历进行 reindex/merge,不可使用 outer join 后 fillna, 否则会在非交易日(节假日)创建虚假数据行。
- **`SHARED-BT-TIME-003`** <sub>(high)</sub>: 增量更新边界校验:历史数据增量更新时,必须从数据库查询已存最新日期, 仅下载该日期之后的数据。若重新下载已有数据并追加,会产生时间戳重复行, 导致回测时序错误。更新前后必须校验无重复 (index.duplicated().any() == False)。
- **`SHARED-BT-TIME-004`** <sub>(medium)</sub>: 回测绩效归因失真:基准(Benchmark)选择不当会使 Alpha/Beta 计算失真。 应选用策略实际可投资的被动基准(如 HS300 ETF),而非不可直接投资的 价格指数(如 HS300 指数)。价格指数不含股息再投资,会低估持仓基准收益。
- **`SHARED-BT-PERF-001`** <sub>(medium)</sub>: 最大回撤(Max Drawdown)计算必须使用净值序列(portfolio value), 不可用累计收益率序列代替。若使用对数收益率累加,会低估回撤深度 (因对数收益率在下跌时会比简单收益率偏小)。
- **`SHARED-BT-PERF-002`** <sub>(medium)</sub>: Sharpe Ratio 年化化约定:年化 Sharpe = 日 Sharpe × sqrt(252)(股票,252 交易日) 或 × sqrt(365)(加密货币,365日)。不同系统默认不同,跨系统对比前必须 确认年化因子,否则 Sharpe 不可比。
- **`SHARED-BT-PERF-003`** <sub>(medium)</sub>: Calmar Ratio / Sortino Ratio 优于 Sharpe Ratio 作为风险调整收益指标: Sharpe 假设收益正态分布,A 股/加密市场的收益分布显著左偏(肥尾), 会低估下行风险。量化评估应同时报告 Sortino(仅下行波动)和 Calmar(年化收益/最大回撤),不应单一依赖 Sharpe。
- **`SHARED-BT-PERF-004`** <sub>(medium)</sub>: 回测绩效归因应拆解为:alpha(主动收益)、beta(市场收益)、 因子暴露收益(style/sector)和特异性收益(stock selection)。 不做归因的回测无法区分"策略优秀"与"顺风行情恰好 beta 对了"。
- **`SHARED-FR-IC-001`** <sub>(high)</sub>: IC(信息系数)是衡量因子预测能力的核心指标,定义为因子值与 下期收益率的 Spearman 秩相关系数(ICIR = IC / std(IC))。 IC 绝对值 > 0.05 视为有预测能力的初步证据,ICIR > 0.5 视为稳定。 不计算 IC 直接报告回测绩效是因子有效性证明缺失的典型问题。
- **`SHARED-FR-IC-002`** <sub>(high)</sub>: IC 衰减(IC Decay)分析:因子预测能力通常随持仓期增长而衰减。 应计算 1/5/10/20 日 IC 序列,识别因子的最优持仓期。 IC 在1日高但20日迅速衰减的因子是短期因子,不适合月度换仓策略; 反之亦然。使用错误的持仓期会严重损害因子实盘表现。
- **`SHARED-FR-IC-003`** <sub>(high)</sub>: Harvey, Liu & Zhu (2016) 警告:学术界已发现 300+ 个"显著"因子, 其中大量是多重检验下的误发现(False Discovery)。因子有效性要求: t-stat > 3.0(而非传统的 1.96);或在不同时段/市场独立复现; 或有清晰的经济学逻辑。不满足上述条件的因子极可能是数据挖掘产物。
- **`SHARED-FR-IC-004`** <sub>(high)</sub>: 因子换手率(Factor Turnover)控制:高 IC 但高换手率的因子,在扣除 交易成本后净 IC 可能为负。应计算换手率调整后的有效 IC: net_IC = IC - turnover × cost_per_turn。目标换手率 ≤ 50%(月频)。
- **`SHARED-FR-IC-005`** <sub>(medium)</sub>: 因子衰减期(Half-life)是因子信号强度的核心参数,直接决定最优再平衡频率。 半衰期 < 5 日:日频或周频换仓;5-20 日:周频或双周;> 20 日:月频换仓。 错误地对短期因子使用月频换仓,会导致大量 alpha 在持仓期内消散。
- **`SHARED-FR-NEUT-001`** <sub>(high)</sub>: 行业中性化(Industry Neutralization):因子值若不对行业均值中性化, 因子收益中会混入行业轮动收益,难以判断是因子本身还是行业暴露驱动了收益。 行业中性化操作:factor_neutral = factor - industry_mean(factor)。
- **`SHARED-FR-NEUT-002`** <sub>(high)</sub>: 市值中性化(Market Cap Neutralization):小盘股效应(小盘跑赢大盘) 是金融史上最持久的 anomaly 之一,会污染几乎所有未中性化的因子。 若因子与市值高度相关,选股会系统性偏向小盘,收益来自市值暴露而非因子本身。 需同时进行行业和市值中性化(Fama-MacBeth 回归或残差法)。
- **`SHARED-FR-NEUT-003`** <sub>(high)</sub>: 异常值处理(Winsorize/MAD):因子原始值通常含有极端值,极端值会扭曲 分组分析(如 Q1/Q10 十分位)。应对原始因子值做 Winsorize(截尾至 [1%, 99%] 或 3-sigma)或 MAD(中位数绝对偏差)缩尾,然后再排名/中性化。
- **`SHARED-FR-NEUT-004`** <sub>(medium)</sub>: 因子正交化(Factor Orthogonalization):当多个因子共同用于合成打分时, 高相关因子的合成等效于对单一因子过度权重,稀释信号多样性。 应在合成前对因子做施密特正交化或 PCA,消除因子间的多重共线性。
- **`SHARED-FR-NEUT-005`** <sub>(medium)</sub>: 缺失数据填充策略:因子计算中的 NaN(停牌/新股/数据缺口)若用截面均值填充 会引入 lookahead bias(均值本身含未来信息);若完全删除会产生幸存者偏差; 正确做法是用截面中位数(当日所有股票的中位数,不依赖未来)或将该股当日排除。
- **`SHARED-FR-PORT-001`** <sub>(high)</sub>: 分层分析(Quantile Analysis):因子评估应使用 Q1/Q5(五分位)或 Q1/Q10(十分位)分组的多空收益差(top minus bottom spread)作为 主要评估指标,而非简单的多头收益。Q1 多 Q5 空的"单调性"检验是 因子有效性的核心证据:单调递增/递减 > 非单调 >> 仅多头有效。
- **`SHARED-FR-PORT-002`** <sub>(medium)</sub>: Alpha 衰减测试(Alpha Decay Test):因子的月度 IC 在不同时段(牛市/熊市/ 震荡市)的稳定性是因子鲁棒性的重要证据。IC 仅在某个特定市场状态下有效 的因子不适合全天候部署;应分段(rolling 12M)展示 IC 时序, 识别因子失效期。
- **`SHARED-FR-PORT-003`** <sub>(medium)</sub>: 换仓成本感知(Turnover-Aware Selection):因子排名靠近中间地带(49-51 分位) 的股票,排名小幅波动就会触发换仓,产生大量无效交易成本。 应在选股时设置换仓缓冲区(buffer zone):只在排名变化超过阈值时才换仓。
- **`SHARED-FR-PORT-004`** <sub>(medium)</sub>: 分组收益的统计显著性(Bootstrap 检验):因子分层收益差(Q1-Q5 spread) 即使在历史数据上很大,也可能是偶然,需要 bootstrap 或 t-test 检验 显著性(p-value < 0.05)。小样本回测期(< 3年)的分层收益尤其不可靠。
- **`SHARED-FR-XFER-001`** <sub>(high)</sub>: 因子跨市场可移植性验证:在一个市场有效的因子,不必然在另一个市场有效。 将美股因子直接套用 A 股、或将股票因子套用期货/加密货币,需要独立 IC 验证, 不可假设跨市场通用性。A 股特有异象(如反转效应、ST 价格异常)不存在于美股。
- **`SHARED-FR-XFER-002`** <sub>(medium)</sub>: 因子有效性时间稳定性:曾经有效的因子会因市场学习和套利行为逐渐失效 (McLean & Pontiff 2016 证明因子发表后平均衰减 58%)。 应定期(每季度/年)重新评估因子 IC,失效因子应及时替换或降权。
- **`SHARED-FR-XFER-003`** <sub>(medium)</sub>: 因子与宏观经济环境的交互:利率周期/经济周期/市场情绪对因子有效性影响显著。 价值因子(低 P/B)在利率上升期更有效;动量因子在趋势市更有效,震荡市失效。 部署因子前应评估当前宏观环境与因子最优生存环境的匹配度。
FILE:references/LOCKS.md
# Semantic Locks + Preconditions
## Semantic Locks (12)
### `SL-01` <sub>(on_violation: fatal)</sub>
Execute sell orders before buy orders in every trading cycle
### `SL-02` <sub>(on_violation: fatal)</sub>
Trading signals MUST use next-bar execution (no look-ahead)
### `SL-03` <sub>(on_violation: fatal)</sub>
Entity IDs MUST follow format entity_type_exchange_code
### `SL-04` <sub>(on_violation: fatal)</sub>
DataFrame index MUST be MultiIndex (entity_id, timestamp)
### `SL-05` <sub>(on_violation: fatal)</sub>
TradingSignal MUST have EXACTLY ONE of: position_pct, order_money, order_amount
### `SL-06` <sub>(on_violation: fatal)</sub>
filter_result column semantics: True=BUY, False=SELL, None/NaN=NO ACTION
### `SL-07` <sub>(on_violation: fatal)</sub>
Transformer MUST run BEFORE Accumulator in factor pipeline
### `SL-08` <sub>(on_violation: fatal)</sub>
MACD parameters locked: fast=12, slow=26, signal=9
### `SL-09` <sub>(on_violation: warning)</sub>
Default transaction costs: buy_cost=0.001, sell_cost=0.001, slippage=0.001
### `SL-10` <sub>(on_violation: fatal)</sub>
A-share equity trading is T+1 (no same-day close of buy positions)
### `SL-11` <sub>(on_violation: fatal)</sub>
Recorder subclass MUST define provider AND data_schema class attributes
### `SL-12` <sub>(on_violation: fatal)</sub>
Factor result_df MUST contain either 'filter_result' OR 'score_result' column
## Preconditions (4)
- **`PC-01`**: `python3 -c 'import zvt; print(zvt.__version__)'` → on_fail: Run: python3 -m pip install zvt then re-run: python3 -m zvt.init_dirs to initialize data directories
- **`PC-02`**: `python3 -c "from zvt.api.kdata import get_kdata; df = get_kdata(entity_ids=['stock_sh_600000'], limit=1); assert df is not None and len(df) > 0, 'No kdata found'"` → on_fail: Run recorder first: python3 -m zvt.recorders.em.em_stock_kdata_recorder --entity_ids stock_sh_600000 (replace with your target entity IDs)
- **`PC-03`**: `python3 -c "import os; from pathlib import Path; zvt_home = Path(os.environ.get('ZVT_HOME', Path.home() / '.zvt')); assert zvt_home.exists(), f'ZVT home not found: {zvt_home}'"` → on_fail: Run: python3 -m zvt.init_dirs
- **`PC-04`**: `python3 -c "import os, tempfile; from pathlib import Path; zvt_home = Path(os.environ.get('ZVT_HOME', Path.home() / '.zvt')); test_f = zvt_home / '.write_test'; test_f.touch(); test_f.unlink()"` → on_fail: Check directory permissions: chmod u+w ~/.zvt or set ZVT_HOME environment variable to a writable location
FILE:references/USE_CASES.md
# Known Use Cases (KUC)
Total: **9**
## `KUC-101`
**Source**: `Paper Trading/Automated_Paper_Trading.ipynb`
Execute simulated paper trades in real-time using a trained PPO reinforcement learning agent connected to Alpaca brokerage API, enabling risk-free strategy validation before live deployment.
## `KUC-102`
**Source**: `examples/Aarons_portfolio_optimization_example.ipynb`
Optimize portfolio allocation across multiple assets using Markowitz mean-variance optimization to maximize risk-adjusted returns, balancing expected return against portfolio volatility.
## `KUC-103`
**Source**: `examples/FinRL_Ensemble_StockTrading_ICAIF_2020.ipynb`
Train and evaluate an ensemble of deep reinforcement learning agents for stock trading, combining multiple model predictions to improve robustness and performance across varying market conditions.
## `KUC-104`
**Source**: `examples/FinRL_PaperTrading_Demo.ipynb`
Demonstrate live paper trading execution using a PPO neural network agent connected to Alpaca's paper trading API, enabling real-time trade simulation with market data feeds.
## `KUC-105`
**Source**: `examples/FinRL_PortfolioOptimizationEnv_Demo.ipynb`
Train deep reinforcement learning agents for portfolio allocation across Brazilian stocks (B3 exchange), using custom portfolio environment with deep neural network policies to optimize multi-asset holdings.
## `KUC-106`
**Source**: `examples/Stock_NeurIPS2018_SB3.ipynb`
Implement stock trading strategies using StableBaselines3 library's DRL implementations (A2C, PPO, SAC) with feature engineering on technical indicators for training and evaluation on historical market data.
## `KUC-107`
**Source**: `examples/run_markowitz_portfolio_optimization.py`
Execute Markowitz mean-variance portfolio optimization algorithm as a standalone Python script, computing optimal asset weights based on historical returns covariance and expected returns to minimize portfolio variance.
## `KUC-108`
**Source**: `examples/run_rl_portfolio_optimization.py`
Train and run reinforcement learning agent (A2C) for portfolio optimization using StockPortfolioEnv, enabling adaptive portfolio allocation that learns from market interactions rather than static optimization.
## `KUC-109`
**Source**: `meta/env_execution_optimizing/order_execution_qlib/workflow_by_code.ipynb`
Optimize order execution using Qlib's LightGBM model to predict stock movements and implement TopkDropoutStrategy, improving trade execution quality by timing orders based on predicted signals.
FILE:references/WISDOM.md
# Cross-Project Wisdom
Total: **10**
## `CW-BT-001` — Cerebro 统一编排引擎
**From**: backtrader · **Applicable to**: backtesting
backtrader 用 Cerebro 作为单一入口,统一管理 data feeds、strategies、analyzers、 observers 的生命周期,支持一次 cerebro.run() 跑多策略+多数据源。 zvt 的 StockTrader 目前每次实例化只绑定一套因子,缺乏统一的多策略组合编排层; 借鉴 Cerebro 模式可让用户把多个 Trader 实例组合到一个 runner 中对比评估。
## `CW-BT-002` — Analyzer 插件化绩效评估
**From**: backtrader · **Applicable to**: backtesting
backtrader 提供 SharpeRatio、DrawDown、TimeReturn、TradeAnalyzer 等即插即用 的 Analyzer,可在不修改策略代码的情况下附加任意绩效指标。 zvt 当前绩效评估能力较弱,没有标准化的 Analyzer 接口; 借鉴此模式可让用户 cerebro.addanalyzer(SharpeRatio) 即得风险调整收益报告。
## `CW-BT-003` — Sizer 仓位管理分离
**From**: backtrader · **Applicable to**: backtesting
backtrader 将仓位管理(每次开仓买多少股/多大比例)单独抽象为 Sizer, 与信号逻辑完全解耦;内置 FixedSize、PercentSizer 等,用户可自定义。 zvt 目前没有显式的 Sizer 概念,仓位控制逻辑散落在 Trader.on_profit_control 等钩子中; 引入 Sizer 接口可使策略信号与资金管理规则独立演化和组合复用。
## `CW-BT-004` — Order 类型全集(Limit/Stop/OCO/Bracket)
**From**: backtrader · **Applicable to**: backtesting
backtrader 支持 Market、Limit、Stop、StopLimit、OCO(二选一)、 Bracket(止盈止损一对订单)等丰富订单类型,并模拟成交滑点和手续费方案。 zvt 回测目前主要支持市价成交,缺乏限价委托和组合订单模拟; 对于高频或实盘对接场景,完善订单类型将大幅提升回测真实性。
## `CW-BT-005` — 数据重采样与重播(Resampling & Replaying)
**From**: backtrader · **Applicable to**: backtesting
backtrader 可将低级别数据(如 1 min)实时 resample 为高级别(如 1 day)并同步驱动策略, 或 replay 逐 tick 模拟 OHLC 形成过程,实现日内精细回测。 zvt 目前多时间框架通过预录入不同级别 K 线实现,缺少运行时动态重采样; 借鉴此模式可在不重复录入数据的前提下支持任意时间粒度组合回测。
## `CW-VN-003` — CTA 回测引擎内置可视化
**From**: vnpy · **Applicable to**: backtesting
vnpy 的 cta_backtester 提供图形界面直接展示策略净值曲线、最大回撤、 每日盈亏、成交明细,无需 Jupyter Notebook。 zvt 目前回测结果可视化依赖 draw_result 方法调用 Plotly,但无统一的回测报告页面; 借鉴此模式可打包一个开箱即用的策略绩效仪表盘。
## `CW-VN-004` — vnpy.alpha ML 因子研究实验室(Lab)
**From**: vnpy · **Applicable to**: factor-research
vnpy 4.0 的 vnpy.alpha.lab 提供数据管理、模型训练、信号生成、策略回测一体化工作流, 支持 Lasso/LightGBM/MLP 等算法的标准化训练接口和可视化对比。 zvt 的 ML 能力目前仅有 MaStockMLMachine 一个入口,缺乏规范化 Lab 框架; 借鉴 Lab 模式可建立"特征工程→训练→信号→回测"的标准流水线,降低 ML 实验门槛。
## `CW-QL-001` — Point-in-Time 数据库(防未来数据泄漏)
**From**: qlib · **Applicable to**: backtesting
qlib 的 Point-in-Time Provider 保证在给定时间点 t 的查询只返回 t 时刻 真实可知的数据(财报发布延迟、修订历史均被正确处理), 彻底消除回测中的 look-ahead bias。 zvt 目前财务数据以报告期为 timestamp,缺少"发布日"维度, 存在用未来财报数据做选股的潜在偏差;引入 PIT 模式可大幅提升回测可信度。
## `CW-QL-002` — Recorder + Experiment 实验管理(MLflow 风格)
**From**: qlib · **Applicable to**: factor-research
qlib 的 workflow 模块提供 Experiment/Recorder,自动记录每次模型训练的 超参数、特征、指标、预测结果,支持跨实验比较和模型版本管理。 zvt 目前缺乏 ML 实验追踪机制,每次重跑结果会覆盖前次; 借鉴 Recorder 模式可将每次因子实验的参数和结果持久化,支持快速复现和版本对比。
## `CW-QL-003` — Nested Decision Framework(多层嵌套决策执行)
**From**: qlib · **Applicable to**: backtesting
qlib 支持将高频执行层(分钟级委托拆单)嵌套在低频决策层(日级组合调仓)中, 两层独立优化且可组合运行,实现日内最优执行算法(如 TWAP、VWAP 调仓)。 zvt 目前回测仅有日线级别的成交假设,缺乏执行算法建模; 借鉴嵌套框架可让 zvt 区分"何时持有哪些股"与"如何以最小冲击成本建仓"两个问题。
FILE:references/components/agent_training.md
# agent_training (6 classes)
## `DRLAgent.get_model`
`agent_training/drlagent-get-model.py:0`
## `DRLAgent.train_model`
`agent_training/drlagent-train-model.py:0`
## `DRLAgent.DRL_prediction`
`agent_training/drlagent-drl-prediction.py:0`
## `DRLEnsembleAgent.run_ensemble_strategy`
`agent_training/drlensembleagent-run-ensemble-strategy.py:0`
## `algorithm`
`agent_training/algorithm.py:0`
## `framework`
`agent_training/framework.py:0`
FILE:references/components/data_collection.md
# data_collection (5 classes)
## `DataProcessor.fetch_data`
`data_collection/dataprocessor-fetch-data.py:0`
## `_Base.download_data`
`data_collection/base-download-data.py:0`
## `_Base.clean_data`
`data_collection/base-clean-data.py:0`
## `_Base.calc_time_zone`
`data_collection/base-calc-time-zone.py:0`
## `data_source`
`data_collection/data-source.py:0`
FILE:references/components/environment_simulation.md
# environment_simulation (5 classes)
## `StockTradingEnv.reset`
`environment_simulation/stocktradingenv-reset.py:0`
## `StockTradingEnv.step`
`environment_simulation/stocktradingenv-step.py:0`
## `StockTradingEnv.get_state`
`environment_simulation/stocktradingenv-get-state.py:0`
## `reward_function`
`environment_simulation/reward-function.py:0`
## `cost_model`
`environment_simulation/cost-model.py:0`
FILE:references/components/feature_engineering.md
# feature_engineering (4 classes)
## `_Base.add_technical_indicator`
`feature_engineering/base-add-technical-indicator.py:0`
## `_Base.add_turbulence`
`feature_engineering/base-add-turbulence.py:0`
## `_Base.df_to_array`
`feature_engineering/base-df-to-array.py:0`
## `indicator_library`
`feature_engineering/indicator-library.py:0`
FILE:references/components/order_execution_-_execution_optimization.md
# order_execution_&_execution_optimization (6 classes)
## `TWAP.execute`
`order_execution_&_execution_optimization/twap-execute.py:0`
## `VWAP.execute`
`order_execution_&_execution_optimization/vwap-execute.py:0`
## `AC.compute_AC_utility`
`order_execution_&_execution_optimization/ac-compute-ac-utility.py:0`
## `MarketEnvironment.start_transactions`
`order_execution_&_execution_optimization/marketenvironment-start-transactions.py:0`
## `execution_policy`
`order_execution_&_execution_optimization/execution-policy.py:0`
## `reward_type`
`order_execution_&_execution_optimization/reward-type.py:0`
FILE:references/components/paper_trading.md
# paper_trading (3 classes)
## `AlpacaPaperTrading.run`
`paper_trading/alpacapapertrading-run.py:0`
## `AlpacaPaperTradingMultiCrypto.test_latency`
`paper_trading/alpacapapertradingmulticrypto-test-laten.py:0`
## `broker`
`paper_trading/broker.py:0`
提供多市场财务分析能力,涵盖历史数据获取、财务报表解析、财务比率计算、固定收益分析、投资组合绩效评估和股票基本面筛选等核心功能。。
---
name: financial-ratios-toolkit
description: |-
提供多市场财务分析能力,涵盖历史数据获取、财务报表解析、财务比率计算、固定收益分析、投资组合绩效评估和股票基本面筛选等核心功能。。
license: Proprietary. See LICENSE.txt in project root.
compatibility: Designed for Doramagic-host ecosystem (Claude Code / openclaw / Cursor). Requires Python 3.12+ with uv package manager.
metadata:
version: "v6.1"
blueprint_id: "finance-bp-118"
compiled_at: "2026-04-22T13:00:57.393924+00:00"
capability_markets: "multi-market"
capability_activities: "portfolio-analytics"
sop_version: "crystal-compilation-v6.1"
---
# 财务比率工具 (financial-ratios-toolkit)
> 提供多市场财务分析能力,涵盖历史数据获取、财务报表解析、财务比率计算、固定收益分析、投资组合绩效评估和股票基本面筛选等核心功能。
## Pipeline
`data_collection -> data_storage -> factor_computation -> target_selection -> trading_execution -> visualization`
## Top Use Cases (13 total)
### Multi-Module Financial Analysis Overview (`UC-101`)
Demonstrating comprehensive financial analysis capabilities covering multiple domains including historical data, financial statements, ratios, models,
**Triggers**: financial analysis, overview, multi-module
### Fixed Income Analysis and Bond Valuation (`UC-103`)
Analyzing fixed income securities including bond statistics, duration calculations, derivative pricing models, and government/corporate bond yield com
**Triggers**: bond, fixed income, yield
### Financial Ratio Analysis (`UC-106`)
Evaluating company financial health through profitability ratios, solvency ratios, liquidity ratios, valuation ratios, and custom ratio calculations f
**Triggers**: ratio, profitability, solvency
For all **13** use cases, see [references/USE_CASES.md](references/USE_CASES.md).
**Execute trigger**: `When user intent matches intent_router.uc_entries[].positive_terms AND user uses action verb (run/execute/跑/执行/backtest/fetch/collect)`
## What I'll Ask You
- Target market: A-share (default), HK, or crypto? (US stocks in ZVT are half-baked — stockus_nasdaq_AAPL exists but coverage is thin)
- Data source / provider: eastmoney (free, no account), joinquant (account+paid), baostock (free, good history), akshare, or qmt (broker)?
- Strategy type: MACD golden-cross, MA crossover, volume breakout, fundamental screen, or custom factor?
- Time range: start_timestamp and end_timestamp for backtest period
- Target entity IDs: specific stocks (stock_sh_600000) or index components (SZ1000)?
## Semantic Locks (Fatal)
| ID | Rule | On Violation |
|---|---|---|
| `SL-01` | Execute sell orders before buy orders in every trading cycle | halt |
| `SL-02` | Trading signals MUST use next-bar execution (no look-ahead) | halt |
| `SL-03` | Entity IDs MUST follow format entity_type_exchange_code | halt |
| `SL-04` | DataFrame index MUST be MultiIndex (entity_id, timestamp) | halt |
| `SL-05` | TradingSignal MUST have EXACTLY ONE of: position_pct, order_money, order_amount | halt |
| `SL-06` | filter_result column semantics: True=BUY, False=SELL, None/NaN=NO ACTION | halt |
| `SL-07` | Transformer MUST run BEFORE Accumulator in factor pipeline | halt |
| `SL-08` | MACD parameters locked: fast=12, slow=26, signal=9 | halt |
Full lock definitions: [references/LOCKS.md](references/LOCKS.md)
## Top Anti-Patterns (14 total)
- **`AP-PORTFOLIO-ANALYTICS-001`**: Division by zero in price ratio calculations corrupts rebalancing
- **`AP-PORTFOLIO-ANALYTICS-002`**: Look-ahead bias from unshifted signal generation and position calculations
- **`AP-PORTFOLIO-ANALYTICS-003`**: Non-positive-semidefinite covariance matrix breaks CVXPY optimization
All 14 anti-patterns: [references/ANTI_PATTERNS.md](references/ANTI_PATTERNS.md)
## Evidence Quality Notice
> [QUALITY NOTICE] This crystal was compiled from blueprint finance-bp-118. Evidence verify ratio = 33.3% and audit fail total = 62. Generated results may have uncaptured requirement gaps. Verify critical decisions against source files (LATEST.yaml / LATEST.jsonl).
## Reference Files
| File | Contents | When to Load |
|---|---|---|
| [references/seed.yaml](references/seed.yaml) | V6+ 全量权威 (source-of-truth) | 有行为/决策争议时必读 |
| [references/ANTI_PATTERNS.md](references/ANTI_PATTERNS.md) | 14 条跨项目反模式 | 开始实现前 |
| [references/WISDOM.md](references/WISDOM.md) | 跨项目精华借鉴 | 架构决策时 |
| [references/CONSTRAINTS.md](references/CONSTRAINTS.md) | domain + fatal 约束 | 规则冲突时 |
| [references/USE_CASES.md](references/USE_CASES.md) | 全量 KUC-* 业务场景 | 需要完整示例时 |
| [references/LOCKS.md](references/LOCKS.md) | SL-* + preconditions + hints | 生成回测/交易代码前 |
| [references/COMPONENTS.md](references/COMPONENTS.md) | AST 组件地图(按 module 拆分)| 查 API 时 |
---
*Compiled by Doramagic crystal-compilation-v6.1 from `finance-bp-118` blueprint at 2026-04-22T13:00:57.393924+00:00.*
*See [human_summary.md](human_summary.md) for non-technical overview.*
FILE:human_summary.md
# Human Summary
> I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow. Just tell me what you want; I'll write the code, you don't have to dig docs. (Heads up: ZVT natively supports A-share, HK, and crypto. US stocks — stockus_nasdaq_AAPL — are half-baked; don't bother for serious work.)
## What I Can Do
- **tagline**: I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow. Just tell me what you want; I'll write the code, you don't have to dig docs. (Heads up: ZVT natively supports A-share, HK, and crypto. US stocks — stockus_nasdaq_AAPL — are half-baked; don't bother for serious work.)
- **use_cases**: ['Fixed Income Analysis and Bond Valuation', 'Basic Historical Data and Financial Statements Retrieval', 'Multi-Module Financial Analysis Overview', 'A-share MACD daily golden-cross backtest with hfq price adjustment from eastmoney', 'End-to-end ZVT pipeline: FinanceRecorder + GoodCompanyFactor + StockTrader', 'Multi-factor strategy with TargetSelector (AND mode) combining MACD + volume breakout', 'Index composition data collection (SZ1000, SZ2000) with EM recorder']
## What I Ask You
- Target market: A-share (default), HK, or crypto? (US stocks in ZVT are half-baked — stockus_nasdaq_AAPL exists but coverage is thin)
- Data source / provider: eastmoney (free, no account), joinquant (account+paid), baostock (free, good history), akshare, or qmt (broker)?
- Strategy type: MACD golden-cross, MA crossover, volume breakout, fundamental screen, or custom factor?
- Time range: start_timestamp and end_timestamp for backtest period
- Target entity IDs: specific stocks (stock_sh_600000) or index components (SZ1000)?
FILE:references/ANTI_PATTERNS.md
# Anti-Patterns (Cross-Project)
Total: **14**
## finance-bp-066--wealthbot (2)
### `AP-PORTFOLIO-ANALYTICS-001` — Division by zero in price ratio calculations corrupts rebalancing <sub>(high)</sub>
When calculating price_diff using current_price divided by old_price without validating old_price is non-zero, the result is NaN or INF. This corrupts portfolio rebalancing calculations in wealthbot, causing incorrect buy/sell decisions based on invalid prices_diff values. The same issue appears in getPricesDiff() where divide-by-zero when old_price equals zero produces NaN/infinity that propagates to all subsequent trade decisions.
### `AP-PORTFOLIO-ANALYTICS-004` — Incorrect portfolio value tracking destroys time-series integrity <sub>(high)</sub>
Updating existing ClientPortfolioValue records instead of creating new ones destroys the time-series integrity needed for billing calculations and historical reconciliation. This creates data corruption where billing calculations and historical reporting against custodian records will fail to match. Portfolio value records must be linked to parent ClientPortfolio via proper relationships to avoid orphaned records.
## finance-bp-068--xalpha (1)
### `AP-PORTFOLIO-ANALYTICS-006` — FIFO sell order violation corrupts cost basis and XIRR <sub>(high)</sub>
Processing positions out of chronological order in FIFO sell operations causes incorrect cost basis assignment, leading to inaccurate realized gains/losses and wrong XIRR calculation. Chinese funds have tiered redemption fees based on holding periods, so FIFO violations result in incorrect holding period calculation and wrong redemption fee being applied, causing direct financial loss.
## finance-bp-068--xalpha, finance-bp-093--PyPortfolioOpt, finance-bp-117--Riskfolio-Lib (1)
### `AP-PORTFOLIO-ANALYTICS-010` — Missing DataFrame schema validation causes KeyError propagation <sub>(medium)</sub>
Passing non-DataFrame objects (numpy arrays, lists) where DataFrame is expected causes NameError, AttributeError, or TypeError in downstream pandas operations. xalpha's fundinfo.price requires specific columns (date, netvalue, totvalue, comment), PyPortfolioOpt and Riskfolio-Lib require index alignment between expected returns and covariance matrix. Missing columns cause backtest calculations to fail with NaN values or KeyError.
## finance-bp-082--stock-screener (1)
### `AP-PORTFOLIO-ANALYTICS-007` — Score validation bypass allows invalid composite calculations <sub>(medium)</sub>
Accepting scores outside the 0-100 range in screener results corrupts ranking and rating logic, causing unpredictable screening results that violate the fundamental score contract. When combined with division-by-zero guards that return 0.0 for empty screener lists, this creates unpredictable behavior where invalid scores produce wrong composite calculations and incorrect Strong Buy/Buy/Watch/Pass ratings.
## finance-bp-093--PyPortfolioOpt (1)
### `AP-PORTFOLIO-ANALYTICS-008` — Convex optimization constraints violate DCP rules <sub>(high)</sub>
Using non-convex objectives or DCP-violating expressions in CVXPY optimization causes DCPError, completely preventing portfolio optimization from running. Similarly, providing non-callable constraints or invalid bounds formats (not matching n_assets length) causes TypeError. Feasibility violations like setting target_volatility below global minimum or target_return above maximum achievable return make problems infeasible.
## finance-bp-093--PyPortfolioOpt, finance-bp-117--Riskfolio-Lib (1)
### `AP-PORTFOLIO-ANALYTICS-003` — Non-positive-semidefinite covariance matrix breaks CVXPY optimization <sub>(high)</sub>
Passing a non-positive-semidefinite covariance matrix to CVXPY optimization with assume_PSD=True produces incorrect results because the solver assumes validity without verification. This causes Cholesky decomposition to fail or produce garbage weights, preventing portfolio optimization from running entirely. Riskfolio-Lib and PyPortfolioOpt both require explicit PSD validation before optimization.
## finance-bp-106--pyfolio-reloaded (2)
### `AP-PORTFOLIO-ANALYTICS-005` — Allocation denominator excludes cash, corrupting portfolio composition <sub>(medium)</sub>
When computing allocation percentages excluding cash from the denominator, portfolio allocation percentages will not sum to 100%, misrepresenting the portfolio's actual composition. Additionally, concentration metrics become artificially skewed when including cash (a non-position asset), producing misleading diversification assessments that could lead to inappropriate risk management decisions.
### `AP-PORTFOLIO-ANALYTICS-009` — Transaction data corruption from missing columns and invalid dates <sub>(medium)</sub>
Extracting round trips from transactions DataFrame without validating required columns (amount, price, symbol) causes KeyError exceptions. When open_dt is not strictly less than close_dt, negative or zero duration values indicate data corruption causing incorrect holding period statistics. Similarly, non-normalized transaction timestamps cause intra-day trades to be incorrectly split across days.
## finance-bp-107--empyrical-reloaded (1)
### `AP-PORTFOLIO-ANALYTICS-011` — Wrong annualization factors distort cross-frequency metric comparison <sub>(high)</sub>
Applying incorrect annualization factors (wrong values for daily, weekly, monthly, quarterly, yearly frequencies) produces non-comparable metrics across different return frequencies, causing invalid strategy comparisons and misallocated capital. The Sharpe ratio formula must use correct annualization with sample standard deviation (ddof=1), otherwise producing misleading risk-adjusted return estimates.
## finance-bp-107--empyrical-reloaded, finance-bp-118--FinanceToolkit (1)
### `AP-PORTFOLIO-ANALYTICS-012` — Misaligned time series in alpha/beta calculation produces invalid factor analysis <sub>(high)</sub>
Passing returns and factor_returns to alpha_beta functions without verifying data alignment on index labels (pd.Series) or length equality (np.ndarray) produces incorrect alpha/beta values due to correlation computed between mismatched periods. Including benchmark ticker in the asset ticker list causes circular correlation producing meaningless beta values of approximately 1.0.
## finance-bp-108--finmarketpy (1)
### `AP-PORTFOLIO-ANALYTICS-013` — Forward-filling spot prices creates look-ahead bias in TRI construction <sub>(high)</sub>
Forward-filling spot prices creates look-ahead bias where future prices are used to calculate historical returns, invalidating all TRI-based backtest results. The total return index construction requires multiplicative cumulation using cumprod (not cumsum) with base value 100, as additive cumulation allows negative cumulative returns to break the index chain.
## finance-bp-108--finmarketpy, finance-bp-106--pyfolio-reloaded (1)
### `AP-PORTFOLIO-ANALYTICS-002` — Look-ahead bias from unshifted signal generation and position calculations <sub>(high)</sub>
Generating trading signals from current-period technical indicators (RSI, moving averages) without proper shift(-1) creates look-ahead bias, causing live trading returns to fall far below backtested results. Similarly, when estimating intraday positions from transactions without applying shift(1) to EOD positions, day-start positions are contaminated with end-of-day values, making results unrepresentative of actual trading.
## finance-bp-117--Riskfolio-Lib, finance-bp-093--PyPortfolioOpt (1)
### `AP-PORTFOLIO-ANALYTICS-014` — Unsupported solver selection breaks advanced risk calculations <sub>(medium)</sub>
Using solvers that don't support required cone programming (power cone, exponential cone) causes CVXPY to fail with SolverError, returning None and breaking risk calculations. CLARABEL, SCS, ECOS support power cone for RLVaR/RLDaR calculations, while CLARABEL/MOSEK/SCS/ECOS support exponential cone for EVaR calculations. Riskfolio-Lib and PyPortfolioOpt both require careful solver selection.
FILE:references/COMPONENTS.md
# Component Capability Map
**Project**: finance-bp-118--FinanceToolkit
**Scan date**: 2026-04-22
**Stats**: {'total_files': 12, 'total_classes': 49, 'total_functions': 0, 'total_stages': 12}
## Modules (12)
- [data_acquisition](components/data_acquisition.md): 3 classes
- [financial_statement_normalization](components/financial_statement_normalization.md): 3 classes
- [financial_ratio_calculation](components/financial_ratio_calculation.md): 6 classes
- [performance_analysis](components/performance_analysis.md): 4 classes
- [risk_analysis](components/risk_analysis.md): 4 classes
- [technical_analysis](components/technical_analysis.md): 4 classes
- [options_pricing_&_greeks](components/options_pricing_-_greeks.md): 4 classes
- [financial_modeling](components/financial_modeling.md): 6 classes
- [security_discovery](components/security_discovery.md): 3 classes
- [fixed_income_analysis](components/fixed_income_analysis.md): 4 classes
- [economic_data](components/economic_data.md): 4 classes
- [portfolio_management](components/portfolio_management.md): 4 classes
FILE:references/CONSTRAINTS.md
# Constraints
## preservation_manifest
```yaml
required_objects:
business_decisions_count: 177
fatal_constraints_count: 56
non_fatal_constraints_count: 352
use_cases_count: 13
semantic_locks_count: 12
preconditions_count: 4
evidence_quality_rules_count: 2
traceback_scenarios_count: 5
```
FILE:references/LOCKS.md
# Semantic Locks + Preconditions
## Semantic Locks (12)
### `SL-01` <sub>(on_violation: fatal)</sub>
Execute sell orders before buy orders in every trading cycle
### `SL-02` <sub>(on_violation: fatal)</sub>
Trading signals MUST use next-bar execution (no look-ahead)
### `SL-03` <sub>(on_violation: fatal)</sub>
Entity IDs MUST follow format entity_type_exchange_code
### `SL-04` <sub>(on_violation: fatal)</sub>
DataFrame index MUST be MultiIndex (entity_id, timestamp)
### `SL-05` <sub>(on_violation: fatal)</sub>
TradingSignal MUST have EXACTLY ONE of: position_pct, order_money, order_amount
### `SL-06` <sub>(on_violation: fatal)</sub>
filter_result column semantics: True=BUY, False=SELL, None/NaN=NO ACTION
### `SL-07` <sub>(on_violation: fatal)</sub>
Transformer MUST run BEFORE Accumulator in factor pipeline
### `SL-08` <sub>(on_violation: fatal)</sub>
MACD parameters locked: fast=12, slow=26, signal=9
### `SL-09` <sub>(on_violation: warning)</sub>
Default transaction costs: buy_cost=0.001, sell_cost=0.001, slippage=0.001
### `SL-10` <sub>(on_violation: fatal)</sub>
A-share equity trading is T+1 (no same-day close of buy positions)
### `SL-11` <sub>(on_violation: fatal)</sub>
Recorder subclass MUST define provider AND data_schema class attributes
### `SL-12` <sub>(on_violation: fatal)</sub>
Factor result_df MUST contain either 'filter_result' OR 'score_result' column
## Preconditions (4)
- **`PC-01`**: `python3 -c 'import zvt; print(zvt.__version__)'` → on_fail: Run: python3 -m pip install zvt then re-run: python3 -m zvt.init_dirs to initialize data directories
- **`PC-02`**: `python3 -c "from zvt.api.kdata import get_kdata; df = get_kdata(entity_ids=['stock_sh_600000'], limit=1); assert df is not None and len(df) > 0, 'No kdata found'"` → on_fail: Run recorder first: python3 -m zvt.recorders.em.em_stock_kdata_recorder --entity_ids stock_sh_600000 (replace with your target entity IDs)
- **`PC-03`**: `python3 -c "import os; from pathlib import Path; zvt_home = Path(os.environ.get('ZVT_HOME', Path.home() / '.zvt')); assert zvt_home.exists(), f'ZVT home not found: {zvt_home}'"` → on_fail: Run: python3 -m zvt.init_dirs
- **`PC-04`**: `python3 -c "import os, tempfile; from pathlib import Path; zvt_home = Path(os.environ.get('ZVT_HOME', Path.home() / '.zvt')); test_f = zvt_home / '.write_test'; test_f.touch(); test_f.unlink()"` → on_fail: Check directory permissions: chmod u+w ~/.zvt or set ZVT_HOME environment variable to a writable location
FILE:references/USE_CASES.md
# Known Use Cases (KUC)
Total: **13**
## `KUC-101`
**Source**: `examples/Finance Toolkit - 0. README Examples.ipynb`
Demonstrating comprehensive financial analysis capabilities covering multiple domains including historical data, financial statements, ratios, models, options, performance, risk, technical analysis, and fixed income in a single workflow.
## `KUC-102`
**Source**: `examples/Finance Toolkit - 1. Getting Started.ipynb`
Retrieving foundational financial data including historical price data, balance sheets, income statements, and cash flow statements for fundamental analysis and financial modeling.
## `KUC-103`
**Source**: `examples/Finance Toolkit - 10. Fixed Income Module.ipynb`
Analyzing fixed income securities including bond statistics, duration calculations, derivative pricing models, and government/corporate bond yield comparisons across multiple countries.
## `KUC-104`
**Source**: `examples/Finance Toolkit - 11. Portfolio Module.ipynb`
Measuring portfolio performance metrics, transaction history, risk-adjusted returns, and benchmarking against market indices like S&P 500 to evaluate investment performance.
## `KUC-105`
**Source**: `examples/Finance Toolkit - 5. Discovery Module.ipynb`
Discovering investment opportunities through stock screening based on market cap, price, beta, volume, and dividend criteria, and identifying top gainers, losers, and most active stocks.
## `KUC-106`
**Source**: `examples/Finance Toolkit - 3. Ratios Module.ipynb`
Evaluating company financial health through profitability ratios, solvency ratios, liquidity ratios, valuation ratios, and custom ratio calculations for multiple companies over time.
## `KUC-107`
**Source**: `examples/Finance Toolkit - 4. Models Module.ipynb`
Applying financial models including Extended Dupont analysis, WACC calculation, Altman Z-score for bankruptcy prediction, and Piotroski F-Score for financial health assessment.
## `KUC-108`
**Source**: `examples/Finance Toolkit - 5. Options Module.ipynb`
Computing options pricing using Black-Scholes and binomial models, simulating stock price paths with Monte Carlo methods, and analyzing Greeks (delta, gamma, etc.) for options strategy evaluation.
## `KUC-109`
**Source**: `examples/Finance Toolkit - 6. Technicals Module.ipynb`
Calculating technical indicators including Bollinger Bands, RSI, ADX, and other chart patterns to identify trends, momentum, overbought/oversold conditions, and trading signals.
## `KUC-110`
**Source**: `examples/Finance Toolkit - 7. Risk Module.ipynb`
Quantifying investment risk through Value at Risk (VaR), Conditional VaR (CVaR), maximum drawdown, and return distribution analysis to measure downside risk and tail losses.
## `KUC-111`
**Source**: `examples/Finance Toolkit - 8. Performance Module.ipynb`
Evaluating investment performance using CAPM, Fama-French multi-factor models, Sharpe ratio, and Jensen's alpha to understand risk-adjusted returns and factor-based performance attribution.
## `KUC-112`
**Source**: `examples/Finance Toolkit - 9. Economics Module.ipynb`
Tracking macroeconomic conditions through consumer confidence indices, short-term and long-term interest rates across multiple countries to inform investment decisions.
## `KUC-113`
**Source**: `examples/Finance Toolkit - Using External Datasets.ipynb`
Importing proprietary or third-party financial data from CSV files, normalizing formats, and combining multiple datasets for unified analysis within the toolkit.
FILE:references/WISDOM.md
# Cross-Project Wisdom
Total: **10**
## `CW-PORTFOLIO-ANALYTICS-001` — Defensive zero-division guards with explicit handling
**From**: finance-bp-066--wealthbot, finance-bp-082--stock-screener, finance-bp-093--PyPortfolioOpt · **Applicable to**: portfolio-analytics
Always guard division operations with explicit zero-value checks before executing. In price ratio calculations, filter out securities where old_price is zero before calling getPricesDiff. In composite score calculations, guard against total_weight of zero and return 0.0 for empty input lists. This prevents NaN/infinity propagation that corrupts downstream calculations and crashes pipelines.
## `CW-PORTFOLIO-ANALYTICS-002` — Covariance matrix positive-semidefiniteness verification
**From**: finance-bp-093--PyPortfolioOpt, finance-bp-117--Riskfolio-Lib · **Applicable to**: portfolio-analytics
Always verify covariance matrix is positive-semidefinite before passing to CVXPY optimization. Apply eigenvalue clipping if violated, as non-PSD matrices cause Cholesky decomposition failures. Both PyPortfolioOpt and Riskfolio-Lib enforce this constraint to prevent optimizer from finding mathematically invalid solutions or crashing entirely.
## `CW-PORTFOLIO-ANALYTICS-003` — Geometric compounding for cumulative returns
**From**: finance-bp-068--xalpha, finance-bp-106--pyfolio-reloaded, finance-bp-107--empyrical-reloaded · **Applicable to**: portfolio-analytics
Compute cumulative returns using geometric compounding via cumprod(1 + returns), never arithmetic cumulation via cumsum. Arithmetic cumulative sum overstates gains and understates losses, causing cumulative returns to diverge significantly from actual portfolio performance over volatile periods. This principle applies to total return index construction and any cumulative performance calculation.
## `CW-PORTFOLIO-ANALYTICS-004` — Temporal shift enforcement to prevent look-ahead bias
**From**: finance-bp-108--finmarketpy, finance-bp-106--pyfolio-reloaded · **Applicable to**: portfolio-analytics
Enforce proper temporal shifting in signal generation and position calculations. Use shift(-1) for exit signals to prevent look-ahead bias, and shift(1) when estimating intraday positions from EOD data. Forward-fill carry data and backward-fill only old data gaps, never forward-fill spot prices. Violations cause live trading returns to diverge from backtested results.
## `CW-PORTFOLIO-ANALYTICS-005` — DCP-compliant convex optimization construction
**From**: finance-bp-093--PyPortfolioOpt, finance-bp-117--Riskfolio-Lib · **Applicable to**: portfolio-analytics
Use only DCP-compliant convex objectives and constraints in CVXPY. Provide constraints as callable functions accepting weight variables, use valid bounds formats matching n_assets length, and verify target parameters (volatility, return) are within feasible ranges. Non-convex or infeasible problems fail with DCPError or OptimizationError, preventing optimization entirely.
## `CW-PORTFOLIO-ANALYTICS-006` — Correct Sharpe ratio formula with risk-free rate subtraction
**From**: finance-bp-107--empyrical-reloaded, finance-bp-118--FinanceToolkit · **Applicable to**: portfolio-analytics
Calculate Sharpe ratio using (mean returns - risk_free) / std(returns) * sqrt(annualization) with sample standard deviation (ddof=1). Subtract risk-free rate from asset returns before dividing by volatility. Incorrect Sharpe ratio calculation produces misleading risk-adjusted return estimates, causing poor investment decisions based on faulty performance attribution.
## `CW-PORTFOLIO-ANALYTICS-007` — Immutable FIFO position tracking with chronological ordering
**From**: finance-bp-068--xalpha, finance-bp-066--wealthbot · **Applicable to**: portfolio-analytics
Maintain FIFO position tracking with strictly increasing date order for position entries. Use copy() function to create independent copies before mutating remtable to avoid side effects. Enforce chronological ordering in sell operations to ensure correct cost basis and holding period calculation, particularly important for funds with tiered fees by holding period.
## `CW-PORTFOLIO-ANALYTICS-008` — Validation at system boundaries with descriptive errors
**From**: finance-bp-082--stock-screener, finance-bp-093--PyPortfolioOpt, finance-bp-117--Riskfolio-Lib · **Applicable to**: portfolio-analytics
Enforce validation at system boundaries with descriptive error messages. Validate expected returns matches covariance matrix dimensions, score values are within [0, 100], confidence values within [0, 1], and required DataFrame columns are present. Invalid inputs should raise ValueError with descriptive messages listing valid options to prevent silent failures or corrupted calculations.
## `CW-PORTFOLIO-ANALYTICS-009` — Decimal rounding for monetary calculations
**From**: finance-bp-068--xalpha, finance-bp-107--empyrical-reloaded · **Applicable to**: portfolio-analytics
Use Decimal with explicit rounding (myround) for each monetary calculation to avoid floating-point errors that cause share miscalculation and incorrect cost basis. This prevents rounding errors from propagating to XIRR and portfolio valuation calculations. Direct floating-point operations in financial calculations accumulate errors that become material over many transactions.
## `CW-PORTFOLIO-ANALYTICS-010` — Cash flow sign convention enforcement
**From**: finance-bp-106--pyfolio-reloaded, finance-bp-068--xalpha · **Applicable to**: portfolio-analytics
Mark cash outflows as negative and cash inflows as positive in cftable. Incorrect cash flow signs cause NPV calculation to invert, producing negative returns for profitable trades and vice versa. Verify sum of round trip PnLs equals total realized transaction dollars to catch sign convention errors before they corrupt performance attribution.
FILE:references/components/data_acquisition.md
# data_acquisition (3 classes)
## `Toolkit.get_historical_data`
`data_acquisition/toolkit-get-historical-data.py:0`
## `Toolkit.get_financial_statements`
`data_acquisition/toolkit-get-financial-statements.py:0`
## `data_source`
`data_acquisition/data-source.py:0`
FILE:references/components/economic_data.md
# economic_data (4 classes)
## `Economics.get_gross_domestic_product`
`economic_data/economics-get-gross-domestic-product.py:0`
## `Economics.get_real_gross_domestic_product`
`economic_data/economics-get-real-gross-domestic-produc.py:0`
## `Economics.get_inflation`
`economic_data/economics-get-inflation.py:0`
## `data_source`
`economic_data/data-source.py:0`
FILE:references/components/financial_modeling.md
# financial_modeling (6 classes)
## `Models.get_dupont_analysis`
`financial_modeling/models-get-dupont-analysis.py:0`
## `Models.get_weighted_average_cost_of_capital`
`financial_modeling/models-get-weighted-average-cost-of-capi.py:0`
## `method`
`financial_modeling/method.py:0`
## `Models.get_altman_z_score`
`financial_modeling/models-get-altman-z-score.py:0`
## `Models.get_piotroski_f_score`
`financial_modeling/models-get-piotroski-f-score.py:0`
## `discount_rate`
`financial_modeling/discount-rate.py:0`
FILE:references/components/financial_ratio_calculation.md
# financial_ratio_calculation (6 classes)
## `Ratios.collect_profitability_ratios`
`financial_ratio_calculation/ratios-collect-profitability-ratios.py:0`
## `Ratios.collect_liquidity_ratios`
`financial_ratio_calculation/ratios-collect-liquidity-ratios.py:0`
## `Ratios.collect_valuation_ratios`
`financial_ratio_calculation/ratios-collect-valuation-ratios.py:0`
## `Ratios.collect_efficiency_ratios`
`financial_ratio_calculation/ratios-collect-efficiency-ratios.py:0`
## `Ratios.collect_all_ratios`
`financial_ratio_calculation/ratios-collect-all-ratios.py:0`
## `custom_ratios`
`financial_ratio_calculation/custom-ratios.py:0`
FILE:references/components/financial_statement_normalization.md
# financial_statement_normalization (3 classes)
## `normalize_statements`
`financial_statement_normalization/normalize-statements.py:0`
## `convert_financial_statements`
`financial_statement_normalization/convert-financial-statements.py:0`
## `normalization_format`
`financial_statement_normalization/normalization-format.py:0`
FILE:references/components/fixed_income_analysis.md
# fixed_income_analysis (4 classes)
## `FixedIncome.get_bond_price`
`fixed_income_analysis/fixedincome-get-bond-price.py:0`
## `FixedIncome.get_ice_bofa_option_adjusted_spread`
`fixed_income_analysis/fixedincome-get-ice-bofa-option-adjusted.py:0`
## `FixedIncome.get_yield_to_maturity`
`fixed_income_analysis/fixedincome-get-yield-to-maturity.py:0`
## `rate_source`
`fixed_income_analysis/rate-source.py:0`
FILE:references/components/options_pricing_-_greeks.md
# options_pricing_&_greeks (4 classes)
## `Options.get_delta`
`options_pricing_&_greeks/options-get-delta.py:0`
## `Options.get_gamma`
`options_pricing_&_greeks/options-get-gamma.py:0`
## `Options.get_implied_volatility`
`options_pricing_&_greeks/options-get-implied-volatility.py:0`
## `pricing_model`
`options_pricing_&_greeks/pricing-model.py:0`
FILE:references/components/performance_analysis.md
# performance_analysis (4 classes)
## `Performance.get_sharpe_ratio`
`performance_analysis/performance-get-sharpe-ratio.py:0`
## `Performance.get_capital_asset_pricing_model`
`performance_analysis/performance-get-capital-asset-pricing-mo.py:0`
## `Performance.get_fama_french`
`performance_analysis/performance-get-fama-french.py:0`
## `risk_free_rate`
`performance_analysis/risk-free-rate.py:0`
FILE:references/components/portfolio_management.md
# portfolio_management (4 classes)
## `Portfolio.read_portfolio_dataset`
`portfolio_management/portfolio-read-portfolio-dataset.py:0`
## `Portfolio.get_portfolio_performance`
`portfolio_management/portfolio-get-portfolio-performance.py:0`
## `Portfolio.get_positions_overview`
`portfolio_management/portfolio-get-positions-overview.py:0`
## `benchmark`
`portfolio_management/benchmark.py:0`
FILE:references/components/risk_analysis.md
# risk_analysis (4 classes)
## `Risk.get_var_historic`
`risk_analysis/risk-get-var-historic.py:0`
## `Risk.get_var_gaussian`
`risk_analysis/risk-get-var-gaussian.py:0`
## `Risk.get_max_drawdown`
`risk_analysis/risk-get-max-drawdown.py:0`
## `var_method`
`risk_analysis/var-method.py:0`
FILE:references/components/security_discovery.md
# security_discovery (3 classes)
## `Discovery.search_instruments`
`security_discovery/discovery-search-instruments.py:0`
## `Discovery.screen_stocks`
`security_discovery/discovery-screen-stocks.py:0`
## `search_method`
`security_discovery/search-method.py:0`
FILE:references/components/technical_analysis.md
# technical_analysis (4 classes)
## `Technicals.get_moving_average`
`technical_analysis/technicals-get-moving-average.py:0`
## `Technicals.get_money_flow_index`
`technical_analysis/technicals-get-money-flow-index.py:0`
## `Technicals.get_bollinger_bands`
`technical_analysis/technicals-get-bollinger-bands.py:0`
## `period`
`technical_analysis/period.py:0`
基于 FinancePy 框架的金融工具日期处理与定价能力,支持多国节假日日历与天数计数约定处理,生成债券和互换现金流调度,计算收益率和价格。
---
name: financepy-derivatives
description: |-
基于 FinancePy 框架的金融工具日期处理与定价能力,支持多国节假日日历与天数计数约定处理,生成债券和互换现金流调度,计算收益率和价格。
license: Proprietary. See LICENSE.txt in project root.
compatibility: Designed for Doramagic-host ecosystem (Claude Code / openclaw / Cursor). Requires Python 3.12+ with uv package manager.
metadata:
version: "v6.1"
blueprint_id: "finance-bp-101"
compiled_at: "2026-04-22T13:00:46.579380+00:00"
capability_markets: "global"
capability_activities: "derivatives-pricing"
sop_version: "crystal-compilation-v6.1"
---
# FinancePy 衍生品定价 (financepy-derivatives)
> 基于 FinancePy 框架的金融工具日期处理与定价能力,支持多国节假日日历与天数计数约定处理,生成债券和互换现金流调度,计算收益率和价格。
## Pipeline
`data_collection -> data_storage -> factor_computation -> target_selection -> trading_execution -> visualization`
## Top Use Cases (88 total)
### Holiday Calendar Usage (`UC-101`)
Determining business days and holidays for different countries to correctly schedule financial transactions and settlements
**Triggers**: calendar, holiday, business days
### Financial Date Creation and Manipulation (`UC-103`)
Creating and manipulating financial dates including adding days, months, tenors, and handling weekends for trade scheduling
**Triggers**: date creation, add days, add months
### Day Count Conventions Introduction (`UC-104`)
Calculating year fractions and day counts using various conventions (ACT/360, ACT/365, 30/360) for interest accrual calculations
**Triggers**: day count, year fraction, accrued interest
For all **88** use cases, see [references/USE_CASES.md](references/USE_CASES.md).
**Execute trigger**: `When user intent matches intent_router.uc_entries[].positive_terms AND user uses action verb (run/execute/跑/执行/backtest/fetch/collect)`
## What I'll Ask You
- Target market: A-share (default), HK, or crypto? (US stocks in ZVT are half-baked — stockus_nasdaq_AAPL exists but coverage is thin)
- Data source / provider: eastmoney (free, no account), joinquant (account+paid), baostock (free, good history), akshare, or qmt (broker)?
- Strategy type: MACD golden-cross, MA crossover, volume breakout, fundamental screen, or custom factor?
- Time range: start_timestamp and end_timestamp for backtest period
- Target entity IDs: specific stocks (stock_sh_600000) or index components (SZ1000)?
## Semantic Locks (Fatal)
| ID | Rule | On Violation |
|---|---|---|
| `SL-01` | Execute sell orders before buy orders in every trading cycle | halt |
| `SL-02` | Trading signals MUST use next-bar execution (no look-ahead) | halt |
| `SL-03` | Entity IDs MUST follow format entity_type_exchange_code | halt |
| `SL-04` | DataFrame index MUST be MultiIndex (entity_id, timestamp) | halt |
| `SL-05` | TradingSignal MUST have EXACTLY ONE of: position_pct, order_money, order_amount | halt |
| `SL-06` | filter_result column semantics: True=BUY, False=SELL, None/NaN=NO ACTION | halt |
| `SL-07` | Transformer MUST run BEFORE Accumulator in factor pipeline | halt |
| `SL-08` | MACD parameters locked: fast=12, slow=26, signal=9 | halt |
Full lock definitions: [references/LOCKS.md](references/LOCKS.md)
## Top Anti-Patterns (15 total)
- **`AP-DERIVATIVES-PRICING-001`**: Instrument NPV called without attached pricing engine
- **`AP-DERIVATIVES-PRICING-002`**: BSM forward price ignores dividend yield
- **`AP-DERIVATIVES-PRICING-003`**: Negative discount factors passed to log-domain interpolation
All 15 anti-patterns: [references/ANTI_PATTERNS.md](references/ANTI_PATTERNS.md)
## Evidence Quality Notice
> [QUALITY NOTICE] This crystal was compiled from blueprint finance-bp-101. Evidence verify ratio = 3.4% and audit fail total = 34. Generated results may have uncaptured requirement gaps. Verify critical decisions against source files (LATEST.yaml / LATEST.jsonl).
## Reference Files
| File | Contents | When to Load |
|---|---|---|
| [references/seed.yaml](references/seed.yaml) | V6+ 全量权威 (source-of-truth) | 有行为/决策争议时必读 |
| [references/ANTI_PATTERNS.md](references/ANTI_PATTERNS.md) | 15 条跨项目反模式 | 开始实现前 |
| [references/WISDOM.md](references/WISDOM.md) | 跨项目精华借鉴 | 架构决策时 |
| [references/CONSTRAINTS.md](references/CONSTRAINTS.md) | domain + fatal 约束 | 规则冲突时 |
| [references/USE_CASES.md](references/USE_CASES.md) | 全量 KUC-* 业务场景 | 需要完整示例时 |
| [references/LOCKS.md](references/LOCKS.md) | SL-* + preconditions + hints | 生成回测/交易代码前 |
| [references/COMPONENTS.md](references/COMPONENTS.md) | AST 组件地图(按 module 拆分)| 查 API 时 |
---
*Compiled by Doramagic crystal-compilation-v6.1 from `finance-bp-101` blueprint at 2026-04-22T13:00:46.579380+00:00.*
*See [human_summary.md](human_summary.md) for non-technical overview.*
FILE:human_summary.md
# Human Summary
> I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow. Just tell me what you want; I'll write the code, you don't have to dig docs. (Heads up: ZVT natively supports A-share, HK, and crypto. US stocks — stockus_nasdaq_AAPL — are half-baked; don't bother for serious work.)
## What I Can Do
- **tagline**: I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow. Just tell me what you want; I'll write the code, you don't have to dig docs. (Heads up: ZVT natively supports A-share, HK, and crypto. US stocks — stockus_nasdaq_AAPL — are half-baked; don't bother for serious work.)
- **use_cases**: ['Financial Date Creation and Manipulation', 'Date Internal Testing', 'Holiday Calendar Usage', 'A-share MACD daily golden-cross backtest with hfq price adjustment from eastmoney', 'End-to-end ZVT pipeline: FinanceRecorder + GoodCompanyFactor + StockTrader', 'Multi-factor strategy with TargetSelector (AND mode) combining MACD + volume breakout', 'Index composition data collection (SZ1000, SZ2000) with EM recorder']
## What I Ask You
- Target market: A-share (default), HK, or crypto? (US stocks in ZVT are half-baked — stockus_nasdaq_AAPL exists but coverage is thin)
- Data source / provider: eastmoney (free, no account), joinquant (account+paid), baostock (free, good history), akshare, or qmt (broker)?
- Strategy type: MACD golden-cross, MA crossover, volume breakout, fundamental screen, or custom factor?
- Time range: start_timestamp and end_timestamp for backtest period
- Target entity IDs: specific stocks (stock_sh_600000) or index components (SZ1000)?
FILE:references/ANTI_PATTERNS.md
# Anti-Patterns (Cross-Project)
Total: **15**
## FinancePy (finance-bp-101) (3)
### `AP-DERIVATIVES-PRICING-003` — Negative discount factors passed to log-domain interpolation <sub>(high)</sub>
When Numba-jitted interpolation functions perform log transformation on discount factors, negative or zero values cause domain errors. This occurs because log(-x) and log(0) are mathematically undefined. The consequence is runtime crashes in jitted functions and complete failure of discount curve interpolation, blocking all downstream pricing calculations.
### `AP-DERIVATIVES-PRICING-004` — Non-monotonic time points in discount curve interpolation <sub>(high)</sub>
Interpolation over non-monotonically increasing time points produces undefined behavior at crossing times, causing discount factors to be incorrectly computed where time values overlap. This corrupts the entire term structure because the bootstrap algorithm cannot determine which discount factor corresponds to which maturity. The consequence is incorrect present value calculations across all downstream products priced against the curve.
### `AP-DERIVATIVES-PRICING-005` — Bootstrap calibration instruments not in maturity order <sub>(high)</sub>
When building yield curves from market instruments (deposits, FRAs, swaps), the instruments must be provided in strictly increasing maturity order. Out-of-order instruments cause the bootstrap algorithm to solve for discount factors at incorrect time points, corrupting the entire term structure. The consequence is wrong forward rates and discount factors that propagate into all priced instruments.
## QuantLib-SWIG (finance-bp-123) (4)
### `AP-DERIVATIVES-PRICING-001` — Instrument NPV called without attached pricing engine <sub>(high)</sub>
Calling NPV() on a derivatives instrument without first calling setPricingEngine() returns uninitialized garbage values or throws null pointer exceptions. This occurs because the Instrument class relies on the attached PricingEngine to perform actual valuation logic. The consequence is silently incorrect pricing results that appear valid, potentially leading to bad trading decisions.
### `AP-DERIVATIVES-PRICING-006` — Option Exercise type mismatches VanillaOption constructor <sub>(high)</sub>
VanillaOption requires both a StrikedTypePayoff and a matching Exercise object. Using wrong Exercise type (e.g., AmericanExercise for European option) causes compilation failures in C++ or runtime errors in SWIG bindings. The consequence is the pricing system cannot initialize options, blocking all option pricing workflows.
### `AP-DERIVATIVES-PRICING-013` — Evaluation date not set before QuantLib term structure construction <sub>(medium)</sub>
QuantLib requires ql.Settings.instance().evaluationDate to be set before constructing yield term structures and instruments. Without an explicit evaluation date, the curve reference date becomes undefined, causing date calculations to fail or produce incorrect settlement dates. The consequence is wrong discount factors and NPV calculations across the entire portfolio.
### `AP-DERIVATIVES-PRICING-014` — Market quotes passed without QuoteHandle wrapper <sub>(medium)</sub>
QuantLib's observer pattern requires all market quotes to be wrapped in QuoteHandle before passing to rate helpers. Raw quote values bypass the observable notification mechanism, causing dependent instruments to never recalculate when market data updates. The consequence is stale pricing that doesn't reflect current market conditions.
## arch (finance-bp-124) (2)
### `AP-DERIVATIVES-PRICING-007` — NaN/inf values in ARCH model input data <sub>(high)</sub>
ARCH model estimation relies on recursive variance computations and scipy optimize. Non-finite input values (NaN, inf) cause optimizers to produce NaN results and recursive variance calculations to fail. The consequence is complete model estimation failure with meaningless outputs that appear valid, leading to incorrect volatility forecasts and risk misestimation.
### `AP-DERIVATIVES-PRICING-008` — ARCH parameter array concatenation in wrong order <sub>(high)</sub>
ARCHModel composes from three components (mean, volatility, distribution) and requires parameter arrays concatenated in fixed order: [mean_params, volatility_params, distribution_params]. Incorrect ordering causes _parse_parameters to assign wrong values to wrong components, producing mathematically invalid models (e.g., volatility parameters interpreted as distribution parameters). The consequence is invalid conditional variance forecasts.
## py_vollib (finance-bp-127) (6)
### `AP-DERIVATIVES-PRICING-002` — BSM forward price ignores dividend yield <sub>(high)</sub>
When calculating option prices on dividend-paying stocks using BSM, the forward price must be adjusted as F = S * exp((r-q)*t). Omitting the dividend yield adjustment (using F = S * exp(r*t)) causes systematic mispricing for all dividend-paying assets. The consequence is consistently wrong option prices that diverge from market prices, leading to arbitrage opportunities and trading losses.
### `AP-DERIVATIVES-PRICING-009` — Zero or negative time-to-expiration in option pricing <sub>(high)</sub>
Option pricing formulas (Black-Scholes, Black model) compute sqrt(t) in the denominator. Zero time causes division by zero; negative time produces NaN in d1/d2 calculations. The consequence is invalid option prices (NaN, inf) that break downstream Greeks calculations and hedging workflows.
### `AP-DERIVATIVES-PRICING-010` — Black model applies spot price instead of forward price <sub>(high)</sub>
The Black model is designed for options on futures/forwards and expects futures price F as input, not spot price S. Using spot directly causes incorrect pricing because the Black formula assumes the underlying follows geometric Brownian motion with drift equal to the risk-free rate (i.e., forward dynamics). The consequence is systematically wrong forward option prices.
### `AP-DERIVATIVES-PRICING-011` — Missing discount factor in Black model pricing <sub>(medium)</sub>
Black model pricing must apply time value discounting with deflater = exp(-r*t) to undiscounted option prices. Omitting the discount factor produces forward option prices that exceed their fair value by the risk-free compounding amount. The consequence is violation of time value of money principles and prices that cannot be used for fair valuation or hedging.
### `AP-DERIVATIVES-PRICING-012` — Invalid flag parameter ('c'/'p') passed to py_vollib without validation <sub>(medium)</sub>
py_vollib binary_flag dict only contains keys 'c' and 'p'. Passing any other flag value causes KeyError exception. The library lacks input validation and crashes on invalid inputs. The consequence is unhandled exceptions in production systems when flag values come from external sources with unexpected formats.
### `AP-DERIVATIVES-PRICING-015` — Implied volatility computed without proper bounds validation <sub>(medium)</sub>
When computing implied volatility, option prices outside theoretical bounds (below intrinsic value or above maximum) must raise appropriate exceptions. Returning invalid IV values (negative volatility or extreme values) violates mathematical definitions and leads to incorrect pricing, risk calculations, and hedging ratios. The consequence is systemic pricing errors across all vol-dependent derivatives.
FILE:references/COMPONENTS.md
# Component Capability Map
**Project**: finance-bp-101--FinancePy
**Scan date**: 2026-04-22
**Stats**: {'total_files': 8, 'total_classes': 45, 'total_functions': 0, 'total_stages': 8}
## Modules (8)
- [utilities](components/utilities.md): 4 classes
- [market_curves](components/market_curves.md): 6 classes
- [market_volatility](components/market_volatility.md): 5 classes
- [pricing_models](components/pricing_models.md): 6 classes
- [equity_&_fx_options](components/equity_-_fx_options.md): 6 classes
- [interest_rate_products](components/interest_rate_products.md): 6 classes
- [credit_products](components/credit_products.md): 6 classes
- [bond_products](components/bond_products.md): 6 classes
FILE:references/CONSTRAINTS.md
# Constraints
## preservation_manifest
```yaml
required_objects:
business_decisions_count: 129
fatal_constraints_count: 89
non_fatal_constraints_count: 216
use_cases_count: 88
semantic_locks_count: 12
preconditions_count: 4
evidence_quality_rules_count: 2
traceback_scenarios_count: 5
```
FILE:references/LOCKS.md
# Semantic Locks + Preconditions
## Semantic Locks (12)
### `SL-01` <sub>(on_violation: fatal)</sub>
Execute sell orders before buy orders in every trading cycle
### `SL-02` <sub>(on_violation: fatal)</sub>
Trading signals MUST use next-bar execution (no look-ahead)
### `SL-03` <sub>(on_violation: fatal)</sub>
Entity IDs MUST follow format entity_type_exchange_code
### `SL-04` <sub>(on_violation: fatal)</sub>
DataFrame index MUST be MultiIndex (entity_id, timestamp)
### `SL-05` <sub>(on_violation: fatal)</sub>
TradingSignal MUST have EXACTLY ONE of: position_pct, order_money, order_amount
### `SL-06` <sub>(on_violation: fatal)</sub>
filter_result column semantics: True=BUY, False=SELL, None/NaN=NO ACTION
### `SL-07` <sub>(on_violation: fatal)</sub>
Transformer MUST run BEFORE Accumulator in factor pipeline
### `SL-08` <sub>(on_violation: fatal)</sub>
MACD parameters locked: fast=12, slow=26, signal=9
### `SL-09` <sub>(on_violation: warning)</sub>
Default transaction costs: buy_cost=0.001, sell_cost=0.001, slippage=0.001
### `SL-10` <sub>(on_violation: fatal)</sub>
A-share equity trading is T+1 (no same-day close of buy positions)
### `SL-11` <sub>(on_violation: fatal)</sub>
Recorder subclass MUST define provider AND data_schema class attributes
### `SL-12` <sub>(on_violation: fatal)</sub>
Factor result_df MUST contain either 'filter_result' OR 'score_result' column
## Preconditions (4)
- **`PC-01`**: `python3 -c 'import zvt; print(zvt.__version__)'` → on_fail: Run: python3 -m pip install zvt then re-run: python3 -m zvt.init_dirs to initialize data directories
- **`PC-02`**: `python3 -c "from zvt.api.kdata import get_kdata; df = get_kdata(entity_ids=['stock_sh_600000'], limit=1); assert df is not None and len(df) > 0, 'No kdata found'"` → on_fail: Run recorder first: python3 -m zvt.recorders.em.em_stock_kdata_recorder --entity_ids stock_sh_600000 (replace with your target entity IDs)
- **`PC-03`**: `python3 -c "import os; from pathlib import Path; zvt_home = Path(os.environ.get('ZVT_HOME', Path.home() / '.zvt')); assert zvt_home.exists(), f'ZVT home not found: {zvt_home}'"` → on_fail: Run: python3 -m zvt.init_dirs
- **`PC-04`**: `python3 -c "import os, tempfile; from pathlib import Path; zvt_home = Path(os.environ.get('ZVT_HOME', Path.home() / '.zvt')); test_f = zvt_home / '.write_test'; test_f.touch(); test_f.unlink()"` → on_fail: Check directory permissions: chmod u+w ~/.zvt or set ZVT_HOME environment variable to a writable location
FILE:references/USE_CASES.md
# Known Use Cases (KUC)
Total: **88**
## `KUC-101`
**Source**: `notebooks/finutils/FINCALENDAR_IntroductionToUsingCalendars.ipynb`
Determining business days and holidays for different countries to correctly schedule financial transactions and settlements.
## `KUC-102`
**Source**: `notebooks/finutils/FINDATES_TestingDateInternals.ipynb`
Testing internal date representation and Excel date serial number conversion for financial date calculations.
## `KUC-103`
**Source**: `notebooks/finutils/FINDATE_CreatingAndManipulatingFinDates.ipynb`
Creating and manipulating financial dates including adding days, months, tenors, and handling weekends for trade scheduling.
## `KUC-104`
**Source**: `notebooks/finutils/FINDAYCOUNT_Introduction.ipynb`
Calculating year fractions and day counts using various conventions (ACT/360, ACT/365, 30/360) for interest accrual calculations.
## `KUC-105`
**Source**: `notebooks/finutils/FINSCHEDULE_ExamplesOfScheduleGeneration.ipynb`
Generating payment schedules for bonds, swaps, and other fixed income instruments with proper date adjustments.
## `KUC-106`
**Source**: `notebooks/finutils/TENSIONSPLINE_Example.ipynb`
Using tension spline interpolation for smooth curve fitting with adjustable tension parameter.
## `KUC-107`
**Source**: `notebooks/market/curves/FINDISCOUNTCURVEFLAT_ExaminationOfDiscountCurveFlat.ipynb`
Analyzing discount factors and zero rates using a flat discount curve with different compounding frequencies.
## `KUC-108`
**Source**: `notebooks/market/curves/FINDISCOUNTCURVENSS_IntroductionToTheNelsonSiegelSvenssonCurve.ipynb`
Fitting yield curves using the Nelson-Siegel-Svensson parametric model for interest rate surface estimation.
## `KUC-109`
**Source**: `notebooks/market/curves/FINDISCOUNTCURVENS_ExaminingTheNelsonSiegelCurve.ipynb`
Analyzing the Nelson-Siegel model factor loadings and curve fitting for yield curve construction.
## `KUC-110`
**Source**: `notebooks/market/curves/FINDISCOUNTCURVEPOLY_SimpleAnalysis.ipynb`
Fitting discount curves using polynomial functions for yield curve construction and forward rate analysis.
## `KUC-111`
**Source**: `notebooks/market/curves/FINDISCOUNTCURVEZERO_ConvertZeroCurveToDiscountCurve.ipynb`
Converting zero rate curves to discount factor curves for bond and derivatives pricing.
## `KUC-112`
**Source**: `notebooks/market/curves/FINDISCOUNTCURVE_AnalysisOfInterpolationSchemes.ipynb`
Comparing different interpolation methods (linear, cubic spline) for discount curve construction.
## `KUC-113`
**Source**: `notebooks/market/curves/FINDISCOUNTCURVE_Introduction.ipynb`
Introduction to discount curve construction and calculating forward rates, swap rates from the curve.
## `KUC-114`
**Source**: `notebooks/market/curves/FINDISCOUNTCURVE_PieceWiseFlatOverNightForwardRateDiscountCurve.ipynb`
Building discount curves using piecewise flat overnight forward rates with bump analysis for risk management.
## `KUC-115`
**Source**: `notebooks/market/volatility/EquityVolSurfaceConstructionSVI.ipynb`
Constructing equity implied volatility surfaces using the SVI (Stochastic Volatility Inspired) parameterization.
## `KUC-116`
**Source**: `notebooks/market/volatility/FXVolSurfaceConstructionPartOne.ipynb`
Building FX implied volatility surfaces from market quotes using various volatility function types.
## `KUC-117`
**Source**: `notebooks/market/volatility/FXVolSurfaceConstructionPartTwo.ipynb`
Extended FX volatility surface construction with 10-delta and 25-delta quotes.
## `KUC-118`
**Source**: `notebooks/market/volatility/FXVolSurfaceConstructionPartThree.ipynb`
Advanced FX volatility surface construction with multiple tenors and full delta coverage.
## `KUC-119`
**Source**: `notebooks/market/volatility/SimpleBuildFXVolatilitySurface25Delta.ipynb`
Building a simple FX volatility surface using 25-delta risk reversals and strangles.
## `KUC-120`
**Source**: `notebooks/models/CHOLESKY CHECK.ipynb`
Validating Cholesky decomposition for generating correlated random variables in Monte Carlo simulations.
## `KUC-121`
**Source**: `notebooks/models/FINGBMPROCESS_generatePaths.ipynb`
Generating Geometric Brownian Motion paths for asset price simulation in Monte Carlo pricing.
## `KUC-122`
**Source**: `notebooks/models/FINITE_DIFFERENCE.ipynb`
Pricing options using finite difference methods (explicit, implicit, Crank-Nicolson) for Black-Scholes PDE.
## `KUC-123`
**Source**: `notebooks/models/FINITE_DIFFERENCE_PSOR.ipynb`
Using Projected Successive Over-Relaxation (PSOR) to solve finite difference equations for option pricing.
## `KUC-124`
**Source**: `notebooks/models/FINMODEL_GAUSSIANCOPULA_PortfolioLossDistributionBuilder.ipynb`
Building portfolio loss distributions using one-factor Gaussian copula model for credit risk analysis.
## `KUC-125`
**Source**: `notebooks/models/FINMODEL_SABRSHIFTED_InterestRates.ipynb`
Pricing interest rate swaptions using the shifted SABR model to capture volatility smile.
## `KUC-126`
**Source**: `notebooks/models/FINMODEL_SABRSHIFTED_VolatilitySmile.ipynb`
Analyzing volatility smiles using the shifted SABR model for low-rate environments.
## `KUC-127`
**Source**: `notebooks/models/FINMODEL_SABR_InterestRates.ipynb`
Implementing and analyzing the SABR stochastic volatility model for interest rate derivatives.
## `KUC-128`
**Source**: `notebooks/models/FINVOLFUNCTIONS_SSVI_MODEL.ipynb`
Analyzing the Surface SVI (SSVI) parameterization for volatility surface construction and arbitrage-free interpolation.
## `KUC-129`
**Source**: `notebooks/models/MERTON_CREDIT_MODEL.ipynb`
Structural credit risk modeling using Merton's firm value model to calculate default probability and credit spreads.
## `KUC-130`
**Source**: `notebooks/products/bonds/FINANNUITY_Valuation.ipynb`
Valuing bond annuity schedules and calculating clean/dirty prices using discount curves.
## `KUC-131`
**Source**: `notebooks/products/bonds/FINBONDCONVERTIBLE_ComparisonWithQLExample.ipynb`
Validating convertible bond pricing against QuantLib reference implementations.
## `KUC-132`
**Source**: `notebooks/products/bonds/FINBONDCONVERTIBLE_ValuationAndConvergenceTest.ipynb`
Testing convergence of convertible bond Monte Carlo valuation with varying step sizes.
## `KUC-133`
**Source**: `notebooks/products/bonds/FINBONDEMBEDDEDOPTION_Valuation.ipynb`
Valuing callable and putable bonds using interest rate tree models (Hull-White, Black-Karasinski).
## `KUC-134`
**Source**: `notebooks/products/bonds/FINBONDFRN_CitigroupExample.ipynb`
Pricing floating rate notes (FRNs) and calculating discount margin, duration, and convexity.
## `KUC-135`
**Source**: `notebooks/products/bonds/FINBONDFUTURES_ExampleContracts.ipynb`
Analyzing bond futures contracts and calculating cheapest-to-deliver and invoice prices.
## `KUC-136`
**Source**: `notebooks/products/bonds/FINBONDMARKET_DatabaseOfConventions.ipynb`
Accessing standard bond market conventions including day count, frequency, settlement days for different countries.
## `KUC-137`
**Source**: `notebooks/products/bonds/FINBONDMORTGAGE_SimpleCalculator.ipynb`
Calculating mortgage repayment schedules including interest-only and repayment modes.
## `KUC-138`
**Source**: `notebooks/products/bonds/FINBONDOPTION_All_Models_Valuation_Analysis.ipynb`
Valuing bond options (European and American) using various short rate models.
## `KUC-139`
**Source**: `notebooks/products/bonds/FINBONDOPTION_BK_ModelValuationAnalysis.ipynb`
Pricing bond options using the Black-Karasinski interest rate model.
## `KUC-140`
**Source**: `notebooks/products/bonds/FINBONDOPTION_HW_EXAMPLE_MATCH_DERIVA_GEN.ipynb`
Valuing bond options using Hull-White model validated against DerivaGem.
## `KUC-141`
**Source**: `notebooks/products/bonds/FINBONDOPTION_HW_Model_Jamshidian.ipynb`
Pricing European bond options using Hull-White model with Jamshidian decomposition.
## `KUC-142`
**Source**: `notebooks/products/bonds/FINBONDOPTION_Tree_Convergence_With_Volatility.ipynb`
Analyzing convergence of lattice tree methods for bond option pricing with varying volatility.
## `KUC-143`
**Source**: `notebooks/products/bonds/FINBONDOPTION_Tree_Convergence_Zero_Vol.ipynb`
Testing tree convergence for bond options in zero volatility (lognormal) limiting case.
## `KUC-144`
**Source**: `notebooks/products/bonds/FINBONDYIELDCURVES_FittingExample.ipynb`
Fitting yield curves to bond prices using polynomial regression.
## `KUC-145`
**Source**: `notebooks/products/bonds/FINBONDYIELDCURVE_FittingToAswAndZSpreads.ipynb`
Fitting bond yield curves to asset swap spreads and Z-spreads.
## `KUC-146`
**Source**: `notebooks/products/bonds/FINBONDYIELDCURVE_FittingToBondMarketPrices.ipynb`
Fitting yield curves directly to observable bond market prices.
## `KUC-147`
**Source**: `notebooks/products/bonds/FINBONDZEROCURVE_BootstrapOutstandingBonds.ipynb`
Bootstrapping zero coupon curves from outstanding bond prices.
## `KUC-148`
**Source**: `notebooks/products/bonds/FINBOND_CalculateOptionAdjustedSpread.ipynb`
Calculating option-adjusted spread (OAS) for callable bonds.
## `KUC-149`
**Source**: `notebooks/products/bonds/FINBOND_CalculatePriceUsingSurvivalCurve.ipynb`
Calculating bond prices using survival (credit) curves accounting for default risk.
## `KUC-150`
**Source**: `notebooks/products/bonds/FINBOND_CalculatingTheAssetSwapSpread.ipynb`
Calculating asset swap spreads for bonds relative to LIBOR.
## `KUC-151`
**Source**: `notebooks/products/bonds/FINBOND_ComparisonWithQLExample.ipynb`
Validating bond pricing implementation against QuantLib reference.
## `KUC-152`
**Source**: `notebooks/products/bonds/FINBOND_DiscountingBondCashflowsFinDiscountCurve.ipynb`
Calculating bond prices by discounting cash flows using a flat discount curve.
## `KUC-153`
**Source**: `notebooks/products/bonds/FINBOND_ExampleAppleCorp.ipynb`
Full analysis of Apple corporate bond including yield, duration, convexity, and accrued interest.
## `KUC-154`
**Source**: `notebooks/products/bonds/FINBOND_ExampleUSTreasury_CUSIP_91282CFX4.ipynb`
Valuing US Treasury bonds with proper conventions and calculating yields.
## `KUC-155`
**Source**: `notebooks/products/bonds/FINBOND_Key_Rate_Durations_Example.ipynb`
Calculating key rate durations for bond portfolio yield curve sensitivity analysis.
## `KUC-156`
**Source**: `notebooks/products/credit/FINCDSBASKET_ValuationModelComparison.ipynb`
Comparing different valuation models for CDS baskets and basket default swaps.
## `KUC-157`
**Source**: `notebooks/products/credit/FINCDSCURVE_BuildingASurvivalCurve.ipynb`
Building credit survival curves from CDS term structures for credit derivative pricing.
## `KUC-158`
**Source**: `notebooks/products/credit/FINCDSINDEXOPTION_CompareValuationApproaches.ipynb`
Comparing different approaches for valuing CDS index options.
## `KUC-159`
**Source**: `notebooks/products/credit/FINCDSINDEX_ValuingCDSIndex.ipynb`
Valuing credit default swap indices (CDX, iTraxx) and calculating par spreads.
## `KUC-160`
**Source**: `notebooks/products/credit/FINCDSOPTION_ValuingCDSOption.ipynb`
Valuing options on credit default swaps including sensitivity analysis.
## `KUC-161`
**Source**: `notebooks/products/credit/FINCDSTRANCHE_CalculatingFairSpread.ipynb`
Calculating fair spreads for CDS index tranches with different attachment/detachment points.
## `KUC-162`
**Source**: `notebooks/products/credit/FINCDS_ComparisonWithMarkitCDSModel.ipynb`
Validating CDS valuation against Markit CDS model reference implementation.
## `KUC-163`
**Source**: `notebooks/products/credit/FINCDS_CreatingAndValuingACDS.ipynb`
Creating and valuing credit default swaps including par spread and PV calculations.
## `KUC-164`
**Source**: `notebooks/products/credit/FINCDS_CreatingAndValuingACDSFlatCurves.ipynb`
CDS valuation using simplified flat discount and survival curves.
## `KUC-165`
**Source**: `notebooks/products/credit/FINCDS_ForwardAndBackward.ipynb`
Understanding CDS cash flow generation using forward vs backward date generation rules.
## `KUC-166`
**Source**: `notebooks/products/equity/EQUITY_AMERICANOPTION_BARONE_ADESI_WHALEY_APPROX.ipynb`
Pricing American options using Barone-Adesi Whaley approximation method.
## `KUC-167`
**Source**: `notebooks/products/equity/EQUITY_AMERICANOPTION_BJERKSUND_STENSLAND_APPROX.ipynb`
Pricing American options using Bjerksund-Stensland approximation for call-put parity.
## `KUC-168`
**Source**: `notebooks/products/equity/EQUITY_AMERICANOPTION_ComparisonWithQLExample.ipynb`
Validating American option pricing against QuantLib reference implementation.
## `KUC-169`
**Source**: `notebooks/products/equity/EQUITY_ASIAN_OPTIONS.ipynb`
Pricing Asian (average rate) options using geometric and arithmetic averaging methods.
## `KUC-170`
**Source**: `notebooks/products/equity/EQUITY_BARRIER_OPTIONS.ipynb`
Pricing barrier options (up-and-out, down-and-in, etc.) with Greeks calculation.
## `KUC-171`
**Source**: `notebooks/products/equity/EQUITY_BASKET_OPTIONS.ipynb`
Pricing basket options on multiple underlying assets using moment matching.
## `KUC-172`
**Source**: `notebooks/products/equity/EQUITY_CHOOSER_OPTION.ipynb`
Pricing chooser options that allow selection of call or put at a future date.
## `KUC-173`
**Source**: `notebooks/products/equity/EQUITY_CLIQUET_OPTION.ipynb`
Pricing cliquet (reset) options with periodic coupon-like payoffs based on performance.
## `KUC-174`
**Source**: `notebooks/products/equity/EQUITY_COMPOUND_OPTION_CompareWithML.ipynb`
Pricing compound options (option on option) and comparing with machine learning approaches.
## `KUC-175`
**Source**: `notebooks/products/equity/EQUITY_DIGITALOPTION_BasicValuation.ipynb`
Pricing digital options with asset-or-nothing payoff and calculating Greeks.
## `KUC-176`
**Source**: `notebooks/products/equity/EQUITY_DIGITAL_CASH_OR_NOTHING_OPTION.ipynb`
Pricing cash-or-nothing digital options with fixed payoff upon condition.
## `KUC-177`
**Source**: `notebooks/products/equity/EQUITY_FIXED_LOOKBACK_OPTION.ipynb`
Pricing fixed strike lookback options using Monte Carlo simulation.
## `KUC-178`
**Source**: `notebooks/products/equity/EQUITY_FLOAT_LOOKBACK_OPTION.ipynb`
Pricing floating strike lookback options where strike is determined by extreme price.
## `KUC-179`
**Source**: `notebooks/products/equity/EQUITY_ONE_TOUCH_OPTION.ipynb`
Pricing one-touch (digital) options that pay upon touching a barrier level.
## `KUC-180`
**Source**: `notebooks/products/equity/EQUITY_RAINBOW_OPTION.ipynb`
Pricing rainbow options on multiple assets with various payoff structures.
## `KUC-181`
**Source**: `notebooks/products/equity/EQUITY_VANILLA_AMERICAN_STYLE_OPTION.ipynb`
Pricing American vanilla options using LSMC and finite difference methods.
## `KUC-182`
**Source**: `notebooks/products/equity/EQUITY_VANILLA_EUROPEAN_STYLE_MONTE_CARLO_SOBOL.ipynb`
European option pricing using Sobol quasi-random sequences for Monte Carlo.
## `KUC-183`
**Source**: `notebooks/products/equity/EQUITY_VANILLA_EUROPEAN_STYLE_MONTE_CARLO_TIMINGS.ipynb`
Performance benchmarking of Monte Carlo implementations with different libraries.
## `KUC-184`
**Source**: `notebooks/products/equity/EQUITY_VANILLA_EUROPEAN_STYLE_OPTION HIGH VOL LIMIT.ipynb`
Analyzing European option behavior in high volatility limiting cases.
## `KUC-185`
**Source**: `notebooks/products/equity/EQUITY_VANILLA_EUROPEAN_STYLE_OPTION.ipynb`
European option pricing with full Greeks calculation (delta, gamma, theta, vega, rho).
## `KUC-186`
**Source**: `notebooks/products/equity/EQUITY_VANILLA_EUROPEAN_STYLE_OPTION_VECTORISATION.ipynb`
Vectorized European option pricing for multiple strikes, expiries, or option types simultaneously.
## `KUC-187`
**Source**: `notebooks/products/equity/EQUITY_VANILLA_OPTION_IntradayValuationAndGreeks.ipynb`
Intraday option pricing with hourly Greeks updates for trading desks.
## `KUC-188`
**Source**: `notebooks/products/equity/EQUITY_VARIANCESWAP_Basic_Example.ipynb`
Pricing variance swaps and calculating fair strike using realized volatility from option surface.
FILE:references/WISDOM.md
# Cross-Project Wisdom
Total: **8**
## `CW-DERIVATIVES-PRICING-001` — Strict input validation before financial calculations
**From**: FinancePy, QuantLib-SWIG · **Applicable to**: derivatives-pricing
Both FinancePy and QuantLib-SWIG enforce strict validation of all input parameters before any financial computation. FinancePy validates day count types, date arguments, tolerance parameters, and max iterations. QuantLib-SWIG validates exercise types and swap direction enums. This pattern prevents corrupted calculations and provides clear error messages. Apply this pattern by validating all inputs at function entry points.
## `CW-DERIVATIVES-PRICING-002` — Bootstrap requires ordered instrument calibration
**From**: FinancePy, QuantLib-SWIG · **Applicable to**: derivatives-pricing
Both FinancePy and QuantLib-SWIG require calibration instruments to be provided in strict maturity order for curve bootstrapping. FinancePy enforces monotonically increasing time points and validates instrument sequencing (deposits before FRAs before swaps). QuantLib-SWIG uses bootstrap helpers (DepositRateHelper, FraRateHelper, SwapRateHelper) that assume ordered inputs. This ensures the bootstrap algorithm solves for discount factors at mathematically correct time points.
## `CW-DERIVATIVES-PRICING-003` — Handle pattern for lazy evaluation chains
**From**: QuantLib-SWIG · **Applicable to**: derivatives-pricing
QuantLib-SWIG requires wrapping market data (quotes, term structures) in Handle objects to enable lazy evaluation and automatic recalculation. QuoteHandle for market quotes and Handle for term structures enable the observer pattern. When market data updates, all dependent instruments automatically recalculate. This pattern is essential for live pricing systems where prices must reflect current market conditions.
## `CW-DERIVATIVES-PRICING-004` — Parameter composition requires fixed ordering and partitioning
**From**: arch · **Applicable to**: derivatives-pricing
arch enforces a strict parameter composition pattern where mean, volatility, and distribution parameters must be concatenated in fixed order with explicit offset partitioning. The offsets array partitions the unified parameter vector into components. This pattern prevents parameter assignment errors that would corrupt model components. Apply this when composing financial models from multiple sub-components.
## `CW-DERIVATIVES-PRICING-005` — Strict mathematical constraint enforcement
**From**: arch, py_vollib · **Applicable to**: derivatives-pricing
Both arch and py_vollib enforce strict mathematical constraints: arch enforces volatility model stationarity constraints (A.dot(params) - b >= 0) for SLSQP optimization; py_vollib validates implied volatility is positive and option prices within intrinsic/maximum bounds. Violating these constraints produces mathematically invalid results. Always enforce domain constraints on all financial model parameters.
## `CW-DERIVATIVES-PRICING-006` — Forward price adjustment for dividend yield in BSM
**From**: py_vollib · **Applicable to**: derivatives-pricing
py_vollib demonstrates the correct BSM implementation: compute forward price F = S * exp((r-q)*t) to adjust for continuous dividend yield before passing to the pricing engine. This pattern is essential for all options on dividend-paying assets. Forgetting the dividend adjustment causes systematic mispricing for the entire equity derivatives book.
## `CW-DERIVATIVES-PRICING-007` — Monotonicity validation for interpolation arrays
**From**: FinancePy · **Applicable to**: derivatives-pricing
FinancePy enforces strictly monotonically increasing time arrays before interpolation operations. This prevents undefined behavior at crossing times and ensures each time point maps to exactly one discount factor. Apply this validation whenever implementing interpolation over financial time series (discount curves, volatility surfaces, forward rates).
## `CW-DERIVATIVES-PRICING-008` — Production vs reference implementation selection
**From**: py_vollib · **Applicable to**: derivatives-pricing
py_vollib explicitly distinguishes between ref_python (slow, educational) and production (fast, C-based lets_be_rational) implementations. Using the reference implementation in production causes 10-100x performance degradation. Always select the appropriate implementation tier based on use case requirements—reference for testing/education, optimized for production trading systems.
FILE:references/components/bond_products.md
# bond_products (6 classes)
## `Bond.dirty_price_from_discount_curve`
`bond_products/bond-dirty-price-from-discount-curve.py:0`
## `Bond.yield_to_maturity`
`bond_products/bond-yield-to-maturity.py:0`
## `BondCallable.value`
`bond_products/bondcallable-value.py:0`
## `BondFRN.discount_margin`
`bond_products/bondfrn-discount-margin.py:0`
## `ytm_convention`
`bond_products/ytm-convention.py:0`
## `yield_basis`
`bond_products/yield-basis.py:0`
FILE:references/components/credit_products.md
# credit_products (6 classes)
## `CDS.value`
`credit_products/cds-value.py:0`
## `CDS.par_spread`
`credit_products/cds-par-spread.py:0`
## `CDSCurve.build`
`credit_products/cdscurve-build.py:0`
## `CDSTranche.value`
`credit_products/cdstranche-value.py:0`
## `pv01_method`
`credit_products/pv01-method.py:0`
## `prot_method`
`credit_products/prot-method.py:0`
FILE:references/components/equity_-_fx_options.md
# equity_&_fx_options (6 classes)
## `EquityVanillaOption.value`
`equity_&_fx_options/equityvanillaoption-value.py:0`
## `EquityVanillaOption.delta`
`equity_&_fx_options/equityvanillaoption-delta.py:0`
## `EquityAmericanOption.value`
`equity_&_fx_options/equityamericanoption-value.py:0`
## `FXVanillaOption.value`
`equity_&_fx_options/fxvanillaoption-value.py:0`
## `pricing_model`
`equity_&_fx_options/pricing-model.py:0`
## `mc_method`
`equity_&_fx_options/mc-method.py:0`
FILE:references/components/interest_rate_products.md
# interest_rate_products (6 classes)
## `IborSwap.value`
`interest_rate_products/iborswap-value.py:0`
## `IborSwap.set_fixed_rate_to_atm`
`interest_rate_products/iborswap-set-fixed-rate-to-atm.py:0`
## `IborSwaption.value`
`interest_rate_products/iborswaption-value.py:0`
## `IborCapFloor.value`
`interest_rate_products/iborcapfloor-value.py:0`
## `swaption_model`
`interest_rate_products/swaption-model.py:0`
## `swap_rate_interpolation`
`interest_rate_products/swap-rate-interpolation.py:0`
FILE:references/components/market_curves.md
# market_curves (6 classes)
## `DiscountCurve.df`
`market_curves/discountcurve-df.py:0`
## `DiscountCurve.fwd`
`market_curves/discountcurve-fwd.py:0`
## `IborSingleCurve.build`
`market_curves/iborsinglecurve-build.py:0`
## `IborDualCurve.build`
`market_curves/ibordualcurve-build.py:0`
## `bootstrap_method`
`market_curves/bootstrap-method.py:0`
## `interpolator_type`
`market_curves/interpolator-type.py:0`
FILE:references/components/market_volatility.md
# market_volatility (5 classes)
## `FXVolSurface.volatility`
`market_volatility/fxvolsurface-volatility.py:0`
## `FXVolSurfacePlus.calibrate`
`market_volatility/fxvolsurfaceplus-calibrate.py:0`
## `SwaptionVolSurface.value`
`market_volatility/swaptionvolsurface-value.py:0`
## `vol_function_type`
`market_volatility/vol-function-type.py:0`
## `atm_method`
`market_volatility/atm-method.py:0`
FILE:references/components/pricing_models.md
# pricing_models (6 classes)
## `Model.price`
`pricing_models/model-price.py:0`
## `BlackScholes.price`
`pricing_models/blackscholes-price.py:0`
## `SABR.black_vol`
`pricing_models/sabr-black-vol.py:0`
## `Heston.value_lewis`
`pricing_models/heston-value-lewis.py:0`
## `model_implementation`
`pricing_models/model-implementation.py:0`
## `process_type`
`pricing_models/process-type.py:0`
FILE:references/components/utilities.md
# utilities (4 classes)
## `Date.add_days`
`utilities/date-add-days.py:0`
## `Schedule.generate`
`utilities/schedule-generate.py:0`
## `interpolation_scheme`
`utilities/interpolation-scheme.py:0`
## `day_count_convention`
`utilities/day-count-convention.py:0`
训练动态知识图谱嵌入模型,学习时序实体关系表示,支持链接预测和时间预测任务。
---
name: finance-kg-embedding
description: |-
训练动态知识图谱嵌入模型,学习时序实体关系表示,支持链接预测和时间预测任务。
license: Proprietary. See LICENSE.txt in project root.
compatibility: Designed for Doramagic-host ecosystem (Claude Code / openclaw / Cursor). Requires Python 3.12+ with uv package manager.
metadata:
version: "v6.1"
blueprint_id: "finance-bp-080"
compiled_at: "2026-04-22T13:00:31.071227+00:00"
capability_markets: "global"
capability_activities: "macro-data"
sop_version: "crystal-compilation-v6.1"
---
# 金融知识图谱嵌入 (finance-kg-embedding)
> 训练动态知识图谱嵌入模型,学习时序实体关系表示,支持链接预测和时间预测任务。
## Pipeline
`data_collection -> data_storage -> factor_computation -> target_selection -> trading_execution -> visualization`
## Top Use Cases (5 total)
### KGTransformer Model Training Pipeline (`UC-101`)
Training a knowledge graph-based transformer model for temporal/dynamic knowledge graph embedding tasks to learn entity and relation representations o
**Triggers**: training, knowledge graph, KGTransformer
### Dynamic Knowledge Graph Model Training (`UC-102`)
Training dynamic knowledge graph models to learn temporal entity and relation embeddings for link prediction and event time prediction tasks
**Triggers**: knowledge graph, dynamic graph, temporal modeling
### Early Stopping Training Utility (`UC-103`)
Preventing overfitting during model training by automatically stopping training when validation performance stops improving, with checkpoint managemen
**Triggers**: early stopping, overfitting prevention, model training
For all **5** use cases, see [references/USE_CASES.md](references/USE_CASES.md).
**Execute trigger**: `When user intent matches intent_router.uc_entries[].positive_terms AND user uses action verb (run/execute/跑/执行/backtest/fetch/collect)`
## What I'll Ask You
- Target market: A-share (default), HK, or crypto? (US stocks in ZVT are half-baked — stockus_nasdaq_AAPL exists but coverage is thin)
- Data source / provider: eastmoney (free, no account), joinquant (account+paid), baostock (free, good history), akshare, or qmt (broker)?
- Strategy type: MACD golden-cross, MA crossover, volume breakout, fundamental screen, or custom factor?
- Time range: start_timestamp and end_timestamp for backtest period
- Target entity IDs: specific stocks (stock_sh_600000) or index components (SZ1000)?
## Semantic Locks (Fatal)
| ID | Rule | On Violation |
|---|---|---|
| `SL-01` | Execute sell orders before buy orders in every trading cycle | halt |
| `SL-02` | Trading signals MUST use next-bar execution (no look-ahead) | halt |
| `SL-03` | Entity IDs MUST follow format entity_type_exchange_code | halt |
| `SL-04` | DataFrame index MUST be MultiIndex (entity_id, timestamp) | halt |
| `SL-05` | TradingSignal MUST have EXACTLY ONE of: position_pct, order_money, order_amount | halt |
| `SL-06` | filter_result column semantics: True=BUY, False=SELL, None/NaN=NO ACTION | halt |
| `SL-07` | Transformer MUST run BEFORE Accumulator in factor pipeline | halt |
| `SL-08` | MACD parameters locked: fast=12, slow=26, signal=9 | halt |
Full lock definitions: [references/LOCKS.md](references/LOCKS.md)
## Top Anti-Patterns (14 total)
- **`AP-MACRO-DATA-001`**: SEC EDGAR Rate Limit Violation
- **`AP-MACRO-DATA-002`**: Temporal Knowledge Graph Look-Ahead Bias
- **`AP-MACRO-DATA-003`**: Technical Indicator Look-Ahead Bias via Missing Shift
All 14 anti-patterns: [references/ANTI_PATTERNS.md](references/ANTI_PATTERNS.md)
## Evidence Quality Notice
> [QUALITY NOTICE] This crystal was compiled from blueprint finance-bp-080. Evidence verify ratio = 19.0% and audit fail total = 15. Generated results may have uncaptured requirement gaps. Verify critical decisions against source files (LATEST.yaml / LATEST.jsonl).
## Reference Files
| File | Contents | When to Load |
|---|---|---|
| [references/seed.yaml](references/seed.yaml) | V6+ 全量权威 (source-of-truth) | 有行为/决策争议时必读 |
| [references/ANTI_PATTERNS.md](references/ANTI_PATTERNS.md) | 14 条跨项目反模式 | 开始实现前 |
| [references/WISDOM.md](references/WISDOM.md) | 跨项目精华借鉴 | 架构决策时 |
| [references/CONSTRAINTS.md](references/CONSTRAINTS.md) | domain + fatal 约束 | 规则冲突时 |
| [references/USE_CASES.md](references/USE_CASES.md) | 全量 KUC-* 业务场景 | 需要完整示例时 |
| [references/LOCKS.md](references/LOCKS.md) | SL-* + preconditions + hints | 生成回测/交易代码前 |
| [references/COMPONENTS.md](references/COMPONENTS.md) | AST 组件地图(按 module 拆分)| 查 API 时 |
---
*Compiled by Doramagic crystal-compilation-v6.1 from `finance-bp-080` blueprint at 2026-04-22T13:00:31.071227+00:00.*
*See [human_summary.md](human_summary.md) for non-technical overview.*
FILE:human_summary.md
# Human Summary
> I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow. Just tell me what you want; I'll write the code, you don't have to dig docs. (Heads up: ZVT natively supports A-share, HK, and crypto. US stocks — stockus_nasdaq_AAPL — are half-baked; don't bother for serious work.)
## What I Can Do
- **tagline**: I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow. Just tell me what you want; I'll write the code, you don't have to dig docs. (Heads up: ZVT natively supports A-share, HK, and crypto. US stocks — stockus_nasdaq_AAPL — are half-baked; don't bother for serious work.)
- **use_cases**: ['Early Stopping Training Utility', 'Dynamic Knowledge Graph Model Training', 'KGTransformer Model Training Pipeline', 'A-share MACD daily golden-cross backtest with hfq price adjustment from eastmoney', 'End-to-end ZVT pipeline: FinanceRecorder + GoodCompanyFactor + StockTrader', 'Multi-factor strategy with TargetSelector (AND mode) combining MACD + volume breakout', 'Index composition data collection (SZ1000, SZ2000) with EM recorder']
## What I Ask You
- Target market: A-share (default), HK, or crypto? (US stocks in ZVT are half-baked — stockus_nasdaq_AAPL exists but coverage is thin)
- Data source / provider: eastmoney (free, no account), joinquant (account+paid), baostock (free, good history), akshare, or qmt (broker)?
- Strategy type: MACD golden-cross, MA crossover, volume breakout, fundamental screen, or custom factor?
- Time range: start_timestamp and end_timestamp for backtest period
- Target entity IDs: specific stocks (stock_sh_600000) or index components (SZ1000)?
FILE:references/ANTI_PATTERNS.md
# Anti-Patterns (Cross-Project)
Total: **14**
## finance-bp-074--FinRobot (1)
### `AP-MACRO-DATA-001` — SEC EDGAR Rate Limit Violation <sub>(high)</sub>
When implementing SEC API calls without applying rate limiting decorators, requests exceed the regulatory 10 requests per second limit. This causes IP blocking from SEC EDGAR, preventing all subsequent access to critical financial filings and completely disrupting the data collection pipeline. FinRobot demonstrates that SEC enforces strict rate limits and missing User-Agent headers compound this by causing silent request failures.
## finance-bp-077--Open_Source_Economic_Model (2)
### `AP-MACRO-DATA-004` — EIOPA Non-Compliant Curve Extrapolation <sub>(high)</sub>
When implementing the Smith-Wilson algorithm for EIOPA Solvency II calculations, using non-EIOPA compliant formulas or incorrect convergence point calculations violates regulatory specifications. The convergence point must use max(U+40, 60) years per EIOPA paragraph 120. Non-compliant formulas will fail regulatory audits for insurance liability calculations and produce incorrect risk-free rates, leading to materially wrong liability valuations.
### `AP-MACRO-DATA-009` — CSV BOM Encoding Corruption in Data Import <sub>(medium)</sub>
When importing CSV portfolio files with special characters without using 'utf-8-sig' encoding to handle BOM markers, CSV files with UTF-8 BOM markers fail to parse correctly. This causes KeyError exceptions when reading row fields, preventing portfolio data from loading entirely. The BOM marker silently corrupts the first column name read by pandas.
## finance-bp-080--FinDKG (3)
### `AP-MACRO-DATA-002` — Temporal Knowledge Graph Look-Ahead Bias <sub>(high)</sub>
When implementing temporal data splitting for knowledge graphs, using non-temporal train/val/test splits causes the model to see future events during training. The violation of train_edges occurring before val_edges and test_edges temporally results in inflated metrics that do not reflect real-world performance. This produces overfit models that fail catastrophically when deployed for actual temporal prediction tasks.
### `AP-MACRO-DATA-008` — DGL Graph Attribute Propagation Failure in Temporal Batching <sub>(medium)</sub>
When implementing temporal knowledge graph data collation without propagating graph attributes (num_relations, num_all_nodes, time_interval) to subgraph variants created by collate_fn, downstream model components encounter missing attribute errors. The EmbeddingUpdater and EdgeModel expect these attributes on all graph objects including subgraphs, causing training to fail with AttributeError.
### `AP-MACRO-DATA-014` — Temporal DataLoader Shuffling Breaking Graph Ordering <sub>(medium)</sub>
When configuring DataLoader for temporal knowledge graph training with shuffle=True, the temporal ordering required for cumulative graph construction is violated. The model receives edges in non-chronological order, breaking the prior_G, batch_G, cumulative_G construction logic that depends on edges_before_batch occurring before edges_in_batch.
## finance-bp-083--Economic-Dashboard (3)
### `AP-MACRO-DATA-003` — Technical Indicator Look-Ahead Bias via Missing Shift <sub>(high)</sub>
When implementing SMA crossover detection (golden/death cross) without using shift(1) to compare current bar state with prior bar state, crossover detection uses current bar data causing look-ahead bias. Signals appear to fire at the same bar as the cross occurs, producing unrealistic backtest results that fail in live trading. Rationalizing this with 'we need the current bar signal immediately' leads to future information leaking into current signals.
### `AP-MACRO-DATA-010` — OHLCV Data Quality Validation Failure <sub>(medium)</sub>
When calculating technical indicators from OHLCV data without verifying required columns (open, high, low, close, volume), missing required OHLCV columns causes ValueError and prevents technical indicator calculation for affected tickers. This blocks downstream regime classification and pattern detection for all tickers with incomplete data.
### `AP-MACRO-DATA-011` — Inconsistent Primary Key Schema Causing JOIN Failures <sub>(medium)</sub>
When storing derived features in DuckDB with a different primary key schema than technical_features table, inconsistent primary keys prevent JOIN operations between tables. This breaks regime classification and pattern detection pipelines. The composite primary key (ticker, date) must be consistent across all feature tables to enable efficient querying and data integrity.
## finance-bp-105--open-climate-investing (5)
### `AP-MACRO-DATA-005` — Factor Regression Using Raw Returns Instead of Excess Returns <sub>(high)</sub>
When computing returns for CAPM/Fama-French factor regression, using raw stock returns instead of subtracting the risk-free rate (Rf) violates standard financial econometric methodology. CAPM/FF regression requires excess returns (Return - Rf); using raw returns produces incorrect beta estimates that misrepresent a stock's systematic risk exposure. This leads to fundamentally flawed risk attribution and portfolio construction decisions.
### `AP-MACRO-DATA-006` — Percentage vs Decimal Unit Mismatch in Factor Data <sub>(high)</sub>
When importing Fama-French factors from CSV files, failing to divide percentage-formatted factors (e.g., 5.2) by 100 before regression causes coefficients scaled by 100x. This produces statistically invalid inference and meaningless factor loadings. The same issue applies to risk-free rate values, corrupting all CAPM beta calculations downstream.
### `AP-MACRO-DATA-007` — Insufficient Regression Observations for Statistical Validity <sub>(medium)</sub>
When implementing factor regression analysis, using fewer than 20 data points after filtering (inner join, winsorization, date range) produces unreliable or undefined t-statistics and p-values. OLS with insufficient observations produces meaningless regression coefficients, making it impossible to distinguish significant factor exposures from noise. This commonly occurs when combining multiple data sources with missing values.
### `AP-MACRO-DATA-012` — Frequency Column Enforcement Missing in Time Series Schema <sub>(medium)</sub>
When creating PostgreSQL schema for time series tables without explicit frequency column enforcement of 'MONTHLY' or 'DAILY' text values, mixed frequency data corrupts regression calculations. Combining incompatible data frequencies produces statistically invalid regression results. The database must enforce frequency consistency to prevent silent data corruption.
### `AP-MACRO-DATA-013` — PostgreSQL Fork in Multiprocessing Context <sub>(medium)</sub>
When implementing multiprocessing for parallel regression execution using fork start method with psycopg2 database connections, child processes inherit corrupted connection state. This causes 'connection already closed' errors or corrupted connection state in child processes, resulting in failed database writes and incomplete factor regression calculations.
FILE:references/COMPONENTS.md
# Component Capability Map
**Project**: finance-bp-080--FinDKG
**Scan date**: 2026-04-22
**Stats**: {'total_files': 9, 'total_classes': 43, 'total_functions': 0, 'total_stages': 9}
## Modules (9)
- [data_loading_&_temporal_graph_construction](components/data_loading_-_temporal_graph_construction.md): 4 classes
- [data_collocation](components/data_collocation.md): 2 classes
- [dynamic_embedding_updater](components/dynamic_embedding_updater.md): 8 classes
- [graph_neural_network_convolution](components/graph_neural_network_convolution.md): 4 classes
- [static-dynamic_embedding_combination](components/static-dynamic_embedding_combination.md): 5 classes
- [temporal_link_prediction](components/temporal_link_prediction.md): 3 classes
- [inter-event_time_prediction_(tpp)](components/inter-event_time_prediction_-tpp.md): 5 classes
- [training_pipeline](components/training_pipeline.md): 6 classes
- [evaluation_&_metrics](components/evaluation_-_metrics.md): 6 classes
FILE:references/CONSTRAINTS.md
# Constraints
## preservation_manifest
```yaml
required_objects:
business_decisions_count: 143
fatal_constraints_count: 70
non_fatal_constraints_count: 164
use_cases_count: 5
semantic_locks_count: 12
preconditions_count: 4
evidence_quality_rules_count: 2
traceback_scenarios_count: 5
```
FILE:references/LOCKS.md
# Semantic Locks + Preconditions
## Semantic Locks (12)
### `SL-01` <sub>(on_violation: fatal)</sub>
Execute sell orders before buy orders in every trading cycle
### `SL-02` <sub>(on_violation: fatal)</sub>
Trading signals MUST use next-bar execution (no look-ahead)
### `SL-03` <sub>(on_violation: fatal)</sub>
Entity IDs MUST follow format entity_type_exchange_code
### `SL-04` <sub>(on_violation: fatal)</sub>
DataFrame index MUST be MultiIndex (entity_id, timestamp)
### `SL-05` <sub>(on_violation: fatal)</sub>
TradingSignal MUST have EXACTLY ONE of: position_pct, order_money, order_amount
### `SL-06` <sub>(on_violation: fatal)</sub>
filter_result column semantics: True=BUY, False=SELL, None/NaN=NO ACTION
### `SL-07` <sub>(on_violation: fatal)</sub>
Transformer MUST run BEFORE Accumulator in factor pipeline
### `SL-08` <sub>(on_violation: fatal)</sub>
MACD parameters locked: fast=12, slow=26, signal=9
### `SL-09` <sub>(on_violation: warning)</sub>
Default transaction costs: buy_cost=0.001, sell_cost=0.001, slippage=0.001
### `SL-10` <sub>(on_violation: fatal)</sub>
A-share equity trading is T+1 (no same-day close of buy positions)
### `SL-11` <sub>(on_violation: fatal)</sub>
Recorder subclass MUST define provider AND data_schema class attributes
### `SL-12` <sub>(on_violation: fatal)</sub>
Factor result_df MUST contain either 'filter_result' OR 'score_result' column
## Preconditions (4)
- **`PC-01`**: `python3 -c 'import zvt; print(zvt.__version__)'` → on_fail: Run: python3 -m pip install zvt then re-run: python3 -m zvt.init_dirs to initialize data directories
- **`PC-02`**: `python3 -c "from zvt.api.kdata import get_kdata; df = get_kdata(entity_ids=['stock_sh_600000'], limit=1); assert df is not None and len(df) > 0, 'No kdata found'"` → on_fail: Run recorder first: python3 -m zvt.recorders.em.em_stock_kdata_recorder --entity_ids stock_sh_600000 (replace with your target entity IDs)
- **`PC-03`**: `python3 -c "import os; from pathlib import Path; zvt_home = Path(os.environ.get('ZVT_HOME', Path.home() / '.zvt')); assert zvt_home.exists(), f'ZVT home not found: {zvt_home}'"` → on_fail: Run: python3 -m zvt.init_dirs
- **`PC-04`**: `python3 -c "import os, tempfile; from pathlib import Path; zvt_home = Path(os.environ.get('ZVT_HOME', Path.home() / '.zvt')); test_f = zvt_home / '.write_test'; test_f.touch(); test_f.unlink()"` → on_fail: Check directory permissions: chmod u+w ~/.zvt or set ZVT_HOME environment variable to a writable location
FILE:references/USE_CASES.md
# Known Use Cases (KUC)
Total: **5**
## `KUC-101`
**Source**: `train_DKG_run.py`
Training a knowledge graph-based transformer model for temporal/dynamic knowledge graph embedding tasks to learn entity and relation representations over time.
## `KUC-102`
**Source**: `DKG/train.py`
Training dynamic knowledge graph models to learn temporal entity and relation embeddings for link prediction and event time prediction tasks.
## `KUC-103`
**Source**: `DKG/utils/train_utils.py`
Preventing overfitting during model training by automatically stopping training when validation performance stops improving, with checkpoint management.
## `KUC-104`
**Source**: `DKG/eval.py`
Evaluating trained knowledge graph models on link prediction and time prediction tasks to measure model performance using various metrics.
## `KUC-105`
**Source**: `DKG/utils/eval_utils.py`
Computing standard ranking metrics (MRR, recall) and regression metrics (MAE, MSE, RMSE) for evaluating machine learning model performance.
FILE:references/WISDOM.md
# Cross-Project Wisdom
Total: **8**
## `CW-MACRO-DATA-001` — Temporal Ordering Enforcement
**From**: finance-bp-080--FinDKG, finance-bp-083--Economic-Dashboard · **Applicable to**: macro-data
Across temporal knowledge graphs and financial time series, strict temporal ordering must be enforced in train/val/test splits and data loading. Train edges must occur strictly before validation edges, which must occur strictly before test edges. DataLoaders must never shuffle temporal data. Apply this pattern whenever implementing any time-series ML pipeline to prevent look-ahead bias that inflates evaluation metrics.
## `CW-MACRO-DATA-002` — Regulatory Formula Compliance
**From**: finance-bp-077--Open_Source_Economic_Model, finance-bp-105--open-climate-investing · **Applicable to**: macro-data
When implementing financial calculations subject to regulatory oversight (EIOPA Solvency II, CAPM, Fama-French), use exact formula specifications from authoritative sources. The Smith-Wilson convergence point must follow EIOPA paragraph 120, factor regressions must use excess returns with properly scaled inputs. Apply this pattern when calculations will be used for regulatory reporting or investment decision-making.
## `CW-MACRO-DATA-003` — Strict Data Schema Enforcement
**From**: finance-bp-083--Economic-Dashboard, finance-bp-077--Open_Source_Economic_Model · **Applicable to**: macro-data
Financial data pipelines require strict schema validation at ingestion points. OHLCV requires specific columns, CSV imports require exact column names matching field access, INI files require specific sections. Missing or malformed schema elements should fail loudly rather than produce silent corruption. Apply this pattern during data import to catch errors early before downstream calculations use bad data.
## `CW-MACRO-DATA-004` — Composite Primary Key Uniqueness
**From**: finance-bp-105--open-climate-investing, finance-bp-080--FinDKG, finance-bp-083--Economic-Dashboard · **Applicable to**: macro-data
Time-series financial databases require composite primary keys (ticker, date) to ensure uniqueness and enable efficient querying. Inconsistent primary keys across tables break JOIN operations essential for feature merging. Apply this pattern when designing any financial database schema involving time-series measurements with multiple entities.
## `CW-MACRO-DATA-005` — External API Rate Limiting
**From**: finance-bp-074--FinRobot · **Applicable to**: macro-data
When accessing external financial APIs (SEC EDGAR, data vendors), strict rate limiting must be implemented before deployment. SEC EDGAR enforces 10 requests per second with IP blocking consequences. Use decorators and proper User-Agent headers. Apply this pattern when integrating any external financial data API to prevent service disruption that blocks critical data access.
## `CW-MACRO-DATA-006` — Graph Attribute Propagation in Batching
**From**: finance-bp-080--FinDKG, finance-bp-105--open-climate-investing · **Applicable to**: macro-data
When creating subgraph variants during batch collation in graph-based ML, all metadata attributes (num_nodes, num_relations, time_interval) must be explicitly propagated to each subgraph. Downstream model components expect these attributes on all graph objects. Apply this pattern whenever implementing custom collate functions for graph neural networks to prevent training failures.
## `CW-MACRO-DATA-007` — Statistical Validity Thresholds
**From**: finance-bp-105--open-climate-investing, finance-bp-083--Economic-Dashboard · **Applicable to**: macro-data
Factor regressions and statistical calculations require minimum observation counts (typically 20+) for reliable inference. Inner joins, winsorization, and date filtering reduce observations; pipeline validation must check for sufficient data points before regression. Apply this pattern whenever computing regression statistics to ensure results are meaningful rather than spurious.
## `CW-MACRO-DATA-008` — Data Type Strictness for ML Operations
**From**: finance-bp-080--FinDKG, finance-bp-077--Open_Source_Economic_Model · **Applicable to**: macro-data
Graph operations and time calculations require strict dtype consistency (float32 for time values, integer for node types, boolean for masks). Type mismatches cause silent failures in edge_subgraph, degree calculations, and time interval transformations. Apply this pattern when preparing data for graph neural networks or any numerical ML pipeline to catch dtype issues early.
FILE:references/components/data_collocation.md
# data_collocation (2 classes)
## `collate_fn`
`data_collocation/collate-fn.py:0`
## `batch_time_window`
`data_collocation/batch-time-window.py:0`
FILE:references/components/data_loading_-_temporal_graph_construction.md
# data_loading_&_temporal_graph_construction (4 classes)
## `load_temporal_knowledge_graph`
`data_loading_&_temporal_graph_construction/load-temporal-knowledge-graph.py:0`
## `load_data_table`
`data_loading_&_temporal_graph_construction/load-data-table.py:0`
## `get_edge_mask`
`data_loading_&_temporal_graph_construction/get-edge-mask.py:0`
## `data_format`
`data_loading_&_temporal_graph_construction/data-format.py:0`
FILE:references/components/dynamic_embedding_updater.md
# dynamic_embedding_updater (8 classes)
## `EmbeddingUpdater.forward`
`dynamic_embedding_updater/embeddingupdater-forward.py:0`
## `GraphStructuralRNNConv.forward`
`dynamic_embedding_updater/graphstructuralrnnconv-forward.py:0`
## `GraphTemporalRNNConv.forward`
`dynamic_embedding_updater/graphtemporalrnnconv-forward.py:0`
## `RelationRNN.update`
`dynamic_embedding_updater/relationrnn-update.py:0`
## `EventTimeHelper.compute_inter_event_times`
`dynamic_embedding_updater/eventtimehelper-compute-inter-event-time.py:0`
## `gnn_architecture`
`dynamic_embedding_updater/gnn-architecture.py:0`
## `rnn_cell_type`
`dynamic_embedding_updater/rnn-cell-type.py:0`
## `inter_event_time_mode`
`dynamic_embedding_updater/inter-event-time-mode.py:0`
FILE:references/components/evaluation_-_metrics.md
# evaluation_&_metrics (6 classes)
## `evaluate`
`evaluation_&_metrics/evaluate.py:0`
## `eval_link_prediction`
`evaluation_&_metrics/eval-link-prediction.py:0`
## `EdgeEvaluator.evaluate_edges`
`evaluation_&_metrics/edgeevaluator-evaluate-edges.py:0`
## `RankingMetric.compute`
`evaluation_&_metrics/rankingmetric-compute.py:0`
## `RegressionMetric.compute`
`evaluation_&_metrics/regressionmetric-compute.py:0`
## `evaluation_mode`
`evaluation_&_metrics/evaluation-mode.py:0`
FILE:references/components/graph_neural_network_convolution.md
# graph_neural_network_convolution (4 classes)
## `RGCN.forward`
`graph_neural_network_convolution/rgcn-forward.py:0`
## `KGTransformer.forward`
`graph_neural_network_convolution/kgtransformer-forward.py:0`
## `GraphTransformer.layer`
`graph_neural_network_convolution/graphtransformer-layer.py:0`
## `gnn_layer`
`graph_neural_network_convolution/gnn-layer.py:0`
FILE:references/components/inter-event_time_prediction_-tpp.md
# inter-event_time_prediction_(tpp) (5 classes)
## `InterEventTimeModel.forward`
`inter-event_time_prediction_(tpp)/intereventtimemodel-forward.py:0`
## `LogNormMixTPP.forward`
`inter-event_time_prediction_(tpp)/lognormmixtpp-forward.py:0`
## `LogNormalMixtureDistribution.sample/log_prob`
`inter-event_time_prediction_(tpp)/lognormalmixturedistribution-sample-log-.py:0`
## `tpp_distribution_family`
`inter-event_time_prediction_(tpp)/tpp-distribution-family.py:0`
## `inter_event_time_mode`
`inter-event_time_prediction_(tpp)/inter-event-time-mode.py:0`
FILE:references/components/static-dynamic_embedding_combination.md
# static-dynamic_embedding_combination (5 classes)
## `Combiner.forward`
`static-dynamic_embedding_combination/combiner-forward.py:0`
## `StaticDynamicCombiner.combine`
`static-dynamic_embedding_combination/staticdynamiccombiner-combine.py:0`
## `GraphReadout.readout`
`static-dynamic_embedding_combination/graphreadout-readout.py:0`
## `combination_mode`
`static-dynamic_embedding_combination/combination-mode.py:0`
## `graph_readout_operation`
`static-dynamic_embedding_combination/graph-readout-operation.py:0`
FILE:references/components/temporal_link_prediction.md
# temporal_link_prediction (3 classes)
## `EdgeModel.forward`
`temporal_link_prediction/edgemodel-forward.py:0`
## `StaticEdgeModel.forward`
`temporal_link_prediction/staticedgemodel-forward.py:0`
## `edge_model_type`
`temporal_link_prediction/edge-model-type.py:0`
FILE:references/components/training_pipeline.md
# training_pipeline (6 classes)
## `main`
`training_pipeline/main.py:0`
## `compute_loss`
`training_pipeline/compute-loss.py:0`
## `forward_graphs`
`training_pipeline/forward-graphs.py:0`
## `EarlyStopping.check`
`training_pipeline/earlystopping-check.py:0`
## `optimize_target`
`training_pipeline/optimize-target.py:0`
## `early_stop_criterion`
`training_pipeline/early-stop-criterion.py:0`
提供基于Fava/Beancount的投资组合管理能力,支持税务亏损收割优化、资产配置分析与等价证券分组识别,辅助用户制定最优卖出策略。
---
name: fava-beancount-viewer
description: |-
提供基于Fava/Beancount的投资组合管理能力,支持税务亏损收割优化、资产配置分析与等价证券分组识别,辅助用户制定最优卖出策略。
license: Proprietary. See LICENSE.txt in project root.
compatibility: Designed for Doramagic-host ecosystem (Claude Code / openclaw / Cursor). Requires Python 3.12+ with uv package manager.
metadata:
version: "v6.1"
blueprint_id: "finance-bp-078"
compiled_at: "2026-04-22T13:00:29.702985+00:00"
capability_markets: "global"
capability_activities: "accounting"
sop_version: "crystal-compilation-v6.1"
---
# Fava 账本查看 (fava-beancount-viewer)
> 提供基于Fava/Beancount的投资组合管理能力,支持税务亏损收割优化、资产配置分析与等价证券分组识别,辅助用户制定最优卖出策略。
## Pipeline
`data_collection -> data_storage -> factor_computation -> target_selection -> trading_execution -> visualization`
## Top Use Cases (5 total)
### Portfolio Management CLI Entry Point (`UC-101`)
Provides a unified command-line interface for portfolio management operations including tax loss harvesting, asset allocation analysis, cash drag dete
**Triggers**: portfolio management, CLI, command line
### Tax-Optimized Selling Strategy (`UC-103`)
Determines optimal sell order for securities to minimize realized capital gains by analyzing cost basis and holding periods across multiple lots
**Triggers**: minimize gains, tax-efficient selling, capital gains optimization
### Tax Loss Harvesting Opportunity Detection (`UC-105`)
Identifies securities with unrealized losses that can be sold to harvest tax losses, typically looking back 30 days to find positions eligible for was
**Triggers**: tax loss harvesting, loss identification, wash sale
For all **5** use cases, see [references/USE_CASES.md](references/USE_CASES.md).
**Execute trigger**: `When user intent matches intent_router.uc_entries[].positive_terms AND user uses action verb (run/execute/跑/执行/backtest/fetch/collect)`
## What I'll Ask You
- Target market: A-share (default), HK, or crypto? (US stocks in ZVT are half-baked — stockus_nasdaq_AAPL exists but coverage is thin)
- Data source / provider: eastmoney (free, no account), joinquant (account+paid), baostock (free, good history), akshare, or qmt (broker)?
- Strategy type: MACD golden-cross, MA crossover, volume breakout, fundamental screen, or custom factor?
- Time range: start_timestamp and end_timestamp for backtest period
- Target entity IDs: specific stocks (stock_sh_600000) or index components (SZ1000)?
## Semantic Locks (Fatal)
| ID | Rule | On Violation |
|---|---|---|
| `SL-01` | Execute sell orders before buy orders in every trading cycle | halt |
| `SL-02` | Trading signals MUST use next-bar execution (no look-ahead) | halt |
| `SL-03` | Entity IDs MUST follow format entity_type_exchange_code | halt |
| `SL-04` | DataFrame index MUST be MultiIndex (entity_id, timestamp) | halt |
| `SL-05` | TradingSignal MUST have EXACTLY ONE of: position_pct, order_money, order_amount | halt |
| `SL-06` | filter_result column semantics: True=BUY, False=SELL, None/NaN=NO ACTION | halt |
| `SL-07` | Transformer MUST run BEFORE Accumulator in factor pipeline | halt |
| `SL-08` | MACD parameters locked: fast=12, slow=26, signal=9 | halt |
Full lock definitions: [references/LOCKS.md](references/LOCKS.md)
## Top Anti-Patterns (15 total)
- **`AP-ACCOUNTING-001`**: Using floating-point arithmetic for monetary amounts
- **`AP-ACCOUNTING-002`**: Skipping initialization calls before VM/script execution
- **`AP-ACCOUNTING-003`**: Mixing different asset types in monetary operations
All 15 anti-patterns: [references/ANTI_PATTERNS.md](references/ANTI_PATTERNS.md)
## Evidence Quality Notice
> [QUALITY NOTICE] This crystal was compiled from blueprint finance-bp-078. Evidence verify ratio = 21.6% and audit fail total = 14. Generated results may have uncaptured requirement gaps. Verify critical decisions against source files (LATEST.yaml / LATEST.jsonl).
## Reference Files
| File | Contents | When to Load |
|---|---|---|
| [references/seed.yaml](references/seed.yaml) | V6+ 全量权威 (source-of-truth) | 有行为/决策争议时必读 |
| [references/ANTI_PATTERNS.md](references/ANTI_PATTERNS.md) | 15 条跨项目反模式 | 开始实现前 |
| [references/WISDOM.md](references/WISDOM.md) | 跨项目精华借鉴 | 架构决策时 |
| [references/CONSTRAINTS.md](references/CONSTRAINTS.md) | domain + fatal 约束 | 规则冲突时 |
| [references/USE_CASES.md](references/USE_CASES.md) | 全量 KUC-* 业务场景 | 需要完整示例时 |
| [references/LOCKS.md](references/LOCKS.md) | SL-* + preconditions + hints | 生成回测/交易代码前 |
| [references/COMPONENTS.md](references/COMPONENTS.md) | AST 组件地图(按 module 拆分)| 查 API 时 |
---
*Compiled by Doramagic crystal-compilation-v6.1 from `finance-bp-078` blueprint at 2026-04-22T13:00:29.702985+00:00.*
*See [human_summary.md](human_summary.md) for non-technical overview.*
FILE:human_summary.md
# Human Summary
> I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow. Just tell me what you want; I'll write the code, you don't have to dig docs. (Heads up: ZVT natively supports A-share, HK, and crypto. US stocks — stockus_nasdaq_AAPL — are half-baked; don't bother for serious work.)
## What I Can Do
- **tagline**: I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow. Just tell me what you want; I'll write the code, you don't have to dig docs. (Heads up: ZVT natively supports A-share, HK, and crypto. US stocks — stockus_nasdaq_AAPL — are half-baked; don't bother for serious work.)
- **use_cases**: ['Tax-Optimized Selling Strategy', 'Related Ticker Grouping Utility', 'Portfolio Management CLI Entry Point', 'A-share MACD daily golden-cross backtest with hfq price adjustment from eastmoney', 'End-to-end ZVT pipeline: FinanceRecorder + GoodCompanyFactor + StockTrader', 'Multi-factor strategy with TargetSelector (AND mode) combining MACD + volume breakout', 'Index composition data collection (SZ1000, SZ2000) with EM recorder']
## What I Ask You
- Target market: A-share (default), HK, or crypto? (US stocks in ZVT are half-baked — stockus_nasdaq_AAPL exists but coverage is thin)
- Data source / provider: eastmoney (free, no account), joinquant (account+paid), baostock (free, good history), akshare, or qmt (broker)?
- Strategy type: MACD golden-cross, MA crossover, volume breakout, fundamental screen, or custom factor?
- Time range: start_timestamp and end_timestamp for backtest period
- Target entity IDs: specific stocks (stock_sh_600000) or index components (SZ1000)?
FILE:references/ANTI_PATTERNS.md
# Anti-Patterns (Cross-Project)
Total: **15**
## finance-bp-073--ledger (7)
### `AP-ACCOUNTING-002` — Skipping initialization calls before VM/script execution <sub>(high)</sub>
Executing Numscript VM without first calling ResolveResources() and ResolveBalances() causes panics with ErrResourcesNotInitialized or ErrBalancesNotInitialized. This prevents any script execution and leaves transactions in an unrunnable state, blocking financial operations entirely.
### `AP-ACCOUNTING-003` — Mixing different asset types in monetary operations <sub>(high)</sub>
Performing addition, subtraction, or take operations on amounts with different asset types produces invalid financial calculations. This violates the fundamental accounting principle that amounts in different currencies cannot be combined, leading to corrupted account balances and failed reconciliations.
### `AP-ACCOUNTING-004` — Missing insufficient funds validation <sub>(high)</sub>
Failing to detect when account balance cannot cover a requested withdrawal or transfer allows overdrafts beyond permitted limits. This causes real monetary losses, account balance violations, and potential regulatory compliance issues in global markets.
### `AP-ACCOUNTING-005` — Non-atomic transaction commit/rollback <sub>(high)</sub>
Processing database operations without atomic commit/rollback leaves partial state when failures occur. This corrupts account balances and volumes, violating double-entry bookkeeping integrity and making audit trails unreliable for global regulatory compliance.
### `AP-ACCOUNTING-006` — On-demand posting generation causing double-spending <sub>(high)</sub>
Computing postings on-demand rather than accumulating them during transaction execution fails to track already-spent funds within the same transaction. This creates double-spending vulnerabilities that violate atomic transaction semantics and can result in significant financial losses.
### `AP-ACCOUNTING-007` — Log insertion after transaction commit breaking event sourcing <sub>(high)</sub>
Committing the transaction before inserting the audit log breaks the event sourcing pattern fundamental to accounting integrity. This makes it impossible to rebuild state from logs and violates audit requirements necessary for global financial compliance.
### `AP-ACCOUNTING-008` — Incomplete transaction log hash chaining <sub>(high)</sub>
Computing log hashes without including the previous log hash breaks the immutable audit trail chain. This allows undetected tampering with historical transaction records, compromising financial integrity and regulatory audit compliance.
## finance-bp-073--ledger, finance-bp-129--beancount (1)
### `AP-ACCOUNTING-001` — Using floating-point arithmetic for monetary amounts <sub>(high)</sub>
Representing currency values with float64 or similar floating-point types causes precision loss during arithmetic operations. Rounding errors accumulate over multiple transactions, leading to incorrect balance calculations and potential financial losses. This violates the fundamental requirement that monetary calculations must be exact.
## finance-bp-078--fava_investor (4)
### `AP-ACCOUNTING-009` — Incorrect row data access patterns on query results <sub>(high)</sub>
Using dictionary notation (row['column_name']) on namedtuple query results raises TypeError since namedtuples only support attribute access. This breaks all module queries expecting attribute-style access, causing asset allocation, tax loss harvesting, and other critical financial computations to fail.
### `AP-ACCOUNTING-010` — Missing bidirectional inference for fund relationship declarations <sub>(medium)</sub>
When relationship A→B is declared but B→A is not inferred, the TLH partner list becomes incomplete. This leads to suboptimal tax-loss harvesting decisions where only some funds show all valid swap options, reducing potential tax savings for investors.
### `AP-ACCOUNTING-011` — Wash sale comparison within substantially identical groups <sub>(high)</sub>
Comparing a ticker to itself in its own substantially identical group falsely triggers wash sale warnings. This incorrectly blocks valid tax-loss harvesting transactions, causing investors to miss opportunities to realize tax losses and offset capital gains.
### `AP-ACCOUNTING-012` — Missing substantially identical tickers in wash sale queries <sub>(high)</sub>
Omitting substantially identical fund tickers from the wash sale comparison set allows purchases of similar funds within the 30-day window. This triggers unintended wash sales that disallow tax loss claims on subsequent sales of the original position.
## finance-bp-129--beancount (3)
### `AP-ACCOUNTING-013` — Using parsed entries with MISSING sentinel values for calculations <sub>(high)</sub>
Using parsed entries directly that contain MISSING sentinel values for balance or cost computations causes runtime errors or silent zero-value calculations. This results in incorrect portfolio valuations and reconciliation failures, compromising financial reporting accuracy.
### `AP-ACCOUNTING-014` — Underspecified interpolation with multiple missing values per currency <sub>(high)</sub>
Having more than one missing value per currency group creates an underdetermined system with no unique solution during interpolation. This causes InterpolationError and transaction failure, blocking balance calculations for affected accounts.
### `AP-ACCOUNTING-015` — Violating accounting identity in opening balance transactions <sub>(high)</sub>
Creating opening balance transactions where the total balance of summarized entries does not equal exactly zero violates the fundamental accounting identity (Assets = Liabilities + Equity). This causes the balance sheet to be fundamentally incorrect with non-zero total assets and liabilities.
FILE:references/COMPONENTS.md
# Component Capability Map
**Project**: finance-bp-078--fava_investor
**Scan date**: 2026-04-22
**Stats**: {'total_files': 8, 'total_classes': 35, 'total_functions': 0, 'total_stages': 8}
## Modules (8)
- [api_abstraction_layer](components/api_abstraction_layer.md): 5 classes
- [ticker_relationship_analyzer](components/ticker_relationship_analyzer.md): 4 classes
- [asset_allocation_by_class](components/asset_allocation_by_class.md): 4 classes
- [asset_allocation_by_account](components/asset_allocation_by_account.md): 5 classes
- [cash_drag_detector](components/cash_drag_detector.md): 3 classes
- [tax_loss_harvester](components/tax_loss_harvester.md): 7 classes
- [gains_minimizer](components/gains_minimizer.md): 3 classes
- [metadata_summarizer](components/metadata_summarizer.md): 4 classes
FILE:references/CONSTRAINTS.md
# Constraints
## preservation_manifest
```yaml
required_objects:
business_decisions_count: 128
fatal_constraints_count: 54
non_fatal_constraints_count: 168
use_cases_count: 5
semantic_locks_count: 12
preconditions_count: 4
evidence_quality_rules_count: 2
traceback_scenarios_count: 5
```
FILE:references/LOCKS.md
# Semantic Locks + Preconditions
## Semantic Locks (12)
### `SL-01` <sub>(on_violation: fatal)</sub>
Execute sell orders before buy orders in every trading cycle
### `SL-02` <sub>(on_violation: fatal)</sub>
Trading signals MUST use next-bar execution (no look-ahead)
### `SL-03` <sub>(on_violation: fatal)</sub>
Entity IDs MUST follow format entity_type_exchange_code
### `SL-04` <sub>(on_violation: fatal)</sub>
DataFrame index MUST be MultiIndex (entity_id, timestamp)
### `SL-05` <sub>(on_violation: fatal)</sub>
TradingSignal MUST have EXACTLY ONE of: position_pct, order_money, order_amount
### `SL-06` <sub>(on_violation: fatal)</sub>
filter_result column semantics: True=BUY, False=SELL, None/NaN=NO ACTION
### `SL-07` <sub>(on_violation: fatal)</sub>
Transformer MUST run BEFORE Accumulator in factor pipeline
### `SL-08` <sub>(on_violation: fatal)</sub>
MACD parameters locked: fast=12, slow=26, signal=9
### `SL-09` <sub>(on_violation: warning)</sub>
Default transaction costs: buy_cost=0.001, sell_cost=0.001, slippage=0.001
### `SL-10` <sub>(on_violation: fatal)</sub>
A-share equity trading is T+1 (no same-day close of buy positions)
### `SL-11` <sub>(on_violation: fatal)</sub>
Recorder subclass MUST define provider AND data_schema class attributes
### `SL-12` <sub>(on_violation: fatal)</sub>
Factor result_df MUST contain either 'filter_result' OR 'score_result' column
## Preconditions (4)
- **`PC-01`**: `python3 -c 'import zvt; print(zvt.__version__)'` → on_fail: Run: python3 -m pip install zvt then re-run: python3 -m zvt.init_dirs to initialize data directories
- **`PC-02`**: `python3 -c "from zvt.api.kdata import get_kdata; df = get_kdata(entity_ids=['stock_sh_600000'], limit=1); assert df is not None and len(df) > 0, 'No kdata found'"` → on_fail: Run recorder first: python3 -m zvt.recorders.em.em_stock_kdata_recorder --entity_ids stock_sh_600000 (replace with your target entity IDs)
- **`PC-03`**: `python3 -c "import os; from pathlib import Path; zvt_home = Path(os.environ.get('ZVT_HOME', Path.home() / '.zvt')); assert zvt_home.exists(), f'ZVT home not found: {zvt_home}'"` → on_fail: Run: python3 -m zvt.init_dirs
- **`PC-04`**: `python3 -c "import os, tempfile; from pathlib import Path; zvt_home = Path(os.environ.get('ZVT_HOME', Path.home() / '.zvt')); test_f = zvt_home / '.write_test'; test_f.touch(); test_f.unlink()"` → on_fail: Check directory permissions: chmod u+w ~/.zvt or set ZVT_HOME environment variable to a writable location
FILE:references/USE_CASES.md
# Known Use Cases (KUC)
Total: **5**
## `KUC-101`
**Source**: `fava_investor/cli/investor.py`
Provides a unified command-line interface for portfolio management operations including tax loss harvesting, asset allocation analysis, cash drag detection, and tax gain minimization.
## `KUC-102`
**Source**: `fava_investor/util/test_relatetickers.py`
Identifies and groups equivalent or substitutable securities (e.g., VTI, VTSAX, VTSMX) based on metadata annotations to support tax lot management and wash sale detection.
## `KUC-103`
**Source**: `fava_investor/modules/minimizegains/test_minimizegains.py`
Determines optimal sell order for securities to minimize realized capital gains by analyzing cost basis and holding periods across multiple lots.
## `KUC-104`
**Source**: `fava_investor/modules/assetalloc_class/test_asset_allocation.py`
Calculates and reports portfolio allocation breakdown by asset type (stocks, bonds, cash, etc.) with percentage distributions from investment account holdings.
## `KUC-105`
**Source**: `fava_investor/modules/tlh/test_libtlh.py`
Identifies securities with unrealized losses that can be sold to harvest tax losses, typically looking back 30 days to find positions eligible for wash sale rule exceptions.
FILE:references/WISDOM.md
# Cross-Project Wisdom
Total: **10**
## `CW-ACCOUNTING-001` — Use exact-precision integer types for monetary representation
**From**: finance-bp-073--ledger, finance-bp-129--beancount · **Applicable to**: accounting
Both the Numscript ledger and Beancount parser mandates using Decimal (beancount) or MonetaryInt based on big.Int (ledger) instead of floating-point. This pattern ensures no rounding errors accumulate in financial calculations, critical for audit compliance in global markets.
## `CW-ACCOUNTING-002` — Mandatory initialization sequence before execution
**From**: finance-bp-073--ledger · **Applicable to**: accounting
The Numscript VM requires a strict initialization sequence: ResolveResources() then ResolveBalances() must both be called before Execute(). Skipping any step causes panics. This teaches that VM/script execution requires careful state setup—always verify prerequisites before running financial logic.
## `CW-ACCOUNTING-003` — Dual idempotency key strategy
**From**: finance-bp-073--ledger · **Applicable to**: accounting
Using both IdempotencyKey and IdempotencyHash together ensures robust duplicate detection: IdempotencyKey prevents exact retries while IdempotencyHash catches retries with different input parameters that would otherwise incorrectly succeed. Single-key approaches leave gaps in financial transaction safety.
## `CW-ACCOUNTING-004` — Log-before-commit event sourcing pattern
**From**: finance-bp-073--ledger · **Applicable to**: accounting
In the transaction processing pipeline, the log must be inserted before committing the transaction to maintain event sourcing integrity. This ensures the audit trail can always reconstruct state and supports rollback scenarios, critical for regulatory compliance in global accounting.
## `CW-ACCOUNTING-005` — Read Committed isolation with FOR UPDATE locks
**From**: finance-bp-073--ledger · **Applicable to**: accounting
When implementing balance operations, use Read Committed isolation level combined with FOR UPDATE row locks. This prevents concurrent transactions from creating inconsistent balances (e.g., both succeeding when they should fail due to insufficient funds), ensuring data integrity under concurrent load.
## `CW-ACCOUNTING-006` — Transitive closure for equivalence relationships
**From**: finance-bp-078--fava_investor · **Applicable to**: accounting
When building commodity groups or substantially identical fund relationships, apply transitive closure to infer complete equivalence. If A equals B and B equals C, then A, B, and C form one group. This ensures wash sale detection and TLH calculations are complete and accurate across all declared relationships.
## `CW-ACCOUNTING-007` — Canonical representative selection for relationship groups
**From**: finance-bp-078--fava_investor · **Applicable to**: accounting
When selecting a representative for a substantially identical fund group, always return the same representative ticker for any member of that group. Inconsistent representative selection causes non-deterministic calculations where the same ticker gets different partners depending on which group member is queried.
## `CW-ACCOUNTING-008` — Immutable monetary objects with __slots__
**From**: finance-bp-129--beancount · **Applicable to**: accounting
Constructing Amount or Position objects using immutable Decimal values with __slots__ = () pattern prevents accidental mutation of monetary values after creation. This immutability ensures financial calculations remain consistent throughout transaction processing and audit trails.
## `CW-ACCOUNTING-009` — Eliminate all MISSING values before presenting parsed data as complete
**From**: finance-bp-129--beancount · **Applicable to**: accounting
Parsed entries with MISSING sentinel values are incomplete and cannot be used for financial reporting. All MISSING values must be resolved through booking and interpolation before claiming parsed entries are ready for balance calculations or realized/unrealized gains computation.
## `CW-ACCOUNTING-010` — Strict schema compatibility across class hierarchies
**From**: finance-bp-078--fava_investor, finance-bp-129--beancount · **Applicable to**: accounting
When extending base classes with additional functionality (like ScaledNAV extending RelateTickers), maintain compatibility with existing metadata schemas. Schema divergence causes extended classes to miss relationships declared for the base class, breaking wash sale detection and TLH recommendations.
FILE:references/components/api_abstraction_layer.md
# api_abstraction_layer (5 classes)
## `FavaInvestorAPI.query_func`
`api_abstraction_layer/favainvestorapi-query-func.py:0`
## `AccAPI.build_price_map`
`api_abstraction_layer/accapi-build-price-map.py:0`
## `AccAPI.realize`
`api_abstraction_layer/accapi-realize.py:0`
## `AccAPI.get_operating_currencies`
`api_abstraction_layer/accapi-get-operating-currencies.py:0`
## `api_implementation`
`api_abstraction_layer/api-implementation.py:0`
FILE:references/components/asset_allocation_by_account.md
# asset_allocation_by_account (5 classes)
## `portfolio_accounts`
`asset_allocation_by_account/portfolio-accounts.py:0`
## `by_account_name`
`asset_allocation_by_account/by-account-name.py:0`
## `by_account_open_metadata`
`asset_allocation_by_account/by-account-open-metadata.py:0`
## `asset_allocation`
`asset_allocation_by_account/asset-allocation.py:0`
## `selection_strategy`
`asset_allocation_by_account/selection-strategy.py:0`
FILE:references/components/asset_allocation_by_class.md
# asset_allocation_by_class (4 classes)
## `treeify`
`asset_allocation_by_class/treeify.py:0`
## `bucketize`
`asset_allocation_by_class/bucketize.py:0`
## `AssetClassNode.serialise`
`asset_allocation_by_class/assetclassnode-serialise.py:0`
## `bucketize_strategy`
`asset_allocation_by_class/bucketize-strategy.py:0`
FILE:references/components/cash_drag_detector.md
# cash_drag_detector (3 classes)
## `find_loose_cash`
`cash_drag_detector/find-loose-cash.py:0`
## `find_cash_commodities`
`cash_drag_detector/find-cash-commodities.py:0`
## `cash_definition`
`cash_drag_detector/cash-definition.py:0`
FILE:references/components/gains_minimizer.md
# gains_minimizer (3 classes)
## `find_minimized_gains`
`gains_minimizer/find-minimized-gains.py:0`
## `find_tax_burden`
`gains_minimizer/find-tax-burden.py:0`
## `lot_selection_algorithm`
`gains_minimizer/lot-selection-algorithm.py:0`
FILE:references/components/metadata_summarizer.md
# metadata_summarizer (4 classes)
## `build_tables`
`metadata_summarizer/build-tables.py:0`
## `active_accounts_metadata`
`metadata_summarizer/active-accounts-metadata.py:0`
## `commodities_metadata`
`metadata_summarizer/commodities-metadata.py:0`
## `directive_type`
`metadata_summarizer/directive-type.py:0`
FILE:references/components/tax_loss_harvester.md
# tax_loss_harvester (7 classes)
## `find_harvestable_lots`
`tax_loss_harvester/find-harvestable-lots.py:0`
## `gain_term`
`tax_loss_harvester/gain-term.py:0`
## `query_recently_bought`
`tax_loss_harvester/query-recently-bought.py:0`
## `recently_sold_at_loss`
`tax_loss_harvester/recently-sold-at-loss.py:0`
## `harvestable_by_commodity`
`tax_loss_harvester/harvestable-by-commodity.py:0`
## `wash_window`
`tax_loss_harvester/wash-window.py:0`
## `loss_threshold`
`tax_loss_harvester/loss-threshold.py:0`
FILE:references/components/ticker_relationship_analyzer.md
# ticker_relationship_analyzer (4 classes)
## `RelateTickers.substidenticals`
`ticker_relationship_analyzer/relatetickers-substidenticals.py:0`
## `RelateTickers.representative`
`ticker_relationship_analyzer/relatetickers-representative.py:0`
## `RelateTickers.compute_tlh_groups`
`ticker_relationship_analyzer/relatetickers-compute-tlh-groups.py:0`
## `relationship_source`
`ticker_relationship_analyzer/relationship-source.py:0`
FILE:references/seed.yaml
meta:
id: finance-bp-078-v5.3
version: v6.1
blueprint_id: finance-bp-078
sop_version: crystal-compilation-v6.1
source_language: en
compiled_at: '2026-04-22T13:00:29.702985+00:00'
target_host: openclaw
authoritative_artifact:
primary: seed.yaml
non_authoritative_derivatives:
- SKILL.md (host-generated summary, may lag)
- HEARTBEAT.md (host telemetry)
- memory/*.md (host conversational memory)
rule: On any behavioral decision (preconditions check, OV assertion, EQ rule firing, spec_lock verification), agents MUST
re-read seed.yaml. Derivatives are for UI display only and may be out-of-date.
execution_protocol:
install_trigger:
- Execute resources.host_adapter.install_recipes[] in declared order
- Verify each package with import check before proceeding
execute_trigger: When user intent matches intent_router.uc_entries[].positive_terms AND user uses action verb (run/execute/跑/执行/backtest/fetch/collect)
on_execute:
- Reload seed.yaml (do not rely on SKILL.md or cached summaries)
- Run preconditions[] in declared order; halt on first fatal failure with on_fail message to user
- Enter context_state_machine.CA1_MEMORY_CHECKED state
- Evaluate evidence_quality.enforcement_rules[]; prepend user_disclosure_template
- Translate user_facing_fields to user locale per locale_contract
- "[V6 READING ORDER]\nThis crystal contains the following V6 layers. Before answering any business question, the host\
\ MUST read them in order:\n 1. anti_patterns[] — cross-project anti-patterns (with AP-* ids)\n 2. cross_project_wisdom[]\
\ — cross-project wisdom (with CW-* ids)\n 3. domain_constraints_injected[] — domain constraints (SHARED-* ids)\n \
\ 4. known_use_cases[] — concrete business scenarios (KUC-* ids)\n 5. component_capability_map — AST component map\
\ (by module)\n\nWhen answering user questions, proactively cite relevant AP-*/CW-*/SHARED-*/KUC-* ids with source text.\
\ Examples: T+1 rules -> cite SHARED-* constraint; model comparison -> warn via AP-*; follow-holdings strategy -> cite\
\ KUC-* with example file."
workspace_resolution:
scripts_path: '{host_workspace}/scripts/'
skills_path: '{host_workspace}/skills/'
trace_path: '{host_workspace}/.trace/'
capability_tags:
markets:
- global
activities:
- accounting
upgraded_from: finance-bp-078-v1.seed.yaml
upgraded_at: '2026-04-22T13:20:17.397484+00:00'
v6_inputs:
ast_mind_map: knowledge/sources/finance/finance-bp-078--fava_investor/v6_inputs/ast_mind_map.yaml
anti_patterns: null
cross_project_wisdom: null
examples_kuc: knowledge/sources/finance/finance-bp-078--fava_investor/v6_inputs/examples_kuc.yaml
shared_pools_dir: knowledge/sources/finance/_shared
anti_patterns:
- id: AP-ACCOUNTING-001
title: Using floating-point arithmetic for monetary amounts
description: Representing currency values with float64 or similar floating-point types causes precision loss during arithmetic
operations. Rounding errors accumulate over multiple transactions, leading to incorrect balance calculations and potential
financial losses. This violates the fundamental requirement that monetary calculations must be exact.
project_source: finance-bp-073--ledger, finance-bp-129--beancount
severity: high
applicable_to_tags:
markets:
- global
activities:
- accounting
_source_file: anti-patterns/accounting.yaml
- id: AP-ACCOUNTING-002
title: Skipping initialization calls before VM/script execution
description: Executing Numscript VM without first calling ResolveResources() and ResolveBalances() causes panics with ErrResourcesNotInitialized
or ErrBalancesNotInitialized. This prevents any script execution and leaves transactions in an unrunnable state, blocking
financial operations entirely.
project_source: finance-bp-073--ledger
severity: high
applicable_to_tags:
markets:
- global
activities:
- accounting
_source_file: anti-patterns/accounting.yaml
- id: AP-ACCOUNTING-003
title: Mixing different asset types in monetary operations
description: Performing addition, subtraction, or take operations on amounts with different asset types produces invalid
financial calculations. This violates the fundamental accounting principle that amounts in different currencies cannot
be combined, leading to corrupted account balances and failed reconciliations.
project_source: finance-bp-073--ledger
severity: high
applicable_to_tags:
markets:
- global
activities:
- accounting
_source_file: anti-patterns/accounting.yaml
- id: AP-ACCOUNTING-004
title: Missing insufficient funds validation
description: Failing to detect when account balance cannot cover a requested withdrawal or transfer allows overdrafts beyond
permitted limits. This causes real monetary losses, account balance violations, and potential regulatory compliance issues
in global markets.
project_source: finance-bp-073--ledger
severity: high
applicable_to_tags:
markets:
- global
activities:
- accounting
_source_file: anti-patterns/accounting.yaml
- id: AP-ACCOUNTING-005
title: Non-atomic transaction commit/rollback
description: Processing database operations without atomic commit/rollback leaves partial state when failures occur. This
corrupts account balances and volumes, violating double-entry bookkeeping integrity and making audit trails unreliable
for global regulatory compliance.
project_source: finance-bp-073--ledger
severity: high
applicable_to_tags:
markets:
- global
activities:
- accounting
_source_file: anti-patterns/accounting.yaml
- id: AP-ACCOUNTING-006
title: On-demand posting generation causing double-spending
description: Computing postings on-demand rather than accumulating them during transaction execution fails to track already-spent
funds within the same transaction. This creates double-spending vulnerabilities that violate atomic transaction semantics
and can result in significant financial losses.
project_source: finance-bp-073--ledger
severity: high
applicable_to_tags:
markets:
- global
activities:
- accounting
_source_file: anti-patterns/accounting.yaml
- id: AP-ACCOUNTING-007
title: Log insertion after transaction commit breaking event sourcing
description: Committing the transaction before inserting the audit log breaks the event sourcing pattern fundamental to
accounting integrity. This makes it impossible to rebuild state from logs and violates audit requirements necessary for
global financial compliance.
project_source: finance-bp-073--ledger
severity: high
applicable_to_tags:
markets:
- global
activities:
- accounting
_source_file: anti-patterns/accounting.yaml
- id: AP-ACCOUNTING-008
title: Incomplete transaction log hash chaining
description: Computing log hashes without including the previous log hash breaks the immutable audit trail chain. This allows
undetected tampering with historical transaction records, compromising financial integrity and regulatory audit compliance.
project_source: finance-bp-073--ledger
severity: high
applicable_to_tags:
markets:
- global
activities:
- accounting
_source_file: anti-patterns/accounting.yaml
- id: AP-ACCOUNTING-009
title: Incorrect row data access patterns on query results
description: Using dictionary notation (row['column_name']) on namedtuple query results raises TypeError since namedtuples
only support attribute access. This breaks all module queries expecting attribute-style access, causing asset allocation,
tax loss harvesting, and other critical financial computations to fail.
project_source: finance-bp-078--fava_investor
severity: high
applicable_to_tags:
markets:
- global
activities:
- accounting
_source_file: anti-patterns/accounting.yaml
- id: AP-ACCOUNTING-010
title: Missing bidirectional inference for fund relationship declarations
description: When relationship A→B is declared but B→A is not inferred, the TLH partner list becomes incomplete. This leads
to suboptimal tax-loss harvesting decisions where only some funds show all valid swap options, reducing potential tax
savings for investors.
project_source: finance-bp-078--fava_investor
severity: medium
applicable_to_tags:
markets:
- global
activities:
- accounting
_source_file: anti-patterns/accounting.yaml
- id: AP-ACCOUNTING-011
title: Wash sale comparison within substantially identical groups
description: Comparing a ticker to itself in its own substantially identical group falsely triggers wash sale warnings.
This incorrectly blocks valid tax-loss harvesting transactions, causing investors to miss opportunities to realize tax
losses and offset capital gains.
project_source: finance-bp-078--fava_investor
severity: high
applicable_to_tags:
markets:
- global
activities:
- accounting
_source_file: anti-patterns/accounting.yaml
- id: AP-ACCOUNTING-012
title: Missing substantially identical tickers in wash sale queries
description: Omitting substantially identical fund tickers from the wash sale comparison set allows purchases of similar
funds within the 30-day window. This triggers unintended wash sales that disallow tax loss claims on subsequent sales
of the original position.
project_source: finance-bp-078--fava_investor
severity: high
applicable_to_tags:
markets:
- global
activities:
- accounting
_source_file: anti-patterns/accounting.yaml
- id: AP-ACCOUNTING-013
title: Using parsed entries with MISSING sentinel values for calculations
description: Using parsed entries directly that contain MISSING sentinel values for balance or cost computations causes
runtime errors or silent zero-value calculations. This results in incorrect portfolio valuations and reconciliation failures,
compromising financial reporting accuracy.
project_source: finance-bp-129--beancount
severity: high
applicable_to_tags:
markets:
- global
activities:
- accounting
_source_file: anti-patterns/accounting.yaml
- id: AP-ACCOUNTING-014
title: Underspecified interpolation with multiple missing values per currency
description: Having more than one missing value per currency group creates an underdetermined system with no unique solution
during interpolation. This causes InterpolationError and transaction failure, blocking balance calculations for affected
accounts.
project_source: finance-bp-129--beancount
severity: high
applicable_to_tags:
markets:
- global
activities:
- accounting
_source_file: anti-patterns/accounting.yaml
- id: AP-ACCOUNTING-015
title: Violating accounting identity in opening balance transactions
description: Creating opening balance transactions where the total balance of summarized entries does not equal exactly
zero violates the fundamental accounting identity (Assets = Liabilities + Equity). This causes the balance sheet to be
fundamentally incorrect with non-zero total assets and liabilities.
project_source: finance-bp-129--beancount
severity: high
applicable_to_tags:
markets:
- global
activities:
- accounting
_source_file: anti-patterns/accounting.yaml
cross_project_wisdom:
- wisdom_id: CW-ACCOUNTING-001
source_project: finance-bp-073--ledger, finance-bp-129--beancount
pattern_name: Use exact-precision integer types for monetary representation
description: Both the Numscript ledger and Beancount parser mandates using Decimal (beancount) or MonetaryInt based on big.Int
(ledger) instead of floating-point. This pattern ensures no rounding errors accumulate in financial calculations, critical
for audit compliance in global markets.
applicable_to_activity: accounting
_source_file: cross-project-wisdom/accounting.yaml
- wisdom_id: CW-ACCOUNTING-002
source_project: finance-bp-073--ledger
pattern_name: Mandatory initialization sequence before execution
description: 'The Numscript VM requires a strict initialization sequence: ResolveResources() then ResolveBalances() must
both be called before Execute(). Skipping any step causes panics. This teaches that VM/script execution requires careful
state setup—always verify prerequisites before running financial logic.'
applicable_to_activity: accounting
_source_file: cross-project-wisdom/accounting.yaml
- wisdom_id: CW-ACCOUNTING-003
source_project: finance-bp-073--ledger
pattern_name: Dual idempotency key strategy
description: 'Using both IdempotencyKey and IdempotencyHash together ensures robust duplicate detection: IdempotencyKey
prevents exact retries while IdempotencyHash catches retries with different input parameters that would otherwise incorrectly
succeed. Single-key approaches leave gaps in financial transaction safety.'
applicable_to_activity: accounting
_source_file: cross-project-wisdom/accounting.yaml
- wisdom_id: CW-ACCOUNTING-004
source_project: finance-bp-073--ledger
pattern_name: Log-before-commit event sourcing pattern
description: In the transaction processing pipeline, the log must be inserted before committing the transaction to maintain
event sourcing integrity. This ensures the audit trail can always reconstruct state and supports rollback scenarios, critical
for regulatory compliance in global accounting.
applicable_to_activity: accounting
_source_file: cross-project-wisdom/accounting.yaml
- wisdom_id: CW-ACCOUNTING-005
source_project: finance-bp-073--ledger
pattern_name: Read Committed isolation with FOR UPDATE locks
description: When implementing balance operations, use Read Committed isolation level combined with FOR UPDATE row locks.
This prevents concurrent transactions from creating inconsistent balances (e.g., both succeeding when they should fail
due to insufficient funds), ensuring data integrity under concurrent load.
applicable_to_activity: accounting
_source_file: cross-project-wisdom/accounting.yaml
- wisdom_id: CW-ACCOUNTING-006
source_project: finance-bp-078--fava_investor
pattern_name: Transitive closure for equivalence relationships
description: When building commodity groups or substantially identical fund relationships, apply transitive closure to infer
complete equivalence. If A equals B and B equals C, then A, B, and C form one group. This ensures wash sale detection
and TLH calculations are complete and accurate across all declared relationships.
applicable_to_activity: accounting
_source_file: cross-project-wisdom/accounting.yaml
- wisdom_id: CW-ACCOUNTING-007
source_project: finance-bp-078--fava_investor
pattern_name: Canonical representative selection for relationship groups
description: When selecting a representative for a substantially identical fund group, always return the same representative
ticker for any member of that group. Inconsistent representative selection causes non-deterministic calculations where
the same ticker gets different partners depending on which group member is queried.
applicable_to_activity: accounting
_source_file: cross-project-wisdom/accounting.yaml
- wisdom_id: CW-ACCOUNTING-008
source_project: finance-bp-129--beancount
pattern_name: Immutable monetary objects with __slots__
description: Constructing Amount or Position objects using immutable Decimal values with __slots__ = () pattern prevents
accidental mutation of monetary values after creation. This immutability ensures financial calculations remain consistent
throughout transaction processing and audit trails.
applicable_to_activity: accounting
_source_file: cross-project-wisdom/accounting.yaml
- wisdom_id: CW-ACCOUNTING-009
source_project: finance-bp-129--beancount
pattern_name: Eliminate all MISSING values before presenting parsed data as complete
description: Parsed entries with MISSING sentinel values are incomplete and cannot be used for financial reporting. All
MISSING values must be resolved through booking and interpolation before claiming parsed entries are ready for balance
calculations or realized/unrealized gains computation.
applicable_to_activity: accounting
_source_file: cross-project-wisdom/accounting.yaml
- wisdom_id: CW-ACCOUNTING-010
source_project: finance-bp-078--fava_investor, finance-bp-129--beancount
pattern_name: Strict schema compatibility across class hierarchies
description: When extending base classes with additional functionality (like ScaledNAV extending RelateTickers), maintain
compatibility with existing metadata schemas. Schema divergence causes extended classes to miss relationships declared
for the base class, breaking wash sale detection and TLH recommendations.
applicable_to_activity: accounting
_source_file: cross-project-wisdom/accounting.yaml
domain_constraints_injected: []
resources_injected: {}
known_use_cases:
- kuc_id: KUC-101
source_file: fava_investor/cli/investor.py
business_problem: Provides a unified command-line interface for portfolio management operations including tax loss harvesting,
asset allocation analysis, cash drag detection, and tax gain minimization.
intent_keywords:
- portfolio management
- CLI
- command line
- tax optimization
- investment analysis
stage: data_collection
data_domain: holding_data
type: live_trading
- kuc_id: KUC-102
source_file: fava_investor/util/test_relatetickers.py
business_problem: Identifies and groups equivalent or substitutable securities (e.g., VTI, VTSAX, VTSMX) based on metadata
annotations to support tax lot management and wash sale detection.
intent_keywords:
- equivalent tickers
- related securities
- commodity grouping
- ticker equivalence
- substitutable assets
stage: data_collection
data_domain: holding_data
type: data_pipeline
- kuc_id: KUC-103
source_file: fava_investor/modules/minimizegains/test_minimizegains.py
business_problem: Determines optimal sell order for securities to minimize realized capital gains by analyzing cost basis
and holding periods across multiple lots.
intent_keywords:
- minimize gains
- tax-efficient selling
- capital gains optimization
- lot selection
- cost basis optimization
stage: factor_computation
data_domain: holding_data
type: live_trading
- kuc_id: KUC-104
source_file: fava_investor/modules/assetalloc_class/test_asset_allocation.py
business_problem: Calculates and reports portfolio allocation breakdown by asset type (stocks, bonds, cash, etc.) with percentage
distributions from investment account holdings.
intent_keywords:
- asset allocation
- portfolio breakdown
- asset class distribution
- allocation report
- portfolio composition
stage: factor_computation
data_domain: holding_data
type: reporting
- kuc_id: KUC-105
source_file: fava_investor/modules/tlh/test_libtlh.py
business_problem: Identifies securities with unrealized losses that can be sold to harvest tax losses, typically looking
back 30 days to find positions eligible for wash sale rule exceptions.
intent_keywords:
- tax loss harvesting
- loss identification
- wash sale
- TLH opportunities
- tax loss selling
stage: factor_computation
data_domain: holding_data
type: live_trading
component_capability_map:
project: finance-bp-078--fava_investor
scan_date: '2026-04-22'
stats:
total_files: 8
total_classes: 35
total_functions: 0
total_stages: 8
modules:
api_abstraction_layer:
class_count: 5
stage_id: api_abstraction
stage_order: 1
responsibility: 'Provides unified interface for accessing Beancount ledger data from both Fava web UI and standalone
CLI. WHY: Enables code reuse while accommodating different runtime contexts - web plugin vs command-line tool share
the same data access logic.'
classes:
- name: FavaInvestorAPI.query_func
file: api_abstraction_layer/favainvestorapi-query-func.py
line: 0
kind: required_method
signature: ''
- name: AccAPI.build_price_map
file: api_abstraction_layer/accapi-build-price-map.py
line: 0
kind: required_method
signature: ''
- name: AccAPI.realize
file: api_abstraction_layer/accapi-realize.py
line: 0
kind: required_method
signature: ''
- name: AccAPI.get_operating_currencies
file: api_abstraction_layer/accapi-get-operating-currencies.py
line: 0
kind: required_method
signature: ''
- name: api_implementation
file: api_abstraction_layer/api-implementation.py
line: 0
kind: replaceable_point
design_decision_count: 3
ticker_relationship_analyzer:
class_count: 4
stage_id: ticker_relationships
stage_order: 2
responsibility: 'Infers relationships between investment tickers from incomplete metadata declarations. WHY: Tax loss
harvesting requires knowing which funds are ''substantially identical'' (trigger wash sales) vs ''substantially different''
(safe for TLH swaps), and this information should be declarable once and inferr'
classes:
- name: RelateTickers.substidenticals
file: ticker_relationship_analyzer/relatetickers-substidenticals.py
line: 0
kind: required_method
signature: ''
- name: RelateTickers.representative
file: ticker_relationship_analyzer/relatetickers-representative.py
line: 0
kind: required_method
signature: ''
- name: RelateTickers.compute_tlh_groups
file: ticker_relationship_analyzer/relatetickers-compute-tlh-groups.py
line: 0
kind: required_method
signature: ''
- name: relationship_source
file: ticker_relationship_analyzer/relationship-source.py
line: 0
kind: replaceable_point
design_decision_count: 4
asset_allocation_by_class:
class_count: 4
stage_id: asset_allocation_by_class
stage_order: 3
responsibility: 'Computes portfolio allocation percentages based on commodity metadata classifications (asset_allocation_*).
WHY: Enables investors to see if their portfolio matches target allocations without manual spreadsheet work, visualizing
how their actual holdings compare to target allocations.'
classes:
- name: treeify
file: asset_allocation_by_class/treeify.py
line: 0
kind: required_method
signature: ''
- name: bucketize
file: asset_allocation_by_class/bucketize.py
line: 0
kind: required_method
signature: ''
- name: AssetClassNode.serialise
file: asset_allocation_by_class/assetclassnode-serialise.py
line: 0
kind: required_method
signature: ''
- name: bucketize_strategy
file: asset_allocation_by_class/bucketize-strategy.py
line: 0
kind: replaceable_point
design_decision_count: 4
asset_allocation_by_account:
class_count: 5
stage_id: asset_allocation_by_account
stage_order: 4
responsibility: 'Groups account balances into portfolios based on regex patterns or metadata. WHY: Lets investors define
custom portfolio groupings without changing their account naming conventions, enabling flexible portfolio organization
independent of chart of accounts structure.'
classes:
- name: portfolio_accounts
file: asset_allocation_by_account/portfolio-accounts.py
line: 0
kind: required_method
signature: ''
- name: by_account_name
file: asset_allocation_by_account/by-account-name.py
line: 0
kind: required_method
signature: ''
- name: by_account_open_metadata
file: asset_allocation_by_account/by-account-open-metadata.py
line: 0
kind: required_method
signature: ''
- name: asset_allocation
file: asset_allocation_by_account/asset-allocation.py
line: 0
kind: required_method
signature: ''
- name: selection_strategy
file: asset_allocation_by_account/selection-strategy.py
line: 0
kind: replaceable_point
design_decision_count: 2
cash_drag_detector:
class_count: 3
stage_id: cash_drag
stage_order: 5
responsibility: 'Identifies uninvested cash sitting in brokerage accounts that could be deployed for investment. WHY:
Idle cash loses purchasing power to inflation over time; detecting it enables investors to take action and rebalance
their portfolios efficiently.'
classes:
- name: find_loose_cash
file: cash_drag_detector/find-loose-cash.py
line: 0
kind: required_method
signature: ''
- name: find_cash_commodities
file: cash_drag_detector/find-cash-commodities.py
line: 0
kind: required_method
signature: ''
- name: cash_definition
file: cash_drag_detector/cash-definition.py
line: 0
kind: replaceable_point
design_decision_count: 2
tax_loss_harvester:
class_count: 7
stage_id: tax_loss_harvesting
stage_order: 6
responsibility: 'Finds investment lots with unrealized losses suitable for tax loss harvesting (TLH). WHY: TLH turns
paper losses into actual tax deductions, reducing current tax burden in the US SpecID method. Allows investors to
systematically identify harvest opportunities.'
classes:
- name: find_harvestable_lots
file: tax_loss_harvester/find-harvestable-lots.py
line: 0
kind: required_method
signature: ''
- name: gain_term
file: tax_loss_harvester/gain-term.py
line: 0
kind: required_method
signature: ''
- name: query_recently_bought
file: tax_loss_harvester/query-recently-bought.py
line: 0
kind: required_method
signature: ''
- name: recently_sold_at_loss
file: tax_loss_harvester/recently-sold-at-loss.py
line: 0
kind: required_method
signature: ''
- name: harvestable_by_commodity
file: tax_loss_harvester/harvestable-by-commodity.py
line: 0
kind: required_method
signature: ''
- name: wash_window
file: tax_loss_harvester/wash-window.py
line: 0
kind: replaceable_point
- name: loss_threshold
file: tax_loss_harvester/loss-threshold.py
line: 0
kind: replaceable_point
design_decision_count: 5
gains_minimizer:
class_count: 3
stage_id: minimize_gains
stage_order: 7
responsibility: 'Determines optimal lot selection to minimize capital gains when selling. WHY: In US SpecID method,
choosing which lots to sell directly impacts tax liability. This module helps investors minimize their tax burden
by prioritizing highest-loss lots.'
classes:
- name: find_minimized_gains
file: gains_minimizer/find-minimized-gains.py
line: 0
kind: required_method
signature: ''
- name: find_tax_burden
file: gains_minimizer/find-tax-burden.py
line: 0
kind: required_method
signature: ''
- name: lot_selection_algorithm
file: gains_minimizer/lot-selection-algorithm.py
line: 0
kind: replaceable_point
design_decision_count: 4
metadata_summarizer:
class_count: 4
stage_id: metadata_summarizer
stage_order: 8
responsibility: 'Extracts and displays metadata from account/commodity Open directives as formatted tables. WHY: Allows
investors to store and view reference info (phone numbers, account numbers, contact details) alongside their ledger,
making metadata accessible without manual inspection.'
classes:
- name: build_tables
file: metadata_summarizer/build-tables.py
line: 0
kind: required_method
signature: ''
- name: active_accounts_metadata
file: metadata_summarizer/active-accounts-metadata.py
line: 0
kind: required_method
signature: ''
- name: commodities_metadata
file: metadata_summarizer/commodities-metadata.py
line: 0
kind: required_method
signature: ''
- name: directive_type
file: metadata_summarizer/directive-type.py
line: 0
kind: replaceable_point
design_decision_count: 3
data_flow_hints: []
locale_contract:
source_language: en
user_facing_fields:
- human_summary.what_i_can_do.tagline
- human_summary.what_i_can_do.use_cases[]
- human_summary.what_i_auto_fetch[]
- human_summary.what_i_ask_you[]
- evidence_quality.user_disclosure_template
- post_install_notice.message_template.positioning
- post_install_notice.message_template.capability_catalog.groups[].name
- post_install_notice.message_template.capability_catalog.groups[].description
- post_install_notice.message_template.capability_catalog.groups[].ucs[].name
- post_install_notice.message_template.capability_catalog.groups[].ucs[].short_description
- post_install_notice.message_template.call_to_action
- post_install_notice.message_template.featured_entries[].beginner_prompt
- post_install_notice.message_template.more_info_hint
- preconditions[].description
- preconditions[].on_fail
- intent_router.uc_entries[].name
- intent_router.uc_entries[].ambiguity_question
- architecture.pipeline
- architecture.stages[].narrative.does_what
- architecture.stages[].narrative.key_decisions
- architecture.stages[].narrative.common_pitfalls
- constraints.fatal[].consequence
- constraints.regular[].consequence
- output_validator.assertions[].failure_message
- acceptance.hard_gates[].on_fail
- skill_crystallization.action
locale_detection_order:
- explicit_user_declaration
- first_message_language
- system_locale
translation_enforcement:
trigger: on_first_user_message
action: Render user_facing_fields in detected locale, preserving all IDs (BD-/SL-/UC-/finance-C-) and code identifiers
verbatim
violation_code: LOCALE-01
violation_signal: User receives untranslated English Human Summary when detected locale != en
evidence_quality:
declared:
evidence_coverage_ratio: 1.0
evidence_verify_ratio: 0.21551724137931033
evidence_invalid: 91
evidence_verified: 25
evidence_auto_fixed: 0
audit_coverage: 33/33 (100%)
audit_pass_rate: 4/33 (12%)
audit_fail_total: 14
audit_finance_universal:
pass: 2
warn: 11
fail: 7
audit_subdomain_totals:
pass: 2
warn: 4
fail: 7
enforcement_rules:
- id: EQ-01
trigger: declared.evidence_verify_ratio < 0.5
action: MUST invoke traceback lookup for all cited BD-IDs in output before emitting business code — read LATEST.yaml sections
for each BD referenced
violation_code: EQ-01-V
violation_signal: Generated script references BD-IDs but no tool_call to read LATEST.yaml preceded code generation
user_disclosure_template: '[QUALITY NOTICE] This crystal was compiled from blueprint finance-bp-078. Evidence verify ratio
= 21.6% and audit fail total = 14. Generated results may have uncaptured requirement gaps. Verify critical decisions against
source files (LATEST.yaml / LATEST.jsonl).'
traceback:
source_files:
blueprint: LATEST.yaml
constraints: LATEST.jsonl
mandatory_lookup_scenarios:
- id: TB-01
condition: Two constraints have apparently conflicting enforcement rules
lookup_target: LATEST.jsonl — find both constraint IDs, compare `consequence` + `evidence_refs` to determine priority
- id: TB-02
condition: A business decision rationale is unclear or disputed
lookup_target: LATEST.yaml — locate BD-ID under business_decisions, read `rationale` + `alternative_considered` fields
- id: TB-03
condition: evidence_invalid > 0 in evidence_quality.declared
lookup_target: LATEST.yaml _enrich_meta — cross-check specific BD `evidence_refs` fields for invalid markers
- id: TB-04
condition: User asks where a rule comes from
lookup_target: LATEST.jsonl — find constraint by ID, read `confidence.evidence_refs` for source file + line number
- id: TB-05
condition: Generated code does not match expected ZVT API behavior
lookup_target: LATEST.yaml stages[].required_methods — verify method signature and evidence locator in source code
degraded_lookup:
no_fs_access: 'Ask the user to paste the relevant LATEST.yaml section or LATEST.jsonl lines for the BD-/finance-C- IDs
in question. Crystal ID: finance-bp-078-v5.0.'
trace_schema:
event_types:
- precondition_check
- spec_lock_check
- evidence_rule_fired
- evidence_rule_skipped
- locale_translation_emitted
- hard_gate_passed
- hard_gate_failed
- skill_emitted
- false_completion_claim
preconditions:
- id: PC-01
description: zvt package installed and importable
check_command: python3 -c 'import zvt; print(zvt.__version__)'
on_fail: 'Run: python3 -m pip install zvt then re-run: python3 -m zvt.init_dirs to initialize data directories'
severity: fatal
- id: PC-02
description: K-data exists for target entities (required before backtesting)
check_command: python3 -c "from zvt.api.kdata import get_kdata; df = get_kdata(entity_ids=['stock_sh_600000'], limit=1);
assert df is not None and len(df) > 0, 'No kdata found'"
on_fail: 'Run recorder first: python3 -m zvt.recorders.em.em_stock_kdata_recorder --entity_ids stock_sh_600000 (replace
with your target entity IDs)'
severity: fatal
applies_to_uc:
- UC-103
- UC-104
- UC-105
- id: PC-03
description: ZVT data directory initialized (~/.zvt or ZVT_HOME)
check_command: 'python3 -c "import os; from pathlib import Path; zvt_home = Path(os.environ.get(''ZVT_HOME'', Path.home()
/ ''.zvt'')); assert zvt_home.exists(), f''ZVT home not found: {zvt_home}''"'
on_fail: 'Run: python3 -m zvt.init_dirs'
severity: fatal
- id: PC-04
description: SQLite write permission for ZVT data directory
check_command: python3 -c "import os, tempfile; from pathlib import Path; zvt_home = Path(os.environ.get('ZVT_HOME', Path.home()
/ '.zvt')); test_f = zvt_home / '.write_test'; test_f.touch(); test_f.unlink()"
on_fail: 'Check directory permissions: chmod u+w ~/.zvt or set ZVT_HOME environment variable to a writable location'
severity: warn
intent_router:
uc_entries:
- uc_id: UC-101
name: Portfolio Management CLI Entry Point
positive_terms:
- portfolio management
- CLI
- command line
- tax optimization
- investment analysis
data_domain: holding_data
negative_terms:
- screening
- trading strategy
- ML prediction
- factor computation
ambiguity_question: Are you looking to use the command-line interface directly, or do you need to integrate one of these
modules into your own code?
- uc_id: UC-102
name: Related Ticker Grouping Utility
positive_terms:
- equivalent tickers
- related securities
- commodity grouping
- ticker equivalence
- substitutable assets
data_domain: holding_data
negative_terms:
- price prediction
- trading signals
- factor analysis
ambiguity_question: Do you need to group equivalent securities for tax purposes, or are you looking for price/volume analysis
of individual tickers?
- uc_id: UC-103
name: Tax-Optimized Selling Strategy
positive_terms:
- minimize gains
- tax-efficient selling
- capital gains optimization
- lot selection
- cost basis optimization
data_domain: holding_data
negative_terms:
- buy signals
- screening
- portfolio rebalancing
- ML prediction
ambiguity_question: Are you deciding which lots to sell to minimize taxes, or are you looking for new investment opportunities
to buy?
- uc_id: UC-104
name: Asset Allocation Analysis
positive_terms:
- asset allocation
- portfolio breakdown
- asset class distribution
- allocation report
- portfolio composition
data_domain: holding_data
negative_terms:
- tax loss harvesting
- screening
- trading signals
- ML prediction
ambiguity_question: Do you need a report showing how your portfolio is allocated across asset types, or are you looking
for specific tax optimization strategies?
- uc_id: UC-105
name: Tax Loss Harvesting Opportunity Detection
positive_terms:
- tax loss harvesting
- loss identification
- wash sale
- TLH opportunities
- tax loss selling
data_domain: holding_data
negative_terms:
- buy signals
- portfolio allocation
- screening
- ML prediction
ambiguity_question: Are you looking for securities to sell at a loss for tax benefits, or do you need to identify securities
to buy or allocate differently?
context_state_machine:
states:
- id: CA1_MEMORY_CHECKED
entry: Task started
exit: All memory queries attempted and recorded; memory_unavailable set if failed
timeout: 30s — skip memory, mark memory_unavailable=true, proceed to CA2
- id: CA2_GAPS_FILLED
entry: CA1 complete
exit: 'All FATAL-priority required inputs answered: target market (A-share/HK/US), data source, time range, strategy type'
timeout: NOT skippable — FATAL inputs MUST be user-answered before proceeding
- id: CA3_PATH_SELECTED
entry: CA2 complete
exit: intent_router matched single use case with confidence gap > 20% over next candidate, no data_domain ambiguity
timeout: Trigger ambiguity_question for top-2 candidates, await user selection
- id: CA4_EXECUTING
entry: CA3 complete + user explicit confirmation received
exit: All hard gates G1-Gn passed and output files written
timeout: NOT skippable — user confirmation of execution path required
enforcement: Code generation is PROHIBITED before CA4_EXECUTING. Any regression to earlier state MUST be announced to user.
buy/sell ordering SL-01 check runs at CA4 entry.
spec_lock_registry:
semantic_locks:
- id: SL-01
description: Execute sell orders before buy orders in every trading cycle
locked_value: sell() called before buy() in each Trader.run() iteration
violation_is: fatal
source_bd_ids:
- BD-018
- id: SL-02
description: Trading signals MUST use next-bar execution (no look-ahead)
locked_value: due_timestamp = happen_timestamp + level.to_second()
violation_is: fatal
source_bd_ids:
- BD-014
- BD-025
- id: SL-03
description: Entity IDs MUST follow format entity_type_exchange_code
locked_value: stock_sh_600000 | stockhk_hk_0700 | stockus_nasdaq_AAPL
violation_is: fatal
source_bd_ids: []
- id: SL-04
description: DataFrame index MUST be MultiIndex (entity_id, timestamp)
locked_value: df.index.names == ['entity_id', 'timestamp']
violation_is: fatal
source_bd_ids: []
- id: SL-05
description: 'TradingSignal MUST have EXACTLY ONE of: position_pct, order_money, order_amount'
locked_value: XOR enforcement in trading/__init__.py:68
violation_is: fatal
source_bd_ids: []
- id: SL-06
description: 'filter_result column semantics: True=BUY, False=SELL, None/NaN=NO ACTION'
locked_value: factor.py:475 order_type_flag mapping
violation_is: fatal
source_bd_ids: []
- id: SL-07
description: Transformer MUST run BEFORE Accumulator in factor pipeline
locked_value: 'compute_result(): transform at :403 before accumulator at :409'
violation_is: fatal
source_bd_ids: []
- id: SL-08
description: 'MACD parameters locked: fast=12, slow=26, signal=9'
locked_value: factors/algorithm.py:30 macd(slow=26, fast=12, n=9)
violation_is: fatal
source_bd_ids:
- BD-036
- id: SL-09
description: 'Default transaction costs: buy_cost=0.001, sell_cost=0.001, slippage=0.001'
locked_value: sim_account.py:25 SimAccountService default costs
violation_is: warning
source_bd_ids:
- BD-029
- id: SL-10
description: A-share equity trading is T+1 (no same-day close of buy positions)
locked_value: sim_account.available_long filters by trading_t
violation_is: fatal
source_bd_ids: []
- id: SL-11
description: Recorder subclass MUST define provider AND data_schema class attributes
locked_value: contract/recorder.py:71 Meta; register_schema decorator
violation_is: fatal
source_bd_ids: []
- id: SL-12
description: Factor result_df MUST contain either 'filter_result' OR 'score_result' column
locked_value: result_df.columns.intersection({'filter_result', 'score_result'}) non-empty
violation_is: fatal
source_bd_ids: []
implementation_hints:
- id: IH-01
hint: 'Use AdjustType enum exactly: qfq (pre-adjust), hfq (post-adjust), bfq (none) — contract/__init__.py:121'
- id: IH-02
hint: For A-share kdata, default to hfq for long-term analysis (dividend-adjusted) — trader.py:538 StockTrader
- id: IH-03
hint: SQLite connection MUST use check_same_thread=False for multi-threaded recorders
- id: IH-04
hint: Accumulator state serialization uses JSON with custom encoder/decoder hooks — contract/base_service.py
- id: IH-05
hint: Factor.level MUST match TargetSelector.level (enforced at add_factor) — factors/target_selector.py:84
preservation_manifest:
required_objects:
business_decisions_count: 128
fatal_constraints_count: 54
non_fatal_constraints_count: 168
use_cases_count: 5
semantic_locks_count: 12
preconditions_count: 4
evidence_quality_rules_count: 2
traceback_scenarios_count: 5
architecture:
pipeline: data_collection -> data_storage -> factor_computation -> target_selection -> trading_execution -> visualization
stages:
- id: data_collection
narrative:
does_what: TimeSeriesDataRecorder and FixedCycleDataRecorder fetch OHLCV and fundamental data from providers (eastmoney,
joinquant, baostock, akshare) and persist domain objects (Stock1dKdata, BalanceSheet) to SQLite via df_to_db().
key_decisions: BD-002 chose evaluate_start_end_size_timestamps for incremental fetch (not full refresh) because comparing
to get_latest_saved_record avoids redundant API calls; BD-003 chose get_data_map field transformation to keep domain
schema provider-agnostic.
common_pitfalls: 'Don''t forget SL-11: Recorder subclass MUST declare both provider and data_schema class attributes
else initialization fails with assertion error; finance-C-001 fatal violation.'
business_decisions: []
- id: data_storage
narrative:
does_what: StorageBackend persists DataFrames to per-provider SQLite databases at {data_path}/{provider}/{provider}_{db_name}.db
using path templates from _get_path_template; Mixin.record_data and Mixin.query_data provide uniform read/write interface.
key_decisions: BD-004 chose StorageBackend abstraction (not hardcoded SQLite) to allow future cloud storage swap; BD-006
derives db_name from data_schema __tablename__ for per-domain database isolation.
common_pitfalls: SL-04 violation (wrong DataFrame index) causes factor pipeline failures downstream; always ensure df.index.names
== ['entity_id', 'timestamp'] before calling record_data.
business_decisions: []
- id: factor_computation
narrative:
does_what: Factor.compute() applies Transformer (stateless, e.g. MacdTransformer) then Accumulator (stateful, e.g. MaStatsAccumulator)
to produce filter_result or score_result columns; EntityStateService persists per-entity rolling state across batches.
key_decisions: BD-007 chose Factor inheriting DataReader for composable data access; SL-08 locks MACD at (fast=12, slow=26,
n=9) — chose standard Appel parameters not adaptive because interpretability matters for practitioners.
common_pitfalls: 'SL-07: Transformer MUST run before Accumulator — swapping order causes NaN propagation; SL-12: result_df
must contain filter_result OR score_result column or TargetSelector silently drops all signals.'
business_decisions: []
- id: target_selection
narrative:
does_what: TargetSelector.add_factor() registers Factor instances; get_targets() returns entity_ids passing threshold
filter at a specific timestamp, enabling point-in-time historical backtesting without look-ahead.
key_decisions: BD-012 chose registrable factor list (not hardcoded) for runtime customization; BD-013 chose timestamp-specific
filtering not current-only because backtests need historical point-in-time correctness.
common_pitfalls: Factor.level MUST match TargetSelector.level (IH-05); mismatched levels cause silent empty target lists
that look like no signals but are actually level-mismatch bugs.
business_decisions: []
- id: trading_execution
narrative:
does_what: Trader.run() calls sell() before buy() each cycle, generates TradingSignals with due_timestamp = happen_timestamp
+ level.to_second() for next-bar execution, and applies on_profit_control() for stop-loss/take-profit before regular
target selection.
key_decisions: SL-01 locks sell-before-buy order because available_long check in sim_account depends on it — chose this
over symmetric ordering to prevent implicit leverage; BD-039 chose long=AND/short=OR multi-level logic to reflect
risk asymmetry.
common_pitfalls: 'SL-02 violation (immediate execution instead of next-bar) introduces look-ahead bias and makes backtest
results unreproducible in live trading; SL-10: A-share T+1 constraint — backtesting without it overstates returns.'
business_decisions: []
- id: visualization
narrative:
does_what: Drawer.draw() combines kline main chart with factor overlays and Rect annotations for entry/exit signals
using Plotly; Drawable interface on Factor enables consistent chart rendering across data types.
key_decisions: BD-019 chose drawer_rects subclass override for custom annotations not hardcoded markers — allows traders
to define entry/exit visuals without modifying base drawing logic.
common_pitfalls: draw_result=True by default (BD-055) is fine for development but set draw_result=False in production/headless
environments to avoid Plotly server startup overhead.
business_decisions: []
- id: cross_cutting_concerns
narrative:
does_what: 'Invariants and utilities that span multiple pipeline stages — collected from 26 source groups: Common Library(3),
Scaled NAV(3), api_abstraction(17), asset_allocation_by_account(3), asset_allocation_by_class(10), cachedtickerinfo(1),
and 20 more.'
key_decisions: 128 BDs merged here because they apply to more than one main stage (e.g. algorithm helpers, default value
choices, ordering contracts, error handling). Agent should inspect individual BD summaries and link back to affected
main stages via shared IDs.
common_pitfalls: Cross-cutting concerns frequently surface as bugs when changes to one main stage unintentionally break
another. Check constraints referencing these BDs and verify invariants still hold after any stage-local modification.
business_decisions:
- id: BD-056
type: B
summary: val() function returns 0 for empty Inventory, None for other cases
- id: BD-057
type: B/RC
summary: Decimal type used for each financial calculations
- id: BD-058
type: B
summary: Table footer sums Inventory columns via reduce with currency conversion
- id: BD-050
type: B/DK
summary: Use last 10 days of price data for MF/ETF ratio calculation
- id: BD-051
type: B/BA
summary: Use median ratio instead of mean to avoid extreme values
- id: BD-052
type: B/RC
summary: Warn but don't fail when ETF prices unavailable for MF estimation
- id: BD-019
type: M
summary: Duck-typed API classes, no abstract base
- id: BD-020
type: B
summary: Fava version compatibility via version.parse
- id: BD-021
type: B
summary: Config extracted from Custom directives
- id: BD-053
type: B
summary: CLI mode returns end_date=None (no date filtering)
- id: BD-054
type: B
summary: Config extracted from fava-extension custom directives in beancount file
- id: BD-055
type: B
summary: Version-specific query_func for Fava 1.22+ vs 1.30+ compatibility
- id: BD-GAP-001
type: M
summary: 'Missing: Convergence criteria explicit'
- id: BD-GAP-002
type: M
summary: 'Missing: Matrix ill-conditioning'
- id: BD-GAP-003
type: B
summary: 'Missing: 收益率频率与年化因子'
- id: BD-GAP-004
type: B
summary: 'Missing: 波动率模型族与分布选择'
- id: BD-GAP-005
type: B
summary: 'Missing: 因子 IC 的 demean 与分组对齐'
- id: BD-GAP-006
type: RC
summary: 'Missing: ** "Implement immutable append-only semantics for each data writes with timestamp + hash chaining'
- id: BD-GAP-007
type: RC
summary: 'Missing: ** "Add timezone-aware datetime handling throughout - prefer UTC normalization for each timestamp
operations'
- id: BD-GAP-008
type: M
summary: 'Missing: 协方差矩阵 PSD 修复策略'
- id: BD-GAP-009
type: B
summary: 'Missing: 协方差估计量选择与收缩'
- id: BD-GAP-010
type: B
summary: 'Missing: VaR/CVaR 置信水平与窗口'
- id: BD-GAP-011
type: B
summary: 'Missing: 波动率模型族与分布选择'
- id: BD-004
type: B
summary: Strategy pattern via pattern_type string lookup
- id: BD-065
type: B
summary: Account allocation uses optional include_children flag for balance rollup
- id: BD-066
type: B/BA
summary: Dynamic dispatch to pattern_type function (by_account_name, by_account_open_metadata)
- id: BD-001
type: B
summary: Metadata-driven bucketing via asset_allocation_* prefix
- id: BD-002
type: B
summary: Single base currency stored in root node
- id: BD-003
type: BA
summary: Unallocated amounts fall through to 'unknown' bucket
- id: BD-031
type: B/BA
summary: Tax adjustment enabled by default in asset allocation
- id: BD-032
type: B
summary: Use first operating currency as base currency for each conversions
- id: BD-033
type: B
summary: Bucket unallocated percentages into 'unknown' bucket when metadata < 100%
- id: BD-034
type: B/BA
summary: Skip negative balances (liabilities) in asset allocation
- id: BD-035
type: B
summary: Remove empty accounts and zero-balance ancestor accounts
- id: BD-036
type: B/BA
summary: Use 'asset_allocation_' metadata prefix for bucket definitions
- id: BD-037
type: B
summary: Convert currencies via operating currencies when direct conversion unavailable
- id: BD-082
type: B/BA
summary: 'Expense ratio conversion: ER * 100 for percentage display'
- id: BD-005
type: B/DK
summary: Cash commodities include operating currencies + metadata-tagged ones
- id: BD-006
type: BA
summary: Empty inventory rows filtered after query
- id: BD-038
type: B
summary: Cash commodities detected via asset_allocation_Bond_Cash metadata = 100
- id: BD-039
type: B/BA
summary: Include operating currencies as cash by default
- id: BD-040
type: B/BA
summary: Default accounts pattern '^Assets' for cash drag detection
- id: BD-097
type: B/BA
summary: loss_threshold defaults to 1 in TLH but 0 in example config (tlh.py:96 vs tlh.py:25)
- id: BD-103
type: B/BA
summary: Asset allocation tree root.currency hardcoded to first operating currency with no fallback
- id: BD-107
type: B/BA
summary: 'INTERACTION: BD-092 × BD-103 → Systemic currency failure when operating_currencies is empty or first currency
is inappropriate'
- id: BD-108
type: RC
summary: 'INTERACTION: BD-007 × BD-022 × BD-027 × BD-073 → Duplicated 30-day wash sale window hardcoding creates maintenance
hazard and compliance risk'
- id: BD-109
type: BA
summary: 'INTERACTION: BD-102 × BD-023 × BD-008 → minimizegains has hidden dependency on libtlh leap-year handling for
tax term classification'
- id: BD-110
type: RC
summary: 'INTERACTION: BD-094 × BD-095 → Hardcoded account filter in 3 modules with single source-of-truth creates concentrated
failure point'
- id: BD-111
type: B/BA
summary: 'INTERACTION: BD-097 (Contradiction) → loss_threshold defaults to 1 in code but 0 in example config creates
context-dependent harvesting behavior'
- id: BD-112
type: BA
summary: 'INTERACTION: BD-029 × BD-030 × BD-017 → Cross-account wash sale detection depends on substantially identical
fund grouping accuracy'
- id: BD-113
type: BA
summary: 'INTERACTION: BD-101 × BD-093 → AccAPI imports Fava internals despite being designed as standalone Beancount
API'
- id: BD-114
type: T
summary: 'INTERACTION: BD-057 × BD-092 → Decimal precision guarantee depends on single-base-currency assumption holding'
- id: BD-115
type: BA/DK
summary: 'INTERACTION: BD-020 × BD-055 → Fava version compatibility logic duplicated across FavaInvestorAPI with different
version thresholds'
- id: BD-116
type: RC
summary: 'RISK CASCADE: BD-092 → BD-002 → BD-103 → BD-037 → BD-058 → BD-069 → BD-090 → BD-091 → BD-076 → BD-077'
- id: BD-117
type: RC
summary: 'RISK CASCADE: BD-095 → BD-102 → BD-023 → BD-008 → BD-096 → BD-024'
- id: BD-098
type: BA
summary: ScaledNAV extends RelateTickers inheriting build_commodity_groups() for identicals - both use same metadata
- id: BD-106
type: BA
summary: Each Investor method creates NEW FavaInvestorAPI instance instead of reusing
- id: BD-092
type: RC
summary: Every modules use operating_currencies[0] as single base currency for ALL financial calculations
- id: BD-094
type: B/RC
summary: Account filter 'account_sortkey(account) ~ "^[01]"' hardcoded across TLH, minimizegains, and summarizer
- id: BD-099
type: RC
summary: RelateTickers.substidenticals() combines both 'a__equivalents' AND 'a__substidenticals' by default
- id: BD-101
type: DK/B
summary: AccAPI.root_tree() imports fava/core.Tree despite being in beancountinvestorapi.py
- id: BD-104
type: T
summary: Every modules use 'a__' prefix for auto-generated metadata vs 'asset_allocation_' for user config
- id: BD-079
type: B
summary: 'Portfolio allocation percentage: (balance / total) * 100 rounded to 1 decimal'
- id: BD-067
type: B
summary: 'Asset allocation percentage calculation: (balance / total) * 100'
- id: BD-068
type: B/RC
summary: Recursive subtree balance computation for hierarchical percentages
- id: BD-069
type: B/BA
summary: 'Tax-adjusted position scaling: position * (tax_adj / 100)'
- id: BD-088
type: B/RC
summary: 'Bucket allocation distribution: amount * (meta_value / 100)'
- id: BD-080
type: B/BA
summary: 'Cash drag threshold filter: position >= min_threshold'
- id: BD-081
type: B/RC
summary: Inventory sum via sequential accumulation into single Inventory
- id: BD-075
type: B/BA
summary: Tax burden interpolation between proceeds bracket boundaries
- id: BD-076
type: B/RC
summary: 'Average tax rate calculation: (cumulative_taxes / cumulative_proceeds) * 100'
- id: BD-077
type: B/RC
summary: 'Marginal tax rate: (Δ_taxes / Δ_proceeds) * 100'
- id: BD-078
type: B/RC
summary: Lot selection ordering by estimated tax percentage (ascending)
- id: BD-089
type: B/RC
summary: Decimal rounding for proceeds (0 decimals), tax rates (1-2 decimals)
- id: BD-091
type: B/RC
summary: 'Estimated tax calculation: gain * tax_rate (per term)'
- id: BD-086
type: B
summary: Table sorting with configurable column and direction
- id: BD-072
type: B/BA
summary: Gain term classification using relativedelta for precise date arithmetic
- id: BD-073
type: B/RC
summary: 30-day wash sale lookback period for recent purchases
- id: BD-074
type: B/BA
summary: 'Loss threshold filtering: losses < -loss_threshold'
- id: BD-090
type: B/DK
summary: Market value to basis difference for loss calculation
- id: BD-014
type: B
summary: Metadata prefix filtering for flexible column selection
- id: BD-015
type: BA
summary: Commodity leaf accounts excluded if parent is open
- id: BD-059
type: B
summary: Commodity_leaf accounts only included if parent account has no Open directive
- id: BD-060
type: B/RC
summary: Option to filter summarizer to active commodities only (has positions)
- id: BD-061
type: B
summary: Empty string used for missing column values in summarizer
- id: BD-011
type: B/RC
summary: Lots sorted by estimated tax percentage (ascending)
- id: BD-012
type: BA
summary: Cumulative columns added after sorting
- id: BD-013
type: B/BA
summary: Short-term and long-term tax rates from config
- id: BD-041
type: B/BA
summary: Default short-term and long-term tax rates of 1%
- id: BD-042
type: B/RC
summary: Sort lots by estimated tax percentage ascending
- id: BD-043
type: B/BA
summary: 'Add cumulative columns: cu_proceeds, cu_taxes, tax_avg, tax_marg'
- id: BD-044
type: B/RC
summary: Estimate tax = gain × tax_rate for each lot
- id: BD-095
type: BA
summary: libtlh.get_account_field(options) is the ONLY source of truth for account field extraction, shared by TLH and
minimizegains
- id: BD-096
type: RC
summary: 'libtlh.get_tables() pipeline order: find_harvestable_lots → harvestable_by_commodity → summarize_tlh → build_recents'
- id: BD-102
type: RC
summary: minimizegains relies on libtlh.gain_term() for short/long term classification
- id: BD-093
type: BA
summary: 'Dual-API pattern: FavaInvestorAPI (Fava context) vs AccAPI (CLI context) implement identical interfaces'
- id: BD-100
type: B/BA
summary: Node tree pattern with underscore-separated naming (e.g., 'equity_domestic') for asset allocation hierarchy
- id: BD-105
type: B/DK
summary: Wash sale detection uses 30-day lookback hardcoded in SQL DATE_ADD(TODAY(), -30)
- id: BD-083
type: B/RC
summary: Union-Find algorithm for building commodity equivalence groups
- id: BD-084
type: B
summary: 'TLH partner inference using symmetric rule: if A→(B,C) then B→(A,C), C→(A,B)'
- id: BD-085
type: B/RC
summary: Representative ticker selection for identical group
- id: BD-070
type: B
summary: MF NAV estimation using median ratio from historical MF/ETF pairs
- id: BD-071
type: B/DK
summary: NAV scaling ratio based on only most recent 10 historical ratios
- id: BD-087
type: B
summary: 'Price ratio calculation: MF_price / ETF_price across matching dates'
- id: BD-007
type: B/DK
summary: 30-day wash sale window hardcoded in query
- id: BD-008
type: B/BA
summary: relativedelta for gain term to handle leap years
- id: BD-009
type: B/BA
summary: Substantially identical tickers read from commodity metadata
- id: BD-010
type: B/RC
summary: Summary aggregates currency values via Decimal sum
- id: BD-022
type: B/DK
summary: 30-day wash sale window for both recent purchases and recent sales
- id: BD-023
type: B/BA
summary: 'Long-term gain threshold: >1 year using relativedelta accounting for leap years'
- id: BD-024
type: B/BA
summary: Default loss_threshold of 1 dollar
- id: BD-025
type: B/RC
summary: Uses a__substidenticals metadata to identify substantially identical securities
- id: BD-026
type: B/BA
summary: Filter accounts via account_sortkey matching ^[01] pattern
- id: BD-027
type: B/DK
summary: Earliest safe sale date = acquisition_date + 31 days
- id: BD-028
type: B/RC
summary: Sort harvestable table by highest to lowest losses
- id: BD-029
type: B/RC
summary: Separate wash_pattern to distinguish taxable accounts from wash-sale accounts
- id: BD-030
type: B/DK
summary: Deduplicate recent purchases by ticker across substantially identical funds
- id: BD-016
type: B
summary: Graph-based inference of TLH partners
- id: BD-017
type: B/DK
summary: Equivalents vs Substidenticals distinction
- id: BD-018
type: BA
summary: Archived tickers filtered from TLH groups
- id: BD-045
type: B/RC
summary: Separate 'a__equivalents' and 'a__substidenticals' metadata fields
- id: BD-046
type: B/RC
summary: Tickers with 'archive' metadata are excluded from TLH calculations
- id: BD-047
type: B/RC
summary: 'TLH partners are made transitive: if A→B, then B→A inferred'
- id: BD-048
type: B/RC
summary: Option to filter TLH partners by fund type (ETF vs MUTUALFUND)
- id: BD-049
type: B/RC
summary: Representative ticker chosen from preferred set (idents_preferred)
- id: BD-062
type: B
summary: Yahoo info cache stored as pickle file in BEAN_ROOT directory
- id: BD-063
type: B/BA
summary: Expense ratio converted from decimal to percentage (×100) on cache write
- id: BD-064
type: B/DK
summary: Remove '-' ISIN values from cached ticker info
resources:
packages:
- name: beancount >= 2.3.2
version_pin: latest
- name: fava >= 1.26
version_pin: latest
- name: beanquery
version_pin: latest
- name: Click >= 7.0
version_pin: latest
- name: click_aliases >= 1.0.1
version_pin: latest
- name: tabulate >= 0.8.9
version_pin: latest
- name: packaging >= 20.3
version_pin: latest
- name: python_dateutil >= 2.8.1
version_pin: latest
- name: yfinance >= 0.1.70
version_pin: latest
- name: importlib_metadata >= 1.5.0
version_pin: latest
strategy_scaffold:
entry_point_name: run_backtest
output_path: result.csv
execution_mode: backtest
conditional_entry_points:
backtest:
entry_point_name: run_backtest
output_path: result.csv
collector:
entry_point_name: run_collector
output_path: result.json
factor:
entry_point_name: run_factor
output_path: result.parquet
training:
entry_point_name: run_training
output_path: result.json
serving:
entry_point_name: run_server
output_path: result.json
research:
entry_point_name: run_research
output_path: result.json
tail_template: "# === DO NOT MODIFY BELOW THIS LINE ===\nif __name__ == \"__main__\":\n result = run_backtest() #\
\ implement above\n from validate import enforce_validation\n enforce_validation(result, output_path=\"{workspace}/result.csv\"\
)\n# === END DO NOT MODIFY ==="
host_adapter:
target: openclaw
timeout_seconds: 1800
shell_operator_restriction: 'exec tool intercepts && / ; / | — never chain: ''pip install X && python Y''. Use separate
exec calls.'
install_recipes:
- python3 -m pip install beancount >= 2.3.2
- python3 -m pip install fava >= 1.26
- python3 -m pip install beanquery
- python3 -m pip install zvt
credential_injection: JoinQuant/QMT credentials require user-side '!' prefix shell login. Never hardcode credentials in
generated scripts.
path_resolution: '{workspace} resolves to ~/.openclaw/workspace/doramagic at execution time.'
file_io_tooling: Use openclaw 'write' tool for .py/.sql files; 'exec' tool for python3 /absolute/path/script.py (absolute
paths only).
constraints:
fatal:
- id: finance-C-001
when: When implementing AccAPI or FavaInvestorAPI query_func
action: Convert query results to namedtuple format with field names from query types
severity: fatal
kind: domain_rule
modality: must
consequence: Module code expecting attribute access via row.column_name will fail with AttributeError, breaking asset
allocation, tax loss harvesting, and other financial computations that rely on namedtuple row iteration
stage_ids:
- api_abstraction
- id: finance-C-002
when: When implementing price loading in AccAPI
action: Build beancount price map using prices.build_price_map(entries) from each ledger entries
severity: fatal
kind: domain_rule
modality: must
consequence: Multi-currency portfolios will fail to convert positions to base currency, causing asset allocation calculations
to crash or produce incorrect results when commodities have different denominations
stage_ids:
- api_abstraction
- id: finance-C-003
when: When parsing Custom directive configurations
action: Filter entries for Custom type with fava-extension and evaluate value strings using ast.literal_eval
severity: fatal
kind: domain_rule
modality: must
consequence: Configuration parsing will silently return empty dicts for all modules, causing all investor reports to use
default parameters instead of user-specified configurations, leading to incorrect financial analysis
stage_ids:
- api_abstraction
- id: finance-C-004
when: When running FavaInvestorAPI with different Fava versions
action: Use version.parse from packaging library to compare fava_version against 1.22 and 1.30 thresholds
severity: fatal
kind: domain_rule
modality: must
consequence: Query execution will fail with wrong number of arguments error on incompatible Fava versions, breaking the
Fava web interface entirely
stage_ids:
- api_abstraction
- id: finance-C-006
when: When running CLI with AccAPI
action: Pass a valid beancount file path to AccAPI constructor for ledger loading
severity: fatal
kind: resource_boundary
modality: must
consequence: Ledger loading will fail with FileNotFoundError, preventing any CLI commands from executing; all modules
(tlh, assetalloc, cashdrag, summarizer) depend on this
stage_ids:
- api_abstraction
- id: finance-C-009
when: When creating new AccAPI or FavaInvestorAPI implementations
action: 'Implement each required methods: query_func, build_price_map, get_custom_config, get_commodity_directives, get_operating_currencies,
realize, root_tree'
severity: fatal
kind: architecture_guardrail
modality: must
consequence: Modules calling missing methods will raise AttributeError at runtime, breaking financial calculations that
depend on those APIs
stage_ids:
- api_abstraction
- id: finance-C-017
when: When accessing row data from query_func results
action: Access row fields using attribute notation (row.column_name) not dictionary notation (row['column_name'])
severity: fatal
kind: domain_rule
modality: must
consequence: namedtuple rows do not support item access; code using row['column'] will raise TypeError, breaking all module
queries that expect attribute-style access
stage_ids:
- api_abstraction
- id: finance-C-019
when: When building commodity groups from incomplete declarations
action: apply transitive closure to infer complete equivalence/substantially identical groups
severity: fatal
kind: domain_rule
modality: must
consequence: Incomplete relationship inference will cause some substantially identical funds to not be grouped together,
leading to incorrect wash sale detection and missing TLH partner recommendations
stage_ids:
- ticker_relationships
- id: finance-C-020
when: When implementing the representative() method
action: consistently return the same representative ticker for any member of a substantially identical group
severity: fatal
kind: domain_rule
modality: must
consequence: Inconsistent representative selection will cause non-deterministic TLH group calculations, where the same
ticker gets different partners depending on which group member is used as the key
stage_ids:
- ticker_relationships
- id: finance-C-021
when: When computing TLH groups from unidirectional declarations
action: infer bidirectional relationships so if A→B is declared, B→A is also included
severity: fatal
kind: domain_rule
modality: must
consequence: Missing bidirectional inference will cause incomplete TLH partner lists, where only some funds show all their
valid swap options, leading to suboptimal TLH decisions
stage_ids:
- ticker_relationships
- id: finance-C-025
when: When relationship_source is extended to read from sources other than commodity directives
action: maintain compatibility with the existing a__equivalents/a__substidenticals/a__tlh_partners metadata schema
severity: fatal
kind: resource_boundary
modality: must
consequence: Schema changes will break compatibility with ScaledNAV and other classes extending RelateTickers, causing
inconsistent ticker relationship data across the system
stage_ids:
- ticker_relationships
- id: finance-C-026
when: When extending RelateTickers with additional functionality
action: share the same metadata schema for equivalents and substidenticals declarations
severity: fatal
kind: resource_boundary
modality: must
consequence: Schema divergence will cause extended classes to miss ticker relationships declared for the base class, resulting
in incomplete wash sale detection and TLH recommendations
stage_ids:
- ticker_relationships
- id: finance-C-027
when: When selecting tickers for wash sale comparison
action: use the representatives of substantially identical groups to prevent within-group comparisons
severity: fatal
kind: architecture_guardrail
modality: must
consequence: Comparing a ticker to itself in its own substantially identical group will falsely trigger wash sale warnings,
blocking valid TLH transactions
stage_ids:
- ticker_relationships
- id: finance-C-030
when: When building the wash sale prevention query
action: use both the current ticker and each its substantially identical partners in the comparison set
severity: fatal
kind: architecture_guardrail
modality: must
consequence: Missing substantially identical tickers from the wash sale query will allow purchases of similar funds within
30 days, triggering unintended wash sales that disallow the tax loss
stage_ids:
- ticker_relationships
- id: finance-C-031
when: When implementing substidenticals() for a ticker with no substantially identical partners
action: return an empty list, not None or a list containing the input ticker
severity: fatal
kind: architecture_guardrail
modality: must
consequence: Returning the input ticker will cause false wash sale warnings when comparing a ticker to itself, incorrectly
blocking valid TLH transactions
stage_ids:
- ticker_relationships
- id: finance-C-035
when: When implementing classes that extend RelateTickers
action: call the parent __init__ or replicate its file loading and database initialization
severity: fatal
kind: architecture_guardrail
modality: must
consequence: Missing database initialization will cause AttributeError when accessing self.db or self.idents, preventing
the class from functioning
stage_ids:
- ticker_relationships
- id: finance-C-037
when: When computing asset class percentages for multiple currencies
action: Convert each positions to a single base currency before aggregating into asset buckets
severity: fatal
kind: domain_rule
modality: must
consequence: Asset allocation percentages will be incorrect if positions in different currencies are summed directly,
leading to meaningless results like '500 + 750 = 1250 USD' without currency conversion
stage_ids:
- asset_allocation_by_class
- id: finance-C-039
when: When scaling positions for tax adjustment
action: Verify that positions have a cost spec present after realization before scaling
severity: fatal
kind: domain_rule
modality: must
consequence: Tax-adjusted allocation will be incorrect if positions lose cost information during realization, causing
wrong basis for tax-adjusted percentage calculations
stage_ids:
- asset_allocation_by_class
- id: finance-C-040
when: When handling multi-currency portfolios with cost currencies different from operating currencies
action: Include the cost currency in operating_currencies list to enable transitive conversion
severity: fatal
kind: resource_boundary
modality: must
consequence: 'Currency conversion will fail with ''Error: unable to convert X to base currency Y (Missing price directive?)''
if cost currency is not available as operating currency or via price chain'
stage_ids:
- asset_allocation_by_class
- id: finance-C-052
when: When writing the asset_allocation function in libaaacc.py
action: guard against division by zero when portfolio_total is zero
severity: fatal
kind: domain_rule
modality: must
consequence: Division by portfolio_total at line 79 causes ZeroDivisionError when all account balances in selected accounts
are zero, resulting in complete failure to render any asset allocation report
stage_ids:
- asset_allocation_by_account
- id: finance-C-055
when: When using accapi.cost_or_value in asset allocation calculations
action: call accapi.cost_or_value with a CLI-based AccAPI instance without first implementing the method
severity: fatal
kind: resource_boundary
modality: must_not
consequence: AccAPI.cost_or_value is commented out and returns None/not implemented, causing libaaacc.py:71 to fail with
AttributeError when running via CLI, preventing any asset allocation by account computation
stage_ids:
- asset_allocation_by_account
- id: finance-C-060
when: When implementing asset allocation by account in Fava extension
action: obtain portfolio data via FavaInvestorAPI which provides cost_or_value functionality
severity: fatal
kind: architecture_guardrail
modality: must
consequence: FavaInvestorAPI provides the cost_or_value method required for balance calculations (libaaacc.py:71), while
AccAPI for CLI lacks this; using wrong API class causes AttributeError on cost_or_value call
stage_ids:
- asset_allocation_by_account
- id: finance-C-069
when: When determining which commodities are considered cash
action: always include operating currencies in the cash commodities list
severity: fatal
kind: domain_rule
modality: must
consequence: Operating currencies like USD will not appear in cash drag output, missing potential cash drag opportunities
that should be flagged
stage_ids:
- cash_drag
- id: finance-C-079
when: When implementing loss calculations for tax loss harvesting
action: Use beancount.core.number.Decimal for each monetary calculations
severity: fatal
kind: domain_rule
modality: must
consequence: Floating-point arithmetic may cause incorrect loss calculations, leading to harvesting the wrong lot quantities
and potential tax compliance issues
stage_ids:
- tax_loss_harvesting
- id: finance-C-080
when: When determining long-term vs short-term gain classification
action: Use dateutil.relativedelta to calculate gain term duration
severity: fatal
kind: domain_rule
modality: must
consequence: Incorrect leap year handling can misclassify gains as long-term when they should be short-term (or vice versa),
resulting in wrong tax rates applied
stage_ids:
- tax_loss_harvesting
- id: finance-C-081
when: When implementing wash sale detection logic
action: Use a 30-day lookback window and 31-day earliest sale date for wash sale compliance
severity: fatal
kind: domain_rule
modality: must
consequence: Incorrect wash sale window violates IRS wash sale rule, disallowing the loss deduction and potentially triggering
IRS penalties
stage_ids:
- tax_loss_harvesting
- id: finance-C-083
when: When implementing tax loss harvesting features
action: Read substantially identical tickers from commodity metadata using a__substidenticals label
severity: fatal
kind: domain_rule
modality: must
consequence: Selling a security and buying a substantially identical one within wash sale window converts harvest into
a disallowed wash sale
stage_ids:
- tax_loss_harvesting
- id: finance-C-094
when: When implementing tax calculations in gains minimization
action: Use Decimal type from beancount.core.number for each monetary calculations
severity: fatal
kind: domain_rule
modality: must
consequence: Floating-point float arithmetic introduces rounding errors in tax calculations, potentially causing incorrect
lot selection and misreported tax liabilities
stage_ids:
- minimize_gains
- id: finance-C-095
when: When implementing lot sorting in gains minimization
action: Sort lots by est_tax_percent in ascending order to prioritize highest-loss lots first
severity: fatal
kind: domain_rule
modality: must
consequence: Incorrect sorting order causes suboptimal lot selection, resulting in higher capital gains tax liability
than necessary
stage_ids:
- minimize_gains
- id: finance-C-096
when: When classifying tax term (short-term vs long-term) in gains minimization
action: Use libtlh.gain_term() function with relativedelta for IRS-compliant year boundary calculation
severity: fatal
kind: domain_rule
modality: must
consequence: Using simple year difference instead of relativedelta causes incorrect long/short term classification near
year boundaries due to leap year handling; IRS defines 'more than 1 year' as requiring 1 year plus at least 1 day
stage_ids:
- minimize_gains
- id: finance-C-101
when: When applying short-term and long-term tax rates in gains minimization
action: Separate lots by holding period and apply corresponding st_tax_rate or lt_tax_rate from config
severity: fatal
kind: domain_rule
modality: must
consequence: Wrong tax rate application causes incorrect estimated tax calculations, leading to suboptimal lot selection
decisions
stage_ids:
- minimize_gains
- id: finance-C-104
when: When depending on gain_term classification in gains minimization
action: Replace or modify libtlh.gain_term() with a custom implementation that changes term classification logic
severity: fatal
kind: architecture_guardrail
modality: must_not
consequence: Custom term classification may violate IRS rules for long/short term gains, causing incorrect tax estimates
and potential IRS compliance issues
stage_ids:
- minimize_gains
- id: finance-C-128
when: When implementing any financial calculation across the entire system
action: Use Decimal from beancount.core.number for each monetary values to preserve exact decimal representation without
floating-point errors
severity: fatal
kind: domain_rule
modality: must
consequence: Floating-point calculations introduce rounding errors that corrupt PnL, tax estimates, and portfolio value
calculations, leading to incorrect financial decisions
- id: finance-C-129
when: When building any table output consumed by Fava templates or CLI presenters
action: Return a standardized 4-tuple format (rtypes, rrows, extra, footer) where rrows is a namedtuple with column access
via row.column_name
severity: fatal
kind: domain_rule
modality: must
consequence: Fava templates and CLI presenters expect namedtuple row objects; breaking this contract causes AttributeError
in all consumers
- id: finance-C-130
when: When implementing the AccAPI interface for data access abstraction
action: 'Implement each required methods: query_func, realize, root_tree, build_price_map, get_commodity_directives, get_operating_currencies'
severity: fatal
kind: architecture_guardrail
modality: must
consequence: Missing interface methods cause AttributeError when modules attempt data access, breaking both Fava extension
and CLI
- id: finance-C-131
when: When performing any currency conversion or financial aggregation
action: Use operating_currencies[0] as the single base currency for each financial calculations to collapse multi-currency
portfolios
severity: fatal
kind: domain_rule
modality: must
consequence: Using currencies inconsistently produces incorrect portfolio totals and asset allocation percentages across
different currencies
- id: finance-C-148
when: When executing the TLH analysis pipeline or refactoring its implementation
action: 'Maintain strict sequential execution order: find_harvestable_lots → harvestable_by_commodity → summarize_tlh
→ build_recents; do not reorder, parallelize, or cache intermediate outputs between stages as subsequent stages expect
specifically formatted inputs from predecessors'
severity: fatal
kind: domain_rule
modality: must
consequence: Reordering pipeline stages or using cached outputs from non-sequential execution causes subsequent stages
to receive incorrectly formatted inputs, producing invalid tax loss harvesting recommendations
derived_from_bd_id: BD-096
- id: finance-C-151
when: When calculating gain term for tax classification using relativedelta
action: Use relativedelta(years=1) for >1 year threshold calculation to properly handle leap years and varying month lengths;
must NOT substitute with timedelta(days=365) or simple year subtraction
severity: fatal
kind: domain_rule
modality: must
consequence: Using 365-day timedelta misclassifies assets held exactly 365 days as long-term when they should be short-term,
and vice versa near year boundaries with leap years, causing incorrect tax rate application
derived_from_bd_id: BD-023
- id: finance-C-152
when: When determining wash sale applicability for security transactions
action: Distinguish between equivalents (freely interchangeable, no wash sale) and substidenticals (trigger wash sale
if bought/sold within 61-day window) — do NOT treat each ticker relationships as substantially identical
severity: fatal
kind: domain_rule
modality: must
consequence: Treating equivalents as substantially identical over-flags wash sales, incorrectly preventing legitimate
fund switches and disallowing valid loss deductions; treating substidenticals as equivalents misses wash sales, causing
IRS non-compliance
derived_from_bd_id: BD-017
- id: finance-C-153
when: When implementing wash sale detection logic in tax loss harvesting
action: Enforce the 30-day wash sale window symmetrically on BOTH sides of any transaction — check for overlapping purchases
within 30 days after sales AND overlapping sales within 30 days before purchases
severity: fatal
kind: domain_rule
modality: must
consequence: Asymmetric wash sale enforcement misses overlapping scenarios; selling at a loss and repurchasing within
30 days would not trigger wash sale, allowing disallowed loss deductions to pass through undetected
derived_from_bd_id: BD-022
- id: finance-C-154
when: When identifying substantially identical securities for wash sale tracking
action: Read and respect the a__substidenticals metadata field on securities to identify user-defined substantially identical
relationships — must NOT rely solely on generic fund matching algorithms
severity: fatal
kind: domain_rule
modality: must
consequence: Skipping the a__substidenticals metadata check misses user-defined substantially identical relationships,
causing false negative wash sale detection where disallowed losses are not properly flagged for IRS compliance
derived_from_bd_id: BD-025
- id: finance-C-156
when: When classifying gains and losses as short-term or long-term across tax optimization modules
action: Use libtlh.gain_term() consistently as the authoritative source for short/long term classification in each modules
including minimizegains — must NOT implement independent classification logic in any module
severity: fatal
kind: domain_rule
modality: must
consequence: Inconsistent classification between TLH and minimizegains creates contradictory tax optimization decisions;
some modules may harvest losses while others classify the same gains differently, potentially creating IRS compliance
issues
derived_from_bd_id: BD-102
- id: finance-C-157
when: When implementing wash sale detection logic that references the 30-day window
action: Centralize the 30-day wash sale constant in a single constants module and import it across each modules (BD-007,
BD-022, BD-027, BD-073) — must NOT hardcode the 30-day value in multiple locations
severity: fatal
kind: architecture_guardrail
modality: must
consequence: Hardcoding 30-day constant in multiple locations creates maintenance hazard; if IRS rules change or non-US
jurisdictions need adaptation, missing one location causes inconsistent wash sale detection leading to false positives
or missed violations
derived_from_bd_id: BD-108
- id: finance-C-161
when: When implementing wash sale detection or determining earliest safe sale dates for tax loss harvesting
action: Calculate earliest safe sale date as acquisition_date plus exactly 31 days — this boundary is one day beyond the
30-day IRS wash sale window, ensuring sales occur outside the prohibited period
severity: fatal
kind: domain_rule
modality: must
consequence: Using 30 days instead of 31 days places sales inside the wash sale window, causing the IRS to disallow loss
deductions and recharacterize gains — this creates tax liability that the backtest does not account for
derived_from_bd_id: BD-027
- id: finance-C-163
when: When configuring wash sale detection for accounts that could trigger wash sale rules
action: Use separate wash_pattern configurations for taxable accounts versus accounts that could trigger wash sales —
the wash_pattern for taxable accounts must differ from the wash_pattern for accounts where repurchase would be disallowed
severity: fatal
kind: domain_rule
modality: must
consequence: Using the same wash_pattern for all accounts causes harvesting in one account to incorrectly trigger wash
sale restrictions in another, disallowing legitimate tax losses across related accounts
derived_from_bd_id: BD-029
- id: finance-C-166
when: When implementing or modifying account filtering logic across TLH, minimizegains, and summarizer modules
action: Centralize account filter pattern '^[01]' and account field extraction logic in a single shared function — each
three modules (libtlh.py, libminimizegains.py, libsummarizer.py) must import this function rather than duplicating the
pattern
severity: fatal
kind: architecture_guardrail
modality: must
consequence: Hardcoded account filter patterns in three separate modules create concentrated failure points — if the pattern
needs updating, a missed update in any module causes inconsistent account filtering and contradictory tax optimization
recommendations
derived_from_bd_id: BD-110
- id: finance-C-175
when: When implementing position scaling with tax adjustment percentage
action: Validate that tax_adj parameter is non-negative (>= 0) before computing position * (tax_adj / 100); reject or
clamp negative values to 0
severity: fatal
kind: domain_rule
modality: must
consequence: A negative tax_adj value produces invalid negative positions, causing strategies to take short positions
in assets they should only hold long, resulting in completely inverted position logic and potentially unlimited loss
exposure in margin accounts
derived_from_bd_id: BD-069
- id: finance-C-177
when: When implementing tax-loss harvesting logic that excludes recent purchases
action: Exclude positions purchased within the last 30 days (including day-of-purchase) from TLH harvesting pool; positions
with purchase_date >= current_date - 30 days are ineligible
severity: fatal
kind: domain_rule
modality: must
consequence: Failing to exclude recently purchased positions from TLH harvesting causes the system to harvest losses that
the IRS will disallow under wash sale rules, resulting in unexpected tax liabilities when the disallowed loss is added
back to cost basis in future years
derived_from_bd_id: BD-073
- id: finance-C-192
when: When implementing any financial calculation (NAV, portfolio valuations, tax lot computations) in any module
action: Convert each monetary values to operating_currencies[0] before performing calculations — the system enforces using
the first operating currency as the single canonical base unit; there is no multi-currency fallback
severity: fatal
kind: domain_rule
modality: must
consequence: Mixing currencies without conversion produces mathematically invalid results; adding USD and CNY values without
conversion creates meaningless numbers that violate accounting consistency and cause incorrect tax calculations
derived_from_bd_id: BD-092
- id: finance-C-199
when: When implementing or modifying libtlh.gain_term() or libtlh.get_account_field() functions
action: Verify backward compatibility in function signatures and behavior - changes will cascade through both TLH harvestable
lot identification and minimizegains gain minimization recommendations simultaneously
severity: fatal
kind: domain_rule
modality: must
consequence: A bug in gain_term leap-year handling or account field parsing silently corrupts both the TLH module's harvestable
lot identification AND the minimizegains module's gain minimization recommendations, causing suboptimal tax strategy
recommendations without raising errors
derived_from_bd_id: BD-095
- id: finance-C-200
when: When processing portfolio valuation for accounts with multiple operating_currencies or when operating_currencies
may be empty
action: Validate that operating_currencies contains at least one currency before any downstream calculations - if operating_currencies
is empty, the entire calculation pipeline (BD-103 asset allocation, BD-037 currency conversion, BD-069 position scaling,
BD-090/091 tax lot calculations, BD-076/077 tax rates) will produce values in invalid currency without raising errors
severity: fatal
kind: domain_rule
modality: must
consequence: Empty operating_currencies causes the entire financial calculation pipeline to produce values in an invalid
currency, corrupting tax optimization recommendations and asset allocation analysis for multi-currency portfolios -
the system will show numbers but they represent no valid currency
derived_from_bd_id: BD-116
- id: finance-C-201
when: When implementing minimizegains module or modifying loss_threshold default value
action: Verify loss_threshold default of $1 is intentional for the tax optimization strategy - changes to this default
affect both the harvestable lot identification stage and the gain minimization stage of the tax loss harvesting pipeline
severity: fatal
kind: domain_rule
modality: must
consequence: Changing loss_threshold default affects both pipeline stages, potentially causing lots with small losses
to be excluded from harvesting or different optimization strategies to be recommended, reducing tax savings effectiveness
derived_from_bd_id: BD-117
- id: finance-C-202
when: When implementing minimizegains module or modifying libtlh gain_term() tax term classification
action: Verify leap year handling in relativedelta calculations for gain_term() - BD-023 and BD-008 verify correct handling,
changes would cascade through tax lot classification affecting both harvestable lot identification and gain minimization
severity: fatal
kind: domain_rule
modality: must
consequence: Incorrect leap year handling in gain_term() causes misclassification of short-term vs long-term gains, leading
to incorrect tax optimization recommendations and potential non-compliance with tax holding period requirements
derived_from_bd_id: BD-117
- id: finance-C-207
when: When implementing wash sale detection logic
action: Modify the 30-day wash sale lookback period — DATE_ADD(TODAY(), -30) implements IRS IRC Section 1091 regulatory
requirement; changing this value violates tax compliance
severity: fatal
kind: domain_rule
modality: must_not
consequence: Modifying the hardcoded 30-day wash sale lookback causes the backtesting system to violate IRS wash sale
rule IRC Section 1091, potentially allowing loss claims that would be disallowed in actual tax filings and triggering
IRS penalties
derived_from_bd_id: BD-105
regular:
- id: finance-C-005
when: When implementing AccAPI root_tree
action: Import and use fava.core.Tree class with entries to build tree structure
severity: high
kind: resource_boundary
modality: must
consequence: CLI commands that display account tree structures will fail to import fava.core.Tree, preventing tree-based
visualizations from working in standalone mode
stage_ids:
- api_abstraction
- id: finance-C-007
when: When deploying FavaInvestorAPI in web context
action: Verify Fava version is at least 1.22 to support required query API compatibility
severity: high
kind: resource_boundary
modality: must
consequence: Fava web interface will fail to execute queries, displaying error messages to users and making all investor
reports inaccessible via the web UI
stage_ids:
- api_abstraction
- id: finance-C-008
when: When parsing config with ast.literal_eval
action: Use Python dict literal syntax in beancount Custom directive values (not JSON)
severity: high
kind: resource_boundary
modality: must
consequence: ast.literal_eval will raise SyntaxError when parsing JSON format, causing all module configurations to be
ignored and defaults to be used instead
stage_ids:
- api_abstraction
- id: finance-C-010
when: When accessing operating currencies in module code
action: Call accapi.get_operating_currencies() which returns a list, and access first element for single currency operations
severity: high
kind: architecture_guardrail
modality: must
consequence: Currency conversion operations will fail with index error or wrong currency when expecting single base currency
but receiving list of operating currencies
stage_ids:
- api_abstraction
- id: finance-C-011
when: When instantiating AccAPI in Fava web context
action: Instantiate AccAPI directly in Fava extension code that runs in web context
severity: high
kind: architecture_guardrail
modality: must_not
consequence: AccAPI requires a beancount file path which is not available in Fava web context; using AccAPI instead of
FavaInvestorAPI will cause loader.load_file to fail with FileNotFoundError
stage_ids:
- api_abstraction
- id: finance-C-012
when: When configuring fava-extension Custom directive
action: Use 'fava_investor' as part of the config key name for fava_investor module configurations
severity: high
kind: architecture_guardrail
modality: must
consequence: get_custom_config will fail to find module configurations, returning empty dicts and causing all modules
to use default parameters instead of user-specified settings
stage_ids:
- api_abstraction
- id: finance-C-013
when: When claiming API functionality
action: Claim real-time price updates or live trading execution capability
severity: high
kind: claim_boundary
modality: must_not
consequence: API only reads historical beancount ledger entries; no mechanism exists for real-time data or trade execution,
misleading users about system capabilities
stage_ids:
- api_abstraction
- id: finance-C-014
when: When presenting query results or calculations
action: Present backtested calculations as guaranteed future investment outcomes
severity: high
kind: claim_boundary
modality: must_not
consequence: Historical ledger analysis does not predict future returns; presenting tax loss harvesting opportunities
or asset allocation suggestions as guaranteed profits violates financial advisory regulations
stage_ids:
- api_abstraction
- id: finance-C-015
when: When considering duck-typing as reason to skip interface validation
action: Skip verifying that new API implementations have each required method signatures
severity: high
kind: rationalization_guard
modality: must_not
consequence: Missing methods will cause AttributeError at runtime when modules call query_func, build_price_map, or other
APIs; duck-typing only works when interface contract is fulfilled
stage_ids:
- api_abstraction
- id: finance-C-016
when: When simplifying API by removing version compatibility branches
action: Remove Fava version 1.22 compatibility branch assuming no users have older versions
severity: medium
kind: rationalization_guard
modality: must_not
consequence: Users running Fava 1.22-1.29 will get TypeError when executing queries, breaking the web interface for a
significant portion of the user base
stage_ids:
- api_abstraction
- id: finance-C-018
when: When declaring ticker relationships in Beancount commodity directives
action: use 'a__equivalents' for same-share-class relationships that don't trigger wash sales
severity: high
kind: domain_rule
modality: must
consequence: Using 'a__substidenticals' for same-share-class funds instead of 'a__equivalents' will incorrectly flag them
as wash sale risks, leading to missed TLH opportunities and potential false wash sale warnings
stage_ids:
- ticker_relationships
- id: finance-C-022
when: When declaring archived tickers in commodity directives
action: include 'archive' in the commodity metadata to mark it as no longer held
severity: high
kind: domain_rule
modality: must
consequence: Without proper archive marking, archived tickers will continue appearing in active TLH recommendations, suggesting
swaps to funds that are no longer part of the portfolio
stage_ids:
- ticker_relationships
- id: finance-C-023
when: When declaring TLH partners in commodity directives
action: use 'a__tlh_partners' metadata field (not 'tlh_partners') with comma-separated ticker values
severity: high
kind: domain_rule
modality: must
consequence: Using the old 'tlh_partners' field or incorrect format will result in zero TLH partners being read, eliminating
all inferred TLH swap recommendations
stage_ids:
- ticker_relationships
- id: finance-C-024
when: When using RelateTickers with a commodities file path
action: provide an existing file path or None; exit gracefully if file doesn't exist
severity: high
kind: resource_boundary
modality: must
consequence: Missing file handling will cause abrupt program termination without clear error message, making debugging
difficult for users
stage_ids:
- ticker_relationships
- id: finance-C-028
when: When computing TLH groups in Step 3
action: apply the bidirectional inference rule exactly once without iteration or convergence
severity: high
kind: architecture_guardrail
modality: must
consequence: Iterating to convergence will cause unintended transitive effects where A→B→C creates A→C directly, which
may not be appropriate for all fund relationships
stage_ids:
- ticker_relationships
- id: finance-C-029
when: When filtering archived tickers from TLH groups
action: remove archived tickers from both keys and values in Step 5
severity: high
kind: architecture_guardrail
modality: must
consequence: Archived tickers appearing as keys will cause downstream errors; archived tickers in values will recommend
funds that are no longer held in the portfolio
stage_ids:
- ticker_relationships
- id: finance-C-032
when: When comparing funds in same_type_funds_only mode
action: compare quote types using the 'a__quoteType' metadata field with values like 'ETF' or 'MUTUALFUND'
severity: medium
kind: architecture_guardrail
modality: must
consequence: Without type filtering, mixing ETFs and mutual funds in TLH recommendations may create unsuitable swap suggestions
that are not truly equivalent investment vehicles
stage_ids:
- ticker_relationships
- id: finance-C-033
when: When TLH analysis results are presented to users
action: present the results as tax or financial advice
severity: high
kind: claim_boundary
modality: must_not
consequence: Presenting computational results as advice may lead users to make financial decisions without consulting
qualified tax professionals, potentially resulting in IRS penalties or missed tax optimization opportunities
stage_ids:
- ticker_relationships
- id: finance-C-034
when: When declaring a__tlh_partners metadata
action: declare only funds that are NOT substantially identical to the current fund
severity: high
kind: domain_rule
modality: must
consequence: Including substantially identical funds in TLH partners will produce misleading recommendations since swapping
between substantially identical funds triggers wash sales rather than achieving tax loss harvesting
stage_ids:
- ticker_relationships
- id: finance-C-036
when: When using the pretty_sort() function for display
action: document that the sort produces valid but potentially different results across different runs
severity: low
kind: operational_lesson
modality: should
consequence: Users expecting deterministic ordering may be confused when the same TLH groups appear in different order
on different runs, potentially leading to misread reports
stage_ids:
- ticker_relationships
- id: finance-C-038
when: When bucketing commodities by asset_allocation_* metadata
action: Verify bucket names use underscores consistently for hierarchical tree construction
severity: high
kind: architecture_guardrail
modality: must
consequence: Tree hierarchy will be incorrectly constructed if bucket names contain hyphens or other separators, causing
'equity-domestic' to be treated as one node instead of nested 'equity' and 'domestic' nodes
stage_ids:
- asset_allocation_by_class
- id: finance-C-041
when: When calculating asset allocation percentages
action: Use Decimal type for percentage calculations to avoid floating-point rounding errors
severity: high
kind: domain_rule
modality: must
consequence: Percentage calculations will have rounding errors when using float, potentially causing percentages to not
sum to 100.00% exactly
stage_ids:
- asset_allocation_by_class
- asset_allocation_by_account
- id: finance-C-042
when: When validating asset_allocation_* metadata percentages
action: Pad remaining percentage to 'unknown' bucket if metadata does not sum to 100%
severity: medium
kind: domain_rule
modality: must
consequence: Portfolio allocation will not sum to 100% if metadata percentages don't add up, silently misrepresenting
the actual portfolio allocation
stage_ids:
- asset_allocation_by_class
- id: finance-C-043
when: When building asset allocation tree from buckets
action: Skip positions with negative balances from asset allocation calculations
severity: medium
kind: domain_rule
modality: must_not
consequence: Liabilities will be incorrectly included as positive asset allocation, distorting portfolio composition and
percentages
stage_ids:
- asset_allocation_by_class
- id: finance-C-044
when: When specifying commodities for asset allocation classification
action: Prefix each asset class metadata keys with 'asset_allocation_'
severity: high
kind: architecture_guardrail
modality: must
consequence: Commodities without the asset_allocation_ prefix will not be included in any bucket, causing incomplete portfolio
representation
stage_ids:
- asset_allocation_by_class
- id: finance-C-045
when: When calculating asset allocation in multi-currency portfolios
action: Add price entries for each commodities to enable accurate market value conversion
severity: high
kind: operational_lesson
modality: must
consequence: Portfolio allocation will be incorrect if price entries are missing or outdated, causing inaccurate market
valuations of held commodities
stage_ids:
- asset_allocation_by_class
- id: finance-C-047
when: When excluding ancestor accounts from asset allocation
action: Set excluded ancestor account balances to empty Inventory instead of removing them
severity: high
kind: architecture_guardrail
modality: must
consequence: Ancestor accounts with explicit transactions will inflate asset allocation if balances are not zeroed, causing
double-counting of holdings
stage_ids:
- asset_allocation_by_class
- id: finance-C-048
when: When reporting asset allocation results
action: Claim percentages are accurate for live trading without considering price timing
severity: medium
kind: claim_boundary
modality: must_not
consequence: Asset allocation percentages depend on price entries at calculation time; presenting these as precise allocations
ignores that prices change throughout the trading day
stage_ids:
- asset_allocation_by_class
- id: finance-C-049
when: When filtering accounts for asset allocation
action: Include accounts with zero balance in the allocation calculation
severity: low
kind: operational_lesson
modality: must_not
consequence: Empty accounts will create unnecessary tree nodes and potentially confuse the hierarchical structure without
providing meaningful allocation data
stage_ids:
- asset_allocation_by_class
- id: finance-C-050
when: When implementing tax-adjusted positions with multiple currency conversions
action: Convert each positions to base currency before applying tax adjustment scaling
severity: high
kind: domain_rule
modality: must
consequence: Tax-adjusted position values will be incorrect if cost currencies are not converted to base currency before
scaling, causing wrong percentage allocations
stage_ids:
- asset_allocation_by_class
- id: finance-C-051
when: When configuring the asset allocation module
action: Set skip_tax_adjustment to True only when not using tax-adjusted accounts
severity: high
kind: operational_lesson
modality: must
consequence: Tax-deferred accounts like retirement accounts will show incorrect allocations if tax adjustment is skipped
when it should be applied
stage_ids:
- asset_allocation_by_class
- id: finance-C-054
when: When calculating asset allocation percentages
action: verify percentages sum to exactly 100% within each portfolio group
severity: high
kind: domain_rule
modality: must
consequence: Without rounding or normalization logic, individual allocation percentages may not sum to 100%, violating
the fundamental invariant that total allocation must equal 100% and breaking acceptance criteria
stage_ids:
- asset_allocation_by_account
- id: finance-C-056
when: When extending asset allocation selection strategies
action: implement new selection strategies as by_* functions in libaaacc.py module scope
severity: high
kind: architecture_guardrail
modality: must
consequence: The pattern_type lookup uses globals()['by_' + pattern_type] to dynamically locate selection strategy functions;
functions not defined at module level or with incorrect naming will raise KeyError, breaking portfolio grouping
stage_ids:
- asset_allocation_by_account
- id: finance-C-057
when: When configuring asset allocation by account patterns
action: verify regex patterns match against existing account names in the root tree
severity: high
kind: operational_lesson
modality: must
consequence: Non-matching regex patterns result in empty selected_accounts list, causing portfolio_total to be zero and
triggering division by zero error, producing no meaningful allocation output
stage_ids:
- asset_allocation_by_account
- id: finance-C-058
when: When running asset allocation by account via CLI
action: attempt to execute assetalloc_account CLI which exits with error message
severity: high
kind: resource_boundary
modality: must_not
consequence: 'assetalloc_account.py:59 contains sys.exit(''Error: CLI not yet implemented''), causing immediate termination
of CLI execution with no asset allocation output, blocking automated workflows'
stage_ids:
- asset_allocation_by_account
- id: finance-C-059
when: When converting account balances to operating currency
action: verify that at least one account balance exists in the operating currency before computing percentages
severity: high
kind: domain_rule
modality: must
consequence: When no accounts have balances in the configured operating currency, rrows stays empty, leading to portfolio_total=0
and division by zero, producing no allocation output despite valid configuration
stage_ids:
- asset_allocation_by_account
- id: finance-C-061
when: When returning table data for Fava template rendering
action: format output as (title, table_data) tuple with correct rtypes structure
severity: high
kind: architecture_guardrail
modality: must
consequence: Fava template expects (title, table_data) tuple structure from table_list_renderer macro; incorrect tuple
format causes Jinja2 template rendering failure and empty output
stage_ids:
- asset_allocation_by_account
- id: finance-C-062
when: When using include_children configuration option
action: call cost_or_value with include_children parameter correctly forwarded to accapi
severity: medium
kind: operational_lesson
modality: must
consequence: include_children defaults to False in config.get() but is passed to cost_or_value; incorrect handling causes
child account balances to be excluded when user expects them included, producing incorrect portfolio totals
stage_ids:
- asset_allocation_by_account
- id: finance-C-063
when: When adding new selection strategy types via pattern_type
action: use unvalidated user input as pattern_type without verifying the corresponding function exists
severity: high
kind: domain_rule
modality: must_not
consequence: Dynamic function lookup via globals()['by_' + pattern_type] with unvalidated config input allows attacker
or misconfigured user to trigger KeyError exceptions, potentially exposing internal module structure or causing denial
of service
stage_ids:
- asset_allocation_by_account
- id: finance-C-064
when: When presenting asset allocation results
action: claim the percentages represent actual market timing or real-time trading accuracy
severity: medium
kind: claim_boundary
modality: must_not
consequence: Asset allocation percentages are computed from historical cost basis and may not reflect current market values,
especially for assets with significant unrealized gains; presenting these as real-time allocation overstates precision
of actual portfolio composition
stage_ids:
- asset_allocation_by_account
- id: finance-C-065
when: When running asset allocation by account via Beancount CLI
action: expect the same functionality as the Fava web UI implementation
severity: high
kind: claim_boundary
modality: must_not
consequence: CLI implementation explicitly exits with error and cost_or_value is not implemented for AccAPI; users attempting
CLI usage will receive immediate termination, not a degraded but functional experience
stage_ids:
- asset_allocation_by_account
- id: finance-C-066
when: When the pattern matches accounts but they have zero balance
action: handle the empty result set gracefully with informative output
severity: medium
kind: operational_lesson
modality: must
consequence: When pattern matches accounts but all have zero balance in operating currency, rrows becomes empty, portfolio_total=0,
and percentage calculation fails silently, producing no output to indicate why allocation is empty
stage_ids:
- asset_allocation_by_account
- id: finance-C-067
when: When using regex patterns with account name matching
action: compile regex patterns with error handling for invalid regex syntax
severity: high
kind: operational_lesson
modality: must
consequence: re.compile(pattern) at libaaacc.py:29 and :48 raises re.error for invalid regex patterns, crashing the entire
asset allocation report and preventing viewing of any portfolio data
stage_ids:
- asset_allocation_by_account
- id: finance-C-068
when: When building the cash drag analysis table
action: filter out rows where position equals empty Inventory before displaying
severity: high
kind: domain_rule
modality: must
consequence: Empty zero-value rows will appear in the table output, cluttering the display with meaningless rows showing
no cash balances
stage_ids:
- cash_drag
- id: finance-C-070
when: When filtering cash positions by minimum threshold
action: apply min_threshold only after filtering empty positions and using converted position value
severity: high
kind: architecture_guardrail
modality: must
consequence: Threshold filtering may fail with AttributeError when position inventory contains no positions, or may compare
wrong currency values
stage_ids:
- cash_drag
- id: finance-C-071
when: When configuring the cash drag module
action: set accounts_exclude_pattern to exclude wallet cash and zero-sum accounts from cash drag analysis
severity: medium
kind: operational_lesson
modality: must
consequence: Physical wallet cash (like Cash-In-Wallet) and zero-sum reconciliation accounts will incorrectly appear as
uninvested cash drag
stage_ids:
- cash_drag
- id: finance-C-072
when: When running cash drag in command-line mode vs Fava
action: use AccAPI for CLI and FavaInvestorAPI for Fava extension, both providing consistent query_func interface
severity: high
kind: architecture_guardrail
modality: must
consequence: Cash drag analysis will fail entirely when running from command line, only working within Fava web interface
stage_ids:
- cash_drag
- id: finance-C-073
when: When using get_only_position() on Inventory objects
action: verify Inventory contains exactly one position before calling get_only_position()
severity: high
kind: domain_rule
modality: must
consequence: ValueError exception will be raised when get_only_position() is called on multi-position Inventory, crashing
the cash drag analysis
stage_ids:
- cash_drag
- id: finance-C-074
when: When presenting cash drag analysis results
action: display balances converted to the primary operating currency for consistent comparison
severity: medium
kind: architecture_guardrail
modality: must
consequence: Multi-currency cash holdings cannot be compared or summed meaningfully, making cash drag analysis unreliable
across different currencies
stage_ids:
- cash_drag
- id: finance-C-075
when: When identifying cash commodities via metadata
action: check for metadata_label_cash set to value 100 on commodity declarations
severity: medium
kind: domain_rule
modality: must
consequence: Money market funds and short-term bonds tagged with asset_allocation_Bond_Cash metadata will not be recognized
as cash equivalents
stage_ids:
- cash_drag
- id: finance-C-076
when: When claiming cash drag detection capabilities
action: claim real-time brokerage balance synchronization or live trading integration
severity: high
kind: claim_boundary
modality: must_not
consequence: Users will expect live cash position updates and automated investment capabilities that the system cannot
provide, leading to unmet expectations
stage_ids:
- cash_drag
- id: finance-C-077
when: When configuring accounts_pattern for cash drag analysis
action: use regex pattern anchored to asset account hierarchy (e.g., '^Assets:.*') to avoid false matches
severity: high
kind: operational_lesson
modality: must
consequence: Income, expense, or liability accounts matching the pattern will incorrectly contribute to cash drag analysis
with spurious amounts
stage_ids:
- cash_drag
- id: finance-C-078
when: When accepting user configurations for cash drag module
action: accept empty string as accounts_exclude_pattern without handling as no-exclusion case
severity: medium
kind: resource_boundary
modality: must_not
consequence: Empty exclude pattern will cause SQL query to fail or produce incorrect results due to malformed regex in
WHERE clause
stage_ids:
- cash_drag
- id: finance-C-082
when: When implementing tax loss harvesting for a client's accounts
action: Verify that Beancount booking method is set to STRICT (Specific Identification of Shares)
severity: high
kind: domain_rule
modality: must
consequence: Using average cost method or FIFO/LIFO while claiming SpecID-based TLH violates tax regulations, causing
disallowed losses and potential audits
stage_ids:
- tax_loss_harvesting
- id: finance-C-084
when: When implementing TLH for users with multiple substantially identical securities
action: Read substantially identical relationships from both a__substidenticals and a__equivalents commodity metadata
severity: high
kind: domain_rule
modality: must
consequence: Missing equivalent fund relationships (like VOO/VFINX/VFIAX) causes wash sales when users switch between
share classes of the same fund
stage_ids:
- tax_loss_harvesting
- id: finance-C-085
when: When deploying tax loss harvesting for a jurisdiction
action: Claim the tool works for non-US tax jurisdictions without explicit adaptation
severity: high
kind: claim_boundary
modality: must_not
consequence: US-specific rules (SpecID, wash sale 30-day window, >365-day long-term) do not apply to other countries,
leading to incorrect tax advice
stage_ids:
- tax_loss_harvesting
- id: finance-C-086
when: When presenting tax loss harvesting results to users
action: Present harvest recommendations without explicit financial/tax advice disclaimer
severity: high
kind: claim_boundary
modality: must_not
consequence: Without proper disclaimer, users may treat automated suggestions as professional tax advice, violating regulatory
requirements and causing financial harm
stage_ids:
- tax_loss_harvesting
- id: finance-C-087
when: When integrating tax loss harvesting into Fava
action: Apply Fava GUI time filters to the TLH module
severity: high
kind: resource_boundary
modality: must_not
consequence: Fava time filters cause unpredictable results in TLH since the module uses TODAY() for wash sale calculation
and needs current market prices
stage_ids:
- tax_loss_harvesting
- id: finance-C-088
when: When handling partial wash sales in tax loss harvesting
action: Display complex wash sale scenarios that require sophisticated IRS matching rules
severity: medium
kind: resource_boundary
modality: must_not
consequence: Displaying partial wash sales with ambiguous purchase/sale matching confuses users and may lead to incorrect
tax filings
stage_ids:
- tax_loss_harvesting
- id: finance-C-089
when: When implementing TLH wash sale detection across accounts
action: Configure dividend reinvestment to be OFF for each tickers across each accounts
severity: high
kind: operational_lesson
modality: must
consequence: Dividend reinvestment creates new purchases within the wash sale window, silently invalidating harvest recommendations
and causing wash sales
stage_ids:
- tax_loss_harvesting
- id: finance-C-090
when: When calculating harvestable losses in TLH summary
action: Include wash-sale-affected losses in the total harvestable loss summary
severity: high
kind: operational_lesson
modality: must_not
consequence: Summary includes losses that will be disallowed due to wash sales, overstating actual harvestable tax benefit
stage_ids:
- tax_loss_harvesting
- id: finance-C-092
when: When implementing TLH for non-standard Beancount ledgers
action: Verify account numbering follows Beancount convention with account_sortkey starting with 0 or 1
severity: high
kind: architecture_guardrail
modality: must
consequence: Hardcoded account_sortkey pattern '^[01]' causes non-standard ledgers to fail silently without finding any
harvestable lots
stage_ids:
- tax_loss_harvesting
- id: finance-C-093
when: When implementing tax loss harvesting queries
action: Use TODAY() function in SQL queries instead of hardcoded dates
severity: high
kind: architecture_guardrail
modality: must
consequence: Hardcoded dates cause look-ahead bias where future information influences current recommendations, and queries
become stale over time
stage_ids:
- tax_loss_harvesting
- id: finance-C-097
when: When querying lot data for gains minimization
action: Verify market_value inventory contains exactly one position using get_only_position()
severity: high
kind: domain_rule
modality: must
consequence: Inventory with multiple positions causes ambiguous gain calculation and incorrect lot pricing, leading to
wrong tax estimates
stage_ids:
- minimize_gains
- id: finance-C-098
when: When calculating cumulative tax columns in gains minimization
action: Calculate cumulative proceeds, taxes, and gains by iterating through sorted lots in order
severity: high
kind: domain_rule
modality: must
consequence: Cumulative columns show incorrect incremental tax burden if calculated out of order, breaking the progressive
selling guidance feature
stage_ids:
- minimize_gains
- id: finance-C-099
when: When implementing lot selection algorithm in gains minimization
action: Design lot_selection_algorithm as a replaceable/pluggable component for customization
severity: medium
kind: resource_boundary
modality: must
consequence: Hardcoded lot selection logic prevents users from implementing jurisdiction-specific or personalized selling
strategies
stage_ids:
- minimize_gains
- id: finance-C-100
when: When using the gains minimization results for tax planning
action: Claim that results account for asset allocation constraints or tax-advantaged account positioning
severity: high
kind: resource_boundary
modality: must_not
consequence: Results misrepresent tax optimization by ignoring portfolio rebalancing impacts and tax-advantaged account
considerations, potentially leading to unintended portfolio drift
stage_ids:
- minimize_gains
- id: finance-C-102
when: When replacing the lot selection algorithm in gains minimization
action: Maintain the est_tax_percent output column and ascending sort for compatibility with downstream cumulative calculations
severity: high
kind: architecture_guardrail
modality: must
consequence: Breaking the est_tax_percent interface breaks cumulative column calculations and interpolation functions
that rely on sorted lot order
stage_ids:
- minimize_gains
- id: finance-C-103
when: When accessing configuration in gains minimization
action: Retrieve minimizegains-specific config via accapi.get_custom_config('minimizegains')
severity: high
kind: architecture_guardrail
modality: must
consequence: Accessing config directly bypasses the abstraction layer, causing CLI and Fava plugin interfaces to fail
stage_ids:
- minimize_gains
- id: finance-C-105
when: When presenting tax burden estimates from gains minimization
action: Claim results as guaranteed tax savings or accurate tax liability predictions
severity: high
kind: claim_boundary
modality: must_not
consequence: Presenting estimates as guarantees misleads users about actual tax outcomes, which depend on individual circumstances,
tax year changes, and jurisdiction-specific rules
stage_ids:
- minimize_gains
- id: finance-C-106
when: When using gains minimization for tax planning decisions
action: Claim that the tool provides wash sale avoidance analysis or considers 30-day rebalancing windows
severity: high
kind: claim_boundary
modality: must_not
consequence: Unlike tax_loss_harvesting module, minimizegains does not check for wash sale implications, leading users
to believe they have comprehensive tax planning when critical rules are missing
stage_ids:
- minimize_gains
- id: finance-C-107
when: When building lot tables with single positions in gains minimization
action: Group by cost_date, currency, cost_currency, cost_number to verify each lot is analyzed separately
severity: high
kind: domain_rule
modality: must
consequence: Incorrect grouping causes mixed lots to be analyzed together, breaking gain calculation per-lot and preventing
granular tax optimization
stage_ids:
- minimize_gains
- id: finance-C-108
when: When interpolating tax burden for a specific liquidation amount
action: Use linear interpolation between cumulative rows when amount falls between two cu_proceeds values
severity: high
kind: domain_rule
modality: must
consequence: Non-linear interpolation causes incorrect tax burden estimates for specific liquidation amounts, leading
to suboptimal selling decisions
stage_ids:
- minimize_gains
- id: finance-C-109
when: When implementing a metadata summarizer config for Beancount
action: specify directive_type values other than 'accounts' or 'commodities'
severity: high
kind: domain_rule
modality: must_not
consequence: The build_table function only handles 'accounts' and 'commodities' directive types, and any other value causes
silent failure with empty table output
stage_ids:
- metadata_summarizer
- id: finance-C-110
when: When processing commodity leaf accounts in the summarizer
action: exclude commodity leaf accounts that have an open parent account from output
severity: high
kind: domain_rule
modality: must
consequence: Commodity accounts (uppercase names) duplicate their parent's metadata in summaries, causing confusing redundant
rows in the output table
stage_ids:
- metadata_summarizer
- id: finance-C-111
when: When building table rows from Open directive metadata
action: fill missing column values with empty strings rather than omitting the row or column
severity: medium
kind: domain_rule
modality: must
consequence: Tables become misaligned with missing cells or columns omitted entirely, making output unreadable and breaking
sort operations on the table
stage_ids:
- metadata_summarizer
- id: finance-C-112
when: When using meta_prefix filtering with specified columns
action: construct column names by concatenating meta_prefix with each specified column name
severity: high
kind: domain_rule
modality: must
consequence: Column matching fails and tables display empty values even when matching metadata exists, because the prefix
is not properly prepended
stage_ids:
- metadata_summarizer
- id: finance-C-113
when: When using col_labels to rename metadata columns
action: preserve the order of columns as specified in the columns config array
severity: high
kind: domain_rule
modality: must
consequence: Column order becomes inconsistent between the header and row data, causing ValueError exceptions when creating
namedtuples or misaligned table output
stage_ids:
- metadata_summarizer
- id: finance-C-114
when: When defining column labels in the summarizer config
action: include spaces in col_labels values
severity: high
kind: domain_rule
modality: must_not
consequence: Namedtuple field names cannot contain spaces, causing ValueError exceptions when building table output and
crashing the summarizer
stage_ids:
- metadata_summarizer
- id: finance-C-115
when: When defining namedtuple field names from config column labels
action: use column names that start with digits or contain invalid Python identifier characters
severity: high
kind: domain_rule
modality: must_not
consequence: Namedtuple requires valid Python identifiers as field names, causing ValueError exceptions when column labels
start with numbers or contain special characters
stage_ids:
- metadata_summarizer
- id: finance-C-116
when: When executing the active commodities SQL query
action: use the Beancount convention 'account_sortkey(account) ~ "^[01]"' for filtering investment accounts
severity: medium
kind: domain_rule
modality: must
consequence: Incorrect account numbering filter causes commodity holdings to be missed from 'active_only' summaries, making
market_value calculations incomplete or zero
stage_ids:
- metadata_summarizer
- id: finance-C-117
when: When retrieving the operating currency for balance calculations
action: access only index [0] of the operating currencies list, assuming at least one currency is defined
severity: high
kind: resource_boundary
modality: must
consequence: IndexError exception crashes the summarizer when the Beancount file does not define any operating_currency
option
stage_ids:
- metadata_summarizer
- id: finance-C-118
when: When compiling the acc_pattern regex in account filtering
action: provide a valid Python regex pattern string for account matching
severity: high
kind: resource_boundary
modality: must
consequence: Invalid regex syntax causes re.compile to raise PatternSyntaxError, crashing the summarizer before any table
generation occurs
stage_ids:
- metadata_summarizer
- id: finance-C-119
when: When running the summarizer without a fava-extension config
action: return an empty dictionary from get_custom_config when no config is found
severity: medium
kind: resource_boundary
modality: must
consequence: Without proper empty dict handling, build_tables iterates over an empty list and produces no output tables,
silently failing
stage_ids:
- metadata_summarizer
- id: finance-C-120
when: When implementing metadata prefix filtering with special metadata columns
action: use either meta_prefix with or without specified_cols, not both modes simultaneously
severity: medium
kind: operational_lesson
modality: must
consequence: Conflicting configuration causes incorrect column selection logic, either returning too many or too few columns
in the output
stage_ids:
- metadata_summarizer
- id: finance-C-121
when: When processing closed accounts in metadata summarization
action: exclude accounts that have a Close directive from the output
severity: medium
kind: architecture_guardrail
modality: must
consequence: Closed accounts continue appearing in metadata tables, showing stale contact information for accounts no
longer in use
stage_ids:
- metadata_summarizer
- id: finance-C-122
when: When adding special 'account' and 'balance' columns to account metadata
action: conditionally add these columns based on whether they appear in the columns config
severity: high
kind: architecture_guardrail
modality: must
consequence: Missing special column handling causes KeyError when accessing row keys, or duplicate columns when 'account'/'balance'
exist in both metadata and as special additions
stage_ids:
- metadata_summarizer
- id: finance-C-123
when: When implementing the summarizer module
action: access data exclusively through the AccAPI/FavaInvestorAPI abstraction, not directly from Beancount internals
severity: high
kind: architecture_guardrail
modality: must
consequence: Direct Beancount API access bypasses the abstraction layer, making the code incompatible with Fava CLI mode
and reducing portability
stage_ids:
- metadata_summarizer
- id: finance-C-124
when: When requesting the commodity market_value column
action: enable active_only mode to populate market_value data from current holdings
severity: medium
kind: operational_lesson
modality: must
consequence: Without active_only, market_value column shows empty values for all commodities, making the table misleading
as it omits actual holding values
stage_ids:
- metadata_summarizer
- id: finance-C-125
when: When summarizing metadata from Beancount directives
action: claim support for metadata on transaction postings or other directive types beyond Open directives
severity: high
kind: claim_boundary
modality: must_not
consequence: Documentation explicitly states only Open directive metadata is supported; claiming broader support violates
documented functionality and misleads users
stage_ids:
- metadata_summarizer
- id: finance-C-126
when: When comparing metadata summarizer output between backtest and live mode
action: expect identical metadata values if the Beancount ledger entries differ between periods
severity: low
kind: claim_boundary
modality: must_not
consequence: Metadata comes from Open directive entries; if accounts are opened/closed or metadata values change, the
summarizer output will differ, making it unsuitable for direct performance comparisons
stage_ids:
- metadata_summarizer
- id: finance-C-132
when: When generating metadata for auto-computed ticker attributes
action: Prefix auto-generated metadata labels with 'a__' to distinguish from user-configured 'asset_allocation_' namespace
severity: high
kind: architecture_guardrail
modality: must
consequence: Namespace collision causes user配置的资产配置被系统生成的属性覆盖,或者反之,导致资产分类报告错误
- id: finance-C-133
when: When presenting or reporting the system's tax optimization capabilities to users
action: Claim tax advice, financial advice, or investment recommendations — the system is informational only
severity: high
kind: claim_boundary
modality: must_not
consequence: Users may make investment decisions based on unverified tax calculations, leading to unexpected tax liability
and potential IRS penalties
- id: finance-C-134
when: When presenting wash sale analysis to users outside the United States
action: Claim wash sale detection accuracy — the 30-day window is hardcoded for US IRS wash sale rules only
severity: high
kind: claim_boundary
modality: must_not
consequence: International users receive incorrect wash sale warnings based on US-specific rules, causing them to either
miss legitimate harvesting opportunities or avoid legitimate rebalancing trades
- id: finance-C-135
when: When promoting the system to non-Beancount users
action: Claim compatibility with non-Beancount accounting systems — the entire architecture depends on Beancount's data
model
severity: high
kind: claim_boundary
modality: must_not
consequence: Users attempt to use the system with incompatible ledgers, resulting in complete failure to generate any
reports
- id: finance-C-136
when: When promoting or describing the system's capabilities
action: Claim real-time trading execution capability — this is a read-only analysis and reporting system with no execution
interface
severity: high
kind: claim_boundary
modality: must_not
consequence: Users expect automated trade execution that does not exist, leading to missed investment opportunities and
broken workflows
- id: finance-C-137
when: When describing tax optimization capabilities to users
action: Claim tax optimization accuracy for users without US SpecID/STRICT lot booking — the system assumes specific identification
of shares
severity: high
kind: claim_boundary
modality: must_not
consequence: Users on average cost basis accounting receive incorrect harvestable loss calculations, leading to tax filing
errors or missed deductions
- id: finance-C-138
when: When using Fava date filters with the Tax Loss Harvester module
action: Expect accurate TLH results — Fava date filter selection leads to unpredictable results with the TLH module
severity: high
kind: resource_boundary
modality: must_not
consequence: Selecting time filters in Fava causes TLH to show incorrect lots, wash sale detection, or summary values
- id: finance-C-139
when: When displaying wash sale analysis to users
action: Claim comprehensive wash sale coverage — partial wash sales and complex matching scenarios are not displayed
severity: medium
kind: claim_boundary
modality: must_not
consequence: Users rely on incomplete wash sale information and trigger wash sales inadvertently, resulting in disallowed
loss deductions and IRS adjustments
- id: finance-C-140
when: When running gains minimization analysis
action: Claim asset allocation preservation — the minimizegains algorithm does not account for asset allocation shifts
caused by selling
severity: medium
kind: claim_boundary
modality: must_not
consequence: Users following the minimization order inadvertently shift their portfolio allocation, leading to unintended
risk profile changes
- id: finance-C-141
when: When configuring tax rates for gains minimization
action: Set st_tax_rate and lt_tax_rate to actual applicable tax rates — default value of 1.0 (100%) produces wildly incorrect
tax estimates
severity: high
kind: resource_boundary
modality: must
consequence: Default tax rates cause est_tax_percent to be grossly incorrect, leading users to sell wrong lots and overpay
taxes by up to 100%
- id: finance-C-142
when: When implementing asset aggregation or reporting logic that combines values across multiple positions
action: Verify each values are normalized to the single base currency stored in the root node before aggregation; if base
currency is missing or values have mixed currencies, fail with an explicit error rather than attempting silent aggregation
severity: high
kind: domain_rule
modality: must
consequence: Silent mixed-currency aggregation produces nonsensical aggregate percentages and misleading portfolio reports,
causing investors to make decisions based on fundamentally invalid data
derived_from_bd_id: BD-002
- id: finance-C-143
when: When adding or modifying account selection strategies in the asset allocation by account stage
action: Implement new strategies as by_* functions that match the pattern_type string from configuration; any new strategy
requires a corresponding by_* function with the exact string match expected by the config parser
severity: high
kind: architecture_guardrail
modality: must
consequence: Strategies without matching by_* functions are silently skipped, causing only the default strategies to run
and producing incomplete or incorrect account selection results
derived_from_bd_id: BD-004
- id: finance-C-144
when: When implementing wash sale detection logic for tax loss harvesting
action: Read substantially identical ticker groups from beancount commodity metadata (Custom directives); wash sale applies
only within user-defined groups marked as substantially identical; cross-group transactions must be treated as distinct
securities
severity: high
kind: domain_rule
modality: must
consequence: Incorrectly treating cross-group transactions as substantially identical causes false wash sale disallowances,
unnecessarily reducing realized losses and increasing tax liability
derived_from_bd_id: BD-009
- id: finance-C-145
when: When using the framework's default tax rate configuration for US 2024 (ST=37%, LT=20%)
action: Verify that short_term_rate and long_term_rate from configuration match the actual jurisdiction tax brackets;
if operating outside US 2024, explicitly configure rates via beancount Custom directives before running tax analysis
severity: medium
kind: operational_lesson
modality: should
consequence: Default US tax rates applied to non-US portfolios systematically produce incorrect tax calculations, causing
either overpayment (if rates too high) or underpayment (if rates too low) with audit exposure
derived_from_bd_id: BD-013
- id: finance-C-146
when: When implementing or modifying wash sale detection queries
action: Verify the 30-day wash sale window uses 61 total days (30 before + 1 sale day + 30 after); if modifying the window,
change each related date calculations consistently to maintain the full 61-day look-back/look-forward span
severity: high
kind: domain_rule
modality: must
consequence: Using inconsistent wash sale windows (e.g., only 30 days total) fails to capture the full IRS 30-day rule,
potentially reporting wash sale adjustments that are incomplete or missing, risking non-compliance
derived_from_bd_id: BD-007
- id: finance-C-147
when: When implementing tax lot liquidation order in minimize_gains analysis
action: 'Sort lots by estimated tax percentage in ascending order: negative gains (losses) sorted with most negative first
for harvesting, positive gains sorted by lowest tax rate first; use this ordering to determine which lots are selected
for liquidation to minimize immediate tax burden'
severity: high
kind: domain_rule
modality: must
consequence: Using FIFO or LIFO instead of tax-optimized sorting produces sub-optimal tax outcomes, potentially causing
investors to pay more taxes than necessary on realized gains
derived_from_bd_id: BD-011
- id: finance-C-149
when: When implementing security substitution logic for NAV calculations or tax lot tracking
action: Verify both 'a__equivalents' and 'a__substidenticals' metadata fields are combined by default in RelateTickers.substidenticals();
if using equivalents_only=True, explicitly document that only equivalents are being used and acknowledge reduced substitution
coverage
severity: high
kind: domain_rule
modality: must
consequence: Missing either field from the combination causes misclassified securities in NAV calculations and tax lot
tracking, leading to incorrect portfolio valuations and potential tax reporting errors
derived_from_bd_id: BD-099
- id: finance-C-150
when: When implementing metadata column filtering logic in the summarizer
action: Preserve prefix-based wildcard matching (e.g., 'contact_' matches 'contact_phone', 'contact_email') for flexible
metadata column selection
severity: high
kind: domain_rule
modality: must
consequence: Changing prefix matching to exact matching breaks user-defined flexible schemas; applications relying on
wildcard metadata lookups will fail to find expected columns
derived_from_bd_id: BD-014
- id: finance-C-155
when: When calculating cumulative tax impact for progressive sell simulations
action: Compute cumulative columns (running totals of selling amounts and associated taxes) AFTER sorting the lot list,
not before — ensures cumulative columns reflect the correct progressive tax situation as lots are added in sorted order
severity: medium
kind: operational_lesson
modality: must
consequence: Computing cumulative columns before sorting produces incorrect running totals; users simulating different
sell quantities see wrong tax impact at each level and may make suboptimal tax decisions
derived_from_bd_id: BD-012
- id: finance-C-158
when: When implementing or refactoring ticker relationship logic for tax loss harvesting
action: Apply graph-based transitive closure when inferring TLH partner relationships — if ticker A has a TLH partner
B, and ticker B has a TLH partner C, then tickers A, B, and C must be treated as a single TLH group
severity: high
kind: domain_rule
modality: must
consequence: Without transitive closure, TLH recommendations may violate wash sale rules when the same underlying position
is held across multiple tickers in the same group
derived_from_bd_id: BD-016
- id: finance-C-159
when: When configuring or modifying the loss_threshold parameter for tax loss harvesting
action: Verify that loss_threshold=1 dollar matches the actual transaction cost structure for the account — if per-trade
costs exceed $1, adjust threshold upward to avoid harvesting trivial losses that generate net negative tax benefit
severity: medium
kind: operational_lesson
modality: should
consequence: Default loss_threshold of $1 may trigger harvests for positions with unrealized losses below transaction
costs, creating net tax loss after accounting for brokerage fees and spread
derived_from_bd_id: BD-024
- id: finance-C-160
when: When configuring account filters for tax loss harvesting or gain minimization modules
action: Verify that account_sortkey values for each accounts intended for tax optimization follow the ^[01] regex pattern
— accounts not matching this pattern are silently excluded from TLH and minimizegains processing
severity: medium
kind: operational_lesson
modality: should
consequence: Misconfigured account sortkey values silently exclude accounts from tax optimization, causing missed harvesting
opportunities or suboptimal tax recommendations
derived_from_bd_id: BD-026
- id: finance-C-162
when: When processing recent purchases for wash sale detection across substantially identical funds
action: Deduplicate recent purchases by ticker symbol before wash sale window check — when multiple substantially identical
funds hold the same underlying position, only one ticker entry per underlying asset should appear in wash sale analysis
severity: high
kind: domain_rule
modality: must
consequence: Without ticker-level deduplication, wash sale detection produces false positives triggering multiple overlapping
wash sale windows for the same underlying position across different fund wrappers
derived_from_bd_id: BD-030
- id: finance-C-164
when: When implementing lot selection logic for gain minimization or tax-aware rebalancing
action: Sort candidate lots by estimated tax percentage in ascending order — lots with lowest tax impact must be prioritized
for sale before lots with higher tax impact to minimize tax liability
severity: high
kind: domain_rule
modality: must
consequence: Incorrect lot sort order causes suboptimal tax outcomes where higher-tax-impact lots are sold before lower-tax-impact
lots, increasing actual tax liability beyond the minimum achievable
derived_from_bd_id: BD-042
- id: finance-C-165
when: When filtering ticker lists for active TLH group analysis and recommendations
action: Exclude archived tickers from TLH group analysis — archived tickers represent positions no longer held and generating
sales recommendations for them produces recommendations users cannot act upon
severity: medium
kind: operational_lesson
modality: must
consequence: Including archived tickers in TLH recommendations causes the system to suggest selling positions that no
longer exist in the portfolio, creating confusion and wasted analysis effort
derived_from_bd_id: BD-018
- id: finance-C-167
when: When persisting data to cache files (e.g., pickle files) in production
action: Assume immutable append-only semantics with timestamp and hash chaining exist in the framework — the framework
lacks these capabilities and pickle cache files can be arbitrarily modified
severity: high
kind: claim_boundary
modality: must_not
consequence: Without immutable append-only semantics, cache files can be arbitrarily modified which breaks the audit trail
and compromises data integrity verification in production environments
derived_from_bd_id: BD-GAP-006
- id: finance-C-168
when: When implementing data persistence operations that require audit trail
action: Implement immutable append-only semantics for each data write operation, including timestamp and hash chaining
to create verifiable audit trail — use append-only logging with SHA-256 hash of previous entry
severity: high
kind: domain_rule
modality: must
consequence: Without timestamp and hash chaining, cache modifications cannot be detected or audited, making it impossible
to verify data integrity or trace unauthorized changes in production
derived_from_bd_id: BD-GAP-006
- id: finance-C-169
when: When processing datetime values throughout the system
action: Assume consistent timezone handling across each operations — the framework has mixed timezone handling with some
timestamps having tzinfo and others without
severity: high
kind: claim_boundary
modality: must_not
consequence: Mixed timezone handling causes subtle datetime comparison bugs where timestamps with and without timezone
info are compared incorrectly, leading to incorrect scheduling, reporting, or calculation errors
derived_from_bd_id: BD-GAP-007
- id: finance-C-170
when: When handling timestamps throughout the system
action: Normalize each datetime operations to UTC and use timezone-aware datetime objects consistently — apply UTC normalization
at data ingestion and verify each timestamps stored have explicit timezone information
severity: high
kind: domain_rule
modality: must
consequence: Without UTC normalization, datetime comparisons across different system components produce incorrect results,
causing wrong order execution times, misplaced transaction records, and incorrect NAV calculations
derived_from_bd_id: BD-GAP-007
- id: finance-C-171
when: When implementing asset allocation calculation logic
action: Bucket unallocated percentages (when metadata sums to less than 100%) into an 'unknown' category rather than silently
scaling or rejecting the configuration
severity: high
kind: domain_rule
modality: must
consequence: Without unknown bucket handling, unallocated percentages are silently misallocated or cause configuration
rejection, leading to incorrect portfolio allocation calculations and misleading performance attribution
derived_from_bd_id: BD-033
- id: finance-C-172
when: When calculating NAV scaling ratios
action: Limit the NAV scaling calculation to the most recent 10 historical ratio observations — use sliding window approach
where window shrinks to available count when fewer than 10 ratios exist; use fallback behavior when zero ratios are
available
severity: high
kind: domain_rule
modality: must
consequence: Using more than 10 historical ratios introduces stale data drift into NAV calculations, causing scaled NAV
values to diverge from current market conditions and leading to incorrect performance measurement
derived_from_bd_id: BD-071
- id: finance-C-173
when: When modifying ScaledNAV or RelateTickers class hierarchy
action: Verify that any changes to build_commodity_groups() method or metadata format (a__equivalents, a__substidenticals)
account for impact on both ScaledNAV and RelateTickers classes — use composition over inheritance or explicit interface
contracts to decouple if changes are frequent
severity: medium
kind: operational_lesson
modality: should
consequence: Tight coupling through inheritance means changes to build_commodity_groups() or metadata format silently
affect both classes, potentially causing unexpected behavior in scaled NAV calculations or ticker relationship analysis
derived_from_bd_id: BD-098
- id: finance-C-174
when: When using the framework's expense ratio conversion logic for backtesting or display calculations
action: Verify that expense ratio values are provided in decimal format before the system multiplies by 100; if source
data is already in percentage format, divide by 100 before caching to avoid inflated values
severity: medium
kind: operational_lesson
modality: should
consequence: If expense ratios are sourced in percentage format (e.g., 0.5 for 0.5%) but the cache conversion multiplies
by 100, the stored value becomes 50%, causing strategy cost calculations to underestimate impact by 100x and making
expense-adjusted returns appear artificially inflated
derived_from_bd_id: BD-063
- id: finance-C-176
when: When implementing hierarchical balance aggregation with recursive subtree computation
action: Validate account hierarchy for circular parent references before running balance aggregation; implement cycle
detection or use iterative depth-first traversal with visited tracking
severity: high
kind: architecture_guardrail
modality: must
consequence: Circular parent references in the account hierarchy cause infinite recursion during subtree balance computation,
leading to stack overflow crashes and complete backtesting failure with no partial results returned
derived_from_bd_id: BD-068
- id: finance-C-178
when: When implementing or refactoring gain term classification logic for short-term vs long-term capital gains
action: Use python-dateutil relativedelta for holding period date arithmetic to verify accurate IRS 1-year-and-a-day boundary
calculation — do NOT use simple day-counting (365 days) which fails at month boundaries such as Feb 29 to Mar 1
severity: high
kind: operational_lesson
modality: must
consequence: Using simple 365-day arithmetic incorrectly classifies gains held across month boundaries, causing wrong
short-term vs long-term gain determination that leads to tax miscalculation and potential IRS compliance issues
derived_from_bd_id: BD-072
- id: finance-C-179
when: When configuring or validating the TLH loss threshold parameter
action: Set loss_threshold to a negative value — the threshold must be non-negative; negative values would incorrectly
include profits or zero-change positions as harvestable losses
severity: high
kind: operational_lesson
modality: must_not
consequence: Negative loss_threshold allows harvesting of positions without actual losses, generating wash sale complications
and transaction costs without tax benefit
derived_from_bd_id: BD-074
- id: finance-C-180
when: When implementing or refactoring TLH lot filtering logic
action: Filter candidate lots using 'losses < -loss_threshold' comparison — not 'losses <= -loss_threshold' or 'losses
< loss_threshold' which alter the qualifying boundary
severity: high
kind: operational_lesson
modality: must
consequence: Using <= instead of < includes zero-change lots; using positive comparison flips the logic entirely, harvesting
profits instead of losses — both cause financial harm
derived_from_bd_id: BD-074
- id: finance-C-181
when: When implementing or refactoring tax burden interpolation between bracket boundaries
action: 'Handle edge cases for proceeds outside bracket range: use floor ratio for proceeds below lowest bracket, use
ceiling ratio for proceeds above highest bracket, and return zero tax for zero proceeds'
severity: medium
kind: operational_lesson
modality: must
consequence: Omitting boundary handling causes undefined interpolation results for out-of-range proceeds values, producing
incorrect tax estimates that could lead to underpayment or overpayment
derived_from_bd_id: BD-075
- id: finance-C-182
when: When configuring or validating the cash drag minimum threshold parameter
action: Set min_threshold to a negative value — the threshold must be non-negative; negative values would incorrectly
flag profits or unchanged positions as cash drag
severity: high
kind: operational_lesson
modality: must_not
consequence: Negative threshold causes incorrect flagging of profitable or neutral positions for liquidation, generating
unnecessary transaction costs and potentially realizing gains that create tax liability
derived_from_bd_id: BD-080
- id: finance-C-183
when: When implementing or refactoring cash drag position filtering logic
action: Filter positions using 'position >= min_threshold' comparison — positions exactly equal to min_threshold are included,
not excluded
severity: medium
kind: operational_lesson
modality: must
consequence: Using > instead of >= excludes borderline positions at exactly the threshold, reducing the pool of harvestable
lots and potentially missing tax optimization opportunities
derived_from_bd_id: BD-080
- id: finance-C-184
when: When implementing or refactoring average tax rate calculation
action: Guard against division by zero when cumulative_proceeds equals zero — return zero average rate or raise explicit
exception; this edge case cannot occur in valid tax scenarios but must be defensively handled
severity: high
kind: domain_rule
modality: must
consequence: Division by zero crashes the tax calculation module, causing backtest pipeline failure and preventing portfolio
tax efficiency analysis
derived_from_bd_id: BD-076
- id: finance-C-185
when: When implementing or refactoring marginal tax rate calculation between tax brackets
action: Detect and handle zero Δ_proceeds between brackets that yields undefined infinite marginal rate — return a sentinel
value or skip to next valid bracket range
severity: high
kind: domain_rule
modality: must
consequence: Division by zero from zero Δ_proceeds crashes the tax calculation module, causing backtest pipeline failure
and preventing lot-selection optimization
derived_from_bd_id: BD-077
- id: finance-C-186
when: When implementing or refactoring TLH lot selection ordering logic
action: Sort TLH candidate lots in ascending order of estimated tax percentage to prioritize harvesting highest-tax-impact
losses first — verify stable sort for ties preserving original list order
severity: high
kind: domain_rule
modality: must
consequence: Sorting in descending order harvests lowest-tax-impact losses first, wasting limited harvestable slots on
small losses while missing opportunities to harvest larger losses that provide greater tax savings
derived_from_bd_id: BD-078
- id: finance-C-187
when: When implementing risk calculation or option pricing modules in backtesting
action: Assume the framework handles volatility model family selection and distribution assumption — the framework does
not implement volatility model families (e.g., GARCH, EWMA, historical volatility) or distribution selection (normal,
t-distribution, skew-normal)
severity: high
kind: claim_boundary
modality: must_not
consequence: Without explicit volatility model and distribution selection, the framework may use inappropriate assumptions
causing systematic mispricing of risk measures by 10-30% for strategies with option positions or volatility-dependent
signals
derived_from_bd_id: BD-GAP-004
- id: finance-C-188
when: When calculating factor IC (Information Coefficient) in cross-sectional backtesting
action: Assume the framework handles factor IC demeaning and group alignment — the framework does not implement cross-sectional
IC demeaning (subtracting cross-sectional mean) or proper group alignment before IC computation
severity: high
kind: claim_boundary
modality: must_not
consequence: Without IC demeaning, cross-sectional mean returns contaminate factor IC calculations, causing 5-15% systematic
bias in IC estimates and leading to incorrect factor selection decisions
derived_from_bd_id: BD-GAP-005
- id: finance-C-189
when: When implementing a new subclass of FavaInvestorAPI or extending the API layer
action: 'Implement each required method signatures: get_commodity_value, get_cost_basis, get_open_amounts — duck typing
provides no compile-time enforcement; missing methods will not be detected until runtime'
severity: high
kind: architecture_guardrail
modality: must
consequence: Missing or misspelled method signatures cause runtime AttributeError when the framework invokes expected
interface methods, breaking financial calculations and potentially corrupting portfolio valuations
derived_from_bd_id: BD-019
- id: finance-C-190
when: When implementing iterative financial calculations or optimization algorithms
action: Assume the framework handles convergence automatically without explicit criteria — missing convergence criteria
means no defined stopping point for iterative calculations
severity: high
kind: claim_boundary
modality: must_not
consequence: Without explicit convergence criteria, iterative calculations may not terminate reliably, causing either
infinite loops consuming CPU or premature termination with inaccurate results that silently propagate through financial
computations
derived_from_bd_id: BD-GAP-001
- id: finance-C-191
when: When implementing any iterative calculation or convergence-dependent algorithm in financial modules
action: 'Define explicit convergence parameters: max_iterations (required), tolerance (required), and a convergence_check
callable — document these as class attributes or constructor parameters'
severity: high
kind: domain_rule
modality: must
consequence: Unbounded iterative calculations may run indefinitely in edge cases, causing system hangs or producing inaccurate
financial results that accumulate without obvious warning
derived_from_bd_id: BD-GAP-001
- id: finance-C-193
when: When classifying assets into portfolio buckets for asset allocation analysis
action: Only recognize commodities with metadata prefix 'asset_allocation_*' for bucketing — do not hardcode commodity
names, use implicit patterns, or implement alternative classification logic
severity: high
kind: domain_rule
modality: must
consequence: Incorrect asset classification changes portfolio allocation percentages, causing misaligned rebalancing decisions
that over-weight or under-weight asset classes relative to investment policy targets
derived_from_bd_id: BD-001
- id: finance-C-194
when: When calculating gain holding period for tax-loss harvesting or capital gains classification
action: Use dateutil.relativedelta for date arithmetic in gain_term calculation — relativedelta correctly handles leap
year edge cases (Feb 29 + 1 year = Feb 28), matching IRS interpretation of 'greater than 1 year'
severity: high
kind: operational_lesson
modality: must
consequence: Using simple 365-day arithmetic misclassifies gains near leap years (e.g., Feb 29 purchase date), applying
incorrect tax rates and causing unexpected tax liability when short-term gains are incorrectly taxed at long-term rates
or vice versa
derived_from_bd_id: BD-008
- id: finance-C-195
when: When implementing or maintaining version-specific Fava compatibility logic
action: Duplicate version detection logic across different modules (FavaInvestorAPI and other workers) — use a unified
version detection utility with a single source of truth for version thresholds
severity: high
kind: architecture_guardrail
modality: must_not
consequence: Duplicated version thresholds create silent inconsistencies where query results differ based on which module's
version logic is used, causing backtest results to diverge from live trading behavior when Fava releases new versions
derived_from_bd_id: BD-115
- id: finance-C-196
when: When using AccAPI.root_tree() or designing new AccAPI functionality
action: Import fava.core.Tree in beancountinvestorapi.py — the standalone Beancount API should not depend on Fava internals
to maintain portability across different Beancount deployments
severity: medium
kind: architecture_guardrail
modality: must_not
consequence: Fava dependency leak means AccAPI fails when Fava is not installed, preventing use of the 'pure' Beancount
API in environments without Fava and indicating incomplete separation of concerns that could cause API failures
derived_from_bd_id: BD-101
- id: finance-C-197
when: When implementing or modifying API methods (build_price_map, realize, query_func, get_commodity_directives) in FavaInvestorAPI
or AccAPI
action: Verify method signatures and return values remain identical between FavaInvestorAPI and AccAPI implementations
- any divergence breaks cross-context compatibility and causes consumers using the wrong context to receive incorrect
data or silent failures
severity: high
kind: domain_rule
modality: must
consequence: If FavaInvestorAPI and AccAPI implementations diverge on any method (e.g., different parameter handling,
missing fields, or changed return types), code switching between Fava and CLI contexts will receive inconsistent results
without explicit errors
derived_from_bd_id: BD-093
- id: finance-C-198
when: When implementing or modifying account field extraction logic in TLH or minimizegains modules
action: Use libtlh.get_account_field(options) as the sole extraction function - do not duplicate logic or extract similar
functionality inline; changes to this function affect both TLH and minimizegains simultaneously
severity: high
kind: architecture_guardrail
modality: must
consequence: Duplicating account field extraction logic in either TLH or minimizegains creates divergence where one module
may use stale parsing rules while the other uses updated logic, corrupting tax optimization recommendations
derived_from_bd_id: BD-095
- id: finance-C-203
when: When implementing asset allocation calculations in the asset_allocation_by_class stage
action: Remove empty accounts (accounts with no holdings) and zero-balance ancestor accounts from asset allocation calculations
— only include accounts with actual positive balances
severity: high
kind: domain_rule
modality: must
consequence: Including empty accounts in asset allocation percentages distorts the allocation results, causing portfolio
displays to show incorrect positions and potentially leading to poor investment decisions based on misleading allocation
data
derived_from_bd_id: BD-035
- id: finance-C-204
when: When implementing currency conversion logic for multi-currency portfolios in asset_allocation_by_class
action: Convert currencies via operating currencies as an intermediate step when direct conversion is unavailable — implement
fallback routing through common currencies (USD, EUR) to verify conversions complete even when direct currency pairs
lack pricing data
severity: high
kind: domain_rule
modality: must
consequence: Without currency conversion fallback, multi-currency portfolios with unavailable direct conversion pairs
will fail to calculate, causing portfolio valuation errors and preventing asset allocation from completing for valid
international portfolios
derived_from_bd_id: BD-037
- id: finance-C-205
when: When using the framework's default cash detection for cash drag calculations in the cash_drag stage
action: Verify that operating currencies are included as cash holdings by default — check that base currencies held in
accounts are captured in cash calculations unless explicitly excluded in configuration
severity: medium
kind: operational_lesson
modality: should
consequence: Default inclusion of operating currencies as cash affects cash drag detection accuracy; strategies relying
on accurate cash position data may miscalculate idle cash and miss optimization opportunities
derived_from_bd_id: BD-039
- id: finance-C-206
when: When configuring cash drag detection in the cash_drag stage
action: Verify that the default regex pattern '^Assets' matches the portfolio's actual account naming convention — customize
the accounts_pattern parameter if the portfolio uses non-standard account naming structures
severity: medium
kind: operational_lesson
modality: should
consequence: Using the default '^Assets' pattern without verification causes cash drag detection to scan incorrect accounts,
potentially missing cash positions in non-standard account structures and producing incomplete cash drag analysis
derived_from_bd_id: BD-040
- id: finance-C-208
when: When using minimizegains strategy for tax optimization
action: Preserve the leap-year-aware date arithmetic in libtlh.gain_term() using relativedelta — verify that any refactoring
maintains the same relativedelta-based date calculations for tax term classification (short-term vs long-term)
severity: high
kind: operational_lesson
modality: must
consequence: Incorrect leap-year handling in gain_term() corrupts short/long-term tax classification, causing minimizegains
to make suboptimal liquidation decisions based on wrong holding period calculations and users to pay higher taxes than
backtested
derived_from_bd_id: BD-109
- id: finance-C-209
when: When configuring cross-account wash sale detection
action: Verify substantially identical fund classification accuracy (BD-017) before relying on cross-account wash sale
detection — verify each fund pair is correctly marked as equivalent or substantially identical before using BD-030 deduplication
severity: high
kind: operational_lesson
modality: must
consequence: Incorrectly marking substantially identical funds as equivalent causes cross-account wash sale detection
to either miss real violations (false negatives) or trigger false blocks (false positives), leading to either IRS penalties
or unnecessarily restricted trading
derived_from_bd_id: BD-112
- id: finance-C-210
when: When implementing cash commodity detection logic in asset allocation
action: Check asset_allocation_Bond_Cash metadata equals exactly 100 to classify an instrument as cash commodity
severity: high
kind: domain_rule
modality: must
consequence: Using a different threshold or metadata field for cash classification causes incorrect asset allocation,
potentially leading to inappropriate portfolio rebalancing decisions or misrepresentation of actual cash positions
derived_from_bd_id: BD-038
- id: finance-C-211
when: When implementing API calls for CLI mode batch operations
action: Set end_date to None in CLI mode to disable date filtering — must not default to current date like GUI mode
severity: high
kind: domain_rule
modality: must_not
consequence: Defaulting end_date to current date in CLI mode breaks batch operations that need to process all historical
data, causing incomplete analysis when users run scripts against full historical datasets
derived_from_bd_id: BD-053
- id: finance-C-212
when: When implementing configuration extraction for the investor API
action: Extract configuration from fava-extension custom directives in the beancount file, not from separate config files
severity: high
kind: architecture_guardrail
modality: must
consequence: Using separate config files instead of beancount directives breaks configuration version control and portability,
causing configuration drift between environments and loss of audit trail
derived_from_bd_id: BD-054
- id: finance-C-213
when: When using the framework's default tax rate parameters for capital gains calculations
action: Verify that default short_term_tax_rate=1% and long_term_tax_rate=1% match the user's actual tax bracket, adjusting
if the user falls in higher brackets where actual rates may be 15%, 20%, or 37%
severity: high
kind: operational_lesson
modality: must
consequence: Default 1% tax rates significantly underestimate tax liability for most investors, causing backtested portfolio
values to appear materially higher than actual results after tax settlement
derived_from_bd_id: BD-041
- id: finance-C-214
when: When implementing inventory aggregation logic for tax lot tracking
action: Use sequential accumulation into a single Inventory object, not vectorized operations, to preserve lot-level detail
for tax calculations
severity: high
kind: domain_rule
modality: must
consequence: Using vectorized or parallel accumulation loses lot-level granularity, making tax lot tracking incomplete
and causing incorrect tax reporting for accounts with multiple position lots
derived_from_bd_id: BD-081
- id: finance-C-215
when: When implementing inventory aggregation for large portfolios
action: Handle numeric type overflow for cumulative inventory sums exceeding standard integer/float bounds - use Decimal
type or explicit overflow detection
severity: high
kind: domain_rule
modality: must
consequence: Unchecked overflow in inventory accumulation silently wraps or truncates values, causing portfolio totals
to become incorrect and triggering wrong tax lot calculations for large positions
derived_from_bd_id: BD-081
- id: finance-C-216
when: When implementing security clustering for commodity-equivalence groups
action: Use Union-Find data structure with path compression and union by rank for near-constant-time group operations;
do not replace with dict-based or list-based grouping
severity: high
kind: architecture_guardrail
modality: must
consequence: Dict-based grouping has O(n) lookup time, causing exponential slowdown when processing portfolios with thousands
of securities; Union-Find provides near-constant-time operations essential for tax-lot matching
derived_from_bd_id: BD-083
- id: finance-C-217
when: When implementing Union-Find for commodity grouping
action: 'Handle edge cases: self-unions (union(A,A)) must be idempotent, isolated securities form singleton groups, circular
dependencies must resolve correctly via algorithm'
severity: high
kind: domain_rule
modality: must
consequence: Missing edge case handling causes incorrect group membership, leading to wrong commodity-equivalence classification
and broken tax-lot matching across related securities
derived_from_bd_id: BD-083
- id: finance-C-218
when: When selecting representative tickers from commodity-equivalence groups
action: Implement deterministic tie-breaking logic for groups with multiple valid candidates; empty groups must produce
no representative, single-element groups must trivially select that element
severity: high
kind: domain_rule
modality: must
consequence: Non-deterministic or missing tie-breaking causes inconsistent reporting across runs, making audit trails
unreliable and potentially causing incorrect tax reporting when different representatives are selected on each execution
derived_from_bd_id: BD-085
- id: finance-C-219
when: When distributing position amounts into allocation buckets
action: Validate that sum of bucket meta_value weights equals 100 for complete coverage; meta_value of 0 distributes nothing,
and meta_value > 100 would overallocate and should be rejected
severity: high
kind: domain_rule
modality: must
consequence: Unvalidated bucket weights causing overallocation or underallocation distorts asset categorization, leading
to incorrect risk reporting and potentially wrong portfolio allocation decisions
derived_from_bd_id: BD-088
- id: finance-C-220
when: When implementing bucket allocation distribution formula
action: 'Use the formula: amount * (meta_value / 100); do not reorder or use alternative distribution methods that could
cause floating-point precision errors with small meta_value percentages'
severity: medium
kind: domain_rule
modality: must
consequence: Alternative formulas or incorrect operator precedence cause miscalculated bucket distributions that silently
miscategorize assets, corrupting portfolio analysis reports
derived_from_bd_id: BD-088
- id: finance-C-221
when: When calculating asset allocation percentages
action: Pre-validate that portfolio total is non-zero before invoking (balance / total) * 100; callers must check total
> 0 or handle division by zero explicitly
severity: high
kind: domain_rule
modality: must
consequence: Division by zero on empty portfolios produces Invalid results that propagate to reporting dashboards, causing
misleading percentage displays and potentially triggering automated rebalancing on zero balances
derived_from_bd_id: BD-067
- id: finance-C-222
when: When implementing TLH partner inference logic in the tax loss harvesting system
action: 'Apply symmetric closure iteratively until fixed point: if A→(B,C) then B→(A,C) and C→(A,B) must be true; empty
partner sets must remain empty; large groups may create many inferred edges and require iteration limits'
severity: high
kind: domain_rule
modality: must
consequence: Incorrect symmetric closure causes wash sale violations to go undetected in complex multi-fund scenarios
where funds share common holdings; the IRS wash sale rule disallows loss deductions when substantially identical securities
are purchased within 30 days before or after the sale
derived_from_bd_id: BD-084
- id: finance-C-223
when: When implementing price ratio calculation (MF_price / ETF_price) in scaled NAVs processing
action: Validate ETF_price is non-zero before division; handle missing ETF price on a date by excluding that observation;
flag or investigate very high ratios as potential data issues
severity: high
kind: domain_rule
modality: must
consequence: Division by zero crashes the calculation pipeline; unhandled missing ETF prices create data gaps in NAV estimation;
very high ratios indicate stale prices or data corruption causing incorrect valuation
derived_from_bd_id: BD-087
- id: finance-C-224
when: When constructing asset allocation tree hierarchies in libassetalloc.py
action: Verify that the first operating currency used as root node denomination is valid and available; implement fallback
logic to select an alternative currency (e.g., portfolio base currency, USD) when the first operating currency is empty,
null, or invalid
severity: medium
kind: operational_lesson
modality: should
consequence: Hardcoded first operating currency with no fallback causes silent failures or crashes when the currency list
is empty or the first entry is invalid, leading to missing allocation reports and inability to track portfolio performance
derived_from_bd_id: BD-103
- id: finance-C-225
when: When setting up or configuring operating_currencies for portfolio analysis
action: Validate that operating_currencies is non-empty and first currency is valid before any portfolio analysis; implement
explicit error handling or default fallback (e.g., USD) if empty
severity: high
kind: operational_lesson
modality: must
consequence: Empty operating_currencies list causes silent cascading failures across all downstream calculations (TLH,
cash drag, allocation, minimizegains) that depend on the base currency invariant, with no graceful degradation and no
user-facing error message
derived_from_bd_id: BD-107
- id: finance-C-226
when: When initializing the TLH module with default loss_threshold parameter
action: Verify loss_threshold default value (1 in code vs 0 in example config) matches expected behavior; explicitly set
the value to verify consistent harvesting behavior across initialization paths
severity: medium
kind: operational_lesson
modality: should
consequence: Inconsistent loss_threshold defaults cause context-dependent TLH behavior — code default=1 filters out penny-level
losses while config default=0 harvests all losses, leading to different recommendations depending on initialization
path
derived_from_bd_id: BD-111
output_validator:
assertions:
- id: OV-01
check_predicate: all(p in inspect.getsource(zvt.factors.algorithm.macd) for p in ['slow=26', 'fast=12', 'n=9'])
failure_message: 'FATAL: MACD params drifted from (fast=12, slow=26, n=9) — SL-08 violation, non-reproducible signals'
business_meaning: Standard MACD parameters are a semantic lock; drift makes results incomparable with industry-standard
indicators and non-reproducible.
source_ids:
- SL-08
- BD-036
- id: OV-02
check_predicate: result.get('total_trades', 0) > 0 or result.get('explicit_zero_trade_ack') is True
failure_message: Zero trades executed — likely missing pre-fetched data (see PC-02) or over-restrictive filters
business_meaning: A backtest with zero trades is not a valid result; either data is missing or the strategy never triggered.
Structural non-emptiness check is insufficient — we need business confirmation.
source_ids:
- SL-01
- finance-C-073
- id: OV-03
check_predicate: result.get('annual_return') is None or abs(float(result['annual_return'])) <= 5.0
failure_message: 'FATAL: |annual_return| > 500% — likely look-ahead bias or data error'
business_meaning: Annual returns exceeding 500% are physically implausible for A-share strategies; indicates look-ahead
bias or corrupt data.
source_ids: []
- id: OV-04
check_predicate: result.get('holding_change_pct') is None or abs(float(result['holding_change_pct'])) <= 1.0
failure_message: 'FATAL: |holding_change_pct| > 100% — physically impossible'
business_meaning: Holding change percentage cannot exceed 100%; violation indicates position accounting error.
source_ids:
- BD-029
- id: OV-05
check_predicate: result.get('max_drawdown') is None or abs(float(result['max_drawdown'])) <= 1.0
failure_message: 'FATAL: |max_drawdown| > 100% — impossible for non-leveraged account'
business_meaning: Maximum drawdown cannot exceed 100% without leverage; violation indicates calculation error or look-ahead
bias.
source_ids: []
- id: OV-06
check_predicate: not (hasattr(result, 'trade_log') and result.trade_log and any(result.trade_log[i].action == 'sell' and
i+1 < len(result.trade_log) and result.trade_log[i+1].action == 'buy' and result.trade_log[i].timestamp == result.trade_log[i+1].timestamp
for i in range(len(result.trade_log)-1)))
failure_message: 'FATAL: buy-before-sell detected in same cycle — SL-01 violation, creates implicit leverage'
business_meaning: SL-01 requires sell() before buy() in each cycle; violation means available_long was not updated before
buying, risking duplicate positions.
source_ids:
- SL-01
scaffold:
validate_py_path: '{workspace}/validate.py'
tail_block: "# === DO NOT MODIFY BELOW THIS LINE ===\nif __name__ == \"__main__\":\n result = run_backtest()\n from\
\ validate import enforce_validation\n enforce_validation(result, output_path=\"{workspace}/result.csv\")\n# ===\
\ END DO NOT MODIFY ==="
enforcement_protocol: 1. Never edit validate.py. 2. Never delete the DO NOT MODIFY tail block from the main script. 3. Never
wrap enforce_validation() in try/except. 4. Never rewrite result write logic — it MUST go through enforce_validation.
5. If validate.py raises ImportError, fix the dependency, do not remove the call.
acceptance:
hard_gates:
- id: G1
check: '{workspace}/result.csv exists AND file size > 0'
on_fail: Strategy did not produce output; check run_backtest() return value and enforce_validation() call
- id: G2
check: '{workspace}/result.csv.validation_passed marker file exists'
on_fail: Validation did not complete; review validate.py output and fix assertion failures
- id: G3
check: 'Main script contains literal: from validate import enforce_validation'
on_fail: Validation chain stripped; re-add the import in the DO NOT MODIFY block
- id: G4
check: 'Main script contains literal: # === DO NOT MODIFY BELOW THIS LINE ==='
on_fail: Validation fence removed; regenerate DO NOT MODIFY tail block
- id: G5
check: 'result.csv has at least 1 row: pandas.read_csv(result.csv).shape[0] >= 1'
on_fail: Empty result; check if trade_log is non-empty and factors generated signals. Confirm PC-02 (k-data exists) passed.
- id: G6
check: 'If MACD strategy: source contains ''slow=26'' AND ''fast=12'' AND ''n=9'' in algorithm call'
on_fail: MACD params drifted from SL-08 lock; restore standard (12, 26, 9)
- id: G7
check: 'For data pipeline tasks: result.csv contains ''entity_id'' and ''timestamp'' fields'
on_fail: Missing required columns; check Mixin.query_data return schema and DataFrame MultiIndex reset_index() before
writing
- id: G8
check: 'OV-03 passes: abs(annual_return) <= 5.0 (500%)'
on_fail: Physical plausibility check failed; investigate look-ahead bias or data corruption in input kdata
soft_gates:
- id: SG-01
rubric: 'Strategy narrative consistency: user intent aligns with generated strategy.py logic. dim_a: signal direction
(buy/sell) matches intent [1-5, pass>=4]; dim_b: frequency (daily/intraday) aligns [1-5, pass>=4]; dim_c: risk controls
match user intent [1-5, pass>=4].'
- id: SG-02
rubric: 'Factor combination quality. dim_a: no highly correlated factor duplication [1-5, pass>=4]; dim_b: multi-period
alignment correct [1-5, pass>=4]; dim_c: liquidity filter present for A-share [1-5, pass>=4].'
- id: SG-03
rubric: 'Data source selection appropriateness. dim_a: coverage sufficient for target entities [1-5, pass>=4]; dim_b:
provider latency acceptable for strategy frequency [1-5, pass>=4]; dim_c: no unauthorized provider used without credentials
[1-5, pass>=4].'
skill_crystallization:
trigger: all_hard_gates_passed AND user_opt_out_skill_saving != true
output_path_template: '{workspace}/../skills/{slug}.skill'
slug_template: '{blueprint_id_short}-{uc_id_lower}'
captured_fields:
- name
- intent_keywords
- entry_point_script
- validate_script
- fatal_constraints
- spec_locks
- preconditions
- install_recipes
- human_summary_translated
action: 'After all Hard Gates PASS, resolve slug via slug_template using the executed UC, then write the .skill YAML file
at output_path_template. Notify user in their detected locale: ''Skill saved as {slug}.skill — next time say one of {sample_triggers}
from the matched UC to invoke directly.'''
violation_signal: All hard gates passed but no .skill file exists at expected path
skill_file_schema:
name: finance-bp-078 / Portfolio Management CLI Entry Point
version: v5.3
intent_keywords:
- portfolio management
- CLI
- command line
- tax optimization
- investment analysis
entry_point: run_backtest
fatal_guards:
- SL-01
- SL-02
- SL-03
- SL-04
- SL-05
- SL-06
- SL-07
- SL-08
- SL-10
- SL-11
- SL-12
spec_locks:
- SL-01
- SL-02
- SL-03
- SL-04
- SL-05
- SL-06
- SL-07
- SL-08
- SL-09
- SL-10
- SL-11
- SL-12
preconditions:
- PC-01
- PC-02
- PC-03
- PC-04
post_install_notice:
trigger: skill_installation_complete
message_template:
positioning: I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow.
capability_catalog:
group_strategy:
source: auto_grouped
strategy_reason: auto-grouped by UC.type (3 distinct values, balanced distribution)
groups:
- group_id: live_trading
name: Live Trading
description: ''
emoji: 📦
uc_count: 3
ucs:
- uc_id: UC-101
name: Portfolio Management CLI Entry Point
short_description: Provides a unified command-line interface for portfolio management operations including tax loss
harvesting, asset allocation analysis, cash drag dete
sample_triggers:
- portfolio management
- CLI
- command line
- uc_id: UC-103
name: Tax-Optimized Selling Strategy
short_description: Determines optimal sell order for securities to minimize realized capital gains by analyzing
cost basis and holding periods across multiple lots
sample_triggers:
- minimize gains
- tax-efficient selling
- capital gains optimization
- uc_id: UC-105
name: Tax Loss Harvesting Opportunity Detection
short_description: Identifies securities with unrealized losses that can be sold to harvest tax losses, typically
looking back 30 days to find positions eligible for was
sample_triggers:
- tax loss harvesting
- loss identification
- wash sale
- group_id: data_pipeline
name: Data Pipeline
description: ''
emoji: 📊
uc_count: 1
ucs:
- uc_id: UC-102
name: Related Ticker Grouping Utility
short_description: Identifies and groups equivalent or substitutable securities (e.g., VTI, VTSAX, VTSMX) based
on metadata annotations to support tax lot management and
sample_triggers:
- equivalent tickers
- related securities
- commodity grouping
- group_id: reporting
name: Reporting
description: ''
emoji: 📋
uc_count: 1
ucs:
- uc_id: UC-104
name: Asset Allocation Analysis
short_description: 'Calculates and reports portfolio allocation breakdown by asset type (stocks, bonds, cash, etc.)
with percentage distributions from investment account '
sample_triggers:
- asset allocation
- portfolio breakdown
- asset class distribution
call_to_action: Tell me which one you want to try.
featured_entries:
- uc_id: UC-101
beginner_prompt: Try portfolio management cli entry point
auto_selected: true
- uc_id: UC-102
beginner_prompt: Try related ticker grouping utility
auto_selected: true
- uc_id: UC-103
beginner_prompt: Try tax-optimized selling strategy
auto_selected: true
more_info_hint: Ask me 'what else can you do?' to see all 5 capabilities.
locale_rendering:
instruction: On skill_installation_complete, translate ALL user-facing strings (positioning + capability_catalog.groups[].name
+ capability_catalog.groups[].description + capability_catalog.groups[].ucs[].short_description + call_to_action + featured_entries[].beginner_prompt
+ more_info_hint) into detected user locale per locale_contract. Preserve UC-IDs, group_id, emoji, and sample_triggers
verbatim.
preserve_verbatim:
- UC-IDs
- group_id
- emoji
- sample_triggers
- technical_class_names
enforcement:
action: 'Host agent MUST send composed message to user as the FIRST user-facing response after skill_installation_complete
event. Message MUST contain: positioning, capability_catalog (rendered as markdown tables per group), 3 featured_entries,
call_to_action, and more_info_hint.'
violation_code: PIN-01
violation_signal: First user-facing message post-install does not contain the full capability_catalog (all UCs grouped)
OR skips featured_entries OR skips call_to_action.
human_summary:
persona: Doraemon
what_i_can_do:
tagline: 'I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow. Just tell me
what you want; I''ll write the code, you don''t have to dig docs. (Heads up: ZVT natively supports A-share, HK, and
crypto. US stocks — stockus_nasdaq_AAPL — are half-baked; don''t bother for serious work.)'
use_cases:
- Tax-Optimized Selling Strategy
- Related Ticker Grouping Utility
- Portfolio Management CLI Entry Point
- A-share MACD daily golden-cross backtest with hfq price adjustment from eastmoney
- 'End-to-end ZVT pipeline: FinanceRecorder + GoodCompanyFactor + StockTrader'
- Multi-factor strategy with TargetSelector (AND mode) combining MACD + volume breakout
- Index composition data collection (SZ1000, SZ2000) with EM recorder
what_i_auto_fetch:
- ZVT stage pipeline structure (data_collection → visualization) from LATEST.yaml
- Semantic locks (SL-01 through SL-12) — especially sell-before-buy ordering and MACD params
- Fatal constraints (finance-C-*) relevant to your target strategy type
- 'Default parameters: MACD(12,26,9), hfq adjustment, buy_cost=0.001, base_capital=1M CNY'
- Entity ID format (stock_sh_600000) and DataFrame MultiIndex convention
- Provider-specific recorder class names and required class attributes
what_i_ask_you:
- 'Target market: A-share (default), HK, or crypto? (US stocks in ZVT are half-baked — stockus_nasdaq_AAPL exists but coverage
is thin)'
- 'Data source / provider: eastmoney (free, no account), joinquant (account+paid), baostock (free, good history), akshare,
or qmt (broker)?'
- 'Strategy type: MACD golden-cross, MA crossover, volume breakout, fundamental screen, or custom factor?'
- 'Time range: start_timestamp and end_timestamp for backtest period'
- 'Target entity IDs: specific stocks (stock_sh_600000) or index components (SZ1000)?'
locale_rendering:
instruction: On first user contact, translate all fields above into detected user locale while preserving Doraemon persona
(direct, frank, mildly snarky, knows limits).
preserve_verbatim:
- BD-IDs
- SL-IDs
- UC-IDs
- finance-C-IDs
- class_names
- function_names
- file_paths
- numeric_thresholds
计算投资组合风险指标,包括年化收益率、夏普比率、索提诺比率、最大回撤和卡玛比率,支持滚动窗口统计和 NaN 数据处理,适用于多市场数据。。
---
name: empyrical-risk-metrics
description: |-
计算投资组合风险指标,包括年化收益率、夏普比率、索提诺比率、最大回撤和卡玛比率,支持滚动窗口统计和 NaN 数据处理,适用于多市场数据。。
license: Proprietary. See LICENSE.txt in project root.
compatibility: Designed for Doramagic-host ecosystem (Claude Code / openclaw / Cursor). Requires Python 3.12+ with uv package manager.
metadata:
version: "v6.1"
blueprint_id: "finance-bp-107"
compiled_at: "2026-04-22T13:00:51.147425+00:00"
capability_markets: "multi-market"
capability_activities: "backtesting, factor-research"
sop_version: "crystal-compilation-v6.1"
---
# 投资风险指标 (empyrical-risk-metrics)
> 计算投资组合风险指标,包括年化收益率、夏普比率、索提诺比率、最大回撤和卡玛比率,支持滚动窗口统计和 NaN 数据处理,适用于多市场数据。
## Pipeline
`data_collection -> data_storage -> factor_computation -> target_selection -> trading_execution -> visualization`
## Top Use Cases (3 total)
### Sphinx Documentation Build Configuration (`UC-101`)
Configuring Sphinx to automatically generate API documentation from docstrings and source code comments for the empyrical library
**Triggers**: sphinx configuration, documentation build, autodoc setup
### Documentation Deployment Automation (`UC-102`)
Automating the process of cleaning, building, and deploying Sphinx documentation to a hosting platform for the empyrical project
**Triggers**: documentation deployment, automated deployment, CI/CD documentation
### Advanced Sphinx Documentation Source Setup (`UC-103`)
Configuring advanced Sphinx extensions including autodoc filtering, numpydoc integration, and markdown support for comprehensive documentation generat
**Triggers**: sphinx extensions, numpydoc, autodoc filtering
**Execute trigger**: `When user intent matches intent_router.uc_entries[].positive_terms AND user uses action verb (run/execute/跑/执行/backtest/fetch/collect)`
## What I'll Ask You
- Target market: A-share (default), HK, or crypto? (US stocks in ZVT are half-baked — stockus_nasdaq_AAPL exists but coverage is thin)
- Data source / provider: eastmoney (free, no account), joinquant (account+paid), baostock (free, good history), akshare, or qmt (broker)?
- Strategy type: MACD golden-cross, MA crossover, volume breakout, fundamental screen, or custom factor?
- Time range: start_timestamp and end_timestamp for backtest period
- Target entity IDs: specific stocks (stock_sh_600000) or index components (SZ1000)?
## Semantic Locks (Fatal)
| ID | Rule | On Violation |
|---|---|---|
| `SL-01` | Execute sell orders before buy orders in every trading cycle | halt |
| `SL-02` | Trading signals MUST use next-bar execution (no look-ahead) | halt |
| `SL-03` | Entity IDs MUST follow format entity_type_exchange_code | halt |
| `SL-04` | DataFrame index MUST be MultiIndex (entity_id, timestamp) | halt |
| `SL-05` | TradingSignal MUST have EXACTLY ONE of: position_pct, order_money, order_amount | halt |
| `SL-06` | filter_result column semantics: True=BUY, False=SELL, None/NaN=NO ACTION | halt |
| `SL-07` | Transformer MUST run BEFORE Accumulator in factor pipeline | halt |
| `SL-08` | MACD parameters locked: fast=12, slow=26, signal=9 | halt |
Full lock definitions: [references/LOCKS.md](references/LOCKS.md)
## Top Anti-Patterns (25 total)
- **`AP-ZVT-183`**: 除权因子为 inf/NaN 时直接参与乘法导致复权静默失败
- **`AP-ZVT-179`**: 第三方数据接口超限后异常被吞噬,数据静默缺失
- **`AP-ZVT-183B`**: HFQ(后复权)与 QFQ(前复权)K 线表使用错误导致因子计算漂移
All 25 anti-patterns: [references/ANTI_PATTERNS.md](references/ANTI_PATTERNS.md)
## Evidence Quality Notice
> [QUALITY NOTICE] This crystal was compiled from blueprint finance-bp-107. Evidence verify ratio = 45.3% and audit fail total = 21. Generated results may have uncaptured requirement gaps. Verify critical decisions against source files (LATEST.yaml / LATEST.jsonl).
## Reference Files
| File | Contents | When to Load |
|---|---|---|
| [references/seed.yaml](references/seed.yaml) | V6+ 全量权威 (source-of-truth) | 有行为/决策争议时必读 |
| [references/ANTI_PATTERNS.md](references/ANTI_PATTERNS.md) | 25 条跨项目反模式 | 开始实现前 |
| [references/WISDOM.md](references/WISDOM.md) | 跨项目精华借鉴 | 架构决策时 |
| [references/CONSTRAINTS.md](references/CONSTRAINTS.md) | domain + fatal 约束 | 规则冲突时 |
| [references/USE_CASES.md](references/USE_CASES.md) | 全量 KUC-* 业务场景 | 需要完整示例时 |
| [references/LOCKS.md](references/LOCKS.md) | SL-* + preconditions + hints | 生成回测/交易代码前 |
| [references/COMPONENTS.md](references/COMPONENTS.md) | AST 组件地图(按 module 拆分)| 查 API 时 |
---
*Compiled by Doramagic crystal-compilation-v6.1 from `finance-bp-107` blueprint at 2026-04-22T13:00:51.147425+00:00.*
*See [human_summary.md](human_summary.md) for non-technical overview.*
FILE:human_summary.md
# Human Summary
> I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow. Just tell me what you want; I'll write the code, you don't have to dig docs. (Heads up: ZVT natively supports A-share, HK, and crypto. US stocks — stockus_nasdaq_AAPL — are half-baked; don't bother for serious work.)
## What I Can Do
- **tagline**: I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow. Just tell me what you want; I'll write the code, you don't have to dig docs. (Heads up: ZVT natively supports A-share, HK, and crypto. US stocks — stockus_nasdaq_AAPL — are half-baked; don't bother for serious work.)
- **use_cases**: ['Advanced Sphinx Documentation Source Setup', 'Documentation Deployment Automation', 'Sphinx Documentation Build Configuration', 'A-share MACD daily golden-cross backtest with hfq price adjustment from eastmoney', 'End-to-end ZVT pipeline: FinanceRecorder + GoodCompanyFactor + StockTrader', 'Multi-factor strategy with TargetSelector (AND mode) combining MACD + volume breakout', 'Index composition data collection (SZ1000, SZ2000) with EM recorder']
## What I Ask You
- Target market: A-share (default), HK, or crypto? (US stocks in ZVT are half-baked — stockus_nasdaq_AAPL exists but coverage is thin)
- Data source / provider: eastmoney (free, no account), joinquant (account+paid), baostock (free, good history), akshare, or qmt (broker)?
- Strategy type: MACD golden-cross, MA crossover, volume breakout, fundamental screen, or custom factor?
- Time range: start_timestamp and end_timestamp for backtest period
- Target entity IDs: specific stocks (stock_sh_600000) or index components (SZ1000)?
FILE:references/ANTI_PATTERNS.md
# Anti-Patterns (Cross-Project)
Total: **25**
## qlib (9)
### `AP-QLIB-1930` — 回测结果与模型无关——共享 dataset 对象导致预测值被首次模型覆盖 <sub>(high)</sub>
Qlib 中多个模型复用同一个已 fit 的 DatasetH 实例时,dataset 内部的标准化 参数(fit_start_time/fit_end_time 决定的归一化统计量)在第一次 fit 后固化。 切换模型但不重新初始化 dataset,导致所有模型实际使用同一套预测信号。表现为 无论换 LightGBM/XGBoost/DNN,回测净值曲线完全一致。这是最危险的"实验看起来 在跑,但结论全部无效"反模式。
Source: https://github.com/microsoft/qlib/issues/1930
### `AP-QLIB-2090` — fit_start_time 与 train segment 双重配置引发隐式数据泄露 <sub>(high)</sub>
Qlib DatasetH 有两个"训练数据范围":handler 的 fit_start_time/fit_end_time (决定归一化器拟合范围)和 segments.train(决定模型训练范围)。常见错误是 让 fit_end_time 覆盖 valid/test 段,使归一化统计量(均值、标准差)包含了 未来数据,造成前向偏差(look-ahead bias)。两者独立配置但语义耦合,文档 未明确说明 fit_end_time 必须 <= train_end。
Source: https://github.com/microsoft/qlib/issues/2090
### `AP-QLIB-2036` — MACD 因子公式文档错误——DEA 被多除一次 CLOSE 导致量纲不一致 <sub>(high)</sub>
Qlib 官方文档中的 Alpha 公式示例将 MACD 的 DEA 定义为 EMA(DIF, 9) / CLOSE, 但 DIF 已经是无量纲(除过 CLOSE 的),再次除以 CLOSE 导致 DEA 量纲为 1/price。 基于此文档公式构建的 MACD 因子在截面标准化后与正确公式差异显著,IC 下降。 此类文档层面的公式错误会被大量用户直接照搬入生产因子库。
Source: https://github.com/microsoft/qlib/issues/2036
### `AP-QLIB-2184` — 自定义 A 股数据导入前未按约定填充停牌日 NaN,引发下游因子噪声 <sub>(high)</sub>
Qlib 约定停牌日 open/close/high/low/volume/factor 字段均应填 NaN,以便框架 在因子计算时识别并跳过。用户自建 A 股数据集时若将停牌日保留为上一日价格 (常见于从东财/Wind 直接导出的数据),会导致停牌期间的价格动量因子出现 "假信号"(价格不变但因子非零)。Qlib 不校验此约定,错误静默流入训练数据。
Source: https://github.com/microsoft/qlib/issues/2184
### `AP-QLIB-1892` — PIT(Point-In-Time)财务数据收集器依赖外部股票列表接口,全量 A 股获取不完整 <sub>(high)</sub>
Qlib 的 PIT 数据收集器(财务数据时间点快照)在初始化时调用 get_hs_stock_symbols() 获取沪深股票列表。该函数依赖东财 API,经常仅返回 部分列表而非全量 5000+ 股票,且函数在获取不完整时直接 raise ValueError。 用户若按文档步骤操作,财务数据集将只覆盖部分股票,基于 PIT 财务因子的回测 存在严重生存者偏差(未被采集的股票被隐式排除)。
Source: https://github.com/microsoft/qlib/issues/1892
### `AP-QLIB-2097` — 全市场 instrument="all" 在 32GB 内存机器上 OOM,但 CSI300 正常 <sub>(medium)</sub>
Qlib 在加载 Alpha158 特征时会将指定 universe 的全部特征矩阵一次性载入内存。 使用 instrument="csi300"(300 股)与 instrument="all"(5000+ 股)的内存占用 差约 16 倍。32GB 机器跑全市场时在 init_instance_by_config 阶段直接 OOM, 错误信息不提示内存问题。用户容易误以为是配置错误,实际上需要分批加载或 使用流式特征计算。
Source: https://github.com/microsoft/qlib/issues/2097
### `AP-QLIB-1984` — LightGBM 模型标签维度校验逻辑永远不触发导致多标签训练静默失败 <sub>(medium)</sub>
Qlib gbdt.py 中用 y.values.ndim == 2 判断是否为多标签,但从 DataFrame 取出的 Series 的 ndim 永远为 1,条件永远为 False,因此多标签训练不会走 squeeze 分支,而是直接进入 LightGBM 训练并在更深处抛出语义不明的错误。 用户尝试自定义多标签任务时无法从错误信息定位到此根因。
Source: https://github.com/microsoft/qlib/issues/1984
### `AP-QLIB-1915` — 自定义 CSV 数据 dump_bin 后 DataHandler 报 Length mismatch,D.features 却正常 <sub>(high)</sub>
Qlib 存在两套数据访问路径:D.features(直接读 binary)和 DataHandler/DataHandlerLP (带 processor pipeline)。自定义 A 股 CSV 数据在 dump_bin 时若字段顺序 或 symbol 格式(如 600000.SH vs SH600000)与 Qlib 约定不符,DataHandler 的 processor 在 align/reindex 时触发 Length mismatch,而 D.features 因不 经过 processor 而成功。这一"两套路径行为不一致"让用户误以为数据已正确导入。
Source: https://github.com/microsoft/qlib/issues/1915
### `AP-QLIB-1949` — Colab/Linux 多进程后端与 Qlib ParallelExt 冲突导致 DataHandler 完全不可用 <sub>(medium)</sub>
Qlib 在非 fork 环境(Windows 或 Google Colab)中,DataHandler 使用 joblib 并行加载特征时,ParallelExt 初始化时访问 _backend_args 属性失败(AttributeError)。 根因是 joblib 1.5+ 移除了该内部属性,Qlib 的兼容层未更新。表现为 D.features 调用抛出多层嵌套异常,用户无法从错误栈判断是并行后端问题还是数据问题。
Source: https://github.com/microsoft/qlib/issues/1949
## vnpy (4)
### `AP-VNPY-3691` — K 线生成器首根 K 线时间戳不对齐,导致第一个周期信号错误 <sub>(high)</sub>
vnpy BarGenerator 在合成 N 分钟 K 线时,第一根推送的 K 线时间戳为"当前 tick 所在分钟"而非"完整 N 分钟周期结束时间"。具体表现:09:59 的 tick 会 触发一根不完整的 5 分钟 K 线推送(本应等到 10:04 才推送)。策略若在 on_bar 中直接用 datetime.minute % 5 过滤,第一根 K 线恰好通过,但包含的 数据不足一个完整周期,用于信号计算会产生错误的开仓信号。
Source: https://github.com/vnpy/vnpy/issues/3691
### `AP-VNPY-3669` — Alpha 模块历史数据增量保存时新旧 DataFrame schema 不兼容导致 SchemaError <sub>(medium)</sub>
vnpy Alpha 模块在保存 K 线数据到 Parquet 文件时,将新下载数据(可能含 Float64 列)与已存文件(历史 Int64 列)直接 polars.concat。polars 强类型 不允许隐式类型提升,抛出 SchemaError。根因是不同数据源/版本返回的字段类型 不一致(如 volume 在部分行情源为整数,在另一些为浮点),且 concat 前无 schema 对齐步骤。影响所有使用 vnpy alpha 进行回测的历史数据构建流程。
Source: https://github.com/vnpy/vnpy/issues/3669
### `AP-VNPY-3685` — 价差交易模块 run_backtesting() 在 Jupyter 环境下静默报错,结果不可信 <sub>(high)</sub>
vnpy 4.10 价差交易(SpreadTrading)模块的 run_backtesting() 在 Jupyter 环境下存在事件循环冲突(asyncio already running),导致回测引擎部分逻辑 不执行但不抛异常,返回看似正常的回测统计数据。同样代码在命令行 Python 中无此问题。vnpy 4.x 将部分 IO 改为 async 但 Jupyter 的事件循环与之不兼容, 是"回测结果看起来正确但实际不完整"的隐蔽陷阱。
Source: https://github.com/vnpy/vnpy/issues/3685
### `AP-VNPY-3700` — 安装脚本不使用 venv 导致全局 numpy 版本被降级破坏其他依赖 <sub>(medium)</sub>
vnpy install.bat 直接在系统/conda base 环境安装,会强制降级 numpy 到 <2.0 以满足 vnpy 依赖,破坏依赖 numpy 2.x 的其他量化工具(如 scipy、pytorch 新版)。 没有 requirements.txt,依赖边界不透明。在多工具共存的量化研究环境中, vnpy 的安装脚本是"全局环境污染"的常见根源。
Source: https://github.com/vnpy/vnpy/issues/3700
## zipline (6)
### `AP-ZIPLINE-138` — 回测价格为未复权价,教程图表误导用户误判策略收益 <sub>(high)</sub>
Zipline 教程使用 AAPL 股价图做演示,但 bundle 中存储的是未复权价格(raw price), 而非经过拆股/分红调整的复权价。图表显示的历史价格与市场实际价约差 4 倍(Apple 历次拆股累计因子),用户误将"价格翻 4 倍"当作策略收益。A 股场景更严重: 除权前后价格跳变会在未复权数据中形成巨大"信号",吸引技术指标在除权日产生 虚假突破信号。
Source: https://github.com/stefan-jansen/zipline-reloaded/issues/138
### `AP-ZIPLINE-235` — 默认以当根 K 线收盘价成交,低估实盘滑点,策略回测收益虚高 <sub>(high)</sub>
Zipline 默认滑点模型在当根 K 线触发信号后,以同根 K 线收盘价成交(current bar close fill)。实盘中信号只能在下一根 K 线的开盘价附近成交(T+1 order execution)。以 A 股日线为例,用收盘价回测比用次日开盘价成交平均高估日收益 约 0.1-0.3%,年化差距可超 30%。需显式配置 slippage model 为 VolumeShareSlippage 或 FixedSlippage 并设合理 volume_limit。
Source: https://github.com/stefan-jansen/zipline-reloaded/issues/235
### `AP-ZIPLINE-190` — 日历 start_session 设为非交易日触发 DateOutOfBounds,无提示如何修正 <sub>(medium)</sub>
Zipline 在注册 bundle 或运行算法时,若 start_session 参数恰好是非交易日 (如 1998-01-01 元旦),Calendar 校验抛出 DateOutOfBounds("cannot be earlier than the first session")。错误信息仅显示交易日历起始日,不提示"请改为第一个 交易日"。A 股场景:使用 SSE/SZSE 日历时,若 start_date 恰好是春节前最后 一天次日(节假日),会触发同类错误,调试成本极高。
Source: https://github.com/stefan-jansen/zipline-reloaded/issues/190
### `AP-ZIPLINE-181` — asset db 过期后 Pipeline 报"no assets traded",误导用户排查数据范围 <sub>(high)</sub>
Zipline 的 asset database(SQLite)记录每只股票的 start/end 交易日期。若 使用了旧版 Quandl/自建 bundle 且未重新 ingest,在回测新日期范围时 Pipeline 抛出 "Failed to find any assets with country_code 'US' that traded between [dates]"。A 股场景:重新下载行情后若只更新价格数据而未重建 asset db,退市/ 新上市股票的日期范围不更新,Pipeline 过滤会悄悄排除这些股票,产生生存者偏差。
Source: https://github.com/stefan-jansen/zipline-reloaded/issues/181
### `AP-ZIPLINE-285` — week_start()/week_end() 在自定义日历(非美股)下静默失效 <sub>(medium)</sub>
Zipline schedule_function 的 date_rules.week_start() 和 date_rules.week_end() 依赖交易日历的周首/周末判断逻辑,但在非美股日历(如 ASX、SSE)中,该逻辑 与 NYSE 日历的偏移计算不兼容,导致 schedule 永远不触发或在错误的日期触发。 A 股场景:使用 SSE 日历时,含春节等连续长假的周,week_start 可能跳过整个 假期周而不调仓,但用户无法从日志发现未触发的调度。
Source: https://github.com/stefan-jansen/zipline-reloaded/issues/285
### `AP-ZIPLINE-240` — 回测日期时区必须为 UTC,传入 naive datetime 引发深层 AssertionError <sub>(medium)</sub>
Zipline 内部强制要求所有时间戳为 UTC aware datetime。当用户传入 naive datetime (无时区信息,如 pd.Timestamp('2020-01-01'))时,不在入口处报错,而是在 算法执行深处触发 AssertionError: Algorithm should have a utc datetime,栈深 难以定位。A 股开发者从本地 CST 时间导入数据时极易触发此陷阱,需在 bundle 注册时显式 tz_localize('UTC')。
Source: https://github.com/stefan-jansen/zipline-reloaded/issues/240
## zvt (6)
### `AP-ZVT-183` — 除权因子为 inf/NaN 时直接参与乘法导致复权静默失败 <sub>(high)</sub>
ZVT 在计算前复权因子时以 new/old 价格比计算 qfq_factor。当 old==0(新股首日 或数据缺失)时因子为 inf;当 kdata.open 本身为 None(停牌日未填充)时乘法 抛出 TypeError。结果:整个 entity 的复权计算中断,后续 K 线全部丢失,但主 流程只 log ERROR 不中断,用户往往不知道已有大量股票数据损坏。
Source: https://github.com/zvtvz/zvt/issues/183
### `AP-ZVT-179` — 第三方数据接口超限后异常被吞噬,数据静默缺失 <sub>(high)</sub>
ZVT 使用聚宽 jqdatasdk 批量拉取全市场 K 线时(4000+ 股票),触发聚宽每日 最大查询条数限制(错误:已超过每日最大查询数量)。ZVT 捕获异常后继续执行下一 entity,导致超限后所有股票的当日数据均静默缺失。回测若使用该残缺数据库,因 子计算结果将产生系统性偏差,且无告警。
Source: https://github.com/zvtvz/zvt/issues/179
### `AP-ZVT-161` — 全市场 SQLite 批量因子计算触发 too many SQL variables 错误 <sub>(medium)</sub>
ZVT 在计算 VolumeUpMaFactor 等多股因子时,将所有 entity_id 拼入单条 SQL 的 IN 子句。当 A 股全市场(5000+ 股)一次性查询时,触发 SQLite 默认限制 SQLITE_MAX_VARIABLE_NUMBER=999。调大 max_allowed_packet(MySQL 参数)无效, 根因是 SQLite 变量数上限。正确解法是分批查询,但 ZVT 早期版本未处理此边界。
Source: https://github.com/zvtvz/zvt/issues/161
### `AP-ZVT-129` — 使用通配符导入隐藏 API 版本变更,AdjustType 等枚举莫名消失 <sub>(medium)</sub>
ZVT 文档示例使用 `from zvt import *` 导入所有符号。当 ZVT 版本升级重构 枚举(如将 AdjustType 移入子模块)后,通配符导入不再包含该符号,触发 AttributeError。使用者误以为是安装问题,实际是版本间 API breaking change 未在 CHANGELOG 中标注,且通配符导入掩盖了具体来源。应显式 import 枚举类。
Source: https://github.com/zvtvz/zvt/issues/129
### `AP-ZVT-187` — 回测引擎未在数据层空结果时提前终止,导致空指针级联崩溃 <sub>(medium)</sub>
ZVT Trader 在 load_data 完成后检查数据为空时,不提前退出,而是将空 DataFrame 传入 selector 计算,触发后续 NoneType 操作链式崩溃。错误栈深且难以定位根因, 用户误以为是策略逻辑问题。根因是数据时间窗口配置错误(start/end 不在数据 库覆盖范围内)但无有效校验。
Source: https://github.com/zvtvz/zvt/issues/187
### `AP-ZVT-183B` — HFQ(后复权)与 QFQ(前复权)K 线表使用错误导致因子计算漂移 <sub>(high)</sub>
ZVT 提供 Stock1dKdata(不复权)、Stock1dHfqKdata(后复权)、Stock1dQfqKdata (前复权)三张独立表。用户在计算价格动量/均线因子时混用两张表(如用不复权 做均线,用后复权做收益率),导致除权日前后因子值产生跳变。ZVT 不做跨表 复权类型一致性校验,混用静默通过。
Source: https://github.com/zvtvz/zvt/issues/183
FILE:references/COMPONENTS.md
# Component Capability Map
**Project**: finance-bp-107--empyrical-reloaded
**Scan date**: 2026-04-22
**Stats**: {'total_files': 6, 'total_classes': 35, 'total_functions': 0, 'total_stages': 6}
## Modules (6)
- [data_ingestion_&_utilities](components/data_ingestion_-_utilities.md): 8 classes
- [return_computation](components/return_computation.md): 5 classes
- [risk_metrics](components/risk_metrics.md): 6 classes
- [performance_metrics](components/performance_metrics.md): 6 classes
- [factor_analysis](components/factor_analysis.md): 7 classes
- [performance_attribution](components/performance_attribution.md): 3 classes
FILE:references/CONSTRAINTS.md
# Constraints
## preservation_manifest
```yaml
required_objects:
business_decisions_count: 114
fatal_constraints_count: 30
non_fatal_constraints_count: 163
use_cases_count: 3
semantic_locks_count: 12
preconditions_count: 4
evidence_quality_rules_count: 2
traceback_scenarios_count: 5
```
## Domain Constraints Injected (39)
- **`SHARED-BT-LAB-001`** <sub>(fatal)</sub>: 未来函数(Lookahead Bias):在模拟历史时间点 t 的交易决策时, 不得使用 t 时刻之后才能知道的信息。最常见形式: (1) 使用收盘价计算信号并同日以收盘价成交; (2) 将 T 日收盘后计算的指标标记在同一根 K 线; (3) 使用当日最高/最低价作为成交假设。 信号计算与成交时间必须对齐:T 日收盘后计算信号,T+1 日开盘成交。
- **`SHARED-BT-LAB-002`** <sub>(high)</sub>: 指标预热期(Warmup Period)处理:滚动窗口指标在前 N 个 bar 时 NaN, 这些 bar 不应参与信号计算和持仓决策。强制要求指标的 warmup_period 与最长 lookback 期等长,且 warmup 期间持仓应置零。
- **`SHARED-BT-LAB-003`** <sub>(fatal)</sub>: ML/DL 模型时序数据划分必须按时间顺序:TRAIN < VALID < TEST, 不可使用随机 k-fold 分折(会将未来数据混入训练集)。 应使用 TimeSeriesSplit 或 Walk-Forward 验证。
- **`SHARED-BT-LAB-004`** <sub>(fatal)</sub>: 开盘价/最高价/最低价成交假设:日线回测中假设每日可以最高价卖出或 最低价买入(如动量策略"最高价止盈"),这是明显的 lookahead, 因为日内最高/最低价只有收盘后才能确认。成交价只能用开盘价或 前一日收盘价(带滑点)。
- **`SHARED-BT-LAB-005`** <sub>(high)</sub>: 数据对齐偏移(Off-by-one):pandas rolling/shift 等操作容易引入细微的 1 期偏移错误。应在代码中明确记录每个序列的"观测时间点", 并通过 assert 验证关键时间对齐关系。
- **`SHARED-BT-LAB-006`** <sub>(high)</sub>: 过度优化(Overfitting):回测数量越多,过拟合概率越高。 Bailey et al.(2014)证明 Optimal Sharpe Ratio 期望值随回测次数单调递减。 应使用 Walk-Forward 验证代替 in-sample 参数穷举,并报告 Deflated Sharpe Ratio(DSR)而非峰值 Sharpe。
- **`SHARED-BT-SURV-001`** <sub>(fatal)</sub>: 幸存者偏差(Survivorship Bias):使用当前市场成分股作为历史回测股票池, 会遗漏曾经存在但后来退市、摘牌或被合并的股票,系统性高估策略历史收益率。 回测股票池必须使用历史时点快照(point-in-time universe)。
- **`SHARED-BT-SURV-002`** <sub>(high)</sub>: In-Sample / Out-of-Sample 划分:策略开发、参数选择必须在样本内完成, 样本外数据仅用于最终验证,不可多次"看"样本外数据后继续调优 (会将样本外变为新的样本内,重蹈过拟合)。
- **`SHARED-BT-SURV-003`** <sub>(high)</sub>: 停牌/缺失数据的填充策略:停牌日价格不可简单用前一日收盘价 forward-fill, 因为这会在复盘时造成"零成交量"日参与了因子计算和信号生成。 应在因子计算层显式过滤缺失交易日,不填充。
- **`SHARED-BT-SURV-004`** <sub>(high)</sub>: 异常值(Extreme Value)污染:原始市场数据可能含有数据源错误(如除权未 及时调整、手工录入错误导致的极端价格),不清洗直接进入因子计算会产生 极端信号,污染整个横截面。应在 pipeline 入口处过滤 3-sigma 异常值。
- **`SHARED-BT-COST-001`** <sub>(fatal)</sub>: 交易成本(佣金 + 印花税/转让税 + 过户费)必须在回测初始化时强制配置, 不可使用零成本默认值。忽略成本的回测策略绩效指标具有欺骗性, 高换手率策略尤其严重(单边往返成本往往吞噬 50%+ 的毛收益)。
- **`SHARED-BT-COST-002`** <sub>(high)</sub>: 滑点(Slippage)建模:回测若无滑点,假设每笔订单以理想价格成交, 高频策略在实盘中会因成交价劣化而产生严重亏损。至少应配置固定点差 或比例滑点;大单应使用成交量比例模型(如不超过日成交量 5%)。
- **`SHARED-BT-COST-003`** <sub>(high)</sub>: 换手率(Turnover)必须在回测绩效报告中展示并与成本关联分析。 月换手率超过 50%(年化 600%+)时,策略净收益对成本假设极度敏感, 每 10bps 成本变化可能改变策略盈亏结论,必须做成本敏感性分析。
- **`SHARED-BT-COST-004`** <sub>(medium)</sub>: 仓位规模化(Position Sizing)必须纳入资金量约束:回测应模拟固定资金量 下的实际持仓股数(取整),而非假设可以持有小数股。 对小盘股,最小交易单位(A股:100股/手)会导致实际可持仓量与目标权重 产生偏差,应在回测中模拟取整效应。
- **`SHARED-BT-TIME-001`** <sub>(high)</sub>: 时间戳时区统一:多数据源合并时,UTC vs 本地时间混用是常见数据腐败源。 所有时间戳必须在 pipeline 入口处统一转换为同一时区(推荐 UTC 存储, 市场本地时区展示),不可在 pipeline 中途混用不同时区。
- **`SHARED-BT-TIME-002`** <sub>(high)</sub>: 交易日历对齐:合并不同市场或不同频率数据时(如日线价格 + 周频因子), 必须使用明确的交易日历进行 reindex/merge,不可使用 outer join 后 fillna, 否则会在非交易日(节假日)创建虚假数据行。
- **`SHARED-BT-TIME-003`** <sub>(high)</sub>: 增量更新边界校验:历史数据增量更新时,必须从数据库查询已存最新日期, 仅下载该日期之后的数据。若重新下载已有数据并追加,会产生时间戳重复行, 导致回测时序错误。更新前后必须校验无重复 (index.duplicated().any() == False)。
- **`SHARED-BT-TIME-004`** <sub>(medium)</sub>: 回测绩效归因失真:基准(Benchmark)选择不当会使 Alpha/Beta 计算失真。 应选用策略实际可投资的被动基准(如 HS300 ETF),而非不可直接投资的 价格指数(如 HS300 指数)。价格指数不含股息再投资,会低估持仓基准收益。
- **`SHARED-BT-PERF-001`** <sub>(medium)</sub>: 最大回撤(Max Drawdown)计算必须使用净值序列(portfolio value), 不可用累计收益率序列代替。若使用对数收益率累加,会低估回撤深度 (因对数收益率在下跌时会比简单收益率偏小)。
- **`SHARED-BT-PERF-002`** <sub>(medium)</sub>: Sharpe Ratio 年化化约定:年化 Sharpe = 日 Sharpe × sqrt(252)(股票,252 交易日) 或 × sqrt(365)(加密货币,365日)。不同系统默认不同,跨系统对比前必须 确认年化因子,否则 Sharpe 不可比。
- **`SHARED-BT-PERF-003`** <sub>(medium)</sub>: Calmar Ratio / Sortino Ratio 优于 Sharpe Ratio 作为风险调整收益指标: Sharpe 假设收益正态分布,A 股/加密市场的收益分布显著左偏(肥尾), 会低估下行风险。量化评估应同时报告 Sortino(仅下行波动)和 Calmar(年化收益/最大回撤),不应单一依赖 Sharpe。
- **`SHARED-BT-PERF-004`** <sub>(medium)</sub>: 回测绩效归因应拆解为:alpha(主动收益)、beta(市场收益)、 因子暴露收益(style/sector)和特异性收益(stock selection)。 不做归因的回测无法区分"策略优秀"与"顺风行情恰好 beta 对了"。
- **`SHARED-FR-IC-001`** <sub>(high)</sub>: IC(信息系数)是衡量因子预测能力的核心指标,定义为因子值与 下期收益率的 Spearman 秩相关系数(ICIR = IC / std(IC))。 IC 绝对值 > 0.05 视为有预测能力的初步证据,ICIR > 0.5 视为稳定。 不计算 IC 直接报告回测绩效是因子有效性证明缺失的典型问题。
- **`SHARED-FR-IC-002`** <sub>(high)</sub>: IC 衰减(IC Decay)分析:因子预测能力通常随持仓期增长而衰减。 应计算 1/5/10/20 日 IC 序列,识别因子的最优持仓期。 IC 在1日高但20日迅速衰减的因子是短期因子,不适合月度换仓策略; 反之亦然。使用错误的持仓期会严重损害因子实盘表现。
- **`SHARED-FR-IC-003`** <sub>(high)</sub>: Harvey, Liu & Zhu (2016) 警告:学术界已发现 300+ 个"显著"因子, 其中大量是多重检验下的误发现(False Discovery)。因子有效性要求: t-stat > 3.0(而非传统的 1.96);或在不同时段/市场独立复现; 或有清晰的经济学逻辑。不满足上述条件的因子极可能是数据挖掘产物。
- **`SHARED-FR-IC-004`** <sub>(high)</sub>: 因子换手率(Factor Turnover)控制:高 IC 但高换手率的因子,在扣除 交易成本后净 IC 可能为负。应计算换手率调整后的有效 IC: net_IC = IC - turnover × cost_per_turn。目标换手率 ≤ 50%(月频)。
- **`SHARED-FR-IC-005`** <sub>(medium)</sub>: 因子衰减期(Half-life)是因子信号强度的核心参数,直接决定最优再平衡频率。 半衰期 < 5 日:日频或周频换仓;5-20 日:周频或双周;> 20 日:月频换仓。 错误地对短期因子使用月频换仓,会导致大量 alpha 在持仓期内消散。
- **`SHARED-FR-NEUT-001`** <sub>(high)</sub>: 行业中性化(Industry Neutralization):因子值若不对行业均值中性化, 因子收益中会混入行业轮动收益,难以判断是因子本身还是行业暴露驱动了收益。 行业中性化操作:factor_neutral = factor - industry_mean(factor)。
- **`SHARED-FR-NEUT-002`** <sub>(high)</sub>: 市值中性化(Market Cap Neutralization):小盘股效应(小盘跑赢大盘) 是金融史上最持久的 anomaly 之一,会污染几乎所有未中性化的因子。 若因子与市值高度相关,选股会系统性偏向小盘,收益来自市值暴露而非因子本身。 需同时进行行业和市值中性化(Fama-MacBeth 回归或残差法)。
- **`SHARED-FR-NEUT-003`** <sub>(high)</sub>: 异常值处理(Winsorize/MAD):因子原始值通常含有极端值,极端值会扭曲 分组分析(如 Q1/Q10 十分位)。应对原始因子值做 Winsorize(截尾至 [1%, 99%] 或 3-sigma)或 MAD(中位数绝对偏差)缩尾,然后再排名/中性化。
- **`SHARED-FR-NEUT-004`** <sub>(medium)</sub>: 因子正交化(Factor Orthogonalization):当多个因子共同用于合成打分时, 高相关因子的合成等效于对单一因子过度权重,稀释信号多样性。 应在合成前对因子做施密特正交化或 PCA,消除因子间的多重共线性。
- **`SHARED-FR-NEUT-005`** <sub>(medium)</sub>: 缺失数据填充策略:因子计算中的 NaN(停牌/新股/数据缺口)若用截面均值填充 会引入 lookahead bias(均值本身含未来信息);若完全删除会产生幸存者偏差; 正确做法是用截面中位数(当日所有股票的中位数,不依赖未来)或将该股当日排除。
- **`SHARED-FR-PORT-001`** <sub>(high)</sub>: 分层分析(Quantile Analysis):因子评估应使用 Q1/Q5(五分位)或 Q1/Q10(十分位)分组的多空收益差(top minus bottom spread)作为 主要评估指标,而非简单的多头收益。Q1 多 Q5 空的"单调性"检验是 因子有效性的核心证据:单调递增/递减 > 非单调 >> 仅多头有效。
- **`SHARED-FR-PORT-002`** <sub>(medium)</sub>: Alpha 衰减测试(Alpha Decay Test):因子的月度 IC 在不同时段(牛市/熊市/ 震荡市)的稳定性是因子鲁棒性的重要证据。IC 仅在某个特定市场状态下有效 的因子不适合全天候部署;应分段(rolling 12M)展示 IC 时序, 识别因子失效期。
- **`SHARED-FR-PORT-003`** <sub>(medium)</sub>: 换仓成本感知(Turnover-Aware Selection):因子排名靠近中间地带(49-51 分位) 的股票,排名小幅波动就会触发换仓,产生大量无效交易成本。 应在选股时设置换仓缓冲区(buffer zone):只在排名变化超过阈值时才换仓。
- **`SHARED-FR-PORT-004`** <sub>(medium)</sub>: 分组收益的统计显著性(Bootstrap 检验):因子分层收益差(Q1-Q5 spread) 即使在历史数据上很大,也可能是偶然,需要 bootstrap 或 t-test 检验 显著性(p-value < 0.05)。小样本回测期(< 3年)的分层收益尤其不可靠。
- **`SHARED-FR-XFER-001`** <sub>(high)</sub>: 因子跨市场可移植性验证:在一个市场有效的因子,不必然在另一个市场有效。 将美股因子直接套用 A 股、或将股票因子套用期货/加密货币,需要独立 IC 验证, 不可假设跨市场通用性。A 股特有异象(如反转效应、ST 价格异常)不存在于美股。
- **`SHARED-FR-XFER-002`** <sub>(medium)</sub>: 因子有效性时间稳定性:曾经有效的因子会因市场学习和套利行为逐渐失效 (McLean & Pontiff 2016 证明因子发表后平均衰减 58%)。 应定期(每季度/年)重新评估因子 IC,失效因子应及时替换或降权。
- **`SHARED-FR-XFER-003`** <sub>(medium)</sub>: 因子与宏观经济环境的交互:利率周期/经济周期/市场情绪对因子有效性影响显著。 价值因子(低 P/B)在利率上升期更有效;动量因子在趋势市更有效,震荡市失效。 部署因子前应评估当前宏观环境与因子最优生存环境的匹配度。
FILE:references/LOCKS.md
# Semantic Locks + Preconditions
## Semantic Locks (12)
### `SL-01` <sub>(on_violation: fatal)</sub>
Execute sell orders before buy orders in every trading cycle
### `SL-02` <sub>(on_violation: fatal)</sub>
Trading signals MUST use next-bar execution (no look-ahead)
### `SL-03` <sub>(on_violation: fatal)</sub>
Entity IDs MUST follow format entity_type_exchange_code
### `SL-04` <sub>(on_violation: fatal)</sub>
DataFrame index MUST be MultiIndex (entity_id, timestamp)
### `SL-05` <sub>(on_violation: fatal)</sub>
TradingSignal MUST have EXACTLY ONE of: position_pct, order_money, order_amount
### `SL-06` <sub>(on_violation: fatal)</sub>
filter_result column semantics: True=BUY, False=SELL, None/NaN=NO ACTION
### `SL-07` <sub>(on_violation: fatal)</sub>
Transformer MUST run BEFORE Accumulator in factor pipeline
### `SL-08` <sub>(on_violation: fatal)</sub>
MACD parameters locked: fast=12, slow=26, signal=9
### `SL-09` <sub>(on_violation: warning)</sub>
Default transaction costs: buy_cost=0.001, sell_cost=0.001, slippage=0.001
### `SL-10` <sub>(on_violation: fatal)</sub>
A-share equity trading is T+1 (no same-day close of buy positions)
### `SL-11` <sub>(on_violation: fatal)</sub>
Recorder subclass MUST define provider AND data_schema class attributes
### `SL-12` <sub>(on_violation: fatal)</sub>
Factor result_df MUST contain either 'filter_result' OR 'score_result' column
## Preconditions (4)
- **`PC-01`**: `python3 -c 'import zvt; print(zvt.__version__)'` → on_fail: Run: python3 -m pip install zvt then re-run: python3 -m zvt.init_dirs to initialize data directories
- **`PC-02`**: `python3 -c "from zvt.api.kdata import get_kdata; df = get_kdata(entity_ids=['stock_sh_600000'], limit=1); assert df is not None and len(df) > 0, 'No kdata found'"` → on_fail: Run recorder first: python3 -m zvt.recorders.em.em_stock_kdata_recorder --entity_ids stock_sh_600000 (replace with your target entity IDs)
- **`PC-03`**: `python3 -c "import os; from pathlib import Path; zvt_home = Path(os.environ.get('ZVT_HOME', Path.home() / '.zvt')); assert zvt_home.exists(), f'ZVT home not found: {zvt_home}'"` → on_fail: Run: python3 -m zvt.init_dirs
- **`PC-04`**: `python3 -c "import os, tempfile; from pathlib import Path; zvt_home = Path(os.environ.get('ZVT_HOME', Path.home() / '.zvt')); test_f = zvt_home / '.write_test'; test_f.touch(); test_f.unlink()"` → on_fail: Check directory permissions: chmod u+w ~/.zvt or set ZVT_HOME environment variable to a writable location
FILE:references/USE_CASES.md
# Known Use Cases (KUC)
Total: **3**
## `KUC-101`
**Source**: `docs/conf.py`
Configuring Sphinx to automatically generate API documentation from docstrings and source code comments for the empyrical library
## `KUC-102`
**Source**: `docs/deploy.py`
Automating the process of cleaning, building, and deploying Sphinx documentation to a hosting platform for the empyrical project
## `KUC-103`
**Source**: `docs/source/conf.py`
Configuring advanced Sphinx extensions including autodoc filtering, numpydoc integration, and markdown support for comprehensive documentation generation
FILE:references/WISDOM.md
# Cross-Project Wisdom
Total: **10**
## `CW-BT-001` — Cerebro 统一编排引擎
**From**: backtrader · **Applicable to**: backtesting
backtrader 用 Cerebro 作为单一入口,统一管理 data feeds、strategies、analyzers、 observers 的生命周期,支持一次 cerebro.run() 跑多策略+多数据源。 zvt 的 StockTrader 目前每次实例化只绑定一套因子,缺乏统一的多策略组合编排层; 借鉴 Cerebro 模式可让用户把多个 Trader 实例组合到一个 runner 中对比评估。
## `CW-BT-002` — Analyzer 插件化绩效评估
**From**: backtrader · **Applicable to**: backtesting
backtrader 提供 SharpeRatio、DrawDown、TimeReturn、TradeAnalyzer 等即插即用 的 Analyzer,可在不修改策略代码的情况下附加任意绩效指标。 zvt 当前绩效评估能力较弱,没有标准化的 Analyzer 接口; 借鉴此模式可让用户 cerebro.addanalyzer(SharpeRatio) 即得风险调整收益报告。
## `CW-BT-003` — Sizer 仓位管理分离
**From**: backtrader · **Applicable to**: backtesting
backtrader 将仓位管理(每次开仓买多少股/多大比例)单独抽象为 Sizer, 与信号逻辑完全解耦;内置 FixedSize、PercentSizer 等,用户可自定义。 zvt 目前没有显式的 Sizer 概念,仓位控制逻辑散落在 Trader.on_profit_control 等钩子中; 引入 Sizer 接口可使策略信号与资金管理规则独立演化和组合复用。
## `CW-BT-004` — Order 类型全集(Limit/Stop/OCO/Bracket)
**From**: backtrader · **Applicable to**: backtesting
backtrader 支持 Market、Limit、Stop、StopLimit、OCO(二选一)、 Bracket(止盈止损一对订单)等丰富订单类型,并模拟成交滑点和手续费方案。 zvt 回测目前主要支持市价成交,缺乏限价委托和组合订单模拟; 对于高频或实盘对接场景,完善订单类型将大幅提升回测真实性。
## `CW-BT-005` — 数据重采样与重播(Resampling & Replaying)
**From**: backtrader · **Applicable to**: backtesting
backtrader 可将低级别数据(如 1 min)实时 resample 为高级别(如 1 day)并同步驱动策略, 或 replay 逐 tick 模拟 OHLC 形成过程,实现日内精细回测。 zvt 目前多时间框架通过预录入不同级别 K 线实现,缺少运行时动态重采样; 借鉴此模式可在不重复录入数据的前提下支持任意时间粒度组合回测。
## `CW-VN-003` — CTA 回测引擎内置可视化
**From**: vnpy · **Applicable to**: backtesting
vnpy 的 cta_backtester 提供图形界面直接展示策略净值曲线、最大回撤、 每日盈亏、成交明细,无需 Jupyter Notebook。 zvt 目前回测结果可视化依赖 draw_result 方法调用 Plotly,但无统一的回测报告页面; 借鉴此模式可打包一个开箱即用的策略绩效仪表盘。
## `CW-VN-004` — vnpy.alpha ML 因子研究实验室(Lab)
**From**: vnpy · **Applicable to**: factor-research
vnpy 4.0 的 vnpy.alpha.lab 提供数据管理、模型训练、信号生成、策略回测一体化工作流, 支持 Lasso/LightGBM/MLP 等算法的标准化训练接口和可视化对比。 zvt 的 ML 能力目前仅有 MaStockMLMachine 一个入口,缺乏规范化 Lab 框架; 借鉴 Lab 模式可建立"特征工程→训练→信号→回测"的标准流水线,降低 ML 实验门槛。
## `CW-QL-001` — Point-in-Time 数据库(防未来数据泄漏)
**From**: qlib · **Applicable to**: backtesting
qlib 的 Point-in-Time Provider 保证在给定时间点 t 的查询只返回 t 时刻 真实可知的数据(财报发布延迟、修订历史均被正确处理), 彻底消除回测中的 look-ahead bias。 zvt 目前财务数据以报告期为 timestamp,缺少"发布日"维度, 存在用未来财报数据做选股的潜在偏差;引入 PIT 模式可大幅提升回测可信度。
## `CW-QL-002` — Recorder + Experiment 实验管理(MLflow 风格)
**From**: qlib · **Applicable to**: factor-research
qlib 的 workflow 模块提供 Experiment/Recorder,自动记录每次模型训练的 超参数、特征、指标、预测结果,支持跨实验比较和模型版本管理。 zvt 目前缺乏 ML 实验追踪机制,每次重跑结果会覆盖前次; 借鉴 Recorder 模式可将每次因子实验的参数和结果持久化,支持快速复现和版本对比。
## `CW-QL-003` — Nested Decision Framework(多层嵌套决策执行)
**From**: qlib · **Applicable to**: backtesting
qlib 支持将高频执行层(分钟级委托拆单)嵌套在低频决策层(日级组合调仓)中, 两层独立优化且可组合运行,实现日内最优执行算法(如 TWAP、VWAP 调仓)。 zvt 目前回测仅有日线级别的成交假设,缺乏执行算法建模; 借鉴嵌套框架可让 zvt 区分"何时持有哪些股"与"如何以最小冲击成本建仓"两个问题。
FILE:references/components/data_ingestion_-_utilities.md
# data_ingestion_&_utilities (8 classes)
## `roll`
`data_ingestion_&_utilities/roll.py:0`
## `_create_unary_vectorized_roll_function`
`data_ingestion_&_utilities/create-unary-vectorized-roll-function.py:0`
## `_create_binary_vectorized_roll_function`
`data_ingestion_&_utilities/create-binary-vectorized-roll-function.py:0`
## `_aligned_series`
`data_ingestion_&_utilities/aligned-series.py:0`
## `get_fama_french`
`data_ingestion_&_utilities/get-fama-french.py:0`
## `up`
`data_ingestion_&_utilities/up.py:0`
## `down`
`data_ingestion_&_utilities/down.py:0`
## `nan_aggregation`
`data_ingestion_&_utilities/nan-aggregation.py:0`
FILE:references/components/factor_analysis.md
# factor_analysis (7 classes)
## `alpha_beta`
`factor_analysis/alpha-beta.py:0`
## `alpha_aligned`
`factor_analysis/alpha-aligned.py:0`
## `beta_aligned`
`factor_analysis/beta-aligned.py:0`
## `capture`
`factor_analysis/capture.py:0`
## `up_capture`
`factor_analysis/up-capture.py:0`
## `down_capture`
`factor_analysis/down-capture.py:0`
## `beta_fragility_heuristic`
`factor_analysis/beta-fragility-heuristic.py:0`
FILE:references/components/performance_attribution.md
# performance_attribution (3 classes)
## `perf_attrib`
`performance_attribution/perf-attrib.py:0`
## `compute_exposures`
`performance_attribution/compute-exposures.py:0`
## `attribution_model`
`performance_attribution/attribution-model.py:0`
FILE:references/components/performance_metrics.md
# performance_metrics (6 classes)
## `sharpe_ratio`
`performance_metrics/sharpe-ratio.py:0`
## `sortino_ratio`
`performance_metrics/sortino-ratio.py:0`
## `omega_ratio`
`performance_metrics/omega-ratio.py:0`
## `calmar_ratio`
`performance_metrics/calmar-ratio.py:0`
## `annual_return`
`performance_metrics/annual-return.py:0`
## `risk_free_rate`
`performance_metrics/risk-free-rate.py:0`
FILE:references/components/return_computation.md
# return_computation (5 classes)
## `simple_returns`
`return_computation/simple-returns.py:0`
## `cum_returns`
`return_computation/cum-returns.py:0`
## `cum_returns_final`
`return_computation/cum-returns-final.py:0`
## `aggregate_returns`
`return_computation/aggregate-returns.py:0`
## `annualization_factor`
`return_computation/annualization-factor.py:0`
FILE:references/components/risk_metrics.md
# risk_metrics (6 classes)
## `max_drawdown`
`risk_metrics/max-drawdown.py:0`
## `drawdown_series`
`risk_metrics/drawdown-series.py:0`
## `annual_volatility`
`risk_metrics/annual-volatility.py:0`
## `downside_risk`
`risk_metrics/downside-risk.py:0`
## `tail_ratio`
`risk_metrics/tail-ratio.py:0`
## `tail_distribution_model`
`risk_metrics/tail-distribution-model.py:0`
从 SEC EDGAR 批量抓取上市公司年报(10-K)和季报(10-Q)文件,支持按季度增量更新与本地缓存,适用于美股基本面分析和量化研究数据获取。。
---
name: edgar-crawler
description: |-
从 SEC EDGAR 批量抓取上市公司年报(10-K)和季报(10-Q)文件,支持按季度增量更新与本地缓存,适用于美股基本面分析和量化研究数据获取。。
license: Proprietary. See LICENSE.txt in project root.
compatibility: Designed for Doramagic-host ecosystem (Claude Code / openclaw / Cursor). Requires Python 3.12+ with uv package manager.
metadata:
version: "v6.1"
blueprint_id: "finance-bp-114"
compiled_at: "2026-04-22T13:00:54.950360+00:00"
capability_markets: "multi-market"
capability_activities: "data-sourcing"
sop_version: "crystal-compilation-v6.1"
---
# EDGAR 文件抓取 (edgar-crawler)
> 从 SEC EDGAR 批量抓取上市公司年报(10-K)和季报(10-Q)文件,支持按季度增量更新与本地缓存,适用于美股基本面分析和量化研究数据获取。
## Pipeline
`data_collection -> data_storage -> factor_computation -> target_selection -> trading_execution -> visualization`
## Top Use Cases (1 total)
### SEC EDGAR Filing Extraction (`UC-101`)
Extracts and processes SEC EDGAR filings (10-K annual reports, 10-Q quarterly reports) from compressed ZIP archives for downstream financial analysis
**Triggers**: EDGAR, SEC filings, 10-K extraction
**Execute trigger**: `When user intent matches intent_router.uc_entries[].positive_terms AND user uses action verb (run/execute/跑/执行/backtest/fetch/collect)`
## What I'll Ask You
- Target market: A-share (default), HK, or crypto? (US stocks in ZVT are half-baked — stockus_nasdaq_AAPL exists but coverage is thin)
- Data source / provider: eastmoney (free, no account), joinquant (account+paid), baostock (free, good history), akshare, or qmt (broker)?
- Strategy type: MACD golden-cross, MA crossover, volume breakout, fundamental screen, or custom factor?
- Time range: start_timestamp and end_timestamp for backtest period
- Target entity IDs: specific stocks (stock_sh_600000) or index components (SZ1000)?
## Semantic Locks (Fatal)
| ID | Rule | On Violation |
|---|---|---|
| `SL-01` | Execute sell orders before buy orders in every trading cycle | halt |
| `SL-02` | Trading signals MUST use next-bar execution (no look-ahead) | halt |
| `SL-03` | Entity IDs MUST follow format entity_type_exchange_code | halt |
| `SL-04` | DataFrame index MUST be MultiIndex (entity_id, timestamp) | halt |
| `SL-05` | TradingSignal MUST have EXACTLY ONE of: position_pct, order_money, order_amount | halt |
| `SL-06` | filter_result column semantics: True=BUY, False=SELL, None/NaN=NO ACTION | halt |
| `SL-07` | Transformer MUST run BEFORE Accumulator in factor pipeline | halt |
| `SL-08` | MACD parameters locked: fast=12, slow=26, signal=9 | halt |
Full lock definitions: [references/LOCKS.md](references/LOCKS.md)
## Top Anti-Patterns (14 total)
- **`AP-DATA-SOURCING-001`**: Missing or invalid User-Agent headers for SEC API requests
- **`AP-DATA-SOURCING-002`**: Ignoring external API rate limits causing IP blocking
- **`AP-DATA-SOURCING-003`**: No HTTP timeout configuration causing indefinite hangs
All 14 anti-patterns: [references/ANTI_PATTERNS.md](references/ANTI_PATTERNS.md)
## Evidence Quality Notice
> [QUALITY NOTICE] This crystal was compiled from blueprint finance-bp-114. Evidence verify ratio = 32.9% and audit fail total = 29. Generated results may have uncaptured requirement gaps. Verify critical decisions against source files (LATEST.yaml / LATEST.jsonl).
## Reference Files
| File | Contents | When to Load |
|---|---|---|
| [references/seed.yaml](references/seed.yaml) | V6+ 全量权威 (source-of-truth) | 有行为/决策争议时必读 |
| [references/ANTI_PATTERNS.md](references/ANTI_PATTERNS.md) | 14 条跨项目反模式 | 开始实现前 |
| [references/WISDOM.md](references/WISDOM.md) | 跨项目精华借鉴 | 架构决策时 |
| [references/CONSTRAINTS.md](references/CONSTRAINTS.md) | domain + fatal 约束 | 规则冲突时 |
| [references/USE_CASES.md](references/USE_CASES.md) | 全量 KUC-* 业务场景 | 需要完整示例时 |
| [references/LOCKS.md](references/LOCKS.md) | SL-* + preconditions + hints | 生成回测/交易代码前 |
| [references/COMPONENTS.md](references/COMPONENTS.md) | AST 组件地图(按 module 拆分)| 查 API 时 |
---
*Compiled by Doramagic crystal-compilation-v6.1 from `finance-bp-114` blueprint at 2026-04-22T13:00:54.950360+00:00.*
*See [human_summary.md](human_summary.md) for non-technical overview.*
FILE:human_summary.md
# Human Summary
> I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow. Just tell me what you want; I'll write the code, you don't have to dig docs. (Heads up: ZVT natively supports A-share, HK, and crypto. US stocks — stockus_nasdaq_AAPL — are half-baked; don't bother for serious work.)
## What I Can Do
- **tagline**: I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow. Just tell me what you want; I'll write the code, you don't have to dig docs. (Heads up: ZVT natively supports A-share, HK, and crypto. US stocks — stockus_nasdaq_AAPL — are half-baked; don't bother for serious work.)
- **use_cases**: ['SEC EDGAR Filing Extraction', 'A-share MACD daily golden-cross backtest with hfq price adjustment from eastmoney', 'End-to-end ZVT pipeline: FinanceRecorder + GoodCompanyFactor + StockTrader', 'Multi-factor strategy with TargetSelector (AND mode) combining MACD + volume breakout', 'Index composition data collection (SZ1000, SZ2000) with EM recorder', 'Institutional fund holdings tracker via joinquant_fund_runner pattern', 'Custom Transformer + Accumulator factor with per-entity rolling state']
## What I Ask You
- Target market: A-share (default), HK, or crypto? (US stocks in ZVT are half-baked — stockus_nasdaq_AAPL exists but coverage is thin)
- Data source / provider: eastmoney (free, no account), joinquant (account+paid), baostock (free, good history), akshare, or qmt (broker)?
- Strategy type: MACD golden-cross, MA crossover, volume breakout, fundamental screen, or custom factor?
- Time range: start_timestamp and end_timestamp for backtest period
- Target entity IDs: specific stocks (stock_sh_600000) or index components (SZ1000)?
FILE:references/ANTI_PATTERNS.md
# Anti-Patterns (Cross-Project)
Total: **14**
## finance-bp-070--edgartools (2)
### `AP-DATA-SOURCING-004` — Invalidating XBRL period types for balance sheet analysis <sub>(high)</sub>
Balance sheets represent point-in-time snapshots (instant periods), not ranges (duration periods). Using duration periods for balance sheet statements causes stockholder equity and other line items to show nonsensical date ranges, corrupting financial calculations that depend on accurate period associations.
### `AP-DATA-SOURCING-012` — Large document parsing without streaming causing OOM errors <sub>(high)</sub>
SEC filings can exceed 160MB, and parsing large documents in memory without streaming causes OOM errors that crash the entire service for all users. Documents exceeding 10MB require switching to streaming parsers to prevent extreme memory usage.
## finance-bp-070--edgartools, finance-bp-079--akshare, finance-bp-084--eastmoney, finance-bp-114--edgar-crawler (1)
### `AP-DATA-SOURCING-002` — Ignoring external API rate limits causing IP blocking <sub>(high)</sub>
Multiple financial data sources (SEC EDGAR, Sina, Eastmoney, TuShare) enforce strict rate limits (10 req/sec, 120 calls/minute). Exceeding these triggers temporary IP blocks lasting 10-60 minutes, causing complete data unavailability. Immediate retry attempts during blocks extend the block duration significantly.
## finance-bp-070--edgartools, finance-bp-114--edgar-crawler (1)
### `AP-DATA-SOURCING-001` — Missing or invalid User-Agent headers for SEC API requests <sub>(high)</sub>
SEC EDGAR requires valid User-Agent identity with contact information in headers. Without this, requests are rejected with 403 Forbidden errors, completely blocking all filing access. Both edgartools and edgar-crawler enforce this constraint as fundamental to any data retrieval operation.
## finance-bp-079--akshare (4)
### `AP-DATA-SOURCING-003` — No HTTP timeout configuration causing indefinite hangs <sub>(high)</sub>
HTTP requests to external financial data sources (Yahoo, Sina, Eastmoney) without timeout values can hang indefinitely on blocked connections. This freezes the entire application and prevents data collection from all other sources, creating cascading failures across the system.
### `AP-DATA-SOURCING-005` — Malformed or empty JSON responses causing silent failures <sub>(medium)</sub>
Financial API responses containing malformed JSON raise unhandled ValueError exceptions, crashing downstream processing. Similarly, empty JSON responses (empty dict, list, null) masquerading as valid data cause silent failures producing empty DataFrames or misleading results in financial analysis.
### `AP-DATA-SOURCING-006` — Source-specific symbol mapping errors causing data corruption <sub>(high)</sub>
Stock symbols require source-specific formatting (sh/sz prefixes for Sina, numeric codes for THS, etc.). Incorrect symbol mapping causes API calls to return empty results or wrong data, corrupting financial datasets with missing records or entirely incorrect tickers being stored.
### `AP-DATA-SOURCING-013` — Column mapping length mismatch causing DataFrame errors <sub>(medium)</sub>
Column mapping constants with length mismatch against actual API response columns cause ValueError exceptions during DataFrame construction. Raw field names (f1, f2, f12) must be mapped to meaningful names (最新价, 涨跌幅) with exact column count alignment.
## finance-bp-103--ArcticDB (3)
### `AP-DATA-SOURCING-007` — Using unsupported DataFrame types with time-series storage <sub>(high)</sub>
ArcticDB does not support MultiIndex columns, PyArrow-backed pandas DataFrames, or timedelta64 columns. Attempting to write these DataFrame types raises ArcticDbNotYetImplemented exceptions, causing write failures and permanent data loss if not properly handled before storage operations.
### `AP-DATA-SOURCING-008` — Non-atomic storage writes causing concurrent access corruption <sub>(high)</sub>
Storage backends without atomic write_if_none operations can cause data corruption under concurrent multi-writer access. Similarly, updating reference keys before atom keys complete allows readers to access incomplete or missing data, breaking version chain integrity.
### `AP-DATA-SOURCING-014` — Pruning snapshot-protected versions breaking point-in-time recovery <sub>(high)</sub>
Deleting or pruning versions that are referenced by existing snapshots breaks historical data access. Snapshots provide point-in-time recovery capabilities, and removing their referenced versions causes read failures when users attempt to access data from specific snapshots.
## finance-bp-114--edgar-crawler (1)
### `AP-DATA-SOURCING-010` — 8-K filing item numbering scheme mismatch for historical filings <sub>(medium)</sub>
8-K filings use obsolete item numbering (1-12) before 2004-08-23 and new numbering (1.01-9.01) after. Using the wrong numbering scheme causes no matches for historical filings, resulting in empty item sections and complete extraction failure for pre-2004 data.
## finance-bp-128--yfinance (2)
### `AP-DATA-SOURCING-009` — Missing timezone-aware DatetimeIndex causing DST offset errors <sub>(high)</sub>
Price history DataFrames returned without timezone-aware DatetimeIndex cause incorrect timestamp interpretation when combined with other timezone-aware data. This leads to 23-25 hour offset errors during daylight saving time transitions, corrupting historical price calculations.
### `AP-DATA-SOURCING-011` — Yahoo Finance missing crumb authentication causing 401/403 errors <sub>(high)</sub>
Yahoo Finance API requires crumb and cookie authentication with every request. Without proper crumb management, API calls return 401 Unauthorized or HTML error pages instead of JSON data, breaking all downstream price and financial data processing.
FILE:references/COMPONENTS.md
# Component Capability Map
**Project**: finance-bp-114--edgar-crawler
**Scan date**: 2026-04-22
**Stats**: {'total_files': 4, 'total_classes': 16, 'total_functions': 0, 'total_stages': 4}
## Modules (4)
- [index_download_stage](components/index_download_stage.md): 4 classes
- [crawl_and_download_stage](components/crawl_and_download_stage.md): 3 classes
- [document_parsing_stage](components/document_parsing_stage.md): 8 classes
- [logging_infrastructure](components/logging_infrastructure.md): 1 classes
FILE:references/CONSTRAINTS.md
# Constraints
## preservation_manifest
```yaml
required_objects:
business_decisions_count: 92
fatal_constraints_count: 31
non_fatal_constraints_count: 139
use_cases_count: 1
semantic_locks_count: 12
preconditions_count: 4
evidence_quality_rules_count: 2
traceback_scenarios_count: 5
```
## Domain Constraints Injected (16)
- **`SHARED-DS-RL-001`** <sub>(fatal)</sub>: Rate Limit + 指数退避重试:所有外部数据 API 调用必须实施速率限制控制 和指数退避重试(Exponential Backoff with Jitter)。收到 429/503 响应后 立即重试是反模式,会加剧服务端压力并触发 IP 封禁。 最大重试次数 3-5 次,退避基数 1-2 秒,最大退避 60 秒。
- **`SHARED-DS-RL-002`** <sub>(high)</sub>: 批量 API 调用必须控制并发数(max_workers),不可无限制并行。 免费 API(akshare/tushare 免费版)通常限制为 1-3 并发; 付费 API 也有并发上限(tushare 积分制,不同积分对应不同并发)。 超出并发限制会触发 429 或 IP 封禁。推荐使用 asyncio.Semaphore 或 ThreadPoolExecutor 的 max_workers 参数显式控制。
- **`SHARED-DS-RL-003`** <sub>(high)</sub>: API Token / 凭证安全:数据源 API key(tushare token / akshare 无需 token 但 其他商业数据源需要)不可硬编码在代码中,必须通过环境变量或配置文件读取。 硬编码 token 提交到 Git 会导致 token 泄露和费用损失。
- **`SHARED-DS-RL-004`** <sub>(medium)</sub>: 请求节流(Throttling):对同一 API 的批量请求应在请求间插入最小间隔 (akshare 部分接口要求 ≥ 0.5s;tushare 免费版每分钟 200 次)。 纯代码 sleep 不如令牌桶(Token Bucket)算法精确,推荐使用 ratelimit 或 slowapi 等成熟库。
- **`SHARED-DS-MISS-001`** <sub>(high)</sub>: 停牌日数据缺失策略:停牌股票在停牌期间无成交数据,数据库中会出现日期缺口。 缺失日期不可使用 forward-fill(会产生虚假成交量); 应在数据库中以 is_suspended=True 标记,量和成交额填 0,价格保留前一日收盘价。 因子计算时必须过滤 is_suspended=True 的行。
- **`SHARED-DS-MISS-002`** <sub>(medium)</sub>: 新上市股票的历史数据边界:新股上市首日开始在数据库中出现,但其上市前 无历史数据。若因子计算的 lookback 期超过上市天数,会产生所有 NaN 因子值。 采集时应记录每只股票的上市日期(list_date),采集逻辑应以上市日期为起点, 不以固定开始日期。
- **`SHARED-DS-MISS-003`** <sub>(high)</sub>: 退市股票的数据完整性:已退市股票在主流数据源(akshare/tushare)中依然 可以查询历史数据(退市前的历史),但退市日期后无数据。 历史股票池构建时必须包含已退市股票(否则幸存者偏差), 且采集时需明确处理退市日截止边界。
- **`SHARED-DS-MISS-004`** <sub>(high)</sub>: 多数据源数据对账(Cross-Source Reconciliation):同一数据(如收盘价) 从不同数据源(akshare/tushare/baostock)获取可能存在细微差异 (不同复权方式/不同节假日处理/除息调整时间不同)。 应在 pipeline 中实施多源对账检查,差异超阈值(如 0.1%)时记录告警并人工确认。
- **`SHARED-DS-TIME-001`** <sub>(high)</sub>: 时间戳精度与类型一致性:数据库中时间戳应使用统一的数据类型 (timestamp 而非 varchar/int)。混用字符串日期('2024-01-15')和 Timestamp 对象是比较、索引、merge 出现细微 bug 的常见来源, 应在 pipeline 入口处强制转换。
- **`SHARED-DS-TIME-002`** <sub>(high)</sub>: 交易时间与自然时间的区分:日线数据的"日期"通常对应交易日(T日), 而新闻/公告数据的"时间"是自然时间。合并两类数据时,必须将自然时间 映射到下一个可用交易日(next available trading day), 否则会产生"公告在T日,但T日盘中已经可用"的 lookahead 问题。
- **`SHARED-DS-TIME-003`** <sub>(medium)</sub>: 夏令时(DST)处理:采集美股/欧洲股市数据时,夏令时切换日(3月/11月) 会导致同一 HH:MM 时刻对应不同的 UTC 时间,若未处理,当日时序数据 会出现1小时的漂移。应始终以 UTC 存储,展示时按市场本地时区转换。
- **`SHARED-DS-INCR-001`** <sub>(high)</sub>: 增量更新幂等性:数据更新脚本必须是幂等的(多次运行结果相同)。 若脚本因网络中断在中途失败,重新运行时不应产生重复数据或数据缺口。 实现方式:先写入临时表,校验后 UPSERT 到主表,不直接 INSERT/APPEND。
- **`SHARED-DS-INCR-002`** <sub>(high)</sub>: 数据完整性检验(数据校验和/行数检查):每次数据更新后, 应对关键字段做完整性检验:行数是否在预期范围内、价格是否为正数、 日期是否连续(无缺失交易日)。缺少自动校验的数据管道是"沉默腐烂"的根源。
- **`SHARED-DS-INCR-003`** <sub>(medium)</sub>: 数据版本化:数据管道的输出数据应版本化管理(data versioning)。 当数据源更新了历史数据(如修订调整后的财务数据), 旧版本数据应保留可追溯,不应静默覆盖,以便对比版本间差异及复现历史回测。
- **`SHARED-DS-INCR-004`** <sub>(medium)</sub>: 数据对齐到交易日历边界:采集完成后,应验证所有股票/资产的数据覆盖 完整性与交易日历的一致性。每只股票在每个交易日都应有一行数据 (停牌标记,不是缺失)。通过 pivot_table 检查 NaN 比例是有效的快速诊断手段。
- **`SHARED-DS-INCR-005`** <sub>(medium)</sub>: 缓存策略(Caching):频繁读取的静态/低频更新数据(如股票信息、行业分类、 指数成分股)应本地缓存,避免每次运行重复 API 调用。 缓存必须设置过期时间(TTL),防止使用过期的行业分类或已失效的成分股信息。
FILE:references/LOCKS.md
# Semantic Locks + Preconditions
## Semantic Locks (12)
### `SL-01` <sub>(on_violation: fatal)</sub>
Execute sell orders before buy orders in every trading cycle
### `SL-02` <sub>(on_violation: fatal)</sub>
Trading signals MUST use next-bar execution (no look-ahead)
### `SL-03` <sub>(on_violation: fatal)</sub>
Entity IDs MUST follow format entity_type_exchange_code
### `SL-04` <sub>(on_violation: fatal)</sub>
DataFrame index MUST be MultiIndex (entity_id, timestamp)
### `SL-05` <sub>(on_violation: fatal)</sub>
TradingSignal MUST have EXACTLY ONE of: position_pct, order_money, order_amount
### `SL-06` <sub>(on_violation: fatal)</sub>
filter_result column semantics: True=BUY, False=SELL, None/NaN=NO ACTION
### `SL-07` <sub>(on_violation: fatal)</sub>
Transformer MUST run BEFORE Accumulator in factor pipeline
### `SL-08` <sub>(on_violation: fatal)</sub>
MACD parameters locked: fast=12, slow=26, signal=9
### `SL-09` <sub>(on_violation: warning)</sub>
Default transaction costs: buy_cost=0.001, sell_cost=0.001, slippage=0.001
### `SL-10` <sub>(on_violation: fatal)</sub>
A-share equity trading is T+1 (no same-day close of buy positions)
### `SL-11` <sub>(on_violation: fatal)</sub>
Recorder subclass MUST define provider AND data_schema class attributes
### `SL-12` <sub>(on_violation: fatal)</sub>
Factor result_df MUST contain either 'filter_result' OR 'score_result' column
## Preconditions (4)
- **`PC-01`**: `python3 -c 'import zvt; print(zvt.__version__)'` → on_fail: Run: python3 -m pip install zvt then re-run: python3 -m zvt.init_dirs to initialize data directories
- **`PC-02`**: `python3 -c "from zvt.api.kdata import get_kdata; df = get_kdata(entity_ids=['stock_sh_600000'], limit=1); assert df is not None and len(df) > 0, 'No kdata found'"` → on_fail: Run recorder first: python3 -m zvt.recorders.em.em_stock_kdata_recorder --entity_ids stock_sh_600000 (replace with your target entity IDs)
- **`PC-03`**: `python3 -c "import os; from pathlib import Path; zvt_home = Path(os.environ.get('ZVT_HOME', Path.home() / '.zvt')); assert zvt_home.exists(), f'ZVT home not found: {zvt_home}'"` → on_fail: Run: python3 -m zvt.init_dirs
- **`PC-04`**: `python3 -c "import os, tempfile; from pathlib import Path; zvt_home = Path(os.environ.get('ZVT_HOME', Path.home() / '.zvt')); test_f = zvt_home / '.write_test'; test_f.touch(); test_f.unlink()"` → on_fail: Check directory permissions: chmod u+w ~/.zvt or set ZVT_HOME environment variable to a writable location
FILE:references/USE_CASES.md
# Known Use Cases (KUC)
Total: **1**
## `KUC-101`
**Source**: `tests/test_extract_items.py`
Extracts and processes SEC EDGAR filings (10-K annual reports, 10-Q quarterly reports) from compressed ZIP archives for downstream financial analysis and document processing workflows.
FILE:references/WISDOM.md
# Cross-Project Wisdom
Total: **8**
## `CW-DATA-SOURCING-001` — Exponential backoff retry with rate limit detection
**From**: finance-bp-079--akshare, finance-bp-114--edgar-crawler · **Applicable to**: data-sourcing
Implement retry logic with exponential backoff specifically for HTTP 429 rate limit responses. Retrying immediately on rate limit errors worsens the block situation. Separate retry logic for transient network errors (TimeoutError, ConnectionError) from permanent errors (ValueError, KeyError) prevents resource waste and masks underlying bugs.
## `CW-DATA-SOURCING-002` — Strict date format validation and standardization
**From**: finance-bp-070--edgartools, finance-bp-079--akshare, finance-bp-084--eastmoney · **Applicable to**: data-sourcing
Validate date formats strictly (YYYY-MM-DD pattern with leap year and month-end checks) before processing XBRL or API data. Convert date strings between formats (YYYYMMDD to YYYY-MM-DD) when storing to databases. Invalid dates corrupt downstream financial calculations.
## `CW-DATA-SOURCING-003` — XBRL fact attribute completeness enforcement
**From**: finance-bp-070--edgartools, finance-bp-114--edgar-crawler · **Applicable to**: data-sourcing
Extract and validate all essential XBRL fact attributes (concept, value, period, unit) from every fact. Missing attributes cause financial analysis queries to return incomplete or misleading results. Period type (instant vs duration) must be correctly distinguished for accurate balance sheet rendering.
## `CW-DATA-SOURCING-004` — Streaming parser threshold for large documents
**From**: finance-bp-070--edgartools, finance-bp-128--yfinance · **Applicable to**: data-sourcing
Implement streaming parser activation when documents exceed configurable thresholds (10MB default). This prevents OOM errors on large NPORT-P filings or bulk document downloads. Also require timezone information for time-series data to prevent DST offset corruption.
## `CW-DATA-SOURCING-005` — Data accuracy disclaimer requirements
**From**: finance-bp-079--akshare, finance-bp-128--yfinance, finance-bp-097--OpenBB · **Applicable to**: data-sourcing
Always present scraped or third-party financial data with proper caveats about accuracy limitations and delays. Claims of guaranteed accuracy, real-time capabilities, or Yahoo/provider affiliation violate terms of service and can lead to user financial losses from reliance on delayed or incorrect data.
## `CW-DATA-SOURCING-006` — Atomic write ordering for versioned storage
**From**: finance-bp-103--ArcticDB · **Applicable to**: data-sourcing
Write atom keys (TABLE_DATA, TABLE_INDEX, VERSION) before updating mutable reference keys (VERSION_REF, SNAPSHOT_REF). Never modify atom keys after writing to preserve content-addressed storage invariants. This prevents readers from accessing incomplete data in multi-writer scenarios.
## `CW-DATA-SOURCING-007` — HTTP status code validation before data processing
**From**: finance-bp-079--akshare, finance-bp-097--OpenBB · **Applicable to**: data-sourcing
Always validate HTTP response status codes before processing response data. Error responses (404, 500) may contain HTML error pages that corrupt downstream JSON parsing. Explicitly check for HTTP 429 and raise RateLimitError for proper handling by callers.
## `CW-DATA-SOURCING-008` — Quality gates for financial recommendations
**From**: finance-bp-084--eastmoney · **Applicable to**: data-sourcing
Apply fundamental quality filters (ROE thresholds, OCF/Profit ratios, debt ratios) before generating financial recommendations. Without quality gates, low-quality stocks may be recommended for positions, leading to investment losses. Separate on-demand computation from scheduled pre-computation to handle API rate limits.
FILE:references/components/crawl_and_download_stage.md
# crawl_and_download_stage (3 classes)
## `crawl`
`crawl_and_download_stage/crawl.py:0`
## `download`
`crawl_and_download_stage/download.py:0`
## `iXBRL URL handling`
`crawl_and_download_stage/ixbrl-url-handling.py:0`
FILE:references/components/document_parsing_stage.md
# document_parsing_stage (8 classes)
## `ExtractItems.extract`
`document_parsing_stage/extractitems-extract.py:0`
## `HtmlStripper.feed`
`document_parsing_stage/htmlstripper-feed.py:0`
## `determine_items_to_extract`
`document_parsing_stage/determine-items-to-extract.py:0`
## `parse_item`
`document_parsing_stage/parse-item.py:0`
## `get_10q_parts`
`document_parsing_stage/get-10q-parts.py:0`
## `remove_tables`
`document_parsing_stage/remove-tables.py:0`
## `items_to_extract`
`document_parsing_stage/items-to-extract.py:0`
## `skip_extracted_filings`
`document_parsing_stage/skip-extracted-filings.py:0`
FILE:references/components/index_download_stage.md
# index_download_stage (4 classes)
## `download_indices`
`index_download_stage/download-indices.py:0`
## `get_specific_indices`
`index_download_stage/get-specific-indices.py:0`
## `requests_retry_session`
`index_download_stage/requests-retry-session.py:0`
## `user_agent`
`index_download_stage/user-agent.py:0`
FILE:references/components/logging_infrastructure.md
# logging_infrastructure (1 classes)
## `Logger.__init__`
`logging_infrastructure/logger-init.py:0`
FILE:references/seed.yaml
meta:
id: finance-bp-114-v5.3
version: v6.1
blueprint_id: finance-bp-114
sop_version: crystal-compilation-v6.1
source_language: en
compiled_at: '2026-04-22T13:00:54.950360+00:00'
target_host: openclaw
authoritative_artifact:
primary: seed.yaml
non_authoritative_derivatives:
- SKILL.md (host-generated summary, may lag)
- HEARTBEAT.md (host telemetry)
- memory/*.md (host conversational memory)
rule: On any behavioral decision (preconditions check, OV assertion, EQ rule firing, spec_lock verification), agents MUST
re-read seed.yaml. Derivatives are for UI display only and may be out-of-date.
execution_protocol:
install_trigger:
- Execute resources.host_adapter.install_recipes[] in declared order
- Verify each package with import check before proceeding
execute_trigger: When user intent matches intent_router.uc_entries[].positive_terms AND user uses action verb (run/execute/跑/执行/backtest/fetch/collect)
on_execute:
- Reload seed.yaml (do not rely on SKILL.md or cached summaries)
- Run preconditions[] in declared order; halt on first fatal failure with on_fail message to user
- Enter context_state_machine.CA1_MEMORY_CHECKED state
- Evaluate evidence_quality.enforcement_rules[]; prepend user_disclosure_template
- Translate user_facing_fields to user locale per locale_contract
- "[V6 READING ORDER]\nThis crystal contains the following V6 layers. Before answering any business question, the host\
\ MUST read them in order:\n 1. anti_patterns[] — cross-project anti-patterns (with AP-* ids)\n 2. cross_project_wisdom[]\
\ — cross-project wisdom (with CW-* ids)\n 3. domain_constraints_injected[] — domain constraints (SHARED-* ids)\n \
\ 4. known_use_cases[] — concrete business scenarios (KUC-* ids)\n 5. component_capability_map — AST component map\
\ (by module)\n\nWhen answering user questions, proactively cite relevant AP-*/CW-*/SHARED-*/KUC-* ids with source text.\
\ Examples: T+1 rules -> cite SHARED-* constraint; model comparison -> warn via AP-*; follow-holdings strategy -> cite\
\ KUC-* with example file."
workspace_resolution:
scripts_path: '{host_workspace}/scripts/'
skills_path: '{host_workspace}/skills/'
trace_path: '{host_workspace}/.trace/'
capability_tags:
markets:
- multi-market
activities:
- data-sourcing
upgraded_from: finance-bp-114-v1.seed.yaml
upgraded_at: '2026-04-22T13:20:30.751233+00:00'
v6_inputs:
ast_mind_map: knowledge/sources/finance/finance-bp-114--edgar-crawler/v6_inputs/ast_mind_map.yaml
anti_patterns: null
cross_project_wisdom: null
examples_kuc: knowledge/sources/finance/finance-bp-114--edgar-crawler/v6_inputs/examples_kuc.yaml
shared_pools_dir: knowledge/sources/finance/_shared
anti_patterns:
- id: AP-DATA-SOURCING-001
title: Missing or invalid User-Agent headers for SEC API requests
description: SEC EDGAR requires valid User-Agent identity with contact information in headers. Without this, requests are
rejected with 403 Forbidden errors, completely blocking all filing access. Both edgartools and edgar-crawler enforce this
constraint as fundamental to any data retrieval operation.
project_source: finance-bp-070--edgartools, finance-bp-114--edgar-crawler
severity: high
applicable_to_tags:
markets:
- multi-market
activities:
- data-sourcing
_source_file: anti-patterns/data-sourcing.yaml
- id: AP-DATA-SOURCING-002
title: Ignoring external API rate limits causing IP blocking
description: Multiple financial data sources (SEC EDGAR, Sina, Eastmoney, TuShare) enforce strict rate limits (10 req/sec,
120 calls/minute). Exceeding these triggers temporary IP blocks lasting 10-60 minutes, causing complete data unavailability.
Immediate retry attempts during blocks extend the block duration significantly.
project_source: finance-bp-070--edgartools, finance-bp-079--akshare, finance-bp-084--eastmoney, finance-bp-114--edgar-crawler
severity: high
applicable_to_tags:
markets:
- multi-market
activities:
- data-sourcing
_source_file: anti-patterns/data-sourcing.yaml
- id: AP-DATA-SOURCING-003
title: No HTTP timeout configuration causing indefinite hangs
description: HTTP requests to external financial data sources (Yahoo, Sina, Eastmoney) without timeout values can hang indefinitely
on blocked connections. This freezes the entire application and prevents data collection from all other sources, creating
cascading failures across the system.
project_source: finance-bp-079--akshare
severity: high
applicable_to_tags:
markets:
- multi-market
activities:
- data-sourcing
_source_file: anti-patterns/data-sourcing.yaml
- id: AP-DATA-SOURCING-004
title: Invalidating XBRL period types for balance sheet analysis
description: Balance sheets represent point-in-time snapshots (instant periods), not ranges (duration periods). Using duration
periods for balance sheet statements causes stockholder equity and other line items to show nonsensical date ranges, corrupting
financial calculations that depend on accurate period associations.
project_source: finance-bp-070--edgartools
severity: high
applicable_to_tags:
markets:
- multi-market
activities:
- data-sourcing
_source_file: anti-patterns/data-sourcing.yaml
- id: AP-DATA-SOURCING-005
title: Malformed or empty JSON responses causing silent failures
description: Financial API responses containing malformed JSON raise unhandled ValueError exceptions, crashing downstream
processing. Similarly, empty JSON responses (empty dict, list, null) masquerading as valid data cause silent failures
producing empty DataFrames or misleading results in financial analysis.
project_source: finance-bp-079--akshare
severity: medium
applicable_to_tags:
markets:
- multi-market
activities:
- data-sourcing
_source_file: anti-patterns/data-sourcing.yaml
- id: AP-DATA-SOURCING-006
title: Source-specific symbol mapping errors causing data corruption
description: Stock symbols require source-specific formatting (sh/sz prefixes for Sina, numeric codes for THS, etc.). Incorrect
symbol mapping causes API calls to return empty results or wrong data, corrupting financial datasets with missing records
or entirely incorrect tickers being stored.
project_source: finance-bp-079--akshare
severity: high
applicable_to_tags:
markets:
- multi-market
activities:
- data-sourcing
_source_file: anti-patterns/data-sourcing.yaml
- id: AP-DATA-SOURCING-007
title: Using unsupported DataFrame types with time-series storage
description: ArcticDB does not support MultiIndex columns, PyArrow-backed pandas DataFrames, or timedelta64 columns. Attempting
to write these DataFrame types raises ArcticDbNotYetImplemented exceptions, causing write failures and permanent data
loss if not properly handled before storage operations.
project_source: finance-bp-103--ArcticDB
severity: high
applicable_to_tags:
markets:
- multi-market
activities:
- data-sourcing
_source_file: anti-patterns/data-sourcing.yaml
- id: AP-DATA-SOURCING-008
title: Non-atomic storage writes causing concurrent access corruption
description: Storage backends without atomic write_if_none operations can cause data corruption under concurrent multi-writer
access. Similarly, updating reference keys before atom keys complete allows readers to access incomplete or missing data,
breaking version chain integrity.
project_source: finance-bp-103--ArcticDB
severity: high
applicable_to_tags:
markets:
- multi-market
activities:
- data-sourcing
_source_file: anti-patterns/data-sourcing.yaml
- id: AP-DATA-SOURCING-009
title: Missing timezone-aware DatetimeIndex causing DST offset errors
description: Price history DataFrames returned without timezone-aware DatetimeIndex cause incorrect timestamp interpretation
when combined with other timezone-aware data. This leads to 23-25 hour offset errors during daylight saving time transitions,
corrupting historical price calculations.
project_source: finance-bp-128--yfinance
severity: high
applicable_to_tags:
markets:
- multi-market
activities:
- data-sourcing
_source_file: anti-patterns/data-sourcing.yaml
- id: AP-DATA-SOURCING-010
title: 8-K filing item numbering scheme mismatch for historical filings
description: 8-K filings use obsolete item numbering (1-12) before 2004-08-23 and new numbering (1.01-9.01) after. Using
the wrong numbering scheme causes no matches for historical filings, resulting in empty item sections and complete extraction
failure for pre-2004 data.
project_source: finance-bp-114--edgar-crawler
severity: medium
applicable_to_tags:
markets:
- multi-market
activities:
- data-sourcing
_source_file: anti-patterns/data-sourcing.yaml
- id: AP-DATA-SOURCING-011
title: Yahoo Finance missing crumb authentication causing 401/403 errors
description: Yahoo Finance API requires crumb and cookie authentication with every request. Without proper crumb management,
API calls return 401 Unauthorized or HTML error pages instead of JSON data, breaking all downstream price and financial
data processing.
project_source: finance-bp-128--yfinance
severity: high
applicable_to_tags:
markets:
- multi-market
activities:
- data-sourcing
_source_file: anti-patterns/data-sourcing.yaml
- id: AP-DATA-SOURCING-012
title: Large document parsing without streaming causing OOM errors
description: SEC filings can exceed 160MB, and parsing large documents in memory without streaming causes OOM errors that
crash the entire service for all users. Documents exceeding 10MB require switching to streaming parsers to prevent extreme
memory usage.
project_source: finance-bp-070--edgartools
severity: high
applicable_to_tags:
markets:
- multi-market
activities:
- data-sourcing
_source_file: anti-patterns/data-sourcing.yaml
- id: AP-DATA-SOURCING-013
title: Column mapping length mismatch causing DataFrame errors
description: Column mapping constants with length mismatch against actual API response columns cause ValueError exceptions
during DataFrame construction. Raw field names (f1, f2, f12) must be mapped to meaningful names (最新价, 涨跌幅) with exact
column count alignment.
project_source: finance-bp-079--akshare
severity: medium
applicable_to_tags:
markets:
- multi-market
activities:
- data-sourcing
_source_file: anti-patterns/data-sourcing.yaml
- id: AP-DATA-SOURCING-014
title: Pruning snapshot-protected versions breaking point-in-time recovery
description: Deleting or pruning versions that are referenced by existing snapshots breaks historical data access. Snapshots
provide point-in-time recovery capabilities, and removing their referenced versions causes read failures when users attempt
to access data from specific snapshots.
project_source: finance-bp-103--ArcticDB
severity: high
applicable_to_tags:
markets:
- multi-market
activities:
- data-sourcing
_source_file: anti-patterns/data-sourcing.yaml
cross_project_wisdom:
- wisdom_id: CW-DATA-SOURCING-001
source_project: finance-bp-079--akshare, finance-bp-114--edgar-crawler
pattern_name: Exponential backoff retry with rate limit detection
description: Implement retry logic with exponential backoff specifically for HTTP 429 rate limit responses. Retrying immediately
on rate limit errors worsens the block situation. Separate retry logic for transient network errors (TimeoutError, ConnectionError)
from permanent errors (ValueError, KeyError) prevents resource waste and masks underlying bugs.
applicable_to_activity: data-sourcing
_source_file: cross-project-wisdom/data-sourcing.yaml
- wisdom_id: CW-DATA-SOURCING-002
source_project: finance-bp-070--edgartools, finance-bp-079--akshare, finance-bp-084--eastmoney
pattern_name: Strict date format validation and standardization
description: Validate date formats strictly (YYYY-MM-DD pattern with leap year and month-end checks) before processing XBRL
or API data. Convert date strings between formats (YYYYMMDD to YYYY-MM-DD) when storing to databases. Invalid dates corrupt
downstream financial calculations.
applicable_to_activity: data-sourcing
_source_file: cross-project-wisdom/data-sourcing.yaml
- wisdom_id: CW-DATA-SOURCING-003
source_project: finance-bp-070--edgartools, finance-bp-114--edgar-crawler
pattern_name: XBRL fact attribute completeness enforcement
description: Extract and validate all essential XBRL fact attributes (concept, value, period, unit) from every fact. Missing
attributes cause financial analysis queries to return incomplete or misleading results. Period type (instant vs duration)
must be correctly distinguished for accurate balance sheet rendering.
applicable_to_activity: data-sourcing
_source_file: cross-project-wisdom/data-sourcing.yaml
- wisdom_id: CW-DATA-SOURCING-004
source_project: finance-bp-070--edgartools, finance-bp-128--yfinance
pattern_name: Streaming parser threshold for large documents
description: Implement streaming parser activation when documents exceed configurable thresholds (10MB default). This prevents
OOM errors on large NPORT-P filings or bulk document downloads. Also require timezone information for time-series data
to prevent DST offset corruption.
applicable_to_activity: data-sourcing
_source_file: cross-project-wisdom/data-sourcing.yaml
- wisdom_id: CW-DATA-SOURCING-005
source_project: finance-bp-079--akshare, finance-bp-128--yfinance, finance-bp-097--OpenBB
pattern_name: Data accuracy disclaimer requirements
description: Always present scraped or third-party financial data with proper caveats about accuracy limitations and delays.
Claims of guaranteed accuracy, real-time capabilities, or Yahoo/provider affiliation violate terms of service and can
lead to user financial losses from reliance on delayed or incorrect data.
applicable_to_activity: data-sourcing
_source_file: cross-project-wisdom/data-sourcing.yaml
- wisdom_id: CW-DATA-SOURCING-006
source_project: finance-bp-103--ArcticDB
pattern_name: Atomic write ordering for versioned storage
description: Write atom keys (TABLE_DATA, TABLE_INDEX, VERSION) before updating mutable reference keys (VERSION_REF, SNAPSHOT_REF).
Never modify atom keys after writing to preserve content-addressed storage invariants. This prevents readers from accessing
incomplete data in multi-writer scenarios.
applicable_to_activity: data-sourcing
_source_file: cross-project-wisdom/data-sourcing.yaml
- wisdom_id: CW-DATA-SOURCING-007
source_project: finance-bp-079--akshare, finance-bp-097--OpenBB
pattern_name: HTTP status code validation before data processing
description: Always validate HTTP response status codes before processing response data. Error responses (404, 500) may
contain HTML error pages that corrupt downstream JSON parsing. Explicitly check for HTTP 429 and raise RateLimitError
for proper handling by callers.
applicable_to_activity: data-sourcing
_source_file: cross-project-wisdom/data-sourcing.yaml
- wisdom_id: CW-DATA-SOURCING-008
source_project: finance-bp-084--eastmoney
pattern_name: Quality gates for financial recommendations
description: Apply fundamental quality filters (ROE thresholds, OCF/Profit ratios, debt ratios) before generating financial
recommendations. Without quality gates, low-quality stocks may be recommended for positions, leading to investment losses.
Separate on-demand computation from scheduled pre-computation to handle API rate limits.
applicable_to_activity: data-sourcing
_source_file: cross-project-wisdom/data-sourcing.yaml
domain_constraints_injected:
- id: SHARED-DS-RL-001
statement: 'Rate Limit + 指数退避重试:所有外部数据 API 调用必须实施速率限制控制 和指数退避重试(Exponential Backoff with Jitter)。收到 429/503 响应后 立即重试是反模式,会加剧服务端压力并触发
IP 封禁。 最大重试次数 3-5 次,退避基数 1-2 秒,最大退避 60 秒。
'
severity: fatal
capability_tags:
activities:
- data-sourcing
applicable_conditions:
blueprint_has_stage:
- data_collection
incompatible_with_tags: {}
stage_id_remap_hints:
- from_stage: data_collection
constraint_context: all external API calls must implement exponential backoff retry with jitter
evidence_refs:
- type: community_validated
ref: AWS《重试行为最佳实践》;akshare 文档限速说明;tushare 文档请求频率限制
url: https://docs.aws.amazon.com/general/latest/gr/api-retries.html
reference_code:
bad_example: "# BAD: 立即重试,不退避,加剧 429\nfor attempt in range(5):\n try:\n data = api.get(symbol)\n break\n\
\ except RateLimitError:\n time.sleep(0.1) # 100ms 立即重试,会加剧问题\n"
good_example: "# GOOD: 指数退避 + Jitter 重试\nimport random\n\ndef fetch_with_retry(func, *args, max_retries=5, base_delay=1.0):\n\
\ for attempt in range(max_retries):\n try:\n return func(*args)\n except (RateLimitError,\
\ TimeoutError) as e:\n if attempt == max_retries - 1:\n raise\n delay = min(base_delay\
\ * (2 ** attempt), 60)\n delay += random.uniform(0, delay * 0.1) # +10% Jitter\n time.sleep(delay)\n"
provenance:
source: community_validated
_source_file: data-sourcing/constraints.yaml
- id: SHARED-DS-RL-002
statement: '批量 API 调用必须控制并发数(max_workers),不可无限制并行。 免费 API(akshare/tushare 免费版)通常限制为 1-3 并发; 付费 API 也有并发上限(tushare 积分制,不同积分对应不同并发)。
超出并发限制会触发 429 或 IP 封禁。推荐使用 asyncio.Semaphore 或 ThreadPoolExecutor 的 max_workers 参数显式控制。
'
severity: high
capability_tags:
activities:
- data-sourcing
applicable_conditions:
blueprint_has_stage:
- data_collection
incompatible_with_tags: {}
stage_id_remap_hints:
- from_stage: data_collection
constraint_context: concurrent API calls must be bounded by explicit max_workers/semaphore
evidence_refs:
- type: community_validated
ref: tushare 文档积分与频率限制;akshare 文档接口说明;MiniMax 并发踩坑记录(Doramagic内部记忆)
reference_code:
bad_example: "# BAD: 无并发限制,触发 429\nwith ThreadPoolExecutor() as executor:\n results = list(executor.map(fetch_stock,\
\ stock_list))\n # 默认 max_workers 可能创建几十个线程,立即触发 429\n"
good_example: "# GOOD: 显式限制并发(akshare 免费版建议 max_workers=2)\nfrom concurrent.futures import ThreadPoolExecutor\nMAX_WORKERS\
\ = 2 # 根据 API 文档调整\n\nwith ThreadPoolExecutor(max_workers=MAX_WORKERS) as executor:\n results = list(executor.map(fetch_stock,\
\ stock_list))\n"
provenance:
source: community_validated
_source_file: data-sourcing/constraints.yaml
- id: SHARED-DS-RL-003
statement: 'API Token / 凭证安全:数据源 API key(tushare token / akshare 无需 token 但 其他商业数据源需要)不可硬编码在代码中,必须通过环境变量或配置文件读取。 硬编码 token
提交到 Git 会导致 token 泄露和费用损失。
'
severity: high
capability_tags:
activities:
- data-sourcing
applicable_conditions:
blueprint_has_stage:
- data_collection
incompatible_with_tags: {}
stage_id_remap_hints:
- from_stage: data_collection
constraint_context: API tokens must be loaded from environment variables, not hardcoded
evidence_refs:
- type: community_validated
ref: tushare 文档 token 管理;GitHub Secret Scanning 最佳实践
url: https://tushare.pro/document/2
reference_code:
bad_example: '# BAD: Token 硬编码,提交到 Git 后泄露
ts.set_token(''abc123def456your_token_here'')
pro = ts.pro_api()
'
good_example: "# GOOD: 从环境变量读取 token\nimport os\ntoken = os.environ.get('TUSHARE_TOKEN')\nif not token:\n raise ValueError(\"\
TUSHARE_TOKEN environment variable not set\")\nts.set_token(token)\npro = ts.pro_api()\n"
provenance:
source: community_validated
_source_file: data-sourcing/constraints.yaml
- id: SHARED-DS-RL-004
statement: '请求节流(Throttling):对同一 API 的批量请求应在请求间插入最小间隔 (akshare 部分接口要求 ≥ 0.5s;tushare 免费版每分钟 200 次)。 纯代码 sleep 不如令牌桶(Token
Bucket)算法精确,推荐使用 ratelimit 或 slowapi 等成熟库。
'
severity: medium
capability_tags:
activities:
- data-sourcing
applicable_conditions:
blueprint_has_stage:
- data_collection
incompatible_with_tags: {}
stage_id_remap_hints:
- from_stage: data_collection
constraint_context: per-request minimum interval must be enforced between API calls
evidence_refs:
- type: community_validated
ref: akshare 官方文档接口说明;知乎《量化数据采集:如何优雅处理限速》
url: https://akshare.akfamily.xyz/
reference_code:
bad_example: "# BAD: 固定 sleep 不准确,高并发下失效\nfor code in stock_list:\n data = ak.stock_zh_a_hist(symbol=code)\n time.sleep(0.1)\
\ # 可能不够,也可能太保守\n"
good_example: "# GOOD: 使用 ratelimit 装饰器精确控制\nfrom ratelimit import limits, sleep_and_retry\n\n@sleep_and_retry\n@limits(calls=200,\
\ period=60) # tushare 免费版: 200次/分钟\ndef fetch_daily(code, start, end):\n return ts.pro_bar(ts_code=code, start_date=start,\
\ end_date=end)\n"
provenance:
source: community_validated
_source_file: data-sourcing/constraints.yaml
- id: SHARED-DS-MISS-001
statement: '停牌日数据缺失策略:停牌股票在停牌期间无成交数据,数据库中会出现日期缺口。 缺失日期不可使用 forward-fill(会产生虚假成交量); 应在数据库中以 is_suspended=True 标记,量和成交额填 0,价格保留前一日收盘价。
因子计算时必须过滤 is_suspended=True 的行。
'
severity: high
capability_tags:
activities:
- data-sourcing
- backtesting
applicable_conditions:
blueprint_has_stage:
- data_collection
- data_filtering
incompatible_with_tags: {}
stage_id_remap_hints:
- from_stage: data_collection
constraint_context: suspended trading days must be explicitly marked with is_suspended=True, not silently forward-filled
evidence_refs:
- type: community_validated
ref: tushare 文档 daily 接口停牌标志;qlib 文档 suspended stock handling
url: https://tushare.pro/document/2?doc_id=28
reference_code:
bad_example: '# BAD: forward-fill 停牌日,量保持前一日非零值
df = df.reindex(all_trading_days).fillna(method=''ffill'')
# volume 被填充为非零值,停牌变"正常交易"
'
good_example: "# GOOD: 停牌日明确标记\nfull_index = pd.MultiIndex.from_product(\n [all_stocks, all_trading_days], names=['stock',\
\ 'date'])\ndf_full = df.reindex(full_index)\ndf_full['is_suspended'] = df_full['volume'].isna()\ndf_full['volume']\
\ = df_full['volume'].fillna(0)\ndf_full['amount'] = df_full['amount'].fillna(0)\ndf_full['close'] = df_full['close'].fillna(method='ffill')\
\ # 价格 ffill\n"
provenance:
source: community_validated
_source_file: data-sourcing/constraints.yaml
- id: SHARED-DS-MISS-002
statement: '新上市股票的历史数据边界:新股上市首日开始在数据库中出现,但其上市前 无历史数据。若因子计算的 lookback 期超过上市天数,会产生所有 NaN 因子值。 采集时应记录每只股票的上市日期(list_date),采集逻辑应以上市日期为起点,
不以固定开始日期。
'
severity: medium
capability_tags:
activities:
- data-sourcing
applicable_conditions:
blueprint_has_stage:
- data_collection
incompatible_with_tags: {}
stage_id_remap_hints:
- from_stage: data_collection
constraint_context: data collection start date must be bounded by stock listing date, not a fixed start date
evidence_refs:
- type: community_validated
ref: tushare stock_basic 接口 list_date 字段;akshare stock_info_a_code_name 接口
url: https://tushare.pro/document/2?doc_id=25
reference_code:
bad_example: "# BAD: 统一从 2010-01-01 开始,新股有大量 NaN\nfor code in stock_list:\n df = fetch(code, start='2010-01-01', end=today)\n"
good_example: "# GOOD: 从上市日期开始采集\nstock_info = ts.get_stock_basics() # 含 list_date\nfor code in stock_list:\n list_date\
\ = stock_info.loc[code, 'list_date']\n df = fetch(code, start=list_date, end=today)\n"
provenance:
source: community_validated
_source_file: data-sourcing/constraints.yaml
- id: SHARED-DS-MISS-003
statement: '退市股票的数据完整性:已退市股票在主流数据源(akshare/tushare)中依然 可以查询历史数据(退市前的历史),但退市日期后无数据。 历史股票池构建时必须包含已退市股票(否则幸存者偏差), 且采集时需明确处理退市日截止边界。
'
severity: high
capability_tags:
activities:
- data-sourcing
- backtesting
applicable_conditions:
blueprint_has_stage:
- data_collection
incompatible_with_tags: {}
stage_id_remap_hints:
- from_stage: data_collection
constraint_context: delisted stocks must be included in historical universe; delist_date must be recorded
evidence_refs:
- type: community_validated
ref: tushare stock_basic 接口 delist_date 字段;qlib 文档 Delisted Stock Handling
url: https://tushare.pro/document/2?doc_id=25
reference_code:
bad_example: '# BAD: 只采集当前上市股票,遗漏已退市股票
stock_list = ts.get_stock_basics() # 只含当前上市股票
'
good_example: "# GOOD: 采集全量股票(含已退市)\nall_stocks = pro.stock_basic(\n exchange='', list_status='L', # 上市\n)\ndelisted\
\ = pro.stock_basic(\n exchange='', list_status='D', # 退市\n)\nfull_universe = pd.concat([all_stocks, delisted])\n"
provenance:
source: community_validated
_source_file: data-sourcing/constraints.yaml
- id: SHARED-DS-MISS-004
statement: '多数据源数据对账(Cross-Source Reconciliation):同一数据(如收盘价) 从不同数据源(akshare/tushare/baostock)获取可能存在细微差异 (不同复权方式/不同节假日处理/除息调整时间不同)。
应在 pipeline 中实施多源对账检查,差异超阈值(如 0.1%)时记录告警并人工确认。
'
severity: high
capability_tags:
activities:
- data-sourcing
applicable_conditions:
blueprint_has_stage:
- data_collection
incompatible_with_tags: {}
stage_id_remap_hints:
- from_stage: data_collection
constraint_context: when using multiple data sources, cross-source price reconciliation must be performed
evidence_refs:
- type: community_validated
ref: 雪球量化社区《数据质量:多数据源对账实践》;知乎《量化数据质量保障》
reference_code:
bad_example: '# BAD: 切换数据源不做对账,静默吞下差异
df_primary = akshare_fetch(code)
df_backup = baostock_fetch(code)
# 如果主源失败,直接用备源,不验证一致性
'
good_example: "# GOOD: 双源对账,价格差异超 0.5% 告警\ntolerance = 0.005\nmerged = df_primary.join(df_backup, lsuffix='_ak', rsuffix='_bs')\n\
diff = (merged['close_ak'] - merged['close_bs']).abs() / merged['close_ak']\nanomalies = diff[diff > tolerance]\nif\
\ len(anomalies) > 0:\n logger.warning(f\"Price discrepancy > {tolerance:.1%}: {len(anomalies)} rows\")\n"
provenance:
source: community_validated
_source_file: data-sourcing/constraints.yaml
- id: SHARED-DS-TIME-001
statement: '时间戳精度与类型一致性:数据库中时间戳应使用统一的数据类型 (timestamp 而非 varchar/int)。混用字符串日期(''2024-01-15'')和 Timestamp 对象是比较、索引、merge 出现细微
bug 的常见来源, 应在 pipeline 入口处强制转换。
'
severity: high
capability_tags:
activities:
- data-sourcing
applicable_conditions:
blueprint_has_stage:
- data_collection
incompatible_with_tags: {}
stage_id_remap_hints:
- from_stage: data_collection
constraint_context: all date/time fields must be normalized to pd.Timestamp at data ingestion boundary
evidence_refs:
- type: community_validated
ref: pandas 文档 to_datetime 最佳实践;SQLAlchemy TIMESTAMP 类型说明
url: https://pandas.pydata.org/docs/reference/api/pandas.to_datetime.html
reference_code:
bad_example: '# BAD: 存储为字符串,比较出错
df[''date''] = ''2024-01-15'' # 字符串
latest = df[df[''date''] == ''2024-01-15''] # 字符串比较,效率低
'
good_example: '# GOOD: 统一转换为 Timestamp
df[''date''] = pd.to_datetime(df[''date''])
latest = df[df[''date''] == pd.Timestamp(''2024-01-15'')]
'
provenance:
source: community_validated
_source_file: data-sourcing/constraints.yaml
- id: SHARED-DS-TIME-002
statement: '交易时间与自然时间的区分:日线数据的"日期"通常对应交易日(T日), 而新闻/公告数据的"时间"是自然时间。合并两类数据时,必须将自然时间 映射到下一个可用交易日(next available trading day),
否则会产生"公告在T日,但T日盘中已经可用"的 lookahead 问题。
'
severity: high
capability_tags:
activities:
- data-sourcing
- backtesting
applicable_conditions:
blueprint_has_stage:
- data_collection
- data_filtering
incompatible_with_tags: {}
stage_id_remap_hints:
- from_stage: data_collection
constraint_context: announcement timestamps must be mapped to next trading day open, not announcement date
evidence_refs:
- type: community_validated
ref: 知乎《量化数据时间戳处理:交易日与自然日的转换》;qlib 文档 point-in-time data
url: https://qlib.readthedocs.io/
reference_code:
bad_example: '# BAD: 公告日当天即可用于交易信号(可能是盘后公告)
signals = df.merge(announcements, on=''date'') # 公告日 = 交易日
'
good_example: "# GOOD: 盘后公告映射到下一交易日\nimport exchange_calendars as xcals\ncal = xcals.get_calendar('XSHG')\n\ndef announcement_to_trade_date(ann_dt,\
\ market_close_hour=15):\n date = pd.Timestamp(ann_dt)\n if date.hour >= market_close_hour:\n # 盘后公告 →\
\ 下一交易日生效\n return cal.next_session(date.date())\n else:\n return date.date()\n\nannouncements['trade_date']\
\ = announcements['ann_datetime'].apply(\n announcement_to_trade_date)\n"
provenance:
source: community_validated
_source_file: data-sourcing/constraints.yaml
- id: SHARED-DS-TIME-003
statement: '夏令时(DST)处理:采集美股/欧洲股市数据时,夏令时切换日(3月/11月) 会导致同一 HH:MM 时刻对应不同的 UTC 时间,若未处理,当日时序数据 会出现1小时的漂移。应始终以 UTC 存储,展示时按市场本地时区转换。
'
severity: medium
capability_tags:
activities:
- data-sourcing
applicable_conditions:
blueprint_has_stage:
- data_collection
incompatible_with_tags:
markets:
- cn-astock
stage_id_remap_hints:
- from_stage: data_collection
constraint_context: DST transitions must be handled when collecting US/EU market data; store as UTC
evidence_refs:
- type: community_validated
ref: pytz 文档 DST 处理;exchange_calendars 文档
url: https://pytz.sourceforge.net/
reference_code:
bad_example: '# BAD: 用 naive datetime,夏令时切换日漂移
df[''datetime''] = pd.to_datetime(df[''time_str'']) # no timezone
'
good_example: "# GOOD: 以 UTC 存储,展示时转本地时区\nimport pytz\neastern = pytz.timezone('America/New_York')\ndf['datetime_utc']\
\ = pd.to_datetime(df['time_str']\n ).dt.tz_localize(eastern, ambiguous='NaT'\n ).dt.tz_convert('UTC')\n"
provenance:
source: community_validated
_source_file: data-sourcing/constraints.yaml
- id: SHARED-DS-INCR-001
statement: '增量更新幂等性:数据更新脚本必须是幂等的(多次运行结果相同)。 若脚本因网络中断在中途失败,重新运行时不应产生重复数据或数据缺口。 实现方式:先写入临时表,校验后 UPSERT 到主表,不直接 INSERT/APPEND。
'
severity: high
capability_tags:
activities:
- data-sourcing
applicable_conditions:
blueprint_has_stage:
- data_collection
incompatible_with_tags: {}
stage_id_remap_hints:
- from_stage: data_collection
constraint_context: 'data update scripts must be idempotent: use UPSERT, not INSERT/APPEND'
evidence_refs:
- type: community_validated
ref: SQLite UPSERT 文档(INSERT OR REPLACE);知乎《量化数据库设计:幂等更新》
url: https://www.sqlite.org/lang_upsert.html
reference_code:
bad_example: '# BAD: 直接 APPEND,重跑产生重复数据
df_new.to_sql(''daily_prices'', con=engine, if_exists=''append'', index=False)
'
good_example: "# GOOD: UPSERT(主键冲突则更新)\nfor _, row in df_new.iterrows():\n engine.execute(\"\"\"\n INSERT OR\
\ REPLACE INTO daily_prices\n (stock_code, date, open, high, low, close, volume)\n VALUES (?, ?, ?, ?,\
\ ?, ?, ?)\n \"\"\", row.to_list())\n# SQLAlchemy 版本:使用 on_conflict_do_update\n"
provenance:
source: community_validated
_source_file: data-sourcing/constraints.yaml
- id: SHARED-DS-INCR-002
statement: '数据完整性检验(数据校验和/行数检查):每次数据更新后, 应对关键字段做完整性检验:行数是否在预期范围内、价格是否为正数、 日期是否连续(无缺失交易日)。缺少自动校验的数据管道是"沉默腐烂"的根源。
'
severity: high
capability_tags:
activities:
- data-sourcing
applicable_conditions:
blueprint_has_stage:
- data_collection
incompatible_with_tags: {}
stage_id_remap_hints:
- from_stage: data_collection
constraint_context: 'post-update data quality checks must run automatically: row count, price positivity, date continuity'
evidence_refs:
- type: community_validated
ref: Great Expectations 文档;知乎《量化数据质量治理:如何发现数据腐烂》
url: https://docs.greatexpectations.io/
reference_code:
bad_example: '# BAD: 更新后不做任何检验
update_daily_prices(date=today)
print("Update done") # 不知道是否成功,不知道有无缺漏
'
good_example: '# GOOD: 更新后自动校验
update_daily_prices(date=today)
# 检验1: 行数合理(A股约5000只股票)
row_count = db.count("SELECT COUNT(*) FROM daily_prices WHERE date = ?", today)
assert 4000 <= row_count <= 6000, f"Unexpected row count: {row_count}"
# 检验2: 无零价格或负价格
invalid = db.count("SELECT COUNT(*) FROM daily_prices WHERE close <= 0")
assert invalid == 0, f"Found {invalid} invalid prices"
# 检验3: 无日期缺口(检查最近 5 个交易日连续性)
check_no_date_gaps(db, last_n_trading_days=5)
'
provenance:
source: community_validated
_source_file: data-sourcing/constraints.yaml
- id: SHARED-DS-INCR-003
statement: '数据版本化:数据管道的输出数据应版本化管理(data versioning)。 当数据源更新了历史数据(如修订调整后的财务数据), 旧版本数据应保留可追溯,不应静默覆盖,以便对比版本间差异及复现历史回测。
'
severity: medium
capability_tags:
activities:
- data-sourcing
applicable_conditions:
blueprint_has_stage:
- data_collection
incompatible_with_tags: {}
stage_id_remap_hints:
- from_stage: data_collection
constraint_context: historical data revisions must be versioned; silent overwrites are prohibited
evidence_refs:
- type: community_validated
ref: ArcticDB 文档数据版本化;DVC (Data Version Control) 文档
url: https://arcticdb.io/
reference_code:
bad_example: '# BAD: 覆盖写入,历史版本丢失
df_revised.to_csv(''financial_data.csv'', index=False) # 覆盖旧版本
'
good_example: '# GOOD: 带时间戳的版本化存储(使用 ArcticDB 或简单目录版本)
version = datetime.now().strftime(''%Y%m%d_%H%M%S'')
df_revised.to_parquet(f''data/financial_data_v{version}.parquet'')
# 软链接指向最新版本
# ln -sf financial_data_v{version}.parquet financial_data_latest.parquet
# 或使用 ArcticDB(内置版本化):
import arcticdb as adb
lib = adb.Arctic(''lmdb:///data/arctic_store'').get_library(''finance'')
lib.write(''financial_data'', df_revised) # 自动版本化
'
provenance:
source: community_validated
_source_file: data-sourcing/constraints.yaml
- id: SHARED-DS-INCR-004
statement: '数据对齐到交易日历边界:采集完成后,应验证所有股票/资产的数据覆盖 完整性与交易日历的一致性。每只股票在每个交易日都应有一行数据 (停牌标记,不是缺失)。通过 pivot_table 检查 NaN 比例是有效的快速诊断手段。
'
severity: medium
capability_tags:
activities:
- data-sourcing
applicable_conditions:
blueprint_has_stage:
- data_collection
- data_filtering
incompatible_with_tags: {}
stage_id_remap_hints:
- from_stage: data_collection
constraint_context: data completeness vs trading calendar must be verified after each ingestion
evidence_refs:
- type: community_validated
ref: qlib 文档 data quality inspection;tushare 文档 daily 接口完整性说明
url: https://qlib.readthedocs.io/
reference_code:
bad_example: '# BAD: 不检验数据完整性,静默忽略缺失
df = load_all_stocks(start_date, end_date)
run_backtest(df)
'
good_example: "# GOOD: pivot 矩阵检验覆盖率\nprice_matrix = df.pivot_table(\n index='date', columns='stock_code', values='close')\n\
coverage = 1 - price_matrix.isna().mean().mean()\nprint(f\"Data coverage: {coverage:.1%}\")\nif coverage < 0.95:\n \
\ logger.warning(f\"Low coverage: {coverage:.1%}, check for missing stocks\")\n# 找出缺失严重的股票\nmissing_stocks = price_matrix.isna().mean()\n\
bad_stocks = missing_stocks[missing_stocks > 0.05].index.tolist()\nif bad_stocks:\n logger.warning(f\"Stocks with\
\ >5% missing days: {bad_stocks}\")\n"
provenance:
source: community_validated
_source_file: data-sourcing/constraints.yaml
- id: SHARED-DS-INCR-005
statement: '缓存策略(Caching):频繁读取的静态/低频更新数据(如股票信息、行业分类、 指数成分股)应本地缓存,避免每次运行重复 API 调用。 缓存必须设置过期时间(TTL),防止使用过期的行业分类或已失效的成分股信息。
'
severity: medium
capability_tags:
activities:
- data-sourcing
applicable_conditions:
blueprint_has_stage:
- data_collection
incompatible_with_tags: {}
stage_id_remap_hints:
- from_stage: data_collection
constraint_context: static/low-frequency data must be cached locally with TTL to avoid unnecessary API calls
evidence_refs:
- type: community_validated
ref: akshare 文档建议本地缓存;functools.lru_cache 文档;joblib.Memory 文档
url: https://akshare.akfamily.xyz/
reference_code:
bad_example: "# BAD: 每次运行都重新获取行业分类(慢且消耗配额)\ndef get_industry(stock):\n return ak.stock_board_industry_name_em() #\
\ 每次调用 API\n"
good_example: "# GOOD: 缓存行业分类,每日刷新一次\nfrom joblib import Memory\nfrom datetime import date\n\ncache_dir = './data_cache'\n\
memory = Memory(cache_dir, verbose=0)\n\[email protected]\ndef get_industry_cached(cache_date: str): # cache_date 作为缓存\
\ key\n return ak.stock_board_industry_name_em()\n\n# 每日刷新:用今日日期作为 key,自动使旧缓存失效\nindustry_df = get_industry_cached(str(date.today()))\n"
provenance:
source: community_validated
_source_file: data-sourcing/constraints.yaml
resources_injected: {}
known_use_cases:
- kuc_id: KUC-101
source_file: tests/test_extract_items.py
business_problem: Extracts and processes SEC EDGAR filings (10-K annual reports, 10-Q quarterly reports) from compressed
ZIP archives for downstream financial analysis and document processing workflows.
intent_keywords:
- EDGAR
- SEC filings
- 10-K extraction
- annual report parsing
- document extraction
stage: data_collection
data_domain: financial_data
type: data_pipeline
component_capability_map:
project: finance-bp-114--edgar-crawler
scan_date: '2026-04-22'
stats:
total_files: 4
total_classes: 16
total_functions: 0
total_stages: 4
modules:
index_download_stage:
class_count: 4
stage_id: index_download
stage_order: 1
responsibility: Downloads SEC EDGAR index files (TSV) for specified years/quarters. Provides the master list of available
filings for downstream filtering. This stage exists because SEC EDGAR provides quarterly indices that must be fetched
incrementally to avoid redundant network calls and enable efficient updates.
classes:
- name: download_indices
file: index_download_stage/download-indices.py
line: 0
kind: required_method
signature: ''
- name: get_specific_indices
file: index_download_stage/get-specific-indices.py
line: 0
kind: required_method
signature: ''
- name: requests_retry_session
file: index_download_stage/requests-retry-session.py
line: 0
kind: required_method
signature: ''
- name: user_agent
file: index_download_stage/user-agent.py
line: 0
kind: replaceable_point
design_decision_count: 3
crawl_and_download_stage:
class_count: 3
stage_id: crawl_and_download
stage_order: 2
responsibility: Parses HTML index pages from SEC EDGAR, extracts filing metadata (SIC, state, fiscal year), and downloads
actual filing documents to local storage. This stage bridges index information to raw document files for downstream
parsing.
classes:
- name: crawl
file: crawl_and_download_stage/crawl.py
line: 0
kind: required_method
signature: ''
- name: download
file: crawl_and_download_stage/download.py
line: 0
kind: required_method
signature: ''
- name: iXBRL URL handling
file: crawl_and_download_stage/ixbrl-url-handling.py
line: 0
kind: replaceable_point
design_decision_count: 3
document_parsing_stage:
class_count: 8
stage_id: document_parsing
stage_order: 3
responsibility: Extracts structured items from raw HTML/text filings using regex pattern matching. Handles tables, spans,
and filing-specific item structures for 10-K, 10-Q, and 8-K filings. This is the core NLP extraction engine that transforms
unstructured documents into machine-readable JSON.
classes:
- name: ExtractItems.extract
file: document_parsing_stage/extractitems-extract.py
line: 0
kind: required_method
signature: ''
- name: HtmlStripper.feed
file: document_parsing_stage/htmlstripper-feed.py
line: 0
kind: required_method
signature: ''
- name: determine_items_to_extract
file: document_parsing_stage/determine-items-to-extract.py
line: 0
kind: required_method
signature: ''
- name: parse_item
file: document_parsing_stage/parse-item.py
line: 0
kind: required_method
signature: ''
- name: get_10q_parts
file: document_parsing_stage/get-10q-parts.py
line: 0
kind: required_method
signature: ''
- name: remove_tables
file: document_parsing_stage/remove-tables.py
line: 0
kind: replaceable_point
- name: items_to_extract
file: document_parsing_stage/items-to-extract.py
line: 0
kind: replaceable_point
- name: skip_extracted_filings
file: document_parsing_stage/skip-extracted-filings.py
line: 0
kind: replaceable_point
design_decision_count: 8
logging_infrastructure:
class_count: 1
stage_id: logging
stage_order: 4
responsibility: Centralized logging infrastructure providing timestamped log files and console output filtering. Enables
debugging of specific execution windows and post-run forensics.
classes:
- name: Logger.__init__
file: logging_infrastructure/logger-init.py
line: 0
kind: required_method
signature: ''
design_decision_count: 2
data_flow_hints: []
locale_contract:
source_language: en
user_facing_fields:
- human_summary.what_i_can_do.tagline
- human_summary.what_i_can_do.use_cases[]
- human_summary.what_i_auto_fetch[]
- human_summary.what_i_ask_you[]
- evidence_quality.user_disclosure_template
- post_install_notice.message_template.positioning
- post_install_notice.message_template.capability_catalog.groups[].name
- post_install_notice.message_template.capability_catalog.groups[].description
- post_install_notice.message_template.capability_catalog.groups[].ucs[].name
- post_install_notice.message_template.capability_catalog.groups[].ucs[].short_description
- post_install_notice.message_template.call_to_action
- post_install_notice.message_template.featured_entries[].beginner_prompt
- post_install_notice.message_template.more_info_hint
- preconditions[].description
- preconditions[].on_fail
- intent_router.uc_entries[].name
- intent_router.uc_entries[].ambiguity_question
- architecture.pipeline
- architecture.stages[].narrative.does_what
- architecture.stages[].narrative.key_decisions
- architecture.stages[].narrative.common_pitfalls
- constraints.fatal[].consequence
- constraints.regular[].consequence
- output_validator.assertions[].failure_message
- acceptance.hard_gates[].on_fail
- skill_crystallization.action
locale_detection_order:
- explicit_user_declaration
- first_message_language
- system_locale
translation_enforcement:
trigger: on_first_user_message
action: Render user_facing_fields in detected locale, preserving all IDs (BD-/SL-/UC-/finance-C-) and code identifiers
verbatim
violation_code: LOCALE-01
violation_signal: User receives untranslated English Human Summary when detected locale != en
evidence_quality:
declared:
evidence_coverage_ratio: 1.0
evidence_verify_ratio: 0.32926829268292684
evidence_invalid: 55
evidence_verified: 27
evidence_auto_fixed: 0
audit_coverage: 45/45 (100%)
audit_pass_rate: 1/45 (2%)
audit_fail_total: 29
audit_finance_universal:
pass: 0
warn: 0
fail: 0
audit_subdomain_totals:
pass: 1
warn: 15
fail: 29
enforcement_rules:
- id: EQ-01
trigger: declared.evidence_verify_ratio < 0.5
action: MUST invoke traceback lookup for all cited BD-IDs in output before emitting business code — read LATEST.yaml sections
for each BD referenced
violation_code: EQ-01-V
violation_signal: Generated script references BD-IDs but no tool_call to read LATEST.yaml preceded code generation
user_disclosure_template: '[QUALITY NOTICE] This crystal was compiled from blueprint finance-bp-114. Evidence verify ratio
= 32.9% and audit fail total = 29. Generated results may have uncaptured requirement gaps. Verify critical decisions against
source files (LATEST.yaml / LATEST.jsonl).'
traceback:
source_files:
blueprint: LATEST.yaml
constraints: LATEST.jsonl
mandatory_lookup_scenarios:
- id: TB-01
condition: Two constraints have apparently conflicting enforcement rules
lookup_target: LATEST.jsonl — find both constraint IDs, compare `consequence` + `evidence_refs` to determine priority
- id: TB-02
condition: A business decision rationale is unclear or disputed
lookup_target: LATEST.yaml — locate BD-ID under business_decisions, read `rationale` + `alternative_considered` fields
- id: TB-03
condition: evidence_invalid > 0 in evidence_quality.declared
lookup_target: LATEST.yaml _enrich_meta — cross-check specific BD `evidence_refs` fields for invalid markers
- id: TB-04
condition: User asks where a rule comes from
lookup_target: LATEST.jsonl — find constraint by ID, read `confidence.evidence_refs` for source file + line number
- id: TB-05
condition: Generated code does not match expected ZVT API behavior
lookup_target: LATEST.yaml stages[].required_methods — verify method signature and evidence locator in source code
degraded_lookup:
no_fs_access: 'Ask the user to paste the relevant LATEST.yaml section or LATEST.jsonl lines for the BD-/finance-C- IDs
in question. Crystal ID: finance-bp-114-v5.0.'
trace_schema:
event_types:
- precondition_check
- spec_lock_check
- evidence_rule_fired
- evidence_rule_skipped
- locale_translation_emitted
- hard_gate_passed
- hard_gate_failed
- skill_emitted
- false_completion_claim
preconditions:
- id: PC-01
description: zvt package installed and importable
check_command: python3 -c 'import zvt; print(zvt.__version__)'
on_fail: 'Run: python3 -m pip install zvt then re-run: python3 -m zvt.init_dirs to initialize data directories'
severity: fatal
- id: PC-02
description: K-data exists for target entities (required before backtesting)
check_command: python3 -c "from zvt.api.kdata import get_kdata; df = get_kdata(entity_ids=['stock_sh_600000'], limit=1);
assert df is not None and len(df) > 0, 'No kdata found'"
on_fail: 'Run recorder first: python3 -m zvt.recorders.em.em_stock_kdata_recorder --entity_ids stock_sh_600000 (replace
with your target entity IDs)'
severity: fatal
applies_to_uc: []
- id: PC-03
description: ZVT data directory initialized (~/.zvt or ZVT_HOME)
check_command: 'python3 -c "import os; from pathlib import Path; zvt_home = Path(os.environ.get(''ZVT_HOME'', Path.home()
/ ''.zvt'')); assert zvt_home.exists(), f''ZVT home not found: {zvt_home}''"'
on_fail: 'Run: python3 -m zvt.init_dirs'
severity: fatal
- id: PC-04
description: SQLite write permission for ZVT data directory
check_command: python3 -c "import os, tempfile; from pathlib import Path; zvt_home = Path(os.environ.get('ZVT_HOME', Path.home()
/ '.zvt')); test_f = zvt_home / '.write_test'; test_f.touch(); test_f.unlink()"
on_fail: 'Check directory permissions: chmod u+w ~/.zvt or set ZVT_HOME environment variable to a writable location'
severity: warn
intent_router:
uc_entries:
- uc_id: UC-101
name: SEC EDGAR Filing Extraction
positive_terms:
- EDGAR
- SEC filings
- 10-K extraction
- annual report parsing
- document extraction
data_domain: financial_data
negative_terms:
- trading strategy
- backtesting
- stock screening
- live trading
- factor computation
- machine learning prediction
ambiguity_question: Are you looking to extract raw SEC EDGAR filings (10-K, 10-Q, 8-K) from compressed archives for document
processing? Or do you need a different financial data pipeline task?
context_state_machine:
states:
- id: CA1_MEMORY_CHECKED
entry: Task started
exit: All memory queries attempted and recorded; memory_unavailable set if failed
timeout: 30s — skip memory, mark memory_unavailable=true, proceed to CA2
- id: CA2_GAPS_FILLED
entry: CA1 complete
exit: 'All FATAL-priority required inputs answered: target market (A-share/HK/US), data source, time range, strategy type'
timeout: NOT skippable — FATAL inputs MUST be user-answered before proceeding
- id: CA3_PATH_SELECTED
entry: CA2 complete
exit: intent_router matched single use case with confidence gap > 20% over next candidate, no data_domain ambiguity
timeout: Trigger ambiguity_question for top-2 candidates, await user selection
- id: CA4_EXECUTING
entry: CA3 complete + user explicit confirmation received
exit: All hard gates G1-Gn passed and output files written
timeout: NOT skippable — user confirmation of execution path required
enforcement: Code generation is PROHIBITED before CA4_EXECUTING. Any regression to earlier state MUST be announced to user.
buy/sell ordering SL-01 check runs at CA4 entry.
spec_lock_registry:
semantic_locks:
- id: SL-01
description: Execute sell orders before buy orders in every trading cycle
locked_value: sell() called before buy() in each Trader.run() iteration
violation_is: fatal
source_bd_ids:
- BD-018
- id: SL-02
description: Trading signals MUST use next-bar execution (no look-ahead)
locked_value: due_timestamp = happen_timestamp + level.to_second()
violation_is: fatal
source_bd_ids:
- BD-014
- BD-025
- id: SL-03
description: Entity IDs MUST follow format entity_type_exchange_code
locked_value: stock_sh_600000 | stockhk_hk_0700 | stockus_nasdaq_AAPL
violation_is: fatal
source_bd_ids: []
- id: SL-04
description: DataFrame index MUST be MultiIndex (entity_id, timestamp)
locked_value: df.index.names == ['entity_id', 'timestamp']
violation_is: fatal
source_bd_ids: []
- id: SL-05
description: 'TradingSignal MUST have EXACTLY ONE of: position_pct, order_money, order_amount'
locked_value: XOR enforcement in trading/__init__.py:68
violation_is: fatal
source_bd_ids: []
- id: SL-06
description: 'filter_result column semantics: True=BUY, False=SELL, None/NaN=NO ACTION'
locked_value: factor.py:475 order_type_flag mapping
violation_is: fatal
source_bd_ids: []
- id: SL-07
description: Transformer MUST run BEFORE Accumulator in factor pipeline
locked_value: 'compute_result(): transform at :403 before accumulator at :409'
violation_is: fatal
source_bd_ids: []
- id: SL-08
description: 'MACD parameters locked: fast=12, slow=26, signal=9'
locked_value: factors/algorithm.py:30 macd(slow=26, fast=12, n=9)
violation_is: fatal
source_bd_ids:
- BD-036
- id: SL-09
description: 'Default transaction costs: buy_cost=0.001, sell_cost=0.001, slippage=0.001'
locked_value: sim_account.py:25 SimAccountService default costs
violation_is: warning
source_bd_ids:
- BD-029
- id: SL-10
description: A-share equity trading is T+1 (no same-day close of buy positions)
locked_value: sim_account.available_long filters by trading_t
violation_is: fatal
source_bd_ids: []
- id: SL-11
description: Recorder subclass MUST define provider AND data_schema class attributes
locked_value: contract/recorder.py:71 Meta; register_schema decorator
violation_is: fatal
source_bd_ids: []
- id: SL-12
description: Factor result_df MUST contain either 'filter_result' OR 'score_result' column
locked_value: result_df.columns.intersection({'filter_result', 'score_result'}) non-empty
violation_is: fatal
source_bd_ids: []
implementation_hints:
- id: IH-01
hint: 'Use AdjustType enum exactly: qfq (pre-adjust), hfq (post-adjust), bfq (none) — contract/__init__.py:121'
- id: IH-02
hint: For A-share kdata, default to hfq for long-term analysis (dividend-adjusted) — trader.py:538 StockTrader
- id: IH-03
hint: SQLite connection MUST use check_same_thread=False for multi-threaded recorders
- id: IH-04
hint: Accumulator state serialization uses JSON with custom encoder/decoder hooks — contract/base_service.py
- id: IH-05
hint: Factor.level MUST match TargetSelector.level (enforced at add_factor) — factors/target_selector.py:84
preservation_manifest:
required_objects:
business_decisions_count: 92
fatal_constraints_count: 31
non_fatal_constraints_count: 139
use_cases_count: 1
semantic_locks_count: 12
preconditions_count: 4
evidence_quality_rules_count: 2
traceback_scenarios_count: 5
architecture:
pipeline: data_collection -> data_storage -> factor_computation -> target_selection -> trading_execution -> visualization
stages:
- id: data_collection
narrative:
does_what: TimeSeriesDataRecorder and FixedCycleDataRecorder fetch OHLCV and fundamental data from providers (eastmoney,
joinquant, baostock, akshare) and persist domain objects (Stock1dKdata, BalanceSheet) to SQLite via df_to_db().
key_decisions: BD-002 chose evaluate_start_end_size_timestamps for incremental fetch (not full refresh) because comparing
to get_latest_saved_record avoids redundant API calls; BD-003 chose get_data_map field transformation to keep domain
schema provider-agnostic.
common_pitfalls: 'Don''t forget SL-11: Recorder subclass MUST declare both provider and data_schema class attributes
else initialization fails with assertion error; finance-C-001 fatal violation.'
business_decisions: []
- id: data_storage
narrative:
does_what: StorageBackend persists DataFrames to per-provider SQLite databases at {data_path}/{provider}/{provider}_{db_name}.db
using path templates from _get_path_template; Mixin.record_data and Mixin.query_data provide uniform read/write interface.
key_decisions: BD-004 chose StorageBackend abstraction (not hardcoded SQLite) to allow future cloud storage swap; BD-006
derives db_name from data_schema __tablename__ for per-domain database isolation.
common_pitfalls: SL-04 violation (wrong DataFrame index) causes factor pipeline failures downstream; always ensure df.index.names
== ['entity_id', 'timestamp'] before calling record_data.
business_decisions: []
- id: factor_computation
narrative:
does_what: Factor.compute() applies Transformer (stateless, e.g. MacdTransformer) then Accumulator (stateful, e.g. MaStatsAccumulator)
to produce filter_result or score_result columns; EntityStateService persists per-entity rolling state across batches.
key_decisions: BD-007 chose Factor inheriting DataReader for composable data access; SL-08 locks MACD at (fast=12, slow=26,
n=9) — chose standard Appel parameters not adaptive because interpretability matters for practitioners.
common_pitfalls: 'SL-07: Transformer MUST run before Accumulator — swapping order causes NaN propagation; SL-12: result_df
must contain filter_result OR score_result column or TargetSelector silently drops all signals.'
business_decisions: []
- id: target_selection
narrative:
does_what: TargetSelector.add_factor() registers Factor instances; get_targets() returns entity_ids passing threshold
filter at a specific timestamp, enabling point-in-time historical backtesting without look-ahead.
key_decisions: BD-012 chose registrable factor list (not hardcoded) for runtime customization; BD-013 chose timestamp-specific
filtering not current-only because backtests need historical point-in-time correctness.
common_pitfalls: Factor.level MUST match TargetSelector.level (IH-05); mismatched levels cause silent empty target lists
that look like no signals but are actually level-mismatch bugs.
business_decisions: []
- id: trading_execution
narrative:
does_what: Trader.run() calls sell() before buy() each cycle, generates TradingSignals with due_timestamp = happen_timestamp
+ level.to_second() for next-bar execution, and applies on_profit_control() for stop-loss/take-profit before regular
target selection.
key_decisions: SL-01 locks sell-before-buy order because available_long check in sim_account depends on it — chose this
over symmetric ordering to prevent implicit leverage; BD-039 chose long=AND/short=OR multi-level logic to reflect
risk asymmetry.
common_pitfalls: 'SL-02 violation (immediate execution instead of next-bar) introduces look-ahead bias and makes backtest
results unreproducible in live trading; SL-10: A-share T+1 constraint — backtesting without it overstates returns.'
business_decisions: []
- id: visualization
narrative:
does_what: Drawer.draw() combines kline main chart with factor overlays and Rect annotations for entry/exit signals
using Plotly; Drawable interface on Factor enables consistent chart rendering across data types.
key_decisions: BD-019 chose drawer_rects subclass override for custom annotations not hardcoded markers — allows traders
to define entry/exit visuals without modifying base drawing logic.
common_pitfalls: draw_result=True by default (BD-055) is fine for development but set draw_result=False in production/headless
environments to avoid Plotly server startup overhead.
business_decisions: []
- id: cross_cutting_concerns
narrative:
does_what: 'Invariants and utilities that span multiple pipeline stages — collected from 36 source groups: 10-Q Bug
Detection(1), 10-Q Processing(1), 8-K Processing(1), Caching Strategy(1), Directory Setup(3), Error Handling(1), and
30 more.'
key_decisions: 92 BDs merged here because they apply to more than one main stage (e.g. algorithm helpers, default value
choices, ordering contracts, error handling). Agent should inspect individual BD summaries and link back to affected
main stages via shared IDs.
common_pitfalls: Cross-cutting concerns frequently surface as bugs when changes to one main stage unintentionally break
another. Check constraints referencing these BDs and verify invariants still hold after any stage-local modification.
business_decisions:
- id: BD-057
type: B/BA
summary: 10-Q part separation bug detected when PART I is only mentioned in ToC and PART II is much longer
- id: BD-038
type: B/RC
summary: '10-Q documents parsed in two parts: Part I (Items 1-4) and Part II (Items 1-6)'
- id: BD-039
type: B/RC
summary: 8-K item format uses decimal notation (1.01, 2.01, 5.01) not simple numbers
- id: BD-045
type: B/RC
summary: Company info cached in JSON file (companies_info.json) to avoid redundant API calls
- id: BD-017
type: B/BA
summary: Dataset directory (DATASET_DIR) is created alongside __init__.py in a 'datasets' subfolder rather than allowing
user specification
- id: BD-018
type: B
summary: Logging directory (LOGGING_DIR) is created alongside __init__.py in a 'logs' subfolder rather than allowing
user specification
- id: BD-019
type: B
summary: Directories are created at import time in __init__.py rather than lazily or on-demand
- id: BD-051
type: B/DK
summary: If each items are null after extraction, log warning and return None to skip filing
- id: BD-046
type: B/DK
summary: 'Downloaded filename format: {CIK}_{FILING_TYPE}_{YEAR}_{ACCESSION_NUM}.{EXT}'
- id: BD-056
type: B/RC
summary: File reading uses errors='backslashreplace' to handle encoding issues gracefully
- id: BD-048
type: B
summary: CSV metadata written to temporary file first, then moved to final location to prevent data loss
- id: BD-023
type: B/RC
summary: 8-K item naming change from simple numbers (1, 2, 3) to decimal format (1.01, 2.01, 5.01) occurred on August
23, 2004
- id: BD-026
type: B
summary: HTML closing tags (div, tr, p, li) replaced with two newline characters during stripping
- id: BD-027
type: B
summary: <br> tags replaced with two newline characters during HTML stripping
- id: BD-028
type: B
summary: TH/TD closing tags replaced with spaces rather than newlines during HTML stripping
- id: BD-034
type: B/RC
summary: Item patterns adjusted to insert optional whitespace before trailing letters (A, B, C) for flexible matching
- id: BD-035
type: B/BA
summary: 'SIGNATURE section allows variations: SIGNATURE, SIGNATURES, or Signature(s)'
- id: BD-064
type: B/BA
summary: Item index pattern includes word boundary characters ([.*~-:\s\(]) after item number
- id: BD-052
type: B/BA
summary: If no items_to_extract specified, each items for the filing type are extracted
- id: BD-043
type: B/DK
summary: Retry mechanism uses 5 retries with exponential backoff factor of 0.2 for network requests
- id: BD-062
type: B
summary: Exponential backoff status codes include 400, 401, 403, 500, 502, 503, 504, 505
- id: BD-050
type: B/BA
summary: Process pool uses 1 worker process for parallel extraction
- id: BD-065
type: B/BA
summary: Whitespace (but not newlines) matched as [\^\S\r\n] in patterns to preserve line breaks
- id: BD-044
type: B/RC
summary: SEC rate limit response detected by checking for 'will be managed until action is taken' text
- id: BD-036
type: B/BA
summary: Item section extraction selects longest matching section between item markers
- id: BD-037
type: B/RC
summary: SIGNATURE extraction uses last occurrence in document rather than first
- id: BD-063
type: B/RC
summary: Case-sensitive search attempted first before falling back to case-insensitive for item matching
- id: BD-053
type: B/BA
summary: SIGNATURE section excluded by default; enabled via include_signature config flag
- id: BD-033
type: B
summary: Horizontal span margins replaced with single space, vertical margins with single newline
- id: BD-054
type: B/BA
summary: Tables removed by default during extraction; disabled via remove_tables config flag
- id: BD-031
type: B/RC
summary: 'Non-blank background colors (not white, transparent, none, or #fff) trigger table removal'
- id: BD-032
type: B
summary: Tables containing item index headers (Item 1, Item 1A, etc.) are preserved even if they have background colors
- id: BD-029
type: B/RC
summary: Multiple consecutive newlines and spaces normalized to single newline, then multiple spaces to single space
- id: BD-030
type: B/RC
summary: Special Unicode characters (smart quotes, em-dashes, various Unicode dashes) normalized to ASCII equivalents
- id: BD-060
type: B/RC
summary: Page numbers and headers removed during text cleanup using regex patterns
- id: BD-061
type: B/RC
summary: Table of Contents, Index to Financial Statements, Back to Contents, Quicklinks headers removed
- id: BD-066
type: B
summary: Whitespace normalization function preserves structure while removing excessive spacing
- id: BD-022
type: B/BA
summary: Regex flags set to IGNORECASE | DOTALL | MULTILINE for each item pattern matching
- id: BD-041
type: B/BA
summary: Index URLs created by prepending 'https://www.sec.gov/Archives/' to relative paths
- id: BD-004
type: B/RC
summary: Parse Document Format Files table for .htm/.html links; fall back to complete submission text file
- id: BD-005
type: M/BA
summary: Store company metadata in companies_info.json to reduce per-filing lookups
- id: BD-006
type: BA/DK
summary: 'Filename convention: {CIK}_{Type}_{Year}_{accession}.{ext}'
- id: BD-047
type: B/BA
summary: 'Incremental download: existing files are skipped but new filings are downloaded'
- id: BD-074
type: BA
summary: HtmlStripper sets convert_charrefs=True and strict=False - affects HTML parsing
- id: BD-007
type: B/RC
summary: Detect HTML vs plain text by checking for <td> and <tr> elements
- id: BD-008
type: M/BA
summary: Remove numerical tables but preserve text-containing tables via background-color detection
- id: BD-009
type: BA
summary: Handle 10-Q two-part structure by splitting text before item extraction
- id: BD-010
type: B/BA
summary: Adjust regex patterns for Roman numerals to capture both I,II and 1,2 formats
- id: BD-011
type: M/BA
summary: Select longest matching section when multiple candidates exist (handles TOC interference)
- id: BD-012
type: M/BA
summary: Process filings in parallel via ProcessPool
- id: BD-013
type: B/RC
summary: '8-K items renamed after August 23, 2004 (old: 1-12, new: 1.01-9.01)'
- id: BD-014
type: M/BA
summary: Set recursion limit to 30000 to handle deeply nested HTML
- id: BD-020
type: B/BA
summary: Python recursion limit increased from default 1000 to 30000 to handle deeply nested HTML structures
- id: BD-024
type: B/RC
summary: Roman numeral mapping (1-20) used for converting numeric parts to Roman numerals for 10-Q parsing
- id: BD-025
type: B/RC
summary: HTML document detected by presence of both <td> AND <tr> elements (not just one)
- id: BD-055
type: B/RC
summary: Embedded PDF sections (<PDF>...</PDF>) stripped from HTML documents
- id: BD-058
type: B
summary: HTMLParser used for HTML stripping with custom data handler that accumulates text
- id: BD-067
type: B/BA
summary: Date threshold for 8-K form version detection
- id: BD-068
type: B/BA
summary: Background color filtering for table removal decision
- id: BD-069
type: B/RC
summary: Special character Unicode normalization
- id: BD-070
type: B/BA
summary: Ignore-matches counter for ToC filtering
- id: BD-083
type: BA
summary: 'INTERACTION: BD-076 (global recursion limit 30000) × BD-074 (HtmlStripper HTMLParser settings) × BD-014 (recursion
limit declaration) → StackOverflow risk cascade in deeply nested documents'
- id: BD-084
type: BA
summary: 'INTERACTION: BD-001 (incremental download) × BD-047 (skip existing files) × BD-077 (CSV format contract) →
Amplified efficiency gains with silent failure risk'
- id: BD-085
type: B/BA
summary: 'INTERACTION: BD-072 (8-K cutoff invariant) × BD-067 (date threshold) × BD-013 (8-K item naming) → Critical
invariant with contradictory implementation risk'
- id: BD-086
type: B/RC
summary: 'INTERACTION: BD-003 (exponential backoff) × BD-044 (rate limit text detection) × BD-062 (status_forcelist)
→ Redundant error handling with partial coverage'
- id: BD-087
type: B
summary: 'INTERACTION: BD-009 (10-Q two-part structure) × BD-038 (10-Q item naming) × BD-075 (part-item delimiter) ×
BD-079 (Roman numeral map) → Cascading dependency on parsing sequence'
- id: BD-088
type: B
summary: 'INTERACTION: BD-017 (DATASET_DIR fixed) × BD-018 (LOGGING_DIR fixed) × BD-019 (eager directory creation) →
Deployment rigidity causing permission errors in restricted environments'
- id: BD-089
type: BA
summary: 'INTERACTION: BD-045 (company info cache) × BD-002 (CIK lookup cache) → Duplicate caching mechanisms with stale
data amplification risk'
- id: BD-090
type: B
summary: 'INTERACTION: BD-007 (HTML detection) × BD-025 (td+tr detection) × BD-058 (HtmlStripper) → Detection failure
cascades to extraction failure on edge-case documents'
- id: BD-001
type: BA
summary: Download indices per-quarter to enable incremental updates without re-fetching each history
- id: BD-002
type: M/BA
summary: Use separate company_info.json cache to avoid redundant CIK lookups
- id: BD-003
type: BA
summary: Exponential backoff with 5 retries on each HTTP requests
- id: BD-040
type: B/RC
summary: 'Quarterly indices stored as TSV files with pipe delimiter, columns: CIK, Company, Type, Date, links, etc.'
- id: BD-042
type: B/DK
summary: EDGAR indices downloaded by year and quarter (e.g., 2023_QTR1.tsv, 2023_QTR2.tsv)
- id: BD-059
type: B/DK
summary: Skip future quarters when downloading indices (based on current date)
- id: BD-GAP-001
type: DK
summary: 'Missing: Stale data detection and expiry policy'
- id: BD-GAP-002
type: DK
summary: 'Missing: Random seed full coverage'
- id: BD-080
type: DK
summary: HtmlStripper inherits from HTMLParser - users may not realize this dependency
- id: BD-072
type: RC
summary: 8-K obsolete cutoff date 2004-08-23 must match between code and tests
- id: BD-075
type: RC
summary: 10-Q item naming convention uses '__' delimiter to encode part-item relationship
- id: BD-077
type: RC
summary: FILINGS_METADATA.csv format is implicit contract between download and extract
- id: BD-079
type: RC
summary: roman_numeral_map keys (1-20) must match part numbers in item_list_10q
- id: BD-015
type: M/DK
summary: Timestamp log filenames for run-level isolation
- id: BD-016
type: M
summary: Console shows INFO+, file captures DEBUG+
- id: BD-021
type: B/RC
summary: CSS utils logging is suppressed at CRITICAL level to avoid noise from the library
- id: BD-049
type: B/BA
summary: Console logging set to INFO level (not DEBUG) to reduce noise during execution
- id: BD-071
type: B
summary: process_filing MUST call determine_items_to_extract BEFORE extract_items
- id: BD-073
type: B/RC
summary: 10-Q extraction requires parts parsed before items - get_10q_parts before item loop
- id: BD-076
type: B/BA
summary: Global recursion limit 30000 set at module load - affects each imports
- id: BD-081
type: BA
summary: Logger instantiated at module level before config.json loaded
- id: BD-078
type: BA/DK
summary: 10-Q bug recovery modifies self.items_list state then restores it
- id: BD-082
type: BA
summary: 10-Q length_difference threshold 5000 chars drives retry loop
resources:
packages:
- name: beautifulsoup4==4.8.2
version_pin: latest
- name: lxml==4.9.1
version_pin: latest
- name: requests==2.31.0
version_pin: latest
- name: pandas==1.5.3
version_pin: latest
- name: click==7.0
version_pin: latest
- name: tqdm==4.42.1
version_pin: latest
- name: numpy==1.24.4
version_pin: latest
- name: cssutils==1.0.2
version_pin: latest
- name: pathos==0.2.9
version_pin: latest
- name: urllib3==1.26.7
version_pin: latest
strategy_scaffold:
entry_point_name: run_backtest
output_path: result.csv
execution_mode: backtest
conditional_entry_points:
backtest:
entry_point_name: run_backtest
output_path: result.csv
collector:
entry_point_name: run_collector
output_path: result.json
factor:
entry_point_name: run_factor
output_path: result.parquet
training:
entry_point_name: run_training
output_path: result.json
serving:
entry_point_name: run_server
output_path: result.json
research:
entry_point_name: run_research
output_path: result.json
tail_template: "# === DO NOT MODIFY BELOW THIS LINE ===\nif __name__ == \"__main__\":\n result = run_backtest() #\
\ implement above\n from validate import enforce_validation\n enforce_validation(result, output_path=\"{workspace}/result.csv\"\
)\n# === END DO NOT MODIFY ==="
host_adapter:
target: openclaw
timeout_seconds: 1800
shell_operator_restriction: 'exec tool intercepts && / ; / | — never chain: ''pip install X && python Y''. Use separate
exec calls.'
install_recipes:
- python3 -m pip install beautifulsoup4==4.8.2
- python3 -m pip install lxml==4.9.1
- python3 -m pip install requests==2.31.0
- python3 -m pip install zvt
credential_injection: JoinQuant/QMT credentials require user-side '!' prefix shell login. Never hardcode credentials in
generated scripts.
path_resolution: '{workspace} resolves to ~/.openclaw/workspace/doramagic at execution time.'
file_io_tooling: Use openclaw 'write' tool for .py/.sql files; 'exec' tool for python3 /absolute/path/script.py (absolute
paths only).
constraints:
fatal:
- id: finance-C-006
when: When requesting data from SEC EDGAR
action: include a valid User-Agent header identifying the requester with contact information
severity: fatal
kind: resource_boundary
modality: must
consequence: SEC EDGAR will reject requests without valid User-Agent identification with 403 Forbidden errors, preventing
any data downloads
stage_ids:
- index_download
- id: finance-C-017
when: When constructing SEC EDGAR index URLs
action: 'use the official SEC EDGAR full-index URL pattern: https://www.sec.gov/Archives/edgar/full-index/{year}/QTR{quarter}/master.zip'
severity: fatal
kind: resource_boundary
modality: must
consequence: Using incorrect URL patterns will result in 404 errors and complete download failure
stage_ids:
- index_download
- id: finance-C-021
when: When downloading SEC EDGAR filings via HTTP requests
action: declare a valid User-Agent header containing contact information (name and email)
severity: fatal
kind: domain_rule
modality: must
consequence: SEC EDGAR will block or throttle requests without a valid User-Agent header, causing downloads to fail with
HTTP 403 errors
stage_ids:
- crawl_and_download
- id: finance-C-028
when: When downloading filings from SEC EDGAR API endpoints
action: implement retry logic with exponential backoff to handle rate limiting responses (HTTP 429) and transient errors
severity: fatal
kind: operational_lesson
modality: must
consequence: SEC EDGAR enforces rate limits; without retry-backoff, repeated requests will trigger temporary IP blocks,
halting all subsequent downloads
stage_ids:
- crawl_and_download
- id: finance-C-030
when: When generating filenames for downloaded filing documents
action: use the convention {CIK}_{FilingTypeName}_{Year}_{accession}.{ext} to verify uniqueness per filing
severity: fatal
kind: architecture_guardrail
modality: must
consequence: Non-unique filenames cause subsequent downloads to overwrite existing files, resulting in data loss and incorrect
filing-to-metadata associations
stage_ids:
- crawl_and_download
- id: finance-C-032
when: When requesting SEC EDGAR index files
action: respect SEC EDGAR's rate limit of 10 requests per second to avoid triggering automated IP blocks
severity: fatal
kind: resource_boundary
modality: must
consequence: Exceeding rate limits causes SEC EDGAR to temporarily block the IP address, preventing all subsequent downloads
until the block expires (typically 15-60 minutes)
stage_ids:
- crawl_and_download
- id: finance-C-037
when: When creating the FILINGS_METADATA.csv output file
action: 'include each required columns: cik, company, filing_type, filing_date, period_of_report, sic, state_of_inc, htm_filing_link,
filename'
severity: fatal
kind: architecture_guardrail
modality: must
consequence: Missing columns in the metadata CSV breaks downstream parsing stages that expect specific field names, causing
KeyError exceptions in extract_items.py
stage_ids:
- crawl_and_download
- id: finance-C-041
when: When extracting items from SEC filings
action: Detect HTML vs plain text by checking for <td> and <tr> table elements presence
severity: fatal
kind: domain_rule
modality: must
consequence: Incorrect format detection causes HTML tags to appear in extracted text or structured data to be lost, corrupting
the extracted JSON output with malformed content
stage_ids:
- document_parsing
- id: finance-C-042
when: When removing HTML tables from filings
action: Preserve unstyled tables that may contain item listings while removing styled financial tables
severity: fatal
kind: domain_rule
modality: must
consequence: Removing all tables indiscriminately causes item section headers and listing tables to be deleted, resulting
in incomplete extraction of filing content
stage_ids:
- document_parsing
- id: finance-C-043
when: When processing 10-Q filings
action: Separate document text into Part I and Part II before extracting items to prevent cross-contamination
severity: fatal
kind: architecture_guardrail
modality: must
consequence: Without part separation, identical item names in different parts (e.g., Item 1 in Part I vs Item 1 in Part
II) cause content to be mixed or incorrectly attributed, corrupting the extracted data
stage_ids:
- document_parsing
- id: finance-C-046
when: When processing 8-K filings
action: Use obsolete item numbering (1-12) for filings before 2004-08-23 and new numbering (1.01-9.01) for later filings
severity: fatal
kind: domain_rule
modality: must
consequence: Using wrong item numbering scheme causes no matches to be found for historical filings, resulting in empty
item sections and complete extraction failure
stage_ids:
- document_parsing
- id: finance-C-047
when: When parsing deeply nested HTML documents
action: Set Python recursion limit to 30000 to handle SEC filings with deeply nested tables and div elements
severity: fatal
kind: resource_boundary
modality: must
consequence: Default recursion limit of 1000 causes StackOverflow errors on malformed or deeply nested HTML documents,
preventing extraction from completing
stage_ids:
- document_parsing
- id: finance-C-051
when: When generating JSON output from filings
action: Name 10-K/8-K items as item_1, item_1A, item_2 and 10-Q items as part_1_item_1, part_2_item_1A per filing type
convention
severity: fatal
kind: architecture_guardrail
modality: must
consequence: Inconsistent item naming prevents downstream NLP applications from reliably locating specific sections, causing
feature extraction failures
stage_ids:
- document_parsing
- id: finance-C-052
when: When writing JSON output files
action: Create filing type subdirectories (10-K, 10-Q, 8-K) before writing extracted JSON files
severity: fatal
kind: architecture_guardrail
modality: must
consequence: Missing directories cause FileNotFoundError during JSON write operations, preventing extracted data from
being persisted to disk
stage_ids:
- document_parsing
- id: finance-C-062
when: When configuring the logging infrastructure
action: Set the file logging level below DEBUG
severity: fatal
kind: domain_rule
modality: must_not
consequence: Setting file level below DEBUG will exclude DEBUG messages including request details needed for post-run
forensics, violating the acceptance criteria that log files must contain DEBUG-level messages
stage_ids:
- logging
- id: finance-C-063
when: When configuring console logging output
action: Set console handler level to INFO or higher
severity: fatal
kind: domain_rule
modality: must
consequence: Setting console level below INFO will cause DEBUG-level spam in stdout, violating the acceptance criterion
that console output shows INFO-level messages only
stage_ids:
- logging
- id: finance-C-064
when: When generating log filenames
action: Include timestamp in format YYYY_MM_DD_HH_MM_SS for run-level isolation
severity: fatal
kind: domain_rule
modality: must
consequence: Without timestamp in the filename, multiple runs will overwrite each other's log files, preventing debugging
of specific execution windows
stage_ids:
- logging
- id: finance-C-076
when: When configuring the SEC EDGAR API connection
action: Set user_agent to a valid contact string containing name and email (e.g., 'John Doe [email protected]')
severity: fatal
kind: domain_rule
modality: must
consequence: SEC EDGAR will block requests without a proper User-Agent header, causing all index downloads and crawls
to fail with traffic management messages
- id: finance-C-078
when: When processing TSV index files into DataFrames
action: Treat each CSV/TSV fields as strings using dtype=str to prevent numeric coercion of CIK and numeric identifiers
severity: fatal
kind: domain_rule
modality: must
consequence: CIK values like '0000320193' get coerced to integers 320193, causing file path mismatches and missing filing
metadata lookups downstream
- id: finance-C-079
when: When transferring DataFrame between index_download and crawl_and_download stages
action: 'Include each required columns: CIK, Company, Type, Date, complete_text_file_link, html_index, Filing Date, Period
of Report, SIC, htm_file_link, State of Inc, State location, Fiscal Year End, filename'
severity: fatal
kind: architecture_guardrail
modality: must
consequence: document_parsing stage expects specific columns to build JSON structure; missing columns cause KeyError exceptions
during extraction
- id: finance-C-083
when: When SEC EDGAR returns a 200 response with traffic management HTML
action: Treat such responses as successful downloads; the content must be validated for expected HTML structure
severity: fatal
kind: resource_boundary
modality: must_not
consequence: Traffic management pages get saved as raw filings, corrupting the dataset with invalid HTML that causes extraction
failures in document_parsing stage
- id: finance-C-087
when: When reading raw filing documents from disk
action: Construct file path as {raw_filings_folder}/{Type}/{filename} matching the directory structure created during
download
severity: fatal
kind: architecture_guardrail
modality: must
consequence: Incorrect file path causes FileNotFoundError, preventing document_parsing stage from processing downloaded
filings
- id: finance-C-095
when: When reading or writing FILINGS_METADATA.csv between download and extract stages
action: 'Verify CSV column names match exactly: CIK, Company, Type, Date, complete_text_file_link, html_index, Filing
Date, Period of Report, SIC, htm_file_link, State of Inc, State location, Fiscal Year End, filename'
severity: fatal
kind: architecture_guardrail
modality: must
consequence: Extract stage fails with KeyError when accessing metadata columns that have mismatched names, causing the
entire extraction pipeline to crash
- id: finance-C-111
when: When implementing 8-K filing extraction logic in extract_items.py
action: Verify the 8-K obsolete cutoff date 2004-08-23 is consistent across both production code and test assertions to
correctly identify which 8-K filings are obsolete under SEC filing rules
severity: fatal
kind: domain_rule
modality: must
consequence: Inconsistent cutoff date between code and tests creates false confidence scenarios where test validation
passes but production fails, violating SEC regulatory requirements for 8-K filing extraction
derived_from_bd_id: BD-072
- id: finance-C-114
when: When implementing or modifying document type detection logic for SEC filings
action: Detect HTML vs plain text by checking for <td> and <tr> elements — this specific check distinguishes structured
HTML from plain text with embedded tags
severity: fatal
kind: domain_rule
modality: must
consequence: Using extension-based detection or other heuristics causes incorrect parsing of .txt files containing HTML
content, corrupting extracted SEC filing data
derived_from_bd_id: BD-007
- id: finance-C-118
when: When implementing 10-Q item extraction parsing logic
action: Use '__' as the delimiter when encoding part-item relationships in SEC filing section names — the parsing logic
at line 927 depends on split('__') to correctly separate section numbers from item numbers
severity: fatal
kind: domain_rule
modality: must
consequence: Using a different delimiter breaks the hierarchical mapping between SEC filing sections and extracted items,
causing structural corruption of the parsed 10-Q document tree
derived_from_bd_id: BD-075
- id: finance-C-119
when: When modifying the FILINGS_METADATA.csv production or consumption logic in extract_items.py
action: Maintain exact column names, ordering, and data types as specified in the implicit contract between download module
(lines 424-439) and extract module (line 1199) — any change requires coordinated updates to both modules
severity: fatal
kind: domain_rule
modality: must
consequence: If download changes CSV column names, ordering, or data types without coordinating with extract, downstream
extraction will fail silently or produce incorrect filing metadata, corrupting all subsequent SEC filing analysis
derived_from_bd_id: BD-077
- id: finance-C-120
when: When implementing part extraction logic for SEC 10-Q filings in extract_items.py
action: Verify roman_numeral_map keys (I-XX, defined at lines 32-53) exactly match the PART regex pattern (line 540) —
any mismatch causes part extraction to silently skip or misidentify SEC section boundaries
severity: fatal
kind: domain_rule
modality: must
consequence: If roman_numeral_map keys do not align with PART regex pattern, part extraction will silently skip or misidentify
SEC section boundaries, causing item content to be attributed to wrong sections in 10-Q filings
derived_from_bd_id: BD-079
- id: finance-C-121
when: When implementing 8-K item extraction logic in extract_items.py
action: 'Apply date-based pattern switching for 8-K item numbering: use decimal notation (1.01-9.01) for filings on or
after August 23, 2004, and sequential integers (1-12) for filings before that date'
severity: fatal
kind: domain_rule
modality: must
consequence: Using only one pattern causes complete parsing failure for pre-2004 8-K filings — SEC mandated item numbering
format change on August 23, 2004, and historical filings must use the old format
derived_from_bd_id: BD-013
- id: finance-C-141
when: When implementing or testing 8-K filing format detection logic
action: Centralize the 8-K cutoff date 2004-08-23 as a single shared constant with import-time validation — both BD-067
and BD-013 implementations must reference the same constant to prevent timezone-related or rounding discrepancies
severity: fatal
kind: architecture_guardrail
modality: must
consequence: Without centralized date handling, tests and production code may use slightly different date representations
for the 2004-08-23 8-K cutoff, causing silent incorrect extraction of pre-2004 8-K filings without any error indication
derived_from_bd_id: BD-085
- id: finance-C-168
when: When implementing SEC EDGAR data retrieval with rate limit handling
action: 'Verify rate limit detection operates as OR logic across each three mechanisms: (1) BD-044 HTML text detection
(''will be managed until action is taken''), (2) BD-062 HTTP status codes (403/429), and (3) BD-003 exponential backoff
retries — each three paths must independently trigger rate limit response'
severity: fatal
kind: domain_rule
modality: must
consequence: Incomplete rate limit detection causes SEC EDGAR requests to fail silently or return partial data. This violates
regulatory data access reliability requirements and may result in gaps in mandatory financial disclosures used for trading
decisions
derived_from_bd_id: BD-086
regular:
- id: finance-C-001
when: When parsing SEC EDGAR master.idx file content
action: decode content using latin-1 encoding to preserve original byte values
severity: high
kind: domain_rule
modality: must
consequence: Using incorrect encoding (e.g., utf-8) will corrupt company names and paths containing non-ASCII characters,
resulting in missing or malformed filing records in the index
stage_ids:
- index_download
- id: finance-C-002
when: When processing SEC EDGAR master.idx file header
action: skip the first 10 lines containing header/metadata before parsing data rows
severity: high
kind: domain_rule
modality: must
consequence: Including header lines in the parsed data will cause downstream processing to fail when attempting to parse
header text as filing records
stage_ids:
- index_download
- id: finance-C-003
when: When validating quarter parameters for SEC EDGAR index download
action: pass invalid quarter values other than 1, 2, 3, or 4 to the download function
severity: high
kind: domain_rule
modality: must_not
consequence: Invalid quarter values will cause the download to fail with an exception, preventing any index files from
being retrieved
stage_ids:
- index_download
- id: finance-C-004
when: When downloading indices for the current calendar year
action: skip quarters that have not yet occurred based on current month calculation
severity: medium
kind: domain_rule
modality: must
consequence: Attempting to download future quarters will result in 404 errors and failed index downloads, wasting network
bandwidth and causing incorrect failure tracking
stage_ids:
- index_download
- id: finance-C-005
when: When naming the downloaded SEC EDGAR index TSV files
action: use the naming convention {year}_QTR{quarter}.tsv as required by downstream processing
severity: high
kind: architecture_guardrail
modality: must
consequence: Incorrect file naming will cause downstream stages to fail when searching for index files, breaking the entire
crawling pipeline
stage_ids:
- index_download
- id: finance-C-007
when: When making HTTP requests to SEC EDGAR
action: implement retry logic with exponential backoff for handling rate limits and transient failures
severity: high
kind: resource_boundary
modality: must
consequence: Without retry logic, rate-limited requests (403 errors) will cause immediate download failures, preventing
successful index retrieval
stage_ids:
- index_download
- id: finance-C-008
when: When retrying failed SEC EDGAR requests
action: include HTTP 403 in the list of status codes that trigger automatic retry
severity: high
kind: resource_boundary
modality: must
consequence: Excluding 403 from retry status codes will cause rate-limit errors to fail immediately instead of being retried,
breaking downloads
stage_ids:
- index_download
- id: finance-C-009
when: When processing already-downloaded SEC EDGAR indices
action: enable skip_present_indices option to avoid redundant network calls and API rate limit consumption
severity: medium
kind: operational_lesson
modality: should
consequence: Re-downloading existing indices wastes bandwidth, consumes SEC EDGAR API rate limits, and extends execution
time unnecessarily
stage_ids:
- index_download
- id: finance-C-010
when: When downloading SEC EDGAR master.zip archives
action: extract and process the master.idx file from within the downloaded zip archive
severity: high
kind: architecture_guardrail
modality: must
consequence: Failing to extract from the zip archive will cause the download to fail when trying to read the raw zip bytes
as text
stage_ids:
- index_download
- id: finance-C-011
when: When processing SEC EDGAR index file paths
action: convert .txt file references to -index.html references for proper HTML index access
severity: high
kind: architecture_guardrail
modality: must
consequence: Using .txt references instead of -index.html will cause downstream document downloads to fail, as SEC EDGAR
HTML indices are the standard access method
stage_ids:
- index_download
- id: finance-C-012
when: When saving processed index files to disk
action: use pipe-delimiter format preserving CIK|Company|Form|Date|Path|HTML_Index structure
severity: high
kind: domain_rule
modality: must
consequence: Incorrect delimiter or missing fields will cause downstream parsing to fail when expecting the standard SEC
EDGAR index format
stage_ids:
- index_download
- id: finance-C-013
when: When making claims about SEC EDGAR data coverage
action: claim that downloaded indices represent complete real-time data without regulatory delays
severity: medium
kind: claim_boundary
modality: must_not
consequence: SEC EDGAR data has inherent delays and filing deadlines; presenting the data as real-time would mislead users
about data freshness
stage_ids:
- index_download
- id: finance-C-014
when: When handling failed SEC EDGAR index downloads
action: track failed indices separately and prompt user for retry decision instead of silently continuing
severity: high
kind: architecture_guardrail
modality: must
consequence: Silently continuing after download failures will result in incomplete index coverage, causing downstream
processing to miss filings from failed periods
stage_ids:
- index_download
- id: finance-C-015
when: When setting SEC EDGAR API request backoff parameters
action: use backoff_factor of 0.2 or higher to avoid overwhelming SEC EDGAR rate limits
severity: high
kind: resource_boundary
modality: must
consequence: Too-aggressive backoff (or no backoff) will cause repeated 403 rate-limit errors, potentially resulting in
temporary or permanent IP blocking by SEC EDGAR
stage_ids:
- index_download
- id: finance-C-016
when: When verifying downloaded index file existence
action: check file existence using os.path.exists before deciding to skip or download
severity: high
kind: architecture_guardrail
modality: must
consequence: Skipping the existence check will cause incorrect behavior when skip_present_indices is True but files are
missing
stage_ids:
- index_download
- id: finance-C-018
when: When configuring the index download stage
action: set start_year greater than end_year as this creates an empty download range
severity: high
kind: domain_rule
modality: must_not
consequence: Invalid year range will cause the download loop to execute zero iterations, producing no index files and
silent failure
stage_ids:
- index_download
- id: finance-C-019
when: When using this tool for financial analysis or regulatory compliance
action: claim this tool provides official SEC filings or guaranteed regulatory compliance verification
severity: high
kind: claim_boundary
modality: must_not
consequence: Presenting scraped EDGAR data as official or compliant could lead to legal liability and incorrect financial
decisions based on potentially outdated or incomplete data
stage_ids:
- index_download
- id: finance-C-020
when: When considering skipping the retry mechanism
action: skip the exponential backoff retry logic even when encountering transient network errors
severity: high
kind: rationalization_guard
modality: must_not
consequence: Skipping retries will cause single transient failures to become complete download failures, wasting previous
successful requests in the batch
stage_ids:
- index_download
- id: finance-C-022
when: When crawling HTML index pages from SEC EDGAR
action: extract the Period of Report field from the filing page; return None if it cannot be found
severity: high
kind: domain_rule
modality: must
consequence: Filings without a Period of Report cannot be properly categorized by year, causing incorrect temporal ordering
and potential duplication of financial data
stage_ids:
- crawl_and_download
- id: finance-C-023
when: When processing EDGAR master.idx index files
action: decode index file content using latin-1 encoding before processing
severity: high
kind: domain_rule
modality: must
consequence: Using incorrect encoding (e.g., UTF-8) will cause character decoding errors for non-ASCII company names,
resulting in corrupted or truncated metadata entries
stage_ids:
- crawl_and_download
- id: finance-C-024
when: When downloading filing documents from SEC EDGAR
action: prefer HTML (.htm/.html) document links over complete submission text files as primary download target
severity: high
kind: architecture_guardrail
modality: must
consequence: Falling back directly to complete submission text files without attempting HTML parsing produces unstructured
data that downstream parsers cannot process correctly
stage_ids:
- crawl_and_download
- id: finance-C-025
when: When encountering iXBRL document links (ix?doc= prefix) during filing download
action: strip the ix?doc=/ prefix from URLs before downloading to obtain valid document URLs
severity: high
kind: resource_boundary
modality: must
consequence: Downloading with ix?doc=/ prefixed URLs will result in 404 errors or invalid content, causing the filing
document to be missing from the dataset
stage_ids:
- crawl_and_download
- id: finance-C-026
when: When downloading indices for the current year
action: skip quarters that have not yet elapsed to avoid requesting non-existent data
severity: medium
kind: domain_rule
modality: must
consequence: Requesting future quarters will return empty or 404 responses, wasting network bandwidth and potentially
corrupting index state
stage_ids:
- crawl_and_download
- id: finance-C-027
when: When storing filing metadata and downloaded files
action: write CSV metadata to a temporary file first, then atomically move to final location
severity: high
kind: operational_lesson
modality: must
consequence: Writing directly to the metadata CSV risks data loss if the process is interrupted (e.g., Ctrl+C), leaving
an incomplete or corrupted metadata file
stage_ids:
- crawl_and_download
- id: finance-C-029
when: When organizing downloaded raw filing documents
action: store files in subdirectories named after the filing type (e.g., RAW_FILINGS/10-K/)
severity: high
kind: architecture_guardrail
modality: must
consequence: Storing all filing types in a single directory causes file name collisions and makes downstream parsing select
the wrong document for each filing type
stage_ids:
- crawl_and_download
- id: finance-C-031
when: When fetching company metadata (SIC, state, fiscal year) for multiple filings
action: cache company metadata in companies_info.json to avoid redundant HTTP requests per filing
severity: medium
kind: resource_boundary
modality: must
consequence: Fetching company metadata for each filing causes N redundant HTTP requests per company, multiplying API load
and slowing down bulk downloads significantly
stage_ids:
- crawl_and_download
- id: finance-C-033
when: When processing Document Format Files table in EDGAR HTML indexes
action: validate that tr.contents[7] exists and matches target filing types before extracting document links
severity: high
kind: domain_rule
modality: must
consequence: Accessing index 7 without bounds checking causes IndexError exceptions that crash the crawl process, leaving
subsequent filings unprocessed
stage_ids:
- crawl_and_download
- id: finance-C-034
when: When validating quarter values in configuration
action: reject quarter values outside the range [1, 2, 3, 4] with a descriptive error
severity: high
kind: domain_rule
modality: must
consequence: Invalid quarter values cause unpredictable behavior in index filtering, potentially downloading wrong quarter
data or returning empty result sets
stage_ids:
- crawl_and_download
- id: finance-C-035
when: When extracting company metadata from SEC EDGAR company pages
action: handle missing HTML elements gracefully using try-except blocks and fall back to cached values
severity: medium
kind: operational_lesson
modality: must
consequence: Parsing failures for SIC/state/fiscal year without fallback cause NaN values in metadata CSV, breaking downstream
financial analysis that requires SIC codes for industry filtering
stage_ids:
- crawl_and_download
- id: finance-C-036
when: When downloading documents via HTTP requests
action: check for SEC EDGAR rate-limit error messages in response text before proceeding
severity: high
kind: resource_boundary
modality: must
consequence: Ignoring rate-limit responses allows the script to continue requesting blocked endpoints, extending the IP
block duration significantly
stage_ids:
- crawl_and_download
- id: finance-C-038
when: When downloading filings for multiple years and quarters
action: skip validation that filings already exist locally before initiating new downloads
severity: medium
kind: operational_lesson
modality: must_not
consequence: Redownloading existing filings wastes bandwidth and API quota, and risks overwriting files that may have
been manually curated or have different content
stage_ids:
- crawl_and_download
- id: finance-C-039
when: When providing filing types to the download module
action: specify at least one valid filing type; reject empty filing type lists
severity: high
kind: domain_rule
modality: must
consequence: An empty filing type list causes the script to exit silently without downloading anything, wasting time on
index downloads that serve no purpose
stage_ids:
- crawl_and_download
- id: finance-C-040
when: When using SEC EDGAR as a data source for financial analysis
action: claim that downloaded filings represent real-time or current data
severity: high
kind: claim_boundary
modality: must_not
consequence: SEC EDGAR has inherent processing delays of 1-5 business days between filing submission and availability;
presenting data as current misleads financial analysts about data freshness
stage_ids:
- crawl_and_download
- id: finance-C-044
when: When matching item patterns in filing text
action: Match both Roman numerals (I, II, III) and Arabic numerals (1, 2, 3) for item numbering
severity: high
kind: domain_rule
modality: must
consequence: Single-format matching causes extraction failures for filings using alternative numbering conventions, resulting
in missing or empty item sections in the output JSON
stage_ids:
- document_parsing
- id: finance-C-045
when: When selecting section boundaries between items
action: Select the longest matching section when multiple candidates exist to prefer actual content over TOC entries
severity: high
kind: architecture_guardrail
modality: must
consequence: Selecting shorter TOC entries over actual section content causes only table of contents text to be extracted,
leaving item sections empty or incomplete
stage_ids:
- document_parsing
- id: finance-C-048
when: When processing CPU-bound text parsing operations
action: Use ProcessPool (process-based parallelism) instead of thread-based parallelism to bypass the Global Interpreter
Lock
severity: medium
kind: architecture_guardrail
modality: must
consequence: Thread-based parallelism suffers from GIL contention on CPU-bound parsing, causing severe performance degradation
and extended processing times
stage_ids:
- document_parsing
- id: finance-C-049
when: When extracting from 10-Q reports
action: Apply heuristics to detect and correct part separation errors in malformed filings
severity: high
kind: operational_lesson
modality: must
consequence: 10-Q filings with formatting bugs (missing PART I markers, PART I containing only ToC) cause incorrect part
attribution, mixing financial data with narrative content
stage_ids:
- document_parsing
- id: finance-C-050
when: When handling embedded content in old filings
action: Remove embedded PDF sections and handle legacy .txt format without <DOCUMENT> tags
severity: high
kind: operational_lesson
modality: must
consequence: Unprocessed PDF tags and missing <DOCUMENT> wrappers cause corrupted output or complete extraction failure
for historical filings predating standardized EDGAR formatting
stage_ids:
- document_parsing
- id: finance-C-053
when: When handling edge cases in item extraction
action: Return empty string for missing items rather than omitting keys from JSON output
severity: high
kind: domain_rule
modality: must
consequence: Missing keys in JSON output cause KeyError exceptions in downstream consumers expecting consistent schema
across all filings
stage_ids:
- document_parsing
- id: finance-C-054
when: When logging extraction status
action: Log warnings when 10-Q part separation encounters known formatting issues
severity: medium
kind: operational_lesson
modality: must
consequence: Silent failures in part extraction produce corrupted data without user awareness, causing downstream analysis
to use incomplete or misattributed content
stage_ids:
- document_parsing
- id: finance-C-055
when: When verifying extracted filing data
action: Validate that at least one item section was successfully extracted before returning JSON
severity: high
kind: domain_rule
modality: must
consequence: Returning JSON with all empty item sections provides no usable data while appearing successful, causing silent
failures in data pipelines
stage_ids:
- document_parsing
- id: finance-C-056
when: When configuring extraction for production use
action: Enable skip_extracted_filings option to support incremental and resumable extraction
severity: low
kind: operational_lesson
modality: should
consequence: Re-extracting already processed filings wastes CPU cycles on redundant parsing operations, increasing processing
time proportionally to already-completed work
stage_ids:
- document_parsing
- id: finance-C-057
when: When processing filings for financial NLP research
action: Remove financial/numerical tables from extracted text to facilitate text-only analysis workflows
severity: medium
kind: domain_rule
modality: must
consequence: Including numerical tables in text extraction corrupts NLP training data with tabular noise, degrading model
performance on narrative financial text analysis
stage_ids:
- document_parsing
- id: finance-C-058
when: When validating input filing metadata
action: Reject unsupported filing types with an exception listing available types
severity: high
kind: architecture_guardrail
modality: must
consequence: Processing unsupported filing types produces no useful output while consuming resources, with cryptic failures
if user doesn't understand why extraction isn't working
stage_ids:
- document_parsing
- id: finance-C-059
when: When cleaning extracted text
action: Normalize Unicode special characters and fix broken section headers caused by OCR or transmission errors
severity: medium
kind: domain_rule
modality: must
consequence: Non-normalized special characters (smart quotes, em-dashes, non-breaking spaces) cause encoding issues and
text matching failures in downstream NLP processing
stage_ids:
- document_parsing
- id: finance-C-060
when: When using extracted filing data for analysis
action: Claim that extracted content is complete or free of parsing errors
severity: medium
kind: claim_boundary
modality: must_not
consequence: SEC filings frequently contain formatting bugs, inconsistent numbering, and encoding issues that cause extraction
to fail for some items; presenting results as complete misleads users about data quality
stage_ids:
- document_parsing
- id: finance-C-061
when: When instantiating the Logger class
action: Pass a name parameter to identify the logging context
severity: high
kind: domain_rule
modality: must
consequence: Without a name parameter, the logger lacks proper identification in log entries, making it difficult to trace
which component generated log messages during debugging
stage_ids:
- logging
- id: finance-C-065
when: When creating log directories
action: Verify the logs/ directory exists before writing log files
severity: high
kind: domain_rule
modality: must
consequence: Without ensuring the logs directory exists, log file writes will fail causing the logging system to malfunction
and lose critical debugging information
stage_ids:
- logging
- id: finance-C-066
when: When suppressing third-party library logs
action: Set urllib3 and cssutils log levels to CRITICAL to reduce noise
severity: medium
kind: resource_boundary
modality: must
consequence: Without suppressing third-party library noise, log files become polluted with irrelevant HTTP and CSS parsing
messages, obscuring important application-level logging
stage_ids:
- logging
- id: finance-C-067
when: When selecting timestamp timezone
action: Use gmtime() for UTC-based timestamps to verify cross-timezone consistency
severity: medium
kind: domain_rule
modality: should
consequence: Using localtime instead of gmtime will cause timestamp confusion when debugging logs across different timezones,
making it difficult to correlate events from distributed runs
stage_ids:
- logging
- id: finance-C-068
when: When defining the LOGGING_DIR constant
action: Place the logs directory relative to the package root
severity: high
kind: architecture_guardrail
modality: must
consequence: Using an absolute path or wrong directory location will cause log file writes to fail or place logs in unexpected
locations
stage_ids:
- logging
- id: finance-C-069
when: When configuring the file handler format
action: Include asctime, name, levelname, and message in the log format for debugging
severity: high
kind: domain_rule
modality: must
consequence: Without comprehensive log format fields, post-run forensics becomes difficult as log entries lack context
about timing, source component, and severity
stage_ids:
- logging
- id: finance-C-070
when: When configuring the console handler format
action: Use simplified message-only format for console output
severity: low
kind: resource_boundary
modality: must
consequence: Including verbose format fields in console output clutters the terminal with redundant information during
real-time monitoring
stage_ids:
- logging
- id: finance-C-071
when: When storing log files in version control
action: Commit log files to version control
severity: high
kind: claim_boundary
modality: must_not
consequence: Committing log files to version control causes repository bloat and exposes potentially sensitive information
about system internals
stage_ids:
- logging
- id: finance-C-072
when: When instantiating Logger for a new module
action: Pass a descriptive name based on the module or operation context
severity: medium
kind: architecture_guardrail
modality: must
consequence: Without a descriptive name parameter, log entries become ambiguous about which module or operation generated
them, hampering debugging
stage_ids:
- logging
- id: finance-C-073
when: When using the filemode parameter
action: Use append mode ('a') to preserve log history across runs
severity: medium
kind: domain_rule
modality: must
consequence: Using write mode ('w') would overwrite existing logs, losing valuable historical debugging information from
previous runs
stage_ids:
- logging
- id: finance-C-074
when: When adding a console handler to the root logger
action: Add the console handler to the root logger to capture each module logs
severity: high
kind: architecture_guardrail
modality: must
consequence: Adding console handler to a non-root logger will cause duplicate output or miss logs from other modules that
don't explicitly use the same logger name
stage_ids:
- logging
- id: finance-C-075
when: When documenting the logging infrastructure
action: Claim the logging system provides real-time streaming or live monitoring
severity: medium
kind: claim_boundary
modality: must_not
consequence: The logging system uses polling-based StreamHandler for console output and does not provide true real-time
streaming capabilities, so such claims would be misleading
stage_ids:
- logging
- id: finance-C-077
when: When downloading EDGAR indices for future quarters
action: Request indices for quarters beyond the current calendar quarter
severity: high
kind: domain_rule
modality: must_not
consequence: SEC EDGAR returns 404 errors for future quarter indices, causing download_indices() to fail repeatedly and
waste API quota
- id: finance-C-080
when: When SEC EDGAR blocks requests due to rate limiting
action: Wait and retry with exponential backoff (up to 5 retries with 0.2 backoff factor) before failing
severity: high
kind: resource_boundary
modality: must
consequence: Without retry logic, rate-limited requests fail immediately, causing incomplete index downloads and missing
filing data
- id: finance-C-081
when: When appending new filings to FILINGS_METADATA.csv
action: Write to a temporary file first (.tmp), then atomically move it to the final location using shutil.move
severity: high
kind: operational_lesson
modality: must
consequence: Direct writes can corrupt the CSV if interrupted (e.g., Ctrl+C), leaving metadata in an inconsistent state
and causing duplicate downloads on retry
- id: finance-C-082
when: When missing company metadata is encountered during crawl
action: Fill missing values (SIC, State of Inc, State location, Fiscal Year End) from companies_info.json cache keyed
by CIK
severity: medium
kind: architecture_guardrail
modality: must
consequence: Incomplete metadata causes downstream document_parsing to produce JSON with empty or null fields, reducing
data utility for NLP research
- id: finance-C-084
when: When processing 8-K filings dated before August 23, 2004
action: Use obsolete 8-K item naming convention (items 1-12) instead of modern dot-notation (items 1.01-9.01)
severity: high
kind: operational_lesson
modality: must
consequence: Using wrong item pattern causes zero items to be extracted from pre-2004 8-K filings, resulting in incomplete
NLP datasets
- id: finance-C-085
when: When extracting filing content from raw HTML/text documents
action: Detect HTML structure via <td> and <tr> tags to determine whether to use BeautifulSoup parsing or plain text regex
extraction
severity: high
kind: architecture_guardrail
modality: must
consequence: Incorrect parsing mode causes garbled text extraction, breaking NLP tokenization and analysis downstream
- id: finance-C-086
when: When replacing NaN values in DataFrames read from CSV
action: Convert np.nan to Python None for consistent null value handling across each downstream JSON serialization
severity: high
kind: domain_rule
modality: must
consequence: np.nan values serialize as 'NaN' strings in JSON, breaking schema validation and causing downstream parsing
errors
- id: finance-C-088
when: When SEC EDGAR's bulk index files use .zip format
action: Extract master.zip archive and parse master.idx file starting from line 11 (skipping EDGAR header lines)
severity: medium
kind: resource_boundary
modality: must
consequence: Parsing from line 1 includes EDGAR header metadata, corrupting the index DataFrame with invalid filing records
- id: finance-C-089
when: When handling iXBRL documents in SEC filings
action: Strip ix?doc=/ prefix from URLs before downloading to get valid .htm file links
severity: high
kind: operational_lesson
modality: must
consequence: Invalid iXBRL URLs cause HTTP 404 errors, leaving raw filings missing and metadata pointing to non-existent
files
- id: finance-C-090
when: When writing extracted filing JSON output
action: Store JSON with UTF-8 encoding (ensure_ascii=False) to preserve special characters in financial text
severity: medium
kind: architecture_guardrail
modality: must
consequence: ASCII encoding mangles non-ASCII characters (e.g., trademark symbols, em-dashes, currency symbols), corrupting
financial text for NLP training
- id: finance-C-091
when: When presenting EDGAR-CRAWLER as a data source for financial analysis
action: Claim the extracted JSON structure is semantically equivalent to the original SEC filing documents
severity: medium
kind: claim_boundary
modality: must_not
consequence: HTML parsing can miss or incorrectly extract content; tables are optionally removed; the tool is designed
for NLP research, not regulatory compliance
- id: finance-C-092
when: When using crawled SEC filing data for trading or investment decisions
action: Treat EDGAR-CRAWLER output as real-time or authoritative financial data suitable for live trading
severity: high
kind: claim_boundary
modality: must_not
consequence: EDGAR has inherent reporting delays (8-K within 4 business days); crawled data reflects historical filings,
not current market conditions
- id: finance-C-093
when: When encountering extraction failures for individual items
action: Skip investigation and assume the source filing lacks that section content
severity: medium
kind: rationalization_guard
modality: must_not
consequence: Many 10-Q filings have formatting bugs (missing PART headers, ToC interference); skipping investigation leads
to systematically incomplete NLP datasets
- id: finance-C-094
when: When implementing file paths across download and extract stages
action: Use {DATASET_DIR} as root directory for each file paths, as defined in __init__.py:2
severity: high
kind: architecture_guardrail
modality: must
consequence: File operations write to unintended directories, causing data loss or retrieval failures because files are
not in the expected canonical location
- id: finance-C-096
when: When reading filings metadata CSV from any stage
action: Use dtype=str in pd.read_csv to prevent pandas type coercion on numeric fields like CIK, and replace np.nan with
None
severity: high
kind: domain_rule
modality: must
consequence: CIK values lose leading zeros (e.g., 0000320193 becomes 320193), causing mismatches between downloaded file
names and metadata references
- id: finance-C-097
when: When processing 8-K filings with dates around the historical transition point
action: Use cutoff date '2004-08-23' consistently between extract_items.py and test_extract_items.py to determine whether
to use item_list_8k or item_list_8k_obsolete
severity: high
kind: architecture_guardrail
modality: must
consequence: Pre-2004-08-23 8-K filings use wrong item pattern matching (modern item names instead of obsolete), causing
all items to extract as empty strings
- id: finance-C-098
when: When extracting items from 10-Q filings
action: Use roman_numeral_map keys (1-20) that match part numbers in item_list_10q to enable dual-format matching (Roman
and Arabic numerals) for PART detection
severity: high
kind: domain_rule
modality: must
consequence: PART I and PART 1 sections fail to match correctly, causing entire 10-Q parts to be missed during extraction
- id: finance-C-099
when: When presenting or reporting this system's extracted financial data to users
action: Claim that extracted filing data equals real-time trading signals, calculated financial metrics, or live market
data
severity: high
kind: claim_boundary
modality: must_not
consequence: Users build automated trading systems based on stale EDGAR filings (8-K/10-Q/10-K are delayed disclosures),
leading to trades on outdated information and potential regulatory violations
- id: finance-C-100
when: When building financial analysis systems using this toolkit's output
action: Claim that parsed 10-K/10-Q/8-K item text provides calculated financial metrics such as P/E ratios, EPS, or ROI
severity: high
kind: claim_boundary
modality: must_not
consequence: Users make investment decisions based on uncalculated text strings, leading to incorrect financial analysis
and potential financial losses
- id: finance-C-101
when: When deploying this toolkit in enterprise document processing pipelines
action: Claim that extracted JSON output includes schema validation, data quality guarantees, or completeness verification
for production-grade compliance systems
severity: high
kind: claim_boundary
modality: must_not
consequence: Compliance systems accept unvalidated JSON with empty item fields as complete, leading to regulatory reporting
gaps and audit failures
- id: finance-C-102
when: When processing non-SEC financial documents
action: Claim support for extracting structured data from non-SEC financial data sources such as company press releases,
earnings call transcripts, or international regulatory filings
severity: high
kind: claim_boundary
modality: must_not
consequence: Users attempt to parse non-SEC documents with SEC-specific item pattern matching, producing malformed JSON
with missing or incorrect field mappings
- id: finance-C-103
when: When downloading filings from SEC EDGAR
action: Declare a valid User-Agent string in HTTP requests to SEC EDGAR to comply with their access policy and avoid IP
blocking
severity: high
kind: resource_boundary
modality: must
consequence: SEC EDGAR blocks requests without proper User-Agent identification, causing downloads to fail with traffic
management messages
stage_ids:
- index_download
- id: finance-C-104
when: When naming extracted JSON keys for 10-Q items
action: 'Use ''__'' delimiter to encode part-item relationship in JSON keys: {part}_item_{number} format (e.g., part_1_item_1,
part_2_item_1A)'
severity: high
kind: architecture_guardrail
modality: must
consequence: Downstream NLP systems expecting standard part_item_N format receive mismatched key names, causing schema
validation failures and data ingestion errors
- id: finance-C-105
when: When naming extracted JSON keys for 10-K and 8-K items
action: 'Use ''item_'' prefix for each item keys: item_{number} format (e.g., item_1, item_1A, item_2.01, item_9A)'
severity: high
kind: architecture_guardrail
modality: must
consequence: Downstream systems expecting standard item_N key format receive malformed key names, breaking data pipeline
integration
- id: finance-C-106
when: When implementing or modifying SEC filing item extraction regex patterns in extract_items.py
action: Maintain regex patterns that capture both Roman numeral (I, II, III) and Arabic numeral (1, 2, 3) formats for
item numbering to verify comprehensive extraction from historical SEC filings with heterogeneous numbering conventions
severity: high
kind: domain_rule
modality: must
consequence: Single-format matching causes extraction failures for historical filings with non-standard numbering conventions,
leading to incomplete data extraction and missing critical disclosure items from the SEC corpus
derived_from_bd_id: BD-010
- id: finance-C-107
when: When implementing SEC filing document parsing logic in download_filings.py
action: Parse HTML document format tables for .htm/.html links first, then fall back to complete submission TXT files
for older filings to verify full coverage from 1994 to present
severity: high
kind: domain_rule
modality: must
consequence: HTML-only parsing misses older TXT-based SEC submissions, creating gaps in filing history and incomplete
coverage of historical regulatory filings prior to SEC standardization
derived_from_bd_id: BD-004
- id: finance-C-108
when: When configuring logging levels in logger.py
action: Set console output to INFO+ level for clean operational indicators without DEBUG noise, and file logging to DEBUG+
level to capture complete diagnostic information for post-run forensics
severity: high
kind: architecture_guardrail
modality: must
consequence: Single log level either clutters console output with DEBUG noise during batch operations or loses critical
diagnostic information needed for failure investigation and performance debugging
derived_from_bd_id: BD-016
- id: finance-C-109
when: When implementing CIK lookup and company metadata retrieval logic
action: Maintain a persistent company_info.json cache file for company metadata to avoid redundant SEC EDGAR API calls
during bulk operations, reducing API overhead by approximately 80%
severity: high
kind: architecture_guardrail
modality: must
consequence: Per-filing CIK lookups trigger excessive API requests, increasing rate-limit risk and causing significant
throughput degradation in bulk download scenarios with repeated CIK access patterns
derived_from_bd_id: BD-002
- id: finance-C-110
when: When configuring log file naming in the logging setup
action: Use timestamped log filenames to verify unique log files per execution run, preventing overwrites and enabling
post-hoc debugging of specific execution windows
severity: medium
kind: operational_lesson
modality: must
consequence: Static log filenames cause overwrites between execution runs, making it impossible to diagnose issues in
long-running bulk operations and losing critical forensic evidence
derived_from_bd_id: BD-015
- id: finance-C-112
when: When implementing or refactoring directory initialization logic
action: Change the eager directory creation pattern in __init__.py to lazy/on-demand creation — directories must be created
at import time, not on first file write
severity: high
kind: domain_rule
modality: must_not
consequence: Lazy directory creation introduces FileNotFoundError during file operations when imports occur without triggering
creation, breaking SEC filing downloads in production environments
derived_from_bd_id: BD-019
- id: finance-C-113
when: When configuring dataset directory paths for SEC filing extraction
action: Verify that DATASET_DIR path matches deployment requirements — if a custom location is needed, modify the hardcoded
'datasets' subfolder path in __init__.py before running extraction workflows
severity: medium
kind: operational_lesson
modality: should
consequence: Hardcoded DATASET_DIR causes extraction failures when the 'datasets' subfolder location doesn't match user
expectations or deployment environment paths
derived_from_bd_id: BD-017
- id: finance-C-115
when: When implementing HTTP request retry logic for SEC EDGAR downloads
action: Use exponential backoff with 5 retries for HTTP requests — SEC EDGAR enforces strict rate limits and returns 403
errors when exceeded
severity: high
kind: domain_rule
modality: must
consequence: Without sufficient retry logic, bulk downloads fail prematurely on rate-limited requests, requiring manual
restart and failing to complete large SEC filing batches
derived_from_bd_id: BD-003
- id: finance-C-116
when: When modifying 10-Q extraction logic that uses state modification and restoration
action: Preserve the state modification/restoration pattern for bug recovery — if refactoring, use a context manager or
equivalent atomic pattern to verify self.items_list is always restored after temporary assignment
severity: high
kind: domain_rule
modality: must
consequence: Removing the state restoration pattern causes self.items_list to retain incorrect intermediate state after
extraction failures, corrupting subsequent filing data in the batch
derived_from_bd_id: BD-078
- id: finance-C-117
when: When implementing company metadata caching for SEC EDGAR downloads
action: Cache company metadata (SIC codes, state of incorporation, fiscal year) in companies_info.json — caching eliminates
redundant HTTP requests and prevents rate-limit pressure during bulk operations
severity: high
kind: domain_rule
modality: must
consequence: Without metadata caching, each filing triggers redundant HTTP requests for constant company information,
causing approximately 50x increase in API calls and potential rate-limit failures
derived_from_bd_id: BD-005
- id: finance-C-122
when: When extracting items from SEC 10-Q filings in extract_items.py
action: Split document text into Part I (financial statements) and Part II (management discussion) before item extraction
to prevent Item 1A contamination between sections
severity: high
kind: operational_lesson
modality: must
consequence: Without part-level separation, Item 1A in Part I (risk factors) mixes with Item 1A in Part II (controls discussion),
corrupting downstream analysis by mixing financial risk disclosures with management assessment content
derived_from_bd_id: BD-009
- id: finance-C-123
when: When implementing table filtering logic in extract_items.py
action: Remove only tables with background-color or background-image attributes; preserve each other tables regardless
of their visual appearance — do not assume each tables are data tables
severity: high
kind: architecture_guardrail
modality: must_not
consequence: Removing all tables destroys item listings and narrative content that appear in unstyled HTML tables, losing
critical information from SEC filing management discussions and risk disclosures
derived_from_bd_id: BD-008
- id: finance-C-124
when: When implementing section boundary detection in extract_items.py
action: Select the longest matching section when multiple candidates share identical headers — this disambiguates Table
of Contents entries (shorter) from actual item content (longer)
severity: high
kind: architecture_guardrail
modality: must
consequence: Without longest-match selection, Table of Contents entries match first and cause premature section termination,
truncating actual item content and losing 2-5% of critical SEC filing disclosure text per document
derived_from_bd_id: BD-011
- id: finance-C-125
when: When implementing any randomized behavior in the SEC EDGAR download pipeline
action: Assume the framework handles random seed configuration for reproducible downloads — the framework does not implement
random seed management, leading to non-deterministic download sequences across runs
severity: high
kind: claim_boundary
modality: must_not
consequence: Without random seed management, download sequences vary between runs causing inconsistent file ordering,
potential duplicate downloads, and non-reproducible audit trails that fail regulatory compliance requirements
derived_from_bd_id: BD-GAP-002
- id: finance-C-126
when: When implementing reproducibility requirements in the SEC EDGAR download pipeline
action: Implement random seed configuration by setting numpy.random.seed() and random.seed() before any randomized operations
in index_download, and document the seed value used for each download session in logs
severity: high
kind: domain_rule
modality: must
consequence: Without explicit random seed handling, retry logic and shuffling operations produce different results each
run, preventing audit reproducibility and making it impossible to reproduce exact download sequences for regulatory
verification
derived_from_bd_id: BD-GAP-002
- id: finance-C-127
when: When using HtmlStripper for HTML parsing in SEC document extraction
action: Verify that convert_charrefs=True and strict=False are documented in system configuration; if implementing custom
HTML parsing, verify equivalent entity conversion and malformed HTML tolerance behavior to maintain consistency with
extraction pipeline
severity: medium
kind: operational_lesson
modality: should
consequence: HtmlStripper with convert_charrefs=True automatically converts HTML character references like & to Unicode
characters, potentially creating inconsistencies if downstream processing expects raw entities; strict=False silently
tolerates malformed HTML which could mask parser errors
derived_from_bd_id: BD-074
- id: finance-C-128
when: When implementing or configuring parallel filing processing in SEC document extraction
action: Use ProcessPool for parallel filing processing due to Python GIL limitations on CPU-bound parallelism; ThreadPool
is insufficient for text parsing workloads; verify processes >= 2 for actual parallelism benefit
severity: high
kind: architecture_guardrail
modality: must
consequence: Using ThreadPool for CPU-bound HTML parsing and regex matching provides no parallelism benefit due to Python
GIL; single-process execution becomes a bottleneck when processing large batches of SEC filings, causing linear scaling
degradation
derived_from_bd_id: BD-012
- id: finance-C-129
when: When implementing Roman numeral conversion for SEC 10-Q document parsing
action: Verify roman_numeral_map covers values 1-20 for bidirectional conversion between numeric and Roman numeral Part/Item
identifiers in 10-Q filings; values exceeding 20 will return '?' placeholder and cause section identification to fail
severity: high
kind: domain_rule
modality: must
consequence: SEC 10-Q filings use Roman numerals for Parts and Items (I, II, III, IV, V, VI, VII, VIII, IX, X, etc.).
When a 10-Q contains Part X or higher, roman_numeral_map returns '?' placeholder, causing downstream section matching
to fail silently and missing critical financial disclosures
derived_from_bd_id: BD-024
- id: finance-C-130
when: When extracting SEC filing content from HTML documents
action: Require presence of both <td> AND <tr> HTML elements to classify a document as HTML; documents containing only
one element type should not be classified as full HTML documents
severity: high
kind: domain_rule
modality: must
consequence: Some SEC documents contain embedded HTML snippets that don't represent full document structure. Misclassifying
a document as HTML when it only has partial table elements causes incorrect parsing logic to be applied, resulting in
garbled or incomplete extraction of filing content
derived_from_bd_id: BD-025
- id: finance-C-131
when: When implementing table extraction from SEC filing documents
action: Preserve tables containing item index patterns (Item 1, Item 1A, Item 2, etc.) regardless of background color
styling; do not filter or remove tables based solely on visual CSS attributes
severity: high
kind: domain_rule
modality: must
consequence: Item index tables are critical for document structure and navigation. Removing tables with colored backgrounds
during document cleaning causes critical SEC filing section headers to be lost, breaking downstream content extraction
and document structure analysis
derived_from_bd_id: BD-032
- id: finance-C-132
when: When processing span elements in extracted SEC document content
action: Replace horizontal span margins (CSS margin-left/margin-right) with single space character, and vertical span
margins (CSS margin-top/margin-bottom) with single newline character; this rule applies to margin CSS properties only,
not padding or other spacing
severity: medium
kind: architecture_guardrail
modality: must
consequence: Span margin replacement preserves intended word separation and line breaks in SEC documents. Without proper
spacing rules, merged words lose boundaries horizontally and paragraph structure is lost vertically, causing content
to become unreadable or misinterpreted
derived_from_bd_id: BD-033
- id: finance-C-133
when: When constructing absolute URLs from SEC EDGAR index relative paths for filing downloads
action: Prepend 'https://www.sec.gov/Archives/' to relative file paths to construct valid absolute URLs; validate or handle
broken paths before URL construction to prevent 404 errors on downloads
severity: medium
kind: operational_lesson
modality: should
consequence: SEC EDGAR indices contain relative file paths that require base URL prepending. Broken or malformed relative
paths result in 404 errors causing complete download failures with no indication of which filings were missed in batch
processing
derived_from_bd_id: BD-041
- id: finance-C-134
when: When implementing or refactoring logging initialization in SEC document extraction modules
action: Instantiate logger after config.json is loaded and its logging configuration is available; do not create module-level
LOGGER objects before configuration is loaded as this prevents custom logging settings from being applied
severity: medium
kind: operational_lesson
modality: should_not
consequence: Logger instantiated at module level before config.json loads creates temporal ordering dependency. The logger
operates with default configuration throughout the module load phase, logging at incorrect levels or to wrong handlers
until configuration is eventually applied, causing debugging visibility gaps
derived_from_bd_id: BD-081
- id: finance-C-135
when: When configuring or adjusting 10-Q extraction retry loop parameters
action: Verify length_difference threshold of 5000 chars matches actual document size expectations before using; setting
threshold too high risks accepting incomplete extractions, while too low may cause unnecessary retries or valid partial
extractions to be rejected
severity: medium
kind: operational_lesson
modality: should
consequence: The 5000-character length_difference threshold determines when the 10-Q extraction retry mechanism continues
or terminates. Wrong threshold causes either incomplete content acceptance (high threshold) or valid extraction rejections
(low threshold), both leading to unreliable backtest data quality
derived_from_bd_id: BD-082
- id: finance-C-136
when: When processing deeply nested SEC EDGAR HTML documents with HtmlStripper and BeautifulSoup
action: Investigate how recursion limit (30000), HTMLParser settings (convert_charrefs=True, strict=False), and BeautifulSoup
tree traversal interact; implement graceful fallback mechanism for documents that may trigger RecursionError before
hitting configured recursion limit
severity: high
kind: operational_lesson
modality: must
consequence: Deeply nested malformed HTML combined with lenient HTMLParser settings and recursive BeautifulSoup traversal
creates a risk cascade where RecursionError occurs before hitting the configured 30000 limit, causing complete extraction
failure instead of graceful degradation on pathological documents
derived_from_bd_id: BD-083
- id: finance-C-137
when: When processing SEC EDGAR filings with deeply nested HTML tables and divs
action: Set sys.setrecursionlimit to 30000 at module initialization before BeautifulSoup tree traversal; the recursion
limit provides headroom for pathological nesting depth while bounding maximum stack depth to prevent resource exhaustion
on extremely malformed documents
severity: high
kind: architecture_guardrail
modality: must
consequence: SEC EDGAR filings can contain deeply nested tables, divs, and spans that exceed Python's default recursion
limit of 1000. Without elevated recursion limit, BeautifulSoup tree traversal triggers StackOverflow on malformed documents,
causing complete extraction failure
derived_from_bd_id: BD-014
- id: finance-C-138
when: When implementing table extraction logic for SEC financial documents
action: Apply background color filtering threshold before table removal decisions — do not remove tables that have colored
backgrounds (RGB-based threshold) as these typically represent financial data tables with visual hierarchy
severity: high
kind: domain_rule
modality: must
consequence: Without color-based filtering, financial tables rendered with colored backgrounds (for visual hierarchy in
SEC filings) will be incorrectly discarded, causing loss of critical numerical data like balance sheets and income statements
derived_from_bd_id: BD-068
- id: finance-C-139
when: When implementing Table of Contents matching logic in SEC document extraction
action: Maintain the ignore-matches counter threshold for ToC filtering to prevent infinite loops on malformed documents
— the counter MUST stop ToC-based matching and fall back to content extraction after reaching the threshold
severity: high
kind: domain_rule
modality: must
consequence: Without the ignore counter, malformed SEC documents with malformed ToC entries will cause unbounded iteration,
leading to extraction process hangs or denial-of-service on crafted inputs
derived_from_bd_id: BD-070
- id: finance-C-140
when: When importing the extract_items module in SEC filing extraction
action: Set sys.setrecursionlimit(30000) as a global process-wide change at module load time — instead, use a context
manager, specific function scope, or subprocess isolation to localize recursion limit changes
severity: high
kind: architecture_guardrail
modality: must_not
consequence: Global recursion limit changes at module import permanently alter process behavior, masking stack overflow
bugs in unrelated code running in the same interpreter and causing unexpected truncation of legitimate deep recursion
derived_from_bd_id: BD-076
- id: finance-C-142
when: When implementing page number and header removal in SEC document text cleanup
action: Use comprehensive regex patterns covering each page number format variations (standalone numbers, with 'Page',
with dashes, Roman numerals) and validate removal effectiveness with post-processing checks
severity: medium
kind: operational_lesson
modality: must
consequence: Incomplete regex patterns for page number removal will leave page artifacts in extracted text, degrading
downstream analysis quality and potentially confusing content identification algorithms
derived_from_bd_id: BD-060
- id: finance-C-143
when: When implementing section header matching in SEC filing extraction
action: Apply case-sensitive matching before case-insensitive as priority order — when case-sensitive match exists anywhere
in document, use it regardless of position, not just as tiebreaker
severity: high
kind: domain_rule
modality: must
consequence: Without correct case-sensitive-first priority, SEC filings with non-canonical section header casing will
match case-insensitive variants first, potentially extracting wrong sections and corrupting document structure analysis
derived_from_bd_id: BD-063
- id: finance-C-144
when: When processing SEC filings that contain embedded PDF content within HTML wrappers
action: Strip embedded PDF sections (<PDF>...</PDF>) from HTML documents during extraction — actual PDF content is lost;
do not treat these as extractable text items
severity: high
kind: domain_rule
modality: must
consequence: Embedded PDF content within HTML wrappers is not parseable as text; without stripping, raw PDF bytes contaminate
text extraction and corrupt downstream analysis with unreadable content
derived_from_bd_id: BD-055
- id: finance-C-145
when: When reading SEC filing files with inconsistent encoding from various sources
action: Use errors='backslashreplace' for file reading to handle encoding issues gracefully — do not use UTF-8 strict
mode which will crash on malformed encodings
severity: high
kind: domain_rule
modality: must
consequence: Without backslashreplace encoding handling, SEC filings with invalid UTF-8 sequences will cause file read
exceptions, preventing extraction from completing on documents that could yield valid content
derived_from_bd_id: BD-056
- id: finance-C-146
when: When implementing text cleanup that removes navigation elements from SEC filings
action: Remove 'Table of Contents', 'Index to Financial Statements', 'Back to Contents', and 'Quicklinks' navigation headers
— but validate these as navigation elements using positional context (appear at document start/mid-section) before removal,
not just phrase matching alone
severity: medium
kind: operational_lesson
modality: must
consequence: Phrase-only matching removes section headers that legitimately contain these phrases in substantive content,
causing silent loss of actual document sections disguised as navigation elements
derived_from_bd_id: BD-061
- id: finance-C-147
when: When implementing Unicode normalization for special character handling in SEC filings
action: Normalize Unicode representations (em-dashes, smart quotes, accented characters) to standard ASCII equivalents
for consistent text matching — but document this normalization for downstream consumers and verify semantic preservation
when normalization is applied
severity: medium
kind: operational_lesson
modality: should
consequence: Without documented normalization, downstream systems may not expect ASCII-converted characters, causing subtle
semantic changes in financial terminology and company names that affect matching accuracy
derived_from_bd_id: BD-069
- id: finance-C-148
when: When implementing file download and persistence logic for SEC filings
action: Write CSV metadata to a temporary file first, then move to the final location using atomic rename — do not write
directly to the target path
severity: high
kind: domain_rule
modality: must
consequence: Direct writes risk leaving partial data if the process is interrupted, corrupting the index file and causing
downstream data retrieval failures
derived_from_bd_id: BD-048
- id: finance-C-149
when: When processing SEC filing downloads with incremental update logic
action: Verify that existing file detection relies on exact naming format matching as implemented in the codebase — do
not assume alternative detection methods (checksums, manifest files) are used unless explicitly configured
severity: medium
kind: operational_lesson
modality: should
consequence: If the naming format convention changes or files are renamed externally, the detection logic may incorrectly
skip existing files or re-download unnecessarily, causing data duplication or gaps
derived_from_bd_id: BD-047
- id: finance-C-150
when: When implementing or modifying the SEC filing download module (BD-077 CSV format contract)
action: Implement explicit schema validation at the CSV format contract boundary between download and extract modules
to detect any format changes before they cause silent failures or corrupted metadata
severity: high
kind: operational_lesson
modality: must
consequence: Without schema validation, a modified CSV format causes the extract module to silently fail or produce corrupted
company metadata, leading to incorrect financial data in backtesting results
derived_from_bd_id: BD-084
- id: finance-C-151
when: When implementing CIK lookup or company info caching with incremental download (BD-002) and skip-existing (BD-047)
action: Implement unified caching with TTL-based invalidation to verify company data reflects recent changes (e.g., new
SIC codes, post-merger name changes), and validate that cached CIK lookups are consistent with current company_info
records
severity: medium
kind: operational_lesson
modality: should
consequence: Duplicate cache mechanisms (companies_info.json vs company_info.json) with stale data cause CIK lookups to
reference outdated company information, resulting in extraction of wrong company filings or missing updated company
data in backtest
derived_from_bd_id: BD-089
- id: finance-C-152
when: When implementing table extraction logic from SEC documents
action: 'Check for non-blank background colors (any color that is not white, transparent, none, or #fff) and remove tables
with such backgrounds — colored backgrounds often indicate navigation elements, disclaimers, or other non-content tables'
severity: high
kind: domain_rule
modality: must
consequence: Preserving tables with colored backgrounds causes extraction of non-content elements like navigation menus
and disclaimers, contaminating the extracted data and reducing analysis quality
derived_from_bd_id: BD-031
- id: finance-C-153
when: When implementing SEC item section pattern matching logic
action: Insert optional whitespace (zero or more spaces) before trailing letters (A, B, C) in item patterns to match variations
like Item 1A and Item 1 B in SEC documents
severity: high
kind: domain_rule
modality: must
consequence: Strict whitespace requirements in item patterns cause missed matches for sub-sections with extra whitespace,
resulting in incomplete document extraction and missing risk factors
derived_from_bd_id: BD-034
- id: finance-C-154
when: When implementing SIGNATURE section extraction from SEC filings
action: Extract the SIGNATURE block from the last occurrence in the document, not the first — table of contents entries
may appear before the actual signature block
severity: high
kind: domain_rule
modality: must
consequence: Extracting the first SIGNATURE occurrence captures TOC entries instead of the genuine signature block, resulting
in incomplete or incorrect signer information extraction
derived_from_bd_id: BD-037
- id: finance-C-155
when: When implementing 10-Q document parsing and item extraction
action: 'Parse 10-Q documents in two parts: Part I (Items 1-4, financial information) and Part II (Items 1-6, non-financial
information) — Items 5-6 only appear in Part II'
severity: high
kind: domain_rule
modality: must
consequence: Single-section 10-Q extraction misses Items 5-6 that appear only in Part II, resulting in incomplete regulatory
filings and potential compliance failures
derived_from_bd_id: BD-038
- id: finance-C-156
when: When implementing text extraction cleanup from SEC HTML documents
action: Normalize whitespace by removing excessive spaces while preserving paragraph and list structure — excessive HTML
whitespace creates noise in extracted text
severity: high
kind: domain_rule
modality: must
consequence: Without whitespace normalization, excessive spacing in HTML causes corrupted extracted text with irregular
formatting, making downstream analysis unreliable
derived_from_bd_id: BD-066
- id: finance-C-157
when: When implementing the process_filing function or refactoring filing processing logic
action: Call determine_items_to_extract BEFORE calling extract_items to verify item selection logic executes before extraction
begins
severity: high
kind: domain_rule
modality: must
consequence: Violating the function call order causes KeyError exceptions when extract_items attempts to access items
that have not been pre-identified by determine_items_to_extract, resulting in runtime failures
derived_from_bd_id: BD-071
- id: finance-C-158
when: When implementing 10-Q parsing logic or refactoring filing extraction components
action: 'Preserve the cascading parsing sequence: (1) BD-009 split 10-Q into Part I and Part II first, (2) BD-038 applies
item mapping within correct part context (Part I=Items 1-4, Part II=Items 1-6), (3) BD-075 uses ''__'' as part-item
delimiter for encoding, (4) BD-079 uses Roman numeral map for part numbering — do not modify any single point without
validating the full cascade'
severity: high
kind: domain_rule
modality: must
consequence: 'Breaking the cascade at any point causes cascading failures: changing BD-075 delimiter breaks split logic,
incomplete BD-079 map fails part identification, BD-009 separation failure causes BD-038 to extract items in wrong context,
all resulting in incorrect filing output'
derived_from_bd_id: BD-087
- id: finance-C-159
when: When using the framework's default item extraction behavior without specifying items_to_extract
action: Verify that extracting each available items aligns with your use case; if targeting specific items for analysis,
explicitly specify items_to_extract parameter to avoid processing large filings with unnecessary items and potential
performance degradation
severity: medium
kind: operational_lesson
modality: should
consequence: Default behavior extracts all available items, which may cause significant processing time on large filings
and introduce noise in analysis when only specific items are needed for targeted research
derived_from_bd_id: BD-052
- id: finance-C-160
when: When extracting SEC filing content using default configuration without explicit include_signature setting
action: Verify that SIGNATURE sections containing personal signer information are not needed for your analysis; for compliance
or audit use cases, set include_signature=true to capture signer details
severity: medium
kind: operational_lesson
modality: should
consequence: Default exclusion of SIGNATURE sections silently removes relevant signer information that may be required
for compliance verification, audit trails, or forensic analysis use cases
derived_from_bd_id: BD-053
- id: finance-C-161
when: When extracting SEC filing content using default configuration without explicit remove_tables setting
action: Verify that tabular data including numerical content, financial tables, and structured information is not needed
for your analysis; for quantitative research, set remove_tables=false to preserve table content
severity: medium
kind: operational_lesson
modality: should
consequence: Default table removal silently discards legitimate tabular content including financial data, numerical schedules,
and structured information that may be critical for quantitative analysis and backtesting strategies
derived_from_bd_id: BD-054
- id: finance-C-162
when: When implementing or refactoring directory initialization and file path handling logic in deployment scenarios
action: Verify directory paths remain configurable or writable in restricted environments (shared servers, containers,
cloud functions); must not assume the package directory is always writable
severity: high
kind: architecture_guardrail
modality: must
consequence: Hardcoded package-relative directory paths cause immediate import failures in restricted deployment environments
where the package directory lacks write permissions, preventing any trading functionality from loading
derived_from_bd_id: BD-088
- id: finance-C-163
when: When implementing or refactoring HTML detection and parsing logic for document extraction
action: Preserve the multi-criteria HTML detection logic requiring BOTH <td> AND <tr> elements before selecting HtmlStripper
parsing strategy; must not simplify detection to require only <td> or only <tr>
severity: high
kind: domain_rule
modality: must
consequence: Simplifying HTML detection to require only partial table elements causes wrong parsing strategy selection,
leading to extraction failure or corrupted output on edge-case documents that contain partial table structures
derived_from_bd_id: BD-090
- id: finance-C-164
when: When parsing 10-Q SEC documents using section separation logic
action: 'Implement length-based validation to detect parsing anomalies: flag documents where PART I appears only in ToC
without substantive section body, or where PART II is disproportionately longer indicating section boundary detection
failure'
severity: medium
kind: operational_lesson
modality: should
consequence: The section separation heuristic silently fails on 10-Q filings where PART I is listed in ToC but lacks a
separate section, causing parsing to skip or misalign critical financial disclosure content
derived_from_bd_id: BD-057
- id: finance-C-165
when: When implementing or refactoring item number matching patterns in SEC document extraction
action: Preserve the explicit boundary character set [.*~-:\s\(] after item numbers in regex patterns; must not remove
these separator characters or replace with simpler word boundary assertions only
severity: high
kind: domain_rule
modality: must
consequence: Simplifying item number patterns to use only word boundaries causes items followed by unexpected separator
characters to fail matching, silently skipping important SEC disclosure items in extracted content
derived_from_bd_id: BD-064
- id: finance-C-166
when: When implementing or refactoring whitespace handling in SEC document text processing patterns
action: Preserve the explicit whitespace definition [^\S\r\n] (matching whitespace but explicitly excluding newlines and
carriage returns); must not replace with standard \s or broader character classes that include line breaks
severity: high
kind: domain_rule
modality: must
consequence: Replacing the custom whitespace pattern with standard \s causes newlines to be treated as ordinary whitespace,
destroying line-oriented document structure and breaking pattern matching that depends on line boundaries for SEC document
parsing
derived_from_bd_id: BD-065
- id: finance-C-167
when: When implementing 10-Q extraction workflow
action: Call get_10q_parts to populate the parts dictionary with section boundaries before entering the item extraction
loop — verify parts['metadata'], parts['financial_statements'], etc. are available for regex pattern matching
severity: high
kind: domain_rule
modality: must
consequence: Skipping get_10q_parts causes item regex patterns to operate on unparsed raw content, producing malformed
or missing item data that corrupts downstream financial analysis and reporting
derived_from_bd_id: BD-073
- id: finance-C-169
when: When processing HTTP responses from SEC EDGAR during data retrieval
action: Assume rate limit detection is complete based on any single mechanism — BD-044 text detection alone is insufficient
(only catches 'will be managed until action is taken'), BD-062 status codes alone miss 200 responses with embedded rate-limit
content, BD-003 retry logic alone lacks explicit rate limit awareness
severity: high
kind: architecture_guardrail
modality: must_not
consequence: Relying on incomplete rate limit detection causes the framework to miss rate limit errors and retry non-rate-limit
failures, or fail to retry when rate-limited — resulting in corrupted or missing market data that propagates into incorrect
trading signals
derived_from_bd_id: BD-086
- id: finance-C-170
when: When handling HTTP 200 responses with embedded content during SEC EDGAR retrieval
action: Implement explicit content scanning for rate-limit indicators within 200 OK responses — BD-003 retry mechanism
and BD-062 status code detection do not trigger for 200 status, so BD-044 HTML text detection is the only safeguard
against rate-limited pages returned as successful responses
severity: high
kind: domain_rule
modality: must
consequence: Rate-limited pages returned as 200 OK bypass all error handling, causing the framework to treat rate-limited
content as valid data. Trading strategies then execute on empty or placeholder content, leading to incorrect position
sizing and significant financial losses
derived_from_bd_id: BD-086
output_validator:
assertions:
- id: OV-01
check_predicate: all(p in inspect.getsource(zvt.factors.algorithm.macd) for p in ['slow=26', 'fast=12', 'n=9'])
failure_message: 'FATAL: MACD params drifted from (fast=12, slow=26, n=9) — SL-08 violation, non-reproducible signals'
business_meaning: Standard MACD parameters are a semantic lock; drift makes results incomparable with industry-standard
indicators and non-reproducible.
source_ids:
- SL-08
- BD-036
- id: OV-02
check_predicate: result.get('total_trades', 0) > 0 or result.get('explicit_zero_trade_ack') is True
failure_message: Zero trades executed — likely missing pre-fetched data (see PC-02) or over-restrictive filters
business_meaning: A backtest with zero trades is not a valid result; either data is missing or the strategy never triggered.
Structural non-emptiness check is insufficient — we need business confirmation.
source_ids:
- SL-01
- finance-C-073
- id: OV-03
check_predicate: result.get('annual_return') is None or abs(float(result['annual_return'])) <= 5.0
failure_message: 'FATAL: |annual_return| > 500% — likely look-ahead bias or data error'
business_meaning: Annual returns exceeding 500% are physically implausible for A-share strategies; indicates look-ahead
bias or corrupt data.
source_ids: []
- id: OV-04
check_predicate: result.get('holding_change_pct') is None or abs(float(result['holding_change_pct'])) <= 1.0
failure_message: 'FATAL: |holding_change_pct| > 100% — physically impossible'
business_meaning: Holding change percentage cannot exceed 100%; violation indicates position accounting error.
source_ids:
- BD-029
- id: OV-05
check_predicate: result.get('max_drawdown') is None or abs(float(result['max_drawdown'])) <= 1.0
failure_message: 'FATAL: |max_drawdown| > 100% — impossible for non-leveraged account'
business_meaning: Maximum drawdown cannot exceed 100% without leverage; violation indicates calculation error or look-ahead
bias.
source_ids: []
- id: OV-06
check_predicate: not (hasattr(result, 'trade_log') and result.trade_log and any(result.trade_log[i].action == 'sell' and
i+1 < len(result.trade_log) and result.trade_log[i+1].action == 'buy' and result.trade_log[i].timestamp == result.trade_log[i+1].timestamp
for i in range(len(result.trade_log)-1)))
failure_message: 'FATAL: buy-before-sell detected in same cycle — SL-01 violation, creates implicit leverage'
business_meaning: SL-01 requires sell() before buy() in each cycle; violation means available_long was not updated before
buying, risking duplicate positions.
source_ids:
- SL-01
scaffold:
validate_py_path: '{workspace}/validate.py'
tail_block: "# === DO NOT MODIFY BELOW THIS LINE ===\nif __name__ == \"__main__\":\n result = run_backtest()\n from\
\ validate import enforce_validation\n enforce_validation(result, output_path=\"{workspace}/result.csv\")\n# ===\
\ END DO NOT MODIFY ==="
enforcement_protocol: 1. Never edit validate.py. 2. Never delete the DO NOT MODIFY tail block from the main script. 3. Never
wrap enforce_validation() in try/except. 4. Never rewrite result write logic — it MUST go through enforce_validation.
5. If validate.py raises ImportError, fix the dependency, do not remove the call.
acceptance:
hard_gates:
- id: G1
check: '{workspace}/result.csv exists AND file size > 0'
on_fail: Strategy did not produce output; check run_backtest() return value and enforce_validation() call
- id: G2
check: '{workspace}/result.csv.validation_passed marker file exists'
on_fail: Validation did not complete; review validate.py output and fix assertion failures
- id: G3
check: 'Main script contains literal: from validate import enforce_validation'
on_fail: Validation chain stripped; re-add the import in the DO NOT MODIFY block
- id: G4
check: 'Main script contains literal: # === DO NOT MODIFY BELOW THIS LINE ==='
on_fail: Validation fence removed; regenerate DO NOT MODIFY tail block
- id: G5
check: 'result.csv has at least 1 row: pandas.read_csv(result.csv).shape[0] >= 1'
on_fail: Empty result; check if trade_log is non-empty and factors generated signals. Confirm PC-02 (k-data exists) passed.
- id: G6
check: 'If MACD strategy: source contains ''slow=26'' AND ''fast=12'' AND ''n=9'' in algorithm call'
on_fail: MACD params drifted from SL-08 lock; restore standard (12, 26, 9)
- id: G7
check: 'For data pipeline tasks: result.csv contains ''entity_id'' and ''timestamp'' fields'
on_fail: Missing required columns; check Mixin.query_data return schema and DataFrame MultiIndex reset_index() before
writing
- id: G8
check: 'OV-03 passes: abs(annual_return) <= 5.0 (500%)'
on_fail: Physical plausibility check failed; investigate look-ahead bias or data corruption in input kdata
soft_gates:
- id: SG-01
rubric: 'Strategy narrative consistency: user intent aligns with generated strategy.py logic. dim_a: signal direction
(buy/sell) matches intent [1-5, pass>=4]; dim_b: frequency (daily/intraday) aligns [1-5, pass>=4]; dim_c: risk controls
match user intent [1-5, pass>=4].'
- id: SG-02
rubric: 'Factor combination quality. dim_a: no highly correlated factor duplication [1-5, pass>=4]; dim_b: multi-period
alignment correct [1-5, pass>=4]; dim_c: liquidity filter present for A-share [1-5, pass>=4].'
- id: SG-03
rubric: 'Data source selection appropriateness. dim_a: coverage sufficient for target entities [1-5, pass>=4]; dim_b:
provider latency acceptable for strategy frequency [1-5, pass>=4]; dim_c: no unauthorized provider used without credentials
[1-5, pass>=4].'
skill_crystallization:
trigger: all_hard_gates_passed AND user_opt_out_skill_saving != true
output_path_template: '{workspace}/../skills/{slug}.skill'
slug_template: '{blueprint_id_short}-{uc_id_lower}'
captured_fields:
- name
- intent_keywords
- entry_point_script
- validate_script
- fatal_constraints
- spec_locks
- preconditions
- install_recipes
- human_summary_translated
action: 'After all Hard Gates PASS, resolve slug via slug_template using the executed UC, then write the .skill YAML file
at output_path_template. Notify user in their detected locale: ''Skill saved as {slug}.skill — next time say one of {sample_triggers}
from the matched UC to invoke directly.'''
violation_signal: All hard gates passed but no .skill file exists at expected path
skill_file_schema:
name: finance-bp-114 / SEC EDGAR Filing Extraction
version: v5.3
intent_keywords:
- EDGAR
- SEC filings
- 10-K extraction
- annual report parsing
- document extraction
entry_point: run_backtest
fatal_guards:
- SL-01
- SL-02
- SL-03
- SL-04
- SL-05
- SL-06
- SL-07
- SL-08
- SL-10
- SL-11
- SL-12
spec_locks:
- SL-01
- SL-02
- SL-03
- SL-04
- SL-05
- SL-06
- SL-07
- SL-08
- SL-09
- SL-10
- SL-11
- SL-12
preconditions:
- PC-01
- PC-02
- PC-03
- PC-04
post_install_notice:
trigger: skill_installation_complete
message_template:
positioning: I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow.
capability_catalog:
group_strategy:
source: auto_grouped
strategy_reason: no candidate field had 2-7 distinct values; all capabilities collapsed into single group
groups:
- group_id: all
name: All Capabilities
description: ''
emoji: 📦
uc_count: 1
ucs:
- uc_id: UC-101
name: SEC EDGAR Filing Extraction
short_description: 'Extracts and processes SEC EDGAR filings (10-K annual reports, 10-Q quarterly reports) from
compressed ZIP archives for downstream financial analysis '
sample_triggers:
- EDGAR
- SEC filings
- 10-K extraction
call_to_action: Tell me which one you want to try.
featured_entries:
- uc_id: UC-101
beginner_prompt: Try sec edgar filing extraction
auto_selected: true
- uc_id: UC-100
beginner_prompt: Try capability UC-100
auto_selected: true
- uc_id: UC-101
beginner_prompt: Try capability UC-101
auto_selected: true
more_info_hint: Ask me 'what else can you do?' to see all 1 capabilities.
locale_rendering:
instruction: On skill_installation_complete, translate ALL user-facing strings (positioning + capability_catalog.groups[].name
+ capability_catalog.groups[].description + capability_catalog.groups[].ucs[].short_description + call_to_action + featured_entries[].beginner_prompt
+ more_info_hint) into detected user locale per locale_contract. Preserve UC-IDs, group_id, emoji, and sample_triggers
verbatim.
preserve_verbatim:
- UC-IDs
- group_id
- emoji
- sample_triggers
- technical_class_names
enforcement:
action: 'Host agent MUST send composed message to user as the FIRST user-facing response after skill_installation_complete
event. Message MUST contain: positioning, capability_catalog (rendered as markdown tables per group), 3 featured_entries,
call_to_action, and more_info_hint.'
violation_code: PIN-01
violation_signal: First user-facing message post-install does not contain the full capability_catalog (all UCs grouped)
OR skips featured_entries OR skips call_to_action.
human_summary:
persona: Doraemon
what_i_can_do:
tagline: 'I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow. Just tell me
what you want; I''ll write the code, you don''t have to dig docs. (Heads up: ZVT natively supports A-share, HK, and
crypto. US stocks — stockus_nasdaq_AAPL — are half-baked; don''t bother for serious work.)'
use_cases:
- SEC EDGAR Filing Extraction
- A-share MACD daily golden-cross backtest with hfq price adjustment from eastmoney
- 'End-to-end ZVT pipeline: FinanceRecorder + GoodCompanyFactor + StockTrader'
- Multi-factor strategy with TargetSelector (AND mode) combining MACD + volume breakout
- Index composition data collection (SZ1000, SZ2000) with EM recorder
- Institutional fund holdings tracker via joinquant_fund_runner pattern
- Custom Transformer + Accumulator factor with per-entity rolling state
what_i_auto_fetch:
- ZVT stage pipeline structure (data_collection → visualization) from LATEST.yaml
- Semantic locks (SL-01 through SL-12) — especially sell-before-buy ordering and MACD params
- Fatal constraints (finance-C-*) relevant to your target strategy type
- 'Default parameters: MACD(12,26,9), hfq adjustment, buy_cost=0.001, base_capital=1M CNY'
- Entity ID format (stock_sh_600000) and DataFrame MultiIndex convention
- Provider-specific recorder class names and required class attributes
what_i_ask_you:
- 'Target market: A-share (default), HK, or crypto? (US stocks in ZVT are half-baked — stockus_nasdaq_AAPL exists but coverage
is thin)'
- 'Data source / provider: eastmoney (free, no account), joinquant (account+paid), baostock (free, good history), akshare,
or qmt (broker)?'
- 'Strategy type: MACD golden-cross, MA crossover, volume breakout, fundamental screen, or custom factor?'
- 'Time range: start_timestamp and end_timestamp for backtest period'
- 'Target entity IDs: specific stocks (stock_sh_600000) or index components (SZ1000)?'
locale_rendering:
instruction: On first user contact, translate all fields above into detected user locale while preserving Doraemon persona
(direct, frank, mildly snarky, knows limits).
preserve_verbatim:
- BD-IDs
- SL-IDs
- UC-IDs
- finance-C-IDs
- class_names
- function_names
- file_paths
- numeric_thresholds
提供全球宏观经济数据仪表板视图,支持多源数据本地存储、冷热数据分离存储与自动化刷新调度。
---
name: economic-dashboard
description: |-
提供全球宏观经济数据仪表板视图,支持多源数据本地存储、冷热数据分离存储与自动化刷新调度。
license: Proprietary. See LICENSE.txt in project root.
compatibility: Designed for Doramagic-host ecosystem (Claude Code / openclaw / Cursor). Requires Python 3.12+ with uv package manager.
metadata:
version: "v6.1"
blueprint_id: "finance-bp-083"
compiled_at: "2026-04-22T13:00:33.402010+00:00"
capability_markets: "global"
capability_activities: "macro-data"
sop_version: "crystal-compilation-v6.1"
---
# 宏观经济仪表板 (economic-dashboard)
> 提供全球宏观经济数据仪表板视图,支持多源数据本地存储、冷热数据分离存储与自动化刷新调度。
## Pipeline
`data_collection -> data_storage -> factor_computation -> target_selection -> trading_execution -> visualization`
## Top Use Cases (13 total)
### Database Snapshot Optimization (`UC-101`)
Creates optimized database backups by partitioning hot (<90 days) and cold (>90 days) data into appropriate storage formats with ZSTD compression and
**Triggers**: backup, snapshot, parquet
### Database Compaction and Optimization (`UC-102`)
Optimizes database performance by running VACUUM, rebuilding indexes, and deduplicating records within retention windows while measuring compression s
**Triggers**: vacuum, optimize, database cleanup
### Daily Economic Data Refresh (`UC-104`)
Fetches each economic data from FRED and Yahoo Finance APIs daily and stores results in cache for dashboard consumption
**Triggers**: refresh data, daily update, FRED data
For all **13** use cases, see [references/USE_CASES.md](references/USE_CASES.md).
**Execute trigger**: `When user intent matches intent_router.uc_entries[].positive_terms AND user uses action verb (run/execute/跑/执行/backtest/fetch/collect)`
## What I'll Ask You
- Target market: A-share (default), HK, or crypto? (US stocks in ZVT are half-baked — stockus_nasdaq_AAPL exists but coverage is thin)
- Data source / provider: eastmoney (free, no account), joinquant (account+paid), baostock (free, good history), akshare, or qmt (broker)?
- Strategy type: MACD golden-cross, MA crossover, volume breakout, fundamental screen, or custom factor?
- Time range: start_timestamp and end_timestamp for backtest period
- Target entity IDs: specific stocks (stock_sh_600000) or index components (SZ1000)?
## Semantic Locks (Fatal)
| ID | Rule | On Violation |
|---|---|---|
| `SL-01` | Execute sell orders before buy orders in every trading cycle | halt |
| `SL-02` | Trading signals MUST use next-bar execution (no look-ahead) | halt |
| `SL-03` | Entity IDs MUST follow format entity_type_exchange_code | halt |
| `SL-04` | DataFrame index MUST be MultiIndex (entity_id, timestamp) | halt |
| `SL-05` | TradingSignal MUST have EXACTLY ONE of: position_pct, order_money, order_amount | halt |
| `SL-06` | filter_result column semantics: True=BUY, False=SELL, None/NaN=NO ACTION | halt |
| `SL-07` | Transformer MUST run BEFORE Accumulator in factor pipeline | halt |
| `SL-08` | MACD parameters locked: fast=12, slow=26, signal=9 | halt |
Full lock definitions: [references/LOCKS.md](references/LOCKS.md)
## Top Anti-Patterns (14 total)
- **`AP-MACRO-DATA-001`**: SEC EDGAR Rate Limit Violation
- **`AP-MACRO-DATA-002`**: Temporal Knowledge Graph Look-Ahead Bias
- **`AP-MACRO-DATA-003`**: Technical Indicator Look-Ahead Bias via Missing Shift
All 14 anti-patterns: [references/ANTI_PATTERNS.md](references/ANTI_PATTERNS.md)
## Evidence Quality Notice
> [QUALITY NOTICE] This crystal was compiled from blueprint finance-bp-083. Evidence verify ratio = 28.0% and audit fail total = 33. Generated results may have uncaptured requirement gaps. Verify critical decisions against source files (LATEST.yaml / LATEST.jsonl).
## Reference Files
| File | Contents | When to Load |
|---|---|---|
| [references/seed.yaml](references/seed.yaml) | V6+ 全量权威 (source-of-truth) | 有行为/决策争议时必读 |
| [references/ANTI_PATTERNS.md](references/ANTI_PATTERNS.md) | 14 条跨项目反模式 | 开始实现前 |
| [references/WISDOM.md](references/WISDOM.md) | 跨项目精华借鉴 | 架构决策时 |
| [references/CONSTRAINTS.md](references/CONSTRAINTS.md) | domain + fatal 约束 | 规则冲突时 |
| [references/USE_CASES.md](references/USE_CASES.md) | 全量 KUC-* 业务场景 | 需要完整示例时 |
| [references/LOCKS.md](references/LOCKS.md) | SL-* + preconditions + hints | 生成回测/交易代码前 |
| [references/COMPONENTS.md](references/COMPONENTS.md) | AST 组件地图(按 module 拆分)| 查 API 时 |
---
*Compiled by Doramagic crystal-compilation-v6.1 from `finance-bp-083` blueprint at 2026-04-22T13:00:33.402010+00:00.*
*See [human_summary.md](human_summary.md) for non-technical overview.*
FILE:human_summary.md
# Human Summary
> I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow. Just tell me what you want; I'll write the code, you don't have to dig docs. (Heads up: ZVT natively supports A-share, HK, and crypto. US stocks — stockus_nasdaq_AAPL — are half-baked; don't bother for serious work.)
## What I Can Do
- **tagline**: I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow. Just tell me what you want; I'll write the code, you don't have to dig docs. (Heads up: ZVT natively supports A-share, HK, and crypto. US stocks — stockus_nasdaq_AAPL — are half-baked; don't bother for serious work.)
- **use_cases**: ['API Key Management Verification', 'Database Compaction and Optimization', 'Database Snapshot Optimization', 'A-share MACD daily golden-cross backtest with hfq price adjustment from eastmoney', 'End-to-end ZVT pipeline: FinanceRecorder + GoodCompanyFactor + StockTrader', 'Multi-factor strategy with TargetSelector (AND mode) combining MACD + volume breakout', 'Index composition data collection (SZ1000, SZ2000) with EM recorder']
## What I Ask You
- Target market: A-share (default), HK, or crypto? (US stocks in ZVT are half-baked — stockus_nasdaq_AAPL exists but coverage is thin)
- Data source / provider: eastmoney (free, no account), joinquant (account+paid), baostock (free, good history), akshare, or qmt (broker)?
- Strategy type: MACD golden-cross, MA crossover, volume breakout, fundamental screen, or custom factor?
- Time range: start_timestamp and end_timestamp for backtest period
- Target entity IDs: specific stocks (stock_sh_600000) or index components (SZ1000)?
FILE:references/ANTI_PATTERNS.md
# Anti-Patterns (Cross-Project)
Total: **14**
## finance-bp-074--FinRobot (1)
### `AP-MACRO-DATA-001` — SEC EDGAR Rate Limit Violation <sub>(high)</sub>
When implementing SEC API calls without applying rate limiting decorators, requests exceed the regulatory 10 requests per second limit. This causes IP blocking from SEC EDGAR, preventing all subsequent access to critical financial filings and completely disrupting the data collection pipeline. FinRobot demonstrates that SEC enforces strict rate limits and missing User-Agent headers compound this by causing silent request failures.
## finance-bp-077--Open_Source_Economic_Model (2)
### `AP-MACRO-DATA-004` — EIOPA Non-Compliant Curve Extrapolation <sub>(high)</sub>
When implementing the Smith-Wilson algorithm for EIOPA Solvency II calculations, using non-EIOPA compliant formulas or incorrect convergence point calculations violates regulatory specifications. The convergence point must use max(U+40, 60) years per EIOPA paragraph 120. Non-compliant formulas will fail regulatory audits for insurance liability calculations and produce incorrect risk-free rates, leading to materially wrong liability valuations.
### `AP-MACRO-DATA-009` — CSV BOM Encoding Corruption in Data Import <sub>(medium)</sub>
When importing CSV portfolio files with special characters without using 'utf-8-sig' encoding to handle BOM markers, CSV files with UTF-8 BOM markers fail to parse correctly. This causes KeyError exceptions when reading row fields, preventing portfolio data from loading entirely. The BOM marker silently corrupts the first column name read by pandas.
## finance-bp-080--FinDKG (3)
### `AP-MACRO-DATA-002` — Temporal Knowledge Graph Look-Ahead Bias <sub>(high)</sub>
When implementing temporal data splitting for knowledge graphs, using non-temporal train/val/test splits causes the model to see future events during training. The violation of train_edges occurring before val_edges and test_edges temporally results in inflated metrics that do not reflect real-world performance. This produces overfit models that fail catastrophically when deployed for actual temporal prediction tasks.
### `AP-MACRO-DATA-008` — DGL Graph Attribute Propagation Failure in Temporal Batching <sub>(medium)</sub>
When implementing temporal knowledge graph data collation without propagating graph attributes (num_relations, num_all_nodes, time_interval) to subgraph variants created by collate_fn, downstream model components encounter missing attribute errors. The EmbeddingUpdater and EdgeModel expect these attributes on all graph objects including subgraphs, causing training to fail with AttributeError.
### `AP-MACRO-DATA-014` — Temporal DataLoader Shuffling Breaking Graph Ordering <sub>(medium)</sub>
When configuring DataLoader for temporal knowledge graph training with shuffle=True, the temporal ordering required for cumulative graph construction is violated. The model receives edges in non-chronological order, breaking the prior_G, batch_G, cumulative_G construction logic that depends on edges_before_batch occurring before edges_in_batch.
## finance-bp-083--Economic-Dashboard (3)
### `AP-MACRO-DATA-003` — Technical Indicator Look-Ahead Bias via Missing Shift <sub>(high)</sub>
When implementing SMA crossover detection (golden/death cross) without using shift(1) to compare current bar state with prior bar state, crossover detection uses current bar data causing look-ahead bias. Signals appear to fire at the same bar as the cross occurs, producing unrealistic backtest results that fail in live trading. Rationalizing this with 'we need the current bar signal immediately' leads to future information leaking into current signals.
### `AP-MACRO-DATA-010` — OHLCV Data Quality Validation Failure <sub>(medium)</sub>
When calculating technical indicators from OHLCV data without verifying required columns (open, high, low, close, volume), missing required OHLCV columns causes ValueError and prevents technical indicator calculation for affected tickers. This blocks downstream regime classification and pattern detection for all tickers with incomplete data.
### `AP-MACRO-DATA-011` — Inconsistent Primary Key Schema Causing JOIN Failures <sub>(medium)</sub>
When storing derived features in DuckDB with a different primary key schema than technical_features table, inconsistent primary keys prevent JOIN operations between tables. This breaks regime classification and pattern detection pipelines. The composite primary key (ticker, date) must be consistent across all feature tables to enable efficient querying and data integrity.
## finance-bp-105--open-climate-investing (5)
### `AP-MACRO-DATA-005` — Factor Regression Using Raw Returns Instead of Excess Returns <sub>(high)</sub>
When computing returns for CAPM/Fama-French factor regression, using raw stock returns instead of subtracting the risk-free rate (Rf) violates standard financial econometric methodology. CAPM/FF regression requires excess returns (Return - Rf); using raw returns produces incorrect beta estimates that misrepresent a stock's systematic risk exposure. This leads to fundamentally flawed risk attribution and portfolio construction decisions.
### `AP-MACRO-DATA-006` — Percentage vs Decimal Unit Mismatch in Factor Data <sub>(high)</sub>
When importing Fama-French factors from CSV files, failing to divide percentage-formatted factors (e.g., 5.2) by 100 before regression causes coefficients scaled by 100x. This produces statistically invalid inference and meaningless factor loadings. The same issue applies to risk-free rate values, corrupting all CAPM beta calculations downstream.
### `AP-MACRO-DATA-007` — Insufficient Regression Observations for Statistical Validity <sub>(medium)</sub>
When implementing factor regression analysis, using fewer than 20 data points after filtering (inner join, winsorization, date range) produces unreliable or undefined t-statistics and p-values. OLS with insufficient observations produces meaningless regression coefficients, making it impossible to distinguish significant factor exposures from noise. This commonly occurs when combining multiple data sources with missing values.
### `AP-MACRO-DATA-012` — Frequency Column Enforcement Missing in Time Series Schema <sub>(medium)</sub>
When creating PostgreSQL schema for time series tables without explicit frequency column enforcement of 'MONTHLY' or 'DAILY' text values, mixed frequency data corrupts regression calculations. Combining incompatible data frequencies produces statistically invalid regression results. The database must enforce frequency consistency to prevent silent data corruption.
### `AP-MACRO-DATA-013` — PostgreSQL Fork in Multiprocessing Context <sub>(medium)</sub>
When implementing multiprocessing for parallel regression execution using fork start method with psycopg2 database connections, child processes inherit corrupted connection state. This causes 'connection already closed' errors or corrupted connection state in child processes, resulting in failed database writes and incomplete factor regression calculations.
FILE:references/COMPONENTS.md
# Component Capability Map
**Project**: finance-bp-083--Economic-Dashboard
**Scan date**: 2026-04-22
**Stats**: {'total_files': 7, 'total_classes': 36, 'total_functions': 0, 'total_stages': 7}
## Modules (7)
- [data_collection](components/data_collection.md): 6 classes
- [feature_engineering](components/feature_engineering.md): 6 classes
- [financial_analysis](components/financial_analysis.md): 6 classes
- [ml_training_&_prediction](components/ml_training_-_prediction.md): 6 classes
- [recession_probability](components/recession_probability.md): 3 classes
- [orchestration_&_automation](components/orchestration_-_automation.md): 3 classes
- [visualization_&_ui](components/visualization_-_ui.md): 6 classes
FILE:references/CONSTRAINTS.md
# Constraints
## preservation_manifest
```yaml
required_objects:
business_decisions_count: 111
fatal_constraints_count: 37
non_fatal_constraints_count: 147
use_cases_count: 13
semantic_locks_count: 12
preconditions_count: 4
evidence_quality_rules_count: 2
traceback_scenarios_count: 5
```
FILE:references/LOCKS.md
# Semantic Locks + Preconditions
## Semantic Locks (12)
### `SL-01` <sub>(on_violation: fatal)</sub>
Execute sell orders before buy orders in every trading cycle
### `SL-02` <sub>(on_violation: fatal)</sub>
Trading signals MUST use next-bar execution (no look-ahead)
### `SL-03` <sub>(on_violation: fatal)</sub>
Entity IDs MUST follow format entity_type_exchange_code
### `SL-04` <sub>(on_violation: fatal)</sub>
DataFrame index MUST be MultiIndex (entity_id, timestamp)
### `SL-05` <sub>(on_violation: fatal)</sub>
TradingSignal MUST have EXACTLY ONE of: position_pct, order_money, order_amount
### `SL-06` <sub>(on_violation: fatal)</sub>
filter_result column semantics: True=BUY, False=SELL, None/NaN=NO ACTION
### `SL-07` <sub>(on_violation: fatal)</sub>
Transformer MUST run BEFORE Accumulator in factor pipeline
### `SL-08` <sub>(on_violation: fatal)</sub>
MACD parameters locked: fast=12, slow=26, signal=9
### `SL-09` <sub>(on_violation: warning)</sub>
Default transaction costs: buy_cost=0.001, sell_cost=0.001, slippage=0.001
### `SL-10` <sub>(on_violation: fatal)</sub>
A-share equity trading is T+1 (no same-day close of buy positions)
### `SL-11` <sub>(on_violation: fatal)</sub>
Recorder subclass MUST define provider AND data_schema class attributes
### `SL-12` <sub>(on_violation: fatal)</sub>
Factor result_df MUST contain either 'filter_result' OR 'score_result' column
## Preconditions (4)
- **`PC-01`**: `python3 -c 'import zvt; print(zvt.__version__)'` → on_fail: Run: python3 -m pip install zvt then re-run: python3 -m zvt.init_dirs to initialize data directories
- **`PC-02`**: `python3 -c "from zvt.api.kdata import get_kdata; df = get_kdata(entity_ids=['stock_sh_600000'], limit=1); assert df is not None and len(df) > 0, 'No kdata found'"` → on_fail: Run recorder first: python3 -m zvt.recorders.em.em_stock_kdata_recorder --entity_ids stock_sh_600000 (replace with your target entity IDs)
- **`PC-03`**: `python3 -c "import os; from pathlib import Path; zvt_home = Path(os.environ.get('ZVT_HOME', Path.home() / '.zvt')); assert zvt_home.exists(), f'ZVT home not found: {zvt_home}'"` → on_fail: Run: python3 -m zvt.init_dirs
- **`PC-04`**: `python3 -c "import os, tempfile; from pathlib import Path; zvt_home = Path(os.environ.get('ZVT_HOME', Path.home() / '.zvt')); test_f = zvt_home / '.write_test'; test_f.touch(); test_f.unlink()"` → on_fail: Check directory permissions: chmod u+w ~/.zvt or set ZVT_HOME environment variable to a writable location
FILE:references/USE_CASES.md
# Known Use Cases (KUC)
Total: **13**
## `KUC-101`
**Source**: `scripts/create_database_snapshot_optimized.py`
Creates optimized database backups by partitioning hot (<90 days) and cold (>90 days) data into appropriate storage formats with ZSTD compression and incremental exports.
## `KUC-102`
**Source**: `scripts/compact_database.py`
Optimizes database performance by running VACUUM, rebuilding indexes, and deduplicating records within retention windows while measuring compression savings.
## `KUC-103`
**Source**: `scripts/verify_api_keys.py`
Verifies the API key management feature implementation is working correctly by testing module imports, credential initialization, and key storage/retrieval.
## `KUC-104`
**Source**: `scripts/refresh_data.py`
Fetches each economic data from FRED and Yahoo Finance APIs daily and stores results in cache for dashboard consumption.
## `KUC-105`
**Source**: `scripts/cleanup_old_data.py`
Archives data older than retention periods to Parquet files and deletes old records from main tables to reduce database size while maintaining historical access.
## `KUC-106`
**Source**: `scripts/quickstart_api_keys.py`
Provides a quick start guide for initializing and testing API key management, storing and verifying FRED API keys securely.
## `KUC-107`
**Source**: `scripts/setup_credentials.py`
Initializes and stores API credentials (FRED API key) securely in encrypted form for authenticated data access.
## `KUC-108`
**Source**: `scripts/move_fred_data.py`
Organizes FRED-related data files and scripts by moving them into a dedicated directory structure.
## `KUC-109`
**Source**: `scripts/generate_sample_data.py`
Generates sample datasets for offline mode testing, including FRED, Yahoo Finance, and World Bank sample data.
## `KUC-110`
**Source**: `scripts/init_database.py`
Initializes the DuckDB database by creating each required tables and indexes for the Economic Dashboard.
## `KUC-111`
**Source**: `scripts/fetch_sentiment_data.py`
Fetches news articles and sentiment data for specified stock symbols, including Google Trends data for sentiment analysis.
## `KUC-112`
**Source**: `scripts/migrate_pickle_to_duckdb.py`
Migrates existing pickle cache files containing FRED and Yahoo Finance data to the new DuckDB database format.
## `KUC-113`
**Source**: `scripts/refresh_data_smart.py`
Intelligently refreshes economic data based on natural update frequencies and SLAs, respecting rate limits and only fetching data when needed.
FILE:references/WISDOM.md
# Cross-Project Wisdom
Total: **8**
## `CW-MACRO-DATA-001` — Temporal Ordering Enforcement
**From**: finance-bp-080--FinDKG, finance-bp-083--Economic-Dashboard · **Applicable to**: macro-data
Across temporal knowledge graphs and financial time series, strict temporal ordering must be enforced in train/val/test splits and data loading. Train edges must occur strictly before validation edges, which must occur strictly before test edges. DataLoaders must never shuffle temporal data. Apply this pattern whenever implementing any time-series ML pipeline to prevent look-ahead bias that inflates evaluation metrics.
## `CW-MACRO-DATA-002` — Regulatory Formula Compliance
**From**: finance-bp-077--Open_Source_Economic_Model, finance-bp-105--open-climate-investing · **Applicable to**: macro-data
When implementing financial calculations subject to regulatory oversight (EIOPA Solvency II, CAPM, Fama-French), use exact formula specifications from authoritative sources. The Smith-Wilson convergence point must follow EIOPA paragraph 120, factor regressions must use excess returns with properly scaled inputs. Apply this pattern when calculations will be used for regulatory reporting or investment decision-making.
## `CW-MACRO-DATA-003` — Strict Data Schema Enforcement
**From**: finance-bp-083--Economic-Dashboard, finance-bp-077--Open_Source_Economic_Model · **Applicable to**: macro-data
Financial data pipelines require strict schema validation at ingestion points. OHLCV requires specific columns, CSV imports require exact column names matching field access, INI files require specific sections. Missing or malformed schema elements should fail loudly rather than produce silent corruption. Apply this pattern during data import to catch errors early before downstream calculations use bad data.
## `CW-MACRO-DATA-004` — Composite Primary Key Uniqueness
**From**: finance-bp-105--open-climate-investing, finance-bp-080--FinDKG, finance-bp-083--Economic-Dashboard · **Applicable to**: macro-data
Time-series financial databases require composite primary keys (ticker, date) to ensure uniqueness and enable efficient querying. Inconsistent primary keys across tables break JOIN operations essential for feature merging. Apply this pattern when designing any financial database schema involving time-series measurements with multiple entities.
## `CW-MACRO-DATA-005` — External API Rate Limiting
**From**: finance-bp-074--FinRobot · **Applicable to**: macro-data
When accessing external financial APIs (SEC EDGAR, data vendors), strict rate limiting must be implemented before deployment. SEC EDGAR enforces 10 requests per second with IP blocking consequences. Use decorators and proper User-Agent headers. Apply this pattern when integrating any external financial data API to prevent service disruption that blocks critical data access.
## `CW-MACRO-DATA-006` — Graph Attribute Propagation in Batching
**From**: finance-bp-080--FinDKG, finance-bp-105--open-climate-investing · **Applicable to**: macro-data
When creating subgraph variants during batch collation in graph-based ML, all metadata attributes (num_nodes, num_relations, time_interval) must be explicitly propagated to each subgraph. Downstream model components expect these attributes on all graph objects. Apply this pattern whenever implementing custom collate functions for graph neural networks to prevent training failures.
## `CW-MACRO-DATA-007` — Statistical Validity Thresholds
**From**: finance-bp-105--open-climate-investing, finance-bp-083--Economic-Dashboard · **Applicable to**: macro-data
Factor regressions and statistical calculations require minimum observation counts (typically 20+) for reliable inference. Inner joins, winsorization, and date filtering reduce observations; pipeline validation must check for sufficient data points before regression. Apply this pattern whenever computing regression statistics to ensure results are meaningful rather than spurious.
## `CW-MACRO-DATA-008` — Data Type Strictness for ML Operations
**From**: finance-bp-080--FinDKG, finance-bp-077--Open_Source_Economic_Model · **Applicable to**: macro-data
Graph operations and time calculations require strict dtype consistency (float32 for time values, integer for node types, boolean for masks). Type mismatches cause silent failures in edge_subgraph, degree calculations, and time interval transformations. Apply this pattern when preparing data for graph neural networks or any numerical ML pipeline to catch dtype issues early.
FILE:references/components/data_collection.md
# data_collection (6 classes)
## `CredentialsManager.set_api_key`
`data_collection/credentialsmanager-set-api-key.py:0`
## `CredentialsManager.get_api_key`
`data_collection/credentialsmanager-get-api-key.py:0`
## `load_fred_data`
`data_collection/load-fred-data.py:0`
## `load_yfinance_data`
`data_collection/load-yfinance-data.py:0`
## `data_source_adapter`
`data_collection/data-source-adapter.py:0`
## `cache_backend`
`data_collection/cache-backend.py:0`
FILE:references/components/feature_engineering.md
# feature_engineering (6 classes)
## `TechnicalIndicatorCalculator.calculate_all`
`feature_engineering/technicalindicatorcalculator-calculate-a.py:0`
## `OptionsMetricsCalculator.calculate`
`feature_engineering/optionsmetricscalculator-calculate.py:0`
## `DerivedFeaturesCalculator.compute`
`feature_engineering/derivedfeaturescalculator-compute.py:0`
## `FeaturePipeline.run_full_pipeline`
`feature_engineering/featurepipeline-run-full-pipeline.py:0`
## `indicator_library`
`feature_engineering/indicator-library.py:0`
## `feature_interactions`
`feature_engineering/feature-interactions.py:0`
FILE:references/components/financial_analysis.md
# financial_analysis (6 classes)
## `MarginCallRiskCalculator.calculate`
`financial_analysis/margincallriskcalculator-calculate.py:0`
## `LeverageMetricsCalculator.compute`
`financial_analysis/leveragemetricscalculator-compute.py:0`
## `InsiderTradingTracker.analyze`
`financial_analysis/insidertradingtracker-analyze.py:0`
## `FinancialHealthScorer.score`
`financial_analysis/financialhealthscorer-score.py:0`
## `risk_weights`
`financial_analysis/risk-weights.py:0`
## `insider_sentiment_formula`
`financial_analysis/insider-sentiment-formula.py:0`
FILE:references/components/ml_training_-_prediction.md
# ml_training_&_prediction (6 classes)
## `ModelTrainer.train`
`ml_training_&_prediction/modeltrainer-train.py:0`
## `EnsembleModel.fit`
`ml_training_&_prediction/ensemblemodel-fit.py:0`
## `PredictionEngine.predict`
`ml_training_&_prediction/predictionengine-predict.py:0`
## `BaseModel.save`
`ml_training_&_prediction/basemodel-save.py:0`
## `base_models`
`ml_training_&_prediction/base-models.py:0`
## `prediction_horizon`
`ml_training_&_prediction/prediction-horizon.py:0`
FILE:references/components/orchestration_-_automation.md
# orchestration_&_automation (3 classes)
## `market_data_refresh_dag`
`orchestration_&_automation/market-data-refresh-dag.py:0`
## `economic_data_refresh_dag`
`orchestration_&_automation/economic-data-refresh-dag.py:0`
## `alert_channel`
`orchestration_&_automation/alert-channel.py:0`
FILE:references/components/recession_probability.md
# recession_probability (3 classes)
## `RecessionProbabilityModel.calculate`
`recession_probability/recessionprobabilitymodel-calculate.py:0`
## `RecessionProbabilityModel.get_probability`
`recession_probability/recessionprobabilitymodel-get-probabilit.py:0`
## `indicator_weights`
`recession_probability/indicator-weights.py:0`
FILE:references/components/visualization_-_ui.md
# visualization_&_ui (6 classes)
## `app.py (landing page)`
`visualization_&_ui/app-py-landing-page.py:0`
## `10_Margin_Call_Risk_Monitor.render`
`visualization_&_ui/10-margin-call-risk-monitor-render.py:0`
## `11_Recession_Probability.render`
`visualization_&_ui/11-recession-probability-render.py:0`
## `13_Insider_Trading_Tracker.render`
`visualization_&_ui/13-insider-trading-tracker-render.py:0`
## `theme/styling`
`visualization_&_ui/theme-styling.py:0`
## `chart_library`
`visualization_&_ui/chart-library.py:0`
FILE:references/seed.yaml
meta:
id: finance-bp-083-v5.3
version: v6.1
blueprint_id: finance-bp-083
sop_version: crystal-compilation-v6.1
source_language: en
compiled_at: '2026-04-22T13:00:33.402010+00:00'
target_host: openclaw
authoritative_artifact:
primary: seed.yaml
non_authoritative_derivatives:
- SKILL.md (host-generated summary, may lag)
- HEARTBEAT.md (host telemetry)
- memory/*.md (host conversational memory)
rule: On any behavioral decision (preconditions check, OV assertion, EQ rule firing, spec_lock verification), agents MUST
re-read seed.yaml. Derivatives are for UI display only and may be out-of-date.
execution_protocol:
install_trigger:
- Execute resources.host_adapter.install_recipes[] in declared order
- Verify each package with import check before proceeding
execute_trigger: When user intent matches intent_router.uc_entries[].positive_terms AND user uses action verb (run/execute/跑/执行/backtest/fetch/collect)
on_execute:
- Reload seed.yaml (do not rely on SKILL.md or cached summaries)
- Run preconditions[] in declared order; halt on first fatal failure with on_fail message to user
- Enter context_state_machine.CA1_MEMORY_CHECKED state
- Evaluate evidence_quality.enforcement_rules[]; prepend user_disclosure_template
- Translate user_facing_fields to user locale per locale_contract
- "[V6 READING ORDER]\nThis crystal contains the following V6 layers. Before answering any business question, the host\
\ MUST read them in order:\n 1. anti_patterns[] — cross-project anti-patterns (with AP-* ids)\n 2. cross_project_wisdom[]\
\ — cross-project wisdom (with CW-* ids)\n 3. domain_constraints_injected[] — domain constraints (SHARED-* ids)\n \
\ 4. known_use_cases[] — concrete business scenarios (KUC-* ids)\n 5. component_capability_map — AST component map\
\ (by module)\n\nWhen answering user questions, proactively cite relevant AP-*/CW-*/SHARED-*/KUC-* ids with source text.\
\ Examples: T+1 rules -> cite SHARED-* constraint; model comparison -> warn via AP-*; follow-holdings strategy -> cite\
\ KUC-* with example file."
workspace_resolution:
scripts_path: '{host_workspace}/scripts/'
skills_path: '{host_workspace}/skills/'
trace_path: '{host_workspace}/.trace/'
capability_tags:
markets:
- global
activities:
- macro-data
upgraded_from: finance-bp-083-v1.seed.yaml
upgraded_at: '2026-04-22T13:20:19.259947+00:00'
v6_inputs:
ast_mind_map: knowledge/sources/finance/finance-bp-083--Economic-Dashboard/v6_inputs/ast_mind_map.yaml
anti_patterns: null
cross_project_wisdom: null
examples_kuc: knowledge/sources/finance/finance-bp-083--Economic-Dashboard/v6_inputs/examples_kuc.yaml
shared_pools_dir: knowledge/sources/finance/_shared
anti_patterns:
- id: AP-MACRO-DATA-001
title: SEC EDGAR Rate Limit Violation
description: When implementing SEC API calls without applying rate limiting decorators, requests exceed the regulatory 10
requests per second limit. This causes IP blocking from SEC EDGAR, preventing all subsequent access to critical financial
filings and completely disrupting the data collection pipeline. FinRobot demonstrates that SEC enforces strict rate limits
and missing User-Agent headers compound this by causing silent request failures.
project_source: finance-bp-074--FinRobot
severity: high
applicable_to_tags:
markets:
- global
activities:
- macro-data
_source_file: anti-patterns/macro-data.yaml
- id: AP-MACRO-DATA-002
title: Temporal Knowledge Graph Look-Ahead Bias
description: When implementing temporal data splitting for knowledge graphs, using non-temporal train/val/test splits causes
the model to see future events during training. The violation of train_edges occurring before val_edges and test_edges
temporally results in inflated metrics that do not reflect real-world performance. This produces overfit models that fail
catastrophically when deployed for actual temporal prediction tasks.
project_source: finance-bp-080--FinDKG
severity: high
applicable_to_tags:
markets:
- global
activities:
- macro-data
_source_file: anti-patterns/macro-data.yaml
- id: AP-MACRO-DATA-003
title: Technical Indicator Look-Ahead Bias via Missing Shift
description: When implementing SMA crossover detection (golden/death cross) without using shift(1) to compare current bar
state with prior bar state, crossover detection uses current bar data causing look-ahead bias. Signals appear to fire
at the same bar as the cross occurs, producing unrealistic backtest results that fail in live trading. Rationalizing this
with 'we need the current bar signal immediately' leads to future information leaking into current signals.
project_source: finance-bp-083--Economic-Dashboard
severity: high
applicable_to_tags:
markets:
- global
activities:
- macro-data
_source_file: anti-patterns/macro-data.yaml
- id: AP-MACRO-DATA-004
title: EIOPA Non-Compliant Curve Extrapolation
description: When implementing the Smith-Wilson algorithm for EIOPA Solvency II calculations, using non-EIOPA compliant
formulas or incorrect convergence point calculations violates regulatory specifications. The convergence point must use
max(U+40, 60) years per EIOPA paragraph 120. Non-compliant formulas will fail regulatory audits for insurance liability
calculations and produce incorrect risk-free rates, leading to materially wrong liability valuations.
project_source: finance-bp-077--Open_Source_Economic_Model
severity: high
applicable_to_tags:
markets:
- global
activities:
- macro-data
_source_file: anti-patterns/macro-data.yaml
- id: AP-MACRO-DATA-005
title: Factor Regression Using Raw Returns Instead of Excess Returns
description: When computing returns for CAPM/Fama-French factor regression, using raw stock returns instead of subtracting
the risk-free rate (Rf) violates standard financial econometric methodology. CAPM/FF regression requires excess returns
(Return - Rf); using raw returns produces incorrect beta estimates that misrepresent a stock's systematic risk exposure.
This leads to fundamentally flawed risk attribution and portfolio construction decisions.
project_source: finance-bp-105--open-climate-investing
severity: high
applicable_to_tags:
markets:
- global
activities:
- macro-data
_source_file: anti-patterns/macro-data.yaml
- id: AP-MACRO-DATA-006
title: Percentage vs Decimal Unit Mismatch in Factor Data
description: When importing Fama-French factors from CSV files, failing to divide percentage-formatted factors (e.g., 5.2)
by 100 before regression causes coefficients scaled by 100x. This produces statistically invalid inference and meaningless
factor loadings. The same issue applies to risk-free rate values, corrupting all CAPM beta calculations downstream.
project_source: finance-bp-105--open-climate-investing
severity: high
applicable_to_tags:
markets:
- global
activities:
- macro-data
_source_file: anti-patterns/macro-data.yaml
- id: AP-MACRO-DATA-007
title: Insufficient Regression Observations for Statistical Validity
description: When implementing factor regression analysis, using fewer than 20 data points after filtering (inner join,
winsorization, date range) produces unreliable or undefined t-statistics and p-values. OLS with insufficient observations
produces meaningless regression coefficients, making it impossible to distinguish significant factor exposures from noise.
This commonly occurs when combining multiple data sources with missing values.
project_source: finance-bp-105--open-climate-investing
severity: medium
applicable_to_tags:
markets:
- global
activities:
- macro-data
_source_file: anti-patterns/macro-data.yaml
- id: AP-MACRO-DATA-008
title: DGL Graph Attribute Propagation Failure in Temporal Batching
description: When implementing temporal knowledge graph data collation without propagating graph attributes (num_relations,
num_all_nodes, time_interval) to subgraph variants created by collate_fn, downstream model components encounter missing
attribute errors. The EmbeddingUpdater and EdgeModel expect these attributes on all graph objects including subgraphs,
causing training to fail with AttributeError.
project_source: finance-bp-080--FinDKG
severity: medium
applicable_to_tags:
markets:
- global
activities:
- macro-data
_source_file: anti-patterns/macro-data.yaml
- id: AP-MACRO-DATA-009
title: CSV BOM Encoding Corruption in Data Import
description: When importing CSV portfolio files with special characters without using 'utf-8-sig' encoding to handle BOM
markers, CSV files with UTF-8 BOM markers fail to parse correctly. This causes KeyError exceptions when reading row fields,
preventing portfolio data from loading entirely. The BOM marker silently corrupts the first column name read by pandas.
project_source: finance-bp-077--Open_Source_Economic_Model
severity: medium
applicable_to_tags:
markets:
- global
activities:
- macro-data
_source_file: anti-patterns/macro-data.yaml
- id: AP-MACRO-DATA-010
title: OHLCV Data Quality Validation Failure
description: When calculating technical indicators from OHLCV data without verifying required columns (open, high, low,
close, volume), missing required OHLCV columns causes ValueError and prevents technical indicator calculation for affected
tickers. This blocks downstream regime classification and pattern detection for all tickers with incomplete data.
project_source: finance-bp-083--Economic-Dashboard
severity: medium
applicable_to_tags:
markets:
- global
activities:
- macro-data
_source_file: anti-patterns/macro-data.yaml
- id: AP-MACRO-DATA-011
title: Inconsistent Primary Key Schema Causing JOIN Failures
description: When storing derived features in DuckDB with a different primary key schema than technical_features table,
inconsistent primary keys prevent JOIN operations between tables. This breaks regime classification and pattern detection
pipelines. The composite primary key (ticker, date) must be consistent across all feature tables to enable efficient querying
and data integrity.
project_source: finance-bp-083--Economic-Dashboard
severity: medium
applicable_to_tags:
markets:
- global
activities:
- macro-data
_source_file: anti-patterns/macro-data.yaml
- id: AP-MACRO-DATA-012
title: Frequency Column Enforcement Missing in Time Series Schema
description: When creating PostgreSQL schema for time series tables without explicit frequency column enforcement of 'MONTHLY'
or 'DAILY' text values, mixed frequency data corrupts regression calculations. Combining incompatible data frequencies
produces statistically invalid regression results. The database must enforce frequency consistency to prevent silent data
corruption.
project_source: finance-bp-105--open-climate-investing
severity: medium
applicable_to_tags:
markets:
- global
activities:
- macro-data
_source_file: anti-patterns/macro-data.yaml
- id: AP-MACRO-DATA-013
title: PostgreSQL Fork in Multiprocessing Context
description: When implementing multiprocessing for parallel regression execution using fork start method with psycopg2 database
connections, child processes inherit corrupted connection state. This causes 'connection already closed' errors or corrupted
connection state in child processes, resulting in failed database writes and incomplete factor regression calculations.
project_source: finance-bp-105--open-climate-investing
severity: medium
applicable_to_tags:
markets:
- global
activities:
- macro-data
_source_file: anti-patterns/macro-data.yaml
- id: AP-MACRO-DATA-014
title: Temporal DataLoader Shuffling Breaking Graph Ordering
description: When configuring DataLoader for temporal knowledge graph training with shuffle=True, the temporal ordering
required for cumulative graph construction is violated. The model receives edges in non-chronological order, breaking
the prior_G, batch_G, cumulative_G construction logic that depends on edges_before_batch occurring before edges_in_batch.
project_source: finance-bp-080--FinDKG
severity: medium
applicable_to_tags:
markets:
- global
activities:
- macro-data
_source_file: anti-patterns/macro-data.yaml
cross_project_wisdom:
- wisdom_id: CW-MACRO-DATA-001
source_project: finance-bp-080--FinDKG, finance-bp-083--Economic-Dashboard
pattern_name: Temporal Ordering Enforcement
description: Across temporal knowledge graphs and financial time series, strict temporal ordering must be enforced in train/val/test
splits and data loading. Train edges must occur strictly before validation edges, which must occur strictly before test
edges. DataLoaders must never shuffle temporal data. Apply this pattern whenever implementing any time-series ML pipeline
to prevent look-ahead bias that inflates evaluation metrics.
applicable_to_activity: macro-data
_source_file: cross-project-wisdom/macro-data.yaml
- wisdom_id: CW-MACRO-DATA-002
source_project: finance-bp-077--Open_Source_Economic_Model, finance-bp-105--open-climate-investing
pattern_name: Regulatory Formula Compliance
description: When implementing financial calculations subject to regulatory oversight (EIOPA Solvency II, CAPM, Fama-French),
use exact formula specifications from authoritative sources. The Smith-Wilson convergence point must follow EIOPA paragraph
120, factor regressions must use excess returns with properly scaled inputs. Apply this pattern when calculations will
be used for regulatory reporting or investment decision-making.
applicable_to_activity: macro-data
_source_file: cross-project-wisdom/macro-data.yaml
- wisdom_id: CW-MACRO-DATA-003
source_project: finance-bp-083--Economic-Dashboard, finance-bp-077--Open_Source_Economic_Model
pattern_name: Strict Data Schema Enforcement
description: Financial data pipelines require strict schema validation at ingestion points. OHLCV requires specific columns,
CSV imports require exact column names matching field access, INI files require specific sections. Missing or malformed
schema elements should fail loudly rather than produce silent corruption. Apply this pattern during data import to catch
errors early before downstream calculations use bad data.
applicable_to_activity: macro-data
_source_file: cross-project-wisdom/macro-data.yaml
- wisdom_id: CW-MACRO-DATA-004
source_project: finance-bp-105--open-climate-investing, finance-bp-080--FinDKG, finance-bp-083--Economic-Dashboard
pattern_name: Composite Primary Key Uniqueness
description: Time-series financial databases require composite primary keys (ticker, date) to ensure uniqueness and enable
efficient querying. Inconsistent primary keys across tables break JOIN operations essential for feature merging. Apply
this pattern when designing any financial database schema involving time-series measurements with multiple entities.
applicable_to_activity: macro-data
_source_file: cross-project-wisdom/macro-data.yaml
- wisdom_id: CW-MACRO-DATA-005
source_project: finance-bp-074--FinRobot
pattern_name: External API Rate Limiting
description: When accessing external financial APIs (SEC EDGAR, data vendors), strict rate limiting must be implemented
before deployment. SEC EDGAR enforces 10 requests per second with IP blocking consequences. Use decorators and proper
User-Agent headers. Apply this pattern when integrating any external financial data API to prevent service disruption
that blocks critical data access.
applicable_to_activity: macro-data
_source_file: cross-project-wisdom/macro-data.yaml
- wisdom_id: CW-MACRO-DATA-006
source_project: finance-bp-080--FinDKG, finance-bp-105--open-climate-investing
pattern_name: Graph Attribute Propagation in Batching
description: When creating subgraph variants during batch collation in graph-based ML, all metadata attributes (num_nodes,
num_relations, time_interval) must be explicitly propagated to each subgraph. Downstream model components expect these
attributes on all graph objects. Apply this pattern whenever implementing custom collate functions for graph neural networks
to prevent training failures.
applicable_to_activity: macro-data
_source_file: cross-project-wisdom/macro-data.yaml
- wisdom_id: CW-MACRO-DATA-007
source_project: finance-bp-105--open-climate-investing, finance-bp-083--Economic-Dashboard
pattern_name: Statistical Validity Thresholds
description: Factor regressions and statistical calculations require minimum observation counts (typically 20+) for reliable
inference. Inner joins, winsorization, and date filtering reduce observations; pipeline validation must check for sufficient
data points before regression. Apply this pattern whenever computing regression statistics to ensure results are meaningful
rather than spurious.
applicable_to_activity: macro-data
_source_file: cross-project-wisdom/macro-data.yaml
- wisdom_id: CW-MACRO-DATA-008
source_project: finance-bp-080--FinDKG, finance-bp-077--Open_Source_Economic_Model
pattern_name: Data Type Strictness for ML Operations
description: Graph operations and time calculations require strict dtype consistency (float32 for time values, integer for
node types, boolean for masks). Type mismatches cause silent failures in edge_subgraph, degree calculations, and time
interval transformations. Apply this pattern when preparing data for graph neural networks or any numerical ML pipeline
to catch dtype issues early.
applicable_to_activity: macro-data
_source_file: cross-project-wisdom/macro-data.yaml
domain_constraints_injected: []
resources_injected: {}
known_use_cases:
- kuc_id: KUC-101
source_file: scripts/create_database_snapshot_optimized.py
business_problem: Creates optimized database backups by partitioning hot (<90 days) and cold (>90 days) data into appropriate
storage formats with ZSTD compression and incremental exports.
intent_keywords:
- backup
- snapshot
- parquet
- database backup
- compress data
stage: data_collection
data_domain: mixed
type: data_pipeline
- kuc_id: KUC-102
source_file: scripts/compact_database.py
business_problem: Optimizes database performance by running VACUUM, rebuilding indexes, and deduplicating records within
retention windows while measuring compression savings.
intent_keywords:
- vacuum
- optimize
- database cleanup
- reclaim space
- index rebuild
stage: data_collection
data_domain: mixed
type: data_pipeline
- kuc_id: KUC-103
source_file: scripts/verify_api_keys.py
business_problem: Verifies the API key management feature implementation is working correctly by testing module imports,
credential initialization, and key storage/retrieval.
intent_keywords:
- verify API keys
- test credentials
- API setup verification
- credentials validation
stage: data_collection
data_domain: mixed
type: monitoring
- kuc_id: KUC-104
source_file: scripts/refresh_data.py
business_problem: Fetches each economic data from FRED and Yahoo Finance APIs daily and stores results in cache for dashboard
consumption.
intent_keywords:
- refresh data
- daily update
- FRED data
- yfinance
- economic data fetch
stage: data_collection
data_domain: financial_data
type: data_pipeline
- kuc_id: KUC-105
source_file: scripts/cleanup_old_data.py
business_problem: Archives data older than retention periods to Parquet files and deletes old records from main tables to
reduce database size while maintaining historical access.
intent_keywords:
- data retention
- cleanup old data
- archive historical
- delete old records
- retention policy
stage: data_collection
data_domain: mixed
type: data_pipeline
- kuc_id: KUC-106
source_file: scripts/quickstart_api_keys.py
business_problem: Provides a quick start guide for initializing and testing API key management, storing and verifying FRED
API keys securely.
intent_keywords:
- setup API keys
- quick start
- initialize credentials
- API key setup
stage: data_collection
data_domain: mixed
type: monitoring
- kuc_id: KUC-107
source_file: scripts/setup_credentials.py
business_problem: Initializes and stores API credentials (FRED API key) securely in encrypted form for authenticated data
access.
intent_keywords:
- setup credentials
- API key initialization
- secure storage
- FRED API
stage: data_collection
data_domain: mixed
type: monitoring
- kuc_id: KUC-108
source_file: scripts/move_fred_data.py
business_problem: Organizes FRED-related data files and scripts by moving them into a dedicated directory structure.
intent_keywords:
- organize files
- move FRED data
- file management
- directory structure
stage: data_collection
data_domain: financial_data
type: data_pipeline
- kuc_id: KUC-109
source_file: scripts/generate_sample_data.py
business_problem: Generates sample datasets for offline mode testing, including FRED, Yahoo Finance, and World Bank sample
data.
intent_keywords:
- generate sample data
- offline mode
- test data
- sample datasets
- offline testing
stage: data_collection
data_domain: mixed
type: data_pipeline
- kuc_id: KUC-110
source_file: scripts/init_database.py
business_problem: Initializes the DuckDB database by creating each required tables and indexes for the Economic Dashboard.
intent_keywords:
- init database
- create tables
- database setup
- DuckDB initialization
stage: data_collection
data_domain: mixed
type: data_pipeline
- kuc_id: KUC-111
source_file: scripts/fetch_sentiment_data.py
business_problem: Fetches news articles and sentiment data for specified stock symbols, including Google Trends data for
sentiment analysis.
intent_keywords:
- news sentiment
- fetch news
- sentiment analysis
- stock news
- Google Trends
stage: data_collection
data_domain: financial_data
type: research_analysis
- kuc_id: KUC-112
source_file: scripts/migrate_pickle_to_duckdb.py
business_problem: Migrates existing pickle cache files containing FRED and Yahoo Finance data to the new DuckDB database
format.
intent_keywords:
- migrate pickle
- convert cache
- DuckDB migration
- pickle to database
- data migration
stage: data_collection
data_domain: financial_data
type: data_pipeline
- kuc_id: KUC-113
source_file: scripts/refresh_data_smart.py
business_problem: Intelligently refreshes economic data based on natural update frequencies and SLAs, respecting rate limits
and only fetching data when needed.
intent_keywords:
- smart refresh
- SLA aware
- rate limit
- incremental refresh
- update frequency
stage: data_collection
data_domain: financial_data
type: data_pipeline
component_capability_map:
project: finance-bp-083--Economic-Dashboard
scan_date: '2026-04-22'
stats:
total_files: 7
total_classes: 36
total_functions: 0
total_stages: 7
modules:
data_collection:
class_count: 6
stage_id: data_collection
stage_order: 1
responsibility: Fetch economic data from FRED, Yahoo Finance, SEC, and CBOE APIs with offline fallback. Manages caching
and rate limiting to ensure reliable data access. This stage exists because financial analysis requires consistent,
fresh data from multiple authoritative sources, and the system must remain funct
classes:
- name: CredentialsManager.set_api_key
file: data_collection/credentialsmanager-set-api-key.py
line: 0
kind: required_method
signature: ''
- name: CredentialsManager.get_api_key
file: data_collection/credentialsmanager-get-api-key.py
line: 0
kind: required_method
signature: ''
- name: load_fred_data
file: data_collection/load-fred-data.py
line: 0
kind: required_method
signature: ''
- name: load_yfinance_data
file: data_collection/load-yfinance-data.py
line: 0
kind: required_method
signature: ''
- name: data_source_adapter
file: data_collection/data-source-adapter.py
line: 0
kind: replaceable_point
- name: cache_backend
file: data_collection/cache-backend.py
line: 0
kind: replaceable_point
design_decision_count: 4
feature_engineering:
class_count: 6
stage_id: feature_engineering
stage_order: 2
responsibility: Calculate technical indicators, options metrics, and derived features from raw price/volume data. Transforms
market data into ML-ready feature vectors. This stage exists because raw market data must be transformed into meaningful
signals before any predictive modeling or analysis can occur.
classes:
- name: TechnicalIndicatorCalculator.calculate_all
file: feature_engineering/technicalindicatorcalculator-calculate-a.py
line: 0
kind: required_method
signature: ''
- name: OptionsMetricsCalculator.calculate
file: feature_engineering/optionsmetricscalculator-calculate.py
line: 0
kind: required_method
signature: ''
- name: DerivedFeaturesCalculator.compute
file: feature_engineering/derivedfeaturescalculator-compute.py
line: 0
kind: required_method
signature: ''
- name: FeaturePipeline.run_full_pipeline
file: feature_engineering/featurepipeline-run-full-pipeline.py
line: 0
kind: required_method
signature: ''
- name: indicator_library
file: feature_engineering/indicator-library.py
line: 0
kind: replaceable_point
- name: feature_interactions
file: feature_engineering/feature-interactions.py
line: 0
kind: replaceable_point
design_decision_count: 4
financial_analysis:
class_count: 6
stage_id: financial_analysis
stage_order: 3
responsibility: Calculate margin risk, leverage exposure, insider trading signals, and financial health scores. Provides
risk metrics and market stress indicators for portfolio risk management. This stage exists because raw market data
needs risk-contextualization to support trading and investment decisions.
classes:
- name: MarginCallRiskCalculator.calculate
file: financial_analysis/margincallriskcalculator-calculate.py
line: 0
kind: required_method
signature: ''
- name: LeverageMetricsCalculator.compute
file: financial_analysis/leveragemetricscalculator-compute.py
line: 0
kind: required_method
signature: ''
- name: InsiderTradingTracker.analyze
file: financial_analysis/insidertradingtracker-analyze.py
line: 0
kind: required_method
signature: ''
- name: FinancialHealthScorer.score
file: financial_analysis/financialhealthscorer-score.py
line: 0
kind: required_method
signature: ''
- name: risk_weights
file: financial_analysis/risk-weights.py
line: 0
kind: replaceable_point
- name: insider_sentiment_formula
file: financial_analysis/insider-sentiment-formula.py
line: 0
kind: replaceable_point
design_decision_count: 4
ml_training_&_prediction:
class_count: 6
stage_id: ml_training_prediction
stage_order: 4
responsibility: Train XGBoost/LightGBM ensemble models for stock direction prediction. Uses walk-forward validation
to prevent lookahead bias in time series. This stage exists because statistical and ML models can identify patterns
in feature data that support forward-looking market predictions.
classes:
- name: ModelTrainer.train
file: ml_training_&_prediction/modeltrainer-train.py
line: 0
kind: required_method
signature: ''
- name: EnsembleModel.fit
file: ml_training_&_prediction/ensemblemodel-fit.py
line: 0
kind: required_method
signature: ''
- name: PredictionEngine.predict
file: ml_training_&_prediction/predictionengine-predict.py
line: 0
kind: required_method
signature: ''
- name: BaseModel.save
file: ml_training_&_prediction/basemodel-save.py
line: 0
kind: required_method
signature: ''
- name: base_models
file: ml_training_&_prediction/base-models.py
line: 0
kind: replaceable_point
- name: prediction_horizon
file: ml_training_&_prediction/prediction-horizon.py
line: 0
kind: replaceable_point
design_decision_count: 4
recession_probability:
class_count: 3
stage_id: recession_indicator
stage_order: 5
responsibility: Calculate recession probability using 7 weighted economic indicators (yield curve, labor, financial
stress, etc.). Provides forward-looking economic risk assessment for macro risk management. This stage exists because
single indicators are unreliable predictors; combining multiple signals reduces fa
classes:
- name: RecessionProbabilityModel.calculate
file: recession_probability/recessionprobabilitymodel-calculate.py
line: 0
kind: required_method
signature: ''
- name: RecessionProbabilityModel.get_probability
file: recession_probability/recessionprobabilitymodel-get-probabilit.py
line: 0
kind: required_method
signature: ''
- name: indicator_weights
file: recession_probability/indicator-weights.py
line: 0
kind: replaceable_point
design_decision_count: 3
orchestration_&_automation:
class_count: 3
stage_id: orchestration_automation
stage_order: 6
responsibility: Schedule data refresh via Airflow DAGs. Coordinates ETL tasks, validates data quality, and sends alerts
on failures. This stage exists because financial analysis requires fresh data on predictable schedules without manual
intervention.
classes:
- name: market_data_refresh_dag
file: orchestration_&_automation/market-data-refresh-dag.py
line: 0
kind: required_method
signature: ''
- name: economic_data_refresh_dag
file: orchestration_&_automation/economic-data-refresh-dag.py
line: 0
kind: required_method
signature: ''
- name: alert_channel
file: orchestration_&_automation/alert-channel.py
line: 0
kind: replaceable_point
design_decision_count: 3
visualization_&_ui:
class_count: 6
stage_id: visualization
stage_order: 7
responsibility: Streamlit-based dashboard pages for economic indicators, technical analysis, margin risk, insider trading,
and ML predictions. This stage exists because analytical outputs need intuitive presentation to support decision-making
by financial professionals.
classes:
- name: app.py (landing page)
file: visualization_&_ui/app-py-landing-page.py
line: 0
kind: required_method
signature: ''
- name: 10_Margin_Call_Risk_Monitor.render
file: visualization_&_ui/10-margin-call-risk-monitor-render.py
line: 0
kind: required_method
signature: ''
- name: 11_Recession_Probability.render
file: visualization_&_ui/11-recession-probability-render.py
line: 0
kind: required_method
signature: ''
- name: 13_Insider_Trading_Tracker.render
file: visualization_&_ui/13-insider-trading-tracker-render.py
line: 0
kind: required_method
signature: ''
- name: theme/styling
file: visualization_&_ui/theme-styling.py
line: 0
kind: replaceable_point
- name: chart_library
file: visualization_&_ui/chart-library.py
line: 0
kind: replaceable_point
design_decision_count: 3
data_flow_hints: []
locale_contract:
source_language: en
user_facing_fields:
- human_summary.what_i_can_do.tagline
- human_summary.what_i_can_do.use_cases[]
- human_summary.what_i_auto_fetch[]
- human_summary.what_i_ask_you[]
- evidence_quality.user_disclosure_template
- post_install_notice.message_template.positioning
- post_install_notice.message_template.capability_catalog.groups[].name
- post_install_notice.message_template.capability_catalog.groups[].description
- post_install_notice.message_template.capability_catalog.groups[].ucs[].name
- post_install_notice.message_template.capability_catalog.groups[].ucs[].short_description
- post_install_notice.message_template.call_to_action
- post_install_notice.message_template.featured_entries[].beginner_prompt
- post_install_notice.message_template.more_info_hint
- preconditions[].description
- preconditions[].on_fail
- intent_router.uc_entries[].name
- intent_router.uc_entries[].ambiguity_question
- architecture.pipeline
- architecture.stages[].narrative.does_what
- architecture.stages[].narrative.key_decisions
- architecture.stages[].narrative.common_pitfalls
- constraints.fatal[].consequence
- constraints.regular[].consequence
- output_validator.assertions[].failure_message
- acceptance.hard_gates[].on_fail
- skill_crystallization.action
locale_detection_order:
- explicit_user_declaration
- first_message_language
- system_locale
translation_enforcement:
trigger: on_first_user_message
action: Render user_facing_fields in detected locale, preserving all IDs (BD-/SL-/UC-/finance-C-) and code identifiers
verbatim
violation_code: LOCALE-01
violation_signal: User receives untranslated English Human Summary when detected locale != en
evidence_quality:
declared:
evidence_coverage_ratio: 1.0
evidence_verify_ratio: 0.28
evidence_invalid: 54
evidence_verified: 21
evidence_auto_fixed: 0
audit_coverage: 60/60 (100%)
audit_pass_rate: 3/60 (5%)
audit_fail_total: 33
audit_finance_universal:
pass: 2
warn: 9
fail: 9
audit_subdomain_totals:
pass: 1
warn: 15
fail: 24
enforcement_rules:
- id: EQ-01
trigger: declared.evidence_verify_ratio < 0.5
action: MUST invoke traceback lookup for all cited BD-IDs in output before emitting business code — read LATEST.yaml sections
for each BD referenced
violation_code: EQ-01-V
violation_signal: Generated script references BD-IDs but no tool_call to read LATEST.yaml preceded code generation
user_disclosure_template: '[QUALITY NOTICE] This crystal was compiled from blueprint finance-bp-083. Evidence verify ratio
= 28.0% and audit fail total = 33. Generated results may have uncaptured requirement gaps. Verify critical decisions against
source files (LATEST.yaml / LATEST.jsonl).'
traceback:
source_files:
blueprint: LATEST.yaml
constraints: LATEST.jsonl
mandatory_lookup_scenarios:
- id: TB-01
condition: Two constraints have apparently conflicting enforcement rules
lookup_target: LATEST.jsonl — find both constraint IDs, compare `consequence` + `evidence_refs` to determine priority
- id: TB-02
condition: A business decision rationale is unclear or disputed
lookup_target: LATEST.yaml — locate BD-ID under business_decisions, read `rationale` + `alternative_considered` fields
- id: TB-03
condition: evidence_invalid > 0 in evidence_quality.declared
lookup_target: LATEST.yaml _enrich_meta — cross-check specific BD `evidence_refs` fields for invalid markers
- id: TB-04
condition: User asks where a rule comes from
lookup_target: LATEST.jsonl — find constraint by ID, read `confidence.evidence_refs` for source file + line number
- id: TB-05
condition: Generated code does not match expected ZVT API behavior
lookup_target: LATEST.yaml stages[].required_methods — verify method signature and evidence locator in source code
degraded_lookup:
no_fs_access: 'Ask the user to paste the relevant LATEST.yaml section or LATEST.jsonl lines for the BD-/finance-C- IDs
in question. Crystal ID: finance-bp-083-v5.0.'
trace_schema:
event_types:
- precondition_check
- spec_lock_check
- evidence_rule_fired
- evidence_rule_skipped
- locale_translation_emitted
- hard_gate_passed
- hard_gate_failed
- skill_emitted
- false_completion_claim
preconditions:
- id: PC-01
description: zvt package installed and importable
check_command: python3 -c 'import zvt; print(zvt.__version__)'
on_fail: 'Run: python3 -m pip install zvt then re-run: python3 -m zvt.init_dirs to initialize data directories'
severity: fatal
- id: PC-02
description: K-data exists for target entities (required before backtesting)
check_command: python3 -c "from zvt.api.kdata import get_kdata; df = get_kdata(entity_ids=['stock_sh_600000'], limit=1);
assert df is not None and len(df) > 0, 'No kdata found'"
on_fail: 'Run recorder first: python3 -m zvt.recorders.em.em_stock_kdata_recorder --entity_ids stock_sh_600000 (replace
with your target entity IDs)'
severity: fatal
applies_to_uc: []
- id: PC-03
description: ZVT data directory initialized (~/.zvt or ZVT_HOME)
check_command: 'python3 -c "import os; from pathlib import Path; zvt_home = Path(os.environ.get(''ZVT_HOME'', Path.home()
/ ''.zvt'')); assert zvt_home.exists(), f''ZVT home not found: {zvt_home}''"'
on_fail: 'Run: python3 -m zvt.init_dirs'
severity: fatal
- id: PC-04
description: SQLite write permission for ZVT data directory
check_command: python3 -c "import os, tempfile; from pathlib import Path; zvt_home = Path(os.environ.get('ZVT_HOME', Path.home()
/ '.zvt')); test_f = zvt_home / '.write_test'; test_f.touch(); test_f.unlink()"
on_fail: 'Check directory permissions: chmod u+w ~/.zvt or set ZVT_HOME environment variable to a writable location'
severity: warn
intent_router:
uc_entries:
- uc_id: UC-101
name: Database Snapshot Optimization
positive_terms:
- backup
- snapshot
- parquet
- database backup
- compress data
data_domain: mixed
negative_terms:
- restore
- live trading
- backtest
- ML prediction
ambiguity_question: Do you need a one-time backup or scheduled recurring snapshots?
- uc_id: UC-102
name: Database Compaction and Optimization
positive_terms:
- vacuum
- optimize
- database cleanup
- reclaim space
- index rebuild
data_domain: mixed
negative_terms:
- restore
- backup
- trading strategy
- screening
ambiguity_question: Are you experiencing slow query performance or trying to reduce storage size?
- uc_id: UC-103
name: API Key Management Verification
positive_terms:
- verify API keys
- test credentials
- API setup verification
- credentials validation
data_domain: mixed
negative_terms:
- data refresh
- backtest
- screening
- live trading
ambiguity_question: Are you setting up credentials for the first time or troubleshooting existing credentials?
- uc_id: UC-104
name: Daily Economic Data Refresh
positive_terms:
- refresh data
- daily update
- FRED data
- yfinance
- economic data fetch
data_domain: financial_data
negative_terms:
- backtest
- screening
- ML prediction
- parquet export
ambiguity_question: Do you need to refresh each data or just specific data sources?
- uc_id: UC-105
name: Data Retention Policy Cleanup
positive_terms:
- data retention
- cleanup old data
- archive historical
- delete old records
- retention policy
data_domain: mixed
negative_terms:
- live trading
- backtest
- ML prediction
- verify API keys
ambiguity_question: Do you want to archive data to Parquet before deletion, or just delete old records?
- uc_id: UC-106
name: API Key Management Quickstart
positive_terms:
- setup API keys
- quick start
- initialize credentials
- API key setup
data_domain: mixed
negative_terms:
- data refresh
- backtest
- screening
- database snapshot
ambiguity_question: Are you setting up API keys for the first time or updating existing keys?
- uc_id: UC-107
name: Credentials Initialization
positive_terms:
- setup credentials
- API key initialization
- secure storage
- FRED API
data_domain: mixed
negative_terms:
- data refresh
- backtest
- screening
- database compaction
ambiguity_question: Do you need to add new API keys or update existing ones?
- uc_id: UC-108
name: FRED Data File Organization
positive_terms:
- organize files
- move FRED data
- file management
- directory structure
data_domain: financial_data
negative_terms:
- refresh data
- backtest
- ML prediction
- API keys
ambiguity_question: Is this a one-time file organization task or part of a larger migration?
- uc_id: UC-109
name: Offline Sample Data Generation
positive_terms:
- generate sample data
- offline mode
- test data
- sample datasets
- offline testing
data_domain: mixed
negative_terms:
- live trading
- backtest
- API keys
- production data
ambiguity_question: Do you need sample data for a specific source or each sources?
- uc_id: UC-110
name: DuckDB Database Initialization
positive_terms:
- init database
- create tables
- database setup
- DuckDB initialization
data_domain: mixed
negative_terms:
- data refresh
- backtest
- screening
- API keys
ambiguity_question: Are you setting up a new database or resetting an existing one?
- uc_id: UC-111
name: News and Sentiment Data Fetching
positive_terms:
- news sentiment
- fetch news
- sentiment analysis
- stock news
- Google Trends
data_domain: financial_data
negative_terms:
- live trading
- backtest
- database snapshot
- API key verification
ambiguity_question: Do you need sentiment data for specific symbols or a broad market scan?
- uc_id: UC-112
name: Pickle Cache to DuckDB Migration
positive_terms:
- migrate pickle
- convert cache
- DuckDB migration
- pickle to database
- data migration
data_domain: financial_data
negative_terms:
- live trading
- backtest
- ML prediction
- API keys
ambiguity_question: Are you migrating historical data only or also maintaining pickle as cache?
- uc_id: UC-113
name: Smart Data Refresh with SLA Awareness
positive_terms:
- smart refresh
- SLA aware
- rate limit
- incremental refresh
- update frequency
data_domain: financial_data
negative_terms:
- full refresh
- backup
- API key setup
- ML prediction
ambiguity_question: Do you need a forced full refresh or smart incremental updates based on SLAs?
context_state_machine:
states:
- id: CA1_MEMORY_CHECKED
entry: Task started
exit: All memory queries attempted and recorded; memory_unavailable set if failed
timeout: 30s — skip memory, mark memory_unavailable=true, proceed to CA2
- id: CA2_GAPS_FILLED
entry: CA1 complete
exit: 'All FATAL-priority required inputs answered: target market (A-share/HK/US), data source, time range, strategy type'
timeout: NOT skippable — FATAL inputs MUST be user-answered before proceeding
- id: CA3_PATH_SELECTED
entry: CA2 complete
exit: intent_router matched single use case with confidence gap > 20% over next candidate, no data_domain ambiguity
timeout: Trigger ambiguity_question for top-2 candidates, await user selection
- id: CA4_EXECUTING
entry: CA3 complete + user explicit confirmation received
exit: All hard gates G1-Gn passed and output files written
timeout: NOT skippable — user confirmation of execution path required
enforcement: Code generation is PROHIBITED before CA4_EXECUTING. Any regression to earlier state MUST be announced to user.
buy/sell ordering SL-01 check runs at CA4 entry.
spec_lock_registry:
semantic_locks:
- id: SL-01
description: Execute sell orders before buy orders in every trading cycle
locked_value: sell() called before buy() in each Trader.run() iteration
violation_is: fatal
source_bd_ids:
- BD-018
- id: SL-02
description: Trading signals MUST use next-bar execution (no look-ahead)
locked_value: due_timestamp = happen_timestamp + level.to_second()
violation_is: fatal
source_bd_ids:
- BD-014
- BD-025
- id: SL-03
description: Entity IDs MUST follow format entity_type_exchange_code
locked_value: stock_sh_600000 | stockhk_hk_0700 | stockus_nasdaq_AAPL
violation_is: fatal
source_bd_ids: []
- id: SL-04
description: DataFrame index MUST be MultiIndex (entity_id, timestamp)
locked_value: df.index.names == ['entity_id', 'timestamp']
violation_is: fatal
source_bd_ids: []
- id: SL-05
description: 'TradingSignal MUST have EXACTLY ONE of: position_pct, order_money, order_amount'
locked_value: XOR enforcement in trading/__init__.py:68
violation_is: fatal
source_bd_ids: []
- id: SL-06
description: 'filter_result column semantics: True=BUY, False=SELL, None/NaN=NO ACTION'
locked_value: factor.py:475 order_type_flag mapping
violation_is: fatal
source_bd_ids: []
- id: SL-07
description: Transformer MUST run BEFORE Accumulator in factor pipeline
locked_value: 'compute_result(): transform at :403 before accumulator at :409'
violation_is: fatal
source_bd_ids: []
- id: SL-08
description: 'MACD parameters locked: fast=12, slow=26, signal=9'
locked_value: factors/algorithm.py:30 macd(slow=26, fast=12, n=9)
violation_is: fatal
source_bd_ids:
- BD-036
- id: SL-09
description: 'Default transaction costs: buy_cost=0.001, sell_cost=0.001, slippage=0.001'
locked_value: sim_account.py:25 SimAccountService default costs
violation_is: warning
source_bd_ids:
- BD-029
- id: SL-10
description: A-share equity trading is T+1 (no same-day close of buy positions)
locked_value: sim_account.available_long filters by trading_t
violation_is: fatal
source_bd_ids: []
- id: SL-11
description: Recorder subclass MUST define provider AND data_schema class attributes
locked_value: contract/recorder.py:71 Meta; register_schema decorator
violation_is: fatal
source_bd_ids: []
- id: SL-12
description: Factor result_df MUST contain either 'filter_result' OR 'score_result' column
locked_value: result_df.columns.intersection({'filter_result', 'score_result'}) non-empty
violation_is: fatal
source_bd_ids: []
implementation_hints:
- id: IH-01
hint: 'Use AdjustType enum exactly: qfq (pre-adjust), hfq (post-adjust), bfq (none) — contract/__init__.py:121'
- id: IH-02
hint: For A-share kdata, default to hfq for long-term analysis (dividend-adjusted) — trader.py:538 StockTrader
- id: IH-03
hint: SQLite connection MUST use check_same_thread=False for multi-threaded recorders
- id: IH-04
hint: Accumulator state serialization uses JSON with custom encoder/decoder hooks — contract/base_service.py
- id: IH-05
hint: Factor.level MUST match TargetSelector.level (enforced at add_factor) — factors/target_selector.py:84
preservation_manifest:
required_objects:
business_decisions_count: 111
fatal_constraints_count: 37
non_fatal_constraints_count: 147
use_cases_count: 13
semantic_locks_count: 12
preconditions_count: 4
evidence_quality_rules_count: 2
traceback_scenarios_count: 5
architecture:
pipeline: data_collection -> data_storage -> factor_computation -> target_selection -> trading_execution -> visualization
stages:
- id: data_collection
narrative:
does_what: TimeSeriesDataRecorder and FixedCycleDataRecorder fetch OHLCV and fundamental data from providers (eastmoney,
joinquant, baostock, akshare) and persist domain objects (Stock1dKdata, BalanceSheet) to SQLite via df_to_db().
key_decisions: BD-002 chose evaluate_start_end_size_timestamps for incremental fetch (not full refresh) because comparing
to get_latest_saved_record avoids redundant API calls; BD-003 chose get_data_map field transformation to keep domain
schema provider-agnostic.
common_pitfalls: 'Don''t forget SL-11: Recorder subclass MUST declare both provider and data_schema class attributes
else initialization fails with assertion error; finance-C-001 fatal violation.'
business_decisions:
- id: BD-001
type: B
summary: DuckDB singleton connection
- id: BD-002
type: BA
summary: Offline mode with sample data fallback
- id: BD-003
type: BA
summary: SLA-based refresh scheduling (Daily=6h, Weekly=1d, Monthly=7d, Quarterly=30d)
- id: BD-004
type: B/RC
summary: Encrypted credentials storage with Fernet symmetric encryption
- id: BD-GAP-001
type: DK
summary: 'Missing: Trading calendar vs natural calendar'
- id: BD-GAP-002
type: DK
summary: 'Missing: Timezone explicit annotation'
- id: BD-GAP-003
type: RC
summary: 'Missing: float vs Decimal for currency'
- id: BD-GAP-004
type: M
summary: 'Missing: Matrix ill-conditioning and stability'
- id: BD-GAP-005
type: B
summary: 'Missing: PnL conservation'
- id: BD-GAP-006
type: DK
summary: 'Missing: Model and data version snapshot binding'
- id: BD-GAP-007
type: RC
summary: 'Missing: Settlement and delivery time convention'
- id: BD-GAP-008
type: RC
summary: 'Missing: Price and quantity precision (tick/lot size)'
- id: BD-GAP-009
type: B
summary: 'Missing: Cost Model Completeness'
- id: BD-GAP-010
type: B
summary: 'Missing: Carry/Funding Cost Modeling'
- id: BD-GAP-011
type: B
summary: 'Missing: Arbitrage-Free Constraints'
- id: BD-GAP-012
type: B
summary: 'Missing: Optimization Constraint Completeness'
- id: BD-GAP-013
type: DK
summary: 'Missing: Rebalancing Trigger Mechanism'
- id: BD-GAP-014
type: RC
summary: 'Missing: Implement timezone-aware datetime handling across each data operations. Use UTC as canonical timezone
and localize on display. Replace naive datetime.now() with timezone-aware alternatives.'
- id: BD-GAP-015
type: DK
summary: 'Missing: Trading calendar vs natural calendar'
- id: BD-GAP-016
type: RC
summary: 'Missing: float vs Decimal for currency'
- id: BD-GAP-017
type: M
summary: 'Missing: Matrix ill-conditioning and stability'
- id: BD-GAP-018
type: B
summary: 'Missing: PnL conservation'
- id: BD-GAP-019
type: B
summary: 'Missing: Greeks Calculation'
- id: BD-GAP-020
type: B
summary: 'Missing: Finite Difference Grid Stability'
- id: BD-GAP-021
type: B
summary: 'Missing: Covariance Estimator Selection'
- id: BD-GAP-022
type: B
summary: 'Missing: VaR/CVaR Confidence and Window'
- id: BD-GAP-023
type: B
summary: 'Missing: Default Definition and IFRS 9 Staging'
- id: BD-GAP-024
type: B
summary: 'Missing: PD/LGD/EAD Estimation Methods'
- id: BD-GAP-025
type: B
summary: 'Missing: FTP (Funds Transfer Pricing) Method'
- id: BD-GAP-026
type: B
summary: 'Missing: Cash Pool Legal Structure'
- id: data_storage
narrative:
does_what: StorageBackend persists DataFrames to per-provider SQLite databases at {data_path}/{provider}/{provider}_{db_name}.db
using path templates from _get_path_template; Mixin.record_data and Mixin.query_data provide uniform read/write interface.
key_decisions: BD-004 chose StorageBackend abstraction (not hardcoded SQLite) to allow future cloud storage swap; BD-006
derives db_name from data_schema __tablename__ for per-domain database isolation.
common_pitfalls: SL-04 violation (wrong DataFrame index) causes factor pipeline failures downstream; always ensure df.index.names
== ['entity_id', 'timestamp'] before calling record_data.
business_decisions: []
- id: factor_computation
narrative:
does_what: Factor.compute() applies Transformer (stateless, e.g. MacdTransformer) then Accumulator (stateful, e.g. MaStatsAccumulator)
to produce filter_result or score_result columns; EntityStateService persists per-entity rolling state across batches.
key_decisions: BD-007 chose Factor inheriting DataReader for composable data access; SL-08 locks MACD at (fast=12, slow=26,
n=9) — chose standard Appel parameters not adaptive because interpretability matters for practitioners.
common_pitfalls: 'SL-07: Transformer MUST run before Accumulator — swapping order causes NaN propagation; SL-12: result_df
must contain filter_result OR score_result column or TargetSelector silently drops all signals.'
business_decisions: []
- id: target_selection
narrative:
does_what: TargetSelector.add_factor() registers Factor instances; get_targets() returns entity_ids passing threshold
filter at a specific timestamp, enabling point-in-time historical backtesting without look-ahead.
key_decisions: BD-012 chose registrable factor list (not hardcoded) for runtime customization; BD-013 chose timestamp-specific
filtering not current-only because backtests need historical point-in-time correctness.
common_pitfalls: Factor.level MUST match TargetSelector.level (IH-05); mismatched levels cause silent empty target lists
that look like no signals but are actually level-mismatch bugs.
business_decisions: []
- id: trading_execution
narrative:
does_what: Trader.run() calls sell() before buy() each cycle, generates TradingSignals with due_timestamp = happen_timestamp
+ level.to_second() for next-bar execution, and applies on_profit_control() for stop-loss/take-profit before regular
target selection.
key_decisions: SL-01 locks sell-before-buy order because available_long check in sim_account depends on it — chose this
over symmetric ordering to prevent implicit leverage; BD-039 chose long=AND/short=OR multi-level logic to reflect
risk asymmetry.
common_pitfalls: 'SL-02 violation (immediate execution instead of next-bar) introduces look-ahead bias and makes backtest
results unreproducible in live trading; SL-10: A-share T+1 constraint — backtesting without it overstates returns.'
business_decisions: []
- id: visualization
narrative:
does_what: Drawer.draw() combines kline main chart with factor overlays and Rect annotations for entry/exit signals
using Plotly; Drawable interface on Factor enables consistent chart rendering across data types.
key_decisions: BD-019 chose drawer_rects subclass override for custom annotations not hardcoded markers — allows traders
to define entry/exit visuals without modifying base drawing logic.
common_pitfalls: draw_result=True by default (BD-055) is fine for development but set draw_result=False in production/headless
environments to avoid Plotly server startup overhead.
business_decisions:
- id: BD-022
type: BA
summary: Offline-first with live fallback (green=live, orange=offline badge)
- id: BD-023
type: BA
summary: Environment-based cache expiry (Dev=1h, Prod=24h)
- id: BD-024
type: BA
summary: 5-year chart lookback default for long-term trends
- id: cross_cutting_concerns
narrative:
does_what: 'Invariants and utilities that span multiple pipeline stages — collected from 23 source groups: default_value(8),
feature_engineering(3), financial_analysis(15), global(10), inheritance(1), invariant(2), and 17 more.'
key_decisions: 78 BDs merged here because they apply to more than one main stage (e.g. algorithm helpers, default value
choices, ordering contracts, error handling). Agent should inspect individual BD summaries and link back to affected
main stages via shared IDs.
common_pitfalls: Cross-cutting concerns frequently surface as bugs when changes to one main stage unintentionally break
another. Check constraints referencing these BDs and verify invariants still hold after any stage-local modification.
business_decisions:
- id: BD-061
type: B/BA
summary: DatabaseConnection singleton hardcodes memory limits affecting each queries
- id: BD-063
type: B/BA
summary: MarginCallRiskCalculator hardcodes component weights encoding business assumptions
- id: BD-064
type: BA/DK
summary: RecessionProbabilityModel encodes empirical indicator weights from financial research
- id: BD-067
type: B/BA
summary: InsiderTradingTracker classifies transactions using hardcoded bullish/bearish codes
- id: BD-068
type: BA
summary: Data refresh SLA contract defines cache expiry based on publication frequency
- id: BD-071
type: BA/DK
summary: 'VIX regime classification thresholds hardcoded: <15 Low, <20 Normal, <30 Elevated, else Crisis'
- id: BD-072
type: B/BA
summary: 'Default environment config cache_expiry differs: development=1h vs production=24h'
- id: BD-075
type: B/BA
summary: CredentialsManager stores encryption key with os.chmod 0o600 permissions
- id: BD-005
type: M
summary: Shift-based crossover detection for SMA50/SMA200
- id: BD-006
type: BA
summary: Regime classification thresholds (Bullish=RSI>60 AND MACD>0 AND Price>SMA50)
- id: BD-007
type: BA
summary: Feature pipeline validation (>10% null RSI or duplicate dates = failure)
- id: BD-008
type: BA
summary: Composite risk scoring with fixed weights (30/25/25/20)
- id: BD-009
type: BA
summary: Short interest thresholds (>30%=100 score, >20%=75 score)
- id: BD-010
type: BA
summary: Bullish/bearish transaction codes (P=Purchase, M=Exercise=bullish; S=Sale=bearish)
- id: BD-011
type: BA
summary: Sector classification (Offensive=XLY/XLK, Defensive=XLU/XLP, Cyclical=remainder)
- id: BD-026
type: B/DK
summary: Use SMA (Simple Moving Average) for trend identification
- id: BD-027
type: B
summary: Use EMA (Exponential Moving Average) for momentum indicators
- id: BD-028
type: B/BA
summary: Use RSI(14) as primary momentum oscillator
- id: BD-029
type: B/BA
summary: Use MACD with standard parameters for trend/momentum
- id: BD-030
type: B/BA
summary: Use Bollinger Bands with 2 standard deviations
- id: BD-031
type: B/DK
summary: Use ATR(14) to measure realized volatility
- id: BD-032
type: B
summary: Use Stochastic Oscillator %K=14, %D=3
- id: BD-033
type: B/DK
summary: Use Fibonacci ratios for price projection
- id: BD-034
type: B/DK
summary: Use Elliott Wave theory for pattern recognition
- id: BD-035
type: B/BA
summary: Classify trend strength using |avg_return|/volatility ratio
- id: BD-055
type: B/BA
summary: Calculate volume trend using linear regression slope
- id: BD-076
type: BA/M
summary: 'INTERACTION: BD-036 (Z-scores for feature normalization) × BD-052 (StandardScaler in ML) → Double-scaling
risk corrupts ensemble probability calibration'
- id: BD-077
type: BA
summary: 'INTERACTION: BD-021 (Weekday-only DAG scheduling) × BD-003 (SLA-based refresh intervals) → Weekend skips extend
effective SLA windows violating data freshness guarantees'
- id: BD-078
type: B/BA
summary: 'INTERACTION: BD-007 (Pipeline validation thresholds) × BD-062 (5-step execution order) → Validation contract
silently broken if pipeline order changes'
- id: BD-079
type: BA
summary: 'INTERACTION: BD-025 (Volatility threshold +1.0) × BD-037 (Percentile rank volatility) × BD-046 (Composite
margin risk) → Triple-counting volatility amplifies defensive positioning triggering'
- id: BD-080
type: BA/M
summary: 'INTERACTION: BD-061 (DB singleton memory limits) × BD-065 (EnsembleModel stacking) × BD-046 (Composite margin
risk) → Memory ceiling creates risk cascade for multi-component risk calculations'
- id: BD-081
type: B
summary: 'INTERACTION: BD-022 (Offline-first with badge) × BD-002 (Sample data fallback) × BD-023 (Env-based cache)
→ Conflicting data freshness signals confuse users about what''s live vs stale'
- id: BD-082
type: BA/M
summary: 'INTERACTION: BD-012 (Walk-forward validation) × BD-070 (Binary target definition) → Validation quality degraded
by target definition that discards magnitude'
- id: BD-083
type: BA/M
summary: 'INTERACTION: BD-016 (Yield curve dominant 25%) × BD-017 (12-18m lookback) × BD-044 (Recession model weights)
→ Yield curve dominates recession probability creating single-point-of-failure in recession'
- id: BD-084
type: B/RC
summary: 'INTERACTION: BD-004 (Fernet encryption) × BD-075 (0o600 key permissions) → Encryption provides false sense
of security against privileged escalation'
- id: BD-085
type: BA/DK
summary: 'INTERACTION: BD-024 (5-year default lookback) × BD-006 (Regime classification thresholds) × BD-038 (Momentum
regime classification) → Historical thresholds calibrated on different market structure may'
- id: BD-069
type: B/BA
summary: BaseModel.apply StandardScaler to each features during fit, serialized in model save
- id: BD-070
type: BA
summary: 'ML training uses binary target: future_close > close determines UP(1)/DOWN(0)'
- id: BD-074
type: B/BA
summary: Database schema enforces composite primary keys (ticker, date) across feature tables
- id: BD-012
type: BA/M
summary: Walk-forward validation using TimeSeriesSplit
- id: BD-013
type: BA/DK
summary: 5-day prediction horizon aligned with weekly rebalancing
- id: BD-014
type: M/DK
summary: Ensemble with LogisticRegression meta-learner stacking
- id: BD-015
type: BA/M
summary: StandardScaler on features before meta-learner
- id: BD-036
type: B/BA
summary: Calculate Z-scores for feature normalization
- id: BD-037
type: B/DK
summary: Classify volatility regime using percentile rank (75th/25th)
- id: BD-038
type: B/BA
summary: Classify momentum regime using RSI/MACD/price position
- id: BD-056
type: B
summary: Classify insider sentiment using weighted buy/sell value
- id: BD-057
type: B/BA
summary: Use title-based insider weighting (CEO=3.0, CFO=2.5)
- id: BD-058
type: B/BA
summary: Use 2x spike threshold for unusual activity detection
- id: BD-045
type: B/BA
summary: Calculate VIX stress score from VIX and VVIX
- id: BD-046
type: B/DK
summary: Use composite margin risk from 4 components (leverage/volume/options/liquidity)
- id: BD-039
type: B/BA
summary: Calculate IV Rank as position in historical IV range
- id: BD-040
type: B
summary: Calculate IV Percentile as percentage of days below current IV
- id: BD-041
type: B/BA
summary: Use Herfindahl Index to measure sector concentration
- id: BD-042
type: B/RC
summary: Calculate relative strength as excess return vs SPY
- id: BD-059
type: B/RC
summary: Calculate sector correlation matrix using Pearson correlation
- id: BD-060
type: B/RC
summary: Use dual momentum (10-day/50-day) for sector rotation
- id: BD-054
type: B/DK
summary: Calculate historical volatility as annualized std of returns
- id: BD-053
type: B
summary: Calculate Sharpe ratio for strategy evaluation
- id: BD-047
type: B/DK
summary: Use XGBoost for stock direction prediction
- id: BD-048
type: B
summary: Use LightGBM for faster gradient boosting
- id: BD-049
type: B
summary: Use ensemble with LogisticRegression meta-learner
- id: BD-052
type: B/BA
summary: Use StandardScaler for feature normalization in ML
- id: BD-043
type: B
summary: Use Sahm Rule (0.5% unemployment rise from 12-month low)
- id: BD-044
type: B
summary: Use weighted recession probability with yield curve dominant
- id: BD-050
type: B/DK
summary: Use walk-forward validation for time series
- id: BD-051
type: B/DK
summary: Use 5-day prediction horizon (1 trading week)
- id: BD-019
type: BA
summary: Parallel ETL tasks (ICI and VIX fetches run concurrently)
- id: BD-020
type: BA
summary: 3 retries with 5-minute delay for API failures
- id: BD-021
type: BA/DK
summary: Weekday-only DAG scheduling (0 7 * * 1-5 = 7 AM UTC, Mon-Fri)
- id: BD-062
type: B/RC
summary: 'FeaturePipeline mandates 5-step execution order: tech→options→derived→margin_risk→quality'
- id: BD-066
type: B/DK
summary: Airflow DAG enforces init→[refreshs]→validate→notify dependency chain
- id: BD-025
type: B/BA
summary: Volatility regime threshold +1.0 for high volatility
- id: BD-065
type: BA/M
summary: 'EnsembleModel uses 2-level stacking: XGBoost+LightGBM base → LogisticRegression meta-learner'
- id: BD-073
type: B/BA
summary: DerivedFeaturesCalculator uses shift(1) to detect golden/death cross transitions
- id: BD-016
type: BA/M
summary: Yield curve has highest weight (0.25) in recession model
- id: BD-017
type: BA/M
summary: 12-18 month inversion lookback (365*1.5 days)
- id: BD-018
type: BA/DK
summary: 7-indicator weighted scoring for recession probability
resources:
packages:
- name: streamlit
version_pin: latest
- name: pandas
version_pin: latest
- name: plotly
version_pin: latest
- name: yfinance
version_pin: latest
- name: pandas-datareader
version_pin: latest
- name: numpy
version_pin: latest
- name: duckdb
version_pin: latest
- name: xgboost
version_pin: latest
- name: lightgbm
version_pin: latest
- name: scikit-learn
version_pin: latest
strategy_scaffold:
entry_point_name: run_backtest
output_path: result.csv
execution_mode: backtest
conditional_entry_points:
backtest:
entry_point_name: run_backtest
output_path: result.csv
collector:
entry_point_name: run_collector
output_path: result.json
factor:
entry_point_name: run_factor
output_path: result.parquet
training:
entry_point_name: run_training
output_path: result.json
serving:
entry_point_name: run_server
output_path: result.json
research:
entry_point_name: run_research
output_path: result.json
tail_template: "# === DO NOT MODIFY BELOW THIS LINE ===\nif __name__ == \"__main__\":\n result = run_backtest() #\
\ implement above\n from validate import enforce_validation\n enforce_validation(result, output_path=\"{workspace}/result.csv\"\
)\n# === END DO NOT MODIFY ==="
host_adapter:
target: openclaw
timeout_seconds: 1800
shell_operator_restriction: 'exec tool intercepts && / ; / | — never chain: ''pip install X && python Y''. Use separate
exec calls.'
install_recipes:
- python3 -m pip install streamlit
- python3 -m pip install pandas
- python3 -m pip install plotly
- python3 -m pip install zvt
credential_injection: JoinQuant/QMT credentials require user-side '!' prefix shell login. Never hardcode credentials in
generated scripts.
path_resolution: '{workspace} resolves to ~/.openclaw/workspace/doramagic at execution time.'
file_io_tooling: Use openclaw 'write' tool for .py/.sql files; 'exec' tool for python3 /absolute/path/script.py (absolute
paths only).
constraints:
fatal:
- id: finance-C-001
when: When storing API credentials on filesystem
action: set file permissions to 0o600 to restrict access to owner only
severity: fatal
kind: domain_rule
modality: must
consequence: World-readable credentials files expose API keys to unauthorized users, enabling data theft or quota abuse
on external services
stage_ids:
- data_collection
- id: finance-C-017
when: When calculating technical indicators from OHLCV data
action: Verify OHLCV DataFrame contains required columns (open, high, low, close, volume)
severity: fatal
kind: domain_rule
modality: must
consequence: Missing required OHLCV columns causes ValueError and prevents technical indicator calculation for affected
tickers
stage_ids:
- feature_engineering
- id: finance-C-018
when: When implementing SMA crossover detection (golden/death cross)
action: Use shift(1) to compare current bar state with prior bar state for transition detection
severity: fatal
kind: domain_rule
modality: must
consequence: Without shift(1), crossover detection uses current bar data causing look-ahead bias where signals appear
to fire at the same bar as the cross occurs
stage_ids:
- feature_engineering
- id: finance-C-020
when: When validating feature data quality after calculation
action: Fail pipeline validation if more than 10% of RSI values are null
severity: fatal
kind: domain_rule
modality: must
consequence: High null rate in RSI indicates insufficient historical data or data quality issues, leading to unreliable
regime classifications and incorrect trading signals
stage_ids:
- feature_engineering
- id: finance-C-021
when: When validating feature data quality after calculation
action: Fail pipeline validation if duplicate dates exist in technical features table
severity: fatal
kind: domain_rule
modality: must
consequence: Duplicate dates violate PRIMARY KEY constraint and cause incorrect feature associations, leading to wrong
price patterns and trading signals
stage_ids:
- feature_engineering
- id: finance-C-028
when: When storing technical features in DuckDB
action: Use composite primary key (ticker, date) to verify uniqueness and enable efficient querying
severity: fatal
kind: architecture_guardrail
modality: must
consequence: Duplicate (ticker, date) pairs cause incorrect feature retrieval and violate data integrity constraints in
downstream ML training
stage_ids:
- feature_engineering
- id: finance-C-029
when: When storing derived features in DuckDB
action: Use composite primary key (ticker, date) consistent with technical_features table schema
severity: fatal
kind: architecture_guardrail
modality: must
consequence: Inconsistent primary keys prevent JOIN operations between technical and derived features, breaking regime
classification and pattern detection
stage_ids:
- feature_engineering
- id: finance-C-036
when: When implementing crossover detection patterns
action: Skip shift(1) with reasoning that 'the data looks clean' or 'we need the current bar signal immediately'
severity: fatal
kind: rationalization_guard
modality: must_not
consequence: Rationalizing look-ahead bias introduction causes future information to leak into current signals, producing
unrealistic backtest results that fail in live trading
stage_ids:
- feature_engineering
- id: finance-C-037
when: When encountering high null rates in technical indicators
action: Skip validation with assumption that 'null values are acceptable for less common indicators'
severity: fatal
kind: rationalization_guard
modality: must_not
consequence: Ignoring elevated null rates leads to incomplete feature sets that cause ML model training failures or silent
prediction errors
stage_ids:
- feature_engineering
- id: finance-C-038
when: When implementing composite margin risk scoring
action: use exactly 30/25/25/20 weights for leverage/volatility/options/liquidity components that sum to 100%
severity: fatal
kind: domain_rule
modality: must
consequence: Incorrect weight allocation will distort risk assessment, causing underestimation of leverage exposure and
leading to inappropriate trading decisions
stage_ids:
- financial_analysis
- id: finance-C-039
when: When classifying short interest for squeeze risk
action: 'apply threshold breakpoints: >30% SI gets score 100, >20% gets 75, >10% gets 50, >5% gets 25, else 0'
severity: fatal
kind: domain_rule
modality: must
consequence: Incorrect short interest scoring thresholds will fail to identify high-risk short squeeze candidates, missing
critical leverage risk signals
stage_ids:
- financial_analysis
- id: finance-C-040
when: When classifying VIX regime for volatility assessment
action: 'map VIX levels to regimes: <15=Low, <20=Normal, <30=Elevated, >=30=Crisis'
severity: fatal
kind: domain_rule
modality: must
consequence: Incorrect VIX regime mapping will misclassify market stress levels, causing improper margin risk assessment
during crisis periods
stage_ids:
- financial_analysis
- id: finance-C-041
when: When parsing SEC Form 4 insider transactions
action: classify transaction codes P (Purchase) and M (Exercise) as bullish; S (Sale) as bearish; A/D/F/G/E as neutral
severity: fatal
kind: domain_rule
modality: must
consequence: Misclassifying insider transaction codes will invert sentiment signals, causing contrarian trading decisions
opposite to actual insider activity
stage_ids:
- financial_analysis
- id: finance-C-043
when: When calculating Altman Z-Score for bankruptcy prediction
action: 'apply Z-Score interpretation: >2.99=Safe Zone, 1.81-2.99=Grey Zone, <1.81=Distress Zone'
severity: fatal
kind: domain_rule
modality: must
consequence: Incorrect Z-Score threshold boundaries will misclassify bankruptcy risk, leading to investment in financially
distressed companies
stage_ids:
- financial_analysis
- id: finance-C-044
when: When calculating sector relative strength
action: compute relative strength as sector return minus SPY return (benchmark)
severity: fatal
kind: domain_rule
modality: must
consequence: Incorrect benchmark calculation will misrepresent sector outperformance, causing wrong risk-on/off rotation
signals
stage_ids:
- financial_analysis
- id: finance-C-046
when: When classifying sectors for rotation analysis
action: 'use predefined classification: OFFENSIVE=Technology/Consumer Discretionary/Communication/Financials, DEFENSIVE=Utilities/Consumer
Staples/Healthcare, CYCLICAL=Energy/Materials/Industrials/Real Estate'
severity: fatal
kind: domain_rule
modality: must
consequence: Incorrect sector classification will invert risk-on/off signals, causing portfolio to take opposite positions
during regime changes
stage_ids:
- financial_analysis
- id: finance-C-050
when: When computing composite margin call risk score
action: 'apply formula: composite = leverage_score*0.30 + volatility_score*0.25 + options_score*0.25 + liquidity_score*0.20'
severity: fatal
kind: domain_rule
modality: must
consequence: Incorrect composite formula will misrepresent total margin risk, leading to inadequate position sizing during
high-stress periods
stage_ids:
- financial_analysis
- id: finance-C-058
when: When implementing walk-forward validation for stock price prediction
action: Use sklearn.model_selection.TimeSeriesSplit instead of random cross-validation
severity: fatal
kind: domain_rule
modality: must
consequence: Random cross-validation on time series causes look-ahead bias where future data leaks into training, making
backtest results unrealistically optimistic and live trading performance far worse
stage_ids:
- ml_training_prediction
- id: finance-C-059
when: When computing binary target variable for directional prediction
action: Calculate target as 1 if future_close > close, else 0 using LEAD window function
severity: fatal
kind: domain_rule
modality: must
consequence: Incorrect target calculation causes models to learn wrong patterns, producing systematic prediction errors
that cannot be corrected by hyperparameter tuning
stage_ids:
- ml_training_prediction
- id: finance-C-067
when: When executing walk-forward validation
action: 'Maintain strict temporal ordering: each training indices precede validation indices in each fold'
severity: fatal
kind: architecture_guardrail
modality: must
consequence: Any shuffle or random sampling within TimeSeriesSplit introduces look-ahead bias, inflating validation metrics
and producing misleading live trading expectations
stage_ids:
- ml_training_prediction
- id: finance-C-070
when: When presenting ML model predictions to users
action: Claim that backtest returns equal expected live trading returns
severity: fatal
kind: claim_boundary
modality: must_not
consequence: Backtest results exclude transaction costs, slippage, and market impact that materially reduce live returns;
presenting backtest as live-equivalent constitutes misleading financial disclosure
stage_ids:
- ml_training_prediction
- id: finance-C-074
when: When implementing the recession probability calculation
action: Verify indicator weights sum to exactly 1.0 for proper probability normalization
severity: fatal
kind: domain_rule
modality: must
consequence: If weights do not sum to 1.0, the recession probability will be incorrectly scaled, leading to systematic
over/underestimation of recession risk and poor investment allocation decisions
stage_ids:
- recession_indicator
- id: finance-C-079
when: When configuring indicator weights
action: 'Define weights for each 7 signal categories: yield_curve, labor_market, financial_stress, economic_activity,
consumer, housing, market'
severity: fatal
kind: architecture_guardrail
modality: must
consequence: Missing weights for any signal category will cause KeyError during probability calculation, breaking the
entire recession probability model
stage_ids:
- recession_indicator
- id: finance-C-081
when: When displaying recession probability to users
action: Claim predictive accuracy or guarantee future recession timing
severity: fatal
kind: claim_boundary
modality: must_not
consequence: Overstating prediction accuracy will mislead investors into making risky allocation decisions based on false
confidence in recession forecasting, potentially causing significant financial losses
stage_ids:
- recession_indicator
- id: finance-C-085
when: When calculating the historical probability time series
action: Apply proper rolling window calculation using only data available up to each historical date point
severity: fatal
kind: domain_rule
modality: must
consequence: Using future data in historical calculations introduces look-ahead bias, causing historical probabilities
to appear more accurate than they actually were and invalidating backtested performance metrics
stage_ids:
- recession_indicator
- id: finance-C-088
when: When implementing the weighted probability calculation
action: Calculate weighted probability as sum of (signal * weight) for each 7 signals
severity: fatal
kind: architecture_guardrail
modality: must
consequence: Incorrect weighted sum calculation will produce wrong recession probabilities, either double-counting certain
indicators or ignoring others entirely, corrupting the model's core output
stage_ids:
- recession_indicator
- id: finance-C-089
when: When implementing data refresh tasks in Airflow DAGs
action: Raise exception when no data records are fetched during refresh
severity: fatal
kind: domain_rule
modality: must
consequence: Dashboard displays empty or outdated data without any error indication, leading to incorrect economic analysis
and trading decisions based on missing data
stage_ids:
- orchestration_automation
- id: finance-C-090
when: When validating refreshed market data quality
action: Raise exception and flag data as stale when latest date exceeds threshold (ICI >14 days, VIX >7 days)
severity: fatal
kind: domain_rule
modality: must
consequence: Dashboard presents stale economic indicators as current data, potentially causing analysts to make decisions
based on outdated market conditions
stage_ids:
- orchestration_automation
- id: finance-C-099
when: When defining Airflow DAG task dependencies
action: Verify init_schema task completes before refresh tasks, and validation completes before notification
severity: fatal
kind: architecture_guardrail
modality: must
consequence: Refresh tasks run before schema initialization, causing database table not found errors and incomplete data
loads into non-existent tables
stage_ids:
- orchestration_automation
- id: finance-C-128
when: When preparing training data for ML models from joined feature tables
action: Use LEAD() window function with prediction_horizon offset for future_close to prevent look-ahead bias
severity: fatal
kind: domain_rule
modality: must
consequence: ML model trained with look-ahead bias produces inflated backtest performance that does not generalize to
live trading
- id: finance-C-129
when: When generating predictions using features with shifted close prices
action: Apply shift(1) to close price before calculating returns to prevent using current candle data for current prediction
severity: fatal
kind: domain_rule
modality: must
consequence: Technical analysis uses intraday data that is not yet finalized, causing prediction errors and potential
legal liability for front-running
- id: finance-C-138
when: When accessing API credentials from Airflow DAGs
action: Log or display actual credential values in Airflow task logs or UI output
severity: fatal
kind: architecture_guardrail
modality: must_not
consequence: API keys exposed in logs violate security best practices and may allow unauthorized access to external data
services
- id: finance-C-145
when: When storing encrypted API credentials
action: Store encryption key files and credentials files with 0o600 permissions (owner read/write only)
severity: fatal
kind: architecture_guardrail
modality: must
consequence: World-readable credential files expose API keys, allowing unauthorized access to FRED, Yahoo Finance, and
other data sources
- id: finance-C-154
when: When a user considers using this system for live trading
action: Claim or imply this system supports real-time trading execution — it is an analytical dashboard only
severity: fatal
kind: claim_boundary
modality: must_not
consequence: Users who believe this dashboard executes trades will experience significant financial losses when they attempt
to live trade
- id: finance-C-159
when: When running ML training on time-series stock data
action: Use TimeSeriesSplit cross-validation — never shuffle time-series data to avoid look-ahead bias
severity: fatal
kind: domain_rule
modality: must
consequence: Random shuffle splits cause future data to leak into training, producing unrealistic backtest performance
that will not generalize to live trading
- id: finance-C-194
when: When implementing credential storage for API keys and third-party tokens
action: Use Fernet symmetric encryption to encrypt credentials at rest; store the encryption key separately with 0o600
file permissions to verify credentials remain unreadable if the encrypted file is compromised
severity: fatal
kind: domain_rule
modality: must
consequence: Storing credentials in plaintext or with improperly protected keys exposes API tokens to theft, enabling
unauthorized access to trading platforms and potential financial loss
derived_from_bd_id: BD-004
- id: finance-C-197
when: When configuring cross-validation strategy for financial time series model training
action: Use TimeSeriesSplit for walk-forward validation to prevent lookahead bias; must_not use random split, standard
k-fold, or any shuffle-based splitting method for financial time series
severity: fatal
kind: domain_rule
modality: must
consequence: Random train/test split leaks future information into training data, inflating model performance metrics
by 15-30% and causing live trading returns to fall far below backtested results
derived_from_bd_id: BD-012
regular:
- id: finance-C-002
when: When implementing cached data retrieval
action: include timestamp metadata with cached data to enable expiry verification
severity: high
kind: domain_rule
modality: must
consequence: Without timestamp tracking, cache files are treated as valid regardless of age, causing stale data to be
served as current data
stage_ids:
- data_collection
- id: finance-C-003
when: When fetching Yahoo Finance data in batch operations
action: enforce rate limiting delay between individual ticker requests
severity: high
kind: domain_rule
modality: must
consequence: Unthrottled Yahoo Finance API calls trigger HTTP 429 Too Many Requests errors, causing data collection failures
and blacklisting
stage_ids:
- data_collection
- id: finance-C-004
when: When loading FRED economic series data
action: check offline mode status before attempting any API call
severity: high
kind: architecture_guardrail
modality: must
consequence: API calls made when offline mode is enabled waste resources and produce connection errors that block dashboard
functionality
stage_ids:
- data_collection
- id: finance-C-005
when: When initializing database connections in Streamlit multi-page apps
action: use singleton pattern to verify single shared connection instance
severity: high
kind: architecture_guardrail
modality: must
consequence: Multiple DuckDB connections cause resource exhaustion and Connection object already initialized errors in
multi-page Streamlit deployments
stage_ids:
- data_collection
- id: finance-C-006
when: When determining cache refresh schedules
action: 'align cache expiry SLAs with FRED publication frequency: daily=6h, weekly=1d, monthly=7d, quarterly=30d'
severity: medium
kind: domain_rule
modality: must
consequence: Cache expiry mismatched to publication frequency causes either stale data delivery or unnecessary API calls
that waste rate limits
stage_ids:
- data_collection
- id: finance-C-007
when: When processing Yahoo Finance batch requests
action: process tickers in batches of at most 5 to respect API rate limits
severity: high
kind: resource_boundary
modality: must
consequence: Fetching unlimited tickers simultaneously triggers Yahoo Finance rate limiting, resulting in HTTP 429 errors
and data collection failures
stage_ids:
- data_collection
- id: finance-C-008
when: When handling Yahoo Finance 429 rate limit errors
action: fall back to expired cache (up to 168 hours) as last resort instead of failing silently
severity: medium
kind: domain_rule
modality: must
consequence: Silent failure on rate limit returns empty data, breaking dashboard displays with no indication of staleness
stage_ids:
- data_collection
- id: finance-C-009
when: When fetching SEC EDGAR data
action: enforce 0.1 second delay between requests and implement exponential backoff on 429 responses
severity: high
kind: resource_boundary
modality: must
consequence: SEC EDGAR enforces ~10 req/sec limit; violations cause temporary IP bans and failed data collection
stage_ids:
- data_collection
- id: finance-C-010
when: When loading data with DuckDB available
action: query DuckDB first before falling back to pickle cache or API
severity: medium
kind: architecture_guardrail
modality: must
consequence: Skipping DuckDB and querying API directly bypasses cached data, increasing latency and exhausting API rate
limits unnecessarily
stage_ids:
- data_collection
- id: finance-C-011
when: When API keys are not configured
action: load sample offline data from data/sample_FRED_data.csv and data/sample_*_data.csv files
severity: high
kind: resource_boundary
modality: must
consequence: Dashboard fails to load without fallback data, preventing demonstrations and offline development
stage_ids:
- data_collection
- id: finance-C-012
when: When claiming data freshness capabilities
action: label Yahoo Finance data as real-time when it has inherent market delay
severity: high
kind: claim_boundary
modality: must_not
consequence: Misrepresenting delayed market data as real-time creates false expectations for trading decisions based on
stale prices
stage_ids:
- data_collection
- id: finance-C-013
when: When serving sample offline data
action: present static historical sample data as current market conditions
severity: high
kind: claim_boundary
modality: must_not
consequence: Displaying outdated sample data as current market data misleads users about economic conditions and asset
valuations
stage_ids:
- data_collection
- id: finance-C-014
when: When accessing FRED without API key
action: use unauthenticated pandas_datareader access which has lower rate limits
severity: medium
kind: resource_boundary
modality: must
consequence: FRED enforces strict unauthenticated rate limits; without API key, data fetching fails frequently during
batch operations
stage_ids:
- data_collection
- id: finance-C-015
when: When caching Yahoo Finance data
action: use 24-hour minimum cache expiry to reduce API calls and avoid rate limits
severity: medium
kind: domain_rule
modality: must
consequence: Short cache expiry causes repeated API calls that exhaust rate limits, especially for batch ticker fetching
stage_ids:
- data_collection
- id: finance-C-016
when: When deciding cache priority order
action: check centralized cache before individual cache files
severity: low
kind: architecture_guardrail
modality: must
consequence: Individual cache lookup misses centralized data, causing redundant API fetches that waste rate limits and
increase latency
stage_ids:
- data_collection
- id: finance-C-019
when: When calculating rolling Z-scores for feature normalization
action: Handle division by zero in rolling_std by replacing zero values with NaN
severity: high
kind: domain_rule
modality: must
consequence: Division by zero produces NaN or infinite Z-scores, corrupting ML training data and causing model training
failures or invalid predictions
stage_ids:
- feature_engineering
- id: finance-C-022
when: When classifying momentum regime using RSI, MACD, and SMA50
action: 'Use regime thresholds: Bullish=RSI>60 AND MACD histogram>0 AND Price/SMA50>1.0, Bearish=RSI<40 AND MACD histogram<0
AND Price/SMA50<1.0'
severity: high
kind: domain_rule
modality: must
consequence: Incorrect threshold values cause misclassification of market regime, leading to wrong trading strategy selection
and financial losses
stage_ids:
- feature_engineering
- id: finance-C-023
when: When calculating technical indicators that depend on historical data
action: Verify input OHLCV data has at least 200 rows to compute SMA200 without null values
severity: high
kind: resource_boundary
modality: must
consequence: Insufficient historical data causes SMA200 to be null, corrupting price_to_sma200 ratio and golden/death
cross detection logic
stage_ids:
- feature_engineering
- id: finance-C-024
when: When calculating historical volatility from returns
action: Require at least 20 trading days of data for 20-day rolling standard deviation
severity: medium
kind: resource_boundary
modality: must
consequence: Shorter lookback windows produce unreliable volatility estimates, causing incorrect risk assessments and
position sizing errors
stage_ids:
- feature_engineering
- id: finance-C-025
when: When calculating IV Rank for options analysis
action: Require at least 10 historical IV records in database for meaningful IV Rank calculation
severity: medium
kind: resource_boundary
modality: must
consequence: Insufficient IV history produces meaningless IV Rank values near 50%, causing incorrect options strategy
selection
stage_ids:
- feature_engineering
- id: finance-C-026
when: When calculating Z-scores for feature normalization
action: Use default rolling window of 20 days for mean and standard deviation calculation
severity: low
kind: resource_boundary
modality: must
consequence: Non-standard window sizes produce inconsistent Z-score distributions across features, reducing ML model interpretability
stage_ids:
- feature_engineering
- id: finance-C-027
when: When running the feature pipeline for a ticker
action: 'Execute pipeline stages in order: technical indicators → options metrics → derived features → data quality validation'
severity: high
kind: architecture_guardrail
modality: must
consequence: Out-of-order execution causes derived features to lack required technical indicators, producing null outputs
and failing ML training
stage_ids:
- feature_engineering
- id: finance-C-030
when: When calculating feature interactions with options data
action: Check options_data availability before attempting merge to prevent full-NaN interaction features
severity: medium
kind: architecture_guardrail
modality: must
consequence: Missing options data without proper guard causes NaN-filled interaction features, corrupting ML training
inputs
stage_ids:
- feature_engineering
- id: finance-C-031
when: When handling options data fetch failures in pipeline
action: Log warning and continue pipeline rather than failing entire batch when options data is unavailable
severity: medium
kind: operational_lesson
modality: should
consequence: Options data unavailability for one ticker causes entire batch pipeline failure, blocking feature generation
for all other tickers
stage_ids:
- feature_engineering
- id: finance-C-032
when: When validating date range coverage for feature data
action: Allow for trading calendar gaps by accepting at least 50% of calendar days as valid date coverage
severity: medium
kind: operational_lesson
modality: must
consequence: Strict calendar-day matching fails for valid trading data that excludes weekends and market holidays, causing
false validation failures
stage_ids:
- feature_engineering
- id: finance-C-033
when: When calculating technical indicators for backtesting
action: Claim real-time trading capability - this stage only produces historical indicators from past data
severity: high
kind: claim_boundary
modality: must_not
consequence: Presenting historical technical indicators as real-time trading signals misleads users about system capabilities,
causing inappropriate trading decisions
stage_ids:
- feature_engineering
- id: finance-C-034
when: When presenting golden/death cross detection results
action: Claim that historical cross detection accurately predicts future market direction
severity: high
kind: claim_boundary
modality: must_not
consequence: Past golden/death cross occurrences do not guarantee future signal accuracy, leading to overconfident trading
strategies and potential financial losses
stage_ids:
- feature_engineering
- id: finance-C-035
when: When calculating regime classifications from technical indicators
action: Claim that momentum_regime values directly translate to profitable trading signals
severity: medium
kind: claim_boundary
modality: must_not
consequence: Regime classifications describe historical market states, not future conditions. Using them as trading signals
without proper risk management causes financial losses
stage_ids:
- feature_engineering
- id: finance-C-042
when: When calculating insider sentiment scores
action: include neutral transaction codes (A/D/F/G/E) in buy/sell value calculations
severity: high
kind: domain_rule
modality: must_not
consequence: Including grants, gifts, and tax withholding in sentiment calculation will contaminate signals with non-investment
transactions
stage_ids:
- financial_analysis
- id: finance-C-045
when: When estimating VVIX when actual data is unavailable
action: use formula vvix = vix * 1.2 + 20 as fallback estimation
severity: high
kind: domain_rule
modality: must
consequence: Using incorrect VVIX estimation formula will produce unreliable volatility stress scores, compromising margin
call risk accuracy
stage_ids:
- financial_analysis
- id: finance-C-047
when: When fetching Yahoo Finance data for financial analysis
action: claim real-time data availability since yfinance inherently has 15+ minute market data delay
severity: high
kind: resource_boundary
modality: must_not
consequence: Presenting delayed data as real-time will mislead users about current market conditions, causing execution
at stale prices
stage_ids:
- financial_analysis
- id: finance-C-048
when: When parsing SEC Form 4 XML filings
action: implement fallback parsing logic since SEC EDGAR XML structure varies across filings and time periods
severity: high
kind: resource_boundary
modality: must
consequence: Without fallback parsing, transaction extraction will fail for filings with non-standard XML structures,
causing incomplete insider data
stage_ids:
- financial_analysis
- id: finance-C-049
when: When calculating Piotroski F-Score
action: require at least 2 years of financial data for year-over-year comparison
severity: high
kind: domain_rule
modality: must
consequence: Calculating F-Score with insufficient historical data will produce meaningless comparisons, masking fundamental
deterioration signals
stage_ids:
- financial_analysis
- id: finance-C-051
when: When detecting market rotation patterns
action: classify risk-on when offensive RS > 1 and defensive RS < -1; risk-off when defensive RS > 1 and offensive RS
< -1
severity: high
kind: domain_rule
modality: must
consequence: Incorrect rotation classification thresholds will trigger wrong portfolio rebalancing, missing regime change
opportunities
stage_ids:
- financial_analysis
- id: finance-C-052
when: When storing margin risk scores in database
action: use INSERT OR REPLACE to verify latest scores overwrite stale data for same ticker/date
severity: medium
kind: architecture_guardrail
modality: must
consequence: Duplicate risk scores will cause confusion during analysis, potentially using outdated margin requirements
stage_ids:
- financial_analysis
- id: finance-C-053
when: When retrieving technical features for risk calculation
action: handle None values gracefully since volume_ratio and bid_ask_spread may not exist for each tickers
severity: high
kind: architecture_guardrail
modality: must
consequence: Missing optional fields without defaults will cause KeyError exceptions, breaking risk calculations for tickers
without complete data
stage_ids:
- financial_analysis
- id: finance-C-054
when: When presenting backtest results for insider trading signals
action: claim backtest returns represent expected live trading performance
severity: high
kind: claim_boundary
modality: must_not
consequence: Presenting backtest results as predictive will mislead investors, as historical performance does not account
for slippage, liquidity constraints, or market impact
stage_ids:
- financial_analysis
- id: finance-C-055
when: When displaying financial health scores derived from SEC XBRL
action: present scores as real-time financial condition since SEC filings have inherent reporting lag (quarterly/annual)
severity: medium
kind: claim_boundary
modality: must_not
consequence: Presenting stale financial data as current will mislead investors about company health, especially during
earnings blackout periods
stage_ids:
- financial_analysis
- id: finance-C-056
when: When parsing Form 4 XML data
action: skip validation assuming well-formed XML since SEC EDGAR contains malformed filings and encoding variations
severity: high
kind: rationalization_guard
modality: must_not
consequence: Parsing failures will silently drop transactions, creating survivorship bias where only clean filings contribute
to sentiment
stage_ids:
- financial_analysis
- id: finance-C-057
when: When fetching VIX data from multiple sources
action: skip error handling assuming first data source succeeds since network failures and API changes are common
severity: high
kind: rationalization_guard
modality: must_not
consequence: Single-source dependency will cause complete VIX unavailability during source outages, breaking margin risk
calculations
stage_ids:
- financial_analysis
- id: finance-C-060
when: When training ensemble model with LogisticRegression meta-learner
action: Apply StandardScaler to features before training the LogisticRegression meta-learner
severity: high
kind: domain_rule
modality: must
consequence: Unscaled meta-features cause LogisticRegression to converge poorly or produce unstable coefficients, degrading
ensemble prediction quality and confidence calibration
stage_ids:
- ml_training_prediction
- id: finance-C-061
when: When storing ML predictions in DuckDB database
action: Include confidence_score as the maximum probability from predict_proba
severity: high
kind: domain_rule
modality: must
consequence: Missing confidence scores prevent downstream risk management from assessing prediction reliability, leading
to uniform position sizing that ignores model uncertainty
stage_ids:
- ml_training_prediction
- id: finance-C-062
when: When evaluating trained models
action: Track accuracy, precision, recall, and AUC-ROC in model_performance table
severity: high
kind: domain_rule
modality: must
consequence: Without comprehensive metric tracking, degraded model quality goes undetected, leading to poor trading decisions
based on deteriorating predictions
stage_ids:
- ml_training_prediction
- id: finance-C-063
when: When configuring prediction horizon for stock directional prediction
action: Set prediction_horizon to align with rebalancing frequency (default 5 trading days)
severity: medium
kind: resource_boundary
modality: must
consequence: Mismatched prediction horizon causes signals to arrive at wrong times relative to rebalancing, forcing either
early exits or holding positions past intended horizons
stage_ids:
- ml_training_prediction
- id: finance-C-064
when: When selecting base models for ensemble
action: Use XGBoost and LightGBM as base models (both scale-invariant tree methods)
severity: medium
kind: resource_boundary
modality: must
consequence: Including scale-variant models in ensemble requires careful feature scaling management, increasing implementation
complexity and potential for scaling bugs
stage_ids:
- ml_training_prediction
- id: finance-C-065
when: When preparing training data for time series models
action: Verify minimum sample count for TimeSeriesSplit (n_splits * min_samples_per_split)
severity: high
kind: domain_rule
modality: must
consequence: Insufficient samples cause TimeSeriesSplit to produce empty folds, resulting in training failures or metrics
computed on unreliably small validation sets
stage_ids:
- ml_training_prediction
- id: finance-C-066
when: When training gradient boosting models with validation data
action: Provide eval_set parameter to enable early stopping
severity: medium
kind: operational_lesson
modality: must
consequence: Without early stopping, models train for fixed n_estimators regardless of convergence, causing overfitting
to training data and poor generalization
stage_ids:
- ml_training_prediction
- id: finance-C-068
when: When combining base model predictions in ensemble
action: Generate meta-features from base model predict_proba outputs, not raw predictions
severity: high
kind: architecture_guardrail
modality: must
consequence: Using discrete predictions (0/1) as meta-features loses probability calibration information, reducing meta-learner
effectiveness and ensemble accuracy
stage_ids:
- ml_training_prediction
- id: finance-C-069
when: When storing predictions in DuckDB
action: Define PRIMARY KEY on (ticker, date) to prevent duplicate predictions
severity: high
kind: architecture_guardrail
modality: must
consequence: Without primary key constraint, duplicate predictions for same ticker/date cause fan-trap queries and incorrect
performance metrics in downstream analysis
stage_ids:
- ml_training_prediction
- id: finance-C-071
when: When using ML predictions for investment decisions
action: Use model as sole basis for investment decisions
severity: high
kind: claim_boundary
modality: must_not
consequence: Relying exclusively on binary directional predictions without fundamental analysis, risk management, or portfolio-level
position sizing exposes portfolios to concentrated losses
stage_ids:
- ml_training_prediction
- id: finance-C-072
when: When implementing ML training pipeline
action: Skip TimeSeriesSplit even when data appears stationary or simple
severity: high
kind: rationalization_guard
modality: must_not
consequence: Stationary-appearing data may contain regime changes or structural breaks; skipping temporal validation produces
overfitted models that fail catastrophically during market regime shifts
stage_ids:
- ml_training_prediction
- id: finance-C-073
when: When data appears clean and validation metrics look good
action: Skip retraining models without checking for concept drift
severity: high
kind: rationalization_guard
modality: must_not
consequence: Market relationships evolve; models trained on historical data degrade over time. Stale models produce systematically
biased predictions that accumulate losses
stage_ids:
- ml_training_prediction
- id: finance-C-075
when: When implementing individual signal calculations
action: Normalize each signal to 0-1 range using min(signal, 1.0)
severity: high
kind: domain_rule
modality: must
consequence: Without proper signal normalization, individual signals exceeding 1.0 will distort the weighted probability
calculation, potentially showing recession probabilities exceeding 100% or returning invalid NaN values
stage_ids:
- recession_indicator
- id: finance-C-076
when: When calculating the recession probability output
action: Verify final probability is constrained between 0 and 1
severity: high
kind: domain_rule
modality: must
consequence: Probability values outside 0-1 range will cause incorrect risk level assignment (LOW/MODERATE/ELEVATED/HIGH),
potentially misleading investment decisions and causing financial losses
stage_ids:
- recession_indicator
- id: finance-C-077
when: When loading data for recession probability calculation
action: Load FRED indicator data via load_indicators_from_data before calling calculate_recession_probability
severity: high
kind: architecture_guardrail
modality: must
consequence: Calling calculate_recession_probability without loading data first will raise ValueError and crash the application,
preventing users from viewing recession risk assessment
stage_ids:
- recession_indicator
- id: finance-C-078
when: When calculating historical recession probabilities
action: Require at least 12 periods of historical data as minimum lookback window
severity: high
kind: domain_rule
modality: must
consequence: Insufficient historical data will cause index errors or use incomplete unemployment/GDP statistics, producing
unreliable recession probabilities that do not reflect actual economic conditions
stage_ids:
- recession_indicator
- id: finance-C-080
when: When implementing yield curve inversion detection
action: Use 18-month (365*1.5 days) lookback window for detecting recent inversions
severity: high
kind: domain_rule
modality: must
consequence: Incorrect lookback period will fail to capture inversions that historically predict recessions within 12-18
months, reducing predictive accuracy of the most important indicator
stage_ids:
- recession_indicator
- id: finance-C-082
when: When using the recession probability model for investment decisions
action: Use the model as the sole basis for investment allocation decisions
severity: high
kind: claim_boundary
modality: must_not
consequence: Relying exclusively on this single-indicator model will ignore other critical factors such as company fundamentals,
geopolitical risks, and market sentiment, leading to suboptimal or loss-making portfolio decisions
stage_ids:
- recession_indicator
- id: finance-C-083
when: When fetching FRED economic data for the model
action: 'Obtain each 7 required FRED series: yield spreads, unemployment, claims, industrial production, GDP growth, consumer
sentiment, corporate spreads, Fed funds rate'
severity: high
kind: resource_boundary
modality: must
consequence: Missing required FRED series will cause signal calculations to return 0.0 for affected indicators, systematically
underestimating recession probability and providing false reassurance
stage_ids:
- recession_indicator
- id: finance-C-084
when: When assigning recession risk levels
action: 'Use correct probability thresholds: LOW (<20%), MODERATE (20-40%), ELEVATED (40-70%), HIGH (>=70%)'
severity: high
kind: domain_rule
modality: must
consequence: Incorrect risk level thresholds will misclassify recession risk, potentially causing investors to either
panic unnecessarily or remain inadequately hedged during genuine economic downturns
stage_ids:
- recession_indicator
- id: finance-C-086
when: When generating indicator explanations for users
action: Provide explanations for each 7 indicator categories explaining their contribution to the recession probability
severity: medium
kind: architecture_guardrail
modality: must
consequence: Missing indicator explanations will leave users without context for why the model assigns specific recession
probabilities, reducing transparency and trust in the model's output
stage_ids:
- recession_indicator
- id: finance-C-087
when: When loading FRED data for recession indicators
action: Use cached data older than the cache expiry threshold without validation
severity: medium
kind: resource_boundary
modality: must_not
consequence: Using stale FRED data will produce outdated recession probabilities that do not reflect current economic
conditions, potentially leading to incorrect investment decisions based on obsolete information
stage_ids:
- recession_indicator
- id: finance-C-091
when: When implementing economic data quality validation
action: Raise exception when FRED data has fewer than 100 rows or 30 columns, or when Yahoo Finance has fewer than 3 tickers
severity: high
kind: domain_rule
modality: must
consequence: Dashboard renders with incomplete economic indicators, missing key series needed for recession analysis and
financial forecasting
stage_ids:
- orchestration_automation
- id: finance-C-092
when: When configuring FRED API data refresh
action: Insert 1-second sleep every 20 FRED requests to stay under 120 calls/minute rate limit
severity: high
kind: resource_boundary
modality: must
consequence: Automated refresh triggers FRED rate limiting, causing API failures and incomplete data fetching for all
subsequent dashboard users
stage_ids:
- orchestration_automation
- id: finance-C-093
when: When configuring Yahoo Finance data refresh
action: Implement exponential backoff (10s, 15s) for failed Yahoo Finance requests, up to 3 retries
severity: high
kind: resource_boundary
modality: must
consequence: Persistent Yahoo Finance rate limit errors cause complete market data refresh failure, leaving dashboard
without current market indicators
stage_ids:
- orchestration_automation
- id: finance-C-094
when: When configuring Airflow DAG task execution
action: Set execution_timeout to 30 minutes maximum per task to prevent indefinite hanging
severity: high
kind: resource_boundary
modality: must
consequence: Stuck DAG tasks block subsequent runs, causing cascading failures and missing data updates for multiple days
stage_ids:
- orchestration_automation
- id: finance-C-095
when: When configuring Airflow DAG retry policy
action: Set retries=3 with retry_delay=5 minutes and email_on_retry=False to handle transient failures gracefully
severity: high
kind: resource_boundary
modality: must
consequence: Without proper retry configuration, transient API failures immediately cause DAG failures, generating excessive
alert emails and wasting investigation time
stage_ids:
- orchestration_automation
- id: finance-C-096
when: When scheduling ICI ETF data refresh DAG
action: Schedule ICI weekly ETF flows refresh for Wednesday (day 3) to capture the weekly publication
severity: high
kind: operational_lesson
modality: must
consequence: ICI data refresh scheduled on wrong day misses the weekly publication window, dashboard displays stale ETF
flow data for entire week
stage_ids:
- orchestration_automation
- id: finance-C-097
when: When scheduling market data refresh DAG
action: Schedule VIX and market data refresh for weekdays only (1-5) to align with trading calendar
severity: high
kind: architecture_guardrail
modality: must
consequence: Weekend refresh attempts fetch stale or no market data, wasting compute resources and creating confusing
empty data states in dashboard
stage_ids:
- orchestration_automation
- id: finance-C-098
when: When configuring AIRFLOW_ALERT_EMAIL for DAG failure notifications
action: Set AIRFLOW_ALERT_EMAIL environment variable to enable failure email alerts when DAG tasks exhaust each retries
severity: high
kind: operational_lesson
modality: must
consequence: Silent DAG failures go unnoticed for extended periods, dashboard serves stale data without operators realizing
refresh has stopped
stage_ids:
- orchestration_automation
- id: finance-C-100
when: When implementing parallel ETL tasks in DAG
action: Execute ICI and VIX fetches concurrently using Airflow task list syntax to reduce total refresh time
severity: medium
kind: architecture_guardrail
modality: must
consequence: Sequential execution doubles refresh duration, risking timeout and delaying dashboard availability for morning
market analysis
stage_ids:
- orchestration_automation
- id: finance-C-101
when: When configuring cache backup retention policy
action: Delete CSV backups older than 30 days to prevent disk space exhaustion
severity: high
kind: resource_boundary
modality: must
consequence: Unbounded backup growth fills disk, causing DAG failures and potential data loss when system cannot write
new backups or cache files
stage_ids:
- orchestration_automation
- id: finance-C-102
when: When implementing data refresh automation
action: Claim guaranteed real-time data availability when refresh relies on polling-based external APIs
severity: medium
kind: claim_boundary
modality: must_not
consequence: Marketing claims of real-time data mislead users; actual data has inherent delays from FRED/Yahoo Finance
polling, causing incorrect assumptions about data freshness
stage_ids:
- orchestration_automation
- id: finance-C-103
when: When presenting automated refresh results
action: Present automated refresh metrics as evidence of system reliability without acknowledging external API dependency
severity: medium
kind: claim_boundary
modality: must_not
consequence: Dashboard claims strong reliability metrics while overlooking that failures stem from external API unavailability,
misrepresenting operational success
stage_ids:
- orchestration_automation
- id: finance-C-104
when: When setting cache expiry thresholds
action: Use 24-hour cache expiry to balance API load reduction with data freshness requirements
severity: medium
kind: resource_boundary
modality: must
consequence: Overly aggressive caching serves stale data; overly aggressive refresh exhausts API rate limits, both degrading
dashboard utility
stage_ids:
- orchestration_automation
- id: finance-C-105
when: When configuring email alerts for DAG notifications
action: Send alert emails on retry attempts (only send on final failure after each retries exhausted)
severity: high
kind: architecture_guardrail
modality: must_not
consequence: Excessive alert emails during retry phases cause alert fatigue, leading operators to ignore or disable critical
failure notifications
stage_ids:
- orchestration_automation
- id: finance-C-125
when: When fetching Yahoo Finance OHLCV data for technical indicators
action: Cache data with 24-hour expiry to respect rate limits (YFINANCE_CACHE_HOURS=24, YFINANCE_RATE_LIMIT_DELAY=0.5)
severity: high
kind: resource_boundary
modality: must
consequence: API rate limiting causes data fetch failures, resulting in missing technical indicators and NaN values propagating
to downstream ML models
- id: finance-C-126
when: When loading OHLCV data from DuckDB for technical indicator calculation
action: Verify date column is parsed as DatetimeIndex and OHLCV columns (open, high, low, close, volume) exist with numeric
dtypes
severity: high
kind: domain_rule
modality: must
consequence: Technical indicators produce NaN or incorrect values because column names or types don't match the ta library
expectations
- id: finance-C-127
when: When storing technical features in DuckDB from feature engineering
action: Use PRIMARY KEY (ticker, date) to prevent duplicate records and verify schema matches technical_features table
definition
severity: high
kind: architecture_guardrail
modality: must
consequence: Duplicate feature records cause JOIN failures in ML training, producing inconsistent model inputs and invalid
predictions
- id: finance-C-130
when: When loading FRED economic series for recession indicator calculation
action: 'Verify series columns match the expected names: yield_spread_10y2y, unemployment_rate, consumer_sentiment, corporate_spread'
severity: high
kind: architecture_guardrail
modality: must
consequence: Recession probability model receives wrong indicator values or NaN, producing incorrect recession probability
scores
- id: finance-C-131
when: When passing margin risk scores to visualization dashboard
action: Pass composite_risk_score as float in range 0-100 with risk_level classification string (Critical/High/Moderate/Low/Minimal)
severity: medium
kind: architecture_guardrail
modality: must
consequence: Dashboard fails to render risk gauges correctly, showing NaN or incorrect colors for risk indicators
- id: finance-C-132
when: When storing trained model files as .pkl for persistence
action: Include ticker, model_type, and timestamp in filename for version tracking and retrieval (e.g., {ticker}_{model_type}_{timestamp}.pkl)
severity: high
kind: domain_rule
modality: must
consequence: Prediction engine cannot locate the correct model version, causing prediction failures or using outdated
models
- id: finance-C-133
when: When displaying ML predictions in the dashboard
action: Display confidence_score as probability percentage alongside the binary prediction (UP/DOWN) to avoid misrepresenting
prediction certainty
severity: medium
kind: claim_boundary
modality: must
consequence: Users misinterpret high confidence scores as guaranteed outcomes, leading to overconfident trading decisions
and potential financial losses
- id: finance-C-134
when: When presenting recession probability scores to users
action: Display recession probability with explicit confidence intervals and historical accuracy statistics to prevent
over-interpretation
severity: high
kind: claim_boundary
modality: must
consequence: Users treat recession probability as precise prediction rather than probabilistic estimate, causing inappropriate
risk hedging or asset allocation decisions
- id: finance-C-135
when: When refreshing market data via Airflow DAGs
action: 'Validate data freshness: ICI ETF weekly data must be within 14 days, VIX data within 7 days, otherwise raise
exception'
severity: high
kind: operational_lesson
modality: must
consequence: Stale market data propagates to all downstream analyses, causing incorrect sector rotation signals and margin
risk scores
- id: finance-C-136
when: When fetching options data from Yahoo Finance for IV metrics
action: Handle division by zero for put_call_ratio when call_volume is 0, returning None instead of inf or NaN
severity: medium
kind: resource_boundary
modality: must
consequence: IV metrics with inf/NaN values cause feature engineering pipeline to fail or produce invalid margin risk
scores
- id: finance-C-139
when: When calculating FRED data pivot from long to wide format
action: Convert date column to DatetimeIndex and rename columns from series_id back to descriptive names after pivot operation
severity: high
kind: domain_rule
modality: must
consequence: Downstream modules expecting descriptive column names receive FRED series IDs, causing feature name mismatches
and KeyErrors
- id: finance-C-140
when: When establishing database connections in the application
action: Use the singleton DatabaseConnection via get_db_connection() — only one connection instance per process is allowed
severity: high
kind: architecture_guardrail
modality: must
consequence: Multiple DuckDB connection instances can cause file locking conflicts and data inconsistency in concurrent
access scenarios
- id: finance-C-141
when: When creating time-series tables in the DuckDB database
action: Use composite primary key (entity_id, date) for each time-series tables to verify uniqueness and proper indexing
severity: high
kind: architecture_guardrail
modality: must
consequence: Duplicate primary keys or missing date-based partitioning causes data insertion failures and incorrect time-series
queries
- id: finance-C-142
when: When training ML models on the feature data
action: Preserve feature column names from the database exactly as-is through the ML training pipeline
severity: medium
kind: architecture_guardrail
modality: must
consequence: Feature name changes break model prediction consistency, causing incorrect feature mapping between training
and inference
- id: finance-C-143
when: When defining the ML binary classification target
action: Create binary target as 1 if future_close > close, else 0 (5-day prediction horizon default)
severity: high
kind: domain_rule
modality: must
consequence: Incorrect target definition invalidates all ML model training results and prediction accuracy metrics
- id: finance-C-144
when: When configuring cache expiry times
action: Set cache_expiry_hours to 1 hour in development and 24 hours in production environments
severity: medium
kind: operational_lesson
modality: must
consequence: Production cache too short causes excessive API calls and slow dashboard loads; dev cache too long delays
visibility of data changes
- id: finance-C-146
when: When implementing regime crossover detection signals
action: Apply shift(1) to compare bar t with bar t-1 for golden cross and death cross detection to avoid look-ahead bias
severity: high
kind: domain_rule
modality: must
consequence: Without shift(1), the signal uses today's moving average values to generate today's signal, causing look-ahead
bias in backtests
- id: finance-C-147
when: When classifying VIX regime levels
action: 'Apply thresholds: <15 Low, <20 Normal, <30 Elevated, >=30 Crisis'
severity: medium
kind: domain_rule
modality: must
consequence: Incorrect threshold boundaries cause wrong risk regime classification, leading to inappropriate margin risk
calculations and trading recommendations
- id: finance-C-148
when: When implementing SLA-based cache refresh policies
action: 'Enforce cache expiry based on data frequency: Daily=6h, Weekly=1d, Monthly=7d, Quarterly=30d'
severity: medium
kind: operational_lesson
modality: must
consequence: Stale data displayed to users when cache not refreshed; excessive API calls when over-refreshing data outside
SLA windows
- id: finance-C-149
when: When fetching data from external sources (FRED, Yahoo Finance)
action: Use UTC timestamps consistently across each scheduled jobs and data refresh workflows
severity: high
kind: domain_rule
modality: must
consequence: Mixed timezone handling causes data timestamp misalignment, leading to incorrect time-series joins and stale
data served as fresh
- id: finance-C-150
when: When presenting or reporting this system's ML prediction results to users
action: Claim that ML prediction accuracy or backtested returns equal expected live trading returns
severity: high
kind: claim_boundary
modality: must_not
consequence: Users make live capital allocation decisions based on inflated backtest returns, leading to severe underperformance
in live trading and potential financial loss
- id: finance-C-151
when: When displaying recession probability model results
action: Display the disclaimer that past indicator performance does not guarantee future predictive accuracy
severity: high
kind: claim_boundary
modality: must
consequence: Without proper disclaimer, users may rely on recession forecasts as definitive predictions, leading to poor
investment timing decisions
- id: finance-C-152
when: When presenting technical analysis results
action: Include the disclaimer that analysis is for educational purposes only and should not be considered financial advice
severity: high
kind: claim_boundary
modality: must
consequence: Without proper disclaimer, users may treat educational technical analysis as actionable trading signals,
leading to financial losses
- id: finance-C-153
when: When presenting news sentiment analysis results
action: Include the disclaimer that analysis is for informational purposes only and should not be considered financial
advice
severity: high
kind: claim_boundary
modality: must
consequence: Without proper disclaimer, users may act on sentiment signals as reliable trading indicators, leading to
poor investment decisions
- id: finance-C-155
when: When a user without Python environment setup considers using this system
action: Claim or imply the system works out-of-the-box without Python 3.10+ and pip dependencies
severity: high
kind: claim_boundary
modality: must_not
consequence: Users without Python environment setup will be unable to run the dashboard or scripts, leading to frustration
and wasted setup time
- id: finance-C-156
when: When presenting news sentiment analysis capabilities
action: Claim NLP sentiment analysis provides accurate or reliable sentiment predictions
severity: high
kind: claim_boundary
modality: must_not
consequence: NLP sentiment analysis has inherent limitations in understanding context, sarcasm, and financial jargon,
leading to misleading sentiment signals
- id: finance-C-157
when: When accessing FRED and Yahoo Finance data
action: Accept that external API data has inherent delays — FRED daily data published ~4PM ET, Yahoo Finance has ~15-minute
delay
severity: medium
kind: resource_boundary
modality: must
consequence: Users expecting real-time economic indicators will see stale data, potentially causing decisions based on
outdated information
- id: finance-C-158
when: When operating without API keys
action: Accept that the system operates in sample data mode with demonstration-quality results only
severity: high
kind: resource_boundary
modality: must
consequence: Users presenting sample/demo data as representative live data will make incorrect conclusions about market
conditions
- id: finance-C-160
when: When implementing momentum indicators in technical analysis module
action: Use EMA (Exponential Moving Average) with 12-period fast and 26-period slow boundaries aligned with standard MACD
parameters; do not replace with DEMA or SMA without re-evaluating signal interpretation
severity: high
kind: domain_rule
modality: must
consequence: Switching to SMA would slow signal response, causing delayed entry/exit points; using DEMA introduces complexity
without standardized parameters, breaking consistency with the trading system's momentum signal generation
derived_from_bd_id: BD-027
- id: finance-C-161
when: When implementing RSI-based momentum oscillator
action: Use RSI(14) with 30/70 overbought/oversold boundaries validated against the 5-day prediction horizon; verify that
RSI period matches strategy timeframe by testing signal frequency vs noise ratio
severity: medium
kind: domain_rule
modality: should
consequence: Using RSI(7) produces excessive signals with higher noise, while RSI(21) may miss short-term momentum reversals;
mismatched RSI period causes either over-trading or delayed signals relative to the 5-day prediction window
derived_from_bd_id: BD-028
- id: finance-C-162
when: When implementing volatility measurement for position sizing
action: Use ATR(14) as the volatility metric for risk parity calculations; ATR period must be 14 to capture meaningful
average true range across different price levels and market conditions
severity: high
kind: domain_rule
modality: must
consequence: Using shorter ATR periods increases sensitivity to noise, while longer periods lag market volatility; mismatched
ATR parameters cause incorrect position sizing, leading to either excessive risk or underallocation
derived_from_bd_id: BD-031
- id: finance-C-163
when: When implementing sector rotation detector for relative strength calculations
action: Calculate relative strength as excess return versus SPY benchmark over rolling 20/60/120-day periods; use positive/negative
boundary to determine sector rotation direction
severity: high
kind: domain_rule
modality: must
consequence: Using sector SPDR ETFs as benchmark instead of SPY narrows market context, causing sector rotation signals
to miss broad market regime changes and potentially allocating to sectors underperforming the broader market
derived_from_bd_id: BD-042
- id: finance-C-164
when: When configuring data refresh intervals in data_series_config
action: 'Align refresh schedules with FRED publication cycles: Daily series = 6h, Weekly series = 1d, Monthly series =
7d (to capture NFP/CPI), Quarterly series = 30d (to capture GDP revisions)'
severity: medium
kind: domain_rule
modality: should
consequence: Using fixed 6-hour refresh wastes API quota on rarely-changing monthly data; conversely, refreshing monthly
data only weekly may miss timely economic releases, causing stale indicators in the ML model
derived_from_bd_id: BD-003
- id: finance-C-165
when: When implementing recession probability model
action: Use 7-indicator weighted scoring approach including yield curve, labor (unemployment claims), financial (credit
spreads), activity (PMI), consumer (consumer confidence), housing (starts/permits), and market (equity drawdown); do
not rely on single-indicator models
severity: high
kind: domain_rule
modality: must
consequence: Single-indicator recession models (e.g., yield curve only) have high false positive rates during normal volatility
cycles, causing premature portfolio de-risking and missed opportunities during extended bull markets
derived_from_bd_id: BD-018
- id: finance-C-166
when: When preparing features for meta-learner training and prediction
action: Apply StandardScaler to features before feeding into LogisticRegression meta-learner; XGBoost/LightGBM inputs
should NOT be scaled since they are scale-invariant
severity: high
kind: architecture_guardrail
modality: must
consequence: Without scaling, probability outputs from different base models receive unequal treatment due to feature
scale differences, causing the meta-learner to overweight models with larger numerical ranges and underweight those
with smaller ranges
derived_from_bd_id: BD-015
- id: finance-C-167
when: When implementing or refactoring Stochastic Oscillator calculations in technical analysis
action: Use %K period of 14 and %D smoothing period of 3 for the fast stochastic configuration — maintain these exact
parameters to verify consistent momentum confirmation signals
severity: high
kind: domain_rule
modality: must
consequence: Changing %K or %D parameters alters signal timing and reduces momentum confirmation reliability; strategies
calibrated with fast stochastic (%K=14, %D=3) may produce false signals with different parameter values
derived_from_bd_id: BD-032
- id: finance-C-168
when: When implementing MACD calculations in technical analysis modules
action: 'Use standard MACD parameters: 12-period fast EMA, 26-period slow EMA, and 9-period signal line — these parameters
were validated through out-of-sample testing showing inferior performance with alternative values during 2020 volatility'
severity: high
kind: domain_rule
modality: must
consequence: Modified MACD parameters (e.g., 6,13,4) demonstrated inferior performance during market volatility events;
using non-standard parameters may cause backtest results that cannot be replicated in live trading
derived_from_bd_id: BD-029
- id: finance-C-169
when: When implementing sector correlation matrix calculations for portfolio optimization
action: Use Pearson correlation to measure linear relationship strength between sector returns — this quantifies portfolio
diversification and informs sector rotation timing aligned with Markowitz optimization framework
severity: high
kind: architecture_guardrail
modality: must
consequence: Using non-Pearson correlation methods (e.g., Spearman rank or DCC-GARCH) changes risk quantification; Markowitz
optimization requires Pearson correlation inputs, and alternative methods produce inconsistent diversification metrics
derived_from_bd_id: BD-059
- id: finance-C-170
when: When implementing regime classification logic using technical indicators
action: 'Apply exact thresholds: Bullish requires RSI>60 AND MACD>0 AND Price>SMA50; Bearish requires RSI<40 OR (MACD<0
AND Price<SMA50) — these three-factor agreement thresholds filter noise while catching genuine regime shifts'
severity: high
kind: domain_rule
modality: must
consequence: Using different threshold values (e.g., RSI>50 alone) produces significantly different regime signals during
consolidation periods; single-factor triggers generate excessive false signals causing wrong strategy execution
derived_from_bd_id: BD-006
- id: finance-C-171
when: When implementing recession prediction model
action: Assign yield curve indicator a weight of 0.25 in the recession model — this weight reflects the 12-18 month lead
time empirically validated in academic literature (Estrella & Mishkin 1998, Rudebusch & Williams 2009)
severity: high
kind: domain_rule
modality: must
consequence: Equal weights or reduced yield curve weight deviates from empirically validated parameters; academic literature
consistently shows yield curve inversion precedes recession by 3-18 months, and reducing its weight diminishes predictive
accuracy
derived_from_bd_id: BD-016
- id: finance-C-172
when: When implementing or refactoring options metrics calculations in modules.features.options_metrics
action: Calculate IV Percentile as the percentage of historical trading days with IV lower than the current value — this
provides time-based regime classification distinct from IV Rank
severity: high
kind: domain_rule
modality: must
consequence: Replacing IV Percentile with simpler metrics loses time-based regime classification; strategies relying on
IV Percentile for options pricing decisions will use wrong volatility regime, causing systematic mispricing
derived_from_bd_id: BD-040
- id: finance-C-173
when: When implementing or refactoring recession detection logic in modules.ml.recession_model
action: Apply the Sahm Rule with the 0.5 percentage point threshold from 12-month unemployment minimum — do not modify
this boundary to other values without comprehensive backtesting
severity: high
kind: domain_rule
modality: must
consequence: Changing the 0.5% recession trigger threshold alters recession signal timing; incorrect threshold causes
either missed recession warnings or premature defensive positioning, both causing significant portfolio losses
derived_from_bd_id: BD-043
- id: finance-C-174
when: When configuring Bollinger Bands parameters in financial_analysis technical analysis module
action: Verify that Bollinger Bands standard deviation parameter is set to 2.0 — this captures approximately 95% of price
action under normal distribution assumptions
severity: medium
kind: domain_rule
modality: should
consequence: Using non-2.0 standard deviation parameters alters breakout signal sensitivity; narrower bands increase false
signals while wider bands miss genuine volatility expansions, causing poor mean-reversion entry timing
derived_from_bd_id: BD-030
- id: finance-C-175
when: When implementing or modifying sector rotation signal generation in modules.features.sector_rotation
action: Use dual momentum with 10-day and 50-day moving averages where 10-day > 50-day confirms upward momentum — these
specific boundary periods capture short-term timing with medium-term noise filtering
severity: high
kind: domain_rule
modality: must
consequence: Changing momentum periods alters sector rotation signal timing and turnover; different period combinations
were backtested and showed inferior risk-adjusted returns, causing suboptimal sector allocation decisions
derived_from_bd_id: BD-060
- id: finance-C-176
when: When executing the feature engineering pipeline in modules.features.feature_pipeline
action: Fail the pipeline when RSI null values exceed 10% or duplicate dates are detected — do not convert this to a warning;
hard failure ensures visibility into data quality issues
severity: high
kind: domain_rule
modality: must
consequence: Converting this to a warning allows ML models to train silently on corrupted data, producing unreliable predictions
that lead to poor trading decisions and financial losses
derived_from_bd_id: BD-007
- id: finance-C-177
when: When applying RecessionProbabilityModel to non-US markets or post-2020 periods with structural economic breaks
action: Recalibrate indicator weights (yield_curve 25%, labor 20%, financial 15%) based on current market empirical data
— these weights are calibrated for US markets pre-2020 and may not reflect post-pandemic indicator relationships
severity: medium
kind: domain_rule
modality: should
consequence: Using pre-2020 calibrated weights on post-pandemic or emerging market data produces unreliable recession
probabilities; indicator relationships changed significantly during COVID, leading to systematic misprediction
derived_from_bd_id: BD-064
- id: finance-C-178
when: When configuring yield curve inversion lookback period in recession_indicator stage
action: Use 12-18 month lookback period (365*1.5 = 547 days) to capture historical inversions that predict recessions
within 12-18 months — current inversion adds signal even if spread has normalized after inversion
severity: high
kind: domain_rule
modality: must
consequence: Using shorter lookback (e.g., 6 months) misses recession signals from inversions that normalized but still
predict near-term recession, causing delayed or missed defensive portfolio positioning
derived_from_bd_id: BD-017
- id: finance-C-180
when: When implementing or modifying the dashboard's data freshness indicators
action: Verify unified data freshness signals that reconcile connection status, last update timestamp, and data source
badge into a single coherent indicator — do not show green connection status while displaying stale sample data
severity: high
kind: domain_rule
modality: must
consequence: Conflicting data freshness signals cause users to trust green connection status while acting on stale sample
data, leading to incorrect trading decisions during critical market moments when data accuracy is essential
derived_from_bd_id: BD-081
- id: finance-C-181
when: When implementing or refactoring insider trading classification logic
action: Verify SEC Form 4 code classification against current filing conventions; add context-aware disambiguation for
code P (distinguish private placements from standard purchases), and implement validation against company event calendars
severity: medium
kind: domain_rule
modality: should
consequence: Misclassifying private placement transactions as bullish (code P) causes false positive bullish signals,
potentially leading to strategy entries based on insider purchases that were actually compensatory awards rather than
directional bets
derived_from_bd_id: BD-067
- id: finance-C-182
when: When training machine learning models using BaseModel with StandardScaler
action: Verify feature distributions are approximately Gaussian within training windows; apply Shapiro-Wilk test (p>0.05)
or examine kurtosis; if heavy tails detected, switch to RobustScaler or rank-based transformation
severity: medium
kind: domain_rule
modality: should
consequence: StandardScaler assumes Gaussian-distributed features; heavy-tailed features get scaled values misrepresenting
true relative importance, causing models to overweight outlier-prone indicators and underweight stable ones during inference
derived_from_bd_id: BD-069
- id: finance-C-183
when: When implementing or refactoring feature calculation logic for golden/death cross detection
action: 'Implement frequency-aware shift calculation: for minute-level data use shift(1) referencing prior minute, for
hourly use shift(1) referencing prior hour; validate that shift window matches input data granularity before computing
SMA crossovers'
severity: high
kind: architecture_guardrail
modality: must
consequence: Hardcoded shift(1) assumes daily closing prices; when applied to intraday minute-level data, shift(1) references
the prior minute instead of prior day, causing completely wrong crossover detection that silently produces false signals
derived_from_bd_id: BD-073
- id: finance-C-184
when: When designing or modifying database schemas for feature tables
action: Normalize timestamps to UTC before storing; include exchange_timezone field in feature metadata; when querying
across multiple exchanges, filter by exchange-specific trading day boundaries rather than assuming UTC date equivalence
severity: medium
kind: architecture_guardrail
modality: should
consequence: Composite primary key (ticker, date) assumes single timezone per date; when the same ticker trades on exchanges
across different time zones, identical ticker/date combinations create conflicting entries with different trading day
boundaries
derived_from_bd_id: BD-074
- id: finance-C-185
when: When implementing or modifying feature pipeline execution order
action: 'Maintain the validated 5-step execution order: tech→options→derived→margin_risk→quality; implement dependency
graph validation that checks prerequisite stages complete before dependent stages run, ensuring null RSI failures are
attributed to ordering violations not data quality issues'
severity: high
kind: domain_rule
modality: must
consequence: Pipeline validation thresholds (>10% null RSI) assume the fixed 5-step order; if derived features run before
tech indicators, null RSI from missing upstream dependencies gets misattributed to data quality, causing the quality
gate to silently accept corrupted inputs that corrupt downstream margin risk calculations
derived_from_bd_id: BD-078
- id: finance-C-186
when: When configuring DAG scheduling with weekday-only runs and SLA-based refresh intervals
action: Convert SLA definitions from natural days to business days (using pandas.bdate_range or similar); for critical
indicators with <24h SLAs, add weekend catch-up runs at Saturday 7 AM UTC and Sunday 7 AM UTC; document actual data
staleness bounds in dashboard metadata
severity: high
kind: architecture_guardrail
modality: must
consequence: Weekday-only scheduling (Mon-Fri 7 AM UTC) combined with natural-day SLA definitions causes Friday afternoon
releases to experience 63 hours of staleness for 6-hour SLA indicators; monthly indicators miss entire weekends, making
dashboard freshness claims systematically optimistic by 2-3x for end-of-week releases
derived_from_bd_id: BD-077
- id: finance-C-187
when: When implementing or modifying volatility regime detection logic in position sizing or margin calculations
action: 'Implement de-duplication logic: when multiple volatility triggers fire simultaneously (z-score threshold + percentile
rank + margin composite), use only one trigger and log the others as suppressed; or unify into a single volatility regime
score with documented weight contribution'
severity: high
kind: architecture_guardrail
modality: must
consequence: Three independent volatility triggers (z-score +1.0, percentile >75th, margin composite 25% weight) fire
simultaneously during high-VIX periods, creating multiplicative defensive positioning that causes 40-60% larger position
reductions than any single trigger would justify, leading to under-hedging followed by failure to re-enter quickly
derived_from_bd_id: BD-079
- id: finance-C-188
when: When implementing crossover detection for SMA-based technical signals
action: Use pandas shift(1) to compare current vs prior bar states for SMA50/SMA200 crossover detection — detect golden
cross (bullish) and death cross (bearish) by comparing previous and current bar states
severity: high
kind: domain_rule
modality: must
consequence: Without shift(1), current bar would incorrectly signal crossovers that haven't occurred yet, causing repainting
issues where signals appear and disappear as price moves within the same bar
derived_from_bd_id: BD-005
- id: finance-C-189
when: When implementing ensemble prediction for multi-model strategies
action: Use LogisticRegression meta-learner stacking on base model probabilities — pass base model predictions as features
to a second-level model that learns optimal weighting; do NOT use simple averaging which treats each models equally
severity: high
kind: domain_rule
modality: must
consequence: Simple averaging gives equal weight to all models regardless of their current predictive power, causing suboptimal
predictions during market regime changes when some models outperform others
derived_from_bd_id: BD-014
- id: finance-C-190
when: When processing monetary values in backtesting calculations
action: Use Python float type for currency calculations — float introduces rounding errors due to binary floating-point
representation
severity: high
kind: architecture_guardrail
modality: must_not
consequence: Float rounding errors accumulate over many transactions, causing P&L discrepancies that may appear as profits
or losses not present in actual trading
derived_from_bd_id: BD-GAP-003
- id: finance-C-191
when: When implementing monetary calculations in the backtesting engine
action: Use Python Decimal type for each currency calculations — import from decimal import Decimal; initialize with string
or Decimal('X.XX') to avoid float conversion; apply Decimal throughout trade cost, P&L, and portfolio value calculations
severity: high
kind: domain_rule
modality: must
consequence: Without Decimal, float-based currency calculations cause silent rounding errors that accumulate in high-frequency
or long-running backtests, leading to incorrect strategy performance metrics
derived_from_bd_id: BD-GAP-003
- id: finance-C-192
when: When implementing database connection management in Streamlit multi-page applications
action: Use singleton pattern for DuckDB connection — implement __new__ method to return single shared instance; do NOT
create new connection instances per page or per query
severity: high
kind: architecture_guardrail
modality: must
consequence: Multiple DuckDB connection instances cause 'database is locked' errors when Streamlit pages access data simultaneously,
breaking multi-page app functionality
derived_from_bd_id: BD-001
- id: finance-C-193
when: When implementing trend identification logic in the financial analysis module
action: Use SMA (Simple Moving Average) with 20/50/200-period boundaries for trend identification; do not substitute with
WMA or EMA without re-evaluating signal stability
severity: high
kind: domain_rule
modality: must
consequence: Switching from SMA to WMA introduces higher sensitivity to outliers and unstable historical baselines, causing
trend signals to flip incorrectly during volatile periods and generating false trading signals
derived_from_bd_id: BD-026
- id: finance-C-195
when: When using the framework in environments without FRED API credentials
action: Verify offline mode status before performing analysis; verify users can distinguish between live FRED data and
fallback sample_FRED_data.csv to avoid treating sample data as current market information
severity: medium
kind: operational_lesson
modality: should
consequence: Running analysis on sample data without awareness produces misleading results; trading decisions based on
outdated sample data will not reflect current market conditions
derived_from_bd_id: BD-002
- id: finance-C-196
when: When configuring prediction horizon and rebalancing frequency for ML-based trading strategies
action: Verify prediction horizon (5 trading days) aligns with rebalancing frequency (weekly); verify these parameters
match the intended swing trading strategy and not a different timeframe
severity: medium
kind: domain_rule
modality: should
consequence: Mismatched prediction horizon and rebalancing frequency causes the model to optimize for a different trading
cycle, leading to signals that are irrelevant to actual weekly rebalancing decisions
derived_from_bd_id: BD-013
- id: finance-C-198
when: When implementing credential storage and access control in production trading systems
action: Do not rely on file encryption (Fernet) and file permissions (0o600) as sole defense against privileged escalation
attacks — encryption keys must never be accessible to processes running under the same user account that owns encrypted
credentials; use Hardware Security Module (HSM) or cloud KMS integration where encryption key never touches application
memory
severity: high
kind: architecture_guardrail
modality: must
consequence: If an attacker gains owner account access via phishing, password reuse, or insider threat, encryption keys
become immediately accessible and all credentials can be decrypted, compromising production trading credentials and
enabling unauthorized market access
derived_from_bd_id: BD-084
- id: finance-C-199
when: When classifying market volatility regimes using VIX thresholds for strategy risk adjustment
action: Verify VIX regime thresholds (Low<15, Normal<20, Elevated<30, Crisis>=30) against current market structure; these
thresholds were established during pre-2008 markets and post-financial crisis 'Normal' range has shifted upward with
structurally higher VIX levels
severity: medium
kind: domain_rule
modality: should
consequence: Pre-2008 VIX thresholds cause over-sensitive Crisis signals in post-2008 markets with structurally elevated
volatility, leading strategies to rotate away from risk assets prematurely and systematically underperform during extended
periods of elevated but non-crisis volatility
derived_from_bd_id: BD-071
- id: finance-C-200
when: When deploying the EnsembleModel for prediction tasks
action: Monitor base learner prediction correlation; if XGBoost and LightGBM base predictions converge to correlation
>0.95, the stacking architecture provides minimal benefit over a single model and base learners must be diversified
with additional model types
severity: high
kind: domain_rule
modality: must
consequence: When base learners produce highly correlated predictions (>0.95), the 2-level stacking architecture provides
no ensemble benefit, effectively acting as a single model with unnecessary computational overhead, causing degraded
prediction accuracy compared to a properly diversified ensemble
derived_from_bd_id: BD-065
- id: finance-C-201
when: When calculating recession probability for macroeconomic regime-aware strategy selection
action: Verify yield curve (10Y-2Y spread) receives dominant weight of at least 40% in recession probability calculation;
do not refactor to equal weights or alternative dominant indicators without re-validation against historical recession
data
severity: high
kind: domain_rule
modality: must
consequence: Changing yield curve dominance in recession probability calculation alters recession signal timing and accuracy,
causing recession-aware strategies to incorrectly rotate between risk assets and defensive positions, leading to significant
performance degradation during economic transitions
derived_from_bd_id: BD-044
- id: finance-C-202
when: When implementing cross-validation logic for time series forecasting
action: Use expanding or rolling window splits that preserve temporal ordering; verify training data chronologically precedes
validation data in every fold
severity: high
kind: architecture_guardrail
modality: must
consequence: Using random train/test splits or k-fold cross-validation on time series data introduces look-ahead bias,
causing backtest results to appear significantly better than live performance
derived_from_bd_id: BD-050
- id: finance-C-203
when: When configuring the prediction horizon parameter for the ML model
action: Set prediction_horizon to 5 (days) for consistency with backtesting; verify the horizon matches the strategy's
signal-to-noise optimization for medium-term directional prediction
severity: high
kind: domain_rule
modality: must
consequence: Using a different prediction horizon than validated in backtesting causes backtest-live inconsistency; strategies
optimized for 1-day horizon may have excessive transaction costs when applied with 5-day horizon
derived_from_bd_id: BD-051
- id: finance-C-204
when: When configuring sector-based macro regime detection
action: Verify the hardcoded sector classification (XLY/XLK=offensive, XLU/XLP=defensive, remainder=cyclical) matches
the actual sector composition of the tradable universe; update OFFENSIVE_SECTORS and DEFENSIVE_SECTORS lists when universe
changes
severity: high
kind: domain_rule
modality: must
consequence: Using incorrect sector classification causes wrong regime signals; misclassifying utilities as cyclical instead
of defensive leads to incorrect risk-on/off detection and poor timing decisions
derived_from_bd_id: BD-011
- id: finance-C-205
when: When validating ML model performance for live deployment
action: Use magnitude-weighted targets, asymmetric loss functions, or supplementary PnL-based validation metrics (Sharpe
ratio, actual returns) alongside AUC/accuracy; do not rely solely on binary classification metrics
severity: high
kind: operational_lesson
modality: must
consequence: Walk-forward validation with binary targets produces misleading metrics; models that correctly predict tiny
0.1% moves score equally with those predicting 10% moves, causing deployment of models that maximize accuracy but minimize
profitability
derived_from_bd_id: BD-082
output_validator:
assertions:
- id: OV-01
check_predicate: all(p in inspect.getsource(zvt.factors.algorithm.macd) for p in ['slow=26', 'fast=12', 'n=9'])
failure_message: 'FATAL: MACD params drifted from (fast=12, slow=26, n=9) — SL-08 violation, non-reproducible signals'
business_meaning: Standard MACD parameters are a semantic lock; drift makes results incomparable with industry-standard
indicators and non-reproducible.
source_ids:
- SL-08
- BD-036
- id: OV-02
check_predicate: result.get('total_trades', 0) > 0 or result.get('explicit_zero_trade_ack') is True
failure_message: Zero trades executed — likely missing pre-fetched data (see PC-02) or over-restrictive filters
business_meaning: A backtest with zero trades is not a valid result; either data is missing or the strategy never triggered.
Structural non-emptiness check is insufficient — we need business confirmation.
source_ids:
- SL-01
- finance-C-073
- id: OV-03
check_predicate: result.get('annual_return') is None or abs(float(result['annual_return'])) <= 5.0
failure_message: 'FATAL: |annual_return| > 500% — likely look-ahead bias or data error'
business_meaning: Annual returns exceeding 500% are physically implausible for A-share strategies; indicates look-ahead
bias or corrupt data.
source_ids: []
- id: OV-04
check_predicate: result.get('holding_change_pct') is None or abs(float(result['holding_change_pct'])) <= 1.0
failure_message: 'FATAL: |holding_change_pct| > 100% — physically impossible'
business_meaning: Holding change percentage cannot exceed 100%; violation indicates position accounting error.
source_ids:
- BD-029
- id: OV-05
check_predicate: result.get('max_drawdown') is None or abs(float(result['max_drawdown'])) <= 1.0
failure_message: 'FATAL: |max_drawdown| > 100% — impossible for non-leveraged account'
business_meaning: Maximum drawdown cannot exceed 100% without leverage; violation indicates calculation error or look-ahead
bias.
source_ids: []
- id: OV-06
check_predicate: not (hasattr(result, 'trade_log') and result.trade_log and any(result.trade_log[i].action == 'sell' and
i+1 < len(result.trade_log) and result.trade_log[i+1].action == 'buy' and result.trade_log[i].timestamp == result.trade_log[i+1].timestamp
for i in range(len(result.trade_log)-1)))
failure_message: 'FATAL: buy-before-sell detected in same cycle — SL-01 violation, creates implicit leverage'
business_meaning: SL-01 requires sell() before buy() in each cycle; violation means available_long was not updated before
buying, risking duplicate positions.
source_ids:
- SL-01
scaffold:
validate_py_path: '{workspace}/validate.py'
tail_block: "# === DO NOT MODIFY BELOW THIS LINE ===\nif __name__ == \"__main__\":\n result = run_backtest()\n from\
\ validate import enforce_validation\n enforce_validation(result, output_path=\"{workspace}/result.csv\")\n# ===\
\ END DO NOT MODIFY ==="
enforcement_protocol: 1. Never edit validate.py. 2. Never delete the DO NOT MODIFY tail block from the main script. 3. Never
wrap enforce_validation() in try/except. 4. Never rewrite result write logic — it MUST go through enforce_validation.
5. If validate.py raises ImportError, fix the dependency, do not remove the call.
acceptance:
hard_gates:
- id: G1
check: '{workspace}/result.csv exists AND file size > 0'
on_fail: Strategy did not produce output; check run_backtest() return value and enforce_validation() call
- id: G2
check: '{workspace}/result.csv.validation_passed marker file exists'
on_fail: Validation did not complete; review validate.py output and fix assertion failures
- id: G3
check: 'Main script contains literal: from validate import enforce_validation'
on_fail: Validation chain stripped; re-add the import in the DO NOT MODIFY block
- id: G4
check: 'Main script contains literal: # === DO NOT MODIFY BELOW THIS LINE ==='
on_fail: Validation fence removed; regenerate DO NOT MODIFY tail block
- id: G5
check: 'result.csv has at least 1 row: pandas.read_csv(result.csv).shape[0] >= 1'
on_fail: Empty result; check if trade_log is non-empty and factors generated signals. Confirm PC-02 (k-data exists) passed.
- id: G6
check: 'If MACD strategy: source contains ''slow=26'' AND ''fast=12'' AND ''n=9'' in algorithm call'
on_fail: MACD params drifted from SL-08 lock; restore standard (12, 26, 9)
- id: G7
check: 'For data pipeline tasks: result.csv contains ''entity_id'' and ''timestamp'' fields'
on_fail: Missing required columns; check Mixin.query_data return schema and DataFrame MultiIndex reset_index() before
writing
- id: G8
check: 'OV-03 passes: abs(annual_return) <= 5.0 (500%)'
on_fail: Physical plausibility check failed; investigate look-ahead bias or data corruption in input kdata
soft_gates:
- id: SG-01
rubric: 'Strategy narrative consistency: user intent aligns with generated strategy.py logic. dim_a: signal direction
(buy/sell) matches intent [1-5, pass>=4]; dim_b: frequency (daily/intraday) aligns [1-5, pass>=4]; dim_c: risk controls
match user intent [1-5, pass>=4].'
- id: SG-02
rubric: 'Factor combination quality. dim_a: no highly correlated factor duplication [1-5, pass>=4]; dim_b: multi-period
alignment correct [1-5, pass>=4]; dim_c: liquidity filter present for A-share [1-5, pass>=4].'
- id: SG-03
rubric: 'Data source selection appropriateness. dim_a: coverage sufficient for target entities [1-5, pass>=4]; dim_b:
provider latency acceptable for strategy frequency [1-5, pass>=4]; dim_c: no unauthorized provider used without credentials
[1-5, pass>=4].'
skill_crystallization:
trigger: all_hard_gates_passed AND user_opt_out_skill_saving != true
output_path_template: '{workspace}/../skills/{slug}.skill'
slug_template: '{blueprint_id_short}-{uc_id_lower}'
captured_fields:
- name
- intent_keywords
- entry_point_script
- validate_script
- fatal_constraints
- spec_locks
- preconditions
- install_recipes
- human_summary_translated
action: 'After all Hard Gates PASS, resolve slug via slug_template using the executed UC, then write the .skill YAML file
at output_path_template. Notify user in their detected locale: ''Skill saved as {slug}.skill — next time say one of {sample_triggers}
from the matched UC to invoke directly.'''
violation_signal: All hard gates passed but no .skill file exists at expected path
skill_file_schema:
name: finance-bp-083 / Database Snapshot Optimization
version: v5.3
intent_keywords:
- backup
- snapshot
- parquet
- database backup
- compress data
entry_point: run_backtest
fatal_guards:
- SL-01
- SL-02
- SL-03
- SL-04
- SL-05
- SL-06
- SL-07
- SL-08
- SL-10
- SL-11
- SL-12
spec_locks:
- SL-01
- SL-02
- SL-03
- SL-04
- SL-05
- SL-06
- SL-07
- SL-08
- SL-09
- SL-10
- SL-11
- SL-12
preconditions:
- PC-01
- PC-02
- PC-03
- PC-04
post_install_notice:
trigger: skill_installation_complete
message_template:
positioning: I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow.
capability_catalog:
group_strategy:
source: auto_grouped
strategy_reason: auto-grouped by UC.type (3 distinct values, balanced distribution)
groups:
- group_id: data_pipeline
name: Data Pipeline
description: ''
emoji: 📊
uc_count: 9
ucs:
- uc_id: UC-101
name: Database Snapshot Optimization
short_description: 'Creates optimized database backups by partitioning hot (<90 days) and cold (>90 days) data into
appropriate storage formats with ZSTD compression and '
sample_triggers:
- backup
- snapshot
- parquet
- uc_id: UC-102
name: Database Compaction and Optimization
short_description: Optimizes database performance by running VACUUM, rebuilding indexes, and deduplicating records
within retention windows while measuring compression s
sample_triggers:
- vacuum
- optimize
- database cleanup
- uc_id: UC-104
name: Daily Economic Data Refresh
short_description: Fetches each economic data from FRED and Yahoo Finance APIs daily and stores results in cache
for dashboard consumption
sample_triggers:
- refresh data
- daily update
- FRED data
- uc_id: UC-105
name: Data Retention Policy Cleanup
short_description: Archives data older than retention periods to Parquet files and deletes old records from main
tables to reduce database size while maintaining histori
sample_triggers:
- data retention
- cleanup old data
- archive historical
- uc_id: UC-108
name: FRED Data File Organization
short_description: Organizes FRED-related data files and scripts by moving them into a dedicated directory structure
sample_triggers:
- organize files
- move FRED data
- file management
- uc_id: UC-109
name: Offline Sample Data Generation
short_description: Generates sample datasets for offline mode testing, including FRED, Yahoo Finance, and World
Bank sample data
sample_triggers:
- generate sample data
- offline mode
- test data
- uc_id: UC-110
name: DuckDB Database Initialization
short_description: Initializes the DuckDB database by creating each required tables and indexes for the Economic
Dashboard
sample_triggers:
- init database
- create tables
- database setup
- uc_id: UC-112
name: Pickle Cache to DuckDB Migration
short_description: Migrates existing pickle cache files containing FRED and Yahoo Finance data to the new DuckDB
database format
sample_triggers:
- migrate pickle
- convert cache
- DuckDB migration
- uc_id: UC-113
name: Smart Data Refresh with SLA Awareness
short_description: Intelligently refreshes economic data based on natural update frequencies and SLAs, respecting
rate limits and only fetching data when needed
sample_triggers:
- smart refresh
- SLA aware
- rate limit
- group_id: monitoring
name: Monitoring
description: ''
emoji: 📦
uc_count: 3
ucs:
- uc_id: UC-103
name: API Key Management Verification
short_description: Verifies the API key management feature implementation is working correctly by testing module
imports, credential initialization, and key storage/retr
sample_triggers:
- verify API keys
- test credentials
- API setup verification
- uc_id: UC-106
name: API Key Management Quickstart
short_description: Provides a quick start guide for initializing and testing API key management, storing and verifying
FRED API keys securely
sample_triggers:
- setup API keys
- quick start
- initialize credentials
- uc_id: UC-107
name: Credentials Initialization
short_description: Initializes and stores API credentials (FRED API key) securely in encrypted form for authenticated
data access
sample_triggers:
- setup credentials
- API key initialization
- secure storage
- group_id: research_analysis
name: Research Analysis
description: ''
emoji: 📦
uc_count: 1
ucs:
- uc_id: UC-111
name: News and Sentiment Data Fetching
short_description: Fetches news articles and sentiment data for specified stock symbols, including Google Trends
data for sentiment analysis
sample_triggers:
- news sentiment
- fetch news
- sentiment analysis
call_to_action: Tell me which one you want to try.
featured_entries:
- uc_id: UC-101
beginner_prompt: Try database snapshot optimization
auto_selected: true
- uc_id: UC-102
beginner_prompt: Try database compaction and optimization
auto_selected: true
- uc_id: UC-103
beginner_prompt: Try api key management verification
auto_selected: true
more_info_hint: Ask me 'what else can you do?' to see all 13 capabilities.
locale_rendering:
instruction: On skill_installation_complete, translate ALL user-facing strings (positioning + capability_catalog.groups[].name
+ capability_catalog.groups[].description + capability_catalog.groups[].ucs[].short_description + call_to_action + featured_entries[].beginner_prompt
+ more_info_hint) into detected user locale per locale_contract. Preserve UC-IDs, group_id, emoji, and sample_triggers
verbatim.
preserve_verbatim:
- UC-IDs
- group_id
- emoji
- sample_triggers
- technical_class_names
enforcement:
action: 'Host agent MUST send composed message to user as the FIRST user-facing response after skill_installation_complete
event. Message MUST contain: positioning, capability_catalog (rendered as markdown tables per group), 3 featured_entries,
call_to_action, and more_info_hint.'
violation_code: PIN-01
violation_signal: First user-facing message post-install does not contain the full capability_catalog (all UCs grouped)
OR skips featured_entries OR skips call_to_action.
human_summary:
persona: Doraemon
what_i_can_do:
tagline: 'I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow. Just tell me
what you want; I''ll write the code, you don''t have to dig docs. (Heads up: ZVT natively supports A-share, HK, and
crypto. US stocks — stockus_nasdaq_AAPL — are half-baked; don''t bother for serious work.)'
use_cases:
- API Key Management Verification
- Database Compaction and Optimization
- Database Snapshot Optimization
- A-share MACD daily golden-cross backtest with hfq price adjustment from eastmoney
- 'End-to-end ZVT pipeline: FinanceRecorder + GoodCompanyFactor + StockTrader'
- Multi-factor strategy with TargetSelector (AND mode) combining MACD + volume breakout
- Index composition data collection (SZ1000, SZ2000) with EM recorder
what_i_auto_fetch:
- ZVT stage pipeline structure (data_collection → visualization) from LATEST.yaml
- Semantic locks (SL-01 through SL-12) — especially sell-before-buy ordering and MACD params
- Fatal constraints (finance-C-*) relevant to your target strategy type
- 'Default parameters: MACD(12,26,9), hfq adjustment, buy_cost=0.001, base_capital=1M CNY'
- Entity ID format (stock_sh_600000) and DataFrame MultiIndex convention
- Provider-specific recorder class names and required class attributes
what_i_ask_you:
- 'Target market: A-share (default), HK, or crypto? (US stocks in ZVT are half-baked — stockus_nasdaq_AAPL exists but coverage
is thin)'
- 'Data source / provider: eastmoney (free, no account), joinquant (account+paid), baostock (free, good history), akshare,
or qmt (broker)?'
- 'Strategy type: MACD golden-cross, MA crossover, volume breakout, fundamental screen, or custom factor?'
- 'Time range: start_timestamp and end_timestamp for backtest period'
- 'Target entity IDs: specific stocks (stock_sh_600000) or index components (SZ1000)?'
locale_rendering:
instruction: On first user contact, translate all fields above into detected user locale while preserving Doraemon persona
(direct, frank, mildly snarky, knows limits).
preserve_verbatim:
- BD-IDs
- SL-IDs
- UC-IDs
- finance-C-IDs
- class_names
- function_names
- file_paths
- numeric_thresholds
提供A股券商客户端自动化交易能力,支持雪球、芸享等多券商登录与交易操作封装,涵盖账户余额查询、持仓管理、委托下单及组合跟随等核心功能。
---
name: easytrader-cn-broker
description: |-
提供A股券商客户端自动化交易能力,支持雪球、芸享等多券商登录与交易操作封装,涵盖账户余额查询、持仓管理、委托下单及组合跟随等核心功能。
license: Proprietary. See LICENSE.txt in project root.
compatibility: Designed for Doramagic-host ecosystem (Claude Code / openclaw / Cursor). Requires Python 3.12+ with uv package manager.
metadata:
version: "v6.1"
blueprint_id: "finance-bp-094"
compiled_at: "2026-04-22T13:00:40.820921+00:00"
capability_markets: "cn-astock"
capability_activities: "backtesting, factor-research"
sop_version: "crystal-compilation-v6.1"
---
# A 股券商自动交易 (easytrader-cn-broker)
> 提供A股券商客户端自动化交易能力,支持雪球、芸享等多券商登录与交易操作封装,涵盖账户余额查询、持仓管理、委托下单及组合跟随等核心功能。
## Pipeline
`data_collection -> data_storage -> factor_computation -> target_selection -> trading_execution -> visualization`
## Top Use Cases (4 total)
### Broker API Server for Trading Operations (`UC-101`)
Provides HTTP REST API endpoints for broker authentication and retrieving account balance information programmatically, enabling integration with exte
**Triggers**: server, api, http
### XueQiu Trader Account Preparation Validation Test (`UC-102`)
Unit test that validates XueQiuTrader correctly handles account preparation with required parameters (cookies) and properly stores portfolio configura
**Triggers**: xueqiu, trader, account preparation
### YunHui Client Trader Integration Tests (`UC-103`)
Integration tests for YunHui (yh_client) broker trading operations including balance queries, today's trades/entrusts, and entrust cancellation functi
**Triggers**: yh_client, balance, entrust
For all **4** use cases, see [references/USE_CASES.md](references/USE_CASES.md).
**Execute trigger**: `When user intent matches intent_router.uc_entries[].positive_terms AND user uses action verb (run/execute/跑/执行/backtest/fetch/collect)`
## What I'll Ask You
- Target market: A-share (default), HK, or crypto? (US stocks in ZVT are half-baked — stockus_nasdaq_AAPL exists but coverage is thin)
- Data source / provider: eastmoney (free, no account), joinquant (account+paid), baostock (free, good history), akshare, or qmt (broker)?
- Strategy type: MACD golden-cross, MA crossover, volume breakout, fundamental screen, or custom factor?
- Time range: start_timestamp and end_timestamp for backtest period
- Target entity IDs: specific stocks (stock_sh_600000) or index components (SZ1000)?
## Semantic Locks (Fatal)
| ID | Rule | On Violation |
|---|---|---|
| `SL-01` | Execute sell orders before buy orders in every trading cycle | halt |
| `SL-02` | Trading signals MUST use next-bar execution (no look-ahead) | halt |
| `SL-03` | Entity IDs MUST follow format entity_type_exchange_code | halt |
| `SL-04` | DataFrame index MUST be MultiIndex (entity_id, timestamp) | halt |
| `SL-05` | TradingSignal MUST have EXACTLY ONE of: position_pct, order_money, order_amount | halt |
| `SL-06` | filter_result column semantics: True=BUY, False=SELL, None/NaN=NO ACTION | halt |
| `SL-07` | Transformer MUST run BEFORE Accumulator in factor pipeline | halt |
| `SL-08` | MACD parameters locked: fast=12, slow=26, signal=9 | halt |
Full lock definitions: [references/LOCKS.md](references/LOCKS.md)
## Top Anti-Patterns (25 total)
- **`AP-ZVT-183`**: 除权因子为 inf/NaN 时直接参与乘法导致复权静默失败
- **`AP-ZVT-179`**: 第三方数据接口超限后异常被吞噬,数据静默缺失
- **`AP-ZVT-183B`**: HFQ(后复权)与 QFQ(前复权)K 线表使用错误导致因子计算漂移
All 25 anti-patterns: [references/ANTI_PATTERNS.md](references/ANTI_PATTERNS.md)
## Evidence Quality Notice
> [QUALITY NOTICE] This crystal was compiled from blueprint finance-bp-094. Evidence verify ratio = 62.7% and audit fail total = 8. Generated results may have uncaptured requirement gaps. Verify critical decisions against source files (LATEST.yaml / LATEST.jsonl).
## Reference Files
| File | Contents | When to Load |
|---|---|---|
| [references/seed.yaml](references/seed.yaml) | V6+ 全量权威 (source-of-truth) | 有行为/决策争议时必读 |
| [references/ANTI_PATTERNS.md](references/ANTI_PATTERNS.md) | 25 条跨项目反模式 | 开始实现前 |
| [references/WISDOM.md](references/WISDOM.md) | 跨项目精华借鉴 | 架构决策时 |
| [references/CONSTRAINTS.md](references/CONSTRAINTS.md) | domain + fatal 约束 | 规则冲突时 |
| [references/USE_CASES.md](references/USE_CASES.md) | 全量 KUC-* 业务场景 | 需要完整示例时 |
| [references/LOCKS.md](references/LOCKS.md) | SL-* + preconditions + hints | 生成回测/交易代码前 |
| [references/COMPONENTS.md](references/COMPONENTS.md) | AST 组件地图(按 module 拆分)| 查 API 时 |
---
*Compiled by Doramagic crystal-compilation-v6.1 from `finance-bp-094` blueprint at 2026-04-22T13:00:40.820921+00:00.*
*See [human_summary.md](human_summary.md) for non-technical overview.*
FILE:human_summary.md
# Human Summary
> I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow. Just tell me what you want; I'll write the code, you don't have to dig docs. (Heads up: ZVT natively supports A-share, HK, and crypto. US stocks — stockus_nasdaq_AAPL — are half-baked; don't bother for serious work.)
## What I Can Do
- **tagline**: I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow. Just tell me what you want; I'll write the code, you don't have to dig docs. (Heads up: ZVT natively supports A-share, HK, and crypto. US stocks — stockus_nasdaq_AAPL — are half-baked; don't bother for serious work.)
- **use_cases**: ['YunHui Client Trader Integration Tests', 'XueQiu Trader Account Preparation Validation Test', 'Broker API Server for Trading Operations', 'A-share MACD daily golden-cross backtest with hfq price adjustment from eastmoney', 'End-to-end ZVT pipeline: FinanceRecorder + GoodCompanyFactor + StockTrader', 'Multi-factor strategy with TargetSelector (AND mode) combining MACD + volume breakout', 'Index composition data collection (SZ1000, SZ2000) with EM recorder']
## What I Ask You
- Target market: A-share (default), HK, or crypto? (US stocks in ZVT are half-baked — stockus_nasdaq_AAPL exists but coverage is thin)
- Data source / provider: eastmoney (free, no account), joinquant (account+paid), baostock (free, good history), akshare, or qmt (broker)?
- Strategy type: MACD golden-cross, MA crossover, volume breakout, fundamental screen, or custom factor?
- Time range: start_timestamp and end_timestamp for backtest period
- Target entity IDs: specific stocks (stock_sh_600000) or index components (SZ1000)?
FILE:references/ANTI_PATTERNS.md
# Anti-Patterns (Cross-Project)
Total: **25**
## qlib (9)
### `AP-QLIB-1930` — 回测结果与模型无关——共享 dataset 对象导致预测值被首次模型覆盖 <sub>(high)</sub>
Qlib 中多个模型复用同一个已 fit 的 DatasetH 实例时,dataset 内部的标准化 参数(fit_start_time/fit_end_time 决定的归一化统计量)在第一次 fit 后固化。 切换模型但不重新初始化 dataset,导致所有模型实际使用同一套预测信号。表现为 无论换 LightGBM/XGBoost/DNN,回测净值曲线完全一致。这是最危险的"实验看起来 在跑,但结论全部无效"反模式。
Source: https://github.com/microsoft/qlib/issues/1930
### `AP-QLIB-2090` — fit_start_time 与 train segment 双重配置引发隐式数据泄露 <sub>(high)</sub>
Qlib DatasetH 有两个"训练数据范围":handler 的 fit_start_time/fit_end_time (决定归一化器拟合范围)和 segments.train(决定模型训练范围)。常见错误是 让 fit_end_time 覆盖 valid/test 段,使归一化统计量(均值、标准差)包含了 未来数据,造成前向偏差(look-ahead bias)。两者独立配置但语义耦合,文档 未明确说明 fit_end_time 必须 <= train_end。
Source: https://github.com/microsoft/qlib/issues/2090
### `AP-QLIB-2036` — MACD 因子公式文档错误——DEA 被多除一次 CLOSE 导致量纲不一致 <sub>(high)</sub>
Qlib 官方文档中的 Alpha 公式示例将 MACD 的 DEA 定义为 EMA(DIF, 9) / CLOSE, 但 DIF 已经是无量纲(除过 CLOSE 的),再次除以 CLOSE 导致 DEA 量纲为 1/price。 基于此文档公式构建的 MACD 因子在截面标准化后与正确公式差异显著,IC 下降。 此类文档层面的公式错误会被大量用户直接照搬入生产因子库。
Source: https://github.com/microsoft/qlib/issues/2036
### `AP-QLIB-2184` — 自定义 A 股数据导入前未按约定填充停牌日 NaN,引发下游因子噪声 <sub>(high)</sub>
Qlib 约定停牌日 open/close/high/low/volume/factor 字段均应填 NaN,以便框架 在因子计算时识别并跳过。用户自建 A 股数据集时若将停牌日保留为上一日价格 (常见于从东财/Wind 直接导出的数据),会导致停牌期间的价格动量因子出现 "假信号"(价格不变但因子非零)。Qlib 不校验此约定,错误静默流入训练数据。
Source: https://github.com/microsoft/qlib/issues/2184
### `AP-QLIB-1892` — PIT(Point-In-Time)财务数据收集器依赖外部股票列表接口,全量 A 股获取不完整 <sub>(high)</sub>
Qlib 的 PIT 数据收集器(财务数据时间点快照)在初始化时调用 get_hs_stock_symbols() 获取沪深股票列表。该函数依赖东财 API,经常仅返回 部分列表而非全量 5000+ 股票,且函数在获取不完整时直接 raise ValueError。 用户若按文档步骤操作,财务数据集将只覆盖部分股票,基于 PIT 财务因子的回测 存在严重生存者偏差(未被采集的股票被隐式排除)。
Source: https://github.com/microsoft/qlib/issues/1892
### `AP-QLIB-2097` — 全市场 instrument="all" 在 32GB 内存机器上 OOM,但 CSI300 正常 <sub>(medium)</sub>
Qlib 在加载 Alpha158 特征时会将指定 universe 的全部特征矩阵一次性载入内存。 使用 instrument="csi300"(300 股)与 instrument="all"(5000+ 股)的内存占用 差约 16 倍。32GB 机器跑全市场时在 init_instance_by_config 阶段直接 OOM, 错误信息不提示内存问题。用户容易误以为是配置错误,实际上需要分批加载或 使用流式特征计算。
Source: https://github.com/microsoft/qlib/issues/2097
### `AP-QLIB-1984` — LightGBM 模型标签维度校验逻辑永远不触发导致多标签训练静默失败 <sub>(medium)</sub>
Qlib gbdt.py 中用 y.values.ndim == 2 判断是否为多标签,但从 DataFrame 取出的 Series 的 ndim 永远为 1,条件永远为 False,因此多标签训练不会走 squeeze 分支,而是直接进入 LightGBM 训练并在更深处抛出语义不明的错误。 用户尝试自定义多标签任务时无法从错误信息定位到此根因。
Source: https://github.com/microsoft/qlib/issues/1984
### `AP-QLIB-1915` — 自定义 CSV 数据 dump_bin 后 DataHandler 报 Length mismatch,D.features 却正常 <sub>(high)</sub>
Qlib 存在两套数据访问路径:D.features(直接读 binary)和 DataHandler/DataHandlerLP (带 processor pipeline)。自定义 A 股 CSV 数据在 dump_bin 时若字段顺序 或 symbol 格式(如 600000.SH vs SH600000)与 Qlib 约定不符,DataHandler 的 processor 在 align/reindex 时触发 Length mismatch,而 D.features 因不 经过 processor 而成功。这一"两套路径行为不一致"让用户误以为数据已正确导入。
Source: https://github.com/microsoft/qlib/issues/1915
### `AP-QLIB-1949` — Colab/Linux 多进程后端与 Qlib ParallelExt 冲突导致 DataHandler 完全不可用 <sub>(medium)</sub>
Qlib 在非 fork 环境(Windows 或 Google Colab)中,DataHandler 使用 joblib 并行加载特征时,ParallelExt 初始化时访问 _backend_args 属性失败(AttributeError)。 根因是 joblib 1.5+ 移除了该内部属性,Qlib 的兼容层未更新。表现为 D.features 调用抛出多层嵌套异常,用户无法从错误栈判断是并行后端问题还是数据问题。
Source: https://github.com/microsoft/qlib/issues/1949
## vnpy (4)
### `AP-VNPY-3691` — K 线生成器首根 K 线时间戳不对齐,导致第一个周期信号错误 <sub>(high)</sub>
vnpy BarGenerator 在合成 N 分钟 K 线时,第一根推送的 K 线时间戳为"当前 tick 所在分钟"而非"完整 N 分钟周期结束时间"。具体表现:09:59 的 tick 会 触发一根不完整的 5 分钟 K 线推送(本应等到 10:04 才推送)。策略若在 on_bar 中直接用 datetime.minute % 5 过滤,第一根 K 线恰好通过,但包含的 数据不足一个完整周期,用于信号计算会产生错误的开仓信号。
Source: https://github.com/vnpy/vnpy/issues/3691
### `AP-VNPY-3669` — Alpha 模块历史数据增量保存时新旧 DataFrame schema 不兼容导致 SchemaError <sub>(medium)</sub>
vnpy Alpha 模块在保存 K 线数据到 Parquet 文件时,将新下载数据(可能含 Float64 列)与已存文件(历史 Int64 列)直接 polars.concat。polars 强类型 不允许隐式类型提升,抛出 SchemaError。根因是不同数据源/版本返回的字段类型 不一致(如 volume 在部分行情源为整数,在另一些为浮点),且 concat 前无 schema 对齐步骤。影响所有使用 vnpy alpha 进行回测的历史数据构建流程。
Source: https://github.com/vnpy/vnpy/issues/3669
### `AP-VNPY-3685` — 价差交易模块 run_backtesting() 在 Jupyter 环境下静默报错,结果不可信 <sub>(high)</sub>
vnpy 4.10 价差交易(SpreadTrading)模块的 run_backtesting() 在 Jupyter 环境下存在事件循环冲突(asyncio already running),导致回测引擎部分逻辑 不执行但不抛异常,返回看似正常的回测统计数据。同样代码在命令行 Python 中无此问题。vnpy 4.x 将部分 IO 改为 async 但 Jupyter 的事件循环与之不兼容, 是"回测结果看起来正确但实际不完整"的隐蔽陷阱。
Source: https://github.com/vnpy/vnpy/issues/3685
### `AP-VNPY-3700` — 安装脚本不使用 venv 导致全局 numpy 版本被降级破坏其他依赖 <sub>(medium)</sub>
vnpy install.bat 直接在系统/conda base 环境安装,会强制降级 numpy 到 <2.0 以满足 vnpy 依赖,破坏依赖 numpy 2.x 的其他量化工具(如 scipy、pytorch 新版)。 没有 requirements.txt,依赖边界不透明。在多工具共存的量化研究环境中, vnpy 的安装脚本是"全局环境污染"的常见根源。
Source: https://github.com/vnpy/vnpy/issues/3700
## zipline (6)
### `AP-ZIPLINE-138` — 回测价格为未复权价,教程图表误导用户误判策略收益 <sub>(high)</sub>
Zipline 教程使用 AAPL 股价图做演示,但 bundle 中存储的是未复权价格(raw price), 而非经过拆股/分红调整的复权价。图表显示的历史价格与市场实际价约差 4 倍(Apple 历次拆股累计因子),用户误将"价格翻 4 倍"当作策略收益。A 股场景更严重: 除权前后价格跳变会在未复权数据中形成巨大"信号",吸引技术指标在除权日产生 虚假突破信号。
Source: https://github.com/stefan-jansen/zipline-reloaded/issues/138
### `AP-ZIPLINE-235` — 默认以当根 K 线收盘价成交,低估实盘滑点,策略回测收益虚高 <sub>(high)</sub>
Zipline 默认滑点模型在当根 K 线触发信号后,以同根 K 线收盘价成交(current bar close fill)。实盘中信号只能在下一根 K 线的开盘价附近成交(T+1 order execution)。以 A 股日线为例,用收盘价回测比用次日开盘价成交平均高估日收益 约 0.1-0.3%,年化差距可超 30%。需显式配置 slippage model 为 VolumeShareSlippage 或 FixedSlippage 并设合理 volume_limit。
Source: https://github.com/stefan-jansen/zipline-reloaded/issues/235
### `AP-ZIPLINE-190` — 日历 start_session 设为非交易日触发 DateOutOfBounds,无提示如何修正 <sub>(medium)</sub>
Zipline 在注册 bundle 或运行算法时,若 start_session 参数恰好是非交易日 (如 1998-01-01 元旦),Calendar 校验抛出 DateOutOfBounds("cannot be earlier than the first session")。错误信息仅显示交易日历起始日,不提示"请改为第一个 交易日"。A 股场景:使用 SSE/SZSE 日历时,若 start_date 恰好是春节前最后 一天次日(节假日),会触发同类错误,调试成本极高。
Source: https://github.com/stefan-jansen/zipline-reloaded/issues/190
### `AP-ZIPLINE-181` — asset db 过期后 Pipeline 报"no assets traded",误导用户排查数据范围 <sub>(high)</sub>
Zipline 的 asset database(SQLite)记录每只股票的 start/end 交易日期。若 使用了旧版 Quandl/自建 bundle 且未重新 ingest,在回测新日期范围时 Pipeline 抛出 "Failed to find any assets with country_code 'US' that traded between [dates]"。A 股场景:重新下载行情后若只更新价格数据而未重建 asset db,退市/ 新上市股票的日期范围不更新,Pipeline 过滤会悄悄排除这些股票,产生生存者偏差。
Source: https://github.com/stefan-jansen/zipline-reloaded/issues/181
### `AP-ZIPLINE-285` — week_start()/week_end() 在自定义日历(非美股)下静默失效 <sub>(medium)</sub>
Zipline schedule_function 的 date_rules.week_start() 和 date_rules.week_end() 依赖交易日历的周首/周末判断逻辑,但在非美股日历(如 ASX、SSE)中,该逻辑 与 NYSE 日历的偏移计算不兼容,导致 schedule 永远不触发或在错误的日期触发。 A 股场景:使用 SSE 日历时,含春节等连续长假的周,week_start 可能跳过整个 假期周而不调仓,但用户无法从日志发现未触发的调度。
Source: https://github.com/stefan-jansen/zipline-reloaded/issues/285
### `AP-ZIPLINE-240` — 回测日期时区必须为 UTC,传入 naive datetime 引发深层 AssertionError <sub>(medium)</sub>
Zipline 内部强制要求所有时间戳为 UTC aware datetime。当用户传入 naive datetime (无时区信息,如 pd.Timestamp('2020-01-01'))时,不在入口处报错,而是在 算法执行深处触发 AssertionError: Algorithm should have a utc datetime,栈深 难以定位。A 股开发者从本地 CST 时间导入数据时极易触发此陷阱,需在 bundle 注册时显式 tz_localize('UTC')。
Source: https://github.com/stefan-jansen/zipline-reloaded/issues/240
## zvt (6)
### `AP-ZVT-183` — 除权因子为 inf/NaN 时直接参与乘法导致复权静默失败 <sub>(high)</sub>
ZVT 在计算前复权因子时以 new/old 价格比计算 qfq_factor。当 old==0(新股首日 或数据缺失)时因子为 inf;当 kdata.open 本身为 None(停牌日未填充)时乘法 抛出 TypeError。结果:整个 entity 的复权计算中断,后续 K 线全部丢失,但主 流程只 log ERROR 不中断,用户往往不知道已有大量股票数据损坏。
Source: https://github.com/zvtvz/zvt/issues/183
### `AP-ZVT-179` — 第三方数据接口超限后异常被吞噬,数据静默缺失 <sub>(high)</sub>
ZVT 使用聚宽 jqdatasdk 批量拉取全市场 K 线时(4000+ 股票),触发聚宽每日 最大查询条数限制(错误:已超过每日最大查询数量)。ZVT 捕获异常后继续执行下一 entity,导致超限后所有股票的当日数据均静默缺失。回测若使用该残缺数据库,因 子计算结果将产生系统性偏差,且无告警。
Source: https://github.com/zvtvz/zvt/issues/179
### `AP-ZVT-161` — 全市场 SQLite 批量因子计算触发 too many SQL variables 错误 <sub>(medium)</sub>
ZVT 在计算 VolumeUpMaFactor 等多股因子时,将所有 entity_id 拼入单条 SQL 的 IN 子句。当 A 股全市场(5000+ 股)一次性查询时,触发 SQLite 默认限制 SQLITE_MAX_VARIABLE_NUMBER=999。调大 max_allowed_packet(MySQL 参数)无效, 根因是 SQLite 变量数上限。正确解法是分批查询,但 ZVT 早期版本未处理此边界。
Source: https://github.com/zvtvz/zvt/issues/161
### `AP-ZVT-129` — 使用通配符导入隐藏 API 版本变更,AdjustType 等枚举莫名消失 <sub>(medium)</sub>
ZVT 文档示例使用 `from zvt import *` 导入所有符号。当 ZVT 版本升级重构 枚举(如将 AdjustType 移入子模块)后,通配符导入不再包含该符号,触发 AttributeError。使用者误以为是安装问题,实际是版本间 API breaking change 未在 CHANGELOG 中标注,且通配符导入掩盖了具体来源。应显式 import 枚举类。
Source: https://github.com/zvtvz/zvt/issues/129
### `AP-ZVT-187` — 回测引擎未在数据层空结果时提前终止,导致空指针级联崩溃 <sub>(medium)</sub>
ZVT Trader 在 load_data 完成后检查数据为空时,不提前退出,而是将空 DataFrame 传入 selector 计算,触发后续 NoneType 操作链式崩溃。错误栈深且难以定位根因, 用户误以为是策略逻辑问题。根因是数据时间窗口配置错误(start/end 不在数据 库覆盖范围内)但无有效校验。
Source: https://github.com/zvtvz/zvt/issues/187
### `AP-ZVT-183B` — HFQ(后复权)与 QFQ(前复权)K 线表使用错误导致因子计算漂移 <sub>(high)</sub>
ZVT 提供 Stock1dKdata(不复权)、Stock1dHfqKdata(后复权)、Stock1dQfqKdata (前复权)三张独立表。用户在计算价格动量/均线因子时混用两张表(如用不复权 做均线,用后复权做收益率),导致除权日前后因子值产生跳变。ZVT 不做跨表 复权类型一致性校验,混用静默通过。
Source: https://github.com/zvtvz/zvt/issues/183
FILE:references/COMPONENTS.md
# Component Capability Map
**Project**: finance-bp-094--easytrader
**Scan date**: 2026-04-22
**Stats**: {'total_files': 5, 'total_classes': 28, 'total_functions': 0, 'total_stages': 5}
## Modules (5)
- [authentication_&_connection](components/authentication_-_connection.md): 5 classes
- [account_query](components/account_query.md): 6 classes
- [order_execution](components/order_execution.md): 6 classes
- [trade_following](components/trade_following.md): 6 classes
- [remote_service_layer](components/remote_service_layer.md): 5 classes
FILE:references/CONSTRAINTS.md
# Constraints
## preservation_manifest
```yaml
required_objects:
business_decisions_count: 146
fatal_constraints_count: 46
non_fatal_constraints_count: 140
use_cases_count: 4
semantic_locks_count: 12
preconditions_count: 4
evidence_quality_rules_count: 2
traceback_scenarios_count: 5
```
## Domain Constraints Injected (71)
- **`SHARED-CN-ASTOCK-T1-001`** <sub>(fatal)</sub>: A 股股票实行 T+1 交收制度:T 日买入的股票最早 T+1 日方可卖出。 T 日卖出所得资金可当日再用于买入。回测框架若未施加 T+1 持仓锁定, 将高估换手率与策略胜率,尤其损害日内反转类策略的真实性。
- **`SHARED-CN-ASTOCK-T1-002`** <sub>(fatal)</sub>: 沪深主板股票日涨跌幅上限为 ±10%(ST/SST 股票 ±5%)。 涨停封板时买方消失、跌停封板时卖方消失;回测若假设当日可以任意价格 成交,会系统性高估可执行性。封板检测应在成交模拟层强制实施。
- **`SHARED-CN-ASTOCK-T1-003`** <sub>(high)</sub>: 科创板和创业板(2020年8月改革后)正常交易日涨跌幅为 ±20%; 北交所 ±30%;新股上市后前5个交易日不设涨跌幅限制。 回测若对所有股票统一套用 ±10% 过滤逻辑,会错误剔除或错误包含这些板块的成交。
- **`SHARED-CN-ASTOCK-T1-004`** <sub>(high)</sub>: ST/*ST 股票日涨跌幅限制为 ±5%,流动性极差,成交假设不可与正常股票混用。 包含历史 ST 股票(最终退市)但不纳入回测会产生幸存者偏差; 纳入回测但不区分 ST 涨跌幅会错误模拟成交。
- **`SHARED-CN-ASTOCK-T1-005`** <sub>(medium)</sub>: A 股开盘集合竞价(9:15-9:25)和收盘集合竞价(14:57-15:00)期间, 成交价由"最大成交量原则"确定,非即时撮合。回测以开盘价或收盘价假设 即时全量成交会低估实际滑点风险,大单策略尤为明显。
- **`SHARED-CN-ASTOCK-T1-006`** <sub>(high)</sub>: 停牌制度:A 股长期停牌(2018年前可长达数月)期间,持仓资金被锁定, 无法再平衡,机会成本在回测中普遍被忽略。应在因子计算前过滤停牌日 (volume == 0 或 is_suspended == True),停牌期间不发出信号。
- **`SHARED-CN-ASTOCK-T1-007`** <sub>(high)</sub>: 新股上市后前5个交易日无涨跌幅限制(首日涨幅可超300%), 且无完整历史数据(均线/波动率/换手率因子无法计算)。 应在因子计算前过滤上市不足 N 个交易日(通常 60-252 日)的股票。
- **`SHARED-CN-ASTOCK-T1-008`** <sub>(high)</sub>: A 股程序化交易监管新规(2025年7月7日施行):单账户每秒申报/撤单 ≥ 300 笔, 或单日申报/撤单 ≥ 20000 笔,被认定为高频交易,须向交易所报备。 AI 生成的量化策略若频率超标则无法合规运行,应在策略设计期提示。
- **`SHARED-CN-ASTOCK-ADJ-001`** <sub>(fatal)</sub>: 除权除息日股价跳空是账面调整而非真实亏损。复权选择: 不复权会虚增策略亏损;前复权会将历史价格内嵌未来分红信息(lookahead bias); 后复权以上市首日为基准累积,是量化回测的最优选择。
- **`SHARED-CN-ASTOCK-ADJ-002`** <sub>(fatal)</sub>: A 股上市公司财务报告披露有法定延迟:年报在次年4月30日前、 半年报在8月31日前、季报分别在4月30日(一季)/10月31日(三季)前披露。 回测中使用财务数据时,必须以实际披露日期(announcement_date)而非 会计期间结束日作为数据可用时间点,否则引入 point-in-time lookahead bias。
- **`SHARED-CN-ASTOCK-ADJ-003`** <sub>(high)</sub>: 分红送股转增和配股会导致除权除息日后股本增加,历史持股数量不变但股价等比 缩水,若回测系统未同步调整持仓股数,会在除权日产生虚假亏损或盈利。
- **`SHARED-CN-ASTOCK-ADJ-004`** <sub>(medium)</sub>: 大宗交易与竞价交易价差:大宗交易成交价可比市价折价最多 10%(主板), 但此价格不影响次日竞价开盘。大宗交易数据出现在收盘后,若将其混入 日内 OHLCV 数据,会污染收盘价和成交量的正常计算。
- **`SHARED-CN-ASTOCK-ADJ-005`** <sub>(fatal)</sub>: 融资融券(两融)做空限制:A 股散户无法直接卖空,融券标的池有限(主要为 大盘蓝筹,中小盘融券极度稀缺),融券利率远高于融资利率。 回测若直接假设可做空任意股票,会产生不可执行的策略,实盘与回测严重背离。
- **`SHARED-CN-ASTOCK-FX-001`** <sub>(high)</sub>: 通过沪深港通(北向)买入股票,境外投资者合计持股上限 30%,预警线 28%。 当外资持股比例达 28% 时,联交所暂停该股新增买盘,直到降至 26% 才恢复。 策略若重仓外资偏好股(消费/医药龙头),需监控外资持股比例。
- **`SHARED-CN-ASTOCK-FX-002`** <sub>(high)</sub>: 5% 举牌规则:单一投资者持有上市公司已发行股份超过 5%,须在3日内向证监会 和交易所报告并公告;在此期间及公告后2日内不得再买卖。 量化选股系统若不考虑此规则,重仓股超过 5% 阈值后将面临强制停止买入。
- **`SHARED-CN-ASTOCK-FX-003`** <sub>(high)</sub>: 公募基金"双十原则":单基金持有单只股票不超过净资产 10%, 同一基金管理人旗下所有基金合计不超过该公司已发行股份 10%。 量化选股组合若部署于公募基金,需在优化约束中强制加入合规上限。
- **`SHARED-CN-ASTOCK-FX-004`** <sub>(fatal)</sub>: 内幕交易边界:AI 辅助量化系统的所有输入数据必须来自公开已披露信息。 通过非公开渠道(私有数据服务/内部消息/重组前预知)触发的自动化交易 构成内幕交易,适用《证券法》第80-83条及《内幕交易行为认定指引》。
- **`SHARED-CN-ASTOCK-MKT-001`** <sub>(fatal)</sub>: 幸存者偏差:使用当前 A 股成分股(如当前沪深300)作为历史回测股票池, 会遗漏曾被纳入指数但因业绩差被调出或退市的股票。2020-2024年 A 股 退市数量加速(41家/年创纪录),此偏差日趋严重。必须使用历史时点快照。
- **`SHARED-CN-ASTOCK-MKT-002`** <sub>(medium)</sub>: 指数成分股调整效应:沪深300/中证500等每半年调整一次(6月/12月), 被纳入股票通常在公告日至生效日之间显著上涨(被动资金被动买入), 被剔除股票则相反。回测股票池应使用历史成分股快照,并标注调整窗口期。
- **`SHARED-CN-ASTOCK-MKT-003`** <sub>(high)</sub>: 策略拥挤(Strategy Crowding):大量量化私募使用相似因子模型时, 持仓高度重叠,遇市场冲击时集体卖出形成踩踏。2024年2月 A 股量化危机 是典型案例(小盘股指数单日跌幅超 10%)。需监控因子多头持仓与 主流量化基金的重叠率。
- **`SHARED-CN-ASTOCK-MKT-004`** <sub>(high)</sub>: A 股量化对冲策略常用 IF/IC/IM 股指期货做多/空对冲系统性风险。 但 A 股股指期货长期处于贴水(远期价格 < 现货),IC 年化贴水可达 10-20%。 回测若仅考虑价格收益而忽略期货贴水/升水,会严重高估对冲策略净收益。
- **`SHARED-CN-ASTOCK-MKT-005`** <sub>(high)</sub>: A 股月度动量因子在方向上与美股相反:近1个月表现最好的股票, 下1个月大概率反转(反转效应而非动量)。机构研究(华泰/东吴证券) 与学术论文均验证:直接套用美股月度动量因子在 A 股会产生系统性亏损。
- **`SHARED-CN-ASTOCK-BF-001`** <sub>(medium)</sub>: 处置效应(Shefrin & Statman 1985)在 A 股散户中尤为显著: 投资者倾向于过早卖出盈利股票、过长持有亏损股票。上交所实证研究证实 超过 90% 的个人账户存在此效应,AI 辅助工具不应迁就"持有亏损等解套" 的直觉,而应基于量化信号提供纪律性止损止盈建议。
- **`SHARED-CN-ASTOCK-BF-002`** <sub>(medium)</sub>: A 股以散户为主(个人账户交易量占比超 80%),羊群效应显著:散户倾向于 跟风操作,导致价格非理性波动(如 2015年杠杆牛熊)。量化策略应避免 使用成交量排行/热度排行等可能强化羊群信号的指标作为主要因子。
- **`SHARED-CN-ASTOCK-BF-003`** <sub>(medium)</sub>: 过度自信效应(Barber & Odean 2000)在 A 股散户中更严重:散户年均换手率 超 500%,机构长期收益显著优于散户。高换手率策略经交易成本后净收益往往 更低。AI 不应鼓励"频繁操作",而应推荐低频高质信号驱动交易。
- **`SHARED-CN-ASTOCK-BF-004`** <sub>(medium)</sub>: A 股日历效应:春节效应(节前5日和节后1-3日倾向上涨)、月初效应 (月初第1-5个交易日表现优于月中/月末)已有学术实证(南京财经大学等)。 策略应在日历特殊窗口降低信号置信度,或单独评估日历驱动收益的贡献。
- **`SHARED-CN-ASTOCK-BF-005`** <sub>(high)</sub>: 策略容量(Capacity)限制:A 股小盘/微盘股日均成交额仅数百万, 大资金买入/卖出会造成严重价格冲击,策略实际容量可能仅几千万元。 回测结果不可外推至亿级资金,应在回测中加入成交量比例上限约束。
- **`SHARED-CN-ASTOCK-COST-001`** <sub>(fatal)</sub>: A 股完整交易成本结构(2023年8月调整后):印花税卖出单向 0.05%; 佣金双向约 0.01%(最低5元);过户费(沪市)0.001%; 滑点/冲击成本小盘股 0.1%-0.5%/次。忽略成本的回测策略年化收益率 具有欺骗性,高频/高换手策略尤甚。
- **`SHARED-CN-ASTOCK-COST-002`** <sub>(high)</sub>: 市场冲击成本(Market Impact)在回测中通常完全缺失,但在实盘中可能是 最大成本来源。A 股小盘股 100 万元买入可能推高 1% 以上。冲击成本与 成交规模呈幂律而非线性关系,应使用 Almgren-Chriss 模型或简化版估算。
- **`SHARED-CN-ASTOCK-COST-003`** <sub>(medium)</sub>: 大股东/董监高减持新规(证监会第224号令,2024年5月):持股5%以上大股东 通过集中竞价减持须提前15个交易日披露减持计划,3个月内不超过股份总数1%。 解禁股减持压力是 A 股特有的系统性风险因子,回测中忽略解禁日历会低估 相关股票的持股风险。
- **`SHARED-CN-ASTOCK-DATA-001`** <sub>(high)</sub>: A 股交易日历与自然日历不一致:存在法定节假日调休导致的"补班日"(周六上班), 以及临时停市(2015年7月8日至7月10日因股灾紧急停市)。 使用通用工作日历(weekdays)推算 A 股交易日会产生偏差, 必须使用 A 股专用交易日历(如 exchange_calendars 或 tushare 的交易日接口)。
- **`SHARED-CN-ASTOCK-DATA-002`** <sub>(medium)</sub>: A 股退市后股票代码可能被新股重用(极少见但存在)。使用纯代码(如 '000001') 作为历史数据主键而不包含交易所后缀('.SZ')或上市日期范围,可能导致 历史数据与当前股票的错误混淆,长周期回测中需特别注意。
- **`SHARED-BT-LAB-001`** <sub>(fatal)</sub>: 未来函数(Lookahead Bias):在模拟历史时间点 t 的交易决策时, 不得使用 t 时刻之后才能知道的信息。最常见形式: (1) 使用收盘价计算信号并同日以收盘价成交; (2) 将 T 日收盘后计算的指标标记在同一根 K 线; (3) 使用当日最高/最低价作为成交假设。 信号计算与成交时间必须对齐:T 日收盘后计算信号,T+1 日开盘成交。
- **`SHARED-BT-LAB-002`** <sub>(high)</sub>: 指标预热期(Warmup Period)处理:滚动窗口指标在前 N 个 bar 时 NaN, 这些 bar 不应参与信号计算和持仓决策。强制要求指标的 warmup_period 与最长 lookback 期等长,且 warmup 期间持仓应置零。
- **`SHARED-BT-LAB-003`** <sub>(fatal)</sub>: ML/DL 模型时序数据划分必须按时间顺序:TRAIN < VALID < TEST, 不可使用随机 k-fold 分折(会将未来数据混入训练集)。 应使用 TimeSeriesSplit 或 Walk-Forward 验证。
- **`SHARED-BT-LAB-004`** <sub>(fatal)</sub>: 开盘价/最高价/最低价成交假设:日线回测中假设每日可以最高价卖出或 最低价买入(如动量策略"最高价止盈"),这是明显的 lookahead, 因为日内最高/最低价只有收盘后才能确认。成交价只能用开盘价或 前一日收盘价(带滑点)。
- **`SHARED-BT-LAB-005`** <sub>(high)</sub>: 数据对齐偏移(Off-by-one):pandas rolling/shift 等操作容易引入细微的 1 期偏移错误。应在代码中明确记录每个序列的"观测时间点", 并通过 assert 验证关键时间对齐关系。
- **`SHARED-BT-LAB-006`** <sub>(high)</sub>: 过度优化(Overfitting):回测数量越多,过拟合概率越高。 Bailey et al.(2014)证明 Optimal Sharpe Ratio 期望值随回测次数单调递减。 应使用 Walk-Forward 验证代替 in-sample 参数穷举,并报告 Deflated Sharpe Ratio(DSR)而非峰值 Sharpe。
- **`SHARED-BT-SURV-001`** <sub>(fatal)</sub>: 幸存者偏差(Survivorship Bias):使用当前市场成分股作为历史回测股票池, 会遗漏曾经存在但后来退市、摘牌或被合并的股票,系统性高估策略历史收益率。 回测股票池必须使用历史时点快照(point-in-time universe)。
- **`SHARED-BT-SURV-002`** <sub>(high)</sub>: In-Sample / Out-of-Sample 划分:策略开发、参数选择必须在样本内完成, 样本外数据仅用于最终验证,不可多次"看"样本外数据后继续调优 (会将样本外变为新的样本内,重蹈过拟合)。
- **`SHARED-BT-SURV-003`** <sub>(high)</sub>: 停牌/缺失数据的填充策略:停牌日价格不可简单用前一日收盘价 forward-fill, 因为这会在复盘时造成"零成交量"日参与了因子计算和信号生成。 应在因子计算层显式过滤缺失交易日,不填充。
- **`SHARED-BT-SURV-004`** <sub>(high)</sub>: 异常值(Extreme Value)污染:原始市场数据可能含有数据源错误(如除权未 及时调整、手工录入错误导致的极端价格),不清洗直接进入因子计算会产生 极端信号,污染整个横截面。应在 pipeline 入口处过滤 3-sigma 异常值。
- **`SHARED-BT-COST-001`** <sub>(fatal)</sub>: 交易成本(佣金 + 印花税/转让税 + 过户费)必须在回测初始化时强制配置, 不可使用零成本默认值。忽略成本的回测策略绩效指标具有欺骗性, 高换手率策略尤其严重(单边往返成本往往吞噬 50%+ 的毛收益)。
- **`SHARED-BT-COST-002`** <sub>(high)</sub>: 滑点(Slippage)建模:回测若无滑点,假设每笔订单以理想价格成交, 高频策略在实盘中会因成交价劣化而产生严重亏损。至少应配置固定点差 或比例滑点;大单应使用成交量比例模型(如不超过日成交量 5%)。
- **`SHARED-BT-COST-003`** <sub>(high)</sub>: 换手率(Turnover)必须在回测绩效报告中展示并与成本关联分析。 月换手率超过 50%(年化 600%+)时,策略净收益对成本假设极度敏感, 每 10bps 成本变化可能改变策略盈亏结论,必须做成本敏感性分析。
- **`SHARED-BT-COST-004`** <sub>(medium)</sub>: 仓位规模化(Position Sizing)必须纳入资金量约束:回测应模拟固定资金量 下的实际持仓股数(取整),而非假设可以持有小数股。 对小盘股,最小交易单位(A股:100股/手)会导致实际可持仓量与目标权重 产生偏差,应在回测中模拟取整效应。
- **`SHARED-BT-TIME-001`** <sub>(high)</sub>: 时间戳时区统一:多数据源合并时,UTC vs 本地时间混用是常见数据腐败源。 所有时间戳必须在 pipeline 入口处统一转换为同一时区(推荐 UTC 存储, 市场本地时区展示),不可在 pipeline 中途混用不同时区。
- **`SHARED-BT-TIME-002`** <sub>(high)</sub>: 交易日历对齐:合并不同市场或不同频率数据时(如日线价格 + 周频因子), 必须使用明确的交易日历进行 reindex/merge,不可使用 outer join 后 fillna, 否则会在非交易日(节假日)创建虚假数据行。
- **`SHARED-BT-TIME-003`** <sub>(high)</sub>: 增量更新边界校验:历史数据增量更新时,必须从数据库查询已存最新日期, 仅下载该日期之后的数据。若重新下载已有数据并追加,会产生时间戳重复行, 导致回测时序错误。更新前后必须校验无重复 (index.duplicated().any() == False)。
- **`SHARED-BT-TIME-004`** <sub>(medium)</sub>: 回测绩效归因失真:基准(Benchmark)选择不当会使 Alpha/Beta 计算失真。 应选用策略实际可投资的被动基准(如 HS300 ETF),而非不可直接投资的 价格指数(如 HS300 指数)。价格指数不含股息再投资,会低估持仓基准收益。
- **`SHARED-BT-PERF-001`** <sub>(medium)</sub>: 最大回撤(Max Drawdown)计算必须使用净值序列(portfolio value), 不可用累计收益率序列代替。若使用对数收益率累加,会低估回撤深度 (因对数收益率在下跌时会比简单收益率偏小)。
- **`SHARED-BT-PERF-002`** <sub>(medium)</sub>: Sharpe Ratio 年化化约定:年化 Sharpe = 日 Sharpe × sqrt(252)(股票,252 交易日) 或 × sqrt(365)(加密货币,365日)。不同系统默认不同,跨系统对比前必须 确认年化因子,否则 Sharpe 不可比。
- **`SHARED-BT-PERF-003`** <sub>(medium)</sub>: Calmar Ratio / Sortino Ratio 优于 Sharpe Ratio 作为风险调整收益指标: Sharpe 假设收益正态分布,A 股/加密市场的收益分布显著左偏(肥尾), 会低估下行风险。量化评估应同时报告 Sortino(仅下行波动)和 Calmar(年化收益/最大回撤),不应单一依赖 Sharpe。
- **`SHARED-BT-PERF-004`** <sub>(medium)</sub>: 回测绩效归因应拆解为:alpha(主动收益)、beta(市场收益)、 因子暴露收益(style/sector)和特异性收益(stock selection)。 不做归因的回测无法区分"策略优秀"与"顺风行情恰好 beta 对了"。
- **`SHARED-FR-IC-001`** <sub>(high)</sub>: IC(信息系数)是衡量因子预测能力的核心指标,定义为因子值与 下期收益率的 Spearman 秩相关系数(ICIR = IC / std(IC))。 IC 绝对值 > 0.05 视为有预测能力的初步证据,ICIR > 0.5 视为稳定。 不计算 IC 直接报告回测绩效是因子有效性证明缺失的典型问题。
- **`SHARED-FR-IC-002`** <sub>(high)</sub>: IC 衰减(IC Decay)分析:因子预测能力通常随持仓期增长而衰减。 应计算 1/5/10/20 日 IC 序列,识别因子的最优持仓期。 IC 在1日高但20日迅速衰减的因子是短期因子,不适合月度换仓策略; 反之亦然。使用错误的持仓期会严重损害因子实盘表现。
- **`SHARED-FR-IC-003`** <sub>(high)</sub>: Harvey, Liu & Zhu (2016) 警告:学术界已发现 300+ 个"显著"因子, 其中大量是多重检验下的误发现(False Discovery)。因子有效性要求: t-stat > 3.0(而非传统的 1.96);或在不同时段/市场独立复现; 或有清晰的经济学逻辑。不满足上述条件的因子极可能是数据挖掘产物。
- **`SHARED-FR-IC-004`** <sub>(high)</sub>: 因子换手率(Factor Turnover)控制:高 IC 但高换手率的因子,在扣除 交易成本后净 IC 可能为负。应计算换手率调整后的有效 IC: net_IC = IC - turnover × cost_per_turn。目标换手率 ≤ 50%(月频)。
- **`SHARED-FR-IC-005`** <sub>(medium)</sub>: 因子衰减期(Half-life)是因子信号强度的核心参数,直接决定最优再平衡频率。 半衰期 < 5 日:日频或周频换仓;5-20 日:周频或双周;> 20 日:月频换仓。 错误地对短期因子使用月频换仓,会导致大量 alpha 在持仓期内消散。
- **`SHARED-FR-NEUT-001`** <sub>(high)</sub>: 行业中性化(Industry Neutralization):因子值若不对行业均值中性化, 因子收益中会混入行业轮动收益,难以判断是因子本身还是行业暴露驱动了收益。 行业中性化操作:factor_neutral = factor - industry_mean(factor)。
- **`SHARED-FR-NEUT-002`** <sub>(high)</sub>: 市值中性化(Market Cap Neutralization):小盘股效应(小盘跑赢大盘) 是金融史上最持久的 anomaly 之一,会污染几乎所有未中性化的因子。 若因子与市值高度相关,选股会系统性偏向小盘,收益来自市值暴露而非因子本身。 需同时进行行业和市值中性化(Fama-MacBeth 回归或残差法)。
- **`SHARED-FR-NEUT-003`** <sub>(high)</sub>: 异常值处理(Winsorize/MAD):因子原始值通常含有极端值,极端值会扭曲 分组分析(如 Q1/Q10 十分位)。应对原始因子值做 Winsorize(截尾至 [1%, 99%] 或 3-sigma)或 MAD(中位数绝对偏差)缩尾,然后再排名/中性化。
- **`SHARED-FR-NEUT-004`** <sub>(medium)</sub>: 因子正交化(Factor Orthogonalization):当多个因子共同用于合成打分时, 高相关因子的合成等效于对单一因子过度权重,稀释信号多样性。 应在合成前对因子做施密特正交化或 PCA,消除因子间的多重共线性。
- **`SHARED-FR-NEUT-005`** <sub>(medium)</sub>: 缺失数据填充策略:因子计算中的 NaN(停牌/新股/数据缺口)若用截面均值填充 会引入 lookahead bias(均值本身含未来信息);若完全删除会产生幸存者偏差; 正确做法是用截面中位数(当日所有股票的中位数,不依赖未来)或将该股当日排除。
- **`SHARED-FR-PORT-001`** <sub>(high)</sub>: 分层分析(Quantile Analysis):因子评估应使用 Q1/Q5(五分位)或 Q1/Q10(十分位)分组的多空收益差(top minus bottom spread)作为 主要评估指标,而非简单的多头收益。Q1 多 Q5 空的"单调性"检验是 因子有效性的核心证据:单调递增/递减 > 非单调 >> 仅多头有效。
- **`SHARED-FR-PORT-002`** <sub>(medium)</sub>: Alpha 衰减测试(Alpha Decay Test):因子的月度 IC 在不同时段(牛市/熊市/ 震荡市)的稳定性是因子鲁棒性的重要证据。IC 仅在某个特定市场状态下有效 的因子不适合全天候部署;应分段(rolling 12M)展示 IC 时序, 识别因子失效期。
- **`SHARED-FR-PORT-003`** <sub>(medium)</sub>: 换仓成本感知(Turnover-Aware Selection):因子排名靠近中间地带(49-51 分位) 的股票,排名小幅波动就会触发换仓,产生大量无效交易成本。 应在选股时设置换仓缓冲区(buffer zone):只在排名变化超过阈值时才换仓。
- **`SHARED-FR-PORT-004`** <sub>(medium)</sub>: 分组收益的统计显著性(Bootstrap 检验):因子分层收益差(Q1-Q5 spread) 即使在历史数据上很大,也可能是偶然,需要 bootstrap 或 t-test 检验 显著性(p-value < 0.05)。小样本回测期(< 3年)的分层收益尤其不可靠。
- **`SHARED-FR-XFER-001`** <sub>(high)</sub>: 因子跨市场可移植性验证:在一个市场有效的因子,不必然在另一个市场有效。 将美股因子直接套用 A 股、或将股票因子套用期货/加密货币,需要独立 IC 验证, 不可假设跨市场通用性。A 股特有异象(如反转效应、ST 价格异常)不存在于美股。
- **`SHARED-FR-XFER-002`** <sub>(medium)</sub>: 因子有效性时间稳定性:曾经有效的因子会因市场学习和套利行为逐渐失效 (McLean & Pontiff 2016 证明因子发表后平均衰减 58%)。 应定期(每季度/年)重新评估因子 IC,失效因子应及时替换或降权。
- **`SHARED-FR-XFER-003`** <sub>(medium)</sub>: 因子与宏观经济环境的交互:利率周期/经济周期/市场情绪对因子有效性影响显著。 价值因子(低 P/B)在利率上升期更有效;动量因子在趋势市更有效,震荡市失效。 部署因子前应评估当前宏观环境与因子最优生存环境的匹配度。
FILE:references/LOCKS.md
# Semantic Locks + Preconditions
## Semantic Locks (12)
### `SL-01` <sub>(on_violation: fatal)</sub>
Execute sell orders before buy orders in every trading cycle
### `SL-02` <sub>(on_violation: fatal)</sub>
Trading signals MUST use next-bar execution (no look-ahead)
### `SL-03` <sub>(on_violation: fatal)</sub>
Entity IDs MUST follow format entity_type_exchange_code
### `SL-04` <sub>(on_violation: fatal)</sub>
DataFrame index MUST be MultiIndex (entity_id, timestamp)
### `SL-05` <sub>(on_violation: fatal)</sub>
TradingSignal MUST have EXACTLY ONE of: position_pct, order_money, order_amount
### `SL-06` <sub>(on_violation: fatal)</sub>
filter_result column semantics: True=BUY, False=SELL, None/NaN=NO ACTION
### `SL-07` <sub>(on_violation: fatal)</sub>
Transformer MUST run BEFORE Accumulator in factor pipeline
### `SL-08` <sub>(on_violation: fatal)</sub>
MACD parameters locked: fast=12, slow=26, signal=9
### `SL-09` <sub>(on_violation: warning)</sub>
Default transaction costs: buy_cost=0.001, sell_cost=0.001, slippage=0.001
### `SL-10` <sub>(on_violation: fatal)</sub>
A-share equity trading is T+1 (no same-day close of buy positions)
### `SL-11` <sub>(on_violation: fatal)</sub>
Recorder subclass MUST define provider AND data_schema class attributes
### `SL-12` <sub>(on_violation: fatal)</sub>
Factor result_df MUST contain either 'filter_result' OR 'score_result' column
## Preconditions (4)
- **`PC-01`**: `python3 -c 'import zvt; print(zvt.__version__)'` → on_fail: Run: python3 -m pip install zvt then re-run: python3 -m zvt.init_dirs to initialize data directories
- **`PC-02`**: `python3 -c "from zvt.api.kdata import get_kdata; df = get_kdata(entity_ids=['stock_sh_600000'], limit=1); assert df is not None and len(df) > 0, 'No kdata found'"` → on_fail: Run recorder first: python3 -m zvt.recorders.em.em_stock_kdata_recorder --entity_ids stock_sh_600000 (replace with your target entity IDs)
- **`PC-03`**: `python3 -c "import os; from pathlib import Path; zvt_home = Path(os.environ.get('ZVT_HOME', Path.home() / '.zvt')); assert zvt_home.exists(), f'ZVT home not found: {zvt_home}'"` → on_fail: Run: python3 -m zvt.init_dirs
- **`PC-04`**: `python3 -c "import os, tempfile; from pathlib import Path; zvt_home = Path(os.environ.get('ZVT_HOME', Path.home() / '.zvt')); test_f = zvt_home / '.write_test'; test_f.touch(); test_f.unlink()"` → on_fail: Check directory permissions: chmod u+w ~/.zvt or set ZVT_HOME environment variable to a writable location
FILE:references/USE_CASES.md
# Known Use Cases (KUC)
Total: **4**
## `KUC-101`
**Source**: `easytrader/server.py`
Provides HTTP REST API endpoints for broker authentication and retrieving account balance information programmatically, enabling integration with external trading systems.
## `KUC-102`
**Source**: `tests/test_xqtrader.py`
Unit test that validates XueQiuTrader correctly handles account preparation with required parameters (cookies) and properly stores portfolio configuration.
## `KUC-103`
**Source**: `tests/test_easytrader.py`
Integration tests for YunHui (yh_client) broker trading operations including balance queries, today's trades/entrusts, and entrust cancellation functionality.
## `KUC-104`
**Source**: `tests/test_xq_follower.py`
Unit tests for XueQiuFollower that verify transaction projection and sell amount adjustment logic for portfolio mirroring operations.
FILE:references/WISDOM.md
# Cross-Project Wisdom
Total: **10**
## `CW-BT-001` — Cerebro 统一编排引擎
**From**: backtrader · **Applicable to**: backtesting
backtrader 用 Cerebro 作为单一入口,统一管理 data feeds、strategies、analyzers、 observers 的生命周期,支持一次 cerebro.run() 跑多策略+多数据源。 zvt 的 StockTrader 目前每次实例化只绑定一套因子,缺乏统一的多策略组合编排层; 借鉴 Cerebro 模式可让用户把多个 Trader 实例组合到一个 runner 中对比评估。
## `CW-BT-002` — Analyzer 插件化绩效评估
**From**: backtrader · **Applicable to**: backtesting
backtrader 提供 SharpeRatio、DrawDown、TimeReturn、TradeAnalyzer 等即插即用 的 Analyzer,可在不修改策略代码的情况下附加任意绩效指标。 zvt 当前绩效评估能力较弱,没有标准化的 Analyzer 接口; 借鉴此模式可让用户 cerebro.addanalyzer(SharpeRatio) 即得风险调整收益报告。
## `CW-BT-003` — Sizer 仓位管理分离
**From**: backtrader · **Applicable to**: backtesting
backtrader 将仓位管理(每次开仓买多少股/多大比例)单独抽象为 Sizer, 与信号逻辑完全解耦;内置 FixedSize、PercentSizer 等,用户可自定义。 zvt 目前没有显式的 Sizer 概念,仓位控制逻辑散落在 Trader.on_profit_control 等钩子中; 引入 Sizer 接口可使策略信号与资金管理规则独立演化和组合复用。
## `CW-BT-004` — Order 类型全集(Limit/Stop/OCO/Bracket)
**From**: backtrader · **Applicable to**: backtesting
backtrader 支持 Market、Limit、Stop、StopLimit、OCO(二选一)、 Bracket(止盈止损一对订单)等丰富订单类型,并模拟成交滑点和手续费方案。 zvt 回测目前主要支持市价成交,缺乏限价委托和组合订单模拟; 对于高频或实盘对接场景,完善订单类型将大幅提升回测真实性。
## `CW-BT-005` — 数据重采样与重播(Resampling & Replaying)
**From**: backtrader · **Applicable to**: backtesting
backtrader 可将低级别数据(如 1 min)实时 resample 为高级别(如 1 day)并同步驱动策略, 或 replay 逐 tick 模拟 OHLC 形成过程,实现日内精细回测。 zvt 目前多时间框架通过预录入不同级别 K 线实现,缺少运行时动态重采样; 借鉴此模式可在不重复录入数据的前提下支持任意时间粒度组合回测。
## `CW-VN-003` — CTA 回测引擎内置可视化
**From**: vnpy · **Applicable to**: backtesting
vnpy 的 cta_backtester 提供图形界面直接展示策略净值曲线、最大回撤、 每日盈亏、成交明细,无需 Jupyter Notebook。 zvt 目前回测结果可视化依赖 draw_result 方法调用 Plotly,但无统一的回测报告页面; 借鉴此模式可打包一个开箱即用的策略绩效仪表盘。
## `CW-VN-004` — vnpy.alpha ML 因子研究实验室(Lab)
**From**: vnpy · **Applicable to**: factor-research
vnpy 4.0 的 vnpy.alpha.lab 提供数据管理、模型训练、信号生成、策略回测一体化工作流, 支持 Lasso/LightGBM/MLP 等算法的标准化训练接口和可视化对比。 zvt 的 ML 能力目前仅有 MaStockMLMachine 一个入口,缺乏规范化 Lab 框架; 借鉴 Lab 模式可建立"特征工程→训练→信号→回测"的标准流水线,降低 ML 实验门槛。
## `CW-QL-001` — Point-in-Time 数据库(防未来数据泄漏)
**From**: qlib · **Applicable to**: backtesting
qlib 的 Point-in-Time Provider 保证在给定时间点 t 的查询只返回 t 时刻 真实可知的数据(财报发布延迟、修订历史均被正确处理), 彻底消除回测中的 look-ahead bias。 zvt 目前财务数据以报告期为 timestamp,缺少"发布日"维度, 存在用未来财报数据做选股的潜在偏差;引入 PIT 模式可大幅提升回测可信度。
## `CW-QL-002` — Recorder + Experiment 实验管理(MLflow 风格)
**From**: qlib · **Applicable to**: factor-research
qlib 的 workflow 模块提供 Experiment/Recorder,自动记录每次模型训练的 超参数、特征、指标、预测结果,支持跨实验比较和模型版本管理。 zvt 目前缺乏 ML 实验追踪机制,每次重跑结果会覆盖前次; 借鉴 Recorder 模式可将每次因子实验的参数和结果持久化,支持快速复现和版本对比。
## `CW-QL-003` — Nested Decision Framework(多层嵌套决策执行)
**From**: qlib · **Applicable to**: backtesting
qlib 支持将高频执行层(分钟级委托拆单)嵌套在低频决策层(日级组合调仓)中, 两层独立优化且可组合运行,实现日内最优执行算法(如 TWAP、VWAP 调仓)。 zvt 目前回测仅有日线级别的成交假设,缺乏执行算法建模; 借鉴嵌套框架可让 zvt 区分"何时持有哪些股"与"如何以最小冲击成本建仓"两个问题。
FILE:references/components/account_query.md
# account_query (6 classes)
## `ClientTrader.balance`
`account_query/clienttrader-balance.py:0`
## `ClientTrader.position`
`account_query/clienttrader-position.py:0`
## `ClientTrader.entrust`
`account_query/clienttrader-entrust.py:0`
## `XueQiuTrader.balance`
`account_query/xueqiutrader-balance.py:0`
## `refresh_strategy`
`account_query/refresh-strategy.py:0`
## `grid_strategy`
`account_query/grid-strategy.py:0`
FILE:references/components/authentication_-_connection.md
# authentication_&_connection (5 classes)
## `BaseLoginClientTrader.prepare`
`authentication_&_connection/baseloginclienttrader-prepare.py:0`
## `BaseLoginClientTrader.connect`
`authentication_&_connection/baseloginclienttrader-connect.py:0`
## `WebTrader.check_login`
`authentication_&_connection/webtrader-check-login.py:0`
## `MiniqmtTrader.connect`
`authentication_&_connection/miniqmttrader-connect.py:0`
## `login_implementation`
`authentication_&_connection/login-implementation.py:0`
FILE:references/components/order_execution.md
# order_execution (6 classes)
## `ClientTrader.buy`
`order_execution/clienttrader-buy.py:0`
## `ClientTrader.sell`
`order_execution/clienttrader-sell.py:0`
## `TradePopDialogHandler.handle`
`order_execution/tradepopdialoghandler-handle.py:0`
## `XueQiuTrader.rebalance`
`order_execution/xueqiutrader-rebalance.py:0`
## `entrust_prop`
`order_execution/entrust-prop.py:0`
## `adjust_sell`
`order_execution/adjust-sell.py:0`
FILE:references/components/remote_service_layer.md
# remote_service_layer (5 classes)
## `server.run`
`remote_service_layer/server-run.py:0`
## `RemoteClient.buy`
`remote_service_layer/remoteclient-buy.py:0`
## `RemoteClient.sell`
`remote_service_layer/remoteclient-sell.py:0`
## `RemoteClient.balance`
`remote_service_layer/remoteclient-balance.py:0`
## `ssl`
`remote_service_layer/ssl.py:0`
FILE:references/components/trade_following.md
# trade_following (6 classes)
## `BaseFollower.follow`
`trade_following/basefollower-follow.py:0`
## `JoinQuantFollower.login`
`trade_following/joinquantfollower-login.py:0`
## `RiceQuantFollower.login`
`trade_following/ricequantfollower-login.py:0`
## `XueQiuFollower.follow`
`trade_following/xueqiufollower-follow.py:0`
## `cmd_cache`
`trade_following/cmd-cache.py:0`
## `platform`
`trade_following/platform.py:0`
为 VAlpha 量化终端用户提供 A 股市场数据获取、多数据源自动切换与熔断保护,支持 Tushare/Akshare 链路 fallback,并根据积分额度自动配置请求频率限制。
---
name: eastmoney-api
description: |-
为 VAlpha 量化终端用户提供 A 股市场数据获取、多数据源自动切换与熔断保护,支持 Tushare/Akshare 链路 fallback,并根据积分额度自动配置请求频率限制。
license: Proprietary. See LICENSE.txt in project root.
compatibility: Designed for Doramagic-host ecosystem (Claude Code / openclaw / Cursor). Requires Python 3.12+ with uv package manager.
metadata:
version: "v6.1"
blueprint_id: "finance-bp-084"
compiled_at: "2026-04-22T13:00:34.071788+00:00"
capability_markets: "cn-astock"
capability_activities: "data-sourcing"
sop_version: "crystal-compilation-v6.1"
---
# 东方财富接口 (eastmoney-api)
> 为 VAlpha 量化终端用户提供 A 股市场数据获取、多数据源自动切换与熔断保护,支持 Tushare/Akshare 链路 fallback,并根据积分额度自动配置请求频率限制。
## Pipeline
`data_collection -> data_storage -> factor_computation -> target_selection -> trading_execution -> visualization`
## Top Use Cases (26 total)
### VAlpha Terminal Entry Point (`UC-101`)
Provides unified entry point for starting FastAPI server or running pre/post-market analysis
**Triggers**: start, server, run
### FastAPI Application Factory (`UC-102`)
Creates and configures FastAPI application instance with CORS, routers, and lifespan management
**Triggers**: application, fastapi, server
### Static File Serving and SPA Routing (`UC-103`)
Serves frontend static files and implements SPA catch-each routing for client-side navigation
**Triggers**: static, frontend, spa
For all **26** use cases, see [references/USE_CASES.md](references/USE_CASES.md).
**Execute trigger**: `When user intent matches intent_router.uc_entries[].positive_terms AND user uses action verb (run/execute/跑/执行/backtest/fetch/collect)`
## What I'll Ask You
- Target market: A-share (default), HK, or crypto? (US stocks in ZVT are half-baked — stockus_nasdaq_AAPL exists but coverage is thin)
- Data source / provider: eastmoney (free, no account), joinquant (account+paid), baostock (free, good history), akshare, or qmt (broker)?
- Strategy type: MACD golden-cross, MA crossover, volume breakout, fundamental screen, or custom factor?
- Time range: start_timestamp and end_timestamp for backtest period
- Target entity IDs: specific stocks (stock_sh_600000) or index components (SZ1000)?
## Semantic Locks (Fatal)
| ID | Rule | On Violation |
|---|---|---|
| `SL-01` | Execute sell orders before buy orders in every trading cycle | halt |
| `SL-02` | Trading signals MUST use next-bar execution (no look-ahead) | halt |
| `SL-03` | Entity IDs MUST follow format entity_type_exchange_code | halt |
| `SL-04` | DataFrame index MUST be MultiIndex (entity_id, timestamp) | halt |
| `SL-05` | TradingSignal MUST have EXACTLY ONE of: position_pct, order_money, order_amount | halt |
| `SL-06` | filter_result column semantics: True=BUY, False=SELL, None/NaN=NO ACTION | halt |
| `SL-07` | Transformer MUST run BEFORE Accumulator in factor pipeline | halt |
| `SL-08` | MACD parameters locked: fast=12, slow=26, signal=9 | halt |
Full lock definitions: [references/LOCKS.md](references/LOCKS.md)
## Top Anti-Patterns (14 total)
- **`AP-DATA-SOURCING-001`**: Missing or invalid User-Agent headers for SEC API requests
- **`AP-DATA-SOURCING-002`**: Ignoring external API rate limits causing IP blocking
- **`AP-DATA-SOURCING-003`**: No HTTP timeout configuration causing indefinite hangs
All 14 anti-patterns: [references/ANTI_PATTERNS.md](references/ANTI_PATTERNS.md)
## Evidence Quality Notice
> [QUALITY NOTICE] This crystal was compiled from blueprint finance-bp-084. Evidence verify ratio = 36.8% and audit fail total = 26. Generated results may have uncaptured requirement gaps. Verify critical decisions against source files (LATEST.yaml / LATEST.jsonl).
## Reference Files
| File | Contents | When to Load |
|---|---|---|
| [references/seed.yaml](references/seed.yaml) | V6+ 全量权威 (source-of-truth) | 有行为/决策争议时必读 |
| [references/ANTI_PATTERNS.md](references/ANTI_PATTERNS.md) | 14 条跨项目反模式 | 开始实现前 |
| [references/WISDOM.md](references/WISDOM.md) | 跨项目精华借鉴 | 架构决策时 |
| [references/CONSTRAINTS.md](references/CONSTRAINTS.md) | domain + fatal 约束 | 规则冲突时 |
| [references/USE_CASES.md](references/USE_CASES.md) | 全量 KUC-* 业务场景 | 需要完整示例时 |
| [references/LOCKS.md](references/LOCKS.md) | SL-* + preconditions + hints | 生成回测/交易代码前 |
| [references/COMPONENTS.md](references/COMPONENTS.md) | AST 组件地图(按 module 拆分)| 查 API 时 |
---
*Compiled by Doramagic crystal-compilation-v6.1 from `finance-bp-084` blueprint at 2026-04-22T13:00:34.071788+00:00.*
*See [human_summary.md](human_summary.md) for non-technical overview.*
FILE:human_summary.md
# Human Summary
> I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow. Just tell me what you want; I'll write the code, you don't have to dig docs. (Heads up: ZVT natively supports A-share, HK, and crypto. US stocks — stockus_nasdaq_AAPL — are half-baked; don't bother for serious work.)
## What I Can Do
- **tagline**: I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow. Just tell me what you want; I'll write the code, you don't have to dig docs. (Heads up: ZVT natively supports A-share, HK, and crypto. US stocks — stockus_nasdaq_AAPL — are half-baked; don't bother for serious work.)
- **use_cases**: ['Static File Serving and SPA Routing', 'FastAPI Application Factory', 'VAlpha Terminal Entry Point', 'A-share MACD daily golden-cross backtest with hfq price adjustment from eastmoney', 'End-to-end ZVT pipeline: FinanceRecorder + GoodCompanyFactor + StockTrader', 'Multi-factor strategy with TargetSelector (AND mode) combining MACD + volume breakout', 'Index composition data collection (SZ1000, SZ2000) with EM recorder']
## What I Ask You
- Target market: A-share (default), HK, or crypto? (US stocks in ZVT are half-baked — stockus_nasdaq_AAPL exists but coverage is thin)
- Data source / provider: eastmoney (free, no account), joinquant (account+paid), baostock (free, good history), akshare, or qmt (broker)?
- Strategy type: MACD golden-cross, MA crossover, volume breakout, fundamental screen, or custom factor?
- Time range: start_timestamp and end_timestamp for backtest period
- Target entity IDs: specific stocks (stock_sh_600000) or index components (SZ1000)?
FILE:references/ANTI_PATTERNS.md
# Anti-Patterns (Cross-Project)
Total: **14**
## finance-bp-070--edgartools (2)
### `AP-DATA-SOURCING-004` — Invalidating XBRL period types for balance sheet analysis <sub>(high)</sub>
Balance sheets represent point-in-time snapshots (instant periods), not ranges (duration periods). Using duration periods for balance sheet statements causes stockholder equity and other line items to show nonsensical date ranges, corrupting financial calculations that depend on accurate period associations.
### `AP-DATA-SOURCING-012` — Large document parsing without streaming causing OOM errors <sub>(high)</sub>
SEC filings can exceed 160MB, and parsing large documents in memory without streaming causes OOM errors that crash the entire service for all users. Documents exceeding 10MB require switching to streaming parsers to prevent extreme memory usage.
## finance-bp-070--edgartools, finance-bp-079--akshare, finance-bp-084--eastmoney, finance-bp-114--edgar-crawler (1)
### `AP-DATA-SOURCING-002` — Ignoring external API rate limits causing IP blocking <sub>(high)</sub>
Multiple financial data sources (SEC EDGAR, Sina, Eastmoney, TuShare) enforce strict rate limits (10 req/sec, 120 calls/minute). Exceeding these triggers temporary IP blocks lasting 10-60 minutes, causing complete data unavailability. Immediate retry attempts during blocks extend the block duration significantly.
## finance-bp-070--edgartools, finance-bp-114--edgar-crawler (1)
### `AP-DATA-SOURCING-001` — Missing or invalid User-Agent headers for SEC API requests <sub>(high)</sub>
SEC EDGAR requires valid User-Agent identity with contact information in headers. Without this, requests are rejected with 403 Forbidden errors, completely blocking all filing access. Both edgartools and edgar-crawler enforce this constraint as fundamental to any data retrieval operation.
## finance-bp-079--akshare (4)
### `AP-DATA-SOURCING-003` — No HTTP timeout configuration causing indefinite hangs <sub>(high)</sub>
HTTP requests to external financial data sources (Yahoo, Sina, Eastmoney) without timeout values can hang indefinitely on blocked connections. This freezes the entire application and prevents data collection from all other sources, creating cascading failures across the system.
### `AP-DATA-SOURCING-005` — Malformed or empty JSON responses causing silent failures <sub>(medium)</sub>
Financial API responses containing malformed JSON raise unhandled ValueError exceptions, crashing downstream processing. Similarly, empty JSON responses (empty dict, list, null) masquerading as valid data cause silent failures producing empty DataFrames or misleading results in financial analysis.
### `AP-DATA-SOURCING-006` — Source-specific symbol mapping errors causing data corruption <sub>(high)</sub>
Stock symbols require source-specific formatting (sh/sz prefixes for Sina, numeric codes for THS, etc.). Incorrect symbol mapping causes API calls to return empty results or wrong data, corrupting financial datasets with missing records or entirely incorrect tickers being stored.
### `AP-DATA-SOURCING-013` — Column mapping length mismatch causing DataFrame errors <sub>(medium)</sub>
Column mapping constants with length mismatch against actual API response columns cause ValueError exceptions during DataFrame construction. Raw field names (f1, f2, f12) must be mapped to meaningful names (最新价, 涨跌幅) with exact column count alignment.
## finance-bp-103--ArcticDB (3)
### `AP-DATA-SOURCING-007` — Using unsupported DataFrame types with time-series storage <sub>(high)</sub>
ArcticDB does not support MultiIndex columns, PyArrow-backed pandas DataFrames, or timedelta64 columns. Attempting to write these DataFrame types raises ArcticDbNotYetImplemented exceptions, causing write failures and permanent data loss if not properly handled before storage operations.
### `AP-DATA-SOURCING-008` — Non-atomic storage writes causing concurrent access corruption <sub>(high)</sub>
Storage backends without atomic write_if_none operations can cause data corruption under concurrent multi-writer access. Similarly, updating reference keys before atom keys complete allows readers to access incomplete or missing data, breaking version chain integrity.
### `AP-DATA-SOURCING-014` — Pruning snapshot-protected versions breaking point-in-time recovery <sub>(high)</sub>
Deleting or pruning versions that are referenced by existing snapshots breaks historical data access. Snapshots provide point-in-time recovery capabilities, and removing their referenced versions causes read failures when users attempt to access data from specific snapshots.
## finance-bp-114--edgar-crawler (1)
### `AP-DATA-SOURCING-010` — 8-K filing item numbering scheme mismatch for historical filings <sub>(medium)</sub>
8-K filings use obsolete item numbering (1-12) before 2004-08-23 and new numbering (1.01-9.01) after. Using the wrong numbering scheme causes no matches for historical filings, resulting in empty item sections and complete extraction failure for pre-2004 data.
## finance-bp-128--yfinance (2)
### `AP-DATA-SOURCING-009` — Missing timezone-aware DatetimeIndex causing DST offset errors <sub>(high)</sub>
Price history DataFrames returned without timezone-aware DatetimeIndex cause incorrect timestamp interpretation when combined with other timezone-aware data. This leads to 23-25 hour offset errors during daylight saving time transitions, corrupting historical price calculations.
### `AP-DATA-SOURCING-011` — Yahoo Finance missing crumb authentication causing 401/403 errors <sub>(high)</sub>
Yahoo Finance API requires crumb and cookie authentication with every request. Without proper crumb management, API calls return 401 Unauthorized or HTML error pages instead of JSON data, breaking all downstream price and financial data processing.
FILE:references/COMPONENTS.md
# Component Capability Map
**Project**: finance-bp-084--eastmoney
**Scan date**: 2026-04-22
**Stats**: {'total_files': 7, 'total_classes': 37, 'total_functions': 0, 'total_stages': 7}
## Modules (7)
- [data_collection](components/data_collection.md): 5 classes
- [factor_computation](components/factor_computation.md): 5 classes
- [recommendation_engine](components/recommendation_engine.md): 7 classes
- [analysis_&_reporting](components/analysis_-_reporting.md): 5 classes
- [portfolio_management](components/portfolio_management.md): 5 classes
- [scheduled_tasks](components/scheduled_tasks.md): 4 classes
- [llm_services](components/llm_services.md): 6 classes
FILE:references/CONSTRAINTS.md
# Constraints
## preservation_manifest
```yaml
required_objects:
business_decisions_count: 229
fatal_constraints_count: 29
non_fatal_constraints_count: 254
use_cases_count: 26
semantic_locks_count: 12
preconditions_count: 4
evidence_quality_rules_count: 2
traceback_scenarios_count: 5
```
## Domain Constraints Injected (47)
- **`SHARED-CN-ASTOCK-T1-001`** <sub>(fatal)</sub>: A 股股票实行 T+1 交收制度:T 日买入的股票最早 T+1 日方可卖出。 T 日卖出所得资金可当日再用于买入。回测框架若未施加 T+1 持仓锁定, 将高估换手率与策略胜率,尤其损害日内反转类策略的真实性。
- **`SHARED-CN-ASTOCK-T1-002`** <sub>(fatal)</sub>: 沪深主板股票日涨跌幅上限为 ±10%(ST/SST 股票 ±5%)。 涨停封板时买方消失、跌停封板时卖方消失;回测若假设当日可以任意价格 成交,会系统性高估可执行性。封板检测应在成交模拟层强制实施。
- **`SHARED-CN-ASTOCK-T1-003`** <sub>(high)</sub>: 科创板和创业板(2020年8月改革后)正常交易日涨跌幅为 ±20%; 北交所 ±30%;新股上市后前5个交易日不设涨跌幅限制。 回测若对所有股票统一套用 ±10% 过滤逻辑,会错误剔除或错误包含这些板块的成交。
- **`SHARED-CN-ASTOCK-T1-004`** <sub>(high)</sub>: ST/*ST 股票日涨跌幅限制为 ±5%,流动性极差,成交假设不可与正常股票混用。 包含历史 ST 股票(最终退市)但不纳入回测会产生幸存者偏差; 纳入回测但不区分 ST 涨跌幅会错误模拟成交。
- **`SHARED-CN-ASTOCK-T1-005`** <sub>(medium)</sub>: A 股开盘集合竞价(9:15-9:25)和收盘集合竞价(14:57-15:00)期间, 成交价由"最大成交量原则"确定,非即时撮合。回测以开盘价或收盘价假设 即时全量成交会低估实际滑点风险,大单策略尤为明显。
- **`SHARED-CN-ASTOCK-T1-006`** <sub>(high)</sub>: 停牌制度:A 股长期停牌(2018年前可长达数月)期间,持仓资金被锁定, 无法再平衡,机会成本在回测中普遍被忽略。应在因子计算前过滤停牌日 (volume == 0 或 is_suspended == True),停牌期间不发出信号。
- **`SHARED-CN-ASTOCK-T1-007`** <sub>(high)</sub>: 新股上市后前5个交易日无涨跌幅限制(首日涨幅可超300%), 且无完整历史数据(均线/波动率/换手率因子无法计算)。 应在因子计算前过滤上市不足 N 个交易日(通常 60-252 日)的股票。
- **`SHARED-CN-ASTOCK-T1-008`** <sub>(high)</sub>: A 股程序化交易监管新规(2025年7月7日施行):单账户每秒申报/撤单 ≥ 300 笔, 或单日申报/撤单 ≥ 20000 笔,被认定为高频交易,须向交易所报备。 AI 生成的量化策略若频率超标则无法合规运行,应在策略设计期提示。
- **`SHARED-CN-ASTOCK-ADJ-001`** <sub>(fatal)</sub>: 除权除息日股价跳空是账面调整而非真实亏损。复权选择: 不复权会虚增策略亏损;前复权会将历史价格内嵌未来分红信息(lookahead bias); 后复权以上市首日为基准累积,是量化回测的最优选择。
- **`SHARED-CN-ASTOCK-ADJ-002`** <sub>(fatal)</sub>: A 股上市公司财务报告披露有法定延迟:年报在次年4月30日前、 半年报在8月31日前、季报分别在4月30日(一季)/10月31日(三季)前披露。 回测中使用财务数据时,必须以实际披露日期(announcement_date)而非 会计期间结束日作为数据可用时间点,否则引入 point-in-time lookahead bias。
- **`SHARED-CN-ASTOCK-ADJ-003`** <sub>(high)</sub>: 分红送股转增和配股会导致除权除息日后股本增加,历史持股数量不变但股价等比 缩水,若回测系统未同步调整持仓股数,会在除权日产生虚假亏损或盈利。
- **`SHARED-CN-ASTOCK-ADJ-004`** <sub>(medium)</sub>: 大宗交易与竞价交易价差:大宗交易成交价可比市价折价最多 10%(主板), 但此价格不影响次日竞价开盘。大宗交易数据出现在收盘后,若将其混入 日内 OHLCV 数据,会污染收盘价和成交量的正常计算。
- **`SHARED-CN-ASTOCK-ADJ-005`** <sub>(fatal)</sub>: 融资融券(两融)做空限制:A 股散户无法直接卖空,融券标的池有限(主要为 大盘蓝筹,中小盘融券极度稀缺),融券利率远高于融资利率。 回测若直接假设可做空任意股票,会产生不可执行的策略,实盘与回测严重背离。
- **`SHARED-CN-ASTOCK-FX-001`** <sub>(high)</sub>: 通过沪深港通(北向)买入股票,境外投资者合计持股上限 30%,预警线 28%。 当外资持股比例达 28% 时,联交所暂停该股新增买盘,直到降至 26% 才恢复。 策略若重仓外资偏好股(消费/医药龙头),需监控外资持股比例。
- **`SHARED-CN-ASTOCK-FX-002`** <sub>(high)</sub>: 5% 举牌规则:单一投资者持有上市公司已发行股份超过 5%,须在3日内向证监会 和交易所报告并公告;在此期间及公告后2日内不得再买卖。 量化选股系统若不考虑此规则,重仓股超过 5% 阈值后将面临强制停止买入。
- **`SHARED-CN-ASTOCK-FX-003`** <sub>(high)</sub>: 公募基金"双十原则":单基金持有单只股票不超过净资产 10%, 同一基金管理人旗下所有基金合计不超过该公司已发行股份 10%。 量化选股组合若部署于公募基金,需在优化约束中强制加入合规上限。
- **`SHARED-CN-ASTOCK-FX-004`** <sub>(fatal)</sub>: 内幕交易边界:AI 辅助量化系统的所有输入数据必须来自公开已披露信息。 通过非公开渠道(私有数据服务/内部消息/重组前预知)触发的自动化交易 构成内幕交易,适用《证券法》第80-83条及《内幕交易行为认定指引》。
- **`SHARED-CN-ASTOCK-MKT-001`** <sub>(fatal)</sub>: 幸存者偏差:使用当前 A 股成分股(如当前沪深300)作为历史回测股票池, 会遗漏曾被纳入指数但因业绩差被调出或退市的股票。2020-2024年 A 股 退市数量加速(41家/年创纪录),此偏差日趋严重。必须使用历史时点快照。
- **`SHARED-CN-ASTOCK-MKT-002`** <sub>(medium)</sub>: 指数成分股调整效应:沪深300/中证500等每半年调整一次(6月/12月), 被纳入股票通常在公告日至生效日之间显著上涨(被动资金被动买入), 被剔除股票则相反。回测股票池应使用历史成分股快照,并标注调整窗口期。
- **`SHARED-CN-ASTOCK-MKT-003`** <sub>(high)</sub>: 策略拥挤(Strategy Crowding):大量量化私募使用相似因子模型时, 持仓高度重叠,遇市场冲击时集体卖出形成踩踏。2024年2月 A 股量化危机 是典型案例(小盘股指数单日跌幅超 10%)。需监控因子多头持仓与 主流量化基金的重叠率。
- **`SHARED-CN-ASTOCK-MKT-004`** <sub>(high)</sub>: A 股量化对冲策略常用 IF/IC/IM 股指期货做多/空对冲系统性风险。 但 A 股股指期货长期处于贴水(远期价格 < 现货),IC 年化贴水可达 10-20%。 回测若仅考虑价格收益而忽略期货贴水/升水,会严重高估对冲策略净收益。
- **`SHARED-CN-ASTOCK-MKT-005`** <sub>(high)</sub>: A 股月度动量因子在方向上与美股相反:近1个月表现最好的股票, 下1个月大概率反转(反转效应而非动量)。机构研究(华泰/东吴证券) 与学术论文均验证:直接套用美股月度动量因子在 A 股会产生系统性亏损。
- **`SHARED-CN-ASTOCK-BF-001`** <sub>(medium)</sub>: 处置效应(Shefrin & Statman 1985)在 A 股散户中尤为显著: 投资者倾向于过早卖出盈利股票、过长持有亏损股票。上交所实证研究证实 超过 90% 的个人账户存在此效应,AI 辅助工具不应迁就"持有亏损等解套" 的直觉,而应基于量化信号提供纪律性止损止盈建议。
- **`SHARED-CN-ASTOCK-BF-002`** <sub>(medium)</sub>: A 股以散户为主(个人账户交易量占比超 80%),羊群效应显著:散户倾向于 跟风操作,导致价格非理性波动(如 2015年杠杆牛熊)。量化策略应避免 使用成交量排行/热度排行等可能强化羊群信号的指标作为主要因子。
- **`SHARED-CN-ASTOCK-BF-003`** <sub>(medium)</sub>: 过度自信效应(Barber & Odean 2000)在 A 股散户中更严重:散户年均换手率 超 500%,机构长期收益显著优于散户。高换手率策略经交易成本后净收益往往 更低。AI 不应鼓励"频繁操作",而应推荐低频高质信号驱动交易。
- **`SHARED-CN-ASTOCK-BF-004`** <sub>(medium)</sub>: A 股日历效应:春节效应(节前5日和节后1-3日倾向上涨)、月初效应 (月初第1-5个交易日表现优于月中/月末)已有学术实证(南京财经大学等)。 策略应在日历特殊窗口降低信号置信度,或单独评估日历驱动收益的贡献。
- **`SHARED-CN-ASTOCK-BF-005`** <sub>(high)</sub>: 策略容量(Capacity)限制:A 股小盘/微盘股日均成交额仅数百万, 大资金买入/卖出会造成严重价格冲击,策略实际容量可能仅几千万元。 回测结果不可外推至亿级资金,应在回测中加入成交量比例上限约束。
- **`SHARED-CN-ASTOCK-COST-001`** <sub>(fatal)</sub>: A 股完整交易成本结构(2023年8月调整后):印花税卖出单向 0.05%; 佣金双向约 0.01%(最低5元);过户费(沪市)0.001%; 滑点/冲击成本小盘股 0.1%-0.5%/次。忽略成本的回测策略年化收益率 具有欺骗性,高频/高换手策略尤甚。
- **`SHARED-CN-ASTOCK-COST-002`** <sub>(high)</sub>: 市场冲击成本(Market Impact)在回测中通常完全缺失,但在实盘中可能是 最大成本来源。A 股小盘股 100 万元买入可能推高 1% 以上。冲击成本与 成交规模呈幂律而非线性关系,应使用 Almgren-Chriss 模型或简化版估算。
- **`SHARED-CN-ASTOCK-COST-003`** <sub>(medium)</sub>: 大股东/董监高减持新规(证监会第224号令,2024年5月):持股5%以上大股东 通过集中竞价减持须提前15个交易日披露减持计划,3个月内不超过股份总数1%。 解禁股减持压力是 A 股特有的系统性风险因子,回测中忽略解禁日历会低估 相关股票的持股风险。
- **`SHARED-CN-ASTOCK-DATA-001`** <sub>(high)</sub>: A 股交易日历与自然日历不一致:存在法定节假日调休导致的"补班日"(周六上班), 以及临时停市(2015年7月8日至7月10日因股灾紧急停市)。 使用通用工作日历(weekdays)推算 A 股交易日会产生偏差, 必须使用 A 股专用交易日历(如 exchange_calendars 或 tushare 的交易日接口)。
- **`SHARED-CN-ASTOCK-DATA-002`** <sub>(medium)</sub>: A 股退市后股票代码可能被新股重用(极少见但存在)。使用纯代码(如 '000001') 作为历史数据主键而不包含交易所后缀('.SZ')或上市日期范围,可能导致 历史数据与当前股票的错误混淆,长周期回测中需特别注意。
- **`SHARED-DS-RL-001`** <sub>(fatal)</sub>: Rate Limit + 指数退避重试:所有外部数据 API 调用必须实施速率限制控制 和指数退避重试(Exponential Backoff with Jitter)。收到 429/503 响应后 立即重试是反模式,会加剧服务端压力并触发 IP 封禁。 最大重试次数 3-5 次,退避基数 1-2 秒,最大退避 60 秒。
- **`SHARED-DS-RL-002`** <sub>(high)</sub>: 批量 API 调用必须控制并发数(max_workers),不可无限制并行。 免费 API(akshare/tushare 免费版)通常限制为 1-3 并发; 付费 API 也有并发上限(tushare 积分制,不同积分对应不同并发)。 超出并发限制会触发 429 或 IP 封禁。推荐使用 asyncio.Semaphore 或 ThreadPoolExecutor 的 max_workers 参数显式控制。
- **`SHARED-DS-RL-003`** <sub>(high)</sub>: API Token / 凭证安全:数据源 API key(tushare token / akshare 无需 token 但 其他商业数据源需要)不可硬编码在代码中,必须通过环境变量或配置文件读取。 硬编码 token 提交到 Git 会导致 token 泄露和费用损失。
- **`SHARED-DS-RL-004`** <sub>(medium)</sub>: 请求节流(Throttling):对同一 API 的批量请求应在请求间插入最小间隔 (akshare 部分接口要求 ≥ 0.5s;tushare 免费版每分钟 200 次)。 纯代码 sleep 不如令牌桶(Token Bucket)算法精确,推荐使用 ratelimit 或 slowapi 等成熟库。
- **`SHARED-DS-MISS-001`** <sub>(high)</sub>: 停牌日数据缺失策略:停牌股票在停牌期间无成交数据,数据库中会出现日期缺口。 缺失日期不可使用 forward-fill(会产生虚假成交量); 应在数据库中以 is_suspended=True 标记,量和成交额填 0,价格保留前一日收盘价。 因子计算时必须过滤 is_suspended=True 的行。
- **`SHARED-DS-MISS-002`** <sub>(medium)</sub>: 新上市股票的历史数据边界:新股上市首日开始在数据库中出现,但其上市前 无历史数据。若因子计算的 lookback 期超过上市天数,会产生所有 NaN 因子值。 采集时应记录每只股票的上市日期(list_date),采集逻辑应以上市日期为起点, 不以固定开始日期。
- **`SHARED-DS-MISS-003`** <sub>(high)</sub>: 退市股票的数据完整性:已退市股票在主流数据源(akshare/tushare)中依然 可以查询历史数据(退市前的历史),但退市日期后无数据。 历史股票池构建时必须包含已退市股票(否则幸存者偏差), 且采集时需明确处理退市日截止边界。
- **`SHARED-DS-MISS-004`** <sub>(high)</sub>: 多数据源数据对账(Cross-Source Reconciliation):同一数据(如收盘价) 从不同数据源(akshare/tushare/baostock)获取可能存在细微差异 (不同复权方式/不同节假日处理/除息调整时间不同)。 应在 pipeline 中实施多源对账检查,差异超阈值(如 0.1%)时记录告警并人工确认。
- **`SHARED-DS-TIME-001`** <sub>(high)</sub>: 时间戳精度与类型一致性:数据库中时间戳应使用统一的数据类型 (timestamp 而非 varchar/int)。混用字符串日期('2024-01-15')和 Timestamp 对象是比较、索引、merge 出现细微 bug 的常见来源, 应在 pipeline 入口处强制转换。
- **`SHARED-DS-TIME-002`** <sub>(high)</sub>: 交易时间与自然时间的区分:日线数据的"日期"通常对应交易日(T日), 而新闻/公告数据的"时间"是自然时间。合并两类数据时,必须将自然时间 映射到下一个可用交易日(next available trading day), 否则会产生"公告在T日,但T日盘中已经可用"的 lookahead 问题。
- **`SHARED-DS-INCR-001`** <sub>(high)</sub>: 增量更新幂等性:数据更新脚本必须是幂等的(多次运行结果相同)。 若脚本因网络中断在中途失败,重新运行时不应产生重复数据或数据缺口。 实现方式:先写入临时表,校验后 UPSERT 到主表,不直接 INSERT/APPEND。
- **`SHARED-DS-INCR-002`** <sub>(high)</sub>: 数据完整性检验(数据校验和/行数检查):每次数据更新后, 应对关键字段做完整性检验:行数是否在预期范围内、价格是否为正数、 日期是否连续(无缺失交易日)。缺少自动校验的数据管道是"沉默腐烂"的根源。
- **`SHARED-DS-INCR-003`** <sub>(medium)</sub>: 数据版本化:数据管道的输出数据应版本化管理(data versioning)。 当数据源更新了历史数据(如修订调整后的财务数据), 旧版本数据应保留可追溯,不应静默覆盖,以便对比版本间差异及复现历史回测。
- **`SHARED-DS-INCR-004`** <sub>(medium)</sub>: 数据对齐到交易日历边界:采集完成后,应验证所有股票/资产的数据覆盖 完整性与交易日历的一致性。每只股票在每个交易日都应有一行数据 (停牌标记,不是缺失)。通过 pivot_table 检查 NaN 比例是有效的快速诊断手段。
- **`SHARED-DS-INCR-005`** <sub>(medium)</sub>: 缓存策略(Caching):频繁读取的静态/低频更新数据(如股票信息、行业分类、 指数成分股)应本地缓存,避免每次运行重复 API 调用。 缓存必须设置过期时间(TTL),防止使用过期的行业分类或已失效的成分股信息。
FILE:references/LOCKS.md
# Semantic Locks + Preconditions
## Semantic Locks (12)
### `SL-01` <sub>(on_violation: fatal)</sub>
Execute sell orders before buy orders in every trading cycle
### `SL-02` <sub>(on_violation: fatal)</sub>
Trading signals MUST use next-bar execution (no look-ahead)
### `SL-03` <sub>(on_violation: fatal)</sub>
Entity IDs MUST follow format entity_type_exchange_code
### `SL-04` <sub>(on_violation: fatal)</sub>
DataFrame index MUST be MultiIndex (entity_id, timestamp)
### `SL-05` <sub>(on_violation: fatal)</sub>
TradingSignal MUST have EXACTLY ONE of: position_pct, order_money, order_amount
### `SL-06` <sub>(on_violation: fatal)</sub>
filter_result column semantics: True=BUY, False=SELL, None/NaN=NO ACTION
### `SL-07` <sub>(on_violation: fatal)</sub>
Transformer MUST run BEFORE Accumulator in factor pipeline
### `SL-08` <sub>(on_violation: fatal)</sub>
MACD parameters locked: fast=12, slow=26, signal=9
### `SL-09` <sub>(on_violation: warning)</sub>
Default transaction costs: buy_cost=0.001, sell_cost=0.001, slippage=0.001
### `SL-10` <sub>(on_violation: fatal)</sub>
A-share equity trading is T+1 (no same-day close of buy positions)
### `SL-11` <sub>(on_violation: fatal)</sub>
Recorder subclass MUST define provider AND data_schema class attributes
### `SL-12` <sub>(on_violation: fatal)</sub>
Factor result_df MUST contain either 'filter_result' OR 'score_result' column
## Preconditions (4)
- **`PC-01`**: `python3 -c 'import zvt; print(zvt.__version__)'` → on_fail: Run: python3 -m pip install zvt then re-run: python3 -m zvt.init_dirs to initialize data directories
- **`PC-02`**: `python3 -c "from zvt.api.kdata import get_kdata; df = get_kdata(entity_ids=['stock_sh_600000'], limit=1); assert df is not None and len(df) > 0, 'No kdata found'"` → on_fail: Run recorder first: python3 -m zvt.recorders.em.em_stock_kdata_recorder --entity_ids stock_sh_600000 (replace with your target entity IDs)
- **`PC-03`**: `python3 -c "import os; from pathlib import Path; zvt_home = Path(os.environ.get('ZVT_HOME', Path.home() / '.zvt')); assert zvt_home.exists(), f'ZVT home not found: {zvt_home}'"` → on_fail: Run: python3 -m zvt.init_dirs
- **`PC-04`**: `python3 -c "import os, tempfile; from pathlib import Path; zvt_home = Path(os.environ.get('ZVT_HOME', Path.home() / '.zvt')); test_f = zvt_home / '.write_test'; test_f.touch(); test_f.unlink()"` → on_fail: Check directory permissions: chmod u+w ~/.zvt or set ZVT_HOME environment variable to a writable location
FILE:references/USE_CASES.md
# Known Use Cases (KUC)
Total: **26**
## `KUC-101`
**Source**: `main.py`
Provides unified entry point for starting FastAPI server or running pre/post-market analysis.
## `KUC-102`
**Source**: `app/main.py`
Creates and configures FastAPI application instance with CORS, routers, and lifespan management.
## `KUC-103`
**Source**: `app/static.py`
Serves frontend static files and implements SPA catch-each routing for client-side navigation.
## `KUC-104`
**Source**: `app/routers/alerts.py`
Manages user notifications across portfolios including unread counts, marking as read, and dismissing alerts.
## `KUC-105`
**Source**: `app/routers/auth.py`
Handles user registration, login, and JWT token generation for secure API access.
## `KUC-106`
**Source**: `app/routers/stocks.py`
Manages user's stock watchlist with CRUD operations, real-time quotes, and financial data retrieval.
## `KUC-107`
**Source**: `app/routers/sentiment.py`
Analyzes market sentiment from news and generates AI-powered sentiment reports.
## `KUC-108`
**Source**: `app/routers/generate.py`
Generates pre-market and post-market investment reports for funds or each user's portfolios.
## `KUC-109`
**Source**: `app/routers/market.py`
Provides fund search functionality and real-time market data using Akshare and TuShare APIs.
## `KUC-110`
**Source**: `app/routers/health.py`
Provides basic health check endpoint for system monitoring and load balancer checks.
## `KUC-111`
**Source**: `app/routers/assistant.py`
Provides conversational AI assistant with RAG-enhanced responses for investment queries.
## `KUC-112`
**Source**: `app/routers/recommendations.py`
Generates AI investment recommendations using quantitative factor-based engine for stocks and funds.
## `KUC-113`
**Source**: `app/routers/preferences.py`
Manages user's investment preferences including risk level presets and portfolio settings.
## `KUC-114`
**Source**: `app/routers/widgets.py`
Provides pre-aggregated market data for dashboard widgets including northbound flow and sector performance.
## `KUC-115`
**Source**: `app/routers/dashboard.py`
Provides dashboard overview, system statistics, and customizable layout management.
## `KUC-116`
**Source**: `app/routers/details.py`
Retrieves detailed stock information including spot data, historical prices, and financial indicators.
## `KUC-117`
**Source**: `app/routers/admin.py`
Provides admin endpoints for system testing, LLM connection verification, and web search testing.
## `KUC-118`
**Source**: `app/routers/funds.py`
Manages investment funds with diagnosis, risk metrics, drawdown analysis, and comparison features.
## `KUC-119`
**Source**: `app/routers/commodities.py`
Analyzes gold and silver commodities with price trends and investment insights.
## `KUC-120`
**Source**: `app/routers/reports.py`
Manages generated reports including listing, viewing, and organizing pre/post-market analysis files.
## `KUC-121`
**Source**: `app/routers/settings.py`
Manages application settings including LLM provider configuration and API key management.
## `KUC-122`
**Source**: `app/routers/portfolios.py`
Manages portfolios with unified positions, transactions, DIP plans, AI rebalancing, stress testing, and correlation analysis.
## `KUC-123`
**Source**: `app/routers/compare.py`
Compares multiple stocks side-by-side with metrics including price, PE, PB, market cap, and turnover.
## `KUC-124`
**Source**: `app/routers/news.py`
Aggregates and personalizes financial news feed with categories and bookmarking functionality.
## `KUC-125`
**Source**: `test_scan.py`
Tests raw TuShare API data access for money flow and HSGT northbound data scanning.
## `KUC-126`
**Source**: `test_hsgt_min.py`
Tests high-frequency northbound capital flow minute-level data retrieval.
FILE:references/WISDOM.md
# Cross-Project Wisdom
Total: **8**
## `CW-DATA-SOURCING-001` — Exponential backoff retry with rate limit detection
**From**: finance-bp-079--akshare, finance-bp-114--edgar-crawler · **Applicable to**: data-sourcing
Implement retry logic with exponential backoff specifically for HTTP 429 rate limit responses. Retrying immediately on rate limit errors worsens the block situation. Separate retry logic for transient network errors (TimeoutError, ConnectionError) from permanent errors (ValueError, KeyError) prevents resource waste and masks underlying bugs.
## `CW-DATA-SOURCING-002` — Strict date format validation and standardization
**From**: finance-bp-070--edgartools, finance-bp-079--akshare, finance-bp-084--eastmoney · **Applicable to**: data-sourcing
Validate date formats strictly (YYYY-MM-DD pattern with leap year and month-end checks) before processing XBRL or API data. Convert date strings between formats (YYYYMMDD to YYYY-MM-DD) when storing to databases. Invalid dates corrupt downstream financial calculations.
## `CW-DATA-SOURCING-003` — XBRL fact attribute completeness enforcement
**From**: finance-bp-070--edgartools, finance-bp-114--edgar-crawler · **Applicable to**: data-sourcing
Extract and validate all essential XBRL fact attributes (concept, value, period, unit) from every fact. Missing attributes cause financial analysis queries to return incomplete or misleading results. Period type (instant vs duration) must be correctly distinguished for accurate balance sheet rendering.
## `CW-DATA-SOURCING-004` — Streaming parser threshold for large documents
**From**: finance-bp-070--edgartools, finance-bp-128--yfinance · **Applicable to**: data-sourcing
Implement streaming parser activation when documents exceed configurable thresholds (10MB default). This prevents OOM errors on large NPORT-P filings or bulk document downloads. Also require timezone information for time-series data to prevent DST offset corruption.
## `CW-DATA-SOURCING-005` — Data accuracy disclaimer requirements
**From**: finance-bp-079--akshare, finance-bp-128--yfinance, finance-bp-097--OpenBB · **Applicable to**: data-sourcing
Always present scraped or third-party financial data with proper caveats about accuracy limitations and delays. Claims of guaranteed accuracy, real-time capabilities, or Yahoo/provider affiliation violate terms of service and can lead to user financial losses from reliance on delayed or incorrect data.
## `CW-DATA-SOURCING-006` — Atomic write ordering for versioned storage
**From**: finance-bp-103--ArcticDB · **Applicable to**: data-sourcing
Write atom keys (TABLE_DATA, TABLE_INDEX, VERSION) before updating mutable reference keys (VERSION_REF, SNAPSHOT_REF). Never modify atom keys after writing to preserve content-addressed storage invariants. This prevents readers from accessing incomplete data in multi-writer scenarios.
## `CW-DATA-SOURCING-007` — HTTP status code validation before data processing
**From**: finance-bp-079--akshare, finance-bp-097--OpenBB · **Applicable to**: data-sourcing
Always validate HTTP response status codes before processing response data. Error responses (404, 500) may contain HTML error pages that corrupt downstream JSON parsing. Explicitly check for HTTP 429 and raise RateLimitError for proper handling by callers.
## `CW-DATA-SOURCING-008` — Quality gates for financial recommendations
**From**: finance-bp-084--eastmoney · **Applicable to**: data-sourcing
Apply fundamental quality filters (ROE thresholds, OCF/Profit ratios, debt ratios) before generating financial recommendations. Without quality gates, low-quality stocks may be recommended for positions, leading to investment losses. Separate on-demand computation from scheduled pre-computation to handle API rate limits.
FILE:references/components/analysis_-_reporting.md
# analysis_&_reporting (5 classes)
## `PreMarketAnalyst.analyze`
`analysis_&_reporting/premarketanalyst-analyze.py:0`
## `PostMarketAnalyst.analyze`
`analysis_&_reporting/postmarketanalyst-analyze.py:0`
## `GoldSilverAnalyst.analyze`
`analysis_&_reporting/goldsilveranalyst-analyze.py:0`
## `BaseAnalyst._build_prompt`
`analysis_&_reporting/baseanalyst-build-prompt.py:0`
## `analysis_mode`
`analysis_&_reporting/analysis-mode.py:0`
FILE:references/components/data_collection.md
# data_collection (5 classes)
## `DataSourceManager.fetch_data`
`data_collection/datasourcemanager-fetch-data.py:0`
## `TuShareClient.get_daily_bars`
`data_collection/tushareclient-get-daily-bars.py:0`
## `RateLimiter.acquire`
`data_collection/ratelimiter-acquire.py:0`
## `CircuitBreaker.call`
`data_collection/circuitbreaker-call.py:0`
## `data_source_provider`
`data_collection/data-source-provider.py:0`
FILE:references/components/factor_computation.md
# factor_computation (5 classes)
## `DailyFactorComputer.compute_all`
`factor_computation/dailyfactorcomputer-compute-all.py:0`
## `TechnicalFactors.compute`
`factor_computation/technicalfactors-compute.py:0`
## `RiskFactors.compute`
`factor_computation/riskfactors-compute.py:0`
## `FactorCache.get`
`factor_computation/factorcache-get.py:0`
## `factor_computation_schedule`
`factor_computation/factor-computation-schedule.py:0`
FILE:references/components/llm_services.md
# llm_services (6 classes)
## `BaseLLMClient.chat`
`llm_services/basellmclient-chat.py:0`
## `GoogleGeminiClient.chat`
`llm_services/googlegeminiclient-chat.py:0`
## `OpenAIClient.chat`
`llm_services/openaiclient-chat.py:0`
## `ToolExecutor.execute`
`llm_services/toolexecutor-execute.py:0`
## `AssistantService.chat`
`llm_services/assistantservice-chat.py:0`
## `llm_provider`
`llm_services/llm-provider.py:0`
FILE:references/components/portfolio_management.md
# portfolio_management (5 classes)
## `SignalGenerator.gen_signal`
`portfolio_management/signalgenerator-gen-signal.py:0`
## `RiskMetricsCalculator.calculate`
`portfolio_management/riskmetricscalculator-calculate.py:0`
## `CorrelationAnalyzer.analyze`
`portfolio_management/correlationanalyzer-analyze.py:0`
## `StressTestEngine.run`
`portfolio_management/stresstestengine-run.py:0`
## `signal_thresholds`
`portfolio_management/signal-thresholds.py:0`
FILE:references/components/recommendation_engine.md
# recommendation_engine (7 classes)
## `RecommendationEngine.get_recommendation`
`recommendation_engine/recommendationengine-get-recommendation.py:0`
## `StockRecommendationEngine.generate`
`recommendation_engine/stockrecommendationengine-generate.py:0`
## `FundRecommendationEngine.generate`
`recommendation_engine/fundrecommendationengine-generate.py:0`
## `ShortTermStrategy.compute_score`
`recommendation_engine/shorttermstrategy-compute-score.py:0`
## `AlphaStrategy.compute_score`
`recommendation_engine/alphastrategy-compute-score.py:0`
## `strategy_weights`
`recommendation_engine/strategy-weights.py:0`
## `min_score_threshold`
`recommendation_engine/min-score-threshold.py:0`
FILE:references/components/scheduled_tasks.md
# scheduled_tasks (4 classes)
## `SchedulerManager.add_factor_computation_job`
`scheduled_tasks/schedulermanager-add-factor-computation-.py:0`
## `TradingCalendar.is_trading_day`
`scheduled_tasks/tradingcalendar-is-trading-day.py:0`
## `SchedulerManager.snapshot_portfolios`
`scheduled_tasks/schedulermanager-snapshot-portfolios.py:0`
## `job_schedule`
`scheduled_tasks/job-schedule.py:0`
Darts 是轻量级时间序列预测库,支持多市场金融数据的确定性与概率性预测,提供协变量整合与层级聚合能力。
---
name: darts-forecasting
description: |-
Darts 是轻量级时间序列预测库,支持多市场金融数据的确定性与概率性预测,提供协变量整合与层级聚合能力。
license: Proprietary. See LICENSE.txt in project root.
compatibility: Designed for Doramagic-host ecosystem (Claude Code / openclaw / Cursor). Requires Python 3.12+ with uv package manager.
metadata:
version: "v6.1"
blueprint_id: "finance-bp-102"
compiled_at: "2026-04-22T13:00:47.497902+00:00"
capability_markets: "multi-market"
capability_activities: "time-series-ml"
sop_version: "crystal-compilation-v6.1"
---
# Darts 时序预测 (darts-forecasting)
> Darts 是轻量级时间序列预测库,支持多市场金融数据的确定性与概率性预测,提供协变量整合与层级聚合能力。
## Pipeline
`data_collection -> data_storage -> factor_computation -> target_selection -> trading_execution -> visualization`
## Top Use Cases (31 total)
### Sphinx Package Title Fixer (`UC-101`)
Automates extraction of descriptive titles and docstrings from Python packages to improve Sphinx API documentation readability
**Triggers**: sphinx documentation, package titles, docstring extraction
### Sphinx Documentation Configuration (`UC-102`)
Configures Sphinx documentation builder with extensions for auto-summary, autodoc, and graphviz visualization
**Triggers**: sphinx config, documentation, autodoc
### Example Utilities Module (`UC-131`)
Provides utility functions for managing Python paths when running Darts examples locally
**Triggers**: utilities, path management, example helpers
For all **31** use cases, see [references/USE_CASES.md](references/USE_CASES.md).
**Execute trigger**: `When user intent matches intent_router.uc_entries[].positive_terms AND user uses action verb (run/execute/跑/执行/backtest/fetch/collect)`
## What I'll Ask You
- Target market: A-share (default), HK, or crypto? (US stocks in ZVT are half-baked — stockus_nasdaq_AAPL exists but coverage is thin)
- Data source / provider: eastmoney (free, no account), joinquant (account+paid), baostock (free, good history), akshare, or qmt (broker)?
- Strategy type: MACD golden-cross, MA crossover, volume breakout, fundamental screen, or custom factor?
- Time range: start_timestamp and end_timestamp for backtest period
- Target entity IDs: specific stocks (stock_sh_600000) or index components (SZ1000)?
## Semantic Locks (Fatal)
| ID | Rule | On Violation |
|---|---|---|
| `SL-01` | Execute sell orders before buy orders in every trading cycle | halt |
| `SL-02` | Trading signals MUST use next-bar execution (no look-ahead) | halt |
| `SL-03` | Entity IDs MUST follow format entity_type_exchange_code | halt |
| `SL-04` | DataFrame index MUST be MultiIndex (entity_id, timestamp) | halt |
| `SL-05` | TradingSignal MUST have EXACTLY ONE of: position_pct, order_money, order_amount | halt |
| `SL-06` | filter_result column semantics: True=BUY, False=SELL, None/NaN=NO ACTION | halt |
| `SL-07` | Transformer MUST run BEFORE Accumulator in factor pipeline | halt |
| `SL-08` | MACD parameters locked: fast=12, slow=26, signal=9 | halt |
Full lock definitions: [references/LOCKS.md](references/LOCKS.md)
## Top Anti-Patterns (15 total)
- **`AP-TIME-SERIES-ML-001`**: TimeSeries values array dimensionality mismatch
- **`AP-TIME-SERIES-ML-002`**: Non-floating-point dtype in TimeSeries values
- **`AP-TIME-SERIES-ML-003`**: Irregular or non-monotonic time index
All 15 anti-patterns: [references/ANTI_PATTERNS.md](references/ANTI_PATTERNS.md)
## Evidence Quality Notice
> [QUALITY NOTICE] This crystal was compiled from blueprint finance-bp-102. Evidence verify ratio = 43.8% and audit fail total = 26. Generated results may have uncaptured requirement gaps. Verify critical decisions against source files (LATEST.yaml / LATEST.jsonl).
## Reference Files
| File | Contents | When to Load |
|---|---|---|
| [references/seed.yaml](references/seed.yaml) | V6+ 全量权威 (source-of-truth) | 有行为/决策争议时必读 |
| [references/ANTI_PATTERNS.md](references/ANTI_PATTERNS.md) | 15 条跨项目反模式 | 开始实现前 |
| [references/WISDOM.md](references/WISDOM.md) | 跨项目精华借鉴 | 架构决策时 |
| [references/CONSTRAINTS.md](references/CONSTRAINTS.md) | domain + fatal 约束 | 规则冲突时 |
| [references/USE_CASES.md](references/USE_CASES.md) | 全量 KUC-* 业务场景 | 需要完整示例时 |
| [references/LOCKS.md](references/LOCKS.md) | SL-* + preconditions + hints | 生成回测/交易代码前 |
| [references/COMPONENTS.md](references/COMPONENTS.md) | AST 组件地图(按 module 拆分)| 查 API 时 |
---
*Compiled by Doramagic crystal-compilation-v6.1 from `finance-bp-102` blueprint at 2026-04-22T13:00:47.497902+00:00.*
*See [human_summary.md](human_summary.md) for non-technical overview.*
FILE:human_summary.md
# Human Summary
> I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow. Just tell me what you want; I'll write the code, you don't have to dig docs. (Heads up: ZVT natively supports A-share, HK, and crypto. US stocks — stockus_nasdaq_AAPL — are half-baked; don't bother for serious work.)
## What I Can Do
- **tagline**: I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow. Just tell me what you want; I'll write the code, you don't have to dig docs. (Heads up: ZVT natively supports A-share, HK, and crypto. US stocks — stockus_nasdaq_AAPL — are half-baked; don't bother for serious work.)
- **use_cases**: ['Darts Quickstart Tutorial', 'Sphinx Documentation Configuration', 'Sphinx Package Title Fixer', 'A-share MACD daily golden-cross backtest with hfq price adjustment from eastmoney', 'End-to-end ZVT pipeline: FinanceRecorder + GoodCompanyFactor + StockTrader', 'Multi-factor strategy with TargetSelector (AND mode) combining MACD + volume breakout', 'Index composition data collection (SZ1000, SZ2000) with EM recorder']
## What I Ask You
- Target market: A-share (default), HK, or crypto? (US stocks in ZVT are half-baked — stockus_nasdaq_AAPL exists but coverage is thin)
- Data source / provider: eastmoney (free, no account), joinquant (account+paid), baostock (free, good history), akshare, or qmt (broker)?
- Strategy type: MACD golden-cross, MA crossover, volume breakout, fundamental screen, or custom factor?
- Time range: start_timestamp and end_timestamp for backtest period
- Target entity IDs: specific stocks (stock_sh_600000) or index components (SZ1000)?
FILE:references/ANTI_PATTERNS.md
# Anti-Patterns (Cross-Project)
Total: **15**
## finance-bp-102--Darts (7)
### `AP-TIME-SERIES-ML-001` — TimeSeries values array dimensionality mismatch <sub>(high)</sub>
When constructing a TimeSeries with a values array that is not expanded to exactly 3 dimensions (time×component×sample), downstream model operations expecting the standard 3D shape will fail with dimension mismatches. This causes all downstream models to receive incorrectly formatted data tensors, leading to complete pipeline failure or silent data corruption.
### `AP-TIME-SERIES-ML-002` — Non-floating-point dtype in TimeSeries values <sub>(high)</sub>
When setting TimeSeries values dtype to integer or non-floating-point types, numerical operations produce incorrect results during financial calculations. Financial forecasts require float64 or float32 precision to handle decimal computations accurately; integer dtypes truncate precision and cause accumulation of rounding errors that compound across time steps.
### `AP-TIME-SERIES-ML-003` — Irregular or non-monotonic time index <sub>(high)</sub>
When TimeSeries time index is not strictly monotonically increasing with a well-defined frequency and no gaps, downstream models produce incorrect forecasts due to temporal misalignment. Gap detection methods fail, and any temporal aggregation or differencing operations will produce meaningless results.
### `AP-TIME-SERIES-ML-004` — Time index and values length mismatch at construction <sub>(high)</sub>
When the time index length does not equal the values array first dimension length, TimeSeries construction fails with ValueError at construction time, preventing any data from being loaded into the system. This typically occurs when importing data from CSV or DataFrame sources where column alignment assumptions are incorrect.
### `AP-TIME-SERIES-ML-005` — Missing abstract method implementations in ForecastingModel subclasses <sub>(high)</sub>
When implementing ForecastingModel subclasses without implementing all required abstract methods (fit, predict, min_train_samples, _target_window_lengths, extreme_lags, supports_multivariate, supports_transferable_series_prediction), Python's ABC abstractmethod enforcement causes TypeError at instantiation time, preventing any model from being created.
### `AP-TIME-SERIES-ML-006` — fit() method not returning self for chaining <sub>(medium)</sub>
When fit() method does not return self for method chaining, the fluent interface pattern expected by users breaks at lines 209, 2932, and 3069 where chaining is attempted. Users encounter AttributeError when trying to chain operations like model.fit(series).predict(n_periods).
### `AP-TIME-SERIES-ML-007` — Frequency inference failure with insufficient timesteps <sub>(medium)</sub>
When using fill_missing_dates with fewer than 3 time steps, frequency inference fails with ValueError because at least 3 consecutive timestamps are required to determine a unique constant frequency. Irregular time series cannot be gap-filled without this minimum data.
## finance-bp-121--machine-learning-for-trading (8)
### `AP-TIME-SERIES-ML-008` — Look-ahead bias from random train/test splits <sub>(high)</sub>
When implementing cross-validation for financial time series using random K-fold or standard train_test_split without temporal ordering, future information leaks into training data. This look-ahead bias artificially inflates backtest performance metrics and leads to significant live trading losses when the model encounters truly unseen data.
### `AP-TIME-SERIES-ML-009` — Missing purge gap contaminating validation results <sub>(high)</sub>
When using walking forward split without an embargo gap between train and test periods, overlapping outcomes between training and test periods contaminate validation results. Without purge gap, seemingly good backtest results do not generalize to live performance due to information leakage across the split boundary.
### `AP-TIME-SERIES-ML-010` — Hardcoded credentials in source code <sub>(high)</sub>
When scraping content from websites requiring authentication by hardcoding credentials in source code files, exposed credentials lead to unauthorized access, potential account termination, and security breaches. Credentials should be loaded from environment variables or secure configuration files, never committed to version control.
### `AP-TIME-SERIES-ML-011` — TA-Lib infinite values causing ML model failures <sub>(high)</sub>
When computing technical indicators using TA-Lib (RSI, MACD, ATR) without handling edge cases, division-by-zero produces infinite values that corrupt the feature DataFrame. Gradient-based ML models (neural networks) cannot process infinite values, causing training to fail or produce NaN gradients.
### `AP-TIME-SERIES-ML-012` — MultiIndex structure lost during feature engineering <sub>(high)</sub>
When flattening or renaming the (ticker, date) MultiIndex during feature engineering for multi-ticker trading, downstream stages (prediction_modeling, backtesting) fail because they expect MultiIndex for proper temporal train/test splits. Data corruption occurs silently when multi-ticker data is treated as single-ticker.
### `AP-TIME-SERIES-ML-013` — Missing TA-Lib C library dependency <sub>(high)</sub>
When installing TA-Lib via pip install ta-lib alone without compiling the underlying C library, import fails because the Python package is merely a wrapper around compiled native code. This causes immediate runtime failure for any code attempting to import talib for technical indicator computation.
### `AP-TIME-SERIES-ML-014` — Trading calendar minutes_per_day mismatch <sub>(high)</sub>
When configuring extended-hours trading calendar with incorrect minutes_per_day (e.g., using 960 for extended hours instead of 1600), minute bar alignment with the calendar fails. Backtest prices do not correspond to actual trading times, producing meaningless results that don't reflect real market microstructure.
### `AP-TIME-SERIES-ML-015` — Zipline bundle ingest function signature mismatch <sub>(high)</sub>
When implementing Zipline bundle ingest function with incorrect parameter count or order, Zipline fails with TypeError during bundle ingest because the ingestion pipeline expects exactly 9 parameters in a specific order. Backtesting cannot run at all when bundle ingestion fails, blocking all downstream work.
FILE:references/COMPONENTS.md
# Component Capability Map
**Project**: finance-bp-102--Darts
**Scan date**: 2026-04-22
**Stats**: {'total_files': 15, 'total_classes': 77, 'total_functions': 0, 'total_stages': 15}
## Modules (15)
- [timeseries_data_representation](components/timeseries_data_representation.md): 5 classes
- [forecasting_model_base](components/forecasting_model_base.md): 5 classes
- [pytorch_deep_learning_forecasting](components/pytorch_deep_learning_forecasting.md): 7 classes
- [statistical_&_classical_forecasting](components/statistical_-_classical_forecasting.md): 5 classes
- [scikit-learn_regression_forecasting](components/scikit-learn_regression_forecasting.md): 5 classes
- [ensemble_forecasting](components/ensemble_forecasting.md): 4 classes
- [conformal_prediction](components/conformal_prediction.md): 4 classes
- [data_transformation_pipeline](components/data_transformation_pipeline.md): 6 classes
- [covariate_encoding](components/covariate_encoding.md): 5 classes
- [hierarchical_reconciliation](components/hierarchical_reconciliation.md): 5 classes
- [anomaly_detection](components/anomaly_detection.md): 7 classes
- [time_series_filtering](components/time_series_filtering.md): 4 classes
- [metrics_evaluation](components/metrics_evaluation.md): 6 classes
- [model_explainability](components/model_explainability.md): 4 classes
- [probabilistic_likelihoods](components/probabilistic_likelihoods.md): 5 classes
FILE:references/CONSTRAINTS.md
# Constraints
## preservation_manifest
```yaml
required_objects:
business_decisions_count: 192
fatal_constraints_count: 102
non_fatal_constraints_count: 282
use_cases_count: 31
semantic_locks_count: 12
preconditions_count: 4
evidence_quality_rules_count: 2
traceback_scenarios_count: 5
```
FILE:references/LOCKS.md
# Semantic Locks + Preconditions
## Semantic Locks (12)
### `SL-01` <sub>(on_violation: fatal)</sub>
Execute sell orders before buy orders in every trading cycle
### `SL-02` <sub>(on_violation: fatal)</sub>
Trading signals MUST use next-bar execution (no look-ahead)
### `SL-03` <sub>(on_violation: fatal)</sub>
Entity IDs MUST follow format entity_type_exchange_code
### `SL-04` <sub>(on_violation: fatal)</sub>
DataFrame index MUST be MultiIndex (entity_id, timestamp)
### `SL-05` <sub>(on_violation: fatal)</sub>
TradingSignal MUST have EXACTLY ONE of: position_pct, order_money, order_amount
### `SL-06` <sub>(on_violation: fatal)</sub>
filter_result column semantics: True=BUY, False=SELL, None/NaN=NO ACTION
### `SL-07` <sub>(on_violation: fatal)</sub>
Transformer MUST run BEFORE Accumulator in factor pipeline
### `SL-08` <sub>(on_violation: fatal)</sub>
MACD parameters locked: fast=12, slow=26, signal=9
### `SL-09` <sub>(on_violation: warning)</sub>
Default transaction costs: buy_cost=0.001, sell_cost=0.001, slippage=0.001
### `SL-10` <sub>(on_violation: fatal)</sub>
A-share equity trading is T+1 (no same-day close of buy positions)
### `SL-11` <sub>(on_violation: fatal)</sub>
Recorder subclass MUST define provider AND data_schema class attributes
### `SL-12` <sub>(on_violation: fatal)</sub>
Factor result_df MUST contain either 'filter_result' OR 'score_result' column
## Preconditions (4)
- **`PC-01`**: `python3 -c 'import zvt; print(zvt.__version__)'` → on_fail: Run: python3 -m pip install zvt then re-run: python3 -m zvt.init_dirs to initialize data directories
- **`PC-02`**: `python3 -c "from zvt.api.kdata import get_kdata; df = get_kdata(entity_ids=['stock_sh_600000'], limit=1); assert df is not None and len(df) > 0, 'No kdata found'"` → on_fail: Run recorder first: python3 -m zvt.recorders.em.em_stock_kdata_recorder --entity_ids stock_sh_600000 (replace with your target entity IDs)
- **`PC-03`**: `python3 -c "import os; from pathlib import Path; zvt_home = Path(os.environ.get('ZVT_HOME', Path.home() / '.zvt')); assert zvt_home.exists(), f'ZVT home not found: {zvt_home}'"` → on_fail: Run: python3 -m zvt.init_dirs
- **`PC-04`**: `python3 -c "import os, tempfile; from pathlib import Path; zvt_home = Path(os.environ.get('ZVT_HOME', Path.home() / '.zvt')); test_f = zvt_home / '.write_test'; test_f.touch(); test_f.unlink()"` → on_fail: Check directory permissions: chmod u+w ~/.zvt or set ZVT_HOME environment variable to a writable location
FILE:references/USE_CASES.md
# Known Use Cases (KUC)
Total: **31**
## `KUC-101`
**Source**: `docs/fix_package_titles.py`
Automates extraction of descriptive titles and docstrings from Python packages to improve Sphinx API documentation readability.
## `KUC-102`
**Source**: `docs/source/conf.py`
Configures Sphinx documentation builder with extensions for auto-summary, autodoc, and graphviz visualization.
## `KUC-103`
**Source**: `examples/00-quickstart.ipynb`
Introduces new users to the Darts time series library with basic operations like series creation, loading datasets, and simple transformations.
## `KUC-104`
**Source**: `examples/01-multi-time-series-and-covariates.ipynb`
Demonstrates forecasting multiple related time series simultaneously using covariates and multivariate models like VARIMA and NBEATS.
## `KUC-105`
**Source**: `examples/02-data-processing.ipynb`
Shows how to build reusable data processing pipelines with transformers for scaling, filling missing values, and other transformations.
## `KUC-106`
**Source**: `examples/03-FFT-examples.ipynb`
Uses Fast Fourier Transform for frequency-based time series forecasting, ideal for seasonal patterns.
## `KUC-107`
**Source**: `examples/04-RNN-examples.ipynb`
Demonstrates recurrent neural network models (RNN, LSTM, GRU) for time series forecasting with seasonality detection.
## `KUC-108`
**Source**: `examples/05-TCN-examples.ipynb`
Uses Temporal Convolutional Networks for high-performance time series forecasting with dilated convolutions.
## `KUC-109`
**Source**: `examples/06-Transformer-examples.ipynb`
Applies Transformer architecture with self-attention mechanisms for capturing long-range dependencies in time series.
## `KUC-110`
**Source**: `examples/07-NBEATS-examples.ipynb`
Uses NBEATS (Neural Basis Expansion Analysis) for interpretable deep learning time series forecasting.
## `KUC-111`
**Source**: `examples/08-DeepAR-examples.ipynb`
Implements DeepAR for probabilistic forecasting with uncertainty quantification using Gaussian likelihood.
## `KUC-112`
**Source**: `examples/09-DeepTCN-examples.ipynb`
Combines Deep TCN architecture with probabilistic prediction using quantile regression and Gaussian likelihood.
## `KUC-113`
**Source**: `examples/10-Kalman-filter-examples.ipynb`
Applies Kalman filtering for state estimation and noise reduction in time series with known state-space models.
## `KUC-114`
**Source**: `examples/11-GP-filter-examples.ipynb`
Uses Gaussian Process regression for flexible non-parametric filtering and noise reduction in time series.
## `KUC-115`
**Source**: `examples/12-Dynamic-Time-Warping-example.ipynb`
Computes similarity between time series using Dynamic Time Warping algorithm for pattern matching and comparison.
## `KUC-116`
**Source**: `examples/13-TFT-examples.ipynb`
Uses TFT for interpretable multi-horizon forecasting with attention visualization and quantile predictions.
## `KUC-117`
**Source**: `examples/14-transfer-learning.ipynb`
Demonstrates transferring knowledge from pre-trained models across different time series datasets (M3, M4 competitions).
## `KUC-118`
**Source**: `examples/15-static-covariates.ipynb`
Shows how to incorporate static (time-invariant) covariates into time series models for multivariate forecasting.
## `KUC-119`
**Source**: `examples/16-hierarchical-reconciliation.ipynb`
Demonstrates hierarchical forecasting with MinT reconciliation to ensure consistency across aggregation levels.
## `KUC-120`
**Source**: `examples/17-hyperparameter-optimization.ipynb`
Uses Optuna for automated hyperparameter tuning of forecasting models with early stopping and visualization.
## `KUC-121`
**Source**: `examples/18-TiDE-examples.ipynb`
Implements TiDE (Time-series Dense Encoder) for efficient long-sequence time series forecasting.
## `KUC-122`
**Source**: `examples/19-EnsembleModel-examples.ipynb`
Combines multiple forecasting models using ensemble techniques like naive ensembling and regression ensembling.
## `KUC-123`
**Source**: `examples/20-SKLearnModel-examples.ipynb`
Uses scikit-learn compatible models (Linear Regression, Random Forest, XGBoost, LightGBM) with SHAP explainability.
## `KUC-124`
**Source**: `examples/21-TSMixer-examples.ipynb`
Uses TSMixer for multi-variate time series forecasting with feature mixing and quantile regression.
## `KUC-125`
**Source**: `examples/22-anomaly-detection-examples.ipynb`
Detects anomalies in time series using scoring methods like KMeans, Wasserstein distance, and forecasting-based models.
## `KUC-126`
**Source**: `examples/23-Conformal-Prediction-examples.ipynb`
Provides distribution-free uncertainty quantification using conformal prediction with calibration sets.
## `KUC-127`
**Source**: `examples/24-SKLearnClassifierModel-examples.ipynb`
Classifies time series segments into categories using gradient-based features and CatBoost classifier.
## `KUC-128`
**Source**: `examples/25-Chronos-2-examples.ipynb`
Uses Chronos-2, a pre-trained time series foundation model, for zero-shot and fine-tuned forecasting.
## `KUC-129`
**Source**: `examples/26-NeuralForecast-examples.ipynb`
Integrates NeuralForecast models (from Nixtla) with Darts for advanced neural network time series forecasting.
## `KUC-130`
**Source**: `examples/27-Torch-and-Foundation-Model-Fine-Tuning-examples.ipynb`
Fine-tunes pre-trained foundation models like Chronos-2 and TiDE on custom time series data.
## `KUC-131`
**Source**: `examples/utils/utils.py`
Provides utility functions for managing Python paths when running Darts examples locally.
FILE:references/WISDOM.md
# Cross-Project Wisdom
Total: **10**
## `CW-TIME-SERIES-ML-001` — 3D TimeSeries dimensionality invariant
**From**: finance-bp-102--Darts · **Applicable to**: time-series-ml
Always expand TimeSeries values to exactly 3 dimensions (n_timesteps, n_components, n_samples) regardless of input format. This invariant enables uniform downstream processing regardless of whether the data is univariate (1 component), single-sample, or multivariate probabilistic series with multiple samples.
## `CW-TIME-SERIES-ML-002` — Strict time index validation
**From**: finance-bp-102--Darts · **Applicable to**: time-series-ml
Validate time index at construction: must be strictly monotonically increasing, have a well-defined frequency, no holes by default, and length must match values first dimension. This prevents silent data corruption in all downstream temporal operations.
## `CW-TIME-SERIES-ML-003` — MultiIndex preservation in multi-ticker pipelines
**From**: finance-bp-121--machine-learning-for-trading · **Applicable to**: time-series-ml
Maintain (ticker, date) MultiIndex structure throughout the entire feature engineering and prediction pipeline for multi-ticker trading systems. Downstream stages depend on this structure for proper temporal train/test splits that respect per-ticker time boundaries.
## `CW-TIME-SERIES-ML-004` — Purged walking forward cross-validation
**From**: finance-bp-121--machine-learning-for-trading · **Applicable to**: time-series-ml
Use purged walking forward split with embargo gap for financial time series validation. Random splits cause look-ahead bias, while splits without purge gaps contaminate results with overlapping outcomes. The purge gap prevents information leakage across train/test boundaries.
## `CW-TIME-SERIES-ML-005` — TA-Lib edge case sanitization
**From**: finance-bp-121--machine-learning-for-trading · **Applicable to**: time-series-ml
Always replace infinite values with NaN and call dropna before ML model training when using TA-Lib technical indicators. RSI, MACD, ATR and other indicators produce inf values during division-by-zero edge cases, which corrupt gradient-based model training.
## `CW-TIME-SERIES-ML-006` — Fluent forecasting model interface
**From**: finance-bp-102--Darts · **Applicable to**: time-series-ml
Implement fit() returning self and predict() on ForecastingModel subclasses to support method chaining. This fluent interface pattern is expected by users for idiomatic usage like model.fit(series).predict(n_periods).
## `CW-TIME-SERIES-ML-007` — Zipline bundle signature contract
**From**: finance-bp-121--machine-learning-for-trading · **Applicable to**: time-series-ml
When implementing Zipline bundle ingest functions, the function must accept exactly 9 parameters in the specified order: environ, asset_db_writer, minute_bar_writer, daily_bar_writer, adjustment_writer, calendar, start_session, end_session, cache. This contract is enforced by Zipline's ingestion pipeline.
## `CW-TIME-SERIES-ML-008` — Calendar minutes_per_day alignment
**From**: finance-bp-121--machine-learning-for-trading · **Applicable to**: time-series-ml
When configuring trading calendars for backtesting, set minutes_per_day to match the total trading minutes including extended hours (960 for regular NYSE, 1600 for extended hours starting 4:00 AM). This ensures minute bar alignment with actual trading times in the backtest.
## `CW-TIME-SERIES-ML-009` — Deterministic series detection
**From**: finance-bp-121--machine-learning-for-trading · **Applicable to**: time-series-ml
A TimeSeries is deterministic when n_samples equals 1, otherwise probabilistic. This distinction matters for methods like to_json and gaps detection which execute differently depending on whether the series contains probabilistic predictions or point estimates.
## `CW-TIME-SERIES-ML-010` — Minimum training sample enforcement
**From**: finance-bp-102--Darts · **Applicable to**: time-series-ml
Enforce min_train_series_length at fit time to prevent underfitting with insufficient historical data. Models should raise ValueError with clear messaging when training series length is below the model's minimum requirement, preventing silent poor forecasts.
FILE:references/components/anomaly_detection.md
# anomaly_detection (7 classes)
## `AnomalyModel.fit`
`anomaly_detection/anomalymodel-fit.py:0`
## `AnomalyModel.score`
`anomaly_detection/anomalymodel-score.py:0`
## `AnomalyModel.detect`
`anomaly_detection/anomalymodel-detect.py:0`
## `Detector.fit_detect`
`anomaly_detection/detector-fit-detect.py:0`
## `Scorer`
`anomaly_detection/scorer.py:0`
## `Detector threshold`
`anomaly_detection/detector-threshold.py:0`
## `Aggregator`
`anomaly_detection/aggregator.py:0`
FILE:references/components/conformal_prediction.md
# conformal_prediction (4 classes)
## `ConformalModel.fit`
`conformal_prediction/conformalmodel-fit.py:0`
## `ConformalModel.predict`
`conformal_prediction/conformalmodel-predict.py:0`
## `ConformalQRModel.fit`
`conformal_prediction/conformalqrmodel-fit.py:0`
## `Conformal method`
`conformal_prediction/conformal-method.py:0`
FILE:references/components/covariate_encoding.md
# covariate_encoding (5 classes)
## `Encoder.encode_train`
`covariate_encoding/encoder-encode-train.py:0`
## `Encoder.encode_inference`
`covariate_encoding/encoder-encode-inference.py:0`
## `SequentialEncoder.fit`
`covariate_encoding/sequentialencoder-fit.py:0`
## `Encoding type`
`covariate_encoding/encoding-type.py:0`
## `Cyclic normalization`
`covariate_encoding/cyclic-normalization.py:0`
FILE:references/components/data_transformation_pipeline.md
# data_transformation_pipeline (6 classes)
## `BaseDataTransformer.transform`
`data_transformation_pipeline/basedatatransformer-transform.py:0`
## `FittableDataTransformer.fit`
`data_transformation_pipeline/fittabledatatransformer-fit.py:0`
## `InvertibleDataTransformer.inverse_transform`
`data_transformation_pipeline/invertibledatatransformer-inverse-transf.py:0`
## `Pipeline.fit_transform`
`data_transformation_pipeline/pipeline-fit-transform.py:0`
## `Scaler backend`
`data_transformation_pipeline/scaler-backend.py:0`
## `Parallelization`
`data_transformation_pipeline/parallelization.py:0`
FILE:references/components/ensemble_forecasting.md
# ensemble_forecasting (4 classes)
## `EnsembleModel.fit`
`ensemble_forecasting/ensemblemodel-fit.py:0`
## `EnsembleModel.predict`
`ensemble_forecasting/ensemblemodel-predict.py:0`
## `RegressionEnsembleModel.fit`
`ensemble_forecasting/regressionensemblemodel-fit.py:0`
## `Ensemble method`
`ensemble_forecasting/ensemble-method.py:0`
FILE:references/components/forecasting_model_base.md
# forecasting_model_base (5 classes)
## `ForecastingModel.fit`
`forecasting_model_base/forecastingmodel-fit.py:0`
## `ForecastingModel.predict`
`forecasting_model_base/forecastingmodel-predict.py:0`
## `ForecastingModel.historical_forecasts`
`forecasting_model_base/forecastingmodel-historical-forecasts.py:0`
## `Encoder system`
`forecasting_model_base/encoder-system.py:0`
## `Likelihood`
`forecasting_model_base/likelihood.py:0`
FILE:references/components/hierarchical_reconciliation.md
# hierarchical_reconciliation (5 classes)
## `BottomUpReconciliator.fit`
`hierarchical_reconciliation/bottomupreconciliator-fit.py:0`
## `TopDownReconciliator.fit`
`hierarchical_reconciliation/topdownreconciliator-fit.py:0`
## `MinTReconciliator.fit`
`hierarchical_reconciliation/mintreconciliator-fit.py:0`
## `Reconciliator.transform`
`hierarchical_reconciliation/reconciliator-transform.py:0`
## `Reconciliation method`
`hierarchical_reconciliation/reconciliation-method.py:0`
FILE:references/components/metrics_evaluation.md
# metrics_evaluation (6 classes)
## `err`
`metrics_evaluation/err.py:0`
## `mae`
`metrics_evaluation/mae.py:0`
## `mape`
`metrics_evaluation/mape.py:0`
## `ql`
`metrics_evaluation/ql.py:0`
## `ic`
`metrics_evaluation/ic.py:0`
## `Reduction`
`metrics_evaluation/reduction.py:0`
FILE:references/components/model_explainability.md
# model_explainability (4 classes)
## `ShapExplainer.explain`
`model_explainability/shapexplainer-explain.py:0`
## `TFTExplainer.explain`
`model_explainability/tftexplainer-explain.py:0`
## `ShapExplainer.plot_explanation`
`model_explainability/shapexplainer-plot-explanation.py:0`
## `Explainer type`
`model_explainability/explainer-type.py:0`
FILE:references/components/probabilistic_likelihoods.md
# probabilistic_likelihoods (5 classes)
## `TorchLikelihood.compute_loss`
`probabilistic_likelihoods/torchlikelihood-compute-loss.py:0`
## `TorchLikelihood.sample`
`probabilistic_likelihoods/torchlikelihood-sample.py:0`
## `GaussianLikelihood.parameters`
`probabilistic_likelihoods/gaussianlikelihood-parameters.py:0`
## `QuantileRegression.sample`
`probabilistic_likelihoods/quantileregression-sample.py:0`
## `Distribution`
`probabilistic_likelihoods/distribution.py:0`
FILE:references/components/pytorch_deep_learning_forecasting.md
# pytorch_deep_learning_forecasting (7 classes)
## `TorchForecastingModel.fit`
`pytorch_deep_learning_forecasting/torchforecastingmodel-fit.py:0`
## `TorchForecastingModel.predict`
`pytorch_deep_learning_forecasting/torchforecastingmodel-predict.py:0`
## `TorchForecastingModel.save`
`pytorch_deep_learning_forecasting/torchforecastingmodel-save.py:0`
## `TorchForecastingModel.load`
`pytorch_deep_learning_forecasting/torchforecastingmodel-load.py:0`
## `Loss function`
`pytorch_deep_learning_forecasting/loss-function.py:0`
## `Optimizer`
`pytorch_deep_learning_forecasting/optimizer.py:0`
## `Training dataset`
`pytorch_deep_learning_forecasting/training-dataset.py:0`
FILE:references/components/scikit-learn_regression_forecasting.md
# scikit-learn_regression_forecasting (5 classes)
## `RegressionModel.fit`
`scikit-learn_regression_forecasting/regressionmodel-fit.py:0`
## `RegressionModel.predict`
`scikit-learn_regression_forecasting/regressionmodel-predict.py:0`
## `SKLearnClassifierModel.fit`
`scikit-learn_regression_forecasting/sklearnclassifiermodel-fit.py:0`
## `Regressor`
`scikit-learn_regression_forecasting/regressor.py:0`
## `Multi-output strategy`
`scikit-learn_regression_forecasting/multi-output-strategy.py:0`
FILE:references/components/statistical_-_classical_forecasting.md
# statistical_&_classical_forecasting (5 classes)
## `ARIMA.fit`
`statistical_&_classical_forecasting/arima-fit.py:0`
## `ARIMA.predict`
`statistical_&_classical_forecasting/arima-predict.py:0`
## `ExponentialSmoothing.fit`
`statistical_&_classical_forecasting/exponentialsmoothing-fit.py:0`
## `NaiveSeasonal.predict`
`statistical_&_classical_forecasting/naiveseasonal-predict.py:0`
## `Underlying statsmodel`
`statistical_&_classical_forecasting/underlying-statsmodel.py:0`
FILE:references/components/time_series_filtering.md
# time_series_filtering (4 classes)
## `FilteringModel.filter`
`time_series_filtering/filteringmodel-filter.py:0`
## `KalmanFilter.fit`
`time_series_filtering/kalmanfilter-fit.py:0`
## `GaussianProcessFilter.fit`
`time_series_filtering/gaussianprocessfilter-fit.py:0`
## `Filter type`
`time_series_filtering/filter-type.py:0`
FILE:references/components/timeseries_data_representation.md
# timeseries_data_representation (5 classes)
## `TimeSeries.from_csv`
`timeseries_data_representation/timeseries-from-csv.py:0`
## `TimeSeries.from_dataframe`
`timeseries_data_representation/timeseries-from-dataframe.py:0`
## `TimeSeries.slice`
`timeseries_data_representation/timeseries-slice.py:0`
## `TimeSeries.concatenate`
`timeseries_data_representation/timeseries-concatenate.py:0`
## `Backend implementation`
`timeseries_data_representation/backend-implementation.py:0`
CZSC 缠论技术分析工具,支持 K 线生成、笔线段识别、分型信号提取与 A 股回测可视化。
---
name: czsc-chan-theory
description: |-
CZSC 缠论技术分析工具,支持 K 线生成、笔线段识别、分型信号提取与 A 股回测可视化。
license: Proprietary. See LICENSE.txt in project root.
compatibility: Designed for Doramagic-host ecosystem (Claude Code / openclaw / Cursor). Requires Python 3.12+ with uv package manager.
metadata:
version: "v6.1"
blueprint_id: "finance-bp-091"
compiled_at: "2026-04-22T13:00:38.716020+00:00"
capability_markets: "cn-astock"
capability_activities: "backtesting, factor-research"
sop_version: "crystal-compilation-v6.1"
---
# 缠论技术分析 (czsc-chan-theory)
> CZSC 缠论技术分析工具,支持 K 线生成、笔线段识别、分型信号提取与 A 股回测可视化。
## Pipeline
`data_collection -> data_storage -> factor_computation -> target_selection -> trading_execution -> visualization`
## Top Use Cases (10 total)
### Sphinx Documentation Configuration (`UC-101`)
Configuring Sphinx documentation builder for the czsc project, ensuring proper Python path setup and Rust version priority
**Triggers**: documentation, sphinx, configuration
### CZSC Performance Benchmarking (`UC-102`)
Benchmarking CZSC analysis performance with varying K-line counts to measure initialization speed and memory usage
**Triggers**: benchmark, performance, speed
### Volatility Classification Signal (`UC-104`)
Classifying market volatility into three tiers (low/middle/high) based on recent K-line price ranges for signal generation
**Triggers**: volatility, classification, signal
For all **10** use cases, see [references/USE_CASES.md](references/USE_CASES.md).
**Execute trigger**: `When user intent matches intent_router.uc_entries[].positive_terms AND user uses action verb (run/execute/跑/执行/backtest/fetch/collect)`
## What I'll Ask You
- Target market: A-share (default), HK, or crypto? (US stocks in ZVT are half-baked — stockus_nasdaq_AAPL exists but coverage is thin)
- Data source / provider: eastmoney (free, no account), joinquant (account+paid), baostock (free, good history), akshare, or qmt (broker)?
- Strategy type: MACD golden-cross, MA crossover, volume breakout, fundamental screen, or custom factor?
- Time range: start_timestamp and end_timestamp for backtest period
- Target entity IDs: specific stocks (stock_sh_600000) or index components (SZ1000)?
## Semantic Locks (Fatal)
| ID | Rule | On Violation |
|---|---|---|
| `SL-01` | Execute sell orders before buy orders in every trading cycle | halt |
| `SL-02` | Trading signals MUST use next-bar execution (no look-ahead) | halt |
| `SL-03` | Entity IDs MUST follow format entity_type_exchange_code | halt |
| `SL-04` | DataFrame index MUST be MultiIndex (entity_id, timestamp) | halt |
| `SL-05` | TradingSignal MUST have EXACTLY ONE of: position_pct, order_money, order_amount | halt |
| `SL-06` | filter_result column semantics: True=BUY, False=SELL, None/NaN=NO ACTION | halt |
| `SL-07` | Transformer MUST run BEFORE Accumulator in factor pipeline | halt |
| `SL-08` | MACD parameters locked: fast=12, slow=26, signal=9 | halt |
Full lock definitions: [references/LOCKS.md](references/LOCKS.md)
## Top Anti-Patterns (25 total)
- **`AP-ZVT-183`**: 除权因子为 inf/NaN 时直接参与乘法导致复权静默失败
- **`AP-ZVT-179`**: 第三方数据接口超限后异常被吞噬,数据静默缺失
- **`AP-ZVT-183B`**: HFQ(后复权)与 QFQ(前复权)K 线表使用错误导致因子计算漂移
All 25 anti-patterns: [references/ANTI_PATTERNS.md](references/ANTI_PATTERNS.md)
## Evidence Quality Notice
> [QUALITY NOTICE] This crystal was compiled from blueprint finance-bp-091. Evidence verify ratio = 60.4% and audit fail total = 13. Generated results may have uncaptured requirement gaps. Verify critical decisions against source files (LATEST.yaml / LATEST.jsonl).
## Reference Files
| File | Contents | When to Load |
|---|---|---|
| [references/seed.yaml](references/seed.yaml) | V6+ 全量权威 (source-of-truth) | 有行为/决策争议时必读 |
| [references/ANTI_PATTERNS.md](references/ANTI_PATTERNS.md) | 25 条跨项目反模式 | 开始实现前 |
| [references/WISDOM.md](references/WISDOM.md) | 跨项目精华借鉴 | 架构决策时 |
| [references/CONSTRAINTS.md](references/CONSTRAINTS.md) | domain + fatal 约束 | 规则冲突时 |
| [references/USE_CASES.md](references/USE_CASES.md) | 全量 KUC-* 业务场景 | 需要完整示例时 |
| [references/LOCKS.md](references/LOCKS.md) | SL-* + preconditions + hints | 生成回测/交易代码前 |
| [references/COMPONENTS.md](references/COMPONENTS.md) | AST 组件地图(按 module 拆分)| 查 API 时 |
---
*Compiled by Doramagic crystal-compilation-v6.1 from `finance-bp-091` blueprint at 2026-04-22T13:00:38.716020+00:00.*
*See [human_summary.md](human_summary.md) for non-technical overview.*
FILE:human_summary.md
# Human Summary
> I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow. Just tell me what you want; I'll write the code, you don't have to dig docs. (Heads up: ZVT natively supports A-share, HK, and crypto. US stocks — stockus_nasdaq_AAPL — are half-baked; don't bother for serious work.)
## What I Can Do
- **tagline**: I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow. Just tell me what you want; I'll write the code, you don't have to dig docs. (Heads up: ZVT natively supports A-share, HK, and crypto. US stocks — stockus_nasdaq_AAPL — are half-baked; don't bother for serious work.)
- **use_cases**: ['Trading View K-Line Visualization', 'CZSC Performance Benchmarking', 'Sphinx Documentation Configuration', 'A-share MACD daily golden-cross backtest with hfq price adjustment from eastmoney', 'End-to-end ZVT pipeline: FinanceRecorder + GoodCompanyFactor + StockTrader', 'Multi-factor strategy with TargetSelector (AND mode) combining MACD + volume breakout', 'Index composition data collection (SZ1000, SZ2000) with EM recorder']
## What I Ask You
- Target market: A-share (default), HK, or crypto? (US stocks in ZVT are half-baked — stockus_nasdaq_AAPL exists but coverage is thin)
- Data source / provider: eastmoney (free, no account), joinquant (account+paid), baostock (free, good history), akshare, or qmt (broker)?
- Strategy type: MACD golden-cross, MA crossover, volume breakout, fundamental screen, or custom factor?
- Time range: start_timestamp and end_timestamp for backtest period
- Target entity IDs: specific stocks (stock_sh_600000) or index components (SZ1000)?
FILE:references/ANTI_PATTERNS.md
# Anti-Patterns (Cross-Project)
Total: **25**
## qlib (9)
### `AP-QLIB-1930` — 回测结果与模型无关——共享 dataset 对象导致预测值被首次模型覆盖 <sub>(high)</sub>
Qlib 中多个模型复用同一个已 fit 的 DatasetH 实例时,dataset 内部的标准化 参数(fit_start_time/fit_end_time 决定的归一化统计量)在第一次 fit 后固化。 切换模型但不重新初始化 dataset,导致所有模型实际使用同一套预测信号。表现为 无论换 LightGBM/XGBoost/DNN,回测净值曲线完全一致。这是最危险的"实验看起来 在跑,但结论全部无效"反模式。
Source: https://github.com/microsoft/qlib/issues/1930
### `AP-QLIB-2090` — fit_start_time 与 train segment 双重配置引发隐式数据泄露 <sub>(high)</sub>
Qlib DatasetH 有两个"训练数据范围":handler 的 fit_start_time/fit_end_time (决定归一化器拟合范围)和 segments.train(决定模型训练范围)。常见错误是 让 fit_end_time 覆盖 valid/test 段,使归一化统计量(均值、标准差)包含了 未来数据,造成前向偏差(look-ahead bias)。两者独立配置但语义耦合,文档 未明确说明 fit_end_time 必须 <= train_end。
Source: https://github.com/microsoft/qlib/issues/2090
### `AP-QLIB-2036` — MACD 因子公式文档错误——DEA 被多除一次 CLOSE 导致量纲不一致 <sub>(high)</sub>
Qlib 官方文档中的 Alpha 公式示例将 MACD 的 DEA 定义为 EMA(DIF, 9) / CLOSE, 但 DIF 已经是无量纲(除过 CLOSE 的),再次除以 CLOSE 导致 DEA 量纲为 1/price。 基于此文档公式构建的 MACD 因子在截面标准化后与正确公式差异显著,IC 下降。 此类文档层面的公式错误会被大量用户直接照搬入生产因子库。
Source: https://github.com/microsoft/qlib/issues/2036
### `AP-QLIB-2184` — 自定义 A 股数据导入前未按约定填充停牌日 NaN,引发下游因子噪声 <sub>(high)</sub>
Qlib 约定停牌日 open/close/high/low/volume/factor 字段均应填 NaN,以便框架 在因子计算时识别并跳过。用户自建 A 股数据集时若将停牌日保留为上一日价格 (常见于从东财/Wind 直接导出的数据),会导致停牌期间的价格动量因子出现 "假信号"(价格不变但因子非零)。Qlib 不校验此约定,错误静默流入训练数据。
Source: https://github.com/microsoft/qlib/issues/2184
### `AP-QLIB-1892` — PIT(Point-In-Time)财务数据收集器依赖外部股票列表接口,全量 A 股获取不完整 <sub>(high)</sub>
Qlib 的 PIT 数据收集器(财务数据时间点快照)在初始化时调用 get_hs_stock_symbols() 获取沪深股票列表。该函数依赖东财 API,经常仅返回 部分列表而非全量 5000+ 股票,且函数在获取不完整时直接 raise ValueError。 用户若按文档步骤操作,财务数据集将只覆盖部分股票,基于 PIT 财务因子的回测 存在严重生存者偏差(未被采集的股票被隐式排除)。
Source: https://github.com/microsoft/qlib/issues/1892
### `AP-QLIB-2097` — 全市场 instrument="all" 在 32GB 内存机器上 OOM,但 CSI300 正常 <sub>(medium)</sub>
Qlib 在加载 Alpha158 特征时会将指定 universe 的全部特征矩阵一次性载入内存。 使用 instrument="csi300"(300 股)与 instrument="all"(5000+ 股)的内存占用 差约 16 倍。32GB 机器跑全市场时在 init_instance_by_config 阶段直接 OOM, 错误信息不提示内存问题。用户容易误以为是配置错误,实际上需要分批加载或 使用流式特征计算。
Source: https://github.com/microsoft/qlib/issues/2097
### `AP-QLIB-1984` — LightGBM 模型标签维度校验逻辑永远不触发导致多标签训练静默失败 <sub>(medium)</sub>
Qlib gbdt.py 中用 y.values.ndim == 2 判断是否为多标签,但从 DataFrame 取出的 Series 的 ndim 永远为 1,条件永远为 False,因此多标签训练不会走 squeeze 分支,而是直接进入 LightGBM 训练并在更深处抛出语义不明的错误。 用户尝试自定义多标签任务时无法从错误信息定位到此根因。
Source: https://github.com/microsoft/qlib/issues/1984
### `AP-QLIB-1915` — 自定义 CSV 数据 dump_bin 后 DataHandler 报 Length mismatch,D.features 却正常 <sub>(high)</sub>
Qlib 存在两套数据访问路径:D.features(直接读 binary)和 DataHandler/DataHandlerLP (带 processor pipeline)。自定义 A 股 CSV 数据在 dump_bin 时若字段顺序 或 symbol 格式(如 600000.SH vs SH600000)与 Qlib 约定不符,DataHandler 的 processor 在 align/reindex 时触发 Length mismatch,而 D.features 因不 经过 processor 而成功。这一"两套路径行为不一致"让用户误以为数据已正确导入。
Source: https://github.com/microsoft/qlib/issues/1915
### `AP-QLIB-1949` — Colab/Linux 多进程后端与 Qlib ParallelExt 冲突导致 DataHandler 完全不可用 <sub>(medium)</sub>
Qlib 在非 fork 环境(Windows 或 Google Colab)中,DataHandler 使用 joblib 并行加载特征时,ParallelExt 初始化时访问 _backend_args 属性失败(AttributeError)。 根因是 joblib 1.5+ 移除了该内部属性,Qlib 的兼容层未更新。表现为 D.features 调用抛出多层嵌套异常,用户无法从错误栈判断是并行后端问题还是数据问题。
Source: https://github.com/microsoft/qlib/issues/1949
## vnpy (4)
### `AP-VNPY-3691` — K 线生成器首根 K 线时间戳不对齐,导致第一个周期信号错误 <sub>(high)</sub>
vnpy BarGenerator 在合成 N 分钟 K 线时,第一根推送的 K 线时间戳为"当前 tick 所在分钟"而非"完整 N 分钟周期结束时间"。具体表现:09:59 的 tick 会 触发一根不完整的 5 分钟 K 线推送(本应等到 10:04 才推送)。策略若在 on_bar 中直接用 datetime.minute % 5 过滤,第一根 K 线恰好通过,但包含的 数据不足一个完整周期,用于信号计算会产生错误的开仓信号。
Source: https://github.com/vnpy/vnpy/issues/3691
### `AP-VNPY-3669` — Alpha 模块历史数据增量保存时新旧 DataFrame schema 不兼容导致 SchemaError <sub>(medium)</sub>
vnpy Alpha 模块在保存 K 线数据到 Parquet 文件时,将新下载数据(可能含 Float64 列)与已存文件(历史 Int64 列)直接 polars.concat。polars 强类型 不允许隐式类型提升,抛出 SchemaError。根因是不同数据源/版本返回的字段类型 不一致(如 volume 在部分行情源为整数,在另一些为浮点),且 concat 前无 schema 对齐步骤。影响所有使用 vnpy alpha 进行回测的历史数据构建流程。
Source: https://github.com/vnpy/vnpy/issues/3669
### `AP-VNPY-3685` — 价差交易模块 run_backtesting() 在 Jupyter 环境下静默报错,结果不可信 <sub>(high)</sub>
vnpy 4.10 价差交易(SpreadTrading)模块的 run_backtesting() 在 Jupyter 环境下存在事件循环冲突(asyncio already running),导致回测引擎部分逻辑 不执行但不抛异常,返回看似正常的回测统计数据。同样代码在命令行 Python 中无此问题。vnpy 4.x 将部分 IO 改为 async 但 Jupyter 的事件循环与之不兼容, 是"回测结果看起来正确但实际不完整"的隐蔽陷阱。
Source: https://github.com/vnpy/vnpy/issues/3685
### `AP-VNPY-3700` — 安装脚本不使用 venv 导致全局 numpy 版本被降级破坏其他依赖 <sub>(medium)</sub>
vnpy install.bat 直接在系统/conda base 环境安装,会强制降级 numpy 到 <2.0 以满足 vnpy 依赖,破坏依赖 numpy 2.x 的其他量化工具(如 scipy、pytorch 新版)。 没有 requirements.txt,依赖边界不透明。在多工具共存的量化研究环境中, vnpy 的安装脚本是"全局环境污染"的常见根源。
Source: https://github.com/vnpy/vnpy/issues/3700
## zipline (6)
### `AP-ZIPLINE-138` — 回测价格为未复权价,教程图表误导用户误判策略收益 <sub>(high)</sub>
Zipline 教程使用 AAPL 股价图做演示,但 bundle 中存储的是未复权价格(raw price), 而非经过拆股/分红调整的复权价。图表显示的历史价格与市场实际价约差 4 倍(Apple 历次拆股累计因子),用户误将"价格翻 4 倍"当作策略收益。A 股场景更严重: 除权前后价格跳变会在未复权数据中形成巨大"信号",吸引技术指标在除权日产生 虚假突破信号。
Source: https://github.com/stefan-jansen/zipline-reloaded/issues/138
### `AP-ZIPLINE-235` — 默认以当根 K 线收盘价成交,低估实盘滑点,策略回测收益虚高 <sub>(high)</sub>
Zipline 默认滑点模型在当根 K 线触发信号后,以同根 K 线收盘价成交(current bar close fill)。实盘中信号只能在下一根 K 线的开盘价附近成交(T+1 order execution)。以 A 股日线为例,用收盘价回测比用次日开盘价成交平均高估日收益 约 0.1-0.3%,年化差距可超 30%。需显式配置 slippage model 为 VolumeShareSlippage 或 FixedSlippage 并设合理 volume_limit。
Source: https://github.com/stefan-jansen/zipline-reloaded/issues/235
### `AP-ZIPLINE-190` — 日历 start_session 设为非交易日触发 DateOutOfBounds,无提示如何修正 <sub>(medium)</sub>
Zipline 在注册 bundle 或运行算法时,若 start_session 参数恰好是非交易日 (如 1998-01-01 元旦),Calendar 校验抛出 DateOutOfBounds("cannot be earlier than the first session")。错误信息仅显示交易日历起始日,不提示"请改为第一个 交易日"。A 股场景:使用 SSE/SZSE 日历时,若 start_date 恰好是春节前最后 一天次日(节假日),会触发同类错误,调试成本极高。
Source: https://github.com/stefan-jansen/zipline-reloaded/issues/190
### `AP-ZIPLINE-181` — asset db 过期后 Pipeline 报"no assets traded",误导用户排查数据范围 <sub>(high)</sub>
Zipline 的 asset database(SQLite)记录每只股票的 start/end 交易日期。若 使用了旧版 Quandl/自建 bundle 且未重新 ingest,在回测新日期范围时 Pipeline 抛出 "Failed to find any assets with country_code 'US' that traded between [dates]"。A 股场景:重新下载行情后若只更新价格数据而未重建 asset db,退市/ 新上市股票的日期范围不更新,Pipeline 过滤会悄悄排除这些股票,产生生存者偏差。
Source: https://github.com/stefan-jansen/zipline-reloaded/issues/181
### `AP-ZIPLINE-285` — week_start()/week_end() 在自定义日历(非美股)下静默失效 <sub>(medium)</sub>
Zipline schedule_function 的 date_rules.week_start() 和 date_rules.week_end() 依赖交易日历的周首/周末判断逻辑,但在非美股日历(如 ASX、SSE)中,该逻辑 与 NYSE 日历的偏移计算不兼容,导致 schedule 永远不触发或在错误的日期触发。 A 股场景:使用 SSE 日历时,含春节等连续长假的周,week_start 可能跳过整个 假期周而不调仓,但用户无法从日志发现未触发的调度。
Source: https://github.com/stefan-jansen/zipline-reloaded/issues/285
### `AP-ZIPLINE-240` — 回测日期时区必须为 UTC,传入 naive datetime 引发深层 AssertionError <sub>(medium)</sub>
Zipline 内部强制要求所有时间戳为 UTC aware datetime。当用户传入 naive datetime (无时区信息,如 pd.Timestamp('2020-01-01'))时,不在入口处报错,而是在 算法执行深处触发 AssertionError: Algorithm should have a utc datetime,栈深 难以定位。A 股开发者从本地 CST 时间导入数据时极易触发此陷阱,需在 bundle 注册时显式 tz_localize('UTC')。
Source: https://github.com/stefan-jansen/zipline-reloaded/issues/240
## zvt (6)
### `AP-ZVT-183` — 除权因子为 inf/NaN 时直接参与乘法导致复权静默失败 <sub>(high)</sub>
ZVT 在计算前复权因子时以 new/old 价格比计算 qfq_factor。当 old==0(新股首日 或数据缺失)时因子为 inf;当 kdata.open 本身为 None(停牌日未填充)时乘法 抛出 TypeError。结果:整个 entity 的复权计算中断,后续 K 线全部丢失,但主 流程只 log ERROR 不中断,用户往往不知道已有大量股票数据损坏。
Source: https://github.com/zvtvz/zvt/issues/183
### `AP-ZVT-179` — 第三方数据接口超限后异常被吞噬,数据静默缺失 <sub>(high)</sub>
ZVT 使用聚宽 jqdatasdk 批量拉取全市场 K 线时(4000+ 股票),触发聚宽每日 最大查询条数限制(错误:已超过每日最大查询数量)。ZVT 捕获异常后继续执行下一 entity,导致超限后所有股票的当日数据均静默缺失。回测若使用该残缺数据库,因 子计算结果将产生系统性偏差,且无告警。
Source: https://github.com/zvtvz/zvt/issues/179
### `AP-ZVT-161` — 全市场 SQLite 批量因子计算触发 too many SQL variables 错误 <sub>(medium)</sub>
ZVT 在计算 VolumeUpMaFactor 等多股因子时,将所有 entity_id 拼入单条 SQL 的 IN 子句。当 A 股全市场(5000+ 股)一次性查询时,触发 SQLite 默认限制 SQLITE_MAX_VARIABLE_NUMBER=999。调大 max_allowed_packet(MySQL 参数)无效, 根因是 SQLite 变量数上限。正确解法是分批查询,但 ZVT 早期版本未处理此边界。
Source: https://github.com/zvtvz/zvt/issues/161
### `AP-ZVT-129` — 使用通配符导入隐藏 API 版本变更,AdjustType 等枚举莫名消失 <sub>(medium)</sub>
ZVT 文档示例使用 `from zvt import *` 导入所有符号。当 ZVT 版本升级重构 枚举(如将 AdjustType 移入子模块)后,通配符导入不再包含该符号,触发 AttributeError。使用者误以为是安装问题,实际是版本间 API breaking change 未在 CHANGELOG 中标注,且通配符导入掩盖了具体来源。应显式 import 枚举类。
Source: https://github.com/zvtvz/zvt/issues/129
### `AP-ZVT-187` — 回测引擎未在数据层空结果时提前终止,导致空指针级联崩溃 <sub>(medium)</sub>
ZVT Trader 在 load_data 完成后检查数据为空时,不提前退出,而是将空 DataFrame 传入 selector 计算,触发后续 NoneType 操作链式崩溃。错误栈深且难以定位根因, 用户误以为是策略逻辑问题。根因是数据时间窗口配置错误(start/end 不在数据 库覆盖范围内)但无有效校验。
Source: https://github.com/zvtvz/zvt/issues/187
### `AP-ZVT-183B` — HFQ(后复权)与 QFQ(前复权)K 线表使用错误导致因子计算漂移 <sub>(high)</sub>
ZVT 提供 Stock1dKdata(不复权)、Stock1dHfqKdata(后复权)、Stock1dQfqKdata (前复权)三张独立表。用户在计算价格动量/均线因子时混用两张表(如用不复权 做均线,用后复权做收益率),导致除权日前后因子值产生跳变。ZVT 不做跨表 复权类型一致性校验,混用静默通过。
Source: https://github.com/zvtvz/zvt/issues/183
FILE:references/COMPONENTS.md
# Component Capability Map
**Project**: finance-bp-091--czsc
**Scan date**: 2026-04-22
**Stats**: {'total_files': 6, 'total_classes': 33, 'total_functions': 0, 'total_stages': 6}
## Modules (6)
- [data_collection_layer](components/data_collection_layer.md): 5 classes
- [chan_theory_analysis](components/chan_theory_analysis.md): 5 classes
- [signal_computation](components/signal_computation.md): 5 classes
- [event_&_position_management](components/event_-_position_management.md): 5 classes
- [trading_execution](components/trading_execution.md): 7 classes
- [backtest_&_performance_analysis](components/backtest_-_performance_analysis.md): 6 classes
FILE:references/CONSTRAINTS.md
# Constraints
## preservation_manifest
```yaml
required_objects:
business_decisions_count: 162
fatal_constraints_count: 46
non_fatal_constraints_count: 189
use_cases_count: 10
semantic_locks_count: 12
preconditions_count: 4
evidence_quality_rules_count: 2
traceback_scenarios_count: 5
```
## Domain Constraints Injected (71)
- **`SHARED-CN-ASTOCK-T1-001`** <sub>(fatal)</sub>: A 股股票实行 T+1 交收制度:T 日买入的股票最早 T+1 日方可卖出。 T 日卖出所得资金可当日再用于买入。回测框架若未施加 T+1 持仓锁定, 将高估换手率与策略胜率,尤其损害日内反转类策略的真实性。
- **`SHARED-CN-ASTOCK-T1-002`** <sub>(fatal)</sub>: 沪深主板股票日涨跌幅上限为 ±10%(ST/SST 股票 ±5%)。 涨停封板时买方消失、跌停封板时卖方消失;回测若假设当日可以任意价格 成交,会系统性高估可执行性。封板检测应在成交模拟层强制实施。
- **`SHARED-CN-ASTOCK-T1-003`** <sub>(high)</sub>: 科创板和创业板(2020年8月改革后)正常交易日涨跌幅为 ±20%; 北交所 ±30%;新股上市后前5个交易日不设涨跌幅限制。 回测若对所有股票统一套用 ±10% 过滤逻辑,会错误剔除或错误包含这些板块的成交。
- **`SHARED-CN-ASTOCK-T1-004`** <sub>(high)</sub>: ST/*ST 股票日涨跌幅限制为 ±5%,流动性极差,成交假设不可与正常股票混用。 包含历史 ST 股票(最终退市)但不纳入回测会产生幸存者偏差; 纳入回测但不区分 ST 涨跌幅会错误模拟成交。
- **`SHARED-CN-ASTOCK-T1-005`** <sub>(medium)</sub>: A 股开盘集合竞价(9:15-9:25)和收盘集合竞价(14:57-15:00)期间, 成交价由"最大成交量原则"确定,非即时撮合。回测以开盘价或收盘价假设 即时全量成交会低估实际滑点风险,大单策略尤为明显。
- **`SHARED-CN-ASTOCK-T1-006`** <sub>(high)</sub>: 停牌制度:A 股长期停牌(2018年前可长达数月)期间,持仓资金被锁定, 无法再平衡,机会成本在回测中普遍被忽略。应在因子计算前过滤停牌日 (volume == 0 或 is_suspended == True),停牌期间不发出信号。
- **`SHARED-CN-ASTOCK-T1-007`** <sub>(high)</sub>: 新股上市后前5个交易日无涨跌幅限制(首日涨幅可超300%), 且无完整历史数据(均线/波动率/换手率因子无法计算)。 应在因子计算前过滤上市不足 N 个交易日(通常 60-252 日)的股票。
- **`SHARED-CN-ASTOCK-T1-008`** <sub>(high)</sub>: A 股程序化交易监管新规(2025年7月7日施行):单账户每秒申报/撤单 ≥ 300 笔, 或单日申报/撤单 ≥ 20000 笔,被认定为高频交易,须向交易所报备。 AI 生成的量化策略若频率超标则无法合规运行,应在策略设计期提示。
- **`SHARED-CN-ASTOCK-ADJ-001`** <sub>(fatal)</sub>: 除权除息日股价跳空是账面调整而非真实亏损。复权选择: 不复权会虚增策略亏损;前复权会将历史价格内嵌未来分红信息(lookahead bias); 后复权以上市首日为基准累积,是量化回测的最优选择。
- **`SHARED-CN-ASTOCK-ADJ-002`** <sub>(fatal)</sub>: A 股上市公司财务报告披露有法定延迟:年报在次年4月30日前、 半年报在8月31日前、季报分别在4月30日(一季)/10月31日(三季)前披露。 回测中使用财务数据时,必须以实际披露日期(announcement_date)而非 会计期间结束日作为数据可用时间点,否则引入 point-in-time lookahead bias。
- **`SHARED-CN-ASTOCK-ADJ-003`** <sub>(high)</sub>: 分红送股转增和配股会导致除权除息日后股本增加,历史持股数量不变但股价等比 缩水,若回测系统未同步调整持仓股数,会在除权日产生虚假亏损或盈利。
- **`SHARED-CN-ASTOCK-ADJ-004`** <sub>(medium)</sub>: 大宗交易与竞价交易价差:大宗交易成交价可比市价折价最多 10%(主板), 但此价格不影响次日竞价开盘。大宗交易数据出现在收盘后,若将其混入 日内 OHLCV 数据,会污染收盘价和成交量的正常计算。
- **`SHARED-CN-ASTOCK-ADJ-005`** <sub>(fatal)</sub>: 融资融券(两融)做空限制:A 股散户无法直接卖空,融券标的池有限(主要为 大盘蓝筹,中小盘融券极度稀缺),融券利率远高于融资利率。 回测若直接假设可做空任意股票,会产生不可执行的策略,实盘与回测严重背离。
- **`SHARED-CN-ASTOCK-FX-001`** <sub>(high)</sub>: 通过沪深港通(北向)买入股票,境外投资者合计持股上限 30%,预警线 28%。 当外资持股比例达 28% 时,联交所暂停该股新增买盘,直到降至 26% 才恢复。 策略若重仓外资偏好股(消费/医药龙头),需监控外资持股比例。
- **`SHARED-CN-ASTOCK-FX-002`** <sub>(high)</sub>: 5% 举牌规则:单一投资者持有上市公司已发行股份超过 5%,须在3日内向证监会 和交易所报告并公告;在此期间及公告后2日内不得再买卖。 量化选股系统若不考虑此规则,重仓股超过 5% 阈值后将面临强制停止买入。
- **`SHARED-CN-ASTOCK-FX-003`** <sub>(high)</sub>: 公募基金"双十原则":单基金持有单只股票不超过净资产 10%, 同一基金管理人旗下所有基金合计不超过该公司已发行股份 10%。 量化选股组合若部署于公募基金,需在优化约束中强制加入合规上限。
- **`SHARED-CN-ASTOCK-FX-004`** <sub>(fatal)</sub>: 内幕交易边界:AI 辅助量化系统的所有输入数据必须来自公开已披露信息。 通过非公开渠道(私有数据服务/内部消息/重组前预知)触发的自动化交易 构成内幕交易,适用《证券法》第80-83条及《内幕交易行为认定指引》。
- **`SHARED-CN-ASTOCK-MKT-001`** <sub>(fatal)</sub>: 幸存者偏差:使用当前 A 股成分股(如当前沪深300)作为历史回测股票池, 会遗漏曾被纳入指数但因业绩差被调出或退市的股票。2020-2024年 A 股 退市数量加速(41家/年创纪录),此偏差日趋严重。必须使用历史时点快照。
- **`SHARED-CN-ASTOCK-MKT-002`** <sub>(medium)</sub>: 指数成分股调整效应:沪深300/中证500等每半年调整一次(6月/12月), 被纳入股票通常在公告日至生效日之间显著上涨(被动资金被动买入), 被剔除股票则相反。回测股票池应使用历史成分股快照,并标注调整窗口期。
- **`SHARED-CN-ASTOCK-MKT-003`** <sub>(high)</sub>: 策略拥挤(Strategy Crowding):大量量化私募使用相似因子模型时, 持仓高度重叠,遇市场冲击时集体卖出形成踩踏。2024年2月 A 股量化危机 是典型案例(小盘股指数单日跌幅超 10%)。需监控因子多头持仓与 主流量化基金的重叠率。
- **`SHARED-CN-ASTOCK-MKT-004`** <sub>(high)</sub>: A 股量化对冲策略常用 IF/IC/IM 股指期货做多/空对冲系统性风险。 但 A 股股指期货长期处于贴水(远期价格 < 现货),IC 年化贴水可达 10-20%。 回测若仅考虑价格收益而忽略期货贴水/升水,会严重高估对冲策略净收益。
- **`SHARED-CN-ASTOCK-MKT-005`** <sub>(high)</sub>: A 股月度动量因子在方向上与美股相反:近1个月表现最好的股票, 下1个月大概率反转(反转效应而非动量)。机构研究(华泰/东吴证券) 与学术论文均验证:直接套用美股月度动量因子在 A 股会产生系统性亏损。
- **`SHARED-CN-ASTOCK-BF-001`** <sub>(medium)</sub>: 处置效应(Shefrin & Statman 1985)在 A 股散户中尤为显著: 投资者倾向于过早卖出盈利股票、过长持有亏损股票。上交所实证研究证实 超过 90% 的个人账户存在此效应,AI 辅助工具不应迁就"持有亏损等解套" 的直觉,而应基于量化信号提供纪律性止损止盈建议。
- **`SHARED-CN-ASTOCK-BF-002`** <sub>(medium)</sub>: A 股以散户为主(个人账户交易量占比超 80%),羊群效应显著:散户倾向于 跟风操作,导致价格非理性波动(如 2015年杠杆牛熊)。量化策略应避免 使用成交量排行/热度排行等可能强化羊群信号的指标作为主要因子。
- **`SHARED-CN-ASTOCK-BF-003`** <sub>(medium)</sub>: 过度自信效应(Barber & Odean 2000)在 A 股散户中更严重:散户年均换手率 超 500%,机构长期收益显著优于散户。高换手率策略经交易成本后净收益往往 更低。AI 不应鼓励"频繁操作",而应推荐低频高质信号驱动交易。
- **`SHARED-CN-ASTOCK-BF-004`** <sub>(medium)</sub>: A 股日历效应:春节效应(节前5日和节后1-3日倾向上涨)、月初效应 (月初第1-5个交易日表现优于月中/月末)已有学术实证(南京财经大学等)。 策略应在日历特殊窗口降低信号置信度,或单独评估日历驱动收益的贡献。
- **`SHARED-CN-ASTOCK-BF-005`** <sub>(high)</sub>: 策略容量(Capacity)限制:A 股小盘/微盘股日均成交额仅数百万, 大资金买入/卖出会造成严重价格冲击,策略实际容量可能仅几千万元。 回测结果不可外推至亿级资金,应在回测中加入成交量比例上限约束。
- **`SHARED-CN-ASTOCK-COST-001`** <sub>(fatal)</sub>: A 股完整交易成本结构(2023年8月调整后):印花税卖出单向 0.05%; 佣金双向约 0.01%(最低5元);过户费(沪市)0.001%; 滑点/冲击成本小盘股 0.1%-0.5%/次。忽略成本的回测策略年化收益率 具有欺骗性,高频/高换手策略尤甚。
- **`SHARED-CN-ASTOCK-COST-002`** <sub>(high)</sub>: 市场冲击成本(Market Impact)在回测中通常完全缺失,但在实盘中可能是 最大成本来源。A 股小盘股 100 万元买入可能推高 1% 以上。冲击成本与 成交规模呈幂律而非线性关系,应使用 Almgren-Chriss 模型或简化版估算。
- **`SHARED-CN-ASTOCK-COST-003`** <sub>(medium)</sub>: 大股东/董监高减持新规(证监会第224号令,2024年5月):持股5%以上大股东 通过集中竞价减持须提前15个交易日披露减持计划,3个月内不超过股份总数1%。 解禁股减持压力是 A 股特有的系统性风险因子,回测中忽略解禁日历会低估 相关股票的持股风险。
- **`SHARED-CN-ASTOCK-DATA-001`** <sub>(high)</sub>: A 股交易日历与自然日历不一致:存在法定节假日调休导致的"补班日"(周六上班), 以及临时停市(2015年7月8日至7月10日因股灾紧急停市)。 使用通用工作日历(weekdays)推算 A 股交易日会产生偏差, 必须使用 A 股专用交易日历(如 exchange_calendars 或 tushare 的交易日接口)。
- **`SHARED-CN-ASTOCK-DATA-002`** <sub>(medium)</sub>: A 股退市后股票代码可能被新股重用(极少见但存在)。使用纯代码(如 '000001') 作为历史数据主键而不包含交易所后缀('.SZ')或上市日期范围,可能导致 历史数据与当前股票的错误混淆,长周期回测中需特别注意。
- **`SHARED-BT-LAB-001`** <sub>(fatal)</sub>: 未来函数(Lookahead Bias):在模拟历史时间点 t 的交易决策时, 不得使用 t 时刻之后才能知道的信息。最常见形式: (1) 使用收盘价计算信号并同日以收盘价成交; (2) 将 T 日收盘后计算的指标标记在同一根 K 线; (3) 使用当日最高/最低价作为成交假设。 信号计算与成交时间必须对齐:T 日收盘后计算信号,T+1 日开盘成交。
- **`SHARED-BT-LAB-002`** <sub>(high)</sub>: 指标预热期(Warmup Period)处理:滚动窗口指标在前 N 个 bar 时 NaN, 这些 bar 不应参与信号计算和持仓决策。强制要求指标的 warmup_period 与最长 lookback 期等长,且 warmup 期间持仓应置零。
- **`SHARED-BT-LAB-003`** <sub>(fatal)</sub>: ML/DL 模型时序数据划分必须按时间顺序:TRAIN < VALID < TEST, 不可使用随机 k-fold 分折(会将未来数据混入训练集)。 应使用 TimeSeriesSplit 或 Walk-Forward 验证。
- **`SHARED-BT-LAB-004`** <sub>(fatal)</sub>: 开盘价/最高价/最低价成交假设:日线回测中假设每日可以最高价卖出或 最低价买入(如动量策略"最高价止盈"),这是明显的 lookahead, 因为日内最高/最低价只有收盘后才能确认。成交价只能用开盘价或 前一日收盘价(带滑点)。
- **`SHARED-BT-LAB-005`** <sub>(high)</sub>: 数据对齐偏移(Off-by-one):pandas rolling/shift 等操作容易引入细微的 1 期偏移错误。应在代码中明确记录每个序列的"观测时间点", 并通过 assert 验证关键时间对齐关系。
- **`SHARED-BT-LAB-006`** <sub>(high)</sub>: 过度优化(Overfitting):回测数量越多,过拟合概率越高。 Bailey et al.(2014)证明 Optimal Sharpe Ratio 期望值随回测次数单调递减。 应使用 Walk-Forward 验证代替 in-sample 参数穷举,并报告 Deflated Sharpe Ratio(DSR)而非峰值 Sharpe。
- **`SHARED-BT-SURV-001`** <sub>(fatal)</sub>: 幸存者偏差(Survivorship Bias):使用当前市场成分股作为历史回测股票池, 会遗漏曾经存在但后来退市、摘牌或被合并的股票,系统性高估策略历史收益率。 回测股票池必须使用历史时点快照(point-in-time universe)。
- **`SHARED-BT-SURV-002`** <sub>(high)</sub>: In-Sample / Out-of-Sample 划分:策略开发、参数选择必须在样本内完成, 样本外数据仅用于最终验证,不可多次"看"样本外数据后继续调优 (会将样本外变为新的样本内,重蹈过拟合)。
- **`SHARED-BT-SURV-003`** <sub>(high)</sub>: 停牌/缺失数据的填充策略:停牌日价格不可简单用前一日收盘价 forward-fill, 因为这会在复盘时造成"零成交量"日参与了因子计算和信号生成。 应在因子计算层显式过滤缺失交易日,不填充。
- **`SHARED-BT-SURV-004`** <sub>(high)</sub>: 异常值(Extreme Value)污染:原始市场数据可能含有数据源错误(如除权未 及时调整、手工录入错误导致的极端价格),不清洗直接进入因子计算会产生 极端信号,污染整个横截面。应在 pipeline 入口处过滤 3-sigma 异常值。
- **`SHARED-BT-COST-001`** <sub>(fatal)</sub>: 交易成本(佣金 + 印花税/转让税 + 过户费)必须在回测初始化时强制配置, 不可使用零成本默认值。忽略成本的回测策略绩效指标具有欺骗性, 高换手率策略尤其严重(单边往返成本往往吞噬 50%+ 的毛收益)。
- **`SHARED-BT-COST-002`** <sub>(high)</sub>: 滑点(Slippage)建模:回测若无滑点,假设每笔订单以理想价格成交, 高频策略在实盘中会因成交价劣化而产生严重亏损。至少应配置固定点差 或比例滑点;大单应使用成交量比例模型(如不超过日成交量 5%)。
- **`SHARED-BT-COST-003`** <sub>(high)</sub>: 换手率(Turnover)必须在回测绩效报告中展示并与成本关联分析。 月换手率超过 50%(年化 600%+)时,策略净收益对成本假设极度敏感, 每 10bps 成本变化可能改变策略盈亏结论,必须做成本敏感性分析。
- **`SHARED-BT-COST-004`** <sub>(medium)</sub>: 仓位规模化(Position Sizing)必须纳入资金量约束:回测应模拟固定资金量 下的实际持仓股数(取整),而非假设可以持有小数股。 对小盘股,最小交易单位(A股:100股/手)会导致实际可持仓量与目标权重 产生偏差,应在回测中模拟取整效应。
- **`SHARED-BT-TIME-001`** <sub>(high)</sub>: 时间戳时区统一:多数据源合并时,UTC vs 本地时间混用是常见数据腐败源。 所有时间戳必须在 pipeline 入口处统一转换为同一时区(推荐 UTC 存储, 市场本地时区展示),不可在 pipeline 中途混用不同时区。
- **`SHARED-BT-TIME-002`** <sub>(high)</sub>: 交易日历对齐:合并不同市场或不同频率数据时(如日线价格 + 周频因子), 必须使用明确的交易日历进行 reindex/merge,不可使用 outer join 后 fillna, 否则会在非交易日(节假日)创建虚假数据行。
- **`SHARED-BT-TIME-003`** <sub>(high)</sub>: 增量更新边界校验:历史数据增量更新时,必须从数据库查询已存最新日期, 仅下载该日期之后的数据。若重新下载已有数据并追加,会产生时间戳重复行, 导致回测时序错误。更新前后必须校验无重复 (index.duplicated().any() == False)。
- **`SHARED-BT-TIME-004`** <sub>(medium)</sub>: 回测绩效归因失真:基准(Benchmark)选择不当会使 Alpha/Beta 计算失真。 应选用策略实际可投资的被动基准(如 HS300 ETF),而非不可直接投资的 价格指数(如 HS300 指数)。价格指数不含股息再投资,会低估持仓基准收益。
- **`SHARED-BT-PERF-001`** <sub>(medium)</sub>: 最大回撤(Max Drawdown)计算必须使用净值序列(portfolio value), 不可用累计收益率序列代替。若使用对数收益率累加,会低估回撤深度 (因对数收益率在下跌时会比简单收益率偏小)。
- **`SHARED-BT-PERF-002`** <sub>(medium)</sub>: Sharpe Ratio 年化化约定:年化 Sharpe = 日 Sharpe × sqrt(252)(股票,252 交易日) 或 × sqrt(365)(加密货币,365日)。不同系统默认不同,跨系统对比前必须 确认年化因子,否则 Sharpe 不可比。
- **`SHARED-BT-PERF-003`** <sub>(medium)</sub>: Calmar Ratio / Sortino Ratio 优于 Sharpe Ratio 作为风险调整收益指标: Sharpe 假设收益正态分布,A 股/加密市场的收益分布显著左偏(肥尾), 会低估下行风险。量化评估应同时报告 Sortino(仅下行波动)和 Calmar(年化收益/最大回撤),不应单一依赖 Sharpe。
- **`SHARED-BT-PERF-004`** <sub>(medium)</sub>: 回测绩效归因应拆解为:alpha(主动收益)、beta(市场收益)、 因子暴露收益(style/sector)和特异性收益(stock selection)。 不做归因的回测无法区分"策略优秀"与"顺风行情恰好 beta 对了"。
- **`SHARED-FR-IC-001`** <sub>(high)</sub>: IC(信息系数)是衡量因子预测能力的核心指标,定义为因子值与 下期收益率的 Spearman 秩相关系数(ICIR = IC / std(IC))。 IC 绝对值 > 0.05 视为有预测能力的初步证据,ICIR > 0.5 视为稳定。 不计算 IC 直接报告回测绩效是因子有效性证明缺失的典型问题。
- **`SHARED-FR-IC-002`** <sub>(high)</sub>: IC 衰减(IC Decay)分析:因子预测能力通常随持仓期增长而衰减。 应计算 1/5/10/20 日 IC 序列,识别因子的最优持仓期。 IC 在1日高但20日迅速衰减的因子是短期因子,不适合月度换仓策略; 反之亦然。使用错误的持仓期会严重损害因子实盘表现。
- **`SHARED-FR-IC-003`** <sub>(high)</sub>: Harvey, Liu & Zhu (2016) 警告:学术界已发现 300+ 个"显著"因子, 其中大量是多重检验下的误发现(False Discovery)。因子有效性要求: t-stat > 3.0(而非传统的 1.96);或在不同时段/市场独立复现; 或有清晰的经济学逻辑。不满足上述条件的因子极可能是数据挖掘产物。
- **`SHARED-FR-IC-004`** <sub>(high)</sub>: 因子换手率(Factor Turnover)控制:高 IC 但高换手率的因子,在扣除 交易成本后净 IC 可能为负。应计算换手率调整后的有效 IC: net_IC = IC - turnover × cost_per_turn。目标换手率 ≤ 50%(月频)。
- **`SHARED-FR-IC-005`** <sub>(medium)</sub>: 因子衰减期(Half-life)是因子信号强度的核心参数,直接决定最优再平衡频率。 半衰期 < 5 日:日频或周频换仓;5-20 日:周频或双周;> 20 日:月频换仓。 错误地对短期因子使用月频换仓,会导致大量 alpha 在持仓期内消散。
- **`SHARED-FR-NEUT-001`** <sub>(high)</sub>: 行业中性化(Industry Neutralization):因子值若不对行业均值中性化, 因子收益中会混入行业轮动收益,难以判断是因子本身还是行业暴露驱动了收益。 行业中性化操作:factor_neutral = factor - industry_mean(factor)。
- **`SHARED-FR-NEUT-002`** <sub>(high)</sub>: 市值中性化(Market Cap Neutralization):小盘股效应(小盘跑赢大盘) 是金融史上最持久的 anomaly 之一,会污染几乎所有未中性化的因子。 若因子与市值高度相关,选股会系统性偏向小盘,收益来自市值暴露而非因子本身。 需同时进行行业和市值中性化(Fama-MacBeth 回归或残差法)。
- **`SHARED-FR-NEUT-003`** <sub>(high)</sub>: 异常值处理(Winsorize/MAD):因子原始值通常含有极端值,极端值会扭曲 分组分析(如 Q1/Q10 十分位)。应对原始因子值做 Winsorize(截尾至 [1%, 99%] 或 3-sigma)或 MAD(中位数绝对偏差)缩尾,然后再排名/中性化。
- **`SHARED-FR-NEUT-004`** <sub>(medium)</sub>: 因子正交化(Factor Orthogonalization):当多个因子共同用于合成打分时, 高相关因子的合成等效于对单一因子过度权重,稀释信号多样性。 应在合成前对因子做施密特正交化或 PCA,消除因子间的多重共线性。
- **`SHARED-FR-NEUT-005`** <sub>(medium)</sub>: 缺失数据填充策略:因子计算中的 NaN(停牌/新股/数据缺口)若用截面均值填充 会引入 lookahead bias(均值本身含未来信息);若完全删除会产生幸存者偏差; 正确做法是用截面中位数(当日所有股票的中位数,不依赖未来)或将该股当日排除。
- **`SHARED-FR-PORT-001`** <sub>(high)</sub>: 分层分析(Quantile Analysis):因子评估应使用 Q1/Q5(五分位)或 Q1/Q10(十分位)分组的多空收益差(top minus bottom spread)作为 主要评估指标,而非简单的多头收益。Q1 多 Q5 空的"单调性"检验是 因子有效性的核心证据:单调递增/递减 > 非单调 >> 仅多头有效。
- **`SHARED-FR-PORT-002`** <sub>(medium)</sub>: Alpha 衰减测试(Alpha Decay Test):因子的月度 IC 在不同时段(牛市/熊市/ 震荡市)的稳定性是因子鲁棒性的重要证据。IC 仅在某个特定市场状态下有效 的因子不适合全天候部署;应分段(rolling 12M)展示 IC 时序, 识别因子失效期。
- **`SHARED-FR-PORT-003`** <sub>(medium)</sub>: 换仓成本感知(Turnover-Aware Selection):因子排名靠近中间地带(49-51 分位) 的股票,排名小幅波动就会触发换仓,产生大量无效交易成本。 应在选股时设置换仓缓冲区(buffer zone):只在排名变化超过阈值时才换仓。
- **`SHARED-FR-PORT-004`** <sub>(medium)</sub>: 分组收益的统计显著性(Bootstrap 检验):因子分层收益差(Q1-Q5 spread) 即使在历史数据上很大,也可能是偶然,需要 bootstrap 或 t-test 检验 显著性(p-value < 0.05)。小样本回测期(< 3年)的分层收益尤其不可靠。
- **`SHARED-FR-XFER-001`** <sub>(high)</sub>: 因子跨市场可移植性验证:在一个市场有效的因子,不必然在另一个市场有效。 将美股因子直接套用 A 股、或将股票因子套用期货/加密货币,需要独立 IC 验证, 不可假设跨市场通用性。A 股特有异象(如反转效应、ST 价格异常)不存在于美股。
- **`SHARED-FR-XFER-002`** <sub>(medium)</sub>: 因子有效性时间稳定性:曾经有效的因子会因市场学习和套利行为逐渐失效 (McLean & Pontiff 2016 证明因子发表后平均衰减 58%)。 应定期(每季度/年)重新评估因子 IC,失效因子应及时替换或降权。
- **`SHARED-FR-XFER-003`** <sub>(medium)</sub>: 因子与宏观经济环境的交互:利率周期/经济周期/市场情绪对因子有效性影响显著。 价值因子(低 P/B)在利率上升期更有效;动量因子在趋势市更有效,震荡市失效。 部署因子前应评估当前宏观环境与因子最优生存环境的匹配度。
FILE:references/LOCKS.md
# Semantic Locks + Preconditions
## Semantic Locks (12)
### `SL-01` <sub>(on_violation: fatal)</sub>
Execute sell orders before buy orders in every trading cycle
### `SL-02` <sub>(on_violation: fatal)</sub>
Trading signals MUST use next-bar execution (no look-ahead)
### `SL-03` <sub>(on_violation: fatal)</sub>
Entity IDs MUST follow format entity_type_exchange_code
### `SL-04` <sub>(on_violation: fatal)</sub>
DataFrame index MUST be MultiIndex (entity_id, timestamp)
### `SL-05` <sub>(on_violation: fatal)</sub>
TradingSignal MUST have EXACTLY ONE of: position_pct, order_money, order_amount
### `SL-06` <sub>(on_violation: fatal)</sub>
filter_result column semantics: True=BUY, False=SELL, None/NaN=NO ACTION
### `SL-07` <sub>(on_violation: fatal)</sub>
Transformer MUST run BEFORE Accumulator in factor pipeline
### `SL-08` <sub>(on_violation: fatal)</sub>
MACD parameters locked: fast=12, slow=26, signal=9
### `SL-09` <sub>(on_violation: warning)</sub>
Default transaction costs: buy_cost=0.001, sell_cost=0.001, slippage=0.001
### `SL-10` <sub>(on_violation: fatal)</sub>
A-share equity trading is T+1 (no same-day close of buy positions)
### `SL-11` <sub>(on_violation: fatal)</sub>
Recorder subclass MUST define provider AND data_schema class attributes
### `SL-12` <sub>(on_violation: fatal)</sub>
Factor result_df MUST contain either 'filter_result' OR 'score_result' column
## Preconditions (4)
- **`PC-01`**: `python3 -c 'import zvt; print(zvt.__version__)'` → on_fail: Run: python3 -m pip install zvt then re-run: python3 -m zvt.init_dirs to initialize data directories
- **`PC-02`**: `python3 -c "from zvt.api.kdata import get_kdata; df = get_kdata(entity_ids=['stock_sh_600000'], limit=1); assert df is not None and len(df) > 0, 'No kdata found'"` → on_fail: Run recorder first: python3 -m zvt.recorders.em.em_stock_kdata_recorder --entity_ids stock_sh_600000 (replace with your target entity IDs)
- **`PC-03`**: `python3 -c "import os; from pathlib import Path; zvt_home = Path(os.environ.get('ZVT_HOME', Path.home() / '.zvt')); assert zvt_home.exists(), f'ZVT home not found: {zvt_home}'"` → on_fail: Run: python3 -m zvt.init_dirs
- **`PC-04`**: `python3 -c "import os, tempfile; from pathlib import Path; zvt_home = Path(os.environ.get('ZVT_HOME', Path.home() / '.zvt')); test_f = zvt_home / '.write_test'; test_f.touch(); test_f.unlink()"` → on_fail: Check directory permissions: chmod u+w ~/.zvt or set ZVT_HOME environment variable to a writable location
FILE:references/USE_CASES.md
# Known Use Cases (KUC)
Total: **10**
## `KUC-101`
**Source**: `docs/source/conf.py`
Configuring Sphinx documentation builder for the czsc project, ensuring proper Python path setup and Rust version priority.
## `KUC-102`
**Source**: `examples/develop/czsc_benchmark.py`
Benchmarking CZSC analysis performance with varying K-line counts to measure initialization speed and memory usage.
## `KUC-103`
**Source**: `examples/develop/test_trading_view_kline.py`
Testing and demonstrating K-line visualization using trading_view_kline function with mock data.
## `KUC-104`
**Source**: `examples/signals_dev/bar_volatility_V241013.py`
Classifying market volatility into three tiers (low/middle/high) based on recent K-line price ranges for signal generation.
## `KUC-105`
**Source**: `examples/signals_dev/signal_match.py`
Parsing and analyzing signal definitions from czsc.signals module using SignalsParser for research and configuration purposes.
## `KUC-106`
**Source**: `examples/use_backtest_report.py`
Generating HTML and PDF backtest reports from trading strategy performance data for analysis and presentation.
## `KUC-107`
**Source**: `examples/use_cta_research.py`
Using CTAResearch framework to develop and test CTA trading strategies with mock data through backtesting.
## `KUC-108`
**Source**: `examples/use_html_report_builder.py`
Creating flexible HTML reports with custom headers, performance metrics, charts, and tables using HtmlReportBuilder.
## `KUC-109`
**Source**: `examples/use_optimize.py`
Optimizing entry and exit trading signals by systematically searching candidate signal combinations to find optimal parameters.
## `KUC-110`
**Source**: `examples/事件策略研究工具使用案例.ipynb`
Researching event-based trading strategies using CZSC objects for K-line analysis,笔 detection, and chart visualization.
FILE:references/WISDOM.md
# Cross-Project Wisdom
Total: **10**
## `CW-BT-001` — Cerebro 统一编排引擎
**From**: backtrader · **Applicable to**: backtesting
backtrader 用 Cerebro 作为单一入口,统一管理 data feeds、strategies、analyzers、 observers 的生命周期,支持一次 cerebro.run() 跑多策略+多数据源。 zvt 的 StockTrader 目前每次实例化只绑定一套因子,缺乏统一的多策略组合编排层; 借鉴 Cerebro 模式可让用户把多个 Trader 实例组合到一个 runner 中对比评估。
## `CW-BT-002` — Analyzer 插件化绩效评估
**From**: backtrader · **Applicable to**: backtesting
backtrader 提供 SharpeRatio、DrawDown、TimeReturn、TradeAnalyzer 等即插即用 的 Analyzer,可在不修改策略代码的情况下附加任意绩效指标。 zvt 当前绩效评估能力较弱,没有标准化的 Analyzer 接口; 借鉴此模式可让用户 cerebro.addanalyzer(SharpeRatio) 即得风险调整收益报告。
## `CW-BT-003` — Sizer 仓位管理分离
**From**: backtrader · **Applicable to**: backtesting
backtrader 将仓位管理(每次开仓买多少股/多大比例)单独抽象为 Sizer, 与信号逻辑完全解耦;内置 FixedSize、PercentSizer 等,用户可自定义。 zvt 目前没有显式的 Sizer 概念,仓位控制逻辑散落在 Trader.on_profit_control 等钩子中; 引入 Sizer 接口可使策略信号与资金管理规则独立演化和组合复用。
## `CW-BT-004` — Order 类型全集(Limit/Stop/OCO/Bracket)
**From**: backtrader · **Applicable to**: backtesting
backtrader 支持 Market、Limit、Stop、StopLimit、OCO(二选一)、 Bracket(止盈止损一对订单)等丰富订单类型,并模拟成交滑点和手续费方案。 zvt 回测目前主要支持市价成交,缺乏限价委托和组合订单模拟; 对于高频或实盘对接场景,完善订单类型将大幅提升回测真实性。
## `CW-BT-005` — 数据重采样与重播(Resampling & Replaying)
**From**: backtrader · **Applicable to**: backtesting
backtrader 可将低级别数据(如 1 min)实时 resample 为高级别(如 1 day)并同步驱动策略, 或 replay 逐 tick 模拟 OHLC 形成过程,实现日内精细回测。 zvt 目前多时间框架通过预录入不同级别 K 线实现,缺少运行时动态重采样; 借鉴此模式可在不重复录入数据的前提下支持任意时间粒度组合回测。
## `CW-VN-003` — CTA 回测引擎内置可视化
**From**: vnpy · **Applicable to**: backtesting
vnpy 的 cta_backtester 提供图形界面直接展示策略净值曲线、最大回撤、 每日盈亏、成交明细,无需 Jupyter Notebook。 zvt 目前回测结果可视化依赖 draw_result 方法调用 Plotly,但无统一的回测报告页面; 借鉴此模式可打包一个开箱即用的策略绩效仪表盘。
## `CW-VN-004` — vnpy.alpha ML 因子研究实验室(Lab)
**From**: vnpy · **Applicable to**: factor-research
vnpy 4.0 的 vnpy.alpha.lab 提供数据管理、模型训练、信号生成、策略回测一体化工作流, 支持 Lasso/LightGBM/MLP 等算法的标准化训练接口和可视化对比。 zvt 的 ML 能力目前仅有 MaStockMLMachine 一个入口,缺乏规范化 Lab 框架; 借鉴 Lab 模式可建立"特征工程→训练→信号→回测"的标准流水线,降低 ML 实验门槛。
## `CW-QL-001` — Point-in-Time 数据库(防未来数据泄漏)
**From**: qlib · **Applicable to**: backtesting
qlib 的 Point-in-Time Provider 保证在给定时间点 t 的查询只返回 t 时刻 真实可知的数据(财报发布延迟、修订历史均被正确处理), 彻底消除回测中的 look-ahead bias。 zvt 目前财务数据以报告期为 timestamp,缺少"发布日"维度, 存在用未来财报数据做选股的潜在偏差;引入 PIT 模式可大幅提升回测可信度。
## `CW-QL-002` — Recorder + Experiment 实验管理(MLflow 风格)
**From**: qlib · **Applicable to**: factor-research
qlib 的 workflow 模块提供 Experiment/Recorder,自动记录每次模型训练的 超参数、特征、指标、预测结果,支持跨实验比较和模型版本管理。 zvt 目前缺乏 ML 实验追踪机制,每次重跑结果会覆盖前次; 借鉴 Recorder 模式可将每次因子实验的参数和结果持久化,支持快速复现和版本对比。
## `CW-QL-003` — Nested Decision Framework(多层嵌套决策执行)
**From**: qlib · **Applicable to**: backtesting
qlib 支持将高频执行层(分钟级委托拆单)嵌套在低频决策层(日级组合调仓)中, 两层独立优化且可组合运行,实现日内最优执行算法(如 TWAP、VWAP 调仓)。 zvt 目前回测仅有日线级别的成交假设,缺乏执行算法建模; 借鉴嵌套框架可让 zvt 区分"何时持有哪些股"与"如何以最小冲击成本建仓"两个问题。
FILE:references/components/backtest_-_performance_analysis.md
# backtest_&_performance_analysis (6 classes)
## `WeightBacktest.evaluate`
`backtest_&_performance_analysis/weightbacktest-evaluate.py:0`
## `PairsPerformance.calculate`
`backtest_&_performance_analysis/pairsperformance-calculate.py:0`
## `KlineChart.render`
`backtest_&_performance_analysis/klinechart-render.py:0`
## `evaluate_holds`
`backtest_&_performance_analysis/evaluate-holds.py:0`
## `report_format`
`backtest_&_performance_analysis/report-format.py:0`
## `performance_metrics`
`backtest_&_performance_analysis/performance-metrics.py:0`
FILE:references/components/chan_theory_analysis.md
# chan_theory_analysis (5 classes)
## `CZSC.update`
`chan_theory_analysis/czsc-update.py:0`
## `check_bi`
`chan_theory_analysis/check-bi.py:0`
## `remove_include`
`chan_theory_analysis/remove-include.py:0`
## `bi_recognition_algorithm`
`chan_theory_analysis/bi-recognition-algorithm.py:0`
## `kline_processing`
`chan_theory_analysis/kline-processing.py:0`
FILE:references/components/data_collection_layer.md
# data_collection_layer (5 classes)
## `DataClient.get_bars`
`data_collection_layer/dataclient-get-bars.py:0`
## `BarGenerator.update`
`data_collection_layer/bargenerator-update.py:0`
## `FeishuApiBase.upload_file`
`data_collection_layer/feishuapibase-upload-file.py:0`
## `data_source`
`data_collection_layer/data-source.py:0`
## `cache_backend`
`data_collection_layer/cache-backend.py:0`
FILE:references/components/event_-_position_management.md
# event_&_position_management (5 classes)
## `Event.is_match`
`event_&_position_management/event-is-match.py:0`
## `Position.update`
`event_&_position_management/position-update.py:0`
## `Position._can_close_today`
`event_&_position_management/position-can-close-today.py:0`
## `risk_controls`
`event_&_position_management/risk-controls.py:0`
## `reentry_policy`
`event_&_position_management/reentry-policy.py:0`
FILE:references/components/signal_computation.md
# signal_computation (5 classes)
## `CzscSignals.update_signals`
`signal_computation/czscsignals-update-signals.py:0`
## `SignalsParser.parse`
`signal_computation/signalsparser-parse.py:0`
## `get_signals_by_conf`
`signal_computation/get-signals-by-conf.py:0`
## `signal_library`
`signal_computation/signal-library.py:0`
## `frequency_selection`
`signal_computation/frequency-selection.py:0`
FILE:references/components/trading_execution.md
# trading_execution (7 classes)
## `CzscTrader.update`
`trading_execution/czsctrader-update.py:0`
## `CzscStrategyBase.positions`
`trading_execution/czscstrategybase-positions.py:0`
## `CzscTrader.get_ensemble_pos`
`trading_execution/czsctrader-get-ensemble-pos.py:0`
## `DummyBacktest.on_sig`
`trading_execution/dummybacktest-on-sig.py:0`
## `ensemble_method`
`trading_execution/ensemble-method.py:0`
## `execution_mode`
`trading_execution/execution-mode.py:0`
## `strategy_base`
`trading_execution/strategy-base.py:0`
金融市场回测框架,支持FX G10货币对技术指标策略回测、ArcticDB高频tick数据本地与S3云端存储、Quandl等数据源的市场数据获取与缓存。
---
name: cuemacro-finmarket
description: |-
金融市场回测框架,支持FX G10货币对技术指标策略回测、ArcticDB高频tick数据本地与S3云端存储、Quandl等数据源的市场数据获取与缓存。
license: Proprietary. See LICENSE.txt in project root.
compatibility: Designed for Doramagic-host ecosystem (Claude Code / openclaw / Cursor). Requires Python 3.12+ with uv package manager.
metadata:
version: "v6.1"
blueprint_id: "finance-bp-108"
compiled_at: "2026-04-22T13:00:51.768652+00:00"
capability_markets: "multi-market"
capability_activities: "portfolio-analytics"
sop_version: "crystal-compilation-v6.1"
---
# Cuemacro 市场工具 (cuemacro-finmarket)
> 金融市场回测框架,支持FX G10货币对技术指标策略回测、ArcticDB高频tick数据本地与S3云端存储、Quandl等数据源的市场数据获取与缓存。
## Pipeline
`data_collection -> data_storage -> factor_computation -> target_selection -> trading_execution -> visualization`
## Top Use Cases (4 total)
### ArcticDB Tick Data Storage (`UC-101`)
Provides persistent storage for high-frequency tick market data using ArcticDB, supporting both local LMDB and S3 cloud storage backends for efficient
**Triggers**: arcticdb, tick data storage, time series database
### Market Data Fetching from Vendors (`UC-103`)
Fetches economic and financial market data from external vendors like Quandl, demonstrating how to request and cache market data with specific fields
**Triggers**: market data, quandl, fetch data
### S3 Cloud Storage for Tick Data (`UC-104`)
Demonstrates writing and reading tick market data to/from AWS S3 cloud storage using Parquet format for efficient compression and retrieval of histori
**Triggers**: s3 storage, aws, parquet
For all **4** use cases, see [references/USE_CASES.md](references/USE_CASES.md).
**Execute trigger**: `When user intent matches intent_router.uc_entries[].positive_terms AND user uses action verb (run/execute/跑/执行/backtest/fetch/collect)`
## What I'll Ask You
- Target market: A-share (default), HK, or crypto? (US stocks in ZVT are half-baked — stockus_nasdaq_AAPL exists but coverage is thin)
- Data source / provider: eastmoney (free, no account), joinquant (account+paid), baostock (free, good history), akshare, or qmt (broker)?
- Strategy type: MACD golden-cross, MA crossover, volume breakout, fundamental screen, or custom factor?
- Time range: start_timestamp and end_timestamp for backtest period
- Target entity IDs: specific stocks (stock_sh_600000) or index components (SZ1000)?
## Semantic Locks (Fatal)
| ID | Rule | On Violation |
|---|---|---|
| `SL-01` | Execute sell orders before buy orders in every trading cycle | halt |
| `SL-02` | Trading signals MUST use next-bar execution (no look-ahead) | halt |
| `SL-03` | Entity IDs MUST follow format entity_type_exchange_code | halt |
| `SL-04` | DataFrame index MUST be MultiIndex (entity_id, timestamp) | halt |
| `SL-05` | TradingSignal MUST have EXACTLY ONE of: position_pct, order_money, order_amount | halt |
| `SL-06` | filter_result column semantics: True=BUY, False=SELL, None/NaN=NO ACTION | halt |
| `SL-07` | Transformer MUST run BEFORE Accumulator in factor pipeline | halt |
| `SL-08` | MACD parameters locked: fast=12, slow=26, signal=9 | halt |
Full lock definitions: [references/LOCKS.md](references/LOCKS.md)
## Top Anti-Patterns (14 total)
- **`AP-PORTFOLIO-ANALYTICS-001`**: Division by zero in price ratio calculations corrupts rebalancing
- **`AP-PORTFOLIO-ANALYTICS-002`**: Look-ahead bias from unshifted signal generation and position calculations
- **`AP-PORTFOLIO-ANALYTICS-003`**: Non-positive-semidefinite covariance matrix breaks CVXPY optimization
All 14 anti-patterns: [references/ANTI_PATTERNS.md](references/ANTI_PATTERNS.md)
## Evidence Quality Notice
> [QUALITY NOTICE] This crystal was compiled from blueprint finance-bp-108. Evidence verify ratio = 32.0% and audit fail total = 18. Generated results may have uncaptured requirement gaps. Verify critical decisions against source files (LATEST.yaml / LATEST.jsonl).
## Reference Files
| File | Contents | When to Load |
|---|---|---|
| [references/seed.yaml](references/seed.yaml) | V6+ 全量权威 (source-of-truth) | 有行为/决策争议时必读 |
| [references/ANTI_PATTERNS.md](references/ANTI_PATTERNS.md) | 14 条跨项目反模式 | 开始实现前 |
| [references/WISDOM.md](references/WISDOM.md) | 跨项目精华借鉴 | 架构决策时 |
| [references/CONSTRAINTS.md](references/CONSTRAINTS.md) | domain + fatal 约束 | 规则冲突时 |
| [references/USE_CASES.md](references/USE_CASES.md) | 全量 KUC-* 业务场景 | 需要完整示例时 |
| [references/LOCKS.md](references/LOCKS.md) | SL-* + preconditions + hints | 生成回测/交易代码前 |
| [references/COMPONENTS.md](references/COMPONENTS.md) | AST 组件地图(按 module 拆分)| 查 API 时 |
---
*Compiled by Doramagic crystal-compilation-v6.1 from `finance-bp-108` blueprint at 2026-04-22T13:00:51.768652+00:00.*
*See [human_summary.md](human_summary.md) for non-technical overview.*
FILE:human_summary.md
# Human Summary
> I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow. Just tell me what you want; I'll write the code, you don't have to dig docs. (Heads up: ZVT natively supports A-share, HK, and crypto. US stocks — stockus_nasdaq_AAPL — are half-baked; don't bother for serious work.)
## What I Can Do
- **tagline**: I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow. Just tell me what you want; I'll write the code, you don't have to dig docs. (Heads up: ZVT natively supports A-share, HK, and crypto. US stocks — stockus_nasdaq_AAPL — are half-baked; don't bother for serious work.)
- **use_cases**: ['Market Data Fetching from Vendors', 'FX G10 Cross Backtesting', 'ArcticDB Tick Data Storage', 'A-share MACD daily golden-cross backtest with hfq price adjustment from eastmoney', 'End-to-end ZVT pipeline: FinanceRecorder + GoodCompanyFactor + StockTrader', 'Multi-factor strategy with TargetSelector (AND mode) combining MACD + volume breakout', 'Index composition data collection (SZ1000, SZ2000) with EM recorder']
## What I Ask You
- Target market: A-share (default), HK, or crypto? (US stocks in ZVT are half-baked — stockus_nasdaq_AAPL exists but coverage is thin)
- Data source / provider: eastmoney (free, no account), joinquant (account+paid), baostock (free, good history), akshare, or qmt (broker)?
- Strategy type: MACD golden-cross, MA crossover, volume breakout, fundamental screen, or custom factor?
- Time range: start_timestamp and end_timestamp for backtest period
- Target entity IDs: specific stocks (stock_sh_600000) or index components (SZ1000)?
FILE:references/ANTI_PATTERNS.md
# Anti-Patterns (Cross-Project)
Total: **14**
## finance-bp-066--wealthbot (2)
### `AP-PORTFOLIO-ANALYTICS-001` — Division by zero in price ratio calculations corrupts rebalancing <sub>(high)</sub>
When calculating price_diff using current_price divided by old_price without validating old_price is non-zero, the result is NaN or INF. This corrupts portfolio rebalancing calculations in wealthbot, causing incorrect buy/sell decisions based on invalid prices_diff values. The same issue appears in getPricesDiff() where divide-by-zero when old_price equals zero produces NaN/infinity that propagates to all subsequent trade decisions.
### `AP-PORTFOLIO-ANALYTICS-004` — Incorrect portfolio value tracking destroys time-series integrity <sub>(high)</sub>
Updating existing ClientPortfolioValue records instead of creating new ones destroys the time-series integrity needed for billing calculations and historical reconciliation. This creates data corruption where billing calculations and historical reporting against custodian records will fail to match. Portfolio value records must be linked to parent ClientPortfolio via proper relationships to avoid orphaned records.
## finance-bp-068--xalpha (1)
### `AP-PORTFOLIO-ANALYTICS-006` — FIFO sell order violation corrupts cost basis and XIRR <sub>(high)</sub>
Processing positions out of chronological order in FIFO sell operations causes incorrect cost basis assignment, leading to inaccurate realized gains/losses and wrong XIRR calculation. Chinese funds have tiered redemption fees based on holding periods, so FIFO violations result in incorrect holding period calculation and wrong redemption fee being applied, causing direct financial loss.
## finance-bp-068--xalpha, finance-bp-093--PyPortfolioOpt, finance-bp-117--Riskfolio-Lib (1)
### `AP-PORTFOLIO-ANALYTICS-010` — Missing DataFrame schema validation causes KeyError propagation <sub>(medium)</sub>
Passing non-DataFrame objects (numpy arrays, lists) where DataFrame is expected causes NameError, AttributeError, or TypeError in downstream pandas operations. xalpha's fundinfo.price requires specific columns (date, netvalue, totvalue, comment), PyPortfolioOpt and Riskfolio-Lib require index alignment between expected returns and covariance matrix. Missing columns cause backtest calculations to fail with NaN values or KeyError.
## finance-bp-082--stock-screener (1)
### `AP-PORTFOLIO-ANALYTICS-007` — Score validation bypass allows invalid composite calculations <sub>(medium)</sub>
Accepting scores outside the 0-100 range in screener results corrupts ranking and rating logic, causing unpredictable screening results that violate the fundamental score contract. When combined with division-by-zero guards that return 0.0 for empty screener lists, this creates unpredictable behavior where invalid scores produce wrong composite calculations and incorrect Strong Buy/Buy/Watch/Pass ratings.
## finance-bp-093--PyPortfolioOpt (1)
### `AP-PORTFOLIO-ANALYTICS-008` — Convex optimization constraints violate DCP rules <sub>(high)</sub>
Using non-convex objectives or DCP-violating expressions in CVXPY optimization causes DCPError, completely preventing portfolio optimization from running. Similarly, providing non-callable constraints or invalid bounds formats (not matching n_assets length) causes TypeError. Feasibility violations like setting target_volatility below global minimum or target_return above maximum achievable return make problems infeasible.
## finance-bp-093--PyPortfolioOpt, finance-bp-117--Riskfolio-Lib (1)
### `AP-PORTFOLIO-ANALYTICS-003` — Non-positive-semidefinite covariance matrix breaks CVXPY optimization <sub>(high)</sub>
Passing a non-positive-semidefinite covariance matrix to CVXPY optimization with assume_PSD=True produces incorrect results because the solver assumes validity without verification. This causes Cholesky decomposition to fail or produce garbage weights, preventing portfolio optimization from running entirely. Riskfolio-Lib and PyPortfolioOpt both require explicit PSD validation before optimization.
## finance-bp-106--pyfolio-reloaded (2)
### `AP-PORTFOLIO-ANALYTICS-005` — Allocation denominator excludes cash, corrupting portfolio composition <sub>(medium)</sub>
When computing allocation percentages excluding cash from the denominator, portfolio allocation percentages will not sum to 100%, misrepresenting the portfolio's actual composition. Additionally, concentration metrics become artificially skewed when including cash (a non-position asset), producing misleading diversification assessments that could lead to inappropriate risk management decisions.
### `AP-PORTFOLIO-ANALYTICS-009` — Transaction data corruption from missing columns and invalid dates <sub>(medium)</sub>
Extracting round trips from transactions DataFrame without validating required columns (amount, price, symbol) causes KeyError exceptions. When open_dt is not strictly less than close_dt, negative or zero duration values indicate data corruption causing incorrect holding period statistics. Similarly, non-normalized transaction timestamps cause intra-day trades to be incorrectly split across days.
## finance-bp-107--empyrical-reloaded (1)
### `AP-PORTFOLIO-ANALYTICS-011` — Wrong annualization factors distort cross-frequency metric comparison <sub>(high)</sub>
Applying incorrect annualization factors (wrong values for daily, weekly, monthly, quarterly, yearly frequencies) produces non-comparable metrics across different return frequencies, causing invalid strategy comparisons and misallocated capital. The Sharpe ratio formula must use correct annualization with sample standard deviation (ddof=1), otherwise producing misleading risk-adjusted return estimates.
## finance-bp-107--empyrical-reloaded, finance-bp-118--FinanceToolkit (1)
### `AP-PORTFOLIO-ANALYTICS-012` — Misaligned time series in alpha/beta calculation produces invalid factor analysis <sub>(high)</sub>
Passing returns and factor_returns to alpha_beta functions without verifying data alignment on index labels (pd.Series) or length equality (np.ndarray) produces incorrect alpha/beta values due to correlation computed between mismatched periods. Including benchmark ticker in the asset ticker list causes circular correlation producing meaningless beta values of approximately 1.0.
## finance-bp-108--finmarketpy (1)
### `AP-PORTFOLIO-ANALYTICS-013` — Forward-filling spot prices creates look-ahead bias in TRI construction <sub>(high)</sub>
Forward-filling spot prices creates look-ahead bias where future prices are used to calculate historical returns, invalidating all TRI-based backtest results. The total return index construction requires multiplicative cumulation using cumprod (not cumsum) with base value 100, as additive cumulation allows negative cumulative returns to break the index chain.
## finance-bp-108--finmarketpy, finance-bp-106--pyfolio-reloaded (1)
### `AP-PORTFOLIO-ANALYTICS-002` — Look-ahead bias from unshifted signal generation and position calculations <sub>(high)</sub>
Generating trading signals from current-period technical indicators (RSI, moving averages) without proper shift(-1) creates look-ahead bias, causing live trading returns to fall far below backtested results. Similarly, when estimating intraday positions from transactions without applying shift(1) to EOD positions, day-start positions are contaminated with end-of-day values, making results unrepresentative of actual trading.
## finance-bp-117--Riskfolio-Lib, finance-bp-093--PyPortfolioOpt (1)
### `AP-PORTFOLIO-ANALYTICS-014` — Unsupported solver selection breaks advanced risk calculations <sub>(medium)</sub>
Using solvers that don't support required cone programming (power cone, exponential cone) causes CVXPY to fail with SolverError, returning None and breaking risk calculations. CLARABEL, SCS, ECOS support power cone for RLVaR/RLDaR calculations, while CLARABEL/MOSEK/SCS/ECOS support exponential cone for EVaR calculations. Riskfolio-Lib and PyPortfolioOpt both require careful solver selection.
FILE:references/COMPONENTS.md
# Component Capability Map
**Project**: finance-bp-108--finmarketpy
**Scan date**: 2026-04-22
**Stats**: {'total_files': 6, 'total_classes': 33, 'total_functions': 0, 'total_stages': 6}
## Modules (6)
- [market_data_collection](components/market_data_collection.md): 4 classes
- [technical_indicator_&_signal_generation](components/technical_indicator_-_signal_generation.md): 5 classes
- [total_return_index_construction](components/total_return_index_construction.md): 6 classes
- [fx_volatility_surface_&_pricing](components/fx_volatility_surface_-_pricing.md): 6 classes
- [strategy_backtesting_engine](components/strategy_backtesting_engine.md): 7 classes
- [trade_analysis_&_reporting](components/trade_analysis_-_reporting.md): 5 classes
FILE:references/CONSTRAINTS.md
# Constraints
## preservation_manifest
```yaml
required_objects:
business_decisions_count: 82
fatal_constraints_count: 18
non_fatal_constraints_count: 132
use_cases_count: 4
semantic_locks_count: 12
preconditions_count: 4
evidence_quality_rules_count: 2
traceback_scenarios_count: 5
```
FILE:references/LOCKS.md
# Semantic Locks + Preconditions
## Semantic Locks (12)
### `SL-01` <sub>(on_violation: fatal)</sub>
Execute sell orders before buy orders in every trading cycle
### `SL-02` <sub>(on_violation: fatal)</sub>
Trading signals MUST use next-bar execution (no look-ahead)
### `SL-03` <sub>(on_violation: fatal)</sub>
Entity IDs MUST follow format entity_type_exchange_code
### `SL-04` <sub>(on_violation: fatal)</sub>
DataFrame index MUST be MultiIndex (entity_id, timestamp)
### `SL-05` <sub>(on_violation: fatal)</sub>
TradingSignal MUST have EXACTLY ONE of: position_pct, order_money, order_amount
### `SL-06` <sub>(on_violation: fatal)</sub>
filter_result column semantics: True=BUY, False=SELL, None/NaN=NO ACTION
### `SL-07` <sub>(on_violation: fatal)</sub>
Transformer MUST run BEFORE Accumulator in factor pipeline
### `SL-08` <sub>(on_violation: fatal)</sub>
MACD parameters locked: fast=12, slow=26, signal=9
### `SL-09` <sub>(on_violation: warning)</sub>
Default transaction costs: buy_cost=0.001, sell_cost=0.001, slippage=0.001
### `SL-10` <sub>(on_violation: fatal)</sub>
A-share equity trading is T+1 (no same-day close of buy positions)
### `SL-11` <sub>(on_violation: fatal)</sub>
Recorder subclass MUST define provider AND data_schema class attributes
### `SL-12` <sub>(on_violation: fatal)</sub>
Factor result_df MUST contain either 'filter_result' OR 'score_result' column
## Preconditions (4)
- **`PC-01`**: `python3 -c 'import zvt; print(zvt.__version__)'` → on_fail: Run: python3 -m pip install zvt then re-run: python3 -m zvt.init_dirs to initialize data directories
- **`PC-02`**: `python3 -c "from zvt.api.kdata import get_kdata; df = get_kdata(entity_ids=['stock_sh_600000'], limit=1); assert df is not None and len(df) > 0, 'No kdata found'"` → on_fail: Run recorder first: python3 -m zvt.recorders.em.em_stock_kdata_recorder --entity_ids stock_sh_600000 (replace with your target entity IDs)
- **`PC-03`**: `python3 -c "import os; from pathlib import Path; zvt_home = Path(os.environ.get('ZVT_HOME', Path.home() / '.zvt')); assert zvt_home.exists(), f'ZVT home not found: {zvt_home}'"` → on_fail: Run: python3 -m zvt.init_dirs
- **`PC-04`**: `python3 -c "import os, tempfile; from pathlib import Path; zvt_home = Path(os.environ.get('ZVT_HOME', Path.home() / '.zvt')); test_f = zvt_home / '.write_test'; test_f.touch(); test_f.unlink()"` → on_fail: Check directory permissions: chmod u+w ~/.zvt or set ZVT_HOME environment variable to a writable location
FILE:references/USE_CASES.md
# Known Use Cases (KUC)
Total: **4**
## `KUC-101`
**Source**: `finmarketpy_examples/finmarketpy_notebooks/arcticdb_example.ipynb`
Provides persistent storage for high-frequency tick market data using ArcticDB, supporting both local LMDB and S3 cloud storage backends for efficient time series data management.
## `KUC-102`
**Source**: `finmarketpy_examples/finmarketpy_notebooks/backtest_example.ipynb`
Enables historical backtesting of FX trading strategies using G10 currency pairs with technical indicator-based signal generation to evaluate strategy performance.
## `KUC-103`
**Source**: `finmarketpy_examples/finmarketpy_notebooks/market_data_example.ipynb`
Fetches economic and financial market data from external vendors like Quandl, demonstrating how to request and cache market data with specific fields and date ranges.
## `KUC-104`
**Source**: `finmarketpy_examples/finmarketpy_notebooks/s3_bucket_example.ipynb`
Demonstrates writing and reading tick market data to/from AWS S3 cloud storage using Parquet format for efficient compression and retrieval of historical FX tick data.
FILE:references/WISDOM.md
# Cross-Project Wisdom
Total: **10**
## `CW-PORTFOLIO-ANALYTICS-001` — Defensive zero-division guards with explicit handling
**From**: finance-bp-066--wealthbot, finance-bp-082--stock-screener, finance-bp-093--PyPortfolioOpt · **Applicable to**: portfolio-analytics
Always guard division operations with explicit zero-value checks before executing. In price ratio calculations, filter out securities where old_price is zero before calling getPricesDiff. In composite score calculations, guard against total_weight of zero and return 0.0 for empty input lists. This prevents NaN/infinity propagation that corrupts downstream calculations and crashes pipelines.
## `CW-PORTFOLIO-ANALYTICS-002` — Covariance matrix positive-semidefiniteness verification
**From**: finance-bp-093--PyPortfolioOpt, finance-bp-117--Riskfolio-Lib · **Applicable to**: portfolio-analytics
Always verify covariance matrix is positive-semidefinite before passing to CVXPY optimization. Apply eigenvalue clipping if violated, as non-PSD matrices cause Cholesky decomposition failures. Both PyPortfolioOpt and Riskfolio-Lib enforce this constraint to prevent optimizer from finding mathematically invalid solutions or crashing entirely.
## `CW-PORTFOLIO-ANALYTICS-003` — Geometric compounding for cumulative returns
**From**: finance-bp-068--xalpha, finance-bp-106--pyfolio-reloaded, finance-bp-107--empyrical-reloaded · **Applicable to**: portfolio-analytics
Compute cumulative returns using geometric compounding via cumprod(1 + returns), never arithmetic cumulation via cumsum. Arithmetic cumulative sum overstates gains and understates losses, causing cumulative returns to diverge significantly from actual portfolio performance over volatile periods. This principle applies to total return index construction and any cumulative performance calculation.
## `CW-PORTFOLIO-ANALYTICS-004` — Temporal shift enforcement to prevent look-ahead bias
**From**: finance-bp-108--finmarketpy, finance-bp-106--pyfolio-reloaded · **Applicable to**: portfolio-analytics
Enforce proper temporal shifting in signal generation and position calculations. Use shift(-1) for exit signals to prevent look-ahead bias, and shift(1) when estimating intraday positions from EOD data. Forward-fill carry data and backward-fill only old data gaps, never forward-fill spot prices. Violations cause live trading returns to diverge from backtested results.
## `CW-PORTFOLIO-ANALYTICS-005` — DCP-compliant convex optimization construction
**From**: finance-bp-093--PyPortfolioOpt, finance-bp-117--Riskfolio-Lib · **Applicable to**: portfolio-analytics
Use only DCP-compliant convex objectives and constraints in CVXPY. Provide constraints as callable functions accepting weight variables, use valid bounds formats matching n_assets length, and verify target parameters (volatility, return) are within feasible ranges. Non-convex or infeasible problems fail with DCPError or OptimizationError, preventing optimization entirely.
## `CW-PORTFOLIO-ANALYTICS-006` — Correct Sharpe ratio formula with risk-free rate subtraction
**From**: finance-bp-107--empyrical-reloaded, finance-bp-118--FinanceToolkit · **Applicable to**: portfolio-analytics
Calculate Sharpe ratio using (mean returns - risk_free) / std(returns) * sqrt(annualization) with sample standard deviation (ddof=1). Subtract risk-free rate from asset returns before dividing by volatility. Incorrect Sharpe ratio calculation produces misleading risk-adjusted return estimates, causing poor investment decisions based on faulty performance attribution.
## `CW-PORTFOLIO-ANALYTICS-007` — Immutable FIFO position tracking with chronological ordering
**From**: finance-bp-068--xalpha, finance-bp-066--wealthbot · **Applicable to**: portfolio-analytics
Maintain FIFO position tracking with strictly increasing date order for position entries. Use copy() function to create independent copies before mutating remtable to avoid side effects. Enforce chronological ordering in sell operations to ensure correct cost basis and holding period calculation, particularly important for funds with tiered fees by holding period.
## `CW-PORTFOLIO-ANALYTICS-008` — Validation at system boundaries with descriptive errors
**From**: finance-bp-082--stock-screener, finance-bp-093--PyPortfolioOpt, finance-bp-117--Riskfolio-Lib · **Applicable to**: portfolio-analytics
Enforce validation at system boundaries with descriptive error messages. Validate expected returns matches covariance matrix dimensions, score values are within [0, 100], confidence values within [0, 1], and required DataFrame columns are present. Invalid inputs should raise ValueError with descriptive messages listing valid options to prevent silent failures or corrupted calculations.
## `CW-PORTFOLIO-ANALYTICS-009` — Decimal rounding for monetary calculations
**From**: finance-bp-068--xalpha, finance-bp-107--empyrical-reloaded · **Applicable to**: portfolio-analytics
Use Decimal with explicit rounding (myround) for each monetary calculation to avoid floating-point errors that cause share miscalculation and incorrect cost basis. This prevents rounding errors from propagating to XIRR and portfolio valuation calculations. Direct floating-point operations in financial calculations accumulate errors that become material over many transactions.
## `CW-PORTFOLIO-ANALYTICS-010` — Cash flow sign convention enforcement
**From**: finance-bp-106--pyfolio-reloaded, finance-bp-068--xalpha · **Applicable to**: portfolio-analytics
Mark cash outflows as negative and cash inflows as positive in cftable. Incorrect cash flow signs cause NPV calculation to invert, producing negative returns for profitable trades and vice versa. Verify sum of round trip PnLs equals total realized transaction dollars to catch sign convention errors before they corrupt performance attribution.
FILE:references/components/fx_volatility_surface_-_pricing.md
# fx_volatility_surface_&_pricing (6 classes)
## `FXVolSurface.build_vol_surface`
`fx_volatility_surface_&_pricing/fxvolsurface-build-vol-surface.py:0`
## `FXOptionsPricer.price_option`
`fx_volatility_surface_&_pricing/fxoptionspricer-price-option.py:0`
## `FXForwardsPricer.price_odd_date`
`fx_volatility_surface_&_pricing/fxforwardspricer-price-odd-date.py:0`
## `VolStats.calculate`
`fx_volatility_surface_&_pricing/volstats-calculate.py:0`
## `vol_function_type`
`fx_volatility_surface_&_pricing/vol-function-type.py:0`
## `pricing_engine`
`fx_volatility_surface_&_pricing/pricing-engine.py:0`
FILE:references/components/market_data_collection.md
# market_data_collection (4 classes)
## `MarketDataRequest.__init__`
`market_data_collection/marketdatarequest-init.py:0`
## `Market.fetch_market_data`
`market_data_collection/market-fetch-market-data.py:0`
## `SpeedCache.generate_key`
`market_data_collection/speedcache-generate-key.py:0`
## `data_source`
`market_data_collection/data-source.py:0`
FILE:references/components/strategy_backtesting_engine.md
# strategy_backtesting_engine (7 classes)
## `Backtest.calculate_trading_PnL`
`strategy_backtesting_engine/backtest-calculate-trading-pnl.py:0`
## `TradingModel.construct_strategy`
`strategy_backtesting_engine/tradingmodel-construct-strategy.py:0`
## `PortfolioWeightConstruction.optimize_portfolio_weights`
`strategy_backtesting_engine/portfolioweightconstruction-optimize-por.py:0`
## `RiskEngine.calculate_leverage_factor`
`strategy_backtesting_engine/riskengine-calculate-leverage-factor.py:0`
## `portfolio_combination`
`strategy_backtesting_engine/portfolio-combination.py:0`
## `signal_delay`
`strategy_backtesting_engine/signal-delay.py:0`
## `portfolio_vol_adjust`
`strategy_backtesting_engine/portfolio-vol-adjust.py:0`
FILE:references/components/technical_indicator_-_signal_generation.md
# technical_indicator_&_signal_generation (5 classes)
## `TechIndicator.create_tech_ind`
`technical_indicator_&_signal_generation/techindicator-create-tech-ind.py:0`
## `TechParams.__init__`
`technical_indicator_&_signal_generation/techparams-init.py:0`
## `EventsFactory.create_event_signal`
`technical_indicator_&_signal_generation/eventsfactory-create-event-signal.py:0`
## `indicator_type`
`technical_indicator_&_signal_generation/indicator-type.py:0`
## `signal_direction_filter`
`technical_indicator_&_signal_generation/signal-direction-filter.py:0`
FILE:references/components/total_return_index_construction.md
# total_return_index_construction (6 classes)
## `FXSpotCurve.construct_total_returns_index`
`total_return_index_construction/fxspotcurve-construct-total-returns-inde.py:0`
## `FXForwardsCurve.roll_contracts`
`total_return_index_construction/fxforwardscurve-roll-contracts.py:0`
## `FXOptionsCurve.construct_tri`
`total_return_index_construction/fxoptionscurve-construct-tri.py:0`
## `AbstractCurve.generate_key`
`total_return_index_construction/abstractcurve-generate-key.py:0`
## `roll_event`
`total_return_index_construction/roll-event.py:0`
## `construct_via_currency`
`total_return_index_construction/construct-via-currency.py:0`
FILE:references/components/trade_analysis_-_reporting.md
# trade_analysis_&_reporting (5 classes)
## `TradeAnalysis.analyse_strategy`
`trade_analysis_&_reporting/tradeanalysis-analyse-strategy.py:0`
## `BacktestComparison.compare`
`trade_analysis_&_reporting/backtestcomparison-compare.py:0`
## `Report.generate`
`trade_analysis_&_reporting/report-generate.py:0`
## `Seasonality.detect`
`trade_analysis_&_reporting/seasonality-detect.py:0`
## `analysis_engine`
`trade_analysis_&_reporting/analysis-engine.py:0`
FILE:references/seed.yaml
meta:
id: finance-bp-108-v5.3
version: v6.1
blueprint_id: finance-bp-108
sop_version: crystal-compilation-v6.1
source_language: en
compiled_at: '2026-04-22T13:00:51.768652+00:00'
target_host: openclaw
authoritative_artifact:
primary: seed.yaml
non_authoritative_derivatives:
- SKILL.md (host-generated summary, may lag)
- HEARTBEAT.md (host telemetry)
- memory/*.md (host conversational memory)
rule: On any behavioral decision (preconditions check, OV assertion, EQ rule firing, spec_lock verification), agents MUST
re-read seed.yaml. Derivatives are for UI display only and may be out-of-date.
execution_protocol:
install_trigger:
- Execute resources.host_adapter.install_recipes[] in declared order
- Verify each package with import check before proceeding
execute_trigger: When user intent matches intent_router.uc_entries[].positive_terms AND user uses action verb (run/execute/跑/执行/backtest/fetch/collect)
on_execute:
- Reload seed.yaml (do not rely on SKILL.md or cached summaries)
- Run preconditions[] in declared order; halt on first fatal failure with on_fail message to user
- Enter context_state_machine.CA1_MEMORY_CHECKED state
- Evaluate evidence_quality.enforcement_rules[]; prepend user_disclosure_template
- Translate user_facing_fields to user locale per locale_contract
- "[V6 READING ORDER]\nThis crystal contains the following V6 layers. Before answering any business question, the host\
\ MUST read them in order:\n 1. anti_patterns[] — cross-project anti-patterns (with AP-* ids)\n 2. cross_project_wisdom[]\
\ — cross-project wisdom (with CW-* ids)\n 3. domain_constraints_injected[] — domain constraints (SHARED-* ids)\n \
\ 4. known_use_cases[] — concrete business scenarios (KUC-* ids)\n 5. component_capability_map — AST component map\
\ (by module)\n\nWhen answering user questions, proactively cite relevant AP-*/CW-*/SHARED-*/KUC-* ids with source text.\
\ Examples: T+1 rules -> cite SHARED-* constraint; model comparison -> warn via AP-*; follow-holdings strategy -> cite\
\ KUC-* with example file."
workspace_resolution:
scripts_path: '{host_workspace}/scripts/'
skills_path: '{host_workspace}/skills/'
trace_path: '{host_workspace}/.trace/'
capability_tags:
markets:
- multi-market
activities:
- portfolio-analytics
upgraded_from: finance-bp-108-v1.seed.yaml
upgraded_at: '2026-04-22T13:20:29.151803+00:00'
v6_inputs:
ast_mind_map: knowledge/sources/finance/finance-bp-108--finmarketpy/v6_inputs/ast_mind_map.yaml
anti_patterns: null
cross_project_wisdom: null
examples_kuc: knowledge/sources/finance/finance-bp-108--finmarketpy/v6_inputs/examples_kuc.yaml
shared_pools_dir: knowledge/sources/finance/_shared
anti_patterns:
- id: AP-PORTFOLIO-ANALYTICS-001
title: Division by zero in price ratio calculations corrupts rebalancing
description: When calculating price_diff using current_price divided by old_price without validating old_price is non-zero,
the result is NaN or INF. This corrupts portfolio rebalancing calculations in wealthbot, causing incorrect buy/sell decisions
based on invalid prices_diff values. The same issue appears in getPricesDiff() where divide-by-zero when old_price equals
zero produces NaN/infinity that propagates to all subsequent trade decisions.
project_source: finance-bp-066--wealthbot
severity: high
applicable_to_tags:
markets:
- multi-market
activities:
- portfolio-analytics
_source_file: anti-patterns/portfolio-analytics.yaml
- id: AP-PORTFOLIO-ANALYTICS-002
title: Look-ahead bias from unshifted signal generation and position calculations
description: Generating trading signals from current-period technical indicators (RSI, moving averages) without proper shift(-1)
creates look-ahead bias, causing live trading returns to fall far below backtested results. Similarly, when estimating
intraday positions from transactions without applying shift(1) to EOD positions, day-start positions are contaminated
with end-of-day values, making results unrepresentative of actual trading.
project_source: finance-bp-108--finmarketpy, finance-bp-106--pyfolio-reloaded
severity: high
applicable_to_tags:
markets:
- multi-market
activities:
- portfolio-analytics
_source_file: anti-patterns/portfolio-analytics.yaml
- id: AP-PORTFOLIO-ANALYTICS-003
title: Non-positive-semidefinite covariance matrix breaks CVXPY optimization
description: Passing a non-positive-semidefinite covariance matrix to CVXPY optimization with assume_PSD=True produces incorrect
results because the solver assumes validity without verification. This causes Cholesky decomposition to fail or produce
garbage weights, preventing portfolio optimization from running entirely. Riskfolio-Lib and PyPortfolioOpt both require
explicit PSD validation before optimization.
project_source: finance-bp-093--PyPortfolioOpt, finance-bp-117--Riskfolio-Lib
severity: high
applicable_to_tags:
markets:
- multi-market
activities:
- portfolio-analytics
_source_file: anti-patterns/portfolio-analytics.yaml
- id: AP-PORTFOLIO-ANALYTICS-004
title: Incorrect portfolio value tracking destroys time-series integrity
description: Updating existing ClientPortfolioValue records instead of creating new ones destroys the time-series integrity
needed for billing calculations and historical reconciliation. This creates data corruption where billing calculations
and historical reporting against custodian records will fail to match. Portfolio value records must be linked to parent
ClientPortfolio via proper relationships to avoid orphaned records.
project_source: finance-bp-066--wealthbot
severity: high
applicable_to_tags:
markets:
- multi-market
activities:
- portfolio-analytics
_source_file: anti-patterns/portfolio-analytics.yaml
- id: AP-PORTFOLIO-ANALYTICS-005
title: Allocation denominator excludes cash, corrupting portfolio composition
description: When computing allocation percentages excluding cash from the denominator, portfolio allocation percentages
will not sum to 100%, misrepresenting the portfolio's actual composition. Additionally, concentration metrics become artificially
skewed when including cash (a non-position asset), producing misleading diversification assessments that could lead to
inappropriate risk management decisions.
project_source: finance-bp-106--pyfolio-reloaded
severity: medium
applicable_to_tags:
markets:
- multi-market
activities:
- portfolio-analytics
_source_file: anti-patterns/portfolio-analytics.yaml
- id: AP-PORTFOLIO-ANALYTICS-006
title: FIFO sell order violation corrupts cost basis and XIRR
description: Processing positions out of chronological order in FIFO sell operations causes incorrect cost basis assignment,
leading to inaccurate realized gains/losses and wrong XIRR calculation. Chinese funds have tiered redemption fees based
on holding periods, so FIFO violations result in incorrect holding period calculation and wrong redemption fee being applied,
causing direct financial loss.
project_source: finance-bp-068--xalpha
severity: high
applicable_to_tags:
markets:
- multi-market
activities:
- portfolio-analytics
_source_file: anti-patterns/portfolio-analytics.yaml
- id: AP-PORTFOLIO-ANALYTICS-007
title: Score validation bypass allows invalid composite calculations
description: Accepting scores outside the 0-100 range in screener results corrupts ranking and rating logic, causing unpredictable
screening results that violate the fundamental score contract. When combined with division-by-zero guards that return
0.0 for empty screener lists, this creates unpredictable behavior where invalid scores produce wrong composite calculations
and incorrect Strong Buy/Buy/Watch/Pass ratings.
project_source: finance-bp-082--stock-screener
severity: medium
applicable_to_tags:
markets:
- multi-market
activities:
- portfolio-analytics
_source_file: anti-patterns/portfolio-analytics.yaml
- id: AP-PORTFOLIO-ANALYTICS-008
title: Convex optimization constraints violate DCP rules
description: Using non-convex objectives or DCP-violating expressions in CVXPY optimization causes DCPError, completely
preventing portfolio optimization from running. Similarly, providing non-callable constraints or invalid bounds formats
(not matching n_assets length) causes TypeError. Feasibility violations like setting target_volatility below global minimum
or target_return above maximum achievable return make problems infeasible.
project_source: finance-bp-093--PyPortfolioOpt
severity: high
applicable_to_tags:
markets:
- multi-market
activities:
- portfolio-analytics
_source_file: anti-patterns/portfolio-analytics.yaml
- id: AP-PORTFOLIO-ANALYTICS-009
title: Transaction data corruption from missing columns and invalid dates
description: Extracting round trips from transactions DataFrame without validating required columns (amount, price, symbol)
causes KeyError exceptions. When open_dt is not strictly less than close_dt, negative or zero duration values indicate
data corruption causing incorrect holding period statistics. Similarly, non-normalized transaction timestamps cause intra-day
trades to be incorrectly split across days.
project_source: finance-bp-106--pyfolio-reloaded
severity: medium
applicable_to_tags:
markets:
- multi-market
activities:
- portfolio-analytics
_source_file: anti-patterns/portfolio-analytics.yaml
- id: AP-PORTFOLIO-ANALYTICS-010
title: Missing DataFrame schema validation causes KeyError propagation
description: Passing non-DataFrame objects (numpy arrays, lists) where DataFrame is expected causes NameError, AttributeError,
or TypeError in downstream pandas operations. xalpha's fundinfo.price requires specific columns (date, netvalue, totvalue,
comment), PyPortfolioOpt and Riskfolio-Lib require index alignment between expected returns and covariance matrix. Missing
columns cause backtest calculations to fail with NaN values or KeyError.
project_source: finance-bp-068--xalpha, finance-bp-093--PyPortfolioOpt, finance-bp-117--Riskfolio-Lib
severity: medium
applicable_to_tags:
markets:
- multi-market
activities:
- portfolio-analytics
_source_file: anti-patterns/portfolio-analytics.yaml
- id: AP-PORTFOLIO-ANALYTICS-011
title: Wrong annualization factors distort cross-frequency metric comparison
description: Applying incorrect annualization factors (wrong values for daily, weekly, monthly, quarterly, yearly frequencies)
produces non-comparable metrics across different return frequencies, causing invalid strategy comparisons and misallocated
capital. The Sharpe ratio formula must use correct annualization with sample standard deviation (ddof=1), otherwise producing
misleading risk-adjusted return estimates.
project_source: finance-bp-107--empyrical-reloaded
severity: high
applicable_to_tags:
markets:
- multi-market
activities:
- portfolio-analytics
_source_file: anti-patterns/portfolio-analytics.yaml
- id: AP-PORTFOLIO-ANALYTICS-012
title: Misaligned time series in alpha/beta calculation produces invalid factor analysis
description: Passing returns and factor_returns to alpha_beta functions without verifying data alignment on index labels
(pd.Series) or length equality (np.ndarray) produces incorrect alpha/beta values due to correlation computed between mismatched
periods. Including benchmark ticker in the asset ticker list causes circular correlation producing meaningless beta values
of approximately 1.0.
project_source: finance-bp-107--empyrical-reloaded, finance-bp-118--FinanceToolkit
severity: high
applicable_to_tags:
markets:
- multi-market
activities:
- portfolio-analytics
_source_file: anti-patterns/portfolio-analytics.yaml
- id: AP-PORTFOLIO-ANALYTICS-013
title: Forward-filling spot prices creates look-ahead bias in TRI construction
description: Forward-filling spot prices creates look-ahead bias where future prices are used to calculate historical returns,
invalidating all TRI-based backtest results. The total return index construction requires multiplicative cumulation using
cumprod (not cumsum) with base value 100, as additive cumulation allows negative cumulative returns to break the index
chain.
project_source: finance-bp-108--finmarketpy
severity: high
applicable_to_tags:
markets:
- multi-market
activities:
- portfolio-analytics
_source_file: anti-patterns/portfolio-analytics.yaml
- id: AP-PORTFOLIO-ANALYTICS-014
title: Unsupported solver selection breaks advanced risk calculations
description: Using solvers that don't support required cone programming (power cone, exponential cone) causes CVXPY to fail
with SolverError, returning None and breaking risk calculations. CLARABEL, SCS, ECOS support power cone for RLVaR/RLDaR
calculations, while CLARABEL/MOSEK/SCS/ECOS support exponential cone for EVaR calculations. Riskfolio-Lib and PyPortfolioOpt
both require careful solver selection.
project_source: finance-bp-117--Riskfolio-Lib, finance-bp-093--PyPortfolioOpt
severity: medium
applicable_to_tags:
markets:
- multi-market
activities:
- portfolio-analytics
_source_file: anti-patterns/portfolio-analytics.yaml
cross_project_wisdom:
- wisdom_id: CW-PORTFOLIO-ANALYTICS-001
source_project: finance-bp-066--wealthbot, finance-bp-082--stock-screener, finance-bp-093--PyPortfolioOpt
pattern_name: Defensive zero-division guards with explicit handling
description: Always guard division operations with explicit zero-value checks before executing. In price ratio calculations,
filter out securities where old_price is zero before calling getPricesDiff. In composite score calculations, guard against
total_weight of zero and return 0.0 for empty input lists. This prevents NaN/infinity propagation that corrupts downstream
calculations and crashes pipelines.
applicable_to_activity: portfolio-analytics
_source_file: cross-project-wisdom/portfolio-analytics.yaml
- wisdom_id: CW-PORTFOLIO-ANALYTICS-002
source_project: finance-bp-093--PyPortfolioOpt, finance-bp-117--Riskfolio-Lib
pattern_name: Covariance matrix positive-semidefiniteness verification
description: Always verify covariance matrix is positive-semidefinite before passing to CVXPY optimization. Apply eigenvalue
clipping if violated, as non-PSD matrices cause Cholesky decomposition failures. Both PyPortfolioOpt and Riskfolio-Lib
enforce this constraint to prevent optimizer from finding mathematically invalid solutions or crashing entirely.
applicable_to_activity: portfolio-analytics
_source_file: cross-project-wisdom/portfolio-analytics.yaml
- wisdom_id: CW-PORTFOLIO-ANALYTICS-003
source_project: finance-bp-068--xalpha, finance-bp-106--pyfolio-reloaded, finance-bp-107--empyrical-reloaded
pattern_name: Geometric compounding for cumulative returns
description: Compute cumulative returns using geometric compounding via cumprod(1 + returns), never arithmetic cumulation
via cumsum. Arithmetic cumulative sum overstates gains and understates losses, causing cumulative returns to diverge significantly
from actual portfolio performance over volatile periods. This principle applies to total return index construction and
any cumulative performance calculation.
applicable_to_activity: portfolio-analytics
_source_file: cross-project-wisdom/portfolio-analytics.yaml
- wisdom_id: CW-PORTFOLIO-ANALYTICS-004
source_project: finance-bp-108--finmarketpy, finance-bp-106--pyfolio-reloaded
pattern_name: Temporal shift enforcement to prevent look-ahead bias
description: Enforce proper temporal shifting in signal generation and position calculations. Use shift(-1) for exit signals
to prevent look-ahead bias, and shift(1) when estimating intraday positions from EOD data. Forward-fill carry data and
backward-fill only old data gaps, never forward-fill spot prices. Violations cause live trading returns to diverge from
backtested results.
applicable_to_activity: portfolio-analytics
_source_file: cross-project-wisdom/portfolio-analytics.yaml
- wisdom_id: CW-PORTFOLIO-ANALYTICS-005
source_project: finance-bp-093--PyPortfolioOpt, finance-bp-117--Riskfolio-Lib
pattern_name: DCP-compliant convex optimization construction
description: Use only DCP-compliant convex objectives and constraints in CVXPY. Provide constraints as callable functions
accepting weight variables, use valid bounds formats matching n_assets length, and verify target parameters (volatility,
return) are within feasible ranges. Non-convex or infeasible problems fail with DCPError or OptimizationError, preventing
optimization entirely.
applicable_to_activity: portfolio-analytics
_source_file: cross-project-wisdom/portfolio-analytics.yaml
- wisdom_id: CW-PORTFOLIO-ANALYTICS-006
source_project: finance-bp-107--empyrical-reloaded, finance-bp-118--FinanceToolkit
pattern_name: Correct Sharpe ratio formula with risk-free rate subtraction
description: Calculate Sharpe ratio using (mean returns - risk_free) / std(returns) * sqrt(annualization) with sample standard
deviation (ddof=1). Subtract risk-free rate from asset returns before dividing by volatility. Incorrect Sharpe ratio calculation
produces misleading risk-adjusted return estimates, causing poor investment decisions based on faulty performance attribution.
applicable_to_activity: portfolio-analytics
_source_file: cross-project-wisdom/portfolio-analytics.yaml
- wisdom_id: CW-PORTFOLIO-ANALYTICS-007
source_project: finance-bp-068--xalpha, finance-bp-066--wealthbot
pattern_name: Immutable FIFO position tracking with chronological ordering
description: Maintain FIFO position tracking with strictly increasing date order for position entries. Use copy() function
to create independent copies before mutating remtable to avoid side effects. Enforce chronological ordering in sell operations
to ensure correct cost basis and holding period calculation, particularly important for funds with tiered fees by holding
period.
applicable_to_activity: portfolio-analytics
_source_file: cross-project-wisdom/portfolio-analytics.yaml
- wisdom_id: CW-PORTFOLIO-ANALYTICS-008
source_project: finance-bp-082--stock-screener, finance-bp-093--PyPortfolioOpt, finance-bp-117--Riskfolio-Lib
pattern_name: Validation at system boundaries with descriptive errors
description: Enforce validation at system boundaries with descriptive error messages. Validate expected returns matches
covariance matrix dimensions, score values are within [0, 100], confidence values within [0, 1], and required DataFrame
columns are present. Invalid inputs should raise ValueError with descriptive messages listing valid options to prevent
silent failures or corrupted calculations.
applicable_to_activity: portfolio-analytics
_source_file: cross-project-wisdom/portfolio-analytics.yaml
- wisdom_id: CW-PORTFOLIO-ANALYTICS-009
source_project: finance-bp-068--xalpha, finance-bp-107--empyrical-reloaded
pattern_name: Decimal rounding for monetary calculations
description: Use Decimal with explicit rounding (myround) for each monetary calculation to avoid floating-point errors that
cause share miscalculation and incorrect cost basis. This prevents rounding errors from propagating to XIRR and portfolio
valuation calculations. Direct floating-point operations in financial calculations accumulate errors that become material
over many transactions.
applicable_to_activity: portfolio-analytics
_source_file: cross-project-wisdom/portfolio-analytics.yaml
- wisdom_id: CW-PORTFOLIO-ANALYTICS-010
source_project: finance-bp-106--pyfolio-reloaded, finance-bp-068--xalpha
pattern_name: Cash flow sign convention enforcement
description: Mark cash outflows as negative and cash inflows as positive in cftable. Incorrect cash flow signs cause NPV
calculation to invert, producing negative returns for profitable trades and vice versa. Verify sum of round trip PnLs
equals total realized transaction dollars to catch sign convention errors before they corrupt performance attribution.
applicable_to_activity: portfolio-analytics
_source_file: cross-project-wisdom/portfolio-analytics.yaml
domain_constraints_injected: []
resources_injected: {}
known_use_cases:
- kuc_id: KUC-101
source_file: finmarketpy_examples/finmarketpy_notebooks/arcticdb_example.ipynb
business_problem: Provides persistent storage for high-frequency tick market data using ArcticDB, supporting both local
LMDB and S3 cloud storage backends for efficient time series data management.
intent_keywords:
- arcticdb
- tick data storage
- time series database
- lmdb
- market data persistence
stage: data_collection
data_domain: market_data
type: data_pipeline
- kuc_id: KUC-102
source_file: finmarketpy_examples/finmarketpy_notebooks/backtest_example.ipynb
business_problem: Enables historical backtesting of FX trading strategies using G10 currency pairs with technical indicator-based
signal generation to evaluate strategy performance.
intent_keywords:
- backtest
- fx trading
- g10 currency
- technical indicators
- strategy testing
stage: backtesting
data_domain: trading_data
type: trading_strategy
- kuc_id: KUC-103
source_file: finmarketpy_examples/finmarketpy_notebooks/market_data_example.ipynb
business_problem: Fetches economic and financial market data from external vendors like Quandl, demonstrating how to request
and cache market data with specific fields and date ranges.
intent_keywords:
- market data
- quandl
- fetch data
- vendor data
- interest rates
stage: data_collection
data_domain: market_data
type: data_pipeline
- kuc_id: KUC-104
source_file: finmarketpy_examples/finmarketpy_notebooks/s3_bucket_example.ipynb
business_problem: Demonstrates writing and reading tick market data to/from AWS S3 cloud storage using Parquet format for
efficient compression and retrieval of historical FX tick data.
intent_keywords:
- s3 storage
- aws
- parquet
- cloud storage
- tick data
stage: data_collection
data_domain: market_data
type: data_pipeline
component_capability_map:
project: finance-bp-108--finmarketpy
scan_date: '2026-04-22'
stats:
total_files: 6
total_classes: 33
total_functions: 0
total_stages: 6
modules:
market_data_collection:
class_count: 4
stage_id: data_collection
stage_order: 1
responsibility: Fetches market data from external vendors (Bloomberg, FRED, Quandl) via findatapy MarketDataRequest
abstraction layer. Provides raw time series for downstream processing, abstracting vendor-specific ticker formats
from strategy logic.
classes:
- name: MarketDataRequest.__init__
file: market_data_collection/marketdatarequest-init.py
line: 0
kind: required_method
signature: ''
- name: Market.fetch_market_data
file: market_data_collection/market-fetch-market-data.py
line: 0
kind: required_method
signature: ''
- name: SpeedCache.generate_key
file: market_data_collection/speedcache-generate-key.py
line: 0
kind: required_method
signature: ''
- name: data_source
file: market_data_collection/data-source.py
line: 0
kind: replaceable_point
design_decision_count: 2
technical_indicator_&_signal_generation:
class_count: 5
stage_id: signal_generation
stage_order: 2
responsibility: Computes technical indicators (SMA, EMA, RSI, Bollinger Bands) and converts them to discrete +1/-1 trading
signals. Acts as the core alpha generation engine, transforming raw price data into actionable directional signals.
classes:
- name: TechIndicator.create_tech_ind
file: technical_indicator_&_signal_generation/techindicator-create-tech-ind.py
line: 0
kind: required_method
signature: ''
- name: TechParams.__init__
file: technical_indicator_&_signal_generation/techparams-init.py
line: 0
kind: required_method
signature: ''
- name: EventsFactory.create_event_signal
file: technical_indicator_&_signal_generation/eventsfactory-create-event-signal.py
line: 0
kind: required_method
signature: ''
- name: indicator_type
file: technical_indicator_&_signal_generation/indicator-type.py
line: 0
kind: replaceable_point
- name: signal_direction_filter
file: technical_indicator_&_signal_generation/signal-direction-filter.py
line: 0
kind: replaceable_point
design_decision_count: 3
total_return_index_construction:
class_count: 6
stage_id: curve_construction
stage_order: 3
responsibility: Builds continuous time series of total return indices for FX spot, forwards, and options by incorporating
carry/roll costs and handling rolling around contract expiry dates. Provides the asset return stream for P&L calculation.
classes:
- name: FXSpotCurve.construct_total_returns_index
file: total_return_index_construction/fxspotcurve-construct-total-returns-inde.py
line: 0
kind: required_method
signature: ''
- name: FXForwardsCurve.roll_contracts
file: total_return_index_construction/fxforwardscurve-roll-contracts.py
line: 0
kind: required_method
signature: ''
- name: FXOptionsCurve.construct_tri
file: total_return_index_construction/fxoptionscurve-construct-tri.py
line: 0
kind: required_method
signature: ''
- name: AbstractCurve.generate_key
file: total_return_index_construction/abstractcurve-generate-key.py
line: 0
kind: required_method
signature: ''
- name: roll_event
file: total_return_index_construction/roll-event.py
line: 0
kind: replaceable_point
- name: construct_via_currency
file: total_return_index_construction/construct-via-currency.py
line: 0
kind: replaceable_point
design_decision_count: 4
fx_volatility_surface_&_pricing:
class_count: 6
stage_id: volatility_pricing
stage_order: 4
responsibility: Builds interpolated FX volatility surface from market quotes (ATM vol, 25d/10d risk reversals/strangles).
Prices vanilla FX options using FinancePy. Computes realized volatility and volatility risk premium for vol-targeting
adjustments.
classes:
- name: FXVolSurface.build_vol_surface
file: fx_volatility_surface_&_pricing/fxvolsurface-build-vol-surface.py
line: 0
kind: required_method
signature: ''
- name: FXOptionsPricer.price_option
file: fx_volatility_surface_&_pricing/fxoptionspricer-price-option.py
line: 0
kind: required_method
signature: ''
- name: FXForwardsPricer.price_odd_date
file: fx_volatility_surface_&_pricing/fxforwardspricer-price-odd-date.py
line: 0
kind: required_method
signature: ''
- name: VolStats.calculate
file: fx_volatility_surface_&_pricing/volstats-calculate.py
line: 0
kind: required_method
signature: ''
- name: vol_function_type
file: fx_volatility_surface_&_pricing/vol-function-type.py
line: 0
kind: replaceable_point
- name: pricing_engine
file: fx_volatility_surface_&_pricing/pricing-engine.py
line: 0
kind: replaceable_point
design_decision_count: 4
strategy_backtesting_engine:
class_count: 7
stage_id: backtesting
stage_order: 5
responsibility: Combines signals with asset returns to compute P&L. Applies volatility targeting, position limits, and
transaction costs. Aggregates into portfolio returns with exposure tracking and leverage management. Core engine for
strategy evaluation.
classes:
- name: Backtest.calculate_trading_PnL
file: strategy_backtesting_engine/backtest-calculate-trading-pnl.py
line: 0
kind: required_method
signature: ''
- name: TradingModel.construct_strategy
file: strategy_backtesting_engine/tradingmodel-construct-strategy.py
line: 0
kind: required_method
signature: ''
- name: PortfolioWeightConstruction.optimize_portfolio_weights
file: strategy_backtesting_engine/portfolioweightconstruction-optimize-por.py
line: 0
kind: required_method
signature: ''
- name: RiskEngine.calculate_leverage_factor
file: strategy_backtesting_engine/riskengine-calculate-leverage-factor.py
line: 0
kind: required_method
signature: ''
- name: portfolio_combination
file: strategy_backtesting_engine/portfolio-combination.py
line: 0
kind: replaceable_point
- name: signal_delay
file: strategy_backtesting_engine/signal-delay.py
line: 0
kind: replaceable_point
- name: portfolio_vol_adjust
file: strategy_backtesting_engine/portfolio-vol-adjust.py
line: 0
kind: replaceable_point
design_decision_count: 7
trade_analysis_&_reporting:
class_count: 5
stage_id: analysis_reporting
stage_order: 6
responsibility: Post-backtest analysis including return statistics, sensitivity analysis to parameters and transaction
costs, day-of-month effects, and comparison across multiple models. Transforms raw P&L into actionable insights.
classes:
- name: TradeAnalysis.analyse_strategy
file: trade_analysis_&_reporting/tradeanalysis-analyse-strategy.py
line: 0
kind: required_method
signature: ''
- name: BacktestComparison.compare
file: trade_analysis_&_reporting/backtestcomparison-compare.py
line: 0
kind: required_method
signature: ''
- name: Report.generate
file: trade_analysis_&_reporting/report-generate.py
line: 0
kind: required_method
signature: ''
- name: Seasonality.detect
file: trade_analysis_&_reporting/seasonality-detect.py
line: 0
kind: required_method
signature: ''
- name: analysis_engine
file: trade_analysis_&_reporting/analysis-engine.py
line: 0
kind: replaceable_point
design_decision_count: 2
data_flow_hints: []
locale_contract:
source_language: en
user_facing_fields:
- human_summary.what_i_can_do.tagline
- human_summary.what_i_can_do.use_cases[]
- human_summary.what_i_auto_fetch[]
- human_summary.what_i_ask_you[]
- evidence_quality.user_disclosure_template
- post_install_notice.message_template.positioning
- post_install_notice.message_template.capability_catalog.groups[].name
- post_install_notice.message_template.capability_catalog.groups[].description
- post_install_notice.message_template.capability_catalog.groups[].ucs[].name
- post_install_notice.message_template.capability_catalog.groups[].ucs[].short_description
- post_install_notice.message_template.call_to_action
- post_install_notice.message_template.featured_entries[].beginner_prompt
- post_install_notice.message_template.more_info_hint
- preconditions[].description
- preconditions[].on_fail
- intent_router.uc_entries[].name
- intent_router.uc_entries[].ambiguity_question
- architecture.pipeline
- architecture.stages[].narrative.does_what
- architecture.stages[].narrative.key_decisions
- architecture.stages[].narrative.common_pitfalls
- constraints.fatal[].consequence
- constraints.regular[].consequence
- output_validator.assertions[].failure_message
- acceptance.hard_gates[].on_fail
- skill_crystallization.action
locale_detection_order:
- explicit_user_declaration
- first_message_language
- system_locale
translation_enforcement:
trigger: on_first_user_message
action: Render user_facing_fields in detected locale, preserving all IDs (BD-/SL-/UC-/finance-C-) and code identifiers
verbatim
violation_code: LOCALE-01
violation_signal: User receives untranslated English Human Summary when detected locale != en
evidence_quality:
declared:
evidence_coverage_ratio: 1.0
evidence_verify_ratio: 0.32
evidence_invalid: 51
evidence_verified: 24
evidence_auto_fixed: 0
audit_coverage: 60/60 (100%)
audit_pass_rate: 8/60 (13%)
audit_fail_total: 18
audit_finance_universal:
pass: 5
warn: 11
fail: 4
audit_subdomain_totals:
pass: 3
warn: 23
fail: 14
enforcement_rules:
- id: EQ-01
trigger: declared.evidence_verify_ratio < 0.5
action: MUST invoke traceback lookup for all cited BD-IDs in output before emitting business code — read LATEST.yaml sections
for each BD referenced
violation_code: EQ-01-V
violation_signal: Generated script references BD-IDs but no tool_call to read LATEST.yaml preceded code generation
user_disclosure_template: '[QUALITY NOTICE] This crystal was compiled from blueprint finance-bp-108. Evidence verify ratio
= 32.0% and audit fail total = 18. Generated results may have uncaptured requirement gaps. Verify critical decisions against
source files (LATEST.yaml / LATEST.jsonl).'
traceback:
source_files:
blueprint: LATEST.yaml
constraints: LATEST.jsonl
mandatory_lookup_scenarios:
- id: TB-01
condition: Two constraints have apparently conflicting enforcement rules
lookup_target: LATEST.jsonl — find both constraint IDs, compare `consequence` + `evidence_refs` to determine priority
- id: TB-02
condition: A business decision rationale is unclear or disputed
lookup_target: LATEST.yaml — locate BD-ID under business_decisions, read `rationale` + `alternative_considered` fields
- id: TB-03
condition: evidence_invalid > 0 in evidence_quality.declared
lookup_target: LATEST.yaml _enrich_meta — cross-check specific BD `evidence_refs` fields for invalid markers
- id: TB-04
condition: User asks where a rule comes from
lookup_target: LATEST.jsonl — find constraint by ID, read `confidence.evidence_refs` for source file + line number
- id: TB-05
condition: Generated code does not match expected ZVT API behavior
lookup_target: LATEST.yaml stages[].required_methods — verify method signature and evidence locator in source code
degraded_lookup:
no_fs_access: 'Ask the user to paste the relevant LATEST.yaml section or LATEST.jsonl lines for the BD-/finance-C- IDs
in question. Crystal ID: finance-bp-108-v5.0.'
trace_schema:
event_types:
- precondition_check
- spec_lock_check
- evidence_rule_fired
- evidence_rule_skipped
- locale_translation_emitted
- hard_gate_passed
- hard_gate_failed
- skill_emitted
- false_completion_claim
preconditions:
- id: PC-01
description: zvt package installed and importable
check_command: python3 -c 'import zvt; print(zvt.__version__)'
on_fail: 'Run: python3 -m pip install zvt then re-run: python3 -m zvt.init_dirs to initialize data directories'
severity: fatal
- id: PC-02
description: K-data exists for target entities (required before backtesting)
check_command: python3 -c "from zvt.api.kdata import get_kdata; df = get_kdata(entity_ids=['stock_sh_600000'], limit=1);
assert df is not None and len(df) > 0, 'No kdata found'"
on_fail: 'Run recorder first: python3 -m zvt.recorders.em.em_stock_kdata_recorder --entity_ids stock_sh_600000 (replace
with your target entity IDs)'
severity: fatal
applies_to_uc: []
- id: PC-03
description: ZVT data directory initialized (~/.zvt or ZVT_HOME)
check_command: 'python3 -c "import os; from pathlib import Path; zvt_home = Path(os.environ.get(''ZVT_HOME'', Path.home()
/ ''.zvt'')); assert zvt_home.exists(), f''ZVT home not found: {zvt_home}''"'
on_fail: 'Run: python3 -m zvt.init_dirs'
severity: fatal
- id: PC-04
description: SQLite write permission for ZVT data directory
check_command: python3 -c "import os, tempfile; from pathlib import Path; zvt_home = Path(os.environ.get('ZVT_HOME', Path.home()
/ '.zvt')); test_f = zvt_home / '.write_test'; test_f.touch(); test_f.unlink()"
on_fail: 'Check directory permissions: chmod u+w ~/.zvt or set ZVT_HOME environment variable to a writable location'
severity: warn
intent_router:
uc_entries:
- uc_id: UC-101
name: ArcticDB Tick Data Storage
positive_terms:
- arcticdb
- tick data storage
- time series database
- lmdb
- market data persistence
data_domain: market_data
negative_terms:
- backtest strategy
- screening
- live trading
- screening factors
- ml prediction
ambiguity_question: Are you looking to store and retrieve tick-level market data from a time series database, or are you
running a trading strategy or backtest?
- uc_id: UC-102
name: FX G10 Cross Backtesting
positive_terms:
- backtest
- fx trading
- g10 currency
- technical indicators
- strategy testing
data_domain: trading_data
negative_terms:
- arcticdb storage
- s3 bucket
- screening
- live trading
- data collection only
ambiguity_question: Do you want to test a trading strategy on historical FX data (backtesting), or do you need to just
fetch and store market data?
- uc_id: UC-103
name: Market Data Fetching from Vendors
positive_terms:
- market data
- quandl
- fetch data
- vendor data
- interest rates
data_domain: market_data
negative_terms:
- backtest
- strategy
- arcticdb
- s3 storage
- live trading
ambiguity_question: Are you trying to download market data from a vendor for analysis, or are you running a backtest or
trading strategy?
- uc_id: UC-104
name: S3 Cloud Storage for Tick Data
positive_terms:
- s3 storage
- aws
- parquet
- cloud storage
- tick data
data_domain: market_data
negative_terms:
- backtest strategy
- arcticdb
- quandl
- live trading
- screening
ambiguity_question: Do you need to store tick data in AWS S3 cloud storage, or are you running a trading strategy or backtest?
context_state_machine:
states:
- id: CA1_MEMORY_CHECKED
entry: Task started
exit: All memory queries attempted and recorded; memory_unavailable set if failed
timeout: 30s — skip memory, mark memory_unavailable=true, proceed to CA2
- id: CA2_GAPS_FILLED
entry: CA1 complete
exit: 'All FATAL-priority required inputs answered: target market (A-share/HK/US), data source, time range, strategy type'
timeout: NOT skippable — FATAL inputs MUST be user-answered before proceeding
- id: CA3_PATH_SELECTED
entry: CA2 complete
exit: intent_router matched single use case with confidence gap > 20% over next candidate, no data_domain ambiguity
timeout: Trigger ambiguity_question for top-2 candidates, await user selection
- id: CA4_EXECUTING
entry: CA3 complete + user explicit confirmation received
exit: All hard gates G1-Gn passed and output files written
timeout: NOT skippable — user confirmation of execution path required
enforcement: Code generation is PROHIBITED before CA4_EXECUTING. Any regression to earlier state MUST be announced to user.
buy/sell ordering SL-01 check runs at CA4 entry.
spec_lock_registry:
semantic_locks:
- id: SL-01
description: Execute sell orders before buy orders in every trading cycle
locked_value: sell() called before buy() in each Trader.run() iteration
violation_is: fatal
source_bd_ids:
- BD-018
- id: SL-02
description: Trading signals MUST use next-bar execution (no look-ahead)
locked_value: due_timestamp = happen_timestamp + level.to_second()
violation_is: fatal
source_bd_ids:
- BD-014
- BD-025
- id: SL-03
description: Entity IDs MUST follow format entity_type_exchange_code
locked_value: stock_sh_600000 | stockhk_hk_0700 | stockus_nasdaq_AAPL
violation_is: fatal
source_bd_ids: []
- id: SL-04
description: DataFrame index MUST be MultiIndex (entity_id, timestamp)
locked_value: df.index.names == ['entity_id', 'timestamp']
violation_is: fatal
source_bd_ids: []
- id: SL-05
description: 'TradingSignal MUST have EXACTLY ONE of: position_pct, order_money, order_amount'
locked_value: XOR enforcement in trading/__init__.py:68
violation_is: fatal
source_bd_ids: []
- id: SL-06
description: 'filter_result column semantics: True=BUY, False=SELL, None/NaN=NO ACTION'
locked_value: factor.py:475 order_type_flag mapping
violation_is: fatal
source_bd_ids: []
- id: SL-07
description: Transformer MUST run BEFORE Accumulator in factor pipeline
locked_value: 'compute_result(): transform at :403 before accumulator at :409'
violation_is: fatal
source_bd_ids: []
- id: SL-08
description: 'MACD parameters locked: fast=12, slow=26, signal=9'
locked_value: factors/algorithm.py:30 macd(slow=26, fast=12, n=9)
violation_is: fatal
source_bd_ids:
- BD-036
- id: SL-09
description: 'Default transaction costs: buy_cost=0.001, sell_cost=0.001, slippage=0.001'
locked_value: sim_account.py:25 SimAccountService default costs
violation_is: warning
source_bd_ids:
- BD-029
- id: SL-10
description: A-share equity trading is T+1 (no same-day close of buy positions)
locked_value: sim_account.available_long filters by trading_t
violation_is: fatal
source_bd_ids: []
- id: SL-11
description: Recorder subclass MUST define provider AND data_schema class attributes
locked_value: contract/recorder.py:71 Meta; register_schema decorator
violation_is: fatal
source_bd_ids: []
- id: SL-12
description: Factor result_df MUST contain either 'filter_result' OR 'score_result' column
locked_value: result_df.columns.intersection({'filter_result', 'score_result'}) non-empty
violation_is: fatal
source_bd_ids: []
implementation_hints:
- id: IH-01
hint: 'Use AdjustType enum exactly: qfq (pre-adjust), hfq (post-adjust), bfq (none) — contract/__init__.py:121'
- id: IH-02
hint: For A-share kdata, default to hfq for long-term analysis (dividend-adjusted) — trader.py:538 StockTrader
- id: IH-03
hint: SQLite connection MUST use check_same_thread=False for multi-threaded recorders
- id: IH-04
hint: Accumulator state serialization uses JSON with custom encoder/decoder hooks — contract/base_service.py
- id: IH-05
hint: Factor.level MUST match TargetSelector.level (enforced at add_factor) — factors/target_selector.py:84
preservation_manifest:
required_objects:
business_decisions_count: 82
fatal_constraints_count: 18
non_fatal_constraints_count: 132
use_cases_count: 4
semantic_locks_count: 12
preconditions_count: 4
evidence_quality_rules_count: 2
traceback_scenarios_count: 5
architecture:
pipeline: data_collection -> data_storage -> factor_computation -> target_selection -> trading_execution -> visualization
stages:
- id: data_collection
narrative:
does_what: TimeSeriesDataRecorder and FixedCycleDataRecorder fetch OHLCV and fundamental data from providers (eastmoney,
joinquant, baostock, akshare) and persist domain objects (Stock1dKdata, BalanceSheet) to SQLite via df_to_db().
key_decisions: BD-002 chose evaluate_start_end_size_timestamps for incremental fetch (not full refresh) because comparing
to get_latest_saved_record avoids redundant API calls; BD-003 chose get_data_map field transformation to keep domain
schema provider-agnostic.
common_pitfalls: 'Don''t forget SL-11: Recorder subclass MUST declare both provider and data_schema class attributes
else initialization fails with assertion error; finance-C-001 fatal violation.'
business_decisions:
- id: BD-001
type: B/DK
summary: Separate vendor tickers from internal tickers via MarketDataRequest
- id: BD-002
type: BA/DK
summary: Use 'close' as default field across each data sources
- id: BD-031
type: B/DK
summary: Left join with fill-down for asset-signal alignment
- id: BD-032
type: B/BA
summary: FFILL for carry/deposit data, not spot
- id: BD-GAP-001
type: B
summary: 'Missing: Train/test time split integrity'
- id: BD-GAP-002
type: B
summary: 'Missing: Immutable event log'
- id: BD-GAP-003
type: B
summary: 'Missing: Immutable event log'
- id: BD-GAP-004
type: B
summary: 'Missing: Default Definition & IFRS 9 Staging'
- id: BD-GAP-005
type: B
summary: 'Missing: Stress Test Macro Variables'
- id: BD-GAP-006
type: B
summary: 'Missing: Funds Transfer Pricing (FTP)'
- id: BD-GAP-007
type: B
summary: 'Missing: Cash Pooling Legal Structure'
- id: data_storage
narrative:
does_what: StorageBackend persists DataFrames to per-provider SQLite databases at {data_path}/{provider}/{provider}_{db_name}.db
using path templates from _get_path_template; Mixin.record_data and Mixin.query_data provide uniform read/write interface.
key_decisions: BD-004 chose StorageBackend abstraction (not hardcoded SQLite) to allow future cloud storage swap; BD-006
derives db_name from data_schema __tablename__ for per-domain database isolation.
common_pitfalls: SL-04 violation (wrong DataFrame index) causes factor pipeline failures downstream; always ensure df.index.names
== ['entity_id', 'timestamp'] before calling record_data.
business_decisions: []
- id: factor_computation
narrative:
does_what: Factor.compute() applies Transformer (stateless, e.g. MacdTransformer) then Accumulator (stateful, e.g. MaStatsAccumulator)
to produce filter_result or score_result columns; EntityStateService persists per-entity rolling state across batches.
key_decisions: BD-007 chose Factor inheriting DataReader for composable data access; SL-08 locks MACD at (fast=12, slow=26,
n=9) — chose standard Appel parameters not adaptive because interpretability matters for practitioners.
common_pitfalls: 'SL-07: Transformer MUST run before Accumulator — swapping order causes NaN propagation; SL-12: result_df
must contain filter_result OR score_result column or TargetSelector silently drops all signals.'
business_decisions: []
- id: target_selection
narrative:
does_what: TargetSelector.add_factor() registers Factor instances; get_targets() returns entity_ids passing threshold
filter at a specific timestamp, enabling point-in-time historical backtesting without look-ahead.
key_decisions: BD-012 chose registrable factor list (not hardcoded) for runtime customization; BD-013 chose timestamp-specific
filtering not current-only because backtests need historical point-in-time correctness.
common_pitfalls: Factor.level MUST match TargetSelector.level (IH-05); mismatched levels cause silent empty target lists
that look like no signals but are actually level-mismatch bugs.
business_decisions: []
- id: trading_execution
narrative:
does_what: Trader.run() calls sell() before buy() each cycle, generates TradingSignals with due_timestamp = happen_timestamp
+ level.to_second() for next-bar execution, and applies on_profit_control() for stop-loss/take-profit before regular
target selection.
key_decisions: SL-01 locks sell-before-buy order because available_long check in sim_account depends on it — chose this
over symmetric ordering to prevent implicit leverage; BD-039 chose long=AND/short=OR multi-level logic to reflect
risk asymmetry.
common_pitfalls: 'SL-02 violation (immediate execution instead of next-bar) introduces look-ahead bias and makes backtest
results unreproducible in live trading; SL-10: A-share T+1 constraint — backtesting without it overstates returns.'
business_decisions: []
- id: visualization
narrative:
does_what: Drawer.draw() combines kline main chart with factor overlays and Rect annotations for entry/exit signals
using Plotly; Drawable interface on Factor enables consistent chart rendering across data types.
key_decisions: BD-019 chose drawer_rects subclass override for custom annotations not hardcoded markers — allows traders
to define entry/exit visuals without modifying base drawing logic.
common_pitfalls: draw_result=True by default (BD-055) is fine for development but set draw_result=False in production/headless
environments to avoid Plotly server startup overhead.
business_decisions: []
- id: cross_cutting_concerns
narrative:
does_what: 'Invariants and utilities that span multiple pipeline stages — collected from 31 source groups: analysis_reporting(5),
asset_selection(1), backtesting(7), carry_calculation(1), cost_modeling(1), curve_construction(6), and 25 more.'
key_decisions: 71 BDs merged here because they apply to more than one main stage (e.g. algorithm helpers, default value
choices, ordering contracts, error handling). Agent should inspect individual BD summaries and link back to affected
main stages via shared IDs.
common_pitfalls: Cross-cutting concerns frequently surface as bugs when changes to one main stage unintentionally break
another. Check constraints referencing these BDs and verify invariants still hold after any stage-local modification.
business_decisions:
- id: BD-020
type: BA
summary: Output path defaults to 'output_data/YYYYMMDD' with timestamp
- id: BD-021
type: M
summary: PyFolio integration optional (try/except import)
- id: BD-052
type: B
summary: GraphicalLassoCV for covariance-based network learning
- id: BD-053
type: B/DK
summary: Affinity propagation for clustering assets in network
- id: BD-054
type: B
summary: Locally Linear Embedding for 2D network visualization
- id: BD-029
type: B
summary: G10 USD crosses basket for FX trend following
- id: BD-014
type: B/BA
summary: 10% vol target as default for both signal and portfolio
- id: BD-015
type: BA/DK
summary: 252 annualization factor for daily data
- id: BD-016
type: BA
summary: Max leverage capped at 5x for vol-targeted strategies
- id: BD-017
type: BA
summary: Signal delayed by 0 periods (trade same day as signal)
- id: BD-018
type: B/BA
summary: Position limits applied via element-wise clip adjustment
- id: BD-019
type: BA/DK
summary: Transaction costs in basis points (bp) not decimals
- id: BD-028
type: B/BA
summary: Only allow longs in EURUSD single-currency strategy
- id: BD-039
type: B/BA
summary: ON (overnight) tenor for spot carry calculation
- id: BD-023
type: B/BA
summary: Default spot transaction cost of 2.5 basis points
- id: BD-006
type: B/BA
summary: Depos tenor 'ON' (overnight) as default for spot curve
- id: BD-007
type: BA
summary: Multiplicative cum_index by default ('mult' not 'add')
- id: BD-008
type: BA/DK
summary: FX options roll on 'expiry-date' not month-end
- id: BD-009
type: BA/DK
summary: Construct crosses via 'no' (direct) or via domestic currency
- id: BD-027
type: B
summary: Rebalance frequency of 'BM' (business month end) for vol targeting
- id: BD-050
type: B/BA
summary: Portfolio combination default is None (equal weight)
- id: BD-040
type: B/BA
summary: 'Currencies with 365-day basis: AUD, CAD, GBP, NZD'
- id: BD-074
type: BA/M
summary: Cumulative index default 'mult' vs 'add' changes P&L scaling fundamentally
- id: BD-055
type: B/BA
summary: Event study uses NYC 10am cutoff for economic releases
- id: BD-030
type: B/BA
summary: Signal delay of 0 (same-day execution)
- id: BD-038
type: B/BA
summary: 1M tenor as default for options and forwards trading
- id: BD-071
type: RC
summary: Hardcoded 365-day count currencies affect ALL FX curve calculations
- id: BD-072
type: B
summary: Stop loss/take profit signals MUST be applied BEFORE portfolio weight optimization
- id: BD-075
type: B
summary: Signal delay via signal_delay shift MUST occur BEFORE non-trading day masking
- id: BD-073
type: BA/DK
summary: Transaction costs divided by 2 assumes symmetric round-trip costs (entry + exit)
- id: BD-059
type: B/BA
summary: Parallel backtesting with multiprocessing on Linux (8 threads)
- id: BD-060
type: B/BA
summary: Output calculation fields disabled by default for performance
- id: BD-033
type: B
summary: Numba JIT compilation for total return index calculation
- id: BD-049
type: B/BA
summary: Multiplicative cumulative index starting at 100
- id: BD-024
type: B
summary: Vol target of 10% annualised with 20-day lookback period
- id: BD-025
type: B/BA
summary: Maximum leverage of 5x for vol-adjusted signals
- id: BD-051
type: B/BA
summary: Position clip resample to BM (business month) by default
- id: BD-057
type: B
summary: Use expiry-date roll event for options strategy
- id: BD-058
type: B
summary: Roll 5 days before roll event
- id: BD-003
type: B/BA
summary: Signal uses +1 for buy/above, -1 for sell/below (not 1/0)
- id: BD-004
type: BA/DK
summary: fillna=True by default in TechParams
- id: BD-005
type: BA/DK
summary: Forward-fill signals on non-trading days (hold previous position)
- id: BD-022
type: B/BA
summary: SMA period of 200 for trend following FX strategy
- id: BD-044
type: B/BA
summary: Buy if spot above SMA, sell if below
- id: BD-045
type: B/BA
summary: First n-1 periods set to NaN for rolling indicators
- id: BD-046
type: B
summary: GMMA uses EMA spans [3,5,7,10,12,15] short and [30,35,40,45,50,60] long
- id: BD-047
type: B/BA
summary: RSI period of 14 for momentum calculation
- id: BD-048
type: B/BA
summary: ATR period of 14 for volatility-adjusted signals
- id: BD-062
type: B/BA
summary: Volatility-targeting via rolling realized vol with max leverage cap
- id: BD-065
type: B/BA
summary: Risk stop signals with stop-loss and take-profit levels
- id: BD-066
type: B/BA
summary: Position clipping to enforce net/total exposure limits
- id: BD-068
type: B/BA
summary: Black-Scholes model for FX vanilla option pricing
- id: BD-064
type: B
summary: FX implied vol surface interpolation with polynomial/Clark5 methods
- id: BD-069
type: T
summary: Rolling realized volatility with annualization factor
- id: BD-070
type: T
summary: Volatility risk premium as implied minus realized vol
- id: BD-063
type: B/DK
summary: Guppy Multiple Moving Average with 12 EMA components
- id: BD-067
type: B
summary: Graphical Lasso for sparse covariance estimation in network
- id: BD-026
type: B/BA
summary: Annualisation factor of 252 for daily data
- id: BD-041
type: B/BA
summary: Multiplier of sqrt(3) for Friday ON vol adjustment
- id: BD-056
type: B/BA
summary: Realised vol rolling window of tenor_days for daily data
- id: BD-042
type: B
summary: Weighted median model for implied vol addon estimation
- id: BD-043
type: B/BA
summary: Model window of 20 days for vol addon calculation
- id: BD-010
type: B/BA
summary: CLARK5 interpolation for vol surface by default
- id: BD-011
type: B/BA
summary: Fwd-delta-neutral-premium-adj ATM method
- id: BD-012
type: M/BA
summary: Nelder-Mead-Numba solver (faster but less accurate)
- id: BD-013
type: M
summary: Uses FinancePy externally for pricing engine
- id: BD-034
type: B
summary: CLARK5 interpolation for FX vol surface
- id: BD-035
type: B/BA
summary: Forward delta neutral premium adjusted for ATM method
- id: BD-036
type: B/BA
summary: Spot delta premium adjusted for delta quoting
- id: BD-037
type: B/DK
summary: Nelder-Mead Numba solver for vol surface calibration
- id: BD-061
type: B/BA
summary: Premium output in pct-for (base currency percentage)
resources:
packages:
- name: blosc
version_pin: latest
- name: chartpy
version_pin: latest
- name: findatapy
version_pin: latest
- name: matplotlib
version_pin: latest
- name: numba
version_pin: latest
- name: numpy
version_pin: latest
- name: pandas
version_pin: latest
- name: scikit-learn
version_pin: latest
- name: seasonal
version_pin: latest
- name: financepy
version_pin: latest
strategy_scaffold:
entry_point_name: run_backtest
output_path: result.csv
execution_mode: backtest
conditional_entry_points:
backtest:
entry_point_name: run_backtest
output_path: result.csv
collector:
entry_point_name: run_collector
output_path: result.json
factor:
entry_point_name: run_factor
output_path: result.parquet
training:
entry_point_name: run_training
output_path: result.json
serving:
entry_point_name: run_server
output_path: result.json
research:
entry_point_name: run_research
output_path: result.json
tail_template: "# === DO NOT MODIFY BELOW THIS LINE ===\nif __name__ == \"__main__\":\n result = run_backtest() #\
\ implement above\n from validate import enforce_validation\n enforce_validation(result, output_path=\"{workspace}/result.csv\"\
)\n# === END DO NOT MODIFY ==="
host_adapter:
target: openclaw
timeout_seconds: 1800
shell_operator_restriction: 'exec tool intercepts && / ; / | — never chain: ''pip install X && python Y''. Use separate
exec calls.'
install_recipes:
- python3 -m pip install blosc
- python3 -m pip install chartpy
- python3 -m pip install findatapy
- python3 -m pip install zvt
credential_injection: JoinQuant/QMT credentials require user-side '!' prefix shell login. Never hardcode credentials in
generated scripts.
path_resolution: '{workspace} resolves to ~/.openclaw/workspace/doramagic at execution time.'
file_io_tooling: Use openclaw 'write' tool for .py/.sql files; 'exec' tool for python3 /absolute/path/script.py (absolute
paths only).
constraints:
fatal:
- id: finance-C-008
when: When accessing Bloomberg Terminal for market data
action: Attempt to use Bloomberg Desktop API on non-Windows platforms
severity: fatal
kind: resource_boundary
modality: must_not
consequence: Bloomberg Terminal/DAPI is Windows-only; attempting to use it on Linux/Mac causes immediate failure with
cryptic errors
stage_ids:
- data_collection
- id: finance-C-011
when: When using finmarketpy for live trading decisions
action: Claim finmarketpy provides real-time trading signals or live execution capability
severity: fatal
kind: claim_boundary
modality: must_not
consequence: finmarketpy is a backtesting library; claiming live trading capability misleads users into making trading
decisions based on unverified backtest-only code
stage_ids:
- data_collection
- id: finance-C-012
when: When presenting backtest results
action: Present backtest returns as guaranteed future trading performance
severity: fatal
kind: claim_boundary
modality: must_not
consequence: Backtest results reflect historical conditions and perfect execution assumptions; presenting them as future
guarantees leads to unexpected live trading losses
stage_ids:
- data_collection
- id: finance-C-016
when: When computing discrete trading signals from technical indicators
action: output exclusively +1 (long), -1 (short), or NaN (flat) values
severity: fatal
kind: domain_rule
modality: must
consequence: Signal values outside {+1, -1, NaN} will cause incorrect position sizing in the backtest engine, as the PnL
calculation at backtestengine.py:201-216 assumes symmetric long/short encoding
stage_ids:
- signal_generation
- id: finance-C-017
when: When implementing indicator-based signal generation
action: introduce inherent 1-period lag through rolling window or shift operations
severity: fatal
kind: domain_rule
modality: must
consequence: Signals generated without lag will exhibit look-ahead bias, causing live trading returns to fall far below
backtested results because the strategy would have traded on information not yet available
stage_ids:
- signal_generation
- id: finance-C-022
when: When constructing signals from RSI indicator
action: use shift(-1) for RSI exit signals to prevent look-ahead bias
severity: fatal
kind: architecture_guardrail
modality: must
consequence: Using current-period RSI values for signal generation creates look-ahead bias, as the signal would fire based
on price movements that haven't occurred yet in the current period
stage_ids:
- signal_generation
- id: finance-C-029
when: When implementing total return index construction with multiplicative cumulation
action: Initialize TRI at base value 100 and compound returns forward using cumprod
severity: fatal
kind: domain_rule
modality: must
consequence: Additive cumulation (cumsum) allows negative cumulative returns to break the index chain, causing TRI values
to become meaningless for P&L calculation and potentially producing misleading backtest results
stage_ids:
- curve_construction
- id: finance-C-035
when: When handling missing deposit rate data at start of TRI series
action: Forward-fill carry data and backward-fill only old data gaps, never forward-fill spot prices
severity: fatal
kind: architecture_guardrail
modality: must
consequence: Forward-filling spot prices creates look-ahead bias where future prices are used to calculate historical
returns, invalidating all TRI-based backtest results
stage_ids:
- curve_construction
- id: finance-C-046
when: When calculating option delta hedging P&L
action: Use previous period's delta for current period's spot return hedging
severity: fatal
kind: domain_rule
modality: must
consequence: Using current delta with current spot return creates circular dependency where hedge ratio is set after price
is known, producing fictitious hedging profits
stage_ids:
- curve_construction
- id: finance-C-048
when: When computing cross-currency returns via intermediate currency
action: Subtract term currency returns from base currency returns (base_rets - terms_rets)
severity: fatal
kind: domain_rule
modality: must
consequence: Incorrect sign in cross-currency return calculation inverts the strategy direction, causing long positions
to be treated as shorts and producing completely wrong P&L attribution
stage_ids:
- curve_construction
- id: finance-C-059
when: When pricing options on high-vol event dates
action: Build vol surface before extracting surface or pricing instruments
severity: fatal
kind: architecture_guardrail
modality: must
consequence: Calling extract_vol_surface or price_instrument without prior build_vol_surface results in NoneType errors,
causing complete failure of option pricing pipeline
stage_ids:
- volatility_pricing
- id: finance-C-061
when: When processing vol surface quotes for JPY pairs
action: Apply divisor of 100 to JPY rates to convert from percentage to decimal
severity: fatal
kind: domain_rule
modality: must
consequence: JPY market quotes are in percentage (e.g., 3.46%) while code expects decimal; omitting divisor produces 100x
wrong rates, corrupting discount curves and option prices
stage_ids:
- volatility_pricing
- id: finance-C-085
when: When outputting strategy comparison results
action: validate each models are TradingModel instances before plotting
severity: fatal
kind: architecture_guardrail
modality: must
consequence: Non-TradingModel input causes AttributeError at runtime when plotting methods are called, blocking comparison
reports
stage_ids:
- analysis_reporting
- id: finance-C-089
when: When presenting backtest results to stakeholders
action: present simulated backtest returns as guaranteed future performance
severity: fatal
kind: claim_boundary
modality: must_not
consequence: Misrepresenting backtest results as predictive leads to inappropriate strategy allocation, potentially causing
significant financial losses when live trading differs from historical simulation
stage_ids:
- analysis_reporting
- id: finance-C-102
when: When implementing signal generation logic for bidirectional trading strategies
action: Use +1 for buy/above-threshold signals and -1 for sell/below-threshold signals — do not use 1/0 binary encoding
which lacks directional information for short positions
severity: fatal
kind: domain_rule
modality: must
consequence: Using 1/0 binary encoding causes undefined behavior at signal boundaries and eliminates the ability to represent
short positions, breaking long/short strategy symmetry and producing incorrect trading signals
derived_from_bd_id: BD-003
- id: finance-C-106
when: When implementing option pricing functionality in FX options trading
action: Verify FinancePy library is installed and accessible — the framework delegates each option pricing calculations
to FinancePy; without it, pricing functionality will fail
severity: fatal
kind: architecture_guardrail
modality: must
consequence: Attempting to use option pricing without FinancePy causes import failures, breaking the entire options pricing
pipeline and preventing vol surface calibration and option valuation
derived_from_bd_id: BD-013
- id: finance-C-109
when: When implementing FX curve calculations for AUD, CAD, GBP, or NZD currencies
action: Apply ACT/365 (ACTual/365) day count convention for accruals and discounting on AUD, CAD, GBP, and NZD currencies
— do not use 30/360 or ACT/360
severity: fatal
kind: domain_rule
modality: must
consequence: Applying incorrect day count conventions violates market conventions and regulatory reporting requirements,
causing miscalculated funding costs and incorrect risk valuations that may constitute regulatory violations
derived_from_bd_id: BD-071
- id: finance-C-139
when: When validating Black-Scholes model inputs for FX vanilla option pricing
action: 'Verify that volatility inputs conform to Black-Scholes constant-vol assumption: volatility_std > 0.001 (0.1%),
volatility < 2.0 (200%), and volatility_surface_skew < 0.05 (5%) — inputs outside these bounds indicate smile dynamics
incompatible with the model'
severity: fatal
kind: architecture_guardrail
modality: must
consequence: Invalid volatility inputs cause Black-Scholes to produce nonsensical prices (near-zero or infinite values
when vol approaches zero or infinity), making all downstream Greeks and hedge ratios unreliable
derived_from_bd_id: BD-068
regular:
- id: finance-C-001
when: When implementing data collection for backtesting
action: Use 'close' field as the default trading field for price calculations
severity: high
kind: domain_rule
modality: must
consequence: Using non-'close' fields as default may cause incorrect backtest results, since close prices are most reliable
for end-of-day backtesting and strategies are typically designed around close-to-close returns
stage_ids:
- data_collection
- id: finance-C-002
when: When processing market data with non-trading days
action: Produce NaN values for missing data on non-trading days rather than raising errors
severity: high
kind: domain_rule
modality: must
consequence: Throwing errors on non-trading days will halt the entire backtest pipeline; NaN values allow the system to
continue and fill forward from the last valid price
stage_ids:
- data_collection
- id: finance-C-003
when: When creating DataFrame output from market data collection
action: Verify returned DataFrame has DateTimeIndex aligned to the requested date range
severity: high
kind: domain_rule
modality: must
consequence: Non-DatetimeIndex or misaligned index causes downstream indicator calculations and signal generation to fail
or produce incorrect results
stage_ids:
- data_collection
- id: finance-C-004
when: When collecting data from external market data vendors
action: Use MarketDataRequest abstraction to separate vendor-specific tickers from internal tickers
severity: high
kind: architecture_guardrail
modality: must
consequence: Mixing vendor tickers with internal logic creates tight coupling; switching vendors requires rewriting strategy
code instead of just updating ticker mappings
stage_ids:
- data_collection
- id: finance-C-005
when: When loading market data via TradingModel
action: Use the load_assets method as the sole data entry point for the TradingModel
severity: high
kind: architecture_guardrail
modality: must
consequence: Direct access to market data bypassing load_assets bypasses data validation and standardization, causing
inconsistent behavior across strategies
stage_ids:
- data_collection
- id: finance-C-006
when: When fetching market data from external data sources
action: Implement fallback mechanism when data fetch returns None
severity: high
kind: operational_lesson
modality: must
consequence: Network failures, API errors, or missing credentials cause fetch_market to return None; without fallback,
the entire backtest fails without generating any results
stage_ids:
- data_collection
- id: finance-C-007
when: When using external market data vendors
action: Assume real-time data availability from vendors that only provide delayed data
severity: high
kind: resource_boundary
modality: must_not
consequence: Most vendors (FRED, Quandl, Yahoo Finance) provide delayed data; assuming real-time causes live trading strategies
to fail or trade on stale prices
stage_ids:
- data_collection
- id: finance-C-009
when: When downloading market data from paid data vendors
action: Configure API keys for FRED, Quandl, and other vendors requiring authentication
severity: high
kind: resource_boundary
modality: must
consequence: Missing API keys cause authentication failures; data downloads fail and backtest cannot proceed
stage_ids:
- data_collection
- id: finance-C-010
when: When standardizing column names from different data vendors
action: Verify DataFrame columns match standardized names regardless of vendor source
severity: high
kind: architecture_guardrail
modality: must
consequence: Bloomberg returns 'PX_LAST', FRED returns 'close'; if not normalized, downstream signal generation and P&L
calculations fail with KeyError
stage_ids:
- data_collection
- id: finance-C-013
when: When using parallel data fetching from external vendors
action: Use excessive parallel threads that trigger vendor rate limits
severity: medium
kind: operational_lesson
modality: must_not
consequence: Some data providers limit concurrent requests; excessive threads cause HTTP 429 errors, data fetch failures,
or temporary API key suspension
stage_ids:
- data_collection
- id: finance-C-014
when: When configuring market data collection for backtesting
action: Set signal_delay parameter to prevent look-ahead bias in strategy signals
severity: high
kind: domain_rule
modality: must
consequence: Signals generated at end-of-day using same-day close prices cannot be executed; without signal_delay, backtest
assumes impossible execution timing
stage_ids:
- data_collection
- id: finance-C-015
when: When running multiple parallel backtests with market data fetching
action: Use Redis caching to reduce redundant API calls to data vendors
severity: medium
kind: operational_lesson
modality: should
consequence: Repeated data fetches for same tickers consume API quota, slow down backtesting, and may hit rate limits
stage_ids:
- data_collection
- id: finance-C-018
when: When initializing TechParams for signal generation
action: verify indicator warmup periods produce NaN signals for the initial window
severity: high
kind: domain_rule
modality: must
consequence: Signals generated before the warmup period completes will use incomplete indicator calculations, producing
unreliable trading signals that can cause significant financial losses
stage_ids:
- signal_generation
- id: finance-C-019
when: When processing signals across non-trading days
action: forward-fill the previous valid signal to maintain position continuity
severity: high
kind: architecture_guardrail
modality: must
consequence: Without forward-fill, signals on non-trading days will be NaN, causing unintended position flat states that
break position continuity and distort backtested PnL calculations
stage_ids:
- signal_generation
- id: finance-C-020
when: When configuring signal direction filters
action: simultaneously set both only_allow_longs and only_allow_shorts to True
severity: high
kind: domain_rule
modality: must_not
consequence: Setting both direction filters simultaneously results in all signals being zeroed out, as each filter eliminates
the other's signals, producing a dead strategy with zero returns
stage_ids:
- signal_generation
- id: finance-C-021
when: When creating technical indicators with fillna enabled
action: forward-fill missing prices before computing indicators to verify continuous signals
severity: high
kind: domain_rule
modality: must
consequence: Computing indicators without forward-filling prices creates NaN gaps that propagate through rolling calculations,
causing discontinuous indicator values and erratic signal generation
stage_ids:
- signal_generation
- id: finance-C-023
when: When backtesting strategies with technical indicators
action: claim that backtest returns equal expected live trading returns
severity: high
kind: claim_boundary
modality: must_not
consequence: Presenting backtest results as live trading proof violates the fundamental limitation that backtests cannot
account for slippage, liquidity constraints, and execution delays present in live markets
stage_ids:
- signal_generation
- id: finance-C-024
when: When configuring signal_delay parameter
action: apply signal shift in the backtest engine after signal generation
severity: high
kind: architecture_guardrail
modality: must
consequence: Skipping signal_delay application causes signals to execute on the same bar as indicator calculation, creating
look-ahead bias in the backtest that won't occur in live trading
stage_ids:
- signal_generation
- id: finance-C-025
when: When computing Bollinger Bands signals
action: forward-fill flat signals between band touches to maintain position state
severity: medium
kind: architecture_guardrail
modality: must
consequence: Without forward-fill, Bollinger Band signals become NaN between touch events, causing unintended position
flat states that break the trend-following logic
stage_ids:
- signal_generation
- id: finance-C-026
when: When constructing signal DataFrames
action: preserve the original price DataFrame index (DateTimeIndex) for temporal alignment
severity: high
kind: domain_rule
modality: must
consequence: Using a different index causes misaligned signal-to-price multiplication in the backtest, producing NaN PnL
values because signal and price timestamps don't match
stage_ids:
- signal_generation
- id: finance-C-027
when: When implementing custom technical indicators
action: override create_custom_tech_ind method and call parent implementation for standard indicators
severity: medium
kind: architecture_guardrail
modality: must
consequence: Custom indicators that don't follow the _signal/_techind naming convention will break the TradingModel.construct_strategy
call chain, causing PnL calculations to fail
stage_ids:
- signal_generation
- id: finance-C-028
when: When using volatility-adjusted signals
action: claim real-time signal generation capability when using polling-based data fetching
severity: medium
kind: claim_boundary
modality: must_not
consequence: Polling-based data fetching cannot provide true real-time signals; claiming real-time capability misleads
users about the system's latency characteristics
stage_ids:
- signal_generation
- id: finance-C-030
when: When constructing total return indices for FX spot
action: Use overnight deposit tenor (ON) for carry calculation to represent true daily carry cost
severity: high
kind: domain_rule
modality: must
consequence: Using longer deposit tenors (1M, 3M) misrepresents daily carry cost, causing TRI to over/understate true
overnight FX position returns and leading to incorrect strategy P&L attribution
stage_ids:
- curve_construction
- id: finance-C-031
when: When calculating carry accrual for TRI construction
action: Apply correct day count convention based on currency (365 for AUD/CAD/GBP/NZD, 360 for others)
severity: high
kind: domain_rule
modality: must
consequence: Incorrect day count causes carry returns to be miscalculated by approximately 1.4%, leading to systematic
TRI drift from true values and invalid strategy performance comparison
stage_ids:
- curve_construction
- id: finance-C-032
when: When rolling FX forwards contracts in TRI construction
action: Use month-end as roll trigger with 5 business days before for 1M contracts
severity: high
kind: domain_rule
modality: must
consequence: Rolling on incorrect dates causes exposure to expiring contracts, resulting in gap risk at delivery and misalignment
with broker execution dates
stage_ids:
- curve_construction
- id: finance-C-033
when: When rolling FX options contracts in TRI construction
action: Use expiry-date as roll trigger to avoid gamma exposure near expiration
severity: high
kind: domain_rule
modality: must
consequence: Rolling at month-end instead of expiry-date creates positions with high gamma near expiration, causing delta
hedging costs to spike and TRI to overstate option strategy returns
stage_ids:
- curve_construction
- id: finance-C-034
when: When joining TRI series with different currency pairs
action: Use outer join to preserve each dates across currency pairs
severity: high
kind: architecture_guardrail
modality: must
consequence: Inner join drops dates where only some pairs have data, creating gaps in multi-currency portfolio TRI and
causing signal generation to skip valid trading days
stage_ids:
- curve_construction
- id: finance-C-036
when: When constructing cross rates via intermediate currency
action: Handle USDUSD special case by returning zero returns and apply correct sign for base/terms currency
severity: high
kind: architecture_guardrail
modality: must
consequence: Missing USDUSD special case causes division by zero or incorrect cross-rate calculation, producing NaN or
wrong TRI values for pairs involving the reference currency
stage_ids:
- curve_construction
- id: finance-C-037
when: When processing FX forward tickers for TRI construction
action: Convert ticker notation to market convention before processing (e.g., USDEUR -> EURUSD)
severity: high
kind: architecture_guardrail
modality: must
consequence: Processing non-standard ticker notation causes incorrect delivery date calculation and wrong roll timing,
leading to positions held past expiration
stage_ids:
- curve_construction
- id: finance-C-038
when: When calculating time differences for daily carry accrual
action: Floor time difference to whole days and set first value to zero
severity: medium
kind: domain_rule
modality: must
consequence: Using sub-daily time differences causes incorrect carry scaling for intraday data, making TRI inconsistent
between daily and intraday backtests
stage_ids:
- curve_construction
- id: finance-C-039
when: When marking FX forwards to market at roll date
action: Use previous contract's interpolated forward price for MTM calculation on roll dates
severity: high
kind: domain_rule
modality: must
consequence: Using current (new) contract price for MTM at roll date causes artificial return spike/discontinuity, inflating
TRI and misrepresenting actual P&L at roll
stage_ids:
- curve_construction
- id: finance-C-040
when: When working with NDF currencies (e.g., BRL, INR, KRW)
action: Forward-fill missing deposit data since NDF fixings may have gaps
severity: high
kind: operational_lesson
modality: must
consequence: Without forward-filling NDF deposit data, TRI construction fails on business days without fixing, creating
NaN carry values and breaking the cumulative index chain
stage_ids:
- curve_construction
- id: finance-C-041
when: When aligning deposit data with spot prices
action: Join carry with spot using inner join to verify carry only exists when spot is available
severity: high
kind: architecture_guardrail
modality: must
consequence: Left join with spot causes carry values to exist on days without spot data, leading to carry accrual without
price movement and TRI overstatement
stage_ids:
- curve_construction
- id: finance-C-042
when: When handling first return in TRI series
action: Set first return to zero (0) rather than leaving as NaN from shift operation
severity: high
kind: domain_rule
modality: must
consequence: NaN first return propagates through cumprod, corrupting entire TRI series and causing downstream signal generation
and P&L calculation failures
stage_ids:
- curve_construction
- id: finance-C-043
when: When implementing FX options TRI with delta hedging
action: Price exiting option using previous day's strike and expiry to avoid look-ahead bias
severity: high
kind: domain_rule
modality: must
consequence: Using current contract parameters for MTM on exit date reveals future information, creating artificial hedging
profits/losses not achievable in live trading
stage_ids:
- curve_construction
- id: finance-C-044
when: When using freeze_implied_vol for options pricing
action: Disable vol freezing unless testing sensitivity scenarios
severity: medium
kind: operational_lesson
modality: should_not
consequence: Freezing implied vol causes TRI to ignore vol surface evolution, misrepresenting realized option P&L and
creating discrepancies with live trading where vol changes affect delta
stage_ids:
- curve_construction
- id: finance-C-045
when: When comparing constructed TRI with external benchmarks (Bloomberg)
action: Accept approximate tracking rather than exact match due to timing and convention differences
severity: medium
kind: claim_boundary
modality: must
consequence: Presenting constructed TRI as identical to Bloomberg indices overstates accuracy; timing differences (NYC
vs. LDN cut) and convention handling create measurable tracking error
stage_ids:
- curve_construction
- id: finance-C-047
when: When handling FX option expiry dates for backtest
action: Adjust expiry to nearest available market data date if expiry falls on non-trading day
severity: high
kind: operational_lesson
modality: must
consequence: Using calendar expiry date when market data doesn't exist causes NaN pricing and TRI breaks at expiry, leading
to missing P&L around roll dates
stage_ids:
- curve_construction
- id: finance-C-049
when: When building an FX volatility surface from market quotes
action: Use CLARK5 interpolation function type for smoother vol surface without arbitrage
severity: high
kind: domain_rule
modality: must
consequence: Using BBG interpolation may produce jagged vol surfaces with potential arbitrage violations, causing incorrect
option pricing and PnL calculation errors
stage_ids:
- volatility_pricing
- id: finance-C-050
when: When quoting ATM volatility for FX options
action: Use fwd-delta-neutral-premium-adj ATM method to account for premium difference between calls/puts
severity: high
kind: domain_rule
modality: must
consequence: Using spot or forward ATM methods without premium adjustment will misalign delta-neutral strikes, causing
systematic pricing errors in FX options strategies
stage_ids:
- volatility_pricing
- id: finance-C-051
when: When interpolating volatility surface across tenors
action: Interpolate linearly in variance space (sigma^2 * T), not in vol space
severity: high
kind: domain_rule
modality: must
consequence: Linear interpolation in vol space creates biased vol surface, causing material pricing errors especially
for long-dated options where variance interpolation is mathematically correct
stage_ids:
- volatility_pricing
- id: finance-C-052
when: When calculating volatility risk premium
action: Align implied vol and realized vol periods using BDay offset, not simple shift
severity: high
kind: domain_rule
modality: must
consequence: Using pandas shift() for VRP calculation introduces look-ahead bias, causing VRP estimates to include future
information and misrepresent true vol risk premium
stage_ids:
- volatility_pricing
- id: finance-C-053
when: When building vol surface for unstable market periods
action: Increase solver tolerance to fill sparse vol surface areas during high-vol events
severity: high
kind: domain_rule
modality: must
consequence: During market stress (Brexit, elections), default tolerance 1e-8 causes solver non-convergence, leaving gaps
in vol surface that corrupt downstream option pricing
stage_ids:
- volatility_pricing
- id: finance-C-054
when: When calculating realized volatility
action: Strip time component from datetime index before returning realized vol series
severity: medium
kind: domain_rule
modality: must
consequence: Realized vol series retains timestamp information causing join failures with daily implied vol data, producing
NaN VRP values and broken downstream calculations
stage_ids:
- volatility_pricing
- id: finance-C-055
when: When using FX vol surface for option pricing
action: Install FinancePy as optional dependency separately from finmarketpy
severity: high
kind: resource_boundary
modality: must
consequence: FinancePy has strict version dependencies (llvmlite) that conflict with other libraries; installing via pip
with --no-deps prevents dependency conflicts that break vol surface construction
stage_ids:
- volatility_pricing
- id: finance-C-056
when: When choosing a volatility surface interpolation method
action: Use SABR model for production vol surface fitting
severity: high
kind: resource_boundary
modality: must_not
consequence: SABR volatility function type is not fully implemented in current FinancePy version, causing unpredictable
behavior and potential crashes during build_vol_surface calls
stage_ids:
- volatility_pricing
- id: finance-C-057
when: When calibrating FX vol surface with nelder-mead-numba solver
action: Accept accuracy-speed tradeoff; numba version is faster but less precise
severity: medium
kind: resource_boundary
modality: must
consequence: Using nelder-mead-numba for real-time pricing gives speed but introduces calibration imprecision that accumulates
across vol surface strikes, causing systematic mispricing
stage_ids:
- volatility_pricing
- id: finance-C-058
when: When running FX vol surface code with Numba JIT compilation
action: Delete __pycache__ folders if Numba frontend errors occur
severity: high
kind: operational_lesson
modality: must
consequence: Stale Numba cache causes 'Failed in nopython mode pipeline' errors that prevent vol surface construction,
blocking all option pricing functionality
stage_ids:
- volatility_pricing
- id: finance-C-060
when: When calling calculate_vol_for_strike_expiry
action: Pass either expiry_date or tenor parameter for vol interpolation
severity: high
kind: architecture_guardrail
modality: must
consequence: Passing neither expiry_date nor tenor returns None, causing downstream NaN vol values that corrupt option
pricing calculations
stage_ids:
- volatility_pricing
- id: finance-C-062
when: When working with 10d delta strikes in vol surface
action: Expect full 10d strike interpolation support in price_instrument
severity: medium
kind: resource_boundary
modality: must_not
consequence: 10d OTM strike pricing has incomplete implementation (TODO comment), causing incorrect strikes to be used
for 10d butterfly and strangle constructions
stage_ids:
- volatility_pricing
- id: finance-C-063
when: When comparing implied vol with realized vol
action: Present VRP as guaranteed predictor of future realized vol
severity: high
kind: claim_boundary
modality: must_not
consequence: Vol risk premium is a statistical relationship that frequently breaks down during market regime changes,
and presenting it as predictive causes overconfident risk estimates
stage_ids:
- volatility_pricing
- id: finance-C-064
when: When computing vol surface across multiple dates
action: Use extract_vol_surface_across_dates for batch processing, not individual calls
severity: medium
kind: operational_lesson
modality: must
consequence: Building vol surface individually for each date without the batch method causes redundant computation and
potential inconsistent extreme value tracking across the surface
stage_ids:
- volatility_pricing
- id: finance-C-065
when: When implementing volatility targeting with rolling window periods
action: account for NaN values in the first N periods before the rolling window completes
severity: high
kind: domain_rule
modality: must
consequence: Volatility-adjusted leverage will be NaN for the initial periods equal to the rolling window (vol_periods),
causing backtest P&L series to contain NaN values for warmup periods and potentially causing downstream calculation
errors in portfolio aggregation
stage_ids:
- backtesting
- id: finance-C-066
when: When calculating transaction costs from basis points input
action: convert bp to decimal by dividing by (2.0 * 100.0 * 100.0) for spot_tc_bp and by (100.0 * 100.0) for spot_rc_bp
severity: high
kind: domain_rule
modality: must
consequence: Transaction costs will be incorrectly applied (off by factor of 10000), severely overstating or understating
actual trading costs and producing misleading backtest P&L results that do not reflect realistic transaction cost drag
stage_ids:
- backtesting
- id: finance-C-067
when: When configuring annualization factor for daily return statistics
action: use 252 as the annualization factor for daily data to match standard trading days in a year
severity: high
kind: domain_rule
modality: must
consequence: Annualized return and volatility statistics will be overstated if using 365 (natural days) or understated
if using other values, leading to incorrect Sharpe ratios and risk metrics that mischaracterize strategy performance
stage_ids:
- backtesting
- id: finance-C-068
when: When aligning signals with asset returns for P&L calculation
action: apply forward-fill (ffill) to asset holidays and signal gaps to carry forward the last valid value
severity: medium
kind: domain_rule
modality: must
consequence: P&L will contain NaN values on asset holidays even though the signal remains valid, causing discontinuities
in cumulative returns and incorrect trade counts when assets resume trading after gaps
stage_ids:
- backtesting
- id: finance-C-069
when: When applying stop loss and take profit risk management
action: apply stop loss/take profit signals BEFORE portfolio weight optimization and volatility targeting
severity: high
kind: architecture_guardrail
modality: must
consequence: Risk stops applied after portfolio optimization will cause incorrect signal cascades where leverage adjustments
trigger unnecessary stops, distorting the true risk-adjusted returns and overstating the frequency of stop-out events
stage_ids:
- backtesting
- id: finance-C-070
when: When implementing signal delay for execution timing
action: apply signal_delay via pandas shift operation to delay signal execution by the configured number of periods
severity: high
kind: architecture_guardrail
modality: must
consequence: Backtest will exhibit look-ahead bias if signal_delay=0 is used with same-day execution assumption, causing
P&L alignment errors when the strategy is deployed live with next-day execution, overstating historical returns
stage_ids:
- backtesting
- id: finance-C-071
when: When calculating trade counts from signal changes
action: compute trade counts as the absolute difference of signal values using shift(1) operation
severity: medium
kind: architecture_guardrail
modality: must
consequence: Trade counts will not match actual signal changes if computed differently, causing discrepancies between
reported turnover and P&L attribution analysis, making it impossible to accurately measure transaction cost drag
stage_ids:
- backtesting
- id: finance-C-072
when: When applying position limits via element-wise clip adjustment
action: scale positions proportionally using the position_clip_adjustment to respect net and total exposure limits
severity: high
kind: architecture_guardrail
modality: must
consequence: Portfolio will exceed configured position limits causing unintended concentrated exposure, potentially leading
to significant losses during adverse market movements when leverage exceeds risk parameters
stage_ids:
- backtesting
- id: finance-C-073
when: When configuring maximum leverage for volatility-targeted strategies
action: set max_leverage to 5.0 as the conservative default to prevent runaway leverage in volatile periods
severity: high
kind: resource_boundary
modality: must
consequence: Unlimited leverage during low-volatility periods will create excessive position sizes that amplify losses
during volatility spikes, potentially causing margin calls and forced liquidation at the worst possible time
stage_ids:
- backtesting
- id: finance-C-074
when: When using vol targeting without sufficient historical data
action: verify the backtest period contains at least vol_periods observations before computing meaningful leverage
severity: medium
kind: operational_lesson
modality: must
consequence: Leverage calculation will produce unreliable or extreme values due to insufficient sample for volatility
estimation, causing unstable position sizing that either under-allocates capital or generates excessive leverage
stage_ids:
- backtesting
- id: finance-C-075
when: When applying portfolio-level volatility targeting
action: apply portfolio leverage AFTER individual signal leverage to correctly scale the combined position
severity: high
kind: architecture_guardrail
modality: must
consequence: Double application or incorrect ordering of leverage will distort portfolio-level risk controls, causing
either excessive or insufficient total exposure compared to the intended vol target
stage_ids:
- backtesting
- id: finance-C-076
when: When interpreting backtest results as indicators of live trading performance
action: present backtest returns as guaranteed expected live trading returns without specified caveats
severity: high
kind: claim_boundary
modality: must_not
consequence: Backtest results include look-ahead bias, ignore slippage, and do not account for execution variability,
leading to unrealistic expectations that can cause poor risk management decisions and significant losses when deploying
strategies live
stage_ids:
- backtesting
- id: finance-C-077
when: When using finmarketpy backtest results for regulatory or compliance reporting
action: represent simulated backtest P&L as proof of actual trading performance without independent verification
severity: high
kind: claim_boundary
modality: must_not
consequence: Presenting backtest-only results as verified trading records violates compliance requirements and can constitute
misleading marketing, exposing the practitioner to regulatory action and reputational damage
stage_ids:
- backtesting
- id: finance-C-078
when: When comparing strategies with different trade frequencies
action: account for transaction costs proportionally to trade frequency when evaluating strategy profitability
severity: high
kind: operational_lesson
modality: must
consequence: High-frequency signal changes will incur disproportionate transaction costs that may exceed gross returns,
causing strategies to appear profitable in backtest but lose money in live trading due to cost drag
stage_ids:
- backtesting
- id: finance-C-079
when: When configuring the portfolio combination method
action: verify portfolio_combination method is explicitly set to 'sum', 'mean', or 'weighted' to avoid implicit equal-weighting
severity: medium
kind: operational_lesson
modality: must
consequence: Implicit mean weighting will silently combine signals with potentially unintended equal contribution, distorting
risk-adjusted returns and causing position sizing that doesn't match the intended portfolio construction methodology
stage_ids:
- backtesting
- id: finance-C-080
when: When implementing sensitivity analysis with parameter sweeps
action: reset trading model parameters after completing the sweep
severity: high
kind: architecture_guardrail
modality: must
consequence: Without parameter reset, subsequent analyses use incorrectly overridden parameter values, causing corrupted
P&L reports and invalid Sharpe ratios across model runs
stage_ids:
- analysis_reporting
- id: finance-C-081
when: When running strategy return statistics with finmarketpy engine
action: save and restore SCALE_FACTOR after report generation
severity: medium
kind: domain_rule
modality: must
consequence: SCALE_FACTOR remains at 0.75 for subsequent plots, causing incorrectly scaled charts and potentially misleading
visual representation of performance metrics
stage_ids:
- analysis_reporting
- id: finance-C-082
when: When importing TradeAnalysis for PyFolio-based reporting
action: assume PyFolio is always available in the environment
severity: high
kind: resource_boundary
modality: must_not
consequence: Code crashes with ImportError when PyFolio is not installed, preventing report generation entirely
stage_ids:
- analysis_reporting
- id: finance-C-083
when: When configuring chart rendering engine
action: specify a valid chart engine from the supported set
severity: high
kind: resource_boundary
modality: must
consequence: Invalid engine name causes chart generation failure, blocking all plot outputs including Sharpe ratios and
drawdown visualizations
stage_ids:
- analysis_reporting
- id: finance-C-084
when: When conducting parameter sensitivity analysis
action: verify parameter_list matches pretty_portfolio_names in length
severity: high
kind: architecture_guardrail
modality: must
consequence: Index mismatch causes IndexError during sensitivity loop iteration, causing analysis to fail mid-sweep and
produce incomplete CSV exports
stage_ids:
- analysis_reporting
- id: finance-C-086
when: When calculating Sharpe ratios from strategy returns
action: annualize returns using sqrt of annualization factor
severity: high
kind: domain_rule
modality: must
consequence: Misaligned annualization produces misleading Sharpe ratios that misrepresent strategy performance, leading
to incorrect investment decisions
stage_ids:
- analysis_reporting
- id: finance-C-087
when: When using parallel processing for sensitivity analysis
action: check platform-specific thread configuration before enabling
severity: medium
kind: resource_boundary
modality: should
consequence: Incorrect thread count for platform causes poor parallel performance or resource exhaustion, slowing down
parameter sweeps significantly
stage_ids:
- analysis_reporting
- id: finance-C-088
when: When generating drawdown metrics for reporting
action: compute drawdowns from cumulative returns series without look-ahead
severity: high
kind: domain_rule
modality: must
consequence: Drawdown calculation using future information produces artificially optimistic risk metrics, misrepresenting
actual historical drawdown exposure
stage_ids:
- analysis_reporting
- id: finance-C-090
when: When resampling time series for annualized statistics
action: align resample_ann_factor with actual data frequency
severity: high
kind: domain_rule
modality: must
consequence: Mismatched annualization factor produces incorrect annualized returns and Sharpe ratios, distorting strategy
comparison across different data frequencies
stage_ids:
- analysis_reporting
- id: finance-C-091
when: When exporting CSV data from sensitivity analysis
action: verify DUMP_PATH directory exists before writing
severity: medium
kind: operational_lesson
modality: must
consequence: Missing output directory causes FileNotFoundError, preventing CSV export of requested statistics and blocking
automated analysis pipelines
stage_ids:
- analysis_reporting
- id: finance-C-092
when: When computing Information Ratio for strategy comparison
action: use returns excess over benchmark, not absolute returns
severity: high
kind: domain_rule
modality: must
consequence: IR calculation using absolute returns instead of excess returns misrepresents manager skill, leading to flawed
strategy selection decisions
stage_ids:
- analysis_reporting
- id: finance-C-093
when: When creating multi-model comparison plots
action: align strategy PnL time series by date before comparison
severity: high
kind: architecture_guardrail
modality: must
consequence: Unaligned time series cause incorrect difference calculations in comparison plots, producing misleading relative
performance charts
stage_ids:
- analysis_reporting
- id: finance-C-094
when: When processing day-of-month seasonality analysis
action: resample to business days before calculating bus_day seasonality
severity: medium
kind: domain_rule
modality: must
consequence: Calendar day resampling includes non-trading days, distorting day-of-month seasonality patterns and leading
to incorrect trading signal timing
stage_ids:
- analysis_reporting
- id: finance-C-095
when: When setting up report output path with timestamp
action: include execution timestamp in output directory name
severity: low
kind: operational_lesson
modality: should
consequence: Without timestamp, repeated runs overwrite previous analysis outputs, losing historical comparison data and
complicating audit trails
stage_ids:
- analysis_reporting
- id: finance-C-096
when: When running backtest analysis without NumPy float precision concerns
action: ignore floating-point precision in Sharpe ratio calculations
severity: medium
kind: domain_rule
modality: must_not
consequence: Cumulative floating-point errors in long-running backtests distort final Sharpe ratio calculations, causing
subtle performance misrepresentation
stage_ids:
- analysis_reporting
- id: finance-C-097
when: When filtering return statistics by date range
action: apply plot_start and plot_finish filters after cumulative index calculation
severity: high
kind: architecture_guardrail
modality: must
consequence: Premature filtering before cumulative calculation produces incomplete equity curves with incorrect starting
values, misrepresenting historical performance
stage_ids:
- analysis_reporting
- id: finance-C-098
when: When calculating individual trade P&L from signals and returns
action: use signal changes to identify trade boundaries, not daily returns
severity: medium
kind: domain_rule
modality: must
consequence: Trade identification using incorrect boundary markers splits trades at wrong points, producing inaccurate
individual trade statistics and misleading win/loss ratios
stage_ids:
- analysis_reporting
- id: finance-C-099
when: When claiming the system can analyze any trading strategy
action: overstate analysis capabilities without considering data quality requirements
severity: medium
kind: claim_boundary
modality: must_not
consequence: Analyzing strategies with insufficient historical data or missing price points produces unreliable Sharpe
ratios and drawdown metrics that appear statistically valid but lack sufficient sample size
stage_ids:
- analysis_reporting
- id: finance-C-101
when: When implementing position sizing logic in the vol-targeting system
action: Use vol_target=0.10 (10% annualized) and lookback_period=20 days as specified — these are the standard parameters
for risk normalization across currency pairs
severity: high
kind: domain_rule
modality: must
consequence: Using non-standard vol target values changes position sizes proportionally, causing the strategy to either
under-risk or over-risk across currency pairs with different volatilities, leading to inconsistent risk-adjusted returns
derived_from_bd_id: BD-024
- id: finance-C-103
when: When implementing signal transitions between directional positions
action: Use explicit zero assignment for neutral signals — do not leave signal value undefined or rely on implicit zero-crossing
behavior during transitions
severity: high
kind: domain_rule
modality: must
consequence: Leaving signal transitions undefined creates ambiguity that can introduce silent directional bias, causing
incorrect position assignments in backtesting and misalignment with live trading execution
derived_from_bd_id: BD-003
- id: finance-C-104
when: When constructing equity curve or cumulative return index in backtesting
action: Use multiplicative compounding mode (cum_index='mult') for cumulative index construction — this accurately represents
real trading where profits and losses are reinvested proportionally
severity: high
kind: domain_rule
modality: must
consequence: Using additive cumulation mode produces negative index values and misrepresents portfolio growth, distorting
drawdown severity and return distribution in backtest results
derived_from_bd_id: BD-007
- id: finance-C-105
when: When configuring P&L scaling mode in backtestrequest
action: Use cum_index='mult' (multiplicative) mode — multiplicative compounding scales P&L geometrically and correctly
represents percentage returns with continuous compounding
severity: high
kind: domain_rule
modality: must
consequence: Using additive ('add') mode treats returns as linear increments, misrepresenting drawdown severity and return
distribution, leading to incorrect risk assessment and strategy evaluation
derived_from_bd_id: BD-074
- id: finance-C-107
when: When generating portfolio analytics and tear sheets
action: Be aware that PyFolio integration is optional — the core backtesting engine functions without PyFolio, but advanced
analytics (tear sheets) will be unavailable if PyFolio is not installed
severity: medium
kind: operational_lesson
modality: should
consequence: Attempting to generate PyFolio tear sheets without installation raises errors, but the error is handled gracefully
and does not crash the backtesting engine
derived_from_bd_id: BD-021
- id: finance-C-108
when: When calibrating volatility surfaces for FX options pricing
action: Use Nelder-Mead optimization with Numba JIT compilation as the default solver — the Numba JIT acceleration is
essential for the speed/accuracy tradeoff; alternative solvers without JIT will be significantly slower
severity: high
kind: architecture_guardrail
modality: must
consequence: Using alternative solvers without Numba JIT causes calibration to run significantly slower, making real-time
vol surface updates infeasible for backtesting workflows
derived_from_bd_id: BD-012
- id: finance-C-110
when: When implementing rebalancing frequency logic for volatility targeting in backtesting
action: Use business month end (BM) rebalancing frequency aligned with data frequency — BM provides monthly calibration
for vol calculations without excessive transaction costs; weekly was rejected due to higher costs and quarterly was
rejected as too infrequent
severity: high
kind: domain_rule
modality: must
consequence: Using incorrect rebalancing frequency (e.g., weekly or quarterly) desynchronizes position adjustments from
vol calculation periods, causing inaccurate volatility targeting and mis-sized positions in backtesting
derived_from_bd_id: BD-027
- id: finance-C-111
when: When selecting currency pairs for FX trend following strategies
action: Use G10 USD crosses as the FX universe (7-10 pairs) — G10 represents the most liquid FX universe with sufficient
volatility and data quality for trend following; EM currencies were rejected due to liquidity concerns
severity: high
kind: domain_rule
modality: must
consequence: Using non-G10 currencies (e.g., EM crosses) may introduce liquidity risk with wider spreads and data gaps,
causing execution prices to deviate significantly from backtest assumptions
derived_from_bd_id: BD-029
- id: finance-C-112
when: When constructing spot curves for FX carry calculations
action: Use overnight (ON) deposit tenor as the default starting point for spot curve construction — ON deposits are the
most liquid tenor representing true overnight carry cost for position funding; override only for emerging market currencies
with illiquid ON markets
severity: high
kind: domain_rule
modality: must
consequence: Using 1W or 1M deposit as default adds complexity and less liquidity to carry interpretation, causing carry
calculation errors that misrepresent true funding costs for FX positions
derived_from_bd_id: BD-006
- id: finance-C-113
when: When constructing volatility surfaces for FX options pricing
action: Use CLARK5 interpolation algorithm as the default vol surface interpolation method — CLARK5 provides smoother
interpolation avoiding wing oscillation artifacts compared to cubic spline; override only for exotic surfaces with discontinuities
severity: high
kind: architecture_guardrail
modality: must
consequence: Using cubic spline (BBG) interpolation introduces oscillation artifacts at the wings of the vol surface,
causing incorrect option pricing especially for deep ITM/OTM strikes
derived_from_bd_id: BD-010
- id: finance-C-114
when: When aligning asset price data with trading signals in backtesting
action: Use left join with forward-fill (fill direction='left') for asset-signal alignment — left join preserves asset
observations while forward-fill ensures alignment without introducing look-ahead bias; inner join drops assets without
signals and right join drops signal observations
severity: high
kind: domain_rule
modality: must
consequence: Using inner join drops assets without signals causing survivorship bias; using right join drops signal observations;
incorrect join type introduces data leakage or survivorship bias in backtest results
derived_from_bd_id: BD-031
- id: finance-C-115
when: When calibrating volatility surface parameters for FX options pricing
action: Use Nelder-Mead simplex solver for vol surface calibration — this derivative-free method handles noisy objective
functions from market quotes without requiring gradient computation; L-BFGS-B requires gradients and Levenberg-Marquardt
requires specific problem structure
severity: high
kind: architecture_guardrail
modality: must
consequence: Gradient-based solvers may fail to converge on noisy vol surface calibration data, producing incorrect surface
parameters and systematic option pricing errors
derived_from_bd_id: BD-037
- id: finance-C-116
when: When configuring volatility targeting with maximum leverage limits in backtesting
action: Verify that signal_vol_max_leverage=5x is configured correctly for vol-targeted strategies — this 5x cap prevents
runaway leverage during high-volatility periods and reflects common regulatory limits; only override with explicit documentation
and regulatory compliance review
severity: high
kind: operational_lesson
modality: must
consequence: Without proper leverage caps, high-volatility periods trigger extreme position sizing that can exceed account
capacity and regulatory limits, causing catastrophic losses and compliance violations
derived_from_bd_id: BD-016
- id: finance-C-117
when: When implementing signal delay configuration for execution timing in backtesting
action: Verify that signal_delay=0 (same-bar execution) matches strategy timing assumptions — for close-to-close strategies
requiring end-of-day signal evaluation, set delay=1 to avoid same-bar look-ahead; high-frequency strategies may use
sub-day delays
severity: medium
kind: operational_lesson
modality: should
consequence: Same-bar execution (delay=0) may introduce subtle look-ahead bias if signal generation uses same-bar closing
prices, causing backtest results to appear more favorable than actual trading would achieve
derived_from_bd_id: BD-017
- id: finance-C-118
when: When calculating technical indicators with default fillna behavior
action: Verify that fillna=True (default forward-fill of missing values) matches strategy requirements — for high-frequency
or momentum strategies sensitive to stale data, set fillna=False to prevent carrying stale weekend/holiday prices across
multiple days
severity: medium
kind: operational_lesson
modality: should
consequence: Forward-filling through weekends carries stale prices across multiple days, causing momentum strategies to
hold outdated positions with artificially smoothed indicators and increased risk exposure
derived_from_bd_id: BD-004
- id: finance-C-119
when: When processing signals across non-trading periods in backtesting
action: Verify that forward-fill (ffill) strategy aligns with position management requirements — for momentum strategies
requiring periodic rebalancing or signal decay, implement explicit signal decay logic instead of relying on ffill to
hold positions through thin market days
severity: medium
kind: operational_lesson
modality: should
consequence: Forward-filling signals through non-trading periods maintains stale positions without rebalancing, causing
momentum strategies to underperform during trending markets with periodic volatility spikes
derived_from_bd_id: BD-005
- id: finance-C-120
when: When implementing cumulative return index calculation for backtesting
action: Use Numba JIT compilation for total return index calculation — do not replace with pure Python loops or untested
optimization approaches
severity: high
kind: architecture_guardrail
modality: must
consequence: Pure Python loops are 10-100x slower than Numba JIT; for backtests with >1000 observations, this causes unacceptable
runtime making iterative development impractical
derived_from_bd_id: BD-033
- id: finance-C-121
when: When constructing FX volatility surfaces for option pricing
action: Use CLARK5 interpolation for FX vol surface fitting — do not substitute with alternative methods without validating
accuracy against industry standard
severity: high
kind: domain_rule
modality: must
consequence: CLARK5 provides smooth vol surface fitting across strikes and tenors; alternative methods may introduce pricing
errors in vanilla FX options, causing systematic mispricing
derived_from_bd_id: BD-034
- id: finance-C-122
when: When estimating implied vol addon for option pricing
action: Use weighted median model for vol addon estimation — do not replace with mean or unweighted median as they are
sensitive to outliers
severity: high
kind: domain_rule
modality: must
consequence: Weighted median is robust to outliers in vol addon estimation; using mean causes tail observations to distort
addon values, leading to incorrect option pricing and delta hedging errors
derived_from_bd_id: BD-042
- id: finance-C-123
when: When determining ATM strike for vanilla FX options pricing
action: Use forward delta-neutral with premium adjustment method for ATM strike determination — do not use spot or forward
ATM without premium adjustment
severity: high
kind: domain_rule
modality: must
consequence: Forward delta-neutral with premium adjustment ensures accurate delta hedging across the vol surface; using
spot or forward ATM without premium adjustment causes delta hedging errors in FX options
derived_from_bd_id: BD-011
- id: finance-C-124
when: When configuring volatility targeting for signal or portfolio level
action: Explicitly verify and set vol_target parameter rather than relying on the 10% default — higher-volatility strategies
(e.g., short-vol, carry) must explicitly increase the target
severity: medium
kind: operational_lesson
modality: should
consequence: Default 10% vol target is appropriate for moderate-risk FX trend-following but causes unlimited leverage
risk for high-volatility strategies if not explicitly overridden
derived_from_bd_id: BD-014
- id: finance-C-125
when: When clustering assets in network analysis
action: Use affinity propagation algorithm for asset clustering — do not replace with K-means or hierarchical clustering
that require pre-specifying cluster count
severity: medium
kind: architecture_guardrail
modality: must
consequence: Affinity propagation automatically determines cluster numbers without requiring pre-specified k; using K-means
with arbitrary k would produce incorrect natural groupings in FX markets
derived_from_bd_id: BD-053
- id: finance-C-126
when: When implementing Guppy Multiple Moving Average technical indicator
action: Use exactly 12 EMA components with windows (3,5,7,9,11,14,21,28,35,42,49,56) for Guppy MMA — do not reduce component
count or change EMA windows
severity: high
kind: domain_rule
modality: must
consequence: The fixed 12-component structure distinguishes rapid trend changes from sustained directional moves via short-group
vs long-group crossovers; changing component count alters entry/exit signal boundaries
derived_from_bd_id: BD-063
- id: finance-C-127
when: When configuring output paths for backtest analysis
action: Override the default 'output_data/YYYYMMDD' path for production pipelines requiring consistent output paths —
production systems must use deterministic paths
severity: medium
kind: operational_lesson
modality: should
consequence: Default timestamped paths prevent overwrites in iterative workflows but production pipelines require consistent
paths for downstream processing; failing to override causes missing data in automated workflows
derived_from_bd_id: BD-020
- id: finance-C-128
when: When implementing FX options roll logic for backtesting
action: Trigger FX options rolling at actual expiry dates rather than calendar month-end — use 'expiry-date' roll event,
not month-end
severity: high
kind: domain_rule
modality: must
consequence: Options gamma peaks at expiry; rolling at month-end misaligns with actual option lifecycle, causing unintended
gamma exposure and misaligned variance-targeting in strategies
derived_from_bd_id: BD-008
- id: finance-C-129
when: When constructing cross-currency pairs for return attribution
action: Verify required cross rates are available when using 'no' (direct) mode; if unavailable, fallback to domestic
currency conversion requires complete market data for the domestic currency
severity: high
kind: domain_rule
modality: must
consequence: Direct triangulation ('no' mode) requires all required cross rates; using it with missing rates produces
incorrect USD-denominated P&L attribution across currency exposures
derived_from_bd_id: BD-009
- id: finance-C-130
when: When implementing rolling technical indicators in signal generation
action: Set first n-1 periods to NaN to create warmup periods equal to indicator period — verify rolling indicators require
full lookback window before generating signals
severity: high
kind: domain_rule
modality: must
consequence: Without NaN warmup periods, rolling indicators generate unstable signals from incomplete history, causing
look-ahead bias where signals use future information not available at signal time
derived_from_bd_id: BD-045
- id: finance-C-131
when: When implementing RSI momentum calculation for trading signals
action: Use RSI period of 14 (Wilder's original specification) — verify the period meets minimum of 1 and balances signal
stability with responsiveness for momentum signals
severity: medium
kind: operational_lesson
modality: should
consequence: Using non-standard RSI period causes signal characteristics to deviate from historical performance data validated
with 14-period settings, leading to unreliable momentum signals
derived_from_bd_id: BD-047
- id: finance-C-132
when: When implementing volatility-adjusted signals using ATR
action: Use ATR period of 14 (Wilder's original specification) — verify the period meets minimum of 1 and captures approximately
two weeks of daily price action
severity: medium
kind: operational_lesson
modality: should
consequence: Using non-standard ATR period causes volatility-adjusted signals to be either too noisy or too slow, reducing
signal quality and strategy performance for volatility-managed trades
derived_from_bd_id: BD-048
- id: finance-C-133
when: When implementing event study logic for economic releases
action: Set NYC 10am cutoff for economic releases (8:30am ET typical release time) — verify timezone conversion uses America/New_York
local market time to capture same-day market response
severity: medium
kind: operational_lesson
modality: should
consequence: Incorrect cutoff time causes event study to misalign economic releases with market price responses, leading
to incorrect event attribution and distorted strategy performance metrics
derived_from_bd_id: BD-055
- id: finance-C-134
when: When implementing volatility calculation for realized volatility using rolling window
action: Use tenor_days as the rolling window for realized volatility on daily data — verify window scales proportionally
with instrument maturity to maintain consistency with day count conventions
severity: high
kind: domain_rule
modality: must
consequence: Using fixed rolling window instead of tenor_days causes volatility estimates to be inconsistent across instruments
with different maturities, leading to incorrect risk assessment and strategy performance degradation
derived_from_bd_id: BD-056
- id: finance-C-135
when: When implementing volatility-targeting position sizing logic in backtesting
action: Apply max leverage cap (typically 1.0-2.0x) when scaling positions based on realized volatility — the cap prevents
unbounded position growth during low-volatility periods that would otherwise exceed prudent risk limits
severity: high
kind: domain_rule
modality: must
consequence: Without a max leverage cap, volatility-targeting produces oversized positions during low-volatility regimes,
amplifying losses when volatility reverts and potentially exceeding available capital or margin limits
derived_from_bd_id: BD-062
- id: finance-C-136
when: When implementing trade exit logic in backtesting
action: Enforce stop-loss and take-profit levels to bound PnL distributions per trade — positions must be closed when
price reaches the stop-loss level (capping maximum loss) or take-profit level (locking predetermined gains)
severity: high
kind: domain_rule
modality: must
consequence: Without stop-loss and take-profit enforcement, trades remain open indefinitely, producing unbounded loss
potential and making backtest results incompatible with live trading discipline where positions require manual or automated
exit
derived_from_bd_id: BD-065
- id: finance-C-137
when: When implementing position sizing and capital allocation logic in backtesting
action: Apply position clipping to enforce hard limits on net exposure (directional risk) and gross exposure (sum of absolute
positions) — prevent positions from exceeding predetermined risk budgets that bound margin requirements and counterparty
exposure
severity: high
kind: domain_rule
modality: must
consequence: Without position clipping, favorable signals drive unbounded position accumulation that exceeds available
capital, causing margin calls in live trading and making backtest results unreproducible due to capital constraint violations
derived_from_bd_id: BD-066
- id: finance-C-138
when: When pricing FX vanilla options using Black-Scholes model
action: Apply Black-Scholes for instruments with significant volatility smile or skew — the constant-vol assumption under
log-normal dynamics systematically misprices deep OTM options where smile effects are pronounced, causing 5-15% pricing
errors on risk-reversal strategies
severity: high
kind: domain_rule
modality: must_not
consequence: Black-Scholes systematically underprices out-of-the-money puts and overprices out-of-the-money calls when
volatility smile exists, causing hedge ratios and premium estimates to deviate significantly from market-quoted prices
derived_from_bd_id: BD-068
- id: finance-C-140
when: When using the framework's default premium output format for FX option pricing
action: Verify that fx_options_premium_output='pct_for' matches user expectations for G10 pairs — if pct_dom or abs format
is required for specific use cases, explicitly override the default parameter to avoid misinterpretation of premium
quotes
severity: medium
kind: operational_lesson
modality: should
consequence: Default pct-for format expresses premium as percentage of foreign currency notional, which may be misinterpreted
as domestic-currency percentage for pairs with significant cross rates, causing hedge ratio and PnL calculation errors
derived_from_bd_id: BD-061
- id: finance-C-141
when: When implementing options roll management in the framework
action: Use expiry-date roll event for options rolling — roll_event must be one of 'expiry', 'delivery', or 'value'; expiry
is the standard for FX options per the standardized expiration calendar
severity: high
kind: domain_rule
modality: must
consequence: Using non-expiry roll events for FX options causes positions to miss the standardized expiration calendar,
potentially resulting in unwanted physical delivery for cash-settled instruments or incorrect position tracking
derived_from_bd_id: BD-057
- id: finance-C-142
when: When implementing options roll timing logic in the framework
action: Set roll_days parameter to 5 as the pre-roll buffer — roll_days must be >= 1; 5 days provides optimal buffer for
execution without excessive stale positions or last-minute execution risk
severity: high
kind: domain_rule
modality: must
consequence: Using fewer than 5 roll days creates tight execution windows that risk missing the roll event due to execution
delays; using more than 5 days leaves positions stale too early, reducing carry returns unnecessarily
derived_from_bd_id: BD-058
- id: finance-C-143
when: When implementing FX implied volatility surface interpolation for pricing and risk management
action: Use polynomial interpolation or Clark5 model for FX implied vol surface — these methods provide smooth interpolation
between discrete strike-tenor nodes and respect theoretical no-arbitrage structure
severity: high
kind: domain_rule
modality: must
consequence: Using cubic splines or SABR for vol surface interpolation introduces arbitrage violations in the wings, causing
economically invalid prices that lead to systematic mispricing and incorrect risk calculations
derived_from_bd_id: BD-064
- id: finance-C-144
when: When implementing covariance estimation for financial network analysis with high-dimensional asset sets
action: Use Graphical Lasso for sparse covariance estimation — the L1 penalty on precision matrix produces sparse estimates
essential for revealing genuine conditional relationships and reducing overfitting
severity: high
kind: domain_rule
modality: must
consequence: Using sample covariance or Ledoit-Wolf shrinkage produces dense matrices that overfit to historical noise
in high-dimensional settings, causing poor out-of-sample portfolio performance and incorrect risk attribution
derived_from_bd_id: BD-067
- id: finance-C-145
when: When implementing the backtesting workflow that combines signal generation with portfolio optimization
action: Apply stop loss and take profit signals to returns_df BEFORE calling optimize_portfolio_weights — stop loss/take
profit constraints modify the return distribution and must be applied pre-optimization
severity: high
kind: domain_rule
modality: must
consequence: Applying stop loss/take profit after optimization uses unconstrained returns for weight calculation, then
projects constraints onto already-optimized weights, violating optimality conditions and producing suboptimal portfolios
derived_from_bd_id: BD-072
- id: finance-C-146
when: When using the framework's default annualization factor for risk metric calculations
action: Verify that ann_factor=252 matches the trading calendar assumption (~21 trading days/month) for the target market;
adjust to 250 for European markets or 252 for US equity/FX if needed
severity: medium
kind: operational_lesson
modality: should
consequence: Using 252 for markets with different trading day conventions (e.g., 250 for some European exchanges) systematically
overstates annualized Sharpe ratios by 0.8%, leading to flawed strategy selection based on misleading performance metrics
derived_from_bd_id: BD-026
- id: finance-C-147
when: When implementing or refactoring execution direction logic for EURUSD single-currency strategies
action: Enforce only_allow_longs=True to restrict execution to long positions only; must reject or filter any short signals
before order generation
severity: high
kind: domain_rule
modality: must
consequence: Removing the longs-only constraint allows short positions which introduce funding costs, overnight borrowing
fees, and counterparty risk not accounted for in the EURUSD baseline strategy design, causing live P&L to diverge significantly
from backtest
derived_from_bd_id: BD-028
- id: finance-C-148
when: When implementing data filling logic for multi-asset backtesting with spot and carry data
action: Apply forward-fill (FFILL) only to carry rates, deposit rates, and yield curve data; spot prices must use the
last known traded price without forward-filling across missing values
severity: high
kind: domain_rule
modality: must_not
consequence: Forward-filling spot prices creates look-ahead bias where future prices are used retroactively in historical
backtests; this causes strategy signals to appear earlier than they could in live trading, systematically overstating
returns by 2-5% in volatile periods
derived_from_bd_id: BD-032
- id: finance-C-149
when: When pricing FX options and determining at-the-money strike for delta hedging calculations
action: 'Use forward delta neutral ATM method: calculate ATM strike as forward-adjusted spot (spot * exp(rate_diff * tenor)),
not spot delta neutral; verify fx_options_atm_method is set to ''forward_delta_neutral'' for major currency pairs'
severity: high
kind: domain_rule
modality: must
consequence: Using spot delta neutral ATM instead of forward delta neutral systematically misprices options by ignoring
interest rate differential; for high-yielding currency pairs, this causes 3-8% mispricing in option premiums, leading
to incorrect hedge ratios and P&L attribution errors
derived_from_bd_id: BD-035
- id: finance-C-151
when: When recording trading events and backtesting results for audit or reproduction
action: Assume the framework provides an immutable event log — the framework does not implement audit trail functionality;
events can be modified or deleted after execution
severity: high
kind: claim_boundary
modality: must_not
consequence: Without immutable event logging, backtest results cannot be audited or reproduced; regulatory compliance
requirements for trading systems are violated and forensic analysis becomes impossible
derived_from_bd_id: BD-GAP-002
- id: finance-C-152
when: When implementing event logging for trading and backtesting operations
action: Implement append-only event storage with cryptographic integrity checks; log entries must include timestamp, event_type,
payload hash, and previous_entry_hash to detect tampering
severity: high
kind: domain_rule
modality: must
consequence: Implementing immutable event logging ensures full auditability and reproducibility of backtest results; regulators
and internal auditors can verify strategy execution, and disputes can be resolved with cryptographic evidence
derived_from_bd_id: BD-GAP-002
output_validator:
assertions:
- id: OV-01
check_predicate: all(p in inspect.getsource(zvt.factors.algorithm.macd) for p in ['slow=26', 'fast=12', 'n=9'])
failure_message: 'FATAL: MACD params drifted from (fast=12, slow=26, n=9) — SL-08 violation, non-reproducible signals'
business_meaning: Standard MACD parameters are a semantic lock; drift makes results incomparable with industry-standard
indicators and non-reproducible.
source_ids:
- SL-08
- BD-036
- id: OV-02
check_predicate: result.get('total_trades', 0) > 0 or result.get('explicit_zero_trade_ack') is True
failure_message: Zero trades executed — likely missing pre-fetched data (see PC-02) or over-restrictive filters
business_meaning: A backtest with zero trades is not a valid result; either data is missing or the strategy never triggered.
Structural non-emptiness check is insufficient — we need business confirmation.
source_ids:
- SL-01
- finance-C-073
- id: OV-03
check_predicate: result.get('annual_return') is None or abs(float(result['annual_return'])) <= 5.0
failure_message: 'FATAL: |annual_return| > 500% — likely look-ahead bias or data error'
business_meaning: Annual returns exceeding 500% are physically implausible for A-share strategies; indicates look-ahead
bias or corrupt data.
source_ids: []
- id: OV-04
check_predicate: result.get('holding_change_pct') is None or abs(float(result['holding_change_pct'])) <= 1.0
failure_message: 'FATAL: |holding_change_pct| > 100% — physically impossible'
business_meaning: Holding change percentage cannot exceed 100%; violation indicates position accounting error.
source_ids:
- BD-029
- id: OV-05
check_predicate: result.get('max_drawdown') is None or abs(float(result['max_drawdown'])) <= 1.0
failure_message: 'FATAL: |max_drawdown| > 100% — impossible for non-leveraged account'
business_meaning: Maximum drawdown cannot exceed 100% without leverage; violation indicates calculation error or look-ahead
bias.
source_ids: []
- id: OV-06
check_predicate: not (hasattr(result, 'trade_log') and result.trade_log and any(result.trade_log[i].action == 'sell' and
i+1 < len(result.trade_log) and result.trade_log[i+1].action == 'buy' and result.trade_log[i].timestamp == result.trade_log[i+1].timestamp
for i in range(len(result.trade_log)-1)))
failure_message: 'FATAL: buy-before-sell detected in same cycle — SL-01 violation, creates implicit leverage'
business_meaning: SL-01 requires sell() before buy() in each cycle; violation means available_long was not updated before
buying, risking duplicate positions.
source_ids:
- SL-01
scaffold:
validate_py_path: '{workspace}/validate.py'
tail_block: "# === DO NOT MODIFY BELOW THIS LINE ===\nif __name__ == \"__main__\":\n result = run_backtest()\n from\
\ validate import enforce_validation\n enforce_validation(result, output_path=\"{workspace}/result.csv\")\n# ===\
\ END DO NOT MODIFY ==="
enforcement_protocol: 1. Never edit validate.py. 2. Never delete the DO NOT MODIFY tail block from the main script. 3. Never
wrap enforce_validation() in try/except. 4. Never rewrite result write logic — it MUST go through enforce_validation.
5. If validate.py raises ImportError, fix the dependency, do not remove the call.
acceptance:
hard_gates:
- id: G1
check: '{workspace}/result.csv exists AND file size > 0'
on_fail: Strategy did not produce output; check run_backtest() return value and enforce_validation() call
- id: G2
check: '{workspace}/result.csv.validation_passed marker file exists'
on_fail: Validation did not complete; review validate.py output and fix assertion failures
- id: G3
check: 'Main script contains literal: from validate import enforce_validation'
on_fail: Validation chain stripped; re-add the import in the DO NOT MODIFY block
- id: G4
check: 'Main script contains literal: # === DO NOT MODIFY BELOW THIS LINE ==='
on_fail: Validation fence removed; regenerate DO NOT MODIFY tail block
- id: G5
check: 'result.csv has at least 1 row: pandas.read_csv(result.csv).shape[0] >= 1'
on_fail: Empty result; check if trade_log is non-empty and factors generated signals. Confirm PC-02 (k-data exists) passed.
- id: G6
check: 'If MACD strategy: source contains ''slow=26'' AND ''fast=12'' AND ''n=9'' in algorithm call'
on_fail: MACD params drifted from SL-08 lock; restore standard (12, 26, 9)
- id: G7
check: 'For data pipeline tasks: result.csv contains ''entity_id'' and ''timestamp'' fields'
on_fail: Missing required columns; check Mixin.query_data return schema and DataFrame MultiIndex reset_index() before
writing
- id: G8
check: 'OV-03 passes: abs(annual_return) <= 5.0 (500%)'
on_fail: Physical plausibility check failed; investigate look-ahead bias or data corruption in input kdata
soft_gates:
- id: SG-01
rubric: 'Strategy narrative consistency: user intent aligns with generated strategy.py logic. dim_a: signal direction
(buy/sell) matches intent [1-5, pass>=4]; dim_b: frequency (daily/intraday) aligns [1-5, pass>=4]; dim_c: risk controls
match user intent [1-5, pass>=4].'
- id: SG-02
rubric: 'Factor combination quality. dim_a: no highly correlated factor duplication [1-5, pass>=4]; dim_b: multi-period
alignment correct [1-5, pass>=4]; dim_c: liquidity filter present for A-share [1-5, pass>=4].'
- id: SG-03
rubric: 'Data source selection appropriateness. dim_a: coverage sufficient for target entities [1-5, pass>=4]; dim_b:
provider latency acceptable for strategy frequency [1-5, pass>=4]; dim_c: no unauthorized provider used without credentials
[1-5, pass>=4].'
skill_crystallization:
trigger: all_hard_gates_passed AND user_opt_out_skill_saving != true
output_path_template: '{workspace}/../skills/{slug}.skill'
slug_template: '{blueprint_id_short}-{uc_id_lower}'
captured_fields:
- name
- intent_keywords
- entry_point_script
- validate_script
- fatal_constraints
- spec_locks
- preconditions
- install_recipes
- human_summary_translated
action: 'After all Hard Gates PASS, resolve slug via slug_template using the executed UC, then write the .skill YAML file
at output_path_template. Notify user in their detected locale: ''Skill saved as {slug}.skill — next time say one of {sample_triggers}
from the matched UC to invoke directly.'''
violation_signal: All hard gates passed but no .skill file exists at expected path
skill_file_schema:
name: finance-bp-108 / ArcticDB Tick Data Storage
version: v5.3
intent_keywords:
- arcticdb
- tick data storage
- time series database
- lmdb
- market data persistence
entry_point: run_backtest
fatal_guards:
- SL-01
- SL-02
- SL-03
- SL-04
- SL-05
- SL-06
- SL-07
- SL-08
- SL-10
- SL-11
- SL-12
spec_locks:
- SL-01
- SL-02
- SL-03
- SL-04
- SL-05
- SL-06
- SL-07
- SL-08
- SL-09
- SL-10
- SL-11
- SL-12
preconditions:
- PC-01
- PC-02
- PC-03
- PC-04
post_install_notice:
trigger: skill_installation_complete
message_template:
positioning: I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow.
capability_catalog:
group_strategy:
source: auto_grouped
strategy_reason: auto-grouped by UC.type (2 distinct values, balanced distribution)
groups:
- group_id: data_pipeline
name: Data Pipeline
description: ''
emoji: 📊
uc_count: 3
ucs:
- uc_id: UC-101
name: ArcticDB Tick Data Storage
short_description: Provides persistent storage for high-frequency tick market data using ArcticDB, supporting both
local LMDB and S3 cloud storage backends for efficient
sample_triggers:
- arcticdb
- tick data storage
- time series database
- uc_id: UC-103
name: Market Data Fetching from Vendors
short_description: 'Fetches economic and financial market data from external vendors like Quandl, demonstrating
how to request and cache market data with specific fields '
sample_triggers:
- market data
- quandl
- fetch data
- uc_id: UC-104
name: S3 Cloud Storage for Tick Data
short_description: Demonstrates writing and reading tick market data to/from AWS S3 cloud storage using Parquet
format for efficient compression and retrieval of histori
sample_triggers:
- s3 storage
- aws
- parquet
- group_id: trading_strategy
name: Trading Strategy
description: ''
emoji: 📦
uc_count: 1
ucs:
- uc_id: UC-102
name: FX G10 Cross Backtesting
short_description: Enables historical backtesting of FX trading strategies using G10 currency pairs with technical
indicator-based signal generation to evaluate strategy
sample_triggers:
- backtest
- fx trading
- g10 currency
call_to_action: Tell me which one you want to try.
featured_entries:
- uc_id: UC-101
beginner_prompt: Try arcticdb tick data storage
auto_selected: true
- uc_id: UC-102
beginner_prompt: Try fx g10 cross backtesting
auto_selected: true
- uc_id: UC-103
beginner_prompt: Try market data fetching from vendors
auto_selected: true
more_info_hint: Ask me 'what else can you do?' to see all 4 capabilities.
locale_rendering:
instruction: On skill_installation_complete, translate ALL user-facing strings (positioning + capability_catalog.groups[].name
+ capability_catalog.groups[].description + capability_catalog.groups[].ucs[].short_description + call_to_action + featured_entries[].beginner_prompt
+ more_info_hint) into detected user locale per locale_contract. Preserve UC-IDs, group_id, emoji, and sample_triggers
verbatim.
preserve_verbatim:
- UC-IDs
- group_id
- emoji
- sample_triggers
- technical_class_names
enforcement:
action: 'Host agent MUST send composed message to user as the FIRST user-facing response after skill_installation_complete
event. Message MUST contain: positioning, capability_catalog (rendered as markdown tables per group), 3 featured_entries,
call_to_action, and more_info_hint.'
violation_code: PIN-01
violation_signal: First user-facing message post-install does not contain the full capability_catalog (all UCs grouped)
OR skips featured_entries OR skips call_to_action.
human_summary:
persona: Doraemon
what_i_can_do:
tagline: 'I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow. Just tell me
what you want; I''ll write the code, you don''t have to dig docs. (Heads up: ZVT natively supports A-share, HK, and
crypto. US stocks — stockus_nasdaq_AAPL — are half-baked; don''t bother for serious work.)'
use_cases:
- Market Data Fetching from Vendors
- FX G10 Cross Backtesting
- ArcticDB Tick Data Storage
- A-share MACD daily golden-cross backtest with hfq price adjustment from eastmoney
- 'End-to-end ZVT pipeline: FinanceRecorder + GoodCompanyFactor + StockTrader'
- Multi-factor strategy with TargetSelector (AND mode) combining MACD + volume breakout
- Index composition data collection (SZ1000, SZ2000) with EM recorder
what_i_auto_fetch:
- ZVT stage pipeline structure (data_collection → visualization) from LATEST.yaml
- Semantic locks (SL-01 through SL-12) — especially sell-before-buy ordering and MACD params
- Fatal constraints (finance-C-*) relevant to your target strategy type
- 'Default parameters: MACD(12,26,9), hfq adjustment, buy_cost=0.001, base_capital=1M CNY'
- Entity ID format (stock_sh_600000) and DataFrame MultiIndex convention
- Provider-specific recorder class names and required class attributes
what_i_ask_you:
- 'Target market: A-share (default), HK, or crypto? (US stocks in ZVT are half-baked — stockus_nasdaq_AAPL exists but coverage
is thin)'
- 'Data source / provider: eastmoney (free, no account), joinquant (account+paid), baostock (free, good history), akshare,
or qmt (broker)?'
- 'Strategy type: MACD golden-cross, MA crossover, volume breakout, fundamental screen, or custom factor?'
- 'Time range: start_timestamp and end_timestamp for backtest period'
- 'Target entity IDs: specific stocks (stock_sh_600000) or index components (SZ1000)?'
locale_rendering:
instruction: On first user contact, translate all fields above into detected user locale while preserving Doraemon persona
(direct, frank, mildly snarky, knows limits).
preserve_verbatim:
- BD-IDs
- SL-IDs
- UC-IDs
- finance-C-IDs
- class_names
- function_names
- file_paths
- numeric_thresholds
实时获取多个加密货币交易所的市场数据流,支持异步回调处理并将交易、行情、订单簿等数据持久化到ArcticDB时序数据库。
---
name: cryptofeed-ws-feeds
description: |-
实时获取多个加密货币交易所的市场数据流,支持异步回调处理并将交易、行情、订单簿等数据持久化到ArcticDB时序数据库。
license: Proprietary. See LICENSE.txt in project root.
compatibility: Designed for Doramagic-host ecosystem (Claude Code / openclaw / Cursor). Requires Python 3.12+ with uv package manager.
metadata:
version: "v6.1"
blueprint_id: "finance-bp-110"
compiled_at: "2026-04-22T13:00:52.892309+00:00"
capability_markets: "crypto"
capability_activities: "crypto-trading"
sop_version: "crystal-compilation-v6.1"
---
# 加密货币实时行情 (cryptofeed-ws-feeds)
> 实时获取多个加密货币交易所的市场数据流,支持异步回调处理并将交易、行情、订单簿等数据持久化到ArcticDB时序数据库。
## Pipeline
`data_collection -> data_storage -> factor_computation -> target_selection -> trading_execution -> visualization`
## Top Use Cases (40 total)
### General Callback Handler Demo (`UC-101`)
Demonstrates how to define and use async callback handlers for receiving real-time market data updates from cryptocurrency exchanges
**Triggers**: callback handler, ticker callback, async handler
### ArcticDB Data Storage (`UC-102`)
Stores cryptocurrency trade, funding, and ticker data to ArcticDB (Arctic) time-series database for persistence and later analysis
**Triggers**: ArcticDB, arctic storage, time series database
### Bequant/HitBTC Exchange Features (`UC-103`)
Demonstrates each supported features (ticker, trades, order book, candles) for Bequant and HitBTC exchanges which share the same API
**Triggers**: Bequant, HitBTC, Bitcoin.com exchange
For all **40** use cases, see [references/USE_CASES.md](references/USE_CASES.md).
**Execute trigger**: `When user intent matches intent_router.uc_entries[].positive_terms AND user uses action verb (run/execute/跑/执行/backtest/fetch/collect)`
## What I'll Ask You
- Target market: A-share (default), HK, or crypto? (US stocks in ZVT are half-baked — stockus_nasdaq_AAPL exists but coverage is thin)
- Data source / provider: eastmoney (free, no account), joinquant (account+paid), baostock (free, good history), akshare, or qmt (broker)?
- Strategy type: MACD golden-cross, MA crossover, volume breakout, fundamental screen, or custom factor?
- Time range: start_timestamp and end_timestamp for backtest period
- Target entity IDs: specific stocks (stock_sh_600000) or index components (SZ1000)?
## Semantic Locks (Fatal)
| ID | Rule | On Violation |
|---|---|---|
| `SL-01` | Execute sell orders before buy orders in every trading cycle | halt |
| `SL-02` | Trading signals MUST use next-bar execution (no look-ahead) | halt |
| `SL-03` | Entity IDs MUST follow format entity_type_exchange_code | halt |
| `SL-04` | DataFrame index MUST be MultiIndex (entity_id, timestamp) | halt |
| `SL-05` | TradingSignal MUST have EXACTLY ONE of: position_pct, order_money, order_amount | halt |
| `SL-06` | filter_result column semantics: True=BUY, False=SELL, None/NaN=NO ACTION | halt |
| `SL-07` | Transformer MUST run BEFORE Accumulator in factor pipeline | halt |
| `SL-08` | MACD parameters locked: fast=12, slow=26, signal=9 | halt |
Full lock definitions: [references/LOCKS.md](references/LOCKS.md)
## Top Anti-Patterns (13 total)
- **`AP-CRYPTO-TRADING-001`**: Float Arithmetic for Monetary Values
- **`AP-CRYPTO-TRADING-002`**: Missing Market Initialization Before Access
- **`AP-CRYPTO-TRADING-003`**: Bypassing API Facade Layer
All 13 anti-patterns: [references/ANTI_PATTERNS.md](references/ANTI_PATTERNS.md)
## Evidence Quality Notice
> [QUALITY NOTICE] This crystal was compiled from blueprint finance-bp-110. Evidence verify ratio = 53.1% and audit fail total = 18. Generated results may have uncaptured requirement gaps. Verify critical decisions against source files (LATEST.yaml / LATEST.jsonl).
## Reference Files
| File | Contents | When to Load |
|---|---|---|
| [references/seed.yaml](references/seed.yaml) | V6+ 全量权威 (source-of-truth) | 有行为/决策争议时必读 |
| [references/ANTI_PATTERNS.md](references/ANTI_PATTERNS.md) | 13 条跨项目反模式 | 开始实现前 |
| [references/WISDOM.md](references/WISDOM.md) | 跨项目精华借鉴 | 架构决策时 |
| [references/CONSTRAINTS.md](references/CONSTRAINTS.md) | domain + fatal 约束 | 规则冲突时 |
| [references/USE_CASES.md](references/USE_CASES.md) | 全量 KUC-* 业务场景 | 需要完整示例时 |
| [references/LOCKS.md](references/LOCKS.md) | SL-* + preconditions + hints | 生成回测/交易代码前 |
| [references/COMPONENTS.md](references/COMPONENTS.md) | AST 组件地图(按 module 拆分)| 查 API 时 |
---
*Compiled by Doramagic crystal-compilation-v6.1 from `finance-bp-110` blueprint at 2026-04-22T13:00:52.892309+00:00.*
*See [human_summary.md](human_summary.md) for non-technical overview.*
FILE:human_summary.md
# Human Summary
> I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow. Just tell me what you want; I'll write the code, you don't have to dig docs. (Heads up: ZVT natively supports A-share, HK, and crypto. US stocks — stockus_nasdaq_AAPL — are half-baked; don't bother for serious work.)
## What I Can Do
- **tagline**: I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow. Just tell me what you want; I'll write the code, you don't have to dig docs. (Heads up: ZVT natively supports A-share, HK, and crypto. US stocks — stockus_nasdaq_AAPL — are half-baked; don't bother for serious work.)
- **use_cases**: ['Bequant/HitBTC Exchange Features', 'ArcticDB Data Storage', 'General Callback Handler Demo', 'A-share MACD daily golden-cross backtest with hfq price adjustment from eastmoney', 'End-to-end ZVT pipeline: FinanceRecorder + GoodCompanyFactor + StockTrader', 'Multi-factor strategy with TargetSelector (AND mode) combining MACD + volume breakout', 'Index composition data collection (SZ1000, SZ2000) with EM recorder']
## What I Ask You
- Target market: A-share (default), HK, or crypto? (US stocks in ZVT are half-baked — stockus_nasdaq_AAPL exists but coverage is thin)
- Data source / provider: eastmoney (free, no account), joinquant (account+paid), baostock (free, good history), akshare, or qmt (broker)?
- Strategy type: MACD golden-cross, MA crossover, volume breakout, fundamental screen, or custom factor?
- Time range: start_timestamp and end_timestamp for backtest period
- Target entity IDs: specific stocks (stock_sh_600000) or index components (SZ1000)?
FILE:references/ANTI_PATTERNS.md
# Anti-Patterns (Cross-Project)
Total: **13**
## ccxt (1)
### `AP-CRYPTO-TRADING-002` — Missing Market Initialization Before Access <sub>(high)</sub>
Attempting to access market data via symbol lookups before load_markets() is called leaves self.markets empty, causing KeyError or BadSymbol exceptions on all trading operations and data retrieval. This breaks the entire trading workflow at the first market interaction.
## cryptofeed (3)
### `AP-CRYPTO-TRADING-009` — Applying Order Book Deltas Before Snapshot <sub>(high)</sub>
Processing order book delta messages before receiving a snapshot for the symbol applies updates to an uninitialized or stale book state. Price levels are incorrectly added/removed, corrupting the local book representation with no way to recover without full reset.
### `AP-CRYPTO-TRADING-010` — Silent HTTP Error Handling <sub>(medium)</sub>
Ignoring non-200 HTTP response status codes without raising exceptions causes silent failures for data requests. Market data is missing or corrupted, failed requests are not retried, and downstream consumers receive incomplete data with no indication of failure.
### `AP-CRYPTO-TRADING-011` — Missing Sequence Number Validation <sub>(medium)</sub>
Not validating that order book sequence numbers increment by exactly 1 allows out-of-order or missing messages to corrupt local book state. Stale or incorrect price levels persist in the book, leading to wrong trading signals and corrupted market depth data.
## hummingbot (5)
### `AP-CRYPTO-TRADING-005` — Unvalidated Collateral for Order Execution <sub>(high)</sub>
Submitting orders without checking collateral requirements including order cost, percent fees, and fixed fees against available balance causes orders to exceed margin. This triggers immediate liquidation or forced position closure at unfavorable prices with partial or total loss of collateral.
### `AP-CRYPTO-TRADING-006` — Close Order Placed Before Open Order Fills <sub>(high)</sub>
Placing a close order before verifying the open order is fully filled causes mismatched position sizes. The executor attempts to close a larger or smaller position than actually exists, leading to unintended directional exposure and potential losses exceeding the configured risk parameters.
### `AP-CRYPTO-TRADING-007` — Arbitrage Across Non-Interchangeable Tokens <sub>(high)</sub>
Executing arbitrage trades between tokens that appear similar but are not interchangeable causes permanent loss of funds. The received tokens cannot be used to close the opposing position, stranding capital and creating one-sided exposure with no recovery path.
### `AP-CRYPTO-TRADING-008` — Skipping Triple Barrier Evaluations <sub>(high)</sub>
Omitting control_stop_loss, control_take_profit, or control_time_limit calls in the control_barriers cycle leaves positions unprotected. Losses exceed configured thresholds as barrier checks never trigger, positions remain open beyond risk tolerance, resulting in amplified losses.
### `AP-CRYPTO-TRADING-012` — Wrong Position Key for Perpetual Modes <sub>(medium)</sub>
Using trading_pair only as the position key in HEDGE mode causes different position sides to collide and overwrite each other. Position tracking becomes incorrect, leading to wrong order matching and potential financial loss when the system misidentifies position direction.
## rotki (3)
### `AP-CRYPTO-TRADING-003` — Bypassing API Facade Layer <sub>(high)</sub>
Directly accessing internal service methods without routing through the RestAPI facade bypasses authentication, task tracking, and error handling mechanisms. Anonymous requests can execute privileged operations, creating critical security vulnerabilities where unauthorized users access sensitive financial data or execute trades.
### `AP-CRYPTO-TRADING-004` — Non-Checksummed EVM Addresses <sub>(high)</sub>
Passing lowercase or mixed-case Ethereum addresses to RPC nodes causes InvalidAddress exceptions since nodes enforce EIP-55 checksum format. This results in RemoteError failures that halt all blockchain data collection for the affected chain, with no graceful degradation or fallback.
### `AP-CRYPTO-TRADING-013` — Overwriting User-Customized Event Classifications <sub>(medium)</sub>
Re-decoding operations silently replace user-modified events marked as CUSTOMIZED without explicit user action. User edits to event classifications are permanently lost, causing incorrect accounting treatment and potential tax reporting errors that may not be detected until audit.
## rotki, hummingbot, cryptofeed, ccxt (1)
### `AP-CRYPTO-TRADING-001` — Float Arithmetic for Monetary Values <sub>(high)</sub>
Using Python float type instead of Decimal for price, amount, balance, PnL, and other financial calculations causes precision errors due to binary floating-point representation. Rounding errors compound across multiple calculations, leading to incorrect order sizing, wrong profit/loss reporting, and potentially incorrect trading decisions or tax calculations.
FILE:references/COMPONENTS.md
# Component Capability Map
**Project**: finance-bp-110--cryptofeed
**Scan date**: 2026-04-22
**Stats**: {'total_files': 8, 'total_classes': 30, 'total_functions': 0, 'total_stages': 8}
## Modules (8)
- [feed_handler_orchestration](components/feed_handler_orchestration.md): 4 classes
- [connection_management](components/connection_management.md): 4 classes
- [exchange_interface_layer](components/exchange_interface_layer.md): 5 classes
- [data_normalization](components/data_normalization.md): 4 classes
- [order_book_processing](components/order_book_processing.md): 3 classes
- [nbbo_aggregation](components/nbbo_aggregation.md): 2 classes
- [callback_dispatch](components/callback_dispatch.md): 3 classes
- [backend_storage](components/backend_storage.md): 5 classes
FILE:references/CONSTRAINTS.md
# Constraints
## preservation_manifest
```yaml
required_objects:
business_decisions_count: 103
fatal_constraints_count: 33
non_fatal_constraints_count: 196
use_cases_count: 40
semantic_locks_count: 12
preconditions_count: 4
evidence_quality_rules_count: 2
traceback_scenarios_count: 5
```
FILE:references/LOCKS.md
# Semantic Locks + Preconditions
## Semantic Locks (12)
### `SL-01` <sub>(on_violation: fatal)</sub>
Execute sell orders before buy orders in every trading cycle
### `SL-02` <sub>(on_violation: fatal)</sub>
Trading signals MUST use next-bar execution (no look-ahead)
### `SL-03` <sub>(on_violation: fatal)</sub>
Entity IDs MUST follow format entity_type_exchange_code
### `SL-04` <sub>(on_violation: fatal)</sub>
DataFrame index MUST be MultiIndex (entity_id, timestamp)
### `SL-05` <sub>(on_violation: fatal)</sub>
TradingSignal MUST have EXACTLY ONE of: position_pct, order_money, order_amount
### `SL-06` <sub>(on_violation: fatal)</sub>
filter_result column semantics: True=BUY, False=SELL, None/NaN=NO ACTION
### `SL-07` <sub>(on_violation: fatal)</sub>
Transformer MUST run BEFORE Accumulator in factor pipeline
### `SL-08` <sub>(on_violation: fatal)</sub>
MACD parameters locked: fast=12, slow=26, signal=9
### `SL-09` <sub>(on_violation: warning)</sub>
Default transaction costs: buy_cost=0.001, sell_cost=0.001, slippage=0.001
### `SL-10` <sub>(on_violation: fatal)</sub>
A-share equity trading is T+1 (no same-day close of buy positions)
### `SL-11` <sub>(on_violation: fatal)</sub>
Recorder subclass MUST define provider AND data_schema class attributes
### `SL-12` <sub>(on_violation: fatal)</sub>
Factor result_df MUST contain either 'filter_result' OR 'score_result' column
## Preconditions (4)
- **`PC-01`**: `python3 -c 'import zvt; print(zvt.__version__)'` → on_fail: Run: python3 -m pip install zvt then re-run: python3 -m zvt.init_dirs to initialize data directories
- **`PC-02`**: `python3 -c "from zvt.api.kdata import get_kdata; df = get_kdata(entity_ids=['stock_sh_600000'], limit=1); assert df is not None and len(df) > 0, 'No kdata found'"` → on_fail: Run recorder first: python3 -m zvt.recorders.em.em_stock_kdata_recorder --entity_ids stock_sh_600000 (replace with your target entity IDs)
- **`PC-03`**: `python3 -c "import os; from pathlib import Path; zvt_home = Path(os.environ.get('ZVT_HOME', Path.home() / '.zvt')); assert zvt_home.exists(), f'ZVT home not found: {zvt_home}'"` → on_fail: Run: python3 -m zvt.init_dirs
- **`PC-04`**: `python3 -c "import os, tempfile; from pathlib import Path; zvt_home = Path(os.environ.get('ZVT_HOME', Path.home() / '.zvt')); test_f = zvt_home / '.write_test'; test_f.touch(); test_f.unlink()"` → on_fail: Check directory permissions: chmod u+w ~/.zvt or set ZVT_HOME environment variable to a writable location
FILE:references/USE_CASES.md
# Known Use Cases (KUC)
Total: **40**
## `KUC-101`
**Source**: `examples/demo.py`
Demonstrates how to define and use async callback handlers for receiving real-time market data updates from cryptocurrency exchanges.
## `KUC-102`
**Source**: `examples/demo_arctic.py`
Stores cryptocurrency trade, funding, and ticker data to ArcticDB (Arctic) time-series database for persistence and later analysis.
## `KUC-103`
**Source**: `examples/demo_bequant_bitcoincom_hitbtc.py`
Demonstrates each supported features (ticker, trades, order book, candles) for Bequant and HitBTC exchanges which share the same API.
## `KUC-104`
**Source**: `examples/demo_binance_authenticated.py`
Demonstrates authenticated access to Binance, Binance Delivery, and Binance Futures for receiving account balances, positions, and order updates in real-time.
## `KUC-105`
**Source**: `examples/demo_binance_delivery.py`
Shows data subscription for Binance Delivery perpetual futures including order book, ticker, and trade data.
## `KUC-106`
**Source**: `examples/demo_binancetr.py`
Demonstrates data subscription for Binance TR exchange including ticker, trades, order book with delta updates, and candle data.
## `KUC-107`
**Source**: `examples/demo_bitfinex_authenticated.py`
Demonstrates synchronous authenticated Bitfinex trading operations including balance queries, order management, and trade execution.
## `KUC-108`
**Source**: `examples/demo_bybit_authenticated.py`
Demonstrates Bybit authenticated feeds for receiving order updates and trade fills in real-time for account monitoring.
## `KUC-109`
**Source**: `examples/demo_check_trade_timestamps.py`
Monitors and compares trade timestamps across multiple exchanges to verify timestamp consistency and identify potential synchronization issues.
## `KUC-110`
**Source**: `examples/demo_concurrent_proxy.py`
Demonstrates using HTTP proxy to bypass exchange rate limits when subscribing to many symbols, enabling concurrent order book and open interest data collection.
## `KUC-111`
**Source**: `examples/demo_custom_agg.py`
Demonstrates custom aggregation of trade data over time windows, tracking min/max prices for each symbol within the aggregation period.
## `KUC-112`
**Source**: `examples/demo_deribit_authenticated.py`
Demonstrates Deribit authenticated feeds for order info, trade fills, and balance updates for comprehensive account monitoring.
## `KUC-113`
**Source**: `examples/demo_elastic.py`
Stores order book, funding, and trade data to Elasticsearch for search and analytics capabilities.
## `KUC-114`
**Source**: `examples/demo_existing_loop.py`
Demonstrates integrating cryptofeed with an existing asyncio event loop, allowing concurrent execution with other async tasks.
## `KUC-115`
**Source**: `examples/demo_gateiofutures.py`
Demonstrates subscription to Gate.io futures exchange for ticker, trades, order book, funding, and candle data.
## `KUC-116`
**Source**: `examples/demo_gcppubsub.py`
Publishes trade data to Google Cloud Platform Pub/Sub for event-driven architectures and cloud-based processing.
## `KUC-117`
**Source**: `examples/demo_influxdb.py`
Stores funding, order book, trades, ticker, and candles to InfluxDB time-series database for monitoring and analysis.
## `KUC-118`
**Source**: `examples/demo_kafka.py`
Streams order book and trade data to Apache Kafka with custom topic and partition routing for scalable event processing.
## `KUC-119`
**Source**: `examples/demo_liquidations.py`
Monitors and displays liquidations across each exchanges that support this channel, useful for identifying market stress and volatility.
## `KUC-120`
**Source**: `examples/demo_loop.py`
Demonstrates dynamic addition of feeds to a running event loop and scheduled callbacks for adding/removing feeds over time.
## `KUC-121`
**Source**: `examples/demo_mongo.py`
Stores order book, trades, and ticker data to MongoDB document database with flexible schema for JSON storage.
## `KUC-122`
**Source**: `examples/demo_multicb.py`
Demonstrates registering multiple callback handlers for a single data channel, enabling parallel processing of the same data.
## `KUC-123`
**Source**: `examples/demo_nbbo.py`
Calculates National Best Bid and Offer (NBBO) by aggregating best bid/ask prices across Coinbase, Gemini, and Kraken for a given symbol.
## `KUC-124`
**Source**: `examples/demo_ohlcv.py`
Aggregates trade data into OHLCV (Open, High, Low, Close, Volume) candles over configurable time windows for charting.
## `KUC-125`
**Source**: `examples/demo_okx_authenticated.py`
Demonstrates authenticated OKX exchange for receiving real-time order updates for account monitoring.
## `KUC-126`
**Source**: `examples/demo_playback.py`
Plays back historical market data from captured PCAP files through the callback system for backtesting and analysis.
## `KUC-127`
**Source**: `examples/demo_postgres.py`
Stores comprehensive market data (candles, index, ticker, trades, open interest, liquidations, funding, order books) to PostgreSQL with custom column mapping.
## `KUC-128`
**Source**: `examples/demo_quasardb.py`
Stores ticker, trades, candles, open interest, index, and liquidation data to QuasarDB for high-performance time-series analytics.
## `KUC-129`
**Source**: `examples/demo_questdb.py`
Stores order book, candles, funding, ticker, and trade data to QuestDB for high-performance time-series database operations.
## `KUC-130`
**Source**: `examples/demo_rabbitmq_exchange.py`
Publishes order book data to RabbitMQ using topic exchange routing for flexible message filtering and distribution.
## `KUC-131`
**Source**: `examples/demo_rabbitmq_queue.py`
Publishes order book data to RabbitMQ using queue-based delivery for point-to-point message distribution.
## `KUC-132`
**Source**: `examples/demo_raw_data.py`
Collects raw WebSocket data to files for offline analysis, debugging, or historical data preservation.
## `KUC-133`
**Source**: `examples/demo_redis.py`
Stores trades, funding, candles, order books, open interest, and ticker data to Redis with both pub/sub and persistent storage backends.
## `KUC-134`
**Source**: `examples/demo_renko.py`
Transforms trade data into Renko chart bricks based on fixed price movements for trend visualization independent of time.
## `KUC-135`
**Source**: `examples/demo_tcp.py`
Streams trade data over TCP sockets for network-based data distribution to remote systems or applications.
## `KUC-136`
**Source**: `examples/demo_throttle.py`
Limits the rate of order book callbacks to a specified number per time window, useful for managing downstream system load.
## `KUC-137`
**Source**: `examples/demo_udp.py`
Streams order book and trade data over UDP datagrams for low-latency network distribution to remote systems.
## `KUC-138`
**Source**: `examples/demo_uds.py`
Streams ticker and trade data over Unix domain sockets for high-performance inter-process communication on the same host.
## `KUC-139`
**Source**: `examples/demo_victoriametrics.py`
Stores trade, ticker, order book, and candle data to VictoriaMetrics for Prometheus-compatible time-series monitoring and analytics.
## `KUC-140`
**Source**: `examples/demo_zmq.py`
Publishes order book and ticker data over ZeroMQ pub/sub for lightweight message distribution to multiple subscribers.
FILE:references/WISDOM.md
# Cross-Project Wisdom
Total: **8**
## `CW-CRYPTO-TRADING-001` — Decimal Type for All Monetary Values
**From**: rotki, hummingbot, cryptofeed, ccxt · **Applicable to**: crypto-trading
All four projects mandate Decimal type for price, amount, balance, quantity, and PnL fields. Float arithmetic causes rounding errors that compound across financial calculations, leading to incorrect order sizing and reporting. Always use Decimal for any value representing money in crypto trading systems.
## `CW-CRYPTO-TRADING-002` — Initialize Data Structures Before Access
**From**: ccxt, cryptofeed, rotki · **Applicable to**: crypto-trading
Projects consistently require explicit initialization before data access: load_markets() before symbol lookups, check symbol population before mapping access, establish RPC connections before queries. Skipping initialization causes KeyError, AttributeError, or silent data corruption that breaks downstream operations.
## `CW-CRYPTO-TRADING-003` — Precise String Arithmetic for Financial Calculations
**From**: ccxt · **Applicable to**: crypto-trading
CCXT mandates Precise.string_* static methods (string_mul, string_div, string_add, string_sub) for monetary calculations to avoid floating-point precision errors. This is especially critical for high-precision exchange data where rounding errors cause incorrect order costs, fees, and balances that may result in financial loss.
## `CW-CRYPTO-TRADING-004` — Respect Exchange Rate Limits
**From**: ccxt · **Applicable to**: crypto-trading
Disabling rate limiting via enableRateLimit=False causes HTTP 429 responses and potential temporary or permanent API key suspension by exchanges. CCXT enforces rate limits per IP/API key pair, and bypassing throttle() gates results in compliance violations that disrupt all trading activity until exchanges lift bans.
## `CW-CRYPTO-TRADING-005` — Inverse Contract Price Adjustment
**From**: ccxt, hummingbot · **Applicable to**: crypto-trading
Perpetual swap cost calculations require applying inverse price adjustment (1/price) before multiplying by contractSize for inverse contracts. Incorrect cost calculation causes wrong position sizing, leading to unexpected liquidation or insufficient margin for perpetual trading positions.
## `CW-CRYPTO-TRADING-006` — Strict Connection Lifecycle Ordering
**From**: cryptofeed, ccxt · **Applicable to**: crypto-trading
Both projects enforce strict execution order for connection operations: cryptofeed requires authenticate -> subscribe -> message handler sequence, while ccxt mandates connect -> on_connected_callback -> subscriptions -> on_close_callback. Out-of-order operations cause subscription failures and no data flow through connections.
## `CW-CRYPTO-TRADING-007` — Validate Input Data Structure Before Processing
**From**: rotki, cryptofeed · **Applicable to**: crypto-trading
Rotki validates EVM address checksum format before RPC calls; cryptofeed checks Symbols.populated() before symbol mapping access. Validating data structure before processing prevents downstream crashes (KeyError, InvalidAddress) and data corruption that is harder to debug when symptoms appear in unrelated code paths.
## `CW-CRYPTO-TRADING-008` — Validate Order Sizes Against Exchange Minimums
**From**: hummingbot · **Applicable to**: crypto-trading
DCAExecutor amounts must be validated against min_notional_size and amounts_quote/prices against min_order_size before execution. Orders below exchange minimums are rejected, breaking strategy execution and potentially leaving positions partially unfilled at unfavorable prices.
FILE:references/components/backend_storage.md
# backend_storage (5 classes)
## `Backend.write`
`backend_storage/backend-write.py:0`
## `Backend.start`
`backend_storage/backend-start.py:0`
## `BackendQueue.start`
`backend_storage/backendqueue-start.py:0`
## `storage_backend`
`backend_storage/storage-backend.py:0`
## `ipc_mechanism`
`backend_storage/ipc-mechanism.py:0`
FILE:references/components/callback_dispatch.md
# callback_dispatch (3 classes)
## `Callback.__call__`
`callback_dispatch/callback-call.py:0`
## `AggregateCallback.__init__`
`callback_dispatch/aggregatecallback-init.py:0`
## `callback_type`
`callback_dispatch/callback-type.py:0`
FILE:references/components/connection_management.md
# connection_management (4 classes)
## `WebsocketEndpoint.connect`
`connection_management/websocketendpoint-connect.py:0`
## `WebsocketEndpoint.read`
`connection_management/websocketendpoint-read.py:0`
## `WebsocketEndpoint.write`
`connection_management/websocketendpoint-write.py:0`
## `websocket_library`
`connection_management/websocket-library.py:0`
FILE:references/components/data_normalization.md
# data_normalization (4 classes)
## `Trade.__init__`
`data_normalization/trade-init.py:0`
## `Book.__init__`
`data_normalization/book-init.py:0`
## `Callback.__call__`
`data_normalization/callback-call.py:0`
## `type_validation`
`data_normalization/type-validation.py:0`
FILE:references/components/exchange_interface_layer.md
# exchange_interface_layer (5 classes)
## `Binance.subscribe`
`exchange_interface_layer/binance-subscribe.py:0`
## `Coinbase._connect`
`exchange_interface_layer/coinbase-connect.py:0`
## `Exchange.standardize_symbol`
`exchange_interface_layer/exchange-standardize-symbol.py:0`
## `symbol_mapping_backend`
`exchange_interface_layer/symbol-mapping-backend.py:0`
## `rest_rate_limit_handling`
`exchange_interface_layer/rest-rate-limit-handling.py:0`
FILE:references/components/feed_handler_orchestration.md
# feed_handler_orchestration (4 classes)
## `FeedHandler.add_feed`
`feed_handler_orchestration/feedhandler-add-feed.py:0`
## `FeedHandler.add_nbbo`
`feed_handler_orchestration/feedhandler-add-nbbo.py:0`
## `FeedHandler.run`
`feed_handler_orchestration/feedhandler-run.py:0`
## `async_event_loop`
`feed_handler_orchestration/async-event-loop.py:0`
FILE:references/components/nbbo_aggregation.md
# nbbo_aggregation (2 classes)
## `NBBO._update`
`nbbo_aggregation/nbbo-update.py:0`
## `nbbo_source`
`nbbo_aggregation/nbbo-source.py:0`
FILE:references/components/order_book_processing.md
# order_book_processing (3 classes)
## `Book.callback`
`order_book_processing/book-callback.py:0`
## `Book.update`
`order_book_processing/book-update.py:0`
## `book_depth`
`order_book_processing/book-depth.py:0`
处理信用评级转移矩阵,支持Not-Rated状态重分配、年度与月度矩阵转换、状态空间定义及数据集表征。
---
name: credit-transition-matrix
description: |-
处理信用评级转移矩阵,支持Not-Rated状态重分配、年度与月度矩阵转换、状态空间定义及数据集表征。
license: Proprietary. See LICENSE.txt in project root.
compatibility: Designed for Doramagic-host ecosystem (Claude Code / openclaw / Cursor). Requires Python 3.12+ with uv package manager.
metadata:
version: "v6.1"
blueprint_id: "finance-bp-119"
compiled_at: "2026-04-22T13:00:58.228711+00:00"
capability_markets: "global"
capability_activities: "credit-risk"
sop_version: "crystal-compilation-v6.1"
---
# 信用转移矩阵 (credit-transition-matrix)
> 处理信用评级转移矩阵,支持Not-Rated状态重分配、年度与月度矩阵转换、状态空间定义及数据集表征。
## Pipeline
`data_collection -> data_storage -> factor_computation -> target_selection -> trading_execution -> visualization`
## Top Use Cases (22 total)
### Adjust Not-Rated State in Credit Migration Matrices (`UC-101`)
Credit rating transition matrices often contain 'not-rated' (NR) observations that need to be redistributed to rated states for downstream risk calcul
**Triggers**: not-rated, NR adjustment, credit migration
### Adjust Not-Rated State via Python Script (`UC-104`)
Corporate credit rating migration data contains NR (not-rated) states that must be removed using noninformative redistribution method before calculati
**Triggers**: not-rated, NR removal, credit rating
### Clean and Prepare Transition Data (`UC-108`)
Raw credit rating data requires preprocessing including column renaming, state validation, and absorbing state verification before it can be used for
**Triggers**: data cleaning, preprocessing, validation
For all **22** use cases, see [references/USE_CASES.md](references/USE_CASES.md).
**Execute trigger**: `When user intent matches intent_router.uc_entries[].positive_terms AND user uses action verb (run/execute/跑/执行/backtest/fetch/collect)`
## What I'll Ask You
- Target market: A-share (default), HK, or crypto? (US stocks in ZVT are half-baked — stockus_nasdaq_AAPL exists but coverage is thin)
- Data source / provider: eastmoney (free, no account), joinquant (account+paid), baostock (free, good history), akshare, or qmt (broker)?
- Strategy type: MACD golden-cross, MA crossover, volume breakout, fundamental screen, or custom factor?
- Time range: start_timestamp and end_timestamp for backtest period
- Target entity IDs: specific stocks (stock_sh_600000) or index components (SZ1000)?
## Semantic Locks (Fatal)
| ID | Rule | On Violation |
|---|---|---|
| `SL-01` | Execute sell orders before buy orders in every trading cycle | halt |
| `SL-02` | Trading signals MUST use next-bar execution (no look-ahead) | halt |
| `SL-03` | Entity IDs MUST follow format entity_type_exchange_code | halt |
| `SL-04` | DataFrame index MUST be MultiIndex (entity_id, timestamp) | halt |
| `SL-05` | TradingSignal MUST have EXACTLY ONE of: position_pct, order_money, order_amount | halt |
| `SL-06` | filter_result column semantics: True=BUY, False=SELL, None/NaN=NO ACTION | halt |
| `SL-07` | Transformer MUST run BEFORE Accumulator in factor pipeline | halt |
| `SL-08` | MACD parameters locked: fast=12, slow=26, signal=9 | halt |
Full lock definitions: [references/LOCKS.md](references/LOCKS.md)
## Top Anti-Patterns (14 total)
- **`AP-CREDIT-RISK-001`**: Empty DataFrame passed to bucketing pipeline
- **`AP-CREDIT-RISK-002`**: Multi-dimensional target array causing WoE shape mismatch
- **`AP-CREDIT-RISK-003`**: OptimalBucketer receiving high-cardinality numerical features
All 14 anti-patterns: [references/ANTI_PATTERNS.md](references/ANTI_PATTERNS.md)
## Evidence Quality Notice
> [QUALITY NOTICE] This crystal was compiled from blueprint finance-bp-119. Evidence verify ratio = 35.8% and audit fail total = 15. Generated results may have uncaptured requirement gaps. Verify critical decisions against source files (LATEST.yaml / LATEST.jsonl).
## Reference Files
| File | Contents | When to Load |
|---|---|---|
| [references/seed.yaml](references/seed.yaml) | V6+ 全量权威 (source-of-truth) | 有行为/决策争议时必读 |
| [references/ANTI_PATTERNS.md](references/ANTI_PATTERNS.md) | 14 条跨项目反模式 | 开始实现前 |
| [references/WISDOM.md](references/WISDOM.md) | 跨项目精华借鉴 | 架构决策时 |
| [references/CONSTRAINTS.md](references/CONSTRAINTS.md) | domain + fatal 约束 | 规则冲突时 |
| [references/USE_CASES.md](references/USE_CASES.md) | 全量 KUC-* 业务场景 | 需要完整示例时 |
| [references/LOCKS.md](references/LOCKS.md) | SL-* + preconditions + hints | 生成回测/交易代码前 |
| [references/COMPONENTS.md](references/COMPONENTS.md) | AST 组件地图(按 module 拆分)| 查 API 时 |
---
*Compiled by Doramagic crystal-compilation-v6.1 from `finance-bp-119` blueprint at 2026-04-22T13:00:58.228711+00:00.*
*See [human_summary.md](human_summary.md) for non-technical overview.*
FILE:human_summary.md
# Human Summary
> I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow. Just tell me what you want; I'll write the code, you don't have to dig docs. (Heads up: ZVT natively supports A-share, HK, and crypto. US stocks — stockus_nasdaq_AAPL — are half-baked; don't bother for serious work.)
## What I Can Do
- **tagline**: I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow. Just tell me what you want; I'll write the code, you don't have to dig docs. (Heads up: ZVT natively supports A-share, HK, and crypto. US stocks — stockus_nasdaq_AAPL — are half-baked; don't bother for serious work.)
- **use_cases**: ['Convert Annual to Monthly Transition Matrices via Generator', 'Transition Matrix Operations Demonstration', 'Adjust Not-Rated State in Credit Migration Matrices', 'A-share MACD daily golden-cross backtest with hfq price adjustment from eastmoney', 'End-to-end ZVT pipeline: FinanceRecorder + GoodCompanyFactor + StockTrader', 'Multi-factor strategy with TargetSelector (AND mode) combining MACD + volume breakout', 'Index composition data collection (SZ1000, SZ2000) with EM recorder']
## What I Ask You
- Target market: A-share (default), HK, or crypto? (US stocks in ZVT are half-baked — stockus_nasdaq_AAPL exists but coverage is thin)
- Data source / provider: eastmoney (free, no account), joinquant (account+paid), baostock (free, good history), akshare, or qmt (broker)?
- Strategy type: MACD golden-cross, MA crossover, volume breakout, fundamental screen, or custom factor?
- Time range: start_timestamp and end_timestamp for backtest period
- Target entity IDs: specific stocks (stock_sh_600000) or index components (SZ1000)?
FILE:references/ANTI_PATTERNS.md
# Anti-Patterns (Cross-Project)
Total: **14**
## finance-bp-050--skorecard (5)
### `AP-CREDIT-RISK-001` — Empty DataFrame passed to bucketing pipeline <sub>(high)</sub>
When preparing input data for bucketing, passing an empty DataFrame with zero rows or zero columns causes immediate ValueError at validation stage. This prevents any downstream processing and blocks the entire credit risk scoring pipeline from executing. The root cause is missing defensive validation before data enters the bucketing workflow.
### `AP-CREDIT-RISK-002` — Multi-dimensional target array causing WoE shape mismatch <sub>(high)</sub>
When providing target variable y to bucketers without normalizing to 1D numpy array through _check_y validation, downstream Weight of Evidence calculations fail with shape mismatches. The consequence is corrupted bucket tables with incorrect credit risk scores that misrepresent default probability estimates.
### `AP-CREDIT-RISK-003` — OptimalBucketer receiving high-cardinality numerical features <sub>(high)</sub>
When implementing prebucketing for OptimalBucketer on numerical features without reducing to at most 100 unique values, the system raises NotPreBucketedError and blocks the entire bucketing pipeline. Similarly, AsIsNumericalBucketer fails with the same error for columns exceeding 100 unique values, preventing feature transformation in production scoring.
### `AP-CREDIT-RISK-004` — Special values distorting optimal bin boundaries <sub>(high)</sub>
When implementing fit() for bucketers without filtering special values from X before computing bin boundaries using _filter_specials_for_fit(), outlier special values distort optimal bin boundaries. This causes incorrect weight-of-evidence calculations and unreliable credit risk scores that misrepresent borrower default probabilities.
### `AP-CREDIT-RISK-005` — Two-phase bucketing ordering violation causing special value loss <sub>(high)</sub>
When fitting a BucketingProcess with two-phase bucketing without fitting prebucketing_pipeline before bucketing_pipeline, special value remapping fails because pre-bucket labels are unavailable. Additionally, not using _find_remapped_specials() after prebucketing causes special values to lose their correct bucket mappings, resulting in runtime errors.
## finance-bp-072--lending (3)
### `AP-CREDIT-RISK-006` — Loan amount exceeding product and collateral limits <sub>(high)</sub>
When validating loan amount for loan applications without enforcing loan_amount does not exceed maximum_loan_amount from loan product or proposed securities, disbursing amounts exceeding product or collateral limits exposes the lender to uncollateralized risk. This violates lending policy and creates direct financial loss exposure through unauthorized lending.
### `AP-CREDIT-RISK-007` — Disbursement validation failures creating unauthorized exposure <sub>(high)</sub>
When implementing loan disbursement validation without checking disbursed amount against loan limit, assigned security value, available limit amount, and limit applicability dates, unauthorized disbursements occur. For Line of Credit loans, disbursement outside approved periods or exceeding available limits creates unauthorized lending exposure and regulatory compliance violations.
### `AP-CREDIT-RISK-008` — Interest accrual on written-off loans inflating income <sub>(high)</sub>
When processing interest accrual for Written Off loans without verifying posting_date is on or after the loan write-off date, interest is artificially inflated on non-performing assets. This misrepresents loan portfolio value, violates provisioning requirements, and creates false income reporting that misleads stakeholders about actual financial performance.
## finance-bp-112--openLGD (2)
### `AP-CREDIT-RISK-009` — Loop index errors in federated parameter averaging <sub>(high)</sub>
When implementing federated parameter averaging logic, using the final index n instead of the loop variable k causes only the last server's weight to be applied repeatedly. Additionally, skipping the first server by starting loop index at 1 excludes valid parameters from averaging, breaking federated convergence and producing incorrect LGD estimates across all nodes.
### `AP-CREDIT-RISK-010` — API response format inconsistency breaking federated coordination <sub>(high)</sub>
When implementing GET /start and POST /update endpoints for LGD estimation without consistent 'intercept' and 'coefficient' keys in JSON responses, the federated coordinator fails to parse responses causing KeyError. Different return key names (e.g., 'coef' instead of 'coefficient') break both standalone and federated execution paths.
## finance-bp-119--transitionMatrix (4)
### `AP-CREDIT-RISK-011` — Invalid transition probabilities corrupting Markov matrices <sub>(high)</sub>
When generating synthetic Markov chain data or estimating transition matrices with probabilities outside [0, 1] or row sums not equal to 1.0, the resulting matrices violate the fundamental mathematical definition of a stochastic transition matrix. This corrupts all downstream Markov chain modeling and credit curve generation, producing unreliable credit risk estimates.
### `AP-CREDIT-RISK-012` — Unsorted event data causing incorrect transition matrix estimates <sub>(high)</sub>
When feeding generated data to cohort or duration estimators without sorting by entity ID first, then by ascending time, incorrect timepoint assignment occurs in estimators, leading to wrong transition counts. Unsorted data also causes the Aalen-Johansen algorithm to process events out of temporal order, producing incorrect transition matrices that violate the Markov property.
### `AP-CREDIT-RISK-013` — Zero-count division causing NaN in transition matrices <sub>(high)</sub>
When normalizing counts to produce transition probabilities without checking source state population count is greater than zero before division, division by zero occurs and causes NaN values in the transition matrix. These NaN values corrupt all downstream matrix operations including generator matrix computation and credit curve generation.
### `AP-CREDIT-RISK-014` — Wrong matrix logarithm method producing invalid generator matrices <sub>(medium)</sub>
When implementing generator() method without using scipy.linalg.logm for matrix logarithm computation, using numpy.log or other approximation methods produces invalid generator matrices with row sums not equal to zero. This violates the mathematical definition of an infinitesimal generator, causing incorrect continuous-time Markov chain modeling.
FILE:references/COMPONENTS.md
# Component Capability Map
**Project**: finance-bp-119--transitionMatrix
**Scan date**: 2026-04-22
**Stats**: {'total_files': 7, 'total_classes': 32, 'total_functions': 0, 'total_stages': 7}
## Modules (7)
- [state_space_definition](components/state_space_definition.md): 2 classes
- [data_preprocessing](components/data_preprocessing.md): 5 classes
- [synthetic_data_generation](components/synthetic_data_generation.md): 4 classes
- [matrix_estimation](components/matrix_estimation.md): 5 classes
- [matrix_representation](components/matrix_representation.md): 6 classes
- [matrix_operations](components/matrix_operations.md): 7 classes
- [credit_curve_analysis](components/credit_curve_analysis.md): 3 classes
FILE:references/CONSTRAINTS.md
# Constraints
## preservation_manifest
```yaml
required_objects:
business_decisions_count: 131
fatal_constraints_count: 46
non_fatal_constraints_count: 147
use_cases_count: 22
semantic_locks_count: 12
preconditions_count: 4
evidence_quality_rules_count: 2
traceback_scenarios_count: 5
```
FILE:references/LOCKS.md
# Semantic Locks + Preconditions
## Semantic Locks (12)
### `SL-01` <sub>(on_violation: fatal)</sub>
Execute sell orders before buy orders in every trading cycle
### `SL-02` <sub>(on_violation: fatal)</sub>
Trading signals MUST use next-bar execution (no look-ahead)
### `SL-03` <sub>(on_violation: fatal)</sub>
Entity IDs MUST follow format entity_type_exchange_code
### `SL-04` <sub>(on_violation: fatal)</sub>
DataFrame index MUST be MultiIndex (entity_id, timestamp)
### `SL-05` <sub>(on_violation: fatal)</sub>
TradingSignal MUST have EXACTLY ONE of: position_pct, order_money, order_amount
### `SL-06` <sub>(on_violation: fatal)</sub>
filter_result column semantics: True=BUY, False=SELL, None/NaN=NO ACTION
### `SL-07` <sub>(on_violation: fatal)</sub>
Transformer MUST run BEFORE Accumulator in factor pipeline
### `SL-08` <sub>(on_violation: fatal)</sub>
MACD parameters locked: fast=12, slow=26, signal=9
### `SL-09` <sub>(on_violation: warning)</sub>
Default transaction costs: buy_cost=0.001, sell_cost=0.001, slippage=0.001
### `SL-10` <sub>(on_violation: fatal)</sub>
A-share equity trading is T+1 (no same-day close of buy positions)
### `SL-11` <sub>(on_violation: fatal)</sub>
Recorder subclass MUST define provider AND data_schema class attributes
### `SL-12` <sub>(on_violation: fatal)</sub>
Factor result_df MUST contain either 'filter_result' OR 'score_result' column
## Preconditions (4)
- **`PC-01`**: `python3 -c 'import zvt; print(zvt.__version__)'` → on_fail: Run: python3 -m pip install zvt then re-run: python3 -m zvt.init_dirs to initialize data directories
- **`PC-02`**: `python3 -c "from zvt.api.kdata import get_kdata; df = get_kdata(entity_ids=['stock_sh_600000'], limit=1); assert df is not None and len(df) > 0, 'No kdata found'"` → on_fail: Run recorder first: python3 -m zvt.recorders.em.em_stock_kdata_recorder --entity_ids stock_sh_600000 (replace with your target entity IDs)
- **`PC-03`**: `python3 -c "import os; from pathlib import Path; zvt_home = Path(os.environ.get('ZVT_HOME', Path.home() / '.zvt')); assert zvt_home.exists(), f'ZVT home not found: {zvt_home}'"` → on_fail: Run: python3 -m zvt.init_dirs
- **`PC-04`**: `python3 -c "import os, tempfile; from pathlib import Path; zvt_home = Path(os.environ.get('ZVT_HOME', Path.home() / '.zvt')); test_f = zvt_home / '.write_test'; test_f.touch(); test_f.unlink()"` → on_fail: Check directory permissions: chmod u+w ~/.zvt or set ZVT_HOME environment variable to a writable location
FILE:references/USE_CASES.md
# Known Use Cases (KUC)
Total: **22**
## `KUC-101`
**Source**: `examples/notebooks/Adjust_NotRated_State.ipynb`
Credit rating transition matrices often contain 'not-rated' (NR) observations that need to be redistributed to rated states for downstream risk calculations and regulatory reporting.
## `KUC-102`
**Source**: `examples/notebooks/Matrix_Operations.ipynb`
Users need to understand how to initialize, validate, and work with transition matrices for credit risk modeling.
## `KUC-103`
**Source**: `examples/notebooks/Monthly_from_Annual.ipynb`
Credit risk models require transition matrices at different time horizons (monthly, quarterly, annual) but only annual matrices may be available; matrix exponentiation of generators enables temporal scaling.
## `KUC-104`
**Source**: `examples/python/adjust_nr_state.py`
Corporate credit rating migration data contains NR (not-rated) states that must be removed using noninformative redistribution method before calculating regulatory capital requirements.
## `KUC-105`
**Source**: `examples/python/characterize_datasets.py`
Data scientists need to understand the characteristics of credit rating transition datasets before applying estimation methods or building models.
## `KUC-106`
**Source**: `examples/python/compare_estimators.py`
Different transition matrix estimation methods produce different results; researchers need to compare cohort-based vs duration-based (Aalen-Johansen) estimators to choose appropriate methods for their data.
## `KUC-107`
**Source**: `examples/python/credit_curves.py`
Credit risk management requires visualization of how default probabilities and credit quality evolve over time through multi-period transition matrices.
## `KUC-108`
**Source**: `examples/python/data_cleaning_example.py`
Raw credit rating data requires preprocessing including column renaming, state validation, and absorbing state verification before it can be used for matrix estimation.
## `KUC-109`
**Source**: `examples/python/deterministic_paths.py`
Testing and validation of transition matrix estimators requires reproducible deterministic transition paths with known outcomes.
## `KUC-110`
**Source**: `examples/python/empirical_transition_matrix.py`
Credit risk modeling requires empirical transition matrix estimation from continuous-time duration data where observation times vary across entities.
## `KUC-111`
**Source**: `examples/python/estimate_matrix.py`
Complete workflow for estimating credit rating transition matrices from historical data using multiple estimation approaches with generator extraction.
## `KUC-112`
**Source**: `examples/python/fix_multiperiod_matrix.py`
Historical credit migration matrices may have structural issues (non-square, missing states, negative probabilities) that must be corrected before use in risk models.
## `KUC-113`
**Source**: `examples/python/generate_full_multiperiod_set.py`
Risk models require complete transition matrices across each time horizons; sparse historical observations must be expanded using matrix exponentiation.
## `KUC-114`
**Source**: `examples/python/generate_synthetic_data.py`
Development and testing of transition matrix estimators requires synthetic data with known properties for validation and benchmarking.
## `KUC-115`
**Source**: `examples/python/generate_visuals.py`
Stakeholders require visual representations of credit migration patterns including Sankey diagrams, heatmaps, and step plots for reporting and presentations.
## `KUC-116`
**Source**: `examples/python/matrix_from_cohort_data.py`
Credit rating agencies publish migration data in cohort format; estimation from this data format requires cohort-based transition matrix estimation.
## `KUC-117`
**Source**: `examples/python/matrix_from_duration_data.py`
Individual credit observations with varying timestamps require duration-based transition matrix estimation using time-to-event methodology.
## `KUC-118`
**Source**: `examples/python/matrix_lendingclub.py`
Peer-to-peer lending platforms like LendingClub have unique grade states; requires specialized transition matrix estimation from loan performance data.
## `KUC-119`
**Source**: `examples/python/matrix_operations.py`
Transition matrices require various mathematical operations including power, validation, printing, and generator extraction for risk calculations.
## `KUC-120`
**Source**: `examples/python/matrix_set_lendingclub.py`
P2P lending risk models require transition matrix sets across multiple periods to capture evolving loan portfolio behavior over time.
## `KUC-121`
**Source**: `examples/python/matrix_set_operations.py`
Multi-period risk models require operations on collections of transition matrices including copying, power-based cumulation, and validation.
## `KUC-122`
**Source**: `examples/python/state_space_operations.py`
Different credit rating agencies use different rating scales; users need to convert between S&P, Moody's, DBRS and other rating systems for portfolio analysis.
FILE:references/WISDOM.md
# Cross-Project Wisdom
Total: **8**
## `CW-CREDIT-RISK-001` — Strict input DataFrame schema validation
**From**: finance-bp-050--skorecard, finance-bp-112--openLGD · **Applicable to**: credit-risk
Both skorecard and openLGD require strict validation that input DataFrames contain exactly the expected columns (X/Y for openLGD, specified variable names for skorecard). This pattern is critical when data flows through multiple transformation stages where downstream modules access columns by name without defensive checking. Always validate column existence before pipeline execution.
## `CW-CREDIT-RISK-002` — Explicit random_state for ML model reproducibility
**From**: finance-bp-112--openLGD · **Applicable to**: credit-risk
In federated learning scenarios with SGDRegressor, omitting random_state causes non-deterministic results due to random data shuffling and weight initialization. This breaks federated learning convergence guarantees. Always set random_state explicitly when reproducibility across nodes or runs is required for regulatory auditability.
## `CW-CREDIT-RISK-003` — Mandatory data sorting before multi-stage estimation
**From**: finance-bp-050--skorecard, finance-bp-119--transitionMatrix · **Applicable to**: credit-risk
Both skorecard's two-phase bucketing and transitionMatrix's Aalen-Johansen estimator require data to be in a specific order before processing. Skorecard requires prebucketing before bucketing; transitionMatrix requires sorting by entity ID then time. Violating this ordering produces incorrect results or runtime errors. Always establish and enforce processing order in multi-stage pipelines.
## `CW-CREDIT-RISK-004` — Consistent API response key naming across all endpoints
**From**: finance-bp-112--openLGD · **Applicable to**: credit-risk
In federated systems with multiple API endpoints (/start, /update), all responses must use identical key names for parameters (intercept, coefficient). Inconsistency causes coordination loop failures in downstream consumers. Define a schema contract upfront and enforce key naming consistency across all response types.
## `CW-CREDIT-RISK-005` — Cardinality bounds checking before array operations
**From**: finance-bp-050--skorecard, finance-bp-119--transitionMatrix · **Applicable to**: credit-risk
Both skorecard's bucketers (max 100 unique values) and transitionMatrix's matrix operations (state cardinality matching matrix dimensions) require strict cardinality validation before creating numpy arrays or performing computations. Violations cause NotPreBucketedError or index out-of-bounds errors. Always validate cardinality constraints before array initialization.
## `CW-CREDIT-RISK-006` — Financial validation gates before transaction execution
**From**: finance-bp-072--lending · **Applicable to**: credit-risk
Lending systems require validation that disbursement amounts do not exceed limits, collateral values, or authorized periods before any transaction executes. These are financial loss prevention controls, not optional business logic. Missing these validations creates unauthorized exposure and regulatory compliance violations that cannot be remedied retroactively.
## `CW-CREDIT-RISK-007` — Mathematical constraint validation for probability outputs
**From**: finance-bp-050--skorecard, finance-bp-119--transitionMatrix · **Applicable to**: credit-risk
Credit risk models must validate mathematical constraints on outputs: skorecard's WoE requires valid bin assignments, transitionMatrix's transition matrices require row sums equals 1.0 and generator matrices require row sums equals 0.0. Invalid mathematical properties corrupt downstream risk calculations. Validate constraints before returning results.
## `CW-CREDIT-RISK-008` — Port-to-ID mapping consistency in distributed model serving
**From**: finance-bp-112--openLGD · **Applicable to**: credit-risk
When deploying distributed model servers, port numbers must map deterministically to server IDs (e.g., port 5001 maps to server ID 1). Computation of ID from port must be consistent across all components. Inconsistencies cause incorrect data directory selection and model parameter mismatches. Document and validate port-ID mappings during deployment.
FILE:references/components/credit_curve_analysis.md
# credit_curve_analysis (3 classes)
## `CreditCurve.default_curves`
`credit_curve_analysis/creditcurve-default-curves.py:0`
## `CreditCurve.validate`
`credit_curve_analysis/creditcurve-validate.py:0`
## `absorbing_state_detection`
`credit_curve_analysis/absorbing-state-detection.py:0`
FILE:references/components/data_preprocessing.md
# data_preprocessing (5 classes)
## `to_canonical`
`data_preprocessing/to-canonical.py:0`
## `to_compact`
`data_preprocessing/to-compact.py:0`
## `bin_timestamps`
`data_preprocessing/bin-timestamps.py:0`
## `generate_cohort_bounds`
`data_preprocessing/generate-cohort-bounds.py:0`
## `cohort_assignment`
`data_preprocessing/cohort-assignment.py:0`
FILE:references/components/matrix_estimation.md
# matrix_estimation (5 classes)
## `BaseEstimator.fit`
`matrix_estimation/baseestimator-fit.py:0`
## `CohortEstimator.fit`
`matrix_estimation/cohortestimator-fit.py:0`
## `AalenJohansenEstimator.fit`
`matrix_estimation/aalenjohansenestimator-fit.py:0`
## `confidence_interval_method`
`matrix_estimation/confidence-interval-method.py:0`
## `estimator_type`
`matrix_estimation/estimator-type.py:0`
FILE:references/components/matrix_operations.md
# matrix_operations (7 classes)
## `TransitionMatrix.generator`
`matrix_operations/transitionmatrix-generator.py:0`
## `TransitionMatrix.power`
`matrix_operations/transitionmatrix-power.py:0`
## `TransitionMatrixSet.cumulate`
`matrix_operations/transitionmatrixset-cumulate.py:0`
## `TransitionMatrixSet.incremental`
`matrix_operations/transitionmatrixset-incremental.py:0`
## `TransitionMatrix.remove`
`matrix_operations/transitionmatrix-remove.py:0`
## `TransitionMatrixSet.default_curves`
`matrix_operations/transitionmatrixset-default-curves.py:0`
## `generator_fix_negative`
`matrix_operations/generator-fix-negative.py:0`
FILE:references/components/matrix_representation.md
# matrix_representation (6 classes)
## `TransitionMatrix.validate`
`matrix_representation/transitionmatrix-validate.py:0`
## `TransitionMatrix.fix_rowsums`
`matrix_representation/transitionmatrix-fix-rowsums.py:0`
## `TransitionMatrix.characterize`
`matrix_representation/transitionmatrix-characterize.py:0`
## `TransitionMatrixSet.to_json`
`matrix_representation/transitionmatrixset-to-json.py:0`
## `validation_accuracy`
`matrix_representation/validation-accuracy.py:0`
## `matrix_set_method`
`matrix_representation/matrix-set-method.py:0`
FILE:references/components/state_space_definition.md
# state_space_definition (2 classes)
## `StateSpace.get_states`
`state_space_definition/statespace-get-states.py:0`
## `state_inference_strategy`
`state_space_definition/state-inference-strategy.py:0`
FILE:references/components/synthetic_data_generation.md
# synthetic_data_generation (4 classes)
## `exponential_transitions`
`synthetic_data_generation/exponential-transitions.py:0`
## `markov_chain`
`synthetic_data_generation/markov-chain.py:0`
## `long_format`
`synthetic_data_generation/long-format.py:0`
## `time_distribution`
`synthetic_data_generation/time-distribution.py:0`
基于监督学习、决策树或聚类等多种算法,自动为评分卡变量生成最优分箱边界,同时支持单调性约束和缺失值处理。
---
name: credit-scorecard
description: |-
基于监督学习、决策树或聚类等多种算法,自动为评分卡变量生成最优分箱边界,同时支持单调性约束和缺失值处理。
license: Proprietary. See LICENSE.txt in project root.
compatibility: Designed for Doramagic-host ecosystem (Claude Code / openclaw / Cursor). Requires Python 3.12+ with uv package manager.
metadata:
version: "v6.1"
blueprint_id: "finance-bp-050"
compiled_at: "2026-04-22T13:00:17.518473+00:00"
capability_markets: "global"
capability_activities: "credit-risk"
sop_version: "crystal-compilation-v6.1"
---
# 信用评分卡 (credit-scorecard)
> 基于监督学习、决策树或聚类等多种算法,自动为评分卡变量生成最优分箱边界,同时支持单调性约束和缺失值处理。
## Pipeline
`data_collection -> data_storage -> factor_computation -> target_selection -> trading_execution -> visualization`
## Top Use Cases (43 total)
### Optimal Supervised Bucketing (`UC-1`)
Automatically find optimal bucket boundaries that maximize predictive power while respecting monotonicity constraints
**Triggers**: optimal, supervised, monotonic
### Decision Tree Supervised Bucketing (`UC-2`)
Use supervised learning to find bucket boundaries based on target variable correlation
**Triggers**: decision tree, supervised, pre-bin
### Equal Width Unsupervised Bucketing (`UC-3`)
Divide numerical features into N equally spaced intervals regardless of data distribution
**Triggers**: equal width, unsupervised, histogram
For all **43** use cases, see [references/USE_CASES.md](references/USE_CASES.md).
**Execute trigger**: `When user intent matches intent_router.uc_entries[].positive_terms AND user uses action verb (run/execute/跑/执行/backtest/fetch/collect)`
## What I'll Ask You
- Target market: A-share (default), HK, or crypto? (US stocks in ZVT are half-baked — stockus_nasdaq_AAPL exists but coverage is thin)
- Data source / provider: eastmoney (free, no account), joinquant (account+paid), baostock (free, good history), akshare, or qmt (broker)?
- Strategy type: MACD golden-cross, MA crossover, volume breakout, fundamental screen, or custom factor?
- Time range: start_timestamp and end_timestamp for backtest period
- Target entity IDs: specific stocks (stock_sh_600000) or index components (SZ1000)?
## Semantic Locks (Fatal)
| ID | Rule | On Violation |
|---|---|---|
| `SL-01` | Execute sell orders before buy orders in every trading cycle | halt |
| `SL-02` | Trading signals MUST use next-bar execution (no look-ahead) | halt |
| `SL-03` | Entity IDs MUST follow format entity_type_exchange_code | halt |
| `SL-04` | DataFrame index MUST be MultiIndex (entity_id, timestamp) | halt |
| `SL-05` | TradingSignal MUST have EXACTLY ONE of: position_pct, order_money, order_amount | halt |
| `SL-06` | filter_result column semantics: True=BUY, False=SELL, None/NaN=NO ACTION | halt |
| `SL-07` | Transformer MUST run BEFORE Accumulator in factor pipeline | halt |
| `SL-08` | MACD parameters locked: fast=12, slow=26, signal=9 | halt |
Full lock definitions: [references/LOCKS.md](references/LOCKS.md)
## Top Anti-Patterns (14 total)
- **`AP-CREDIT-RISK-001`**: Empty DataFrame passed to bucketing pipeline
- **`AP-CREDIT-RISK-002`**: Multi-dimensional target array causing WoE shape mismatch
- **`AP-CREDIT-RISK-003`**: OptimalBucketer receiving high-cardinality numerical features
All 14 anti-patterns: [references/ANTI_PATTERNS.md](references/ANTI_PATTERNS.md)
## Evidence Quality Notice
> [QUALITY NOTICE] This crystal was compiled from blueprint finance-bp-050. Evidence verify ratio = 78.6% and audit fail total = 24. Generated results may have uncaptured requirement gaps. Verify critical decisions against source files (LATEST.yaml / LATEST.jsonl).
## Reference Files
| File | Contents | When to Load |
|---|---|---|
| [references/seed.yaml](references/seed.yaml) | V6+ 全量权威 (source-of-truth) | 有行为/决策争议时必读 |
| [references/ANTI_PATTERNS.md](references/ANTI_PATTERNS.md) | 14 条跨项目反模式 | 开始实现前 |
| [references/WISDOM.md](references/WISDOM.md) | 跨项目精华借鉴 | 架构决策时 |
| [references/CONSTRAINTS.md](references/CONSTRAINTS.md) | domain + fatal 约束 | 规则冲突时 |
| [references/USE_CASES.md](references/USE_CASES.md) | 全量 KUC-* 业务场景 | 需要完整示例时 |
| [references/LOCKS.md](references/LOCKS.md) | SL-* + preconditions + hints | 生成回测/交易代码前 |
| [references/COMPONENTS.md](references/COMPONENTS.md) | AST 组件地图(按 module 拆分)| 查 API 时 |
---
*Compiled by Doramagic crystal-compilation-v6.1 from `finance-bp-050` blueprint at 2026-04-22T13:00:17.518473+00:00.*
*See [human_summary.md](human_summary.md) for non-technical overview.*
FILE:human_summary.md
# Human Summary
> I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow. Just tell me what you want; I'll write the code, you don't have to dig docs. (Heads up: ZVT natively supports A-share, HK, and crypto. US stocks — stockus_nasdaq_AAPL — are half-baked; don't bother for serious work.)
## What I Can Do
- **tagline**: I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow. Just tell me what you want; I'll write the code, you don't have to dig docs. (Heads up: ZVT natively supports A-share, HK, and crypto. US stocks — stockus_nasdaq_AAPL — are half-baked; don't bother for serious work.)
- **use_cases**: ['Equal Width Unsupervised Bucketing', 'Decision Tree Supervised Bucketing', 'Optimal Supervised Bucketing', 'A-share MACD daily golden-cross backtest with hfq price adjustment from eastmoney', 'End-to-end ZVT pipeline: FinanceRecorder + GoodCompanyFactor + StockTrader', 'Multi-factor strategy with TargetSelector (AND mode) combining MACD + volume breakout', 'Index composition data collection (SZ1000, SZ2000) with EM recorder']
## What I Ask You
- Target market: A-share (default), HK, or crypto? (US stocks in ZVT are half-baked — stockus_nasdaq_AAPL exists but coverage is thin)
- Data source / provider: eastmoney (free, no account), joinquant (account+paid), baostock (free, good history), akshare, or qmt (broker)?
- Strategy type: MACD golden-cross, MA crossover, volume breakout, fundamental screen, or custom factor?
- Time range: start_timestamp and end_timestamp for backtest period
- Target entity IDs: specific stocks (stock_sh_600000) or index components (SZ1000)?
FILE:references/ANTI_PATTERNS.md
# Anti-Patterns (Cross-Project)
Total: **14**
## finance-bp-050--skorecard (5)
### `AP-CREDIT-RISK-001` — Empty DataFrame passed to bucketing pipeline <sub>(high)</sub>
When preparing input data for bucketing, passing an empty DataFrame with zero rows or zero columns causes immediate ValueError at validation stage. This prevents any downstream processing and blocks the entire credit risk scoring pipeline from executing. The root cause is missing defensive validation before data enters the bucketing workflow.
### `AP-CREDIT-RISK-002` — Multi-dimensional target array causing WoE shape mismatch <sub>(high)</sub>
When providing target variable y to bucketers without normalizing to 1D numpy array through _check_y validation, downstream Weight of Evidence calculations fail with shape mismatches. The consequence is corrupted bucket tables with incorrect credit risk scores that misrepresent default probability estimates.
### `AP-CREDIT-RISK-003` — OptimalBucketer receiving high-cardinality numerical features <sub>(high)</sub>
When implementing prebucketing for OptimalBucketer on numerical features without reducing to at most 100 unique values, the system raises NotPreBucketedError and blocks the entire bucketing pipeline. Similarly, AsIsNumericalBucketer fails with the same error for columns exceeding 100 unique values, preventing feature transformation in production scoring.
### `AP-CREDIT-RISK-004` — Special values distorting optimal bin boundaries <sub>(high)</sub>
When implementing fit() for bucketers without filtering special values from X before computing bin boundaries using _filter_specials_for_fit(), outlier special values distort optimal bin boundaries. This causes incorrect weight-of-evidence calculations and unreliable credit risk scores that misrepresent borrower default probabilities.
### `AP-CREDIT-RISK-005` — Two-phase bucketing ordering violation causing special value loss <sub>(high)</sub>
When fitting a BucketingProcess with two-phase bucketing without fitting prebucketing_pipeline before bucketing_pipeline, special value remapping fails because pre-bucket labels are unavailable. Additionally, not using _find_remapped_specials() after prebucketing causes special values to lose their correct bucket mappings, resulting in runtime errors.
## finance-bp-072--lending (3)
### `AP-CREDIT-RISK-006` — Loan amount exceeding product and collateral limits <sub>(high)</sub>
When validating loan amount for loan applications without enforcing loan_amount does not exceed maximum_loan_amount from loan product or proposed securities, disbursing amounts exceeding product or collateral limits exposes the lender to uncollateralized risk. This violates lending policy and creates direct financial loss exposure through unauthorized lending.
### `AP-CREDIT-RISK-007` — Disbursement validation failures creating unauthorized exposure <sub>(high)</sub>
When implementing loan disbursement validation without checking disbursed amount against loan limit, assigned security value, available limit amount, and limit applicability dates, unauthorized disbursements occur. For Line of Credit loans, disbursement outside approved periods or exceeding available limits creates unauthorized lending exposure and regulatory compliance violations.
### `AP-CREDIT-RISK-008` — Interest accrual on written-off loans inflating income <sub>(high)</sub>
When processing interest accrual for Written Off loans without verifying posting_date is on or after the loan write-off date, interest is artificially inflated on non-performing assets. This misrepresents loan portfolio value, violates provisioning requirements, and creates false income reporting that misleads stakeholders about actual financial performance.
## finance-bp-112--openLGD (2)
### `AP-CREDIT-RISK-009` — Loop index errors in federated parameter averaging <sub>(high)</sub>
When implementing federated parameter averaging logic, using the final index n instead of the loop variable k causes only the last server's weight to be applied repeatedly. Additionally, skipping the first server by starting loop index at 1 excludes valid parameters from averaging, breaking federated convergence and producing incorrect LGD estimates across all nodes.
### `AP-CREDIT-RISK-010` — API response format inconsistency breaking federated coordination <sub>(high)</sub>
When implementing GET /start and POST /update endpoints for LGD estimation without consistent 'intercept' and 'coefficient' keys in JSON responses, the federated coordinator fails to parse responses causing KeyError. Different return key names (e.g., 'coef' instead of 'coefficient') break both standalone and federated execution paths.
## finance-bp-119--transitionMatrix (4)
### `AP-CREDIT-RISK-011` — Invalid transition probabilities corrupting Markov matrices <sub>(high)</sub>
When generating synthetic Markov chain data or estimating transition matrices with probabilities outside [0, 1] or row sums not equal to 1.0, the resulting matrices violate the fundamental mathematical definition of a stochastic transition matrix. This corrupts all downstream Markov chain modeling and credit curve generation, producing unreliable credit risk estimates.
### `AP-CREDIT-RISK-012` — Unsorted event data causing incorrect transition matrix estimates <sub>(high)</sub>
When feeding generated data to cohort or duration estimators without sorting by entity ID first, then by ascending time, incorrect timepoint assignment occurs in estimators, leading to wrong transition counts. Unsorted data also causes the Aalen-Johansen algorithm to process events out of temporal order, producing incorrect transition matrices that violate the Markov property.
### `AP-CREDIT-RISK-013` — Zero-count division causing NaN in transition matrices <sub>(high)</sub>
When normalizing counts to produce transition probabilities without checking source state population count is greater than zero before division, division by zero occurs and causes NaN values in the transition matrix. These NaN values corrupt all downstream matrix operations including generator matrix computation and credit curve generation.
### `AP-CREDIT-RISK-014` — Wrong matrix logarithm method producing invalid generator matrices <sub>(medium)</sub>
When implementing generator() method without using scipy.linalg.logm for matrix logarithm computation, using numpy.log or other approximation methods produces invalid generator matrices with row sums not equal to zero. This violates the mathematical definition of an infinitesimal generator, causing incorrect continuous-time Markov chain modeling.
FILE:references/COMPONENTS.md
# Component Capability Map
**Project**: finance-bp-050--skorecard
**Scan date**: 2026-04-22
**Stats**: {'total_files': 9, 'total_classes': 18, 'total_functions': 0, 'total_stages': 9}
## Modules (9)
- [data_preparation](components/data_preparation.md): 2 classes
- [feature_pre-bucketing_(optional)](components/feature_pre-bucketing_-optional.md): 2 classes
- [feature_bucketing_/_binning](components/feature_bucketing_-_binning.md): 2 classes
- [weight_of_evidence_(woe)_encoding](components/weight_of_evidence_-woe-_encoding.md): 2 classes
- [feature_selection](components/feature_selection.md): 2 classes
- [logistic_regression_model_training](components/logistic_regression_model_training.md): 2 classes
- [scorecard_rescaling](components/scorecard_rescaling.md): 2 classes
- [validation_and_reporting](components/validation_and_reporting.md): 2 classes
- [model_deployment](components/model_deployment.md): 2 classes
FILE:references/CONSTRAINTS.md
# Constraints
## preservation_manifest
```yaml
required_objects:
business_decisions_count: 124
fatal_constraints_count: 64
non_fatal_constraints_count: 194
use_cases_count: 43
semantic_locks_count: 12
preconditions_count: 4
evidence_quality_rules_count: 2
traceback_scenarios_count: 5
```
FILE:references/LOCKS.md
# Semantic Locks + Preconditions
## Semantic Locks (12)
### `SL-01` <sub>(on_violation: fatal)</sub>
Execute sell orders before buy orders in every trading cycle
### `SL-02` <sub>(on_violation: fatal)</sub>
Trading signals MUST use next-bar execution (no look-ahead)
### `SL-03` <sub>(on_violation: fatal)</sub>
Entity IDs MUST follow format entity_type_exchange_code
### `SL-04` <sub>(on_violation: fatal)</sub>
DataFrame index MUST be MultiIndex (entity_id, timestamp)
### `SL-05` <sub>(on_violation: fatal)</sub>
TradingSignal MUST have EXACTLY ONE of: position_pct, order_money, order_amount
### `SL-06` <sub>(on_violation: fatal)</sub>
filter_result column semantics: True=BUY, False=SELL, None/NaN=NO ACTION
### `SL-07` <sub>(on_violation: fatal)</sub>
Transformer MUST run BEFORE Accumulator in factor pipeline
### `SL-08` <sub>(on_violation: fatal)</sub>
MACD parameters locked: fast=12, slow=26, signal=9
### `SL-09` <sub>(on_violation: warning)</sub>
Default transaction costs: buy_cost=0.001, sell_cost=0.001, slippage=0.001
### `SL-10` <sub>(on_violation: fatal)</sub>
A-share equity trading is T+1 (no same-day close of buy positions)
### `SL-11` <sub>(on_violation: fatal)</sub>
Recorder subclass MUST define provider AND data_schema class attributes
### `SL-12` <sub>(on_violation: fatal)</sub>
Factor result_df MUST contain either 'filter_result' OR 'score_result' column
## Preconditions (4)
- **`PC-01`**: `python3 -c 'import zvt; print(zvt.__version__)'` → on_fail: Run: python3 -m pip install zvt then re-run: python3 -m zvt.init_dirs to initialize data directories
- **`PC-02`**: `python3 -c "from zvt.api.kdata import get_kdata; df = get_kdata(entity_ids=['stock_sh_600000'], limit=1); assert df is not None and len(df) > 0, 'No kdata found'"` → on_fail: Run recorder first: python3 -m zvt.recorders.em.em_stock_kdata_recorder --entity_ids stock_sh_600000 (replace with your target entity IDs)
- **`PC-03`**: `python3 -c "import os; from pathlib import Path; zvt_home = Path(os.environ.get('ZVT_HOME', Path.home() / '.zvt')); assert zvt_home.exists(), f'ZVT home not found: {zvt_home}'"` → on_fail: Run: python3 -m zvt.init_dirs
- **`PC-04`**: `python3 -c "import os, tempfile; from pathlib import Path; zvt_home = Path(os.environ.get('ZVT_HOME', Path.home() / '.zvt')); test_f = zvt_home / '.write_test'; test_f.touch(); test_f.unlink()"` → on_fail: Check directory permissions: chmod u+w ~/.zvt or set ZVT_HOME environment variable to a writable location
FILE:references/USE_CASES.md
# Known Use Cases (KUC)
Total: **43**
## `KUC-1`
**Source**: `skorecard/bucketers/bucketers.py`
Automatically find optimal bucket boundaries that maximize predictive power while respecting monotonicity constraints
## `KUC-2`
**Source**: `skorecard/bucketers/bucketers.py`
Use supervised learning to find bucket boundaries based on target variable correlation
## `KUC-3`
**Source**: `skorecard/bucketers/bucketers.py`
Divide numerical features into N equally spaced intervals regardless of data distribution
## `KUC-4`
**Source**: `skorecard/bucketers/bucketers.py`
Divide numerical features into N buckets with equal number of observations (quantiles)
## `KUC-5`
**Source**: `skorecard/bucketers/bucketers.py`
Use agglomerative clustering to find natural groupings in numerical data
## `KUC-6`
**Source**: `skorecard/bucketers/bucketers.py`
Convert categorical variables into ordered ordinal numbers based on target rate or frequency
## `KUC-7`
**Source**: `skorecard/bucketers/bucketers.py`
Treat existing unique categories as pre-defined buckets without transformation
## `KUC-8`
**Source**: `skorecard/bucketers/bucketers.py`
Treat existing unique numerical values as bucket boundaries (for pre-bucketed data)
## `KUC-9`
**Source**: `skorecard/bucketers/bucketers.py`
Apply manually defined bucket boundaries from YAML or dictionary to new data
## `KUC-10`
**Source**: `skorecard/pipeline/bucketing_process.py`
First pre-bucket high-cardinality features, then apply final bucketing strategy
## `KUC-11`
**Source**: `skorecard/bucketers/base_bucketer.py`
Handle missing values by assigning them to specific buckets or treating separately
## `KUC-12`
**Source**: `skorecard/bucketers/base_bucketer.py`
Assign specific outlier or important values to their own dedicated buckets
## `KUC-13`
**Source**: `skorecard/bucketers/base_bucketer.py`
Visually explore and manually adjust bucket boundaries using Dash web app
## `KUC-14`
**Source**: `skorecard/preprocessing/_WoEEncoder.py`
Transform bucket IDs into Weight of Evidence values for logistic regression
## `KUC-15`
**Source**: `skorecard/metrics/metrics.py`
Measure the predictive power of individual features for credit risk
## `KUC-16`
**Source**: `skorecard/reporting/report.py`
Monitor distribution drift between training and production data
## `KUC-17`
**Source**: `skorecard/linear_model/linear_model.py`
Build logistic regression model with statistical significance for credit scoring
## `KUC-18`
**Source**: `skorecard/skorecard.py`
Build complete credit scoring scorecard in one step
## `KUC-19`
**Source**: `skorecard/rescale/rescale.py`
Convert model probabilities to traditional scorecard scale (e.g., 300-850)
## `KUC-20`
**Source**: `skorecard/reporting/report.py`
Generate detailed bucket tables with event rate, WoE, IV for documentation
## `KUC-21`
**Source**: `skorecard/reporting/plotting.py`
Visualize bucket distributions with event rate or WoE trends
## `KUC-22`
**Source**: `skorecard/bucket_mapping.py`
Export bucket mappings to YAML for production deployment
## `KUC-23`
**Source**: `skorecard/pipeline/pipeline.py`
Integrate skorecard bucketers into existing scikit-learn ML pipelines
## `KUC-24`
**Source**: `docs/tutorials/2_feature_selection.ipynb`
Select most predictive and stable features using IV and PSI metrics
## `KUC-25`
**Source**: `skorecard/utils/validation.py`
Detect suppressor effects and multicollinearity between features
## `KUC-26`
**Source**: `docs/tutorials/categoricals.ipynb`
Handle categorical variables with many categories in credit scoring
## `KUC-27`
**Source**: `docs/discussion/benchmark_with_EBM.ipynb`
Compare skorecard performance against Explainable Boosting Machines
## `KUC-101`
**Source**: `docs/discussion/benchmark_stats_feature.ipynb`
Compare performance of different machine learning classifiers on credit card default prediction using AUC metrics.
## `KUC-103`
**Source**: `docs/discussion/benchmarks.ipynb`
Run comprehensive benchmarks comparing multiple classifiers on credit card data with timing analysis.
## `KUC-104`
**Source**: `docs/howto/Optimizations.ipynb`
Find optimal bucketing parameters (max_n_bins, min_bin_size) using grid search with Information Value scoring.
## `KUC-105`
**Source**: `docs/howto/mix_with_other_packages.ipynb`
Combine skorecard bucketing with external packages like category_encoders and sklearn transformers in a pipeline.
## `KUC-106`
**Source**: `docs/howto/psi_and_iv.ipynb`
Calculate Population Stability Index (PSI) and Information Value (IV) to validate feature stability and predictive power.
## `KUC-107`
**Source**: `docs/howto/save_buckets_to_file.ipynb`
Persist bucketer configurations to YAML files for reuse and deployment across environments.
## `KUC-108`
**Source**: `docs/howto/using_manually_defined_buckets.ipynb`
Define custom bucket boundaries manually for specific business requirements without automatic binning.
## `KUC-109`
**Source**: `docs/tutorials/1_bucketing.ipynb`
Learn fundamental bucketing concepts for credit card data including categorical and numerical feature handling.
## `KUC-111`
**Source**: `docs/tutorials/3_skorecard_model.ipynb`
Build an end-to-end scorecard model combining bucketing with logistic regression for credit scoring.
## `KUC-113`
**Source**: `docs/tutorials/interactive_bucketing.ipynb`
Learn interactive bucketing approach for manual adjustment of bin boundaries in a pipeline.
## `KUC-114`
**Source**: `docs/tutorials/methods.ipynb`
Explore bucketer methods including summary statistics, bucket tables, plots, and YAML export.
## `KUC-115`
**Source**: `docs/tutorials/missing_values.ipynb`
Handle missing values in bucketing with various treatment strategies like neutral, similar, least_risky.
## `KUC-116`
**Source**: `docs/tutorials/reporting.ipynb`
Generate reports and visualizations for scorecard models including bucket tables and weight plots.
## `KUC-117`
**Source**: `docs/tutorials/specials.ipynb`
Define special values and ranges in bucketing that require separate treatment from regular bins.
## `KUC-118`
**Source**: `docs/tutorials/the_basics.ipynb`
Introduction to basic bucketing operations including DecisionTree and EqualWidth bucketers.
## `KUC-119`
**Source**: `docs/tutorials/using-bucketing-process.ipynb`
Learn the BucketingProcess workflow with pre-bucketing and bucketing stages for complex credit scoring.
FILE:references/WISDOM.md
# Cross-Project Wisdom
Total: **8**
## `CW-CREDIT-RISK-001` — Strict input DataFrame schema validation
**From**: finance-bp-050--skorecard, finance-bp-112--openLGD · **Applicable to**: credit-risk
Both skorecard and openLGD require strict validation that input DataFrames contain exactly the expected columns (X/Y for openLGD, specified variable names for skorecard). This pattern is critical when data flows through multiple transformation stages where downstream modules access columns by name without defensive checking. Always validate column existence before pipeline execution.
## `CW-CREDIT-RISK-002` — Explicit random_state for ML model reproducibility
**From**: finance-bp-112--openLGD · **Applicable to**: credit-risk
In federated learning scenarios with SGDRegressor, omitting random_state causes non-deterministic results due to random data shuffling and weight initialization. This breaks federated learning convergence guarantees. Always set random_state explicitly when reproducibility across nodes or runs is required for regulatory auditability.
## `CW-CREDIT-RISK-003` — Mandatory data sorting before multi-stage estimation
**From**: finance-bp-050--skorecard, finance-bp-119--transitionMatrix · **Applicable to**: credit-risk
Both skorecard's two-phase bucketing and transitionMatrix's Aalen-Johansen estimator require data to be in a specific order before processing. Skorecard requires prebucketing before bucketing; transitionMatrix requires sorting by entity ID then time. Violating this ordering produces incorrect results or runtime errors. Always establish and enforce processing order in multi-stage pipelines.
## `CW-CREDIT-RISK-004` — Consistent API response key naming across all endpoints
**From**: finance-bp-112--openLGD · **Applicable to**: credit-risk
In federated systems with multiple API endpoints (/start, /update), all responses must use identical key names for parameters (intercept, coefficient). Inconsistency causes coordination loop failures in downstream consumers. Define a schema contract upfront and enforce key naming consistency across all response types.
## `CW-CREDIT-RISK-005` — Cardinality bounds checking before array operations
**From**: finance-bp-050--skorecard, finance-bp-119--transitionMatrix · **Applicable to**: credit-risk
Both skorecard's bucketers (max 100 unique values) and transitionMatrix's matrix operations (state cardinality matching matrix dimensions) require strict cardinality validation before creating numpy arrays or performing computations. Violations cause NotPreBucketedError or index out-of-bounds errors. Always validate cardinality constraints before array initialization.
## `CW-CREDIT-RISK-006` — Financial validation gates before transaction execution
**From**: finance-bp-072--lending · **Applicable to**: credit-risk
Lending systems require validation that disbursement amounts do not exceed limits, collateral values, or authorized periods before any transaction executes. These are financial loss prevention controls, not optional business logic. Missing these validations creates unauthorized exposure and regulatory compliance violations that cannot be remedied retroactively.
## `CW-CREDIT-RISK-007` — Mathematical constraint validation for probability outputs
**From**: finance-bp-050--skorecard, finance-bp-119--transitionMatrix · **Applicable to**: credit-risk
Credit risk models must validate mathematical constraints on outputs: skorecard's WoE requires valid bin assignments, transitionMatrix's transition matrices require row sums equals 1.0 and generator matrices require row sums equals 0.0. Invalid mathematical properties corrupt downstream risk calculations. Validate constraints before returning results.
## `CW-CREDIT-RISK-008` — Port-to-ID mapping consistency in distributed model serving
**From**: finance-bp-112--openLGD · **Applicable to**: credit-risk
When deploying distributed model servers, port numbers must map deterministically to server IDs (e.g., port 5001 maps to server ID 1). Computation of ID from port must be consistent across all components. Inconsistencies cause incorrect data directory selection and model parameter mismatches. Document and validate port-ID mappings during deployment.
FILE:references/components/data_preparation.md
# data_preparation (2 classes)
## `ensure_dataframe`
`data_preparation/ensure-dataframe.py:0`
## `slot`
`data_preparation/slot.py:0`
FILE:references/components/feature_bucketing_-_binning.md
# feature_bucketing_/_binning (2 classes)
## `BaseBucketer.fit`
`feature_bucketing_/_binning/basebucketer-fit.py:0`
## `slot`
`feature_bucketing_/_binning/slot.py:0`
FILE:references/components/feature_pre-bucketing_-optional.md
# feature_pre-bucketing_(optional) (2 classes)
## `BaseBucketer.fit`
`feature_pre-bucketing_(optional)/basebucketer-fit.py:0`
## `slot`
`feature_pre-bucketing_(optional)/slot.py:0`
FILE:references/components/feature_selection.md
# feature_selection (2 classes)
## `iv`
`feature_selection/iv.py:0`
## `slot`
`feature_selection/slot.py:0`
FILE:references/components/logistic_regression_model_training.md
# logistic_regression_model_training (2 classes)
## `LogisticRegression.fit`
`logistic_regression_model_training/logisticregression-fit.py:0`
## `slot`
`logistic_regression_model_training/slot.py:0`
FILE:references/components/model_deployment.md
# model_deployment (2 classes)
## `FeaturesBucketMapping.save_yml`
`model_deployment/featuresbucketmapping-save-yml.py:0`
## `slot`
`model_deployment/slot.py:0`
FILE:references/components/scorecard_rescaling.md
# scorecard_rescaling (2 classes)
## `ScoreCardPoints.fit`
`scorecard_rescaling/scorecardpoints-fit.py:0`
## `slot`
`scorecard_rescaling/slot.py:0`
FILE:references/components/validation_and_reporting.md
# validation_and_reporting (2 classes)
## `SkorecardPipeline.bucket_table`
`validation_and_reporting/skorecardpipeline-bucket-table.py:0`
## `slot`
`validation_and_reporting/slot.py:0`
FILE:references/components/weight_of_evidence_-woe-_encoding.md
# weight_of_evidence_(woe)_encoding (2 classes)
## `WoeEncoder.fit`
`weight_of_evidence_(woe)_encoding/woeencoder-fit.py:0`
## `slot`
`weight_of_evidence_(woe)_encoding/slot.py:0`
构建并训练 LGD(违约损失率)机器学习模型,支持基于历史违约数据的信用风险量化评估与预测。
---
name: credit-lgd-model
description: |-
构建并训练 LGD(违约损失率)机器学习模型,支持基于历史违约数据的信用风险量化评估与预测。
license: Proprietary. See LICENSE.txt in project root.
compatibility: Designed for Doramagic-host ecosystem (Claude Code / openclaw / Cursor). Requires Python 3.12+ with uv package manager.
metadata:
version: "v6.1"
blueprint_id: "finance-bp-112"
compiled_at: "2026-04-22T13:00:54.441302+00:00"
capability_markets: "global"
capability_activities: "credit-risk"
sop_version: "crystal-compilation-v6.1"
---
# 信用违约损失模型 (credit-lgd-model)
> 构建并训练 LGD(违约损失率)机器学习模型,支持基于历史违约数据的信用风险量化评估与预测。
## Pipeline
`data_collection -> data_storage -> factor_computation -> target_selection -> trading_execution -> visualization`
## Top Use Cases (1 total)
### Sphinx Documentation Configuration (`UC-101`)
This file configures the Sphinx documentation builder for the openLGD project, setting up project metadata, version information, and path configuratio
**Triggers**: documentation, sphinx, configuration
**Execute trigger**: `When user intent matches intent_router.uc_entries[].positive_terms AND user uses action verb (run/execute/跑/执行/backtest/fetch/collect)`
## What I'll Ask You
- Target market: A-share (default), HK, or crypto? (US stocks in ZVT are half-baked — stockus_nasdaq_AAPL exists but coverage is thin)
- Data source / provider: eastmoney (free, no account), joinquant (account+paid), baostock (free, good history), akshare, or qmt (broker)?
- Strategy type: MACD golden-cross, MA crossover, volume breakout, fundamental screen, or custom factor?
- Time range: start_timestamp and end_timestamp for backtest period
- Target entity IDs: specific stocks (stock_sh_600000) or index components (SZ1000)?
## Semantic Locks (Fatal)
| ID | Rule | On Violation |
|---|---|---|
| `SL-01` | Execute sell orders before buy orders in every trading cycle | halt |
| `SL-02` | Trading signals MUST use next-bar execution (no look-ahead) | halt |
| `SL-03` | Entity IDs MUST follow format entity_type_exchange_code | halt |
| `SL-04` | DataFrame index MUST be MultiIndex (entity_id, timestamp) | halt |
| `SL-05` | TradingSignal MUST have EXACTLY ONE of: position_pct, order_money, order_amount | halt |
| `SL-06` | filter_result column semantics: True=BUY, False=SELL, None/NaN=NO ACTION | halt |
| `SL-07` | Transformer MUST run BEFORE Accumulator in factor pipeline | halt |
| `SL-08` | MACD parameters locked: fast=12, slow=26, signal=9 | halt |
Full lock definitions: [references/LOCKS.md](references/LOCKS.md)
## Top Anti-Patterns (14 total)
- **`AP-CREDIT-RISK-001`**: Empty DataFrame passed to bucketing pipeline
- **`AP-CREDIT-RISK-002`**: Multi-dimensional target array causing WoE shape mismatch
- **`AP-CREDIT-RISK-003`**: OptimalBucketer receiving high-cardinality numerical features
All 14 anti-patterns: [references/ANTI_PATTERNS.md](references/ANTI_PATTERNS.md)
## Evidence Quality Notice
> [QUALITY NOTICE] This crystal was compiled from blueprint finance-bp-112. Evidence verify ratio = 21.0% and audit fail total = 23. Generated results may have uncaptured requirement gaps. Verify critical decisions against source files (LATEST.yaml / LATEST.jsonl).
## Reference Files
| File | Contents | When to Load |
|---|---|---|
| [references/seed.yaml](references/seed.yaml) | V6+ 全量权威 (source-of-truth) | 有行为/决策争议时必读 |
| [references/ANTI_PATTERNS.md](references/ANTI_PATTERNS.md) | 14 条跨项目反模式 | 开始实现前 |
| [references/WISDOM.md](references/WISDOM.md) | 跨项目精华借鉴 | 架构决策时 |
| [references/CONSTRAINTS.md](references/CONSTRAINTS.md) | domain + fatal 约束 | 规则冲突时 |
| [references/USE_CASES.md](references/USE_CASES.md) | 全量 KUC-* 业务场景 | 需要完整示例时 |
| [references/LOCKS.md](references/LOCKS.md) | SL-* + preconditions + hints | 生成回测/交易代码前 |
| [references/COMPONENTS.md](references/COMPONENTS.md) | AST 组件地图(按 module 拆分)| 查 API 时 |
---
*Compiled by Doramagic crystal-compilation-v6.1 from `finance-bp-112` blueprint at 2026-04-22T13:00:54.441302+00:00.*
*See [human_summary.md](human_summary.md) for non-technical overview.*
FILE:human_summary.md
# Human Summary
> I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow. Just tell me what you want; I'll write the code, you don't have to dig docs. (Heads up: ZVT natively supports A-share, HK, and crypto. US stocks — stockus_nasdaq_AAPL — are half-baked; don't bother for serious work.)
## What I Can Do
- **tagline**: I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow. Just tell me what you want; I'll write the code, you don't have to dig docs. (Heads up: ZVT natively supports A-share, HK, and crypto. US stocks — stockus_nasdaq_AAPL — are half-baked; don't bother for serious work.)
- **use_cases**: ['Sphinx Documentation Configuration', 'A-share MACD daily golden-cross backtest with hfq price adjustment from eastmoney', 'End-to-end ZVT pipeline: FinanceRecorder + GoodCompanyFactor + StockTrader', 'Multi-factor strategy with TargetSelector (AND mode) combining MACD + volume breakout', 'Index composition data collection (SZ1000, SZ2000) with EM recorder', 'Institutional fund holdings tracker via joinquant_fund_runner pattern', 'Custom Transformer + Accumulator factor with per-entity rolling state']
## What I Ask You
- Target market: A-share (default), HK, or crypto? (US stocks in ZVT are half-baked — stockus_nasdaq_AAPL exists but coverage is thin)
- Data source / provider: eastmoney (free, no account), joinquant (account+paid), baostock (free, good history), akshare, or qmt (broker)?
- Strategy type: MACD golden-cross, MA crossover, volume breakout, fundamental screen, or custom factor?
- Time range: start_timestamp and end_timestamp for backtest period
- Target entity IDs: specific stocks (stock_sh_600000) or index components (SZ1000)?
FILE:references/ANTI_PATTERNS.md
# Anti-Patterns (Cross-Project)
Total: **14**
## finance-bp-050--skorecard (5)
### `AP-CREDIT-RISK-001` — Empty DataFrame passed to bucketing pipeline <sub>(high)</sub>
When preparing input data for bucketing, passing an empty DataFrame with zero rows or zero columns causes immediate ValueError at validation stage. This prevents any downstream processing and blocks the entire credit risk scoring pipeline from executing. The root cause is missing defensive validation before data enters the bucketing workflow.
### `AP-CREDIT-RISK-002` — Multi-dimensional target array causing WoE shape mismatch <sub>(high)</sub>
When providing target variable y to bucketers without normalizing to 1D numpy array through _check_y validation, downstream Weight of Evidence calculations fail with shape mismatches. The consequence is corrupted bucket tables with incorrect credit risk scores that misrepresent default probability estimates.
### `AP-CREDIT-RISK-003` — OptimalBucketer receiving high-cardinality numerical features <sub>(high)</sub>
When implementing prebucketing for OptimalBucketer on numerical features without reducing to at most 100 unique values, the system raises NotPreBucketedError and blocks the entire bucketing pipeline. Similarly, AsIsNumericalBucketer fails with the same error for columns exceeding 100 unique values, preventing feature transformation in production scoring.
### `AP-CREDIT-RISK-004` — Special values distorting optimal bin boundaries <sub>(high)</sub>
When implementing fit() for bucketers without filtering special values from X before computing bin boundaries using _filter_specials_for_fit(), outlier special values distort optimal bin boundaries. This causes incorrect weight-of-evidence calculations and unreliable credit risk scores that misrepresent borrower default probabilities.
### `AP-CREDIT-RISK-005` — Two-phase bucketing ordering violation causing special value loss <sub>(high)</sub>
When fitting a BucketingProcess with two-phase bucketing without fitting prebucketing_pipeline before bucketing_pipeline, special value remapping fails because pre-bucket labels are unavailable. Additionally, not using _find_remapped_specials() after prebucketing causes special values to lose their correct bucket mappings, resulting in runtime errors.
## finance-bp-072--lending (3)
### `AP-CREDIT-RISK-006` — Loan amount exceeding product and collateral limits <sub>(high)</sub>
When validating loan amount for loan applications without enforcing loan_amount does not exceed maximum_loan_amount from loan product or proposed securities, disbursing amounts exceeding product or collateral limits exposes the lender to uncollateralized risk. This violates lending policy and creates direct financial loss exposure through unauthorized lending.
### `AP-CREDIT-RISK-007` — Disbursement validation failures creating unauthorized exposure <sub>(high)</sub>
When implementing loan disbursement validation without checking disbursed amount against loan limit, assigned security value, available limit amount, and limit applicability dates, unauthorized disbursements occur. For Line of Credit loans, disbursement outside approved periods or exceeding available limits creates unauthorized lending exposure and regulatory compliance violations.
### `AP-CREDIT-RISK-008` — Interest accrual on written-off loans inflating income <sub>(high)</sub>
When processing interest accrual for Written Off loans without verifying posting_date is on or after the loan write-off date, interest is artificially inflated on non-performing assets. This misrepresents loan portfolio value, violates provisioning requirements, and creates false income reporting that misleads stakeholders about actual financial performance.
## finance-bp-112--openLGD (2)
### `AP-CREDIT-RISK-009` — Loop index errors in federated parameter averaging <sub>(high)</sub>
When implementing federated parameter averaging logic, using the final index n instead of the loop variable k causes only the last server's weight to be applied repeatedly. Additionally, skipping the first server by starting loop index at 1 excludes valid parameters from averaging, breaking federated convergence and producing incorrect LGD estimates across all nodes.
### `AP-CREDIT-RISK-010` — API response format inconsistency breaking federated coordination <sub>(high)</sub>
When implementing GET /start and POST /update endpoints for LGD estimation without consistent 'intercept' and 'coefficient' keys in JSON responses, the federated coordinator fails to parse responses causing KeyError. Different return key names (e.g., 'coef' instead of 'coefficient') break both standalone and federated execution paths.
## finance-bp-119--transitionMatrix (4)
### `AP-CREDIT-RISK-011` — Invalid transition probabilities corrupting Markov matrices <sub>(high)</sub>
When generating synthetic Markov chain data or estimating transition matrices with probabilities outside [0, 1] or row sums not equal to 1.0, the resulting matrices violate the fundamental mathematical definition of a stochastic transition matrix. This corrupts all downstream Markov chain modeling and credit curve generation, producing unreliable credit risk estimates.
### `AP-CREDIT-RISK-012` — Unsorted event data causing incorrect transition matrix estimates <sub>(high)</sub>
When feeding generated data to cohort or duration estimators without sorting by entity ID first, then by ascending time, incorrect timepoint assignment occurs in estimators, leading to wrong transition counts. Unsorted data also causes the Aalen-Johansen algorithm to process events out of temporal order, producing incorrect transition matrices that violate the Markov property.
### `AP-CREDIT-RISK-013` — Zero-count division causing NaN in transition matrices <sub>(high)</sub>
When normalizing counts to produce transition probabilities without checking source state population count is greater than zero before division, division by zero occurs and causes NaN values in the transition matrix. These NaN values corrupt all downstream matrix operations including generator matrix computation and credit curve generation.
### `AP-CREDIT-RISK-014` — Wrong matrix logarithm method producing invalid generator matrices <sub>(medium)</sub>
When implementing generator() method without using scipy.linalg.logm for matrix logarithm computation, using numpy.log or other approximation methods produces invalid generator matrices with row sums not equal to zero. This violates the mathematical definition of an infinitesimal generator, causing incorrect continuous-time Markov chain modeling.
FILE:references/COMPONENTS.md
# Component Capability Map
**Project**: finance-bp-112--openLGD
**Scan date**: 2026-04-22
**Stats**: {'total_files': 5, 'total_classes': 12, 'total_functions': 0, 'total_stages': 5}
## Modules (5)
- [data_acquisition](components/data_acquisition.md): 2 classes
- [model_estimation](components/model_estimation.md): 2 classes
- [model_serving](components/model_serving.md): 3 classes
- [federated_coordination](components/federated_coordination.md): 3 classes
- [standalone_execution](components/standalone_execution.md): 2 classes
FILE:references/CONSTRAINTS.md
# Constraints
## preservation_manifest
```yaml
required_objects:
business_decisions_count: 91
fatal_constraints_count: 31
non_fatal_constraints_count: 99
use_cases_count: 1
semantic_locks_count: 12
preconditions_count: 4
evidence_quality_rules_count: 2
traceback_scenarios_count: 5
```
FILE:references/LOCKS.md
# Semantic Locks + Preconditions
## Semantic Locks (12)
### `SL-01` <sub>(on_violation: fatal)</sub>
Execute sell orders before buy orders in every trading cycle
### `SL-02` <sub>(on_violation: fatal)</sub>
Trading signals MUST use next-bar execution (no look-ahead)
### `SL-03` <sub>(on_violation: fatal)</sub>
Entity IDs MUST follow format entity_type_exchange_code
### `SL-04` <sub>(on_violation: fatal)</sub>
DataFrame index MUST be MultiIndex (entity_id, timestamp)
### `SL-05` <sub>(on_violation: fatal)</sub>
TradingSignal MUST have EXACTLY ONE of: position_pct, order_money, order_amount
### `SL-06` <sub>(on_violation: fatal)</sub>
filter_result column semantics: True=BUY, False=SELL, None/NaN=NO ACTION
### `SL-07` <sub>(on_violation: fatal)</sub>
Transformer MUST run BEFORE Accumulator in factor pipeline
### `SL-08` <sub>(on_violation: fatal)</sub>
MACD parameters locked: fast=12, slow=26, signal=9
### `SL-09` <sub>(on_violation: warning)</sub>
Default transaction costs: buy_cost=0.001, sell_cost=0.001, slippage=0.001
### `SL-10` <sub>(on_violation: fatal)</sub>
A-share equity trading is T+1 (no same-day close of buy positions)
### `SL-11` <sub>(on_violation: fatal)</sub>
Recorder subclass MUST define provider AND data_schema class attributes
### `SL-12` <sub>(on_violation: fatal)</sub>
Factor result_df MUST contain either 'filter_result' OR 'score_result' column
## Preconditions (4)
- **`PC-01`**: `python3 -c 'import zvt; print(zvt.__version__)'` → on_fail: Run: python3 -m pip install zvt then re-run: python3 -m zvt.init_dirs to initialize data directories
- **`PC-02`**: `python3 -c "from zvt.api.kdata import get_kdata; df = get_kdata(entity_ids=['stock_sh_600000'], limit=1); assert df is not None and len(df) > 0, 'No kdata found'"` → on_fail: Run recorder first: python3 -m zvt.recorders.em.em_stock_kdata_recorder --entity_ids stock_sh_600000 (replace with your target entity IDs)
- **`PC-03`**: `python3 -c "import os; from pathlib import Path; zvt_home = Path(os.environ.get('ZVT_HOME', Path.home() / '.zvt')); assert zvt_home.exists(), f'ZVT home not found: {zvt_home}'"` → on_fail: Run: python3 -m zvt.init_dirs
- **`PC-04`**: `python3 -c "import os, tempfile; from pathlib import Path; zvt_home = Path(os.environ.get('ZVT_HOME', Path.home() / '.zvt')); test_f = zvt_home / '.write_test'; test_f.touch(); test_f.unlink()"` → on_fail: Check directory permissions: chmod u+w ~/.zvt or set ZVT_HOME environment variable to a writable location
FILE:references/USE_CASES.md
# Known Use Cases (KUC)
Total: **1**
## `KUC-101`
**Source**: `docs/source/conf.py`
This file configures the Sphinx documentation builder for the openLGD project, setting up project metadata, version information, and path configurations needed to generate developer documentation.
FILE:references/WISDOM.md
# Cross-Project Wisdom
Total: **8**
## `CW-CREDIT-RISK-001` — Strict input DataFrame schema validation
**From**: finance-bp-050--skorecard, finance-bp-112--openLGD · **Applicable to**: credit-risk
Both skorecard and openLGD require strict validation that input DataFrames contain exactly the expected columns (X/Y for openLGD, specified variable names for skorecard). This pattern is critical when data flows through multiple transformation stages where downstream modules access columns by name without defensive checking. Always validate column existence before pipeline execution.
## `CW-CREDIT-RISK-002` — Explicit random_state for ML model reproducibility
**From**: finance-bp-112--openLGD · **Applicable to**: credit-risk
In federated learning scenarios with SGDRegressor, omitting random_state causes non-deterministic results due to random data shuffling and weight initialization. This breaks federated learning convergence guarantees. Always set random_state explicitly when reproducibility across nodes or runs is required for regulatory auditability.
## `CW-CREDIT-RISK-003` — Mandatory data sorting before multi-stage estimation
**From**: finance-bp-050--skorecard, finance-bp-119--transitionMatrix · **Applicable to**: credit-risk
Both skorecard's two-phase bucketing and transitionMatrix's Aalen-Johansen estimator require data to be in a specific order before processing. Skorecard requires prebucketing before bucketing; transitionMatrix requires sorting by entity ID then time. Violating this ordering produces incorrect results or runtime errors. Always establish and enforce processing order in multi-stage pipelines.
## `CW-CREDIT-RISK-004` — Consistent API response key naming across all endpoints
**From**: finance-bp-112--openLGD · **Applicable to**: credit-risk
In federated systems with multiple API endpoints (/start, /update), all responses must use identical key names for parameters (intercept, coefficient). Inconsistency causes coordination loop failures in downstream consumers. Define a schema contract upfront and enforce key naming consistency across all response types.
## `CW-CREDIT-RISK-005` — Cardinality bounds checking before array operations
**From**: finance-bp-050--skorecard, finance-bp-119--transitionMatrix · **Applicable to**: credit-risk
Both skorecard's bucketers (max 100 unique values) and transitionMatrix's matrix operations (state cardinality matching matrix dimensions) require strict cardinality validation before creating numpy arrays or performing computations. Violations cause NotPreBucketedError or index out-of-bounds errors. Always validate cardinality constraints before array initialization.
## `CW-CREDIT-RISK-006` — Financial validation gates before transaction execution
**From**: finance-bp-072--lending · **Applicable to**: credit-risk
Lending systems require validation that disbursement amounts do not exceed limits, collateral values, or authorized periods before any transaction executes. These are financial loss prevention controls, not optional business logic. Missing these validations creates unauthorized exposure and regulatory compliance violations that cannot be remedied retroactively.
## `CW-CREDIT-RISK-007` — Mathematical constraint validation for probability outputs
**From**: finance-bp-050--skorecard, finance-bp-119--transitionMatrix · **Applicable to**: credit-risk
Credit risk models must validate mathematical constraints on outputs: skorecard's WoE requires valid bin assignments, transitionMatrix's transition matrices require row sums equals 1.0 and generator matrices require row sums equals 0.0. Invalid mathematical properties corrupt downstream risk calculations. Validate constraints before returning results.
## `CW-CREDIT-RISK-008` — Port-to-ID mapping consistency in distributed model serving
**From**: finance-bp-112--openLGD · **Applicable to**: credit-risk
When deploying distributed model servers, port numbers must map deterministically to server IDs (e.g., port 5001 maps to server ID 1). Computation of ID from port must be consistent across all components. Inconsistencies cause incorrect data directory selection and model parameter mismatches. Document and validate port-ID mappings during deployment.
FILE:references/components/data_acquisition.md
# data_acquisition (2 classes)
## `dataSource`
`data_acquisition/datasource.py:0`
## `Data Transport Layer`
`data_acquisition/data-transport-layer.py:0`
FILE:references/components/federated_coordination.md
# federated_coordination (3 classes)
## `federated_run`
`federated_coordination/federated-run.py:0`
## `Aggregation Algorithm`
`federated_coordination/aggregation-algorithm.py:0`
## `Communication Pattern`
`federated_coordination/communication-pattern.py:0`
FILE:references/components/model_estimation.md
# model_estimation (2 classes)
## `lgdModel`
`model_estimation/lgdmodel.py:0`
## `Regression Algorithm`
`model_estimation/regression-algorithm.py:0`
FILE:references/components/model_serving.md
# model_serving (3 classes)
## `start_calculation`
`model_serving/start-calculation.py:0`
## `update_calculation`
`model_serving/update-calculation.py:0`
## `Server Framework`
`model_serving/server-framework.py:0`
FILE:references/components/standalone_execution.md
# standalone_execution (2 classes)
## `standalone_run`
`standalone_execution/standalone-run.py:0`
## `Execution Mode`
`standalone_execution/execution-mode.py:0`
FILE:references/seed.yaml
meta:
id: finance-bp-112-v5.3
version: v6.1
blueprint_id: finance-bp-112
sop_version: crystal-compilation-v6.1
source_language: en
compiled_at: '2026-04-22T13:00:54.441302+00:00'
target_host: openclaw
authoritative_artifact:
primary: seed.yaml
non_authoritative_derivatives:
- SKILL.md (host-generated summary, may lag)
- HEARTBEAT.md (host telemetry)
- memory/*.md (host conversational memory)
rule: On any behavioral decision (preconditions check, OV assertion, EQ rule firing, spec_lock verification), agents MUST
re-read seed.yaml. Derivatives are for UI display only and may be out-of-date.
execution_protocol:
install_trigger:
- Execute resources.host_adapter.install_recipes[] in declared order
- Verify each package with import check before proceeding
execute_trigger: When user intent matches intent_router.uc_entries[].positive_terms AND user uses action verb (run/execute/跑/执行/backtest/fetch/collect)
on_execute:
- Reload seed.yaml (do not rely on SKILL.md or cached summaries)
- Run preconditions[] in declared order; halt on first fatal failure with on_fail message to user
- Enter context_state_machine.CA1_MEMORY_CHECKED state
- Evaluate evidence_quality.enforcement_rules[]; prepend user_disclosure_template
- Translate user_facing_fields to user locale per locale_contract
- "[V6 READING ORDER]\nThis crystal contains the following V6 layers. Before answering any business question, the host\
\ MUST read them in order:\n 1. anti_patterns[] — cross-project anti-patterns (with AP-* ids)\n 2. cross_project_wisdom[]\
\ — cross-project wisdom (with CW-* ids)\n 3. domain_constraints_injected[] — domain constraints (SHARED-* ids)\n \
\ 4. known_use_cases[] — concrete business scenarios (KUC-* ids)\n 5. component_capability_map — AST component map\
\ (by module)\n\nWhen answering user questions, proactively cite relevant AP-*/CW-*/SHARED-*/KUC-* ids with source text.\
\ Examples: T+1 rules -> cite SHARED-* constraint; model comparison -> warn via AP-*; follow-holdings strategy -> cite\
\ KUC-* with example file."
workspace_resolution:
scripts_path: '{host_workspace}/scripts/'
skills_path: '{host_workspace}/skills/'
trace_path: '{host_workspace}/.trace/'
capability_tags:
markets:
- global
activities:
- credit-risk
upgraded_from: finance-bp-112-v1.seed.yaml
upgraded_at: '2026-04-22T13:20:30.477210+00:00'
v6_inputs:
ast_mind_map: knowledge/sources/finance/finance-bp-112--openLGD/v6_inputs/ast_mind_map.yaml
anti_patterns: null
cross_project_wisdom: null
examples_kuc: knowledge/sources/finance/finance-bp-112--openLGD/v6_inputs/examples_kuc.yaml
shared_pools_dir: knowledge/sources/finance/_shared
anti_patterns:
- id: AP-CREDIT-RISK-001
title: Empty DataFrame passed to bucketing pipeline
description: When preparing input data for bucketing, passing an empty DataFrame with zero rows or zero columns causes immediate
ValueError at validation stage. This prevents any downstream processing and blocks the entire credit risk scoring pipeline
from executing. The root cause is missing defensive validation before data enters the bucketing workflow.
project_source: finance-bp-050--skorecard
severity: high
applicable_to_tags:
markets:
- global
activities:
- credit-risk
_source_file: anti-patterns/credit-risk.yaml
- id: AP-CREDIT-RISK-002
title: Multi-dimensional target array causing WoE shape mismatch
description: When providing target variable y to bucketers without normalizing to 1D numpy array through _check_y validation,
downstream Weight of Evidence calculations fail with shape mismatches. The consequence is corrupted bucket tables with
incorrect credit risk scores that misrepresent default probability estimates.
project_source: finance-bp-050--skorecard
severity: high
applicable_to_tags:
markets:
- global
activities:
- credit-risk
_source_file: anti-patterns/credit-risk.yaml
- id: AP-CREDIT-RISK-003
title: OptimalBucketer receiving high-cardinality numerical features
description: When implementing prebucketing for OptimalBucketer on numerical features without reducing to at most 100 unique
values, the system raises NotPreBucketedError and blocks the entire bucketing pipeline. Similarly, AsIsNumericalBucketer
fails with the same error for columns exceeding 100 unique values, preventing feature transformation in production scoring.
project_source: finance-bp-050--skorecard
severity: high
applicable_to_tags:
markets:
- global
activities:
- credit-risk
_source_file: anti-patterns/credit-risk.yaml
- id: AP-CREDIT-RISK-004
title: Special values distorting optimal bin boundaries
description: When implementing fit() for bucketers without filtering special values from X before computing bin boundaries
using _filter_specials_for_fit(), outlier special values distort optimal bin boundaries. This causes incorrect weight-of-evidence
calculations and unreliable credit risk scores that misrepresent borrower default probabilities.
project_source: finance-bp-050--skorecard
severity: high
applicable_to_tags:
markets:
- global
activities:
- credit-risk
_source_file: anti-patterns/credit-risk.yaml
- id: AP-CREDIT-RISK-005
title: Two-phase bucketing ordering violation causing special value loss
description: When fitting a BucketingProcess with two-phase bucketing without fitting prebucketing_pipeline before bucketing_pipeline,
special value remapping fails because pre-bucket labels are unavailable. Additionally, not using _find_remapped_specials()
after prebucketing causes special values to lose their correct bucket mappings, resulting in runtime errors.
project_source: finance-bp-050--skorecard
severity: high
applicable_to_tags:
markets:
- global
activities:
- credit-risk
_source_file: anti-patterns/credit-risk.yaml
- id: AP-CREDIT-RISK-006
title: Loan amount exceeding product and collateral limits
description: When validating loan amount for loan applications without enforcing loan_amount does not exceed maximum_loan_amount
from loan product or proposed securities, disbursing amounts exceeding product or collateral limits exposes the lender
to uncollateralized risk. This violates lending policy and creates direct financial loss exposure through unauthorized
lending.
project_source: finance-bp-072--lending
severity: high
applicable_to_tags:
markets:
- global
activities:
- credit-risk
_source_file: anti-patterns/credit-risk.yaml
- id: AP-CREDIT-RISK-007
title: Disbursement validation failures creating unauthorized exposure
description: When implementing loan disbursement validation without checking disbursed amount against loan limit, assigned
security value, available limit amount, and limit applicability dates, unauthorized disbursements occur. For Line of Credit
loans, disbursement outside approved periods or exceeding available limits creates unauthorized lending exposure and regulatory
compliance violations.
project_source: finance-bp-072--lending
severity: high
applicable_to_tags:
markets:
- global
activities:
- credit-risk
_source_file: anti-patterns/credit-risk.yaml
- id: AP-CREDIT-RISK-008
title: Interest accrual on written-off loans inflating income
description: When processing interest accrual for Written Off loans without verifying posting_date is on or after the loan
write-off date, interest is artificially inflated on non-performing assets. This misrepresents loan portfolio value, violates
provisioning requirements, and creates false income reporting that misleads stakeholders about actual financial performance.
project_source: finance-bp-072--lending
severity: high
applicable_to_tags:
markets:
- global
activities:
- credit-risk
_source_file: anti-patterns/credit-risk.yaml
- id: AP-CREDIT-RISK-009
title: Loop index errors in federated parameter averaging
description: When implementing federated parameter averaging logic, using the final index n instead of the loop variable
k causes only the last server's weight to be applied repeatedly. Additionally, skipping the first server by starting loop
index at 1 excludes valid parameters from averaging, breaking federated convergence and producing incorrect LGD estimates
across all nodes.
project_source: finance-bp-112--openLGD
severity: high
applicable_to_tags:
markets:
- global
activities:
- credit-risk
_source_file: anti-patterns/credit-risk.yaml
- id: AP-CREDIT-RISK-010
title: API response format inconsistency breaking federated coordination
description: When implementing GET /start and POST /update endpoints for LGD estimation without consistent 'intercept' and
'coefficient' keys in JSON responses, the federated coordinator fails to parse responses causing KeyError. Different return
key names (e.g., 'coef' instead of 'coefficient') break both standalone and federated execution paths.
project_source: finance-bp-112--openLGD
severity: high
applicable_to_tags:
markets:
- global
activities:
- credit-risk
_source_file: anti-patterns/credit-risk.yaml
- id: AP-CREDIT-RISK-011
title: Invalid transition probabilities corrupting Markov matrices
description: When generating synthetic Markov chain data or estimating transition matrices with probabilities outside [0,
1] or row sums not equal to 1.0, the resulting matrices violate the fundamental mathematical definition of a stochastic
transition matrix. This corrupts all downstream Markov chain modeling and credit curve generation, producing unreliable
credit risk estimates.
project_source: finance-bp-119--transitionMatrix
severity: high
applicable_to_tags:
markets:
- global
activities:
- credit-risk
_source_file: anti-patterns/credit-risk.yaml
- id: AP-CREDIT-RISK-012
title: Unsorted event data causing incorrect transition matrix estimates
description: When feeding generated data to cohort or duration estimators without sorting by entity ID first, then by ascending
time, incorrect timepoint assignment occurs in estimators, leading to wrong transition counts. Unsorted data also causes
the Aalen-Johansen algorithm to process events out of temporal order, producing incorrect transition matrices that violate
the Markov property.
project_source: finance-bp-119--transitionMatrix
severity: high
applicable_to_tags:
markets:
- global
activities:
- credit-risk
_source_file: anti-patterns/credit-risk.yaml
- id: AP-CREDIT-RISK-013
title: Zero-count division causing NaN in transition matrices
description: When normalizing counts to produce transition probabilities without checking source state population count
is greater than zero before division, division by zero occurs and causes NaN values in the transition matrix. These NaN
values corrupt all downstream matrix operations including generator matrix computation and credit curve generation.
project_source: finance-bp-119--transitionMatrix
severity: high
applicable_to_tags:
markets:
- global
activities:
- credit-risk
_source_file: anti-patterns/credit-risk.yaml
- id: AP-CREDIT-RISK-014
title: Wrong matrix logarithm method producing invalid generator matrices
description: When implementing generator() method without using scipy.linalg.logm for matrix logarithm computation, using
numpy.log or other approximation methods produces invalid generator matrices with row sums not equal to zero. This violates
the mathematical definition of an infinitesimal generator, causing incorrect continuous-time Markov chain modeling.
project_source: finance-bp-119--transitionMatrix
severity: medium
applicable_to_tags:
markets:
- global
activities:
- credit-risk
_source_file: anti-patterns/credit-risk.yaml
cross_project_wisdom:
- wisdom_id: CW-CREDIT-RISK-001
source_project: finance-bp-050--skorecard, finance-bp-112--openLGD
pattern_name: Strict input DataFrame schema validation
description: Both skorecard and openLGD require strict validation that input DataFrames contain exactly the expected columns
(X/Y for openLGD, specified variable names for skorecard). This pattern is critical when data flows through multiple transformation
stages where downstream modules access columns by name without defensive checking. Always validate column existence before
pipeline execution.
applicable_to_activity: credit-risk
_source_file: cross-project-wisdom/credit-risk.yaml
- wisdom_id: CW-CREDIT-RISK-002
source_project: finance-bp-112--openLGD
pattern_name: Explicit random_state for ML model reproducibility
description: In federated learning scenarios with SGDRegressor, omitting random_state causes non-deterministic results due
to random data shuffling and weight initialization. This breaks federated learning convergence guarantees. Always set
random_state explicitly when reproducibility across nodes or runs is required for regulatory auditability.
applicable_to_activity: credit-risk
_source_file: cross-project-wisdom/credit-risk.yaml
- wisdom_id: CW-CREDIT-RISK-003
source_project: finance-bp-050--skorecard, finance-bp-119--transitionMatrix
pattern_name: Mandatory data sorting before multi-stage estimation
description: Both skorecard's two-phase bucketing and transitionMatrix's Aalen-Johansen estimator require data to be in
a specific order before processing. Skorecard requires prebucketing before bucketing; transitionMatrix requires sorting
by entity ID then time. Violating this ordering produces incorrect results or runtime errors. Always establish and enforce
processing order in multi-stage pipelines.
applicable_to_activity: credit-risk
_source_file: cross-project-wisdom/credit-risk.yaml
- wisdom_id: CW-CREDIT-RISK-004
source_project: finance-bp-112--openLGD
pattern_name: Consistent API response key naming across all endpoints
description: In federated systems with multiple API endpoints (/start, /update), all responses must use identical key names
for parameters (intercept, coefficient). Inconsistency causes coordination loop failures in downstream consumers. Define
a schema contract upfront and enforce key naming consistency across all response types.
applicable_to_activity: credit-risk
_source_file: cross-project-wisdom/credit-risk.yaml
- wisdom_id: CW-CREDIT-RISK-005
source_project: finance-bp-050--skorecard, finance-bp-119--transitionMatrix
pattern_name: Cardinality bounds checking before array operations
description: Both skorecard's bucketers (max 100 unique values) and transitionMatrix's matrix operations (state cardinality
matching matrix dimensions) require strict cardinality validation before creating numpy arrays or performing computations.
Violations cause NotPreBucketedError or index out-of-bounds errors. Always validate cardinality constraints before array
initialization.
applicable_to_activity: credit-risk
_source_file: cross-project-wisdom/credit-risk.yaml
- wisdom_id: CW-CREDIT-RISK-006
source_project: finance-bp-072--lending
pattern_name: Financial validation gates before transaction execution
description: Lending systems require validation that disbursement amounts do not exceed limits, collateral values, or authorized
periods before any transaction executes. These are financial loss prevention controls, not optional business logic. Missing
these validations creates unauthorized exposure and regulatory compliance violations that cannot be remedied retroactively.
applicable_to_activity: credit-risk
_source_file: cross-project-wisdom/credit-risk.yaml
- wisdom_id: CW-CREDIT-RISK-007
source_project: finance-bp-050--skorecard, finance-bp-119--transitionMatrix
pattern_name: Mathematical constraint validation for probability outputs
description: 'Credit risk models must validate mathematical constraints on outputs: skorecard''s WoE requires valid bin
assignments, transitionMatrix''s transition matrices require row sums equals 1.0 and generator matrices require row sums
equals 0.0. Invalid mathematical properties corrupt downstream risk calculations. Validate constraints before returning
results.'
applicable_to_activity: credit-risk
_source_file: cross-project-wisdom/credit-risk.yaml
- wisdom_id: CW-CREDIT-RISK-008
source_project: finance-bp-112--openLGD
pattern_name: Port-to-ID mapping consistency in distributed model serving
description: When deploying distributed model servers, port numbers must map deterministically to server IDs (e.g., port
5001 maps to server ID 1). Computation of ID from port must be consistent across all components. Inconsistencies cause
incorrect data directory selection and model parameter mismatches. Document and validate port-ID mappings during deployment.
applicable_to_activity: credit-risk
_source_file: cross-project-wisdom/credit-risk.yaml
domain_constraints_injected: []
resources_injected: {}
known_use_cases:
- kuc_id: KUC-101
source_file: docs/source/conf.py
business_problem: This file configures the Sphinx documentation builder for the openLGD project, setting up project metadata,
version information, and path configurations needed to generate developer documentation.
intent_keywords:
- documentation
- sphinx
- configuration
- build docs
- project setup
stage: documentation
data_domain: mixed
type: extension_example
component_capability_map:
project: finance-bp-112--openLGD
scan_date: '2026-04-22'
stats:
total_files: 5
total_classes: 12
total_functions: 0
total_stages: 5
modules:
data_acquisition:
class_count: 2
stage_id: data_acquisition
stage_order: 1
responsibility: Retrieves LGD regression data from either local CSV files or REST API endpoints. Supports two transport
modes enabling development and production deployments without code changes.
classes:
- name: dataSource
file: data_acquisition/datasource.py
line: 0
kind: required_method
signature: ''
- name: Data Transport Layer
file: data_acquisition/data-transport-layer.py
line: 0
kind: replaceable_point
design_decision_count: 3
model_estimation:
class_count: 2
stage_id: model_estimation
stage_order: 2
responsibility: Executes iterative linear regression using stochastic gradient descent. Supports warm-start mode for
federated learning where prior averaged parameters initialize local estimation.
classes:
- name: lgdModel
file: model_estimation/lgdmodel.py
line: 0
kind: required_method
signature: ''
- name: Regression Algorithm
file: model_estimation/regression-algorithm.py
line: 0
kind: replaceable_point
design_decision_count: 4
model_serving:
class_count: 3
stage_id: model_serving
stage_order: 3
responsibility: Flask-based HTTP server that exposes LGD estimation via REST endpoints. Each server instance maintains
local data access and provides cold-start and warm-start estimation paths.
classes:
- name: start_calculation
file: model_serving/start-calculation.py
line: 0
kind: required_method
signature: ''
- name: update_calculation
file: model_serving/update-calculation.py
line: 0
kind: required_method
signature: ''
- name: Server Framework
file: model_serving/server-framework.py
line: 0
kind: replaceable_point
design_decision_count: 4
federated_coordination:
class_count: 3
stage_id: federated_coordination
stage_order: 4
responsibility: 'Orchestrates federated learning across multiple model servers using parameter averaging. Implements
the FedAvg algorithm: local estimation, parameter collection, weighted averaging, and broadcast to each servers.'
classes:
- name: federated_run
file: federated_coordination/federated-run.py
line: 0
kind: required_method
signature: ''
- name: Aggregation Algorithm
file: federated_coordination/aggregation-algorithm.py
line: 0
kind: replaceable_point
- name: Communication Pattern
file: federated_coordination/communication-pattern.py
line: 0
kind: replaceable_point
design_decision_count: 5
standalone_execution:
class_count: 2
stage_id: standalone_execution
stage_order: 5
responsibility: Single-process LGD estimation loop for development and testing. Validates environment setup and core
estimation logic without federation overhead.
classes:
- name: standalone_run
file: standalone_execution/standalone-run.py
line: 0
kind: required_method
signature: ''
- name: Execution Mode
file: standalone_execution/execution-mode.py
line: 0
kind: replaceable_point
design_decision_count: 2
data_flow_hints: []
locale_contract:
source_language: en
user_facing_fields:
- human_summary.what_i_can_do.tagline
- human_summary.what_i_can_do.use_cases[]
- human_summary.what_i_auto_fetch[]
- human_summary.what_i_ask_you[]
- evidence_quality.user_disclosure_template
- post_install_notice.message_template.positioning
- post_install_notice.message_template.capability_catalog.groups[].name
- post_install_notice.message_template.capability_catalog.groups[].description
- post_install_notice.message_template.capability_catalog.groups[].ucs[].name
- post_install_notice.message_template.capability_catalog.groups[].ucs[].short_description
- post_install_notice.message_template.call_to_action
- post_install_notice.message_template.featured_entries[].beginner_prompt
- post_install_notice.message_template.more_info_hint
- preconditions[].description
- preconditions[].on_fail
- intent_router.uc_entries[].name
- intent_router.uc_entries[].ambiguity_question
- architecture.pipeline
- architecture.stages[].narrative.does_what
- architecture.stages[].narrative.key_decisions
- architecture.stages[].narrative.common_pitfalls
- constraints.fatal[].consequence
- constraints.regular[].consequence
- output_validator.assertions[].failure_message
- acceptance.hard_gates[].on_fail
- skill_crystallization.action
locale_detection_order:
- explicit_user_declaration
- first_message_language
- system_locale
translation_enforcement:
trigger: on_first_user_message
action: Render user_facing_fields in detected locale, preserving all IDs (BD-/SL-/UC-/finance-C-) and code identifiers
verbatim
violation_code: LOCALE-01
violation_signal: User receives untranslated English Human Summary when detected locale != en
evidence_quality:
declared:
evidence_coverage_ratio: 1.0
evidence_verify_ratio: 0.20987654320987653
evidence_invalid: 64
evidence_verified: 17
evidence_auto_fixed: 0
audit_coverage: 30/30 (100%)
audit_pass_rate: 1/30 (3%)
audit_fail_total: 23
audit_finance_universal:
pass: 1
warn: 3
fail: 15
audit_subdomain_totals:
pass: 0
warn: 3
fail: 8
enforcement_rules:
- id: EQ-01
trigger: declared.evidence_verify_ratio < 0.5
action: MUST invoke traceback lookup for all cited BD-IDs in output before emitting business code — read LATEST.yaml sections
for each BD referenced
violation_code: EQ-01-V
violation_signal: Generated script references BD-IDs but no tool_call to read LATEST.yaml preceded code generation
user_disclosure_template: '[QUALITY NOTICE] This crystal was compiled from blueprint finance-bp-112. Evidence verify ratio
= 21.0% and audit fail total = 23. Generated results may have uncaptured requirement gaps. Verify critical decisions against
source files (LATEST.yaml / LATEST.jsonl).'
traceback:
source_files:
blueprint: LATEST.yaml
constraints: LATEST.jsonl
mandatory_lookup_scenarios:
- id: TB-01
condition: Two constraints have apparently conflicting enforcement rules
lookup_target: LATEST.jsonl — find both constraint IDs, compare `consequence` + `evidence_refs` to determine priority
- id: TB-02
condition: A business decision rationale is unclear or disputed
lookup_target: LATEST.yaml — locate BD-ID under business_decisions, read `rationale` + `alternative_considered` fields
- id: TB-03
condition: evidence_invalid > 0 in evidence_quality.declared
lookup_target: LATEST.yaml _enrich_meta — cross-check specific BD `evidence_refs` fields for invalid markers
- id: TB-04
condition: User asks where a rule comes from
lookup_target: LATEST.jsonl — find constraint by ID, read `confidence.evidence_refs` for source file + line number
- id: TB-05
condition: Generated code does not match expected ZVT API behavior
lookup_target: LATEST.yaml stages[].required_methods — verify method signature and evidence locator in source code
degraded_lookup:
no_fs_access: 'Ask the user to paste the relevant LATEST.yaml section or LATEST.jsonl lines for the BD-/finance-C- IDs
in question. Crystal ID: finance-bp-112-v5.0.'
trace_schema:
event_types:
- precondition_check
- spec_lock_check
- evidence_rule_fired
- evidence_rule_skipped
- locale_translation_emitted
- hard_gate_passed
- hard_gate_failed
- skill_emitted
- false_completion_claim
preconditions:
- id: PC-01
description: zvt package installed and importable
check_command: python3 -c 'import zvt; print(zvt.__version__)'
on_fail: 'Run: python3 -m pip install zvt then re-run: python3 -m zvt.init_dirs to initialize data directories'
severity: fatal
- id: PC-02
description: K-data exists for target entities (required before backtesting)
check_command: python3 -c "from zvt.api.kdata import get_kdata; df = get_kdata(entity_ids=['stock_sh_600000'], limit=1);
assert df is not None and len(df) > 0, 'No kdata found'"
on_fail: 'Run recorder first: python3 -m zvt.recorders.em.em_stock_kdata_recorder --entity_ids stock_sh_600000 (replace
with your target entity IDs)'
severity: fatal
applies_to_uc: []
- id: PC-03
description: ZVT data directory initialized (~/.zvt or ZVT_HOME)
check_command: 'python3 -c "import os; from pathlib import Path; zvt_home = Path(os.environ.get(''ZVT_HOME'', Path.home()
/ ''.zvt'')); assert zvt_home.exists(), f''ZVT home not found: {zvt_home}''"'
on_fail: 'Run: python3 -m zvt.init_dirs'
severity: fatal
- id: PC-04
description: SQLite write permission for ZVT data directory
check_command: python3 -c "import os, tempfile; from pathlib import Path; zvt_home = Path(os.environ.get('ZVT_HOME', Path.home()
/ '.zvt')); test_f = zvt_home / '.write_test'; test_f.touch(); test_f.unlink()"
on_fail: 'Check directory permissions: chmod u+w ~/.zvt or set ZVT_HOME environment variable to a writable location'
severity: warn
intent_router:
uc_entries:
- uc_id: UC-101
name: Sphinx Documentation Configuration
positive_terms:
- documentation
- sphinx
- configuration
- build docs
- project setup
data_domain: mixed
negative_terms:
- trading strategy
- screening
- data pipeline
- monitoring
- live trading
- factor computation
- machine learning
ambiguity_question: Are you looking to configure documentation build tools, or are you trying to implement a trading strategy,
data pipeline, or analytical workflow?
context_state_machine:
states:
- id: CA1_MEMORY_CHECKED
entry: Task started
exit: All memory queries attempted and recorded; memory_unavailable set if failed
timeout: 30s — skip memory, mark memory_unavailable=true, proceed to CA2
- id: CA2_GAPS_FILLED
entry: CA1 complete
exit: 'All FATAL-priority required inputs answered: target market (A-share/HK/US), data source, time range, strategy type'
timeout: NOT skippable — FATAL inputs MUST be user-answered before proceeding
- id: CA3_PATH_SELECTED
entry: CA2 complete
exit: intent_router matched single use case with confidence gap > 20% over next candidate, no data_domain ambiguity
timeout: Trigger ambiguity_question for top-2 candidates, await user selection
- id: CA4_EXECUTING
entry: CA3 complete + user explicit confirmation received
exit: All hard gates G1-Gn passed and output files written
timeout: NOT skippable — user confirmation of execution path required
enforcement: Code generation is PROHIBITED before CA4_EXECUTING. Any regression to earlier state MUST be announced to user.
buy/sell ordering SL-01 check runs at CA4 entry.
spec_lock_registry:
semantic_locks:
- id: SL-01
description: Execute sell orders before buy orders in every trading cycle
locked_value: sell() called before buy() in each Trader.run() iteration
violation_is: fatal
source_bd_ids:
- BD-018
- id: SL-02
description: Trading signals MUST use next-bar execution (no look-ahead)
locked_value: due_timestamp = happen_timestamp + level.to_second()
violation_is: fatal
source_bd_ids:
- BD-014
- BD-025
- id: SL-03
description: Entity IDs MUST follow format entity_type_exchange_code
locked_value: stock_sh_600000 | stockhk_hk_0700 | stockus_nasdaq_AAPL
violation_is: fatal
source_bd_ids: []
- id: SL-04
description: DataFrame index MUST be MultiIndex (entity_id, timestamp)
locked_value: df.index.names == ['entity_id', 'timestamp']
violation_is: fatal
source_bd_ids: []
- id: SL-05
description: 'TradingSignal MUST have EXACTLY ONE of: position_pct, order_money, order_amount'
locked_value: XOR enforcement in trading/__init__.py:68
violation_is: fatal
source_bd_ids: []
- id: SL-06
description: 'filter_result column semantics: True=BUY, False=SELL, None/NaN=NO ACTION'
locked_value: factor.py:475 order_type_flag mapping
violation_is: fatal
source_bd_ids: []
- id: SL-07
description: Transformer MUST run BEFORE Accumulator in factor pipeline
locked_value: 'compute_result(): transform at :403 before accumulator at :409'
violation_is: fatal
source_bd_ids: []
- id: SL-08
description: 'MACD parameters locked: fast=12, slow=26, signal=9'
locked_value: factors/algorithm.py:30 macd(slow=26, fast=12, n=9)
violation_is: fatal
source_bd_ids:
- BD-036
- id: SL-09
description: 'Default transaction costs: buy_cost=0.001, sell_cost=0.001, slippage=0.001'
locked_value: sim_account.py:25 SimAccountService default costs
violation_is: warning
source_bd_ids:
- BD-029
- id: SL-10
description: A-share equity trading is T+1 (no same-day close of buy positions)
locked_value: sim_account.available_long filters by trading_t
violation_is: fatal
source_bd_ids: []
- id: SL-11
description: Recorder subclass MUST define provider AND data_schema class attributes
locked_value: contract/recorder.py:71 Meta; register_schema decorator
violation_is: fatal
source_bd_ids: []
- id: SL-12
description: Factor result_df MUST contain either 'filter_result' OR 'score_result' column
locked_value: result_df.columns.intersection({'filter_result', 'score_result'}) non-empty
violation_is: fatal
source_bd_ids: []
implementation_hints:
- id: IH-01
hint: 'Use AdjustType enum exactly: qfq (pre-adjust), hfq (post-adjust), bfq (none) — contract/__init__.py:121'
- id: IH-02
hint: For A-share kdata, default to hfq for long-term analysis (dividend-adjusted) — trader.py:538 StockTrader
- id: IH-03
hint: SQLite connection MUST use check_same_thread=False for multi-threaded recorders
- id: IH-04
hint: Accumulator state serialization uses JSON with custom encoder/decoder hooks — contract/base_service.py
- id: IH-05
hint: Factor.level MUST match TargetSelector.level (enforced at add_factor) — factors/target_selector.py:84
preservation_manifest:
required_objects:
business_decisions_count: 91
fatal_constraints_count: 31
non_fatal_constraints_count: 99
use_cases_count: 1
semantic_locks_count: 12
preconditions_count: 4
evidence_quality_rules_count: 2
traceback_scenarios_count: 5
architecture:
pipeline: data_collection -> data_storage -> factor_computation -> target_selection -> trading_execution -> visualization
stages:
- id: data_collection
narrative:
does_what: TimeSeriesDataRecorder and FixedCycleDataRecorder fetch OHLCV and fundamental data from providers (eastmoney,
joinquant, baostock, akshare) and persist domain objects (Stock1dKdata, BalanceSheet) to SQLite via df_to_db().
key_decisions: BD-002 chose evaluate_start_end_size_timestamps for incremental fetch (not full refresh) because comparing
to get_latest_saved_record avoids redundant API calls; BD-003 chose get_data_map field transformation to keep domain
schema provider-agnostic.
common_pitfalls: 'Don''t forget SL-11: Recorder subclass MUST declare both provider and data_schema class attributes
else initialization fails with assertion error; finance-C-001 fatal violation.'
business_decisions: []
- id: data_storage
narrative:
does_what: StorageBackend persists DataFrames to per-provider SQLite databases at {data_path}/{provider}/{provider}_{db_name}.db
using path templates from _get_path_template; Mixin.record_data and Mixin.query_data provide uniform read/write interface.
key_decisions: BD-004 chose StorageBackend abstraction (not hardcoded SQLite) to allow future cloud storage swap; BD-006
derives db_name from data_schema __tablename__ for per-domain database isolation.
common_pitfalls: SL-04 violation (wrong DataFrame index) causes factor pipeline failures downstream; always ensure df.index.names
== ['entity_id', 'timestamp'] before calling record_data.
business_decisions: []
- id: factor_computation
narrative:
does_what: Factor.compute() applies Transformer (stateless, e.g. MacdTransformer) then Accumulator (stateful, e.g. MaStatsAccumulator)
to produce filter_result or score_result columns; EntityStateService persists per-entity rolling state across batches.
key_decisions: BD-007 chose Factor inheriting DataReader for composable data access; SL-08 locks MACD at (fast=12, slow=26,
n=9) — chose standard Appel parameters not adaptive because interpretability matters for practitioners.
common_pitfalls: 'SL-07: Transformer MUST run before Accumulator — swapping order causes NaN propagation; SL-12: result_df
must contain filter_result OR score_result column or TargetSelector silently drops all signals.'
business_decisions: []
- id: target_selection
narrative:
does_what: TargetSelector.add_factor() registers Factor instances; get_targets() returns entity_ids passing threshold
filter at a specific timestamp, enabling point-in-time historical backtesting without look-ahead.
key_decisions: BD-012 chose registrable factor list (not hardcoded) for runtime customization; BD-013 chose timestamp-specific
filtering not current-only because backtests need historical point-in-time correctness.
common_pitfalls: Factor.level MUST match TargetSelector.level (IH-05); mismatched levels cause silent empty target lists
that look like no signals but are actually level-mismatch bugs.
business_decisions: []
- id: trading_execution
narrative:
does_what: Trader.run() calls sell() before buy() each cycle, generates TradingSignals with due_timestamp = happen_timestamp
+ level.to_second() for next-bar execution, and applies on_profit_control() for stop-loss/take-profit before regular
target selection.
key_decisions: SL-01 locks sell-before-buy order because available_long check in sim_account depends on it — chose this
over symmetric ordering to prevent implicit leverage; BD-039 chose long=AND/short=OR multi-level logic to reflect
risk asymmetry.
common_pitfalls: 'SL-02 violation (immediate execution instead of next-bar) introduces look-ahead bias and makes backtest
results unreproducible in live trading; SL-10: A-share T+1 constraint — backtesting without it overstates returns.'
business_decisions: []
- id: visualization
narrative:
does_what: Drawer.draw() combines kline main chart with factor overlays and Rect annotations for entry/exit signals
using Plotly; Drawable interface on Factor enables consistent chart rendering across data types.
key_decisions: BD-019 chose drawer_rects subclass override for custom annotations not hardcoded markers — allows traders
to define entry/exit visuals without modifying base drawing logic.
common_pitfalls: draw_result=True by default (BD-055) is fine for development but set draw_result=False in production/headless
environments to avoid Plotly server startup overhead.
business_decisions: []
- id: cross_cutting_concerns
narrative:
does_what: 'Invariants and utilities that span multiple pipeline stages — collected from 27 source groups: API(3), Aggregation(1),
Algorithm(5), Architecture(2), Configuration(4), Deployment(2), and 21 more.'
key_decisions: 91 BDs merged here because they apply to more than one main stage (e.g. algorithm helpers, default value
choices, ordering contracts, error handling). Agent should inspect individual BD summaries and link back to affected
main stages via shared IDs.
common_pitfalls: Cross-cutting concerns frequently surface as bugs when changes to one main stage unintentionally break
another. Check constraints referencing these BDs and verify invariants still hold after any stage-local modification.
business_decisions:
- id: BD-035
type: B/BA
summary: Use GET /start endpoint to initiate cold start and retrieve initial local estimates
- id: BD-036
type: B
summary: Use POST /update endpoint to receive averaged parameters and return new estimates
- id: BD-037
type: B
summary: Use GET / as health check endpoint to verify server liveness
- id: BD-024
type: B
summary: Use equal weights (0.25 each) for federated parameter averaging across 4 servers
- id: BD-020
type: B
summary: Use SGDRegressor from scikit-learn for linear regression with stochastic gradient descent
- id: BD-021
type: B
summary: Set max_iter=1 per epoch for incremental/online learning style updates
- id: BD-022
type: B/BA
summary: Disable regularization (tol=None) and early stopping for pure empirical loss
- id: BD-052
type: B
summary: Set warm_start=False to erase previous solution on each fit call
- id: BD-053
type: B
summary: Use verbose=0 for silent training output
- id: BD-019
type: BA/DK
summary: Use federated learning architecture where data stays local and only model parameters are aggregated
- id: BD-038
type: BA/M
summary: Design stateless model servers where each request is computed independently
- id: BD-041
type: B
summary: Use YAML configuration file for cluster parameters (hosts, epochs, servers)
- id: BD-042
type: B/RC
summary: Run Flask in debug mode (debug=True) for development
- id: BD-057
type: B
summary: Configure base URL as 'http://127.0.0.1:500' in config.yml for local demo
- id: BD-058
type: B/RC
summary: Use ruamel.yaml for YAML parsing with safe loading
- id: BD-039
type: B
summary: Use Fabric deployment tool for cluster management tasks
- id: BD-040
type: B/BA
summary: Use Docker containers for openNPL data backend deployment
- id: BD-031
type: B/BA
summary: Run Flask model servers on ports 5001-5004 for the federated cluster
- id: BD-032
type: B
summary: Run openNPL data backend servers on ports 8001-8004 for database-backed demo
- id: BD-033
type: B
summary: 'Derive server ID from port number: server_id = port - 5000'
- id: BD-034
type: B/BA
summary: Configure 4 federated servers as default cluster size
- id: BD-054
type: B
summary: Print server estimates and averaged parameters after each epoch
- id: BD-047
type: B/DK
summary: Provide /stop endpoint for graceful server shutdown
- id: BD-048
type: B
summary: Recommend Linux environment for running the federated demo
- id: BD-049
type: B/DK
summary: Use virtual environment for dependency isolation
- id: BD-050
type: B/DK
summary: Use XTerm windows for displaying model server output during demo
- id: BD-045
type: B
summary: Run separate client/coordinator process to orchestrate federated rounds
- id: BD-046
type: B
summary: Check each model server health before starting federated calculation
- id: BD-028
type: B/DK
summary: 'Exchange two parameters between coordinator and servers: intercept and coefficient'
- id: BD-029
type: B/BA
summary: Use cold start (no initial params) for first iteration, warm start thereafter
- id: BD-030
type: B/DK
summary: Use JSON serialization for parameter exchange in federated protocol
- id: BD-051
type: B/BA
summary: Use HTTP requests library for client-server communication
- id: BD-061
type: DK/B
summary: 'TODO: Implement fractional regression variations for LGD models'
- id: BD-062
type: DK/B
summary: 'TODO: Adopt different data loading strategies for standalone vs federated learning'
- id: BD-059
type: DK/B
summary: 'TODO: Remove hardcoded weights - fetch node data shape via controlled API'
- id: BD-060
type: DK
summary: 'TODO: Remove file/URL path hardwiring in dataSource'
- id: BD-055
type: B/BA
summary: Provide standalone_run.py as single-server validation before federated demo
- id: BD-023
type: B/BA
summary: Set 10 epochs as default training iterations
- id: BD-056
type: B
summary: Iterate federated rounds by calling lgdModel with previous averaged params
- id: BD-001
type: B
summary: Choice parameter controls data transport rather than separate functions
- id: BD-002
type: BA/DK
summary: Port-derived server ID convention (port - 5000 = server number)
- id: BD-003
type: B/BA
summary: Hardcoded data schema (X, Y column names)
- id: BD-025
type: B
summary: 'Provide two data source modes: local filesystem (choice=1) and REST API (choice=2)'
- id: BD-026
type: B/BA
summary: Store CSV data in server_dirs/{server_id}/regression_data.csv pattern
- id: BD-027
type: B/BA
summary: Define CSV data format with X column as target and Y as explanatory variable
- id: BD-043
type: B/RC
summary: Query openNPL API endpoint /api/npl_data/counterparties for data backend
- id: BD-044
type: B/BA
summary: Extract current_assets and cash_and_cash_equivalent_items as X and Y features
- id: BD-073
type: BA/DK
summary: 'SGDRegressor defaults encode iterative forcing: max_iter=1, tol=None, early_stopping=False'
- id: BD-075
type: BA/DK
summary: 'Server ID derived from port via hardcoded offset: n = int(port) - 5000'
- id: BD-077
type: BA/DK
summary: Data source choice=1 loads from ./server_dirs/{server}/regression_data.csv
- id: BD-081
type: BA
summary: Epochs count hardcoded in config.yml (10) vs standalone_run.py (10) - dual maintenance risk
- id: BD-012
type: M/BA
summary: Federated Averaging (FedAvg) algorithm
- id: BD-013
type: B/BA
summary: Equal weighting across servers
- id: BD-014
type: M/BA
summary: Per-epoch parameter collection and averaging
- id: BD-015
type: B/BA
summary: Hardcoded weight dictionary
- id: BD-016
type: B/BA
summary: Blocking sequential server communication
- id: BD-082
type: B/BA
summary: 'INTERACTION: BD-038 (stateless servers) × BD-005/BD-029 (warm-start via intercept_init/coef_init) → Paradox:
warm-start REQUIRES state persistence across requests, contradicting stateless server desig'
- id: BD-083
type: B/BA
summary: 'INTERACTION: BD-003/BD-078 (X=target, Y=explanatory column convention) × BD-025/BD-043 (dual data source modes)
→ Convention fragility amplified by data source variability'
- id: BD-084
type: BA
summary: 'INTERACTION: BD-002/BD-033/BD-075 (port-derived server ID: n = port - 5000) × BD-080 (exactly 4 servers required)
→ Port availability dependency creates cascading failure risk'
- id: BD-085
type: B
summary: 'INTERACTION: BD-013/BD-024/BD-076 (equal 0.25 weighting) × BD-016 (blocking sequential communication) → Unequal
convergence quality with linear latency penalty'
- id: BD-086
type: B
summary: 'INTERACTION: BD-004/BD-021/BD-052/BD-064/BD-073 (SGDRegressor single-epoch settings) × BD-019 (federated learning
architecture) → Training limitation undermines federated convergence benefit'
- id: BD-087
type: B/RC
summary: 'INTERACTION: BD-072 (start BEFORE update ordering) × BD-074 (averaging BEFORE next epoch) × BD-016 (sequential
blocking) → Single slow server creates cascading deadlock risk in federated rounds'
- id: BD-088
type: BA
summary: 'INTERACTION: BD-023 (epochs: 10) × BD-081 (epochs dual-hardcoded) → Configuration inconsistency risk between
federated and standalone modes'
- id: BD-089
type: B/BA
summary: 'RISK CASCADE: BD-076 (equal weighting) → BD-085 (latency amplification) → BD-087 (cascading deadlock) → federation
failure when data is heterogeneous'
- id: BD-090
type: BA
summary: 'RISK CASCADE: BD-075 (port-derived ID) → BD-080 (4-server hardcode) → BD-046 (health check) → deployment failure
cascades to federation inability'
- id: BD-091
type: BA
summary: 'CONTRADICTION: BD-038 (stateless servers) states ''each request is computed independently'' while BD-072 (start
BEFORE update) mandates stateful request ordering across federated rounds'
- id: BD-074
type: B
summary: Federated averaging MUST complete before sending averaged params to next epoch
- id: BD-076
type: B/BA
summary: Equal federated weights (0.25 each) hardcoded for 4 servers - no API to fetch data size
- id: BD-063
type: B/BA
summary: Linear regression model using SGDRegressor instead of closed-form OLS
- id: BD-064
type: B/BA
summary: SGD optimization with max_iter=1 per fit call and warm_start disabled
- id: BD-065
type: B/BA
summary: Early stopping disabled with no convergence tolerance criterion
- id: BD-066
type: B
summary: No explicit regularization penalty applied to loss function
- id: BD-067
type: B
summary: Server-based data source selection with file vs REST API input method
- id: BD-068
type: B/RC
summary: 'Variable assignment convention: X as target, y as explanatory variables'
- id: BD-069
type: B/BA
summary: Default squared error loss (squared_loss) with default optimal learning rate schedule
- id: BD-070
type: B/BA
summary: Model parameter initialization supported via coef_init and intercept_init
- id: BD-071
type: B/BA
summary: Fitted parameters returned as dictionary with predictions and metadata
- id: BD-004
type: M/BA
summary: SGDRegressor with max_iter=1 for iterative control
- id: BD-005
type: M/DK
summary: Warm-start via intercept_init/coef_init parameters
- id: BD-006
type: B/BA
summary: None-checking as cold/warm start toggle
- id: BD-007
type: B/BA
summary: SGDRegressor hardcoded (not abstracted or configurable)
- id: BD-008
type: B/BA
summary: Port-to-server-ID derivation at runtime
- id: BD-009
type: BA
summary: Server ID derived from request.host header parsing
- id: BD-010
type: B/DK
summary: Signal-based shutdown via SIGKILL/SIGTERM selection
- id: BD-011
type: B/BA
summary: Three-endpoint API design (/, /start, /update)
- id: BD-072
type: RC
summary: Federated workflow REQUIRES /start cold-start BEFORE /update warm-start calls per epoch
- id: BD-078
type: RC
summary: 'CSV column convention: X=target, Y=explanatory; extraction order matters for regression'
- id: BD-079
type: DK/B
summary: Standalone and federated modes implement identical iterative SGD loop - code duplication
- id: BD-017
type: B
summary: Identical epoch loop structure to federated_run
- id: BD-018
type: B
summary: Direct lgdModel imports without server abstraction
- id: BD-080
type: B/BA
summary: server_dirs/X/ requires exactly 4 subdirectories with identical CSV structure
resources:
packages:
- name: Flask
version_pin: latest
- name: scikit-learn
version_pin: latest
- name: numpy
version_pin: latest
- name: pandas
version_pin: latest
- name: scipy
version_pin: latest
- name: requests
version_pin: latest
- name: ruamel.yaml
version_pin: latest
- name: fabric
version_pin: latest
- name: Sphinx
version_pin: latest
- name: sphinx-rtd-theme
version_pin: latest
strategy_scaffold:
entry_point_name: run_backtest
output_path: result.csv
execution_mode: backtest
conditional_entry_points:
backtest:
entry_point_name: run_backtest
output_path: result.csv
collector:
entry_point_name: run_collector
output_path: result.json
factor:
entry_point_name: run_factor
output_path: result.parquet
training:
entry_point_name: run_training
output_path: result.json
serving:
entry_point_name: run_server
output_path: result.json
research:
entry_point_name: run_research
output_path: result.json
tail_template: "# === DO NOT MODIFY BELOW THIS LINE ===\nif __name__ == \"__main__\":\n result = run_backtest() #\
\ implement above\n from validate import enforce_validation\n enforce_validation(result, output_path=\"{workspace}/result.csv\"\
)\n# === END DO NOT MODIFY ==="
host_adapter:
target: openclaw
timeout_seconds: 1800
shell_operator_restriction: 'exec tool intercepts && / ; / | — never chain: ''pip install X && python Y''. Use separate
exec calls.'
install_recipes:
- python3 -m pip install Flask
- python3 -m pip install scikit-learn
- python3 -m pip install numpy
- python3 -m pip install zvt
credential_injection: JoinQuant/QMT credentials require user-side '!' prefix shell login. Never hardcode credentials in
generated scripts.
path_resolution: '{workspace} resolves to ~/.openclaw/workspace/doramagic at execution time.'
file_io_tooling: Use openclaw 'write' tool for .py/.sql files; 'exec' tool for python3 /absolute/path/script.py (absolute
paths only).
constraints:
fatal:
- id: finance-C-001
when: When implementing data acquisition for LGD regression model
action: Return a DataFrame containing exactly 'X' and 'Y' columns
severity: fatal
kind: domain_rule
modality: must
consequence: The downstream lgdModel.py module accesses df[['X']] and df['Y'] columns without validation, causing KeyError
exceptions if column names are different
stage_ids:
- data_acquisition
- id: finance-C-002
when: When implementing local file mode (choice=1) in dataSource
action: Read CSV file from server_dirs/{server_id}/regression_data.csv path
severity: fatal
kind: domain_rule
modality: must
consequence: pandas.read_csv will raise FileNotFoundError if the file path is incorrect, and there is no try-except handler
to provide meaningful error messages
stage_ids:
- data_acquisition
- id: finance-C-004
when: When configuring data transport for the LGD model
action: Pass choice values other than 1 or 2 to dataSource
severity: fatal
kind: resource_boundary
modality: must_not
consequence: If choice is neither 1 nor 2, the function returns None implicitly, causing lgdModel.py to fail when trying
to access df[['X']] columns
stage_ids:
- data_acquisition
- id: finance-C-005
when: When deploying the federated model server infrastructure
action: Start model servers on ports 5001-500N matching server IDs for correct port-to-ID mapping
severity: fatal
kind: architecture_guardrail
modality: must
consequence: model_server.py:35 computes n = int(port) - 5000 to derive server ID; wrong port causes incorrect data directory
selection
stage_ids:
- data_acquisition
- id: finance-C-011
when: When implementing SGDRegressor for federated LGD estimation
action: Set random_state parameter explicitly to verify reproducibility across federated nodes
severity: fatal
kind: domain_rule
modality: must
consequence: Without random_state, each lgdModel call produces non-deterministic results due to random data shuffling
and weight initialization. This breaks federated learning convergence guarantees as different nodes will reach different
local minima.
stage_ids:
- model_estimation
- id: finance-C-013
when: When providing input data to lgdModel
action: Verify data source contains columns exactly named 'X' (explanatory) and 'Y' (target)
severity: fatal
kind: domain_rule
modality: must
consequence: lgdModel accesses df[['X']] and df['Y'] without validation. Missing or misnamed columns will raise KeyError
at runtime, breaking both standalone and federated execution flows.
stage_ids:
- model_estimation
- id: finance-C-014
when: When implementing federated parameter averaging logic
action: Iterate over each participating servers (k from 1 to n) when computing weighted average
severity: fatal
kind: domain_rule
modality: must
consequence: federated_run.py uses weights[str(n)] for all servers instead of weights[str(k)] for each server k. This
causes the averaged parameters to use only the last server's weight, corrupting federated model convergence and producing
incorrect global LGD estimates.
stage_ids:
- model_estimation
- id: finance-C-018
when: When returning fitted parameters from lgdModel
action: Return dict with keys 'intercept' and 'coefficient' containing scalar values
severity: fatal
kind: architecture_guardrail
modality: must
consequence: model_server.py and standalone_run.py access params['intercept'] and params['coefficient']. Returning different
key names (e.g., 'coef' or 'coefficients') would cause KeyError in all downstream consumers, breaking both standalone
and federated modes.
stage_ids:
- model_estimation
- id: finance-C-033
when: When implementing GET /start endpoint for cold-start LGD estimation
action: Return JSON with 'intercept' and 'coefficient' keys from lgdModel cold-start calculation
severity: fatal
kind: architecture_guardrail
modality: must
consequence: Federated coordinator fails to parse response causing KeyError in federated_run.py:58-59 when accessing data['coefficient']
stage_ids:
- model_serving
- id: finance-C-034
when: When implementing POST /update endpoint for warm-start LGD estimation
action: Accept JSON body with 'intercept' and 'coefficient' fields and return updated parameters in same JSON structure
severity: fatal
kind: architecture_guardrail
modality: must
consequence: Federated coordination loop breaks when /update response format differs from /start response format
stage_ids:
- model_serving
- id: finance-C-037
when: When presenting LGD estimation results from model server for regulatory credit risk reporting
action: Claim that backtest model parameters equal live production model parameters
severity: fatal
kind: claim_boundary
modality: must_not
consequence: Regulatory non-compliance when presenting simulated model estimates as actual risk quantification without
noting estimation methodology differences
stage_ids:
- model_serving
- id: finance-C-041
when: When implementing the initial parameter averaging loop
action: skip the first server by starting loop index at 1 instead of 0
severity: fatal
kind: domain_rule
modality: must_not
consequence: The first server's parameters are excluded from initial averaging, causing all subsequent averaged parameters
to be incorrect and breaking federated convergence. Server 0 contribution is completely lost.
stage_ids:
- federated_coordination
- id: finance-C-042
when: When implementing the epoch averaging loop
action: use the loop variable k to index weights instead of using the final index n
severity: fatal
kind: domain_rule
modality: must
consequence: Epoch averaging uses weights[str(n)] inside the loop instead of weights[str(k)], causing only the last server's
weight (0.0 for n=4) to be applied repeatedly, producing meaningless averaged parameters.
stage_ids:
- federated_coordination
- id: finance-C-043
when: When initializing SGDRegressor with warm start parameters
action: pass intercept_init and coef_init to the fit() method of sklearn SGDRegressor
severity: fatal
kind: domain_rule
modality: must_not
consequence: SGDRegressor.fit() does not accept intercept_init or coef_init parameters. Passing these will raise a TypeError,
breaking all federated update cycles and preventing convergence.
stage_ids:
- federated_coordination
- id: finance-C-053
when: When implementing SGDRegressor warm-start parameter initialization
action: Use sklearn set_params() method to set initial coefficient and intercept values before fitting with warm_start=True
severity: fatal
kind: domain_rule
modality: must
consequence: SGDRegressor.fit() does not accept intercept_init and coef_init keyword arguments, causing TypeError at runtime
when warm-starting with pre-existing parameter values
stage_ids:
- standalone_execution
- id: finance-C-054
when: When implementing warm-start SGDRegressor with external parameter initialization
action: Set clf.coef_ and clf.intercept_ attributes directly before calling fit(), or use partial_fit() for stateful updates
severity: fatal
kind: domain_rule
modality: must
consequence: Attempting to pass custom initial parameters via non-existent fit() arguments will raise TypeError, breaking
the epoch iteration loop
stage_ids:
- standalone_execution
- id: finance-C-057
when: When configuring dataSource for local CSV mode
action: Set choice parameter to 1 and verify server_dirs/{server}/regression_data.csv exists before calling dataSource()
severity: fatal
kind: resource_boundary
modality: must
consequence: Missing data directory or incorrect file path will trigger FileNotFoundError, preventing model estimation
from executing
stage_ids:
- standalone_execution
- id: finance-C-066
when: When implementing data acquisition, ensure DataFrame schema matches model expectations
action: Return a pandas DataFrame with columns exactly named 'X' (target variable) and 'Y' (explanatory variable)
severity: fatal
kind: domain_rule
modality: must
consequence: If column names differ, lgdModel.py line 73 df[['X']] and line 75 df['Y'] will raise KeyError, causing model
estimation to fail silently or crash
stage_ids:
- data_acquisition
- model_estimation
- id: finance-C-067
when: When passing SGDRegressor parameters to HTTP endpoints, ensure proper type extraction
action: Extract scalar values from numpy arrays using index [0] before returning dict
severity: fatal
kind: architecture_guardrail
modality: must
consequence: JSON serialization of numpy arrays produces incompatible format that Flask cannot jsonify correctly, causing
HTTP endpoint failures
stage_ids:
- model_estimation
- model_serving
- id: finance-C-078
when: When implementing or validating DataFrame inputs to the LGD estimation model
action: Provide DataFrames containing both 'X' (target/LGD variable) and 'Y' (explanatory variable) columns, as the model
extracts X=df[['X']] and y=df['Y']
severity: fatal
kind: domain_rule
modality: must
consequence: KeyError or incorrect regression results when the model tries to access missing 'X' or 'Y' columns, breaking
the LGD estimation pipeline
- id: finance-C-079
when: When initializing or updating LGD model parameters in federated mode
action: Pass parameter dictionaries containing both 'intercept' and 'coefficient' keys as required by the sklearn SGDRegressor
warm-start interface
severity: fatal
kind: domain_rule
modality: must
consequence: KeyError or TypeError when sklearn fit() receives incorrect parameter dict structure, breaking federated
parameter exchange
- id: finance-C-080
when: When spawning Flask model servers for federated LGD estimation
action: Assign server ports in the range 5001-5004, as the server code derives server ID via n = int(port) - 5000 to map
to server_dirs/N/data
severity: fatal
kind: architecture_guardrail
modality: must
consequence: Incorrect data directory mapping causing FileNotFoundError or loading wrong server's data, breaking the entire
federated estimation
- id: finance-C-081
when: When executing a federated LGD training epoch
action: Call /start (cold-start) before any /update (warm-start) calls, as the model requires initial parameters to be
established first
severity: fatal
kind: architecture_guardrail
modality: must
consequence: Incorrect model parameters propagated to all servers when warm-start is called without prior cold-start,
leading to divergent or invalid federated estimates
- id: finance-C-082
when: When implementing the SGDRegressor-based LGD model iteration
action: Set max_iter=1 and tol=None to enforce single-epoch per fit() call, as each gradient step must be performed independently
across federated nodes
severity: fatal
kind: architecture_guardrail
modality: must
consequence: Multi-epoch convergence within a single fit() call breaks the federated averaging contract, causing incorrect
parameter aggregation across nodes
- id: finance-C-085
when: When deploying the openLGD Flask model servers
action: Run Flask with debug=True in production or any security-sensitive environment, as this enables code execution
and interactive debugger
severity: fatal
kind: domain_rule
modality: must_not
consequence: Remote code execution vulnerability when werkzeug debugger is exposed in production, allowing attackers to
execute arbitrary code on the server
- id: finance-C-086
when: When presenting or marketing openLGD's capabilities to users
action: Claim that openLGD is suitable for production deployment, as it is explicitly documented as early alpha software
with unstable API
severity: fatal
kind: claim_boundary
modality: must_not
consequence: Users deploy alpha software in production, experiencing unexpected API breaking changes, unhandled edge cases,
and security vulnerabilities
- id: finance-C-099
when: When implementing or evaluating federated learning architecture decisions
action: Verify raw data remains local to each server node and only model parameters (intercept and coefficient) traverse
the network — must NOT implement centralized data pooling even if technically feasible
severity: fatal
kind: domain_rule
modality: must
consequence: Centralizing raw financial data violates data sovereignty requirements for multi-institution scenarios, causing
regulatory non-compliance with GDPR, banking secrecy laws, and institutional data sharing prohibitions
derived_from_bd_id: BD-019
- id: finance-C-102
when: When implementing data loading and regression preparation in lgdModel.py and dataSource.py
action: Extract column 'X' as target values and column 'Y' as explanatory variables in the exact order specified at lgdModel.py:73-75
— must NOT swap, rename, or use alternative column mappings
severity: fatal
kind: domain_rule
modality: must
consequence: Inverting the X/Y column convention produces an inverted regression model where target and predictor variables
are swapped, causing completely incorrect LGD estimates and invalid credit risk assessments
derived_from_bd_id: BD-078
- id: finance-C-106
when: When parsing YAML configuration files in federated_run.py
action: Use ruamel.yaml with safe loading (typ='safe') for YAML parsing; never use yaml.load without specifying a safe
Loader to prevent arbitrary code execution through YAML deserialization vulnerabilities
severity: fatal
kind: domain_rule
modality: must
consequence: Unsafe YAML loading allows arbitrary code execution from malicious configuration files, creating remote code
execution vulnerability in production deployments
derived_from_bd_id: BD-058
- id: finance-C-108
when: When implementing federated learning parameter synchronization across distributed servers
action: Implement per-epoch parameter collection and averaging with explicit epochs configuration; verify each server's
partial_fit() results are collected and averaged before the next round begins
severity: fatal
kind: domain_rule
modality: must
consequence: Missing per-epoch synchronization causes parameter drift across servers, leading to inconsistent model states
and failed federated convergence
derived_from_bd_id: BD-014
- id: finance-C-121
when: When implementing or refactoring federated learning aggregation logic
action: Verify federated averaging operation (parameter aggregation from servers) completes entirely before sending averaged
parameters to the next epoch — do not parallelize or reorder the averaging step with subsequent epoch processing
severity: fatal
kind: architecture_guardrail
modality: must
consequence: Skipping or parallelizing the averaging step causes stale or inconsistent parameters to propagate, corrupting
federated learning convergence and producing models that do not represent true global consensus
derived_from_bd_id: BD-074
regular:
- id: finance-C-003
when: When implementing API mode (choice=2) in dataSource
action: Construct URL using localhost:800{server_id} pattern and /api/npl_data/counterparties endpoint
severity: high
kind: resource_boundary
modality: must
consequence: requests.get will raise ConnectionError if the target server is not running, with no error handling in the
code
stage_ids:
- data_acquisition
- id: finance-C-006
when: When running openLGD in production
action: Expect production-grade API stability from openLGD
severity: high
kind: claim_boundary
modality: must_not
consequence: README.md:18 explicitly states 'early alpha release' and CHANGELOG.rst:3 warns 'API IS STILL VERY UNSTABLE
AS MORE USE CASES / FEATURES ARE ADDED REGULARLY'
stage_ids:
- data_acquisition
- id: finance-C-007
when: When deploying federated mode with API data source (choice=2)
action: Verify target API server is running before calling dataSource with choice=2
severity: high
kind: resource_boundary
modality: must
consequence: requests.get() in dataSource.py:41 will raise ConnectionError if the localhost:800X API server is not running,
and there is no error handling
stage_ids:
- data_acquisition
- id: finance-C-008
when: When implementing local file mode (choice=1) data acquisition
action: Verify server_dirs/{server_id} directory exists before attempting to read CSV
severity: high
kind: operational_lesson
modality: must
consequence: Missing directory will cause pandas.read_csv to raise FileNotFoundError with no custom error message or recovery
mechanism
stage_ids:
- data_acquisition
- id: finance-C-009
when: When adding new data sources or modifying data acquisition
action: Hardcode file paths, URLs, or column names directly in dataSource implementation
severity: medium
kind: operational_lesson
modality: must_not
consequence: dataSource.py:25 TODO comment explicitly states 'remove file / url path hardwiring', hardcoded paths make
deployment brittle and non-portable
stage_ids:
- data_acquisition
- id: finance-C-010
when: When using API mode (choice=2) data acquisition
action: Handle the nested API call pattern (counterparty list then individual records)
severity: medium
kind: operational_lesson
modality: must
consequence: dataSource.py:40-47 makes two sequential requests.get calls; if any individual data_url fails, the loop continues
with incomplete data
stage_ids:
- data_acquisition
- id: finance-C-012
when: When implementing the cold/warm start toggle logic
action: Verify both intercept and coef parameters are provided together for warm-start mode
severity: high
kind: domain_rule
modality: must
consequence: The condition 'if intercept is None or coef is None' triggers cold-start if either parameter is missing.
Partial initialization with only one parameter will silently fall back to random initialization, producing incorrect
model updates in the federated loop.
stage_ids:
- model_estimation
- id: finance-C-015
when: When deploying federated model servers
action: Use ports other than 5001-5004 for the default configuration without updating both server and client code
severity: high
kind: resource_boundary
modality: must_not
consequence: model_server.py:35 derives server ID as 'int(port) - 5000', and Federated_Demo.md documents ports 5001-5004.
Port mismatches between server and controller cause requests to reach wrong servers, breaking federated coordination.
stage_ids:
- model_estimation
- id: finance-C-016
when: When selecting regression algorithm for LGD estimation
action: Accept that SGDRegressor is the only available algorithm (not abstracted or configurable)
severity: medium
kind: resource_boundary
modality: must
consequence: The regression algorithm is hardcoded to sklearn.linear_model.SGDRegressor. Replacing it requires modifying
lgdModel.py directly. This creates a tight coupling and prevents using alternative algorithms (e.g., ridge regression,
ElasticNet) without code changes.
stage_ids:
- model_estimation
- id: finance-C-017
when: When running lgdModel in iterative fashion for federated learning
action: Set max_iter=1 to verify exactly one gradient step per function call
severity: high
kind: architecture_guardrail
modality: must
consequence: The max_iter=1 setting is critical for the federated learning architecture where each call represents one
epoch. Increasing max_iter would perform multiple gradient steps per call, breaking the per-epoch parameter update contract
required by federated averaging.
stage_ids:
- model_estimation
- id: finance-C-019
when: When claiming LGD estimation capabilities
action: Claim statistical rigor equivalent to pooled dataset analysis
severity: high
kind: claim_boundary
modality: must_not
consequence: Federated LGD estimation with SGD produces parameters that may not converge to the pooled optimum due to
data heterogeneity across servers. Presenting federated estimates as equivalent to centralized estimation would misrepresent
the statistical properties of the model.
stage_ids:
- model_estimation
- id: finance-C-020
when: When considering replacing the SGDRegressor implementation
action: Claim that federated learning produces identical results to centralized estimation
severity: medium
kind: claim_boundary
modality: should_not
consequence: Federated averaging with SGD is an approximation that depends on data distribution across servers. Different
server configurations will produce different model parameters even with identical hyperparameters, which is expected
behavior, not a bug.
stage_ids:
- model_estimation
- id: finance-C-021
when: When evaluating federated averaging convergence
action: Skip monitoring parameter stability across epochs
severity: medium
kind: operational_lesson
modality: must_not
consequence: Without tracking parameter change magnitude across epochs, users cannot determine if the federated process
has converged. Parameters oscillating or diverging indicate misconfiguration or data quality issues that would go unnoticed.
stage_ids:
- model_estimation
- id: finance-C-022
when: When performing initial cold-start call
action: Pass None for both intercept and coef parameters
severity: high
kind: architecture_guardrail
modality: must
consequence: The first federated iteration requires random initialization. Passing non-None values on cold-start would
improperly seed the federated process with arbitrary values, corrupting the initial global model state.
stage_ids:
- model_estimation
- id: finance-C-023
when: When implementing port-to-server-ID derivation in Flask endpoints
action: Validate that port can be converted to integer before subtracting 5000
severity: high
kind: domain_rule
modality: must
consequence: ValueError exception when Host header contains non-numeric port, causing HTTP 500 response to clients
stage_ids:
- model_serving
- id: finance-C-024
when: When implementing POST /update endpoint that parses JSON request body
action: Validate JSON parsing result and check required fields 'intercept' and 'coefficient' exist
severity: high
kind: domain_rule
modality: must
consequence: KeyError exception when client sends JSON without 'intercept' or 'coefficient' fields, causing HTTP 500 response
stage_ids:
- model_serving
- id: finance-C-026
when: When implementing Flask model server endpoint that accesses local data
action: Use server_dirs/{port-5000}/regression_data.csv as the data directory path pattern
severity: high
kind: domain_rule
modality: must
consequence: FileNotFoundError when server tries to access non-existent data directory, causing cold-start estimation
to fail
stage_ids:
- model_serving
- id: finance-C-027
when: When implementing /update endpoint that expects model parameters
action: Verify request Content-Type is application/json before parsing request body
severity: medium
kind: domain_rule
modality: must
consequence: Malformed JSON response or HTTP 415 Unsupported Media Type when client sends non-JSON data
stage_ids:
- model_serving
- id: finance-C-028
when: When implementing model server in federated cluster topology
action: Run server instances on ports 5001-5004 matching server_dirs/1 through server_dirs/4
severity: high
kind: resource_boundary
modality: must
consequence: Server with port 5005 incorrectly maps to server_dirs/5 which may not exist, causing data loading failure
stage_ids:
- model_serving
- id: finance-C-029
when: When deploying Flask-based model server for federated LGD estimation
action: Accept that Flask development server is single-threaded and not suitable for high-concurrency production workloads
severity: medium
kind: resource_boundary
modality: must
consequence: HTTP request blocking causing federated coordination timeouts when multiple clients connect simultaneously
stage_ids:
- model_serving
- id: finance-C-030
when: When configuring model servers for federated estimation workflow
action: Start each model servers before executing federated_run.py coordinator script
severity: high
kind: architecture_guardrail
modality: must
consequence: ConnectionError when coordinator attempts GET /start or POST /update on unavailable server, breaking federated
iteration
stage_ids:
- model_serving
- id: finance-C-031
when: When using openLGD in early alpha release for credit risk estimation
action: Expect API instability and prepare for breaking changes in each release cycle
severity: medium
kind: operational_lesson
modality: should
consequence: Silent model parameter changes causing inconsistent LGD estimates across federated nodes after library upgrade
stage_ids:
- model_serving
- id: finance-C-032
when: When implementing federated LGD estimation with multiple server instances
action: Verify each server instance has unique port and corresponding server_dirs/{n}/data directory provisioned
severity: high
kind: architecture_guardrail
modality: must
consequence: Multiple servers accessing same data directory causing race conditions in CSV file read operations
stage_ids:
- model_serving
- id: finance-C-035
when: When implementing Flask model server that derives server ID from HTTP Host header
action: Use request.host header for port extraction to verify multi-tenant isolation per server instance
severity: high
kind: architecture_guardrail
modality: must
consequence: Wrong server ID used for data directory access causing data contamination between federated nodes
stage_ids:
- model_serving
- id: finance-C-036
when: When implementing root endpoint (/) for health check
action: Return HTTP 200 OK with JSON response indicating server liveness and identity
severity: medium
kind: architecture_guardrail
modality: must
consequence: Health check monitoring tools fail to detect server availability, causing false alarms in federated cluster
monitoring
stage_ids:
- model_serving
- id: finance-C-038
when: When deploying model_server.py in a federated credit risk production system
action: Advertise the Flask development server as production-ready HTTP service
severity: high
kind: claim_boundary
modality: must_not
consequence: Security audit failure and operational risk when relying on Flask debug server lacking production hardening
features
stage_ids:
- model_serving
- id: finance-C-039
when: When using model server for openLGD federated estimation in alpha stage
action: Assume API compatibility between minor version upgrades without regression testing
severity: high
kind: claim_boundary
modality: must_not
consequence: Silent breaking changes in JSON response format causing federated coordination to fail silently or produce
incorrect averaged parameters
stage_ids:
- model_serving
- id: finance-C-040
when: When estimating LGD model parameters with federated learning across multiple servers
action: Claim federated-averaged parameters are equivalent to centrally-computed parameters without mathematical proof
of convergence
severity: medium
kind: claim_boundary
modality: must_not
consequence: Incorrect credit risk estimates when federated averaging assumptions (data homogeneity, equal weighting)
are violated in practice
stage_ids:
- model_serving
- id: finance-C-044
when: When making HTTP requests to federated servers
action: include timeout parameters to prevent indefinite blocking on unreachable servers
severity: high
kind: resource_boundary
modality: must
consequence: Without HTTP timeouts, a single unresponsive server causes the entire federated run to hang indefinitely.
In production, this blocks all participating servers waiting for the coordinator.
stage_ids:
- federated_coordination
- id: finance-C-045
when: When configuring server weights for federated averaging
action: dynamically calculate weights based on actual data volumes or sample counts per server
severity: high
kind: operational_lesson
modality: must
consequence: Equal weighting assumes equal data volumes across servers. If servers have unequal data (e.g., 100 vs 10000
samples), the weighted average under-represents larger datasets, producing biased LGD estimates that misrepresent actual
credit risk.
stage_ids:
- federated_coordination
- id: finance-C-046
when: When running Flask model servers in production
action: run with debug=True enabled in production environments
severity: high
kind: resource_boundary
modality: must_not
consequence: Flask debug mode enables code reloading and Werkzeug debugger, exposing the Python traceback to attackers.
This creates remote code execution vulnerabilities in production deployments.
stage_ids:
- federated_coordination
- id: finance-C-047
when: When configuring the number of servers in config.yml
action: verify the server count matches exactly the number of weights defined in the weights dictionary
severity: high
kind: architecture_guardrail
modality: must
consequence: The weights dictionary is hardcoded for 4 servers. If config.yml specifies servers != 4, the URL construction
and weight indexing will fail, causing KeyError or IndexError exceptions.
stage_ids:
- federated_coordination
- id: finance-C-048
when: When presenting federated learning results
action: claim that the system provides production-ready real-time federated credit risk modeling
severity: high
kind: claim_boundary
modality: must_not
consequence: The README explicitly states 'This is an early alpha release. openLGD is still in active development'. Presenting
alpha software as production-ready violates user expectations and regulatory requirements for credit risk models.
stage_ids:
- federated_coordination
- id: finance-C-049
when: When processing JSON responses from federated servers
action: validate response structure before accessing dictionary keys
severity: medium
kind: domain_rule
modality: must
consequence: Without validation, if a server returns malformed JSON or missing keys ('coefficient', 'intercept'), the
code raises KeyError, crashing the entire federated run mid-epoch.
stage_ids:
- federated_coordination
- id: finance-C-050
when: When handling HTTP errors from server communication
action: check HTTP status codes and implement retry logic for transient failures
severity: medium
kind: resource_boundary
modality: must
consequence: Network partitions, server overload, or temporary unavailability cause HTTP errors that crash the federated
run. Without error handling, a single epoch failure prevents any parameter updates from being applied.
stage_ids:
- federated_coordination
- id: finance-C-051
when: When implementing data source abstraction
action: externalize file paths and URL patterns to configuration instead of hardcoding in source
severity: medium
kind: operational_lesson
modality: must
consequence: Hardcoded paths like './server_dirs/' and 'http://localhost:800' prevent the system from running in different
environments without code modifications.
stage_ids:
- federated_coordination
- id: finance-C-052
when: When scaling the federated system to more than 4 servers
action: assume the hardcoded weights dictionary remains valid without modification
severity: high
kind: architecture_guardrail
modality: must_not
consequence: Increasing servers > 4 in config.yml causes KeyError when accessing weights beyond the 4 hardcoded keys,
crashing the federated run.
stage_ids:
- federated_coordination
- id: finance-C-055
when: When configuring SGDRegressor for iterative model estimation
action: Set warm_start=True to enable parameter reuse across consecutive fit() calls
severity: high
kind: domain_rule
modality: must
consequence: With warm_start=False, each fit() call resets coefficients to random initialization, preventing convergence
across epochs and producing non-monotonic parameter estimates
stage_ids:
- standalone_execution
- id: finance-C-056
when: When preparing CSV data for LGD model estimation
action: Verify data files contain exactly two columns named 'X' (target variable) and 'Y' (explanatory variable) without
missing values
severity: high
kind: domain_rule
modality: must
consequence: Mismatched column names or missing values will cause KeyError during DataFrame extraction or produce NaN
coefficients, invalidating the LGD estimation
stage_ids:
- standalone_execution
- id: finance-C-058
when: When running standalone execution for environment validation
action: Execute standalone_run.py first to verify paths, dependencies, and core estimation logic before launching federated
servers
severity: medium
kind: architecture_guardrail
modality: should
consequence: Skipping standalone validation may lead to cryptic errors during federated execution when environment issues
could have been caught earlier
stage_ids:
- standalone_execution
- id: finance-C-059
when: When configuring standalone execution epochs
action: Hardcode Epochs value in standalone_run.py when it should be configurable via config.yml like federated_run.py
severity: medium
kind: resource_boundary
modality: must_not
consequence: Hardcoded 10 epochs prevents testing different convergence behaviors and creates inconsistency between standalone
and federated execution configurations
stage_ids:
- standalone_execution
- id: finance-C-060
when: When comparing standalone vs federated estimation results
action: Use identical epoch loop structure in standalone_run.py and federated_run.py to enable deterministic result comparison
severity: high
kind: architecture_guardrail
modality: must
consequence: Different loop structures prevent meaningful validation that standalone lgdModel produces identical results
to model_server endpoint, defeating the purpose of standalone as validation framework
stage_ids:
- standalone_execution
- id: finance-C-061
when: When accessing LGD estimation logic from standalone execution
action: Import lgdModel directly without model_server abstraction to validate core estimation works independently of federation
infrastructure
severity: medium
kind: architecture_guardrail
modality: must
consequence: Using model_server endpoints in standalone mode introduces unnecessary HTTP overhead and masks potential
lgdModel issues behind server interface complexity
stage_ids:
- standalone_execution
- id: finance-C-062
when: When presenting standalone execution as production system
action: Claim standalone execution produces production-ready LGD estimates equivalent to enterprise financial systems
severity: high
kind: claim_boundary
modality: must_not
consequence: openLGD is explicitly documented as 'early alpha' research software; presenting alpha results as production-ready
violates the project's stated development status
stage_ids:
- standalone_execution
- id: finance-C-063
when: When claiming standalone results validate federated production deployments
action: Claim that single-server standalone LGD estimates equal federated multi-server estimates without accounting for
data partitioning and averaging differences
severity: medium
kind: claim_boundary
modality: must_not
consequence: Standalone runs use consolidated data while federated runs partition data across servers and average parameters,
producing different estimation landscapes
stage_ids:
- standalone_execution
- id: finance-C-064
when: When executing standalone_run.py without understanding SGDRegressor convergence
action: Assume 10 epochs produces converged parameters without verifying coefficient stability across consecutive epochs
severity: medium
kind: operational_lesson
modality: must_not
consequence: With tol=None and max_iter=1 per fit() call, 10 external epochs may be insufficient for convergence with
complex datasets, leading to unreliable LGD estimates
stage_ids:
- standalone_execution
- id: finance-C-065
when: When using sklearn SGDRegressor with stochastic gradient descent for financial modeling
action: Set random_state parameter explicitly to verify reproducible coefficient estimates across executions
severity: high
kind: domain_rule
modality: must
consequence: Without explicit random_state, SGDRegressor will produce different coefficient estimates on each run due
to random shuffling of training samples, preventing reproducible validation
stage_ids:
- standalone_execution
- id: finance-C-068
when: When sending parameters to POST /update endpoint, ensure Content-Type header is set
action: 'Set HTTP header ''Content-Type'': ''application/json'' when posting JSON data'
severity: high
kind: architecture_guardrail
modality: must
consequence: Without proper Content-Type header, Flask may parse request.data incorrectly, causing json.loads to fail
with UnicodeDecodeError
stage_ids:
- federated_coordination
- model_serving
- id: finance-C-069
when: When implementing federated averaging, ensure loop iterates over each servers
action: Use correct loop variable in weight index - must be k, not n
severity: high
kind: domain_rule
modality: must_not
consequence: Loop on line 64 uses range(1, n) which excludes server 0, then uses weights[str(n)] which may be undefined,
causing KeyError or incorrect weighted averaging
stage_ids:
- federated_coordination
- id: finance-C-070
when: When configuring federated workflow, ensure weights sum to 1.0 for proper averaging
action: Validate that weight sum equals 1.0 or is proportional across each participating servers
severity: high
kind: domain_rule
modality: must
consequence: Incorrect weights cause model parameters to be improperly averaged, leading to biased LGD estimates and incorrect
credit risk capital calculations
stage_ids:
- federated_coordination
- id: finance-C-071
when: When loading config.yml for federated coordination, validate each required keys exist
action: Check that config contains 'hosts', 'epochs', and 'servers' keys before accessing them
severity: high
kind: resource_boundary
modality: must
consequence: Missing config keys cause KeyError when accessing config['hosts'], config['epochs'], or config['servers'],
preventing federated coordination from starting
stage_ids:
- federated_coordination
- id: finance-C-072
when: When mapping server port to server ID, ensure port follows the 5000+ convention
action: Format model server URL as base URL plus server number without trailing slash
severity: high
kind: architecture_guardrail
modality: must
consequence: Incorrect URL format causes requests.get() to fail with ConnectionError, preventing federated parameter aggregation
stage_ids:
- federated_coordination
- model_serving
- id: finance-C-073
when: When loading data from CSV, ensure the file has exactly 2 columns with proper headers
action: Validate CSV structure returns DataFrame with exactly columns 'X' and 'Y'
severity: medium
kind: domain_rule
modality: must
consequence: CSV parsing errors or missing columns cause KeyError in lgdModel.py, producing incorrect LGD estimates without
warning
stage_ids:
- data_acquisition
- id: finance-C-074
when: When receiving JSON in POST request, ensure both 'intercept' and 'coefficient' keys exist
action: Validate params dictionary contains both 'intercept' and 'coefficient' keys before passing to lgdModel
severity: high
kind: architecture_guardrail
modality: must
consequence: Missing keys cause KeyError in model_server.py:44-45, crashing the update endpoint and halting federated
training
stage_ids:
- model_serving
- id: finance-C-075
when: When returning parameters from GET /start endpoint, ensure dict can be serialized
action: Return dict with float values (not numpy scalars) for JSON serialization compatibility
severity: medium
kind: architecture_guardrail
modality: must
consequence: Flask jsonify fails on numpy float64 types, returning 500 Internal Server Error and breaking federated coordination
stage_ids:
- model_serving
- id: finance-C-076
when: When using openLGD for credit risk decisions, do not present simulated results as validated outcomes
action: Claim results are for 'research and validation purposes' rather than production credit risk quantification
severity: medium
kind: claim_boundary
modality: must_not
consequence: Presenting early alpha LGD estimates as validated credit risk parameters may violate regulatory requirements
for capital calculation under Basel frameworks
stage_ids:
- model_estimation
- model_serving
- id: finance-C-077
when: When encountering model convergence warnings, do not skip investigation by assuming 'early iterations are normal'
action: Investigate each convergence warnings before continuing federated iterations
severity: low
kind: rationalization_guard
modality: must_not
consequence: Skipping convergence investigation may hide numerical instability or data quality issues, producing unreliable
LGD estimates
stage_ids:
- model_estimation
- federated_coordination
- id: finance-C-083
when: When completing a federated averaging round before sending parameters to the next epoch
action: Complete the averaging of each node parameters before distributing averaged params to any node, as premature sending
of partial averages corrupts model convergence
severity: high
kind: architecture_guardrail
modality: must
consequence: Partial averages sent to nodes cause parameter drift and non-convergence in subsequent epochs, producing
invalid LGD model coefficients
- id: finance-C-084
when: When configuring federated averaging weights across model servers
action: Adjust server weights proportionally when changing the number of participating servers, as the default 0.25 weight
assumes exactly 4 equal-data-volume nodes
severity: high
kind: architecture_guardrail
modality: must
consequence: Incorrect weighted averaging causes biased LGD model parameters when server data volumes differ from the
4-node equal-weight assumption
- id: finance-C-087
when: When using openLGD for credit risk decision-making
action: Claim that federated LGD model estimates are equivalent to centralized model estimates, as data distribution assumptions
differ between modes
severity: high
kind: claim_boundary
modality: must_not
consequence: Incorrect regulatory capital calculations when federated estimates diverge from centralized benchmarks without
proper validation methodology
- id: finance-C-088
when: When selecting an LGD modeling approach
action: Claim that openLGD supports non-linear LGD modeling, as the implementation uses only sklearn SGDRegressor with
linear loss
severity: high
kind: claim_boundary
modality: must_not
consequence: Incorrect credit risk assessments when users expect non-linear LGD capabilities (GLM with binomial, beta
regression) that openLGD does not provide
- id: finance-C-089
when: When scaling the federated LGD deployment beyond the demo configuration
action: Claim that openLGD supports large-scale federated deployments with many servers, as the sequential communication
architecture creates a bottleneck
severity: high
kind: claim_boundary
modality: must_not
consequence: Severe performance degradation or timeout failures when scaling beyond 4 servers due to sequential HTTP-based
parameter exchange
- id: finance-C-090
when: When using SGDRegressor with the configured hyperparameters
action: Claim that the model will converge to an optimal solution within any specific number of epochs, as tol=None disables
convergence checking
severity: medium
kind: claim_boundary
modality: must_not
consequence: Users may stop training prematurely expecting convergence, leading to under-fitted LGD models with suboptimal
coefficient estimates
- id: finance-C-091
when: When implementing federated learning workflows based on this blueprint
action: Fetch actual node data shapes via a controlled API instead of hardcoding weights, as TODO at federated_run.py:33
acknowledges this limitation
severity: medium
kind: operational_lesson
modality: must
consequence: Incorrect model averaging when actual server data volumes differ from the assumed equal distribution, producing
biased LGD estimates
- id: finance-C-092
when: When sourcing LGD training data through the dataSource abstraction
action: Use parametric paths (server_dirs/N/datafile.csv or proper openNPL API endpoints) as hardcoded in dataSource.py,
or risk data loading failures
severity: high
kind: resource_boundary
modality: must
consequence: FileNotFoundError or data loading failures when data files are not in the expected parametric locations,
breaking both standalone and federated modes
- id: finance-C-093
when: When selecting data loading method in the LGD estimation workflow
action: Use choice=1 for local CSV files or choice=2 for openNPL REST API, as the dataSource function branches on these
discrete values only
severity: medium
kind: resource_boundary
modality: must
consequence: Unexpected behavior or data loading failure when using unsupported choice values for data sourcing
- id: finance-C-094
when: When implementing or refactoring training loop logic in standalone_run.py
action: Maintain identical epoch loop structure and Epochs configuration as federated_run.py to verify valid comparison
between standalone and federated training results
severity: high
kind: domain_rule
modality: must
consequence: Modifying the epoch loop structure independently in standalone_run.py breaks the comparison guarantee between
standalone and federated training modes, making it impossible to verify that federation complexity does not introduce
behavioral changes
derived_from_bd_id: BD-017
- id: finance-C-095
when: When implementing or refactoring model initialization logic in lgdModel.py
action: Assume parameters can be passed as None when intending to use existing values — always distinguish between 'parameter
not provided' (use existing) and 'parameter explicitly set to None' (cold start)
severity: high
kind: operational_lesson
modality: must_not
consequence: Confusing None vs not-present causes silent cold starts that reset model state, producing incorrect LGD estimates
and invalidating credit risk calculations
derived_from_bd_id: BD-006
- id: finance-C-096
when: When implementing federated protocol communication between coordinator and servers
action: Exchange only the two scalar parameters (intercept and coefficient) per communication round — must NOT add gradient
vectors, Hessian information, or additional statistics to the payload
severity: high
kind: domain_rule
modality: must
consequence: Adding extra parameters to federated exchanges increases bandwidth requirements and attack surface for parameter
tampering, violating the minimal payload design essential for bandwidth-constrained environments
derived_from_bd_id: BD-028
- id: finance-C-097
when: When implementing data acquisition for LGD model training
action: Query the openNPL API endpoint /api/npl_data/counterparties for structured entity data including financial metrics
— must NOT use alternative data sources without validation against openNPL schema
severity: high
kind: domain_rule
modality: must
consequence: Using mismatched data sources causes schema incompatibilities with downstream LGD estimation, potentially
producing meaningless regression results or silent data corruption
derived_from_bd_id: BD-043
- id: finance-C-098
when: When modifying training epoch configuration across the federated learning system
action: Update epoch count in both config.yml and standalone_run.py simultaneously — implement a centralized constant
or import from a shared module to prevent dual maintenance drift
severity: medium
kind: architecture_guardrail
modality: should
consequence: Updating epochs in only one location causes divergent training duration between federated and standalone
modes, invalidating comparative results and producing non-reproducible experiments
derived_from_bd_id: BD-081
- id: finance-C-100
when: When implementing FedAvg aggregation logic in federated_run.py
action: Fetch actual node data shapes (sample counts) via controlled API and apply weighted averaging proportional to
local data volumes — must NOT use hardcoded equal weights (0.25 each) in production environments
severity: high
kind: domain_rule
modality: must
consequence: Using equal weights when datasets have heterogeneous sizes causes model convergence bias toward smaller nodes,
producing suboptimal LGD estimates that systematically underestimate risk for larger institutions
derived_from_bd_id: BD-059
- id: finance-C-101
when: When implementing federated coordination and model aggregation in federated_run.py
action: Use Federated Averaging (FedAvg) algorithm with synchronous rounds and parameter-level averaging only — must NOT
implement asynchronous averaging, differential privacy mechanisms, or secure aggregation without explicit architectural
approval
severity: high
kind: architecture_guardrail
modality: must
consequence: Using alternative aggregation methods without architectural review breaks the synchronous FedAvg assumption,
potentially causing parameter staleness, convergence failures, or compatibility issues with existing server implementations
derived_from_bd_id: BD-012
- id: finance-C-103
when: When importing and using the lgdModel module for standalone execution
action: Import lgdModel directly via 'from lgdModel import lgdModel' to verify standalone execution without network dependencies;
do not introduce HTTP client abstractions that would couple core estimation to the model_server layer
severity: high
kind: domain_rule
modality: must
consequence: Refactoring to use HTTP client for lgdModel would break standalone execution, preventing unit testing and
local development without network infrastructure
derived_from_bd_id: BD-018
- id: finance-C-104
when: When implementing linear regression for LGD estimation in federated learning
action: Use SGDRegressor from scikit-learn with partial_fit() for incremental learning; do not replace with OLS closed-form
solution or other batch-only algorithms that require centralized data
severity: high
kind: domain_rule
modality: must
consequence: Switching to OLS or batch-only regression breaks the federated learning architecture, requiring centralized
data aggregation that violates distributed processing assumptions
derived_from_bd_id: BD-020
- id: finance-C-105
when: When implementing LGD estimation logic
action: Use SGDRegressor as the regression algorithm; be aware that switching to alternative algorithms (e.g., RandomForest,
neural networks) requires implementing abstract base class, factory pattern, and serialization abstraction layers
severity: medium
kind: architecture_guardrail
modality: must
consequence: Hardcoded SGDRegressor assumption means alternative algorithms require significant refactoring; strategy
accuracy depends on regression model choice and must be validated independently
derived_from_bd_id: BD-007
- id: finance-C-107
when: When configuring federated learning server infrastructure
action: Use explicit server ID configuration via environment variable instead of port-derived ID (n = port - 5000); verify
port availability in the 5001-5004 range before startup; implement graceful degradation when ports are unavailable
severity: high
kind: architecture_guardrail
modality: must
consequence: Port-derived server ID creates cascading failure risk where port conflicts prevent server startup, causing
health check failures that halt entire federated execution
derived_from_bd_id: BD-084
- id: finance-C-109
when: When implementing model training with partial_fit() in lgdModel.py
action: Use max_iter=1 for each partial_fit() call to maintain explicit per-epoch control in the federated orchestration
loop — do not change to higher values as this blurs the distinction between local and global epochs
severity: high
kind: domain_rule
modality: must
consequence: Increasing max_iter beyond 1 causes local optimization iterations to blend with global federated rounds,
making convergence analysis unreliable and breaking the federated coordination contract
derived_from_bd_id: BD-021
- id: finance-C-110
when: When implementing server ID derivation in model_server.py
action: Assume consecutive port allocation starting from 5001 for server ID derivation — use explicit server ID configuration
instead
severity: medium
kind: operational_lesson
modality: should_not
consequence: Port-based ID derivation assumes a specific port numbering scheme that may not hold in all deployment scenarios,
causing server ID mismatches when ports are allocated non-consecutively
derived_from_bd_id: BD-008
- id: finance-C-111
when: When designing APIs for model serving endpoints
action: Implement external state management for distributed deployments — the /, /start, /update three-endpoint design
assumes stateless operation and does not handle distributed state coordination
severity: high
kind: architecture_guardrail
modality: must
consequence: Stateless API design fails in distributed scenarios where multiple server instances require coordination,
causing inconsistent state across requests
derived_from_bd_id: BD-011
- id: finance-C-112
when: When implementing server lifecycle management
action: Implement the /stop endpoint for graceful shutdown to verify in-flight requests complete and state is properly
finalized before termination
severity: medium
kind: operational_lesson
modality: should
consequence: Abrupt server termination without graceful shutdown risks data corruption and leaves clients with incomplete
responses
derived_from_bd_id: BD-047
- id: finance-C-113
when: When setting up project dependencies
action: Use virtual environment isolation for dependency management to prevent conflicts with system packages and verify
reproducible builds
severity: medium
kind: operational_lesson
modality: should
consequence: System-wide package installation risks breaking system packages and causes dependency conflicts across projects
derived_from_bd_id: BD-049
- id: finance-C-114
when: When integrating with sklearn utilities or ML pipelines
action: Use standard sklearn convention where X=features and y=target — the code uses reversed convention with X as target
and y as explanatory variables
severity: high
kind: domain_rule
modality: must_not
consequence: Reversed X/y convention causes silent failures when using sklearn utilities expecting standard ordering,
producing incorrect model predictions or cryptic errors
derived_from_bd_id: BD-068
- id: finance-C-115
when: When implementing federated round orchestration with sequential blocking
action: Implement timeout handling and retry logic for individual server calls — sequential blocking with BD-072 (/start
before /update) and BD-074 (averaging before next epoch) creates cascading deadlock if any server becomes unresponsive
mid-round
severity: high
kind: architecture_guardrail
modality: must
consequence: A single slow or unresponsive server during /start or /update blocks the entire federated round with no timeout
mechanism, causing cascading timeouts across all rounds
derived_from_bd_id: BD-087
- id: finance-C-116
when: When configuring training epochs for LGD model
action: Centralize epoch configuration in config.yml and import from it in both standalone_run.py and federated_run.py
— do not hardcode epoch values separately
severity: high
kind: domain_rule
modality: must
consequence: Dual-hardcoded epoch values create maintenance hazard; updating epochs in one file but not the other causes
federated and standalone modes to train for different durations, invalidating BD-055 validation baseline
derived_from_bd_id: BD-088
- id: finance-C-117
when: When parsing server port numbers to derive server IDs
action: Hardcode the magic formula n = int(port) - 5000 — the port-to-ID mapping depends on a specific port range allocation
that must remain consistent
severity: medium
kind: architecture_guardrail
modality: must_not
consequence: Hardcoded port offset makes server ID derivation brittle; changing port allocation scheme breaks ID mapping
silently throughout the system
derived_from_bd_id: BD-075
- id: finance-C-118
when: When refactoring training loop code
action: Consider extracting the shared SGD training loop from standalone_run.py and federated_run.py into a common module
to eliminate duplication — duplicate logic in epoch_loop across both files creates maintenance risk
severity: low
kind: operational_lesson
modality: should
consequence: Identical training loop logic duplicated in two files requires synchronized updates; changes applied to one
file but not the other cause divergent behavior between modes
derived_from_bd_id: BD-079
- id: finance-C-119
when: When implementing federated learning fit logic using partial_fit() calls
action: Enable warm_start on the SGDRegressor — warm_start must remain False to verify each partial_fit() call starts
fresh without leveraging optimizer state from previous iterations
severity: high
kind: domain_rule
modality: must_not
consequence: Setting warm_start=True preserves optimizer state across partial_fit() calls, causing unintended state carryover
between federated rounds and breaking the parameter averaging protocol semantics
derived_from_bd_id: BD-052
- id: finance-C-120
when: When selecting features for Loss Given Default (LGD) credit risk estimation
action: Verify that current_assets and cash_and_cash_equivalent_items are the intended features — if replacing with alternative
features, verify liquidity characteristics are still captured as these are fundamental to credit risk modeling
severity: medium
kind: operational_lesson
modality: should
consequence: Using alternative features without liquidity coverage may cause the LGD model to underestimate default losses
for asset-heavy borrowers, leading to insufficient provision calculations in live trading
derived_from_bd_id: BD-044
- id: finance-C-122
when: When implementing warm-start functionality using coef_init or intercept_init parameters
action: Explicitly set warm_start=True before calling fit() to enable parameter reuse — without warm_start=True, coef_init
and intercept_init only apply to the first fit() call and subsequent calls will reinitialize parameters, silently discarding
warm-start behavior
severity: high
kind: domain_rule
modality: must
consequence: If warm_start=False (default), coef_init and intercept_init parameters are ignored after the first fit()
call, causing warm-start attempts to silently fail and lose previously learned parameter state
derived_from_bd_id: BD-070
- id: finance-C-123
when: When consuming model output from lgdModel.fit()
action: Verify that the consumer code expects dictionary return type from fitted_params — if integrating with downstream
systems, verify dict interface compatibility or implement explicit type handling; coordinate with team before changing
return format to tuple or dataclass
severity: medium
kind: operational_lesson
modality: must
consequence: Return format assumes dict interface consumer; if downstream systems expect a different type or the return
format changes, data consumption breaks silently causing downstream processing failures
derived_from_bd_id: BD-071
- id: finance-C-124
when: When implementing stateless server endpoints with warm-start functionality
action: Do not rely on stateless server architecture for /update endpoints that require warm-start — implement state persistence
via coordinator tracking iteration state, sticky sessions with persistent server instances, or shared state store; parameters
received via intercept_init/coef_init must be preserved across requests
severity: high
kind: domain_rule
modality: must
consequence: Stateless server design initializes fresh state per request, but warm-start requires parameter state preservation;
parameters passed via intercept_init/coef_init are silently discarded, causing the federated protocol to produce inconsistent
models across rounds
derived_from_bd_id: BD-082
- id: finance-C-125
when: When integrating data from multiple sources using X/Y column conventions
action: Implement schema validation for column mapping between openNPL API fields and X/Y conventions — verify that hardcoded
field mappings (current_assets, cash_and_cash_equivalent_items) in dataSource function remain synchronized with upstream
API schema; add explicit error handling if expected columns are missing or renamed
severity: high
kind: domain_rule
modality: must
consequence: Hardcoded X/Y column convention breaks silently when openNPL API schema changes, causing incorrect feature
extraction with no obvious error; downstream models train on misaligned data producing invalid LGD estimates
derived_from_bd_id: BD-083
- id: finance-C-126
when: When implementing federated learning coordination logic in the framework
action: Implement sequential blocking without timeout handling for inter-server ordering contracts — this creates a cascading
deadlock vulnerability when servers have heterogeneous data volumes
severity: high
kind: architecture_guardrail
modality: must_not
consequence: Sequential blocking with no timeouts causes the federation to deadlock when any server experiences extended
training time due to larger datasets; all participating servers hang waiting for the slowest server, causing complete
federation failure
derived_from_bd_id: BD-089
- id: finance-C-127
when: When implementing federated learning round coordination in the framework
action: Implement timeout handling for inter-server ordering contracts and add data volume heterogeneity checks before
initiating coordination rounds
severity: high
kind: domain_rule
modality: must
consequence: Without timeout handling and heterogeneity checks, the federation will experience cascading deadlocks when
servers have significantly different dataset sizes, causing complete round failures and federation collapse
derived_from_bd_id: BD-089
- id: finance-C-128
when: When implementing or modifying model serving logic
action: Initialize model from scratch for each request using server identifier — do not cache model state in server memory
between requests
severity: high
kind: architecture_guardrail
modality: must
consequence: 'Stateful model servers cause backtest-live inconsistency: in containerized deployments, instances may be
created/destroyed with stale cached state, and load-balanced multi-instance setups may route requests to instances with
outdated models, leading to unpredictable execution results'
derived_from_bd_id: BD-038
- id: finance-C-129
when: When deploying model servers in containerized or load-balanced environments
action: Verify each request loads model parameters from persistent storage (server_dirs/{server_id}) independently — confirm
no in-memory model caching across requests
severity: high
kind: domain_rule
modality: must
consequence: 'In-memory caching of model state causes platform-dependent behavior: different container instances may have
different cached states, making backtest results non-reproducible across deployment configurations'
derived_from_bd_id: BD-038
- id: finance-C-130
when: When implementing federated averaging weight configuration
action: Verify that sample_count is equal across each servers before using equal weights; if sample counts differ significantly,
implement proportional weighting based on actual sample counts per server
severity: medium
kind: operational_lesson
modality: should
consequence: Equal weighting (25% each) silently distorts federated model accuracy when servers have unequal data volumes;
in production, servers with smaller datasets are over-weighted while larger datasets are under-weighted, leading to
suboptimal model convergence and degraded prediction accuracy
derived_from_bd_id: BD-013/BD-015
- id: finance-C-131
when: When scaling the federated learning system beyond demo scale (>4 servers)
action: Replace blocking sequential HTTP requests with async parallel execution (asyncio with aiohttp) or thread pool
to reduce latency from O(n) linear scaling to near-constant time
severity: medium
kind: operational_lesson
modality: should
consequence: Sequential blocking communication creates linear latency growth O(n) with server count; for 8+ servers, round-trip
time doubles compared to parallel execution, causing unacceptable delays in production federated training rounds
derived_from_bd_id: BD-016
output_validator:
assertions:
- id: OV-01
check_predicate: all(p in inspect.getsource(zvt.factors.algorithm.macd) for p in ['slow=26', 'fast=12', 'n=9'])
failure_message: 'FATAL: MACD params drifted from (fast=12, slow=26, n=9) — SL-08 violation, non-reproducible signals'
business_meaning: Standard MACD parameters are a semantic lock; drift makes results incomparable with industry-standard
indicators and non-reproducible.
source_ids:
- SL-08
- BD-036
- id: OV-02
check_predicate: result.get('total_trades', 0) > 0 or result.get('explicit_zero_trade_ack') is True
failure_message: Zero trades executed — likely missing pre-fetched data (see PC-02) or over-restrictive filters
business_meaning: A backtest with zero trades is not a valid result; either data is missing or the strategy never triggered.
Structural non-emptiness check is insufficient — we need business confirmation.
source_ids:
- SL-01
- finance-C-073
- id: OV-03
check_predicate: result.get('annual_return') is None or abs(float(result['annual_return'])) <= 5.0
failure_message: 'FATAL: |annual_return| > 500% — likely look-ahead bias or data error'
business_meaning: Annual returns exceeding 500% are physically implausible for A-share strategies; indicates look-ahead
bias or corrupt data.
source_ids: []
- id: OV-04
check_predicate: result.get('holding_change_pct') is None or abs(float(result['holding_change_pct'])) <= 1.0
failure_message: 'FATAL: |holding_change_pct| > 100% — physically impossible'
business_meaning: Holding change percentage cannot exceed 100%; violation indicates position accounting error.
source_ids:
- BD-029
- id: OV-05
check_predicate: result.get('max_drawdown') is None or abs(float(result['max_drawdown'])) <= 1.0
failure_message: 'FATAL: |max_drawdown| > 100% — impossible for non-leveraged account'
business_meaning: Maximum drawdown cannot exceed 100% without leverage; violation indicates calculation error or look-ahead
bias.
source_ids: []
- id: OV-06
check_predicate: not (hasattr(result, 'trade_log') and result.trade_log and any(result.trade_log[i].action == 'sell' and
i+1 < len(result.trade_log) and result.trade_log[i+1].action == 'buy' and result.trade_log[i].timestamp == result.trade_log[i+1].timestamp
for i in range(len(result.trade_log)-1)))
failure_message: 'FATAL: buy-before-sell detected in same cycle — SL-01 violation, creates implicit leverage'
business_meaning: SL-01 requires sell() before buy() in each cycle; violation means available_long was not updated before
buying, risking duplicate positions.
source_ids:
- SL-01
scaffold:
validate_py_path: '{workspace}/validate.py'
tail_block: "# === DO NOT MODIFY BELOW THIS LINE ===\nif __name__ == \"__main__\":\n result = run_backtest()\n from\
\ validate import enforce_validation\n enforce_validation(result, output_path=\"{workspace}/result.csv\")\n# ===\
\ END DO NOT MODIFY ==="
enforcement_protocol: 1. Never edit validate.py. 2. Never delete the DO NOT MODIFY tail block from the main script. 3. Never
wrap enforce_validation() in try/except. 4. Never rewrite result write logic — it MUST go through enforce_validation.
5. If validate.py raises ImportError, fix the dependency, do not remove the call.
acceptance:
hard_gates:
- id: G1
check: '{workspace}/result.csv exists AND file size > 0'
on_fail: Strategy did not produce output; check run_backtest() return value and enforce_validation() call
- id: G2
check: '{workspace}/result.csv.validation_passed marker file exists'
on_fail: Validation did not complete; review validate.py output and fix assertion failures
- id: G3
check: 'Main script contains literal: from validate import enforce_validation'
on_fail: Validation chain stripped; re-add the import in the DO NOT MODIFY block
- id: G4
check: 'Main script contains literal: # === DO NOT MODIFY BELOW THIS LINE ==='
on_fail: Validation fence removed; regenerate DO NOT MODIFY tail block
- id: G5
check: 'result.csv has at least 1 row: pandas.read_csv(result.csv).shape[0] >= 1'
on_fail: Empty result; check if trade_log is non-empty and factors generated signals. Confirm PC-02 (k-data exists) passed.
- id: G6
check: 'If MACD strategy: source contains ''slow=26'' AND ''fast=12'' AND ''n=9'' in algorithm call'
on_fail: MACD params drifted from SL-08 lock; restore standard (12, 26, 9)
- id: G7
check: 'For data pipeline tasks: result.csv contains ''entity_id'' and ''timestamp'' fields'
on_fail: Missing required columns; check Mixin.query_data return schema and DataFrame MultiIndex reset_index() before
writing
- id: G8
check: 'OV-03 passes: abs(annual_return) <= 5.0 (500%)'
on_fail: Physical plausibility check failed; investigate look-ahead bias or data corruption in input kdata
soft_gates:
- id: SG-01
rubric: 'Strategy narrative consistency: user intent aligns with generated strategy.py logic. dim_a: signal direction
(buy/sell) matches intent [1-5, pass>=4]; dim_b: frequency (daily/intraday) aligns [1-5, pass>=4]; dim_c: risk controls
match user intent [1-5, pass>=4].'
- id: SG-02
rubric: 'Factor combination quality. dim_a: no highly correlated factor duplication [1-5, pass>=4]; dim_b: multi-period
alignment correct [1-5, pass>=4]; dim_c: liquidity filter present for A-share [1-5, pass>=4].'
- id: SG-03
rubric: 'Data source selection appropriateness. dim_a: coverage sufficient for target entities [1-5, pass>=4]; dim_b:
provider latency acceptable for strategy frequency [1-5, pass>=4]; dim_c: no unauthorized provider used without credentials
[1-5, pass>=4].'
skill_crystallization:
trigger: all_hard_gates_passed AND user_opt_out_skill_saving != true
output_path_template: '{workspace}/../skills/{slug}.skill'
slug_template: '{blueprint_id_short}-{uc_id_lower}'
captured_fields:
- name
- intent_keywords
- entry_point_script
- validate_script
- fatal_constraints
- spec_locks
- preconditions
- install_recipes
- human_summary_translated
action: 'After all Hard Gates PASS, resolve slug via slug_template using the executed UC, then write the .skill YAML file
at output_path_template. Notify user in their detected locale: ''Skill saved as {slug}.skill — next time say one of {sample_triggers}
from the matched UC to invoke directly.'''
violation_signal: All hard gates passed but no .skill file exists at expected path
skill_file_schema:
name: finance-bp-112 / Sphinx Documentation Configuration
version: v5.3
intent_keywords:
- documentation
- sphinx
- configuration
- build docs
- project setup
entry_point: run_backtest
fatal_guards:
- SL-01
- SL-02
- SL-03
- SL-04
- SL-05
- SL-06
- SL-07
- SL-08
- SL-10
- SL-11
- SL-12
spec_locks:
- SL-01
- SL-02
- SL-03
- SL-04
- SL-05
- SL-06
- SL-07
- SL-08
- SL-09
- SL-10
- SL-11
- SL-12
preconditions:
- PC-01
- PC-02
- PC-03
- PC-04
post_install_notice:
trigger: skill_installation_complete
message_template:
positioning: I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow.
capability_catalog:
group_strategy:
source: auto_grouped
strategy_reason: no candidate field had 2-7 distinct values; all capabilities collapsed into single group
groups:
- group_id: all
name: All Capabilities
description: ''
emoji: 📦
uc_count: 1
ucs:
- uc_id: UC-101
name: Sphinx Documentation Configuration
short_description: This file configures the Sphinx documentation builder for the openLGD project, setting up project
metadata, version information, and path configuratio
sample_triggers:
- documentation
- sphinx
- configuration
call_to_action: Tell me which one you want to try.
featured_entries:
- uc_id: UC-101
beginner_prompt: Try sphinx documentation configuration
auto_selected: true
- uc_id: UC-100
beginner_prompt: Try capability UC-100
auto_selected: true
- uc_id: UC-101
beginner_prompt: Try capability UC-101
auto_selected: true
more_info_hint: Ask me 'what else can you do?' to see all 1 capabilities.
locale_rendering:
instruction: On skill_installation_complete, translate ALL user-facing strings (positioning + capability_catalog.groups[].name
+ capability_catalog.groups[].description + capability_catalog.groups[].ucs[].short_description + call_to_action + featured_entries[].beginner_prompt
+ more_info_hint) into detected user locale per locale_contract. Preserve UC-IDs, group_id, emoji, and sample_triggers
verbatim.
preserve_verbatim:
- UC-IDs
- group_id
- emoji
- sample_triggers
- technical_class_names
enforcement:
action: 'Host agent MUST send composed message to user as the FIRST user-facing response after skill_installation_complete
event. Message MUST contain: positioning, capability_catalog (rendered as markdown tables per group), 3 featured_entries,
call_to_action, and more_info_hint.'
violation_code: PIN-01
violation_signal: First user-facing message post-install does not contain the full capability_catalog (all UCs grouped)
OR skips featured_entries OR skips call_to_action.
human_summary:
persona: Doraemon
what_i_can_do:
tagline: 'I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow. Just tell me
what you want; I''ll write the code, you don''t have to dig docs. (Heads up: ZVT natively supports A-share, HK, and
crypto. US stocks — stockus_nasdaq_AAPL — are half-baked; don''t bother for serious work.)'
use_cases:
- Sphinx Documentation Configuration
- A-share MACD daily golden-cross backtest with hfq price adjustment from eastmoney
- 'End-to-end ZVT pipeline: FinanceRecorder + GoodCompanyFactor + StockTrader'
- Multi-factor strategy with TargetSelector (AND mode) combining MACD + volume breakout
- Index composition data collection (SZ1000, SZ2000) with EM recorder
- Institutional fund holdings tracker via joinquant_fund_runner pattern
- Custom Transformer + Accumulator factor with per-entity rolling state
what_i_auto_fetch:
- ZVT stage pipeline structure (data_collection → visualization) from LATEST.yaml
- Semantic locks (SL-01 through SL-12) — especially sell-before-buy ordering and MACD params
- Fatal constraints (finance-C-*) relevant to your target strategy type
- 'Default parameters: MACD(12,26,9), hfq adjustment, buy_cost=0.001, base_capital=1M CNY'
- Entity ID format (stock_sh_600000) and DataFrame MultiIndex convention
- Provider-specific recorder class names and required class attributes
what_i_ask_you:
- 'Target market: A-share (default), HK, or crypto? (US stocks in ZVT are half-baked — stockus_nasdaq_AAPL exists but coverage
is thin)'
- 'Data source / provider: eastmoney (free, no account), joinquant (account+paid), baostock (free, good history), akshare,
or qmt (broker)?'
- 'Strategy type: MACD golden-cross, MA crossover, volume breakout, fundamental screen, or custom factor?'
- 'Time range: start_timestamp and end_timestamp for backtest period'
- 'Target entity IDs: specific stocks (stock_sh_600000) or index components (SZ1000)?'
locale_rendering:
instruction: On first user contact, translate all fields above into detected user locale while preserving Doraemon persona
(direct, frank, mildly snarky, knows limits).
preserve_verbatim:
- BD-IDs
- SL-IDs
- UC-IDs
- finance-C-IDs
- class_names
- function_names
- file_paths
- numeric_thresholds
使用Fama-French因子模型进行气候ESG投资分析,支持月度股价数据下载、因子相关性计算、OLS回归诊断及显著性筛选,帮助用户构建因子组合和风险评估。
---
name: climate-esg-investing
description: |-
使用Fama-French因子模型进行气候ESG投资分析,支持月度股价数据下载、因子相关性计算、OLS回归诊断及显著性筛选,帮助用户构建因子组合和风险评估。
license: Proprietary. See LICENSE.txt in project root.
compatibility: Designed for Doramagic-host ecosystem (Claude Code / openclaw / Cursor). Requires Python 3.12+ with uv package manager.
metadata:
version: "v6.1"
blueprint_id: "finance-bp-105"
compiled_at: "2026-04-22T13:00:49.775031+00:00"
capability_markets: "global"
capability_activities: "macro-data"
sop_version: "crystal-compilation-v6.1"
---
# ESG 气候投资 (climate-esg-investing)
> 使用Fama-French因子模型进行气候ESG投资分析,支持月度股价数据下载、因子相关性计算、OLS回归诊断及显著性筛选,帮助用户构建因子组合和风险评估。
## Pipeline
`data_collection -> data_storage -> factor_computation -> target_selection -> trading_execution -> visualization`
## Top Use Cases (9 total)
### Sector Stock Count and Significant Factor Regression Analyzer (`UC-101`)
Identifies how many stocks from an index fall into each sector and screens for stocks with statistically significant factor regression results based o
**Triggers**: sector composition, significant regression, p-value screening
### Factor Correlation Calculator (`UC-102`)
Computes correlations between different factors over time to understand factor relationship dynamics and potential multicollinearity issues
**Triggers**: factor correlation, correlation matrix, factor relationships
### OLS Regression with Diagnostic Statistics (`UC-103`)
Performs ordinary least squares regression on factor data with comprehensive diagnostic tests including Durbin-Watson, Jarque-Bera, and Breusch-Pagan
**Triggers**: OLS regression, diagnostic tests, statistical tests
For all **9** use cases, see [references/USE_CASES.md](references/USE_CASES.md).
**Execute trigger**: `When user intent matches intent_router.uc_entries[].positive_terms AND user uses action verb (run/execute/跑/执行/backtest/fetch/collect)`
## What I'll Ask You
- Target market: A-share (default), HK, or crypto? (US stocks in ZVT are half-baked — stockus_nasdaq_AAPL exists but coverage is thin)
- Data source / provider: eastmoney (free, no account), joinquant (account+paid), baostock (free, good history), akshare, or qmt (broker)?
- Strategy type: MACD golden-cross, MA crossover, volume breakout, fundamental screen, or custom factor?
- Time range: start_timestamp and end_timestamp for backtest period
- Target entity IDs: specific stocks (stock_sh_600000) or index components (SZ1000)?
## Semantic Locks (Fatal)
| ID | Rule | On Violation |
|---|---|---|
| `SL-01` | Execute sell orders before buy orders in every trading cycle | halt |
| `SL-02` | Trading signals MUST use next-bar execution (no look-ahead) | halt |
| `SL-03` | Entity IDs MUST follow format entity_type_exchange_code | halt |
| `SL-04` | DataFrame index MUST be MultiIndex (entity_id, timestamp) | halt |
| `SL-05` | TradingSignal MUST have EXACTLY ONE of: position_pct, order_money, order_amount | halt |
| `SL-06` | filter_result column semantics: True=BUY, False=SELL, None/NaN=NO ACTION | halt |
| `SL-07` | Transformer MUST run BEFORE Accumulator in factor pipeline | halt |
| `SL-08` | MACD parameters locked: fast=12, slow=26, signal=9 | halt |
Full lock definitions: [references/LOCKS.md](references/LOCKS.md)
## Top Anti-Patterns (14 total)
- **`AP-MACRO-DATA-001`**: SEC EDGAR Rate Limit Violation
- **`AP-MACRO-DATA-002`**: Temporal Knowledge Graph Look-Ahead Bias
- **`AP-MACRO-DATA-003`**: Technical Indicator Look-Ahead Bias via Missing Shift
All 14 anti-patterns: [references/ANTI_PATTERNS.md](references/ANTI_PATTERNS.md)
## Evidence Quality Notice
> [QUALITY NOTICE] This crystal was compiled from blueprint finance-bp-105. Evidence verify ratio = 3.3% and audit fail total = 20. Generated results may have uncaptured requirement gaps. Verify critical decisions against source files (LATEST.yaml / LATEST.jsonl).
## Reference Files
| File | Contents | When to Load |
|---|---|---|
| [references/seed.yaml](references/seed.yaml) | V6+ 全量权威 (source-of-truth) | 有行为/决策争议时必读 |
| [references/ANTI_PATTERNS.md](references/ANTI_PATTERNS.md) | 14 条跨项目反模式 | 开始实现前 |
| [references/WISDOM.md](references/WISDOM.md) | 跨项目精华借鉴 | 架构决策时 |
| [references/CONSTRAINTS.md](references/CONSTRAINTS.md) | domain + fatal 约束 | 规则冲突时 |
| [references/USE_CASES.md](references/USE_CASES.md) | 全量 KUC-* 业务场景 | 需要完整示例时 |
| [references/LOCKS.md](references/LOCKS.md) | SL-* + preconditions + hints | 生成回测/交易代码前 |
| [references/COMPONENTS.md](references/COMPONENTS.md) | AST 组件地图(按 module 拆分)| 查 API 时 |
---
*Compiled by Doramagic crystal-compilation-v6.1 from `finance-bp-105` blueprint at 2026-04-22T13:00:49.775031+00:00.*
*See [human_summary.md](human_summary.md) for non-technical overview.*
FILE:human_summary.md
# Human Summary
> I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow. Just tell me what you want; I'll write the code, you don't have to dig docs. (Heads up: ZVT natively supports A-share, HK, and crypto. US stocks — stockus_nasdaq_AAPL — are half-baked; don't bother for serious work.)
## What I Can Do
- **tagline**: I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow. Just tell me what you want; I'll write the code, you don't have to dig docs. (Heads up: ZVT natively supports A-share, HK, and crypto. US stocks — stockus_nasdaq_AAPL — are half-baked; don't bother for serious work.)
- **use_cases**: ['OLS Regression with Diagnostic Statistics', 'Factor Correlation Calculator', 'Sector Stock Count and Significant Factor Regression Analyzer', 'A-share MACD daily golden-cross backtest with hfq price adjustment from eastmoney', 'End-to-end ZVT pipeline: FinanceRecorder + GoodCompanyFactor + StockTrader', 'Multi-factor strategy with TargetSelector (AND mode) combining MACD + volume breakout', 'Index composition data collection (SZ1000, SZ2000) with EM recorder']
## What I Ask You
- Target market: A-share (default), HK, or crypto? (US stocks in ZVT are half-baked — stockus_nasdaq_AAPL exists but coverage is thin)
- Data source / provider: eastmoney (free, no account), joinquant (account+paid), baostock (free, good history), akshare, or qmt (broker)?
- Strategy type: MACD golden-cross, MA crossover, volume breakout, fundamental screen, or custom factor?
- Time range: start_timestamp and end_timestamp for backtest period
- Target entity IDs: specific stocks (stock_sh_600000) or index components (SZ1000)?
FILE:references/ANTI_PATTERNS.md
# Anti-Patterns (Cross-Project)
Total: **14**
## finance-bp-074--FinRobot (1)
### `AP-MACRO-DATA-001` — SEC EDGAR Rate Limit Violation <sub>(high)</sub>
When implementing SEC API calls without applying rate limiting decorators, requests exceed the regulatory 10 requests per second limit. This causes IP blocking from SEC EDGAR, preventing all subsequent access to critical financial filings and completely disrupting the data collection pipeline. FinRobot demonstrates that SEC enforces strict rate limits and missing User-Agent headers compound this by causing silent request failures.
## finance-bp-077--Open_Source_Economic_Model (2)
### `AP-MACRO-DATA-004` — EIOPA Non-Compliant Curve Extrapolation <sub>(high)</sub>
When implementing the Smith-Wilson algorithm for EIOPA Solvency II calculations, using non-EIOPA compliant formulas or incorrect convergence point calculations violates regulatory specifications. The convergence point must use max(U+40, 60) years per EIOPA paragraph 120. Non-compliant formulas will fail regulatory audits for insurance liability calculations and produce incorrect risk-free rates, leading to materially wrong liability valuations.
### `AP-MACRO-DATA-009` — CSV BOM Encoding Corruption in Data Import <sub>(medium)</sub>
When importing CSV portfolio files with special characters without using 'utf-8-sig' encoding to handle BOM markers, CSV files with UTF-8 BOM markers fail to parse correctly. This causes KeyError exceptions when reading row fields, preventing portfolio data from loading entirely. The BOM marker silently corrupts the first column name read by pandas.
## finance-bp-080--FinDKG (3)
### `AP-MACRO-DATA-002` — Temporal Knowledge Graph Look-Ahead Bias <sub>(high)</sub>
When implementing temporal data splitting for knowledge graphs, using non-temporal train/val/test splits causes the model to see future events during training. The violation of train_edges occurring before val_edges and test_edges temporally results in inflated metrics that do not reflect real-world performance. This produces overfit models that fail catastrophically when deployed for actual temporal prediction tasks.
### `AP-MACRO-DATA-008` — DGL Graph Attribute Propagation Failure in Temporal Batching <sub>(medium)</sub>
When implementing temporal knowledge graph data collation without propagating graph attributes (num_relations, num_all_nodes, time_interval) to subgraph variants created by collate_fn, downstream model components encounter missing attribute errors. The EmbeddingUpdater and EdgeModel expect these attributes on all graph objects including subgraphs, causing training to fail with AttributeError.
### `AP-MACRO-DATA-014` — Temporal DataLoader Shuffling Breaking Graph Ordering <sub>(medium)</sub>
When configuring DataLoader for temporal knowledge graph training with shuffle=True, the temporal ordering required for cumulative graph construction is violated. The model receives edges in non-chronological order, breaking the prior_G, batch_G, cumulative_G construction logic that depends on edges_before_batch occurring before edges_in_batch.
## finance-bp-083--Economic-Dashboard (3)
### `AP-MACRO-DATA-003` — Technical Indicator Look-Ahead Bias via Missing Shift <sub>(high)</sub>
When implementing SMA crossover detection (golden/death cross) without using shift(1) to compare current bar state with prior bar state, crossover detection uses current bar data causing look-ahead bias. Signals appear to fire at the same bar as the cross occurs, producing unrealistic backtest results that fail in live trading. Rationalizing this with 'we need the current bar signal immediately' leads to future information leaking into current signals.
### `AP-MACRO-DATA-010` — OHLCV Data Quality Validation Failure <sub>(medium)</sub>
When calculating technical indicators from OHLCV data without verifying required columns (open, high, low, close, volume), missing required OHLCV columns causes ValueError and prevents technical indicator calculation for affected tickers. This blocks downstream regime classification and pattern detection for all tickers with incomplete data.
### `AP-MACRO-DATA-011` — Inconsistent Primary Key Schema Causing JOIN Failures <sub>(medium)</sub>
When storing derived features in DuckDB with a different primary key schema than technical_features table, inconsistent primary keys prevent JOIN operations between tables. This breaks regime classification and pattern detection pipelines. The composite primary key (ticker, date) must be consistent across all feature tables to enable efficient querying and data integrity.
## finance-bp-105--open-climate-investing (5)
### `AP-MACRO-DATA-005` — Factor Regression Using Raw Returns Instead of Excess Returns <sub>(high)</sub>
When computing returns for CAPM/Fama-French factor regression, using raw stock returns instead of subtracting the risk-free rate (Rf) violates standard financial econometric methodology. CAPM/FF regression requires excess returns (Return - Rf); using raw returns produces incorrect beta estimates that misrepresent a stock's systematic risk exposure. This leads to fundamentally flawed risk attribution and portfolio construction decisions.
### `AP-MACRO-DATA-006` — Percentage vs Decimal Unit Mismatch in Factor Data <sub>(high)</sub>
When importing Fama-French factors from CSV files, failing to divide percentage-formatted factors (e.g., 5.2) by 100 before regression causes coefficients scaled by 100x. This produces statistically invalid inference and meaningless factor loadings. The same issue applies to risk-free rate values, corrupting all CAPM beta calculations downstream.
### `AP-MACRO-DATA-007` — Insufficient Regression Observations for Statistical Validity <sub>(medium)</sub>
When implementing factor regression analysis, using fewer than 20 data points after filtering (inner join, winsorization, date range) produces unreliable or undefined t-statistics and p-values. OLS with insufficient observations produces meaningless regression coefficients, making it impossible to distinguish significant factor exposures from noise. This commonly occurs when combining multiple data sources with missing values.
### `AP-MACRO-DATA-012` — Frequency Column Enforcement Missing in Time Series Schema <sub>(medium)</sub>
When creating PostgreSQL schema for time series tables without explicit frequency column enforcement of 'MONTHLY' or 'DAILY' text values, mixed frequency data corrupts regression calculations. Combining incompatible data frequencies produces statistically invalid regression results. The database must enforce frequency consistency to prevent silent data corruption.
### `AP-MACRO-DATA-013` — PostgreSQL Fork in Multiprocessing Context <sub>(medium)</sub>
When implementing multiprocessing for parallel regression execution using fork start method with psycopg2 database connections, child processes inherit corrupted connection state. This causes 'connection already closed' errors or corrupted connection state in child processes, resulting in failed database writes and incomplete factor regression calculations.
FILE:references/COMPONENTS.md
# Component Capability Map
**Project**: finance-bp-105--open-climate-investing
**Scan date**: 2026-04-22
**Stats**: {'total_files': 7, 'total_classes': 25, 'total_functions': 0, 'total_stages': 7}
## Modules (7)
- [stock_data_collection](components/stock_data_collection.md): 2 classes
- [database_setup_&_data_import](components/database_setup_-_data_import.md): 4 classes
- [bmg_factor_computation](components/bmg_factor_computation.md): 4 classes
- [factor_regression_analysis](components/factor_regression_analysis.md): 4 classes
- [bulk_regression_execution](components/bulk_regression_execution.md): 5 classes
- [factor_correlation_&_orthogonalization](components/factor_correlation_-_orthogonalization.md): 3 classes
- [regression_results_analysis](components/regression_results_analysis.md): 3 classes
FILE:references/CONSTRAINTS.md
# Constraints
## preservation_manifest
```yaml
required_objects:
business_decisions_count: 124
fatal_constraints_count: 39
non_fatal_constraints_count: 139
use_cases_count: 9
semantic_locks_count: 12
preconditions_count: 4
evidence_quality_rules_count: 2
traceback_scenarios_count: 5
```
FILE:references/LOCKS.md
# Semantic Locks + Preconditions
## Semantic Locks (12)
### `SL-01` <sub>(on_violation: fatal)</sub>
Execute sell orders before buy orders in every trading cycle
### `SL-02` <sub>(on_violation: fatal)</sub>
Trading signals MUST use next-bar execution (no look-ahead)
### `SL-03` <sub>(on_violation: fatal)</sub>
Entity IDs MUST follow format entity_type_exchange_code
### `SL-04` <sub>(on_violation: fatal)</sub>
DataFrame index MUST be MultiIndex (entity_id, timestamp)
### `SL-05` <sub>(on_violation: fatal)</sub>
TradingSignal MUST have EXACTLY ONE of: position_pct, order_money, order_amount
### `SL-06` <sub>(on_violation: fatal)</sub>
filter_result column semantics: True=BUY, False=SELL, None/NaN=NO ACTION
### `SL-07` <sub>(on_violation: fatal)</sub>
Transformer MUST run BEFORE Accumulator in factor pipeline
### `SL-08` <sub>(on_violation: fatal)</sub>
MACD parameters locked: fast=12, slow=26, signal=9
### `SL-09` <sub>(on_violation: warning)</sub>
Default transaction costs: buy_cost=0.001, sell_cost=0.001, slippage=0.001
### `SL-10` <sub>(on_violation: fatal)</sub>
A-share equity trading is T+1 (no same-day close of buy positions)
### `SL-11` <sub>(on_violation: fatal)</sub>
Recorder subclass MUST define provider AND data_schema class attributes
### `SL-12` <sub>(on_violation: fatal)</sub>
Factor result_df MUST contain either 'filter_result' OR 'score_result' column
## Preconditions (4)
- **`PC-01`**: `python3 -c 'import zvt; print(zvt.__version__)'` → on_fail: Run: python3 -m pip install zvt then re-run: python3 -m zvt.init_dirs to initialize data directories
- **`PC-02`**: `python3 -c "from zvt.api.kdata import get_kdata; df = get_kdata(entity_ids=['stock_sh_600000'], limit=1); assert df is not None and len(df) > 0, 'No kdata found'"` → on_fail: Run recorder first: python3 -m zvt.recorders.em.em_stock_kdata_recorder --entity_ids stock_sh_600000 (replace with your target entity IDs)
- **`PC-03`**: `python3 -c "import os; from pathlib import Path; zvt_home = Path(os.environ.get('ZVT_HOME', Path.home() / '.zvt')); assert zvt_home.exists(), f'ZVT home not found: {zvt_home}'"` → on_fail: Run: python3 -m zvt.init_dirs
- **`PC-04`**: `python3 -c "import os, tempfile; from pathlib import Path; zvt_home = Path(os.environ.get('ZVT_HOME', Path.home() / '.zvt')); test_f = zvt_home / '.write_test'; test_f.touch(); test_f.unlink()"` → on_fail: Check directory permissions: chmod u+w ~/.zvt or set ZVT_HOME environment variable to a writable location
FILE:references/USE_CASES.md
# Known Use Cases (KUC)
Total: **9**
## `KUC-101`
**Source**: `scripts/bmg_analyze.py`
Identifies how many stocks from an index fall into each sector and screens for stocks with statistically significant factor regression results based on p-values.
## `KUC-102`
**Source**: `scripts/correlate.py`
Computes correlations between different factors over time to understand factor relationship dynamics and potential multicollinearity issues.
## `KUC-103`
**Source**: `scripts/regression_function.py`
Performs ordinary least squares regression on factor data with comprehensive diagnostic tests including Durbin-Watson, Jarque-Bera, and Breusch-Pagan tests.
## `KUC-104`
**Source**: `scripts/bulk_script.py`
Builds custom Fama-French style factor models by merging stock returns, Fama-French factors, and carbon risk factors into unified datasets for analysis.
## `KUC-105`
**Source**: `scripts/stock_price_function.py`
Downloads historical stock price data from Yahoo Finance with support for daily and monthly frequencies, including automatic retry on timeout.
## `KUC-106`
**Source**: `scripts/bmg_series.py`
Creates Brown-Green (BMG) factor series by calculating the return differential between brown (high carbon) and green (low carbon) stocks for carbon risk analysis.
## `KUC-107`
**Source**: `scripts/get_regressions.py`
Executes factor regression analysis across multiple stocks in parallel using multiprocessing, loading Fama-French and carbon risk factors from database.
## `KUC-108`
**Source**: `scripts/get_stocks.py`
Imports stock return data from CSV or downloads from yfinance, with support for incremental updates to maintain current database with stock returns.
## `KUC-109`
**Source**: `scripts/setup_db.py`
Initializes database schema and imports Fama-French, bond, and carbon risk factors into PostgreSQL tables, including BMG factor data.
FILE:references/WISDOM.md
# Cross-Project Wisdom
Total: **8**
## `CW-MACRO-DATA-001` — Temporal Ordering Enforcement
**From**: finance-bp-080--FinDKG, finance-bp-083--Economic-Dashboard · **Applicable to**: macro-data
Across temporal knowledge graphs and financial time series, strict temporal ordering must be enforced in train/val/test splits and data loading. Train edges must occur strictly before validation edges, which must occur strictly before test edges. DataLoaders must never shuffle temporal data. Apply this pattern whenever implementing any time-series ML pipeline to prevent look-ahead bias that inflates evaluation metrics.
## `CW-MACRO-DATA-002` — Regulatory Formula Compliance
**From**: finance-bp-077--Open_Source_Economic_Model, finance-bp-105--open-climate-investing · **Applicable to**: macro-data
When implementing financial calculations subject to regulatory oversight (EIOPA Solvency II, CAPM, Fama-French), use exact formula specifications from authoritative sources. The Smith-Wilson convergence point must follow EIOPA paragraph 120, factor regressions must use excess returns with properly scaled inputs. Apply this pattern when calculations will be used for regulatory reporting or investment decision-making.
## `CW-MACRO-DATA-003` — Strict Data Schema Enforcement
**From**: finance-bp-083--Economic-Dashboard, finance-bp-077--Open_Source_Economic_Model · **Applicable to**: macro-data
Financial data pipelines require strict schema validation at ingestion points. OHLCV requires specific columns, CSV imports require exact column names matching field access, INI files require specific sections. Missing or malformed schema elements should fail loudly rather than produce silent corruption. Apply this pattern during data import to catch errors early before downstream calculations use bad data.
## `CW-MACRO-DATA-004` — Composite Primary Key Uniqueness
**From**: finance-bp-105--open-climate-investing, finance-bp-080--FinDKG, finance-bp-083--Economic-Dashboard · **Applicable to**: macro-data
Time-series financial databases require composite primary keys (ticker, date) to ensure uniqueness and enable efficient querying. Inconsistent primary keys across tables break JOIN operations essential for feature merging. Apply this pattern when designing any financial database schema involving time-series measurements with multiple entities.
## `CW-MACRO-DATA-005` — External API Rate Limiting
**From**: finance-bp-074--FinRobot · **Applicable to**: macro-data
When accessing external financial APIs (SEC EDGAR, data vendors), strict rate limiting must be implemented before deployment. SEC EDGAR enforces 10 requests per second with IP blocking consequences. Use decorators and proper User-Agent headers. Apply this pattern when integrating any external financial data API to prevent service disruption that blocks critical data access.
## `CW-MACRO-DATA-006` — Graph Attribute Propagation in Batching
**From**: finance-bp-080--FinDKG, finance-bp-105--open-climate-investing · **Applicable to**: macro-data
When creating subgraph variants during batch collation in graph-based ML, all metadata attributes (num_nodes, num_relations, time_interval) must be explicitly propagated to each subgraph. Downstream model components expect these attributes on all graph objects. Apply this pattern whenever implementing custom collate functions for graph neural networks to prevent training failures.
## `CW-MACRO-DATA-007` — Statistical Validity Thresholds
**From**: finance-bp-105--open-climate-investing, finance-bp-083--Economic-Dashboard · **Applicable to**: macro-data
Factor regressions and statistical calculations require minimum observation counts (typically 20+) for reliable inference. Inner joins, winsorization, and date filtering reduce observations; pipeline validation must check for sufficient data points before regression. Apply this pattern whenever computing regression statistics to ensure results are meaningful rather than spurious.
## `CW-MACRO-DATA-008` — Data Type Strictness for ML Operations
**From**: finance-bp-080--FinDKG, finance-bp-077--Open_Source_Economic_Model · **Applicable to**: macro-data
Graph operations and time calculations require strict dtype consistency (float32 for time values, integer for node types, boolean for masks). Type mismatches cause silent failures in edge_subgraph, degree calculations, and time interval transformations. Apply this pattern when preparing data for graph neural networks or any numerical ML pipeline to catch dtype issues early.
FILE:references/components/bmg_factor_computation.md
# bmg_factor_computation (4 classes)
## `add_bmg_series`
`bmg_factor_computation/add-bmg-series.py:0`
## `get_bmg_series`
`bmg_factor_computation/get-bmg-series.py:0`
## `load_stocks_returns_from_db`
`bmg_factor_computation/load-stocks-returns-from-db.py:0`
## `factor_definition`
`bmg_factor_computation/factor-definition.py:0`
FILE:references/components/bulk_regression_execution.md
# bulk_regression_execution (5 classes)
## `run_regression`
`bulk_regression_execution/run-regression.py:0`
## `run_regression_internal`
`bulk_regression_execution/run-regression-internal.py:0`
## `store_regression_into_db`
`bulk_regression_execution/store-regression-into-db.py:0`
## `window_type`
`bulk_regression_execution/window-type.py:0`
## `parallelization`
`bulk_regression_execution/parallelization.py:0`
FILE:references/components/database_setup_-_data_import.md
# database_setup_&_data_import (4 classes)
## `load_data_files`
`database_setup_&_data_import/load-data-files.py:0`
## `import_monthly_ff_data_factors_into_sql`
`database_setup_&_data_import/import-monthly-ff-data-factors-into-sql.py:0`
## `import_msci_constituents_into_sql`
`database_setup_&_data_import/import-msci-constituents-into-sql.py:0`
## `database`
`database_setup_&_data_import/database.py:0`
FILE:references/components/factor_correlation_-_orthogonalization.md
# factor_correlation_&_orthogonalization (3 classes)
## `process_factor`
`factor_correlation_&_orthogonalization/process-factor.py:0`
## `execute_batch`
`factor_correlation_&_orthogonalization/execute-batch.py:0`
## `orthogonalization_method`
`factor_correlation_&_orthogonalization/orthogonalization-method.py:0`
FILE:references/components/factor_regression_analysis.md
# factor_regression_analysis (4 classes)
## `run_regression`
`factor_regression_analysis/run-regression.py:0`
## `regression_input_output`
`factor_regression_analysis/regression-input-output.py:0`
## `merge_data`
`factor_regression_analysis/merge-data.py:0`
## `regression_model`
`factor_regression_analysis/regression-model.py:0`
FILE:references/components/regression_results_analysis.md
# regression_results_analysis (3 classes)
## `get_stocks_with_significant_regressions`
`regression_results_analysis/get-stocks-with-significant-regressions.py:0`
## `get_sectors_with_significant_final_regression`
`regression_results_analysis/get-sectors-with-significant-final-regre.py:0`
## `significance_criteria`
`regression_results_analysis/significance-criteria.py:0`
FILE:references/components/stock_data_collection.md
# stock_data_collection (2 classes)
## `stock_grabber`
`stock_data_collection/stock-grabber.py:0`
## `data_source`
`stock_data_collection/data-source.py:0`
FILE:references/seed.yaml
meta:
id: finance-bp-105-v5.3
version: v6.1
blueprint_id: finance-bp-105
sop_version: crystal-compilation-v6.1
source_language: en
compiled_at: '2026-04-22T13:00:49.775031+00:00'
target_host: openclaw
authoritative_artifact:
primary: seed.yaml
non_authoritative_derivatives:
- SKILL.md (host-generated summary, may lag)
- HEARTBEAT.md (host telemetry)
- memory/*.md (host conversational memory)
rule: On any behavioral decision (preconditions check, OV assertion, EQ rule firing, spec_lock verification), agents MUST
re-read seed.yaml. Derivatives are for UI display only and may be out-of-date.
execution_protocol:
install_trigger:
- Execute resources.host_adapter.install_recipes[] in declared order
- Verify each package with import check before proceeding
execute_trigger: When user intent matches intent_router.uc_entries[].positive_terms AND user uses action verb (run/execute/跑/执行/backtest/fetch/collect)
on_execute:
- Reload seed.yaml (do not rely on SKILL.md or cached summaries)
- Run preconditions[] in declared order; halt on first fatal failure with on_fail message to user
- Enter context_state_machine.CA1_MEMORY_CHECKED state
- Evaluate evidence_quality.enforcement_rules[]; prepend user_disclosure_template
- Translate user_facing_fields to user locale per locale_contract
- "[V6 READING ORDER]\nThis crystal contains the following V6 layers. Before answering any business question, the host\
\ MUST read them in order:\n 1. anti_patterns[] — cross-project anti-patterns (with AP-* ids)\n 2. cross_project_wisdom[]\
\ — cross-project wisdom (with CW-* ids)\n 3. domain_constraints_injected[] — domain constraints (SHARED-* ids)\n \
\ 4. known_use_cases[] — concrete business scenarios (KUC-* ids)\n 5. component_capability_map — AST component map\
\ (by module)\n\nWhen answering user questions, proactively cite relevant AP-*/CW-*/SHARED-*/KUC-* ids with source text.\
\ Examples: T+1 rules -> cite SHARED-* constraint; model comparison -> warn via AP-*; follow-holdings strategy -> cite\
\ KUC-* with example file."
workspace_resolution:
scripts_path: '{host_workspace}/scripts/'
skills_path: '{host_workspace}/skills/'
trace_path: '{host_workspace}/.trace/'
capability_tags:
markets:
- global
activities:
- macro-data
upgraded_from: finance-bp-105-v1.seed.yaml
upgraded_at: '2026-04-22T13:20:28.159836+00:00'
v6_inputs:
ast_mind_map: knowledge/sources/finance/finance-bp-105--open-climate-investing/v6_inputs/ast_mind_map.yaml
anti_patterns: null
cross_project_wisdom: null
examples_kuc: knowledge/sources/finance/finance-bp-105--open-climate-investing/v6_inputs/examples_kuc.yaml
shared_pools_dir: knowledge/sources/finance/_shared
anti_patterns:
- id: AP-MACRO-DATA-001
title: SEC EDGAR Rate Limit Violation
description: When implementing SEC API calls without applying rate limiting decorators, requests exceed the regulatory 10
requests per second limit. This causes IP blocking from SEC EDGAR, preventing all subsequent access to critical financial
filings and completely disrupting the data collection pipeline. FinRobot demonstrates that SEC enforces strict rate limits
and missing User-Agent headers compound this by causing silent request failures.
project_source: finance-bp-074--FinRobot
severity: high
applicable_to_tags:
markets:
- global
activities:
- macro-data
_source_file: anti-patterns/macro-data.yaml
- id: AP-MACRO-DATA-002
title: Temporal Knowledge Graph Look-Ahead Bias
description: When implementing temporal data splitting for knowledge graphs, using non-temporal train/val/test splits causes
the model to see future events during training. The violation of train_edges occurring before val_edges and test_edges
temporally results in inflated metrics that do not reflect real-world performance. This produces overfit models that fail
catastrophically when deployed for actual temporal prediction tasks.
project_source: finance-bp-080--FinDKG
severity: high
applicable_to_tags:
markets:
- global
activities:
- macro-data
_source_file: anti-patterns/macro-data.yaml
- id: AP-MACRO-DATA-003
title: Technical Indicator Look-Ahead Bias via Missing Shift
description: When implementing SMA crossover detection (golden/death cross) without using shift(1) to compare current bar
state with prior bar state, crossover detection uses current bar data causing look-ahead bias. Signals appear to fire
at the same bar as the cross occurs, producing unrealistic backtest results that fail in live trading. Rationalizing this
with 'we need the current bar signal immediately' leads to future information leaking into current signals.
project_source: finance-bp-083--Economic-Dashboard
severity: high
applicable_to_tags:
markets:
- global
activities:
- macro-data
_source_file: anti-patterns/macro-data.yaml
- id: AP-MACRO-DATA-004
title: EIOPA Non-Compliant Curve Extrapolation
description: When implementing the Smith-Wilson algorithm for EIOPA Solvency II calculations, using non-EIOPA compliant
formulas or incorrect convergence point calculations violates regulatory specifications. The convergence point must use
max(U+40, 60) years per EIOPA paragraph 120. Non-compliant formulas will fail regulatory audits for insurance liability
calculations and produce incorrect risk-free rates, leading to materially wrong liability valuations.
project_source: finance-bp-077--Open_Source_Economic_Model
severity: high
applicable_to_tags:
markets:
- global
activities:
- macro-data
_source_file: anti-patterns/macro-data.yaml
- id: AP-MACRO-DATA-005
title: Factor Regression Using Raw Returns Instead of Excess Returns
description: When computing returns for CAPM/Fama-French factor regression, using raw stock returns instead of subtracting
the risk-free rate (Rf) violates standard financial econometric methodology. CAPM/FF regression requires excess returns
(Return - Rf); using raw returns produces incorrect beta estimates that misrepresent a stock's systematic risk exposure.
This leads to fundamentally flawed risk attribution and portfolio construction decisions.
project_source: finance-bp-105--open-climate-investing
severity: high
applicable_to_tags:
markets:
- global
activities:
- macro-data
_source_file: anti-patterns/macro-data.yaml
- id: AP-MACRO-DATA-006
title: Percentage vs Decimal Unit Mismatch in Factor Data
description: When importing Fama-French factors from CSV files, failing to divide percentage-formatted factors (e.g., 5.2)
by 100 before regression causes coefficients scaled by 100x. This produces statistically invalid inference and meaningless
factor loadings. The same issue applies to risk-free rate values, corrupting all CAPM beta calculations downstream.
project_source: finance-bp-105--open-climate-investing
severity: high
applicable_to_tags:
markets:
- global
activities:
- macro-data
_source_file: anti-patterns/macro-data.yaml
- id: AP-MACRO-DATA-007
title: Insufficient Regression Observations for Statistical Validity
description: When implementing factor regression analysis, using fewer than 20 data points after filtering (inner join,
winsorization, date range) produces unreliable or undefined t-statistics and p-values. OLS with insufficient observations
produces meaningless regression coefficients, making it impossible to distinguish significant factor exposures from noise.
This commonly occurs when combining multiple data sources with missing values.
project_source: finance-bp-105--open-climate-investing
severity: medium
applicable_to_tags:
markets:
- global
activities:
- macro-data
_source_file: anti-patterns/macro-data.yaml
- id: AP-MACRO-DATA-008
title: DGL Graph Attribute Propagation Failure in Temporal Batching
description: When implementing temporal knowledge graph data collation without propagating graph attributes (num_relations,
num_all_nodes, time_interval) to subgraph variants created by collate_fn, downstream model components encounter missing
attribute errors. The EmbeddingUpdater and EdgeModel expect these attributes on all graph objects including subgraphs,
causing training to fail with AttributeError.
project_source: finance-bp-080--FinDKG
severity: medium
applicable_to_tags:
markets:
- global
activities:
- macro-data
_source_file: anti-patterns/macro-data.yaml
- id: AP-MACRO-DATA-009
title: CSV BOM Encoding Corruption in Data Import
description: When importing CSV portfolio files with special characters without using 'utf-8-sig' encoding to handle BOM
markers, CSV files with UTF-8 BOM markers fail to parse correctly. This causes KeyError exceptions when reading row fields,
preventing portfolio data from loading entirely. The BOM marker silently corrupts the first column name read by pandas.
project_source: finance-bp-077--Open_Source_Economic_Model
severity: medium
applicable_to_tags:
markets:
- global
activities:
- macro-data
_source_file: anti-patterns/macro-data.yaml
- id: AP-MACRO-DATA-010
title: OHLCV Data Quality Validation Failure
description: When calculating technical indicators from OHLCV data without verifying required columns (open, high, low,
close, volume), missing required OHLCV columns causes ValueError and prevents technical indicator calculation for affected
tickers. This blocks downstream regime classification and pattern detection for all tickers with incomplete data.
project_source: finance-bp-083--Economic-Dashboard
severity: medium
applicable_to_tags:
markets:
- global
activities:
- macro-data
_source_file: anti-patterns/macro-data.yaml
- id: AP-MACRO-DATA-011
title: Inconsistent Primary Key Schema Causing JOIN Failures
description: When storing derived features in DuckDB with a different primary key schema than technical_features table,
inconsistent primary keys prevent JOIN operations between tables. This breaks regime classification and pattern detection
pipelines. The composite primary key (ticker, date) must be consistent across all feature tables to enable efficient querying
and data integrity.
project_source: finance-bp-083--Economic-Dashboard
severity: medium
applicable_to_tags:
markets:
- global
activities:
- macro-data
_source_file: anti-patterns/macro-data.yaml
- id: AP-MACRO-DATA-012
title: Frequency Column Enforcement Missing in Time Series Schema
description: When creating PostgreSQL schema for time series tables without explicit frequency column enforcement of 'MONTHLY'
or 'DAILY' text values, mixed frequency data corrupts regression calculations. Combining incompatible data frequencies
produces statistically invalid regression results. The database must enforce frequency consistency to prevent silent data
corruption.
project_source: finance-bp-105--open-climate-investing
severity: medium
applicable_to_tags:
markets:
- global
activities:
- macro-data
_source_file: anti-patterns/macro-data.yaml
- id: AP-MACRO-DATA-013
title: PostgreSQL Fork in Multiprocessing Context
description: When implementing multiprocessing for parallel regression execution using fork start method with psycopg2 database
connections, child processes inherit corrupted connection state. This causes 'connection already closed' errors or corrupted
connection state in child processes, resulting in failed database writes and incomplete factor regression calculations.
project_source: finance-bp-105--open-climate-investing
severity: medium
applicable_to_tags:
markets:
- global
activities:
- macro-data
_source_file: anti-patterns/macro-data.yaml
- id: AP-MACRO-DATA-014
title: Temporal DataLoader Shuffling Breaking Graph Ordering
description: When configuring DataLoader for temporal knowledge graph training with shuffle=True, the temporal ordering
required for cumulative graph construction is violated. The model receives edges in non-chronological order, breaking
the prior_G, batch_G, cumulative_G construction logic that depends on edges_before_batch occurring before edges_in_batch.
project_source: finance-bp-080--FinDKG
severity: medium
applicable_to_tags:
markets:
- global
activities:
- macro-data
_source_file: anti-patterns/macro-data.yaml
cross_project_wisdom:
- wisdom_id: CW-MACRO-DATA-001
source_project: finance-bp-080--FinDKG, finance-bp-083--Economic-Dashboard
pattern_name: Temporal Ordering Enforcement
description: Across temporal knowledge graphs and financial time series, strict temporal ordering must be enforced in train/val/test
splits and data loading. Train edges must occur strictly before validation edges, which must occur strictly before test
edges. DataLoaders must never shuffle temporal data. Apply this pattern whenever implementing any time-series ML pipeline
to prevent look-ahead bias that inflates evaluation metrics.
applicable_to_activity: macro-data
_source_file: cross-project-wisdom/macro-data.yaml
- wisdom_id: CW-MACRO-DATA-002
source_project: finance-bp-077--Open_Source_Economic_Model, finance-bp-105--open-climate-investing
pattern_name: Regulatory Formula Compliance
description: When implementing financial calculations subject to regulatory oversight (EIOPA Solvency II, CAPM, Fama-French),
use exact formula specifications from authoritative sources. The Smith-Wilson convergence point must follow EIOPA paragraph
120, factor regressions must use excess returns with properly scaled inputs. Apply this pattern when calculations will
be used for regulatory reporting or investment decision-making.
applicable_to_activity: macro-data
_source_file: cross-project-wisdom/macro-data.yaml
- wisdom_id: CW-MACRO-DATA-003
source_project: finance-bp-083--Economic-Dashboard, finance-bp-077--Open_Source_Economic_Model
pattern_name: Strict Data Schema Enforcement
description: Financial data pipelines require strict schema validation at ingestion points. OHLCV requires specific columns,
CSV imports require exact column names matching field access, INI files require specific sections. Missing or malformed
schema elements should fail loudly rather than produce silent corruption. Apply this pattern during data import to catch
errors early before downstream calculations use bad data.
applicable_to_activity: macro-data
_source_file: cross-project-wisdom/macro-data.yaml
- wisdom_id: CW-MACRO-DATA-004
source_project: finance-bp-105--open-climate-investing, finance-bp-080--FinDKG, finance-bp-083--Economic-Dashboard
pattern_name: Composite Primary Key Uniqueness
description: Time-series financial databases require composite primary keys (ticker, date) to ensure uniqueness and enable
efficient querying. Inconsistent primary keys across tables break JOIN operations essential for feature merging. Apply
this pattern when designing any financial database schema involving time-series measurements with multiple entities.
applicable_to_activity: macro-data
_source_file: cross-project-wisdom/macro-data.yaml
- wisdom_id: CW-MACRO-DATA-005
source_project: finance-bp-074--FinRobot
pattern_name: External API Rate Limiting
description: When accessing external financial APIs (SEC EDGAR, data vendors), strict rate limiting must be implemented
before deployment. SEC EDGAR enforces 10 requests per second with IP blocking consequences. Use decorators and proper
User-Agent headers. Apply this pattern when integrating any external financial data API to prevent service disruption
that blocks critical data access.
applicable_to_activity: macro-data
_source_file: cross-project-wisdom/macro-data.yaml
- wisdom_id: CW-MACRO-DATA-006
source_project: finance-bp-080--FinDKG, finance-bp-105--open-climate-investing
pattern_name: Graph Attribute Propagation in Batching
description: When creating subgraph variants during batch collation in graph-based ML, all metadata attributes (num_nodes,
num_relations, time_interval) must be explicitly propagated to each subgraph. Downstream model components expect these
attributes on all graph objects. Apply this pattern whenever implementing custom collate functions for graph neural networks
to prevent training failures.
applicable_to_activity: macro-data
_source_file: cross-project-wisdom/macro-data.yaml
- wisdom_id: CW-MACRO-DATA-007
source_project: finance-bp-105--open-climate-investing, finance-bp-083--Economic-Dashboard
pattern_name: Statistical Validity Thresholds
description: Factor regressions and statistical calculations require minimum observation counts (typically 20+) for reliable
inference. Inner joins, winsorization, and date filtering reduce observations; pipeline validation must check for sufficient
data points before regression. Apply this pattern whenever computing regression statistics to ensure results are meaningful
rather than spurious.
applicable_to_activity: macro-data
_source_file: cross-project-wisdom/macro-data.yaml
- wisdom_id: CW-MACRO-DATA-008
source_project: finance-bp-080--FinDKG, finance-bp-077--Open_Source_Economic_Model
pattern_name: Data Type Strictness for ML Operations
description: Graph operations and time calculations require strict dtype consistency (float32 for time values, integer for
node types, boolean for masks). Type mismatches cause silent failures in edge_subgraph, degree calculations, and time
interval transformations. Apply this pattern when preparing data for graph neural networks or any numerical ML pipeline
to catch dtype issues early.
applicable_to_activity: macro-data
_source_file: cross-project-wisdom/macro-data.yaml
domain_constraints_injected: []
resources_injected: {}
known_use_cases:
- kuc_id: KUC-101
source_file: scripts/bmg_analyze.py
business_problem: Identifies how many stocks from an index fall into each sector and screens for stocks with statistically
significant factor regression results based on p-values.
intent_keywords:
- sector composition
- significant regression
- p-value screening
- stock sectors
- factor analysis
stage: factor_computation
data_domain: holding_data
type: screening
- kuc_id: KUC-102
source_file: scripts/correlate.py
business_problem: Computes correlations between different factors over time to understand factor relationship dynamics and
potential multicollinearity issues.
intent_keywords:
- factor correlation
- correlation matrix
- factor relationships
- multicollinearity
- time series correlation
stage: factor_computation
data_domain: financial_data
type: research_analysis
- kuc_id: KUC-103
source_file: scripts/regression_function.py
business_problem: Performs ordinary least squares regression on factor data with comprehensive diagnostic tests including
Durbin-Watson, Jarque-Bera, and Breusch-Pagan tests.
intent_keywords:
- OLS regression
- diagnostic tests
- statistical tests
- regression analysis
- residual analysis
stage: factor_computation
data_domain: financial_data
type: research_analysis
- kuc_id: KUC-104
source_file: scripts/bulk_script.py
business_problem: Builds custom Fama-French style factor models by merging stock returns, Fama-French factors, and carbon
risk factors into unified datasets for analysis.
intent_keywords:
- Fama-French model
- factor model
- carbon risk
- factor construction
- data merging
stage: data_collection
data_domain: mixed
type: data_pipeline
- kuc_id: KUC-105
source_file: scripts/stock_price_function.py
business_problem: Downloads historical stock price data from Yahoo Finance with support for daily and monthly frequencies,
including automatic retry on timeout.
intent_keywords:
- stock prices
- price download
- yfinance
- historical data
- market data
stage: data_collection
data_domain: market_data
type: data_pipeline
- kuc_id: KUC-106
source_file: scripts/bmg_series.py
business_problem: Creates Brown-Green (BMG) factor series by calculating the return differential between brown (high carbon)
and green (low carbon) stocks for carbon risk analysis.
intent_keywords:
- BMG factor
- carbon risk
- brown green stocks
- factor creation
- environmental factor
stage: factor_computation
data_domain: financial_data
type: builtin_factor
- kuc_id: KUC-107
source_file: scripts/get_regressions.py
business_problem: Executes factor regression analysis across multiple stocks in parallel using multiprocessing, loading
Fama-French and carbon risk factors from database.
intent_keywords:
- regression
- multiprocessing
- parallel analysis
- factor regression
- batch analysis
stage: factor_computation
data_domain: financial_data
type: research_analysis
- kuc_id: KUC-108
source_file: scripts/get_stocks.py
business_problem: Imports stock return data from CSV or downloads from yfinance, with support for incremental updates to
maintain current database with stock returns.
intent_keywords:
- import stocks
- stock data
- data import
- stock returns
- database update
stage: data_collection
data_domain: trading_data
type: data_pipeline
- kuc_id: KUC-109
source_file: scripts/setup_db.py
business_problem: Initializes database schema and imports Fama-French, bond, and carbon risk factors into PostgreSQL tables,
including BMG factor data.
intent_keywords:
- database setup
- schema initialization
- factor import
- carbon data
- bond factors
stage: data_collection
data_domain: financial_data
type: data_pipeline
component_capability_map:
project: finance-bp-105--open-climate-investing
scan_date: '2026-04-22'
stats:
total_files: 7
total_classes: 25
total_functions: 0
total_stages: 7
modules:
stock_data_collection:
class_count: 2
stage_id: data_collection
stage_order: 1
responsibility: Fetches stock price data from Yahoo Finance API with monthly/daily frequency support. Provides retry
logic and incomplete data filtering. Acts as the primary data ingestion point for each downstream analysis.
classes:
- name: stock_grabber
file: stock_data_collection/stock-grabber.py
line: 0
kind: required_method
signature: ''
- name: data_source
file: stock_data_collection/data-source.py
line: 0
kind: replaceable_point
design_decision_count: 4
database_setup_&_data_import:
class_count: 4
stage_id: data_import
stage_order: 2
responsibility: Initializes PostgreSQL schema, imports Fama-French factors, carbon risk factors, stock constituents,
and other reference data. Provides batch CSV ingestion with idempotent upsert behavior.
classes:
- name: load_data_files
file: database_setup_&_data_import/load-data-files.py
line: 0
kind: required_method
signature: ''
- name: import_monthly_ff_data_factors_into_sql
file: database_setup_&_data_import/import-monthly-ff-data-factors-into-sql.py
line: 0
kind: required_method
signature: ''
- name: import_msci_constituents_into_sql
file: database_setup_&_data_import/import-msci-constituents-into-sql.py
line: 0
kind: required_method
signature: ''
- name: database
file: database_setup_&_data_import/database.py
line: 0
kind: replaceable_point
design_decision_count: 5
bmg_factor_computation:
class_count: 4
stage_id: bmg_factor_computation
stage_order: 3
responsibility: Computes Brown-Minus-Green (BMG) carbon risk factor from index constituent returns. BMG = Brown Returns
- Green Returns. Positive BMG means brown stocks outperform green, indicating carbon risk premium.
classes:
- name: add_bmg_series
file: bmg_factor_computation/add-bmg-series.py
line: 0
kind: required_method
signature: ''
- name: get_bmg_series
file: bmg_factor_computation/get-bmg-series.py
line: 0
kind: required_method
signature: ''
- name: load_stocks_returns_from_db
file: bmg_factor_computation/load-stocks-returns-from-db.py
line: 0
kind: required_method
signature: ''
- name: factor_definition
file: bmg_factor_computation/factor-definition.py
line: 0
kind: replaceable_point
design_decision_count: 3
factor_regression_analysis:
class_count: 4
stage_id: factor_regression
stage_order: 4
responsibility: Runs OLS regressions of stock returns on carbon risk factor and Fama-French factors. Computes coefficients,
t-stats, p-values, and diagnostic statistics with proper data alignment and validation.
classes:
- name: run_regression
file: factor_regression_analysis/run-regression.py
line: 0
kind: required_method
signature: ''
- name: regression_input_output
file: factor_regression_analysis/regression-input-output.py
line: 0
kind: required_method
signature: ''
- name: merge_data
file: factor_regression_analysis/merge-data.py
line: 0
kind: required_method
signature: ''
- name: regression_model
file: factor_regression_analysis/regression-model.py
line: 0
kind: replaceable_point
design_decision_count: 6
bulk_regression_execution:
class_count: 5
stage_id: bulk_regression
stage_order: 5
responsibility: Runs rolling-window regressions across multiple stocks with multiprocessing support. Stores results
incrementally in PostgreSQL using UPSERT pattern. Enables analysis of large stock universes.
classes:
- name: run_regression
file: bulk_regression_execution/run-regression.py
line: 0
kind: required_method
signature: ''
- name: run_regression_internal
file: bulk_regression_execution/run-regression-internal.py
line: 0
kind: required_method
signature: ''
- name: store_regression_into_db
file: bulk_regression_execution/store-regression-into-db.py
line: 0
kind: required_method
signature: ''
- name: window_type
file: bulk_regression_execution/window-type.py
line: 0
kind: replaceable_point
- name: parallelization
file: bulk_regression_execution/parallelization.py
line: 0
kind: replaceable_point
design_decision_count: 6
factor_correlation_&_orthogonalization:
class_count: 3
stage_id: factor_orthogonalization
stage_order: 6
responsibility: Analyzes correlation between BMG factor and other factors (Fama-French), then orthogonalizes BMG by
regressing on significantly correlated factors and storing residuals. Removes factor contamination.
classes:
- name: process_factor
file: factor_correlation_&_orthogonalization/process-factor.py
line: 0
kind: required_method
signature: ''
- name: execute_batch
file: factor_correlation_&_orthogonalization/execute-batch.py
line: 0
kind: required_method
signature: ''
- name: orthogonalization_method
file: factor_correlation_&_orthogonalization/orthogonalization-method.py
line: 0
kind: replaceable_point
design_decision_count: 3
regression_results_analysis:
class_count: 3
stage_id: results_analysis
stage_order: 7
responsibility: Queries stored regression results to identify stocks and sectors with significant carbon risk exposure.
Aggregates statistics by sector and significance level for actionable insights.
classes:
- name: get_stocks_with_significant_regressions
file: regression_results_analysis/get-stocks-with-significant-regressions.py
line: 0
kind: required_method
signature: ''
- name: get_sectors_with_significant_final_regression
file: regression_results_analysis/get-sectors-with-significant-final-regre.py
line: 0
kind: required_method
signature: ''
- name: significance_criteria
file: regression_results_analysis/significance-criteria.py
line: 0
kind: replaceable_point
design_decision_count: 2
data_flow_hints: []
locale_contract:
source_language: en
user_facing_fields:
- human_summary.what_i_can_do.tagline
- human_summary.what_i_can_do.use_cases[]
- human_summary.what_i_auto_fetch[]
- human_summary.what_i_ask_you[]
- evidence_quality.user_disclosure_template
- post_install_notice.message_template.positioning
- post_install_notice.message_template.capability_catalog.groups[].name
- post_install_notice.message_template.capability_catalog.groups[].description
- post_install_notice.message_template.capability_catalog.groups[].ucs[].name
- post_install_notice.message_template.capability_catalog.groups[].ucs[].short_description
- post_install_notice.message_template.call_to_action
- post_install_notice.message_template.featured_entries[].beginner_prompt
- post_install_notice.message_template.more_info_hint
- preconditions[].description
- preconditions[].on_fail
- intent_router.uc_entries[].name
- intent_router.uc_entries[].ambiguity_question
- architecture.pipeline
- architecture.stages[].narrative.does_what
- architecture.stages[].narrative.key_decisions
- architecture.stages[].narrative.common_pitfalls
- constraints.fatal[].consequence
- constraints.regular[].consequence
- output_validator.assertions[].failure_message
- acceptance.hard_gates[].on_fail
- skill_crystallization.action
locale_detection_order:
- explicit_user_declaration
- first_message_language
- system_locale
translation_enforcement:
trigger: on_first_user_message
action: Render user_facing_fields in detected locale, preserving all IDs (BD-/SL-/UC-/finance-C-) and code identifiers
verbatim
violation_code: LOCALE-01
violation_signal: User receives untranslated English Human Summary when detected locale != en
evidence_quality:
declared:
evidence_coverage_ratio: 1.0
evidence_verify_ratio: 0.03260869565217391
evidence_invalid: 89
evidence_verified: 3
evidence_auto_fixed: 0
audit_coverage: 39/39 (100%)
audit_pass_rate: 0/39 (0%)
audit_fail_total: 20
audit_finance_universal:
pass: 0
warn: 9
fail: 11
audit_subdomain_totals:
pass: 0
warn: 10
fail: 9
enforcement_rules:
- id: EQ-01
trigger: declared.evidence_verify_ratio < 0.5
action: MUST invoke traceback lookup for all cited BD-IDs in output before emitting business code — read LATEST.yaml sections
for each BD referenced
violation_code: EQ-01-V
violation_signal: Generated script references BD-IDs but no tool_call to read LATEST.yaml preceded code generation
user_disclosure_template: '[QUALITY NOTICE] This crystal was compiled from blueprint finance-bp-105. Evidence verify ratio
= 3.3% and audit fail total = 20. Generated results may have uncaptured requirement gaps. Verify critical decisions against
source files (LATEST.yaml / LATEST.jsonl).'
traceback:
source_files:
blueprint: LATEST.yaml
constraints: LATEST.jsonl
mandatory_lookup_scenarios:
- id: TB-01
condition: Two constraints have apparently conflicting enforcement rules
lookup_target: LATEST.jsonl — find both constraint IDs, compare `consequence` + `evidence_refs` to determine priority
- id: TB-02
condition: A business decision rationale is unclear or disputed
lookup_target: LATEST.yaml — locate BD-ID under business_decisions, read `rationale` + `alternative_considered` fields
- id: TB-03
condition: evidence_invalid > 0 in evidence_quality.declared
lookup_target: LATEST.yaml _enrich_meta — cross-check specific BD `evidence_refs` fields for invalid markers
- id: TB-04
condition: User asks where a rule comes from
lookup_target: LATEST.jsonl — find constraint by ID, read `confidence.evidence_refs` for source file + line number
- id: TB-05
condition: Generated code does not match expected ZVT API behavior
lookup_target: LATEST.yaml stages[].required_methods — verify method signature and evidence locator in source code
degraded_lookup:
no_fs_access: 'Ask the user to paste the relevant LATEST.yaml section or LATEST.jsonl lines for the BD-/finance-C- IDs
in question. Crystal ID: finance-bp-105-v5.0.'
trace_schema:
event_types:
- precondition_check
- spec_lock_check
- evidence_rule_fired
- evidence_rule_skipped
- locale_translation_emitted
- hard_gate_passed
- hard_gate_failed
- skill_emitted
- false_completion_claim
preconditions:
- id: PC-01
description: zvt package installed and importable
check_command: python3 -c 'import zvt; print(zvt.__version__)'
on_fail: 'Run: python3 -m pip install zvt then re-run: python3 -m zvt.init_dirs to initialize data directories'
severity: fatal
- id: PC-02
description: K-data exists for target entities (required before backtesting)
check_command: python3 -c "from zvt.api.kdata import get_kdata; df = get_kdata(entity_ids=['stock_sh_600000'], limit=1);
assert df is not None and len(df) > 0, 'No kdata found'"
on_fail: 'Run recorder first: python3 -m zvt.recorders.em.em_stock_kdata_recorder --entity_ids stock_sh_600000 (replace
with your target entity IDs)'
severity: fatal
applies_to_uc:
- UC-101
- UC-102
- UC-103
- UC-106
- UC-107
- id: PC-03
description: ZVT data directory initialized (~/.zvt or ZVT_HOME)
check_command: 'python3 -c "import os; from pathlib import Path; zvt_home = Path(os.environ.get(''ZVT_HOME'', Path.home()
/ ''.zvt'')); assert zvt_home.exists(), f''ZVT home not found: {zvt_home}''"'
on_fail: 'Run: python3 -m zvt.init_dirs'
severity: fatal
- id: PC-04
description: SQLite write permission for ZVT data directory
check_command: python3 -c "import os, tempfile; from pathlib import Path; zvt_home = Path(os.environ.get('ZVT_HOME', Path.home()
/ '.zvt')); test_f = zvt_home / '.write_test'; test_f.touch(); test_f.unlink()"
on_fail: 'Check directory permissions: chmod u+w ~/.zvt or set ZVT_HOME environment variable to a writable location'
severity: warn
intent_router:
uc_entries:
- uc_id: UC-101
name: Sector Stock Count and Significant Factor Regression Analyzer
positive_terms:
- sector composition
- significant regression
- p-value screening
- stock sectors
- factor analysis
data_domain: holding_data
negative_terms:
- stock price download
- correlation calculation
- factor series creation
- database setup
- bulk factor model
ambiguity_question: Are you looking to screen stocks by sector distribution from an index, or to find stocks with statistically
significant factor relationships?
- uc_id: UC-102
name: Factor Correlation Calculator
positive_terms:
- factor correlation
- correlation matrix
- factor relationships
- multicollinearity
- time series correlation
data_domain: financial_data
negative_terms:
- stock screening
- price download
- BMG factor
- regression execution
- database setup
ambiguity_question: Do you want to see how different factors correlate with each other, or are you looking to run factor
regressions on individual stocks?
- uc_id: UC-103
name: OLS Regression with Diagnostic Statistics
positive_terms:
- OLS regression
- diagnostic tests
- statistical tests
- regression analysis
- residual analysis
data_domain: financial_data
negative_terms:
- correlation
- stock screening
- data download
- factor creation
- database utilities
ambiguity_question: Are you trying to run regression analysis with statistical diagnostics, or do you need something else
like factor correlations or data loading?
- uc_id: UC-104
name: Fama-French Factor Model Generator
positive_terms:
- Fama-French model
- factor model
- carbon risk
- factor construction
- data merging
data_domain: mixed
negative_terms:
- stock screening
- correlation
- price download
- database setup
- BMG factor
ambiguity_question: Are you building a custom factor model combining multiple data sources, or are you analyzing existing
factors for correlations or regressions?
- uc_id: UC-105
name: Stock Price Data Downloader
positive_terms:
- stock prices
- price download
- yfinance
- historical data
- market data
data_domain: market_data
negative_terms:
- regression
- correlation
- screening
- BMG factor
- database setup
ambiguity_question: Are you trying to download raw stock price data, or are you looking to perform analysis like regressions,
correlations, or screening?
- uc_id: UC-106
name: BMG Factor Series Creator
positive_terms:
- BMG factor
- carbon risk
- brown green stocks
- factor creation
- environmental factor
data_domain: financial_data
negative_terms:
- regression
- correlation
- stock prices
- screening
- database setup
ambiguity_question: Are you creating a new BMG (brown-minus-green) carbon risk factor series, or are you using an existing
factor for analysis like regressions or correlations?
- uc_id: UC-107
name: Multi-Stock Factor Regression Runner
positive_terms:
- regression
- multiprocessing
- parallel analysis
- factor regression
- batch analysis
data_domain: financial_data
negative_terms:
- correlation
- screening
- price download
- BMG creation
- database setup
ambiguity_question: Are you running factor regressions on multiple stocks at scale, or are you looking for single-stock
analysis, factor correlations, or data loading?
- uc_id: UC-108
name: Stock Data Import and Update
positive_terms:
- import stocks
- stock data
- data import
- stock returns
- database update
data_domain: trading_data
negative_terms:
- regression
- correlation
- screening
- BMG factor
- factor model
ambiguity_question: Are you loading or updating stock return data in the database, or are you performing analysis like
regressions, screening, or factor computations?
- uc_id: UC-109
name: Database Schema Initialization and Data Import
positive_terms:
- database setup
- schema initialization
- factor import
- carbon data
- bond factors
data_domain: financial_data
negative_terms:
- regression
- correlation
- screening
- stock prices
- factor analysis
ambiguity_question: Are you setting up the database schema and loading factor data, or are you performing analysis like
regressions, correlations, or screening?
context_state_machine:
states:
- id: CA1_MEMORY_CHECKED
entry: Task started
exit: All memory queries attempted and recorded; memory_unavailable set if failed
timeout: 30s — skip memory, mark memory_unavailable=true, proceed to CA2
- id: CA2_GAPS_FILLED
entry: CA1 complete
exit: 'All FATAL-priority required inputs answered: target market (A-share/HK/US), data source, time range, strategy type'
timeout: NOT skippable — FATAL inputs MUST be user-answered before proceeding
- id: CA3_PATH_SELECTED
entry: CA2 complete
exit: intent_router matched single use case with confidence gap > 20% over next candidate, no data_domain ambiguity
timeout: Trigger ambiguity_question for top-2 candidates, await user selection
- id: CA4_EXECUTING
entry: CA3 complete + user explicit confirmation received
exit: All hard gates G1-Gn passed and output files written
timeout: NOT skippable — user confirmation of execution path required
enforcement: Code generation is PROHIBITED before CA4_EXECUTING. Any regression to earlier state MUST be announced to user.
buy/sell ordering SL-01 check runs at CA4 entry.
spec_lock_registry:
semantic_locks:
- id: SL-01
description: Execute sell orders before buy orders in every trading cycle
locked_value: sell() called before buy() in each Trader.run() iteration
violation_is: fatal
source_bd_ids:
- BD-018
- id: SL-02
description: Trading signals MUST use next-bar execution (no look-ahead)
locked_value: due_timestamp = happen_timestamp + level.to_second()
violation_is: fatal
source_bd_ids:
- BD-014
- BD-025
- id: SL-03
description: Entity IDs MUST follow format entity_type_exchange_code
locked_value: stock_sh_600000 | stockhk_hk_0700 | stockus_nasdaq_AAPL
violation_is: fatal
source_bd_ids: []
- id: SL-04
description: DataFrame index MUST be MultiIndex (entity_id, timestamp)
locked_value: df.index.names == ['entity_id', 'timestamp']
violation_is: fatal
source_bd_ids: []
- id: SL-05
description: 'TradingSignal MUST have EXACTLY ONE of: position_pct, order_money, order_amount'
locked_value: XOR enforcement in trading/__init__.py:68
violation_is: fatal
source_bd_ids: []
- id: SL-06
description: 'filter_result column semantics: True=BUY, False=SELL, None/NaN=NO ACTION'
locked_value: factor.py:475 order_type_flag mapping
violation_is: fatal
source_bd_ids: []
- id: SL-07
description: Transformer MUST run BEFORE Accumulator in factor pipeline
locked_value: 'compute_result(): transform at :403 before accumulator at :409'
violation_is: fatal
source_bd_ids: []
- id: SL-08
description: 'MACD parameters locked: fast=12, slow=26, signal=9'
locked_value: factors/algorithm.py:30 macd(slow=26, fast=12, n=9)
violation_is: fatal
source_bd_ids:
- BD-036
- id: SL-09
description: 'Default transaction costs: buy_cost=0.001, sell_cost=0.001, slippage=0.001'
locked_value: sim_account.py:25 SimAccountService default costs
violation_is: warning
source_bd_ids:
- BD-029
- id: SL-10
description: A-share equity trading is T+1 (no same-day close of buy positions)
locked_value: sim_account.available_long filters by trading_t
violation_is: fatal
source_bd_ids: []
- id: SL-11
description: Recorder subclass MUST define provider AND data_schema class attributes
locked_value: contract/recorder.py:71 Meta; register_schema decorator
violation_is: fatal
source_bd_ids: []
- id: SL-12
description: Factor result_df MUST contain either 'filter_result' OR 'score_result' column
locked_value: result_df.columns.intersection({'filter_result', 'score_result'}) non-empty
violation_is: fatal
source_bd_ids: []
implementation_hints:
- id: IH-01
hint: 'Use AdjustType enum exactly: qfq (pre-adjust), hfq (post-adjust), bfq (none) — contract/__init__.py:121'
- id: IH-02
hint: For A-share kdata, default to hfq for long-term analysis (dividend-adjusted) — trader.py:538 StockTrader
- id: IH-03
hint: SQLite connection MUST use check_same_thread=False for multi-threaded recorders
- id: IH-04
hint: Accumulator state serialization uses JSON with custom encoder/decoder hooks — contract/base_service.py
- id: IH-05
hint: Factor.level MUST match TargetSelector.level (enforced at add_factor) — factors/target_selector.py:84
preservation_manifest:
required_objects:
business_decisions_count: 124
fatal_constraints_count: 39
non_fatal_constraints_count: 139
use_cases_count: 9
semantic_locks_count: 12
preconditions_count: 4
evidence_quality_rules_count: 2
traceback_scenarios_count: 5
architecture:
pipeline: data_collection -> data_storage -> factor_computation -> target_selection -> trading_execution -> visualization
stages:
- id: data_collection
narrative:
does_what: TimeSeriesDataRecorder and FixedCycleDataRecorder fetch OHLCV and fundamental data from providers (eastmoney,
joinquant, baostock, akshare) and persist domain objects (Stock1dKdata, BalanceSheet) to SQLite via df_to_db().
key_decisions: BD-002 chose evaluate_start_end_size_timestamps for incremental fetch (not full refresh) because comparing
to get_latest_saved_record avoids redundant API calls; BD-003 chose get_data_map field transformation to keep domain
schema provider-agnostic.
common_pitfalls: 'Don''t forget SL-11: Recorder subclass MUST declare both provider and data_schema class attributes
else initialization fails with assertion error; finance-C-001 fatal violation.'
business_decisions:
- id: BD-001
type: B/BA
summary: Monthly interval uses 1mo yfinance interval (NOT 1d)
- id: BD-002
type: BA
summary: Drops last entry assuming incomplete candle
- id: BD-003
type: BA/DK
summary: Monthly dates aligned to MonthEnd via pandas offset
- id: BD-004
type: B
summary: 3 retry attempts on JSON decode errors
- id: BD-031
type: T
summary: Default data frequency is MONTHLY across each operations
- id: BD-035
type: T
summary: Minimum 20 data points required for valid regression
- id: BD-036
type: T
summary: Returns capped at ±50% (0.5) during regression to remove outliers
- id: BD-037
type: T
summary: Abnormal returns > 100% (>1) are filtered out from stock data
- id: BD-038
type: T
summary: Inner join used when merging stock returns with factor data
- id: BD-039
type: B/DK
summary: Use Yahoo Finance (yfinance) as source for stock price data
- id: BD-040
type: B
summary: Store BMG series in database for reuse across analysis
- id: BD-047
type: T
summary: Drop last (incomplete) data point when fetching stock history
- id: BD-048
type: T
summary: For composites with no price data, compute and store only returns
- id: BD-054
type: T
summary: Skip components with missing percentage when computing composite returns
- id: BD-058
type: T
summary: Fama-French factors converted from decimal to percentage (divided by 100)
- id: BD-061
type: B/RC
summary: 'Store stock details: EBITDA, enterprise value, P/E, cash, debt, shares outstanding'
- id: BD-GAP-001
type: RC
summary: 'Missing: Timezone explicit annotation + UTC'
- id: BD-GAP-002
type: DK
summary: 'Missing: Point-in-Time data availability'
- id: BD-GAP-003
type: DK
summary: 'Missing: Stale data detection and expiry'
- id: BD-GAP-004
type: B
summary: 'Missing: PnL conservation (realized + unrealized)'
- id: BD-GAP-005
type: B
summary: 'Missing: Train/test time split integrity'
- id: BD-GAP-006
type: DK
summary: 'Missing: Random seed full coverage'
- id: BD-GAP-007
type: RC
summary: 'Missing: Settlement and delivery time convention'
- id: BD-GAP-008
type: DK
summary: 'Missing: Rebalancing Trigger Mechanism'
- id: BD-GAP-009
type: M
summary: 'Missing: Transition Matrix Time Homogeneity & Conditioning'
- id: BD-GAP-010
type: DK
summary: 'Missing: Versioned Writes & Snapshot Semantics'
- id: BD-GAP-011
type: DK
summary: 'Missing: ** "Implement UTC timezone normalization for each datetime fields. Add tzinfo awareness to stock_data,
carbon_risk_factor, and ff_factor tables.'
- id: BD-GAP-012
type: DK
summary: 'Missing: ** "Add random.seed(42) or equivalent to each stochastic operations. Document reproducibility requirements
in regression_function.py and correlate.py.'
- id: BD-GAP-013
type: RC
summary: 'Missing: ** "Replace ON CONFLICT DO UPDATE with versioned INSERT (add valid_from/valid_to timestamps). Implement
append-only audit table for carbon_risk_factor changes.'
- id: BD-GAP-014
type: B
summary: 'Missing: ** "Add data_version column to each factor tables. Track run_id and experiment_id in regression results
table.'
- id: BD-GAP-015
type: DK
summary: 'Missing: Stale data detection and expiry'
- id: BD-GAP-016
type: B
summary: 'Missing: PnL conservation (realized + unrealized)'
- id: BD-GAP-017
type: M/DK
summary: 'Missing: Day count convention'
- id: BD-GAP-018
type: M
summary: 'Missing: Covariance Matrix PSD Fix Strategy'
- id: BD-GAP-019
type: B
summary: 'Missing: Default Definition & IFRS 9 Staging'
- id: BD-GAP-020
type: B
summary: 'Missing: PD/LGD/EAD Estimation (IRB vs Standard)'
- id: BD-GAP-021
type: B
summary: 'Missing: Vasicek Single-Factor Asset Correlation'
- id: BD-GAP-022
type: M
summary: 'Missing: ** "Implement covariance matrix PSD repair strategy (nearest correlation, eigenvalue clipping, or
shrinkage estimator).'
- id: data_storage
narrative:
does_what: StorageBackend persists DataFrames to per-provider SQLite databases at {data_path}/{provider}/{provider}_{db_name}.db
using path templates from _get_path_template; Mixin.record_data and Mixin.query_data provide uniform read/write interface.
key_decisions: BD-004 chose StorageBackend abstraction (not hardcoded SQLite) to allow future cloud storage swap; BD-006
derives db_name from data_schema __tablename__ for per-domain database isolation.
common_pitfalls: SL-04 violation (wrong DataFrame index) causes factor pipeline failures downstream; always ensure df.index.names
== ['entity_id', 'timestamp'] before calling record_data.
business_decisions: []
- id: factor_computation
narrative:
does_what: Factor.compute() applies Transformer (stateless, e.g. MacdTransformer) then Accumulator (stateful, e.g. MaStatsAccumulator)
to produce filter_result or score_result columns; EntityStateService persists per-entity rolling state across batches.
key_decisions: BD-007 chose Factor inheriting DataReader for composable data access; SL-08 locks MACD at (fast=12, slow=26,
n=9) — chose standard Appel parameters not adaptive because interpretability matters for practitioners.
common_pitfalls: 'SL-07: Transformer MUST run before Accumulator — swapping order causes NaN propagation; SL-12: result_df
must contain filter_result OR score_result column or TargetSelector silently drops all signals.'
business_decisions: []
- id: target_selection
narrative:
does_what: TargetSelector.add_factor() registers Factor instances; get_targets() returns entity_ids passing threshold
filter at a specific timestamp, enabling point-in-time historical backtesting without look-ahead.
key_decisions: BD-012 chose registrable factor list (not hardcoded) for runtime customization; BD-013 chose timestamp-specific
filtering not current-only because backtests need historical point-in-time correctness.
common_pitfalls: Factor.level MUST match TargetSelector.level (IH-05); mismatched levels cause silent empty target lists
that look like no signals but are actually level-mismatch bugs.
business_decisions: []
- id: trading_execution
narrative:
does_what: Trader.run() calls sell() before buy() each cycle, generates TradingSignals with due_timestamp = happen_timestamp
+ level.to_second() for next-bar execution, and applies on_profit_control() for stop-loss/take-profit before regular
target selection.
key_decisions: SL-01 locks sell-before-buy order because available_long check in sim_account depends on it — chose this
over symmetric ordering to prevent implicit leverage; BD-039 chose long=AND/short=OR multi-level logic to reflect
risk asymmetry.
common_pitfalls: 'SL-02 violation (immediate execution instead of next-bar) introduces look-ahead bias and makes backtest
results unreproducible in live trading; SL-10: A-share T+1 constraint — backtesting without it overstates returns.'
business_decisions: []
- id: visualization
narrative:
does_what: Drawer.draw() combines kline main chart with factor overlays and Rect annotations for entry/exit signals
using Plotly; Drawable interface on Factor enables consistent chart rendering across data types.
key_decisions: BD-019 chose drawer_rects subclass override for custom annotations not hardcoded markers — allows traders
to define entry/exit visuals without modifying base drawing logic.
common_pitfalls: draw_result=True by default (BD-055) is fine for development but set draw_result=False in production/headless
environments to avoid Plotly server startup overhead.
business_decisions: []
- id: cross_cutting_concerns
narrative:
does_what: 'Invariants and utilities that span multiple pipeline stages — collected from 25 source groups: Composite
Calculation(1), Date Handling(1), Date Normalization(2), Index Definition(1), Model Diagnostics(1), Model Validity(1),
and 19 more.'
key_decisions: 86 BDs merged here because they apply to more than one main stage (e.g. algorithm helpers, default value
choices, ordering contracts, error handling). Agent should inspect individual BD summaries and link back to affected
main stages via shared IDs.
common_pitfalls: Cross-cutting concerns frequently surface as bugs when changes to one main stage unintentionally break
another. Check constraints referencing these BDs and verify invariants still hold after any stage-local modification.
business_decisions:
- id: BD-043
type: T
summary: 'Composite stock returns calculated as weighted average: sum(pct*return) / sum(pct)'
- id: BD-053
type: T
summary: Use minimum available date when user start_date precedes data range
- id: BD-051
type: T
summary: Monthly dates anchored to end-of-month using MonthEnd(1)
- id: BD-052
type: T
summary: WML factors adjusted to month-end by subtracting 1 day from date
- id: BD-042
type: B/DK
summary: 'Index stocks: IVV = S&P 500, XWD.TO = MSCI World'
- id: BD-046
type: T
summary: 'Store three statistical tests: Jarque-Bera, Breusch-Pagan, Durbin-Watson'
- id: BD-060
type: T
summary: Require at least N+10 data points where N is number of factors in regression
- id: BD-049
type: T
summary: '''DEFAULT'' is reserved factor name, cannot be used for custom BMG series'
- id: BD-050
type: T
summary: Bulk regression mode loads each stocks from DB and joins in memory
- id: BD-041
type: T
summary: Convert stock prices to excess returns by subtracting risk-free rate (Rf)
- id: BD-032
type: T
summary: Default significance threshold = 0.05 for regression analysis
- id: BD-010
type: B
summary: BMG = Brown Returns - Green Returns
- id: BD-011
type: B
summary: Composite stock returns = weighted average of components
- id: BD-012
type: BA
summary: NaN values removed before storing BMG
- id: BD-030
type: B/BA
summary: BMG (Brown-Minus-Green) factor = Brown stock returns minus Green stock returns, calculated as return_x - return_y
- id: BD-044
type: T
summary: Orthogonalize BMG factor by regressing on FF factors, storing residuals
- id: BD-019
type: BA/M
summary: Default interval = 60 months (5 years) for MONTHLY
- id: BD-020
type: BA/DK
summary: Default interval = 730 days (2 years) for DAILY
- id: BD-021
type: B/DK
summary: Rolling window advances by interval (NOT by 1 month)
- id: BD-022
type: B/RC
summary: multiprocessing.set_start_method('spawn')
- id: BD-023
type: B/BA
summary: Connection pool size = 20
- id: BD-024
type: BA
summary: Data loaded once, passed to workers via itertools.repeat
- id: BD-005
type: B
summary: ON CONFLICT DO NOTHING for data imports
- id: BD-006
type: BA
summary: Uses COPY FROM PROGRAM with grep to filter raw FF files
- id: BD-007
type: B
summary: WML (weak momentum) stored in same ff_factor table
- id: BD-008
type: B
summary: Stores both close price AND computed return in stock_data
- id: BD-009
type: B
summary: Frequency column on each time series tables
- id: BD-057
type: T
summary: CSV dates must be in YYYYMMDD or YYYYMM format for import
- id: BD-059
type: T
summary: Delete existing data before importing new factor data
- id: BD-080
type: B/BA
summary: Frequency defaults to MONTHLY; DAILY requires explicit specification
- id: BD-081
type: B/BA
summary: Factor data divided by 100 before regression; Close adjusted by risk-free rate
- id: BD-088
type: M/BA
summary: '''DEFAULT'' is reserved factor_name; cannot be used for user-defined BMG series'
- id: BD-092
type: B/BA
summary: Default interval 0 becomes 60 months (MONTHLY) or 730 days (DAILY)
- id: BD-025
type: BA/M
summary: Significance threshold filter (default 0.1)
- id: BD-026
type: B
summary: Orthogonalized factor stored with -ORTHO suffix
- id: BD-027
type: B
summary: Residuals become the orthogonalized factor
- id: BD-033
type: T
summary: Orthogonalization significance threshold = 0.10
- id: BD-013
type: B
summary: Inner join on dates for each DataFrame merges
- id: BD-014
type: B
summary: Excess returns = Close - Rf (risk-free rate)
- id: BD-015
type: BA
summary: 'Winsorization: keep only |value| < 0.5'
- id: BD-016
type: BA/M
summary: Minimum 20 data points required
- id: BD-017
type: B/BA
summary: FF factors and RF stored as percentage, divided by 100 before use
- id: BD-018
type: M/DK
summary: Custom DateInRangeError for date boundary validation
- id: BD-062
type: B/BA
summary: Pearson correlation for factor correlation matrix
- id: BD-063
type: B/BA
summary: OLS regression for factor orthogonalization
- id: BD-064
type: B/BA
summary: Significance threshold for variable selection at p-value 0.1
- id: BD-065
type: B
summary: Two-stage sequential regression for orthogonalization
- id: BD-066
type: B/BA
summary: OLS residuals as orthogonalized factor values
- id: BD-094
type: B/DK
summary: 'INTERACTION: BD-021 (rolling window by interval) × BD-055 (rolling window by 1 period) → Contradictory window
advancement logic'
- id: BD-095
type: B/BA
summary: 'INTERACTION: BD-036 (±0.5 winsorization) × BD-091 (≤1.0 abnormal filter) → Inconsistent outlier handling across
pipeline stages'
- id: BD-096
type: BA
summary: 'INTERACTION: BD-009 (frequency column on tables) × BD-013 (inner join on dates) × BD-038 (inner join for stock-factor
merge) → Hidden data loss cascade'
- id: BD-097
type: B/BA
summary: 'INTERACTION: BD-021 (rolling window by interval) × BD-019 (60-month default) × BD-020 (730-day default) ×
BD-080 (MONTHLY default) → Risk cascade of window size misinterpretation'
- id: BD-098
type: BA
summary: 'INTERACTION: BD-039 (Yahoo Finance source) × BD-008 (precomputed returns stored) × BD-047 (drop last incomplete
entry) → Amplification of data quality risks'
- id: BD-099
type: BA/M
summary: 'INTERACTION: BD-017 (FF factors /100) × BD-041 (excess returns = Close - Rf) × BD-081 (factor /100 and Close
- Rf) → Overlapping normalization decisions with latent contradiction risk'
- id: BD-100
type: BA/DK
summary: 'INTERACTION: BD-022 (spawn multiprocessing) × BD-090 (connection pool get/put pairing) × BD-024 (data loaded
once, passed via repeat) → Hidden dependency on spawn-specific behavior'
- id: BD-101
type: B
summary: 'INTERACTION: BD-059 (DELETE before INSERT) × BD-005 (ON CONFLICT DO NOTHING) × BD-093 (refresh_views after
completion) → Risk cascade of atomicity violations'
- id: BD-102
type: BA
summary: 'INTERACTION: BD-015 (±0.5 winsorization) × BD-072 (outlier filter [-0.5, 0.5]) × BD-087 ([-0.5, 0.5] bounds)
→ Redundant outlier decisions across scripts'
- id: BD-085
type: DK
summary: Composite stocks compute returns via component weighted average BEFORE regression
- id: BD-082
type: BA
summary: Sliding window regression requires data_end_date <= end_date to terminate loop
- id: BD-083
type: BA/M
summary: Minimum 20 data points required for valid OLS regression
- id: BD-087
type: BA
summary: Outlier filter bounds returns to [-0.5, 0.5] before regression
- id: BD-091
type: BA/DK
summary: 'Abnormal returns filtered: data[''r''] <= 1 before DB insert'
- id: BD-079
type: B/BA
summary: BMG factor MUST be brown minus green (subtraction order is critical)
- id: BD-084
type: B/DK
summary: 'Date range validation: start_date < max_data AND end_date > min_data required'
- id: BD-089
type: B
summary: 'Data merge order: stock → carbon → ff → rf (inner join on dates)'
- id: BD-086
type: BA
summary: Multiprocessing uses spawn method with explicit data duplication per worker
- id: BD-090
type: BA
summary: 'Connection pool lifecycle: getconn() must be paired with putconn()'
- id: BD-093
type: M
summary: Database refresh_views called after each regressions complete
- id: BD-071
type: B/BA
summary: OLS regression for stock factor analysis
- id: BD-072
type: B
summary: Data clipping to [-0.5, 0.5] range for outlier removal
- id: BD-073
type: B
summary: Minimum sample size requirement n > k + 10 for regression
- id: BD-074
type: B/BA
summary: Jarque-Bera test for normality of residuals
- id: BD-075
type: B/BA
summary: Breusch-Pagan test for heteroscedasticity
- id: BD-076
type: B/BA
summary: Durbin-Watson test for autocorrelation
- id: BD-077
type: B
summary: R-squared for model fit assessment
- id: BD-028
type: B/BA
summary: Significance defined as bmg_p_gt_abs_t < threshold
- id: BD-029
type: BA
summary: 'Majority rule: >50% of periods significant'
- id: BD-034
type: T
summary: 'Regression interval defaults: 60 months for MONTHLY, 730 days for DAILY'
- id: BD-045
type: T
summary: 'Run two-stage OLS: first with each factors, then only significant factors'
- id: BD-055
type: T
summary: Rolling window advances by 1 period (month or day) between regressions
- id: BD-056
type: T
summary: Daily frequency uses 730-day (2-year) interval when not specified
- id: BD-067
type: B
summary: Simple percentage change for return calculation
- id: BD-068
type: B
summary: Return cap at 100% for abnormal return filtering
- id: BD-069
type: B/BA
summary: Weighted average for composite stock returns
- id: BD-070
type: B/RC
summary: Outer join for merging component stock data
- id: BD-078
type: B
summary: Decimal type for percentage-weighted return multiplication
resources:
packages:
- name: pandas
version_pin: ==1.5.3
- name: numpy
version_pin: ==1.24.4
- name: matplotlib
version_pin: '>=2'
- name: requests
version_pin: ==2.31.0
- name: scipy
version_pin: '>=1.3.0'
- name: scikit-learn
version_pin: '>1.4.2'
- name: pytest
version_pin: '>=8.3'
strategy_scaffold:
entry_point_name: run_backtest
output_path: result.csv
execution_mode: backtest
conditional_entry_points:
backtest:
entry_point_name: run_backtest
output_path: result.csv
collector:
entry_point_name: run_collector
output_path: result.json
factor:
entry_point_name: run_factor
output_path: result.parquet
training:
entry_point_name: run_training
output_path: result.json
serving:
entry_point_name: run_server
output_path: result.json
research:
entry_point_name: run_research
output_path: result.json
tail_template: "# === DO NOT MODIFY BELOW THIS LINE ===\nif __name__ == \"__main__\":\n result = run_backtest() #\
\ implement above\n from validate import enforce_validation\n enforce_validation(result, output_path=\"{workspace}/result.csv\"\
)\n# === END DO NOT MODIFY ==="
host_adapter:
target: openclaw
timeout_seconds: 1800
shell_operator_restriction: 'exec tool intercepts && / ; / | — never chain: ''pip install X && python Y''. Use separate
exec calls.'
install_recipes:
- python3 -m pip install zvt
credential_injection: JoinQuant/QMT credentials require user-side '!' prefix shell login. Never hardcode credentials in
generated scripts.
path_resolution: '{workspace} resolves to ~/.openclaw/workspace/doramagic at execution time.'
file_io_tooling: Use openclaw 'write' tool for .py/.sql files; 'exec' tool for python3 /absolute/path/script.py (absolute
paths only).
constraints:
fatal:
- id: finance-C-001
when: When fetching monthly frequency stock data
action: Use yfinance interval='1mo' (monthly candles), NOT interval='1d'
severity: fatal
kind: domain_rule
modality: must
consequence: Using daily data for monthly frequency causes incorrect MonthEnd date alignment, leading to misaligned returns
that corrupt downstream factor regressions
stage_ids:
- data_collection
- id: finance-C-002
when: When fetching stock price data from yfinance
action: Drop the last entry assuming it is an incomplete candle
severity: fatal
kind: domain_rule
modality: must
consequence: Partial last-period candle causes incomplete return calculations, producing NaN or incorrect values when
computing pct_change in downstream stages
stage_ids:
- data_collection
- id: finance-C-007
when: When fetching stock data with yfinance
action: Limit frequency to DAILY or MONTHLY only (raises Exception for other values)
severity: fatal
kind: resource_boundary
modality: must
consequence: Unsupported frequency triggers Exception at stock_price_function.py:25, aborting the data fetch and leaving
downstream stages without required data
stage_ids:
- data_collection
- id: finance-C-011
when: When computing returns from stock price data
action: Verify Close column contains numeric float values (not strings or objects)
severity: fatal
kind: domain_rule
modality: must
consequence: Non-numeric Close values cause pct_change() to return NaN for all rows, producing empty regression inputs
and invalid factor loadings
stage_ids:
- data_collection
- id: finance-C-016
when: When importing Fama-French factor data from raw CSV files
action: apply grep preprocessing to filter header rows before COPY import
severity: fatal
kind: domain_rule
modality: must
consequence: Raw FF CSV files contain header rows that would corrupt the database with invalid date strings, causing all
factor data imports to fail or contain garbage values
stage_ids:
- data_import
- id: finance-C-017
when: When creating PostgreSQL schema for time series tables
action: define frequency column with text type and enforce 'MONTHLY' or 'DAILY' values
severity: fatal
kind: domain_rule
modality: must
consequence: Without explicit frequency enforcement, mixed frequency data causes regression calculations to combine incompatible
data, producing statistically invalid results
stage_ids:
- data_import
- id: finance-C-020
when: When importing Fama-French factors
action: remove rows with any null factor values (mkt_rf, smb, hml, wml) after import
severity: fatal
kind: domain_rule
modality: must
consequence: Incomplete factor rows cause matrix inversion failures in regressions, as factor covariance matrix cannot
be computed with null values
stage_ids:
- data_import
- id: finance-C-022
when: When setting up the data infrastructure
action: use PostgreSQL as the database backend with COPY FROM PROGRAM support
severity: fatal
kind: resource_boundary
modality: must
consequence: The import scripts rely on PostgreSQL-specific features (COPY FROM PROGRAM, materialized views, ON CONFLICT).
Using incompatible databases causes all imports to fail
stage_ids:
- data_import
- id: finance-C-029
when: When initializing database schema
action: 'create each seven required tables: stocks, stock_data, carbon_risk_factor, ff_factor, risk_free, stock_components,
stock_stats'
severity: fatal
kind: architecture_guardrail
modality: must
consequence: Missing any required table causes downstream scripts to fail with relation does not exist errors, blocking
all factor regression calculations
stage_ids:
- data_import
- id: finance-C-046
when: When implementing factor regression data alignment
action: use inner join to merge stock returns, carbon factor, FF factors, and risk-free rate on matching dates
severity: fatal
kind: domain_rule
modality: must
consequence: Regression coefficients become statistically invalid when data points lack complete factor coverage, causing
biased estimates and unreliable inference
stage_ids:
- factor_regression
- id: finance-C-047
when: When computing excess returns for factor regression
action: subtract risk-free rate (Rf) from stock Close price to compute excess returns, not raw returns
severity: fatal
kind: domain_rule
modality: must
consequence: CAPM/FF regression requires excess returns (Close - Rf); using raw returns violates standard financial econometric
methodology and produces incorrect beta estimates
stage_ids:
- factor_regression
- id: finance-C-048
when: When preparing Fama-French factors and risk-free rate for OLS regression
action: divide percentage-formatted FF factors and risk-free rate by 100 before using in regression
severity: fatal
kind: domain_rule
modality: must
consequence: FF factors stored as percentages (e.g., 5.2) must be converted to decimals (0.052); using raw percentages
produces coefficients scaled by 100x and invalid inference
stage_ids:
- factor_regression
- id: finance-C-050
when: When validating regression data sufficiency
action: require at least 20 data points after each filtering (inner join, winsorization, date range)
severity: fatal
kind: domain_rule
modality: must
consequence: OLS with fewer than 20 observations produces unreliable t-statistics and p-values; standard practice requires
minimum 20 monthly observations for statistical validity
stage_ids:
- factor_regression
- id: finance-C-061
when: When implementing regression analysis with factor models
action: Use at least 20 data points for statistical validity of regression coefficients
severity: fatal
kind: domain_rule
modality: must
consequence: Regression statistics (t-stats, p-values, R²) become unreliable or undefined with fewer than 20 observations,
producing meaningless or misleading factor loadings
stage_ids:
- bulk_regression
- id: finance-C-065
when: When implementing multiprocessing for parallel regression execution
action: Use spawn method instead of fork for multiprocessing start
severity: fatal
kind: resource_boundary
modality: must
consequence: Forking a process with psycopg2 connections causes 'connection already closed' errors or corrupted connection
state in child processes, resulting in failed database writes
stage_ids:
- bulk_regression
- id: finance-C-067
when: When acquiring database connections from the pool
action: Return connections to the pool after use with putconn()
severity: fatal
kind: resource_boundary
modality: must
consequence: Connection leak exhausts the pool, causing subsequent operations to block indefinitely waiting for available
connections
stage_ids:
- bulk_regression
- id: finance-C-075
when: When implementing factor orthogonalization using OLS regression
action: Add a constant term (intercept) to the regression model via x.insert(0, 'Constant', 1)
severity: fatal
kind: domain_rule
modality: must
consequence: Regression without intercept will produce biased estimates of factor loadings, causing the orthogonalization
to incorrectly attribute variance to the constant term instead of the BMG factor
stage_ids:
- factor_orthogonalization
- id: finance-C-076
when: When selecting factors for orthogonalization regression
action: Use p-values from the initial full regression to filter factors where p-value is below the significance threshold
severity: fatal
kind: domain_rule
modality: must
consequence: Including non-significant factors in orthogonalization introduces noise and may overfit the regression model,
producing unreliable residuals that contaminate downstream factor analysis
stage_ids:
- factor_orthogonalization
- id: finance-C-077
when: When naming the orthogonalized factor
action: Append the -ORTHO suffix to the original factor name to distinguish it from raw factor values
severity: fatal
kind: domain_rule
modality: must
consequence: Without the -ORTHO suffix, downstream regression scripts cannot distinguish between raw and orthogonalized
BMG factors, causing incorrect factor selection and contaminated regression results
stage_ids:
- factor_orthogonalization
- id: finance-C-078
when: When storing orthogonalization results in the database
action: Delete any existing orthogonalized factor with the same name before inserting new results to prevent duplicate
primary key violations
severity: fatal
kind: domain_rule
modality: must
consequence: Duplicate entries in carbon_risk_factor table will cause primary key constraint violations and corrupt downstream
regression analysis that depends on unique factor-date combinations
stage_ids:
- factor_orthogonalization
- id: finance-C-080
when: When storing orthogonalized factor values
action: Store the regression residuals (model.resid.values) as the bmg column in carbon_risk_factor table
severity: fatal
kind: architecture_guardrail
modality: must
consequence: Storing fitted values instead of residuals defeats the purpose of orthogonalization, as the resulting factor
still contains variance explained by correlated factors
stage_ids:
- factor_orthogonalization
- id: finance-C-082
when: When ensuring orthogonalized factors are available for downstream regressions
action: Store orthogonalized factors in the carbon_risk_factor table with matching frequency parameter so get_regressions.py
can query them by factor_name
severity: fatal
kind: architecture_guardrail
modality: must
consequence: Storing orthogonalized factors in a separate table or with different schema breaks the data access pattern
used by subsequent regression scripts, making the orthogonalized factors unavailable for selection
stage_ids:
- factor_orthogonalization
- id: finance-C-091
when: When implementing significance filtering for BMG coefficient
action: Use bmg_p_gt_abs_t < threshold to identify significant results
severity: fatal
kind: domain_rule
modality: must
consequence: Using incorrect significance test direction (e.g., bmg_p_gt_abs_t > threshold) would include non-significant
results, causing false positive climate risk identifications
stage_ids:
- results_analysis
- id: finance-C-092
when: When implementing majority significance counting across periods
action: Use HAVING count(CASE WHEN bmg_p_gt_abs_t < threshold THEN 1 END) > count(CASE WHEN bmg_p_gt_abs_t >= threshold
THEN 1 END) for majority rule
severity: fatal
kind: domain_rule
modality: must
consequence: Using >= 50% threshold instead of > 50% would misclassify stocks with exactly 50% significant periods as
having significant exposure
stage_ids:
- results_analysis
- id: finance-C-101
when: When presenting analysis results to users
action: Present backtest regression results as definitive proof of live trading performance
severity: fatal
kind: claim_boundary
modality: must_not
consequence: Presenting backtest results as live performance would mislead users into expecting similar trading returns,
violating financial regulatory guidance and causing potential financial loss
stage_ids:
- results_analysis
- id: finance-C-102
when: When using analysis results for investment decisions
action: Treat BMG factor regression results as guaranteed investment returns
severity: fatal
kind: claim_boundary
modality: must_not
consequence: Regression coefficients reflect historical market pricing of climate risk and do not predict future returns,
leading to potential financial losses if used for direct trading
stage_ids:
- results_analysis
- id: finance-C-128
when: When implementing time series data handling across each stages
action: Use Date index with name 'Date' for each time series DataFrames
severity: fatal
kind: domain_rule
modality: must
consequence: Merging and correlation calculations fail when date index naming is inconsistent, causing regression to produce
incorrect factor loadings and invalid climate risk measurements
- id: finance-C-129
when: When storing or processing time series data across each stages
action: Use only 'MONTHLY' or 'DAILY' values for the frequency field
severity: fatal
kind: domain_rule
modality: must
consequence: Invalid frequency values cause unsupported frequency exceptions, preventing data storage and blocking all
regression analysis
- id: finance-C-130
when: When storing or processing percentage values across each stages
action: Store percentage values as percentage format (e.g., 5.2) NOT decimal format (e.g., 0.052)
severity: fatal
kind: domain_rule
modality: must
consequence: Factor values multiplied by 100 during calculation but stored without conversion, causing regression coefficients
to be 100x too large and producing meaningless BMG climate risk loadings
- id: finance-C-131
when: When running regression analysis across each stages
action: Verify inner join on dates across stock returns, carbon risk factor, and Fama-French factors
severity: fatal
kind: domain_rule
modality: must
consequence: Missing date alignment causes rows with NaN values in regression, producing invalid OLS coefficients and
unreliable climate risk factor loadings
- id: finance-C-132
when: When running regression analysis across each stages
action: Require at least 20 data points for valid statistical regression results
severity: fatal
kind: domain_rule
modality: must
consequence: Regression with insufficient data points produces unreliable t-statistics and p-values, causing incorrect
conclusions about climate risk factor significance
- id: finance-C-134
when: When naming BMG climate risk factors across each stages
action: Use 'DEFAULT' as a custom factor name — it is reserved as the system default BMG factor
severity: fatal
kind: domain_rule
modality: must_not
consequence: Overwriting DEFAULT factor causes all downstream regressions to use wrong climate risk data, invalidating
entire analysis
- id: finance-C-143
when: When operating this system in production
action: Require PostgreSQL database infrastructure — the system stores each stock data, factor data, and regression results
in database tables
severity: fatal
kind: resource_boundary
modality: must
consequence: Without PostgreSQL, no data can be stored or retrieved, completely blocking all analysis and regression workflows
stage_ids:
- data_import
- id: finance-C-148
when: When executing the regression workflow
action: 'Follow the required data flow order: load factors from DB → merge on dates using inner join → calculate stock
returns with pct_change → filter outliers → run OLS regression'
severity: fatal
kind: architecture_guardrail
modality: must
consequence: Deviation from the required workflow order causes data misalignment, resulting in incorrect factor loadings
and invalid climate risk measurements
- id: finance-C-165
when: When implementing BMG (Brown-Minus-Green) factor calculation for carbon risk regression analysis
action: Calculate BMG factor as return_x minus return_y (Brown stock returns minus Green stock returns) to construct a
long-short portfolio capturing pure carbon risk premium without contamination from other factor exposures
severity: fatal
kind: domain_rule
modality: must
consequence: Incorrect BMG calculation (e.g., reversed subtraction order or different weighting) corrupts the primary
dependent variable in carbon risk analysis, causing strategies to be built on wrong factor exposures and potentially
losing capital on mispriced carbon risk
derived_from_bd_id: BD-030
- id: finance-C-174
when: When implementing or refactoring BMG factor computation logic
action: Maintain BMG formula as Brown Returns minus Green Returns (Brown - Green); changing the order to Green minus Brown
inverts the entire framework's interpretation of positive values
severity: fatal
kind: domain_rule
modality: must
consequence: Inverting the BMG formula causes all climate factor analysis to report opposite results, producing false
carbon alpha signals and misidentifying brown stock outperformance as green stock outperformance
derived_from_bd_id: BD-010
- id: finance-C-177
when: When implementing or using rolling window regression API
action: Explicitly specify overlap parameter behavior when requesting rolling window advancement; do not rely on ambiguous
window advancement logic that contradicts between BD-021 (non-overlapping for statistical independence) and BD-055 (1-period
advancement)
severity: fatal
kind: architecture_guardrail
modality: must
consequence: Contradictory window advancement logic between non-overlapping (BD-021) and overlapping (BD-055) specifications
produces invalid rolling regression results with silent errors in beta estimates and statistical inference
derived_from_bd_id: BD-094
- id: finance-C-181
when: When normalizing factor returns for regression
action: 'Verify consistent application of factor normalization: FF factors must be divided by 100 AND excess returns must
be computed as Close minus risk-free rate; verify both operations are applied to maintain consistent scale between factors
and returns'
severity: fatal
kind: domain_rule
modality: must
consequence: Inconsistent normalization (applying only one of factor/100 or Close-Rf) creates scale mismatches in regression
coefficients, causing systematic pricing errors and invalid factor risk premium estimates
derived_from_bd_id: BD-099
- id: finance-C-205
when: When configuring rolling window regression parameters
action: Verify window advancement size matches the interval default (60 months for MONTHLY, 730 days for DAILY) — BD-019,
BD-020, BD-080, and BD-021 must be consistent; verify that interval advancement logic uses the same unit (months vs
days) as the window size
severity: fatal
kind: domain_rule
modality: must
consequence: Window size misinterpretation due to unit mismatch (month-based window advancing by day-based interval) causes
incorrect beta estimates, invalid t-statistics, and wrong BMG premium conclusions that appear statistically valid but
are structurally flawed
derived_from_bd_id: BD-097
regular:
- id: finance-C-003
when: When aligning monthly stock data timestamps
action: Add MonthEnd(1) offset to index for proper financial convention alignment
severity: high
kind: domain_rule
modality: must
consequence: Without MonthEnd alignment, monthly return dates do not match financial reporting periods, causing misalignment
with Fama-French factors and BMG climate data
stage_ids:
- data_collection
- id: finance-C-004
when: When processing stock data with the Close column
action: Drop NaN values from the Close series before returning
severity: high
kind: domain_rule
modality: must
consequence: NaN values in Close column propagate to pct_change calculations, producing all-NaN rows and corrupting downstream
regression inputs
stage_ids:
- data_collection
- id: finance-C-005
when: When returning stock data from stock_df_grab
action: Verify the DataFrame index is named 'Date' as per interface contract
severity: high
kind: domain_rule
modality: must
consequence: Downstream stages like bulk_script.py:15 expect index.rename('Date') and merge operations fail when index
name is missing or incorrect
stage_ids:
- data_collection
- id: finance-C-006
when: When using yfinance API for stock data
action: Assume yfinance provides real-time data without delay
severity: medium
kind: resource_boundary
modality: must_not
consequence: yfinance data has inherent delays (15+ minutes for US stocks), presenting historical data as real-time causes
incorrect trading signals and performance claims
stage_ids:
- data_collection
- id: finance-C-008
when: When yfinance JSON decode fails (transient network error)
action: Retry up to 3 attempts before raising ValueError timeout
severity: high
kind: resource_boundary
modality: must
consequence: Without retry logic, transient network failures cause immediate data fetch failures, preventing bulk stock
imports from completing
stage_ids:
- data_collection
- id: finance-C-009
when: When importing stock data into the database
action: Skip stocks that fail to fetch (raise ValueError and catch in calling code)
severity: high
kind: operational_lesson
modality: must
consequence: Uncaught exceptions from invalid tickers or API failures halt the entire batch import process, preventing
valid stocks from being processed
stage_ids:
- data_collection
- id: finance-C-010
when: When the fetch ticker is invalid or data unavailable
action: Return empty DataFrame, not crash the calling process
severity: high
kind: operational_lesson
modality: must
consequence: Invalid ticker causing unhandled exception terminates multiprocessing pool workers, losing progress on all
pending stocks in bulk imports
stage_ids:
- data_collection
- id: finance-C-012
when: When representing fetched stock data as results
action: Claim yfinance historical data represents real-time trading performance
severity: high
kind: claim_boundary
modality: must_not
consequence: Presenting historical backtest results as live trading performance violates financial regulations and misleads
investors about expected returns
stage_ids:
- data_collection
- id: finance-C-013
when: When handling fetched stock price data
action: Use raw yfinance data as the sole source without validation against other data providers
severity: medium
kind: claim_boundary
modality: must_not
consequence: Relying exclusively on yfinance without cross-validation risks data quality issues (delays, missing data,
incorrect splits) being undetected and propagated to analysis
stage_ids:
- data_collection
- id: finance-C-014
when: When accessing stock price data in downstream stages
action: Use stock_df_grab() wrapper function as the data entry point, not yfinance directly
severity: high
kind: architecture_guardrail
modality: must
consequence: Bypassing the data ingestion layer causes missing Date column formatting, incorrect index naming, and inconsistent
data format across downstream stages
stage_ids:
- data_collection
- id: finance-C-015
when: When formatting stock data output
action: Convert Date index to date objects and position Date column as first column in DataFrame
severity: high
kind: architecture_guardrail
modality: must
consequence: Inconsistent date handling causes merge operations in bulk_script.py and factor_regression.py to fail, producing
incorrect or empty merged datasets
stage_ids:
- data_collection
- id: finance-C-018
when: When storing financial factor values in PostgreSQL
action: use DECIMAL data type with specified precision instead of FLOAT
severity: high
kind: domain_rule
modality: must
consequence: Floating-point representation causes rounding errors in financial calculations, leading to incorrect factor
loadings and misstated investment performance metrics
stage_ids:
- data_import
- id: finance-C-019
when: When inserting stock return data
action: accept returns greater than 100% or NaN values into stock_data table
severity: high
kind: domain_rule
modality: must_not
consequence: Abnormal returns (>100%) and NaN values cause division errors and corrupt downstream regressions, producing
meaningless beta/alpha estimates
stage_ids:
- data_import
- id: finance-C-021
when: When performing data imports via COPY command
action: use staging tables with ON CONFLICT DO NOTHING to prevent duplicate key violations on re-runs
severity: high
kind: domain_rule
modality: must
consequence: Without idempotent upsert behavior, re-running data imports throws unique constraint violations, blocking
incremental updates and requiring full database recreation
stage_ids:
- data_import
- id: finance-C-023
when: When importing MSCI constituents and weights
action: populate both stocks table (with ticker, name, sector) and stock_components table (with parent ticker, component
stock, percentage)
severity: high
kind: domain_rule
modality: must
consequence: Missing either stocks or stock_components records breaks the parent-component relationship, preventing composite
ETF return calculations and sector analysis
stage_ids:
- data_import
- id: finance-C-024
when: When importing Fama-French factors
action: store each four factors (Mkt-RF, SMB, HML, WML) in the ff_factor table with unified structure
severity: high
kind: architecture_guardrail
modality: must
consequence: Without WML (weak momentum) in the same table, factor regressions miss the momentum factor, producing incorrect
four-factor model estimates
stage_ids:
- data_import
- id: finance-C-025
when: When storing stock price data
action: store both close price and pre-computed return in stock_data table
severity: medium
kind: architecture_guardrail
modality: must
consequence: Without pre-computed returns, every query must recalculate pct_change, causing repeated O(n) scans over large
time series and slow dashboard queries
stage_ids:
- data_import
- id: finance-C-026
when: When creating database tables
action: define explicit PRIMARY KEY constraints on each tables
severity: high
kind: architecture_guardrail
modality: must
consequence: Without primary keys, duplicate rows can accumulate silently, causing one-to-many relationship failures in
JOIN queries and incorrect regression results
stage_ids:
- data_import
- id: finance-C-027
when: When importing index constituent data
action: clear existing constituents for the same parent ticker before inserting new ones
severity: high
kind: operational_lesson
modality: must
consequence: Without deleting old constituents first, the stock_components table accumulates stale entries, causing outdated
index weights to pollute regression analysis
stage_ids:
- data_import
- id: finance-C-028
when: When importing bond factor data
action: delete each existing rows before import (non-idempotent full replacement)
severity: medium
kind: operational_lesson
modality: must
consequence: Unlike other tables, bond_factor uses DELETE + COPY without ON CONFLICT, so stale data remains if import
fails midway through the file
stage_ids:
- data_import
- id: finance-C-030
when: When performing data imports
action: claim that imported historical data equals real-time market conditions
severity: high
kind: claim_boundary
modality: must_not
consequence: Presenting historical backtest results as equivalent to live trading violates regulatory standards and investor
protection requirements, as past performance does not guarantee future results
stage_ids:
- data_import
- id: finance-C-031
when: When using the system for investment decisions
action: claim that simulated portfolio returns from imported data represent actual trading performance
severity: high
kind: claim_boundary
modality: must_not
consequence: Simulated returns exclude transaction costs, slippage, and liquidity constraints, causing material overstatement
of expected live trading profits
stage_ids:
- data_import
- id: finance-C-049
when: When filtering extreme returns from regression input
action: exclude observations where |return| >= 0.5 (50%) using winsorization to remove outliers
severity: high
kind: domain_rule
modality: must
consequence: Extreme returns exceeding 50% likely represent data errors or corporate actions that dominate regression
coefficients, causing spurious factor loadings
stage_ids:
- factor_regression
- id: finance-C-051
when: When generating output DataFrames from factor regression
action: set DateTime index name to 'Date' in merged DataFrame to maintain naming consistency
severity: high
kind: architecture_guardrail
modality: must
consequence: Downstream scripts (get_regressions.py) expect 'Date' index name; mismatched index names cause merge failures
and silent data loss
stage_ids:
- factor_regression
- id: finance-C-052
when: When validating date range inputs against available data
action: raise custom DateInRangeError when start_date >= max_date or end_date <= min_date
severity: high
kind: architecture_guardrail
modality: must
consequence: Invalid date boundaries cause empty regression datasets or silent data truncation, producing meaningless
or missing results without clear error messages
stage_ids:
- factor_regression
- id: finance-C-053
when: When ensuring OLS has sufficient degrees of freedom
action: require number of observations to exceed number of factors by at least 10 (shape[0] > shape[1] + 10)
severity: high
kind: domain_rule
modality: must
consequence: Insufficient degrees of freedom causes OLS estimation to fail or produce unstable estimates; the +10 buffer
ensures reliable standard error estimation
stage_ids:
- factor_regression
- id: finance-C-054
when: When computing factor regression coefficients
action: include intercept term (constant) as first column in regression design matrix
severity: high
kind: architecture_guardrail
modality: must
consequence: FF factor models require an intercept to capture abnormal returns; omitting the constant produces biased
coefficient estimates
stage_ids:
- factor_regression
- id: finance-C-055
when: When performing rolling/interval regression analysis
action: only run regression when overlapping data exists between stock, FF factors, and carbon factor after start_date
severity: high
kind: architecture_guardrail
modality: must
consequence: Regression without complete factor coverage produces invalid coefficients; attempting regression with missing
data raises ValueError in code
stage_ids:
- factor_regression
- id: finance-C-056
when: When presenting factor regression results
action: claim that backtest regression coefficients predict future stock performance or guarantee investment returns
severity: high
kind: claim_boundary
modality: must_not
consequence: Past factor loadings do not guarantee future results; regression captures historical relationships that may
change due to market regime shifts, structural breaks, or factor decay
stage_ids:
- factor_regression
- id: finance-C-057
when: When using regression coefficients for investment decisions
action: present the software output as financial investment advice or recommendation
severity: high
kind: claim_boundary
modality: must_not
consequence: Disclaimer explicitly states this is not investment advice; presenting regression output as recommendations
violates the stated purpose and creates legal/regulatory risk
stage_ids:
- factor_regression
- id: finance-C-058
when: When running bulk regressions for multiple stocks
action: enable silent mode to suppress verbose output and improve processing performance
severity: medium
kind: operational_lesson
modality: should
consequence: Verbose printing for each of hundreds of stocks creates excessive I/O overhead, significantly slowing bulk
regression processing
stage_ids:
- factor_regression
- id: finance-C-059
when: When converting date inputs for factor regression
action: parse date strings in YYYY-MM-DD format; non-string dates are passed through unchanged
severity: medium
kind: architecture_guardrail
modality: must
consequence: Incorrectly formatted date strings cause ValueError in datetime.strptime; silent pass-through of non-string
dates prevents type checking
stage_ids:
- factor_regression
- id: finance-C-060
when: When handling edge case of empty merged DataFrame
action: raise ValueError immediately when merged DataFrame length is zero before attempting regression
severity: high
kind: architecture_guardrail
modality: must
consequence: Attempting OLS on empty DataFrame causes downstream errors; explicit validation with clear error message
prevents cryptic failures
stage_ids:
- factor_regression
- id: finance-C-062
when: When implementing bulk data processing with rolling windows
action: Handle missing data by dropping rows with NaN values before regression
severity: high
kind: domain_rule
modality: must
consequence: Inner join on dates with missing factor data produces empty DataFrames, causing silent failures and missing
regression results for affected stocks
stage_ids:
- bulk_regression
- id: finance-C-063
when: When setting the default regression interval
action: Use 60 months (5 years) as default for MONTHLY frequency and 730 days (2 years) for DAILY frequency
severity: medium
kind: domain_rule
modality: must
consequence: Using insufficient observations reduces statistical significance; using too many observations includes stale
data that degrades factor model accuracy
stage_ids:
- bulk_regression
- id: finance-C-064
when: When configuring PostgreSQL connection pool for bulk regression
action: Limit connection pool to maximum 20 concurrent connections
severity: high
kind: resource_boundary
modality: must
consequence: Exceeding pool size causes connection exhaustion errors, blocking all database operations and halting regression
execution mid-process
stage_ids:
- bulk_regression
- id: finance-C-066
when: When loading factor data for bulk regression workers
action: Load each factor data once and share via itertools.repeat across workers
severity: high
kind: resource_boundary
modality: must
consequence: Per-worker data loading causes redundant database queries, multiplying I/O load by number of workers and
drastically slowing bulk regression execution
stage_ids:
- bulk_regression
- id: finance-C-068
when: When storing regression results in the database
action: Use DELETE before INSERT pattern for UPSERT semantics on stock_stats table
severity: high
kind: architecture_guardrail
modality: must
consequence: Duplicate key violations prevent regression storage; without idempotent writes, re-running regressions produces
constraint errors instead of updates
stage_ids:
- bulk_regression
- id: finance-C-069
when: When completing bulk regression processing
action: Refresh materialized views (stock_and_stats, stock_component_and_stats, stock_parent_and_stats) after bulk completion
severity: high
kind: architecture_guardrail
modality: must
consequence: Stale materialized view data causes queries to return outdated regression results, misleading downstream
analysis and portfolio construction
stage_ids:
- bulk_regression
- id: finance-C-070
when: When implementing rolling window regression advancement
action: Advance window by interval_freq (1 day/month) to create overlapping windows
severity: medium
kind: architecture_guardrail
modality: must
consequence: Non-overlapping windows reduce time-series observations; overlapping windows provide more data points for
statistical significance while maintaining temporal ordering
stage_ids:
- bulk_regression
- id: finance-C-071
when: When configuring frequency parameter for regression
action: Use only MONTHLY or DAILY as valid frequency values
severity: high
kind: domain_rule
modality: must
consequence: 'Unsupported frequency causes Exception with message ''Unsupported frequency: {value}'', preventing any regression
from running'
stage_ids:
- bulk_regression
- id: finance-C-072
when: When presenting regression results to users
action: Claim that backtest regression results predict future trading performance
severity: high
kind: claim_boundary
modality: must_not
consequence: Presenting historical factor loadings as predictive of future returns violates regulatory guidance and misleads
investors about actual expected performance
stage_ids:
- bulk_regression
- id: finance-C-073
when: When displaying regression coefficient results
action: Present statistical results as investment recommendations
severity: high
kind: claim_boundary
modality: must_not
consequence: Factor loadings (beta, t-stats) are analytical outputs, not buy/sell signals; presenting them as recommendations
violates the project's stated purpose and legal disclaimers
stage_ids:
- bulk_regression
- id: finance-C-074
when: When using ThreadedConnectionPool for database operations
action: Use ThreadedConnectionPool (not SimpleConnectionPool) to support concurrent thread access
severity: high
kind: resource_boundary
modality: must
consequence: SimpleConnectionPool does not handle concurrent thread access safely, causing race conditions and intermittent
'connection already closed' errors
stage_ids:
- bulk_regression
- id: finance-C-079
when: When merging factor dataframes for orthogonalization
action: Use left join when merging additional factors to preserve each BMG observations, then call dropna() to remove
incomplete rows
severity: high
kind: domain_rule
modality: must
consequence: Using outer join or skipping dropna() will include rows with missing factor values, producing NaN residuals
that corrupt the orthogonalized factor series
stage_ids:
- factor_orthogonalization
- id: finance-C-081
when: When running batch processing of each BMG series
action: Skip factor names that already end with -ORTHO to prevent re-orthogonalization of already-orthogonalized factors
severity: high
kind: architecture_guardrail
modality: must
consequence: Re-orthogonalizing already-orthogonalized factors produces doubly-transformed residuals with degraded statistical
properties and unpredictable factor loadings
stage_ids:
- factor_orthogonalization
- id: finance-C-083
when: When executing orthogonalization batch processing
action: Use psycopg2.extras.execute_batch() with parameterized queries to safely insert DataFrame rows into the database
severity: high
kind: resource_boundary
modality: must
consequence: String formatting SQL queries exposes the system to SQL injection attacks and will fail when factor names
contain special characters like dashes or underscores
stage_ids:
- factor_orthogonalization
- id: finance-C-084
when: When requiring data availability for orthogonalization
action: Verify carbon_risk_factor, ff_factor, and additional_factors tables contain overlapping date ranges for the selected
frequency
severity: high
kind: resource_boundary
modality: must
consequence: If date ranges do not overlap, the inner merge will produce an empty DataFrame, the regression will fail,
and no orthogonalized factor will be generated
stage_ids:
- factor_orthogonalization
- id: finance-C-085
when: When processing orthogonalization in batch mode with --each flag
action: Verify that the database connection pool is properly initialized before accessing it in process_factor()
severity: high
kind: resource_boundary
modality: must
consequence: Accessing uninitialized connection pool will raise an AttributeError, causing the batch orthogonalization
script to fail entirely
stage_ids:
- factor_orthogonalization
- id: finance-C-086
when: When interpreting orthogonalization results
action: Claim that orthogonalized factor returns equal or predict live trading performance
severity: high
kind: claim_boundary
modality: must_not
consequence: Backtested orthogonalization results do not guarantee future live trading returns. Orthogonalization removes
historical correlation structure but market dynamics may change, invalidating the factor's predictive power in forward-looking
trading
stage_ids:
- factor_orthogonalization
- id: finance-C-087
when: When using the orthogonalization system
action: Present orthogonalization results as investment advice or specific security recommendations
severity: high
kind: claim_boundary
modality: must_not
consequence: The system is for informational and research purposes only. Orthogonalization is a mathematical transformation
that removes statistical correlation—it does not constitute financial advice about particular securities or investment
strategies
stage_ids:
- factor_orthogonalization
- id: finance-C-088
when: When running orthogonalization with the --each flag
action: Process factors that are already orthogonalized (ending with -ORTHO) to avoid double transformation
severity: high
kind: operational_lesson
modality: must_not
consequence: Processing already-orthogonalized factors produces doubly-transformed residuals that have degraded statistical
properties and unpredictable behavior in subsequent regressions
stage_ids:
- factor_orthogonalization
- id: finance-C-089
when: When configuring the significance threshold for orthogonalization
action: Use a threshold of 0.1 (10%) as the default p-value cutoff for including factors in orthogonalization
severity: medium
kind: operational_lesson
modality: should
consequence: Using an overly strict threshold (e.g., 0.05) may exclude factors that have meaningful correlation with the
BMG factor, leaving residual correlation that contaminates downstream analysis
stage_ids:
- factor_orthogonalization
- id: finance-C-090
when: When verifying orthogonalization completeness
action: Confirm the correlation between the orthogonalized factor and each FF factor is below the significance threshold
after orthogonalization
severity: high
kind: domain_rule
modality: must
consequence: If orthogonalized factor still shows significant correlation with any FF factor, the orthogonalization was
incomplete, leading to factor contamination in subsequent regression analysis
stage_ids:
- factor_orthogonalization
- id: finance-C-093
when: When retrieving the most recent regression results
action: Query max(thru_date) within each (ticker, bmg_factor_name) group to identify final period results
severity: high
kind: domain_rule
modality: must
consequence: Without max(thru_date) filtering, results may include stale regression periods causing outdated analysis
and incorrect conclusions
stage_ids:
- results_analysis
- id: finance-C-094
when: When joining stock_stats with stocks table
action: Use LEFT JOIN for stocks table to preserve stocks without sector assignments
severity: high
kind: domain_rule
modality: must
consequence: Using INNER JOIN would exclude stocks with missing sector information, causing incomplete sector aggregation
and missing climate risk identification
stage_ids:
- results_analysis
- id: finance-C-095
when: When specifying BMG factor name for analysis
action: Use 'DEFAULT' as a factor_name parameter value
severity: high
kind: resource_boundary
modality: must_not
consequence: DEFAULT is a reserved factor name in the carbon_risk_factor table, passing it as a filter parameter would
return no results or unintended data
stage_ids:
- results_analysis
- id: finance-C-096
when: When accessing regression results from the database
action: REFRESH MATERIALIZED VIEW before querying to verify latest results are available
severity: medium
kind: resource_boundary
modality: must
consequence: Without refreshing materialized views, query results may be stale, showing outdated regression periods instead
of current analysis
stage_ids:
- results_analysis
- id: finance-C-097
when: When analyzing sector-level climate risk exposure
action: Group by both sector and bmg_factor_name to avoid mixing different climate factors
severity: high
kind: architecture_guardrail
modality: must
consequence: Without proper grouping by factor_name, sector aggregations would mix results from different BMG series,
causing misleading climate risk assessments
stage_ids:
- results_analysis
- id: finance-C-098
when: When aggregating stock-level results to sector level
action: Count distinct tickers per sector, not raw rows, to avoid double-counting
severity: high
kind: architecture_guardrail
modality: must
consequence: Without distinct ticker counting, stocks appearing in multiple regression periods would be counted multiple
times, inflating sector significance numbers
stage_ids:
- results_analysis
- id: finance-C-099
when: When accepting p-value threshold input from users
action: Use default significance threshold of 0.05 (5% level) when no threshold specified
severity: medium
kind: operational_lesson
modality: should
consequence: Without a reasonable default, users might accidentally filter results with threshold 0 (no significant results)
or fail to specify threshold at all
stage_ids:
- results_analysis
- id: finance-C-100
when: When calculating ratio of significant stocks to index stocks by sector
action: Handle missing sector counts by returning None or zero ratio, not causing division errors
severity: medium
kind: operational_lesson
modality: must
consequence: Division by sector count of zero would crash the analysis output, preventing users from identifying climate
risk patterns
stage_ids:
- results_analysis
- id: finance-C-103
when: When interpreting BMG coefficient significance results
action: Consider p-value significance only within the context of model assumption diagnostics (Jarque-Bera, Breusch-Pagan,
Durbin-Watson)
severity: high
kind: claim_boundary
modality: must
consequence: Ignoring model diagnostics and treating p-value alone as proof of climate risk could identify spurious correlations
due to non-normal residuals or heteroskedasticity
stage_ids:
- results_analysis
- id: finance-C-104
when: When evaluating statistical significance of BMG coefficients
action: Skip validation of Jarque-Bera p-value > 0.05 confirming Gaussian residuals
severity: high
kind: domain_rule
modality: must_not
consequence: OLS regression assumes Gaussian-distributed residuals; violating this assumption makes p-values unreliable
and significance conclusions invalid
stage_ids:
- results_analysis
- id: finance-C-105
when: When comparing climate risk results across different BMG factor series
action: Separate analysis by bmg_factor_name to avoid mixing orthogonalized and non-orthogonalized climate factors
severity: high
kind: architecture_guardrail
modality: must
consequence: Mixing results from different BMG series (e.g., XOP-SMOG vs XOP-SMOG_HML_HYGIEI) would produce inconsistent
and incomparable climate risk assessments
stage_ids:
- results_analysis
- id: finance-C-106
when: When interpreting sector-level climate risk aggregation
action: Conclude that each stocks in a sector have the same climate risk profile based on aggregate counts
severity: medium
kind: claim_boundary
modality: must_not
consequence: Sector aggregation shows distribution of significant exposures, not uniform risk; individual stock-level
analysis is required for portfolio construction
stage_ids:
- results_analysis
- id: finance-C-133
when: When handling composite stock tickers across each stages
action: Populate stock_components table with component_stock and percentage weights before processing composite tickers
severity: high
kind: domain_rule
modality: must
consequence: Missing stock_components entries cause division by zero or NaN composite returns, breaking portfolio-level
BMG climate risk analysis
- id: finance-C-135
when: When processing stock return data across each stages
action: Filter out and exclude returns exceeding 100% (return > 1) as abnormal data
severity: high
kind: domain_rule
modality: must
consequence: Abnormal returns from data errors distort regression coefficients, leading to incorrect BMG factor loadings
and misguided climate risk assessments
- id: finance-C-136
when: When running OLS regression across each stages
action: Cap regression input values at [-0.5, 0.5] range to filter extreme outliers before estimation
severity: high
kind: domain_rule
modality: must
consequence: Extreme outliers in returns corrupt OLS estimation, producing unreliable coefficients and invalid statistical
tests (Jarque-Bera, Breusch-Pagan, Durbin-Watson)
- id: finance-C-137
when: When presenting or reporting this system's regression results to users
action: Claim that backtested factor loadings equal expected live trading returns — regression analysis ignores transaction
costs, slippage, market impact, and execution delays
severity: high
kind: claim_boundary
modality: must_not
consequence: Users allocate live capital based on inflated backtest returns, leading to severe underperformance in live
trading and potential financial loss
- id: finance-C-138
when: When marketing or describing this system's capabilities
action: Claim real-time trading support — this is a historical factor analysis and regression system, NOT a live trading
execution platform
severity: high
kind: claim_boundary
modality: must_not
consequence: Users purchase or implement this system expecting live trading capabilities that do not exist, leading to
implementation failures and missed investment opportunities
- id: finance-C-139
when: When applying this system's models to non-equity asset classes
action: Claim BMG climate risk factor applicability to bonds, commodities, or other non-equity assets — the Fama-French-Carhart
model is designed for equity analysis
severity: high
kind: claim_boundary
modality: must_not
consequence: Invalid factor model application to non-equity assets produces meaningless risk loadings, leading to incorrect
portfolio construction and financial losses
- id: finance-C-140
when: When presenting statistical significance results
action: Claim predictive certainty based on p-values < 0.05 — statistical significance in historical regression does not
guarantee future factor effectiveness
severity: high
kind: claim_boundary
modality: must_not
consequence: Overconfidence in statistical significance leads to over-trading, concentrated positions, and losses when
market regimes change
- id: finance-C-141
when: When using yfinance for stock data retrieval
action: Acknowledge that yfinance provides delayed data (approximately 15 minutes for US stocks, longer for international)
— it is NOT real-time market data
severity: high
kind: resource_boundary
modality: must
consequence: Backtested returns computed with delayed data differ from real-time prices, causing look-ahead bias and overestimated
strategy performance
- id: finance-C-142
when: When depending on external factor data providers
action: Verify Fama-French factor data availability from Ken French Data Library — external factors have coverage limitations
and may not extend to current dates
severity: high
kind: resource_boundary
modality: must
consequence: Missing factor data blocks regression execution, leaving portfolios without updated climate risk loadings
during critical market periods
- id: finance-C-144
when: When computing BMG factor loadings for stocks
action: Use rolling window intervals of at least 60 months for MONTHLY frequency or 730 days for DAILY frequency to verify
statistical validity
severity: medium
kind: resource_boundary
modality: must
consequence: Shorter intervals produce statistically unreliable factor loadings due to insufficient sample size, leading
to incorrect climate risk rankings
- id: finance-C-145
when: When updating stock or factor data in the database
action: Refresh materialized views (stock_and_stats, stock_component_and_stats, stock_parent_and_stats) after data modifications
severity: high
kind: operational_lesson
modality: must
consequence: Stale materialized views show outdated regression results, causing users to make portfolio decisions based
on old climate risk assessments
- id: finance-C-146
when: When loading international stock tickers
action: Use correct exchange suffix format for non-US tickers (e.g., .L for London, .TO for Toronto, .AS for Amsterdam,
.SW for Switzerland)
severity: medium
kind: operational_lesson
modality: must
consequence: Incorrect ticker format causes data retrieval failures, blocking climate risk analysis for international
equities
- id: finance-C-147
when: When accessing data across each analysis stages
action: Access each stock data through database load functions (load_stocks_from_db, load_stocks_data_with_returns_from_db)
to verify consistent data format and caching
severity: high
kind: architecture_guardrail
modality: must
consequence: Direct data access bypasses caching and format conversion, causing inconsistent regression results across
different analysis runs
- id: finance-C-149
when: When accessing database resources
action: Use database connection pool for concurrent operations — never create new connections in parallel processing loops
severity: high
kind: architecture_guardrail
modality: must
consequence: Unmanaged connection creation exhausts database connections, causing connection failures and blocking all
analysis operations
- id: finance-C-150
when: When maintaining data integrity across database operations
action: Use autocommit mode for read-only database operations to avoid holding locks on materialized views
severity: medium
kind: architecture_guardrail
modality: must
consequence: Transaction locks on materialized views block concurrent analysis requests, causing analysis delays and timeout
failures
- id: finance-C-151
when: When collecting monthly interval data using yfinance
action: Verify that the 1mo yfinance interval provides MonthEnd-aligned data for the specific stock universe being analyzed;
validate that returned dates match known MonthEnd dates (e.g., 2023-01-31, 2023-02-28) and not arbitrary calendar month
boundaries
severity: medium
kind: operational_lesson
modality: should
consequence: Using 1mo interval assumes proper MonthEnd alignment, but yfinance may return non-aligned dates causing monthly
return calculations to shift by days, leading to systematic misalignment between backtest returns and actual financial
reporting periods
derived_from_bd_id: BD-001
- id: finance-C-152
when: When implementing multiprocessing for bulk regression processing
action: Use multiprocessing.set_start_method('spawn') to create fresh processes with inherited state; must NOT use 'fork'
method as it causes psycopg2 connection corruption on Linux and crashes on macOS/Windows
severity: high
kind: architecture_guardrail
modality: must
consequence: Forking processes with psycopg2 connections causes connection sharing corruption, leading to 'connection
already closed' errors or data corruption in database operations during bulk regression batches
derived_from_bd_id: BD-022
- id: finance-C-153
when: When implementing database materialized view refresh strategy for batch regression workloads
action: Call refresh_views only after each regression batch completes to batch materialized view updates into single operations;
must NOT refresh during regression processing as concurrent queries may see partially-updated views
severity: high
kind: domain_rule
modality: must
consequence: Refreshing views during regression processing creates a window where concurrent queries receive partially-updated
data, causing inconsistent factor values and potentially generating incorrect regression coefficients that differ between
runs
derived_from_bd_id: BD-093
- id: finance-C-154
when: When validating factor_name input for BMG series insertion
action: Reject or sanitize any user-provided factor_name equal to 'DEFAULT' at insert time; enforce schema constraint
that 'DEFAULT' is reserved for system use and cannot be user-defined
severity: high
kind: domain_rule
modality: must
consequence: Allowing 'DEFAULT' as a user-defined factor_name creates ambiguity in queries that rely on the schema default
value, causing queries to return unexpected mixed results instead of the intended default factor
derived_from_bd_id: BD-088
- id: finance-C-155
when: When implementing factor data storage or modifying database schema in the data_import stage
action: Store WML (weak momentum) in the same ff_factor table as other Fama-French factors — do not separate into a different
table or external storage
severity: high
kind: domain_rule
modality: must
consequence: Separating WML into a different table breaks existing JOIN queries that assume unified factor storage, causing
factor data access failures and requiring schema refactoring across multiple modules
derived_from_bd_id: BD-007
- id: finance-C-156
when: When configuring database connection pool for bulk_regression workers
action: Set connection pool size to exactly 20 to support up to 20 concurrent regression workers — do not exceed PostgreSQL
connection limits or reduce below this threshold
severity: high
kind: domain_rule
modality: must
consequence: Connection pool smaller than 20 causes worker queuing and dramatically slows bulk regression; pool larger
than 20 exceeds PostgreSQL connection limits and causes database connection failures
derived_from_bd_id: BD-023
- id: finance-C-157
when: When using the framework's default connection pool size parameter for bulk regression processing
action: Verify that connection pool size = 20 matches the available PostgreSQL connection limits and actual worker count
requirements; adjust based on system resources if needed
severity: medium
kind: operational_lesson
modality: should
consequence: Hardcoded pool size of 20 may exhaust connections on systems with limited PostgreSQL allocation or cause
insufficient parallelism when running more than 20 workers concurrently
derived_from_bd_id: BD-023
- id: finance-C-158
when: When defining index benchmarks for factor model estimation in the Index Definition stage
action: Use IVV for S&P 500 US market proxy and XWD.TO for MSCI World global market proxy — do not substitute with SPY,
EEM, or other ETFs without re-evaluating factor model representativeness
severity: high
kind: domain_rule
modality: must
consequence: Using SPY instead of IVV increases expense ratio costs; using EEM changes the benchmark from developed markets
to emerging markets, distorting global factor exposure estimates
derived_from_bd_id: BD-042
- id: finance-C-159
when: When implementing stock data merging logic in the returns.calculation stage
action: Use OUTER JOIN when merging component stock data to preserve each observations from each stock — do not use INNER
JOIN as it loses data when trading dates misalign
severity: high
kind: domain_rule
modality: must
consequence: Inner join silently drops observations where dates don't align, reducing sample size unpredictably and causing
survivorship bias in factor calculations
derived_from_bd_id: BD-070
- id: finance-C-160
when: When computing BMG factor values in the bmg_factor_computation stage
action: Remove NaN values before storing BMG — inner join on returns to verify valid paired brown and green observations;
do not forward-fill or leave NaN values
severity: high
kind: domain_rule
modality: must
consequence: NaN values in BMG series corrupt factor time series and cause regression analysis to fail or produce invalid
coefficients due to missing observation pairs
derived_from_bd_id: BD-012
- id: finance-C-161
when: When using the framework's default abnormal return filter for data validation in the invariant stage
action: Verify that the threshold data['r'] <= 1 correctly identifies data errors versus valid extreme returns for your
asset universe; manually review returns above 100% before inclusion
severity: medium
kind: operational_lesson
modality: should
consequence: The 100% threshold may filter legitimate intraday spikes or overnight gap returns in volatile markets, removing
valid observations that could significantly affect factor calculations
derived_from_bd_id: BD-091
- id: finance-C-162
when: When implementing rolling window logic for MONTHLY factor estimation
action: Use exactly 60 months (5 years) as the default rolling window for MONTHLY frequency — this window provides sufficient
data points for statistical significance while remaining responsive to regime changes
severity: high
kind: domain_rule
modality: must
consequence: Windows shorter than 60 months for MONTHLY frequency have insufficient data points for statistically significant
factor detection; windows longer than 60 months introduce regime shift contamination
derived_from_bd_id: BD-019
- id: finance-C-163
when: When implementing factor orthogonalization logic in the factor_orthogonalization stage
action: Only orthogonalize against factors with p-value below 0.1 — factors with higher p-values are due to chance correlation
and should be retained as-is to avoid destroying valid signal
severity: high
kind: domain_rule
modality: must
consequence: Orthogonalizing factors with non-significant correlation removes useful information from the model, causing
factor exposure estimates to be systematically biased and backtest returns to be overstated
derived_from_bd_id: BD-025
- id: finance-C-164
when: When implementing or modifying statistical significance testing in carbon risk regression analysis
action: Define statistical significance using the two-tailed p-value condition bmg_p_gt_abs_t < threshold (typically 0.05),
where bmg_p is the p-value for the BMG coefficient and abs_t is the absolute value of the t-statistic
severity: high
kind: domain_rule
modality: must
consequence: Using one-tailed test or different p-value formulations produces incorrect significance decisions, causing
strategies to incorrectly accept or reject carbon risk factors based on flawed statistical inference
derived_from_bd_id: BD-028
- id: finance-C-166
when: When implementing date range validation for factor regression input data
action: 'Enforce strict inequality validation: start_date < max_data AND end_date > min_data, ensuring at least one data
point exists on each side of the requested window; raise DateInRangeError if violated instead of silent truncation'
severity: high
kind: domain_rule
modality: must
consequence: Omitting the bidirectional date range check allows regressions to run with truncated input data, producing
biased coefficients and misleading performance attribution that leads to strategies with incorrect risk factor loadings
derived_from_bd_id: BD-084
- id: finance-C-167
when: When applying winsorization to factor regression input data
action: Apply winsorization threshold of |value| < 0.5 to remove extreme returns exceeding 50x monthly return as likely
data errors (stock splits, delistings); verify the threshold matches actual data error patterns and adjust if needed
severity: medium
kind: operational_lesson
modality: should
consequence: Using a different winsorization threshold or disabling it entirely causes outliers to dominate regression
coefficients, leading to strategies that overweight stocks with data errors and underweight legitimate high-volatility
periods
derived_from_bd_id: BD-015
- id: finance-C-168
when: When implementing multiprocessing for production database operations
action: Maintain the interaction contract between BD-022 (spawn start method), BD-090 (connection get/put pairing), and
BD-024 (data pre-loading via itertools.repeat); do not change spawn to fork, do not reorder data pre-loading relative
to worker spawning, and preserve connection pairing discipline
severity: high
kind: architecture_guardrail
modality: must
consequence: Breaking the spawn/get-put/preload interaction causes connection pool corruption or N×memory consumption
from deep-copying DataFrames to each worker, leading to database connection errors or memory exhaustion in production
derived_from_bd_id: BD-100
- id: finance-C-169
when: When implementing OLS regression for factor analysis
action: Enforce minimum 20 data points for valid OLS regression; reject or warn when fewer observations are available
as standard errors become unreliable, t-statistics lose validity, and confidence intervals widen unpredictably
severity: high
kind: domain_rule
modality: must
consequence: Running regression with fewer than 20 data points produces unreliable standard errors and invalid t-statistics,
causing strategies to be selected based on statistically meaningless regressions that appear valid
derived_from_bd_id: BD-083
- id: finance-C-170
when: When processing data in production data collection pipeline
action: Assume the framework handles stale data detection and automatic expiry — this capability is confirmed absent;
the framework does not validate data freshness or expire outdated records
severity: high
kind: claim_boundary
modality: must_not
consequence: Using stale data in regression analysis produces incorrect coefficients based on outdated market conditions,
leading to strategies with wrong factor loadings and significant capital losses in live trading
derived_from_bd_id: BD-GAP-003
- id: finance-C-171
when: When implementing data collection pipeline for factor regression
action: 'Implement timestamp validation for each data record: compare record timestamp against current time, flag records
exceeding configured expiry_threshold (e.g., 30 days for daily data), and prevent processing of stale data without explicit
user acknowledgment via a data_freshness_warning parameter'
severity: high
kind: operational_lesson
modality: must
consequence: Without stale data detection, regression analysis uses outdated market data leading to wrong factor coefficients
and strategies that fail to adapt to current market conditions
derived_from_bd_id: BD-GAP-003
- id: finance-C-172
when: When implementing or modifying any code involving random number generation
action: Assume each random number generators have reproducible seeds by default — random seed coverage is confirmed incomplete;
numpy.random, random module, and other RNG calls may lack explicit seed configuration
severity: high
kind: claim_boundary
modality: must_not
consequence: Using non-reproducible RNG seeds causes backtest results to vary between runs, making it impossible to reproduce
strategy performance or verify code changes against established benchmarks
derived_from_bd_id: BD-GAP-006
- id: finance-C-173
when: When implementing any code involving random number generation for factor analysis
action: Audit each numpy.random, random, and other RNG calls in the codebase; verify each has an explicit seed parameter
or inherits from a global RNG with a configurable seed; call set_random_seed(global_seed) at initialization and document
seed requirements in code comments
severity: high
kind: operational_lesson
modality: must
consequence: Missing random seed coverage causes backtest results to be non-reproducible, preventing verification of strategy
changes and making it impossible to distinguish code changes from random variation in results
derived_from_bd_id: BD-GAP-006
- id: finance-C-175
when: When calculating composite index fund returns (IVV, XWD.TO)
action: Use weighted average of components based on index constituent weights, not equal weighting; weighted averaging
accurately represents actual index composition while equal weighting over-weights small-cap stocks
severity: high
kind: domain_rule
modality: must
consequence: Using equal weighting instead of weighted averaging over-weights small-cap stocks that comprise small portions
of indices, causing composite returns to diverge from actual index performance and distorting backtest results
derived_from_bd_id: BD-011
- id: finance-C-176
when: When performing factor orthogonalization using OLS regression
action: Validate homoscedasticity and absence of autocorrelation in factor residuals before applying OLS; if these assumptions
are violated, use Generalized Least Squares (GLS) to handle heteroscedastic errors
severity: medium
kind: operational_lesson
modality: should
consequence: OLS assumes homoscedastic errors and no autocorrelation; violating these assumptions produces biased factor
decompositions that cause the orthogonalization to attribute variance incorrectly between market and idiosyncratic components
derived_from_bd_id: BD-063
- id: finance-C-178
when: When loading data for parallel bulk regression processing
action: Load each data before spawning worker processes and pass pre-loaded DataFrames via reference (e.g., itertools.repeat);
do not implement per-worker data loading as it creates race conditions if underlying data changes between worker start
times
severity: high
kind: operational_lesson
modality: must
consequence: Per-worker data loading introduces race conditions where different workers may process inconsistent snapshots
of data if the dataset changes between worker initialization, causing inconsistent regression results across parallel
windows
derived_from_bd_id: BD-024
- id: finance-C-179
when: When determining factor significance using rolling window analysis
action: Apply majority rule threshold (>50% of non-overlapping periods significant) for factor significance determination;
do not use stricter 100% threshold (every period significant) or lenient single-period threshold
severity: high
kind: operational_lesson
modality: must
consequence: Using incorrect significance thresholds produces false positives (single-period) or false negatives (100%
rule); majority rule guards against finding statistical significance by chance in small numbers of non-overlapping windows
while remaining robust to power limitations
derived_from_bd_id: BD-029
- id: finance-C-180
when: When implementing factor normalization across the framework
action: Centralize normalization logic into a single, documented normalization function that applies both FF factors divided
by 100 and excess returns computed as Close minus risk-free rate; avoid distributed normalization code that applies
these operations inconsistently across different modules
severity: high
kind: architecture_guardrail
modality: must
consequence: Distributed normalization decisions (BD-017, BD-041, BD-081) create latent contradiction risk where future
modifications may apply division/subtraction incorrectly, masking errors across multiple overlapping decisions and causing
silent factor mis-specification
derived_from_bd_id: BD-099
- id: finance-C-182
when: When implementing data merging logic for factor regression
action: Use inner join to merge DataFrames by date — only periods with each required data should be included in regression
analysis
severity: high
kind: domain_rule
modality: must
consequence: Using outer join introduces NaN values that cause regression failures or produce biased coefficient estimates,
making factor analysis unreliable and non-reproducible
derived_from_bd_id: BD-013
- id: finance-C-183
when: When calculating returns for factor regression input
action: Calculate excess returns as Close minus risk-free rate (Rf) to isolate the risk premium component that factors
aim to explain
severity: high
kind: domain_rule
modality: must
consequence: Using raw returns violates standard CAPM/FF financial econometrics methodology, producing meaningless factor
loadings that cannot be compared to academic literature or used for portfolio construction
derived_from_bd_id: BD-014
- id: finance-C-184
when: When storing orthogonalized factors in database or analysis outputs
action: Append -ORTHO suffix to orthogonalized factor names to distinguish processed factors from raw factors in database
queries and analysis scripts
severity: high
kind: architecture_guardrail
modality: must
consequence: Without the -ORTHO suffix, raw and orthogonalized factors become indistinguishable in database queries, causing
data misuse where raw factors are accidentally used where orthogonalized ones are required
derived_from_bd_id: BD-026
- id: finance-C-185
when: When implementing variable selection logic for factor regression
action: Use p-value 0.1 as the significance threshold for variable inclusion — factors with p-value below 0.1 should be
included in the model
severity: medium
kind: operational_lesson
modality: should
consequence: Using stricter 0.05 threshold may exclude economically meaningful factors, while 0.2 looser threshold may
introduce overfitting; incorrect threshold selection leads to either underfitting or overfitting
derived_from_bd_id: BD-064
- id: finance-C-186
when: When implementing factor orthogonalization for multi-factor models
action: Use OLS residuals as orthogonalized factor values — regress each factor against market and use residuals to capture
the component orthogonal to market
severity: high
kind: domain_rule
modality: must
consequence: OLS orthogonalization assumes linear relationship between factors and market; non-linear patterns are captured
in residuals, enabling multi-factor models without multicollinearity — alternative methods (Schmidt, Gram-Schmidt) may
produce different results
derived_from_bd_id: BD-066
- id: finance-C-187
when: When implementing sliding window regression loop termination logic
action: Check that data_end_date <= end_date as the loop termination condition — return (None, False) when data extends
beyond requested range to prevent silent failures
severity: high
kind: domain_rule
modality: must
consequence: Without this termination check, the loop produces incomplete or misleading regression results when data extends
beyond requested range, causing silent data loss at boundaries
derived_from_bd_id: BD-082
- id: finance-C-188
when: When implementing parallel OLS computations using multiprocessing
action: Use spawn method with explicit data duplication via itertools.repeat for each worker — pass carbon_data, ff_data,
and rf_data as independent copies to prevent shared-state corruption
severity: high
kind: architecture_guardrail
modality: must
consequence: Using fork method with shared memory causes race conditions in pandas operations, producing non-reproducible
results across runs and potentially incorrect factor coefficients
derived_from_bd_id: BD-086
- id: finance-C-189
when: When implementing outlier filtering for returns data before regression
action: Clip returns to [-0.5, 0.5] (±50% single-period returns) to exclude data entry errors and survivorship bias artifacts
while capturing legitimate extreme events
severity: high
kind: domain_rule
modality: must
consequence: Without proper outlier bounds, extreme returns disproportionately influence OLS coefficients, which are sensitive
to leverage points; standard-deviation based bounds are rejected due to sensitivity to the very outliers they would
filter
derived_from_bd_id: BD-087
- id: finance-C-190
when: When processing datetime fields from multi-timezone data sources (US, EU, Asia markets)
action: Assume the framework handles UTC timezone normalization for datetime fields — the framework does not implement
timezone awareness for stock_data, carbon_risk_factor, and ff_factor tables
severity: high
kind: claim_boundary
modality: must_not
consequence: Without UTC timezone normalization, multi-timezone data ingestion produces incorrect timestamp alignment,
causing cross-market factor regression to use mismatched dates and produce invalid results
derived_from_bd_id: BD-GAP-011
- id: finance-C-191
when: When processing datetime fields from multi-timezone data sources
action: Add explicit timezone conversion ensuring each datetime fields are normalized to UTC before storage; add tzinfo
awareness to stock_data, carbon_risk_factor, and ff_factor tables using standard library timezone utilities
severity: high
kind: domain_rule
modality: must
consequence: Multi-timezone data ingestion without explicit timezone handling causes timestamp misalignment across markets,
making factor regression results unreliable and non-reproducible
derived_from_bd_id: BD-GAP-011
- id: finance-C-192
when: When implementing or refactoring stochastic operations in factor regression scripts
action: Assume the framework sets random seeds for reproducibility — stochastic operations do not have deterministic behavior
by default
severity: high
kind: claim_boundary
modality: must_not
consequence: Without random seed initialization, stochastic operations produce different results on each run, compromising
research reproducibility, model comparison, and peer review
derived_from_bd_id: BD-GAP-012
- id: finance-C-193
when: When implementing or refactoring stochastic operations in factor regression scripts
action: Add random.seed(42) or equivalent seed initialization to each stochastic operations in regression_function.py
and correlate.py; document reproducibility requirements in both modules
severity: high
kind: domain_rule
modality: must
consequence: Research results without seeded randomness cannot be reproduced, compromising model comparison and peer review
— different runs produce different factor coefficients, making validation impossible
derived_from_bd_id: BD-GAP-012
- id: finance-C-194
when: When implementing factor regression or using FF factor data in calculations
action: Divide stored FF factor values (RF, market premium, SMB, HML, etc.) by 100 before passing to OLS or statistical
functions — factors are stored as percentages (e.g., 5.2) but must be converted to decimals (0.052) for proper model
estimation
severity: high
kind: domain_rule
modality: must
consequence: Using undivided percentage values in OLS regression causes coefficients to be off by a factor of 100, making
factor loadings meaningless and alpha estimates completely incorrect
derived_from_bd_id: BD-017
- id: finance-C-195
when: When selecting or switching stock price data sources in production pipelines
action: Verify any data source alternative to yfinance provides adjusted close prices that account for dividends and splits
— verify the source delivers total return data, not just price returns
severity: high
kind: domain_rule
modality: must
consequence: Switching to a data source that provides unadjusted close prices causes return calculations to overstate
actual investment performance by ignoring dividend reinvestment, leading to systematic overestimation of backtested
returns
derived_from_bd_id: BD-039
- id: finance-C-196
when: When constructing value-investing factors or screening stocks by fundamentals
action: Verify that EBITDA, enterprise value, P/E ratio, cash, debt, and shares outstanding fields are populated for each
target stocks before executing EV/EBITDA or P/E screening — handle NULL/missing values explicitly (do not assume default
zero)
severity: high
kind: domain_rule
modality: must
consequence: Screening for value stocks using incomplete fundamental data causes stocks with missing P/E or EV fields
to be incorrectly included or excluded, distorting factor returns and potentially selecting overvalued stocks
derived_from_bd_id: BD-061
- id: finance-C-197
when: When importing Fama-French factor files into the database
action: Preprocess FF CSV files with grep to remove header rows before executing COPY FROM PROGRAM — the raw FF files
contain metadata headers that would cause parsing errors or corrupt factor data if loaded directly
severity: high
kind: domain_rule
modality: must
consequence: Loading FF factor files without header removal causes PostgreSQL to interpret header strings as numeric factor
values, corrupting all factor data and producing NaN coefficients in regression outputs
derived_from_bd_id: BD-006
- id: finance-C-198
when: When implementing BMG factor calculation in factor analysis
action: Verify BMG is calculated as brown minus green (merged['return_x'] - merged['return_y']), NOT green minus brown
severity: high
kind: domain_rule
modality: must
consequence: Reversing the subtraction order produces negative BMG returns that fundamentally invert factor direction,
corrupting all downstream alpha calculations and portfolio exposure estimates
derived_from_bd_id: BD-079
- id: finance-C-199
when: When running factor regressions without explicit frequency parameter
action: Verify frequency matches the actual data source periodicity; be aware that MONTHLY default uses 60 months while
DAILY uses 730 days with 12x memory footprint
severity: medium
kind: operational_lesson
modality: should
consequence: Using DAILY frequency on monthly-sourced fundamental data produces misleadingly precise estimates that don't
reflect actual data availability, inflating apparent statistical power while degrading model validity
derived_from_bd_id: BD-080
- id: finance-C-200
when: When preparing data for factor regression analysis
action: Divide factor returns by 100 to convert percentage to decimal form AND subtract risk-free rate from Close to isolate
excess returns
severity: high
kind: domain_rule
modality: must
consequence: Omitting the /100 normalization produces coefficients 100x larger than expected, making interpretation impossible
and comparisons across factors misleading
derived_from_bd_id: BD-081
- id: finance-C-201
when: When using Durbin-Watson test to detect autocorrelation in factor regression residuals
action: Be aware that DW test has a 'dead zone' (1.5 to 2.5) where it cannot detect autocorrelation; use Breusch-Godfrey
test or Newey-West HAC standard errors when DW is inconclusive
severity: medium
kind: operational_lesson
modality: should
consequence: DW values between 1.5 and 2.5 indicate inconclusive results where autocorrelation may be present but undetected,
leading to underestimated standard errors and spurious statistical significance
derived_from_bd_id: BD-076
- id: finance-C-202
when: When implementing or refactoring interval parameter handling in rolling regression logic
action: Treat interval=0 as a request for default history length (60 months for MONTHLY, 730 days for DAILY) — do not
interpret interval=0 as zero-length window or skip processing
severity: high
kind: domain_rule
modality: must
consequence: Interpreting interval=0 as zero produces empty windows, causing rolling regressions to return NaN or default
to incomplete samples, making all beta estimates and BMG premium conclusions unreliable
derived_from_bd_id: BD-092
- id: finance-C-203
when: When using the framework's default interval parameter for rolling window calculations
action: Verify that interval=0 defaults align with the intended statistical sample size (60 months for monthly data, 730
days for daily data) and document any override of these defaults
severity: medium
kind: operational_lesson
modality: should
consequence: Default interval=0 produces 5-year monthly or 2-year daily samples; changing defaults to shorter windows
reduces statistical power while longer windows may include structural breaks, both distorting BMG premium estimates
derived_from_bd_id: BD-092
- id: finance-C-204
when: When implementing outlier filtering logic in the returns processing pipeline
action: 'Verify consistent outlier threshold application: winsorization threshold (±0.5 or ±50%) must match or be derived
from the abnormal return filter threshold (<=1.0 or 100%) — document any intentional asymmetry'
severity: high
kind: domain_rule
modality: must
consequence: Inconsistent thresholds (±50% winsorization vs <=100% filter) cause the same return data to be treated differently
across pipeline stages, biasing factor loadings toward moderate-return stocks and distorting BMG premium conclusions
derived_from_bd_id: BD-095
- id: finance-C-206
when: When implementing return calculation logic that multiplies percentage values
action: Use Decimal type for percentage-weighted return multiplication to prevent floating-point rounding errors — do
not replace Decimal with float even if performance concerns are raised
severity: high
kind: domain_rule
modality: must
consequence: Using float instead of Decimal in percentage multiplication introduces rounding errors that accumulate across
thousands of transactions, causing reported returns to differ from actual returns by basis points that compound into
material losses in high-frequency strategies
derived_from_bd_id: BD-078
- id: finance-C-207
when: When implementing data merge logic in the factor regression pipeline
action: 'Maintain the sequential merge order: stock → carbon → ff → rf using inner joins on dates — do not reorder or
use different join types'
severity: high
kind: domain_rule
modality: must
consequence: Reversing the merge order changes which dates are retained at each step, producing different filtered datasets
and invalidating regression results. The invariant that all final rows contain dates present across all four datasets
would be broken.
derived_from_bd_id: BD-089
- id: finance-C-208
when: When calculating portfolio returns, equity curves, or position PnL
action: Assume the framework automatically conserves PnL across realized and unrealized components — the framework does
not implement PnL conservation validation
severity: high
kind: claim_boundary
modality: must_not
consequence: Without PnL conservation validation, realized and unrealized PnL components can diverge silently, causing
equity curve misstatement and incorrect performance attribution in backtesting results
derived_from_bd_id: BD-GAP-004
- id: finance-C-209
when: When implementing PnL tracking and portfolio accounting
action: Verify total_pnl = realized_pnl + unrealized_pnl is mathematically conserved across each calculations; validate
that closing a position correctly transfers unrealized PnL to realized PnL and maintains the invariant at every rebalance
point
severity: high
kind: domain_rule
modality: must
consequence: PnL components that fail to reconcile indicate accounting errors; in live trading this causes position discrepancies
and incorrect profit/loss reporting
derived_from_bd_id: BD-GAP-004
- id: finance-C-210
when: When performing model training, backtesting, or evaluating strategy performance
action: Assume the framework enforces strict temporal train/test split integrity without data leakage — the framework
does not implement temporal split validation
severity: high
kind: claim_boundary
modality: must_not
consequence: Without temporal split integrity enforcement, future data can leak into training sets, causing look-ahead
bias where backtest results are systematically inflated compared to live trading
derived_from_bd_id: BD-GAP-005
- id: finance-C-211
when: When implementing train/test split logic for factor regression models
action: Verify that each training samples precede test samples chronologically using split_date as the boundary; verify
no temporal overlap exists and that the split is deterministic based on experiment_id for reproducibility
severity: high
kind: domain_rule
modality: must
consequence: Temporal data leakage causes strategies to appear profitable in backtesting but fail in live trading, leading
to direct financial losses from deploying overfitted models
derived_from_bd_id: BD-GAP-005
- id: finance-C-212
when: When running factor regressions, model experiments, or any reproducible analysis
action: Assume the framework tracks data versions and experiment lineage automatically — the framework does not implement
data versioning or run tracking columns
severity: high
kind: claim_boundary
modality: must_not
consequence: Without data versioning and run tracking, exact model state cannot be reproduced, making A/B testing impossible
and invalidating scientific claims about strategy improvements
derived_from_bd_id: BD-GAP-014
- id: finance-C-213
when: When defining factor tables and regression metadata schema
action: Add data_version column to each factor table to track which input dataset was used; populate run_id and experiment_id
columns in the regression_results table for every execution to enable traceability and A/B comparison
severity: high
kind: domain_rule
modality: must
consequence: Missing versioning metadata makes it impossible to reproduce model state, conduct valid A/B tests between
strategy versions, or diagnose why performance changed over time
derived_from_bd_id: BD-GAP-014
output_validator:
assertions:
- id: OV-01
check_predicate: all(p in inspect.getsource(zvt.factors.algorithm.macd) for p in ['slow=26', 'fast=12', 'n=9'])
failure_message: 'FATAL: MACD params drifted from (fast=12, slow=26, n=9) — SL-08 violation, non-reproducible signals'
business_meaning: Standard MACD parameters are a semantic lock; drift makes results incomparable with industry-standard
indicators and non-reproducible.
source_ids:
- SL-08
- BD-036
- id: OV-02
check_predicate: result.get('total_trades', 0) > 0 or result.get('explicit_zero_trade_ack') is True
failure_message: Zero trades executed — likely missing pre-fetched data (see PC-02) or over-restrictive filters
business_meaning: A backtest with zero trades is not a valid result; either data is missing or the strategy never triggered.
Structural non-emptiness check is insufficient — we need business confirmation.
source_ids:
- SL-01
- finance-C-073
- id: OV-03
check_predicate: result.get('annual_return') is None or abs(float(result['annual_return'])) <= 5.0
failure_message: 'FATAL: |annual_return| > 500% — likely look-ahead bias or data error'
business_meaning: Annual returns exceeding 500% are physically implausible for A-share strategies; indicates look-ahead
bias or corrupt data.
source_ids: []
- id: OV-04
check_predicate: result.get('holding_change_pct') is None or abs(float(result['holding_change_pct'])) <= 1.0
failure_message: 'FATAL: |holding_change_pct| > 100% — physically impossible'
business_meaning: Holding change percentage cannot exceed 100%; violation indicates position accounting error.
source_ids:
- BD-029
- id: OV-05
check_predicate: result.get('max_drawdown') is None or abs(float(result['max_drawdown'])) <= 1.0
failure_message: 'FATAL: |max_drawdown| > 100% — impossible for non-leveraged account'
business_meaning: Maximum drawdown cannot exceed 100% without leverage; violation indicates calculation error or look-ahead
bias.
source_ids: []
- id: OV-06
check_predicate: not (hasattr(result, 'trade_log') and result.trade_log and any(result.trade_log[i].action == 'sell' and
i+1 < len(result.trade_log) and result.trade_log[i+1].action == 'buy' and result.trade_log[i].timestamp == result.trade_log[i+1].timestamp
for i in range(len(result.trade_log)-1)))
failure_message: 'FATAL: buy-before-sell detected in same cycle — SL-01 violation, creates implicit leverage'
business_meaning: SL-01 requires sell() before buy() in each cycle; violation means available_long was not updated before
buying, risking duplicate positions.
source_ids:
- SL-01
scaffold:
validate_py_path: '{workspace}/validate.py'
tail_block: "# === DO NOT MODIFY BELOW THIS LINE ===\nif __name__ == \"__main__\":\n result = run_backtest()\n from\
\ validate import enforce_validation\n enforce_validation(result, output_path=\"{workspace}/result.csv\")\n# ===\
\ END DO NOT MODIFY ==="
enforcement_protocol: 1. Never edit validate.py. 2. Never delete the DO NOT MODIFY tail block from the main script. 3. Never
wrap enforce_validation() in try/except. 4. Never rewrite result write logic — it MUST go through enforce_validation.
5. If validate.py raises ImportError, fix the dependency, do not remove the call.
acceptance:
hard_gates:
- id: G1
check: '{workspace}/result.csv exists AND file size > 0'
on_fail: Strategy did not produce output; check run_backtest() return value and enforce_validation() call
- id: G2
check: '{workspace}/result.csv.validation_passed marker file exists'
on_fail: Validation did not complete; review validate.py output and fix assertion failures
- id: G3
check: 'Main script contains literal: from validate import enforce_validation'
on_fail: Validation chain stripped; re-add the import in the DO NOT MODIFY block
- id: G4
check: 'Main script contains literal: # === DO NOT MODIFY BELOW THIS LINE ==='
on_fail: Validation fence removed; regenerate DO NOT MODIFY tail block
- id: G5
check: 'result.csv has at least 1 row: pandas.read_csv(result.csv).shape[0] >= 1'
on_fail: Empty result; check if trade_log is non-empty and factors generated signals. Confirm PC-02 (k-data exists) passed.
- id: G6
check: 'If MACD strategy: source contains ''slow=26'' AND ''fast=12'' AND ''n=9'' in algorithm call'
on_fail: MACD params drifted from SL-08 lock; restore standard (12, 26, 9)
- id: G7
check: 'For data pipeline tasks: result.csv contains ''entity_id'' and ''timestamp'' fields'
on_fail: Missing required columns; check Mixin.query_data return schema and DataFrame MultiIndex reset_index() before
writing
- id: G8
check: 'OV-03 passes: abs(annual_return) <= 5.0 (500%)'
on_fail: Physical plausibility check failed; investigate look-ahead bias or data corruption in input kdata
soft_gates:
- id: SG-01
rubric: 'Strategy narrative consistency: user intent aligns with generated strategy.py logic. dim_a: signal direction
(buy/sell) matches intent [1-5, pass>=4]; dim_b: frequency (daily/intraday) aligns [1-5, pass>=4]; dim_c: risk controls
match user intent [1-5, pass>=4].'
- id: SG-02
rubric: 'Factor combination quality. dim_a: no highly correlated factor duplication [1-5, pass>=4]; dim_b: multi-period
alignment correct [1-5, pass>=4]; dim_c: liquidity filter present for A-share [1-5, pass>=4].'
- id: SG-03
rubric: 'Data source selection appropriateness. dim_a: coverage sufficient for target entities [1-5, pass>=4]; dim_b:
provider latency acceptable for strategy frequency [1-5, pass>=4]; dim_c: no unauthorized provider used without credentials
[1-5, pass>=4].'
skill_crystallization:
trigger: all_hard_gates_passed AND user_opt_out_skill_saving != true
output_path_template: '{workspace}/../skills/{slug}.skill'
slug_template: '{blueprint_id_short}-{uc_id_lower}'
captured_fields:
- name
- intent_keywords
- entry_point_script
- validate_script
- fatal_constraints
- spec_locks
- preconditions
- install_recipes
- human_summary_translated
action: 'After all Hard Gates PASS, resolve slug via slug_template using the executed UC, then write the .skill YAML file
at output_path_template. Notify user in their detected locale: ''Skill saved as {slug}.skill — next time say one of {sample_triggers}
from the matched UC to invoke directly.'''
violation_signal: All hard gates passed but no .skill file exists at expected path
skill_file_schema:
name: finance-bp-105 / Sector Stock Count and Significant Factor Regression Analyzer
version: v5.3
intent_keywords:
- sector composition
- significant regression
- p-value screening
- stock sectors
- factor analysis
entry_point: run_backtest
fatal_guards:
- SL-01
- SL-02
- SL-03
- SL-04
- SL-05
- SL-06
- SL-07
- SL-08
- SL-10
- SL-11
- SL-12
spec_locks:
- SL-01
- SL-02
- SL-03
- SL-04
- SL-05
- SL-06
- SL-07
- SL-08
- SL-09
- SL-10
- SL-11
- SL-12
preconditions:
- PC-01
- PC-02
- PC-03
- PC-04
post_install_notice:
trigger: skill_installation_complete
message_template:
positioning: I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow.
capability_catalog:
group_strategy:
source: auto_grouped
strategy_reason: auto-grouped by UC.type (4 distinct values, balanced distribution)
groups:
- group_id: screening
name: Screening
description: ''
emoji: 📦
uc_count: 1
ucs:
- uc_id: UC-101
name: Sector Stock Count and Significant Factor Regression Analyzer
short_description: Identifies how many stocks from an index fall into each sector and screens for stocks with statistically
significant factor regression results based o
sample_triggers:
- sector composition
- significant regression
- p-value screening
- group_id: research_analysis
name: Research Analysis
description: ''
emoji: 📦
uc_count: 3
ucs:
- uc_id: UC-102
name: Factor Correlation Calculator
short_description: Computes correlations between different factors over time to understand factor relationship dynamics
and potential multicollinearity issues
sample_triggers:
- factor correlation
- correlation matrix
- factor relationships
- uc_id: UC-103
name: OLS Regression with Diagnostic Statistics
short_description: 'Performs ordinary least squares regression on factor data with comprehensive diagnostic tests
including Durbin-Watson, Jarque-Bera, and Breusch-Pagan '
sample_triggers:
- OLS regression
- diagnostic tests
- statistical tests
- uc_id: UC-107
name: Multi-Stock Factor Regression Runner
short_description: Executes factor regression analysis across multiple stocks in parallel using multiprocessing,
loading Fama-French and carbon risk factors from databas
sample_triggers:
- regression
- multiprocessing
- parallel analysis
- group_id: data_pipeline
name: Data Pipeline
description: ''
emoji: 📊
uc_count: 4
ucs:
- uc_id: UC-104
name: Fama-French Factor Model Generator
short_description: Builds custom Fama-French style factor models by merging stock returns, Fama-French factors,
and carbon risk factors into unified datasets for analysi
sample_triggers:
- Fama-French model
- factor model
- carbon risk
- uc_id: UC-105
name: Stock Price Data Downloader
short_description: Downloads historical stock price data from Yahoo Finance with support for daily and monthly frequencies,
including automatic retry on timeout
sample_triggers:
- stock prices
- price download
- yfinance
- uc_id: UC-108
name: Stock Data Import and Update
short_description: Imports stock return data from CSV or downloads from yfinance, with support for incremental updates
to maintain current database with stock returns
sample_triggers:
- import stocks
- stock data
- data import
- uc_id: UC-109
name: Database Schema Initialization and Data Import
short_description: Initializes database schema and imports Fama-French, bond, and carbon risk factors into PostgreSQL
tables, including BMG factor data
sample_triggers:
- database setup
- schema initialization
- factor import
- group_id: builtin_factor
name: Builtin Factor
description: ''
emoji: 🧮
uc_count: 1
ucs:
- uc_id: UC-106
name: BMG Factor Series Creator
short_description: Creates Brown-Green (BMG) factor series by calculating the return differential between brown
(high carbon) and green (low carbon) stocks for carbon ri
sample_triggers:
- BMG factor
- carbon risk
- brown green stocks
call_to_action: Tell me which one you want to try.
featured_entries:
- uc_id: UC-101
beginner_prompt: Try sector stock count and significant factor regression analyzer
auto_selected: true
- uc_id: UC-102
beginner_prompt: Try factor correlation calculator
auto_selected: true
- uc_id: UC-103
beginner_prompt: Try ols regression with diagnostic statistics
auto_selected: true
more_info_hint: Ask me 'what else can you do?' to see all 9 capabilities.
locale_rendering:
instruction: On skill_installation_complete, translate ALL user-facing strings (positioning + capability_catalog.groups[].name
+ capability_catalog.groups[].description + capability_catalog.groups[].ucs[].short_description + call_to_action + featured_entries[].beginner_prompt
+ more_info_hint) into detected user locale per locale_contract. Preserve UC-IDs, group_id, emoji, and sample_triggers
verbatim.
preserve_verbatim:
- UC-IDs
- group_id
- emoji
- sample_triggers
- technical_class_names
enforcement:
action: 'Host agent MUST send composed message to user as the FIRST user-facing response after skill_installation_complete
event. Message MUST contain: positioning, capability_catalog (rendered as markdown tables per group), 3 featured_entries,
call_to_action, and more_info_hint.'
violation_code: PIN-01
violation_signal: First user-facing message post-install does not contain the full capability_catalog (all UCs grouped)
OR skips featured_entries OR skips call_to_action.
human_summary:
persona: Doraemon
what_i_can_do:
tagline: 'I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow. Just tell me
what you want; I''ll write the code, you don''t have to dig docs. (Heads up: ZVT natively supports A-share, HK, and
crypto. US stocks — stockus_nasdaq_AAPL — are half-baked; don''t bother for serious work.)'
use_cases:
- OLS Regression with Diagnostic Statistics
- Factor Correlation Calculator
- Sector Stock Count and Significant Factor Regression Analyzer
- A-share MACD daily golden-cross backtest with hfq price adjustment from eastmoney
- 'End-to-end ZVT pipeline: FinanceRecorder + GoodCompanyFactor + StockTrader'
- Multi-factor strategy with TargetSelector (AND mode) combining MACD + volume breakout
- Index composition data collection (SZ1000, SZ2000) with EM recorder
what_i_auto_fetch:
- ZVT stage pipeline structure (data_collection → visualization) from LATEST.yaml
- Semantic locks (SL-01 through SL-12) — especially sell-before-buy ordering and MACD params
- Fatal constraints (finance-C-*) relevant to your target strategy type
- 'Default parameters: MACD(12,26,9), hfq adjustment, buy_cost=0.001, base_capital=1M CNY'
- Entity ID format (stock_sh_600000) and DataFrame MultiIndex convention
- Provider-specific recorder class names and required class attributes
what_i_ask_you:
- 'Target market: A-share (default), HK, or crypto? (US stocks in ZVT are half-baked — stockus_nasdaq_AAPL exists but coverage
is thin)'
- 'Data source / provider: eastmoney (free, no account), joinquant (account+paid), baostock (free, good history), akshare,
or qmt (broker)?'
- 'Strategy type: MACD golden-cross, MA crossover, volume breakout, fundamental screen, or custom factor?'
- 'Time range: start_timestamp and end_timestamp for backtest period'
- 'Target entity IDs: specific stocks (stock_sh_600000) or index components (SZ1000)?'
locale_rendering:
instruction: On first user contact, translate all fields above into detected user locale while preserving Doraemon persona
(direct, frank, mildly snarky, knows limits).
preserve_verbatim:
- BD-IDs
- SL-IDs
- UC-IDs
- finance-C-IDs
- class_names
- function_names
- file_paths
- numeric_thresholds
CCXT 库统一封装全球主流加密货币交易所的交易 API,支持订单管理、市场行情查询、账户余额监控与自动化借贷等核心操作。
---
name: ccxt-crypto-api
description: |-
CCXT 库统一封装全球主流加密货币交易所的交易 API,支持订单管理、市场行情查询、账户余额监控与自动化借贷等核心操作。
license: Proprietary. See LICENSE.txt in project root.
compatibility: Designed for Doramagic-host ecosystem (Claude Code / openclaw / Cursor). Requires Python 3.12+ with uv package manager.
metadata:
version: "v6.1"
blueprint_id: "finance-bp-111"
compiled_at: "2026-04-22T13:00:53.651332+00:00"
capability_markets: "crypto"
capability_activities: "crypto-trading"
sop_version: "crystal-compilation-v6.1"
---
# CCXT 加密交易接口 (ccxt-crypto-api)
> CCXT 库统一封装全球主流加密货币交易所的交易 API,支持订单管理、市场行情查询、账户余额监控与自动化借贷等核心操作。
## Pipeline
`data_collection -> data_storage -> factor_computation -> target_selection -> trading_execution -> visualization`
## Top Use Cases (100 total)
### Bitfinex fUST Lending Bot (`UC-101`)
Automates cryptocurrency lending on Bitfinex by checking for lending opportunities and executing market orders to deploy funds into lending markets
**Triggers**: lending, bot, bitfinex
### Cross-Exchange Spot Arbitrage Bot (`UC-102`)
Scans multiple exchanges (OKX, Bybit, Binance, KuCoin, BitMart, Gate.io) for price discrepancies in spot markets and executes arbitrage trades
**Triggers**: arbitrage, spot trading, cross-exchange
### Binance Create and Cancel Order (`UC-103`)
Demonstrates creating a limit order on Binance and then canceling it, useful for testing order workflows
**Triggers**: create order, cancel order, binance
For all **100** use cases, see [references/USE_CASES.md](references/USE_CASES.md).
**Execute trigger**: `When user intent matches intent_router.uc_entries[].positive_terms AND user uses action verb (run/execute/跑/执行/backtest/fetch/collect)`
## What I'll Ask You
- Target market: A-share (default), HK, or crypto? (US stocks in ZVT are half-baked — stockus_nasdaq_AAPL exists but coverage is thin)
- Data source / provider: eastmoney (free, no account), joinquant (account+paid), baostock (free, good history), akshare, or qmt (broker)?
- Strategy type: MACD golden-cross, MA crossover, volume breakout, fundamental screen, or custom factor?
- Time range: start_timestamp and end_timestamp for backtest period
- Target entity IDs: specific stocks (stock_sh_600000) or index components (SZ1000)?
## Semantic Locks (Fatal)
| ID | Rule | On Violation |
|---|---|---|
| `SL-01` | Execute sell orders before buy orders in every trading cycle | halt |
| `SL-02` | Trading signals MUST use next-bar execution (no look-ahead) | halt |
| `SL-03` | Entity IDs MUST follow format entity_type_exchange_code | halt |
| `SL-04` | DataFrame index MUST be MultiIndex (entity_id, timestamp) | halt |
| `SL-05` | TradingSignal MUST have EXACTLY ONE of: position_pct, order_money, order_amount | halt |
| `SL-06` | filter_result column semantics: True=BUY, False=SELL, None/NaN=NO ACTION | halt |
| `SL-07` | Transformer MUST run BEFORE Accumulator in factor pipeline | halt |
| `SL-08` | MACD parameters locked: fast=12, slow=26, signal=9 | halt |
Full lock definitions: [references/LOCKS.md](references/LOCKS.md)
## Top Anti-Patterns (13 total)
- **`AP-CRYPTO-TRADING-001`**: Float Arithmetic for Monetary Values
- **`AP-CRYPTO-TRADING-002`**: Missing Market Initialization Before Access
- **`AP-CRYPTO-TRADING-003`**: Bypassing API Facade Layer
All 13 anti-patterns: [references/ANTI_PATTERNS.md](references/ANTI_PATTERNS.md)
## Evidence Quality Notice
> [QUALITY NOTICE] This crystal was compiled from blueprint finance-bp-111. Evidence verify ratio = 60.5% and audit fail total = 9. Generated results may have uncaptured requirement gaps. Verify critical decisions against source files (LATEST.yaml / LATEST.jsonl).
## Reference Files
| File | Contents | When to Load |
|---|---|---|
| [references/seed.yaml](references/seed.yaml) | V6+ 全量权威 (source-of-truth) | 有行为/决策争议时必读 |
| [references/ANTI_PATTERNS.md](references/ANTI_PATTERNS.md) | 13 条跨项目反模式 | 开始实现前 |
| [references/WISDOM.md](references/WISDOM.md) | 跨项目精华借鉴 | 架构决策时 |
| [references/CONSTRAINTS.md](references/CONSTRAINTS.md) | domain + fatal 约束 | 规则冲突时 |
| [references/USE_CASES.md](references/USE_CASES.md) | 全量 KUC-* 业务场景 | 需要完整示例时 |
| [references/LOCKS.md](references/LOCKS.md) | SL-* + preconditions + hints | 生成回测/交易代码前 |
| [references/COMPONENTS.md](references/COMPONENTS.md) | AST 组件地图(按 module 拆分)| 查 API 时 |
---
*Compiled by Doramagic crystal-compilation-v6.1 from `finance-bp-111` blueprint at 2026-04-22T13:00:53.651332+00:00.*
*See [human_summary.md](human_summary.md) for non-technical overview.*
FILE:human_summary.md
# Human Summary
> I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow. Just tell me what you want; I'll write the code, you don't have to dig docs. (Heads up: ZVT natively supports A-share, HK, and crypto. US stocks — stockus_nasdaq_AAPL — are half-baked; don't bother for serious work.)
## What I Can Do
- **tagline**: I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow. Just tell me what you want; I'll write the code, you don't have to dig docs. (Heads up: ZVT natively supports A-share, HK, and crypto. US stocks — stockus_nasdaq_AAPL — are half-baked; don't bother for serious work.)
- **use_cases**: ['Binance Create and Cancel Order', 'Cross-Exchange Spot Arbitrage Bot', 'Bitfinex fUST Lending Bot', 'A-share MACD daily golden-cross backtest with hfq price adjustment from eastmoney', 'End-to-end ZVT pipeline: FinanceRecorder + GoodCompanyFactor + StockTrader', 'Multi-factor strategy with TargetSelector (AND mode) combining MACD + volume breakout', 'Index composition data collection (SZ1000, SZ2000) with EM recorder']
## What I Ask You
- Target market: A-share (default), HK, or crypto? (US stocks in ZVT are half-baked — stockus_nasdaq_AAPL exists but coverage is thin)
- Data source / provider: eastmoney (free, no account), joinquant (account+paid), baostock (free, good history), akshare, or qmt (broker)?
- Strategy type: MACD golden-cross, MA crossover, volume breakout, fundamental screen, or custom factor?
- Time range: start_timestamp and end_timestamp for backtest period
- Target entity IDs: specific stocks (stock_sh_600000) or index components (SZ1000)?
FILE:references/ANTI_PATTERNS.md
# Anti-Patterns (Cross-Project)
Total: **13**
## ccxt (1)
### `AP-CRYPTO-TRADING-002` — Missing Market Initialization Before Access <sub>(high)</sub>
Attempting to access market data via symbol lookups before load_markets() is called leaves self.markets empty, causing KeyError or BadSymbol exceptions on all trading operations and data retrieval. This breaks the entire trading workflow at the first market interaction.
## cryptofeed (3)
### `AP-CRYPTO-TRADING-009` — Applying Order Book Deltas Before Snapshot <sub>(high)</sub>
Processing order book delta messages before receiving a snapshot for the symbol applies updates to an uninitialized or stale book state. Price levels are incorrectly added/removed, corrupting the local book representation with no way to recover without full reset.
### `AP-CRYPTO-TRADING-010` — Silent HTTP Error Handling <sub>(medium)</sub>
Ignoring non-200 HTTP response status codes without raising exceptions causes silent failures for data requests. Market data is missing or corrupted, failed requests are not retried, and downstream consumers receive incomplete data with no indication of failure.
### `AP-CRYPTO-TRADING-011` — Missing Sequence Number Validation <sub>(medium)</sub>
Not validating that order book sequence numbers increment by exactly 1 allows out-of-order or missing messages to corrupt local book state. Stale or incorrect price levels persist in the book, leading to wrong trading signals and corrupted market depth data.
## hummingbot (5)
### `AP-CRYPTO-TRADING-005` — Unvalidated Collateral for Order Execution <sub>(high)</sub>
Submitting orders without checking collateral requirements including order cost, percent fees, and fixed fees against available balance causes orders to exceed margin. This triggers immediate liquidation or forced position closure at unfavorable prices with partial or total loss of collateral.
### `AP-CRYPTO-TRADING-006` — Close Order Placed Before Open Order Fills <sub>(high)</sub>
Placing a close order before verifying the open order is fully filled causes mismatched position sizes. The executor attempts to close a larger or smaller position than actually exists, leading to unintended directional exposure and potential losses exceeding the configured risk parameters.
### `AP-CRYPTO-TRADING-007` — Arbitrage Across Non-Interchangeable Tokens <sub>(high)</sub>
Executing arbitrage trades between tokens that appear similar but are not interchangeable causes permanent loss of funds. The received tokens cannot be used to close the opposing position, stranding capital and creating one-sided exposure with no recovery path.
### `AP-CRYPTO-TRADING-008` — Skipping Triple Barrier Evaluations <sub>(high)</sub>
Omitting control_stop_loss, control_take_profit, or control_time_limit calls in the control_barriers cycle leaves positions unprotected. Losses exceed configured thresholds as barrier checks never trigger, positions remain open beyond risk tolerance, resulting in amplified losses.
### `AP-CRYPTO-TRADING-012` — Wrong Position Key for Perpetual Modes <sub>(medium)</sub>
Using trading_pair only as the position key in HEDGE mode causes different position sides to collide and overwrite each other. Position tracking becomes incorrect, leading to wrong order matching and potential financial loss when the system misidentifies position direction.
## rotki (3)
### `AP-CRYPTO-TRADING-003` — Bypassing API Facade Layer <sub>(high)</sub>
Directly accessing internal service methods without routing through the RestAPI facade bypasses authentication, task tracking, and error handling mechanisms. Anonymous requests can execute privileged operations, creating critical security vulnerabilities where unauthorized users access sensitive financial data or execute trades.
### `AP-CRYPTO-TRADING-004` — Non-Checksummed EVM Addresses <sub>(high)</sub>
Passing lowercase or mixed-case Ethereum addresses to RPC nodes causes InvalidAddress exceptions since nodes enforce EIP-55 checksum format. This results in RemoteError failures that halt all blockchain data collection for the affected chain, with no graceful degradation or fallback.
### `AP-CRYPTO-TRADING-013` — Overwriting User-Customized Event Classifications <sub>(medium)</sub>
Re-decoding operations silently replace user-modified events marked as CUSTOMIZED without explicit user action. User edits to event classifications are permanently lost, causing incorrect accounting treatment and potential tax reporting errors that may not be detected until audit.
## rotki, hummingbot, cryptofeed, ccxt (1)
### `AP-CRYPTO-TRADING-001` — Float Arithmetic for Monetary Values <sub>(high)</sub>
Using Python float type instead of Decimal for price, amount, balance, PnL, and other financial calculations causes precision errors due to binary floating-point representation. Rounding errors compound across multiple calculations, leading to incorrect order sizing, wrong profit/loss reporting, and potentially incorrect trading decisions or tax calculations.
FILE:references/COMPONENTS.md
# Component Capability Map
**Project**: finance-bp-111--ccxt
**Scan date**: 2026-04-22
**Stats**: {'total_files': 6, 'total_classes': 21, 'total_functions': 0, 'total_stages': 6}
## Modules (6)
- [market_and_currency_loading](components/market_and_currency_loading.md): 3 classes
- [api_request_construction](components/api_request_construction.md): 3 classes
- [network_request_execution](components/network_request_execution.md): 3 classes
- [response_parsing_and_normalization](components/response_parsing_and_normalization.md): 3 classes
- [websocket_real-time_streaming](components/websocket_real-time_streaming.md): 4 classes
- [trading_operations](components/trading_operations.md): 5 classes
FILE:references/CONSTRAINTS.md
# Constraints
## preservation_manifest
```yaml
required_objects:
business_decisions_count: 138
fatal_constraints_count: 46
non_fatal_constraints_count: 175
use_cases_count: 100
semantic_locks_count: 12
preconditions_count: 4
evidence_quality_rules_count: 2
traceback_scenarios_count: 5
```
FILE:references/LOCKS.md
# Semantic Locks + Preconditions
## Semantic Locks (12)
### `SL-01` <sub>(on_violation: fatal)</sub>
Execute sell orders before buy orders in every trading cycle
### `SL-02` <sub>(on_violation: fatal)</sub>
Trading signals MUST use next-bar execution (no look-ahead)
### `SL-03` <sub>(on_violation: fatal)</sub>
Entity IDs MUST follow format entity_type_exchange_code
### `SL-04` <sub>(on_violation: fatal)</sub>
DataFrame index MUST be MultiIndex (entity_id, timestamp)
### `SL-05` <sub>(on_violation: fatal)</sub>
TradingSignal MUST have EXACTLY ONE of: position_pct, order_money, order_amount
### `SL-06` <sub>(on_violation: fatal)</sub>
filter_result column semantics: True=BUY, False=SELL, None/NaN=NO ACTION
### `SL-07` <sub>(on_violation: fatal)</sub>
Transformer MUST run BEFORE Accumulator in factor pipeline
### `SL-08` <sub>(on_violation: fatal)</sub>
MACD parameters locked: fast=12, slow=26, signal=9
### `SL-09` <sub>(on_violation: warning)</sub>
Default transaction costs: buy_cost=0.001, sell_cost=0.001, slippage=0.001
### `SL-10` <sub>(on_violation: fatal)</sub>
A-share equity trading is T+1 (no same-day close of buy positions)
### `SL-11` <sub>(on_violation: fatal)</sub>
Recorder subclass MUST define provider AND data_schema class attributes
### `SL-12` <sub>(on_violation: fatal)</sub>
Factor result_df MUST contain either 'filter_result' OR 'score_result' column
## Preconditions (4)
- **`PC-01`**: `python3 -c 'import zvt; print(zvt.__version__)'` → on_fail: Run: python3 -m pip install zvt then re-run: python3 -m zvt.init_dirs to initialize data directories
- **`PC-02`**: `python3 -c "from zvt.api.kdata import get_kdata; df = get_kdata(entity_ids=['stock_sh_600000'], limit=1); assert df is not None and len(df) > 0, 'No kdata found'"` → on_fail: Run recorder first: python3 -m zvt.recorders.em.em_stock_kdata_recorder --entity_ids stock_sh_600000 (replace with your target entity IDs)
- **`PC-03`**: `python3 -c "import os; from pathlib import Path; zvt_home = Path(os.environ.get('ZVT_HOME', Path.home() / '.zvt')); assert zvt_home.exists(), f'ZVT home not found: {zvt_home}'"` → on_fail: Run: python3 -m zvt.init_dirs
- **`PC-04`**: `python3 -c "import os, tempfile; from pathlib import Path; zvt_home = Path(os.environ.get('ZVT_HOME', Path.home() / '.zvt')); test_f = zvt_home / '.write_test'; test_f.touch(); test_f.unlink()"` → on_fail: Check directory permissions: chmod u+w ~/.zvt or set ZVT_HOME environment variable to a writable location
FILE:references/USE_CASES.md
# Known Use Cases (KUC)
Total: **100**
## `KUC-101`
**Source**: `examples/bots/py/bitfinex-lending-bot.py`
Automates cryptocurrency lending on Bitfinex by checking for lending opportunities and executing market orders to deploy funds into lending markets.
## `KUC-102`
**Source**: `examples/bots/py/spot-arbitrage-bot.py`
Scans multiple exchanges (OKX, Bybit, Binance, KuCoin, BitMart, Gate.io) for price discrepancies in spot markets and executes arbitrage trades.
## `KUC-103`
**Source**: `examples/ccxt.pro/py/binance-create-order-cancel-order.py`
Demonstrates creating a limit order on Binance and then canceling it, useful for testing order workflows.
## `KUC-104`
**Source**: `examples/ccxt.pro/py/binance-fetch-balance-snapshot-watch-balance-updates.py`
Captures initial balance snapshot and continuously monitors for balance updates via WebSocket, printing changes when they occur.
## `KUC-105`
**Source**: `examples/ccxt.pro/py/binance-futures-watch-balance.py`
Continuously watches futures, delivery, and spot balances on Binance simultaneously using asyncio.
## `KUC-106`
**Source**: `examples/ccxt.pro/py/binance-futures-watch-order-book.py`
Streams real-time order book updates for BTC/USDT futures contract on Binance.
## `KUC-107`
**Source**: `examples/ccxt.pro/py/binance-futures.py`
Continuously monitors and prints order book updates with timestamps for BTC/USDT on Binance.
## `KUC-108`
**Source**: `examples/ccxt.pro/py/binance-reload-markets.py`
Periodically reloads market data from Binance while simultaneously watching order books, ensuring market data stays current.
## `KUC-109`
**Source**: `examples/ccxt.pro/py/binance-spot-and-futures.py`
Watches multiple order books across different market types (spot, futures) and displays them together.
## `KUC-110`
**Source**: `examples/ccxt.pro/py/binance-watch-many-orderbooks.py`
Subscribes to order book updates for multiple trading pairs simultaneously on Binance, printing each updates.
## `KUC-111`
**Source**: `examples/ccxt.pro/py/binance-watch-margin-balance.py`
Monitors margin account balance changes on Binance via WebSocket, printing updates when margin positions change.
## `KUC-112`
**Source**: `examples/ccxt.pro/py/binance-watch-ohlcv.py`
Streams real-time OHLCV (candlestick) data for ETH/USDT on Binance with configurable timeframe and limit.
## `KUC-113`
**Source**: `examples/ccxt.pro/py/binance-watch-order-book-individual-updates.py`
Captures and displays individual high-frequency order book updates by subclassing Binance exchange to intercept messages.
## `KUC-114`
**Source**: `examples/ccxt.pro/py/binance-watch-orderbook-watch-balance.py`
Simultaneously monitors order book and balance updates, displaying them together with common handler logic.
## `KUC-115`
**Source**: `examples/ccxt.pro/py/binance-watch-orders-being-placed.py`
Watches active orders and balance updates while also placing delayed orders to demonstrate order lifecycle monitoring.
## `KUC-116`
**Source**: `examples/ccxt.pro/py/binance-watch-spot-futures-balances-continuously.py`
Continuously monitors balance across multiple Binance accounts (spot, USD-M futures, COIN-M futures) and prints each currency totals.
## `KUC-117`
**Source**: `examples/ccxt.pro/py/bitmex_watch_ohlcv.py`
Streams real-time OHLCV candlestick data for BTC/USD perpetual contract on Bitmex with formatted table output.
## `KUC-118`
**Source**: `examples/ccxt.pro/py/bitmex_watch_ticker_and_ohlcv.py`
Simultaneously streams ticker data and OHLCV candlesticks on Bitmex with color-coded output for visual distinction.
## `KUC-119`
**Source**: `examples/ccxt.pro/py/bitvavo-watch-order-book.py`
Streams real-time order book updates for BTC/EUR on Bitvavo European exchange with nonce verification.
## `KUC-120`
**Source**: `examples/ccxt.pro/py/build-ohlcv-many-symbols.py`
Constructs OHLCV candlesticks from individual trades for multiple symbols, supporting both complete and incomplete candles.
## `KUC-121`
**Source**: `examples/ccxt.pro/py/coinbase-watch-all-trades.py`
Watches each trade updates on Coinbase for BTC/USD and tracks the last trade ID to avoid duplicates.
## `KUC-122`
**Source**: `examples/ccxt.pro/py/coinbase-watch-trades.py`
Streams trade data for BTC/USD on Coinbase, printing latest trade with count of cached trades.
## `KUC-123`
**Source**: `examples/ccxt.pro/py/consume-all-trades.py`
Continuously consumes and prints each trade updates for BTC/USD on Bitmex, clearing trade cache after processing.
## `KUC-124`
**Source**: `examples/ccxt.pro/py/gateio-watch-trades.py`
Watches trade updates on Gate.io for BTC/USDT with timestamp-based pagination to fetch incremental updates.
## `KUC-125`
**Source**: `examples/ccxt.pro/py/intercept-original-ohlcv-updates.py`
Subclasses Binance to intercept and process raw OHLCV WebSocket messages before passing to standard handler.
## `KUC-126`
**Source**: `examples/ccxt.pro/py/kucoin-watch-multiple-orderbooks.py`
Watches order books for multiple symbols (KDA/USDT, KDA/BTC, BTC/USDT) simultaneously on KuCoin with authentication.
## `KUC-127`
**Source**: `examples/ccxt.pro/py/many-exchanges-many-different-streams.py`
Monitors multiple data streams (order book, ticker, trades) across multiple exchanges simultaneously.
## `KUC-128`
**Source**: `examples/ccxt.pro/py/many-exchanges-many-orderbooks-synchronized.py`
Watches order books across multiple exchanges and displays each current order books together in a synchronized view.
## `KUC-129`
**Source**: `examples/ccxt.pro/py/many-exchanges-many-orderbooks-throttled.py`
Watches order books across multiple exchanges with throttled output every 5 seconds to manage display bandwidth.
## `KUC-130`
**Source**: `examples/ccxt.pro/py/many-exchanges-many-streams-with-keys.py`
Advanced multi-exchange monitoring supporting both symbol-specific and global streams (like server time) with API authentication.
## `KUC-131`
**Source**: `examples/ccxt.pro/py/many-exchanges-many-streams.py`
Watches order books for multiple symbols across OKX and Binance simultaneously using async gather patterns.
## `KUC-132`
**Source**: `examples/ccxt.pro/py/many-exchanges-many-symbols-watch-trades.py`
Streams trade data for multiple symbols across OKX and Binance with incremental updates enabled.
## `KUC-133`
**Source**: `examples/ccxt.pro/py/many-exchanges.py`
Simple example watching BTC/USDT order books across Kraken, Binance, and Bitmex simultaneously.
## `KUC-134`
**Source**: `examples/ccxt.pro/py/multiple-exchanges-watch-orderbook-continuously.py`
Monitors CELO/USD order books across Coinbase Pro, OKCoin, and Bittrex, printing when top bid changes.
## `KUC-135`
**Source**: `examples/ccxt.pro/py/okex-create-swap-order.py`
Places a market order for BTC/USDT perpetual swap contract on OKX with configurable position direction.
## `KUC-136`
**Source**: `examples/ccxt.pro/py/okex-watch-margin-balance-with-params.py`
Watches OKX margin account balance for specific symbol (BTC/USDT) using params-based approach with verbose output.
## `KUC-137`
**Source**: `examples/ccxt.pro/py/okex-watch-margin-balance.py`
Continuously monitors OKX margin account balance changes for BTC/USDT with verbose debugging enabled.
## `KUC-138`
**Source**: `examples/ccxt.pro/py/okx-bbo-tbt.py`
Streams best bid/ask data tick-by-tick on OKX for high-frequency price monitoring.
## `KUC-139`
**Source**: `examples/ccxt.pro/py/on-connected-user-hook.py`
Demonstrates WebSocket connection lifecycle hook by placing an order immediately upon connection establishment.
## `KUC-140`
**Source**: `examples/ccxt.pro/py/one-exchange-different-streams.py`
Watches both order book and trades streams simultaneously for BTC/USD on Bitstamp.
## `KUC-141`
**Source**: `examples/ccxt.pro/py/one-exchange-many-streams.py`
Watches order books for multiple symbols (BTC/USDT, ETH/USDT, ETH/BTC) on FTX exchange with throttling.
## `KUC-142`
**Source**: `examples/ccxt.pro/py/phemex-cancel-all-orders.py`
Cancels each open orders for a specific symbol (UNI/USDT) on Phemex exchange.
## `KUC-143`
**Source**: `examples/ccxt.pro/py/spot-vs-future-arbitrage-bitmart.py`
Monitors both spot and futures order books on BitMart to detect arbitrage opportunities between the two markets.
## `KUC-144`
**Source**: `examples/ccxt.pro/py/watch-all-symbols.py`
Watches order books for each available trading pairs on Kraken, printing every 100th update to manage output.
## `KUC-145`
**Source**: `examples/ccxt.pro/py/watch-custom-exchange-specific-streams.py`
Implements custom WebSocket handler for Binance mini ticker stream not natively supported in CCXT Pro.
## `KUC-146`
**Source**: `examples/ccxt.pro/py/watch-many-exchanges-many-tickers.py`
Streams ticker data (bid/ask/last) for multiple symbols across Binance and FTX simultaneously.
## `KUC-147`
**Source**: `examples/ccxt.pro/py/watch-ticker-to-csv.py`
Streams ticker data for multiple symbols and writes results to CSV files for historical analysis.
## `KUC-148`
**Source**: `examples/py/aiohttp-custom-session-connector.py`
Configures CCXT async support to use SOCKS proxy for exchanges that require it for connectivity.
## `KUC-149`
**Source**: `examples/py/all-exchanges.py`
Lists each cryptocurrency exchanges supported by the CCXT library for discovery purposes.
## `KUC-150`
**Source**: `examples/py/arbitrage-pairs.py`
Scans multiple exchanges to find arbitrage opportunities by comparing prices across different trading pairs.
## `KUC-151`
**Source**: `examples/py/asciichart.py`
Provides terminal-based charting capability to visualize price data using ASCII art.
## `KUC-152`
**Source**: `examples/py/async-analyse-augur-v1-vs-v2-exchanges.py`
Compares trading pairs across Augur v1 and v2 exchanges to identify differences in available markets.
## `KUC-153`
**Source**: `examples/py/async-balance-coinbasepro.py`
Fetches account balance from Coinbase Pro exchange using sandbox environment for testing.
## `KUC-154`
**Source**: `examples/py/async-balance-gdax.py`
Fetches account balance from GDAX (Coinbase) exchange using sandbox mode.
## `KUC-155`
**Source**: `examples/py/async-balance.py`
Fetches account balance from Bittrex exchange asynchronously.
## `KUC-156`
**Source**: `examples/py/async-balances.py`
Fetches balances from multiple exchanges (Kraken, Bitfinex) concurrently.
## `KUC-157`
**Source**: `examples/py/async-basic-callchain.py`
Demonstrates sequential async operations pattern: load markets, fetch ticker, fetch order book on multiple exchanges.
## `KUC-158`
**Source**: `examples/py/async-basic-orderbook.py`
Fetches order book data from OKX exchange asynchronously.
## `KUC-159`
**Source**: `examples/py/async-basic-rate-limiter.py`
Demonstrates CCXT's built-in rate limiting by making 100 consecutive API calls without hitting exchange limits.
## `KUC-160`
**Source**: `examples/py/async-basic.py`
Simple example demonstrating async market loading from Binance exchange.
## `KUC-161`
**Source**: `examples/py/async-binance-cancel-option-order.py`
Cancels a specific options order on Binance using the implicit API for options trading.
## `KUC-162`
**Source**: `examples/py/async-binance-create-margin-order.py`
Places a limit buy order on Binance using margin trading account type.
## `KUC-163`
**Source**: `examples/py/async-binance-create-option-order.py`
Places a call options order on Binance USDT Options market.
## `KUC-164`
**Source**: `examples/py/async-binance-create-trailing-percent-order.py`
Places a trailing percent stop order on Binance USD-M futures with reduce-only flag.
## `KUC-165`
**Source**: `examples/py/async-binance-fetch-margin-balance-with-options.py`
Fetches margin account balance from Binance using options-based configuration.
## `KUC-166`
**Source**: `examples/py/async-binance-fetch-margin-balance-with-params.py`
Fetches Binance margin balance using params-based approach specifying type.
## `KUC-167`
**Source**: `examples/py/async-binance-fetch-option-OHLCV.py`
Fetches historical candlestick data for Binance options contracts.
## `KUC-168`
**Source**: `examples/py/async-binance-fetch-option-details.py`
Fetches options market details (mark price, etc.) from Binance using implicit API.
## `KUC-169`
**Source**: `examples/py/async-binance-fetch-option-order.py`
Fetches open options orders from Binance with pagination support.
## `KUC-170`
**Source**: `examples/py/async-binance-fetch-option-orderbook.py`
Fetches options order book from Binance USDT Options market.
## `KUC-171`
**Source**: `examples/py/async-binance-fetch-option-position.py`
Fetches options position information from Binance.
## `KUC-172`
**Source**: `examples/py/async-binance-fetch-option-ticker.py`
Fetches ticker/price information for Binance options contracts.
## `KUC-173`
**Source**: `examples/py/async-binance-fetch-ticker-continuously.py`
Continuously fetches ticker data from Binance with error handling and retry logic for robustness.
## `KUC-174`
**Source**: `examples/py/async-binance-futures-vs-spot.py`
Compares account data (balance, orders, trades) between Binance spot and futures accounts.
## `KUC-175`
**Source**: `examples/py/async-binance-margin-borrow.py`
Borrows cryptocurrency from Binance margin account for trading or other purposes.
## `KUC-176`
**Source**: `examples/py/async-binance-margin-repay.py`
Repays borrowed cryptocurrency on Binance margin account to reduce margin debt.
## `KUC-177`
**Source**: `examples/py/async-binance-usdm-fetch-continuous-klines-ohlcv.py`
Fetches continuous klines (perpetual contract) data from Binance USD-M futures.
## `KUC-178`
**Source**: `examples/py/async-bitfinex-public-get-symbols.py`
Fetches list of trading symbols available on Bitfinex exchange via public API.
## `KUC-179`
**Source**: `examples/py/async-bitget-perpetual-futures-swaps.py`
Places perpetual swap orders and fetches balance on Bitget exchange with API authentication.
## `KUC-180`
**Source**: `examples/py/async-bitstamp-create-limit-buy-order.py`
Places a limit buy order on Bitstamp exchange with configurable price and amount.
## `KUC-181`
**Source**: `examples/py/async-bitstamp-create-order-cancel-order.py`
Places a sell limit order on Bitstamp then cancels it, demonstrating full order lifecycle.
## `KUC-182`
**Source**: `examples/py/async-bittrex-orderbook.py`
Async generator that continuously polls order book data from Bittrex exchange.
## `KUC-183`
**Source**: `examples/py/async-bybit-transfer.py`
Fetches transfer history and executes internal transfers between Bybit account wallets (spot, derivatives, options).
## `KUC-184`
**Source**: `examples/py/async-fetch-balance.py`
Simple async example to fetch account balance from Bitstamp exchange.
## `KUC-185`
**Source**: `examples/py/async-fetch-many-orderbooks-continuously.py`
Continuously fetches order books for multiple symbols across OKX and Binance exchanges.
## `KUC-186`
**Source**: `examples/py/async-fetch-ohlcv-indicators-discord-webhook.py`
Fetches OHLCV data, calculates RSI indicator, and sends alerts to Discord when RSI conditions are met.
## `KUC-187`
**Source**: `examples/py/async-fetch-ohlcv-multiple-symbols-continuously.py`
Continuously fetches latest OHLCV candles for multiple symbols on Binance in a loop.
## `KUC-188`
**Source**: `examples/py/async-fetch-order-book-from-many-exchanges.py`
Fetches order book from multiple exchanges (Binance, KuCoin, Huobi) concurrently for the same symbol.
## `KUC-189`
**Source**: `examples/py/async-fetch-ticker.py`
Simple one-liner to fetch current ticker price from Binance.
## `KUC-190`
**Source**: `examples/py/async-gather-concurrency.py`
Demonstrates concurrent API calls using asyncio.gather to fetch order books from multiple symbols efficiently.
## `KUC-191`
**Source**: `examples/py/async-gdax-fetch-order-book-continuously.py`
Continuously polls order book data from Binance (mislabeled as GDAX in example) in a while loop.
## `KUC-192`
**Source**: `examples/py/async-generator-basic.py`
Demonstrates async generator pattern to continuously yield ticker data from Poloniex.
## `KUC-193`
**Source**: `examples/py/async-generator-multiple-tickers.py`
Async generator that cycles through multiple tickers on Kraken with round-robin approach.
## `KUC-194`
**Source**: `examples/py/async-generator-ticker-poller.py`
Authenticated async generator that polls BTC/USD ticker from Kraken continuously.
## `KUC-195`
**Source**: `examples/py/async-hollaex-sandbox.py`
Tests Hollaex API connectivity using sandbox mode with test API keys.
## `KUC-196`
**Source**: `examples/py/async-instantiate-all-at-once.py`
Creates instances of each CCXT-supported exchanges and demonstrates accessing one.
## `KUC-197`
**Source**: `examples/py/async-kucoin-rate-limit.py`
Demonstrates robust OHLCV fetching from KuCoin with proper rate limit handling and retry logic.
## `KUC-198`
**Source**: `examples/py/async-macd.py`
Calculates MACD (Moving Average Convergence Divergence) indicator on live OHLCV data for trading decisions.
## `KUC-199`
**Source**: `examples/py/async-market-making-symbols.py`
Scans each exchanges to find symbols with 0% maker fees, useful for market making strategies.
## `KUC-200`
**Source**: `examples/py/async-multiple-accounts.py`
Manages multiple exchange accounts simultaneously, fetching balance data from each account.
FILE:references/WISDOM.md
# Cross-Project Wisdom
Total: **8**
## `CW-CRYPTO-TRADING-001` — Decimal Type for All Monetary Values
**From**: rotki, hummingbot, cryptofeed, ccxt · **Applicable to**: crypto-trading
All four projects mandate Decimal type for price, amount, balance, quantity, and PnL fields. Float arithmetic causes rounding errors that compound across financial calculations, leading to incorrect order sizing and reporting. Always use Decimal for any value representing money in crypto trading systems.
## `CW-CRYPTO-TRADING-002` — Initialize Data Structures Before Access
**From**: ccxt, cryptofeed, rotki · **Applicable to**: crypto-trading
Projects consistently require explicit initialization before data access: load_markets() before symbol lookups, check symbol population before mapping access, establish RPC connections before queries. Skipping initialization causes KeyError, AttributeError, or silent data corruption that breaks downstream operations.
## `CW-CRYPTO-TRADING-003` — Precise String Arithmetic for Financial Calculations
**From**: ccxt · **Applicable to**: crypto-trading
CCXT mandates Precise.string_* static methods (string_mul, string_div, string_add, string_sub) for monetary calculations to avoid floating-point precision errors. This is especially critical for high-precision exchange data where rounding errors cause incorrect order costs, fees, and balances that may result in financial loss.
## `CW-CRYPTO-TRADING-004` — Respect Exchange Rate Limits
**From**: ccxt · **Applicable to**: crypto-trading
Disabling rate limiting via enableRateLimit=False causes HTTP 429 responses and potential temporary or permanent API key suspension by exchanges. CCXT enforces rate limits per IP/API key pair, and bypassing throttle() gates results in compliance violations that disrupt all trading activity until exchanges lift bans.
## `CW-CRYPTO-TRADING-005` — Inverse Contract Price Adjustment
**From**: ccxt, hummingbot · **Applicable to**: crypto-trading
Perpetual swap cost calculations require applying inverse price adjustment (1/price) before multiplying by contractSize for inverse contracts. Incorrect cost calculation causes wrong position sizing, leading to unexpected liquidation or insufficient margin for perpetual trading positions.
## `CW-CRYPTO-TRADING-006` — Strict Connection Lifecycle Ordering
**From**: cryptofeed, ccxt · **Applicable to**: crypto-trading
Both projects enforce strict execution order for connection operations: cryptofeed requires authenticate -> subscribe -> message handler sequence, while ccxt mandates connect -> on_connected_callback -> subscriptions -> on_close_callback. Out-of-order operations cause subscription failures and no data flow through connections.
## `CW-CRYPTO-TRADING-007` — Validate Input Data Structure Before Processing
**From**: rotki, cryptofeed · **Applicable to**: crypto-trading
Rotki validates EVM address checksum format before RPC calls; cryptofeed checks Symbols.populated() before symbol mapping access. Validating data structure before processing prevents downstream crashes (KeyError, InvalidAddress) and data corruption that is harder to debug when symptoms appear in unrelated code paths.
## `CW-CRYPTO-TRADING-008` — Validate Order Sizes Against Exchange Minimums
**From**: hummingbot · **Applicable to**: crypto-trading
DCAExecutor amounts must be validated against min_notional_size and amounts_quote/prices against min_order_size before execution. Orders below exchange minimums are rejected, breaking strategy execution and potentially leaving positions partially unfilled at unfavorable prices.
FILE:references/components/api_request_construction.md
# api_request_construction (3 classes)
## `Exchange.sign`
`api_request_construction/exchange-sign.py:0`
## `Entry descriptor`
`api_request_construction/entry-descriptor.py:0`
## `sign`
`api_request_construction/sign.py:0`
FILE:references/components/market_and_currency_loading.md
# market_and_currency_loading (3 classes)
## `Exchange.load_markets`
`market_and_currency_loading/exchange-load-markets.py:0`
## `binance.fetch_markets`
`market_and_currency_loading/binance-fetch-markets.py:0`
## `fetch_markets`
`market_and_currency_loading/fetch-markets.py:0`
FILE:references/components/network_request_execution.md
# network_request_execution (3 classes)
## `Exchange.fetch`
`network_request_execution/exchange-fetch.py:0`
## `Throttler.wait`
`network_request_execution/throttler-wait.py:0`
## `rateLimit`
`network_request_execution/ratelimit.py:0`
FILE:references/components/response_parsing_and_normalization.md
# response_parsing_and_normalization (3 classes)
## `Exchange.safe_float`
`response_parsing_and_normalization/exchange-safe-float.py:0`
## `Precise.string_mul`
`response_parsing_and_normalization/precise-string-mul.py:0`
## `parse_* methods`
`response_parsing_and_normalization/parse-methods.py:0`
FILE:references/components/trading_operations.md
# trading_operations (5 classes)
## `binance.create_order`
`trading_operations/binance-create-order.py:0`
## `binance.create_orders`
`trading_operations/binance-create-orders.py:0`
## `Exchange.check_order_arguments`
`trading_operations/exchange-check-order-arguments.py:0`
## `margin modes`
`trading_operations/margin-modes.py:0`
## `position mode`
`trading_operations/position-mode.py:0`
FILE:references/components/websocket_real-time_streaming.md
# websocket_real-time_streaming (4 classes)
## `Client.connect`
`websocket_real-time_streaming/client-connect.py:0`
## `Future.race`
`websocket_real-time_streaming/future-race.py:0`
## `binance.watch_trades`
`websocket_real-time_streaming/binance-watch-trades.py:0`
## `handle_* methods`
`websocket_real-time_streaming/handle-methods.py:0`
使用 bt 框架构建和回测多策略投资组合,支持风险平价、等风险贡献、逆波动率加权等组合构建方法,以及政府债券滚动交易的模拟回测。
---
name: bt-portfolio-backtest
description: |-
使用 bt 框架构建和回测多策略投资组合,支持风险平价、等风险贡献、逆波动率加权等组合构建方法,以及政府债券滚动交易的模拟回测。
license: Proprietary. See LICENSE.txt in project root.
compatibility: Designed for Doramagic-host ecosystem (Claude Code / openclaw / Cursor). Requires Python 3.12+ with uv package manager.
metadata:
version: "v6.1"
blueprint_id: "finance-bp-125"
compiled_at: "2026-04-22T13:01:02.252610+00:00"
capability_markets: "multi-market"
capability_activities: "backtesting, factor-research"
sop_version: "crystal-compilation-v6.1"
---
# bt 组合回测 (bt-portfolio-backtest)
> 使用 bt 框架构建和回测多策略投资组合,支持风险平价、等风险贡献、逆波动率加权等组合构建方法,以及政府债券滚动交易的模拟回测。
## Pipeline
`data_collection -> data_storage -> factor_computation -> target_selection -> trading_execution -> visualization`
## Top Use Cases (20 total)
### Buy and Hold Monthly Rebalancing Strategy (`UC-101`)
Implements a passive buy-and-hold strategy with monthly rebalancing to fixed target weights, demonstrating core backtesting framework capabilities
**Triggers**: buy and hold, monthly rebalance, fixed weights
### Equal Risk Contribution Portfolio Construction (`UC-102`)
Demonstrates Equal Risk Contribution (ERC) portfolio weighting using multivariate normal returns and covariance matrix inputs for risk parity allocati
**Triggers**: equal risk contribution, risk parity, covariance
### Fixed Income Government Bond Rolling Strategy (`UC-103`)
Simulates rolling government bond trading with synthetic price-to-yield calculations and bond lifecycle management for fixed income backtesting
**Triggers**: fixed income, government bonds, rolling bonds
For all **20** use cases, see [references/USE_CASES.md](references/USE_CASES.md).
**Execute trigger**: `When user intent matches intent_router.uc_entries[].positive_terms AND user uses action verb (run/execute/跑/执行/backtest/fetch/collect)`
## What I'll Ask You
- Target market: A-share (default), HK, or crypto? (US stocks in ZVT are half-baked — stockus_nasdaq_AAPL exists but coverage is thin)
- Data source / provider: eastmoney (free, no account), joinquant (account+paid), baostock (free, good history), akshare, or qmt (broker)?
- Strategy type: MACD golden-cross, MA crossover, volume breakout, fundamental screen, or custom factor?
- Time range: start_timestamp and end_timestamp for backtest period
- Target entity IDs: specific stocks (stock_sh_600000) or index components (SZ1000)?
## Semantic Locks (Fatal)
| ID | Rule | On Violation |
|---|---|---|
| `SL-01` | Execute sell orders before buy orders in every trading cycle | halt |
| `SL-02` | Trading signals MUST use next-bar execution (no look-ahead) | halt |
| `SL-03` | Entity IDs MUST follow format entity_type_exchange_code | halt |
| `SL-04` | DataFrame index MUST be MultiIndex (entity_id, timestamp) | halt |
| `SL-05` | TradingSignal MUST have EXACTLY ONE of: position_pct, order_money, order_amount | halt |
| `SL-06` | filter_result column semantics: True=BUY, False=SELL, None/NaN=NO ACTION | halt |
| `SL-07` | Transformer MUST run BEFORE Accumulator in factor pipeline | halt |
| `SL-08` | MACD parameters locked: fast=12, slow=26, signal=9 | halt |
Full lock definitions: [references/LOCKS.md](references/LOCKS.md)
## Top Anti-Patterns (25 total)
- **`AP-ZVT-183`**: 除权因子为 inf/NaN 时直接参与乘法导致复权静默失败
- **`AP-ZVT-179`**: 第三方数据接口超限后异常被吞噬,数据静默缺失
- **`AP-ZVT-183B`**: HFQ(后复权)与 QFQ(前复权)K 线表使用错误导致因子计算漂移
All 25 anti-patterns: [references/ANTI_PATTERNS.md](references/ANTI_PATTERNS.md)
## Evidence Quality Notice
> [QUALITY NOTICE] This crystal was compiled from blueprint finance-bp-125. Evidence verify ratio = 10.0% and audit fail total = 14. Generated results may have uncaptured requirement gaps. Verify critical decisions against source files (LATEST.yaml / LATEST.jsonl).
## Reference Files
| File | Contents | When to Load |
|---|---|---|
| [references/seed.yaml](references/seed.yaml) | V6+ 全量权威 (source-of-truth) | 有行为/决策争议时必读 |
| [references/ANTI_PATTERNS.md](references/ANTI_PATTERNS.md) | 25 条跨项目反模式 | 开始实现前 |
| [references/WISDOM.md](references/WISDOM.md) | 跨项目精华借鉴 | 架构决策时 |
| [references/CONSTRAINTS.md](references/CONSTRAINTS.md) | domain + fatal 约束 | 规则冲突时 |
| [references/USE_CASES.md](references/USE_CASES.md) | 全量 KUC-* 业务场景 | 需要完整示例时 |
| [references/LOCKS.md](references/LOCKS.md) | SL-* + preconditions + hints | 生成回测/交易代码前 |
| [references/COMPONENTS.md](references/COMPONENTS.md) | AST 组件地图(按 module 拆分)| 查 API 时 |
---
*Compiled by Doramagic crystal-compilation-v6.1 from `finance-bp-125` blueprint at 2026-04-22T13:01:02.252610+00:00.*
*See [human_summary.md](human_summary.md) for non-technical overview.*
FILE:human_summary.md
# Human Summary
> I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow. Just tell me what you want; I'll write the code, you don't have to dig docs. (Heads up: ZVT natively supports A-share, HK, and crypto. US stocks — stockus_nasdaq_AAPL — are half-baked; don't bother for serious work.)
## What I Can Do
- **tagline**: I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow. Just tell me what you want; I'll write the code, you don't have to dig docs. (Heads up: ZVT natively supports A-share, HK, and crypto. US stocks — stockus_nasdaq_AAPL — are half-baked; don't bother for serious work.)
- **use_cases**: ['Fixed Income Government Bond Rolling Strategy', 'Equal Risk Contribution Portfolio Construction', 'Buy and Hold Monthly Rebalancing Strategy', 'A-share MACD daily golden-cross backtest with hfq price adjustment from eastmoney', 'End-to-end ZVT pipeline: FinanceRecorder + GoodCompanyFactor + StockTrader', 'Multi-factor strategy with TargetSelector (AND mode) combining MACD + volume breakout', 'Index composition data collection (SZ1000, SZ2000) with EM recorder']
## What I Ask You
- Target market: A-share (default), HK, or crypto? (US stocks in ZVT are half-baked — stockus_nasdaq_AAPL exists but coverage is thin)
- Data source / provider: eastmoney (free, no account), joinquant (account+paid), baostock (free, good history), akshare, or qmt (broker)?
- Strategy type: MACD golden-cross, MA crossover, volume breakout, fundamental screen, or custom factor?
- Time range: start_timestamp and end_timestamp for backtest period
- Target entity IDs: specific stocks (stock_sh_600000) or index components (SZ1000)?
FILE:references/ANTI_PATTERNS.md
# Anti-Patterns (Cross-Project)
Total: **25**
## qlib (9)
### `AP-QLIB-1930` — 回测结果与模型无关——共享 dataset 对象导致预测值被首次模型覆盖 <sub>(high)</sub>
Qlib 中多个模型复用同一个已 fit 的 DatasetH 实例时,dataset 内部的标准化 参数(fit_start_time/fit_end_time 决定的归一化统计量)在第一次 fit 后固化。 切换模型但不重新初始化 dataset,导致所有模型实际使用同一套预测信号。表现为 无论换 LightGBM/XGBoost/DNN,回测净值曲线完全一致。这是最危险的"实验看起来 在跑,但结论全部无效"反模式。
Source: https://github.com/microsoft/qlib/issues/1930
### `AP-QLIB-2090` — fit_start_time 与 train segment 双重配置引发隐式数据泄露 <sub>(high)</sub>
Qlib DatasetH 有两个"训练数据范围":handler 的 fit_start_time/fit_end_time (决定归一化器拟合范围)和 segments.train(决定模型训练范围)。常见错误是 让 fit_end_time 覆盖 valid/test 段,使归一化统计量(均值、标准差)包含了 未来数据,造成前向偏差(look-ahead bias)。两者独立配置但语义耦合,文档 未明确说明 fit_end_time 必须 <= train_end。
Source: https://github.com/microsoft/qlib/issues/2090
### `AP-QLIB-2036` — MACD 因子公式文档错误——DEA 被多除一次 CLOSE 导致量纲不一致 <sub>(high)</sub>
Qlib 官方文档中的 Alpha 公式示例将 MACD 的 DEA 定义为 EMA(DIF, 9) / CLOSE, 但 DIF 已经是无量纲(除过 CLOSE 的),再次除以 CLOSE 导致 DEA 量纲为 1/price。 基于此文档公式构建的 MACD 因子在截面标准化后与正确公式差异显著,IC 下降。 此类文档层面的公式错误会被大量用户直接照搬入生产因子库。
Source: https://github.com/microsoft/qlib/issues/2036
### `AP-QLIB-2184` — 自定义 A 股数据导入前未按约定填充停牌日 NaN,引发下游因子噪声 <sub>(high)</sub>
Qlib 约定停牌日 open/close/high/low/volume/factor 字段均应填 NaN,以便框架 在因子计算时识别并跳过。用户自建 A 股数据集时若将停牌日保留为上一日价格 (常见于从东财/Wind 直接导出的数据),会导致停牌期间的价格动量因子出现 "假信号"(价格不变但因子非零)。Qlib 不校验此约定,错误静默流入训练数据。
Source: https://github.com/microsoft/qlib/issues/2184
### `AP-QLIB-1892` — PIT(Point-In-Time)财务数据收集器依赖外部股票列表接口,全量 A 股获取不完整 <sub>(high)</sub>
Qlib 的 PIT 数据收集器(财务数据时间点快照)在初始化时调用 get_hs_stock_symbols() 获取沪深股票列表。该函数依赖东财 API,经常仅返回 部分列表而非全量 5000+ 股票,且函数在获取不完整时直接 raise ValueError。 用户若按文档步骤操作,财务数据集将只覆盖部分股票,基于 PIT 财务因子的回测 存在严重生存者偏差(未被采集的股票被隐式排除)。
Source: https://github.com/microsoft/qlib/issues/1892
### `AP-QLIB-2097` — 全市场 instrument="all" 在 32GB 内存机器上 OOM,但 CSI300 正常 <sub>(medium)</sub>
Qlib 在加载 Alpha158 特征时会将指定 universe 的全部特征矩阵一次性载入内存。 使用 instrument="csi300"(300 股)与 instrument="all"(5000+ 股)的内存占用 差约 16 倍。32GB 机器跑全市场时在 init_instance_by_config 阶段直接 OOM, 错误信息不提示内存问题。用户容易误以为是配置错误,实际上需要分批加载或 使用流式特征计算。
Source: https://github.com/microsoft/qlib/issues/2097
### `AP-QLIB-1984` — LightGBM 模型标签维度校验逻辑永远不触发导致多标签训练静默失败 <sub>(medium)</sub>
Qlib gbdt.py 中用 y.values.ndim == 2 判断是否为多标签,但从 DataFrame 取出的 Series 的 ndim 永远为 1,条件永远为 False,因此多标签训练不会走 squeeze 分支,而是直接进入 LightGBM 训练并在更深处抛出语义不明的错误。 用户尝试自定义多标签任务时无法从错误信息定位到此根因。
Source: https://github.com/microsoft/qlib/issues/1984
### `AP-QLIB-1915` — 自定义 CSV 数据 dump_bin 后 DataHandler 报 Length mismatch,D.features 却正常 <sub>(high)</sub>
Qlib 存在两套数据访问路径:D.features(直接读 binary)和 DataHandler/DataHandlerLP (带 processor pipeline)。自定义 A 股 CSV 数据在 dump_bin 时若字段顺序 或 symbol 格式(如 600000.SH vs SH600000)与 Qlib 约定不符,DataHandler 的 processor 在 align/reindex 时触发 Length mismatch,而 D.features 因不 经过 processor 而成功。这一"两套路径行为不一致"让用户误以为数据已正确导入。
Source: https://github.com/microsoft/qlib/issues/1915
### `AP-QLIB-1949` — Colab/Linux 多进程后端与 Qlib ParallelExt 冲突导致 DataHandler 完全不可用 <sub>(medium)</sub>
Qlib 在非 fork 环境(Windows 或 Google Colab)中,DataHandler 使用 joblib 并行加载特征时,ParallelExt 初始化时访问 _backend_args 属性失败(AttributeError)。 根因是 joblib 1.5+ 移除了该内部属性,Qlib 的兼容层未更新。表现为 D.features 调用抛出多层嵌套异常,用户无法从错误栈判断是并行后端问题还是数据问题。
Source: https://github.com/microsoft/qlib/issues/1949
## vnpy (4)
### `AP-VNPY-3691` — K 线生成器首根 K 线时间戳不对齐,导致第一个周期信号错误 <sub>(high)</sub>
vnpy BarGenerator 在合成 N 分钟 K 线时,第一根推送的 K 线时间戳为"当前 tick 所在分钟"而非"完整 N 分钟周期结束时间"。具体表现:09:59 的 tick 会 触发一根不完整的 5 分钟 K 线推送(本应等到 10:04 才推送)。策略若在 on_bar 中直接用 datetime.minute % 5 过滤,第一根 K 线恰好通过,但包含的 数据不足一个完整周期,用于信号计算会产生错误的开仓信号。
Source: https://github.com/vnpy/vnpy/issues/3691
### `AP-VNPY-3669` — Alpha 模块历史数据增量保存时新旧 DataFrame schema 不兼容导致 SchemaError <sub>(medium)</sub>
vnpy Alpha 模块在保存 K 线数据到 Parquet 文件时,将新下载数据(可能含 Float64 列)与已存文件(历史 Int64 列)直接 polars.concat。polars 强类型 不允许隐式类型提升,抛出 SchemaError。根因是不同数据源/版本返回的字段类型 不一致(如 volume 在部分行情源为整数,在另一些为浮点),且 concat 前无 schema 对齐步骤。影响所有使用 vnpy alpha 进行回测的历史数据构建流程。
Source: https://github.com/vnpy/vnpy/issues/3669
### `AP-VNPY-3685` — 价差交易模块 run_backtesting() 在 Jupyter 环境下静默报错,结果不可信 <sub>(high)</sub>
vnpy 4.10 价差交易(SpreadTrading)模块的 run_backtesting() 在 Jupyter 环境下存在事件循环冲突(asyncio already running),导致回测引擎部分逻辑 不执行但不抛异常,返回看似正常的回测统计数据。同样代码在命令行 Python 中无此问题。vnpy 4.x 将部分 IO 改为 async 但 Jupyter 的事件循环与之不兼容, 是"回测结果看起来正确但实际不完整"的隐蔽陷阱。
Source: https://github.com/vnpy/vnpy/issues/3685
### `AP-VNPY-3700` — 安装脚本不使用 venv 导致全局 numpy 版本被降级破坏其他依赖 <sub>(medium)</sub>
vnpy install.bat 直接在系统/conda base 环境安装,会强制降级 numpy 到 <2.0 以满足 vnpy 依赖,破坏依赖 numpy 2.x 的其他量化工具(如 scipy、pytorch 新版)。 没有 requirements.txt,依赖边界不透明。在多工具共存的量化研究环境中, vnpy 的安装脚本是"全局环境污染"的常见根源。
Source: https://github.com/vnpy/vnpy/issues/3700
## zipline (6)
### `AP-ZIPLINE-138` — 回测价格为未复权价,教程图表误导用户误判策略收益 <sub>(high)</sub>
Zipline 教程使用 AAPL 股价图做演示,但 bundle 中存储的是未复权价格(raw price), 而非经过拆股/分红调整的复权价。图表显示的历史价格与市场实际价约差 4 倍(Apple 历次拆股累计因子),用户误将"价格翻 4 倍"当作策略收益。A 股场景更严重: 除权前后价格跳变会在未复权数据中形成巨大"信号",吸引技术指标在除权日产生 虚假突破信号。
Source: https://github.com/stefan-jansen/zipline-reloaded/issues/138
### `AP-ZIPLINE-235` — 默认以当根 K 线收盘价成交,低估实盘滑点,策略回测收益虚高 <sub>(high)</sub>
Zipline 默认滑点模型在当根 K 线触发信号后,以同根 K 线收盘价成交(current bar close fill)。实盘中信号只能在下一根 K 线的开盘价附近成交(T+1 order execution)。以 A 股日线为例,用收盘价回测比用次日开盘价成交平均高估日收益 约 0.1-0.3%,年化差距可超 30%。需显式配置 slippage model 为 VolumeShareSlippage 或 FixedSlippage 并设合理 volume_limit。
Source: https://github.com/stefan-jansen/zipline-reloaded/issues/235
### `AP-ZIPLINE-190` — 日历 start_session 设为非交易日触发 DateOutOfBounds,无提示如何修正 <sub>(medium)</sub>
Zipline 在注册 bundle 或运行算法时,若 start_session 参数恰好是非交易日 (如 1998-01-01 元旦),Calendar 校验抛出 DateOutOfBounds("cannot be earlier than the first session")。错误信息仅显示交易日历起始日,不提示"请改为第一个 交易日"。A 股场景:使用 SSE/SZSE 日历时,若 start_date 恰好是春节前最后 一天次日(节假日),会触发同类错误,调试成本极高。
Source: https://github.com/stefan-jansen/zipline-reloaded/issues/190
### `AP-ZIPLINE-181` — asset db 过期后 Pipeline 报"no assets traded",误导用户排查数据范围 <sub>(high)</sub>
Zipline 的 asset database(SQLite)记录每只股票的 start/end 交易日期。若 使用了旧版 Quandl/自建 bundle 且未重新 ingest,在回测新日期范围时 Pipeline 抛出 "Failed to find any assets with country_code 'US' that traded between [dates]"。A 股场景:重新下载行情后若只更新价格数据而未重建 asset db,退市/ 新上市股票的日期范围不更新,Pipeline 过滤会悄悄排除这些股票,产生生存者偏差。
Source: https://github.com/stefan-jansen/zipline-reloaded/issues/181
### `AP-ZIPLINE-285` — week_start()/week_end() 在自定义日历(非美股)下静默失效 <sub>(medium)</sub>
Zipline schedule_function 的 date_rules.week_start() 和 date_rules.week_end() 依赖交易日历的周首/周末判断逻辑,但在非美股日历(如 ASX、SSE)中,该逻辑 与 NYSE 日历的偏移计算不兼容,导致 schedule 永远不触发或在错误的日期触发。 A 股场景:使用 SSE 日历时,含春节等连续长假的周,week_start 可能跳过整个 假期周而不调仓,但用户无法从日志发现未触发的调度。
Source: https://github.com/stefan-jansen/zipline-reloaded/issues/285
### `AP-ZIPLINE-240` — 回测日期时区必须为 UTC,传入 naive datetime 引发深层 AssertionError <sub>(medium)</sub>
Zipline 内部强制要求所有时间戳为 UTC aware datetime。当用户传入 naive datetime (无时区信息,如 pd.Timestamp('2020-01-01'))时,不在入口处报错,而是在 算法执行深处触发 AssertionError: Algorithm should have a utc datetime,栈深 难以定位。A 股开发者从本地 CST 时间导入数据时极易触发此陷阱,需在 bundle 注册时显式 tz_localize('UTC')。
Source: https://github.com/stefan-jansen/zipline-reloaded/issues/240
## zvt (6)
### `AP-ZVT-183` — 除权因子为 inf/NaN 时直接参与乘法导致复权静默失败 <sub>(high)</sub>
ZVT 在计算前复权因子时以 new/old 价格比计算 qfq_factor。当 old==0(新股首日 或数据缺失)时因子为 inf;当 kdata.open 本身为 None(停牌日未填充)时乘法 抛出 TypeError。结果:整个 entity 的复权计算中断,后续 K 线全部丢失,但主 流程只 log ERROR 不中断,用户往往不知道已有大量股票数据损坏。
Source: https://github.com/zvtvz/zvt/issues/183
### `AP-ZVT-179` — 第三方数据接口超限后异常被吞噬,数据静默缺失 <sub>(high)</sub>
ZVT 使用聚宽 jqdatasdk 批量拉取全市场 K 线时(4000+ 股票),触发聚宽每日 最大查询条数限制(错误:已超过每日最大查询数量)。ZVT 捕获异常后继续执行下一 entity,导致超限后所有股票的当日数据均静默缺失。回测若使用该残缺数据库,因 子计算结果将产生系统性偏差,且无告警。
Source: https://github.com/zvtvz/zvt/issues/179
### `AP-ZVT-161` — 全市场 SQLite 批量因子计算触发 too many SQL variables 错误 <sub>(medium)</sub>
ZVT 在计算 VolumeUpMaFactor 等多股因子时,将所有 entity_id 拼入单条 SQL 的 IN 子句。当 A 股全市场(5000+ 股)一次性查询时,触发 SQLite 默认限制 SQLITE_MAX_VARIABLE_NUMBER=999。调大 max_allowed_packet(MySQL 参数)无效, 根因是 SQLite 变量数上限。正确解法是分批查询,但 ZVT 早期版本未处理此边界。
Source: https://github.com/zvtvz/zvt/issues/161
### `AP-ZVT-129` — 使用通配符导入隐藏 API 版本变更,AdjustType 等枚举莫名消失 <sub>(medium)</sub>
ZVT 文档示例使用 `from zvt import *` 导入所有符号。当 ZVT 版本升级重构 枚举(如将 AdjustType 移入子模块)后,通配符导入不再包含该符号,触发 AttributeError。使用者误以为是安装问题,实际是版本间 API breaking change 未在 CHANGELOG 中标注,且通配符导入掩盖了具体来源。应显式 import 枚举类。
Source: https://github.com/zvtvz/zvt/issues/129
### `AP-ZVT-187` — 回测引擎未在数据层空结果时提前终止,导致空指针级联崩溃 <sub>(medium)</sub>
ZVT Trader 在 load_data 完成后检查数据为空时,不提前退出,而是将空 DataFrame 传入 selector 计算,触发后续 NoneType 操作链式崩溃。错误栈深且难以定位根因, 用户误以为是策略逻辑问题。根因是数据时间窗口配置错误(start/end 不在数据 库覆盖范围内)但无有效校验。
Source: https://github.com/zvtvz/zvt/issues/187
### `AP-ZVT-183B` — HFQ(后复权)与 QFQ(前复权)K 线表使用错误导致因子计算漂移 <sub>(high)</sub>
ZVT 提供 Stock1dKdata(不复权)、Stock1dHfqKdata(后复权)、Stock1dQfqKdata (前复权)三张独立表。用户在计算价格动量/均线因子时混用两张表(如用不复权 做均线,用后复权做收益率),导致除权日前后因子值产生跳变。ZVT 不做跨表 复权类型一致性校验,混用静默通过。
Source: https://github.com/zvtvz/zvt/issues/183
FILE:references/COMPONENTS.md
# Component Capability Map
**Project**: finance-bp-125--bt
**Scan date**: 2026-04-22
**Stats**: {'total_files': 6, 'total_classes': 30, 'total_functions': 0, 'total_stages': 6}
## Modules (6)
- [data_input_&_preprocessing](components/data_input_-_preprocessing.md): 3 classes
- [tree_structure_construction](components/tree_structure_construction.md): 6 classes
- [strategy_logic_execution_(algostack)](components/strategy_logic_execution_-algostack.md): 6 classes
- [capital_allocation_&_rebalancing](components/capital_allocation_-_rebalancing.md): 6 classes
- [value_propagation_&_stale_resolution](components/value_propagation_-_stale_resolution.md): 4 classes
- [result_analysis_&_benchmarking](components/result_analysis_-_benchmarking.md): 5 classes
FILE:references/CONSTRAINTS.md
# Constraints
## preservation_manifest
```yaml
required_objects:
business_decisions_count: 105
fatal_constraints_count: 34
non_fatal_constraints_count: 129
use_cases_count: 20
semantic_locks_count: 12
preconditions_count: 4
evidence_quality_rules_count: 2
traceback_scenarios_count: 5
```
## Domain Constraints Injected (39)
- **`SHARED-BT-LAB-001`** <sub>(fatal)</sub>: 未来函数(Lookahead Bias):在模拟历史时间点 t 的交易决策时, 不得使用 t 时刻之后才能知道的信息。最常见形式: (1) 使用收盘价计算信号并同日以收盘价成交; (2) 将 T 日收盘后计算的指标标记在同一根 K 线; (3) 使用当日最高/最低价作为成交假设。 信号计算与成交时间必须对齐:T 日收盘后计算信号,T+1 日开盘成交。
- **`SHARED-BT-LAB-002`** <sub>(high)</sub>: 指标预热期(Warmup Period)处理:滚动窗口指标在前 N 个 bar 时 NaN, 这些 bar 不应参与信号计算和持仓决策。强制要求指标的 warmup_period 与最长 lookback 期等长,且 warmup 期间持仓应置零。
- **`SHARED-BT-LAB-003`** <sub>(fatal)</sub>: ML/DL 模型时序数据划分必须按时间顺序:TRAIN < VALID < TEST, 不可使用随机 k-fold 分折(会将未来数据混入训练集)。 应使用 TimeSeriesSplit 或 Walk-Forward 验证。
- **`SHARED-BT-LAB-004`** <sub>(fatal)</sub>: 开盘价/最高价/最低价成交假设:日线回测中假设每日可以最高价卖出或 最低价买入(如动量策略"最高价止盈"),这是明显的 lookahead, 因为日内最高/最低价只有收盘后才能确认。成交价只能用开盘价或 前一日收盘价(带滑点)。
- **`SHARED-BT-LAB-005`** <sub>(high)</sub>: 数据对齐偏移(Off-by-one):pandas rolling/shift 等操作容易引入细微的 1 期偏移错误。应在代码中明确记录每个序列的"观测时间点", 并通过 assert 验证关键时间对齐关系。
- **`SHARED-BT-LAB-006`** <sub>(high)</sub>: 过度优化(Overfitting):回测数量越多,过拟合概率越高。 Bailey et al.(2014)证明 Optimal Sharpe Ratio 期望值随回测次数单调递减。 应使用 Walk-Forward 验证代替 in-sample 参数穷举,并报告 Deflated Sharpe Ratio(DSR)而非峰值 Sharpe。
- **`SHARED-BT-SURV-001`** <sub>(fatal)</sub>: 幸存者偏差(Survivorship Bias):使用当前市场成分股作为历史回测股票池, 会遗漏曾经存在但后来退市、摘牌或被合并的股票,系统性高估策略历史收益率。 回测股票池必须使用历史时点快照(point-in-time universe)。
- **`SHARED-BT-SURV-002`** <sub>(high)</sub>: In-Sample / Out-of-Sample 划分:策略开发、参数选择必须在样本内完成, 样本外数据仅用于最终验证,不可多次"看"样本外数据后继续调优 (会将样本外变为新的样本内,重蹈过拟合)。
- **`SHARED-BT-SURV-003`** <sub>(high)</sub>: 停牌/缺失数据的填充策略:停牌日价格不可简单用前一日收盘价 forward-fill, 因为这会在复盘时造成"零成交量"日参与了因子计算和信号生成。 应在因子计算层显式过滤缺失交易日,不填充。
- **`SHARED-BT-SURV-004`** <sub>(high)</sub>: 异常值(Extreme Value)污染:原始市场数据可能含有数据源错误(如除权未 及时调整、手工录入错误导致的极端价格),不清洗直接进入因子计算会产生 极端信号,污染整个横截面。应在 pipeline 入口处过滤 3-sigma 异常值。
- **`SHARED-BT-COST-001`** <sub>(fatal)</sub>: 交易成本(佣金 + 印花税/转让税 + 过户费)必须在回测初始化时强制配置, 不可使用零成本默认值。忽略成本的回测策略绩效指标具有欺骗性, 高换手率策略尤其严重(单边往返成本往往吞噬 50%+ 的毛收益)。
- **`SHARED-BT-COST-002`** <sub>(high)</sub>: 滑点(Slippage)建模:回测若无滑点,假设每笔订单以理想价格成交, 高频策略在实盘中会因成交价劣化而产生严重亏损。至少应配置固定点差 或比例滑点;大单应使用成交量比例模型(如不超过日成交量 5%)。
- **`SHARED-BT-COST-003`** <sub>(high)</sub>: 换手率(Turnover)必须在回测绩效报告中展示并与成本关联分析。 月换手率超过 50%(年化 600%+)时,策略净收益对成本假设极度敏感, 每 10bps 成本变化可能改变策略盈亏结论,必须做成本敏感性分析。
- **`SHARED-BT-COST-004`** <sub>(medium)</sub>: 仓位规模化(Position Sizing)必须纳入资金量约束:回测应模拟固定资金量 下的实际持仓股数(取整),而非假设可以持有小数股。 对小盘股,最小交易单位(A股:100股/手)会导致实际可持仓量与目标权重 产生偏差,应在回测中模拟取整效应。
- **`SHARED-BT-TIME-001`** <sub>(high)</sub>: 时间戳时区统一:多数据源合并时,UTC vs 本地时间混用是常见数据腐败源。 所有时间戳必须在 pipeline 入口处统一转换为同一时区(推荐 UTC 存储, 市场本地时区展示),不可在 pipeline 中途混用不同时区。
- **`SHARED-BT-TIME-002`** <sub>(high)</sub>: 交易日历对齐:合并不同市场或不同频率数据时(如日线价格 + 周频因子), 必须使用明确的交易日历进行 reindex/merge,不可使用 outer join 后 fillna, 否则会在非交易日(节假日)创建虚假数据行。
- **`SHARED-BT-TIME-003`** <sub>(high)</sub>: 增量更新边界校验:历史数据增量更新时,必须从数据库查询已存最新日期, 仅下载该日期之后的数据。若重新下载已有数据并追加,会产生时间戳重复行, 导致回测时序错误。更新前后必须校验无重复 (index.duplicated().any() == False)。
- **`SHARED-BT-TIME-004`** <sub>(medium)</sub>: 回测绩效归因失真:基准(Benchmark)选择不当会使 Alpha/Beta 计算失真。 应选用策略实际可投资的被动基准(如 HS300 ETF),而非不可直接投资的 价格指数(如 HS300 指数)。价格指数不含股息再投资,会低估持仓基准收益。
- **`SHARED-BT-PERF-001`** <sub>(medium)</sub>: 最大回撤(Max Drawdown)计算必须使用净值序列(portfolio value), 不可用累计收益率序列代替。若使用对数收益率累加,会低估回撤深度 (因对数收益率在下跌时会比简单收益率偏小)。
- **`SHARED-BT-PERF-002`** <sub>(medium)</sub>: Sharpe Ratio 年化化约定:年化 Sharpe = 日 Sharpe × sqrt(252)(股票,252 交易日) 或 × sqrt(365)(加密货币,365日)。不同系统默认不同,跨系统对比前必须 确认年化因子,否则 Sharpe 不可比。
- **`SHARED-BT-PERF-003`** <sub>(medium)</sub>: Calmar Ratio / Sortino Ratio 优于 Sharpe Ratio 作为风险调整收益指标: Sharpe 假设收益正态分布,A 股/加密市场的收益分布显著左偏(肥尾), 会低估下行风险。量化评估应同时报告 Sortino(仅下行波动)和 Calmar(年化收益/最大回撤),不应单一依赖 Sharpe。
- **`SHARED-BT-PERF-004`** <sub>(medium)</sub>: 回测绩效归因应拆解为:alpha(主动收益)、beta(市场收益)、 因子暴露收益(style/sector)和特异性收益(stock selection)。 不做归因的回测无法区分"策略优秀"与"顺风行情恰好 beta 对了"。
- **`SHARED-FR-IC-001`** <sub>(high)</sub>: IC(信息系数)是衡量因子预测能力的核心指标,定义为因子值与 下期收益率的 Spearman 秩相关系数(ICIR = IC / std(IC))。 IC 绝对值 > 0.05 视为有预测能力的初步证据,ICIR > 0.5 视为稳定。 不计算 IC 直接报告回测绩效是因子有效性证明缺失的典型问题。
- **`SHARED-FR-IC-002`** <sub>(high)</sub>: IC 衰减(IC Decay)分析:因子预测能力通常随持仓期增长而衰减。 应计算 1/5/10/20 日 IC 序列,识别因子的最优持仓期。 IC 在1日高但20日迅速衰减的因子是短期因子,不适合月度换仓策略; 反之亦然。使用错误的持仓期会严重损害因子实盘表现。
- **`SHARED-FR-IC-003`** <sub>(high)</sub>: Harvey, Liu & Zhu (2016) 警告:学术界已发现 300+ 个"显著"因子, 其中大量是多重检验下的误发现(False Discovery)。因子有效性要求: t-stat > 3.0(而非传统的 1.96);或在不同时段/市场独立复现; 或有清晰的经济学逻辑。不满足上述条件的因子极可能是数据挖掘产物。
- **`SHARED-FR-IC-004`** <sub>(high)</sub>: 因子换手率(Factor Turnover)控制:高 IC 但高换手率的因子,在扣除 交易成本后净 IC 可能为负。应计算换手率调整后的有效 IC: net_IC = IC - turnover × cost_per_turn。目标换手率 ≤ 50%(月频)。
- **`SHARED-FR-IC-005`** <sub>(medium)</sub>: 因子衰减期(Half-life)是因子信号强度的核心参数,直接决定最优再平衡频率。 半衰期 < 5 日:日频或周频换仓;5-20 日:周频或双周;> 20 日:月频换仓。 错误地对短期因子使用月频换仓,会导致大量 alpha 在持仓期内消散。
- **`SHARED-FR-NEUT-001`** <sub>(high)</sub>: 行业中性化(Industry Neutralization):因子值若不对行业均值中性化, 因子收益中会混入行业轮动收益,难以判断是因子本身还是行业暴露驱动了收益。 行业中性化操作:factor_neutral = factor - industry_mean(factor)。
- **`SHARED-FR-NEUT-002`** <sub>(high)</sub>: 市值中性化(Market Cap Neutralization):小盘股效应(小盘跑赢大盘) 是金融史上最持久的 anomaly 之一,会污染几乎所有未中性化的因子。 若因子与市值高度相关,选股会系统性偏向小盘,收益来自市值暴露而非因子本身。 需同时进行行业和市值中性化(Fama-MacBeth 回归或残差法)。
- **`SHARED-FR-NEUT-003`** <sub>(high)</sub>: 异常值处理(Winsorize/MAD):因子原始值通常含有极端值,极端值会扭曲 分组分析(如 Q1/Q10 十分位)。应对原始因子值做 Winsorize(截尾至 [1%, 99%] 或 3-sigma)或 MAD(中位数绝对偏差)缩尾,然后再排名/中性化。
- **`SHARED-FR-NEUT-004`** <sub>(medium)</sub>: 因子正交化(Factor Orthogonalization):当多个因子共同用于合成打分时, 高相关因子的合成等效于对单一因子过度权重,稀释信号多样性。 应在合成前对因子做施密特正交化或 PCA,消除因子间的多重共线性。
- **`SHARED-FR-NEUT-005`** <sub>(medium)</sub>: 缺失数据填充策略:因子计算中的 NaN(停牌/新股/数据缺口)若用截面均值填充 会引入 lookahead bias(均值本身含未来信息);若完全删除会产生幸存者偏差; 正确做法是用截面中位数(当日所有股票的中位数,不依赖未来)或将该股当日排除。
- **`SHARED-FR-PORT-001`** <sub>(high)</sub>: 分层分析(Quantile Analysis):因子评估应使用 Q1/Q5(五分位)或 Q1/Q10(十分位)分组的多空收益差(top minus bottom spread)作为 主要评估指标,而非简单的多头收益。Q1 多 Q5 空的"单调性"检验是 因子有效性的核心证据:单调递增/递减 > 非单调 >> 仅多头有效。
- **`SHARED-FR-PORT-002`** <sub>(medium)</sub>: Alpha 衰减测试(Alpha Decay Test):因子的月度 IC 在不同时段(牛市/熊市/ 震荡市)的稳定性是因子鲁棒性的重要证据。IC 仅在某个特定市场状态下有效 的因子不适合全天候部署;应分段(rolling 12M)展示 IC 时序, 识别因子失效期。
- **`SHARED-FR-PORT-003`** <sub>(medium)</sub>: 换仓成本感知(Turnover-Aware Selection):因子排名靠近中间地带(49-51 分位) 的股票,排名小幅波动就会触发换仓,产生大量无效交易成本。 应在选股时设置换仓缓冲区(buffer zone):只在排名变化超过阈值时才换仓。
- **`SHARED-FR-PORT-004`** <sub>(medium)</sub>: 分组收益的统计显著性(Bootstrap 检验):因子分层收益差(Q1-Q5 spread) 即使在历史数据上很大,也可能是偶然,需要 bootstrap 或 t-test 检验 显著性(p-value < 0.05)。小样本回测期(< 3年)的分层收益尤其不可靠。
- **`SHARED-FR-XFER-001`** <sub>(high)</sub>: 因子跨市场可移植性验证:在一个市场有效的因子,不必然在另一个市场有效。 将美股因子直接套用 A 股、或将股票因子套用期货/加密货币,需要独立 IC 验证, 不可假设跨市场通用性。A 股特有异象(如反转效应、ST 价格异常)不存在于美股。
- **`SHARED-FR-XFER-002`** <sub>(medium)</sub>: 因子有效性时间稳定性:曾经有效的因子会因市场学习和套利行为逐渐失效 (McLean & Pontiff 2016 证明因子发表后平均衰减 58%)。 应定期(每季度/年)重新评估因子 IC,失效因子应及时替换或降权。
- **`SHARED-FR-XFER-003`** <sub>(medium)</sub>: 因子与宏观经济环境的交互:利率周期/经济周期/市场情绪对因子有效性影响显著。 价值因子(低 P/B)在利率上升期更有效;动量因子在趋势市更有效,震荡市失效。 部署因子前应评估当前宏观环境与因子最优生存环境的匹配度。
FILE:references/LOCKS.md
# Semantic Locks + Preconditions
## Semantic Locks (12)
### `SL-01` <sub>(on_violation: fatal)</sub>
Execute sell orders before buy orders in every trading cycle
### `SL-02` <sub>(on_violation: fatal)</sub>
Trading signals MUST use next-bar execution (no look-ahead)
### `SL-03` <sub>(on_violation: fatal)</sub>
Entity IDs MUST follow format entity_type_exchange_code
### `SL-04` <sub>(on_violation: fatal)</sub>
DataFrame index MUST be MultiIndex (entity_id, timestamp)
### `SL-05` <sub>(on_violation: fatal)</sub>
TradingSignal MUST have EXACTLY ONE of: position_pct, order_money, order_amount
### `SL-06` <sub>(on_violation: fatal)</sub>
filter_result column semantics: True=BUY, False=SELL, None/NaN=NO ACTION
### `SL-07` <sub>(on_violation: fatal)</sub>
Transformer MUST run BEFORE Accumulator in factor pipeline
### `SL-08` <sub>(on_violation: fatal)</sub>
MACD parameters locked: fast=12, slow=26, signal=9
### `SL-09` <sub>(on_violation: warning)</sub>
Default transaction costs: buy_cost=0.001, sell_cost=0.001, slippage=0.001
### `SL-10` <sub>(on_violation: fatal)</sub>
A-share equity trading is T+1 (no same-day close of buy positions)
### `SL-11` <sub>(on_violation: fatal)</sub>
Recorder subclass MUST define provider AND data_schema class attributes
### `SL-12` <sub>(on_violation: fatal)</sub>
Factor result_df MUST contain either 'filter_result' OR 'score_result' column
## Preconditions (4)
- **`PC-01`**: `python3 -c 'import zvt; print(zvt.__version__)'` → on_fail: Run: python3 -m pip install zvt then re-run: python3 -m zvt.init_dirs to initialize data directories
- **`PC-02`**: `python3 -c "from zvt.api.kdata import get_kdata; df = get_kdata(entity_ids=['stock_sh_600000'], limit=1); assert df is not None and len(df) > 0, 'No kdata found'"` → on_fail: Run recorder first: python3 -m zvt.recorders.em.em_stock_kdata_recorder --entity_ids stock_sh_600000 (replace with your target entity IDs)
- **`PC-03`**: `python3 -c "import os; from pathlib import Path; zvt_home = Path(os.environ.get('ZVT_HOME', Path.home() / '.zvt')); assert zvt_home.exists(), f'ZVT home not found: {zvt_home}'"` → on_fail: Run: python3 -m zvt.init_dirs
- **`PC-04`**: `python3 -c "import os, tempfile; from pathlib import Path; zvt_home = Path(os.environ.get('ZVT_HOME', Path.home() / '.zvt')); test_f = zvt_home / '.write_test'; test_f.touch(); test_f.unlink()"` → on_fail: Check directory permissions: chmod u+w ~/.zvt or set ZVT_HOME environment variable to a writable location
FILE:references/USE_CASES.md
# Known Use Cases (KUC)
Total: **20**
## `KUC-101`
**Source**: `docs/source/Buy_and_hold.ipynb`
Implements a passive buy-and-hold strategy with monthly rebalancing to fixed target weights, demonstrating core backtesting framework capabilities.
## `KUC-102`
**Source**: `docs/source/ERC.ipynb`
Demonstrates Equal Risk Contribution (ERC) portfolio weighting using multivariate normal returns and covariance matrix inputs for risk parity allocation.
## `KUC-103`
**Source**: `docs/source/Fixed_Income.ipynb`
Simulates rolling government bond trading with synthetic price-to-yield calculations and bond lifecycle management for fixed income backtesting.
## `KUC-104`
**Source**: `docs/source/PTE.ipynb`
Implements inverse volatility weighting to allocate more capital to lower-volatility assets, with 3-month lookback and 1-day lag for rebalancing.
## `KUC-105`
**Source**: `docs/source/Strategy_Combination.ipynb`
Combines multiple trading strategies into a single portfolio to test strategy allocation and diversification across different algorithmic approaches.
## `KUC-106`
**Source**: `docs/source/Target_Volatility.ipynb`
Controls portfolio-level volatility to a target annualized level (10%) using weekly rebalancing with inverse volatility asset weights.
## `KUC-107`
**Source**: `docs/source/Trend_1.ipynb`
Implements trend following using a rolling 12-month median as a moving average signal for asset selection and timing decisions.
## `KUC-108`
**Source**: `docs/source/Trend_2.ipynb`
Demonstrates custom algorithm creation by implementing a Signal algo that calculates total returns over a lookback period for monthly rebalancing decisions.
## `KUC-109`
**Source**: `docs/source/examples-nb.ipynb`
Demonstrates the SelectWhere algorithm for selecting securities based on custom signal DataFrames, using 50-day rolling mean as a sample indicator.
## `KUC-110`
**Source**: `docs/source/intro.ipynb`
Educational example comparing monthly equal-weight vs weekly inverse-volatility strategies using real market data (AAPL, MSFT, SPY, AGG).
## `KUC-111`
**Source**: `examples/pairs_trading.py`
Implements statistical arbitrage pairs trading by identifying cointegrated pairs whose indicator exceeds threshold for long/short positioning.
## `KUC-112`
**Source**: `examples/buy_and_hold.py`
Executable Python version of buy-and-hold strategy with monthly rebalancing, demonstrating standalone script execution for portfolio backtesting.
## `KUC-113`
**Source**: `examples/fixed_income.ipynb`
Creates synthetic government bond data with rolling maturity schedules, price-to-yield calculations, and bond lifecycle management for fixed income backtesting.
## `KUC-114`
**Source**: `examples/ERC.ipynb`
Equal Risk Contribution portfolio using multivariate normal returns and explicit covariance matrix for risk parity weighting across assets.
## `KUC-115`
**Source**: `examples/PTE.ipynb`
Inverse volatility weighting strategy using 3-month historical data and 1-day lag to reduce risk concentration in high-volatility assets.
## `KUC-116`
**Source**: `examples/Strategy_Combination.ipynb`
Combines multiple strategies into a unified portfolio allocation framework for testing strategy diversification and correlation effects.
## `KUC-117`
**Source**: `examples/Target_Volatility.ipynb`
Controls portfolio volatility to 10% annualized target using weekly rebalancing and inverse volatility asset weighting with 12-month lookback.
## `KUC-118`
**Source**: `examples/buy_and_hold.ipynb`
Basic buy-and-hold strategy with monthly rebalancing to fixed weights (60/40), demonstrating core framework rebalancing mechanics.
## `KUC-119`
**Source**: `examples/trend_1.ipynb`
Trend following strategy using 12-month rolling median as a baseline indicator, visualizing price vs moving average crossover signals.
## `KUC-120`
**Source**: `examples/trend_2.ipynb`
Custom Signal algorithm that calculates total returns over configurable lookback periods for monthly rebalancing decisions.
FILE:references/WISDOM.md
# Cross-Project Wisdom
Total: **10**
## `CW-BT-001` — Cerebro 统一编排引擎
**From**: backtrader · **Applicable to**: backtesting
backtrader 用 Cerebro 作为单一入口,统一管理 data feeds、strategies、analyzers、 observers 的生命周期,支持一次 cerebro.run() 跑多策略+多数据源。 zvt 的 StockTrader 目前每次实例化只绑定一套因子,缺乏统一的多策略组合编排层; 借鉴 Cerebro 模式可让用户把多个 Trader 实例组合到一个 runner 中对比评估。
## `CW-BT-002` — Analyzer 插件化绩效评估
**From**: backtrader · **Applicable to**: backtesting
backtrader 提供 SharpeRatio、DrawDown、TimeReturn、TradeAnalyzer 等即插即用 的 Analyzer,可在不修改策略代码的情况下附加任意绩效指标。 zvt 当前绩效评估能力较弱,没有标准化的 Analyzer 接口; 借鉴此模式可让用户 cerebro.addanalyzer(SharpeRatio) 即得风险调整收益报告。
## `CW-BT-003` — Sizer 仓位管理分离
**From**: backtrader · **Applicable to**: backtesting
backtrader 将仓位管理(每次开仓买多少股/多大比例)单独抽象为 Sizer, 与信号逻辑完全解耦;内置 FixedSize、PercentSizer 等,用户可自定义。 zvt 目前没有显式的 Sizer 概念,仓位控制逻辑散落在 Trader.on_profit_control 等钩子中; 引入 Sizer 接口可使策略信号与资金管理规则独立演化和组合复用。
## `CW-BT-004` — Order 类型全集(Limit/Stop/OCO/Bracket)
**From**: backtrader · **Applicable to**: backtesting
backtrader 支持 Market、Limit、Stop、StopLimit、OCO(二选一)、 Bracket(止盈止损一对订单)等丰富订单类型,并模拟成交滑点和手续费方案。 zvt 回测目前主要支持市价成交,缺乏限价委托和组合订单模拟; 对于高频或实盘对接场景,完善订单类型将大幅提升回测真实性。
## `CW-BT-005` — 数据重采样与重播(Resampling & Replaying)
**From**: backtrader · **Applicable to**: backtesting
backtrader 可将低级别数据(如 1 min)实时 resample 为高级别(如 1 day)并同步驱动策略, 或 replay 逐 tick 模拟 OHLC 形成过程,实现日内精细回测。 zvt 目前多时间框架通过预录入不同级别 K 线实现,缺少运行时动态重采样; 借鉴此模式可在不重复录入数据的前提下支持任意时间粒度组合回测。
## `CW-VN-003` — CTA 回测引擎内置可视化
**From**: vnpy · **Applicable to**: backtesting
vnpy 的 cta_backtester 提供图形界面直接展示策略净值曲线、最大回撤、 每日盈亏、成交明细,无需 Jupyter Notebook。 zvt 目前回测结果可视化依赖 draw_result 方法调用 Plotly,但无统一的回测报告页面; 借鉴此模式可打包一个开箱即用的策略绩效仪表盘。
## `CW-VN-004` — vnpy.alpha ML 因子研究实验室(Lab)
**From**: vnpy · **Applicable to**: factor-research
vnpy 4.0 的 vnpy.alpha.lab 提供数据管理、模型训练、信号生成、策略回测一体化工作流, 支持 Lasso/LightGBM/MLP 等算法的标准化训练接口和可视化对比。 zvt 的 ML 能力目前仅有 MaStockMLMachine 一个入口,缺乏规范化 Lab 框架; 借鉴 Lab 模式可建立"特征工程→训练→信号→回测"的标准流水线,降低 ML 实验门槛。
## `CW-QL-001` — Point-in-Time 数据库(防未来数据泄漏)
**From**: qlib · **Applicable to**: backtesting
qlib 的 Point-in-Time Provider 保证在给定时间点 t 的查询只返回 t 时刻 真实可知的数据(财报发布延迟、修订历史均被正确处理), 彻底消除回测中的 look-ahead bias。 zvt 目前财务数据以报告期为 timestamp,缺少"发布日"维度, 存在用未来财报数据做选股的潜在偏差;引入 PIT 模式可大幅提升回测可信度。
## `CW-QL-002` — Recorder + Experiment 实验管理(MLflow 风格)
**From**: qlib · **Applicable to**: factor-research
qlib 的 workflow 模块提供 Experiment/Recorder,自动记录每次模型训练的 超参数、特征、指标、预测结果,支持跨实验比较和模型版本管理。 zvt 目前缺乏 ML 实验追踪机制,每次重跑结果会覆盖前次; 借鉴 Recorder 模式可将每次因子实验的参数和结果持久化,支持快速复现和版本对比。
## `CW-QL-003` — Nested Decision Framework(多层嵌套决策执行)
**From**: qlib · **Applicable to**: backtesting
qlib 支持将高频执行层(分钟级委托拆单)嵌套在低频决策层(日级组合调仓)中, 两层独立优化且可组合运行,实现日内最优执行算法(如 TWAP、VWAP 调仓)。 zvt 目前回测仅有日线级别的成交假设,缺乏执行算法建模; 借鉴嵌套框架可让 zvt 区分"何时持有哪些股"与"如何以最小冲击成本建仓"两个问题。
FILE:references/components/capital_allocation_-_rebalancing.md
# capital_allocation_&_rebalancing (6 classes)
## `StrategyBase.allocate`
`capital_allocation_&_rebalancing/strategybase-allocate.py:0`
## `StrategyBase.rebalance`
`capital_allocation_&_rebalancing/strategybase-rebalance.py:0`
## `SecurityBase.transact`
`capital_allocation_&_rebalancing/securitybase-transact.py:0`
## `SecurityBase.outlay`
`capital_allocation_&_rebalancing/securitybase-outlay.py:0`
## `weight_mode`
`capital_allocation_&_rebalancing/weight-mode.py:0`
## `commission_model`
`capital_allocation_&_rebalancing/commission-model.py:0`
FILE:references/components/data_input_-_preprocessing.md
# data_input_&_preprocessing (3 classes)
## `Backtest.run`
`data_input_&_preprocessing/backtest-run.py:0`
## `Backtest._process_data`
`data_input_&_preprocessing/backtest-process-data.py:0`
## `data_source`
`data_input_&_preprocessing/data-source.py:0`
FILE:references/components/result_analysis_-_benchmarking.md
# result_analysis_&_benchmarking (5 classes)
## `Result.__init__`
`result_analysis_&_benchmarking/result-init.py:0`
## `RandomBenchmarkResult.__init__`
`result_analysis_&_benchmarking/randombenchmarkresult-init.py:0`
## `RenormalizedFixedIncomeResult._price`
`result_analysis_&_benchmarking/renormalizedfixedincomeresult-price.py:0`
## `Result.get_transactions`
`result_analysis_&_benchmarking/result-get-transactions.py:0`
## `normalization`
`result_analysis_&_benchmarking/normalization.py:0`
FILE:references/components/strategy_logic_execution_-algostack.md
# strategy_logic_execution_(algostack) (6 classes)
## `Algo.__call__`
`strategy_logic_execution_(algostack)/algo-call.py:0`
## `AlgoStack.__call__`
`strategy_logic_execution_(algostack)/algostack-call.py:0`
## `Strategy.run`
`strategy_logic_execution_(algostack)/strategy-run.py:0`
## `run_timing`
`strategy_logic_execution_(algostack)/run-timing.py:0`
## `selection_logic`
`strategy_logic_execution_(algostack)/selection-logic.py:0`
## `weighting_logic`
`strategy_logic_execution_(algostack)/weighting-logic.py:0`
FILE:references/components/tree_structure_construction.md
# tree_structure_construction (6 classes)
## `Node.__init__`
`tree_structure_construction/node-init.py:0`
## `StrategyBase.__init__`
`tree_structure_construction/strategybase-init.py:0`
## `SecurityBase.__init__`
`tree_structure_construction/securitybase-init.py:0`
## `Node._add_children`
`tree_structure_construction/node-add-children.py:0`
## `security_type`
`tree_structure_construction/security-type.py:0`
## `lazy_creation`
`tree_structure_construction/lazy-creation.py:0`
FILE:references/components/value_propagation_-_stale_resolution.md
# value_propagation_&_stale_resolution (4 classes)
## `StrategyBase.update`
`value_propagation_&_stale_resolution/strategybase-update.py:0`
## `SecurityBase.update`
`value_propagation_&_stale_resolution/securitybase-update.py:0`
## `StrategyBase._sync_data`
`value_propagation_&_stale_resolution/strategybase-sync-data.py:0`
## `price_source`
`value_propagation_&_stale_resolution/price-source.py:0`