CengSin

@clawhub-cengsin-807851f459

3prompts

0upvotes received

0contributions

Joined 3 months ago

3 contributions in the last year

Aug

Sep

Oct

Nov

Dec

Jan

Feb

Mar

Apr

May

Jun

Jul

Less

Coze Cli

Skill

Interact with Coze CLI (@coze/cli) — create/deploy Coze projects, manage spaces and organizations, send messages to projects, generate images/audio/video, an...

---
name: coze-cli
description: Interact with Coze CLI (@coze/cli) — create/deploy Coze projects, manage spaces and organizations, send messages to projects, generate images/audio/video, and automate Coze workflows via terminal. Triggered when user mentions "coze", "coze cli", "扣子 CLI", or any coze command execution.
---

# Coze CLI Skill

## Overview

This skill enables AI agents to interact with Coze CLI (`@coze/cli`) — the official command-line tool for Coze/Cozeflow development. It supports project creation, deployment, messaging, multimedia generation, and space management via terminal commands.

**Use this skill when**: The user wants to create/deploy Coze projects, manage spaces/orgs, generate images/audio/video, send messages to Coze projects, or automate Coze workflows via CLI.

## Quick Start

### Installation

```bash
npm install -g @coze/cli
coze --version   # verify
```

### Authentication

```bash
coze auth login --oauth   # opens browser for OAuth flow
coze auth status          # verify login
```

### Initial Setup

```bash
# Select organization
coze org list
coze org use <organization_id>

# Select workspace
coze space list
coze space use <space_id>
```

---

## Core Workflows

### Create a Project

```bash
# Natural language project creation
coze code project create --message "创建一个数据分析 Web 应用" --type web

# With wait (blocking until done)
coze code project create --message "创建一个客服机器人" --type agent --wait
```

**Supported types**: `agent`, `workflow`, `app`, `skill`, `web`, `miniprogram`, `assistant`

### List / Get Projects

```bash
coze code project list                          # all projects
coze code project list --type agent --type web  # filter by type
coze code project list --name "客服"            # search by name
coze code project get <project_id>              # detail
```

### Send Message to Project

```bash
coze code message send "修复登录页面的样式问题" -p <project_id>

# With local file context
coze code message send "重构 @src/utils.ts 中的代码" -p <project_id>

# Via pipe
cat error.log | coze code message send "分析这个错误日志" -p <project_id>

# Check status / cancel
coze code message status -p <project_id>
coze code message cancel -p <project_id>
```

### Deploy Project

```bash
coze code deploy <project_id>           # deploy
coze code deploy <project_id> --wait    # wait for completion
coze code deploy status <project_id>    # check status
```

### Preview Project

```bash
coze code preview <project_id>
```

### Manage Environment Variables

```bash
coze code env list -p <project_id>                   # dev env
coze code env list -p <project_id> --env prod        # prod env
coze code env set API_KEY xxx -p <project_id>         # set
coze code env delete API_KEY -p <project_id>          # delete
```

### Generate Multimedia

```bash
# Image
coze generate image "一只在太空漫步的猫"
coze generate image "未来城市" --output-path ./city.png --size 4K --no-watermark

# Audio
coze generate audio "你好，欢迎使用 Coze CLI"
coze generate audio "你好世界" --output-path ./hello.mp3 --audio-format ogg_opus

# Video
coze generate video create "一只跳舞的小猫"
coze generate video create "日落延时" --wait --output-path ./sunset.mp4 --resolution 1080p --duration 8
coze generate video status <task_id>
```

### Upload File

```bash
coze file upload ./document.pdf
```

---

## Output Format

```bash
# Text (default)
coze space list

# JSON (for scripting)
coze space list --format json
coze code project list --format json | jq '.[].name'
```

---

## CI/CD / Non-Interactive Use

```bash
export COZE_ORG_ID=<YOUR_ORG_ID>
export COZE_SPACE_ID=<YOUR_SPACE_ID>
export COZE_PROJECT_ID=<PROJECT_ID>

coze code deploy <project_id> --wait --format json
```

---

## Global Options

| Option | Description |
| --- | --- |
| `--format json\|text` | Output format (default: text) |
| `--no-color` | Disable ANSI colors |
| `--config <path>` | Custom config file |
| `--org-id <id>` | Override organization ID |
| `--space-id <id>` | Override space ID |
| `-p <project_id>` | Target project ID |
| `--verbose` | Verbose logging |
| `--debug` | Full diagnostic logs |

---

## Configuration

Config priority (high→low):
1. Environment variables (`COZE_ORG_ID`, `COZE_SPACE_ID`, etc.)
2. `--config` CLI flag
3. `COZE_CONFIG_FILE` env var
4. `.cozerc.json` in project dir
5. `~/.coze/config.json` global

```bash
coze config list
coze config get base_url
coze config set base_url https://api.coze.cn
```

---

## Detailed Command Reference

For the full command reference table, see:

→ **[references/commands.md](references/commands.md)**

Contains: auth, org/space, project CRUD, message, deploy, env, domain, skill, multimedia generation, file upload, config, completion, upgrade, CI/CD env vars, and quick command templates.

FILE:README.md
# Coze CLI Skill

> Interact with [Coze CLI](https://github.com/copilot66/coze-cli) — create/deploy Coze projects, manage spaces and organizations, send messages to projects, generate images/audio/video, and automate Coze workflows via terminal.

## Overview

This skill enables AI agents to interact with **Coze CLI** (`@coze/cli`), the official command-line tool for Coze/Cozeflow development. It supports project creation, deployment, messaging, multimedia generation, and space management via terminal commands.

**Trigger phrases**: `coze`, `coze cli`, `扣子 CLI`, or any coze command execution.

---

## Installation

```bash
npm install -g @coze/cli
coze --version   # verify
```

### Authentication

```bash
coze auth login --oauth   # opens browser for OAuth flow
coze auth status          # verify login
```

### Initial Setup

```bash
coze org list
coze org use <organization_id>

coze space list
coze space use <space_id>
```

---

## Core Commands

### Create a Project

```bash
coze code project create --message "创建一个数据分析 Web 应用" --type web
coze code project create --message "创建一个客服机器人" --type agent --wait
```

**Supported types**: `agent`, `workflow`, `app`, `skill`, `web`, `miniprogram`, `assistant`

### Send Message to Project

```bash
coze code message send "修复登录页面的样式问题" -p <project_id>

# With local file context
coze code message send "重构 @src/utils.ts 中的代码" -p <project_id>

# Via pipe
cat error.log | coze code message send "分析这个错误日志" -p <project_id>
```

### Deploy Project

```bash
coze code deploy <project_id>           # deploy
coze code deploy <project_id> --wait    # wait for completion
coze code deploy status <project_id>    # check status
```

### Manage Environment Variables

```bash
coze code env list -p <project_id>                  # dev env
coze code env list -p <project_id> --env prod       # prod env
coze code env set API_KEY xxx -p <project_id>      # set
coze code env delete API_KEY -p <project_id>       # delete
```

### Multimedia Generation

```bash
# Image
coze generate image "一只在太空漫步的猫"
coze generate image "未来城市" --output-path ./city.png --size 4K --no-watermark

# Audio
coze generate audio "你好，欢迎使用 Coze CLI"
coze generate audio "你好世界" --output-path ./hello.mp3 --audio-format ogg_opus

# Video
coze generate video create "一只跳舞的小猫"
coze generate video create "日落延时" --wait --output-path ./sunset.mp4 --resolution 1080p --duration 8
```

---

## Environment Variables (CI/CD)

| Variable | Description |
|----------|-------------|
| `COZE_ORG_ID` | Organization ID |
| `COZE_SPACE_ID` | Space ID |
| `COZE_PROJECT_ID` | Project ID (for message commands) |
| `COZE_CONFIG_FILE` | Custom config file path |

---

## Configuration

Config priority (high→low):

1. Environment variables (`COZE_ORG_ID`, `COZE_SPACE_ID`, etc.)
2. `--config` CLI flag
3. `COZE_CONFIG_FILE` env var
4. `.cozerc.json` in project dir
5. `~/.coze/config.json` global

```bash
coze config list
coze config set base_url https://api.coze.cn
```

---

## Quick Command Templates

```bash
# Full init flow
npm install -g @coze/cli && \
coze auth login --oauth && \
coze org list && coze org use <org_id> && \
coze space list && coze space use <space_id>

# Create + wait + deploy
coze code project create --message "<需求描述>" --type <type> --wait && \
coze code deploy <project_id> --wait && \
coze code preview <project_id>

# Batch query projects
coze code project list --format json | jq '.[] | select(.type=="agent") | .name'
```

---

## File Structure

```
coze-cli/
├── README.md              # This file
├── SKILL.md               # OpenClaw skill definition
└── references/
    └── commands.md        # Full command reference
```

---

## References

- [Coze CLI Official Docs](https://docs.coze.cn/developer_guides/coze_cli)
- [Coze CLI Quickstart](https://docs.coze.cn/developer_guides/coze_cli_quickstart)

FILE:references/commands.md
# Coze CLI 命令速查

> 来源：<https://docs.coze.cn/developer_guides/coze_cli> + <https://docs.coze.cn/developer_guides/coze_cli_quickstart>

---

## 认证

| 命令 | 说明 |
| --- | --- |
| `coze auth login --oauth` | OAuth 登录（自动打开浏览器） |
| `coze auth status` | 查看登录状态 |
| `coze auth logout` | 登出 |

---

## 组织 & 空间

| 命令 | 说明 |
| --- | --- |
| `coze org list` | 列出所有组织 |
| `coze org use <org_id>` | 切换默认组织 |
| `coze space list` | 列出当前组织下所有空间 |
| `coze space use <space_id>` | 切换默认空间 |

---

## 项目管理

| 命令 | 说明 |
| --- | --- |
| `coze code project create --message <描述> --type <类型>` | 通过自然语言创建项目 |
| `coze code project list` | 列出所有项目 |
| `coze code project list --type <type>` | 按类型筛选（可多次使用 `--type`） |
| `coze code project list --name <关键词>` | 按名称搜索 |
| `coze code project list --search-scope 1` | 只看自己创建的项目 |
| `coze code project list --size 20 --cursor-id <cursor>` | 分页查询 |
| `coze code project list --order-by 1 --order-type 1` | 按创建时间升序 |
| `coze code project list --is-fav-filter` | 只看收藏项目 |
| `coze code project get <project_id>` | 查看项目详情 |
| `coze code project delete <project_id>` | 删除项目 |

**支持的项目类型**: `agent`、`workflow`、`app`、`skill`、`web`、`miniprogram`、`assistant`

---

## 消息对话

| 命令 | 说明 |
| --- | --- |
| `coze code message send <消息> -p <project_id>` | 发送消息 |
| `coze code message send <消息> @<本地文件> -p <id>` | 引用本地文件 |
| `coze code message send <消息> -p <id> <file1> <file2>` | 引用多个文件 |
| `cat <file> \| coze code message send <消息> -p <id>` | 通过管道输入 |
| `export COZE_PROJECT_ID=<id>` then `coze code message send <消息>` | 通过环境变量指定项目 |
| `coze code message status -p <project_id>` | 查询消息状态 |
| `coze code message cancel -p <project_id>` | 取消进行中的消息 |

---

## 部署 & 预览

| 命令 | 说明 |
| --- | --- |
| `coze code deploy <project_id>` | 部署项目 |
| `coze code deploy <project_id> --wait` | 部署并等待完成 |
| `coze code deploy status <project_id>` | 查看部署状态（最新记录） |
| `coze code deploy status <project_id> --deploy-id <id>` | 查看指定部署记录 |
| `coze code preview <project_id>` | 获取预览链接 |

---

## 环境变量

| 命令 | 说明 |
| --- | --- |
| `coze code env list -p <project_id>` | 查看开发环境变量 |
| `coze code env list -p <project_id> --env prod` | 查看生产环境变量 |
| `coze code env set <KEY> <VALUE> -p <project_id>` | 设置环境变量 |
| `coze code env delete <KEY> -p <project_id>` | 删除环境变量 |

---

## 自定义域名

| 命令 | 说明 |
| --- | --- |
| `coze code domain list <project_id>` | 查看项目域名 |
| `coze code domain add <domain> -p <project_id>` | 添加自定义域名 |
| `coze code domain remove <domain> -p <project_id>` | 移除自定义域名 |

---

## 项目技能

| 命令 | 说明 |
| --- | --- |
| `coze code skill list -p <project_id>` | 查看项目技能（含安装状态） |
| `coze code skill add <skill_id> -p <project_id>` | 添加技能 |
| `coze code skill remove <skill_id> -p <project_id>` | 移除技能 |

---

## 多媒体生成

### 图像生成

| 命令 | 说明 |
| --- | --- |
| `coze generate image "<prompt>"` | 文生图 |
| `coze generate image "<prompt>" --output-path ./out.png` | 保存到本地 |
| `coze generate image "<prompt>" --size 4K --no-watermark` | 指定分辨率并禁水印 |
| `coze generate image "<prompt>" --image <参考图URL>` | 参考图生成 |

### 音频生成

| 命令 | 说明 |
| --- | --- |
| `coze generate audio "<text>"` | 文本转语音 |
| `coze generate audio "<text>" --output-path ./out.mp3` | 保存为 MP3 |
| `echo "<ssml>" \| coze generate audio --ssml` | SSML 输入 |
| `coze generate audio "<text>" --audio-format ogg_opus --speech-rate 50` | 指定格式和语速 |

### 视频生成

| 命令 | 说明 |
| --- | --- |
| `coze generate video create "<prompt>"` | 创建视频生成任务 |
| `coze generate video create "<prompt>" --wait --output-path ./out.mp4` | 等待完成并保存 |
| `coze generate video create "<prompt>" --resolution 1080p --duration 8` | 指定分辨率和时长 |
| `coze generate video status <task_id>` | 查询任务状态 |

---

## 文件管理

| 命令 | 说明 |
| --- | --- |
| `coze file upload <path>` | 上传本地文件，获取文件链接 |

---

## 配置管理

| 命令 | 说明 |
| --- | --- |
| `coze config list` | 查看所有配置（含来源） |
| `coze config get <key>` | 查看单项配置 |
| `coze config set <key> <value>` | 设置配置 |
| `coze config delete <key>` | 删除配置 |

**配置文件位置**（优先级从高到低）:
1. 环境变量
2. `--config` CLI 参数
3. `COZE_CONFIG_FILE` 环境变量
4. `.cozerc.json`（项目级）
5. `~/.coze/config.json`（全局）

---

## Shell 自动补全

| 命令 | 说明 |
| --- | --- |
| `coze completion --setup` | 安装自动补全脚本 |
| `source ~/.zshrc` 或 `source ~/.bashrc` | 重新加载 |
| `coze completion --cleanup` | 卸载自动补全 |

---

## 版本升级

| 命令 | 说明 |
| --- | --- |
| `coze upgrade` | 检查并升级到最新版本 |
| `coze upgrade --force` | 强制升级 |
| `coze upgrade --tag beta` | 升级到指定标签版本 |

---

## CI/CD 环境变量

| 环境变量 | 说明 |
| --- | --- |
| `COZE_ORG_ID` | 组织 ID |
| `COZE_ENTERPRISE_ID` | 企业 ID |
| `COZE_SPACE_ID` | 空间 ID |
| `COZE_PROJECT_ID` | 项目 ID（用于 message 命令） |
| `COZE_CONFIG_FILE` | 自定义配置文件路径 |
| `COZE_CONFIG_SCOPE` | 配置作用域（global 或 local） |
| `COZE_AUTO_CHECK_UPDATE` | 是否自动检查更新 |

---

## 全局选项

| 选项 | 说明 |
| --- | --- |
| `--format json\|text` | 输出格式 |
| `--no-color` | 禁用彩色输出 |
| `--config <path>` | 指定配置文件 |
| `--org-id <id>` | 临时覆盖组织 ID |
| `--space-id <id>` | 临时覆盖空间 ID |
| `--verbose` | 详细业务流程日志 |
| `--debug` | 全量诊断日志 |
| `--log-file <path>` | 日志输出到文件 |
| `-p <project_id>` | 指定项目 ID |

---

## 快速命令模板

```bash
# 完整初始化流程
npm install -g @coze/cli && \
coze auth login --oauth && \
coze org list && coze org use <org_id> && \
coze space list && coze space use <space_id>

# 创建 + 等待 + 部署
coze code project create --message "<需求描述>" --type <type> --wait && \
coze code deploy <project_id> --wait && \
coze code preview <project_id>

# 批量查询项目
coze code project list --format json | jq '.[] | select(.type=="agent") | .name'

# 代理配置
export HTTPS_PROXY=http://your-proxy:8080
export HTTP_PROXY=http://your-proxy:8080
```

ClawHub Coding DevOps+2

C@clawhub-cengsin-807851f459

Tickflow Realtime

Skill

使用 TickFlow 数据中心查询实时行情和日K数据。适用于用户想查单个或多个标的的最新价格、涨跌幅、成交量、交易时段，或查询单标的/多标的的日K、最近N根K线、复权K线时。

---
name: tickflow-realtime
description: 使用 TickFlow 数据中心查询实时行情和日K数据。适用于用户想查单个或多个标的的最新价格、涨跌幅、成交量、交易时段，或查询单标的/多标的的日K、最近N根K线、复权K线时。
---

# TickFlow 实时行情与日K

这个 skill 用 TickFlow HTTP API 查询两类数据：

- 实时行情：最新价、涨跌幅、成交量、交易时段
- K 线数据：默认日 K，也可按周期查询最近 N 根 K 线

## 何时使用

在这些场景触发：

- 用户要查某只股票、ETF、美股、港股的最新行情
- 用户要批量比较多个代码的实时价格或涨跌幅
- 用户要查某个标的池 `universes` 的实时行情
- 用户要查某个代码的日 K、周 K、月 K
- 用户要查多只标的最近一根或最近几根 K 线

## 工作流

1. 判断用户是要实时行情还是 K 线。
2. 从环境变量 `TICKFLOW_API_KEY` 读取 API Key。
3. 实时行情优先使用 `GET /v1/quotes`；标的较多时可以切到 `POST /v1/quotes`。
4. K 线单标的使用 `GET /v1/klines`；多标的使用 `GET /v1/klines/batch`。
5. 校验响应结构。
6. 返回简洁摘要；如果用户明确要原始数据，再返回 JSON。

## API Key

- 统一从环境变量 `TICKFLOW_API_KEY` 读取
- 不要把 API Key 写入文件、日志或输出内容。

## 脚本

- 实时行情脚本：`scripts/query_quotes.py`
- K 线脚本：`scripts/query_klines.py`
- 共享工具：`scripts/tickflow_common.py`

## 参考文档

- API 结构与字段说明：`references/api.md`
- 对外输出约定：`references/output-contract.md`

## 使用约定

- 百分比字段需要按 TickFlow 语义换算：`0.01 -> 1%`
- `timestamp` 是毫秒时间戳
- `region` 枚举目前是 `CN | US | HK`
- `session` 枚举见 `references/api.md`
- K 线接口返回紧凑列式结构；做摘要或表格前先解压

## 示例

查询单个标的实时行情：

```bash
python3 scripts/query_quotes.py --symbols 600519.SH --format summary
```

批量查询多个代码实时行情：

```bash
python3 scripts/query_quotes.py --symbols 600519.SH,000001.SZ,AAPL.US --format table
```

查询单个标的日 K：

```bash
python3 scripts/query_klines.py --symbol 600519.SH --period 1d --count 20 --format table
```

查询多个标的最近一根日 K：

```bash
python3 scripts/query_klines.py --symbols 600519.SH,000001.SZ --period 1d --count 1 --format summary
```

FILE:README.md
# tickflow-realtime

A Codex skill for querying real-time quotes and daily K-lines from TickFlow.

## What It Does

This skill wraps the TickFlow HTTP API and provides two main capabilities:

- Real-time quotes for one or more symbols
- K-line queries, with `1d` as the default period

It supports:

- Single-symbol quote lookup
- Multi-symbol quote comparison
- Quote lookup by `universes`
- Single-symbol K-line queries
- Batch K-line queries for multiple symbols

## Project Structure

```text
.
├── SKILL.md
├── README.md
├── TODO.md
├── references/
│   ├── api.md
│   └── output-contract.md
└── scripts/
    ├── query_klines.py
    ├── query_quotes.py
    └── tickflow_common.py
```

## Requirements

- Python 3
- A valid TickFlow API key

## API Key

This project reads the API key from the environment variable:

```bash
export TICKFLOW_API_KEY=your_key_here
```

The key is sent as the `x-api-key` header.

## Usage

### Real-Time Quotes

Single symbol:

```bash
python3 scripts/query_quotes.py --symbols 600519.SH --format summary
```

Multiple symbols:

```bash
python3 scripts/query_quotes.py --symbols 600519.SH,000001.SZ,AAPL.US --format table
```

Universe query:

```bash
python3 scripts/query_quotes.py --universes my-universe-id --format json --pretty
```

### K-Lines

Single symbol daily K:

```bash
python3 scripts/query_klines.py --symbol 600519.SH --period 1d --count 20 --format table
```

Batch latest daily K:

```bash
python3 scripts/query_klines.py --symbols 600519.SH,000001.SZ --period 1d --count 1 --format summary
```

Adjusted K-lines:

```bash
python3 scripts/query_klines.py --symbol 600519.SH --period 1d --count 60 --adjust forward --format table
```

## Output Modes

Both scripts support:

- `summary`
- `table`
- `json`

For JSON output, add `--pretty` for formatted output.

## Notes

- TickFlow percentage fields use the convention `0.01 -> 1%`
- Timestamps are epoch milliseconds
- K-line responses are returned in compact columnar form and are expanded by the script before summary/table rendering

## References

- Skill definition: [SKILL.md](./SKILL.md)
- API notes: [references/api.md](./references/api.md)
- Output contract: [references/output-contract.md](./references/output-contract.md)

## Publish Checklist

Before pushing to GitHub:

- Confirm `TICKFLOW_API_KEY` is not committed anywhere
- Review `TODO.md` and decide whether to keep it public
- Remove local-only helper files if they are not meant for the repository
- Run:

```bash
python3 -m py_compile scripts/tickflow_common.py scripts/query_quotes.py scripts/query_klines.py
```

FILE:references/api.md
# TickFlow API Notes

本文件只保留实现这个 skill 需要的最小事实，来源是：

- 官网文档页
- `https://api.tickflow.org/openapi.json`

## 认证

- 所有接口通过请求头 `x-api-key` 认证
- 本 skill 统一从环境变量 `TICKFLOW_API_KEY` 读取 API Key

## 实时行情

### GET `/v1/quotes`

- 参数：
  - `symbols`: 逗号分隔字符串
  - `universes`: 逗号分隔字符串
- 适合少量查询

### POST `/v1/quotes`

- Body schema: `QuotesRequest`
- 字段：
  - `symbols: string[] | null`
  - `universes: string[] | null`
- 适合大量标的，避免 URL 过长

### `QuotesResponse`

- 顶层：`data: Quote[]`

### `Quote`

- 必填核心字段：
  - `symbol`
  - `region`
  - `last_price`
  - `prev_close`
  - `open`
  - `high`
  - `low`
  - `volume`
  - `amount`
  - `timestamp`
- 可选字段：
  - `session`
  - `ext`

### 区域与交易时段

- `region`: `CN | US | HK`
- `session`:
  - `pre_market`
  - `regular`
  - `after_hours`
  - `closed`
  - `halted`
  - `lunch_break`

### `ext`

`ext` 是按市场区分的联合类型：

- `cn_equity`
  - `name`
  - `change_amount`
  - `change_pct`
  - `amplitude`
  - `turnover_rate`
- `us_equity`
  - `name`
  - `pre_market_price`
  - `pre_market_change_pct`
  - `after_hours_price`
  - `after_hours_change_pct`
- `hk_equity`
  - `name`

### 百分比字段

TickFlow 的百分比说明是：

- `0.01 -> 1%`

因此展示时应乘以 `100` 再加 `%`。

## K 线

### GET `/v1/klines`

- 单标的
- 参数：
  - `symbol` 必填
  - `period` 可选
  - `count` 可选
  - `start_time` 可选，毫秒时间戳
  - `end_time` 可选，毫秒时间戳
  - `adjust` 可选

### GET `/v1/klines/batch`

- 多标的
- 参数：
  - `symbols` 必填，逗号分隔字符串
  - 其余参数与单标的一致

### `Period`

- `1m`
- `5m`
- `10m`
- `15m`
- `30m`
- `60m`
- `1d`
- `1w`
- `1M`
- `1Q`
- `1Y`

对于这个 skill，默认优先 `1d`。

### `AdjustType`

- `forward`
- `backward`
- `forward_additive`
- `backward_additive`
- `none`

### `KlinesResponse`

- 顶层：`data: CompactKlineData`

### `KlinesBatchResponse`

- 顶层：`data: { [symbol: string]: CompactKlineData }`

### `CompactKlineData`

K 线不是逐根对象数组，而是列式结构：

- `timestamp: int64[]`
- `open: number[]`
- `high: number[]`
- `low: number[]`
- `close: number[]`
- `volume: int64[]`
- `amount: number[]`
- 可选：
  - `prev_close`
  - `open_interest`
  - `settlement_price`

做摘要或表格前，先按索引解压成逐根 K 线对象。

## 错误

错误响应 schema: `ApiError`

- `code`
- `message`
- `details` 可选

## 相关标的池接口

- `GET /v1/universes`
- `POST /v1/universes/batch`
- `GET /v1/universes/{id}`

当用户给的是 `universes` 时，实时行情可以直接透传给 `/v1/quotes`；如果后续要做校验或增强展示，可以再接入这些接口。

FILE:references/output-contract.md
# Output Contract

## 实时行情

### 单标的 `summary`

输出一段简明摘要，至少包含：

- `symbol`
- `name` 或 `-`
- `region`
- `last_price`
- `change_amount`
- `change_pct`
- `volume`
- `amount`
- `session`
- `timestamp`

### 多标的 `table`

列建议：

- `symbol`
- `name`
- `region`
- `last`
- `chg`
- `chg%`
- `volume`
- `session`
- `time`

### `json`

- 原样输出接口 JSON
- `--pretty` 时使用缩进格式化

## K 线

### 单标的 `summary`

至少包含：

- `symbol`
- `period`
- `bars`
- 最新一根 K 线的 `timestamp/open/high/low/close/volume/amount`

### 单标的 `table`

按时间顺序输出最近若干根 K 线：

- `time`
- `open`
- `high`
- `low`
- `close`
- `volume`
- `amount`

### 多标的 `summary`

每个标的只展示最新一根：

- `symbol`
- `period`
- `time`
- `open`
- `high`
- `low`
- `close`
- `volume`

### `json`

- 原样输出接口 JSON

## 空结果

- 不伪造默认值
- 输出 `No data returned.`

## 错误

- 优先输出 API 的 `code` 和 `message`
- 如果有 HTTP 状态码，一并显示
- 不输出 API Key

FILE:scripts/query_klines.py
#!/usr/bin/env python3

import argparse

from tickflow_common import (
    DEFAULT_BASE_URL,
    TickFlowAPIError,
    TickFlowError,
    compact_kline_to_rows,
    ensure_dict,
    format_epoch_ms,
    format_number,
    join_csv,
    parse_csv_arg,
    render_table,
    request_json,
    resolve_api_key,
    to_pretty_json,
)


def build_parser() -> argparse.ArgumentParser:
    parser = argparse.ArgumentParser(description="Query TickFlow K-lines.")
    parser.add_argument("--symbol", help="Single symbol")
    parser.add_argument("--symbols", help="Comma-separated symbols for batch query")
    parser.add_argument("--period", default="1d", help="K-line period, default: 1d")
    parser.add_argument("--count", type=int, help="Number of bars")
    parser.add_argument("--start-time", type=int, help="Start time in epoch milliseconds")
    parser.add_argument("--end-time", type=int, help="End time in epoch milliseconds")
    parser.add_argument("--adjust", help="Adjust type")
    parser.add_argument("--rows", type=int, default=10, help="Rows to show in table format")
    parser.add_argument("--base-url", default=DEFAULT_BASE_URL, help="TickFlow API base URL")
    parser.add_argument("--timeout", type=float, default=10.0, help="Request timeout in seconds")
    parser.add_argument("--format", choices=["json", "summary", "table"], default="summary")
    parser.add_argument("--pretty", action="store_true", help="Pretty-print JSON output")
    return parser


def fetch_single(args: argparse.Namespace, api_key: str) -> dict:
    params = {
        "symbol": args.symbol,
        "period": args.period,
        "count": args.count,
        "start_time": args.start_time,
        "end_time": args.end_time,
        "adjust": args.adjust,
    }
    payload = request_json(
        "GET",
        "/v1/klines",
        api_key,
        base_url=args.base_url,
        params=params,
        timeout=args.timeout,
    )
    payload = ensure_dict(payload, name="response")
    payload["data"] = compact_kline_to_rows(payload.get("data"))
    return payload


def fetch_batch(args: argparse.Namespace, api_key: str, symbols: list[str]) -> dict:
    params = {
        "symbols": join_csv(symbols),
        "period": args.period,
        "count": args.count,
        "start_time": args.start_time,
        "end_time": args.end_time,
        "adjust": args.adjust,
    }
    payload = request_json(
        "GET",
        "/v1/klines/batch",
        api_key,
        base_url=args.base_url,
        params=params,
        timeout=args.timeout,
    )
    payload = ensure_dict(payload, name="response")
    raw_data = ensure_dict(payload.get("data"), name="response.data")
    payload["data"] = {symbol: compact_kline_to_rows(value) for symbol, value in raw_data.items()}
    return payload


def kline_row(symbol: str, period: str, item: dict) -> dict[str, str]:
    return {
        "symbol": symbol,
        "period": period,
        "time": format_epoch_ms(item.get("timestamp")),
        "open": format_number(item.get("open")),
        "high": format_number(item.get("high")),
        "low": format_number(item.get("low")),
        "close": format_number(item.get("close")),
        "volume": format_number(item.get("volume"), 0),
        "amount": format_number(item.get("amount")),
    }


def render_single_summary(symbol: str, period: str, rows: list[dict]) -> str:
    if not rows:
        return "No data returned."
    latest = kline_row(symbol, period, rows[-1])
    return "\n".join(
        [
            f"{symbol} period={period} bars={len(rows)}",
            f"time={latest['time']} open={latest['open']} high={latest['high']} low={latest['low']} close={latest['close']}",
            f"volume={latest['volume']} amount={latest['amount']}",
        ]
    )


def render_batch_summary(period: str, payload: dict[str, list[dict]]) -> str:
    if not payload:
        return "No data returned."
    blocks = []
    for symbol, rows in payload.items():
        if not rows:
            blocks.append(f"{symbol} period={period} No data returned.")
            continue
        latest = kline_row(symbol, period, rows[-1])
        blocks.append(
            "\n".join(
                [
                    f"{symbol} period={period}",
                    f"time={latest['time']} open={latest['open']} high={latest['high']} low={latest['low']} close={latest['close']}",
                    f"volume={latest['volume']} amount={latest['amount']}",
                ]
            )
        )
    return "\n\n".join(blocks)


def render_single_table(symbol: str, period: str, rows: list[dict], limit: int) -> str:
    selected = rows[-limit:] if limit > 0 else rows
    table_rows = [kline_row(symbol, period, item) for item in selected]
    columns = [
        ("time", "time"),
        ("open", "open"),
        ("high", "high"),
        ("low", "low"),
        ("close", "close"),
        ("volume", "volume"),
        ("amount", "amount"),
    ]
    return render_table(table_rows, columns)


def render_batch_table(period: str, payload: dict[str, list[dict]]) -> str:
    rows = []
    for symbol, items in payload.items():
        if not items:
            rows.append(
                {
                    "symbol": symbol,
                    "period": period,
                    "time": "-",
                    "open": "-",
                    "high": "-",
                    "low": "-",
                    "close": "-",
                    "volume": "-",
                }
            )
            continue
        rows.append(kline_row(symbol, period, items[-1]))
    columns = [
        ("symbol", "symbol"),
        ("period", "period"),
        ("time", "time"),
        ("open", "open"),
        ("high", "high"),
        ("low", "low"),
        ("close", "close"),
        ("volume", "volume"),
    ]
    return render_table(rows, columns)


def main() -> None:
    parser = build_parser()
    args = parser.parse_args()

    if bool(args.symbol) == bool(args.symbols):
        raise SystemExit("Provide exactly one of --symbol or --symbols.")

    api_key = resolve_api_key()

    try:
        if args.symbol:
            payload = fetch_single(args, api_key)
            if args.format == "json":
                print(to_pretty_json(payload, args.pretty))
                return
            if args.format == "table":
                print(render_single_table(args.symbol, args.period, payload["data"], args.rows))
                return
            print(render_single_summary(args.symbol, args.period, payload["data"]))
            return

        symbols = parse_csv_arg(args.symbols)
        if not symbols:
            raise TickFlowError("No symbols provided.")

        payload = fetch_batch(args, api_key, symbols)
        if args.format == "json":
            print(to_pretty_json(payload, args.pretty))
            return
        if args.format == "table":
            print(render_batch_table(args.period, payload["data"]))
            return
        print(render_batch_summary(args.period, payload["data"]))
    except TickFlowAPIError as exc:
        raise SystemExit(str(exc))
    except TickFlowError as exc:
        raise SystemExit(str(exc))


if __name__ == "__main__":
    main()

FILE:scripts/query_quotes.py
#!/usr/bin/env python3

import argparse

from tickflow_common import (
    DEFAULT_BASE_URL,
    TickFlowAPIError,
    TickFlowError,
    ensure_dict,
    ensure_list,
    format_epoch_ms,
    format_number,
    format_percent,
    join_csv,
    parse_csv_arg,
    render_table,
    request_json,
    resolve_api_key,
    to_pretty_json,
)


def build_parser() -> argparse.ArgumentParser:
    parser = argparse.ArgumentParser(description="Query TickFlow real-time quotes.")
    parser.add_argument("--symbols", help="Comma-separated symbols, for example 600519.SH,AAPL.US")
    parser.add_argument("--universes", help="Comma-separated universe ids")
    parser.add_argument("--base-url", default=DEFAULT_BASE_URL, help="TickFlow API base URL")
    parser.add_argument("--timeout", type=float, default=10.0, help="Request timeout in seconds")
    parser.add_argument("--format", choices=["json", "summary", "table"], default="summary")
    parser.add_argument("--pretty", action="store_true", help="Pretty-print JSON output")
    parser.add_argument("--force-post", action="store_true", help="Use POST /v1/quotes")
    return parser


def choose_method(symbols: list[str], universes: list[str], force_post: bool) -> str:
    if force_post:
        return "POST"

    total_items = len(symbols) + len(universes)
    joined_length = len(",".join(symbols)) + len(",".join(universes))
    if total_items > 20 or joined_length > 800:
        return "POST"
    return "GET"


def fetch_quotes(args: argparse.Namespace) -> dict:
    symbols = parse_csv_arg(args.symbols)
    universes = parse_csv_arg(args.universes)
    if not symbols and not universes:
        raise TickFlowError("Provide at least one of --symbols or --universes.")

    api_key = resolve_api_key()
    method = choose_method(symbols, universes, args.force_post)

    if method == "POST":
        body = {
            "symbols": symbols or None,
            "universes": universes or None,
        }
        payload = request_json(
            "POST",
            "/v1/quotes",
            api_key,
            base_url=args.base_url,
            json_body=body,
            timeout=args.timeout,
        )
    else:
        params = {
            "symbols": join_csv(symbols) or None,
            "universes": join_csv(universes) or None,
        }
        payload = request_json(
            "GET",
            "/v1/quotes",
            api_key,
            base_url=args.base_url,
            params=params,
            timeout=args.timeout,
        )

    payload = ensure_dict(payload, name="response")
    payload["data"] = ensure_list(payload.get("data"), name="response.data")
    return payload


def quote_to_row(item: dict) -> dict[str, str]:
    ext = item.get("ext") if isinstance(item.get("ext"), dict) else {}
    return {
        "symbol": str(item.get("symbol", "-")),
        "name": str(ext.get("name") or "-"),
        "region": str(item.get("region", "-")),
        "last": format_number(item.get("last_price")),
        "chg": format_number(ext.get("change_amount")),
        "chg_pct": format_percent(ext.get("change_pct")),
        "volume": format_number(item.get("volume"), 0),
        "session": str(item.get("session") or "-"),
        "time": format_epoch_ms(item.get("timestamp")),
        "amount": format_number(item.get("amount")),
        "open": format_number(item.get("open")),
        "high": format_number(item.get("high")),
        "low": format_number(item.get("low")),
        "prev_close": format_number(item.get("prev_close")),
        "ext_type": str(ext.get("type") or "-"),
    }


def render_summary(data: list[dict]) -> str:
    if not data:
        return "No data returned."

    blocks = []
    for item in data:
        row = quote_to_row(item)
        blocks.append(
            "\n".join(
                [
                    f"{row['symbol']} {row['name']}",
                    f"region={row['region']} last={row['last']} change={row['chg']} ({row['chg_pct']})",
                    f"open={row['open']} high={row['high']} low={row['low']} prev_close={row['prev_close']}",
                    f"volume={row['volume']} amount={row['amount']} session={row['session']}",
                    f"time={row['time']} ext={row['ext_type']}",
                ]
            )
        )
    return "\n\n".join(blocks)


def render_quotes_table(data: list[dict]) -> str:
    rows = [quote_to_row(item) for item in data]
    columns = [
        ("symbol", "symbol"),
        ("name", "name"),
        ("region", "region"),
        ("last", "last"),
        ("chg", "chg"),
        ("chg_pct", "chg%"),
        ("volume", "volume"),
        ("session", "session"),
        ("time", "time"),
    ]
    return render_table(rows, columns)


def main() -> None:
    parser = build_parser()
    args = parser.parse_args()

    try:
        payload = fetch_quotes(args)
        if args.format == "json":
            print(to_pretty_json(payload, args.pretty))
            return

        data = payload["data"]
        if args.format == "table":
            print(render_quotes_table(data))
            return

        print(render_summary(data))
    except TickFlowAPIError as exc:
        raise SystemExit(str(exc))
    except TickFlowError as exc:
        raise SystemExit(str(exc))


if __name__ == "__main__":
    main()

FILE:scripts/tickflow_common.py
#!/usr/bin/env python3

import json
import os
import sys
import urllib.error
import urllib.parse
import urllib.request
from datetime import datetime, timezone
from typing import Any, Iterable


DEFAULT_BASE_URL = "https://api.tickflow.org"
DEFAULT_API_KEY_ENV = "TICKFLOW_API_KEY"
DEFAULT_TIMEOUT = 10.0
USER_AGENT = "tickflow-realtime-skill/1.0"


class TickFlowError(Exception):
    pass


class TickFlowAPIError(TickFlowError):
    def __init__(self, status: int, code: str | None, message: str, details: Any = None):
        self.status = status
        self.code = code
        self.message = message
        self.details = details
        super().__init__(self.__str__())

    def __str__(self) -> str:
        parts = [f"HTTP {self.status}"]
        if self.code:
            parts.append(self.code)
        parts.append(self.message)
        return " - ".join(parts)


def eprint(message: str) -> None:
    print(message, file=sys.stderr)


def die(message: str, exit_code: int = 1) -> "NoReturn":
    eprint(message)
    raise SystemExit(exit_code)


def parse_csv_arg(value: str | None) -> list[str]:
    if not value:
        return []
    return [item.strip() for item in value.split(",") if item.strip()]


def resolve_api_key() -> str:
    value = os.environ.get(DEFAULT_API_KEY_ENV)
    if value:
        return value

    raise TickFlowError(f"Missing API key. Export {DEFAULT_API_KEY_ENV} before running the script.")


def build_url(base_url: str, path: str, params: dict[str, Any] | None = None) -> str:
    base = base_url.rstrip("/")
    url = f"{base}{path}"
    if not params:
        return url

    filtered = {key: value for key, value in params.items() if value is not None}
    query = urllib.parse.urlencode(filtered, doseq=True)
    return f"{url}?{query}" if query else url


def request_json(
    method: str,
    path: str,
    api_key: str,
    *,
    base_url: str = DEFAULT_BASE_URL,
    params: dict[str, Any] | None = None,
    json_body: dict[str, Any] | None = None,
    timeout: float = DEFAULT_TIMEOUT,
) -> Any:
    headers = {
        "Accept": "application/json",
        "User-Agent": USER_AGENT,
        "x-api-key": api_key,
    }
    data = None
    if json_body is not None:
        headers["Content-Type"] = "application/json"
        data = json.dumps(json_body).encode("utf-8")

    request = urllib.request.Request(
        build_url(base_url, path, params),
        method=method.upper(),
        headers=headers,
        data=data,
    )

    try:
        with urllib.request.urlopen(request, timeout=timeout) as response:
            payload = response.read().decode("utf-8")
    except urllib.error.HTTPError as exc:
        payload = exc.read().decode("utf-8", errors="replace")
        parsed = _safe_json_loads(payload)
        if isinstance(parsed, dict):
            raise TickFlowAPIError(exc.code, parsed.get("code"), parsed.get("message", payload), parsed.get("details"))
        raise TickFlowAPIError(exc.code, None, payload.strip() or "Request failed")
    except urllib.error.URLError as exc:
        raise TickFlowError(f"Network error: {exc.reason}") from exc

    parsed = _safe_json_loads(payload)
    if parsed is None:
        raise TickFlowError("Response is not valid JSON.")
    return parsed


def _safe_json_loads(text: str) -> Any:
    try:
        return json.loads(text)
    except json.JSONDecodeError:
        return None


def ensure_dict(value: Any, *, name: str) -> dict[str, Any]:
    if not isinstance(value, dict):
        raise TickFlowError(f"{name} must be an object.")
    return value


def ensure_list(value: Any, *, name: str) -> list[Any]:
    if not isinstance(value, list):
        raise TickFlowError(f"{name} must be an array.")
    return value


def format_epoch_ms(value: Any) -> str:
    if value in (None, ""):
        return "-"
    try:
        dt = datetime.fromtimestamp(int(value) / 1000, tz=timezone.utc)
    except (TypeError, ValueError, OSError):
        return str(value)
    return dt.isoformat().replace("+00:00", "Z")


def format_number(value: Any, decimals: int = 2) -> str:
    if value is None:
        return "-"
    if isinstance(value, bool):
        return str(value)
    if isinstance(value, int):
        return str(value)
    if isinstance(value, float):
        text = f"{value:.{decimals}f}"
        return text.rstrip("0").rstrip(".")
    return str(value)


def format_percent(value: Any, decimals: int = 2) -> str:
    if value is None:
        return "-"
    try:
        pct = float(value) * 100
    except (TypeError, ValueError):
        return str(value)
    return f"{pct:.{decimals}f}%"


def pad(value: Any, width: int) -> str:
    text = str(value)
    return text if len(text) >= width else text + (" " * (width - len(text)))


def render_table(rows: list[dict[str, Any]], columns: list[tuple[str, str]]) -> str:
    if not rows:
        return "No data returned."

    widths: dict[str, int] = {}
    for key, title in columns:
        widths[key] = len(title)
    for row in rows:
        for key, _ in columns:
            widths[key] = max(widths[key], len(str(row.get(key, "-"))))

    header = "  ".join(pad(title, widths[key]) for key, title in columns)
    divider = "  ".join("-" * widths[key] for key, _ in columns)
    body = []
    for row in rows:
        body.append("  ".join(pad(row.get(key, "-"), widths[key]) for key, _ in columns))
    return "\n".join([header, divider, *body])


def compact_kline_to_rows(data: dict[str, Any]) -> list[dict[str, Any]]:
    data = ensure_dict(data, name="kline data")
    timestamps = ensure_list(data.get("timestamp"), name="data.timestamp")
    row_count = len(timestamps)
    rows: list[dict[str, Any]] = []
    keys = [
        "timestamp",
        "open",
        "high",
        "low",
        "close",
        "volume",
        "amount",
        "prev_close",
        "open_interest",
        "settlement_price",
    ]

    for index in range(row_count):
        row: dict[str, Any] = {}
        for key in keys:
            column = data.get(key)
            if column is None:
                continue
            if not isinstance(column, list):
                raise TickFlowError(f"data.{key} must be an array.")
            if len(column) != row_count:
                raise TickFlowError(f"data.{key} length does not match data.timestamp.")
            row[key] = column[index]
        rows.append(row)
    return rows


def to_pretty_json(value: Any, pretty: bool) -> str:
    if pretty:
        return json.dumps(value, indent=2, ensure_ascii=False)
    return json.dumps(value, ensure_ascii=False)


def join_csv(items: Iterable[str]) -> str:
    return ",".join(item for item in items if item)

ClawHub Coding Database+2

C@clawhub-cengsin-807851f459

Weread Reading Recommender

Skill

Use this skill when the user wants to export local WeRead records, normalize WeRead data, analyze reading preferences from WeRead history, or get book recomm...

---
name: weread-reading-recommender
description: Use this skill when the user wants to export local WeRead records, normalize WeRead data, analyze reading preferences from WeRead history, or get book recommendations grounded in WeRead reading behavior and a current learning goal.
---

# WeRead Reading Recommender

## Overview

This is a local-first skill for exporting 微信读书 (WeRead) records from a cookie stored on the user's machine, normalizing those records into a recommendation-friendly JSON file, and using that data to analyze reading preferences or recommend what to read next.

Use this skill when the user wants to:
- 根据微信读书记录推荐书
- 分析自己的阅读偏好或阅读画像
- 结合“最近想学的主题”与微信读书历史一起做推荐
- 导出、刷新、归一化本地微信读书数据

## Trigger Cases

Activate this skill for requests like:
- “根据我的微信读书记录推荐书”
- “分析我的阅读偏好”
- “我最近想系统学 AI Agent，结合微信读书记录推荐 5 本书”
- “帮我导出 / 刷新 / 归一化微信读书数据”
- “基于我的阅读历史，推荐下一本最适合现在读的书”
- “分析我的阅读偏好，并给我 3 本稳妥推荐 + 2 本探索推荐”

## Workflow

Follow this sequence:

1. Check whether a normalized JSON file already exists.
2. If normalized data is missing, or the user explicitly wants fresh data, check whether a local WeRead cookie is already available.
3. Look for a local cookie source in this order:
   - a cookie file path explicitly provided by the user
   - `WEREAD_COOKIE`
   - another env var name passed through `--env-var`
4. If no local cookie source exists, ask the user to set one locally and stop there. Do not tell the user to edit `SKILL.md`.
5. If a local cookie source exists, run the export script.
6. Run the normalize script on the raw export.
7. Read the normalized JSON and identify strong signals:
   - high-engagement books
   - recent books
   - unfinished books with momentum
   - repeated categories or lists
8. If the user provides a current goal, weight goal fit first.
9. If the user does not provide a goal, produce a reading-profile summary plus safe and exploratory recommendations.

## Recommendation Guidance

When the user provides a current goal, weight approximately:
- 60% goal fit
- 40% history fit

When the user provides no goal, weight approximately:
- 70% history fit
- 20% recency
- 10% exploration/diversity

For each recommendation, explain:
- why it fits the user's current goal or history
- which past books it resembles
- what gap it fills
- whether it is a safe pick or an exploration pick
- whether it is a good fit right now

Suggested response structure:
- 阅读画像 / Reading profile
- 推荐结果 / Recommendations
- 为什么适合现在 / Why now
- 暂缓推荐 / Skip for now (optional)

## Local Data Workflow

### 1. Check local cookie availability first

Before asking the user to set anything, first check whether a local cookie is already available through:
- a cookie file path the user provided
- `WEREAD_COOKIE`
- another env var name passed through `--env-var`

If none of these exist, ask the user to set the cookie locally, then continue.

### 2. Export raw WeRead data

If a local cookie is already available, export directly:

```bash
python3 scripts/export_weread.py --out data/weread-raw.json
```

Optional variants:

```bash
python3 scripts/export_weread.py --cookie-file ~/.config/weread.cookie --out data/weread-raw.json
python3 scripts/export_weread.py --env-var WEREAD_COOKIE --include-book-info --detail-limit 50 --out data/weread-raw.json
```

If the user does need to set one manually, keep it local. For example:

```bash
export WEREAD_COOKIE='wr_skey=...; wr_vid=...; ...'
```

### 3. Normalize the raw export

```bash
python3 scripts/normalize_weread.py --input data/weread-raw.json --output data/weread-normalized.json
```

### 4. Use the normalized file for recommendation turns

After normalization, this skill should reason primarily from the normalized JSON, not from a live cookie session, unless the user explicitly asks for a refresh.

## Security Boundary

This skill is local-first. Enforce these rules:

- Cookie is for local use only.
- Never write the cookie into `SKILL.md`, scripts, assets, logs, or exported JSON.
- Never echo the cookie back in responses.
- Prefer checking existing local cookie sources before asking the user to set one again.
- Do not rely on CookieCloud or any third-party cookie sync service by default.
- Do not suggest remote cookie hosting as the normal path.
- Recommendation work should use the normalized JSON whenever possible.

## Files

Use these project files as the main references:
- `scripts/export_weread.py`
- `scripts/normalize_weread.py`
- `references/data-schema.md`
- `references/privacy-model.md`
- `references/recommendation-rubric.md`
- `assets/sample-weread-raw.json`
- `assets/sample-weread-normalized.json`

## Example Requests

- 结合我的微信读书记录，我最近想系统学 AI Agent，推荐 5 本书
- 基于我的阅读历史，推荐下一本最适合现在读的书
- 分析我的阅读偏好，并给我 3 本稳妥推荐 + 2 本探索推荐
- 帮我刷新微信读书数据，然后按最近在读主题推荐下一批书

FILE:PLAN.md
# PLAN.md — weread-reading-recommender

## Goal
Build a local-first draft skill that reads 微信读书 (WeRead) records using a cookie kept only on the user's machine, exports the data to JSON, normalizes it into a recommendation-friendly format, and then lets the skill recommend books based on:

1. the user's current learning goal/question, or
2. the reading profile inferred from WeRead history when no current goal is provided.

## Product Positioning
This is **not** a cloud sync integration and **not** a hosted cookie proxy.

It is a:
- local exporter
- local normalizer
- recommendation-oriented AgentSkill

## MVP Scope
### In
- Read WeRead cookie from local env var or local file
- Export bookshelf + progress + notebook stats to local JSON
- Optionally fetch public-ish per-book metadata from WeRead book info API
- Normalize raw export into a stable JSON schema
- Use normalized JSON as the main input for recommendation turns
- Recommend books from a user goal when provided
- Fall back to general recommendations from reading history alone

### Out
- CookieCloud / third-party cookie sync
- Remote storage of cookie or reading records
- Automatic scheduling / background refresh
- Highlight/note text analysis in v1
- Full external-book search pipeline in code
- Installation / packaging to production skill directory

## Draft Deliverables
- `PLAN.md`
- `SPEC.md`
- `SKILL.md`
- `scripts/export_weread.py`
- `scripts/normalize_weread.py`
- `references/data-schema.md`
- `references/privacy-model.md`
- `references/recommendation-rubric.md`
- sample raw/normalized JSON assets

## Workflow
1. User keeps cookie locally.
2. Run `export_weread.py` to dump raw WeRead data.
3. Run `normalize_weread.py` to produce a stable normalized file.
4. Skill reads normalized JSON.
5. If the user provides a goal, weight goal fit first.
6. If no goal is provided, weight reading profile + recency + engagement.

## Data Strategy
### Raw export
Keep a mostly faithful copy of WeRead responses for debugging and future extension.

### Normalized export
Produce a compact, stable schema with:
- per-book status
- engagement score
- top categories
- recent books
- strongest books/signals
- enough metadata for recommendation reasoning

## Recommendation Strategy
### With a current goal
Weight:
- 60% goal fit
- 40% reading-history fit

### Without a current goal
Weight:
- 70% reading-history fit
- 20% recency
- 10% exploration/diversity

## Safety / Privacy Model
- Cookie must never be written to output JSON.
- Cookie must never be embedded in SKILL.md or assets.
- Export happens only on the local machine.
- Normalized JSON should contain only recommendation-relevant reading data.

## Success Criteria for This Draft
- A local export command exists and is readable/testable.
- A normalization command exists and is readable/testable.
- The skill instructions clearly describe when/how to use the exported data.
- The draft is good enough for you to review before deciding whether to continue to full implementation or install.

FILE:README.md
# weread-reading-recommender

一个本地优先的微信读书推荐 skill。它的目标不是做云同步，也不是代管 cookie，而是让你在自己的机器上：

1. 读取本地微信读书 cookie
2. 导出原始阅读数据
3. 归一化成推荐友好的 JSON
4. 基于阅读历史和当前学习目标推荐下一本书

## 项目定位

这个项目适合下面几类需求：

- 根据微信读书记录推荐书
- 分析阅读偏好和阅读画像
- 结合“最近想学的主题”与历史阅读轨迹一起做推荐
- 刷新、导出、归一化本地微信读书数据

它是：

- 本地导出工具
- 本地归一化工具
- 面向推荐场景的 skill 草案

它不是：

- 云端同步服务
- CookieCloud 默认集成
- 托管 cookie 的远程代理

## 核心流程

推荐使用下面的顺序：

1. 检查本地是否已有可用 cookie
2. 运行导出脚本，生成 raw JSON
3. 运行归一化脚本，生成 normalized JSON
4. 用 normalized JSON 做阅读画像和推荐

如果本地没有 cookie，再由用户自己设置本地 cookie。项目不会把 cookie 写进仓库文件，也不会默认依赖第三方同步服务。

## 当前能力

### 1. 导出脚本

文件：[`scripts/export_weread.py`](scripts/export_weread.py)

支持：

- 从 `--cookie` 读取 cookie
- 从 `--cookie-file` 读取 cookie
- 从 `--env-var` 指定的环境变量读取 cookie
- 默认读取 `WEREAD_COOKIE`
- 调用微信读书书架同步接口
- 调用微信读书 notebook 接口
- 可选调用 `book/info` 获取书籍补充信息
- 输出不含 cookie 的 raw JSON

### 2. 归一化脚本

文件：[`scripts/normalize_weread.py`](scripts/normalize_weread.py)

会把 raw JSON 转成推荐友好的结构，包括：

- 每本书的阅读状态
- 阅读进度与阅读时长
- 笔记、书签、点评计数
- `engagement_score`
- `summary`
- `profile_inputs`
- `llm_hints`

### 3. Skill 说明

文件：[`SKILL.md`](SKILL.md)

定义了：

- 什么时候用这个 skill
- 没有 normalized 数据时的工作流
- 如何检查本地 cookie
- 推荐时的解释结构
- 安全和隐私边界

## 快速开始

### 环境要求

- Python 3
- `requests`

安装依赖：

```bash
python3 -m pip install requests
```

### 1. 准备本地 cookie

优先检查本地是否已经有可用 cookie，例如：

- 你已经设置了 `WEREAD_COOKIE`
- 你已经有本地 cookie 文件

如果没有，再自行设置本地 cookie，例如：

```bash
export WEREAD_COOKIE='wr_skey=...; wr_vid=...; ...'
```

### 2. 导出 raw 数据

```bash
python3 scripts/export_weread.py --out data/weread-raw.json
```

可选用法：

```bash
python3 scripts/export_weread.py --cookie-file ~/.config/weread.cookie --out data/weread-raw.json
python3 scripts/export_weread.py --env-var WEREAD_COOKIE --include-book-info --detail-limit 50 --out data/weread-raw.json
```

### 3. 归一化数据

```bash
python3 scripts/normalize_weread.py --input data/weread-raw.json --output data/weread-normalized.json
```

### 4. 用样例数据试跑

仓库内已提供样例 raw 数据：

- [`assets/sample-weread-raw.json`](assets/sample-weread-raw.json)

可以直接运行：

```bash
python3 scripts/normalize_weread.py \
  --input assets/sample-weread-raw.json \
  --output assets/sample-weread-normalized.json
```

## 输出结构

### Raw export

顶层至少包含：

- `exported_at`
- `source`
- `summary`
- `shelf_sync`
- `notebook`
- `book_info`
- `warnings`

### Normalized export

顶层至少包含：

- `generated_at`
- `source_file`
- `summary`
- `profile_inputs`
- `llm_hints`
- `books`

每本书至少包含：

- `book_id`
- `title`
- `author`
- `translator`
- `categories`
- `book_lists`
- `status`
- `is_finished`
- `progress`
- `reading_time_seconds`
- `last_read_at`
- `note_count`
- `bookmark_count`
- `review_count`
- `interaction_count`
- `engagement_score`
- `is_imported`
- `is_paid`
- `public_rating`
- `intro`

更完整的数据说明见：

- [`SPEC.md`](SPEC.md)
- [`references/data-schema.md`](references/data-schema.md)

## 推荐场景示例

- 结合我的微信读书记录，我最近想系统学 AI Agent，推荐 5 本书
- 基于我的阅读历史，推荐下一本最适合现在读的书
- 分析我的阅读偏好，并给我 3 本稳妥推荐 + 2 本探索推荐
- 帮我刷新微信读书数据，然后按最近在读主题推荐下一批书

## 隐私与安全

这个项目遵循 local-first：

- cookie 仅限本地使用
- 不把 cookie 写进导出 JSON
- 不把 cookie 写进仓库文件
- 不默认依赖 CookieCloud 或第三方同步服务
- 不要求远端托管 cookie

隐私边界说明见：

- [`references/privacy-model.md`](references/privacy-model.md)

## 项目结构

```text
weread-reading-recommender/
├── README.md
├── PLAN.md
├── SPEC.md
├── TODO.md
├── SKILL.md
├── scripts/
│   ├── export_weread.py
│   └── normalize_weread.py
├── references/
│   ├── data-schema.md
│   ├── privacy-model.md
│   └── recommendation-rubric.md
└── assets/
    ├── sample-weread-raw.json
    └── sample-weread-normalized.json
```

## 当前状态

当前仓库已经完成：

- 正式版 `SKILL.md`
- `export_weread.py`
- `normalize_weread.py`
- `sample-weread-normalized.json`

后续可继续增强：

- 划线和笔记文本分析
- 外部候选书召回
- 中英文书名归一化和去重
- 自动刷新流程
- 更强的推荐解释层

FILE:SPEC.md
# SPEC.md — weread-reading-recommender

## 1. Objective
Provide a local-first skill for WeRead-based book recommendation. The skill should help answer prompts such as:

- 结合我的微信读书记录，我最近想系统学 AI Agent，推荐几本书
- 根据我的微信读书历史，推荐下一本最值得读的书
- 分析我的阅读偏好，再给我 3 本稳妥推荐和 2 本探索推荐

## 2. Inputs
### 2.1 Local cookie input for export script
Supported sources:
- `--cookie "..."`
- `--cookie-file /path/to/weread.cookie`
- env var `WEREAD_COOKIE` (default)
- custom env var via `--env-var`

### 2.2 Raw export file
Default example path:
- `data/weread-raw.json`

### 2.3 Normalized file
Default example path:
- `data/weread-normalized.json`

### 2.4 Recommendation prompt
Optional natural-language goal supplied by the user.

## 3. Export Script Requirements
File:
- `scripts/export_weread.py`

### CLI
```bash
python3 scripts/export_weread.py --out data/weread-raw.json
python3 scripts/export_weread.py --cookie-file ~/.config/weread.cookie --out data/weread-raw.json
python3 scripts/export_weread.py --include-book-info --detail-limit 50 --out data/weread-raw.json
```

### Required behavior
- Read cookie locally
- Call WeRead bookshelf endpoint
- Call WeRead notebook endpoint
- Optionally call per-book info endpoint
- Write JSON output without cookie contents
- Print a short summary to stdout

### Required output fields
Top-level fields:
- `exported_at`
- `source`
- `summary`
- `shelf_sync`
- `notebook`
- `book_info` (optional)
- `warnings` (optional)

## 4. Normalization Script Requirements
File:
- `scripts/normalize_weread.py`

### CLI
```bash
python3 scripts/normalize_weread.py --input data/weread-raw.json --output data/weread-normalized.json
```

### Required behavior
- Read raw export JSON
- Build per-book normalized records
- Compute reading status buckets
- Compute engagement score
- Compute top categories / lists
- Produce recommendation-friendly summary sections

### Normalized top-level fields
- `generated_at`
- `source_file`
- `summary`
- `profile_inputs`
- `llm_hints`
- `books`

### Required per-book normalized fields
- `book_id`
- `title`
- `author`
- `translator`
- `categories`
- `book_lists`
- `status` (`finished` | `reading` | `unread`)
- `is_finished`
- `progress`
- `reading_time_seconds`
- `last_read_at`
- `note_count`
- `bookmark_count`
- `review_count`
- `interaction_count`
- `engagement_score`
- `is_imported`
- `is_paid`
- `public_rating` (if available)
- `intro` (if available)

## 5. Skill Behavior Requirements
File:
- `SKILL.md`

### Trigger scope
Use when the user asks to:
- export local WeRead records
- normalize WeRead data
- analyze reading history
- recommend books from WeRead history

### Skill workflow
1. If normalized file is missing, help the user export + normalize first.
2. Read normalized JSON.
3. Extract strongest books, categories, recency, unfinished momentum, interaction patterns.
4. If the user has a current goal/question, prioritize that.
5. Produce recommendations with explanations.

### Output format expectations
Suggested reply sections:
- 阅读画像 / Reading profile
- 推荐结果 / Recommendations
- 为什么适合现在 / Why now
- 可选：暂缓推荐 / Skip for now

Each recommendation should ideally explain:
- why it matches the user's current goal
- which past books it resembles
- what new value it adds
- whether it is a safe pick or an exploration pick

## 6. Security Requirements
- Never store cookie in exported JSON
- Never hardcode cookie in code or assets
- Never suggest third-party cookie sync as the default path
- Keep export local-first

## 7. Deferred Items
- highlight / note text analysis
- duplicate edition detection
- external candidate retrieval
- automatic refresh jobs
- packaging / installation

FILE:TODO.md
# TODO.md — weread-reading-recommender

> 当前状态：草稿已初始化，部分文档已落地；核心代码和正式 skill 说明尚未完成。

## 项目目标
做一个**本地优先（local-first）**的微信读书推荐 skill：

- 通过**本地 cookie**导出微信读书阅读记录
- 将原始数据归一化为推荐友好的 JSON
- 根据：
  1. 用户当前想解决的问题 / 想学习的主题
  2. 用户微信读书里的阅读轨迹
  来推荐下一本或下一组最适合的书

## 已完成
### 文件
- `PLAN.md`
- `SPEC.md`
- `references/data-schema.md`
- `references/privacy-model.md`
- `references/recommendation-rubric.md`
- `assets/sample-weread-raw.json`

### 目录
- `scripts/`
- `references/`
- `assets/`

## 未完成（必须补）

---

# 1. 正式完成 SKILL.md
当前 `SKILL.md` 仍然是 init_skill.py 生成的模板，**必须改成正式版**。

## 要求
### 1.1 说明 skill 是做什么的
明确写清：
- 这是一个本地优先的微信读书阅读画像与图书推荐 skill
- 主要用途是：
  - 导出本地微信读书记录
  - 归一化数据
  - 根据当前目标 / 阅读历史推荐书

### 1.2 说明什么时候用
要覆盖这些触发场景：
- 用户说“根据我的微信读书记录推荐书”
- 用户说“分析我的阅读偏好”
- 用户说“我最近想学某个主题，结合微信读书记录推荐”
- 用户说“帮我导出 / 刷新 / 归一化微信读书数据”

### 1.3 工作流写清楚
推荐结构：
1. 检查是否已有 normalized JSON
2. 如果没有，则先运行导出脚本
3. 再运行 normalize 脚本
4. 再读取 normalized 数据做推荐

### 1.4 要强调安全边界
必须明确写：
- cookie 仅限本地使用
- 不要写进 skill 文件
- 不要输出到导出 JSON
- 不要默认依赖 CookieCloud / 第三方同步服务

### 1.5 写几个真实示例
例如：
- 结合我的微信读书记录，我最近想系统学 AI Agent，推荐 5 本书
- 基于我的阅读历史，推荐下一本最适合现在读的书
- 分析我的阅读偏好，并给我 3 本稳妥推荐 + 2 本探索推荐

---

# 2. 实现 scripts/export_weread.py
这是核心导出脚本，**必须实现**。

## 目标
本地读取 cookie，调用微信读书接口，导出 raw JSON。

## 技术要求
- Python 3
- 尽量只依赖标准库 + `requests`
- 代码要可直接运行
- 错误信息明确，不要吞异常

## 支持的 cookie 输入优先级
按优先级读取：
1. `--cookie`
2. `--cookie-file`
3. `--env-var <NAME>` 指定环境变量
4. 默认环境变量 `WEREAD_COOKIE`

## CLI 参数
至少支持：
- `--out`
- `--cookie`
- `--cookie-file`
- `--env-var`
- `--include-book-info`
- `--detail-limit`
- `--timeout`

## 必须调用的接口
### 2.1 书架同步
`https://weread.qq.com/web/shelf/sync`

### 2.2 笔记本书籍列表
`https://weread.qq.com/api/user/notebook`

### 2.3 可选：按 bookId 获取书籍信息
`https://weread.qq.com/api/book/info`

## 输出 JSON 顶层字段
至少包括：
- `exported_at`
- `source`
- `summary`
- `shelf_sync`
- `notebook`
- `book_info`
- `warnings`（可选）

## 输出要求
- **严禁写入 cookie 到 JSON**
- `summary` 至少统计：
  - total_books
  - finished_books
  - reading_books
  - unread_books
  - notebook_books
- 终端 stdout 打印导出摘要

## 异常处理要求
至少处理：
- 没有 cookie
- cookie 文件不存在
- 接口请求失败
- 接口返回非法 JSON
- cookie 过期 / 未登录

## 建议函数拆分
建议至少拆成：
- `resolve_cookie(...)`
- `build_session(...)`
- `fetch_shelf_sync(...)`
- `fetch_notebook(...)`
- `fetch_book_info(...)`
- `build_summary(...)`
- `main()`

---

# 3. 实现 scripts/normalize_weread.py
这是第二个核心脚本，负责把 raw JSON 转成推荐友好格式。

## 目标
读取 `weread-raw.json`，输出 `weread-normalized.json`。

## CLI
至少支持：
- `--input`
- `--output`

## 每本书必须生成的字段
- `book_id`
- `title`
- `author`
- `translator`
- `categories`
- `book_lists`
- `status`：`finished | reading | unread`
- `is_finished`
- `progress`
- `reading_time_seconds`
- `last_read_at`
- `note_count`
- `bookmark_count`
- `review_count`
- `interaction_count`
- `engagement_score`
- `is_imported`
- `is_paid`
- `public_rating`
- `intro`

## 顶层必须生成的结构
### 3.1 summary
至少包括：
- total_books
- status_counts
- top_categories
- top_book_lists
- imported_vs_native

### 3.2 profile_inputs
至少包括：
- `highest_engagement_books`
- `recent_books`
- `unfinished_books_with_momentum`

### 3.3 llm_hints
生成若干简短提示，帮助推荐阶段快速理解阅读画像，例如：
- 最近阅读偏向哪些类别
- 高互动书主要集中在哪些主题
- 完读率较高的类型

## engagement_score 设计要求
给出一个启发式分数（0~100 即可），综合考虑：
- finishReading
- progress
- readingTime
- noteCount
- bookmarkCount
- reviewCount
- recency

### 建议大致思路
可例如：
- 完读加分
- 进度高加分
- 阅读时长加分（上限截断）
- 交互数加分
- 最近读过加分

## recent_books
按 `last_read_at` 排序取前若干本。

## highest_engagement_books
按 `engagement_score` 排序取前若干本。

## unfinished_books_with_momentum
筛出：
- 未读完
- 但 progress > 0 或 reading_time_seconds > 0
- 并按 momentum 排序

## 建议函数拆分
建议至少拆成：
- `load_raw(...)`
- `parse_categories(...)`
- `build_book_lists_map(...)`
- `compute_status(...)`
- `compute_engagement_score(...)`
- `normalize_books(...)`
- `build_summary(...)`
- `build_profile_inputs(...)`
- `build_llm_hints(...)`
- `main()`

---

# 4. 生成 assets/sample-weread-normalized.json
基于现有的：
- `assets/sample-weread-raw.json`

生成对应的：
- `assets/sample-weread-normalized.json`

## 要求
- 与 normalize 脚本输出结构一致
- 可作为示例输入给 skill 使用
- 字段完整，不要只放极简 stub

---

# 5. 自测
至少做这些最基本的验证：

## 5.1 Python 语法检查
例如：
- `python3 -m py_compile scripts/export_weread.py`
- `python3 -m py_compile scripts/normalize_weread.py`

## 5.2 normalize 跑样例
用现有样例跑一次：
- 输入：`assets/sample-weread-raw.json`
- 输出：`assets/sample-weread-normalized.json`

## 5.3 输出结果检查
确认：
- 文件成功生成
- JSON 合法
- 每本书关键字段都存在
- summary / profile_inputs / llm_hints 存在

## 5.4 最好顺手补 README 级运行说明到 SKILL.md
至少告诉用户：
- 如何设置 cookie
- 如何导出 raw
- 如何 normalize
- 然后如何问推荐问题

---

# 6. 推荐功能层面的设计要求（供完善 SKILL.md / 后续实现参考）
虽然本轮重点是导出和归一化，但 skill 的最终功能目标必须写清：

## 功能目标
### 6.1 有当前目标时
根据：
- 当前问题 / 学习主题
- 微信读书历史
推荐书籍

例如：
- 我最近想系统学 AI Agent，结合我的微信读书记录推荐几本书

### 6.2 没有当前目标时
根据阅读历史做：
- 阅读画像总结
- 稳妥推荐
- 探索推荐
- 可选：暂缓推荐

## 推荐解释需要包含
每本书最好说明：
- 为什么推荐
- 它像你读过的哪几本
- 它补的是哪一块
- 适不适合现在读

---

# 7. 安全要求（实现时必须遵守）
- 不要把 cookie 写入任何 JSON / md / assets / logs
- 不要把第三方 cookie 同步作为默认方案
- 不要要求远端服务代管 cookie
- 保持 local-first

---

# 8. 建议最终交付文件清单
完成后，目录至少应该包含：

```text
weread-reading-recommender/
├── PLAN.md
├── SPEC.md
├── TODO.md
├── SKILL.md
├── scripts/
│   ├── export_weread.py
│   └── normalize_weread.py
├── references/
│   ├── data-schema.md
│   ├── privacy-model.md
│   └── recommendation-rubric.md
└── assets/
    ├── sample-weread-raw.json
    └── sample-weread-normalized.json
```

---

# 9. 完成定义（Definition of Done）
满足以下条件才算本轮完成：

- [ ] `SKILL.md` 已从模板改成正式版
- [ ] `scripts/export_weread.py` 已实现
- [ ] `scripts/normalize_weread.py` 已实现
- [ ] `assets/sample-weread-normalized.json` 已生成
- [ ] 两个 Python 脚本语法通过
- [ ] normalize 脚本已用 sample raw 跑通
- [ ] 没有把 cookie 写入任何输出文件

---

# 10. 可选增强（不是本轮必须）
以下内容可以后续再做：
- 分析划线/笔记文本
- 更强的阅读画像总结
- 外部候选书源召回
- 中英文书名归一化 / 去重
- 热门书评纳入解释层
- 自动刷新脚本

FILE:assets/sample-weread-normalized.json
{
  "generated_at": "2026-03-19T09:08:30Z",
  "source_file": "assets/sample-weread-raw.json",
  "summary": {
    "total_books": 3,
    "status_counts": {
      "finished": 1,
      "reading": 1,
      "unread": 1
    },
    "top_categories": [
      {
        "name": "思维",
        "books": 2,
        "weighted_score": 139.0
      },
      {
        "name": "创业",
        "books": 1,
        "weighted_score": 100.0
      },
      {
        "name": "写作",
        "books": 1,
        "weighted_score": 39.0
      },
      {
        "name": "投资",
        "books": 1,
        "weighted_score": 0.0
      },
      {
        "name": "认知",
        "books": 1,
        "weighted_score": 0.0
      }
    ],
    "top_book_lists": [
      {
        "name": "方法论",
        "books": 2,
        "weighted_score": 139.0
      },
      {
        "name": "金融",
        "books": 1,
        "weighted_score": 0.0
      }
    ],
    "imported_vs_native": {
      "imported": 1,
      "native": 2
    }
  },
  "profile_inputs": {
    "highest_engagement_books": [
      {
        "book_id": "book_1",
        "title": "纳瓦尔宝典",
        "author": "埃里克·乔根森",
        "status": "finished",
        "progress": 100.0,
        "engagement_score": 100.0,
        "last_read_at": "2026-03-19T10:40:00Z",
        "categories": [
          "创业",
          "思维"
        ]
      },
      {
        "book_id": "book_2",
        "title": "金字塔原理",
        "author": "芭芭拉·明托",
        "status": "reading",
        "progress": 42.0,
        "engagement_score": 39.0,
        "last_read_at": "2026-03-20T10:40:00Z",
        "categories": [
          "写作",
          "思维"
        ]
      },
      {
        "book_id": "CB_book_3",
        "title": "The Psychology of Money",
        "author": "Morgan Housel",
        "status": "unread",
        "progress": 0.0,
        "engagement_score": 0.0,
        "last_read_at": null,
        "categories": [
          "投资",
          "认知"
        ]
      }
    ],
    "recent_books": [
      {
        "book_id": "book_2",
        "title": "金字塔原理",
        "author": "芭芭拉·明托",
        "status": "reading",
        "progress": 42.0,
        "engagement_score": 39.0,
        "last_read_at": "2026-03-20T10:40:00Z",
        "categories": [
          "写作",
          "思维"
        ]
      },
      {
        "book_id": "book_1",
        "title": "纳瓦尔宝典",
        "author": "埃里克·乔根森",
        "status": "finished",
        "progress": 100.0,
        "engagement_score": 100.0,
        "last_read_at": "2026-03-19T10:40:00Z",
        "categories": [
          "创业",
          "思维"
        ]
      }
    ],
    "unfinished_books_with_momentum": [
      {
        "book_id": "book_2",
        "title": "金字塔原理",
        "author": "芭芭拉·明托",
        "status": "reading",
        "progress": 42.0,
        "engagement_score": 39.0,
        "last_read_at": "2026-03-20T10:40:00Z",
        "categories": [
          "写作",
          "思维"
        ]
      }
    ]
  },
  "llm_hints": [
    "最近阅读重心偏向：思维, 写作, 创业。",
    "高互动书主要集中在：创业, 思维。",
    "完读率较高的类型包括：创业, 思维。",
    "当前有 1 本书处于在读状态，可优先判断是否续读。",
    "书架里有 1 本导入书，推荐时注意区分导入来源与原生微信读书书籍。"
  ],
  "books": [
    {
      "book_id": "book_1",
      "title": "纳瓦尔宝典",
      "author": "埃里克·乔根森",
      "translator": "",
      "categories": [
        "创业",
        "思维"
      ],
      "book_lists": [
        "方法论"
      ],
      "status": "finished",
      "is_finished": true,
      "progress": 100.0,
      "reading_time_seconds": 28800,
      "last_read_at": "2026-03-19T10:40:00Z",
      "note_count": 5,
      "bookmark_count": 18,
      "review_count": 2,
      "interaction_count": 25,
      "engagement_score": 100.0,
      "is_imported": false,
      "is_paid": true,
      "public_rating": 9.2,
      "intro": "一本关于财富、杠杆和人生选择的现代格言集。"
    },
    {
      "book_id": "book_2",
      "title": "金字塔原理",
      "author": "芭芭拉·明托",
      "translator": "",
      "categories": [
        "写作",
        "思维"
      ],
      "book_lists": [
        "方法论"
      ],
      "status": "reading",
      "is_finished": false,
      "progress": 42.0,
      "reading_time_seconds": 10800,
      "last_read_at": "2026-03-20T10:40:00Z",
      "note_count": 2,
      "bookmark_count": 4,
      "review_count": 0,
      "interaction_count": 6,
      "engagement_score": 39.0,
      "is_imported": false,
      "is_paid": true,
      "public_rating": 8.9,
      "intro": "结构化表达与思考训练经典。"
    },
    {
      "book_id": "CB_book_3",
      "title": "The Psychology of Money",
      "author": "Morgan Housel",
      "translator": "",
      "categories": [
        "投资",
        "认知"
      ],
      "book_lists": [
        "金融"
      ],
      "status": "unread",
      "is_finished": false,
      "progress": 0.0,
      "reading_time_seconds": 0,
      "last_read_at": null,
      "note_count": 0,
      "bookmark_count": 0,
      "review_count": 0,
      "interaction_count": 0,
      "engagement_score": 0.0,
      "is_imported": true,
      "is_paid": false,
      "public_rating": 9.1,
      "intro": "Timeless lessons on wealth, greed, and happiness."
    }
  ]
}

FILE:assets/sample-weread-raw.json
{
  "exported_at": "2026-03-19T16:00:00Z",
  "source": "weread-local-cookie",
  "summary": {
    "total_books": 3,
    "finished_books": 1,
    "reading_books": 1,
    "unread_books": 1,
    "notebook_books": 2
  },
  "shelf_sync": {
    "books": [
      {
        "bookId": "book_1",
        "title": "纳瓦尔宝典",
        "author": "埃里克·乔根森",
        "translator": "",
        "finishReading": 1,
        "paid": 1,
        "price": 0,
        "publishTime": "2021-01-01",
        "categories": [{"title": "创业"}, {"title": "思维"}]
      },
      {
        "bookId": "book_2",
        "title": "金字塔原理",
        "author": "芭芭拉·明托",
        "translator": "",
        "finishReading": 0,
        "paid": 1,
        "price": 0,
        "publishTime": "2019-01-01",
        "categories": [{"title": "写作"}, {"title": "思维"}]
      },
      {
        "bookId": "CB_book_3",
        "title": "The Psychology of Money",
        "author": "Morgan Housel",
        "translator": "",
        "finishReading": 0,
        "paid": 0,
        "price": 0,
        "publishTime": "2020-01-01",
        "categories": [{"title": "投资"}, {"title": "认知"}]
      }
    ],
    "bookProgress": [
      {
        "bookId": "book_1",
        "progress": 100,
        "readingTime": 28800,
        "updateTime": 1773916800
      },
      {
        "bookId": "book_2",
        "progress": 42,
        "readingTime": 10800,
        "updateTime": 1774003200
      },
      {
        "bookId": "CB_book_3",
        "progress": 0,
        "readingTime": 0,
        "updateTime": 0
      }
    ],
    "archive": [
      {
        "archiveId": 1,
        "name": "方法论",
        "bookIds": ["book_1", "book_2"]
      },
      {
        "archiveId": 2,
        "name": "金融",
        "bookIds": ["CB_book_3"]
      }
    ]
  },
  "notebook": {
    "books": [
      {
        "bookId": "book_1",
        "noteCount": 5,
        "bookmarkCount": 18,
        "reviewCount": 2
      },
      {
        "bookId": "book_2",
        "noteCount": 2,
        "bookmarkCount": 4,
        "reviewCount": 0
      }
    ]
  },
  "book_info": {
    "book_1": {
      "intro": "一本关于财富、杠杆和人生选择的现代格言集。",
      "newRating": 9200,
      "publisher": "中信出版社"
    },
    "book_2": {
      "intro": "结构化表达与思考训练经典。",
      "newRating": 8900,
      "publisher": "南海出版公司"
    },
    "CB_book_3": {
      "intro": "Timeless lessons on wealth, greed, and happiness.",
      "newRating": 9100,
      "publisher": "Harriman House"
    }
  }
}

FILE:references/data-schema.md
# Data Schema

## Raw export (`weread-raw.json`)

Top-level shape:

```json
{
  "exported_at": "2026-03-19T16:00:00Z",
  "source": "weread-local-cookie",
  "summary": {
    "total_books": 123,
    "finished_books": 40,
    "reading_books": 12,
    "unread_books": 71,
    "notebook_books": 18
  },
  "shelf_sync": { "...": "raw WeRead response" },
  "notebook": { "...": "raw WeRead response" },
  "book_info": {
    "book_id": { "...": "raw per-book info" }
  }
}
```

Notes:
- `shelf_sync` is intentionally close to the original response for debugging and future expansion.
- `book_info` is optional because fetching details for every book can be slow.

## Normalized export (`weread-normalized.json`)

Top-level shape:

```json
{
  "generated_at": "2026-03-19T16:05:00Z",
  "source_file": "data/weread-raw.json",
  "summary": {
    "total_books": 123,
    "status_counts": {
      "finished": 40,
      "reading": 12,
      "unread": 71
    },
    "top_categories": [
      {"name": "商业", "books": 10, "weighted_score": 421.0}
    ]
  },
  "profile_inputs": {
    "highest_engagement_books": [],
    "recent_books": [],
    "unfinished_books_with_momentum": []
  },
  "llm_hints": [],
  "books": []
}
```

## Engagement score
The normalizer computes a heuristic `engagement_score` in the range `0..100` from:
- reading progress
- finished state
- interaction count (notes/bookmarks/reviews)
- reading time
- recency

This score is not a universal truth; it is a ranking hint for recommendation reasoning.

## Status definition
- `finished`: `finishReading == 1`
- `reading`: not finished and `progress > 0`
- `unread`: not finished and no measurable progress

FILE:references/privacy-model.md
# Privacy Model

## Principle
This skill is **local-first**.

The user may use a WeRead cookie locally to export their own reading records, but:
- the cookie must stay on the local machine
- the cookie must not be embedded in skill files
- the cookie must not be written into exported JSON
- the cookie must not be sent to third-party sync services by default

## Allowed local cookie sources
- environment variable
- local cookie file
- direct one-off CLI argument

## Not the default path
- CookieCloud
- remote cookie sync
- shared cloud storage of cookie

## Output hygiene
The exporter should write only reading-record data that is necessary for downstream recommendation work.

## Recommended workflow
1. User sets local cookie
2. Export raw JSON locally
3. Normalize locally
4. Skill reads normalized JSON
5. Recommendation happens from the normalized file, not from a live cookie session unless the user explicitly requests a refresh

FILE:references/recommendation-rubric.md
# Recommendation Rubric

## When the user provides a current goal
Weight approximately:
- 60% goal fit
- 40% history fit

Examples:
- "我最近想系统学 AI Agent"
- "我想补创业管理"
- "我想读几本建立世界史框架的书"

## When the user provides no goal
Weight approximately:
- 70% history fit
- 20% recency
- 10% exploration/diversity

## What to look at in normalized data
### Strong positive signals
- high `engagement_score`
- finished books
- books with many notes/bookmarks/reviews
- recently active books
- repeated categories among high-engagement books

### Weak signals
- unread books with zero interaction
- imported/archive items with no progress
- books present only because they were added long ago

## Recommended reply structure
### 1. Reading profile
Summarize:
- dominant themes
- likely preferred style (framework / practical / narrative / theoretical)
- recency shift

### 2. Recommendations
For each book, explain:
- why it matches the goal or history
- which prior books it resembles
- what new angle it adds
- whether it is a safe pick or exploration pick

### 3. Why now
State why the recommendation fits the user's current stage.

### 4. Skip for now (optional)
Call out books that may be good later but are not the best next read.

## Avoid
- empty similarity claims
- recommending near-duplicate editions/translations as if they were new discoveries
- recommending only because a book is famous

FILE:scripts/export_weread.py
#!/usr/bin/env python3
"""Export local WeRead data into a raw JSON snapshot."""

from __future__ import annotations

import argparse
import json
import os
import sys
from dataclasses import dataclass
from datetime import datetime, timezone
from pathlib import Path
from typing import Any, Dict, Iterable, List, Optional

import requests


SHELF_SYNC_URL = "https://weread.qq.com/web/shelf/sync"
NOTEBOOK_URL = "https://weread.qq.com/api/user/notebook"
BOOK_INFO_URL = "https://weread.qq.com/api/book/info"
DEFAULT_ENV_VAR = "WEREAD_COOKIE"
DEFAULT_TIMEOUT = 15.0


class ExportError(RuntimeError):
    """Raised when export cannot continue safely."""


@dataclass
class CookieSource:
    source: str
    env_var: Optional[str] = None


def parse_args() -> argparse.Namespace:
    parser = argparse.ArgumentParser(
        description="Export WeRead shelf and notebook data to a raw JSON file."
    )
    parser.add_argument("--out", required=True, help="Path to write the raw JSON export.")
    parser.add_argument("--cookie", help="Cookie string to use for this run.")
    parser.add_argument("--cookie-file", help="Path to a local file containing the cookie.")
    parser.add_argument(
        "--env-var",
        default=DEFAULT_ENV_VAR,
        help=f"Environment variable name to read cookie from. Default: {DEFAULT_ENV_VAR}",
    )
    parser.add_argument(
        "--include-book-info",
        action="store_true",
        help="Fetch per-book metadata via the book info endpoint.",
    )
    parser.add_argument(
        "--detail-limit",
        type=int,
        default=0,
        help="Max number of books to fetch detail for. 0 means all books when enabled.",
    )
    parser.add_argument(
        "--timeout",
        type=float,
        default=DEFAULT_TIMEOUT,
        help=f"HTTP timeout in seconds. Default: {DEFAULT_TIMEOUT}",
    )
    return parser.parse_args()


def resolve_cookie(args: argparse.Namespace) -> tuple[str, CookieSource]:
    if args.cookie:
        cookie = args.cookie.strip()
        if not cookie:
            raise ExportError("--cookie was provided but is empty.")
        return cookie, CookieSource(source="cli")

    if args.cookie_file:
        cookie_path = Path(args.cookie_file).expanduser()
        if not cookie_path.exists():
            raise ExportError(f"Cookie file does not exist: {cookie_path}")
        if not cookie_path.is_file():
            raise ExportError(f"Cookie path is not a file: {cookie_path}")
        cookie = cookie_path.read_text(encoding="utf-8").strip()
        if not cookie:
            raise ExportError(f"Cookie file is empty: {cookie_path}")
        return cookie, CookieSource(source="file")

    env_name = args.env_var or DEFAULT_ENV_VAR
    cookie = os.environ.get(env_name, "").strip()
    if cookie:
        return cookie, CookieSource(source="env", env_var=env_name)

    raise ExportError(
        "No cookie available. Provide --cookie, --cookie-file, or set the "
        f"{env_name} environment variable."
    )


def build_session(cookie: str) -> requests.Session:
    session = requests.Session()
    session.headers.update(
        {
            "Cookie": cookie,
            "Referer": "https://weread.qq.com/",
            "Origin": "https://weread.qq.com",
            "User-Agent": (
                "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) "
                "AppleWebKit/537.36 (KHTML, like Gecko) "
                "Chrome/123.0.0.0 Safari/537.36"
            ),
            "Accept": "application/json, text/plain, */*",
        }
    )
    return session


def request_json(
    session: requests.Session,
    method: str,
    url: str,
    *,
    timeout: float,
    params: Optional[Dict[str, Any]] = None,
    data: Optional[Dict[str, Any]] = None,
) -> Any:
    try:
        response = session.request(
            method,
            url,
            timeout=timeout,
            params=params,
            data=data,
        )
        response.raise_for_status()
    except requests.RequestException as exc:
        raise ExportError(f"Request failed for {url}: {exc}") from exc

    try:
        payload = response.json()
    except ValueError as exc:
        raise ExportError(f"Invalid JSON returned by {url}") from exc

    detect_auth_failure(payload, url)
    return payload


def detect_auth_failure(payload: Any, url: str) -> None:
    if isinstance(payload, dict):
        text_fields = [
            str(payload.get("errMsg", "")),
            str(payload.get("errmsg", "")),
            str(payload.get("message", "")),
            str(payload.get("msg", "")),
        ]
        combined = " ".join(text_fields).lower()
        if payload.get("code") in {401, 403} or payload.get("errcode") in {401, 403}:
            raise ExportError(f"Authentication failed for {url}. Cookie may be expired.")
        if any(
            keyword in combined
            for keyword in ("login", "登录", "expired", "unauthorized", "cookie")
        ):
            raise ExportError(f"Authentication failed for {url}. Cookie may be expired.")


def fetch_shelf_sync(session: requests.Session, timeout: float) -> Dict[str, Any]:
    payload = request_json(session, "GET", SHELF_SYNC_URL, timeout=timeout)
    if not isinstance(payload, dict):
        raise ExportError("Unexpected shelf sync payload shape; expected a JSON object.")
    return payload


def fetch_notebook(session: requests.Session, timeout: float) -> Dict[str, Any]:
    payload = request_json(session, "GET", NOTEBOOK_URL, timeout=timeout)
    if not isinstance(payload, dict):
        raise ExportError("Unexpected notebook payload shape; expected a JSON object.")
    return payload


def fetch_book_info(
    session: requests.Session,
    book_ids: Iterable[str],
    timeout: float,
    detail_limit: int,
) -> tuple[Dict[str, Any], List[str]]:
    results: Dict[str, Any] = {}
    warnings: List[str] = []

    unique_book_ids = []
    seen = set()
    for book_id in book_ids:
        if not book_id or book_id in seen:
            continue
        seen.add(book_id)
        unique_book_ids.append(book_id)

    if detail_limit > 0:
        unique_book_ids = unique_book_ids[:detail_limit]

    for book_id in unique_book_ids:
        try:
            payload = request_json(
                session,
                "GET",
                BOOK_INFO_URL,
                timeout=timeout,
                params={"bookId": book_id},
            )
        except ExportError as exc:
            warnings.append(f"Failed to fetch book info for {book_id}: {exc}")
            continue

        results[book_id] = payload

    return results, warnings


def build_summary(shelf_sync: Dict[str, Any], notebook: Dict[str, Any]) -> Dict[str, int]:
    books = shelf_sync.get("books") or []
    progress_items = shelf_sync.get("bookProgress") or []
    notebook_books = notebook.get("books") or []

    progress_by_book = {
        str(item.get("bookId")): item for item in progress_items if item.get("bookId")
    }

    finished_books = 0
    reading_books = 0
    unread_books = 0

    for book in books:
        book_id = str(book.get("bookId", ""))
        is_finished = int(book.get("finishReading") or 0) == 1
        progress = progress_by_book.get(book_id, {}).get("progress") or 0

        if is_finished:
            finished_books += 1
        elif progress and float(progress) > 0:
            reading_books += 1
        else:
            unread_books += 1

    return {
        "total_books": len(books),
        "finished_books": finished_books,
        "reading_books": reading_books,
        "unread_books": unread_books,
        "notebook_books": len(notebook_books),
    }


def extract_book_ids(shelf_sync: Dict[str, Any], notebook: Dict[str, Any]) -> List[str]:
    book_ids: List[str] = []
    for book in shelf_sync.get("books") or []:
        if book.get("bookId"):
            book_ids.append(str(book["bookId"]))
    for book in notebook.get("books") or []:
        if book.get("bookId"):
            book_ids.append(str(book["bookId"]))
    return book_ids


def utc_now_iso() -> str:
    return datetime.now(timezone.utc).replace(microsecond=0).isoformat().replace("+00:00", "Z")


def write_json(out_path: Path, payload: Dict[str, Any]) -> None:
    out_path.parent.mkdir(parents=True, exist_ok=True)
    out_path.write_text(
        json.dumps(payload, ensure_ascii=False, indent=2) + "\n",
        encoding="utf-8",
    )


def print_summary(out_path: Path, summary: Dict[str, int], warnings: List[str]) -> None:
    print(f"Exported WeRead data to {out_path}")
    print(f"  total_books: {summary['total_books']}")
    print(f"  finished_books: {summary['finished_books']}")
    print(f"  reading_books: {summary['reading_books']}")
    print(f"  unread_books: {summary['unread_books']}")
    print(f"  notebook_books: {summary['notebook_books']}")
    if warnings:
        print(f"  warnings: {len(warnings)}")


def main() -> int:
    args = parse_args()

    if args.detail_limit < 0:
        raise ExportError("--detail-limit must be >= 0.")
    if args.timeout <= 0:
        raise ExportError("--timeout must be > 0.")

    cookie, cookie_source = resolve_cookie(args)
    session = build_session(cookie)

    shelf_sync = fetch_shelf_sync(session, args.timeout)
    notebook = fetch_notebook(session, args.timeout)

    warnings: List[str] = []
    book_info: Dict[str, Any] = {}
    if args.include_book_info:
        book_ids = extract_book_ids(shelf_sync, notebook)
        book_info, detail_warnings = fetch_book_info(
            session,
            book_ids,
            args.timeout,
            args.detail_limit,
        )
        warnings.extend(detail_warnings)

    summary = build_summary(shelf_sync, notebook)
    payload: Dict[str, Any] = {
        "exported_at": utc_now_iso(),
        "source": "weread-local-cookie",
        "summary": summary,
        "shelf_sync": shelf_sync,
        "notebook": notebook,
        "book_info": book_info,
    }

    source_meta = {"cookie_source": cookie_source.source}
    if cookie_source.env_var:
        source_meta["cookie_env_var"] = cookie_source.env_var
    payload["source_meta"] = source_meta

    if warnings:
        payload["warnings"] = warnings

    out_path = Path(args.out).expanduser()
    write_json(out_path, payload)
    print_summary(out_path, summary, warnings)
    return 0


if __name__ == "__main__":
    try:
        raise SystemExit(main())
    except ExportError as exc:
        print(f"Error: {exc}", file=sys.stderr)
        raise SystemExit(1)

FILE:scripts/normalize_weread.py
#!/usr/bin/env python3
"""Normalize raw WeRead export data into recommendation-friendly JSON."""

from __future__ import annotations

import argparse
import json
from collections import Counter
from datetime import datetime, timezone
from pathlib import Path
from typing import Any, Dict, Iterable, List, Optional, Tuple


def parse_args() -> argparse.Namespace:
    parser = argparse.ArgumentParser(
        description="Normalize a raw WeRead export into recommendation-friendly JSON."
    )
    parser.add_argument("--input", required=True, help="Path to raw WeRead JSON.")
    parser.add_argument("--output", required=True, help="Path to normalized JSON output.")
    return parser.parse_args()


def load_raw(path: Path) -> Dict[str, Any]:
    with path.open("r", encoding="utf-8") as handle:
        payload = json.load(handle)
    if not isinstance(payload, dict):
        raise ValueError("Raw WeRead export must be a JSON object.")
    return payload


def utc_now_iso() -> str:
    return datetime.now(timezone.utc).replace(microsecond=0).isoformat().replace("+00:00", "Z")


def timestamp_to_iso(value: Any) -> Optional[str]:
    if value in (None, "", 0, "0"):
        return None
    try:
        timestamp = int(value)
    except (TypeError, ValueError):
        return None
    if timestamp <= 0:
        return None
    return datetime.fromtimestamp(timestamp, tz=timezone.utc).replace(microsecond=0).isoformat().replace("+00:00", "Z")


def parse_categories(book: Dict[str, Any], info: Dict[str, Any]) -> List[str]:
    categories: List[str] = []
    for source in (book.get("categories"), info.get("categories")):
        if not source:
            continue
        for item in source:
            if isinstance(item, dict):
                name = item.get("title") or item.get("name")
            else:
                name = item
            if name:
                categories.append(str(name).strip())
    return dedupe_preserve_order(categories)


def build_book_lists_map(archives: Iterable[Dict[str, Any]]) -> Dict[str, List[str]]:
    mapping: Dict[str, List[str]] = {}
    for archive in archives or []:
        list_name = str(archive.get("name") or "").strip()
        if not list_name:
            continue
        for book_id in archive.get("bookIds") or []:
            book_key = str(book_id)
            mapping.setdefault(book_key, []).append(list_name)
    for book_id, names in mapping.items():
        mapping[book_id] = dedupe_preserve_order(names)
    return mapping


def compute_status(book: Dict[str, Any], progress: Dict[str, Any]) -> Tuple[str, bool]:
    is_finished = int(book.get("finishReading") or 0) == 1
    progress_value = float(progress.get("progress") or 0)
    if is_finished:
        return "finished", True
    if progress_value > 0:
        return "reading", False
    return "unread", False


def compute_engagement_score(
    *,
    is_finished: bool,
    progress: float,
    reading_time_seconds: int,
    note_count: int,
    bookmark_count: int,
    review_count: int,
    last_read_at: Optional[str],
) -> float:
    score = 0.0

    if is_finished:
        score += 25.0

    score += min(max(progress, 0.0), 100.0) * 0.25
    score += min(reading_time_seconds / 3600.0, 20.0) * 1.5

    interaction_score = note_count * 2.5 + bookmark_count * 1.0 + review_count * 4.0
    score += min(interaction_score, 25.0)

    if last_read_at:
        try:
            last_dt = datetime.fromisoformat(last_read_at.replace("Z", "+00:00"))
            now = datetime.now(timezone.utc)
            age_days = max((now - last_dt).days, 0)
            recency_bonus = max(0.0, 15.0 - min(age_days, 90) / 6.0)
            score += recency_bonus
        except ValueError:
            pass

    return round(min(score, 100.0), 1)


def normalize_books(raw: Dict[str, Any]) -> List[Dict[str, Any]]:
    shelf_sync = raw.get("shelf_sync") or {}
    notebook = raw.get("notebook") or {}
    book_info = raw.get("book_info") or {}

    books = shelf_sync.get("books") or []
    progress_items = shelf_sync.get("bookProgress") or []
    progress_by_book = {
        str(item.get("bookId")): item for item in progress_items if item.get("bookId")
    }
    notebook_by_book = {
        str(item.get("bookId")): item for item in (notebook.get("books") or []) if item.get("bookId")
    }
    book_lists_map = build_book_lists_map(shelf_sync.get("archive") or [])

    normalized: List[Dict[str, Any]] = []
    for book in books:
        book_id = str(book.get("bookId") or "").strip()
        if not book_id:
            continue

        progress_item = progress_by_book.get(book_id, {})
        notebook_item = notebook_by_book.get(book_id, {})
        info = book_info.get(book_id) or {}

        status, is_finished = compute_status(book, progress_item)
        progress = float(progress_item.get("progress") or 0)
        reading_time_seconds = int(progress_item.get("readingTime") or 0)
        last_read_at = timestamp_to_iso(progress_item.get("updateTime"))
        note_count = int(notebook_item.get("noteCount") or 0)
        bookmark_count = int(notebook_item.get("bookmarkCount") or 0)
        review_count = int(notebook_item.get("reviewCount") or 0)
        interaction_count = note_count + bookmark_count + review_count
        public_rating = extract_public_rating(info)

        record = {
            "book_id": book_id,
            "title": str(book.get("title") or info.get("title") or "").strip(),
            "author": str(book.get("author") or info.get("author") or "").strip(),
            "translator": str(book.get("translator") or info.get("translator") or "").strip(),
            "categories": parse_categories(book, info),
            "book_lists": book_lists_map.get(book_id, []),
            "status": status,
            "is_finished": is_finished,
            "progress": round(progress, 1),
            "reading_time_seconds": reading_time_seconds,
            "last_read_at": last_read_at,
            "note_count": note_count,
            "bookmark_count": bookmark_count,
            "review_count": review_count,
            "interaction_count": interaction_count,
            "engagement_score": compute_engagement_score(
                is_finished=is_finished,
                progress=progress,
                reading_time_seconds=reading_time_seconds,
                note_count=note_count,
                bookmark_count=bookmark_count,
                review_count=review_count,
                last_read_at=last_read_at,
            ),
            "is_imported": book_id.startswith("CB_"),
            "is_paid": bool(int(book.get("paid") or 0)),
            "public_rating": public_rating,
            "intro": str(info.get("intro") or "").strip(),
        }
        normalized.append(record)

    normalized.sort(
        key=lambda item: (
            item["engagement_score"],
            item["last_read_at"] or "",
            item["title"],
        ),
        reverse=True,
    )
    return normalized


def extract_public_rating(info: Dict[str, Any]) -> Optional[float]:
    rating = info.get("newRating")
    if rating in (None, ""):
        return None
    try:
        numeric = float(rating)
    except (TypeError, ValueError):
        return None
    return round(numeric / 1000.0, 2) if numeric > 100 else round(numeric, 2)


def build_summary(books: List[Dict[str, Any]]) -> Dict[str, Any]:
    status_counts = Counter(book["status"] for book in books)
    category_counter: Counter[str] = Counter()
    category_weighted: Counter[str] = Counter()
    book_list_counter: Counter[str] = Counter()
    book_list_weighted: Counter[str] = Counter()
    imported_vs_native = {"imported": 0, "native": 0}

    for book in books:
        if book["is_imported"]:
            imported_vs_native["imported"] += 1
        else:
            imported_vs_native["native"] += 1

        for name in book["categories"]:
            category_counter[name] += 1
            category_weighted[name] += book["engagement_score"]

        for name in book["book_lists"]:
            book_list_counter[name] += 1
            book_list_weighted[name] += book["engagement_score"]

    top_categories = [
        {
            "name": name,
            "books": count,
            "weighted_score": round(category_weighted[name], 1),
        }
        for name, count in category_counter.most_common(8)
    ]
    top_book_lists = [
        {
            "name": name,
            "books": count,
            "weighted_score": round(book_list_weighted[name], 1),
        }
        for name, count in book_list_counter.most_common(8)
    ]

    return {
        "total_books": len(books),
        "status_counts": {
            "finished": status_counts.get("finished", 0),
            "reading": status_counts.get("reading", 0),
            "unread": status_counts.get("unread", 0),
        },
        "top_categories": top_categories,
        "top_book_lists": top_book_lists,
        "imported_vs_native": imported_vs_native,
    }


def build_profile_inputs(books: List[Dict[str, Any]]) -> Dict[str, List[Dict[str, Any]]]:
    highest_engagement = sorted(
        books, key=lambda item: (item["engagement_score"], item["title"]), reverse=True
    )[:5]
    recent_books = sorted(
        (book for book in books if book["last_read_at"]),
        key=lambda item: item["last_read_at"],
        reverse=True,
    )[:5]
    unfinished_books = sorted(
        (
            book
            for book in books
            if not book["is_finished"]
            and (book["progress"] > 0 or book["reading_time_seconds"] > 0)
        ),
        key=lambda item: (
            momentum_score(item),
            item["last_read_at"] or "",
        ),
        reverse=True,
    )[:5]

    return {
        "highest_engagement_books": [profile_book_view(book) for book in highest_engagement],
        "recent_books": [profile_book_view(book) for book in recent_books],
        "unfinished_books_with_momentum": [profile_book_view(book) for book in unfinished_books],
    }


def momentum_score(book: Dict[str, Any]) -> float:
    return round(
        float(book["progress"]) * 0.6
        + min(book["reading_time_seconds"] / 3600.0, 12.0) * 3.0
        + book["interaction_count"] * 1.5,
        1,
    )


def profile_book_view(book: Dict[str, Any]) -> Dict[str, Any]:
    return {
        "book_id": book["book_id"],
        "title": book["title"],
        "author": book["author"],
        "status": book["status"],
        "progress": book["progress"],
        "engagement_score": book["engagement_score"],
        "last_read_at": book["last_read_at"],
        "categories": book["categories"],
    }


def build_llm_hints(books: List[Dict[str, Any]], summary: Dict[str, Any]) -> List[str]:
    hints: List[str] = []

    recent_books = [book for book in books if book["last_read_at"]]
    if recent_books:
        recent_categories = Counter(
            category
            for book in sorted(recent_books, key=lambda item: item["last_read_at"], reverse=True)[:5]
            for category in book["categories"]
        )
        if recent_categories:
            names = ", ".join(name for name, _ in recent_categories.most_common(3))
            hints.append(f"最近阅读重心偏向：{names}。")

    engaged_books = [book for book in books if book["engagement_score"] >= 60]
    if engaged_books:
        engaged_categories = Counter(
            category for book in engaged_books for category in book["categories"]
        )
        if engaged_categories:
            names = ", ".join(name for name, _ in engaged_categories.most_common(3))
            hints.append(f"高互动书主要集中在：{names}。")

    finished_categories = Counter(
        category for book in books if book["is_finished"] for category in book["categories"]
    )
    if finished_categories:
        names = ", ".join(name for name, _ in finished_categories.most_common(3))
        hints.append(f"完读率较高的类型包括：{names}。")

    status_counts = summary["status_counts"]
    if status_counts["reading"] > 0:
        hints.append(f"当前有 {status_counts['reading']} 本书处于在读状态，可优先判断是否续读。")

    imported_count = summary["imported_vs_native"]["imported"]
    if imported_count > 0:
        hints.append(f"书架里有 {imported_count} 本导入书，推荐时注意区分导入来源与原生微信读书书籍。")

    return hints[:6]


def dedupe_preserve_order(values: Iterable[str]) -> List[str]:
    seen = set()
    result: List[str] = []
    for value in values:
        cleaned = str(value).strip()
        if not cleaned or cleaned in seen:
            continue
        seen.add(cleaned)
        result.append(cleaned)
    return result


def write_json(path: Path, payload: Dict[str, Any]) -> None:
    path.parent.mkdir(parents=True, exist_ok=True)
    path.write_text(json.dumps(payload, ensure_ascii=False, indent=2) + "\n", encoding="utf-8")


def main() -> int:
    args = parse_args()
    input_path = Path(args.input).expanduser()
    output_path = Path(args.output).expanduser()

    raw = load_raw(input_path)
    books = normalize_books(raw)
    summary = build_summary(books)
    profile_inputs = build_profile_inputs(books)
    llm_hints = build_llm_hints(books, summary)

    payload = {
        "generated_at": utc_now_iso(),
        "source_file": str(input_path),
        "summary": summary,
        "profile_inputs": profile_inputs,
        "llm_hints": llm_hints,
        "books": books,
    }
    write_json(output_path, payload)
    return 0


if __name__ == "__main__":
    raise SystemExit(main())

ClawHub Coding Data Analysis+2

C@clawhub-cengsin-807851f459