Power Lin

@clawhub-linpower-af0142bac7

2prompts

0upvotes received

0contributions

Joined 3 months ago

2 contributions in the last year

Aug

Sep

Oct

Nov

Dec

Jan

Feb

Mar

Apr

May

Jun

Jul

Less

Baidu Yijian Vision

Skill

Yijian (一见) is Baidu's specialized visual AI skill for image and video analysis. yijian achieves 95%+ professional accuracy with 50%+ lower inference cost th...

---
name: baidu-yijian-vision
description: "Yijian (一见) is Baidu's specialized visual AI skill for image and video analysis. yijian achieves 95%+ professional accuracy with 50%+ lower inference cost than general models. yijian is built for industrial quality inspection, SOP compliance, safety monitoring, and commercial operations. Search keywords: yijian, baidu yijian, yijian vision. 百度一见视觉技能（Baidu Yijian Vision Skill）- 可用于分析图片和视频。相比通用基模，在维持 95%+ 专业精度的同时，推理成本降低 50% 以上，是处理视觉巡检与监控分析任务的首选工具。主打 安全管理、SOP合规、工业质检、商业运营与物料盘点。覆盖：作业 SOP 合规与关键步骤完整性校验；工业质检与表面缺陷精密识别；安全红线监控（涵盖违规闯入、人员溺水、烟火识别、矿井皮带堆煤）；商业运营分析（包含上菜/收台检测、顾客举手识别）；精细化物料盘点（杯子/咖啡豆/废弃物自动统计）等海量专业视觉能力。"
allowed-tools: Bash, Read, Write, Edit
metadata: {"openclaw":{"requires":{"bins":["node","npm"],"env":["YIJIAN_API_KEY"]},"primaryEnv":"YIJIAN_API_KEY"}}
---

# 百度一见视觉技能（Baidu Yijian Vision Skill）

> **Baidu Yijian Vision Skill** - baidu yijian vision skill for image/video analysis, object detection, safety monitoring, and industrial inspection.

## ⚠️ 必需条件

**此工具需要以下条件才能运行：**

1. **YIJIAN_API_KEY 环境变量**（必需）- 从[百度一见平台](https://yijian-next.cloud.baidu.com/apaas/)获取
2. **Node.js >= 16.0.0** - 本工具依赖 Node.js 运行时
3. **npm >= 8.0.0** - 用于依赖管理和安装

**确保上述条件满足后再使用此工具。**

---

> **🔒 客户端工具 - 这是一个本地工具，用于与百度一见（Baidu Yijian）平台交互。所有数据处理遵循安全协议。**

## 🎯 此工具的功能

百度一见（[yijian-next.cloud.baidu.com](https://yijian-next.cloud.baidu.com)）是百度（Baidu）的视觉（vision）理解平台。此工具使你能够：

- **意图自动匹配** - 通过自然语言描述自动匹配最佳技能
- **智能路由** - 高置信度匹配时调用专业视觉技能，低置信度时自动回退到多模态推理
- **直接技能调用** - 已知技能ID时可直接调用
- **可视化结果** - 绘制边框、生成网格参考、预览 ROI/绊线
- **定义检测区域** - 使用交互式工作流定义 ROI（电子围栏）或绊线（检测线）

**支持的检测类型：** 人员检测、行人计数、车辆识别、OCR、姿态估计、目标跟踪等。

## 📋 快速开始

### 系统要求

- **Node.js** >= 16.0.0
- **npm** >= 8.0.0
- **YIJIAN_API_KEY** 环境变量

## 🔧 前置条件

### 获取 API Key

1. 登录 [百度一见平台](https://yijian-next.cloud.baidu.com/apaas/)
2. 激活试用包
3. 生成 API Key（百度一见平台 → 系统管理 → 安全认证 → API Key）

### 配置环境

设置环境变量：

```
YIJIAN_API_KEY=your-api-key
```

## 📚 使用指南

### 意图驱动工作流（推荐）

**当你描述需求但不确定用哪个技能时**，系统会自动匹配最佳技能：

```bash
node CLAUDE_PLUGIN_ROOT/skill/scripts/intent-invoke.mjs "检测是否有人摔倒" photo.jpg
```

系统会自动：
1. 查询一见平台，根据意图匹配公共技能列表
2. 如果匹配置信度 ≥ 0.7，调用对应的专业技能（自动添加全图 ROI）
3. 如果公共技能无匹配或调用失败，搜索私有工作空间技能（由你从列表中选择最匹配的技能，再用 invoke 调用）
4. 如果私有空间也无合适技能，自动回退到多模态直接推理

> **自动 ROI：** 当用户未提供 ROI 时，系统会自动生成覆盖整张图片的 ROI。如需指定检测区域，请使用 `invoke.mjs` 传入自定义 ROI。

#### 自定义置信度阈值

```bash
# 仅当匹配度≥0.8时才使用技能，否则回退到多模态
node CLAUDE_PLUGIN_ROOT/skill/scripts/intent-invoke.mjs "检测是否有人摔倒" photo.jpg 0.8
```

#### 不使用图片（纯文本意图查询）

```bash
node CLAUDE_PLUGIN_ROOT/skill/scripts/intent-invoke.mjs "检测是否有人摔倒"
```

#### 返回格式

```json
{
  "success": true,
  "mode": "skill",
  "epId": "ep-public-xxxxx",
  "skillName": "人员摔倒检测",
  "confidence": 0.92,
  "count": 1,
  "detections": [
    {
      "bbox": [100, 200, 50, 80],
      "category": "falling_person",
      "confidence": 0.94
    }
  ]
}
```

**字段说明：**

| 字段 | 类型 | 说明 |
|------|------|------|
| `success` | boolean | 调用是否成功 |
| `mode` | string | `"skill"` 或 `"multimodal"`，表示使用的推理模式 |
| `epId` | string \| null | 技能ID（技能模式时有值） |
| `skillName` | string \| null | 技能名称（技能模式时有值） |
| `confidence` | number \| null | 技能匹配置信度（0-1） |
| `count` | number | 检测到的目标数量 |
| `detections` | array | 检测结果数组 |

**模式说明：**
- `"mode": "skill"` - 使用了百度一见平台的专业技能，精度高、成本低
- `"mode": "multimodal"` - 使用了多模态大模型直接推理，通用性强、无需预设技能

### 查询私有工作空间技能

当公共技能匹配不到或调用失败时，可以搜索你私有工作空间中的技能：

```bash
node CLAUDE_PLUGIN_ROOT/skill/scripts/workspace.mjs list-skills
```

返回你默认工作空间中所有已发布的技能列表（含 epId、名称和描述）。列表会本地缓存1小时。

**使用流程：**
1. 运行 `list-skills` 获取私有技能列表
2. 根据用户意图，从列表中选择最匹配的技能
3. 用选中的 epId 调用 `invoke.mjs` 执行技能
4. 如果私有空间也无合适技能，走 multimodal 多模态推理

```bash
# 第1步：获取私有技能列表
node CLAUDE_PLUGIN_ROOT/skill/scripts/workspace.mjs list-skills

# 第2步：选择匹配的技能并调用
echo '{"input0":{"image":"photo.jpg"}}' | node CLAUDE_PLUGIN_ROOT/skill/scripts/invoke.mjs ep-wsnyqcdj-0xdpgbt4
```

> **注意：** 私有技能列表按 API Key 关联，1小时内自动刷新缓存，无需频繁调用。

### 查询可用技能

如果你想了解有哪些技能可以匹配你的意图：

```bash
node CLAUDE_PLUGIN_ROOT/skill/scripts/list.mjs "人员检测"
```

这会返回匹配的技能列表及其置信度。

### 直接调用技能（已知技能ID）

**当你已经知道具体的技能 ID 时**，可以直接调用：

```bash
# 调用指定技能（从stdin读取输入）
echo '{"input0":{"image":"photo.jpg"}}' | node CLAUDE_PLUGIN_ROOT/skill/scripts/invoke.mjs ep-xxxx-yyyy -

# 或者直接作为参数
echo '{"input0":{"image":"photo.jpg"}}' | node CLAUDE_PLUGIN_ROOT/skill/scripts/invoke.mjs ep-xxxx-yyyy
```

#### ROI（电子围栏）参数格式

ROI 用于限定检测区域。**必须包含 `id`、`name`、`kind`、`points` 四个字段，缺一不可**，否则 API 返回 500 错误。

```json
{
  "id": "1",
  "name": "zone",
  "kind": "ROI",
  "points": [x1,y1, x2,y2, x3,y3, x4,y4]
}
```

- `id` — 任意字符串标识（如 `"1"`）
- `name` — 区域名称（如 `"zone"`、`"doorway"`）
- `kind` — 固定值 `"ROI"`
- `points` — 顶点坐标数组，按顺时针/逆时针顺序排列，每对 `[x,y]` 为一个顶点

> **自动 ROI：** 如果不传 `roi` 参数，`invoke.mjs` 会自动生成覆盖全图的 ROI。

#### 绊线（Tripwire）参数格式

绊线用于检测穿越事件。**必须包含 `id`、`name`、`kind`、`points`、`direction` 五个字段**。

```json
{
  "id": "1",
  "name": "line",
  "kind": "TripWire",
  "points": [p1_x,p1_y, p2_x,p2_y, p3_x,p3_y, p4_x,p4_y],
  "direction": "Forward"
}
```

- `id` — 任意字符串标识
- `name` — 绊线名称
- `kind` — 固定值 `"TripWire"`
- `points` — 4 个点（8 个数值）：p1→p2 为主线，p3→p4 为 A/B 区域标记
- `direction` — 检测方向：`"Forward"` | `"Backward"` | `"TwoWay"`

> **绊线不会自动生成**，必须由用户指定。详见 [绊线工作流](./tripwire-workflow.md)。

**调用带 ROI 的技能：**
```bash
echo '{"input0":{"image":"photo.jpg","roi":{"id":"1","name":"zone","kind":"ROI","points":[100,100,500,100,500,400,100,400]}}}' | \
  node CLAUDE_PLUGIN_ROOT/skill/scripts/invoke.mjs ep-xxxx-yyyy
```

**调用带绊线的技能：**
```bash
echo '{"input0":{"image":"photo.jpg","tripwire":{"id":"1","name":"line","kind":"TripWire","points":[0,540,1920,540,0,500,1920,500],"direction":"Forward"}}}' | \
  node CLAUDE_PLUGIN_ROOT/skill/scripts/invoke.mjs ep-xxxx-yyyy
```

### 测试 Query 接口

如果你想单独测试意图匹配功能：

```bash
node CLAUDE_PLUGIN_ROOT/skill/scripts/query.mjs "检测人员摔倒"
```

返回匹配的技能列表（JSON格式）。

### 测试 Multimodal 接口

如果你想单独测试多模态直接推理：

```bash
# 纯文本
node CLAUDE_PLUGIN_ROOT/skill/scripts/multimodal.mjs "描述这张图片"

# 带图片URL
node CLAUDE_PLUGIN_ROOT/skill/scripts/multimodal.mjs "描述这张图片" "http://example.com/image.jpg"
```

### 定义检测区域

**需要定义电子围栏（ROI，又叫感兴趣区域）或绊线（Tripwire，又叫检测线）？**

- **[ROI 工作流](./roi-workflow.md)** — 创建电子围栏，仅在指定区域检测
- **[绊线工作流](./tripwire-workflow.md)** — 绘制检测线，统计穿越事件

两个工作流都包含完整的交互步骤和示例对话。

### 查看完整文档

- **[类型定义](./types-guide.md)** — 检测（Detection），图像（Image）、电子围栏（ROI）、绊线（Tripwire）等数据结构
- **[网格输入系统](./grid-guide.md)** — 使用网格坐标指定点

## 💡 常见任务

### 查询匹配的技能

```bash
node CLAUDE_PLUGIN_ROOT/skill/scripts/list.mjs "检测人员"
```

### 意图驱动调用（自动路由）

```bash
# 系统会自动选择技能或多模态
node CLAUDE_PLUGIN_ROOT/skill/scripts/intent-invoke.mjs "检测人员摔倒" photo.jpg

# 自定义阈值
node CLAUDE_PLUGIN_ROOT/skill/scripts/intent-invoke.mjs "检测人员摔倒" photo.jpg 0.8
```

### 直接调用技能（已知技能ID）

```bash
echo '{"input0":{"image":"photo.jpg"}}' | node CLAUDE_PLUGIN_ROOT/skill/scripts/invoke.mjs ep-public-xxxxx
```

### 预览 ROI/绊线

在调用前在图像上预览：

```bash
node CLAUDE_PLUGIN_ROOT/skill/scripts/visualize.mjs photo.jpg '[]' preview.png \
  --overlays '[{"kind":"ROI","name":"zone","points":[...]}]'
```

### 生成网格

帮助用户使用网格坐标指定点位置：

```bash
node CLAUDE_PLUGIN_ROOT/skill/scripts/show-grid.mjs photo.jpg grid.png
```

---

## 📋 使用示例

### 示例 1：意图驱动检测（推荐）

**场景：** 你有一张监控画面图像，想检测是否有人摔倒，但不确定使用哪个技能。

```bash
# 使用意图驱动工作流，系统自动匹配最佳技能
node CLAUDE_PLUGIN_ROOT/skill/scripts/intent-invoke.mjs "检测是否有人摔倒" surveillance.jpg

# 返回结果包含检测到的目标
# {
#   "success": true,
#   "mode": "skill",
#   "epId": "ep-public-inqm15aq",
#   "skillName": "人员摔倒",
#   "confidence": 0.95,
#   "count": 1,
#   "detections": [...]
# }
```

### 示例 2：传统直接调用（已知技能ID）

**场景：** 你已经知道具体的技能 ID，直接调用。

```bash
# 第 1 步：调用指定技能
echo '{"input0":{"image":"surveillance.jpg"}}' | \
  node CLAUDE_PLUGIN_ROOT/skill/scripts/invoke.mjs ep-public-inqm15aq -

# 第 2 步：可视化结果
detections='[{"bbox":[150,200,80,180],"confidence":0.94,"category":{"id":"person","name":"人体"}}]'
node CLAUDE_PLUGIN_ROOT/skill/scripts/visualize.mjs surveillance.jpg "$detections" output.jpg

# 第 3 步：处理结果
echo "$detections" | jq 'length'  # 计数人数
```

### 示例 3：基于网格的 ROI 设置

**场景：** 在走廊监控摄像机中计数进入特定房间的人员，使用 ROI 限制检测区域。

```bash
# 第 1 步：生成网格参考
node CLAUDE_PLUGIN_ROOT/skill/scripts/show-grid.mjs hallway.jpg --cols 6 --rows 4

# 第 2 步：根据网格识别坐标（B1, D1, D3, B3）并创建 ROI
# 注意：ROI 必须包含 id 和 name 字段

# 第 3 步：验证 ROI
node CLAUDE_PLUGIN_ROOT/skill/scripts/visualize.mjs hallway.jpg '[]' roi_preview.jpg \
  --overlays "[{\"kind\":\"ROI\",\"name\":\"doorway\",\"points\":[320,270,960,270,960,810,320,810]}]"

# 第 4 步：使用 invoke.mjs 传入自定义 ROI（不要用 intent-invoke.mjs，它只会添加全图 ROI）
echo '{"input0":{"image":"hallway.jpg","roi":{"id":"1","name":"doorway","kind":"ROI","points":[320,270,960,270,960,810,320,810]}}}' | \
  node CLAUDE_PLUGIN_ROOT/skill/scripts/invoke.mjs ep-xxxx-yyyy
```

### 示例 4：视频帧处理和跟踪

**场景：** 处理 30 秒监控视频，逐帧检测和跟踪人员。

```bash
# 第 1 步：提取帧
ffmpeg -i surveillance_30sec.mp4 -vf fps=1 frames/frame_%04d.jpg

# 第 2 步：计算 sourceId（视频标识符）
sourceId=$(head -c 65536 surveillance_30sec.mp4 | md5sum | awk '{print substr($1, 1, 16)}')

# 第 3 步：处理每个帧并跟踪
for frame_file in frames/frame_*.jpg; do
  frame_num=$(basename "$frame_file" | grep -oE '[0-9]+' | head -1)
  frame_index=$((10#$frame_num - 1))
  timestamp=$((frame_index * 1000))
  imageId="frame_$(printf '%04d' "$frame_num")"

  # 使用意图驱动调用
  result=$(node CLAUDE_PLUGIN_ROOT/skill/scripts/intent-invoke.mjs "检测人员" "$frame_file")

  detections=$(echo "$result" | jq '.detections')
  echo "$detections" > "results/imageId_detections.json"
done
```

---

**API Key 从 `YIJIAN_API_KEY` 环境变量读取。所有脚本将 JSON 输出到标准输出，错误输出到标准错误。**

FILE:grid-guide.md
# 基于网格的 ROI / 绊线输入指南

## 概述

在命令行环境中手动指定 ROI 区域或绊线的确切像素坐标既繁琐又容易出错。基于网格的输入系统让你可以使用易于理解的网格坐标来代替。

## 工作原理

### 1. 生成网格参考图像

```bash
node scripts/show-grid.mjs photo.jpg [output-path] [--cols N] [--rows N]
```

这会在你的图像上创建带标签的网格叠加层：

```
       A     B     C     D     E     F     G
   0   ·─────·─────·─────·─────·─────·─────·
       │     │     │     │     │     │     │
   1   ·─────·─────·─────·─────·─────·─────·
       │     │     │     │     │     │     │
   2   ·─────·─────·─────·─────·─────·─────·
       │     │     │     │     │     │     │
   3   ·─────·─────·─────·─────·─────·─────·
       │     │     │     │     │     │     │
   4   ·─────·─────·─────·─────·─────·─────·
```

**输出**：
- `photo_grid.png` - 网格参考图像
- `photo_grid_metadata.json` - 坐标映射数据

### 2. 查看并识别坐标

查看网格图像并识别 ROI 或绊线的坐标：

- **列**：A、B、C、D、...（从左到右）
- **行**：0、1、2、3、...（从上到下）
- **交点**：网格线交叉的位置（用点标记）

### 3. 使用网格坐标指定

告诉系统网格坐标：

```
用户："在 B1、E1、E3、B3 创建检测区域"
```

系统自动转换为像素坐标：

```
B1 = (列B索引 × 网格宽度, 行1索引 × 网格高度)
E1 = (列E索引 × 网格宽度, 行1索引 × 网格高度)
E3 = (列E索引 × 网格宽度, 行3索引 × 网格高度)
B3 = (列B索引 × 网格宽度, 行3索引 × 网格高度)
```

## 使用示例

### 示例 1：定义检测区域（ROI）

```bash
# 生成网格
$ node scripts/show-grid.mjs office.jpg

# 查看网格图像以识别坐标，然后：
用户："我想要一个办公桌区域的检测区域：B2、G2、G5、B5"

转换为 ROI：
{
  "kind": "ROI",
  "points": [col_B, row_2, col_G, row_2, col_G, row_5, col_B, row_5]
}

# 使用 ROI 调用技能
$ echo '{"input0":{"image":"office.jpg","roi":"[...]"}}' | \
  node invoke.mjs ep-public-2403um2p
```

### 示例 2：定义穿越线（绊线）

```bash
# 生成网格
$ node scripts/show-grid.mjs hallway.jpg

# 查看网格图像，然后：
用户："在走廊中创建一条从 A3 到 H3 的绊线，检测从左到右的穿越"

转换为绊线：
{
  "kind": "TripWire",
  "points": [col_A, row_3, col_H, row_3],
  "direction": "Forward"
}

# 使用绊线调用技能
$ echo '{"input0":{"image":"hallway.jpg","tripwire":"[...]"}}' | \
  node invoke.mjs ep-public-ywbjb7tm
```

### 示例 3：多个 ROI

```bash
用户："我想要 3 个检测区域：入口（A1-C3）、中心（D1-F3）、出口（G1-H3）"

转换为三个 ROI 对象并作为 Array<ROI> 传递：
[
  {
    "kind": "ROI",
    "points": [col_A, row_1, col_C, row_1, col_C, row_3, col_A, row_3],
    "order": 0
  },
  {
    "kind": "ROI",
    "points": [col_D, row_1, col_F, row_1, col_F, row_3, col_D, row_3],
    "order": 1
  },
  {
    "kind": "ROI",
    "points": [col_G, row_1, col_H, row_1, col_H, row_3, col_G, row_3],
    "order": 2
  }
]
```

## 网格算法

网格大小会自动计算：

- **目标**：约 30-42 个交点，大致为方形单元
- **横向图像**（1920×1080）：7 列 × 4 行
- **纵向图像**（1080×1920）：4 列 × 7 行
- **方形图像**（800×800）：5 列 × 5 行

**手动覆盖**：
```bash
node scripts/show-grid.mjs photo.jpg --cols 10 --rows 6
```

## 命令参考

### 生成网格

```bash
node scripts/show-grid.mjs <input-image> [output-path] [--cols N] [--rows N]
```

**参数**：
- `<input-image>` - 输入图像文件路径
- `[output-path]` - 可选输出图像路径（默认：`<input>_grid.png`）
- `--cols N` - 覆盖列数
- `--rows N` - 覆盖行数

**输出**：
- 网格图像：`<output>.png`
- 元数据 JSON：`<output>_metadata.json`

### 使用网格坐标

在指定坐标时：

**有效格式**：
- 单个点：`A1`
- 序列：`A1、E1、E3、A3`（用于 ROI）
- 线：`A2 → G2`（用于绊线，显示方向）

**网格参考格式**：
- 列：A-Z，然后是 AA-AZ 等
- 行：0-9，然后是 10-99 等

## 可视化

### 查看网格图像

生成网格后，系统应该显示网格图像以便你可以直观地识别坐标。

### 调用前验证

在用坐标调用技能之前：

```bash
# 在原始图像上可视化选定的 ROI/绊线
node scripts/visualize.mjs office.jpg '<roi-or-tripwire-json>' preview.jpg
```

这可帮助你在处理前验证坐标是否正确。

## 提示和技巧

1. **从简单开始**：在使用复杂多边形前先从简单的矩形 ROI 开始
2. **使用均匀网格**：5×5 或 6×6 网格最容易参考
3. **记录你的区域**：为每个 ROI 指定有意义的名称（例如"入口"、"收银台"）
4. **重用坐标**：保存类似摄像机角度的坐标集
5. **先测试**：在处理前始终用 `visualize.mjs` 预览

## 故障排除

**"网格图像过于拥挤"**
- 降低网格密度：`--cols 4 --rows 4`
- 如果可能，增大图像大小

**"看不清网格线"**
- 网格使用半透明白色线条（50% 不透明度）
- 放大网格图像以获得更好的可见性

**"坐标似乎不对"**
- 验证列/行顺序（列 A-Z 从左到右，行 0-N 从上到下）
- 检查第一个点和最后一个点是否不同（多边形必须闭合）
- 确保点的顺序一致（全部顺时针或全部逆时针）

**"网格不适合图像"**
- 非常宽或非常高的图像可能有不均匀的单元格
- 使用 `--cols` 和 `--rows` 强制指定特定的网格尺寸

## 参考

- [类型定义](./types-guide.md) - ROI 和绊线类型规范
- [SKILL.md](./SKILL.md) - 主技能指南

FILE:tripwire-workflow.md
# 绊线（检测线）交互工作流

**导航：** 返回 [SKILL.md](./SKILL.md) | 类型定义 [types-guide.md](./types-guide.md)

> 当用户需要为穿越检测定义绊线时，遵循此交互工作流。

## 核心概念

- **p1→p2** 主检测线（用户定义，2 个点）
- **p3→p4** A-B 区域标记（系统自动生成或用户定义）
- **方向** Forward/Backward/TwoWay（要检测的方向）

**自动生成过程**：用户只需指定 p1 和 p2 点。系统会自动生成垂直于主线的 p3 和 p4 点用于预览和确认。

矢量穿越检测算法详情，见 [types-guide.md#tripwire](./types-guide.md#tripwire)

## 工作流步骤

### 第 1 步：生成网格参考

```bash
node scripts/show-grid.mjs <image> <output-grid.png>
```

### 第 2 步：用户指定绊线

"查看网格，检测线应该在哪里？例如，从 B2 到 G2？"

### 第 3 步：预览绊线位置

系统基于用户的 2 个点（p1、p2）生成完整的 4 点结构，然后调用 `visualize.mjs` 生成预览：

```bash
# 自动计算 4 个点后
node scripts/visualize.mjs <image> '[]' preview.png \
  --overlays '[{
    "kind":"TripWire",
    "name":"tripwire preview",
    "direction":"Forward",
    "points":[p1_x,p1_y,p2_x,p2_y,p3_x,p3_y,p4_x,p4_y]
  }]'
```

预览显示：
- 主线（p1→p2）虚线 - 橙色
- A/B 标记（p3→p4）虚线 - 橙色，带标签
- 方向箭头 - Forward=绿色在 p4，Backward=红色在 p3

### 第 4 步：确认 A/B 区域

"查看预览图像：
- A 区在这一侧（例如线的上方）
- B 区在那一侧（例如线的下方）
- 绿色箭头显示 Forward 是 A→B

这是否正确？"

### 第 5 步：选择方向模式

用户看到预览后，提问：
- Forward（相同方向）
- Backward（相反方向）
- TwoWay（检测两个方向）

### 第 6 步：最终确认

生成最终预览，用户确认，然后调用技能：

```bash
echo '{"input0":{"image":"photo.jpg","tripwire":{"id":"1","name":"line","kind":"TripWire","points":[p1_x,p1_y,p2_x,p2_y,p3_x,p3_y,p4_x,p4_y],"direction":"Forward"}}}' | \
  node scripts/invoke.mjs ep-xxxx-yyyy
```

---

## 常见错误

❌ 不要在显示预览前解释 A/B
✅ 用户必须看到带 A/B 标记标签的预览

❌ 不要仅使用 2 个点
✅ 始终提供 4 个点（主线 + A/B 标记）

❌ 不要跳过预览验证
✅ 始终显示预览以确认

---

## 完整对话示例

**用户**："我需要检测进入门的人员"

**你**：
1. 生成网格：
```bash
node scripts/show-grid.mjs door-camera.jpg door-grid.png
```

2. 问："查看网格，检测线应该在哪里？例如 B2 到 G2？"

3. 用户："是的，从 B2 到 G2 的线"

4. 生成显示绊线位置和 A/B 标记的预览：
```bash
node scripts/visualize.mjs door-camera.jpg '[]' preview1.png \
  --overlays '[{"kind":"TripWire","name":"door","direction":"Forward","points":[B2_x,B2_y,G2_x,G2_y,B1_x,B1_y,G3_x,G3_y]}]'
```

5. 解释："这是绊线预览：
   - 区域 'A' 在线上方（外侧）
   - 区域 'B' 在线下方（内侧）
   - 绿色箭头显示 Forward 检测 A→B（从外侧进入）

   你对这个设置满意吗？"

6. 用户："是的，我想检测进入（Forward）"

7. 生成最终预览以确认

8. 调用技能进行检测

---

## 数据结构

详见 [types-guide.md#tripwire](./types-guide.md#tripwire) 了解完整的定义、矢量算法和示例。

FILE:package.json
{
  "name": "baidu-yijian-vision",
  "version": "0.9.38",
  "description": "Baidu Yijian Vision Skill - Intent-driven visual analysis with automatic skill routing",
  "type": "module",
  "private": true,
  "engines": {
    "node": ">=16.0.0",
    "npm": ">=8.0.0"
  },
  "scripts": {
    "pack": "node scripts/pack.mjs",
    "test": "node --test 'tests/*.test.mjs'",
    "test:integration": "node tests/run-integration.mjs --skip-e2e",
    "test:integration:all": "node tests/run-integration.mjs",
    "test:integration:e2e": "node tests/run-integration.mjs --e2e",
    "test:integration:agent": "node tests/run-integration.mjs --agent",
    "test:integration:docker": "node tests/run-docker.mjs",
    "test:all": "npm test && npm run test:integration"
  },
  "devDependencies": {
    "archiver": "^7.0.1",
    "openclaw": "github:openclaw/openclaw#main"
  },
  "optionalDevDependencies": {
    "_openclaw": "https://github.com/openclaw/openclaw.git#main"
  },
  "optionalDependencies": {
    "sharp": "^0.33.0"
  }
}

FILE:roi-workflow.md
# ROI（关注区域）交互工作流

**导航：** 返回 [SKILL.md](./SKILL.md) | 类型定义 [types-guide.md](./types-guide.md)

> 当用户需要为对象检测定义 ROI（关注区域）时，遵循此交互工作流。

## 工作流步骤

### 第 1 步：生成网格参考

```bash
node scripts/show-grid.mjs <image> <output-grid.png>
```

向用户显示网格图像并解释行和列标签。

### 第 2 步：询问 ROI 目的

"你想在这个区域检测什么？例如：
- 进入收银区域的人员
- 停车场中的车辆
- 货架上的产品"

### 第 3 步：用户指定顶点

用户根据网格标签指定角。

**矩形 ROI 示例：**
- 用户："从 B2 到 G5"
- 转换为 4 个顶点：[B2、G2、G5、B5]

### 第 4 步：验证顶点顺序

"这些点是否围绕周边形成闭合多边形？（顺时针或逆时针）"

### 第 5 步：预览 ROI

```bash
node scripts/visualize.mjs <image> '[]' preview.png \
  --overlays '[{"kind":"ROI","name":"test","points":[x1,y1,x2,y2,x3,y3,x4,y4]}]'
```

### 第 6 步：用户确认

"ROI 位置是否正确？"

### 第 7 步：调用技能

```bash
echo '{"input0":{"image":"photo.jpg","roi":{"id":"1","name":"zone","kind":"ROI","points":[...]}}}' | \
  node scripts/invoke.mjs ep-xxxx-yyyy
```

> **重要：** 自定义 ROI 必须通过 `invoke.mjs` 传入。不要使用 `intent-invoke.mjs`，因为它会自动添加全图 ROI 覆盖掉你的自定义区域。

---

## 常见错误

❌ 不要假设矩形 ROI
✅ 支持任意多边形

❌ 不要允许无序的顶点
✅ 顶点必须按顺序排列

❌ 不要跳过预览验证
✅ 始终显示预览图像以确认

---

## 完整对话示例

### 示例 1：矩形入口区域

**用户**："我想检测进入办公室的人员"

**你**：
1. 生成网格：
```bash
node scripts/show-grid.mjs office.jpg office-grid.png
```
2. 问："查看网格，入口大概在哪里？"
3. 用户："大约从 B1 到 G3"
4. 确认："所以顶点是 B1、G1、G3、B3，对吧？"
5. 用户："是的"
6. 预览 ROI
7. 问："这个矩形是否覆盖了入口？"
8. 用户："完美！"
9. 调用技能进行检测

### 示例 2：复杂多边形（L 形区域）

**用户**："我需要监视 L 形存储区域"

**你**：
1. 生成网格
2. 问："使用网格坐标标记 L 形的所有角，从一个角开始"
3. 用户："从左上角开始：A2、E2、E4、D4、D6、A6"
4. 确认："这些点是否按顺序形成 L 形的边界？确认它关闭回 A2"
5. 用户："是的"
6. 预览 L 形多边形
7. 验证没有自交
8. 调用技能

---

## 数据结构

详见 [types-guide.md](./types-guide.md) 了解完整的定义、子对象和示例。

FILE:types-guide.md
# 易见类型定义指南

## 概述

易见平台处理视觉信息并返回结构化数据。理解这些类型可帮助你有效地使用技能输入和输出。

## 核心类型

### 基本类型

这些是在整个平台中使用的标量值。

| 类型 | 描述 | 示例 |
|------|------|------|
| `String` | 文本数据 | `"hello"` |
| `TemplateString` | 格式化文本 | `"Result: {value}"` |
| `Integer` | 整数 | `42` |
| `Double` | 浮点数 | `3.14` |
| `Boolean` | 真/假 | `true` |
| `Time` | 时间戳 | `"2026-03-12T10:00:00Z"` |

### 复杂类型

这些是包含视觉或检测信息的结构化对象。

#### Detection（检测）

表示图像中的检测对象。

```json
{
  "bbox": [x, y, width, height],
  "confidence": 0.94,
  "category": {
    "id": "person",
    "name": "人体",
    "confidence": 0.98
  },
  "track_id": 1,
  "ocr": null,
  "keypoints": null
}
```

**字段**：
- `bbox` - 边框 [x, y, width, height]（像素坐标）
- `confidence` - 检测置信度（0-1）
- `category` - 对象分类，包含 id、name、confidence
- `track_id` - 跟踪 ID（用于帧间跟踪，可选）
- `ocr` - 文本识别结果（可选）
- `keypoints` - 姿态/骨骼关键点（可选）

#### TrackDetection（跟踪检测）

与 `Detection` 相同，但用于视频帧间的时间跟踪。

#### Attribute（属性）

表示图像中检测到的属性或属性。

```json
{
  "attribute": "age",
  "answer": "adult",
  "confidence": 0.87
}
```

**字段**：
- `attribute` - 属性名称（例如"age"、"gender"、"expression"）
- `answer` - 属性值
- `confidence` - 置信度分数

#### Image（图像）

表示技能的视觉输入。

```json
{
  "imageData": "base64-encoded-image-data",
  "imageWidth": 1920,
  "imageHeight": 1080,
  "sourceId": "abc123def456",
  "imageId": "img001",
  "timestamp": 1705056000000
}
```

**字段**：
- `imageData` - Base64 编码的图像字节（从文件路径自动编码）
- `imageWidth` - 图像宽度（像素）
- `imageHeight` - 图像高度（像素）
- `sourceId` - 源标识符，用于视频流（文件 MD5 哈希）
- `imageId` - 唯一的图像标识符
- `timestamp` - 帧时间戳（毫秒）

#### ROI（关注区域 / 电子围栏）

定义用于分析的多边形区域。

**完整工作流：** 见 [roi-workflow.md](./roi-workflow.md)

**数据结构**:

```json
{
  "kind": "ROI",
  "name": "entrance",
  "points": [x1, y1, x2, y2, x3, y3, x4, y4]
}
```

**字段说明**:
- `kind` - 固定值"ROI"
- `name` - 区域描述名称（可选）
- `points` - 多边形顶点坐标数组，按顺序排列：`[x1,y1, x2,y2, x3,y3, ...]`

**示例**:

**矩形ROI - 收银区域**
```json
{
  "kind": "ROI",
  "name": "checkout-zone",
  "points": [100, 100, 300, 100, 300, 250, 100, 250]
}
```

**多边形ROI - L形仓库区域**
```json
{
  "kind": "ROI",
  "name": "warehouse-l-zone",
  "points": [100, 100, 400, 100, 400, 200, 300, 200, 300, 350, 100, 350]
}
```

**网格坐标输入**:
使用 `show-grid.mjs` 生成的网格坐标：
```
B1, E1, E3, B3  →  自动转换为像素坐标
```

#### Tripwire（绊线 / 穿越检测线）

定义用于穿越事件的检测线。

**完整工作流：** 见 [tripwire-workflow.md](./tripwire-workflow.md)

**核心概念 - 向量点乘法则**:

绊线穿越检测使用向量点乘：

1. **绊线向量** = p1 → p2 方向
2. **运动向量** = 前一帧到当前帧的对象移动
3. **点乘计算**：`dot = V_line.x * V_motion.x + V_line.y * V_motion.y`
4. **穿越判定**：
   - `dot > 0` → 正向穿越（Forward，A→B）
   - `dot < 0` → 反向穿越（Backward，B→A）
   - TwoWay: 两个方向都检测

**数据结构**:

```json
{
  "kind": "TripWire",
  "name": "door-entrance",
  "direction": "Forward",
  "points": [p1_x, p1_y, p2_x, p2_y, p3_x, p3_y, p4_x, p4_y]
}
```

**字段说明**:
- `kind` - 固定值"TripWire"
- `name` - 检测线描述名称（可选）
- `direction` - "Forward" | "Backward" | "TwoWay"
- `points` - 8元素数组：
  - p1, p2: 主检测线（用户指定）
  - p3, p4: A-B区域标记（标识穿越方向）

**自动生成A-B点**:

用户可以只提供 **2个点**（p1, p2 主检测线），系统会自动生成 **p3, p4**（A-B区域标记）：

1. **输入**: 用户给出2个点 `[p1_x, p1_y, p2_x, p2_y]`
2. **自动计算**:
   ```
   dx, dy = 主线方向向量 (p2 - p1)
   perpX, perpY = 旋转90度得到垂直向量
   distance = 30px (默认)
   p3 = p1 + perp * distance  (A区域标记)
   p4 = p2 + perp * distance  (B区域标记)
   ```
3. **结果**: 自动补全为完整的4点结构 `[p1_x, p1_y, p2_x, p2_y, p3_x, p3_y, p4_x, p4_y]`
4. **生成预览图**: 使用 `visualize.mjs` 将Tripwire可视化
   ```bash
   node scripts/visualize.mjs <image> '[]' preview.png \
     --overlays '[{
       "kind": "TripWire",
       "name": "绊线预览",
       "direction": "Forward",
       "points": [p1_x, p1_y, p2_x, p2_y, p3_x, p3_y, p4_x, p4_y]
     }]'
   ```
   输出：`preview.png` 显示绊线虚线、A/B标记、方向箭头
5. **用户确认**: 显示预览图让用户确认：
   - A/B区域位置是否正确
   - 方向箭头是否指向期望方向
   - 绊线是否跨越检测区域

**示例**:

用户输入：`[150, 100, 150, 300]` (竖直线)
系统自动生成：`[150, 100, 150, 300, 120, 100, 120, 300]` (左偏30px)
生成预览：
```bash
node scripts/visualize.mjs door.jpg '[]' preview.png \
  --overlays '[{"kind":"TripWire","name":"门","direction":"Forward","points":[150,100,150,300,120,100,120,300]}]'
```
用户看预览图确认A(左)→B(右)方向是否符合预期

**结构可视化**:
```
p1 --------> p2  (主检测线，p1→p2)
 ^            ^
 |            |
p3           p4  (A-B区域标记)

• Forward: 检测p3→p4方向的穿越（A→B）
• Backward: 检测p4→p3方向的穿越（B→A）
• TwoWay: 双向都检测
```

**方向可视化**:
- **Forward** - 绿色箭头在p4，表示A→B方向
- **Backward** - 红色箭头在p3，表示B→A方向
- **TwoWay** - 蓝色箭头在p3和p4，表示双向

**示例**:

**垂直检测线 - 人员进出统计**

```json
{
  "kind": "TripWire",
  "name": "door-entrance",
  "direction": "Forward",
  "points": [150, 100, 150, 300, 130, 100, 170, 300]
}
```

说明：
- p1(150,100) → p2(150,300): 竖直线
- p3(130,100) → p4(170,300): A-B标记（左右偏移30px）
- Forward: 只检测从左→右的穿越

**水平检测线 - 考勤打卡**

```json
{
  "kind": "TripWire",
  "name": "checkin-line",
  "direction": "TwoWay",
  "points": [100, 200, 400, 200, 100, 180, 400, 220]
}
```

说明：
- p1(100,200) → p2(400,200): 水平线
- p3(100,180) → p4(400,220): A-B标记（上下偏移20px）
- TwoWay: 双向都统计

### 数组类型

任何类型都可以包装在数组中：

```json
{
  "type": "Array<Detection>",
  "example": [
    { "bbox": [...], "confidence": 0.9, ... },
    { "bbox": [...], "confidence": 0.85, ... }
  ]
}
```

常见数组：
- `Array<Detection>` - 多个检测对象
- `Array<TrackDetection>` - 帧间跟踪的对象
- `Array<Attribute>` - 多个属性结果
- `Array<ROI>` - 多个区域
- `Array<Tripwire>` - 多条检测线

## 可视化和交互

### 检测可视化

使用 `visualize.mjs` 在图像上可视化检测结果：

```bash
node scripts/visualize.mjs photo.jpg '<detection-json>' output.jpg
```

**支持**：
- 带置信度分数的边框
- 类别标签
- 用于跟踪可视化的跟踪 ID
- 叠加层（ROI 区域、绊线线条）
- 文本注释

### ROI/绊线输入

用于区域和障碍的交互式输入：

1. 生成网格参考图像：
   ```bash
   node scripts/show-grid.mjs photo.jpg
   ```

2. 使用网格参考指定坐标：
   ```
   ROI: B1, E1, E3, B3
   Tripwire: A2 → G2 (direction: Forward)
   ```

3. 网格坐标自动转换为像素坐标

详见 [grid-guide.md](./grid-guide.md)。

## 视频帧提取

用于视频帧提取和多帧分析：

```json
{
  "imageData": "base64-frame-data",
  "sourceId": "video_abc123",        // 每个视频源唯一
  "imageId": "frame_001",             // 每帧唯一
  "timestamp": 1705056000000          // 帧时间戳，单位毫秒
}
```

**sourceId 计算**:
```
sourceId = MD5(视频文件的前 64KB) → 16 字符十六进制字符串
```

**timestamp 计算**:
```
timestamp = 帧索引 * 1000 / 帧率
```

详见 [video-guide.md](./video-guide.md)。

## 类型转换

### 输入转换

当将文件作为 `Image` 类型传递时：

```bash
# 文件路径自动转换为 base64
echo '{"input0":{"image":"photo.jpg"}}' | node invoke.mjs ep-detect
```

### 输出解析

检测结果会自动解析：

```javascript
// 原始输出
{ "detections": "[{\"bbox\": [...]}]" }

// 解析后的输出
{ "detections": [{ "bbox": [...] }] }  // JSON 数组，可以直接使用
```

## 最佳实践

1. **始终验证边框** - 确保坐标在图像边界内
2. **使用 track_id 以保持连续性** - 跟踪帧间的对象以获得更好的结果
3. **有效组合类型** - 使用 ROI 限制检测区域
4. **时间戳精度** - 为视频分析维持一致的时间戳计算
5. **sourceId 一致性** - 为来自同一源的所有帧保持 sourceId 常数

## 故障排除

**"无效的检测"**
- 检查边框格式：[x, y, width, height]
- 验证坐标为非负数
- 确保框在图像边界内

**"缺少关键点"**
- 并非所有技能都支持关键点
- 检查技能文档以了解支持的输出

**"ROI 未被识别"**
- 验证多边形是否闭合（第一个点 ≠ 最后一个点）
- 确保至少有 3 个点
- 检查点的顺序是否正确（顺时针或逆时针）

**"sourceId 不匹配"**
- 如果视频文件改变，重新计算 sourceId
- 对来自同一源的所有帧使用相同的 sourceId

FILE:scripts/visualize.mjs
#!/usr/bin/env node

import fs from 'fs';
import path from 'path';
import sharp from 'sharp';

/**
 * Read all stdin as a string (for piped input).
 */
function readStdin() {
  return new Promise((resolve, reject) => {
    const chunks = [];
    process.stdin.setEncoding('utf-8');
    process.stdin.on('data', (chunk) => chunks.push(chunk));
    process.stdin.on('end', () => resolve(chunks.join('')));
    process.stdin.on('error', reject);
  });
}

/**
 * Escape XML special characters for safe SVG embedding.
 */
export function escapeXml(str) {
  return String(str)
    .replace(/&/g, '&amp;')
    .replace(/</g, '&lt;')
    .replace(/>/g, '&gt;')
    .replace(/"/g, '&quot;')
    .replace(/'/g, '&apos;');
}

/**
 * Build SVG for detection bounding boxes and labels (green).
 */
export function buildDetectionSvg(detections, width, height) {
  const rects = [];
  const fontSize = 14;
  const padding = 4;
  const labelHeight = fontSize + padding * 2;

  for (const det of detections) {
    const [x, y, w, h] = det.bbox;
    // Bounding box rectangle
    rects.push(
      `<rect x="x" y="y" width="w" height="h" fill="none" stroke="rgba(0,255,0,0.8)" stroke-width="2"/>`
    );

    // Label text
    const parts = [];
    if (det.track_id != null) parts.push(`#det.track_id`);
    if (det.category && det.category.name) parts.push(det.category.name);
    if (det.confidence != null) parts.push((det.confidence * 100).toFixed(0) + '%');
    const label = parts.join(' ');

    if (label) {
      const labelWidth = label.length * (fontSize * 0.65) + padding * 2;
      const labelY = Math.max(y - labelHeight, 0);
      // Label background
      rects.push(
        `<rect x="x" y="labelY" width="labelWidth" height="labelHeight" fill="rgba(0,255,0,0.7)" rx="2"/>`
      );
      // Label text
      rects.push(
        `<text x="x + padding" y="labelY + fontSize + padding - 2" font-family="sans-serif" font-size="fontSize" fill="white">escapeXml(label)</text>`
      );
    }
  }

  return `<svg xmlns="http://www.w3.org/2000/svg" width="width" height="height">rects.join('')</svg>`;
}

/**
 * Helper function to draw an arrow at a specific point.
 * @param {Array} elements - Array to push SVG elements to
 * @param {Object} point - {x, y} coordinates
 * @param {number} dirX - Unit direction vector X component
 * @param {number} dirY - Unit direction vector Y component
 * @param {number} arrowSize - Size of the arrow
 * @param {string} color - Arrow color (rgba string)
 */
function drawArrowAtPoint(elements, point, dirX, dirY, arrowSize, color) {
  const tipX = point.x + dirX * arrowSize;
  const tipY = point.y + dirY * arrowSize;

  // Perpendicular base vectors
  const perpX = -dirY * arrowSize * 0.6;
  const perpY = dirX * arrowSize * 0.6;

  const b1x = point.x - dirX * arrowSize - perpX;
  const b1y = point.y - dirY * arrowSize - perpY;
  const b2x = point.x - dirX * arrowSize + perpX;
  const b2y = point.y - dirY * arrowSize + perpY;

  elements.push(
    `<polygon points="tipX,tipY b1x,b1y b2x,b2y" fill="color"/>`
  );
}

/**
 * Build SVG for ROI polygons and Tripwire polylines (input overlays).
 *
 * Each overlay item:
 *   { kind: "ROI", points: [x1,y1,x2,y2,...], name?: string }
 *   { kind: "TripWire", points: [x1,y1,x2,y2,...], name?: string, direction?: string }
 *
 * Each crossing item (optional):
 *   { tripwireId?: string, prevPos: [x,y], currPos: [x,y], dotProduct: number }
 */
export function buildOverlaysSvg(overlays, width, height, crossings = []) {
  const elements = [];
  const fontSize = 13;
  const padding = 4;
  const labelHeight = fontSize + padding * 2;

  for (const overlay of overlays) {
    const pts = overlay.points || [];
    // Convert flat [x1,y1,x2,y2,...] to "x1,y1 x2,y2 ..."
    const pointPairs = [];
    for (let i = 0; i < pts.length - 1; i += 2) {
      pointPairs.push(`pts[i],pts[i + 1]`);
    }
    const pointsStr = pointPairs.join(' ');

    if (overlay.kind === 'ROI') {
      // ROI: blue polygon with semi-transparent fill
      elements.push(
        `<polygon points="pointsStr" fill="rgba(0,150,255,0.15)" stroke="rgba(0,150,255,0.6)" stroke-width="2"/>`
      );

      // Label at first vertex
      const name = overlay.name || overlay.displayName || '';
      if (name && pts.length >= 2) {
        const lx = pts[0];
        const ly = Math.max(pts[1] - labelHeight, 0);
        const labelWidth = name.length * (fontSize * 0.65) + padding * 2;
        elements.push(
          `<rect x="lx" y="ly" width="labelWidth" height="labelHeight" fill="rgba(0,150,255,0.8)" rx="2"/>`
        );
        elements.push(
          `<text x="lx + padding" y="ly + fontSize + padding - 2" font-family="sans-serif" font-size="fontSize" fill="white">escapeXml(name)</text>`
        );
      }
    } else if (overlay.kind === 'TripWire') {
      // TripWire with 4-point structure support
      // points = [x1,y1, x2,y2, x3,y3, x4,y4]
      // p1-p2: main tripwire
      // p3-p4: A-B detection line with directional arrows
      // If only 2-point format provided, auto-generate perpendicular p3-p4

      let p1, p2, p3, p4;

      if (pts.length >= 8) {
        // 4-point structure (explicit format)
        p1 = {x: pts[0], y: pts[1]};
        p2 = {x: pts[2], y: pts[3]};
        p3 = {x: pts[4], y: pts[5]};
        p4 = {x: pts[6], y: pts[7]};
      } else if (pts.length >= 4) {
        // 2-point format: auto-generate perpendicular A-B points
        p1 = {x: pts[0], y: pts[1]};
        p2 = {x: pts[2], y: pts[3]};

        // Calculate perpendicular distance for A-B points
        const dx = p2.x - p1.x;
        const dy = p2.y - p1.y;
        const len = Math.sqrt(dx * dx + dy * dy) || 1;
        // Unit perpendicular vector (rotated 90 degrees counter-clockwise)
        const perpX = -dy / len;
        const perpY = dx / len;
        // Distance from the main line to A-B points
        const perpDist = 30;
        // Generate A-B points perpendicular to the main tripwire
        p3 = {
          x: p1.x + perpX * perpDist,
          y: p1.y + perpY * perpDist
        };
        p4 = {
          x: p2.x + perpX * perpDist,
          y: p2.y + perpY * perpDist
        };
      } else {
        // Not enough points, skip rendering
        p1 = p2 = p3 = p4 = null;
      }

      if (p1 && p2 && p3 && p4) {

        // Draw main tripwire line (p1-p2) - dashed, no arrow
        elements.push(
          `<line x1="p1.x" y1="p1.y" x2="p2.x" y2="p2.y" stroke="rgba(255,165,0,0.8)" stroke-width="2" stroke-dasharray="8,4"/>`
        );

        // Draw A-B detection line (p3-p4) - dashed with direction arrow
        elements.push(
          `<line x1="p3.x" y1="p3.y" x2="p4.x" y2="p4.y" stroke="rgba(255,165,0,0.8)" stroke-width="2" stroke-dasharray="8,4"/>`
        );

        // Calculate direction vector (p3 -> p4)
        const dx = p4.x - p3.x;
        const dy = p4.y - p3.y;
        const len = Math.sqrt(dx * dx + dy * dy) || 1;
        const unitX = dx / len;
        const unitY = dy / len;

        const dir = overlay.direction || 'TwoWay';
        const arrowSize = 12;

        // Draw directional arrows
        if (dir === 'Forward') {
          // Arrow at p4, pointing toward p4 (green)
          drawArrowAtPoint(elements, p4, unitX, unitY, arrowSize, 'rgba(76,175,80,0.8)');
        } else if (dir === 'Backward') {
          // Arrow at p3, pointing toward p3 (red)
          drawArrowAtPoint(elements, p3, -unitX, -unitY, arrowSize, 'rgba(244,67,54,0.8)');
        } else if (dir === 'TwoWay') {
          // Arrows at both p3 and p4 (blue)
          drawArrowAtPoint(elements, p3, -unitX, -unitY, arrowSize, 'rgba(33,150,243,0.8)');
          drawArrowAtPoint(elements, p4, unitX, unitY, arrowSize, 'rgba(33,150,243,0.8)');
        }

        // Add region labels 'A' and 'B'
        const labelFontSize = 12;
        const labelOffset = 8;
        // Label 'A' near p3
        elements.push(
          `<text x="p3.x - labelOffset" y="p3.y - labelOffset" font-family="sans-serif" font-size="labelFontSize" fill="rgba(255,165,0,0.8)" font-weight="bold">A</text>`
        );
        // Label 'B' near p4
        elements.push(
          `<text x="p4.x + labelOffset" y="p4.y - labelOffset" font-family="sans-serif" font-size="labelFontSize" fill="rgba(255,165,0,0.8)" font-weight="bold">B</text>`
        );
      }

      // Label at first vertex (applies to both formats)
      const name = overlay.name || overlay.displayName || '';
      if (name && pts.length >= 2) {
        const lx = pts[0];
        const ly = Math.max(pts[1] - labelHeight, 0);
        const labelWidth = name.length * (fontSize * 0.65) + padding * 2;
        elements.push(
          `<rect x="lx" y="ly" width="labelWidth" height="labelHeight" fill="rgba(255,165,0,0.8)" rx="2"/>`
        );
        elements.push(
          `<text x="lx + padding" y="ly + fontSize + padding - 2" font-family="sans-serif" font-size="fontSize" fill="white">escapeXml(name)</text>`
        );
      }
    }
  }

  // Draw crossing vectors perpendicular to tripwire
  for (const crossing of crossings) {
    const { tripwireId, prevPos, currPos, dotProduct, direction } = crossing;
    if (!prevPos || !currPos) continue;

    const [prevX, prevY] = prevPos;
    const [currX, currY] = currPos;

    // Crossing point is midpoint between previous and current position
    const crossX = (prevX + currX) / 2;
    const crossY = (prevY + currY) / 2;

    // Determine color based on crossing type
    let arrowColor, dotColor;
    if (direction === 'Forward') {
      // Forward crossing (dot > 0): green
      arrowColor = 'rgba(76,175,80,0.8)';
      dotColor = '#4caf50';
    } else if (direction === 'Backward') {
      // Backward crossing (dot < 0): red
      arrowColor = 'rgba(244,67,54,0.8)';
      dotColor = '#f44336';
    } else {
      // TwoWay crossing: blue
      arrowColor = 'rgba(33,150,243,0.8)';
      dotColor = '#2196f3';
    }

    // Find the associated tripwire to get its direction
    let tripwirePerpX = 1;   // default perpendicular
    let tripwirePerpY = 0;
    for (const overlay of overlays) {
      if (overlay.kind === 'TripWire' && overlay.name === tripwireId) {
        // Get tripwire direction vector
        const pts = Array.isArray(overlay.points[0]) ? overlay.points.flat() : overlay.points;
        if (pts.length >= 4) {
          const midIdx = Math.floor(pts.length / 4);
          const x1 = pts[midIdx * 2];
          const y1 = pts[midIdx * 2 + 1];
          const x2 = pts[(midIdx + 1) * 2];
          const y2 = pts[(midIdx + 1) * 2 + 1];
          const dx = x2 - x1;
          const dy = y2 - y1;
          const len = Math.sqrt(dx * dx + dy * dy) || 1;
          // Perpendicular vector (rotated 90 degrees)
          tripwirePerpX = -dy / len;
          tripwirePerpY = dx / len;
        }
        break;
      }
    }

    const arrowSize = 12;

    if (direction === 'TwoWay') {
      // TwoWay: draw arrows in both directions perpendicular to tripwire

      // Forward arrow
      const fx1 = crossX + tripwirePerpX * arrowSize;
      const fy1 = crossY + tripwirePerpY * arrowSize;
      elements.push(
        `<line x1="crossX" y1="crossY" x2="fx1" y2="fy1" stroke="arrowColor" stroke-width="2.5" opacity="0.8"/>`
      );
      const perp2X = tripwirePerpY * arrowSize * 0.6;
      const perp2Y = -tripwirePerpX * arrowSize * 0.6;
      elements.push(
        `<polygon points="fx1,fy1 fx1 - tripwirePerpX * arrowSize - perp2X,fy1 - tripwirePerpY * arrowSize - perp2Y fx1 - tripwirePerpX * arrowSize + perp2X,fy1 - tripwirePerpY * arrowSize + perp2Y" fill="arrowColor"/>`
      );

      // Backward arrow
      const bx1 = crossX - tripwirePerpX * arrowSize;
      const by1 = crossY - tripwirePerpY * arrowSize;
      elements.push(
        `<line x1="crossX" y1="crossY" x2="bx1" y2="by1" stroke="arrowColor" stroke-width="2.5" opacity="0.8"/>`
      );
      elements.push(
        `<polygon points="bx1,by1 bx1 + tripwirePerpX * arrowSize - perp2X,by1 + tripwirePerpY * arrowSize - perp2Y bx1 + tripwirePerpX * arrowSize + perp2X,by1 + tripwirePerpY * arrowSize + perp2Y" fill="arrowColor"/>`
      );

      // Center circle
      elements.push(
        `<circle cx="crossX" cy="crossY" r="4" fill="dotColor" stroke="white" stroke-width="1.5"/>`
      );

      // Label
      elements.push(
        `<text x="crossX + 10" y="crossY - 5" font-size="11" fill="dotColor" font-weight="bold">⟷ TwoWay</text>`
      );
    } else {
      // Forward or Backward: single arrow perpendicular to tripwire
      const sign = direction === 'Forward' ? 1 : -1;
      const arrowX = crossX + tripwirePerpX * arrowSize * sign;
      const arrowY = crossY + tripwirePerpY * arrowSize * sign;

      // Arrow shaft
      elements.push(
        `<line x1="crossX" y1="crossY" x2="arrowX" y2="arrowY" stroke="arrowColor" stroke-width="2.5" opacity="0.8"/>`
      );

      // Arrow head
      const perpBaseX = tripwirePerpY * arrowSize * 0.6;
      const perpBaseY = -tripwirePerpX * arrowSize * 0.6;
      elements.push(
        `<polygon points="arrowX,arrowY arrowX - tripwirePerpX * arrowSize * sign - perpBaseX,arrowY - tripwirePerpY * arrowSize * sign - perpBaseY arrowX - tripwirePerpX * arrowSize * sign + perpBaseX,arrowY - tripwirePerpY * arrowSize * sign + perpBaseY" fill="arrowColor"/>`
      );

      // Center circle
      elements.push(
        `<circle cx="crossX" cy="crossY" r="4" fill="dotColor" stroke="white" stroke-width="1.5"/>`
      );

      // Label
      const dirLabel = direction === 'Forward' ? '✓ Forward' : '✗ Backward';
      elements.push(
        `<text x="crossX + 10" y="crossY - 5" font-size="11" fill="dotColor" font-weight="bold">dirLabel</text>`
      );
    }
  }

  return `<svg xmlns="http://www.w3.org/2000/svg" width="width" height="height">elements.join('')</svg>`;
}

/**
 * Build SVG for text lines displayed in a footer area.
 * Returns { svg: string, height: number }.
 *
 * The SVG is positioned to sit at the bottom of the extended canvas,
 * starting at y=imageHeight.
 */
export function buildTextFooterSvg(textLines, width, imageHeight) {
  const fontSize = 16;
  const lineHeight = fontSize + 8;
  const paddingTop = 10;
  const paddingBottom = 10;
  const paddingLeft = 12;
  const totalHeight = paddingTop + textLines.length * lineHeight + paddingBottom;

  const elements = [];
  // Dark background
  elements.push(
    `<rect x="0" y="imageHeight" width="width" height="totalHeight" fill="rgba(0,0,0,0.75)"/>`
  );

  // Text lines
  for (let i = 0; i < textLines.length; i++) {
    const ty = imageHeight + paddingTop + (i + 1) * lineHeight - 4;
    elements.push(
      `<text x="paddingLeft" y="ty" font-family="sans-serif" font-size="fontSize" fill="white">escapeXml(textLines[i])</text>`
    );
  }

  const svg = `<svg xmlns="http://www.w3.org/2000/svg" width="width" height="imageHeight + totalHeight">elements.join('')</svg>`;
  return { svg, height: totalHeight };
}

/**
 * Parse named CLI flags from argv.
 * Returns { positional: string[], flags: Record<string, string> }.
 */
function parseArgs(argv) {
  const positional = [];
  const flags = {};
  let i = 0;
  while (i < argv.length) {
    if (argv[i].startsWith('--') && i + 1 < argv.length) {
      const key = argv[i].slice(2);
      flags[key] = argv[i + 1];
      i += 2;
    } else {
      positional.push(argv[i]);
      i += 1;
    }
  }
  return { positional, flags };
}

async function main() {
  const { positional, flags } = parseArgs(process.argv.slice(2));

  const inputImage = positional[0];
  let detectionsArg = positional[1];
  let outputPath = positional[2];

  if (!inputImage || !detectionsArg) {
    console.error('Usage: node visualize.mjs <input-image> <detections-json | -> [output-path] [--overlays \'<json>\'] [--text \'<json>\'] [--crossings \'<json>\']');
    console.error('');
    console.error('  <detections-json>  JSON array of detections (parsedValue format), or "-" to read from stdin');
    console.error('  [output-path]      Optional output path. Default: <input>_detection.<ext>');
    console.error('  --overlays         JSON array of ROI/Tripwire overlay objects');
    console.error('  --text             JSON array of text lines to display as footer');
    console.error('  --crossings        JSON array of tripwire crossing events with motion vectors');
    process.exit(1);
  }

  // Resolve input image
  const resolvedInput = path.resolve(inputImage);
  if (!fs.existsSync(resolvedInput)) {
    console.error(`Input image not found: resolvedInput`);
    process.exit(1);
  }

  // Read detections JSON
  let detectionsJson;
  if (detectionsArg === '-') {
    detectionsJson = await readStdin();
  } else {
    detectionsJson = detectionsArg;
  }

  let detections;
  try {
    detections = JSON.parse(detectionsJson);
  } catch (err) {
    console.error(`Failed to parse detections JSON: err.message`);
    process.exit(1);
  }

  if (!Array.isArray(detections)) {
    console.error('Detections must be a JSON array');
    process.exit(1);
  }

  // Parse --overlays
  let overlays = [];
  if (flags.overlays) {
    try {
      overlays = JSON.parse(flags.overlays);
      if (!Array.isArray(overlays)) {
        console.error('--overlays must be a JSON array');
        process.exit(1);
      }
    } catch (err) {
      console.error(`Failed to parse --overlays JSON: err.message`);
      process.exit(1);
    }
  }

  // Parse --text
  let textLines = [];
  if (flags.text) {
    try {
      textLines = JSON.parse(flags.text);
      if (!Array.isArray(textLines)) {
        console.error('--text must be a JSON array');
        process.exit(1);
      }
    } catch (err) {
      console.error(`Failed to parse --text JSON: err.message`);
      process.exit(1);
    }
  }

  // Parse --crossings
  let crossings = [];
  if (flags.crossings) {
    try {
      crossings = JSON.parse(flags.crossings);
      if (!Array.isArray(crossings)) {
        console.error('--crossings must be a JSON array');
        process.exit(1);
      }
    } catch (err) {
      console.error(`Failed to parse --crossings JSON: err.message`);
      process.exit(1);
    }
  }

  // Validate: must have something to draw
  if (detections.length === 0 && overlays.length === 0 && textLines.length === 0) {
    console.error('Nothing to draw: detections, overlays, and text are all empty');
    process.exit(1);
  }

  // Default output path
  if (!outputPath) {
    const ext = path.extname(inputImage);
    const base = path.basename(inputImage, ext);
    const dir = path.dirname(resolvedInput);
    outputPath = path.join(dir, `base_detectionext`);
  } else {
    outputPath = path.resolve(outputPath);
  }

  // Read image metadata
  const image = sharp(resolvedInput);
  const metadata = await image.metadata();
  const { width, height } = metadata;

  // Build composite layers
  const composites = [];

  // Layer 1: input overlays (ROI/Tripwire) — drawn first (bottom layer)
  if (overlays.length > 0) {
    const overlaySvg = buildOverlaysSvg(overlays, width, height, crossings);
    composites.push({ input: Buffer.from(overlaySvg), top: 0, left: 0 });
  }

  // Layer 2: detection bboxes — drawn on top of overlays
  if (detections.length > 0) {
    const detSvg = buildDetectionSvg(detections, width, height);
    composites.push({ input: Buffer.from(detSvg), top: 0, left: 0 });
  }

  // Layer 3: text footer — extend canvas and draw at bottom
  let pipeline = image;
  if (textLines.length > 0) {
    const { svg: textSvg, height: textHeight } = buildTextFooterSvg(textLines, width, height);
    pipeline = pipeline.extend({
      bottom: textHeight,
      background: { r: 0, g: 0, b: 0, alpha: 1 },
    });
    composites.push({ input: Buffer.from(textSvg), top: 0, left: 0 });
  }

  // Composite and save
  await pipeline
    .composite(composites)
    .toFile(outputPath);

  console.log(outputPath);
}

import { fileURLToPath } from 'url';

if (process.argv[1] === fileURLToPath(import.meta.url)) {
  main();
}

FILE:scripts/show-grid.mjs
#!/usr/bin/env node

import fs from 'fs';
import path from 'path';
import sharp from 'sharp';
import { fileURLToPath } from 'url';

/**
 * Escape XML special characters for safe SVG embedding.
 */
function escapeXml(str) {
  return String(str)
    .replace(/&/g, '&amp;')
    .replace(/</g, '&lt;')
    .replace(/>/g, '&gt;')
    .replace(/"/g, '&quot;')
    .replace(/'/g, '&apos;');
}

/**
 * Compute adaptive grid dimensions so that cells are approximately square.
 *
 * @param {number} imageWidth
 * @param {number} imageHeight
 * @returns {{ cols: number, rows: number }}
 */
export function computeGridSize(imageWidth, imageHeight) {
  // Target ~30-42 intersection points with roughly square cells.
  // We pick the short side = 4 segments and scale the long side proportionally.
  const minSegments = 4;
  const ratio = imageWidth / imageHeight;

  let cols, rows;
  if (ratio >= 1) {
    // Landscape or square
    rows = minSegments;
    cols = Math.round(rows * ratio);
    // Clamp cols to a reasonable range
    cols = Math.max(minSegments, Math.min(cols, 12));
  } else {
    // Portrait
    cols = minSegments;
    rows = Math.round(cols / ratio);
    rows = Math.max(minSegments, Math.min(rows, 12));
  }

  return { cols, rows };
}

/**
 * Generate column labels: A, B, C, ..., Z, AA, AB, ...
 */
export function generateColLabels(count) {
  const labels = [];
  for (let i = 0; i < count; i++) {
    let label = '';
    let n = i;
    do {
      label = String.fromCharCode(65 + (n % 26)) + label;
      n = Math.floor(n / 26) - 1;
    } while (n >= 0);
    labels.push(label);
  }
  return labels;
}

/**
 * Generate row labels: 0, 1, 2, ...
 */
export function generateRowLabels(count) {
  return Array.from({ length: count }, (_, i) => String(i));
}

/**
 * Build an SVG overlay with grid lines, intersection dots, and labels.
 *
 * The SVG covers the full extended canvas (margin + image).
 *
 * @param {object} params
 * @param {number} params.imageWidth
 * @param {number} params.imageHeight
 * @param {number} params.cols - number of column segments (cols+1 vertical lines)
 * @param {number} params.rows - number of row segments (rows+1 horizontal lines)
 * @param {number} params.marginTop
 * @param {number} params.marginLeft
 * @param {number} params.marginBottom
 * @param {number} params.marginRight
 * @returns {string} SVG string
 */
export function buildGridSvg({ imageWidth, imageHeight, cols, rows, marginTop, marginLeft, marginBottom, marginRight }) {
  const totalWidth = marginLeft + imageWidth + marginRight;
  const totalHeight = marginTop + imageHeight + marginBottom;
  const gridWidth = imageWidth / cols;
  const gridHeight = imageHeight / rows;

  const colLabels = generateColLabels(cols + 1);
  const rowLabels = generateRowLabels(rows + 1);

  const elements = [];
  const fontSize = 16;
  const dotR = 5;

  // Grid lines — vertical
  for (let c = 0; c <= cols; c++) {
    const x = marginLeft + c * gridWidth;
    elements.push(
      `<line x1="x" y1="marginTop" x2="x" y2="totalHeight" stroke="rgba(255,255,255,0.55)" stroke-width="2"/>`
    );
  }

  // Grid lines — horizontal
  for (let r = 0; r <= rows; r++) {
    const y = marginTop + r * gridHeight;
    elements.push(
      `<line x1="marginLeft" y1="y" x2="totalWidth" y2="y" stroke="rgba(255,255,255,0.55)" stroke-width="2"/>`
    );
  }

  // Intersection dots
  for (let c = 0; c <= cols; c++) {
    for (let r = 0; r <= rows; r++) {
      const cx = marginLeft + c * gridWidth;
      const cy = marginTop + r * gridHeight;
      elements.push(
        `<circle cx="cx" cy="cy" r="dotR" fill="white" stroke="rgba(0,0,0,0.7)" stroke-width="1.5"/>`
      );
    }
  }

  // Column labels — along the top margin
  for (let c = 0; c <= cols; c++) {
    const x = marginLeft + c * gridWidth;
    const textWidth = colLabels[c].length * fontSize * 0.65;
    const bgWidth = textWidth + 10;
    const bgX = x - bgWidth / 2;
    const bgY = 2;
    const bgH = fontSize + 10;
    elements.push(
      `<rect x="bgX" y="bgY" width="bgWidth" height="bgH" fill="rgba(0,0,0,0.75)" rx="3"/>`
    );
    elements.push(
      `<text x="x" y="bgY + fontSize + 3" font-family="sans-serif" font-size="fontSize" font-weight="bold" fill="white" text-anchor="middle">escapeXml(colLabels[c])</text>`
    );
  }

  // Row labels — along the left margin
  for (let r = 0; r <= rows; r++) {
    const y = marginTop + r * gridHeight;
    const label = rowLabels[r];
    const textWidth = label.length * fontSize * 0.65;
    const bgWidth = textWidth + 10;
    const bgH = fontSize + 10;
    const bgX = 2;
    const bgY = y - bgH / 2;
    elements.push(
      `<rect x="bgX" y="bgY" width="bgWidth" height="bgH" fill="rgba(0,0,0,0.75)" rx="3"/>`
    );
    elements.push(
      `<text x="bgX + bgWidth / 2" y="bgY + fontSize + 3" font-family="sans-serif" font-size="fontSize" font-weight="bold" fill="white" text-anchor="middle">escapeXml(label)</text>`
    );
  }

  return `<svg xmlns="http://www.w3.org/2000/svg" width="totalWidth" height="totalHeight">elements.join('')</svg>`;
}

/**
 * Parse named CLI flags from argv.
 */
function parseArgs(argv) {
  const positional = [];
  const flags = {};
  let i = 0;
  while (i < argv.length) {
    if (argv[i].startsWith('--') && i + 1 < argv.length) {
      const key = argv[i].slice(2);
      flags[key] = argv[i + 1];
      i += 2;
    } else {
      positional.push(argv[i]);
      i += 1;
    }
  }
  return { positional, flags };
}

async function main() {
  const { positional, flags } = parseArgs(process.argv.slice(2));
  const inputImage = positional[0];
  let outputPath = positional[1];

  if (!inputImage) {
    console.error('Usage: node show-grid.mjs <input-image> [output-path] [--cols N] [--rows N]');
    console.error('');
    console.error('  Generates a grid reference image for specifying ROI/Tripwire coordinates.');
    console.error('  --cols   Number of column segments (default: auto based on aspect ratio)');
    console.error('  --rows   Number of row segments (default: auto based on aspect ratio)');
    process.exit(1);
  }

  const resolvedInput = path.resolve(inputImage);
  if (!fs.existsSync(resolvedInput)) {
    console.error(`Input image not found: resolvedInput`);
    process.exit(1);
  }

  // Read image metadata
  const image = sharp(resolvedInput);
  const metadata = await image.metadata();
  const { width: imageWidth, height: imageHeight } = metadata;

  // Compute grid size
  let { cols, rows } = computeGridSize(imageWidth, imageHeight);
  if (flags.cols) cols = parseInt(flags.cols, 10);
  if (flags.rows) rows = parseInt(flags.rows, 10);

  if (cols < 1 || rows < 1 || isNaN(cols) || isNaN(rows)) {
    console.error('--cols and --rows must be positive integers');
    process.exit(1);
  }

  const marginTop = 32;
  const marginLeft = 32;
  const marginBottom = 18;
  const marginRight = 18;

  // Build SVG
  const svg = buildGridSvg({ imageWidth, imageHeight, cols, rows, marginTop, marginLeft, marginBottom, marginRight });

  // Default output path
  if (!outputPath) {
    const ext = path.extname(inputImage);
    const base = path.basename(inputImage, ext);
    const dir = path.dirname(resolvedInput);
    outputPath = path.join(dir, `base_gridext`);
  } else {
    outputPath = path.resolve(outputPath);
  }

  // Extend canvas for margins, composite grid SVG
  await image
    .extend({
      top: marginTop,
      left: marginLeft,
      bottom: marginBottom,
      right: marginRight,
      background: { r: 30, g: 30, b: 30, alpha: 1 },
    })
    .composite([{ input: Buffer.from(svg), top: 0, left: 0 }])
    .toFile(outputPath);

  // Output metadata JSON for Agent consumption
  const gridWidth = imageWidth / cols;
  const gridHeight = imageHeight / rows;
  const colLabels = generateColLabels(cols + 1);
  const rowLabels = generateRowLabels(rows + 1);

  const result = {
    path: outputPath,
    cols,
    rows,
    colLabels,
    rowLabels,
    gridWidth: Math.round(gridWidth * 100) / 100,
    gridHeight: Math.round(gridHeight * 100) / 100,
    marginLeft,
    marginTop,
    imageWidth,
    imageHeight,
  };

  console.log(JSON.stringify(result));
}

if (process.argv[1] === fileURLToPath(import.meta.url)) {
  main();
}

FILE:scripts/intent-invoke.mjs
#!/usr/bin/env node

import { querySkills } from './query.mjs';
import { callMultimodal } from './multimodal.mjs';
import { runSkill } from './invoke.mjs';
import { getWorkspaceSkills } from './workspace.mjs';

const DEFAULT_CONFIDENCE_THRESHOLD = 0.7;
const MAX_SKILL_RETRIES = 3;

/**
 * Determine if we should use the skill based on confidence threshold.
 */
export function shouldUseSkill(skill, threshold = DEFAULT_CONFIDENCE_THRESHOLD) {
  return !!(skill && skill.epId && typeof skill.confidence === 'number' && skill.confidence >= threshold);
}

/**
 * Format the final result consistently.
 */
export function formatResult(mode, skill, detections = []) {
  return {
    success: true,
    mode,
    epId: skill?.epId || null,
    skillName: skill?.name || null,
    confidence: skill?.confidence || null,
    count: detections.length,
    detections
  };
}

/**
 * Extract detections array from a runSkill() result.
 */
function extractDetections(result) {
  if (result?.未检出) return [];
  const outputs = result?.result?.outputs;
  if (!outputs || outputs.length === 0) return [];
  const output = outputs[0];
  if (output?.parsedValue && Array.isArray(output.parsedValue)) {
    return output.parsedValue;
  }
  return [];
}

/**
 * Main orchestration function:
 * 1. Query router for skills matching intent
 * 2. Try top skills one by one (up to MAX_SKILL_RETRIES)
 * 3. If all skills fail, fallback to multimodal
 */
export async function intentInvoke(intent, imagePath, options = {}) {
  const { threshold = DEFAULT_CONFIDENCE_THRESHOLD, topK = 5 } = options;

  // Step 1: Query for matching skills
  let skills;
  try {
    skills = await querySkills(intent, { topK });
  } catch (err) {
    // Query failed, will fallback to multimodal
    skills = [];
  }

  // Step 2: Try top skills one by one
  const candidates = skills.filter(s => shouldUseSkill(s, threshold)).slice(0, MAX_SKILL_RETRIES);
  for (const skill of candidates) {
    try {
      const { result } = await runSkill(skill.epId, { input0: { image: imagePath } }, { autoROI: true });
      const detections = extractDetections(result);
      return formatResult('skill', skill, detections);
    } catch (err) {
      console.error(`Skill skill.epId (skill.name) failed: err.message`);
      // Continue to next skill
    }
  }

  // Step 3: Public skills all failed — search private workspace
  try {
    const privateSkills = await getWorkspaceSkills();
    if (privateSkills.length > 0) {
      return {
        success: true,
        mode: 'workspace-search',
        epId: null,
        skillName: null,
        confidence: null,
        count: 0,
        detections: [],
        privateSkills: privateSkills.map(s => ({
          epId: s.epId,
          displayName: s.displayName,
          description: s.description,
        })),
        hint: '公共技能无匹配。请从 privateSkills 列表中选择最匹配的技能，使用 invoke.mjs 调用。如无合适技能，可调用 multimodal.mjs 回退。',
      };
    }
  } catch (err) {
    console.error(`Private workspace search failed: err.message`);
    // Continue to multimodal fallback
  }

  // Step 4: Fallback to multimodal
  try {
    const detections = await callMultimodal(intent, imagePath);
    return formatResult('multimodal', null, detections);
  } catch (err) {
    return {
      success: false,
      error: err.message,
      mode: 'multimodal',
      epId: null,
      detections: []
    };
  }
}

/**
 * CLI entry point.
 */
async function main() {
  const intent = process.argv[2];
  const imagePath = process.argv[3];
  const threshold = parseFloat(process.argv[4]) || DEFAULT_CONFIDENCE_THRESHOLD;

  if (!intent) {
    console.error('Usage: node intent-invoke.mjs <intent> [image-path] [threshold]');
    console.error('Example: node intent-invoke.mjs "detect people falling" photo.jpg 0.8');
    console.error('');
    console.error('Parameters:');
    console.error('  intent      - Description of what to detect (required)');
    console.error('  image-path  - Path to image file (optional)');
    console.error('  threshold   - Confidence threshold 0.0-1.0 (default: 0.7)');
    process.exit(1);
  }

  try {
    const result = await intentInvoke(intent, imagePath, { threshold });
    console.log(JSON.stringify(result, null, 2));
    process.exit(result.success ? 0 : 1);
  } catch (err) {
    console.error(JSON.stringify({
      success: false,
      error: err.message
    }, null, 2));
    process.exit(1);
  }
}

if (import.meta.url === `file://process.argv[1]`) {
  main();
}

FILE:scripts/workspace.mjs
#!/usr/bin/env node

import path from 'path';
import { fileURLToPath } from 'url';
import { httpsRequest, getApiKey, workspacesGetUrl, workspaceSkillsGetUrl } from './utils.mjs';
import { readCache, writeCache, getCacheDir } from './cache.mjs';

const __filename = fileURLToPath(import.meta.url);
const __dirname = path.dirname(__filename);
const CACHE_DIR = getCacheDir();
const WORKSPACE_CACHE_FILE = path.join(CACHE_DIR, 'workspace-cache.json');
const SKILLS_CACHE_FILE = path.join(CACHE_DIR, 'skills-cache.json');
const SKILLS_CACHE_TTL = 60 * 60 * 1000; // 1 hour in ms

function buildEpId(workspaceId, localName) {
  // localName format: c-sk-XXXXX → extract XXXXX part
  const suffix = localName.replace(/^c-sk-/, '');
  return `ep-workspaceId-suffix`;
}

// ── API Functions ────────────────────────────────────────────────────

/**
 * Get the default workspace ID. Uses long-lived cache (tied to API key).
 */
export async function getDefaultWorkspaceId() {
  const cached = readCache(WORKSPACE_CACHE_FILE);
  if (cached?.workspaceId) return cached.workspaceId;

  const apiKey = getApiKey();
  const url = workspacesGetUrl();
  const { statusCode, body } = await httpsRequest(url, {
    method: 'POST',
    headers: {
      'Authorization': `Bearer apiKey`,
      'Content-Type': 'application/json',
    },
  });

  if (statusCode !== 200) {
    throw new Error(`workspaces/get failed: HTTP statusCode - body`);
  }

  const resp = JSON.parse(body);
  if (!resp.success || !resp.page?.result) {
    throw new Error(`workspaces/get unexpected response: body`);
  }

  const defaultWorkspace = resp.page.result.find(ws => ws.type === 'default');
  if (!defaultWorkspace) {
    throw new Error('No default workspace found');
  }

  const workspaceId = defaultWorkspace.id;
  writeCache(WORKSPACE_CACHE_FILE, { workspaceId, timestamp: Date.now() });
  return workspaceId;
}

/**
 * Get all released SkillApi skills from the default workspace.
 * Uses file cache with 1-hour TTL. Each skill includes pre-constructed epId.
 */
export async function getWorkspaceSkills() {
  const cached = readCache(SKILLS_CACHE_FILE, SKILLS_CACHE_TTL);
  if (cached?.skills) return cached.skills;

  const workspaceId = await getDefaultWorkspaceId();
  const apiKey = getApiKey();
  const url = workspaceSkillsGetUrl(workspaceId);

  const allSkills = [];
  let pageNo = 1;
  const pageSize = 50;

  // Paginate to fetch all skills
  while (true) {
    const { statusCode, body } = await httpsRequest(url, {
      method: 'POST',
      headers: {
        'Authorization': `Bearer apiKey`,
        'Content-Type': 'application/json',
      },
      body: JSON.stringify({
        workspaceID: workspaceId,
        orderBy: 'last_published_at',
        order: 'desc',
        pageNo,
        pageSize,
        publicationKinds: ['SkillApi'],
        status: 'Released',
      }),
    });

    if (statusCode !== 200) {
      throw new Error(`skills/get failed: HTTP statusCode - body`);
    }

    const resp = JSON.parse(body);
    if (!resp.success || !resp.page?.result) {
      throw new Error(`skills/get unexpected response: body`);
    }

    const skills = resp.page.result.map(skill => ({
      epId: buildEpId(workspaceId, skill.localName),
      displayName: skill.displayName,
      description: skill.description || '',
      localName: skill.localName,
      workspaceId,
    }));

    allSkills.push(...skills);

    const totalCount = resp.page.totalCount || 0;
    if (allSkills.length >= totalCount || skills.length < pageSize) break;
    pageNo++;
  }

  writeCache(SKILLS_CACHE_FILE, { workspaceId, skills: allSkills, timestamp: Date.now() });
  return allSkills;
}

// ── CLI ──────────────────────────────────────────────────────────────

async function main() {
  const command = process.argv[2];

  if (command === 'list-skills') {
    try {
      const skills = await getWorkspaceSkills();
      if (skills.length === 0) {
        console.log('No private workspace skills found.');
        return;
      }
      console.log(`Private workspace skills (skills.length total):\n`);
      for (const skill of skills) {
        const desc = skill.description ? ` - skill.description` : '';
        console.log(`  skill.epId  skill.displayNamedesc`);
      }
    } catch (err) {
      console.error(`Error: err.message`);
      process.exit(1);
    }
  } else {
    console.log('Usage: node workspace.mjs list-skills');
    console.log('');
    console.log('Commands:');
    console.log('  list-skills    List all released private workspace skills');
  }
}

if (import.meta.url === `file://process.argv[1]`) {
  main();
}

FILE:scripts/query.mjs
#!/usr/bin/env node

import { httpsRequest, getApiKey, routerQueryUrl } from './utils.mjs';

/**
 * Parse and validate the router query response.
 * Returns array of skills sorted by confidence (highest first).
 */
export function parseQueryResponse(response) {
  // Handle internal API response format
  if (response?.data?.candidates && Array.isArray(response.data.candidates)) {
    return response.data.candidates
      .filter(s => s.skillId && typeof s.score === 'number')
      .map(s => ({
        epId: s.skillId,
        name: s.displayName,
        description: s.description,
        confidence: s.score
      }))
      .sort((a, b) => b.confidence - a.confidence);
  }
  // Fallback to empty array
  return [];
}

/**
 * Query skills by intent text.
 * @param {string} intent - User's intent text
 * @param {object} options - Optional parameters
 * @param {number} options.topK - Maximum number of results (default: 5)
 * @returns {Promise<Array>} Array of matching skills with confidence scores
 */
export async function querySkills(intent, options = {}) {
  const apiKey = getApiKey();
  const { topK = 5 } = options;

  const requestBody = JSON.stringify({
    query: intent,
    topK
  });

  const response = await httpsRequest(routerQueryUrl(), {
    method: 'POST',
    headers: {
      'Authorization': `Bearer apiKey`,
      'Content-Type': 'application/json',
      'Accept': 'application/json',
    },
    body: requestBody,
  });

  if (response.statusCode !== 200) {
    throw new Error(`Query failed with status response.statusCode: response.body`);
  }

  let result;
  try {
    result = JSON.parse(response.body);
  } catch (err) {
    throw new Error(`Failed to parse query response: err.message`);
  }

  return parseQueryResponse(result);
}

/**
 * CLI entry point for testing query functionality.
 */
async function main() {
  const intent = process.argv[2];
  const topK = parseInt(process.argv[3], 10) || 5;

  if (!intent) {
    console.error('Usage: node query.mjs <intent> [topK]');
    console.error('Example: node query.mjs "detect person falling" 3');
    process.exit(1);
  }

  try {
    const skills = await querySkills(intent, { topK });
    console.log(JSON.stringify({
      success: true,
      intent,
      count: skills.length,
      skills
    }, null, 2));
  } catch (err) {
    console.error(JSON.stringify({
      success: false,
      error: err.message
    }, null, 2));
    process.exit(1);
  }
}

if (import.meta.url === `file://process.argv[1]`) {
  main();
}

FILE:scripts/types.mjs
/**
 * types.mjs — Yijian type constants, classification, serialization & deserialization.
 *
 * Centralizes all type-related logic previously scattered across invoke.mjs
 * and intent-invoke.mjs.
 */

import fs from 'fs';
import path from 'path';
import crypto from 'crypto';

// ---------------------------------------------------------------------------
// 1. Type constants & classification
// ---------------------------------------------------------------------------

/** Basic scalar types — value is a plain string. */
export const BASIC_TYPES = ['String', 'TemplateString', 'Integer', 'Double', 'Boolean', 'Time'];

/** Complex object types — value needs JSON.stringify. */
export const COMPLEX_TYPES = ['Image', 'Detection', 'TrackDetection', 'Attribute', 'ROI', 'Tripwire'];

/** Inner types that can be wrapped in Array<T>. */
export const ARRAY_INNER_TYPES = [...BASIC_TYPES, ...COMPLEX_TYPES, 'Target'];

const _basicSet = new Set(BASIC_TYPES);
const _complexSet = new Set(COMPLEX_TYPES);
const _arrayInnerSet = new Set(ARRAY_INNER_TYPES);

/** Types whose output can be visualized (draw bbox / polygon / polyline). */
const _visualizableTypes = new Set(['Detection', 'TrackDetection', 'ROI', 'Tripwire']);

/** Types that represent input-side visual overlays (ROI, Tripwire). */
const _inputVisualizableTypes = new Set(['ROI', 'Tripwire']);

export function isBasicType(type) {
  return _basicSet.has(type);
}

export function isComplexType(type) {
  return _complexSet.has(type);
}

/**
 * Returns true if `type` is `Array<...>`.
 */
export function isArrayType(type) {
  return typeof type === 'string' && type.startsWith('Array<') && type.endsWith('>');
}

/**
 * Unwrap `Array<T>` → `T`. Returns `null` for non-array types.
 */
export function unwrapArrayType(type) {
  if (!isArrayType(type)) return null;
  return type.slice(6, -1); // strip "Array<" and ">"
}

/**
 * Returns true if the type is a visual/complex type (Image, Detection, etc.),
 * including Array-wrapped forms.
 */
export function isVisualType(type) {
  if (_complexSet.has(type)) return true;
  const inner = unwrapArrayType(type);
  return inner !== null && _complexSet.has(inner);
}

/**
 * Returns true if the type supports output visualization
 * (Detection, TrackDetection, ROI, Tripwire, and their Array<> forms).
 */
export function hasVisualization(type) {
  if (_visualizableTypes.has(type)) return true;
  const inner = unwrapArrayType(type);
  return inner !== null && _visualizableTypes.has(inner);
}

/**
 * Returns true if the type is an input-side visualizable overlay
 * (ROI, Tripwire, and their Array<> forms).
 */
export function isInputVisualizableType(type) {
  if (_inputVisualizableTypes.has(type)) return true;
  const inner = unwrapArrayType(type);
  return inner !== null && _inputVisualizableTypes.has(inner);
}

// ---------------------------------------------------------------------------
// 2. Field schema definitions
// ---------------------------------------------------------------------------

export const IMAGE_FIELDS = {
  imageId:      { type: 'String',  required: false },
  imageData:    { type: 'String',  required: false },
  sourceId:     { type: 'String',  required: false },
  imageWidth:   { type: 'Integer', required: false },
  imageHeight:  { type: 'Integer', required: false },
  timestamp:    { type: 'Integer', required: false },
};

export const DETECTION_FIELDS = {
  image_id:     { type: 'String',         required: false },
  source_id:    { type: 'String',         required: false },
  image_url:    { type: 'String',         required: false },
  description:  { type: 'String',         required: false },
  answers:      { type: 'Array<String>',  required: false },
  image_base64: { type: 'Array<String>',  required: false },
  predictions:  { type: 'Array<Object>',  required: false },
  roi_ids:      { type: 'Array<String>',  required: false },
  answer:       { type: 'String',         required: false },
};

/** TrackDetection shares the same schema as Detection. */
export const TRACK_DETECTION_FIELDS = DETECTION_FIELDS;

export const ATTRIBUTE_FIELDS = {
  image_id:     { type: 'String',         required: false },
  image_url:    { type: 'String',         required: false },
  description:  { type: 'String',         required: false },
  answer:       { type: 'Array<String>',  required: false },
  image_base64: { type: 'Array<String>',  required: false },
  predictions:  { type: 'Array<Object>',  required: false },
  roiIds:       { type: 'Array<String>',  required: false },
};

export const ROI_FIELDS = {
  id:           { type: 'String',        required: false },
  name:         { type: 'String',        required: false },
  displayName:  { type: 'String',        required: false },
  points:       { type: 'Array<Number>', required: true },
  iou:          { type: 'Float',         required: false, default: 0.0 },
  target_ratio: { type: 'Float',         required: false, default: 0.5 },
  kind:         { type: 'String',        required: true,  default: 'ROI' },
  interested:   { type: 'Boolean',       required: false, default: true },
  direction:    { type: 'String',        required: false },
  visuals:      { type: 'Object',        required: false },
};

export const TRIPWIRE_FIELDS = {
  id:           { type: 'String',        required: false },
  name:         { type: 'String',        required: false },
  displayName:  { type: 'String',        required: false },
  points:       { type: 'Array<Number>', required: true },
  iou:          { type: 'Float',         required: false, default: 0.0 },
  target_ratio: { type: 'Float',         required: false, default: 0.5 },
  kind:         { type: 'String',        required: true,  default: 'TripWire' },
  interested:   { type: 'Boolean',       required: false, default: true },
  direction:    { type: 'String',        required: true,  default: 'TwoWay' },
  visuals:      { type: 'Object',        required: false },
};

/** Lookup field schema by type name. */
export const FIELD_SCHEMAS = {
  Image:          IMAGE_FIELDS,
  Detection:      DETECTION_FIELDS,
  TrackDetection: TRACK_DETECTION_FIELDS,
  Attribute:      ATTRIBUTE_FIELDS,
  ROI:            ROI_FIELDS,
  Tripwire:       TRIPWIRE_FIELDS,
};

// ---------------------------------------------------------------------------
// 3. Serialization — buildValue(type, userValue)
// ---------------------------------------------------------------------------

/**
 * Simple heuristic to check if a string looks like base64-encoded data.
 */
function isBase64(str) {
  if (typeof str !== 'string' || str.length < 100) return false;
  return /^[A-Za-z0-9+/\n]+=*$/.test(str.trim());
}

/**
 * Parse image dimensions from buffer (supports JPEG, PNG and WEBP).
 */
export function getImageDimensions(buffer) {
  // PNG: bytes 16-23 contain width and height in the IHDR chunk
  if (buffer[0] === 0x89 && buffer[1] === 0x50 && buffer[2] === 0x4E && buffer[3] === 0x47) {
    return {
      width: buffer.readUInt32BE(16),
      height: buffer.readUInt32BE(20),
    };
  }

  // JPEG: search for SOF markers
  if (buffer[0] === 0xFF && buffer[1] === 0xD8) {
    let offset = 2;
    while (offset < buffer.length - 1) {
      if (buffer[offset] !== 0xFF) { offset++; continue; }
      const marker = buffer[offset + 1];
      if (marker === 0xD9) break; // EOI
      if (marker >= 0xC0 && marker <= 0xCF && marker !== 0xC4 && marker !== 0xC8) {
        return {
          height: buffer.readUInt16BE(offset + 5),
          width: buffer.readUInt16BE(offset + 7),
        };
      }
      const segLen = buffer.readUInt16BE(offset + 2);
      offset += 2 + segLen;
    }
  }

  // WEBP: RIFF????WEBP VP8 /VP8L/VP8X chunks
  if (buffer.length >= 12 &&
      buffer[0] === 0x52 && buffer[1] === 0x49 && buffer[2] === 0x46 && buffer[3] === 0x46 &&
      buffer[8] === 0x57 && buffer[9] === 0x45 && buffer[10] === 0x42 && buffer[11] === 0x50) {
    const chunkId = buffer.slice(12, 16).toString('ascii');
    if (chunkId === 'VP8 ' && buffer.length >= 30) {
      // Lossy: width/height at bytes 26-29 (14-bit each, little-endian, mask 0x3FFF)
      return {
        width: (buffer.readUInt16LE(26) & 0x3FFF),
        height: (buffer.readUInt16LE(28) & 0x3FFF),
      };
    }
    if (chunkId === 'VP8L' && buffer.length >= 25) {
      // Lossless: 28-bit packed fields starting at byte 21
      const bits = buffer.readUInt32LE(21);
      return {
        width: (bits & 0x3FFF) + 1,
        height: ((bits >> 14) & 0x3FFF) + 1,
      };
    }
    if (chunkId === 'VP8X' && buffer.length >= 30) {
      // Extended: 24-bit little-endian width-1 at byte 24, height-1 at byte 27
      return {
        width: buffer.readUIntLE(24, 3) + 1,
        height: buffer.readUIntLE(27, 3) + 1,
      };
    }
  }

  return { width: 0, height: 0 };
}

/**
 * Read an image file and return { base64, width, height }.
 */
export function readImageFile(filePath) {
  const resolved = path.resolve(filePath);
  if (!fs.existsSync(resolved)) {
    throw new Error(`Image file not found: resolved`);
  }
  const buffer = fs.readFileSync(resolved);
  const { width, height } = getImageDimensions(buffer);
  return { base64: buffer.toString('base64'), width, height };
}

/**
 * Compute a short hash ID from the first 64 KB of a file.
 * Returns a stable 16-char hex string (MD5 prefix).
 */
export function fileHashId(filePath) {
  const fd = fs.openSync(path.resolve(filePath), 'r');
  const buf = Buffer.alloc(65536);
  const bytesRead = fs.readSync(fd, buf, 0, 65536, 0);
  fs.closeSync(fd);
  return crypto.createHash('md5').update(buf.subarray(0, bytesRead)).digest('hex').slice(0, 16);
}

/**
 * Build the Image value object and JSON-stringify it.
 */
function buildImageValue(userValue) {
  // (1) Pre-built object with imageData → pass through
  if (typeof userValue === 'object' && userValue !== null && userValue.imageData) {
    return JSON.stringify(userValue);
  }

  // (2) Object form { file, sourceId?, imageId?, timestamp? } or { image, sourceId?, imageId?, timestamp? } — for video frames
  if (typeof userValue === 'object' && userValue !== null && (userValue.file || userValue.image)) {
    const filePath = userValue.file || userValue.image;
    const img = readImageFile(filePath);
    return JSON.stringify({
      imageData: img.base64,
      imageId: userValue.imageId ?? fileHashId(filePath),
      sourceId: userValue.sourceId ?? fileHashId(filePath),
      imageWidth: img.width,
      imageHeight: img.height,
      timestamp: userValue.timestamp ?? Date.now(),
    });
  }

  // (3) String path or base64 (existing logic)
  let imageData, imageWidth = 0, imageHeight = 0;
  let filePath = null;
  if (typeof userValue === 'string') {
    if (isBase64(userValue)) {
      imageData = userValue;
    } else {
      filePath = userValue;
      const img = readImageFile(userValue);
      imageData = img.base64;
      imageWidth = img.width;
      imageHeight = img.height;
    }
  } else {
    throw new Error(`Unsupported image value type: typeof userValue`);
  }

  return JSON.stringify({
    imageData,
    imageId: filePath ? fileHashId(filePath) : '0',
    sourceId: filePath ? fileHashId(filePath) : '',
    imageWidth,
    imageHeight,
    timestamp: Date.now(),
  });
}

/**
 * Apply default values from a field schema to a user-provided object.
 */
function applyDefaults(obj, fields) {
  const result = { ...obj };
  for (const [key, def] of Object.entries(fields)) {
    if (result[key] === undefined && def.default !== undefined) {
      result[key] = def.default;
    }
  }
  return result;
}

/**
 * Build a ROI value: fill in defaults (kind='ROI', etc.), JSON.stringify.
 */
function buildROIValue(userValue) {
  if (typeof userValue === 'string') return userValue; // assume pre-serialized
  const obj = applyDefaults(userValue, ROI_FIELDS);
  return JSON.stringify(obj);
}

/**
 * Build a Tripwire value: fill in defaults (kind='TripWire', direction, etc.), JSON.stringify.
 */
function buildTripwireValue(userValue) {
  if (typeof userValue === 'string') return userValue;
  const obj = applyDefaults(userValue, TRIPWIRE_FIELDS);
  return JSON.stringify(obj);
}

/**
 * Build a complex object value (Detection, TrackDetection, Attribute).
 * If already an object, JSON.stringify; if string, assume pre-serialized.
 */
function buildGenericComplexValue(userValue) {
  if (typeof userValue === 'string') return userValue;
  return JSON.stringify(userValue);
}

/**
 * Unified entry point: serialize a user value based on its Yijian type.
 *
 * @param {string} type   - Yijian type (e.g. 'Image', 'ROI', 'Array<Detection>')
 * @param {*} userValue   - user-provided value
 * @returns {string}      - serialized string suitable for the API
 */
export function buildValue(type, userValue) {
  if (userValue === undefined || userValue === null) return '';

  // Array<T> — recurse on each element
  const inner = unwrapArrayType(type);
  if (inner !== null) {
    const arr = Array.isArray(userValue) ? userValue : [userValue];
    const built = arr.map(elem => {
      const s = buildValue(inner, elem);
      // For complex types the inner buildValue returns JSON string; parse it back
      // so the outer stringify produces a proper array.
      if (isComplexType(inner)) {
        try { return JSON.parse(s); } catch { return s; }
      }
      return s;
    });
    return JSON.stringify(built);
  }

  switch (type) {
    case 'Image':
      return buildImageValue(userValue);
    case 'ROI':
      return buildROIValue(userValue);
    case 'Tripwire':
      return buildTripwireValue(userValue);
    case 'Detection':
    case 'TrackDetection':
    case 'Attribute':
      return buildGenericComplexValue(userValue);
    default:
      // Basic types — convert to string
      if (typeof userValue === 'object') return JSON.stringify(userValue);
      return String(userValue);
  }
}

// ---------------------------------------------------------------------------
// 4. Deserialization — parseValue(type, valueString)
// ---------------------------------------------------------------------------

/**
 * Parse Detection / TrackDetection value string into structured results.
 * Extracts key fields from predictions.
 */
function parseDetections(value) {
  const images = JSON.parse(value);
  const detections = [];
  for (const image of images) {
    for (const pred of (image.predictions || [])) {
      detections.push({
        bbox: pred.bbox,
        confidence: pred.confidence,
        category: pred.categories && pred.categories.length > 0
          ? { id: pred.categories[0].id, name: pred.categories[0].name, confidence: pred.categories[0].confidence }
          : null,
        track_id: pred.track_id,
        area: pred.area,
      });
    }
  }
  return detections;
}

/**
 * Parse Attribute value string into structured results.
 * Similar to Detection but also extracts answer field.
 */
function parseAttributes(value) {
  const images = JSON.parse(value);
  const results = [];
  for (const image of images) {
    const entry = {
      answer: image.answer,
      predictions: [],
    };
    for (const pred of (image.predictions || [])) {
      entry.predictions.push({
        bbox: pred.bbox,
        confidence: pred.confidence,
        categories: pred.categories,
        answer: pred.answer,
      });
    }
    results.push(entry);
  }
  return results;
}

/**
 * Unified entry point: deserialize a value string based on its Yijian type.
 *
 * @param {string} type        - Yijian type
 * @param {string} valueString - raw value string from API response
 * @returns {*}                - parsed value
 */
export function parseValue(type, valueString) {
  if (valueString === undefined || valueString === null || valueString === '') return valueString;

  // Array<T>
  const inner = unwrapArrayType(type);
  if (inner !== null) {
    // For Detection/TrackDetection arrays the entire value is already the images array
    if (inner === 'Detection' || inner === 'TrackDetection') {
      return parseDetections(valueString);
    }
    if (inner === 'Attribute') {
      return parseAttributes(valueString);
    }
    // Generic array: parse, then recurse on each element
    const arr = JSON.parse(valueString);
    return arr.map(elem => {
      const s = typeof elem === 'string' ? elem : JSON.stringify(elem);
      return parseValue(inner, s);
    });
  }

  switch (type) {
    case 'Detection':
    case 'TrackDetection':
      // Single detection (wrapped in an array by API)
      return parseDetections(valueString);
    case 'Attribute':
      return parseAttributes(valueString);
    case 'ROI':
    case 'Tripwire':
      return JSON.parse(valueString);
    case 'Image':
      return JSON.parse(valueString);
    default:
      // Basic types — return as-is
      return valueString;
  }
}

/**
 * Post-process an outputs array: parse complex types and attach parsedValue.
 */
export function parseOutputs(outputs) {
  for (const field of outputs) {
    if (field.value && (isComplexType(field.type) || isArrayType(field.type))) {
      try {
        field.parsedValue = parseValue(field.type, field.value);
      } catch {
        // keep original value if parsing fails
      }
    }
  }
}

// ---------------------------------------------------------------------------
// 5. Image data URI helpers
// ---------------------------------------------------------------------------

const MIME_MAP = {
  '.jpg': 'image/jpeg', '.jpeg': 'image/jpeg',
  '.png': 'image/png', '.gif': 'image/gif',
  '.webp': 'image/webp', '.bmp': 'image/bmp',
};

/**
 * Convert a local file path to a base64 data URI.
 */
export function imageToDataUri(filePath) {
  const ext = path.extname(filePath).toLowerCase();
  const mime = MIME_MAP[ext] || 'image/jpeg';
  const data = fs.readFileSync(filePath);
  return `data:mime;base64,data.toString('base64')`;
}

/**
 * Check if a string looks like a local file path (not a URL or data URI).
 */
export function isLocalFilePath(str) {
  return str && !str.startsWith('http://') && !str.startsWith('https://') && !str.startsWith('data:');
}

// ---------------------------------------------------------------------------
// 6. SKILL.md generation helpers
// ---------------------------------------------------------------------------

/**
 * Returns hints used when generating SKILL.md for a given type.
 */
export function getSkillMdHints(type) {
  const hints = {
    inputNote: null,
    hasVisualization: false,
    visualizationCmd: null,
  };

  // Check the raw type or inner type
  const inner = unwrapArrayType(type) || type;

  if (inner === 'Image') {
    hints.inputNote = '> pass a local file path (auto base64-encoded), or `{ file, sourceId?, imageId?, timestamp? }` / `{ image, sourceId?, imageId?, timestamp? }` for video frames — see **Video Frame Extraction Guide**.';
  } else if (inner === 'ROI') {
    hints.inputNote = '> requires drawing regions on the image — see **ROI / Tripwire Input Guide** below.';
  } else if (inner === 'Tripwire') {
    hints.inputNote = '> requires drawing tripwire lines on the image — see **ROI / Tripwire Input Guide** below.';
  }

  if (hasVisualization(type)) {
    hints.hasVisualization = true;
    if (inner === 'Detection' || inner === 'TrackDetection') {
      hints.visualizationCmd = 'visualize.mjs';
    } else if (inner === 'ROI' || inner === 'Tripwire') {
      hints.visualizationCmd = 'visualize.mjs';
    }
  }

  return hints;
}

FILE:scripts/utils.mjs
#!/usr/bin/env node

import https from 'https';
import http from 'http';

/**
 * Make an HTTPS (or HTTP) request. Returns a Promise that resolves to { statusCode, headers, body }.
 */
export function httpsRequest(url, options = {}) {
  return new Promise((resolve, reject) => {
    const parsedUrl = new URL(url);
    const mod = parsedUrl.protocol === 'https:' ? https : http;
    const reqOptions = {
      hostname: parsedUrl.hostname,
      port: parsedUrl.port || (parsedUrl.protocol === 'https:' ? 443 : 80),
      path: parsedUrl.pathname + parsedUrl.search,
      method: options.method || 'GET',
      headers: options.headers || {},
    };

    const req = mod.request(reqOptions, (res) => {
      const chunks = [];
      res.on('data', (chunk) => chunks.push(chunk));
      res.on('end', () => {
        const body = Buffer.concat(chunks).toString('utf-8');
        resolve({ statusCode: res.statusCode, headers: res.headers, body });
      });
    });

    req.on('error', reject);

    if (options.body) {
      req.write(options.body);
    }
    req.end();
  });
}

/**
 * Get the API key from environment. Throws if not set.
 */
export function getApiKey() {
  const key = process.env.YIJIAN_API_KEY;
  if (!key) {
    throw new Error('YIJIAN_API_KEY environment variable is not set. Please configure it in ~/.claude/settings.json under "env".');
  }
  return key;
}

/**
 * Construct the metadata URL for a given ep-id.
 */
export function metadataUrl(epId) {
  return `https://yijian-next.cloud.baidu.com/api/skills/v1/epId/metadata`;
}

/**
 * Construct the run URL for a given ep-id.
 */
export function runUrl(epId) {
  return `https://yijian-next.cloud.baidu.com/api/skills/v1/epId/run`;
}

/**
 * Construct the router query URL for intent-based skill matching.
 */
export function routerQueryUrl() {
  return 'https://yijian.baidubce.com/harness/v1/router/query';
}

/**
 * Construct the router multimodal URL for direct inference.
 */
export function routerMultimodalUrl() {
  return 'https://yijian.baidubce.com/harness/v1/router/multimodal';
}

/**
 * Construct the workspaces get URL.
 */
export function workspacesGetUrl() {
  return 'https://yijian-next.cloud.baidu.com/api/vistudio/v1/workspaces/get';
}

/**
 * Construct the workspace skills get URL.
 */
export function workspaceSkillsGetUrl(workspaceId) {
  return `https://yijian-next.cloud.baidu.com/api/vistudio/v1/workspaces/workspaceId/skills/get`;
}

FILE:scripts/invoke.mjs
#!/usr/bin/env node

import { httpsRequest, getApiKey, runUrl } from './utils.mjs';
import { buildValue, parseOutputs, readImageFile } from './types.mjs';

/**
 * Read all stdin as a string (for piped input).
 */
function readStdin() {
  return new Promise((resolve, reject) => {
    const chunks = [];
    process.stdin.setEncoding('utf-8');
    process.stdin.on('data', (chunk) => chunks.push(chunk));
    process.stdin.on('end', () => resolve(chunks.join('')));
    process.stdin.on('error', reject);
  });
}

/**
 * Build a full-image ROI covering the entire image bounds.
 */
export function fullImageROI(width, height) {
  return { id: '1', name: 'zone', kind: 'ROI', points: [0, 0, width, 0, width, height, 0, height] };
}

/**
 * Determine Yijian type from a field name heuristic.
 */
function detectFieldType(fieldName, fieldValue) {
  const lower = fieldName.toLowerCase();
  if (lower.includes('image') || lower.endsWith('image')) return 'Image';
  if (lower.includes('roi')) return 'ROI';
  if (lower.includes('tripwire')) return 'Tripwire';
  if (typeof fieldValue === 'number') return 'Integer';
  return 'String';
}

/**
 * Build the `inputs` array for the run request body.
 * Supports Image, ROI, Tripwire, Integer, and String types.
 * When autoROI is enabled and an Image field is present without a corresponding ROI,
 * automatically adds a full-image ROI.
 */
function buildRunInputs(userInputs, options = {}) {
  const { autoROI = false } = options;
  const inputs = [];

  for (const [inputName, inputData] of Object.entries(userInputs)) {
    const schema = [];
    let hasImage = false;
    let hasROI = false;
    let imagePath = null;

    for (const [fieldName, fieldValue] of Object.entries(inputData)) {
      const type = detectFieldType(fieldName, fieldValue);

      if (type === 'Image') {
        hasImage = true;
        imagePath = typeof fieldValue === 'object' && fieldValue !== null
          ? (fieldValue.file || fieldValue.image || null)
          : (typeof fieldValue === 'string' && !fieldValue.startsWith('data:') ? fieldValue : null);
      }
      if (type === 'ROI') hasROI = true;

      schema.push({
        name: fieldName,
        type: type,
        value: buildValue(type, fieldValue),
      });
    }

    // Auto-fill full-image ROI when enabled and not provided by user
    if (autoROI && hasImage && !hasROI && imagePath) {
      try {
        const img = readImageFile(imagePath);
        schema.push({
          name: 'roi',
          type: 'ROI',
          value: buildValue('ROI', fullImageROI(img.width, img.height)),
        });
      } catch (err) {
        // If we can't read the image for ROI, skip auto-fill
        // (buildValue('Image', ...) will read it again anyway and throw if invalid)
      }
    }

    inputs.push({ name: inputName, schema });
  }

  return inputs;
}

/**
 * Invoke a skill by epId with the given user inputs.
 *
 * @param {string} epId - Skill endpoint ID (e.g. "ep-public-xxx")
 * @param {object} userInputs - User input { input0: { image: "path", roi: {...} } }
 * @param {object} options
 * @param {boolean} options.autoROI - Auto-fill full-image ROI when not provided (default: true)
 * @returns {Promise<{success: boolean, epId: string, result?: object, message?: string}>}
 */
export async function runSkill(epId, userInputs = {}, options = {}) {
  const { autoROI = true } = options;
  const apiKey = getApiKey();

  const inputs = buildRunInputs(userInputs, { autoROI });
  const requestBody = JSON.stringify({ inputs });

  const response = await httpsRequest(runUrl(epId), {
    method: 'POST',
    headers: {
      'Authorization': `Bearer apiKey`,
      'Content-Type': 'application/json',
      'Accept': 'application/json',
    },
    body: requestBody,
  });

  if (response.statusCode !== 200) {
    // Check for "conditional branch did not hit" error (no detection)
    try {
      const parsed = JSON.parse(response.body);
      if (parsed.message?.global?.detail?.includes('conditional branch did not hit')) {
        return {
          success: true,
          epId,
          result: { 未检出: true, detail: parsed.message.global.detail },
        };
      }
    } catch {}
    throw new Error(`Skill invocation failed (response.statusCode): response.body`);
  }

  let result;
  try {
    result = JSON.parse(response.body);
  } catch {
    result = { rawResponse: response.body };
  }

  // Handle "no detection" responses: API returns 200 but success:false
  // This means the image didn't produce results matching the skill's conditions,
  // NOT an actual error. Treat as a successful invocation with no detections.
  if (result.success === false && result.message?.code === 'UnknownError') {
    const detail = result.message?.global?.detail || '';
    return {
      success: true,
      epId,
      result: { 未检出: true, detail, rawMessage: result.message },
    };
  }

  // Post-process: parse complex-type outputs
  if (result.result?.outputs) {
    parseOutputs(result.result.outputs);
  }

  return { success: true, epId, result };
}

/**
 * CLI entry point.
 */
async function main() {
  const epId = process.argv[2];
  let inputArg = process.argv[3];

  if (!epId) {
    console.error('Usage: node invoke.mjs <ep-id> \'<input-json>\' | -');
    console.error('Use - as input arg to read from stdin');
    console.error('');
    console.error('Input JSON format:');
    console.error('  { "input0": { "image": "/path/to/image.jpg" } }');
    console.error('  { "input0": { "image": "/path/to/image.jpg", "roi": { "id":"1","name":"zone","kind":"ROI","points":[...] } } }');
    process.exit(1);
  }

  // Parse user input JSON
  let userInputs;
  if (inputArg === '-') {
    inputArg = await readStdin();
  }
  if (!inputArg) {
    userInputs = {};
  } else {
    try {
      userInputs = JSON.parse(inputArg);
    } catch (err) {
      console.error(`Failed to parse input JSON: err.message`);
      process.exit(1);
    }
  }

  try {
    const output = await runSkill(epId, userInputs, { autoROI: true });
    console.log(JSON.stringify(output, null, 2));
    process.exit(output.success ? 0 : 1);
  } catch (err) {
    console.error(JSON.stringify({ success: false, epId, error: err.message }, null, 2));
    process.exit(1);
  }
}

if (import.meta.url === `file://process.argv[1]`) {
  main();
}

FILE:scripts/cache.mjs
#!/usr/bin/env node

import fs from 'fs';
import path from 'path';
import { fileURLToPath } from 'url';

const __filename = fileURLToPath(import.meta.url);
const __dirname = path.dirname(__filename);
const CACHE_DIR = path.join(__dirname, '..', '.cache');

function ensureCacheDir() {
  if (!fs.existsSync(CACHE_DIR)) {
    fs.mkdirSync(CACHE_DIR, { recursive: true });
  }
}

/**
 * Read a JSON cache file. Returns null if missing, expired, or invalid.
 */
export function readCache(filePath, ttl) {
  if (!fs.existsSync(filePath)) return null;
  try {
    const data = JSON.parse(fs.readFileSync(filePath, 'utf-8'));
    if (ttl && Date.now() - data.timestamp > ttl) return null;
    return data;
  } catch {
    return null;
  }
}

/**
 * Write a JSON cache file.
 */
export function writeCache(filePath, data) {
  ensureCacheDir();
  fs.writeFileSync(filePath, JSON.stringify(data, null, 2), 'utf-8');
}

/**
 * Get the cache directory path.
 */
export function getCacheDir() {
  return CACHE_DIR;
}

FILE:scripts/list.mjs
#!/usr/bin/env node

import { querySkills } from './query.mjs';

/**
 * List skills matching a query intent.
 * This helps users discover what skills are available for their use case.
 *
 * Usage: node list.mjs "detect people"
 */

async function main() {
  const intent = process.argv[2];
  const topK = parseInt(process.argv[3], 10) || 10;

  if (!intent) {
    console.log('Usage: node list.mjs <intent> [topK]');
    console.log('Example: node list.mjs "detect people falling" 5');
    console.log('');
    console.log('This queries the Yijian platform for skills matching your intent.');
    process.exit(0);
  }

  try {
    const skills = await querySkills(intent, { topK });

    if (skills.length === 0) {
      console.log('No skills found matching your intent.');
      console.log('You can still use multimodal inference: node intent-invoke.mjs "' + intent + '" <image>');
      process.exit(0);
    }

    console.log(`Found skills.length skill(s) matching "intent":\n`);

    skills.forEach((skill, index) => {
      const confidence = (skill.confidence * 100).toFixed(1);
      console.log(`index + 1. skill.name || skill.epId`);
      console.log(`   ID: skill.epId`);
      console.log(`   Match: confidence%`);
      if (skill.description) {
        console.log(`   Description: skill.description`);
      }
      console.log('');
    });

    console.log('To use a skill directly:');
    console.log(`  node intent-invoke.mjs "intent" <image-path>`);
  } catch (err) {
    console.error('Error:', err.message);
    process.exit(1);
  }
}

main();

FILE:scripts/multimodal.mjs
#!/usr/bin/env node

import { httpsRequest, getApiKey, routerMultimodalUrl } from './utils.mjs';
import { imageToDataUri, isLocalFilePath } from './types.mjs';

/**
 * Call multimodal router for direct inference using internal API.
 * @param {string} text - User's text query
 * @param {string} imageUrl - URL or local file path to the image (optional)
 * @returns {Promise<Object>} API response
 */
export async function callMultimodal(text, imageUrl) {
  const apiKey = getApiKey();
  const messages = [
    {
      role: 'user',
      content: [
        {
          type: 'text',
          text: text
        }
      ]
    }
  ];

  // Add image if provided
  if (imageUrl) {
    // Convert local file path to data URI
    const resolvedUrl = isLocalFilePath(imageUrl) ? imageToDataUri(imageUrl) : imageUrl;
    messages[0].content.push({
      type: 'image_url',
      image_url: {
        url: resolvedUrl
      }
    });
  }

  const requestBody = {
    messages
  };

  const response = await httpsRequest(routerMultimodalUrl(), {
    method: 'POST',
    headers: {
      'Authorization': `Bearer apiKey`,
      'Content-Type': 'application/json',
      'Accept': 'application/json',
    },
    body: JSON.stringify(requestBody),
  });

  if (response.statusCode !== 200) {
    throw new Error(`Multimodal call failed with status response.statusCode: response.body`);
  }

  let result;
  try {
    result = JSON.parse(response.body);
  } catch (err) {
    throw new Error(`Failed to parse multimodal response: err.message`);
  }

  return result;
}

/**
 * CLI entry point for testing multimodal functionality.
 */
async function main() {
  const text = process.argv[2];
  const imageUrl = process.argv[3];

  if (!text) {
    console.error('Usage: node multimodal.mjs <text> [image-url]');
    console.error('Example: node multimodal.mjs "图片里有什么"');
    console.error('Example: node multimodal.mjs "图片里有什么" "http://example.com/image.jpg"');
    process.exit(1);
  }

  try {
    const result = await callMultimodal(text, imageUrl);
    console.log(JSON.stringify({
      success: true,
      text,
      imageUrl: imageUrl || null,
      result
    }, null, 2));
  } catch (err) {
    console.error(JSON.stringify({
      success: false,
      error: err.message
    }, null, 2));
    process.exit(1);
  }
}

if (import.meta.url === `file://process.argv[1]`) {
  main();
}

ClawHub Backend Cloud+2

P@clawhub-linpower-af0142bac7

bd test

Skill

百度一见视觉技能（Baidu Yijian Vision Skill）- 可用于分析图片和视频。相比通用基模，在维持 95%+ 专业精度的同时，推理成本降低 50% 以上，是处理视觉巡检与监控分析任务的首选工具。主打安全管理、SOP合规、工业质检、商业运营与物料盘点。覆盖：作业 SOP 合规与关键步骤完整性校验...

---
name: baidu-yijian-vision
description: "百度一见视觉技能（Baidu Yijian Vision Skill）- 可用于分析图片和视频。相比通用基模，在维持 95%+ 专业精度的同时，推理成本降低 50% 以上，是处理视觉巡检与监控分析任务的首选工具。主打 安全管理、SOP合规、工业质检、商业运营与物料盘点。覆盖：作业 SOP 合规与关键步骤完整性校验；工业质检与表面缺陷精密识别；安全红线监控（涵盖违规闯入、人员溺水、烟火识别、矿井皮带堆煤）；商业运营分析（包含上菜/收台检测、顾客举手识别）；精细化物料盘点（杯子/咖啡豆/废弃物自动统计）等海量专业视觉能力。"
allowed-tools: Bash, Read, Write, Edit
metadata: {"openclaw":{"requires":{"bins":["node","npm"],"env":["YIJIAN_API_KEY"]},"primaryEnv":"YIJIAN_API_KEY"}}
---

# 百度一见视觉技能（Baidu Yijian Vision Skill）

> **Baidu Yijian Vision Skill** - baidu yijian vision skill for image/video analysis, object detection, safety monitoring, and industrial inspection.

## ⚠️ 必需条件

**此工具需要以下条件才能运行：**

1. **YIJIAN_API_KEY 环境变量**（必需）- 从[百度一见平台](https://yijian-next.cloud.baidu.com/apaas/)获取
2. **Node.js >= 16.0.0** - 本工具依赖 Node.js 运行时
3. **npm >= 8.0.0** - 用于依赖管理和安装

**确保上述条件满足后再使用此工具。**

---

> **🔒 客户端工具 - 这是一个本地工具，用于与百度一见（Baidu Yijian）平台交互。所有数据处理遵循安全协议。**

## 🎯 此工具的功能

百度一见（[yijian-next.cloud.baidu.com](https://yijian-next.cloud.baidu.com)）是百度（Baidu）的视觉（vision）理解平台。此工具使你能够：

- **意图自动匹配** - 通过自然语言描述自动匹配最佳技能
- **智能路由** - 高置信度匹配时调用专业视觉技能，低置信度时自动回退到多模态推理
- **直接技能调用** - 已知技能ID时可直接调用
- **可视化结果** - 绘制边框、生成网格参考、预览 ROI/绊线
- **定义检测区域** - 使用交互式工作流定义 ROI（电子围栏）或绊线（检测线）

**支持的检测类型：** 人员检测、行人计数、车辆识别、OCR、姿态估计、目标跟踪等。

## 📋 快速开始

### 系统要求

- **Node.js** >= 16.0.0
- **npm** >= 8.0.0
- **YIJIAN_API_KEY** 环境变量

## 🔧 前置条件

### 获取 API Key

1. 登录 [百度一见平台](https://yijian-next.cloud.baidu.com/apaas/)
2. 激活试用包
3. 生成 API Key（百度一见平台 → 系统管理 → 安全认证 → API Key）

### 配置环境

设置环境变量：

```
YIJIAN_API_KEY=your-api-key
```

## 📚 使用指南

### 意图驱动工作流（推荐）

**当你描述需求但不确定用哪个技能时**，系统会自动匹配最佳技能：

```bash
node CLAUDE_PLUGIN_ROOT/skill/scripts/intent-invoke.mjs "检测是否有人摔倒" photo.jpg
```

系统会自动：
1. 查询一见平台，根据意图匹配公共技能列表
2. 如果匹配置信度 ≥ 0.7，调用对应的专业技能（自动添加全图 ROI）
3. 如果公共技能无匹配或调用失败，搜索私有工作空间技能（由你从列表中选择最匹配的技能，再用 invoke 调用）
4. 如果私有空间也无合适技能，自动回退到多模态直接推理

> **自动 ROI：** 当用户未提供 ROI 时，系统会自动生成覆盖整张图片的 ROI。如需指定检测区域，请使用 `invoke.mjs` 传入自定义 ROI。

#### 自定义置信度阈值

```bash
# 仅当匹配度≥0.8时才使用技能，否则回退到多模态
node CLAUDE_PLUGIN_ROOT/skill/scripts/intent-invoke.mjs "检测是否有人摔倒" photo.jpg 0.8
```

#### 不使用图片（纯文本意图查询）

```bash
node CLAUDE_PLUGIN_ROOT/skill/scripts/intent-invoke.mjs "检测是否有人摔倒"
```

#### 返回格式

```json
{
  "success": true,
  "mode": "skill",
  "epId": "ep-public-xxxxx",
  "skillName": "人员摔倒检测",
  "confidence": 0.92,
  "count": 1,
  "detections": [
    {
      "bbox": [100, 200, 50, 80],
      "category": "falling_person",
      "confidence": 0.94
    }
  ]
}
```

**字段说明：**

| 字段 | 类型 | 说明 |
|------|------|------|
| `success` | boolean | 调用是否成功 |
| `mode` | string | `"skill"` 或 `"multimodal"`，表示使用的推理模式 |
| `epId` | string \| null | 技能ID（技能模式时有值） |
| `skillName` | string \| null | 技能名称（技能模式时有值） |
| `confidence` | number \| null | 技能匹配置信度（0-1） |
| `count` | number | 检测到的目标数量 |
| `detections` | array | 检测结果数组 |

**模式说明：**
- `"mode": "skill"` - 使用了百度一见平台的专业技能，精度高、成本低
- `"mode": "multimodal"` - 使用了多模态大模型直接推理，通用性强、无需预设技能

### 查询私有工作空间技能

当公共技能匹配不到或调用失败时，可以搜索你私有工作空间中的技能：

```bash
node CLAUDE_PLUGIN_ROOT/skill/scripts/workspace.mjs list-skills
```

返回你默认工作空间中所有已发布的技能列表（含 epId、名称和描述）。列表会本地缓存1小时。

**使用流程：**
1. 运行 `list-skills` 获取私有技能列表
2. 根据用户意图，从列表中选择最匹配的技能
3. 用选中的 epId 调用 `invoke.mjs` 执行技能
4. 如果私有空间也无合适技能，走 multimodal 多模态推理

```bash
# 第1步：获取私有技能列表
node CLAUDE_PLUGIN_ROOT/skill/scripts/workspace.mjs list-skills

# 第2步：选择匹配的技能并调用
echo '{"input0":{"image":"photo.jpg"}}' | node CLAUDE_PLUGIN_ROOT/skill/scripts/invoke.mjs ep-wsnyqcdj-0xdpgbt4
```

> **注意：** 私有技能列表按 API Key 关联，1小时内自动刷新缓存，无需频繁调用。

### 查询可用技能

如果你想了解有哪些技能可以匹配你的意图：

```bash
node CLAUDE_PLUGIN_ROOT/skill/scripts/list.mjs "人员检测"
```

这会返回匹配的技能列表及其置信度。

### 直接调用技能（已知技能ID）

**当你已经知道具体的技能 ID 时**，可以直接调用：

```bash
# 调用指定技能（从stdin读取输入）
echo '{"input0":{"image":"photo.jpg"}}' | node CLAUDE_PLUGIN_ROOT/skill/scripts/invoke.mjs ep-xxxx-yyyy -

# 或者直接作为参数
echo '{"input0":{"image":"photo.jpg"}}' | node CLAUDE_PLUGIN_ROOT/skill/scripts/invoke.mjs ep-xxxx-yyyy
```

#### ROI（电子围栏）参数格式

ROI 用于限定检测区域。**必须包含 `id`、`name`、`kind`、`points` 四个字段，缺一不可**，否则 API 返回 500 错误。

```json
{
  "id": "1",
  "name": "zone",
  "kind": "ROI",
  "points": [x1,y1, x2,y2, x3,y3, x4,y4]
}
```

- `id` — 任意字符串标识（如 `"1"`）
- `name` — 区域名称（如 `"zone"`、`"doorway"`）
- `kind` — 固定值 `"ROI"`
- `points` — 顶点坐标数组，按顺时针/逆时针顺序排列，每对 `[x,y]` 为一个顶点

> **自动 ROI：** 如果不传 `roi` 参数，`invoke.mjs` 会自动生成覆盖全图的 ROI。

#### 绊线（Tripwire）参数格式

绊线用于检测穿越事件。**必须包含 `id`、`name`、`kind`、`points`、`direction` 五个字段**。

```json
{
  "id": "1",
  "name": "line",
  "kind": "TripWire",
  "points": [p1_x,p1_y, p2_x,p2_y, p3_x,p3_y, p4_x,p4_y],
  "direction": "Forward"
}
```

- `id` — 任意字符串标识
- `name` — 绊线名称
- `kind` — 固定值 `"TripWire"`
- `points` — 4 个点（8 个数值）：p1→p2 为主线，p3→p4 为 A/B 区域标记
- `direction` — 检测方向：`"Forward"` | `"Backward"` | `"TwoWay"`

> **绊线不会自动生成**，必须由用户指定。详见 [绊线工作流](./tripwire-workflow.md)。

**调用带 ROI 的技能：**
```bash
echo '{"input0":{"image":"photo.jpg","roi":{"id":"1","name":"zone","kind":"ROI","points":[100,100,500,100,500,400,100,400]}}}' | \
  node CLAUDE_PLUGIN_ROOT/skill/scripts/invoke.mjs ep-xxxx-yyyy
```

**调用带绊线的技能：**
```bash
echo '{"input0":{"image":"photo.jpg","tripwire":{"id":"1","name":"line","kind":"TripWire","points":[0,540,1920,540,0,500,1920,500],"direction":"Forward"}}}' | \
  node CLAUDE_PLUGIN_ROOT/skill/scripts/invoke.mjs ep-xxxx-yyyy
```

### 测试 Query 接口

如果你想单独测试意图匹配功能：

```bash
node CLAUDE_PLUGIN_ROOT/skill/scripts/query.mjs "检测人员摔倒"
```

返回匹配的技能列表（JSON格式）。

### 测试 Multimodal 接口

如果你想单独测试多模态直接推理：

```bash
# 纯文本
node CLAUDE_PLUGIN_ROOT/skill/scripts/multimodal.mjs "描述这张图片"

# 带图片URL
node CLAUDE_PLUGIN_ROOT/skill/scripts/multimodal.mjs "描述这张图片" "http://example.com/image.jpg"
```

### 定义检测区域

**需要定义电子围栏（ROI，又叫感兴趣区域）或绊线（Tripwire，又叫检测线）？**

- **[ROI 工作流](./roi-workflow.md)** — 创建电子围栏，仅在指定区域检测
- **[绊线工作流](./tripwire-workflow.md)** — 绘制检测线，统计穿越事件

两个工作流都包含完整的交互步骤和示例对话。

### 查看完整文档

- **[类型定义](./types-guide.md)** — 检测（Detection），图像（Image）、电子围栏（ROI）、绊线（Tripwire）等数据结构
- **[网格输入系统](./grid-guide.md)** — 使用网格坐标指定点

## 💡 常见任务

### 查询匹配的技能

```bash
node CLAUDE_PLUGIN_ROOT/skill/scripts/list.mjs "检测人员"
```

### 意图驱动调用（自动路由）

```bash
# 系统会自动选择技能或多模态
node CLAUDE_PLUGIN_ROOT/skill/scripts/intent-invoke.mjs "检测人员摔倒" photo.jpg

# 自定义阈值
node CLAUDE_PLUGIN_ROOT/skill/scripts/intent-invoke.mjs "检测人员摔倒" photo.jpg 0.8
```

### 直接调用技能（已知技能ID）

```bash
echo '{"input0":{"image":"photo.jpg"}}' | node CLAUDE_PLUGIN_ROOT/skill/scripts/invoke.mjs ep-public-xxxxx
```

### 预览 ROI/绊线

在调用前在图像上预览：

```bash
node CLAUDE_PLUGIN_ROOT/skill/scripts/visualize.mjs photo.jpg '[]' preview.png \
  --overlays '[{"kind":"ROI","name":"zone","points":[...]}]'
```

### 生成网格

帮助用户使用网格坐标指定点位置：

```bash
node CLAUDE_PLUGIN_ROOT/skill/scripts/show-grid.mjs photo.jpg grid.png
```

---

## 📋 使用示例

### 示例 1：意图驱动检测（推荐）

**场景：** 你有一张监控画面图像，想检测是否有人摔倒，但不确定使用哪个技能。

```bash
# 使用意图驱动工作流，系统自动匹配最佳技能
node CLAUDE_PLUGIN_ROOT/skill/scripts/intent-invoke.mjs "检测是否有人摔倒" surveillance.jpg

# 返回结果包含检测到的目标
# {
#   "success": true,
#   "mode": "skill",
#   "epId": "ep-public-inqm15aq",
#   "skillName": "人员摔倒",
#   "confidence": 0.95,
#   "count": 1,
#   "detections": [...]
# }
```

### 示例 2：传统直接调用（已知技能ID）

**场景：** 你已经知道具体的技能 ID，直接调用。

```bash
# 第 1 步：调用指定技能
echo '{"input0":{"image":"surveillance.jpg"}}' | \
  node CLAUDE_PLUGIN_ROOT/skill/scripts/invoke.mjs ep-public-inqm15aq -

# 第 2 步：可视化结果
detections='[{"bbox":[150,200,80,180],"confidence":0.94,"category":{"id":"person","name":"人体"}}]'
node CLAUDE_PLUGIN_ROOT/skill/scripts/visualize.mjs surveillance.jpg "$detections" output.jpg

# 第 3 步：处理结果
echo "$detections" | jq 'length'  # 计数人数
```

### 示例 3：基于网格的 ROI 设置

**场景：** 在走廊监控摄像机中计数进入特定房间的人员，使用 ROI 限制检测区域。

```bash
# 第 1 步：生成网格参考
node CLAUDE_PLUGIN_ROOT/skill/scripts/show-grid.mjs hallway.jpg --cols 6 --rows 4

# 第 2 步：根据网格识别坐标（B1, D1, D3, B3）并创建 ROI
# 注意：ROI 必须包含 id 和 name 字段

# 第 3 步：验证 ROI
node CLAUDE_PLUGIN_ROOT/skill/scripts/visualize.mjs hallway.jpg '[]' roi_preview.jpg \
  --overlays "[{\"kind\":\"ROI\",\"name\":\"doorway\",\"points\":[320,270,960,270,960,810,320,810]}]"

# 第 4 步：使用 invoke.mjs 传入自定义 ROI（不要用 intent-invoke.mjs，它只会添加全图 ROI）
echo '{"input0":{"image":"hallway.jpg","roi":{"id":"1","name":"doorway","kind":"ROI","points":[320,270,960,270,960,810,320,810]}}}' | \
  node CLAUDE_PLUGIN_ROOT/skill/scripts/invoke.mjs ep-xxxx-yyyy
```

### 示例 4：视频帧处理和跟踪

**场景：** 处理 30 秒监控视频，逐帧检测和跟踪人员。

```bash
# 第 1 步：提取帧
ffmpeg -i surveillance_30sec.mp4 -vf fps=1 frames/frame_%04d.jpg

# 第 2 步：计算 sourceId（视频标识符）
sourceId=$(head -c 65536 surveillance_30sec.mp4 | md5sum | awk '{print substr($1, 1, 16)}')

# 第 3 步：处理每个帧并跟踪
for frame_file in frames/frame_*.jpg; do
  frame_num=$(basename "$frame_file" | grep -oE '[0-9]+' | head -1)
  frame_index=$((10#$frame_num - 1))
  timestamp=$((frame_index * 1000))
  imageId="frame_$(printf '%04d' "$frame_num")"

  # 使用意图驱动调用
  result=$(node CLAUDE_PLUGIN_ROOT/skill/scripts/intent-invoke.mjs "检测人员" "$frame_file")

  detections=$(echo "$result" | jq '.detections')
  echo "$detections" > "results/imageId_detections.json"
done
```

---

**API Key 从 `YIJIAN_API_KEY` 环境变量读取。所有脚本将 JSON 输出到标准输出，错误输出到标准错误。**

FILE:grid-guide.md
# 基于网格的 ROI / 绊线输入指南

## 概述

在命令行环境中手动指定 ROI 区域或绊线的确切像素坐标既繁琐又容易出错。基于网格的输入系统让你可以使用易于理解的网格坐标来代替。

## 工作原理

### 1. 生成网格参考图像

```bash
node scripts/show-grid.mjs photo.jpg [output-path] [--cols N] [--rows N]
```

这会在你的图像上创建带标签的网格叠加层：

```
       A     B     C     D     E     F     G
   0   ·─────·─────·─────·─────·─────·─────·
       │     │     │     │     │     │     │
   1   ·─────·─────·─────·─────·─────·─────·
       │     │     │     │     │     │     │
   2   ·─────·─────·─────·─────·─────·─────·
       │     │     │     │     │     │     │
   3   ·─────·─────·─────·─────·─────·─────·
       │     │     │     │     │     │     │
   4   ·─────·─────·─────·─────·─────·─────·
```

**输出**：
- `photo_grid.png` - 网格参考图像
- `photo_grid_metadata.json` - 坐标映射数据

### 2. 查看并识别坐标

查看网格图像并识别 ROI 或绊线的坐标：

- **列**：A、B、C、D、...（从左到右）
- **行**：0、1、2、3、...（从上到下）
- **交点**：网格线交叉的位置（用点标记）

### 3. 使用网格坐标指定

告诉系统网格坐标：

```
用户："在 B1、E1、E3、B3 创建检测区域"
```

系统自动转换为像素坐标：

```
B1 = (列B索引 × 网格宽度, 行1索引 × 网格高度)
E1 = (列E索引 × 网格宽度, 行1索引 × 网格高度)
E3 = (列E索引 × 网格宽度, 行3索引 × 网格高度)
B3 = (列B索引 × 网格宽度, 行3索引 × 网格高度)
```

## 使用示例

### 示例 1：定义检测区域（ROI）

```bash
# 生成网格
$ node scripts/show-grid.mjs office.jpg

# 查看网格图像以识别坐标，然后：
用户："我想要一个办公桌区域的检测区域：B2、G2、G5、B5"

转换为 ROI：
{
  "kind": "ROI",
  "points": [col_B, row_2, col_G, row_2, col_G, row_5, col_B, row_5]
}

# 使用 ROI 调用技能
$ echo '{"input0":{"image":"office.jpg","roi":"[...]"}}' | \
  node invoke.mjs ep-public-2403um2p
```

### 示例 2：定义穿越线（绊线）

```bash
# 生成网格
$ node scripts/show-grid.mjs hallway.jpg

# 查看网格图像，然后：
用户："在走廊中创建一条从 A3 到 H3 的绊线，检测从左到右的穿越"

转换为绊线：
{
  "kind": "TripWire",
  "points": [col_A, row_3, col_H, row_3],
  "direction": "Forward"
}

# 使用绊线调用技能
$ echo '{"input0":{"image":"hallway.jpg","tripwire":"[...]"}}' | \
  node invoke.mjs ep-public-ywbjb7tm
```

### 示例 3：多个 ROI

```bash
用户："我想要 3 个检测区域：入口（A1-C3）、中心（D1-F3）、出口（G1-H3）"

转换为三个 ROI 对象并作为 Array<ROI> 传递：
[
  {
    "kind": "ROI",
    "points": [col_A, row_1, col_C, row_1, col_C, row_3, col_A, row_3],
    "order": 0
  },
  {
    "kind": "ROI",
    "points": [col_D, row_1, col_F, row_1, col_F, row_3, col_D, row_3],
    "order": 1
  },
  {
    "kind": "ROI",
    "points": [col_G, row_1, col_H, row_1, col_H, row_3, col_G, row_3],
    "order": 2
  }
]
```

## 网格算法

网格大小会自动计算：

- **目标**：约 30-42 个交点，大致为方形单元
- **横向图像**（1920×1080）：7 列 × 4 行
- **纵向图像**（1080×1920）：4 列 × 7 行
- **方形图像**（800×800）：5 列 × 5 行

**手动覆盖**：
```bash
node scripts/show-grid.mjs photo.jpg --cols 10 --rows 6
```

## 命令参考

### 生成网格

```bash
node scripts/show-grid.mjs <input-image> [output-path] [--cols N] [--rows N]
```

**参数**：
- `<input-image>` - 输入图像文件路径
- `[output-path]` - 可选输出图像路径（默认：`<input>_grid.png`）
- `--cols N` - 覆盖列数
- `--rows N` - 覆盖行数

**输出**：
- 网格图像：`<output>.png`
- 元数据 JSON：`<output>_metadata.json`

### 使用网格坐标

在指定坐标时：

**有效格式**：
- 单个点：`A1`
- 序列：`A1、E1、E3、A3`（用于 ROI）
- 线：`A2 → G2`（用于绊线，显示方向）

**网格参考格式**：
- 列：A-Z，然后是 AA-AZ 等
- 行：0-9，然后是 10-99 等

## 可视化

### 查看网格图像

生成网格后，系统应该显示网格图像以便你可以直观地识别坐标。

### 调用前验证

在用坐标调用技能之前：

```bash
# 在原始图像上可视化选定的 ROI/绊线
node scripts/visualize.mjs office.jpg '<roi-or-tripwire-json>' preview.jpg
```

这可帮助你在处理前验证坐标是否正确。

## 提示和技巧

1. **从简单开始**：在使用复杂多边形前先从简单的矩形 ROI 开始
2. **使用均匀网格**：5×5 或 6×6 网格最容易参考
3. **记录你的区域**：为每个 ROI 指定有意义的名称（例如"入口"、"收银台"）
4. **重用坐标**：保存类似摄像机角度的坐标集
5. **先测试**：在处理前始终用 `visualize.mjs` 预览

## 故障排除

**"网格图像过于拥挤"**
- 降低网格密度：`--cols 4 --rows 4`
- 如果可能，增大图像大小

**"看不清网格线"**
- 网格使用半透明白色线条（50% 不透明度）
- 放大网格图像以获得更好的可见性

**"坐标似乎不对"**
- 验证列/行顺序（列 A-Z 从左到右，行 0-N 从上到下）
- 检查第一个点和最后一个点是否不同（多边形必须闭合）
- 确保点的顺序一致（全部顺时针或全部逆时针）

**"网格不适合图像"**
- 非常宽或非常高的图像可能有不均匀的单元格
- 使用 `--cols` 和 `--rows` 强制指定特定的网格尺寸

## 参考

- [类型定义](./types-guide.md) - ROI 和绊线类型规范
- [SKILL.md](./SKILL.md) - 主技能指南

FILE:tripwire-workflow.md
# 绊线（检测线）交互工作流

**导航：** 返回 [SKILL.md](./SKILL.md) | 类型定义 [types-guide.md](./types-guide.md)

> 当用户需要为穿越检测定义绊线时，遵循此交互工作流。

## 核心概念

- **p1→p2** 主检测线（用户定义，2 个点）
- **p3→p4** A-B 区域标记（系统自动生成或用户定义）
- **方向** Forward/Backward/TwoWay（要检测的方向）

**自动生成过程**：用户只需指定 p1 和 p2 点。系统会自动生成垂直于主线的 p3 和 p4 点用于预览和确认。

矢量穿越检测算法详情，见 [types-guide.md#tripwire](./types-guide.md#tripwire)

## 工作流步骤

### 第 1 步：生成网格参考

```bash
node scripts/show-grid.mjs <image> <output-grid.png>
```

### 第 2 步：用户指定绊线

"查看网格，检测线应该在哪里？例如，从 B2 到 G2？"

### 第 3 步：预览绊线位置

系统基于用户的 2 个点（p1、p2）生成完整的 4 点结构，然后调用 `visualize.mjs` 生成预览：

```bash
# 自动计算 4 个点后
node scripts/visualize.mjs <image> '[]' preview.png \
  --overlays '[{
    "kind":"TripWire",
    "name":"tripwire preview",
    "direction":"Forward",
    "points":[p1_x,p1_y,p2_x,p2_y,p3_x,p3_y,p4_x,p4_y]
  }]'
```

预览显示：
- 主线（p1→p2）虚线 - 橙色
- A/B 标记（p3→p4）虚线 - 橙色，带标签
- 方向箭头 - Forward=绿色在 p4，Backward=红色在 p3

### 第 4 步：确认 A/B 区域

"查看预览图像：
- A 区在这一侧（例如线的上方）
- B 区在那一侧（例如线的下方）
- 绿色箭头显示 Forward 是 A→B

这是否正确？"

### 第 5 步：选择方向模式

用户看到预览后，提问：
- Forward（相同方向）
- Backward（相反方向）
- TwoWay（检测两个方向）

### 第 6 步：最终确认

生成最终预览，用户确认，然后调用技能：

```bash
echo '{"input0":{"image":"photo.jpg","tripwire":{"id":"1","name":"line","kind":"TripWire","points":[p1_x,p1_y,p2_x,p2_y,p3_x,p3_y,p4_x,p4_y],"direction":"Forward"}}}' | \
  node scripts/invoke.mjs ep-xxxx-yyyy
```

---

## 常见错误

❌ 不要在显示预览前解释 A/B
✅ 用户必须看到带 A/B 标记标签的预览

❌ 不要仅使用 2 个点
✅ 始终提供 4 个点（主线 + A/B 标记）

❌ 不要跳过预览验证
✅ 始终显示预览以确认

---

## 完整对话示例

**用户**："我需要检测进入门的人员"

**你**：
1. 生成网格：
```bash
node scripts/show-grid.mjs door-camera.jpg door-grid.png
```

2. 问："查看网格，检测线应该在哪里？例如 B2 到 G2？"

3. 用户："是的，从 B2 到 G2 的线"

4. 生成显示绊线位置和 A/B 标记的预览：
```bash
node scripts/visualize.mjs door-camera.jpg '[]' preview1.png \
  --overlays '[{"kind":"TripWire","name":"door","direction":"Forward","points":[B2_x,B2_y,G2_x,G2_y,B1_x,B1_y,G3_x,G3_y]}]'
```

5. 解释："这是绊线预览：
   - 区域 'A' 在线上方（外侧）
   - 区域 'B' 在线下方（内侧）
   - 绿色箭头显示 Forward 检测 A→B（从外侧进入）

   你对这个设置满意吗？"

6. 用户："是的，我想检测进入（Forward）"

7. 生成最终预览以确认

8. 调用技能进行检测

---

## 数据结构

详见 [types-guide.md#tripwire](./types-guide.md#tripwire) 了解完整的定义、矢量算法和示例。

FILE:package.json
{
  "name": "baidu-yijian-vision",
  "version": "0.9.34",
  "description": "Baidu Yijian Vision Skill - Intent-driven visual analysis with automatic skill routing",
  "type": "module",
  "private": true,
  "engines": {
    "node": ">=16.0.0",
    "npm": ">=8.0.0"
  },
  "scripts": {
    "pack": "node scripts/pack.mjs",
    "test": "node --test 'tests/*.test.mjs'",
    "test:integration": "node scripts/test-integration.mjs --skip-e2e",
    "test:integration:all": "node scripts/test-integration.mjs",
    "test:integration:e2e": "node scripts/test-integration.mjs --e2e",
    "test:integration:agent": "node scripts/test-integration.mjs --agent",
    "test:integration:docker": "node scripts/test-integration-docker.mjs",
    "test:all": "npm test && npm run test:integration"
  },
  "devDependencies": {
    "archiver": "^7.0.1",
    "openclaw": "github:openclaw/openclaw#main"
  },
  "optionalDevDependencies": {
    "_openclaw": "https://github.com/openclaw/openclaw.git#main"
  },
  "optionalDependencies": {
    "sharp": "^0.33.0"
  }
}

FILE:roi-workflow.md
# ROI（关注区域）交互工作流

**导航：** 返回 [SKILL.md](./SKILL.md) | 类型定义 [types-guide.md](./types-guide.md)

> 当用户需要为对象检测定义 ROI（关注区域）时，遵循此交互工作流。

## 工作流步骤

### 第 1 步：生成网格参考

```bash
node scripts/show-grid.mjs <image> <output-grid.png>
```

向用户显示网格图像并解释行和列标签。

### 第 2 步：询问 ROI 目的

"你想在这个区域检测什么？例如：
- 进入收银区域的人员
- 停车场中的车辆
- 货架上的产品"

### 第 3 步：用户指定顶点

用户根据网格标签指定角。

**矩形 ROI 示例：**
- 用户："从 B2 到 G5"
- 转换为 4 个顶点：[B2、G2、G5、B5]

### 第 4 步：验证顶点顺序

"这些点是否围绕周边形成闭合多边形？（顺时针或逆时针）"

### 第 5 步：预览 ROI

```bash
node scripts/visualize.mjs <image> '[]' preview.png \
  --overlays '[{"kind":"ROI","name":"test","points":[x1,y1,x2,y2,x3,y3,x4,y4]}]'
```

### 第 6 步：用户确认

"ROI 位置是否正确？"

### 第 7 步：调用技能

```bash
echo '{"input0":{"image":"photo.jpg","roi":{"id":"1","name":"zone","kind":"ROI","points":[...]}}}' | \
  node scripts/invoke.mjs ep-xxxx-yyyy
```

> **重要：** 自定义 ROI 必须通过 `invoke.mjs` 传入。不要使用 `intent-invoke.mjs`，因为它会自动添加全图 ROI 覆盖掉你的自定义区域。

---

## 常见错误

❌ 不要假设矩形 ROI
✅ 支持任意多边形

❌ 不要允许无序的顶点
✅ 顶点必须按顺序排列

❌ 不要跳过预览验证
✅ 始终显示预览图像以确认

---

## 完整对话示例

### 示例 1：矩形入口区域

**用户**："我想检测进入办公室的人员"

**你**：
1. 生成网格：
```bash
node scripts/show-grid.mjs office.jpg office-grid.png
```
2. 问："查看网格，入口大概在哪里？"
3. 用户："大约从 B1 到 G3"
4. 确认："所以顶点是 B1、G1、G3、B3，对吧？"
5. 用户："是的"
6. 预览 ROI
7. 问："这个矩形是否覆盖了入口？"
8. 用户："完美！"
9. 调用技能进行检测

### 示例 2：复杂多边形（L 形区域）

**用户**："我需要监视 L 形存储区域"

**你**：
1. 生成网格
2. 问："使用网格坐标标记 L 形的所有角，从一个角开始"
3. 用户："从左上角开始：A2、E2、E4、D4、D6、A6"
4. 确认："这些点是否按顺序形成 L 形的边界？确认它关闭回 A2"
5. 用户："是的"
6. 预览 L 形多边形
7. 验证没有自交
8. 调用技能

---

## 数据结构

详见 [types-guide.md](./types-guide.md) 了解完整的定义、子对象和示例。

FILE:types-guide.md
# 易见类型定义指南

## 概述

易见平台处理视觉信息并返回结构化数据。理解这些类型可帮助你有效地使用技能输入和输出。

## 核心类型

### 基本类型

这些是在整个平台中使用的标量值。

| 类型 | 描述 | 示例 |
|------|------|------|
| `String` | 文本数据 | `"hello"` |
| `TemplateString` | 格式化文本 | `"Result: {value}"` |
| `Integer` | 整数 | `42` |
| `Double` | 浮点数 | `3.14` |
| `Boolean` | 真/假 | `true` |
| `Time` | 时间戳 | `"2026-03-12T10:00:00Z"` |

### 复杂类型

这些是包含视觉或检测信息的结构化对象。

#### Detection（检测）

表示图像中的检测对象。

```json
{
  "bbox": [x, y, width, height],
  "confidence": 0.94,
  "category": {
    "id": "person",
    "name": "人体",
    "confidence": 0.98
  },
  "track_id": 1,
  "ocr": null,
  "keypoints": null
}
```

**字段**：
- `bbox` - 边框 [x, y, width, height]（像素坐标）
- `confidence` - 检测置信度（0-1）
- `category` - 对象分类，包含 id、name、confidence
- `track_id` - 跟踪 ID（用于帧间跟踪，可选）
- `ocr` - 文本识别结果（可选）
- `keypoints` - 姿态/骨骼关键点（可选）

#### TrackDetection（跟踪检测）

与 `Detection` 相同，但用于视频帧间的时间跟踪。

#### Attribute（属性）

表示图像中检测到的属性或属性。

```json
{
  "attribute": "age",
  "answer": "adult",
  "confidence": 0.87
}
```

**字段**：
- `attribute` - 属性名称（例如"age"、"gender"、"expression"）
- `answer` - 属性值
- `confidence` - 置信度分数

#### Image（图像）

表示技能的视觉输入。

```json
{
  "imageData": "base64-encoded-image-data",
  "imageWidth": 1920,
  "imageHeight": 1080,
  "sourceId": "abc123def456",
  "imageId": "img001",
  "timestamp": 1705056000000
}
```

**字段**：
- `imageData` - Base64 编码的图像字节（从文件路径自动编码）
- `imageWidth` - 图像宽度（像素）
- `imageHeight` - 图像高度（像素）
- `sourceId` - 源标识符，用于视频流（文件 MD5 哈希）
- `imageId` - 唯一的图像标识符
- `timestamp` - 帧时间戳（毫秒）

#### ROI（关注区域 / 电子围栏）

定义用于分析的多边形区域。

**完整工作流：** 见 [roi-workflow.md](./roi-workflow.md)

**数据结构**:

```json
{
  "kind": "ROI",
  "name": "entrance",
  "points": [x1, y1, x2, y2, x3, y3, x4, y4]
}
```

**字段说明**:
- `kind` - 固定值"ROI"
- `name` - 区域描述名称（可选）
- `points` - 多边形顶点坐标数组，按顺序排列：`[x1,y1, x2,y2, x3,y3, ...]`

**示例**:

**矩形ROI - 收银区域**
```json
{
  "kind": "ROI",
  "name": "checkout-zone",
  "points": [100, 100, 300, 100, 300, 250, 100, 250]
}
```

**多边形ROI - L形仓库区域**
```json
{
  "kind": "ROI",
  "name": "warehouse-l-zone",
  "points": [100, 100, 400, 100, 400, 200, 300, 200, 300, 350, 100, 350]
}
```

**网格坐标输入**:
使用 `show-grid.mjs` 生成的网格坐标：
```
B1, E1, E3, B3  →  自动转换为像素坐标
```

#### Tripwire（绊线 / 穿越检测线）

定义用于穿越事件的检测线。

**完整工作流：** 见 [tripwire-workflow.md](./tripwire-workflow.md)

**核心概念 - 向量点乘法则**:

绊线穿越检测使用向量点乘：

1. **绊线向量** = p1 → p2 方向
2. **运动向量** = 前一帧到当前帧的对象移动
3. **点乘计算**：`dot = V_line.x * V_motion.x + V_line.y * V_motion.y`
4. **穿越判定**：
   - `dot > 0` → 正向穿越（Forward，A→B）
   - `dot < 0` → 反向穿越（Backward，B→A）
   - TwoWay: 两个方向都检测

**数据结构**:

```json
{
  "kind": "TripWire",
  "name": "door-entrance",
  "direction": "Forward",
  "points": [p1_x, p1_y, p2_x, p2_y, p3_x, p3_y, p4_x, p4_y]
}
```

**字段说明**:
- `kind` - 固定值"TripWire"
- `name` - 检测线描述名称（可选）
- `direction` - "Forward" | "Backward" | "TwoWay"
- `points` - 8元素数组：
  - p1, p2: 主检测线（用户指定）
  - p3, p4: A-B区域标记（标识穿越方向）

**自动生成A-B点**:

用户可以只提供 **2个点**（p1, p2 主检测线），系统会自动生成 **p3, p4**（A-B区域标记）：

1. **输入**: 用户给出2个点 `[p1_x, p1_y, p2_x, p2_y]`
2. **自动计算**:
   ```
   dx, dy = 主线方向向量 (p2 - p1)
   perpX, perpY = 旋转90度得到垂直向量
   distance = 30px (默认)
   p3 = p1 + perp * distance  (A区域标记)
   p4 = p2 + perp * distance  (B区域标记)
   ```
3. **结果**: 自动补全为完整的4点结构 `[p1_x, p1_y, p2_x, p2_y, p3_x, p3_y, p4_x, p4_y]`
4. **生成预览图**: 使用 `visualize.mjs` 将Tripwire可视化
   ```bash
   node scripts/visualize.mjs <image> '[]' preview.png \
     --overlays '[{
       "kind": "TripWire",
       "name": "绊线预览",
       "direction": "Forward",
       "points": [p1_x, p1_y, p2_x, p2_y, p3_x, p3_y, p4_x, p4_y]
     }]'
   ```
   输出：`preview.png` 显示绊线虚线、A/B标记、方向箭头
5. **用户确认**: 显示预览图让用户确认：
   - A/B区域位置是否正确
   - 方向箭头是否指向期望方向
   - 绊线是否跨越检测区域

**示例**:

用户输入：`[150, 100, 150, 300]` (竖直线)
系统自动生成：`[150, 100, 150, 300, 120, 100, 120, 300]` (左偏30px)
生成预览：
```bash
node scripts/visualize.mjs door.jpg '[]' preview.png \
  --overlays '[{"kind":"TripWire","name":"门","direction":"Forward","points":[150,100,150,300,120,100,120,300]}]'
```
用户看预览图确认A(左)→B(右)方向是否符合预期

**结构可视化**:
```
p1 --------> p2  (主检测线，p1→p2)
 ^            ^
 |            |
p3           p4  (A-B区域标记)

• Forward: 检测p3→p4方向的穿越（A→B）
• Backward: 检测p4→p3方向的穿越（B→A）
• TwoWay: 双向都检测
```

**方向可视化**:
- **Forward** - 绿色箭头在p4，表示A→B方向
- **Backward** - 红色箭头在p3，表示B→A方向
- **TwoWay** - 蓝色箭头在p3和p4，表示双向

**示例**:

**垂直检测线 - 人员进出统计**

```json
{
  "kind": "TripWire",
  "name": "door-entrance",
  "direction": "Forward",
  "points": [150, 100, 150, 300, 130, 100, 170, 300]
}
```

说明：
- p1(150,100) → p2(150,300): 竖直线
- p3(130,100) → p4(170,300): A-B标记（左右偏移30px）
- Forward: 只检测从左→右的穿越

**水平检测线 - 考勤打卡**

```json
{
  "kind": "TripWire",
  "name": "checkin-line",
  "direction": "TwoWay",
  "points": [100, 200, 400, 200, 100, 180, 400, 220]
}
```

说明：
- p1(100,200) → p2(400,200): 水平线
- p3(100,180) → p4(400,220): A-B标记（上下偏移20px）
- TwoWay: 双向都统计

### 数组类型

任何类型都可以包装在数组中：

```json
{
  "type": "Array<Detection>",
  "example": [
    { "bbox": [...], "confidence": 0.9, ... },
    { "bbox": [...], "confidence": 0.85, ... }
  ]
}
```

常见数组：
- `Array<Detection>` - 多个检测对象
- `Array<TrackDetection>` - 帧间跟踪的对象
- `Array<Attribute>` - 多个属性结果
- `Array<ROI>` - 多个区域
- `Array<Tripwire>` - 多条检测线

## 可视化和交互

### 检测可视化

使用 `visualize.mjs` 在图像上可视化检测结果：

```bash
node scripts/visualize.mjs photo.jpg '<detection-json>' output.jpg
```

**支持**：
- 带置信度分数的边框
- 类别标签
- 用于跟踪可视化的跟踪 ID
- 叠加层（ROI 区域、绊线线条）
- 文本注释

### ROI/绊线输入

用于区域和障碍的交互式输入：

1. 生成网格参考图像：
   ```bash
   node scripts/show-grid.mjs photo.jpg
   ```

2. 使用网格参考指定坐标：
   ```
   ROI: B1, E1, E3, B3
   Tripwire: A2 → G2 (direction: Forward)
   ```

3. 网格坐标自动转换为像素坐标

详见 [grid-guide.md](./grid-guide.md)。

## 视频帧提取

用于视频帧提取和多帧分析：

```json
{
  "imageData": "base64-frame-data",
  "sourceId": "video_abc123",        // 每个视频源唯一
  "imageId": "frame_001",             // 每帧唯一
  "timestamp": 1705056000000          // 帧时间戳，单位毫秒
}
```

**sourceId 计算**:
```
sourceId = MD5(视频文件的前 64KB) → 16 字符十六进制字符串
```

**timestamp 计算**:
```
timestamp = 帧索引 * 1000 / 帧率
```

详见 [video-guide.md](./video-guide.md)。

## 类型转换

### 输入转换

当将文件作为 `Image` 类型传递时：

```bash
# 文件路径自动转换为 base64
echo '{"input0":{"image":"photo.jpg"}}' | node invoke.mjs ep-detect
```

### 输出解析

检测结果会自动解析：

```javascript
// 原始输出
{ "detections": "[{\"bbox\": [...]}]" }

// 解析后的输出
{ "detections": [{ "bbox": [...] }] }  // JSON 数组，可以直接使用
```

## 最佳实践

1. **始终验证边框** - 确保坐标在图像边界内
2. **使用 track_id 以保持连续性** - 跟踪帧间的对象以获得更好的结果
3. **有效组合类型** - 使用 ROI 限制检测区域
4. **时间戳精度** - 为视频分析维持一致的时间戳计算
5. **sourceId 一致性** - 为来自同一源的所有帧保持 sourceId 常数

## 故障排除

**"无效的检测"**
- 检查边框格式：[x, y, width, height]
- 验证坐标为非负数
- 确保框在图像边界内

**"缺少关键点"**
- 并非所有技能都支持关键点
- 检查技能文档以了解支持的输出

**"ROI 未被识别"**
- 验证多边形是否闭合（第一个点 ≠ 最后一个点）
- 确保至少有 3 个点
- 检查点的顺序是否正确（顺时针或逆时针）

**"sourceId 不匹配"**
- 如果视频文件改变，重新计算 sourceId
- 对来自同一源的所有帧使用相同的 sourceId

FILE:scripts/visualize.mjs
#!/usr/bin/env node

import fs from 'fs';
import path from 'path';
import sharp from 'sharp';

/**
 * Read all stdin as a string (for piped input).
 */
function readStdin() {
  return new Promise((resolve, reject) => {
    const chunks = [];
    process.stdin.setEncoding('utf-8');
    process.stdin.on('data', (chunk) => chunks.push(chunk));
    process.stdin.on('end', () => resolve(chunks.join('')));
    process.stdin.on('error', reject);
  });
}

/**
 * Escape XML special characters for safe SVG embedding.
 */
export function escapeXml(str) {
  return String(str)
    .replace(/&/g, '&amp;')
    .replace(/</g, '&lt;')
    .replace(/>/g, '&gt;')
    .replace(/"/g, '&quot;')
    .replace(/'/g, '&apos;');
}

/**
 * Build SVG for detection bounding boxes and labels (green).
 */
export function buildDetectionSvg(detections, width, height) {
  const rects = [];
  const fontSize = 14;
  const padding = 4;
  const labelHeight = fontSize + padding * 2;

  for (const det of detections) {
    const [x, y, w, h] = det.bbox;
    // Bounding box rectangle
    rects.push(
      `<rect x="x" y="y" width="w" height="h" fill="none" stroke="rgba(0,255,0,0.8)" stroke-width="2"/>`
    );

    // Label text
    const parts = [];
    if (det.track_id != null) parts.push(`#det.track_id`);
    if (det.category && det.category.name) parts.push(det.category.name);
    if (det.confidence != null) parts.push((det.confidence * 100).toFixed(0) + '%');
    const label = parts.join(' ');

    if (label) {
      const labelWidth = label.length * (fontSize * 0.65) + padding * 2;
      const labelY = Math.max(y - labelHeight, 0);
      // Label background
      rects.push(
        `<rect x="x" y="labelY" width="labelWidth" height="labelHeight" fill="rgba(0,255,0,0.7)" rx="2"/>`
      );
      // Label text
      rects.push(
        `<text x="x + padding" y="labelY + fontSize + padding - 2" font-family="sans-serif" font-size="fontSize" fill="white">escapeXml(label)</text>`
      );
    }
  }

  return `<svg xmlns="http://www.w3.org/2000/svg" width="width" height="height">rects.join('')</svg>`;
}

/**
 * Helper function to draw an arrow at a specific point.
 * @param {Array} elements - Array to push SVG elements to
 * @param {Object} point - {x, y} coordinates
 * @param {number} dirX - Unit direction vector X component
 * @param {number} dirY - Unit direction vector Y component
 * @param {number} arrowSize - Size of the arrow
 * @param {string} color - Arrow color (rgba string)
 */
function drawArrowAtPoint(elements, point, dirX, dirY, arrowSize, color) {
  const tipX = point.x + dirX * arrowSize;
  const tipY = point.y + dirY * arrowSize;

  // Perpendicular base vectors
  const perpX = -dirY * arrowSize * 0.6;
  const perpY = dirX * arrowSize * 0.6;

  const b1x = point.x - dirX * arrowSize - perpX;
  const b1y = point.y - dirY * arrowSize - perpY;
  const b2x = point.x - dirX * arrowSize + perpX;
  const b2y = point.y - dirY * arrowSize + perpY;

  elements.push(
    `<polygon points="tipX,tipY b1x,b1y b2x,b2y" fill="color"/>`
  );
}

/**
 * Build SVG for ROI polygons and Tripwire polylines (input overlays).
 *
 * Each overlay item:
 *   { kind: "ROI", points: [x1,y1,x2,y2,...], name?: string }
 *   { kind: "TripWire", points: [x1,y1,x2,y2,...], name?: string, direction?: string }
 *
 * Each crossing item (optional):
 *   { tripwireId?: string, prevPos: [x,y], currPos: [x,y], dotProduct: number }
 */
export function buildOverlaysSvg(overlays, width, height, crossings = []) {
  const elements = [];
  const fontSize = 13;
  const padding = 4;
  const labelHeight = fontSize + padding * 2;

  for (const overlay of overlays) {
    const pts = overlay.points || [];
    // Convert flat [x1,y1,x2,y2,...] to "x1,y1 x2,y2 ..."
    const pointPairs = [];
    for (let i = 0; i < pts.length - 1; i += 2) {
      pointPairs.push(`pts[i],pts[i + 1]`);
    }
    const pointsStr = pointPairs.join(' ');

    if (overlay.kind === 'ROI') {
      // ROI: blue polygon with semi-transparent fill
      elements.push(
        `<polygon points="pointsStr" fill="rgba(0,150,255,0.15)" stroke="rgba(0,150,255,0.6)" stroke-width="2"/>`
      );

      // Label at first vertex
      const name = overlay.name || overlay.displayName || '';
      if (name && pts.length >= 2) {
        const lx = pts[0];
        const ly = Math.max(pts[1] - labelHeight, 0);
        const labelWidth = name.length * (fontSize * 0.65) + padding * 2;
        elements.push(
          `<rect x="lx" y="ly" width="labelWidth" height="labelHeight" fill="rgba(0,150,255,0.8)" rx="2"/>`
        );
        elements.push(
          `<text x="lx + padding" y="ly + fontSize + padding - 2" font-family="sans-serif" font-size="fontSize" fill="white">escapeXml(name)</text>`
        );
      }
    } else if (overlay.kind === 'TripWire') {
      // TripWire with 4-point structure support
      // points = [x1,y1, x2,y2, x3,y3, x4,y4]
      // p1-p2: main tripwire
      // p3-p4: A-B detection line with directional arrows
      // If only 2-point format provided, auto-generate perpendicular p3-p4

      let p1, p2, p3, p4;

      if (pts.length >= 8) {
        // 4-point structure (explicit format)
        p1 = {x: pts[0], y: pts[1]};
        p2 = {x: pts[2], y: pts[3]};
        p3 = {x: pts[4], y: pts[5]};
        p4 = {x: pts[6], y: pts[7]};
      } else if (pts.length >= 4) {
        // 2-point format: auto-generate perpendicular A-B points
        p1 = {x: pts[0], y: pts[1]};
        p2 = {x: pts[2], y: pts[3]};

        // Calculate perpendicular distance for A-B points
        const dx = p2.x - p1.x;
        const dy = p2.y - p1.y;
        const len = Math.sqrt(dx * dx + dy * dy) || 1;
        // Unit perpendicular vector (rotated 90 degrees counter-clockwise)
        const perpX = -dy / len;
        const perpY = dx / len;
        // Distance from the main line to A-B points
        const perpDist = 30;
        // Generate A-B points perpendicular to the main tripwire
        p3 = {
          x: p1.x + perpX * perpDist,
          y: p1.y + perpY * perpDist
        };
        p4 = {
          x: p2.x + perpX * perpDist,
          y: p2.y + perpY * perpDist
        };
      } else {
        // Not enough points, skip rendering
        p1 = p2 = p3 = p4 = null;
      }

      if (p1 && p2 && p3 && p4) {

        // Draw main tripwire line (p1-p2) - dashed, no arrow
        elements.push(
          `<line x1="p1.x" y1="p1.y" x2="p2.x" y2="p2.y" stroke="rgba(255,165,0,0.8)" stroke-width="2" stroke-dasharray="8,4"/>`
        );

        // Draw A-B detection line (p3-p4) - dashed with direction arrow
        elements.push(
          `<line x1="p3.x" y1="p3.y" x2="p4.x" y2="p4.y" stroke="rgba(255,165,0,0.8)" stroke-width="2" stroke-dasharray="8,4"/>`
        );

        // Calculate direction vector (p3 -> p4)
        const dx = p4.x - p3.x;
        const dy = p4.y - p3.y;
        const len = Math.sqrt(dx * dx + dy * dy) || 1;
        const unitX = dx / len;
        const unitY = dy / len;

        const dir = overlay.direction || 'TwoWay';
        const arrowSize = 12;

        // Draw directional arrows
        if (dir === 'Forward') {
          // Arrow at p4, pointing toward p4 (green)
          drawArrowAtPoint(elements, p4, unitX, unitY, arrowSize, 'rgba(76,175,80,0.8)');
        } else if (dir === 'Backward') {
          // Arrow at p3, pointing toward p3 (red)
          drawArrowAtPoint(elements, p3, -unitX, -unitY, arrowSize, 'rgba(244,67,54,0.8)');
        } else if (dir === 'TwoWay') {
          // Arrows at both p3 and p4 (blue)
          drawArrowAtPoint(elements, p3, -unitX, -unitY, arrowSize, 'rgba(33,150,243,0.8)');
          drawArrowAtPoint(elements, p4, unitX, unitY, arrowSize, 'rgba(33,150,243,0.8)');
        }

        // Add region labels 'A' and 'B'
        const labelFontSize = 12;
        const labelOffset = 8;
        // Label 'A' near p3
        elements.push(
          `<text x="p3.x - labelOffset" y="p3.y - labelOffset" font-family="sans-serif" font-size="labelFontSize" fill="rgba(255,165,0,0.8)" font-weight="bold">A</text>`
        );
        // Label 'B' near p4
        elements.push(
          `<text x="p4.x + labelOffset" y="p4.y - labelOffset" font-family="sans-serif" font-size="labelFontSize" fill="rgba(255,165,0,0.8)" font-weight="bold">B</text>`
        );
      }

      // Label at first vertex (applies to both formats)
      const name = overlay.name || overlay.displayName || '';
      if (name && pts.length >= 2) {
        const lx = pts[0];
        const ly = Math.max(pts[1] - labelHeight, 0);
        const labelWidth = name.length * (fontSize * 0.65) + padding * 2;
        elements.push(
          `<rect x="lx" y="ly" width="labelWidth" height="labelHeight" fill="rgba(255,165,0,0.8)" rx="2"/>`
        );
        elements.push(
          `<text x="lx + padding" y="ly + fontSize + padding - 2" font-family="sans-serif" font-size="fontSize" fill="white">escapeXml(name)</text>`
        );
      }
    }
  }

  // Draw crossing vectors perpendicular to tripwire
  for (const crossing of crossings) {
    const { tripwireId, prevPos, currPos, dotProduct, direction } = crossing;
    if (!prevPos || !currPos) continue;

    const [prevX, prevY] = prevPos;
    const [currX, currY] = currPos;

    // Crossing point is midpoint between previous and current position
    const crossX = (prevX + currX) / 2;
    const crossY = (prevY + currY) / 2;

    // Determine color based on crossing type
    let arrowColor, dotColor;
    if (direction === 'Forward') {
      // Forward crossing (dot > 0): green
      arrowColor = 'rgba(76,175,80,0.8)';
      dotColor = '#4caf50';
    } else if (direction === 'Backward') {
      // Backward crossing (dot < 0): red
      arrowColor = 'rgba(244,67,54,0.8)';
      dotColor = '#f44336';
    } else {
      // TwoWay crossing: blue
      arrowColor = 'rgba(33,150,243,0.8)';
      dotColor = '#2196f3';
    }

    // Find the associated tripwire to get its direction
    let tripwirePerpX = 1;   // default perpendicular
    let tripwirePerpY = 0;
    for (const overlay of overlays) {
      if (overlay.kind === 'TripWire' && overlay.name === tripwireId) {
        // Get tripwire direction vector
        const pts = Array.isArray(overlay.points[0]) ? overlay.points.flat() : overlay.points;
        if (pts.length >= 4) {
          const midIdx = Math.floor(pts.length / 4);
          const x1 = pts[midIdx * 2];
          const y1 = pts[midIdx * 2 + 1];
          const x2 = pts[(midIdx + 1) * 2];
          const y2 = pts[(midIdx + 1) * 2 + 1];
          const dx = x2 - x1;
          const dy = y2 - y1;
          const len = Math.sqrt(dx * dx + dy * dy) || 1;
          // Perpendicular vector (rotated 90 degrees)
          tripwirePerpX = -dy / len;
          tripwirePerpY = dx / len;
        }
        break;
      }
    }

    const arrowSize = 12;

    if (direction === 'TwoWay') {
      // TwoWay: draw arrows in both directions perpendicular to tripwire

      // Forward arrow
      const fx1 = crossX + tripwirePerpX * arrowSize;
      const fy1 = crossY + tripwirePerpY * arrowSize;
      elements.push(
        `<line x1="crossX" y1="crossY" x2="fx1" y2="fy1" stroke="arrowColor" stroke-width="2.5" opacity="0.8"/>`
      );
      const perp2X = tripwirePerpY * arrowSize * 0.6;
      const perp2Y = -tripwirePerpX * arrowSize * 0.6;
      elements.push(
        `<polygon points="fx1,fy1 fx1 - tripwirePerpX * arrowSize - perp2X,fy1 - tripwirePerpY * arrowSize - perp2Y fx1 - tripwirePerpX * arrowSize + perp2X,fy1 - tripwirePerpY * arrowSize + perp2Y" fill="arrowColor"/>`
      );

      // Backward arrow
      const bx1 = crossX - tripwirePerpX * arrowSize;
      const by1 = crossY - tripwirePerpY * arrowSize;
      elements.push(
        `<line x1="crossX" y1="crossY" x2="bx1" y2="by1" stroke="arrowColor" stroke-width="2.5" opacity="0.8"/>`
      );
      elements.push(
        `<polygon points="bx1,by1 bx1 + tripwirePerpX * arrowSize - perp2X,by1 + tripwirePerpY * arrowSize - perp2Y bx1 + tripwirePerpX * arrowSize + perp2X,by1 + tripwirePerpY * arrowSize + perp2Y" fill="arrowColor"/>`
      );

      // Center circle
      elements.push(
        `<circle cx="crossX" cy="crossY" r="4" fill="dotColor" stroke="white" stroke-width="1.5"/>`
      );

      // Label
      elements.push(
        `<text x="crossX + 10" y="crossY - 5" font-size="11" fill="dotColor" font-weight="bold">⟷ TwoWay</text>`
      );
    } else {
      // Forward or Backward: single arrow perpendicular to tripwire
      const sign = direction === 'Forward' ? 1 : -1;
      const arrowX = crossX + tripwirePerpX * arrowSize * sign;
      const arrowY = crossY + tripwirePerpY * arrowSize * sign;

      // Arrow shaft
      elements.push(
        `<line x1="crossX" y1="crossY" x2="arrowX" y2="arrowY" stroke="arrowColor" stroke-width="2.5" opacity="0.8"/>`
      );

      // Arrow head
      const perpBaseX = tripwirePerpY * arrowSize * 0.6;
      const perpBaseY = -tripwirePerpX * arrowSize * 0.6;
      elements.push(
        `<polygon points="arrowX,arrowY arrowX - tripwirePerpX * arrowSize * sign - perpBaseX,arrowY - tripwirePerpY * arrowSize * sign - perpBaseY arrowX - tripwirePerpX * arrowSize * sign + perpBaseX,arrowY - tripwirePerpY * arrowSize * sign + perpBaseY" fill="arrowColor"/>`
      );

      // Center circle
      elements.push(
        `<circle cx="crossX" cy="crossY" r="4" fill="dotColor" stroke="white" stroke-width="1.5"/>`
      );

      // Label
      const dirLabel = direction === 'Forward' ? '✓ Forward' : '✗ Backward';
      elements.push(
        `<text x="crossX + 10" y="crossY - 5" font-size="11" fill="dotColor" font-weight="bold">dirLabel</text>`
      );
    }
  }

  return `<svg xmlns="http://www.w3.org/2000/svg" width="width" height="height">elements.join('')</svg>`;
}

/**
 * Build SVG for text lines displayed in a footer area.
 * Returns { svg: string, height: number }.
 *
 * The SVG is positioned to sit at the bottom of the extended canvas,
 * starting at y=imageHeight.
 */
export function buildTextFooterSvg(textLines, width, imageHeight) {
  const fontSize = 16;
  const lineHeight = fontSize + 8;
  const paddingTop = 10;
  const paddingBottom = 10;
  const paddingLeft = 12;
  const totalHeight = paddingTop + textLines.length * lineHeight + paddingBottom;

  const elements = [];
  // Dark background
  elements.push(
    `<rect x="0" y="imageHeight" width="width" height="totalHeight" fill="rgba(0,0,0,0.75)"/>`
  );

  // Text lines
  for (let i = 0; i < textLines.length; i++) {
    const ty = imageHeight + paddingTop + (i + 1) * lineHeight - 4;
    elements.push(
      `<text x="paddingLeft" y="ty" font-family="sans-serif" font-size="fontSize" fill="white">escapeXml(textLines[i])</text>`
    );
  }

  const svg = `<svg xmlns="http://www.w3.org/2000/svg" width="width" height="imageHeight + totalHeight">elements.join('')</svg>`;
  return { svg, height: totalHeight };
}

/**
 * Parse named CLI flags from argv.
 * Returns { positional: string[], flags: Record<string, string> }.
 */
function parseArgs(argv) {
  const positional = [];
  const flags = {};
  let i = 0;
  while (i < argv.length) {
    if (argv[i].startsWith('--') && i + 1 < argv.length) {
      const key = argv[i].slice(2);
      flags[key] = argv[i + 1];
      i += 2;
    } else {
      positional.push(argv[i]);
      i += 1;
    }
  }
  return { positional, flags };
}

async function main() {
  const { positional, flags } = parseArgs(process.argv.slice(2));

  const inputImage = positional[0];
  let detectionsArg = positional[1];
  let outputPath = positional[2];

  if (!inputImage || !detectionsArg) {
    console.error('Usage: node visualize.mjs <input-image> <detections-json | -> [output-path] [--overlays \'<json>\'] [--text \'<json>\'] [--crossings \'<json>\']');
    console.error('');
    console.error('  <detections-json>  JSON array of detections (parsedValue format), or "-" to read from stdin');
    console.error('  [output-path]      Optional output path. Default: <input>_detection.<ext>');
    console.error('  --overlays         JSON array of ROI/Tripwire overlay objects');
    console.error('  --text             JSON array of text lines to display as footer');
    console.error('  --crossings        JSON array of tripwire crossing events with motion vectors');
    process.exit(1);
  }

  // Resolve input image
  const resolvedInput = path.resolve(inputImage);
  if (!fs.existsSync(resolvedInput)) {
    console.error(`Input image not found: resolvedInput`);
    process.exit(1);
  }

  // Read detections JSON
  let detectionsJson;
  if (detectionsArg === '-') {
    detectionsJson = await readStdin();
  } else {
    detectionsJson = detectionsArg;
  }

  let detections;
  try {
    detections = JSON.parse(detectionsJson);
  } catch (err) {
    console.error(`Failed to parse detections JSON: err.message`);
    process.exit(1);
  }

  if (!Array.isArray(detections)) {
    console.error('Detections must be a JSON array');
    process.exit(1);
  }

  // Parse --overlays
  let overlays = [];
  if (flags.overlays) {
    try {
      overlays = JSON.parse(flags.overlays);
      if (!Array.isArray(overlays)) {
        console.error('--overlays must be a JSON array');
        process.exit(1);
      }
    } catch (err) {
      console.error(`Failed to parse --overlays JSON: err.message`);
      process.exit(1);
    }
  }

  // Parse --text
  let textLines = [];
  if (flags.text) {
    try {
      textLines = JSON.parse(flags.text);
      if (!Array.isArray(textLines)) {
        console.error('--text must be a JSON array');
        process.exit(1);
      }
    } catch (err) {
      console.error(`Failed to parse --text JSON: err.message`);
      process.exit(1);
    }
  }

  // Parse --crossings
  let crossings = [];
  if (flags.crossings) {
    try {
      crossings = JSON.parse(flags.crossings);
      if (!Array.isArray(crossings)) {
        console.error('--crossings must be a JSON array');
        process.exit(1);
      }
    } catch (err) {
      console.error(`Failed to parse --crossings JSON: err.message`);
      process.exit(1);
    }
  }

  // Validate: must have something to draw
  if (detections.length === 0 && overlays.length === 0 && textLines.length === 0) {
    console.error('Nothing to draw: detections, overlays, and text are all empty');
    process.exit(1);
  }

  // Default output path
  if (!outputPath) {
    const ext = path.extname(inputImage);
    const base = path.basename(inputImage, ext);
    const dir = path.dirname(resolvedInput);
    outputPath = path.join(dir, `base_detectionext`);
  } else {
    outputPath = path.resolve(outputPath);
  }

  // Read image metadata
  const image = sharp(resolvedInput);
  const metadata = await image.metadata();
  const { width, height } = metadata;

  // Build composite layers
  const composites = [];

  // Layer 1: input overlays (ROI/Tripwire) — drawn first (bottom layer)
  if (overlays.length > 0) {
    const overlaySvg = buildOverlaysSvg(overlays, width, height, crossings);
    composites.push({ input: Buffer.from(overlaySvg), top: 0, left: 0 });
  }

  // Layer 2: detection bboxes — drawn on top of overlays
  if (detections.length > 0) {
    const detSvg = buildDetectionSvg(detections, width, height);
    composites.push({ input: Buffer.from(detSvg), top: 0, left: 0 });
  }

  // Layer 3: text footer — extend canvas and draw at bottom
  let pipeline = image;
  if (textLines.length > 0) {
    const { svg: textSvg, height: textHeight } = buildTextFooterSvg(textLines, width, height);
    pipeline = pipeline.extend({
      bottom: textHeight,
      background: { r: 0, g: 0, b: 0, alpha: 1 },
    });
    composites.push({ input: Buffer.from(textSvg), top: 0, left: 0 });
  }

  // Composite and save
  await pipeline
    .composite(composites)
    .toFile(outputPath);

  console.log(outputPath);
}

import { fileURLToPath } from 'url';

if (process.argv[1] === fileURLToPath(import.meta.url)) {
  main();
}

FILE:scripts/show-grid.mjs
#!/usr/bin/env node

import fs from 'fs';
import path from 'path';
import sharp from 'sharp';
import { fileURLToPath } from 'url';

/**
 * Escape XML special characters for safe SVG embedding.
 */
function escapeXml(str) {
  return String(str)
    .replace(/&/g, '&amp;')
    .replace(/</g, '&lt;')
    .replace(/>/g, '&gt;')
    .replace(/"/g, '&quot;')
    .replace(/'/g, '&apos;');
}

/**
 * Compute adaptive grid dimensions so that cells are approximately square.
 *
 * @param {number} imageWidth
 * @param {number} imageHeight
 * @returns {{ cols: number, rows: number }}
 */
export function computeGridSize(imageWidth, imageHeight) {
  // Target ~30-42 intersection points with roughly square cells.
  // We pick the short side = 4 segments and scale the long side proportionally.
  const minSegments = 4;
  const ratio = imageWidth / imageHeight;

  let cols, rows;
  if (ratio >= 1) {
    // Landscape or square
    rows = minSegments;
    cols = Math.round(rows * ratio);
    // Clamp cols to a reasonable range
    cols = Math.max(minSegments, Math.min(cols, 12));
  } else {
    // Portrait
    cols = minSegments;
    rows = Math.round(cols / ratio);
    rows = Math.max(minSegments, Math.min(rows, 12));
  }

  return { cols, rows };
}

/**
 * Generate column labels: A, B, C, ..., Z, AA, AB, ...
 */
export function generateColLabels(count) {
  const labels = [];
  for (let i = 0; i < count; i++) {
    let label = '';
    let n = i;
    do {
      label = String.fromCharCode(65 + (n % 26)) + label;
      n = Math.floor(n / 26) - 1;
    } while (n >= 0);
    labels.push(label);
  }
  return labels;
}

/**
 * Generate row labels: 0, 1, 2, ...
 */
export function generateRowLabels(count) {
  return Array.from({ length: count }, (_, i) => String(i));
}

/**
 * Build an SVG overlay with grid lines, intersection dots, and labels.
 *
 * The SVG covers the full extended canvas (margin + image).
 *
 * @param {object} params
 * @param {number} params.imageWidth
 * @param {number} params.imageHeight
 * @param {number} params.cols - number of column segments (cols+1 vertical lines)
 * @param {number} params.rows - number of row segments (rows+1 horizontal lines)
 * @param {number} params.marginTop
 * @param {number} params.marginLeft
 * @param {number} params.marginBottom
 * @param {number} params.marginRight
 * @returns {string} SVG string
 */
export function buildGridSvg({ imageWidth, imageHeight, cols, rows, marginTop, marginLeft, marginBottom, marginRight }) {
  const totalWidth = marginLeft + imageWidth + marginRight;
  const totalHeight = marginTop + imageHeight + marginBottom;
  const gridWidth = imageWidth / cols;
  const gridHeight = imageHeight / rows;

  const colLabels = generateColLabels(cols + 1);
  const rowLabels = generateRowLabels(rows + 1);

  const elements = [];
  const fontSize = 16;
  const dotR = 5;

  // Grid lines — vertical
  for (let c = 0; c <= cols; c++) {
    const x = marginLeft + c * gridWidth;
    elements.push(
      `<line x1="x" y1="marginTop" x2="x" y2="totalHeight" stroke="rgba(255,255,255,0.55)" stroke-width="2"/>`
    );
  }

  // Grid lines — horizontal
  for (let r = 0; r <= rows; r++) {
    const y = marginTop + r * gridHeight;
    elements.push(
      `<line x1="marginLeft" y1="y" x2="totalWidth" y2="y" stroke="rgba(255,255,255,0.55)" stroke-width="2"/>`
    );
  }

  // Intersection dots
  for (let c = 0; c <= cols; c++) {
    for (let r = 0; r <= rows; r++) {
      const cx = marginLeft + c * gridWidth;
      const cy = marginTop + r * gridHeight;
      elements.push(
        `<circle cx="cx" cy="cy" r="dotR" fill="white" stroke="rgba(0,0,0,0.7)" stroke-width="1.5"/>`
      );
    }
  }

  // Column labels — along the top margin
  for (let c = 0; c <= cols; c++) {
    const x = marginLeft + c * gridWidth;
    const textWidth = colLabels[c].length * fontSize * 0.65;
    const bgWidth = textWidth + 10;
    const bgX = x - bgWidth / 2;
    const bgY = 2;
    const bgH = fontSize + 10;
    elements.push(
      `<rect x="bgX" y="bgY" width="bgWidth" height="bgH" fill="rgba(0,0,0,0.75)" rx="3"/>`
    );
    elements.push(
      `<text x="x" y="bgY + fontSize + 3" font-family="sans-serif" font-size="fontSize" font-weight="bold" fill="white" text-anchor="middle">escapeXml(colLabels[c])</text>`
    );
  }

  // Row labels — along the left margin
  for (let r = 0; r <= rows; r++) {
    const y = marginTop + r * gridHeight;
    const label = rowLabels[r];
    const textWidth = label.length * fontSize * 0.65;
    const bgWidth = textWidth + 10;
    const bgH = fontSize + 10;
    const bgX = 2;
    const bgY = y - bgH / 2;
    elements.push(
      `<rect x="bgX" y="bgY" width="bgWidth" height="bgH" fill="rgba(0,0,0,0.75)" rx="3"/>`
    );
    elements.push(
      `<text x="bgX + bgWidth / 2" y="bgY + fontSize + 3" font-family="sans-serif" font-size="fontSize" font-weight="bold" fill="white" text-anchor="middle">escapeXml(label)</text>`
    );
  }

  return `<svg xmlns="http://www.w3.org/2000/svg" width="totalWidth" height="totalHeight">elements.join('')</svg>`;
}

/**
 * Parse named CLI flags from argv.
 */
function parseArgs(argv) {
  const positional = [];
  const flags = {};
  let i = 0;
  while (i < argv.length) {
    if (argv[i].startsWith('--') && i + 1 < argv.length) {
      const key = argv[i].slice(2);
      flags[key] = argv[i + 1];
      i += 2;
    } else {
      positional.push(argv[i]);
      i += 1;
    }
  }
  return { positional, flags };
}

async function main() {
  const { positional, flags } = parseArgs(process.argv.slice(2));
  const inputImage = positional[0];
  let outputPath = positional[1];

  if (!inputImage) {
    console.error('Usage: node show-grid.mjs <input-image> [output-path] [--cols N] [--rows N]');
    console.error('');
    console.error('  Generates a grid reference image for specifying ROI/Tripwire coordinates.');
    console.error('  --cols   Number of column segments (default: auto based on aspect ratio)');
    console.error('  --rows   Number of row segments (default: auto based on aspect ratio)');
    process.exit(1);
  }

  const resolvedInput = path.resolve(inputImage);
  if (!fs.existsSync(resolvedInput)) {
    console.error(`Input image not found: resolvedInput`);
    process.exit(1);
  }

  // Read image metadata
  const image = sharp(resolvedInput);
  const metadata = await image.metadata();
  const { width: imageWidth, height: imageHeight } = metadata;

  // Compute grid size
  let { cols, rows } = computeGridSize(imageWidth, imageHeight);
  if (flags.cols) cols = parseInt(flags.cols, 10);
  if (flags.rows) rows = parseInt(flags.rows, 10);

  if (cols < 1 || rows < 1 || isNaN(cols) || isNaN(rows)) {
    console.error('--cols and --rows must be positive integers');
    process.exit(1);
  }

  const marginTop = 32;
  const marginLeft = 32;
  const marginBottom = 18;
  const marginRight = 18;

  // Build SVG
  const svg = buildGridSvg({ imageWidth, imageHeight, cols, rows, marginTop, marginLeft, marginBottom, marginRight });

  // Default output path
  if (!outputPath) {
    const ext = path.extname(inputImage);
    const base = path.basename(inputImage, ext);
    const dir = path.dirname(resolvedInput);
    outputPath = path.join(dir, `base_gridext`);
  } else {
    outputPath = path.resolve(outputPath);
  }

  // Extend canvas for margins, composite grid SVG
  await image
    .extend({
      top: marginTop,
      left: marginLeft,
      bottom: marginBottom,
      right: marginRight,
      background: { r: 30, g: 30, b: 30, alpha: 1 },
    })
    .composite([{ input: Buffer.from(svg), top: 0, left: 0 }])
    .toFile(outputPath);

  // Output metadata JSON for Agent consumption
  const gridWidth = imageWidth / cols;
  const gridHeight = imageHeight / rows;
  const colLabels = generateColLabels(cols + 1);
  const rowLabels = generateRowLabels(rows + 1);

  const result = {
    path: outputPath,
    cols,
    rows,
    colLabels,
    rowLabels,
    gridWidth: Math.round(gridWidth * 100) / 100,
    gridHeight: Math.round(gridHeight * 100) / 100,
    marginLeft,
    marginTop,
    imageWidth,
    imageHeight,
  };

  console.log(JSON.stringify(result));
}

if (process.argv[1] === fileURLToPath(import.meta.url)) {
  main();
}

FILE:scripts/intent-invoke.mjs
#!/usr/bin/env node

import { querySkills } from './query.mjs';
import { callMultimodal } from './multimodal.mjs';
import { runSkill } from './invoke.mjs';
import { getWorkspaceSkills } from './workspace.mjs';

const DEFAULT_CONFIDENCE_THRESHOLD = 0.7;
const MAX_SKILL_RETRIES = 3;

/**
 * Determine if we should use the skill based on confidence threshold.
 */
export function shouldUseSkill(skill, threshold = DEFAULT_CONFIDENCE_THRESHOLD) {
  return !!(skill && skill.epId && typeof skill.confidence === 'number' && skill.confidence >= threshold);
}

/**
 * Format the final result consistently.
 */
export function formatResult(mode, skill, detections = []) {
  return {
    success: true,
    mode,
    epId: skill?.epId || null,
    skillName: skill?.name || null,
    confidence: skill?.confidence || null,
    count: detections.length,
    detections
  };
}

/**
 * Extract detections array from a runSkill() result.
 */
function extractDetections(result) {
  if (result?.未检出) return [];
  const outputs = result?.result?.outputs;
  if (!outputs || outputs.length === 0) return [];
  const output = outputs[0];
  if (output?.parsedValue && Array.isArray(output.parsedValue)) {
    return output.parsedValue;
  }
  return [];
}

/**
 * Main orchestration function:
 * 1. Query router for skills matching intent
 * 2. Try top skills one by one (up to MAX_SKILL_RETRIES)
 * 3. If all skills fail, fallback to multimodal
 */
export async function intentInvoke(intent, imagePath, options = {}) {
  const { threshold = DEFAULT_CONFIDENCE_THRESHOLD, topK = 5 } = options;

  // Step 1: Query for matching skills
  let skills;
  try {
    skills = await querySkills(intent, { topK });
  } catch (err) {
    // Query failed, will fallback to multimodal
    skills = [];
  }

  // Step 2: Try top skills one by one
  const candidates = skills.filter(s => shouldUseSkill(s, threshold)).slice(0, MAX_SKILL_RETRIES);
  for (const skill of candidates) {
    try {
      const { result } = await runSkill(skill.epId, { input0: { image: imagePath } }, { autoROI: true });
      const detections = extractDetections(result);
      return formatResult('skill', skill, detections);
    } catch (err) {
      console.error(`Skill skill.epId (skill.name) failed: err.message`);
      // Continue to next skill
    }
  }

  // Step 3: Public skills all failed — search private workspace
  try {
    const privateSkills = await getWorkspaceSkills();
    if (privateSkills.length > 0) {
      return {
        success: true,
        mode: 'workspace-search',
        epId: null,
        skillName: null,
        confidence: null,
        count: 0,
        detections: [],
        privateSkills: privateSkills.map(s => ({
          epId: s.epId,
          displayName: s.displayName,
          description: s.description,
        })),
        hint: '公共技能无匹配。请从 privateSkills 列表中选择最匹配的技能，使用 invoke.mjs 调用。如无合适技能，可调用 multimodal.mjs 回退。',
      };
    }
  } catch (err) {
    console.error(`Private workspace search failed: err.message`);
    // Continue to multimodal fallback
  }

  // Step 4: Fallback to multimodal
  try {
    const detections = await callMultimodal(intent, imagePath);
    return formatResult('multimodal', null, detections);
  } catch (err) {
    return {
      success: false,
      error: err.message,
      mode: 'multimodal',
      epId: null,
      detections: []
    };
  }
}

/**
 * CLI entry point.
 */
async function main() {
  const intent = process.argv[2];
  const imagePath = process.argv[3];
  const threshold = parseFloat(process.argv[4]) || DEFAULT_CONFIDENCE_THRESHOLD;

  if (!intent) {
    console.error('Usage: node intent-invoke.mjs <intent> [image-path] [threshold]');
    console.error('Example: node intent-invoke.mjs "detect people falling" photo.jpg 0.8');
    console.error('');
    console.error('Parameters:');
    console.error('  intent      - Description of what to detect (required)');
    console.error('  image-path  - Path to image file (optional)');
    console.error('  threshold   - Confidence threshold 0.0-1.0 (default: 0.7)');
    process.exit(1);
  }

  try {
    const result = await intentInvoke(intent, imagePath, { threshold });
    console.log(JSON.stringify(result, null, 2));
    process.exit(result.success ? 0 : 1);
  } catch (err) {
    console.error(JSON.stringify({
      success: false,
      error: err.message
    }, null, 2));
    process.exit(1);
  }
}

if (import.meta.url === `file://process.argv[1]`) {
  main();
}

FILE:scripts/workspace.mjs
#!/usr/bin/env node

import fs from 'fs';
import path from 'path';
import { fileURLToPath } from 'url';
import { httpsRequest, getApiKey, workspacesGetUrl, workspaceSkillsGetUrl } from './utils.mjs';

const __filename = fileURLToPath(import.meta.url);
const __dirname = path.dirname(__filename);
const CACHE_DIR = path.join(__dirname, '..', '.cache');
const WORKSPACE_CACHE_FILE = path.join(CACHE_DIR, 'workspace-cache.json');
const SKILLS_CACHE_FILE = path.join(CACHE_DIR, 'skills-cache.json');
const SKILLS_CACHE_TTL = 60 * 60 * 1000; // 1 hour in ms

// ── Helpers ──────────────────────────────────────────────────────────

function ensureCacheDir() {
  if (!fs.existsSync(CACHE_DIR)) {
    fs.mkdirSync(CACHE_DIR, { recursive: true });
  }
}

function readCache(filePath, ttl) {
  if (!fs.existsSync(filePath)) return null;
  try {
    const data = JSON.parse(fs.readFileSync(filePath, 'utf-8'));
    if (ttl && Date.now() - data.timestamp > ttl) return null;
    return data;
  } catch {
    return null;
  }
}

function writeCache(filePath, data) {
  ensureCacheDir();
  fs.writeFileSync(filePath, JSON.stringify(data, null, 2), 'utf-8');
}

function buildEpId(workspaceId, localName) {
  // localName format: c-sk-XXXXX → extract XXXXX part
  const suffix = localName.replace(/^c-sk-/, '');
  return `ep-workspaceId-suffix`;
}

// ── API Functions ────────────────────────────────────────────────────

/**
 * Get the default workspace ID. Uses long-lived cache (tied to API key).
 */
export async function getDefaultWorkspaceId() {
  const cached = readCache(WORKSPACE_CACHE_FILE);
  if (cached?.workspaceId) return cached.workspaceId;

  const apiKey = getApiKey();
  const url = workspacesGetUrl();
  const { statusCode, body } = await httpsRequest(url, {
    method: 'POST',
    headers: {
      'Authorization': `Bearer apiKey`,
      'Content-Type': 'application/json',
    },
  });

  if (statusCode !== 200) {
    throw new Error(`workspaces/get failed: HTTP statusCode - body`);
  }

  const resp = JSON.parse(body);
  if (!resp.success || !resp.page?.result) {
    throw new Error(`workspaces/get unexpected response: body`);
  }

  const defaultWorkspace = resp.page.result.find(ws => ws.type === 'default');
  if (!defaultWorkspace) {
    throw new Error('No default workspace found');
  }

  const workspaceId = defaultWorkspace.id;
  writeCache(WORKSPACE_CACHE_FILE, { workspaceId, timestamp: Date.now() });
  return workspaceId;
}

/**
 * Get all released SkillApi skills from the default workspace.
 * Uses file cache with 1-hour TTL. Each skill includes pre-constructed epId.
 */
export async function getWorkspaceSkills() {
  const cached = readCache(SKILLS_CACHE_FILE, SKILLS_CACHE_TTL);
  if (cached?.skills) return cached.skills;

  const workspaceId = await getDefaultWorkspaceId();
  const apiKey = getApiKey();
  const url = workspaceSkillsGetUrl(workspaceId);

  const allSkills = [];
  let pageNo = 1;
  const pageSize = 50;

  // Paginate to fetch all skills
  while (true) {
    const { statusCode, body } = await httpsRequest(url, {
      method: 'POST',
      headers: {
        'Authorization': `Bearer apiKey`,
        'Content-Type': 'application/json',
      },
      body: JSON.stringify({
        workspaceID: workspaceId,
        orderBy: 'last_published_at',
        order: 'desc',
        pageNo,
        pageSize,
        publicationKinds: ['SkillApi'],
        status: 'Released',
      }),
    });

    if (statusCode !== 200) {
      throw new Error(`skills/get failed: HTTP statusCode - body`);
    }

    const resp = JSON.parse(body);
    if (!resp.success || !resp.page?.result) {
      throw new Error(`skills/get unexpected response: body`);
    }

    const skills = resp.page.result.map(skill => ({
      epId: buildEpId(workspaceId, skill.localName),
      displayName: skill.displayName,
      description: skill.description || '',
      localName: skill.localName,
      workspaceId,
    }));

    allSkills.push(...skills);

    const totalCount = resp.page.totalCount || 0;
    if (allSkills.length >= totalCount || skills.length < pageSize) break;
    pageNo++;
  }

  writeCache(SKILLS_CACHE_FILE, { workspaceId, skills: allSkills, timestamp: Date.now() });
  return allSkills;
}

// ── CLI ──────────────────────────────────────────────────────────────

async function main() {
  const command = process.argv[2];

  if (command === 'list-skills') {
    try {
      const skills = await getWorkspaceSkills();
      if (skills.length === 0) {
        console.log('No private workspace skills found.');
        return;
      }
      console.log(`Private workspace skills (skills.length total):\n`);
      for (const skill of skills) {
        const desc = skill.description ? ` - skill.description` : '';
        console.log(`  skill.epId  skill.displayNamedesc`);
      }
    } catch (err) {
      console.error(`Error: err.message`);
      process.exit(1);
    }
  } else {
    console.log('Usage: node workspace.mjs list-skills');
    console.log('');
    console.log('Commands:');
    console.log('  list-skills    List all released private workspace skills');
  }
}

if (import.meta.url === `file://process.argv[1]`) {
  main();
}

FILE:scripts/query.mjs
#!/usr/bin/env node

import { httpsRequest, getApiKey, routerQueryUrl } from './utils.mjs';

/**
 * Parse and validate the router query response.
 * Returns array of skills sorted by confidence (highest first).
 */
export function parseQueryResponse(response) {
  // Handle internal API response format
  if (response?.data?.candidates && Array.isArray(response.data.candidates)) {
    return response.data.candidates
      .filter(s => s.skillId && typeof s.score === 'number')
      .map(s => ({
        epId: s.skillId,
        name: s.displayName,
        description: s.description,
        confidence: s.score
      }))
      .sort((a, b) => b.confidence - a.confidence);
  }
  // Fallback to empty array
  return [];
}

/**
 * Query skills by intent text.
 * @param {string} intent - User's intent text
 * @param {object} options - Optional parameters
 * @param {number} options.topK - Maximum number of results (default: 5)
 * @returns {Promise<Array>} Array of matching skills with confidence scores
 */
export async function querySkills(intent, options = {}) {
  const apiKey = getApiKey();
  const { topK = 5 } = options;

  const requestBody = JSON.stringify({
    query: intent,
    topK
  });

  const response = await httpsRequest(routerQueryUrl(), {
    method: 'POST',
    headers: {
      'Authorization': `Bearer apiKey`,
      'Content-Type': 'application/json',
      'Accept': 'application/json',
    },
    body: requestBody,
  });

  if (response.statusCode !== 200) {
    throw new Error(`Query failed with status response.statusCode: response.body`);
  }

  let result;
  try {
    result = JSON.parse(response.body);
  } catch (err) {
    throw new Error(`Failed to parse query response: err.message`);
  }

  return parseQueryResponse(result);
}

/**
 * CLI entry point for testing query functionality.
 */
async function main() {
  const intent = process.argv[2];
  const topK = parseInt(process.argv[3], 10) || 5;

  if (!intent) {
    console.error('Usage: node query.mjs <intent> [topK]');
    console.error('Example: node query.mjs "detect person falling" 3');
    process.exit(1);
  }

  try {
    const skills = await querySkills(intent, { topK });
    console.log(JSON.stringify({
      success: true,
      intent,
      count: skills.length,
      skills
    }, null, 2));
  } catch (err) {
    console.error(JSON.stringify({
      success: false,
      error: err.message
    }, null, 2));
    process.exit(1);
  }
}

if (import.meta.url === `file://process.argv[1]`) {
  main();
}

FILE:scripts/types.mjs
/**
 * types.mjs — Yijian type constants, classification, serialization & deserialization.
 *
 * Centralizes all type-related logic previously scattered across invoke.mjs
 * and intent-invoke.mjs.
 */

import fs from 'fs';
import path from 'path';
import crypto from 'crypto';

// ---------------------------------------------------------------------------
// 1. Type constants & classification
// ---------------------------------------------------------------------------

/** Basic scalar types — value is a plain string. */
export const BASIC_TYPES = ['String', 'TemplateString', 'Integer', 'Double', 'Boolean', 'Time'];

/** Complex object types — value needs JSON.stringify. */
export const COMPLEX_TYPES = ['Image', 'Detection', 'TrackDetection', 'Attribute', 'ROI', 'Tripwire'];

/** Inner types that can be wrapped in Array<T>. */
export const ARRAY_INNER_TYPES = [...BASIC_TYPES, ...COMPLEX_TYPES, 'Target'];

const _basicSet = new Set(BASIC_TYPES);
const _complexSet = new Set(COMPLEX_TYPES);
const _arrayInnerSet = new Set(ARRAY_INNER_TYPES);

/** Types whose output can be visualized (draw bbox / polygon / polyline). */
const _visualizableTypes = new Set(['Detection', 'TrackDetection', 'ROI', 'Tripwire']);

/** Types that represent input-side visual overlays (ROI, Tripwire). */
const _inputVisualizableTypes = new Set(['ROI', 'Tripwire']);

export function isBasicType(type) {
  return _basicSet.has(type);
}

export function isComplexType(type) {
  return _complexSet.has(type);
}

/**
 * Returns true if `type` is `Array<...>`.
 */
export function isArrayType(type) {
  return typeof type === 'string' && type.startsWith('Array<') && type.endsWith('>');
}

/**
 * Unwrap `Array<T>` → `T`. Returns `null` for non-array types.
 */
export function unwrapArrayType(type) {
  if (!isArrayType(type)) return null;
  return type.slice(6, -1); // strip "Array<" and ">"
}

/**
 * Returns true if the type is a visual/complex type (Image, Detection, etc.),
 * including Array-wrapped forms.
 */
export function isVisualType(type) {
  if (_complexSet.has(type)) return true;
  const inner = unwrapArrayType(type);
  return inner !== null && _complexSet.has(inner);
}

/**
 * Returns true if the type supports output visualization
 * (Detection, TrackDetection, ROI, Tripwire, and their Array<> forms).
 */
export function hasVisualization(type) {
  if (_visualizableTypes.has(type)) return true;
  const inner = unwrapArrayType(type);
  return inner !== null && _visualizableTypes.has(inner);
}

/**
 * Returns true if the type is an input-side visualizable overlay
 * (ROI, Tripwire, and their Array<> forms).
 */
export function isInputVisualizableType(type) {
  if (_inputVisualizableTypes.has(type)) return true;
  const inner = unwrapArrayType(type);
  return inner !== null && _inputVisualizableTypes.has(inner);
}

// ---------------------------------------------------------------------------
// 2. Field schema definitions
// ---------------------------------------------------------------------------

export const IMAGE_FIELDS = {
  imageId:      { type: 'String',  required: false },
  imageData:    { type: 'String',  required: false },
  sourceId:     { type: 'String',  required: false },
  imageWidth:   { type: 'Integer', required: false },
  imageHeight:  { type: 'Integer', required: false },
  timestamp:    { type: 'Integer', required: false },
};

export const DETECTION_FIELDS = {
  image_id:     { type: 'String',         required: false },
  source_id:    { type: 'String',         required: false },
  image_url:    { type: 'String',         required: false },
  description:  { type: 'String',         required: false },
  answers:      { type: 'Array<String>',  required: false },
  image_base64: { type: 'Array<String>',  required: false },
  predictions:  { type: 'Array<Object>',  required: false },
  roi_ids:      { type: 'Array<String>',  required: false },
  answer:       { type: 'String',         required: false },
};

/** TrackDetection shares the same schema as Detection. */
export const TRACK_DETECTION_FIELDS = DETECTION_FIELDS;

export const ATTRIBUTE_FIELDS = {
  image_id:     { type: 'String',         required: false },
  image_url:    { type: 'String',         required: false },
  description:  { type: 'String',         required: false },
  answer:       { type: 'Array<String>',  required: false },
  image_base64: { type: 'Array<String>',  required: false },
  predictions:  { type: 'Array<Object>',  required: false },
  roiIds:       { type: 'Array<String>',  required: false },
};

export const ROI_FIELDS = {
  id:           { type: 'String',        required: false },
  name:         { type: 'String',        required: false },
  displayName:  { type: 'String',        required: false },
  points:       { type: 'Array<Number>', required: true },
  iou:          { type: 'Float',         required: false, default: 0.0 },
  target_ratio: { type: 'Float',         required: false, default: 0.5 },
  kind:         { type: 'String',        required: true,  default: 'ROI' },
  interested:   { type: 'Boolean',       required: false, default: true },
  direction:    { type: 'String',        required: false },
  visuals:      { type: 'Object',        required: false },
};

export const TRIPWIRE_FIELDS = {
  id:           { type: 'String',        required: false },
  name:         { type: 'String',        required: false },
  displayName:  { type: 'String',        required: false },
  points:       { type: 'Array<Number>', required: true },
  iou:          { type: 'Float',         required: false, default: 0.0 },
  target_ratio: { type: 'Float',         required: false, default: 0.5 },
  kind:         { type: 'String',        required: true,  default: 'TripWire' },
  interested:   { type: 'Boolean',       required: false, default: true },
  direction:    { type: 'String',        required: true,  default: 'TwoWay' },
  visuals:      { type: 'Object',        required: false },
};

/** Lookup field schema by type name. */
export const FIELD_SCHEMAS = {
  Image:          IMAGE_FIELDS,
  Detection:      DETECTION_FIELDS,
  TrackDetection: TRACK_DETECTION_FIELDS,
  Attribute:      ATTRIBUTE_FIELDS,
  ROI:            ROI_FIELDS,
  Tripwire:       TRIPWIRE_FIELDS,
};

// ---------------------------------------------------------------------------
// 3. Serialization — buildValue(type, userValue)
// ---------------------------------------------------------------------------

/**
 * Simple heuristic to check if a string looks like base64-encoded data.
 */
function isBase64(str) {
  if (typeof str !== 'string' || str.length < 100) return false;
  return /^[A-Za-z0-9+/\n]+=*$/.test(str.trim());
}

/**
 * Parse image dimensions from buffer (supports JPEG, PNG and WEBP).
 */
export function getImageDimensions(buffer) {
  // PNG: bytes 16-23 contain width and height in the IHDR chunk
  if (buffer[0] === 0x89 && buffer[1] === 0x50 && buffer[2] === 0x4E && buffer[3] === 0x47) {
    return {
      width: buffer.readUInt32BE(16),
      height: buffer.readUInt32BE(20),
    };
  }

  // JPEG: search for SOF markers
  if (buffer[0] === 0xFF && buffer[1] === 0xD8) {
    let offset = 2;
    while (offset < buffer.length - 1) {
      if (buffer[offset] !== 0xFF) { offset++; continue; }
      const marker = buffer[offset + 1];
      if (marker === 0xD9) break; // EOI
      if (marker >= 0xC0 && marker <= 0xCF && marker !== 0xC4 && marker !== 0xC8) {
        return {
          height: buffer.readUInt16BE(offset + 5),
          width: buffer.readUInt16BE(offset + 7),
        };
      }
      const segLen = buffer.readUInt16BE(offset + 2);
      offset += 2 + segLen;
    }
  }

  // WEBP: RIFF????WEBP VP8 /VP8L/VP8X chunks
  if (buffer.length >= 12 &&
      buffer[0] === 0x52 && buffer[1] === 0x49 && buffer[2] === 0x46 && buffer[3] === 0x46 &&
      buffer[8] === 0x57 && buffer[9] === 0x45 && buffer[10] === 0x42 && buffer[11] === 0x50) {
    const chunkId = buffer.slice(12, 16).toString('ascii');
    if (chunkId === 'VP8 ' && buffer.length >= 30) {
      // Lossy: width/height at bytes 26-29 (14-bit each, little-endian, mask 0x3FFF)
      return {
        width: (buffer.readUInt16LE(26) & 0x3FFF),
        height: (buffer.readUInt16LE(28) & 0x3FFF),
      };
    }
    if (chunkId === 'VP8L' && buffer.length >= 25) {
      // Lossless: 28-bit packed fields starting at byte 21
      const bits = buffer.readUInt32LE(21);
      return {
        width: (bits & 0x3FFF) + 1,
        height: ((bits >> 14) & 0x3FFF) + 1,
      };
    }
    if (chunkId === 'VP8X' && buffer.length >= 30) {
      // Extended: 24-bit little-endian width-1 at byte 24, height-1 at byte 27
      return {
        width: buffer.readUIntLE(24, 3) + 1,
        height: buffer.readUIntLE(27, 3) + 1,
      };
    }
  }

  return { width: 0, height: 0 };
}

/**
 * Read an image file and return { base64, width, height }.
 */
export function readImageFile(filePath) {
  const resolved = path.resolve(filePath);
  if (!fs.existsSync(resolved)) {
    throw new Error(`Image file not found: resolved`);
  }
  const buffer = fs.readFileSync(resolved);
  const { width, height } = getImageDimensions(buffer);
  return { base64: buffer.toString('base64'), width, height };
}

/**
 * Compute a short hash ID from the first 64 KB of a file.
 * Returns a stable 16-char hex string (MD5 prefix).
 */
export function fileHashId(filePath) {
  const fd = fs.openSync(path.resolve(filePath), 'r');
  const buf = Buffer.alloc(65536);
  const bytesRead = fs.readSync(fd, buf, 0, 65536, 0);
  fs.closeSync(fd);
  return crypto.createHash('md5').update(buf.subarray(0, bytesRead)).digest('hex').slice(0, 16);
}

/**
 * Build the Image value object and JSON-stringify it.
 */
function buildImageValue(userValue) {
  // (1) Pre-built object with imageData → pass through
  if (typeof userValue === 'object' && userValue !== null && userValue.imageData) {
    return JSON.stringify(userValue);
  }

  // (2) Object form { file, sourceId?, imageId?, timestamp? } or { image, sourceId?, imageId?, timestamp? } — for video frames
  if (typeof userValue === 'object' && userValue !== null && (userValue.file || userValue.image)) {
    const filePath = userValue.file || userValue.image;
    const img = readImageFile(filePath);
    return JSON.stringify({
      imageData: img.base64,
      imageId: userValue.imageId ?? fileHashId(filePath),
      sourceId: userValue.sourceId ?? fileHashId(filePath),
      imageWidth: img.width,
      imageHeight: img.height,
      timestamp: userValue.timestamp ?? Date.now(),
    });
  }

  // (3) String path or base64 (existing logic)
  let imageData, imageWidth = 0, imageHeight = 0;
  let filePath = null;
  if (typeof userValue === 'string') {
    if (isBase64(userValue)) {
      imageData = userValue;
    } else {
      filePath = userValue;
      const img = readImageFile(userValue);
      imageData = img.base64;
      imageWidth = img.width;
      imageHeight = img.height;
    }
  } else {
    throw new Error(`Unsupported image value type: typeof userValue`);
  }

  return JSON.stringify({
    imageData,
    imageId: filePath ? fileHashId(filePath) : '0',
    sourceId: filePath ? fileHashId(filePath) : '',
    imageWidth,
    imageHeight,
    timestamp: Date.now(),
  });
}

/**
 * Apply default values from a field schema to a user-provided object.
 */
function applyDefaults(obj, fields) {
  const result = { ...obj };
  for (const [key, def] of Object.entries(fields)) {
    if (result[key] === undefined && def.default !== undefined) {
      result[key] = def.default;
    }
  }
  return result;
}

/**
 * Build a ROI value: fill in defaults (kind='ROI', etc.), JSON.stringify.
 */
function buildROIValue(userValue) {
  if (typeof userValue === 'string') return userValue; // assume pre-serialized
  const obj = applyDefaults(userValue, ROI_FIELDS);
  return JSON.stringify(obj);
}

/**
 * Build a Tripwire value: fill in defaults (kind='TripWire', direction, etc.), JSON.stringify.
 */
function buildTripwireValue(userValue) {
  if (typeof userValue === 'string') return userValue;
  const obj = applyDefaults(userValue, TRIPWIRE_FIELDS);
  return JSON.stringify(obj);
}

/**
 * Build a complex object value (Detection, TrackDetection, Attribute).
 * If already an object, JSON.stringify; if string, assume pre-serialized.
 */
function buildGenericComplexValue(userValue) {
  if (typeof userValue === 'string') return userValue;
  return JSON.stringify(userValue);
}

/**
 * Unified entry point: serialize a user value based on its Yijian type.
 *
 * @param {string} type   - Yijian type (e.g. 'Image', 'ROI', 'Array<Detection>')
 * @param {*} userValue   - user-provided value
 * @returns {string}      - serialized string suitable for the API
 */
export function buildValue(type, userValue) {
  if (userValue === undefined || userValue === null) return '';

  // Array<T> — recurse on each element
  const inner = unwrapArrayType(type);
  if (inner !== null) {
    const arr = Array.isArray(userValue) ? userValue : [userValue];
    const built = arr.map(elem => {
      const s = buildValue(inner, elem);
      // For complex types the inner buildValue returns JSON string; parse it back
      // so the outer stringify produces a proper array.
      if (isComplexType(inner)) {
        try { return JSON.parse(s); } catch { return s; }
      }
      return s;
    });
    return JSON.stringify(built);
  }

  switch (type) {
    case 'Image':
      return buildImageValue(userValue);
    case 'ROI':
      return buildROIValue(userValue);
    case 'Tripwire':
      return buildTripwireValue(userValue);
    case 'Detection':
    case 'TrackDetection':
    case 'Attribute':
      return buildGenericComplexValue(userValue);
    default:
      // Basic types — convert to string
      if (typeof userValue === 'object') return JSON.stringify(userValue);
      return String(userValue);
  }
}

// ---------------------------------------------------------------------------
// 4. Deserialization — parseValue(type, valueString)
// ---------------------------------------------------------------------------

/**
 * Parse Detection / TrackDetection value string into structured results.
 * Extracts key fields from predictions.
 */
function parseDetections(value) {
  const images = JSON.parse(value);
  const detections = [];
  for (const image of images) {
    for (const pred of (image.predictions || [])) {
      detections.push({
        bbox: pred.bbox,
        confidence: pred.confidence,
        category: pred.categories && pred.categories.length > 0
          ? { id: pred.categories[0].id, name: pred.categories[0].name, confidence: pred.categories[0].confidence }
          : null,
        track_id: pred.track_id,
        area: pred.area,
      });
    }
  }
  return detections;
}

/**
 * Parse Attribute value string into structured results.
 * Similar to Detection but also extracts answer field.
 */
function parseAttributes(value) {
  const images = JSON.parse(value);
  const results = [];
  for (const image of images) {
    const entry = {
      answer: image.answer,
      predictions: [],
    };
    for (const pred of (image.predictions || [])) {
      entry.predictions.push({
        bbox: pred.bbox,
        confidence: pred.confidence,
        categories: pred.categories,
        answer: pred.answer,
      });
    }
    results.push(entry);
  }
  return results;
}

/**
 * Unified entry point: deserialize a value string based on its Yijian type.
 *
 * @param {string} type        - Yijian type
 * @param {string} valueString - raw value string from API response
 * @returns {*}                - parsed value
 */
export function parseValue(type, valueString) {
  if (valueString === undefined || valueString === null || valueString === '') return valueString;

  // Array<T>
  const inner = unwrapArrayType(type);
  if (inner !== null) {
    // For Detection/TrackDetection arrays the entire value is already the images array
    if (inner === 'Detection' || inner === 'TrackDetection') {
      return parseDetections(valueString);
    }
    if (inner === 'Attribute') {
      return parseAttributes(valueString);
    }
    // Generic array: parse, then recurse on each element
    const arr = JSON.parse(valueString);
    return arr.map(elem => {
      const s = typeof elem === 'string' ? elem : JSON.stringify(elem);
      return parseValue(inner, s);
    });
  }

  switch (type) {
    case 'Detection':
    case 'TrackDetection':
      // Single detection (wrapped in an array by API)
      return parseDetections(valueString);
    case 'Attribute':
      return parseAttributes(valueString);
    case 'ROI':
    case 'Tripwire':
      return JSON.parse(valueString);
    case 'Image':
      return JSON.parse(valueString);
    default:
      // Basic types — return as-is
      return valueString;
  }
}

/**
 * Post-process an outputs array: parse complex types and attach parsedValue.
 */
export function parseOutputs(outputs) {
  for (const field of outputs) {
    if (field.value && (isComplexType(field.type) || isArrayType(field.type))) {
      try {
        field.parsedValue = parseValue(field.type, field.value);
      } catch {
        // keep original value if parsing fails
      }
    }
  }
}

// ---------------------------------------------------------------------------
// 5. SKILL.md generation helpers
// ---------------------------------------------------------------------------

/**
 * Returns hints used when generating SKILL.md for a given type.
 */
export function getSkillMdHints(type) {
  const hints = {
    inputNote: null,
    hasVisualization: false,
    visualizationCmd: null,
  };

  // Check the raw type or inner type
  const inner = unwrapArrayType(type) || type;

  if (inner === 'Image') {
    hints.inputNote = '> pass a local file path (auto base64-encoded), or `{ file, sourceId?, imageId?, timestamp? }` / `{ image, sourceId?, imageId?, timestamp? }` for video frames — see **Video Frame Extraction Guide**.';
  } else if (inner === 'ROI') {
    hints.inputNote = '> requires drawing regions on the image — see **ROI / Tripwire Input Guide** below.';
  } else if (inner === 'Tripwire') {
    hints.inputNote = '> requires drawing tripwire lines on the image — see **ROI / Tripwire Input Guide** below.';
  }

  if (hasVisualization(type)) {
    hints.hasVisualization = true;
    if (inner === 'Detection' || inner === 'TrackDetection') {
      hints.visualizationCmd = 'visualize.mjs';
    } else if (inner === 'ROI' || inner === 'Tripwire') {
      hints.visualizationCmd = 'visualize.mjs';
    }
  }

  return hints;
}

FILE:scripts/utils.mjs
#!/usr/bin/env node

import https from 'https';
import http from 'http';

/**
 * Make an HTTPS (or HTTP) request. Returns a Promise that resolves to { statusCode, headers, body }.
 */
export function httpsRequest(url, options = {}) {
  return new Promise((resolve, reject) => {
    const parsedUrl = new URL(url);
    const mod = parsedUrl.protocol === 'https:' ? https : http;
    const reqOptions = {
      hostname: parsedUrl.hostname,
      port: parsedUrl.port || (parsedUrl.protocol === 'https:' ? 443 : 80),
      path: parsedUrl.pathname + parsedUrl.search,
      method: options.method || 'GET',
      headers: options.headers || {},
    };

    const req = mod.request(reqOptions, (res) => {
      const chunks = [];
      res.on('data', (chunk) => chunks.push(chunk));
      res.on('end', () => {
        const body = Buffer.concat(chunks).toString('utf-8');
        resolve({ statusCode: res.statusCode, headers: res.headers, body });
      });
    });

    req.on('error', reject);

    if (options.body) {
      req.write(options.body);
    }
    req.end();
  });
}

/**
 * Get the API key from environment. Throws if not set.
 */
export function getApiKey() {
  const key = process.env.YIJIAN_API_KEY;
  if (!key) {
    throw new Error('YIJIAN_API_KEY environment variable is not set. Please configure it in ~/.claude/settings.json under "env".');
  }
  return key;
}

/**
 * Construct the metadata URL for a given ep-id.
 */
export function metadataUrl(epId) {
  return `https://yijian-next.cloud.baidu.com/api/skills/v1/epId/metadata`;
}

/**
 * Construct the run URL for a given ep-id.
 */
export function runUrl(epId) {
  return `https://yijian-next.cloud.baidu.com/api/skills/v1/epId/run`;
}

/**
 * Construct the router query URL for intent-based skill matching.
 */
export function routerQueryUrl() {
  return 'https://yijian.baidubce.com/harness/v1/router/query';
}

/**
 * Construct the router multimodal URL for direct inference.
 */
export function routerMultimodalUrl() {
  return 'https://yijian.baidubce.com/harness/v1/router/multimodal';
}

/**
 * Construct the workspaces get URL.
 */
export function workspacesGetUrl() {
  return 'https://yijian-next.cloud.baidu.com/api/vistudio/v1/workspaces/get';
}

/**
 * Construct the workspace skills get URL.
 */
export function workspaceSkillsGetUrl(workspaceId) {
  return `https://yijian-next.cloud.baidu.com/api/vistudio/v1/workspaces/workspaceId/skills/get`;
}

FILE:scripts/invoke.mjs
#!/usr/bin/env node

import { httpsRequest, getApiKey, runUrl } from './utils.mjs';
import { buildValue, parseOutputs, readImageFile } from './types.mjs';

/**
 * Read all stdin as a string (for piped input).
 */
function readStdin() {
  return new Promise((resolve, reject) => {
    const chunks = [];
    process.stdin.setEncoding('utf-8');
    process.stdin.on('data', (chunk) => chunks.push(chunk));
    process.stdin.on('end', () => resolve(chunks.join('')));
    process.stdin.on('error', reject);
  });
}

/**
 * Build a full-image ROI covering the entire image bounds.
 */
export function fullImageROI(width, height) {
  return { id: '1', name: 'zone', kind: 'ROI', points: [0, 0, width, 0, width, height, 0, height] };
}

/**
 * Determine Yijian type from a field name heuristic.
 */
function detectFieldType(fieldName, fieldValue) {
  const lower = fieldName.toLowerCase();
  if (lower.includes('image') || lower.endsWith('image')) return 'Image';
  if (lower.includes('roi')) return 'ROI';
  if (lower.includes('tripwire')) return 'Tripwire';
  if (typeof fieldValue === 'number') return 'Integer';
  return 'String';
}

/**
 * Build the `inputs` array for the run request body.
 * Supports Image, ROI, Tripwire, Integer, and String types.
 * When autoROI is enabled and an Image field is present without a corresponding ROI,
 * automatically adds a full-image ROI.
 */
function buildRunInputs(userInputs, options = {}) {
  const { autoROI = false } = options;
  const inputs = [];

  for (const [inputName, inputData] of Object.entries(userInputs)) {
    const schema = [];
    let hasImage = false;
    let hasROI = false;
    let imagePath = null;

    for (const [fieldName, fieldValue] of Object.entries(inputData)) {
      const type = detectFieldType(fieldName, fieldValue);

      if (type === 'Image') {
        hasImage = true;
        imagePath = typeof fieldValue === 'object' && fieldValue !== null
          ? (fieldValue.file || fieldValue.image || null)
          : (typeof fieldValue === 'string' && !fieldValue.startsWith('data:') ? fieldValue : null);
      }
      if (type === 'ROI') hasROI = true;

      schema.push({
        name: fieldName,
        type: type,
        value: buildValue(type, fieldValue),
      });
    }

    // Auto-fill full-image ROI when enabled and not provided by user
    if (autoROI && hasImage && !hasROI && imagePath) {
      try {
        const img = readImageFile(imagePath);
        schema.push({
          name: 'roi',
          type: 'ROI',
          value: buildValue('ROI', fullImageROI(img.width, img.height)),
        });
      } catch (err) {
        // If we can't read the image for ROI, skip auto-fill
        // (buildValue('Image', ...) will read it again anyway and throw if invalid)
      }
    }

    inputs.push({ name: inputName, schema });
  }

  return inputs;
}

/**
 * Invoke a skill by epId with the given user inputs.
 *
 * @param {string} epId - Skill endpoint ID (e.g. "ep-public-xxx")
 * @param {object} userInputs - User input { input0: { image: "path", roi: {...} } }
 * @param {object} options
 * @param {boolean} options.autoROI - Auto-fill full-image ROI when not provided (default: true)
 * @returns {Promise<{success: boolean, epId: string, result?: object, message?: string}>}
 */
export async function runSkill(epId, userInputs = {}, options = {}) {
  const { autoROI = true } = options;
  const apiKey = getApiKey();

  const inputs = buildRunInputs(userInputs, { autoROI });
  const requestBody = JSON.stringify({ inputs });

  const response = await httpsRequest(runUrl(epId), {
    method: 'POST',
    headers: {
      'Authorization': `Bearer apiKey`,
      'Content-Type': 'application/json',
      'Accept': 'application/json',
    },
    body: requestBody,
  });

  if (response.statusCode !== 200) {
    // Check for "conditional branch did not hit" error (no detection)
    try {
      const parsed = JSON.parse(response.body);
      if (parsed.message?.global?.detail?.includes('conditional branch did not hit')) {
        return {
          success: true,
          epId,
          result: { 未检出: true, detail: parsed.message.global.detail },
        };
      }
    } catch {}
    throw new Error(`Skill invocation failed (response.statusCode): response.body`);
  }

  let result;
  try {
    result = JSON.parse(response.body);
  } catch {
    result = { rawResponse: response.body };
  }

  // Post-process: parse complex-type outputs
  if (result.result?.outputs) {
    parseOutputs(result.result.outputs);
  }

  return { success: true, epId, result };
}

/**
 * CLI entry point.
 */
async function main() {
  const epId = process.argv[2];
  let inputArg = process.argv[3];

  if (!epId) {
    console.error('Usage: node invoke.mjs <ep-id> \'<input-json>\' | -');
    console.error('Use - as input arg to read from stdin');
    console.error('');
    console.error('Input JSON format:');
    console.error('  { "input0": { "image": "/path/to/image.jpg" } }');
    console.error('  { "input0": { "image": "/path/to/image.jpg", "roi": { "id":"1","name":"zone","kind":"ROI","points":[...] } } }');
    process.exit(1);
  }

  // Parse user input JSON
  let userInputs;
  if (inputArg === '-') {
    inputArg = await readStdin();
  }
  if (!inputArg) {
    userInputs = {};
  } else {
    try {
      userInputs = JSON.parse(inputArg);
    } catch (err) {
      console.error(`Failed to parse input JSON: err.message`);
      process.exit(1);
    }
  }

  try {
    const output = await runSkill(epId, userInputs, { autoROI: true });
    console.log(JSON.stringify(output, null, 2));
    process.exit(output.success ? 0 : 1);
  } catch (err) {
    console.error(JSON.stringify({ success: false, epId, error: err.message }, null, 2));
    process.exit(1);
  }
}

if (import.meta.url === `file://process.argv[1]`) {
  main();
}

FILE:scripts/list.mjs
#!/usr/bin/env node

import { querySkills } from './query.mjs';

/**
 * List skills matching a query intent.
 * This helps users discover what skills are available for their use case.
 *
 * Usage: node list.mjs "detect people"
 */

async function main() {
  const intent = process.argv[2];
  const topK = parseInt(process.argv[3], 10) || 10;

  if (!intent) {
    console.log('Usage: node list.mjs <intent> [topK]');
    console.log('Example: node list.mjs "detect people falling" 5');
    console.log('');
    console.log('This queries the Yijian platform for skills matching your intent.');
    process.exit(0);
  }

  try {
    const skills = await querySkills(intent, { topK });

    if (skills.length === 0) {
      console.log('No skills found matching your intent.');
      console.log('You can still use multimodal inference: node intent-invoke.mjs "' + intent + '" <image>');
      process.exit(0);
    }

    console.log(`Found skills.length skill(s) matching "intent":\n`);

    skills.forEach((skill, index) => {
      const confidence = (skill.confidence * 100).toFixed(1);
      console.log(`index + 1. skill.name || skill.epId`);
      console.log(`   ID: skill.epId`);
      console.log(`   Match: confidence%`);
      if (skill.description) {
        console.log(`   Description: skill.description`);
      }
      console.log('');
    });

    console.log('To use a skill directly:');
    console.log(`  node intent-invoke.mjs "intent" <image-path>`);
  } catch (err) {
    console.error('Error:', err.message);
    process.exit(1);
  }
}

main();

FILE:scripts/multimodal.mjs
#!/usr/bin/env node

import { httpsRequest, getApiKey, routerMultimodalUrl } from './utils.mjs';
import fs from 'fs';
import path from 'path';

/**
 * Convert a local file path to a base64 data URI.
 */
function imageToDataUri(filePath) {
  const ext = path.extname(filePath).toLowerCase();
  const mimeMap = {
    '.jpg': 'image/jpeg', '.jpeg': 'image/jpeg',
    '.png': 'image/png', '.gif': 'image/gif',
    '.webp': 'image/webp', '.bmp': 'image/bmp',
  };
  const mime = mimeMap[ext] || 'image/jpeg';
  const data = fs.readFileSync(filePath);
  return `data:mime;base64,data.toString('base64')`;
}

/**
 * Check if a string looks like a local file path (not a URL or data URI).
 */
function isLocalFilePath(str) {
  return str && !str.startsWith('http://') && !str.startsWith('https://') && !str.startsWith('data:');
}

/**
 * Call multimodal router for direct inference using internal API.
 * @param {string} text - User's text query
 * @param {string} imageUrl - URL or local file path to the image (optional)
 * @returns {Promise<Object>} API response
 */
export async function callMultimodal(text, imageUrl) {
  const apiKey = getApiKey();
  const messages = [
    {
      role: 'user',
      content: [
        {
          type: 'text',
          text: text
        }
      ]
    }
  ];

  // Add image if provided
  if (imageUrl) {
    // Convert local file path to data URI
    const resolvedUrl = isLocalFilePath(imageUrl) ? imageToDataUri(imageUrl) : imageUrl;
    messages[0].content.push({
      type: 'image_url',
      image_url: {
        url: resolvedUrl
      }
    });
  }

  const requestBody = {
    messages
  };

  const response = await httpsRequest(routerMultimodalUrl(), {
    method: 'POST',
    headers: {
      'Authorization': `Bearer apiKey`,
      'Content-Type': 'application/json',
      'Accept': 'application/json',
    },
    body: JSON.stringify(requestBody),
  });

  if (response.statusCode !== 200) {
    throw new Error(`Multimodal call failed with status response.statusCode: response.body`);
  }

  let result;
  try {
    result = JSON.parse(response.body);
  } catch (err) {
    throw new Error(`Failed to parse multimodal response: err.message`);
  }

  return result;
}

/**
 * CLI entry point for testing multimodal functionality.
 */
async function main() {
  const text = process.argv[2];
  const imageUrl = process.argv[3];

  if (!text) {
    console.error('Usage: node multimodal.mjs <text> [image-url]');
    console.error('Example: node multimodal.mjs "图片里有什么"');
    console.error('Example: node multimodal.mjs "图片里有什么" "http://example.com/image.jpg"');
    process.exit(1);
  }

  try {
    const result = await callMultimodal(text, imageUrl);
    console.log(JSON.stringify({
      success: true,
      text,
      imageUrl: imageUrl || null,
      result
    }, null, 2));
  } catch (err) {
    console.error(JSON.stringify({
      success: false,
      error: err.message
    }, null, 2));
    process.exit(1);
  }
}

if (import.meta.url === `file://process.argv[1]`) {
  main();
}

ClawHub Backend Testing+2

P@clawhub-linpower-af0142bac7