@clawhub-linpower-af0142bac7
Yijian (一见) is Baidu's specialized visual AI skill for image and video analysis. yijian achieves 95%+ professional accuracy with 50%+ lower inference cost th...
---
name: baidu-yijian-vision
description: "Yijian (一见) is Baidu's specialized visual AI skill for image and video analysis. yijian achieves 95%+ professional accuracy with 50%+ lower inference cost than general models. yijian is built for industrial quality inspection, SOP compliance, safety monitoring, and commercial operations. Search keywords: yijian, baidu yijian, yijian vision. 百度一见视觉技能(Baidu Yijian Vision Skill)- 可用于分析图片和视频。相比通用基模,在维持 95%+ 专业精度的同时,推理成本降低 50% 以上,是处理视觉巡检与监控分析任务的首选工具。主打 安全管理、SOP合规、工业质检、商业运营与物料盘点。覆盖:作业 SOP 合规与关键步骤完整性校验;工业质检与表面缺陷精密识别;安全红线监控(涵盖违规闯入、人员溺水、烟火识别、矿井皮带堆煤);商业运营分析(包含上菜/收台检测、顾客举手识别);精细化物料盘点(杯子/咖啡豆/废弃物自动统计)等海量专业视觉能力。"
allowed-tools: Bash, Read, Write, Edit
metadata: {"openclaw":{"requires":{"bins":["node","npm"],"env":["YIJIAN_API_KEY"]},"primaryEnv":"YIJIAN_API_KEY"}}
---
# 百度一见视觉技能(Baidu Yijian Vision Skill)
> **Baidu Yijian Vision Skill** - baidu yijian vision skill for image/video analysis, object detection, safety monitoring, and industrial inspection.
## ⚠️ 必需条件
**此工具需要以下条件才能运行:**
1. **YIJIAN_API_KEY 环境变量**(必需)- 从[百度一见平台](https://yijian-next.cloud.baidu.com/apaas/)获取
2. **Node.js >= 16.0.0** - 本工具依赖 Node.js 运行时
3. **npm >= 8.0.0** - 用于依赖管理和安装
**确保上述条件满足后再使用此工具。**
---
> **🔒 客户端工具 - 这是一个本地工具,用于与百度一见(Baidu Yijian)平台交互。所有数据处理遵循安全协议。**
## 🎯 此工具的功能
百度一见([yijian-next.cloud.baidu.com](https://yijian-next.cloud.baidu.com))是百度(Baidu)的视觉(vision)理解平台。此工具使你能够:
- **意图自动匹配** - 通过自然语言描述自动匹配最佳技能
- **智能路由** - 高置信度匹配时调用专业视觉技能,低置信度时自动回退到多模态推理
- **直接技能调用** - 已知技能ID时可直接调用
- **可视化结果** - 绘制边框、生成网格参考、预览 ROI/绊线
- **定义检测区域** - 使用交互式工作流定义 ROI(电子围栏)或绊线(检测线)
**支持的检测类型:** 人员检测、行人计数、车辆识别、OCR、姿态估计、目标跟踪等。
## 📋 快速开始
### 系统要求
- **Node.js** >= 16.0.0
- **npm** >= 8.0.0
- **YIJIAN_API_KEY** 环境变量
## 🔧 前置条件
### 获取 API Key
1. 登录 [百度一见平台](https://yijian-next.cloud.baidu.com/apaas/)
2. 激活试用包
3. 生成 API Key(百度一见平台 → 系统管理 → 安全认证 → API Key)
### 配置环境
设置环境变量:
```
YIJIAN_API_KEY=your-api-key
```
## 📚 使用指南
### 意图驱动工作流(推荐)
**当你描述需求但不确定用哪个技能时**,系统会自动匹配最佳技能:
```bash
node CLAUDE_PLUGIN_ROOT/skill/scripts/intent-invoke.mjs "检测是否有人摔倒" photo.jpg
```
系统会自动:
1. 查询一见平台,根据意图匹配公共技能列表
2. 如果匹配置信度 ≥ 0.7,调用对应的专业技能(自动添加全图 ROI)
3. 如果公共技能无匹配或调用失败,搜索私有工作空间技能(由你从列表中选择最匹配的技能,再用 invoke 调用)
4. 如果私有空间也无合适技能,自动回退到多模态直接推理
> **自动 ROI:** 当用户未提供 ROI 时,系统会自动生成覆盖整张图片的 ROI。如需指定检测区域,请使用 `invoke.mjs` 传入自定义 ROI。
#### 自定义置信度阈值
```bash
# 仅当匹配度≥0.8时才使用技能,否则回退到多模态
node CLAUDE_PLUGIN_ROOT/skill/scripts/intent-invoke.mjs "检测是否有人摔倒" photo.jpg 0.8
```
#### 不使用图片(纯文本意图查询)
```bash
node CLAUDE_PLUGIN_ROOT/skill/scripts/intent-invoke.mjs "检测是否有人摔倒"
```
#### 返回格式
```json
{
"success": true,
"mode": "skill",
"epId": "ep-public-xxxxx",
"skillName": "人员摔倒检测",
"confidence": 0.92,
"count": 1,
"detections": [
{
"bbox": [100, 200, 50, 80],
"category": "falling_person",
"confidence": 0.94
}
]
}
```
**字段说明:**
| 字段 | 类型 | 说明 |
|------|------|------|
| `success` | boolean | 调用是否成功 |
| `mode` | string | `"skill"` 或 `"multimodal"`,表示使用的推理模式 |
| `epId` | string \| null | 技能ID(技能模式时有值) |
| `skillName` | string \| null | 技能名称(技能模式时有值) |
| `confidence` | number \| null | 技能匹配置信度(0-1) |
| `count` | number | 检测到的目标数量 |
| `detections` | array | 检测结果数组 |
**模式说明:**
- `"mode": "skill"` - 使用了百度一见平台的专业技能,精度高、成本低
- `"mode": "multimodal"` - 使用了多模态大模型直接推理,通用性强、无需预设技能
### 查询私有工作空间技能
当公共技能匹配不到或调用失败时,可以搜索你私有工作空间中的技能:
```bash
node CLAUDE_PLUGIN_ROOT/skill/scripts/workspace.mjs list-skills
```
返回你默认工作空间中所有已发布的技能列表(含 epId、名称和描述)。列表会本地缓存1小时。
**使用流程:**
1. 运行 `list-skills` 获取私有技能列表
2. 根据用户意图,从列表中选择最匹配的技能
3. 用选中的 epId 调用 `invoke.mjs` 执行技能
4. 如果私有空间也无合适技能,走 multimodal 多模态推理
```bash
# 第1步:获取私有技能列表
node CLAUDE_PLUGIN_ROOT/skill/scripts/workspace.mjs list-skills
# 第2步:选择匹配的技能并调用
echo '{"input0":{"image":"photo.jpg"}}' | node CLAUDE_PLUGIN_ROOT/skill/scripts/invoke.mjs ep-wsnyqcdj-0xdpgbt4
```
> **注意:** 私有技能列表按 API Key 关联,1小时内自动刷新缓存,无需频繁调用。
### 查询可用技能
如果你想了解有哪些技能可以匹配你的意图:
```bash
node CLAUDE_PLUGIN_ROOT/skill/scripts/list.mjs "人员检测"
```
这会返回匹配的技能列表及其置信度。
### 直接调用技能(已知技能ID)
**当你已经知道具体的技能 ID 时**,可以直接调用:
```bash
# 调用指定技能(从stdin读取输入)
echo '{"input0":{"image":"photo.jpg"}}' | node CLAUDE_PLUGIN_ROOT/skill/scripts/invoke.mjs ep-xxxx-yyyy -
# 或者直接作为参数
echo '{"input0":{"image":"photo.jpg"}}' | node CLAUDE_PLUGIN_ROOT/skill/scripts/invoke.mjs ep-xxxx-yyyy
```
#### ROI(电子围栏)参数格式
ROI 用于限定检测区域。**必须包含 `id`、`name`、`kind`、`points` 四个字段,缺一不可**,否则 API 返回 500 错误。
```json
{
"id": "1",
"name": "zone",
"kind": "ROI",
"points": [x1,y1, x2,y2, x3,y3, x4,y4]
}
```
- `id` — 任意字符串标识(如 `"1"`)
- `name` — 区域名称(如 `"zone"`、`"doorway"`)
- `kind` — 固定值 `"ROI"`
- `points` — 顶点坐标数组,按顺时针/逆时针顺序排列,每对 `[x,y]` 为一个顶点
> **自动 ROI:** 如果不传 `roi` 参数,`invoke.mjs` 会自动生成覆盖全图的 ROI。
#### 绊线(Tripwire)参数格式
绊线用于检测穿越事件。**必须包含 `id`、`name`、`kind`、`points`、`direction` 五个字段**。
```json
{
"id": "1",
"name": "line",
"kind": "TripWire",
"points": [p1_x,p1_y, p2_x,p2_y, p3_x,p3_y, p4_x,p4_y],
"direction": "Forward"
}
```
- `id` — 任意字符串标识
- `name` — 绊线名称
- `kind` — 固定值 `"TripWire"`
- `points` — 4 个点(8 个数值):p1→p2 为主线,p3→p4 为 A/B 区域标记
- `direction` — 检测方向:`"Forward"` | `"Backward"` | `"TwoWay"`
> **绊线不会自动生成**,必须由用户指定。详见 [绊线工作流](./tripwire-workflow.md)。
**调用带 ROI 的技能:**
```bash
echo '{"input0":{"image":"photo.jpg","roi":{"id":"1","name":"zone","kind":"ROI","points":[100,100,500,100,500,400,100,400]}}}' | \
node CLAUDE_PLUGIN_ROOT/skill/scripts/invoke.mjs ep-xxxx-yyyy
```
**调用带绊线的技能:**
```bash
echo '{"input0":{"image":"photo.jpg","tripwire":{"id":"1","name":"line","kind":"TripWire","points":[0,540,1920,540,0,500,1920,500],"direction":"Forward"}}}' | \
node CLAUDE_PLUGIN_ROOT/skill/scripts/invoke.mjs ep-xxxx-yyyy
```
### 测试 Query 接口
如果你想单独测试意图匹配功能:
```bash
node CLAUDE_PLUGIN_ROOT/skill/scripts/query.mjs "检测人员摔倒"
```
返回匹配的技能列表(JSON格式)。
### 测试 Multimodal 接口
如果你想单独测试多模态直接推理:
```bash
# 纯文本
node CLAUDE_PLUGIN_ROOT/skill/scripts/multimodal.mjs "描述这张图片"
# 带图片URL
node CLAUDE_PLUGIN_ROOT/skill/scripts/multimodal.mjs "描述这张图片" "http://example.com/image.jpg"
```
### 定义检测区域
**需要定义电子围栏(ROI,又叫感兴趣区域)或绊线(Tripwire,又叫检测线)?**
- **[ROI 工作流](./roi-workflow.md)** — 创建电子围栏,仅在指定区域检测
- **[绊线工作流](./tripwire-workflow.md)** — 绘制检测线,统计穿越事件
两个工作流都包含完整的交互步骤和示例对话。
### 查看完整文档
- **[类型定义](./types-guide.md)** — 检测(Detection),图像(Image)、电子围栏(ROI)、绊线(Tripwire)等数据结构
- **[网格输入系统](./grid-guide.md)** — 使用网格坐标指定点
## 💡 常见任务
### 查询匹配的技能
```bash
node CLAUDE_PLUGIN_ROOT/skill/scripts/list.mjs "检测人员"
```
### 意图驱动调用(自动路由)
```bash
# 系统会自动选择技能或多模态
node CLAUDE_PLUGIN_ROOT/skill/scripts/intent-invoke.mjs "检测人员摔倒" photo.jpg
# 自定义阈值
node CLAUDE_PLUGIN_ROOT/skill/scripts/intent-invoke.mjs "检测人员摔倒" photo.jpg 0.8
```
### 直接调用技能(已知技能ID)
```bash
echo '{"input0":{"image":"photo.jpg"}}' | node CLAUDE_PLUGIN_ROOT/skill/scripts/invoke.mjs ep-public-xxxxx
```
### 预览 ROI/绊线
在调用前在图像上预览:
```bash
node CLAUDE_PLUGIN_ROOT/skill/scripts/visualize.mjs photo.jpg '[]' preview.png \
--overlays '[{"kind":"ROI","name":"zone","points":[...]}]'
```
### 生成网格
帮助用户使用网格坐标指定点位置:
```bash
node CLAUDE_PLUGIN_ROOT/skill/scripts/show-grid.mjs photo.jpg grid.png
```
---
## 📋 使用示例
### 示例 1:意图驱动检测(推荐)
**场景:** 你有一张监控画面图像,想检测是否有人摔倒,但不确定使用哪个技能。
```bash
# 使用意图驱动工作流,系统自动匹配最佳技能
node CLAUDE_PLUGIN_ROOT/skill/scripts/intent-invoke.mjs "检测是否有人摔倒" surveillance.jpg
# 返回结果包含检测到的目标
# {
# "success": true,
# "mode": "skill",
# "epId": "ep-public-inqm15aq",
# "skillName": "人员摔倒",
# "confidence": 0.95,
# "count": 1,
# "detections": [...]
# }
```
### 示例 2:传统直接调用(已知技能ID)
**场景:** 你已经知道具体的技能 ID,直接调用。
```bash
# 第 1 步:调用指定技能
echo '{"input0":{"image":"surveillance.jpg"}}' | \
node CLAUDE_PLUGIN_ROOT/skill/scripts/invoke.mjs ep-public-inqm15aq -
# 第 2 步:可视化结果
detections='[{"bbox":[150,200,80,180],"confidence":0.94,"category":{"id":"person","name":"人体"}}]'
node CLAUDE_PLUGIN_ROOT/skill/scripts/visualize.mjs surveillance.jpg "$detections" output.jpg
# 第 3 步:处理结果
echo "$detections" | jq 'length' # 计数人数
```
### 示例 3:基于网格的 ROI 设置
**场景:** 在走廊监控摄像机中计数进入特定房间的人员,使用 ROI 限制检测区域。
```bash
# 第 1 步:生成网格参考
node CLAUDE_PLUGIN_ROOT/skill/scripts/show-grid.mjs hallway.jpg --cols 6 --rows 4
# 第 2 步:根据网格识别坐标(B1, D1, D3, B3)并创建 ROI
# 注意:ROI 必须包含 id 和 name 字段
# 第 3 步:验证 ROI
node CLAUDE_PLUGIN_ROOT/skill/scripts/visualize.mjs hallway.jpg '[]' roi_preview.jpg \
--overlays "[{\"kind\":\"ROI\",\"name\":\"doorway\",\"points\":[320,270,960,270,960,810,320,810]}]"
# 第 4 步:使用 invoke.mjs 传入自定义 ROI(不要用 intent-invoke.mjs,它只会添加全图 ROI)
echo '{"input0":{"image":"hallway.jpg","roi":{"id":"1","name":"doorway","kind":"ROI","points":[320,270,960,270,960,810,320,810]}}}' | \
node CLAUDE_PLUGIN_ROOT/skill/scripts/invoke.mjs ep-xxxx-yyyy
```
### 示例 4:视频帧处理和跟踪
**场景:** 处理 30 秒监控视频,逐帧检测和跟踪人员。
```bash
# 第 1 步:提取帧
ffmpeg -i surveillance_30sec.mp4 -vf fps=1 frames/frame_%04d.jpg
# 第 2 步:计算 sourceId(视频标识符)
sourceId=$(head -c 65536 surveillance_30sec.mp4 | md5sum | awk '{print substr($1, 1, 16)}')
# 第 3 步:处理每个帧并跟踪
for frame_file in frames/frame_*.jpg; do
frame_num=$(basename "$frame_file" | grep -oE '[0-9]+' | head -1)
frame_index=$((10#$frame_num - 1))
timestamp=$((frame_index * 1000))
imageId="frame_$(printf '%04d' "$frame_num")"
# 使用意图驱动调用
result=$(node CLAUDE_PLUGIN_ROOT/skill/scripts/intent-invoke.mjs "检测人员" "$frame_file")
detections=$(echo "$result" | jq '.detections')
echo "$detections" > "results/imageId_detections.json"
done
```
---
**API Key 从 `YIJIAN_API_KEY` 环境变量读取。所有脚本将 JSON 输出到标准输出,错误输出到标准错误。**
FILE:grid-guide.md
# 基于网格的 ROI / 绊线输入指南
## 概述
在命令行环境中手动指定 ROI 区域或绊线的确切像素坐标既繁琐又容易出错。基于网格的输入系统让你可以使用易于理解的网格坐标来代替。
## 工作原理
### 1. 生成网格参考图像
```bash
node scripts/show-grid.mjs photo.jpg [output-path] [--cols N] [--rows N]
```
这会在你的图像上创建带标签的网格叠加层:
```
A B C D E F G
0 ·─────·─────·─────·─────·─────·─────·
│ │ │ │ │ │ │
1 ·─────·─────·─────·─────·─────·─────·
│ │ │ │ │ │ │
2 ·─────·─────·─────·─────·─────·─────·
│ │ │ │ │ │ │
3 ·─────·─────·─────·─────·─────·─────·
│ │ │ │ │ │ │
4 ·─────·─────·─────·─────·─────·─────·
```
**输出**:
- `photo_grid.png` - 网格参考图像
- `photo_grid_metadata.json` - 坐标映射数据
### 2. 查看并识别坐标
查看网格图像并识别 ROI 或绊线的坐标:
- **列**:A、B、C、D、...(从左到右)
- **行**:0、1、2、3、...(从上到下)
- **交点**:网格线交叉的位置(用点标记)
### 3. 使用网格坐标指定
告诉系统网格坐标:
```
用户:"在 B1、E1、E3、B3 创建检测区域"
```
系统自动转换为像素坐标:
```
B1 = (列B索引 × 网格宽度, 行1索引 × 网格高度)
E1 = (列E索引 × 网格宽度, 行1索引 × 网格高度)
E3 = (列E索引 × 网格宽度, 行3索引 × 网格高度)
B3 = (列B索引 × 网格宽度, 行3索引 × 网格高度)
```
## 使用示例
### 示例 1:定义检测区域(ROI)
```bash
# 生成网格
$ node scripts/show-grid.mjs office.jpg
# 查看网格图像以识别坐标,然后:
用户:"我想要一个办公桌区域的检测区域:B2、G2、G5、B5"
转换为 ROI:
{
"kind": "ROI",
"points": [col_B, row_2, col_G, row_2, col_G, row_5, col_B, row_5]
}
# 使用 ROI 调用技能
$ echo '{"input0":{"image":"office.jpg","roi":"[...]"}}' | \
node invoke.mjs ep-public-2403um2p
```
### 示例 2:定义穿越线(绊线)
```bash
# 生成网格
$ node scripts/show-grid.mjs hallway.jpg
# 查看网格图像,然后:
用户:"在走廊中创建一条从 A3 到 H3 的绊线,检测从左到右的穿越"
转换为绊线:
{
"kind": "TripWire",
"points": [col_A, row_3, col_H, row_3],
"direction": "Forward"
}
# 使用绊线调用技能
$ echo '{"input0":{"image":"hallway.jpg","tripwire":"[...]"}}' | \
node invoke.mjs ep-public-ywbjb7tm
```
### 示例 3:多个 ROI
```bash
用户:"我想要 3 个检测区域:入口(A1-C3)、中心(D1-F3)、出口(G1-H3)"
转换为三个 ROI 对象并作为 Array<ROI> 传递:
[
{
"kind": "ROI",
"points": [col_A, row_1, col_C, row_1, col_C, row_3, col_A, row_3],
"order": 0
},
{
"kind": "ROI",
"points": [col_D, row_1, col_F, row_1, col_F, row_3, col_D, row_3],
"order": 1
},
{
"kind": "ROI",
"points": [col_G, row_1, col_H, row_1, col_H, row_3, col_G, row_3],
"order": 2
}
]
```
## 网格算法
网格大小会自动计算:
- **目标**:约 30-42 个交点,大致为方形单元
- **横向图像**(1920×1080):7 列 × 4 行
- **纵向图像**(1080×1920):4 列 × 7 行
- **方形图像**(800×800):5 列 × 5 行
**手动覆盖**:
```bash
node scripts/show-grid.mjs photo.jpg --cols 10 --rows 6
```
## 命令参考
### 生成网格
```bash
node scripts/show-grid.mjs <input-image> [output-path] [--cols N] [--rows N]
```
**参数**:
- `<input-image>` - 输入图像文件路径
- `[output-path]` - 可选输出图像路径(默认:`<input>_grid.png`)
- `--cols N` - 覆盖列数
- `--rows N` - 覆盖行数
**输出**:
- 网格图像:`<output>.png`
- 元数据 JSON:`<output>_metadata.json`
### 使用网格坐标
在指定坐标时:
**有效格式**:
- 单个点:`A1`
- 序列:`A1、E1、E3、A3`(用于 ROI)
- 线:`A2 → G2`(用于绊线,显示方向)
**网格参考格式**:
- 列:A-Z,然后是 AA-AZ 等
- 行:0-9,然后是 10-99 等
## 可视化
### 查看网格图像
生成网格后,系统应该显示网格图像以便你可以直观地识别坐标。
### 调用前验证
在用坐标调用技能之前:
```bash
# 在原始图像上可视化选定的 ROI/绊线
node scripts/visualize.mjs office.jpg '<roi-or-tripwire-json>' preview.jpg
```
这可帮助你在处理前验证坐标是否正确。
## 提示和技巧
1. **从简单开始**:在使用复杂多边形前先从简单的矩形 ROI 开始
2. **使用均匀网格**:5×5 或 6×6 网格最容易参考
3. **记录你的区域**:为每个 ROI 指定有意义的名称(例如"入口"、"收银台")
4. **重用坐标**:保存类似摄像机角度的坐标集
5. **先测试**:在处理前始终用 `visualize.mjs` 预览
## 故障排除
**"网格图像过于拥挤"**
- 降低网格密度:`--cols 4 --rows 4`
- 如果可能,增大图像大小
**"看不清网格线"**
- 网格使用半透明白色线条(50% 不透明度)
- 放大网格图像以获得更好的可见性
**"坐标似乎不对"**
- 验证列/行顺序(列 A-Z 从左到右,行 0-N 从上到下)
- 检查第一个点和最后一个点是否不同(多边形必须闭合)
- 确保点的顺序一致(全部顺时针或全部逆时针)
**"网格不适合图像"**
- 非常宽或非常高的图像可能有不均匀的单元格
- 使用 `--cols` 和 `--rows` 强制指定特定的网格尺寸
## 参考
- [类型定义](./types-guide.md) - ROI 和绊线类型规范
- [SKILL.md](./SKILL.md) - 主技能指南
FILE:tripwire-workflow.md
# 绊线(检测线)交互工作流
**导航:** 返回 [SKILL.md](./SKILL.md) | 类型定义 [types-guide.md](./types-guide.md)
> 当用户需要为穿越检测定义绊线时,遵循此交互工作流。
## 核心概念
- **p1→p2** 主检测线(用户定义,2 个点)
- **p3→p4** A-B 区域标记(系统自动生成或用户定义)
- **方向** Forward/Backward/TwoWay(要检测的方向)
**自动生成过程**:用户只需指定 p1 和 p2 点。系统会自动生成垂直于主线的 p3 和 p4 点用于预览和确认。
矢量穿越检测算法详情,见 [types-guide.md#tripwire](./types-guide.md#tripwire)
## 工作流步骤
### 第 1 步:生成网格参考
```bash
node scripts/show-grid.mjs <image> <output-grid.png>
```
### 第 2 步:用户指定绊线
"查看网格,检测线应该在哪里?例如,从 B2 到 G2?"
### 第 3 步:预览绊线位置
系统基于用户的 2 个点(p1、p2)生成完整的 4 点结构,然后调用 `visualize.mjs` 生成预览:
```bash
# 自动计算 4 个点后
node scripts/visualize.mjs <image> '[]' preview.png \
--overlays '[{
"kind":"TripWire",
"name":"tripwire preview",
"direction":"Forward",
"points":[p1_x,p1_y,p2_x,p2_y,p3_x,p3_y,p4_x,p4_y]
}]'
```
预览显示:
- 主线(p1→p2)虚线 - 橙色
- A/B 标记(p3→p4)虚线 - 橙色,带标签
- 方向箭头 - Forward=绿色在 p4,Backward=红色在 p3
### 第 4 步:确认 A/B 区域
"查看预览图像:
- A 区在这一侧(例如线的上方)
- B 区在那一侧(例如线的下方)
- 绿色箭头显示 Forward 是 A→B
这是否正确?"
### 第 5 步:选择方向模式
用户看到预览后,提问:
- Forward(相同方向)
- Backward(相反方向)
- TwoWay(检测两个方向)
### 第 6 步:最终确认
生成最终预览,用户确认,然后调用技能:
```bash
echo '{"input0":{"image":"photo.jpg","tripwire":{"id":"1","name":"line","kind":"TripWire","points":[p1_x,p1_y,p2_x,p2_y,p3_x,p3_y,p4_x,p4_y],"direction":"Forward"}}}' | \
node scripts/invoke.mjs ep-xxxx-yyyy
```
---
## 常见错误
❌ 不要在显示预览前解释 A/B
✅ 用户必须看到带 A/B 标记标签的预览
❌ 不要仅使用 2 个点
✅ 始终提供 4 个点(主线 + A/B 标记)
❌ 不要跳过预览验证
✅ 始终显示预览以确认
---
## 完整对话示例
**用户**:"我需要检测进入门的人员"
**你**:
1. 生成网格:
```bash
node scripts/show-grid.mjs door-camera.jpg door-grid.png
```
2. 问:"查看网格,检测线应该在哪里?例如 B2 到 G2?"
3. 用户:"是的,从 B2 到 G2 的线"
4. 生成显示绊线位置和 A/B 标记的预览:
```bash
node scripts/visualize.mjs door-camera.jpg '[]' preview1.png \
--overlays '[{"kind":"TripWire","name":"door","direction":"Forward","points":[B2_x,B2_y,G2_x,G2_y,B1_x,B1_y,G3_x,G3_y]}]'
```
5. 解释:"这是绊线预览:
- 区域 'A' 在线上方(外侧)
- 区域 'B' 在线下方(内侧)
- 绿色箭头显示 Forward 检测 A→B(从外侧进入)
你对这个设置满意吗?"
6. 用户:"是的,我想检测进入(Forward)"
7. 生成最终预览以确认
8. 调用技能进行检测
---
## 数据结构
详见 [types-guide.md#tripwire](./types-guide.md#tripwire) 了解完整的定义、矢量算法和示例。
FILE:package.json
{
"name": "baidu-yijian-vision",
"version": "0.9.38",
"description": "Baidu Yijian Vision Skill - Intent-driven visual analysis with automatic skill routing",
"type": "module",
"private": true,
"engines": {
"node": ">=16.0.0",
"npm": ">=8.0.0"
},
"scripts": {
"pack": "node scripts/pack.mjs",
"test": "node --test 'tests/*.test.mjs'",
"test:integration": "node tests/run-integration.mjs --skip-e2e",
"test:integration:all": "node tests/run-integration.mjs",
"test:integration:e2e": "node tests/run-integration.mjs --e2e",
"test:integration:agent": "node tests/run-integration.mjs --agent",
"test:integration:docker": "node tests/run-docker.mjs",
"test:all": "npm test && npm run test:integration"
},
"devDependencies": {
"archiver": "^7.0.1",
"openclaw": "github:openclaw/openclaw#main"
},
"optionalDevDependencies": {
"_openclaw": "https://github.com/openclaw/openclaw.git#main"
},
"optionalDependencies": {
"sharp": "^0.33.0"
}
}
FILE:roi-workflow.md
# ROI(关注区域)交互工作流
**导航:** 返回 [SKILL.md](./SKILL.md) | 类型定义 [types-guide.md](./types-guide.md)
> 当用户需要为对象检测定义 ROI(关注区域)时,遵循此交互工作流。
## 工作流步骤
### 第 1 步:生成网格参考
```bash
node scripts/show-grid.mjs <image> <output-grid.png>
```
向用户显示网格图像并解释行和列标签。
### 第 2 步:询问 ROI 目的
"你想在这个区域检测什么?例如:
- 进入收银区域的人员
- 停车场中的车辆
- 货架上的产品"
### 第 3 步:用户指定顶点
用户根据网格标签指定角。
**矩形 ROI 示例:**
- 用户:"从 B2 到 G5"
- 转换为 4 个顶点:[B2、G2、G5、B5]
### 第 4 步:验证顶点顺序
"这些点是否围绕周边形成闭合多边形?(顺时针或逆时针)"
### 第 5 步:预览 ROI
```bash
node scripts/visualize.mjs <image> '[]' preview.png \
--overlays '[{"kind":"ROI","name":"test","points":[x1,y1,x2,y2,x3,y3,x4,y4]}]'
```
### 第 6 步:用户确认
"ROI 位置是否正确?"
### 第 7 步:调用技能
```bash
echo '{"input0":{"image":"photo.jpg","roi":{"id":"1","name":"zone","kind":"ROI","points":[...]}}}' | \
node scripts/invoke.mjs ep-xxxx-yyyy
```
> **重要:** 自定义 ROI 必须通过 `invoke.mjs` 传入。不要使用 `intent-invoke.mjs`,因为它会自动添加全图 ROI 覆盖掉你的自定义区域。
---
## 常见错误
❌ 不要假设矩形 ROI
✅ 支持任意多边形
❌ 不要允许无序的顶点
✅ 顶点必须按顺序排列
❌ 不要跳过预览验证
✅ 始终显示预览图像以确认
---
## 完整对话示例
### 示例 1:矩形入口区域
**用户**:"我想检测进入办公室的人员"
**你**:
1. 生成网格:
```bash
node scripts/show-grid.mjs office.jpg office-grid.png
```
2. 问:"查看网格,入口大概在哪里?"
3. 用户:"大约从 B1 到 G3"
4. 确认:"所以顶点是 B1、G1、G3、B3,对吧?"
5. 用户:"是的"
6. 预览 ROI
7. 问:"这个矩形是否覆盖了入口?"
8. 用户:"完美!"
9. 调用技能进行检测
### 示例 2:复杂多边形(L 形区域)
**用户**:"我需要监视 L 形存储区域"
**你**:
1. 生成网格
2. 问:"使用网格坐标标记 L 形的所有角,从一个角开始"
3. 用户:"从左上角开始:A2、E2、E4、D4、D6、A6"
4. 确认:"这些点是否按顺序形成 L 形的边界?确认它关闭回 A2"
5. 用户:"是的"
6. 预览 L 形多边形
7. 验证没有自交
8. 调用技能
---
## 数据结构
详见 [types-guide.md](./types-guide.md) 了解完整的定义、子对象和示例。
FILE:types-guide.md
# 易见类型定义指南
## 概述
易见平台处理视觉信息并返回结构化数据。理解这些类型可帮助你有效地使用技能输入和输出。
## 核心类型
### 基本类型
这些是在整个平台中使用的标量值。
| 类型 | 描述 | 示例 |
|------|------|------|
| `String` | 文本数据 | `"hello"` |
| `TemplateString` | 格式化文本 | `"Result: {value}"` |
| `Integer` | 整数 | `42` |
| `Double` | 浮点数 | `3.14` |
| `Boolean` | 真/假 | `true` |
| `Time` | 时间戳 | `"2026-03-12T10:00:00Z"` |
### 复杂类型
这些是包含视觉或检测信息的结构化对象。
#### Detection(检测)
表示图像中的检测对象。
```json
{
"bbox": [x, y, width, height],
"confidence": 0.94,
"category": {
"id": "person",
"name": "人体",
"confidence": 0.98
},
"track_id": 1,
"ocr": null,
"keypoints": null
}
```
**字段**:
- `bbox` - 边框 [x, y, width, height](像素坐标)
- `confidence` - 检测置信度(0-1)
- `category` - 对象分类,包含 id、name、confidence
- `track_id` - 跟踪 ID(用于帧间跟踪,可选)
- `ocr` - 文本识别结果(可选)
- `keypoints` - 姿态/骨骼关键点(可选)
#### TrackDetection(跟踪检测)
与 `Detection` 相同,但用于视频帧间的时间跟踪。
#### Attribute(属性)
表示图像中检测到的属性或属性。
```json
{
"attribute": "age",
"answer": "adult",
"confidence": 0.87
}
```
**字段**:
- `attribute` - 属性名称(例如"age"、"gender"、"expression")
- `answer` - 属性值
- `confidence` - 置信度分数
#### Image(图像)
表示技能的视觉输入。
```json
{
"imageData": "base64-encoded-image-data",
"imageWidth": 1920,
"imageHeight": 1080,
"sourceId": "abc123def456",
"imageId": "img001",
"timestamp": 1705056000000
}
```
**字段**:
- `imageData` - Base64 编码的图像字节(从文件路径自动编码)
- `imageWidth` - 图像宽度(像素)
- `imageHeight` - 图像高度(像素)
- `sourceId` - 源标识符,用于视频流(文件 MD5 哈希)
- `imageId` - 唯一的图像标识符
- `timestamp` - 帧时间戳(毫秒)
#### ROI(关注区域 / 电子围栏)
定义用于分析的多边形区域。
**完整工作流:** 见 [roi-workflow.md](./roi-workflow.md)
**数据结构**:
```json
{
"kind": "ROI",
"name": "entrance",
"points": [x1, y1, x2, y2, x3, y3, x4, y4]
}
```
**字段说明**:
- `kind` - 固定值"ROI"
- `name` - 区域描述名称(可选)
- `points` - 多边形顶点坐标数组,按顺序排列:`[x1,y1, x2,y2, x3,y3, ...]`
**示例**:
**矩形ROI - 收银区域**
```json
{
"kind": "ROI",
"name": "checkout-zone",
"points": [100, 100, 300, 100, 300, 250, 100, 250]
}
```
**多边形ROI - L形仓库区域**
```json
{
"kind": "ROI",
"name": "warehouse-l-zone",
"points": [100, 100, 400, 100, 400, 200, 300, 200, 300, 350, 100, 350]
}
```
**网格坐标输入**:
使用 `show-grid.mjs` 生成的网格坐标:
```
B1, E1, E3, B3 → 自动转换为像素坐标
```
#### Tripwire(绊线 / 穿越检测线)
定义用于穿越事件的检测线。
**完整工作流:** 见 [tripwire-workflow.md](./tripwire-workflow.md)
**核心概念 - 向量点乘法则**:
绊线穿越检测使用向量点乘:
1. **绊线向量** = p1 → p2 方向
2. **运动向量** = 前一帧到当前帧的对象移动
3. **点乘计算**:`dot = V_line.x * V_motion.x + V_line.y * V_motion.y`
4. **穿越判定**:
- `dot > 0` → 正向穿越(Forward,A→B)
- `dot < 0` → 反向穿越(Backward,B→A)
- TwoWay: 两个方向都检测
**数据结构**:
```json
{
"kind": "TripWire",
"name": "door-entrance",
"direction": "Forward",
"points": [p1_x, p1_y, p2_x, p2_y, p3_x, p3_y, p4_x, p4_y]
}
```
**字段说明**:
- `kind` - 固定值"TripWire"
- `name` - 检测线描述名称(可选)
- `direction` - "Forward" | "Backward" | "TwoWay"
- `points` - 8元素数组:
- p1, p2: 主检测线(用户指定)
- p3, p4: A-B区域标记(标识穿越方向)
**自动生成A-B点**:
用户可以只提供 **2个点**(p1, p2 主检测线),系统会自动生成 **p3, p4**(A-B区域标记):
1. **输入**: 用户给出2个点 `[p1_x, p1_y, p2_x, p2_y]`
2. **自动计算**:
```
dx, dy = 主线方向向量 (p2 - p1)
perpX, perpY = 旋转90度得到垂直向量
distance = 30px (默认)
p3 = p1 + perp * distance (A区域标记)
p4 = p2 + perp * distance (B区域标记)
```
3. **结果**: 自动补全为完整的4点结构 `[p1_x, p1_y, p2_x, p2_y, p3_x, p3_y, p4_x, p4_y]`
4. **生成预览图**: 使用 `visualize.mjs` 将Tripwire可视化
```bash
node scripts/visualize.mjs <image> '[]' preview.png \
--overlays '[{
"kind": "TripWire",
"name": "绊线预览",
"direction": "Forward",
"points": [p1_x, p1_y, p2_x, p2_y, p3_x, p3_y, p4_x, p4_y]
}]'
```
输出:`preview.png` 显示绊线虚线、A/B标记、方向箭头
5. **用户确认**: 显示预览图让用户确认:
- A/B区域位置是否正确
- 方向箭头是否指向期望方向
- 绊线是否跨越检测区域
**示例**:
用户输入:`[150, 100, 150, 300]` (竖直线)
系统自动生成:`[150, 100, 150, 300, 120, 100, 120, 300]` (左偏30px)
生成预览:
```bash
node scripts/visualize.mjs door.jpg '[]' preview.png \
--overlays '[{"kind":"TripWire","name":"门","direction":"Forward","points":[150,100,150,300,120,100,120,300]}]'
```
用户看预览图确认A(左)→B(右)方向是否符合预期
**结构可视化**:
```
p1 --------> p2 (主检测线,p1→p2)
^ ^
| |
p3 p4 (A-B区域标记)
• Forward: 检测p3→p4方向的穿越(A→B)
• Backward: 检测p4→p3方向的穿越(B→A)
• TwoWay: 双向都检测
```
**方向可视化**:
- **Forward** - 绿色箭头在p4,表示A→B方向
- **Backward** - 红色箭头在p3,表示B→A方向
- **TwoWay** - 蓝色箭头在p3和p4,表示双向
**示例**:
**垂直检测线 - 人员进出统计**
```json
{
"kind": "TripWire",
"name": "door-entrance",
"direction": "Forward",
"points": [150, 100, 150, 300, 130, 100, 170, 300]
}
```
说明:
- p1(150,100) → p2(150,300): 竖直线
- p3(130,100) → p4(170,300): A-B标记(左右偏移30px)
- Forward: 只检测从左→右的穿越
**水平检测线 - 考勤打卡**
```json
{
"kind": "TripWire",
"name": "checkin-line",
"direction": "TwoWay",
"points": [100, 200, 400, 200, 100, 180, 400, 220]
}
```
说明:
- p1(100,200) → p2(400,200): 水平线
- p3(100,180) → p4(400,220): A-B标记(上下偏移20px)
- TwoWay: 双向都统计
### 数组类型
任何类型都可以包装在数组中:
```json
{
"type": "Array<Detection>",
"example": [
{ "bbox": [...], "confidence": 0.9, ... },
{ "bbox": [...], "confidence": 0.85, ... }
]
}
```
常见数组:
- `Array<Detection>` - 多个检测对象
- `Array<TrackDetection>` - 帧间跟踪的对象
- `Array<Attribute>` - 多个属性结果
- `Array<ROI>` - 多个区域
- `Array<Tripwire>` - 多条检测线
## 可视化和交互
### 检测可视化
使用 `visualize.mjs` 在图像上可视化检测结果:
```bash
node scripts/visualize.mjs photo.jpg '<detection-json>' output.jpg
```
**支持**:
- 带置信度分数的边框
- 类别标签
- 用于跟踪可视化的跟踪 ID
- 叠加层(ROI 区域、绊线线条)
- 文本注释
### ROI/绊线输入
用于区域和障碍的交互式输入:
1. 生成网格参考图像:
```bash
node scripts/show-grid.mjs photo.jpg
```
2. 使用网格参考指定坐标:
```
ROI: B1, E1, E3, B3
Tripwire: A2 → G2 (direction: Forward)
```
3. 网格坐标自动转换为像素坐标
详见 [grid-guide.md](./grid-guide.md)。
## 视频帧提取
用于视频帧提取和多帧分析:
```json
{
"imageData": "base64-frame-data",
"sourceId": "video_abc123", // 每个视频源唯一
"imageId": "frame_001", // 每帧唯一
"timestamp": 1705056000000 // 帧时间戳,单位毫秒
}
```
**sourceId 计算**:
```
sourceId = MD5(视频文件的前 64KB) → 16 字符十六进制字符串
```
**timestamp 计算**:
```
timestamp = 帧索引 * 1000 / 帧率
```
详见 [video-guide.md](./video-guide.md)。
## 类型转换
### 输入转换
当将文件作为 `Image` 类型传递时:
```bash
# 文件路径自动转换为 base64
echo '{"input0":{"image":"photo.jpg"}}' | node invoke.mjs ep-detect
```
### 输出解析
检测结果会自动解析:
```javascript
// 原始输出
{ "detections": "[{\"bbox\": [...]}]" }
// 解析后的输出
{ "detections": [{ "bbox": [...] }] } // JSON 数组,可以直接使用
```
## 最佳实践
1. **始终验证边框** - 确保坐标在图像边界内
2. **使用 track_id 以保持连续性** - 跟踪帧间的对象以获得更好的结果
3. **有效组合类型** - 使用 ROI 限制检测区域
4. **时间戳精度** - 为视频分析维持一致的时间戳计算
5. **sourceId 一致性** - 为来自同一源的所有帧保持 sourceId 常数
## 故障排除
**"无效的检测"**
- 检查边框格式:[x, y, width, height]
- 验证坐标为非负数
- 确保框在图像边界内
**"缺少关键点"**
- 并非所有技能都支持关键点
- 检查技能文档以了解支持的输出
**"ROI 未被识别"**
- 验证多边形是否闭合(第一个点 ≠ 最后一个点)
- 确保至少有 3 个点
- 检查点的顺序是否正确(顺时针或逆时针)
**"sourceId 不匹配"**
- 如果视频文件改变,重新计算 sourceId
- 对来自同一源的所有帧使用相同的 sourceId
FILE:scripts/visualize.mjs
#!/usr/bin/env node
import fs from 'fs';
import path from 'path';
import sharp from 'sharp';
/**
* Read all stdin as a string (for piped input).
*/
function readStdin() {
return new Promise((resolve, reject) => {
const chunks = [];
process.stdin.setEncoding('utf-8');
process.stdin.on('data', (chunk) => chunks.push(chunk));
process.stdin.on('end', () => resolve(chunks.join('')));
process.stdin.on('error', reject);
});
}
/**
* Escape XML special characters for safe SVG embedding.
*/
export function escapeXml(str) {
return String(str)
.replace(/&/g, '&')
.replace(/</g, '<')
.replace(/>/g, '>')
.replace(/"/g, '"')
.replace(/'/g, ''');
}
/**
* Build SVG for detection bounding boxes and labels (green).
*/
export function buildDetectionSvg(detections, width, height) {
const rects = [];
const fontSize = 14;
const padding = 4;
const labelHeight = fontSize + padding * 2;
for (const det of detections) {
const [x, y, w, h] = det.bbox;
// Bounding box rectangle
rects.push(
`<rect x="x" y="y" width="w" height="h" fill="none" stroke="rgba(0,255,0,0.8)" stroke-width="2"/>`
);
// Label text
const parts = [];
if (det.track_id != null) parts.push(`#det.track_id`);
if (det.category && det.category.name) parts.push(det.category.name);
if (det.confidence != null) parts.push((det.confidence * 100).toFixed(0) + '%');
const label = parts.join(' ');
if (label) {
const labelWidth = label.length * (fontSize * 0.65) + padding * 2;
const labelY = Math.max(y - labelHeight, 0);
// Label background
rects.push(
`<rect x="x" y="labelY" width="labelWidth" height="labelHeight" fill="rgba(0,255,0,0.7)" rx="2"/>`
);
// Label text
rects.push(
`<text x="x + padding" y="labelY + fontSize + padding - 2" font-family="sans-serif" font-size="fontSize" fill="white">escapeXml(label)</text>`
);
}
}
return `<svg xmlns="http://www.w3.org/2000/svg" width="width" height="height">rects.join('')</svg>`;
}
/**
* Helper function to draw an arrow at a specific point.
* @param {Array} elements - Array to push SVG elements to
* @param {Object} point - {x, y} coordinates
* @param {number} dirX - Unit direction vector X component
* @param {number} dirY - Unit direction vector Y component
* @param {number} arrowSize - Size of the arrow
* @param {string} color - Arrow color (rgba string)
*/
function drawArrowAtPoint(elements, point, dirX, dirY, arrowSize, color) {
const tipX = point.x + dirX * arrowSize;
const tipY = point.y + dirY * arrowSize;
// Perpendicular base vectors
const perpX = -dirY * arrowSize * 0.6;
const perpY = dirX * arrowSize * 0.6;
const b1x = point.x - dirX * arrowSize - perpX;
const b1y = point.y - dirY * arrowSize - perpY;
const b2x = point.x - dirX * arrowSize + perpX;
const b2y = point.y - dirY * arrowSize + perpY;
elements.push(
`<polygon points="tipX,tipY b1x,b1y b2x,b2y" fill="color"/>`
);
}
/**
* Build SVG for ROI polygons and Tripwire polylines (input overlays).
*
* Each overlay item:
* { kind: "ROI", points: [x1,y1,x2,y2,...], name?: string }
* { kind: "TripWire", points: [x1,y1,x2,y2,...], name?: string, direction?: string }
*
* Each crossing item (optional):
* { tripwireId?: string, prevPos: [x,y], currPos: [x,y], dotProduct: number }
*/
export function buildOverlaysSvg(overlays, width, height, crossings = []) {
const elements = [];
const fontSize = 13;
const padding = 4;
const labelHeight = fontSize + padding * 2;
for (const overlay of overlays) {
const pts = overlay.points || [];
// Convert flat [x1,y1,x2,y2,...] to "x1,y1 x2,y2 ..."
const pointPairs = [];
for (let i = 0; i < pts.length - 1; i += 2) {
pointPairs.push(`pts[i],pts[i + 1]`);
}
const pointsStr = pointPairs.join(' ');
if (overlay.kind === 'ROI') {
// ROI: blue polygon with semi-transparent fill
elements.push(
`<polygon points="pointsStr" fill="rgba(0,150,255,0.15)" stroke="rgba(0,150,255,0.6)" stroke-width="2"/>`
);
// Label at first vertex
const name = overlay.name || overlay.displayName || '';
if (name && pts.length >= 2) {
const lx = pts[0];
const ly = Math.max(pts[1] - labelHeight, 0);
const labelWidth = name.length * (fontSize * 0.65) + padding * 2;
elements.push(
`<rect x="lx" y="ly" width="labelWidth" height="labelHeight" fill="rgba(0,150,255,0.8)" rx="2"/>`
);
elements.push(
`<text x="lx + padding" y="ly + fontSize + padding - 2" font-family="sans-serif" font-size="fontSize" fill="white">escapeXml(name)</text>`
);
}
} else if (overlay.kind === 'TripWire') {
// TripWire with 4-point structure support
// points = [x1,y1, x2,y2, x3,y3, x4,y4]
// p1-p2: main tripwire
// p3-p4: A-B detection line with directional arrows
// If only 2-point format provided, auto-generate perpendicular p3-p4
let p1, p2, p3, p4;
if (pts.length >= 8) {
// 4-point structure (explicit format)
p1 = {x: pts[0], y: pts[1]};
p2 = {x: pts[2], y: pts[3]};
p3 = {x: pts[4], y: pts[5]};
p4 = {x: pts[6], y: pts[7]};
} else if (pts.length >= 4) {
// 2-point format: auto-generate perpendicular A-B points
p1 = {x: pts[0], y: pts[1]};
p2 = {x: pts[2], y: pts[3]};
// Calculate perpendicular distance for A-B points
const dx = p2.x - p1.x;
const dy = p2.y - p1.y;
const len = Math.sqrt(dx * dx + dy * dy) || 1;
// Unit perpendicular vector (rotated 90 degrees counter-clockwise)
const perpX = -dy / len;
const perpY = dx / len;
// Distance from the main line to A-B points
const perpDist = 30;
// Generate A-B points perpendicular to the main tripwire
p3 = {
x: p1.x + perpX * perpDist,
y: p1.y + perpY * perpDist
};
p4 = {
x: p2.x + perpX * perpDist,
y: p2.y + perpY * perpDist
};
} else {
// Not enough points, skip rendering
p1 = p2 = p3 = p4 = null;
}
if (p1 && p2 && p3 && p4) {
// Draw main tripwire line (p1-p2) - dashed, no arrow
elements.push(
`<line x1="p1.x" y1="p1.y" x2="p2.x" y2="p2.y" stroke="rgba(255,165,0,0.8)" stroke-width="2" stroke-dasharray="8,4"/>`
);
// Draw A-B detection line (p3-p4) - dashed with direction arrow
elements.push(
`<line x1="p3.x" y1="p3.y" x2="p4.x" y2="p4.y" stroke="rgba(255,165,0,0.8)" stroke-width="2" stroke-dasharray="8,4"/>`
);
// Calculate direction vector (p3 -> p4)
const dx = p4.x - p3.x;
const dy = p4.y - p3.y;
const len = Math.sqrt(dx * dx + dy * dy) || 1;
const unitX = dx / len;
const unitY = dy / len;
const dir = overlay.direction || 'TwoWay';
const arrowSize = 12;
// Draw directional arrows
if (dir === 'Forward') {
// Arrow at p4, pointing toward p4 (green)
drawArrowAtPoint(elements, p4, unitX, unitY, arrowSize, 'rgba(76,175,80,0.8)');
} else if (dir === 'Backward') {
// Arrow at p3, pointing toward p3 (red)
drawArrowAtPoint(elements, p3, -unitX, -unitY, arrowSize, 'rgba(244,67,54,0.8)');
} else if (dir === 'TwoWay') {
// Arrows at both p3 and p4 (blue)
drawArrowAtPoint(elements, p3, -unitX, -unitY, arrowSize, 'rgba(33,150,243,0.8)');
drawArrowAtPoint(elements, p4, unitX, unitY, arrowSize, 'rgba(33,150,243,0.8)');
}
// Add region labels 'A' and 'B'
const labelFontSize = 12;
const labelOffset = 8;
// Label 'A' near p3
elements.push(
`<text x="p3.x - labelOffset" y="p3.y - labelOffset" font-family="sans-serif" font-size="labelFontSize" fill="rgba(255,165,0,0.8)" font-weight="bold">A</text>`
);
// Label 'B' near p4
elements.push(
`<text x="p4.x + labelOffset" y="p4.y - labelOffset" font-family="sans-serif" font-size="labelFontSize" fill="rgba(255,165,0,0.8)" font-weight="bold">B</text>`
);
}
// Label at first vertex (applies to both formats)
const name = overlay.name || overlay.displayName || '';
if (name && pts.length >= 2) {
const lx = pts[0];
const ly = Math.max(pts[1] - labelHeight, 0);
const labelWidth = name.length * (fontSize * 0.65) + padding * 2;
elements.push(
`<rect x="lx" y="ly" width="labelWidth" height="labelHeight" fill="rgba(255,165,0,0.8)" rx="2"/>`
);
elements.push(
`<text x="lx + padding" y="ly + fontSize + padding - 2" font-family="sans-serif" font-size="fontSize" fill="white">escapeXml(name)</text>`
);
}
}
}
// Draw crossing vectors perpendicular to tripwire
for (const crossing of crossings) {
const { tripwireId, prevPos, currPos, dotProduct, direction } = crossing;
if (!prevPos || !currPos) continue;
const [prevX, prevY] = prevPos;
const [currX, currY] = currPos;
// Crossing point is midpoint between previous and current position
const crossX = (prevX + currX) / 2;
const crossY = (prevY + currY) / 2;
// Determine color based on crossing type
let arrowColor, dotColor;
if (direction === 'Forward') {
// Forward crossing (dot > 0): green
arrowColor = 'rgba(76,175,80,0.8)';
dotColor = '#4caf50';
} else if (direction === 'Backward') {
// Backward crossing (dot < 0): red
arrowColor = 'rgba(244,67,54,0.8)';
dotColor = '#f44336';
} else {
// TwoWay crossing: blue
arrowColor = 'rgba(33,150,243,0.8)';
dotColor = '#2196f3';
}
// Find the associated tripwire to get its direction
let tripwirePerpX = 1; // default perpendicular
let tripwirePerpY = 0;
for (const overlay of overlays) {
if (overlay.kind === 'TripWire' && overlay.name === tripwireId) {
// Get tripwire direction vector
const pts = Array.isArray(overlay.points[0]) ? overlay.points.flat() : overlay.points;
if (pts.length >= 4) {
const midIdx = Math.floor(pts.length / 4);
const x1 = pts[midIdx * 2];
const y1 = pts[midIdx * 2 + 1];
const x2 = pts[(midIdx + 1) * 2];
const y2 = pts[(midIdx + 1) * 2 + 1];
const dx = x2 - x1;
const dy = y2 - y1;
const len = Math.sqrt(dx * dx + dy * dy) || 1;
// Perpendicular vector (rotated 90 degrees)
tripwirePerpX = -dy / len;
tripwirePerpY = dx / len;
}
break;
}
}
const arrowSize = 12;
if (direction === 'TwoWay') {
// TwoWay: draw arrows in both directions perpendicular to tripwire
// Forward arrow
const fx1 = crossX + tripwirePerpX * arrowSize;
const fy1 = crossY + tripwirePerpY * arrowSize;
elements.push(
`<line x1="crossX" y1="crossY" x2="fx1" y2="fy1" stroke="arrowColor" stroke-width="2.5" opacity="0.8"/>`
);
const perp2X = tripwirePerpY * arrowSize * 0.6;
const perp2Y = -tripwirePerpX * arrowSize * 0.6;
elements.push(
`<polygon points="fx1,fy1 fx1 - tripwirePerpX * arrowSize - perp2X,fy1 - tripwirePerpY * arrowSize - perp2Y fx1 - tripwirePerpX * arrowSize + perp2X,fy1 - tripwirePerpY * arrowSize + perp2Y" fill="arrowColor"/>`
);
// Backward arrow
const bx1 = crossX - tripwirePerpX * arrowSize;
const by1 = crossY - tripwirePerpY * arrowSize;
elements.push(
`<line x1="crossX" y1="crossY" x2="bx1" y2="by1" stroke="arrowColor" stroke-width="2.5" opacity="0.8"/>`
);
elements.push(
`<polygon points="bx1,by1 bx1 + tripwirePerpX * arrowSize - perp2X,by1 + tripwirePerpY * arrowSize - perp2Y bx1 + tripwirePerpX * arrowSize + perp2X,by1 + tripwirePerpY * arrowSize + perp2Y" fill="arrowColor"/>`
);
// Center circle
elements.push(
`<circle cx="crossX" cy="crossY" r="4" fill="dotColor" stroke="white" stroke-width="1.5"/>`
);
// Label
elements.push(
`<text x="crossX + 10" y="crossY - 5" font-size="11" fill="dotColor" font-weight="bold">⟷ TwoWay</text>`
);
} else {
// Forward or Backward: single arrow perpendicular to tripwire
const sign = direction === 'Forward' ? 1 : -1;
const arrowX = crossX + tripwirePerpX * arrowSize * sign;
const arrowY = crossY + tripwirePerpY * arrowSize * sign;
// Arrow shaft
elements.push(
`<line x1="crossX" y1="crossY" x2="arrowX" y2="arrowY" stroke="arrowColor" stroke-width="2.5" opacity="0.8"/>`
);
// Arrow head
const perpBaseX = tripwirePerpY * arrowSize * 0.6;
const perpBaseY = -tripwirePerpX * arrowSize * 0.6;
elements.push(
`<polygon points="arrowX,arrowY arrowX - tripwirePerpX * arrowSize * sign - perpBaseX,arrowY - tripwirePerpY * arrowSize * sign - perpBaseY arrowX - tripwirePerpX * arrowSize * sign + perpBaseX,arrowY - tripwirePerpY * arrowSize * sign + perpBaseY" fill="arrowColor"/>`
);
// Center circle
elements.push(
`<circle cx="crossX" cy="crossY" r="4" fill="dotColor" stroke="white" stroke-width="1.5"/>`
);
// Label
const dirLabel = direction === 'Forward' ? '✓ Forward' : '✗ Backward';
elements.push(
`<text x="crossX + 10" y="crossY - 5" font-size="11" fill="dotColor" font-weight="bold">dirLabel</text>`
);
}
}
return `<svg xmlns="http://www.w3.org/2000/svg" width="width" height="height">elements.join('')</svg>`;
}
/**
* Build SVG for text lines displayed in a footer area.
* Returns { svg: string, height: number }.
*
* The SVG is positioned to sit at the bottom of the extended canvas,
* starting at y=imageHeight.
*/
export function buildTextFooterSvg(textLines, width, imageHeight) {
const fontSize = 16;
const lineHeight = fontSize + 8;
const paddingTop = 10;
const paddingBottom = 10;
const paddingLeft = 12;
const totalHeight = paddingTop + textLines.length * lineHeight + paddingBottom;
const elements = [];
// Dark background
elements.push(
`<rect x="0" y="imageHeight" width="width" height="totalHeight" fill="rgba(0,0,0,0.75)"/>`
);
// Text lines
for (let i = 0; i < textLines.length; i++) {
const ty = imageHeight + paddingTop + (i + 1) * lineHeight - 4;
elements.push(
`<text x="paddingLeft" y="ty" font-family="sans-serif" font-size="fontSize" fill="white">escapeXml(textLines[i])</text>`
);
}
const svg = `<svg xmlns="http://www.w3.org/2000/svg" width="width" height="imageHeight + totalHeight">elements.join('')</svg>`;
return { svg, height: totalHeight };
}
/**
* Parse named CLI flags from argv.
* Returns { positional: string[], flags: Record<string, string> }.
*/
function parseArgs(argv) {
const positional = [];
const flags = {};
let i = 0;
while (i < argv.length) {
if (argv[i].startsWith('--') && i + 1 < argv.length) {
const key = argv[i].slice(2);
flags[key] = argv[i + 1];
i += 2;
} else {
positional.push(argv[i]);
i += 1;
}
}
return { positional, flags };
}
async function main() {
const { positional, flags } = parseArgs(process.argv.slice(2));
const inputImage = positional[0];
let detectionsArg = positional[1];
let outputPath = positional[2];
if (!inputImage || !detectionsArg) {
console.error('Usage: node visualize.mjs <input-image> <detections-json | -> [output-path] [--overlays \'<json>\'] [--text \'<json>\'] [--crossings \'<json>\']');
console.error('');
console.error(' <detections-json> JSON array of detections (parsedValue format), or "-" to read from stdin');
console.error(' [output-path] Optional output path. Default: <input>_detection.<ext>');
console.error(' --overlays JSON array of ROI/Tripwire overlay objects');
console.error(' --text JSON array of text lines to display as footer');
console.error(' --crossings JSON array of tripwire crossing events with motion vectors');
process.exit(1);
}
// Resolve input image
const resolvedInput = path.resolve(inputImage);
if (!fs.existsSync(resolvedInput)) {
console.error(`Input image not found: resolvedInput`);
process.exit(1);
}
// Read detections JSON
let detectionsJson;
if (detectionsArg === '-') {
detectionsJson = await readStdin();
} else {
detectionsJson = detectionsArg;
}
let detections;
try {
detections = JSON.parse(detectionsJson);
} catch (err) {
console.error(`Failed to parse detections JSON: err.message`);
process.exit(1);
}
if (!Array.isArray(detections)) {
console.error('Detections must be a JSON array');
process.exit(1);
}
// Parse --overlays
let overlays = [];
if (flags.overlays) {
try {
overlays = JSON.parse(flags.overlays);
if (!Array.isArray(overlays)) {
console.error('--overlays must be a JSON array');
process.exit(1);
}
} catch (err) {
console.error(`Failed to parse --overlays JSON: err.message`);
process.exit(1);
}
}
// Parse --text
let textLines = [];
if (flags.text) {
try {
textLines = JSON.parse(flags.text);
if (!Array.isArray(textLines)) {
console.error('--text must be a JSON array');
process.exit(1);
}
} catch (err) {
console.error(`Failed to parse --text JSON: err.message`);
process.exit(1);
}
}
// Parse --crossings
let crossings = [];
if (flags.crossings) {
try {
crossings = JSON.parse(flags.crossings);
if (!Array.isArray(crossings)) {
console.error('--crossings must be a JSON array');
process.exit(1);
}
} catch (err) {
console.error(`Failed to parse --crossings JSON: err.message`);
process.exit(1);
}
}
// Validate: must have something to draw
if (detections.length === 0 && overlays.length === 0 && textLines.length === 0) {
console.error('Nothing to draw: detections, overlays, and text are all empty');
process.exit(1);
}
// Default output path
if (!outputPath) {
const ext = path.extname(inputImage);
const base = path.basename(inputImage, ext);
const dir = path.dirname(resolvedInput);
outputPath = path.join(dir, `base_detectionext`);
} else {
outputPath = path.resolve(outputPath);
}
// Read image metadata
const image = sharp(resolvedInput);
const metadata = await image.metadata();
const { width, height } = metadata;
// Build composite layers
const composites = [];
// Layer 1: input overlays (ROI/Tripwire) — drawn first (bottom layer)
if (overlays.length > 0) {
const overlaySvg = buildOverlaysSvg(overlays, width, height, crossings);
composites.push({ input: Buffer.from(overlaySvg), top: 0, left: 0 });
}
// Layer 2: detection bboxes — drawn on top of overlays
if (detections.length > 0) {
const detSvg = buildDetectionSvg(detections, width, height);
composites.push({ input: Buffer.from(detSvg), top: 0, left: 0 });
}
// Layer 3: text footer — extend canvas and draw at bottom
let pipeline = image;
if (textLines.length > 0) {
const { svg: textSvg, height: textHeight } = buildTextFooterSvg(textLines, width, height);
pipeline = pipeline.extend({
bottom: textHeight,
background: { r: 0, g: 0, b: 0, alpha: 1 },
});
composites.push({ input: Buffer.from(textSvg), top: 0, left: 0 });
}
// Composite and save
await pipeline
.composite(composites)
.toFile(outputPath);
console.log(outputPath);
}
import { fileURLToPath } from 'url';
if (process.argv[1] === fileURLToPath(import.meta.url)) {
main();
}
FILE:scripts/show-grid.mjs
#!/usr/bin/env node
import fs from 'fs';
import path from 'path';
import sharp from 'sharp';
import { fileURLToPath } from 'url';
/**
* Escape XML special characters for safe SVG embedding.
*/
function escapeXml(str) {
return String(str)
.replace(/&/g, '&')
.replace(/</g, '<')
.replace(/>/g, '>')
.replace(/"/g, '"')
.replace(/'/g, ''');
}
/**
* Compute adaptive grid dimensions so that cells are approximately square.
*
* @param {number} imageWidth
* @param {number} imageHeight
* @returns {{ cols: number, rows: number }}
*/
export function computeGridSize(imageWidth, imageHeight) {
// Target ~30-42 intersection points with roughly square cells.
// We pick the short side = 4 segments and scale the long side proportionally.
const minSegments = 4;
const ratio = imageWidth / imageHeight;
let cols, rows;
if (ratio >= 1) {
// Landscape or square
rows = minSegments;
cols = Math.round(rows * ratio);
// Clamp cols to a reasonable range
cols = Math.max(minSegments, Math.min(cols, 12));
} else {
// Portrait
cols = minSegments;
rows = Math.round(cols / ratio);
rows = Math.max(minSegments, Math.min(rows, 12));
}
return { cols, rows };
}
/**
* Generate column labels: A, B, C, ..., Z, AA, AB, ...
*/
export function generateColLabels(count) {
const labels = [];
for (let i = 0; i < count; i++) {
let label = '';
let n = i;
do {
label = String.fromCharCode(65 + (n % 26)) + label;
n = Math.floor(n / 26) - 1;
} while (n >= 0);
labels.push(label);
}
return labels;
}
/**
* Generate row labels: 0, 1, 2, ...
*/
export function generateRowLabels(count) {
return Array.from({ length: count }, (_, i) => String(i));
}
/**
* Build an SVG overlay with grid lines, intersection dots, and labels.
*
* The SVG covers the full extended canvas (margin + image).
*
* @param {object} params
* @param {number} params.imageWidth
* @param {number} params.imageHeight
* @param {number} params.cols - number of column segments (cols+1 vertical lines)
* @param {number} params.rows - number of row segments (rows+1 horizontal lines)
* @param {number} params.marginTop
* @param {number} params.marginLeft
* @param {number} params.marginBottom
* @param {number} params.marginRight
* @returns {string} SVG string
*/
export function buildGridSvg({ imageWidth, imageHeight, cols, rows, marginTop, marginLeft, marginBottom, marginRight }) {
const totalWidth = marginLeft + imageWidth + marginRight;
const totalHeight = marginTop + imageHeight + marginBottom;
const gridWidth = imageWidth / cols;
const gridHeight = imageHeight / rows;
const colLabels = generateColLabels(cols + 1);
const rowLabels = generateRowLabels(rows + 1);
const elements = [];
const fontSize = 16;
const dotR = 5;
// Grid lines — vertical
for (let c = 0; c <= cols; c++) {
const x = marginLeft + c * gridWidth;
elements.push(
`<line x1="x" y1="marginTop" x2="x" y2="totalHeight" stroke="rgba(255,255,255,0.55)" stroke-width="2"/>`
);
}
// Grid lines — horizontal
for (let r = 0; r <= rows; r++) {
const y = marginTop + r * gridHeight;
elements.push(
`<line x1="marginLeft" y1="y" x2="totalWidth" y2="y" stroke="rgba(255,255,255,0.55)" stroke-width="2"/>`
);
}
// Intersection dots
for (let c = 0; c <= cols; c++) {
for (let r = 0; r <= rows; r++) {
const cx = marginLeft + c * gridWidth;
const cy = marginTop + r * gridHeight;
elements.push(
`<circle cx="cx" cy="cy" r="dotR" fill="white" stroke="rgba(0,0,0,0.7)" stroke-width="1.5"/>`
);
}
}
// Column labels — along the top margin
for (let c = 0; c <= cols; c++) {
const x = marginLeft + c * gridWidth;
const textWidth = colLabels[c].length * fontSize * 0.65;
const bgWidth = textWidth + 10;
const bgX = x - bgWidth / 2;
const bgY = 2;
const bgH = fontSize + 10;
elements.push(
`<rect x="bgX" y="bgY" width="bgWidth" height="bgH" fill="rgba(0,0,0,0.75)" rx="3"/>`
);
elements.push(
`<text x="x" y="bgY + fontSize + 3" font-family="sans-serif" font-size="fontSize" font-weight="bold" fill="white" text-anchor="middle">escapeXml(colLabels[c])</text>`
);
}
// Row labels — along the left margin
for (let r = 0; r <= rows; r++) {
const y = marginTop + r * gridHeight;
const label = rowLabels[r];
const textWidth = label.length * fontSize * 0.65;
const bgWidth = textWidth + 10;
const bgH = fontSize + 10;
const bgX = 2;
const bgY = y - bgH / 2;
elements.push(
`<rect x="bgX" y="bgY" width="bgWidth" height="bgH" fill="rgba(0,0,0,0.75)" rx="3"/>`
);
elements.push(
`<text x="bgX + bgWidth / 2" y="bgY + fontSize + 3" font-family="sans-serif" font-size="fontSize" font-weight="bold" fill="white" text-anchor="middle">escapeXml(label)</text>`
);
}
return `<svg xmlns="http://www.w3.org/2000/svg" width="totalWidth" height="totalHeight">elements.join('')</svg>`;
}
/**
* Parse named CLI flags from argv.
*/
function parseArgs(argv) {
const positional = [];
const flags = {};
let i = 0;
while (i < argv.length) {
if (argv[i].startsWith('--') && i + 1 < argv.length) {
const key = argv[i].slice(2);
flags[key] = argv[i + 1];
i += 2;
} else {
positional.push(argv[i]);
i += 1;
}
}
return { positional, flags };
}
async function main() {
const { positional, flags } = parseArgs(process.argv.slice(2));
const inputImage = positional[0];
let outputPath = positional[1];
if (!inputImage) {
console.error('Usage: node show-grid.mjs <input-image> [output-path] [--cols N] [--rows N]');
console.error('');
console.error(' Generates a grid reference image for specifying ROI/Tripwire coordinates.');
console.error(' --cols Number of column segments (default: auto based on aspect ratio)');
console.error(' --rows Number of row segments (default: auto based on aspect ratio)');
process.exit(1);
}
const resolvedInput = path.resolve(inputImage);
if (!fs.existsSync(resolvedInput)) {
console.error(`Input image not found: resolvedInput`);
process.exit(1);
}
// Read image metadata
const image = sharp(resolvedInput);
const metadata = await image.metadata();
const { width: imageWidth, height: imageHeight } = metadata;
// Compute grid size
let { cols, rows } = computeGridSize(imageWidth, imageHeight);
if (flags.cols) cols = parseInt(flags.cols, 10);
if (flags.rows) rows = parseInt(flags.rows, 10);
if (cols < 1 || rows < 1 || isNaN(cols) || isNaN(rows)) {
console.error('--cols and --rows must be positive integers');
process.exit(1);
}
const marginTop = 32;
const marginLeft = 32;
const marginBottom = 18;
const marginRight = 18;
// Build SVG
const svg = buildGridSvg({ imageWidth, imageHeight, cols, rows, marginTop, marginLeft, marginBottom, marginRight });
// Default output path
if (!outputPath) {
const ext = path.extname(inputImage);
const base = path.basename(inputImage, ext);
const dir = path.dirname(resolvedInput);
outputPath = path.join(dir, `base_gridext`);
} else {
outputPath = path.resolve(outputPath);
}
// Extend canvas for margins, composite grid SVG
await image
.extend({
top: marginTop,
left: marginLeft,
bottom: marginBottom,
right: marginRight,
background: { r: 30, g: 30, b: 30, alpha: 1 },
})
.composite([{ input: Buffer.from(svg), top: 0, left: 0 }])
.toFile(outputPath);
// Output metadata JSON for Agent consumption
const gridWidth = imageWidth / cols;
const gridHeight = imageHeight / rows;
const colLabels = generateColLabels(cols + 1);
const rowLabels = generateRowLabels(rows + 1);
const result = {
path: outputPath,
cols,
rows,
colLabels,
rowLabels,
gridWidth: Math.round(gridWidth * 100) / 100,
gridHeight: Math.round(gridHeight * 100) / 100,
marginLeft,
marginTop,
imageWidth,
imageHeight,
};
console.log(JSON.stringify(result));
}
if (process.argv[1] === fileURLToPath(import.meta.url)) {
main();
}
FILE:scripts/intent-invoke.mjs
#!/usr/bin/env node
import { querySkills } from './query.mjs';
import { callMultimodal } from './multimodal.mjs';
import { runSkill } from './invoke.mjs';
import { getWorkspaceSkills } from './workspace.mjs';
const DEFAULT_CONFIDENCE_THRESHOLD = 0.7;
const MAX_SKILL_RETRIES = 3;
/**
* Determine if we should use the skill based on confidence threshold.
*/
export function shouldUseSkill(skill, threshold = DEFAULT_CONFIDENCE_THRESHOLD) {
return !!(skill && skill.epId && typeof skill.confidence === 'number' && skill.confidence >= threshold);
}
/**
* Format the final result consistently.
*/
export function formatResult(mode, skill, detections = []) {
return {
success: true,
mode,
epId: skill?.epId || null,
skillName: skill?.name || null,
confidence: skill?.confidence || null,
count: detections.length,
detections
};
}
/**
* Extract detections array from a runSkill() result.
*/
function extractDetections(result) {
if (result?.未检出) return [];
const outputs = result?.result?.outputs;
if (!outputs || outputs.length === 0) return [];
const output = outputs[0];
if (output?.parsedValue && Array.isArray(output.parsedValue)) {
return output.parsedValue;
}
return [];
}
/**
* Main orchestration function:
* 1. Query router for skills matching intent
* 2. Try top skills one by one (up to MAX_SKILL_RETRIES)
* 3. If all skills fail, fallback to multimodal
*/
export async function intentInvoke(intent, imagePath, options = {}) {
const { threshold = DEFAULT_CONFIDENCE_THRESHOLD, topK = 5 } = options;
// Step 1: Query for matching skills
let skills;
try {
skills = await querySkills(intent, { topK });
} catch (err) {
// Query failed, will fallback to multimodal
skills = [];
}
// Step 2: Try top skills one by one
const candidates = skills.filter(s => shouldUseSkill(s, threshold)).slice(0, MAX_SKILL_RETRIES);
for (const skill of candidates) {
try {
const { result } = await runSkill(skill.epId, { input0: { image: imagePath } }, { autoROI: true });
const detections = extractDetections(result);
return formatResult('skill', skill, detections);
} catch (err) {
console.error(`Skill skill.epId (skill.name) failed: err.message`);
// Continue to next skill
}
}
// Step 3: Public skills all failed — search private workspace
try {
const privateSkills = await getWorkspaceSkills();
if (privateSkills.length > 0) {
return {
success: true,
mode: 'workspace-search',
epId: null,
skillName: null,
confidence: null,
count: 0,
detections: [],
privateSkills: privateSkills.map(s => ({
epId: s.epId,
displayName: s.displayName,
description: s.description,
})),
hint: '公共技能无匹配。请从 privateSkills 列表中选择最匹配的技能,使用 invoke.mjs 调用。如无合适技能,可调用 multimodal.mjs 回退。',
};
}
} catch (err) {
console.error(`Private workspace search failed: err.message`);
// Continue to multimodal fallback
}
// Step 4: Fallback to multimodal
try {
const detections = await callMultimodal(intent, imagePath);
return formatResult('multimodal', null, detections);
} catch (err) {
return {
success: false,
error: err.message,
mode: 'multimodal',
epId: null,
detections: []
};
}
}
/**
* CLI entry point.
*/
async function main() {
const intent = process.argv[2];
const imagePath = process.argv[3];
const threshold = parseFloat(process.argv[4]) || DEFAULT_CONFIDENCE_THRESHOLD;
if (!intent) {
console.error('Usage: node intent-invoke.mjs <intent> [image-path] [threshold]');
console.error('Example: node intent-invoke.mjs "detect people falling" photo.jpg 0.8');
console.error('');
console.error('Parameters:');
console.error(' intent - Description of what to detect (required)');
console.error(' image-path - Path to image file (optional)');
console.error(' threshold - Confidence threshold 0.0-1.0 (default: 0.7)');
process.exit(1);
}
try {
const result = await intentInvoke(intent, imagePath, { threshold });
console.log(JSON.stringify(result, null, 2));
process.exit(result.success ? 0 : 1);
} catch (err) {
console.error(JSON.stringify({
success: false,
error: err.message
}, null, 2));
process.exit(1);
}
}
if (import.meta.url === `file://process.argv[1]`) {
main();
}
FILE:scripts/workspace.mjs
#!/usr/bin/env node
import path from 'path';
import { fileURLToPath } from 'url';
import { httpsRequest, getApiKey, workspacesGetUrl, workspaceSkillsGetUrl } from './utils.mjs';
import { readCache, writeCache, getCacheDir } from './cache.mjs';
const __filename = fileURLToPath(import.meta.url);
const __dirname = path.dirname(__filename);
const CACHE_DIR = getCacheDir();
const WORKSPACE_CACHE_FILE = path.join(CACHE_DIR, 'workspace-cache.json');
const SKILLS_CACHE_FILE = path.join(CACHE_DIR, 'skills-cache.json');
const SKILLS_CACHE_TTL = 60 * 60 * 1000; // 1 hour in ms
function buildEpId(workspaceId, localName) {
// localName format: c-sk-XXXXX → extract XXXXX part
const suffix = localName.replace(/^c-sk-/, '');
return `ep-workspaceId-suffix`;
}
// ── API Functions ────────────────────────────────────────────────────
/**
* Get the default workspace ID. Uses long-lived cache (tied to API key).
*/
export async function getDefaultWorkspaceId() {
const cached = readCache(WORKSPACE_CACHE_FILE);
if (cached?.workspaceId) return cached.workspaceId;
const apiKey = getApiKey();
const url = workspacesGetUrl();
const { statusCode, body } = await httpsRequest(url, {
method: 'POST',
headers: {
'Authorization': `Bearer apiKey`,
'Content-Type': 'application/json',
},
});
if (statusCode !== 200) {
throw new Error(`workspaces/get failed: HTTP statusCode - body`);
}
const resp = JSON.parse(body);
if (!resp.success || !resp.page?.result) {
throw new Error(`workspaces/get unexpected response: body`);
}
const defaultWorkspace = resp.page.result.find(ws => ws.type === 'default');
if (!defaultWorkspace) {
throw new Error('No default workspace found');
}
const workspaceId = defaultWorkspace.id;
writeCache(WORKSPACE_CACHE_FILE, { workspaceId, timestamp: Date.now() });
return workspaceId;
}
/**
* Get all released SkillApi skills from the default workspace.
* Uses file cache with 1-hour TTL. Each skill includes pre-constructed epId.
*/
export async function getWorkspaceSkills() {
const cached = readCache(SKILLS_CACHE_FILE, SKILLS_CACHE_TTL);
if (cached?.skills) return cached.skills;
const workspaceId = await getDefaultWorkspaceId();
const apiKey = getApiKey();
const url = workspaceSkillsGetUrl(workspaceId);
const allSkills = [];
let pageNo = 1;
const pageSize = 50;
// Paginate to fetch all skills
while (true) {
const { statusCode, body } = await httpsRequest(url, {
method: 'POST',
headers: {
'Authorization': `Bearer apiKey`,
'Content-Type': 'application/json',
},
body: JSON.stringify({
workspaceID: workspaceId,
orderBy: 'last_published_at',
order: 'desc',
pageNo,
pageSize,
publicationKinds: ['SkillApi'],
status: 'Released',
}),
});
if (statusCode !== 200) {
throw new Error(`skills/get failed: HTTP statusCode - body`);
}
const resp = JSON.parse(body);
if (!resp.success || !resp.page?.result) {
throw new Error(`skills/get unexpected response: body`);
}
const skills = resp.page.result.map(skill => ({
epId: buildEpId(workspaceId, skill.localName),
displayName: skill.displayName,
description: skill.description || '',
localName: skill.localName,
workspaceId,
}));
allSkills.push(...skills);
const totalCount = resp.page.totalCount || 0;
if (allSkills.length >= totalCount || skills.length < pageSize) break;
pageNo++;
}
writeCache(SKILLS_CACHE_FILE, { workspaceId, skills: allSkills, timestamp: Date.now() });
return allSkills;
}
// ── CLI ──────────────────────────────────────────────────────────────
async function main() {
const command = process.argv[2];
if (command === 'list-skills') {
try {
const skills = await getWorkspaceSkills();
if (skills.length === 0) {
console.log('No private workspace skills found.');
return;
}
console.log(`Private workspace skills (skills.length total):\n`);
for (const skill of skills) {
const desc = skill.description ? ` - skill.description` : '';
console.log(` skill.epId skill.displayNamedesc`);
}
} catch (err) {
console.error(`Error: err.message`);
process.exit(1);
}
} else {
console.log('Usage: node workspace.mjs list-skills');
console.log('');
console.log('Commands:');
console.log(' list-skills List all released private workspace skills');
}
}
if (import.meta.url === `file://process.argv[1]`) {
main();
}
FILE:scripts/query.mjs
#!/usr/bin/env node
import { httpsRequest, getApiKey, routerQueryUrl } from './utils.mjs';
/**
* Parse and validate the router query response.
* Returns array of skills sorted by confidence (highest first).
*/
export function parseQueryResponse(response) {
// Handle internal API response format
if (response?.data?.candidates && Array.isArray(response.data.candidates)) {
return response.data.candidates
.filter(s => s.skillId && typeof s.score === 'number')
.map(s => ({
epId: s.skillId,
name: s.displayName,
description: s.description,
confidence: s.score
}))
.sort((a, b) => b.confidence - a.confidence);
}
// Fallback to empty array
return [];
}
/**
* Query skills by intent text.
* @param {string} intent - User's intent text
* @param {object} options - Optional parameters
* @param {number} options.topK - Maximum number of results (default: 5)
* @returns {Promise<Array>} Array of matching skills with confidence scores
*/
export async function querySkills(intent, options = {}) {
const apiKey = getApiKey();
const { topK = 5 } = options;
const requestBody = JSON.stringify({
query: intent,
topK
});
const response = await httpsRequest(routerQueryUrl(), {
method: 'POST',
headers: {
'Authorization': `Bearer apiKey`,
'Content-Type': 'application/json',
'Accept': 'application/json',
},
body: requestBody,
});
if (response.statusCode !== 200) {
throw new Error(`Query failed with status response.statusCode: response.body`);
}
let result;
try {
result = JSON.parse(response.body);
} catch (err) {
throw new Error(`Failed to parse query response: err.message`);
}
return parseQueryResponse(result);
}
/**
* CLI entry point for testing query functionality.
*/
async function main() {
const intent = process.argv[2];
const topK = parseInt(process.argv[3], 10) || 5;
if (!intent) {
console.error('Usage: node query.mjs <intent> [topK]');
console.error('Example: node query.mjs "detect person falling" 3');
process.exit(1);
}
try {
const skills = await querySkills(intent, { topK });
console.log(JSON.stringify({
success: true,
intent,
count: skills.length,
skills
}, null, 2));
} catch (err) {
console.error(JSON.stringify({
success: false,
error: err.message
}, null, 2));
process.exit(1);
}
}
if (import.meta.url === `file://process.argv[1]`) {
main();
}
FILE:scripts/types.mjs
/**
* types.mjs — Yijian type constants, classification, serialization & deserialization.
*
* Centralizes all type-related logic previously scattered across invoke.mjs
* and intent-invoke.mjs.
*/
import fs from 'fs';
import path from 'path';
import crypto from 'crypto';
// ---------------------------------------------------------------------------
// 1. Type constants & classification
// ---------------------------------------------------------------------------
/** Basic scalar types — value is a plain string. */
export const BASIC_TYPES = ['String', 'TemplateString', 'Integer', 'Double', 'Boolean', 'Time'];
/** Complex object types — value needs JSON.stringify. */
export const COMPLEX_TYPES = ['Image', 'Detection', 'TrackDetection', 'Attribute', 'ROI', 'Tripwire'];
/** Inner types that can be wrapped in Array<T>. */
export const ARRAY_INNER_TYPES = [...BASIC_TYPES, ...COMPLEX_TYPES, 'Target'];
const _basicSet = new Set(BASIC_TYPES);
const _complexSet = new Set(COMPLEX_TYPES);
const _arrayInnerSet = new Set(ARRAY_INNER_TYPES);
/** Types whose output can be visualized (draw bbox / polygon / polyline). */
const _visualizableTypes = new Set(['Detection', 'TrackDetection', 'ROI', 'Tripwire']);
/** Types that represent input-side visual overlays (ROI, Tripwire). */
const _inputVisualizableTypes = new Set(['ROI', 'Tripwire']);
export function isBasicType(type) {
return _basicSet.has(type);
}
export function isComplexType(type) {
return _complexSet.has(type);
}
/**
* Returns true if `type` is `Array<...>`.
*/
export function isArrayType(type) {
return typeof type === 'string' && type.startsWith('Array<') && type.endsWith('>');
}
/**
* Unwrap `Array<T>` → `T`. Returns `null` for non-array types.
*/
export function unwrapArrayType(type) {
if (!isArrayType(type)) return null;
return type.slice(6, -1); // strip "Array<" and ">"
}
/**
* Returns true if the type is a visual/complex type (Image, Detection, etc.),
* including Array-wrapped forms.
*/
export function isVisualType(type) {
if (_complexSet.has(type)) return true;
const inner = unwrapArrayType(type);
return inner !== null && _complexSet.has(inner);
}
/**
* Returns true if the type supports output visualization
* (Detection, TrackDetection, ROI, Tripwire, and their Array<> forms).
*/
export function hasVisualization(type) {
if (_visualizableTypes.has(type)) return true;
const inner = unwrapArrayType(type);
return inner !== null && _visualizableTypes.has(inner);
}
/**
* Returns true if the type is an input-side visualizable overlay
* (ROI, Tripwire, and their Array<> forms).
*/
export function isInputVisualizableType(type) {
if (_inputVisualizableTypes.has(type)) return true;
const inner = unwrapArrayType(type);
return inner !== null && _inputVisualizableTypes.has(inner);
}
// ---------------------------------------------------------------------------
// 2. Field schema definitions
// ---------------------------------------------------------------------------
export const IMAGE_FIELDS = {
imageId: { type: 'String', required: false },
imageData: { type: 'String', required: false },
sourceId: { type: 'String', required: false },
imageWidth: { type: 'Integer', required: false },
imageHeight: { type: 'Integer', required: false },
timestamp: { type: 'Integer', required: false },
};
export const DETECTION_FIELDS = {
image_id: { type: 'String', required: false },
source_id: { type: 'String', required: false },
image_url: { type: 'String', required: false },
description: { type: 'String', required: false },
answers: { type: 'Array<String>', required: false },
image_base64: { type: 'Array<String>', required: false },
predictions: { type: 'Array<Object>', required: false },
roi_ids: { type: 'Array<String>', required: false },
answer: { type: 'String', required: false },
};
/** TrackDetection shares the same schema as Detection. */
export const TRACK_DETECTION_FIELDS = DETECTION_FIELDS;
export const ATTRIBUTE_FIELDS = {
image_id: { type: 'String', required: false },
image_url: { type: 'String', required: false },
description: { type: 'String', required: false },
answer: { type: 'Array<String>', required: false },
image_base64: { type: 'Array<String>', required: false },
predictions: { type: 'Array<Object>', required: false },
roiIds: { type: 'Array<String>', required: false },
};
export const ROI_FIELDS = {
id: { type: 'String', required: false },
name: { type: 'String', required: false },
displayName: { type: 'String', required: false },
points: { type: 'Array<Number>', required: true },
iou: { type: 'Float', required: false, default: 0.0 },
target_ratio: { type: 'Float', required: false, default: 0.5 },
kind: { type: 'String', required: true, default: 'ROI' },
interested: { type: 'Boolean', required: false, default: true },
direction: { type: 'String', required: false },
visuals: { type: 'Object', required: false },
};
export const TRIPWIRE_FIELDS = {
id: { type: 'String', required: false },
name: { type: 'String', required: false },
displayName: { type: 'String', required: false },
points: { type: 'Array<Number>', required: true },
iou: { type: 'Float', required: false, default: 0.0 },
target_ratio: { type: 'Float', required: false, default: 0.5 },
kind: { type: 'String', required: true, default: 'TripWire' },
interested: { type: 'Boolean', required: false, default: true },
direction: { type: 'String', required: true, default: 'TwoWay' },
visuals: { type: 'Object', required: false },
};
/** Lookup field schema by type name. */
export const FIELD_SCHEMAS = {
Image: IMAGE_FIELDS,
Detection: DETECTION_FIELDS,
TrackDetection: TRACK_DETECTION_FIELDS,
Attribute: ATTRIBUTE_FIELDS,
ROI: ROI_FIELDS,
Tripwire: TRIPWIRE_FIELDS,
};
// ---------------------------------------------------------------------------
// 3. Serialization — buildValue(type, userValue)
// ---------------------------------------------------------------------------
/**
* Simple heuristic to check if a string looks like base64-encoded data.
*/
function isBase64(str) {
if (typeof str !== 'string' || str.length < 100) return false;
return /^[A-Za-z0-9+/\n]+=*$/.test(str.trim());
}
/**
* Parse image dimensions from buffer (supports JPEG, PNG and WEBP).
*/
export function getImageDimensions(buffer) {
// PNG: bytes 16-23 contain width and height in the IHDR chunk
if (buffer[0] === 0x89 && buffer[1] === 0x50 && buffer[2] === 0x4E && buffer[3] === 0x47) {
return {
width: buffer.readUInt32BE(16),
height: buffer.readUInt32BE(20),
};
}
// JPEG: search for SOF markers
if (buffer[0] === 0xFF && buffer[1] === 0xD8) {
let offset = 2;
while (offset < buffer.length - 1) {
if (buffer[offset] !== 0xFF) { offset++; continue; }
const marker = buffer[offset + 1];
if (marker === 0xD9) break; // EOI
if (marker >= 0xC0 && marker <= 0xCF && marker !== 0xC4 && marker !== 0xC8) {
return {
height: buffer.readUInt16BE(offset + 5),
width: buffer.readUInt16BE(offset + 7),
};
}
const segLen = buffer.readUInt16BE(offset + 2);
offset += 2 + segLen;
}
}
// WEBP: RIFF????WEBP VP8 /VP8L/VP8X chunks
if (buffer.length >= 12 &&
buffer[0] === 0x52 && buffer[1] === 0x49 && buffer[2] === 0x46 && buffer[3] === 0x46 &&
buffer[8] === 0x57 && buffer[9] === 0x45 && buffer[10] === 0x42 && buffer[11] === 0x50) {
const chunkId = buffer.slice(12, 16).toString('ascii');
if (chunkId === 'VP8 ' && buffer.length >= 30) {
// Lossy: width/height at bytes 26-29 (14-bit each, little-endian, mask 0x3FFF)
return {
width: (buffer.readUInt16LE(26) & 0x3FFF),
height: (buffer.readUInt16LE(28) & 0x3FFF),
};
}
if (chunkId === 'VP8L' && buffer.length >= 25) {
// Lossless: 28-bit packed fields starting at byte 21
const bits = buffer.readUInt32LE(21);
return {
width: (bits & 0x3FFF) + 1,
height: ((bits >> 14) & 0x3FFF) + 1,
};
}
if (chunkId === 'VP8X' && buffer.length >= 30) {
// Extended: 24-bit little-endian width-1 at byte 24, height-1 at byte 27
return {
width: buffer.readUIntLE(24, 3) + 1,
height: buffer.readUIntLE(27, 3) + 1,
};
}
}
return { width: 0, height: 0 };
}
/**
* Read an image file and return { base64, width, height }.
*/
export function readImageFile(filePath) {
const resolved = path.resolve(filePath);
if (!fs.existsSync(resolved)) {
throw new Error(`Image file not found: resolved`);
}
const buffer = fs.readFileSync(resolved);
const { width, height } = getImageDimensions(buffer);
return { base64: buffer.toString('base64'), width, height };
}
/**
* Compute a short hash ID from the first 64 KB of a file.
* Returns a stable 16-char hex string (MD5 prefix).
*/
export function fileHashId(filePath) {
const fd = fs.openSync(path.resolve(filePath), 'r');
const buf = Buffer.alloc(65536);
const bytesRead = fs.readSync(fd, buf, 0, 65536, 0);
fs.closeSync(fd);
return crypto.createHash('md5').update(buf.subarray(0, bytesRead)).digest('hex').slice(0, 16);
}
/**
* Build the Image value object and JSON-stringify it.
*/
function buildImageValue(userValue) {
// (1) Pre-built object with imageData → pass through
if (typeof userValue === 'object' && userValue !== null && userValue.imageData) {
return JSON.stringify(userValue);
}
// (2) Object form { file, sourceId?, imageId?, timestamp? } or { image, sourceId?, imageId?, timestamp? } — for video frames
if (typeof userValue === 'object' && userValue !== null && (userValue.file || userValue.image)) {
const filePath = userValue.file || userValue.image;
const img = readImageFile(filePath);
return JSON.stringify({
imageData: img.base64,
imageId: userValue.imageId ?? fileHashId(filePath),
sourceId: userValue.sourceId ?? fileHashId(filePath),
imageWidth: img.width,
imageHeight: img.height,
timestamp: userValue.timestamp ?? Date.now(),
});
}
// (3) String path or base64 (existing logic)
let imageData, imageWidth = 0, imageHeight = 0;
let filePath = null;
if (typeof userValue === 'string') {
if (isBase64(userValue)) {
imageData = userValue;
} else {
filePath = userValue;
const img = readImageFile(userValue);
imageData = img.base64;
imageWidth = img.width;
imageHeight = img.height;
}
} else {
throw new Error(`Unsupported image value type: typeof userValue`);
}
return JSON.stringify({
imageData,
imageId: filePath ? fileHashId(filePath) : '0',
sourceId: filePath ? fileHashId(filePath) : '',
imageWidth,
imageHeight,
timestamp: Date.now(),
});
}
/**
* Apply default values from a field schema to a user-provided object.
*/
function applyDefaults(obj, fields) {
const result = { ...obj };
for (const [key, def] of Object.entries(fields)) {
if (result[key] === undefined && def.default !== undefined) {
result[key] = def.default;
}
}
return result;
}
/**
* Build a ROI value: fill in defaults (kind='ROI', etc.), JSON.stringify.
*/
function buildROIValue(userValue) {
if (typeof userValue === 'string') return userValue; // assume pre-serialized
const obj = applyDefaults(userValue, ROI_FIELDS);
return JSON.stringify(obj);
}
/**
* Build a Tripwire value: fill in defaults (kind='TripWire', direction, etc.), JSON.stringify.
*/
function buildTripwireValue(userValue) {
if (typeof userValue === 'string') return userValue;
const obj = applyDefaults(userValue, TRIPWIRE_FIELDS);
return JSON.stringify(obj);
}
/**
* Build a complex object value (Detection, TrackDetection, Attribute).
* If already an object, JSON.stringify; if string, assume pre-serialized.
*/
function buildGenericComplexValue(userValue) {
if (typeof userValue === 'string') return userValue;
return JSON.stringify(userValue);
}
/**
* Unified entry point: serialize a user value based on its Yijian type.
*
* @param {string} type - Yijian type (e.g. 'Image', 'ROI', 'Array<Detection>')
* @param {*} userValue - user-provided value
* @returns {string} - serialized string suitable for the API
*/
export function buildValue(type, userValue) {
if (userValue === undefined || userValue === null) return '';
// Array<T> — recurse on each element
const inner = unwrapArrayType(type);
if (inner !== null) {
const arr = Array.isArray(userValue) ? userValue : [userValue];
const built = arr.map(elem => {
const s = buildValue(inner, elem);
// For complex types the inner buildValue returns JSON string; parse it back
// so the outer stringify produces a proper array.
if (isComplexType(inner)) {
try { return JSON.parse(s); } catch { return s; }
}
return s;
});
return JSON.stringify(built);
}
switch (type) {
case 'Image':
return buildImageValue(userValue);
case 'ROI':
return buildROIValue(userValue);
case 'Tripwire':
return buildTripwireValue(userValue);
case 'Detection':
case 'TrackDetection':
case 'Attribute':
return buildGenericComplexValue(userValue);
default:
// Basic types — convert to string
if (typeof userValue === 'object') return JSON.stringify(userValue);
return String(userValue);
}
}
// ---------------------------------------------------------------------------
// 4. Deserialization — parseValue(type, valueString)
// ---------------------------------------------------------------------------
/**
* Parse Detection / TrackDetection value string into structured results.
* Extracts key fields from predictions.
*/
function parseDetections(value) {
const images = JSON.parse(value);
const detections = [];
for (const image of images) {
for (const pred of (image.predictions || [])) {
detections.push({
bbox: pred.bbox,
confidence: pred.confidence,
category: pred.categories && pred.categories.length > 0
? { id: pred.categories[0].id, name: pred.categories[0].name, confidence: pred.categories[0].confidence }
: null,
track_id: pred.track_id,
area: pred.area,
});
}
}
return detections;
}
/**
* Parse Attribute value string into structured results.
* Similar to Detection but also extracts answer field.
*/
function parseAttributes(value) {
const images = JSON.parse(value);
const results = [];
for (const image of images) {
const entry = {
answer: image.answer,
predictions: [],
};
for (const pred of (image.predictions || [])) {
entry.predictions.push({
bbox: pred.bbox,
confidence: pred.confidence,
categories: pred.categories,
answer: pred.answer,
});
}
results.push(entry);
}
return results;
}
/**
* Unified entry point: deserialize a value string based on its Yijian type.
*
* @param {string} type - Yijian type
* @param {string} valueString - raw value string from API response
* @returns {*} - parsed value
*/
export function parseValue(type, valueString) {
if (valueString === undefined || valueString === null || valueString === '') return valueString;
// Array<T>
const inner = unwrapArrayType(type);
if (inner !== null) {
// For Detection/TrackDetection arrays the entire value is already the images array
if (inner === 'Detection' || inner === 'TrackDetection') {
return parseDetections(valueString);
}
if (inner === 'Attribute') {
return parseAttributes(valueString);
}
// Generic array: parse, then recurse on each element
const arr = JSON.parse(valueString);
return arr.map(elem => {
const s = typeof elem === 'string' ? elem : JSON.stringify(elem);
return parseValue(inner, s);
});
}
switch (type) {
case 'Detection':
case 'TrackDetection':
// Single detection (wrapped in an array by API)
return parseDetections(valueString);
case 'Attribute':
return parseAttributes(valueString);
case 'ROI':
case 'Tripwire':
return JSON.parse(valueString);
case 'Image':
return JSON.parse(valueString);
default:
// Basic types — return as-is
return valueString;
}
}
/**
* Post-process an outputs array: parse complex types and attach parsedValue.
*/
export function parseOutputs(outputs) {
for (const field of outputs) {
if (field.value && (isComplexType(field.type) || isArrayType(field.type))) {
try {
field.parsedValue = parseValue(field.type, field.value);
} catch {
// keep original value if parsing fails
}
}
}
}
// ---------------------------------------------------------------------------
// 5. Image data URI helpers
// ---------------------------------------------------------------------------
const MIME_MAP = {
'.jpg': 'image/jpeg', '.jpeg': 'image/jpeg',
'.png': 'image/png', '.gif': 'image/gif',
'.webp': 'image/webp', '.bmp': 'image/bmp',
};
/**
* Convert a local file path to a base64 data URI.
*/
export function imageToDataUri(filePath) {
const ext = path.extname(filePath).toLowerCase();
const mime = MIME_MAP[ext] || 'image/jpeg';
const data = fs.readFileSync(filePath);
return `data:mime;base64,data.toString('base64')`;
}
/**
* Check if a string looks like a local file path (not a URL or data URI).
*/
export function isLocalFilePath(str) {
return str && !str.startsWith('http://') && !str.startsWith('https://') && !str.startsWith('data:');
}
// ---------------------------------------------------------------------------
// 6. SKILL.md generation helpers
// ---------------------------------------------------------------------------
/**
* Returns hints used when generating SKILL.md for a given type.
*/
export function getSkillMdHints(type) {
const hints = {
inputNote: null,
hasVisualization: false,
visualizationCmd: null,
};
// Check the raw type or inner type
const inner = unwrapArrayType(type) || type;
if (inner === 'Image') {
hints.inputNote = '> pass a local file path (auto base64-encoded), or `{ file, sourceId?, imageId?, timestamp? }` / `{ image, sourceId?, imageId?, timestamp? }` for video frames — see **Video Frame Extraction Guide**.';
} else if (inner === 'ROI') {
hints.inputNote = '> requires drawing regions on the image — see **ROI / Tripwire Input Guide** below.';
} else if (inner === 'Tripwire') {
hints.inputNote = '> requires drawing tripwire lines on the image — see **ROI / Tripwire Input Guide** below.';
}
if (hasVisualization(type)) {
hints.hasVisualization = true;
if (inner === 'Detection' || inner === 'TrackDetection') {
hints.visualizationCmd = 'visualize.mjs';
} else if (inner === 'ROI' || inner === 'Tripwire') {
hints.visualizationCmd = 'visualize.mjs';
}
}
return hints;
}
FILE:scripts/utils.mjs
#!/usr/bin/env node
import https from 'https';
import http from 'http';
/**
* Make an HTTPS (or HTTP) request. Returns a Promise that resolves to { statusCode, headers, body }.
*/
export function httpsRequest(url, options = {}) {
return new Promise((resolve, reject) => {
const parsedUrl = new URL(url);
const mod = parsedUrl.protocol === 'https:' ? https : http;
const reqOptions = {
hostname: parsedUrl.hostname,
port: parsedUrl.port || (parsedUrl.protocol === 'https:' ? 443 : 80),
path: parsedUrl.pathname + parsedUrl.search,
method: options.method || 'GET',
headers: options.headers || {},
};
const req = mod.request(reqOptions, (res) => {
const chunks = [];
res.on('data', (chunk) => chunks.push(chunk));
res.on('end', () => {
const body = Buffer.concat(chunks).toString('utf-8');
resolve({ statusCode: res.statusCode, headers: res.headers, body });
});
});
req.on('error', reject);
if (options.body) {
req.write(options.body);
}
req.end();
});
}
/**
* Get the API key from environment. Throws if not set.
*/
export function getApiKey() {
const key = process.env.YIJIAN_API_KEY;
if (!key) {
throw new Error('YIJIAN_API_KEY environment variable is not set. Please configure it in ~/.claude/settings.json under "env".');
}
return key;
}
/**
* Construct the metadata URL for a given ep-id.
*/
export function metadataUrl(epId) {
return `https://yijian-next.cloud.baidu.com/api/skills/v1/epId/metadata`;
}
/**
* Construct the run URL for a given ep-id.
*/
export function runUrl(epId) {
return `https://yijian-next.cloud.baidu.com/api/skills/v1/epId/run`;
}
/**
* Construct the router query URL for intent-based skill matching.
*/
export function routerQueryUrl() {
return 'https://yijian.baidubce.com/harness/v1/router/query';
}
/**
* Construct the router multimodal URL for direct inference.
*/
export function routerMultimodalUrl() {
return 'https://yijian.baidubce.com/harness/v1/router/multimodal';
}
/**
* Construct the workspaces get URL.
*/
export function workspacesGetUrl() {
return 'https://yijian-next.cloud.baidu.com/api/vistudio/v1/workspaces/get';
}
/**
* Construct the workspace skills get URL.
*/
export function workspaceSkillsGetUrl(workspaceId) {
return `https://yijian-next.cloud.baidu.com/api/vistudio/v1/workspaces/workspaceId/skills/get`;
}
FILE:scripts/invoke.mjs
#!/usr/bin/env node
import { httpsRequest, getApiKey, runUrl } from './utils.mjs';
import { buildValue, parseOutputs, readImageFile } from './types.mjs';
/**
* Read all stdin as a string (for piped input).
*/
function readStdin() {
return new Promise((resolve, reject) => {
const chunks = [];
process.stdin.setEncoding('utf-8');
process.stdin.on('data', (chunk) => chunks.push(chunk));
process.stdin.on('end', () => resolve(chunks.join('')));
process.stdin.on('error', reject);
});
}
/**
* Build a full-image ROI covering the entire image bounds.
*/
export function fullImageROI(width, height) {
return { id: '1', name: 'zone', kind: 'ROI', points: [0, 0, width, 0, width, height, 0, height] };
}
/**
* Determine Yijian type from a field name heuristic.
*/
function detectFieldType(fieldName, fieldValue) {
const lower = fieldName.toLowerCase();
if (lower.includes('image') || lower.endsWith('image')) return 'Image';
if (lower.includes('roi')) return 'ROI';
if (lower.includes('tripwire')) return 'Tripwire';
if (typeof fieldValue === 'number') return 'Integer';
return 'String';
}
/**
* Build the `inputs` array for the run request body.
* Supports Image, ROI, Tripwire, Integer, and String types.
* When autoROI is enabled and an Image field is present without a corresponding ROI,
* automatically adds a full-image ROI.
*/
function buildRunInputs(userInputs, options = {}) {
const { autoROI = false } = options;
const inputs = [];
for (const [inputName, inputData] of Object.entries(userInputs)) {
const schema = [];
let hasImage = false;
let hasROI = false;
let imagePath = null;
for (const [fieldName, fieldValue] of Object.entries(inputData)) {
const type = detectFieldType(fieldName, fieldValue);
if (type === 'Image') {
hasImage = true;
imagePath = typeof fieldValue === 'object' && fieldValue !== null
? (fieldValue.file || fieldValue.image || null)
: (typeof fieldValue === 'string' && !fieldValue.startsWith('data:') ? fieldValue : null);
}
if (type === 'ROI') hasROI = true;
schema.push({
name: fieldName,
type: type,
value: buildValue(type, fieldValue),
});
}
// Auto-fill full-image ROI when enabled and not provided by user
if (autoROI && hasImage && !hasROI && imagePath) {
try {
const img = readImageFile(imagePath);
schema.push({
name: 'roi',
type: 'ROI',
value: buildValue('ROI', fullImageROI(img.width, img.height)),
});
} catch (err) {
// If we can't read the image for ROI, skip auto-fill
// (buildValue('Image', ...) will read it again anyway and throw if invalid)
}
}
inputs.push({ name: inputName, schema });
}
return inputs;
}
/**
* Invoke a skill by epId with the given user inputs.
*
* @param {string} epId - Skill endpoint ID (e.g. "ep-public-xxx")
* @param {object} userInputs - User input { input0: { image: "path", roi: {...} } }
* @param {object} options
* @param {boolean} options.autoROI - Auto-fill full-image ROI when not provided (default: true)
* @returns {Promise<{success: boolean, epId: string, result?: object, message?: string}>}
*/
export async function runSkill(epId, userInputs = {}, options = {}) {
const { autoROI = true } = options;
const apiKey = getApiKey();
const inputs = buildRunInputs(userInputs, { autoROI });
const requestBody = JSON.stringify({ inputs });
const response = await httpsRequest(runUrl(epId), {
method: 'POST',
headers: {
'Authorization': `Bearer apiKey`,
'Content-Type': 'application/json',
'Accept': 'application/json',
},
body: requestBody,
});
if (response.statusCode !== 200) {
// Check for "conditional branch did not hit" error (no detection)
try {
const parsed = JSON.parse(response.body);
if (parsed.message?.global?.detail?.includes('conditional branch did not hit')) {
return {
success: true,
epId,
result: { 未检出: true, detail: parsed.message.global.detail },
};
}
} catch {}
throw new Error(`Skill invocation failed (response.statusCode): response.body`);
}
let result;
try {
result = JSON.parse(response.body);
} catch {
result = { rawResponse: response.body };
}
// Handle "no detection" responses: API returns 200 but success:false
// This means the image didn't produce results matching the skill's conditions,
// NOT an actual error. Treat as a successful invocation with no detections.
if (result.success === false && result.message?.code === 'UnknownError') {
const detail = result.message?.global?.detail || '';
return {
success: true,
epId,
result: { 未检出: true, detail, rawMessage: result.message },
};
}
// Post-process: parse complex-type outputs
if (result.result?.outputs) {
parseOutputs(result.result.outputs);
}
return { success: true, epId, result };
}
/**
* CLI entry point.
*/
async function main() {
const epId = process.argv[2];
let inputArg = process.argv[3];
if (!epId) {
console.error('Usage: node invoke.mjs <ep-id> \'<input-json>\' | -');
console.error('Use - as input arg to read from stdin');
console.error('');
console.error('Input JSON format:');
console.error(' { "input0": { "image": "/path/to/image.jpg" } }');
console.error(' { "input0": { "image": "/path/to/image.jpg", "roi": { "id":"1","name":"zone","kind":"ROI","points":[...] } } }');
process.exit(1);
}
// Parse user input JSON
let userInputs;
if (inputArg === '-') {
inputArg = await readStdin();
}
if (!inputArg) {
userInputs = {};
} else {
try {
userInputs = JSON.parse(inputArg);
} catch (err) {
console.error(`Failed to parse input JSON: err.message`);
process.exit(1);
}
}
try {
const output = await runSkill(epId, userInputs, { autoROI: true });
console.log(JSON.stringify(output, null, 2));
process.exit(output.success ? 0 : 1);
} catch (err) {
console.error(JSON.stringify({ success: false, epId, error: err.message }, null, 2));
process.exit(1);
}
}
if (import.meta.url === `file://process.argv[1]`) {
main();
}
FILE:scripts/cache.mjs
#!/usr/bin/env node
import fs from 'fs';
import path from 'path';
import { fileURLToPath } from 'url';
const __filename = fileURLToPath(import.meta.url);
const __dirname = path.dirname(__filename);
const CACHE_DIR = path.join(__dirname, '..', '.cache');
function ensureCacheDir() {
if (!fs.existsSync(CACHE_DIR)) {
fs.mkdirSync(CACHE_DIR, { recursive: true });
}
}
/**
* Read a JSON cache file. Returns null if missing, expired, or invalid.
*/
export function readCache(filePath, ttl) {
if (!fs.existsSync(filePath)) return null;
try {
const data = JSON.parse(fs.readFileSync(filePath, 'utf-8'));
if (ttl && Date.now() - data.timestamp > ttl) return null;
return data;
} catch {
return null;
}
}
/**
* Write a JSON cache file.
*/
export function writeCache(filePath, data) {
ensureCacheDir();
fs.writeFileSync(filePath, JSON.stringify(data, null, 2), 'utf-8');
}
/**
* Get the cache directory path.
*/
export function getCacheDir() {
return CACHE_DIR;
}
FILE:scripts/list.mjs
#!/usr/bin/env node
import { querySkills } from './query.mjs';
/**
* List skills matching a query intent.
* This helps users discover what skills are available for their use case.
*
* Usage: node list.mjs "detect people"
*/
async function main() {
const intent = process.argv[2];
const topK = parseInt(process.argv[3], 10) || 10;
if (!intent) {
console.log('Usage: node list.mjs <intent> [topK]');
console.log('Example: node list.mjs "detect people falling" 5');
console.log('');
console.log('This queries the Yijian platform for skills matching your intent.');
process.exit(0);
}
try {
const skills = await querySkills(intent, { topK });
if (skills.length === 0) {
console.log('No skills found matching your intent.');
console.log('You can still use multimodal inference: node intent-invoke.mjs "' + intent + '" <image>');
process.exit(0);
}
console.log(`Found skills.length skill(s) matching "intent":\n`);
skills.forEach((skill, index) => {
const confidence = (skill.confidence * 100).toFixed(1);
console.log(`index + 1. skill.name || skill.epId`);
console.log(` ID: skill.epId`);
console.log(` Match: confidence%`);
if (skill.description) {
console.log(` Description: skill.description`);
}
console.log('');
});
console.log('To use a skill directly:');
console.log(` node intent-invoke.mjs "intent" <image-path>`);
} catch (err) {
console.error('Error:', err.message);
process.exit(1);
}
}
main();
FILE:scripts/multimodal.mjs
#!/usr/bin/env node
import { httpsRequest, getApiKey, routerMultimodalUrl } from './utils.mjs';
import { imageToDataUri, isLocalFilePath } from './types.mjs';
/**
* Call multimodal router for direct inference using internal API.
* @param {string} text - User's text query
* @param {string} imageUrl - URL or local file path to the image (optional)
* @returns {Promise<Object>} API response
*/
export async function callMultimodal(text, imageUrl) {
const apiKey = getApiKey();
const messages = [
{
role: 'user',
content: [
{
type: 'text',
text: text
}
]
}
];
// Add image if provided
if (imageUrl) {
// Convert local file path to data URI
const resolvedUrl = isLocalFilePath(imageUrl) ? imageToDataUri(imageUrl) : imageUrl;
messages[0].content.push({
type: 'image_url',
image_url: {
url: resolvedUrl
}
});
}
const requestBody = {
messages
};
const response = await httpsRequest(routerMultimodalUrl(), {
method: 'POST',
headers: {
'Authorization': `Bearer apiKey`,
'Content-Type': 'application/json',
'Accept': 'application/json',
},
body: JSON.stringify(requestBody),
});
if (response.statusCode !== 200) {
throw new Error(`Multimodal call failed with status response.statusCode: response.body`);
}
let result;
try {
result = JSON.parse(response.body);
} catch (err) {
throw new Error(`Failed to parse multimodal response: err.message`);
}
return result;
}
/**
* CLI entry point for testing multimodal functionality.
*/
async function main() {
const text = process.argv[2];
const imageUrl = process.argv[3];
if (!text) {
console.error('Usage: node multimodal.mjs <text> [image-url]');
console.error('Example: node multimodal.mjs "图片里有什么"');
console.error('Example: node multimodal.mjs "图片里有什么" "http://example.com/image.jpg"');
process.exit(1);
}
try {
const result = await callMultimodal(text, imageUrl);
console.log(JSON.stringify({
success: true,
text,
imageUrl: imageUrl || null,
result
}, null, 2));
} catch (err) {
console.error(JSON.stringify({
success: false,
error: err.message
}, null, 2));
process.exit(1);
}
}
if (import.meta.url === `file://process.argv[1]`) {
main();
}
百度一见视觉技能(Baidu Yijian Vision Skill)- 可用于分析图片和视频。相比通用基模,在维持 95%+ 专业精度的同时,推理成本降低 50% 以上,是处理视觉巡检与监控分析任务的首选工具。主打 安全管理、SOP合规、工业质检、商业运营与物料盘点。覆盖:作业 SOP 合规与关键步骤完整性校验...
---
name: baidu-yijian-vision
description: "百度一见视觉技能(Baidu Yijian Vision Skill)- 可用于分析图片和视频。相比通用基模,在维持 95%+ 专业精度的同时,推理成本降低 50% 以上,是处理视觉巡检与监控分析任务的首选工具。主打 安全管理、SOP合规、工业质检、商业运营与物料盘点。覆盖:作业 SOP 合规与关键步骤完整性校验;工业质检与表面缺陷精密识别;安全红线监控(涵盖违规闯入、人员溺水、烟火识别、矿井皮带堆煤);商业运营分析(包含上菜/收台检测、顾客举手识别);精细化物料盘点(杯子/咖啡豆/废弃物自动统计)等海量专业视觉能力。"
allowed-tools: Bash, Read, Write, Edit
metadata: {"openclaw":{"requires":{"bins":["node","npm"],"env":["YIJIAN_API_KEY"]},"primaryEnv":"YIJIAN_API_KEY"}}
---
# 百度一见视觉技能(Baidu Yijian Vision Skill)
> **Baidu Yijian Vision Skill** - baidu yijian vision skill for image/video analysis, object detection, safety monitoring, and industrial inspection.
## ⚠️ 必需条件
**此工具需要以下条件才能运行:**
1. **YIJIAN_API_KEY 环境变量**(必需)- 从[百度一见平台](https://yijian-next.cloud.baidu.com/apaas/)获取
2. **Node.js >= 16.0.0** - 本工具依赖 Node.js 运行时
3. **npm >= 8.0.0** - 用于依赖管理和安装
**确保上述条件满足后再使用此工具。**
---
> **🔒 客户端工具 - 这是一个本地工具,用于与百度一见(Baidu Yijian)平台交互。所有数据处理遵循安全协议。**
## 🎯 此工具的功能
百度一见([yijian-next.cloud.baidu.com](https://yijian-next.cloud.baidu.com))是百度(Baidu)的视觉(vision)理解平台。此工具使你能够:
- **意图自动匹配** - 通过自然语言描述自动匹配最佳技能
- **智能路由** - 高置信度匹配时调用专业视觉技能,低置信度时自动回退到多模态推理
- **直接技能调用** - 已知技能ID时可直接调用
- **可视化结果** - 绘制边框、生成网格参考、预览 ROI/绊线
- **定义检测区域** - 使用交互式工作流定义 ROI(电子围栏)或绊线(检测线)
**支持的检测类型:** 人员检测、行人计数、车辆识别、OCR、姿态估计、目标跟踪等。
## 📋 快速开始
### 系统要求
- **Node.js** >= 16.0.0
- **npm** >= 8.0.0
- **YIJIAN_API_KEY** 环境变量
## 🔧 前置条件
### 获取 API Key
1. 登录 [百度一见平台](https://yijian-next.cloud.baidu.com/apaas/)
2. 激活试用包
3. 生成 API Key(百度一见平台 → 系统管理 → 安全认证 → API Key)
### 配置环境
设置环境变量:
```
YIJIAN_API_KEY=your-api-key
```
## 📚 使用指南
### 意图驱动工作流(推荐)
**当你描述需求但不确定用哪个技能时**,系统会自动匹配最佳技能:
```bash
node CLAUDE_PLUGIN_ROOT/skill/scripts/intent-invoke.mjs "检测是否有人摔倒" photo.jpg
```
系统会自动:
1. 查询一见平台,根据意图匹配公共技能列表
2. 如果匹配置信度 ≥ 0.7,调用对应的专业技能(自动添加全图 ROI)
3. 如果公共技能无匹配或调用失败,搜索私有工作空间技能(由你从列表中选择最匹配的技能,再用 invoke 调用)
4. 如果私有空间也无合适技能,自动回退到多模态直接推理
> **自动 ROI:** 当用户未提供 ROI 时,系统会自动生成覆盖整张图片的 ROI。如需指定检测区域,请使用 `invoke.mjs` 传入自定义 ROI。
#### 自定义置信度阈值
```bash
# 仅当匹配度≥0.8时才使用技能,否则回退到多模态
node CLAUDE_PLUGIN_ROOT/skill/scripts/intent-invoke.mjs "检测是否有人摔倒" photo.jpg 0.8
```
#### 不使用图片(纯文本意图查询)
```bash
node CLAUDE_PLUGIN_ROOT/skill/scripts/intent-invoke.mjs "检测是否有人摔倒"
```
#### 返回格式
```json
{
"success": true,
"mode": "skill",
"epId": "ep-public-xxxxx",
"skillName": "人员摔倒检测",
"confidence": 0.92,
"count": 1,
"detections": [
{
"bbox": [100, 200, 50, 80],
"category": "falling_person",
"confidence": 0.94
}
]
}
```
**字段说明:**
| 字段 | 类型 | 说明 |
|------|------|------|
| `success` | boolean | 调用是否成功 |
| `mode` | string | `"skill"` 或 `"multimodal"`,表示使用的推理模式 |
| `epId` | string \| null | 技能ID(技能模式时有值) |
| `skillName` | string \| null | 技能名称(技能模式时有值) |
| `confidence` | number \| null | 技能匹配置信度(0-1) |
| `count` | number | 检测到的目标数量 |
| `detections` | array | 检测结果数组 |
**模式说明:**
- `"mode": "skill"` - 使用了百度一见平台的专业技能,精度高、成本低
- `"mode": "multimodal"` - 使用了多模态大模型直接推理,通用性强、无需预设技能
### 查询私有工作空间技能
当公共技能匹配不到或调用失败时,可以搜索你私有工作空间中的技能:
```bash
node CLAUDE_PLUGIN_ROOT/skill/scripts/workspace.mjs list-skills
```
返回你默认工作空间中所有已发布的技能列表(含 epId、名称和描述)。列表会本地缓存1小时。
**使用流程:**
1. 运行 `list-skills` 获取私有技能列表
2. 根据用户意图,从列表中选择最匹配的技能
3. 用选中的 epId 调用 `invoke.mjs` 执行技能
4. 如果私有空间也无合适技能,走 multimodal 多模态推理
```bash
# 第1步:获取私有技能列表
node CLAUDE_PLUGIN_ROOT/skill/scripts/workspace.mjs list-skills
# 第2步:选择匹配的技能并调用
echo '{"input0":{"image":"photo.jpg"}}' | node CLAUDE_PLUGIN_ROOT/skill/scripts/invoke.mjs ep-wsnyqcdj-0xdpgbt4
```
> **注意:** 私有技能列表按 API Key 关联,1小时内自动刷新缓存,无需频繁调用。
### 查询可用技能
如果你想了解有哪些技能可以匹配你的意图:
```bash
node CLAUDE_PLUGIN_ROOT/skill/scripts/list.mjs "人员检测"
```
这会返回匹配的技能列表及其置信度。
### 直接调用技能(已知技能ID)
**当你已经知道具体的技能 ID 时**,可以直接调用:
```bash
# 调用指定技能(从stdin读取输入)
echo '{"input0":{"image":"photo.jpg"}}' | node CLAUDE_PLUGIN_ROOT/skill/scripts/invoke.mjs ep-xxxx-yyyy -
# 或者直接作为参数
echo '{"input0":{"image":"photo.jpg"}}' | node CLAUDE_PLUGIN_ROOT/skill/scripts/invoke.mjs ep-xxxx-yyyy
```
#### ROI(电子围栏)参数格式
ROI 用于限定检测区域。**必须包含 `id`、`name`、`kind`、`points` 四个字段,缺一不可**,否则 API 返回 500 错误。
```json
{
"id": "1",
"name": "zone",
"kind": "ROI",
"points": [x1,y1, x2,y2, x3,y3, x4,y4]
}
```
- `id` — 任意字符串标识(如 `"1"`)
- `name` — 区域名称(如 `"zone"`、`"doorway"`)
- `kind` — 固定值 `"ROI"`
- `points` — 顶点坐标数组,按顺时针/逆时针顺序排列,每对 `[x,y]` 为一个顶点
> **自动 ROI:** 如果不传 `roi` 参数,`invoke.mjs` 会自动生成覆盖全图的 ROI。
#### 绊线(Tripwire)参数格式
绊线用于检测穿越事件。**必须包含 `id`、`name`、`kind`、`points`、`direction` 五个字段**。
```json
{
"id": "1",
"name": "line",
"kind": "TripWire",
"points": [p1_x,p1_y, p2_x,p2_y, p3_x,p3_y, p4_x,p4_y],
"direction": "Forward"
}
```
- `id` — 任意字符串标识
- `name` — 绊线名称
- `kind` — 固定值 `"TripWire"`
- `points` — 4 个点(8 个数值):p1→p2 为主线,p3→p4 为 A/B 区域标记
- `direction` — 检测方向:`"Forward"` | `"Backward"` | `"TwoWay"`
> **绊线不会自动生成**,必须由用户指定。详见 [绊线工作流](./tripwire-workflow.md)。
**调用带 ROI 的技能:**
```bash
echo '{"input0":{"image":"photo.jpg","roi":{"id":"1","name":"zone","kind":"ROI","points":[100,100,500,100,500,400,100,400]}}}' | \
node CLAUDE_PLUGIN_ROOT/skill/scripts/invoke.mjs ep-xxxx-yyyy
```
**调用带绊线的技能:**
```bash
echo '{"input0":{"image":"photo.jpg","tripwire":{"id":"1","name":"line","kind":"TripWire","points":[0,540,1920,540,0,500,1920,500],"direction":"Forward"}}}' | \
node CLAUDE_PLUGIN_ROOT/skill/scripts/invoke.mjs ep-xxxx-yyyy
```
### 测试 Query 接口
如果你想单独测试意图匹配功能:
```bash
node CLAUDE_PLUGIN_ROOT/skill/scripts/query.mjs "检测人员摔倒"
```
返回匹配的技能列表(JSON格式)。
### 测试 Multimodal 接口
如果你想单独测试多模态直接推理:
```bash
# 纯文本
node CLAUDE_PLUGIN_ROOT/skill/scripts/multimodal.mjs "描述这张图片"
# 带图片URL
node CLAUDE_PLUGIN_ROOT/skill/scripts/multimodal.mjs "描述这张图片" "http://example.com/image.jpg"
```
### 定义检测区域
**需要定义电子围栏(ROI,又叫感兴趣区域)或绊线(Tripwire,又叫检测线)?**
- **[ROI 工作流](./roi-workflow.md)** — 创建电子围栏,仅在指定区域检测
- **[绊线工作流](./tripwire-workflow.md)** — 绘制检测线,统计穿越事件
两个工作流都包含完整的交互步骤和示例对话。
### 查看完整文档
- **[类型定义](./types-guide.md)** — 检测(Detection),图像(Image)、电子围栏(ROI)、绊线(Tripwire)等数据结构
- **[网格输入系统](./grid-guide.md)** — 使用网格坐标指定点
## 💡 常见任务
### 查询匹配的技能
```bash
node CLAUDE_PLUGIN_ROOT/skill/scripts/list.mjs "检测人员"
```
### 意图驱动调用(自动路由)
```bash
# 系统会自动选择技能或多模态
node CLAUDE_PLUGIN_ROOT/skill/scripts/intent-invoke.mjs "检测人员摔倒" photo.jpg
# 自定义阈值
node CLAUDE_PLUGIN_ROOT/skill/scripts/intent-invoke.mjs "检测人员摔倒" photo.jpg 0.8
```
### 直接调用技能(已知技能ID)
```bash
echo '{"input0":{"image":"photo.jpg"}}' | node CLAUDE_PLUGIN_ROOT/skill/scripts/invoke.mjs ep-public-xxxxx
```
### 预览 ROI/绊线
在调用前在图像上预览:
```bash
node CLAUDE_PLUGIN_ROOT/skill/scripts/visualize.mjs photo.jpg '[]' preview.png \
--overlays '[{"kind":"ROI","name":"zone","points":[...]}]'
```
### 生成网格
帮助用户使用网格坐标指定点位置:
```bash
node CLAUDE_PLUGIN_ROOT/skill/scripts/show-grid.mjs photo.jpg grid.png
```
---
## 📋 使用示例
### 示例 1:意图驱动检测(推荐)
**场景:** 你有一张监控画面图像,想检测是否有人摔倒,但不确定使用哪个技能。
```bash
# 使用意图驱动工作流,系统自动匹配最佳技能
node CLAUDE_PLUGIN_ROOT/skill/scripts/intent-invoke.mjs "检测是否有人摔倒" surveillance.jpg
# 返回结果包含检测到的目标
# {
# "success": true,
# "mode": "skill",
# "epId": "ep-public-inqm15aq",
# "skillName": "人员摔倒",
# "confidence": 0.95,
# "count": 1,
# "detections": [...]
# }
```
### 示例 2:传统直接调用(已知技能ID)
**场景:** 你已经知道具体的技能 ID,直接调用。
```bash
# 第 1 步:调用指定技能
echo '{"input0":{"image":"surveillance.jpg"}}' | \
node CLAUDE_PLUGIN_ROOT/skill/scripts/invoke.mjs ep-public-inqm15aq -
# 第 2 步:可视化结果
detections='[{"bbox":[150,200,80,180],"confidence":0.94,"category":{"id":"person","name":"人体"}}]'
node CLAUDE_PLUGIN_ROOT/skill/scripts/visualize.mjs surveillance.jpg "$detections" output.jpg
# 第 3 步:处理结果
echo "$detections" | jq 'length' # 计数人数
```
### 示例 3:基于网格的 ROI 设置
**场景:** 在走廊监控摄像机中计数进入特定房间的人员,使用 ROI 限制检测区域。
```bash
# 第 1 步:生成网格参考
node CLAUDE_PLUGIN_ROOT/skill/scripts/show-grid.mjs hallway.jpg --cols 6 --rows 4
# 第 2 步:根据网格识别坐标(B1, D1, D3, B3)并创建 ROI
# 注意:ROI 必须包含 id 和 name 字段
# 第 3 步:验证 ROI
node CLAUDE_PLUGIN_ROOT/skill/scripts/visualize.mjs hallway.jpg '[]' roi_preview.jpg \
--overlays "[{\"kind\":\"ROI\",\"name\":\"doorway\",\"points\":[320,270,960,270,960,810,320,810]}]"
# 第 4 步:使用 invoke.mjs 传入自定义 ROI(不要用 intent-invoke.mjs,它只会添加全图 ROI)
echo '{"input0":{"image":"hallway.jpg","roi":{"id":"1","name":"doorway","kind":"ROI","points":[320,270,960,270,960,810,320,810]}}}' | \
node CLAUDE_PLUGIN_ROOT/skill/scripts/invoke.mjs ep-xxxx-yyyy
```
### 示例 4:视频帧处理和跟踪
**场景:** 处理 30 秒监控视频,逐帧检测和跟踪人员。
```bash
# 第 1 步:提取帧
ffmpeg -i surveillance_30sec.mp4 -vf fps=1 frames/frame_%04d.jpg
# 第 2 步:计算 sourceId(视频标识符)
sourceId=$(head -c 65536 surveillance_30sec.mp4 | md5sum | awk '{print substr($1, 1, 16)}')
# 第 3 步:处理每个帧并跟踪
for frame_file in frames/frame_*.jpg; do
frame_num=$(basename "$frame_file" | grep -oE '[0-9]+' | head -1)
frame_index=$((10#$frame_num - 1))
timestamp=$((frame_index * 1000))
imageId="frame_$(printf '%04d' "$frame_num")"
# 使用意图驱动调用
result=$(node CLAUDE_PLUGIN_ROOT/skill/scripts/intent-invoke.mjs "检测人员" "$frame_file")
detections=$(echo "$result" | jq '.detections')
echo "$detections" > "results/imageId_detections.json"
done
```
---
**API Key 从 `YIJIAN_API_KEY` 环境变量读取。所有脚本将 JSON 输出到标准输出,错误输出到标准错误。**
FILE:grid-guide.md
# 基于网格的 ROI / 绊线输入指南
## 概述
在命令行环境中手动指定 ROI 区域或绊线的确切像素坐标既繁琐又容易出错。基于网格的输入系统让你可以使用易于理解的网格坐标来代替。
## 工作原理
### 1. 生成网格参考图像
```bash
node scripts/show-grid.mjs photo.jpg [output-path] [--cols N] [--rows N]
```
这会在你的图像上创建带标签的网格叠加层:
```
A B C D E F G
0 ·─────·─────·─────·─────·─────·─────·
│ │ │ │ │ │ │
1 ·─────·─────·─────·─────·─────·─────·
│ │ │ │ │ │ │
2 ·─────·─────·─────·─────·─────·─────·
│ │ │ │ │ │ │
3 ·─────·─────·─────·─────·─────·─────·
│ │ │ │ │ │ │
4 ·─────·─────·─────·─────·─────·─────·
```
**输出**:
- `photo_grid.png` - 网格参考图像
- `photo_grid_metadata.json` - 坐标映射数据
### 2. 查看并识别坐标
查看网格图像并识别 ROI 或绊线的坐标:
- **列**:A、B、C、D、...(从左到右)
- **行**:0、1、2、3、...(从上到下)
- **交点**:网格线交叉的位置(用点标记)
### 3. 使用网格坐标指定
告诉系统网格坐标:
```
用户:"在 B1、E1、E3、B3 创建检测区域"
```
系统自动转换为像素坐标:
```
B1 = (列B索引 × 网格宽度, 行1索引 × 网格高度)
E1 = (列E索引 × 网格宽度, 行1索引 × 网格高度)
E3 = (列E索引 × 网格宽度, 行3索引 × 网格高度)
B3 = (列B索引 × 网格宽度, 行3索引 × 网格高度)
```
## 使用示例
### 示例 1:定义检测区域(ROI)
```bash
# 生成网格
$ node scripts/show-grid.mjs office.jpg
# 查看网格图像以识别坐标,然后:
用户:"我想要一个办公桌区域的检测区域:B2、G2、G5、B5"
转换为 ROI:
{
"kind": "ROI",
"points": [col_B, row_2, col_G, row_2, col_G, row_5, col_B, row_5]
}
# 使用 ROI 调用技能
$ echo '{"input0":{"image":"office.jpg","roi":"[...]"}}' | \
node invoke.mjs ep-public-2403um2p
```
### 示例 2:定义穿越线(绊线)
```bash
# 生成网格
$ node scripts/show-grid.mjs hallway.jpg
# 查看网格图像,然后:
用户:"在走廊中创建一条从 A3 到 H3 的绊线,检测从左到右的穿越"
转换为绊线:
{
"kind": "TripWire",
"points": [col_A, row_3, col_H, row_3],
"direction": "Forward"
}
# 使用绊线调用技能
$ echo '{"input0":{"image":"hallway.jpg","tripwire":"[...]"}}' | \
node invoke.mjs ep-public-ywbjb7tm
```
### 示例 3:多个 ROI
```bash
用户:"我想要 3 个检测区域:入口(A1-C3)、中心(D1-F3)、出口(G1-H3)"
转换为三个 ROI 对象并作为 Array<ROI> 传递:
[
{
"kind": "ROI",
"points": [col_A, row_1, col_C, row_1, col_C, row_3, col_A, row_3],
"order": 0
},
{
"kind": "ROI",
"points": [col_D, row_1, col_F, row_1, col_F, row_3, col_D, row_3],
"order": 1
},
{
"kind": "ROI",
"points": [col_G, row_1, col_H, row_1, col_H, row_3, col_G, row_3],
"order": 2
}
]
```
## 网格算法
网格大小会自动计算:
- **目标**:约 30-42 个交点,大致为方形单元
- **横向图像**(1920×1080):7 列 × 4 行
- **纵向图像**(1080×1920):4 列 × 7 行
- **方形图像**(800×800):5 列 × 5 行
**手动覆盖**:
```bash
node scripts/show-grid.mjs photo.jpg --cols 10 --rows 6
```
## 命令参考
### 生成网格
```bash
node scripts/show-grid.mjs <input-image> [output-path] [--cols N] [--rows N]
```
**参数**:
- `<input-image>` - 输入图像文件路径
- `[output-path]` - 可选输出图像路径(默认:`<input>_grid.png`)
- `--cols N` - 覆盖列数
- `--rows N` - 覆盖行数
**输出**:
- 网格图像:`<output>.png`
- 元数据 JSON:`<output>_metadata.json`
### 使用网格坐标
在指定坐标时:
**有效格式**:
- 单个点:`A1`
- 序列:`A1、E1、E3、A3`(用于 ROI)
- 线:`A2 → G2`(用于绊线,显示方向)
**网格参考格式**:
- 列:A-Z,然后是 AA-AZ 等
- 行:0-9,然后是 10-99 等
## 可视化
### 查看网格图像
生成网格后,系统应该显示网格图像以便你可以直观地识别坐标。
### 调用前验证
在用坐标调用技能之前:
```bash
# 在原始图像上可视化选定的 ROI/绊线
node scripts/visualize.mjs office.jpg '<roi-or-tripwire-json>' preview.jpg
```
这可帮助你在处理前验证坐标是否正确。
## 提示和技巧
1. **从简单开始**:在使用复杂多边形前先从简单的矩形 ROI 开始
2. **使用均匀网格**:5×5 或 6×6 网格最容易参考
3. **记录你的区域**:为每个 ROI 指定有意义的名称(例如"入口"、"收银台")
4. **重用坐标**:保存类似摄像机角度的坐标集
5. **先测试**:在处理前始终用 `visualize.mjs` 预览
## 故障排除
**"网格图像过于拥挤"**
- 降低网格密度:`--cols 4 --rows 4`
- 如果可能,增大图像大小
**"看不清网格线"**
- 网格使用半透明白色线条(50% 不透明度)
- 放大网格图像以获得更好的可见性
**"坐标似乎不对"**
- 验证列/行顺序(列 A-Z 从左到右,行 0-N 从上到下)
- 检查第一个点和最后一个点是否不同(多边形必须闭合)
- 确保点的顺序一致(全部顺时针或全部逆时针)
**"网格不适合图像"**
- 非常宽或非常高的图像可能有不均匀的单元格
- 使用 `--cols` 和 `--rows` 强制指定特定的网格尺寸
## 参考
- [类型定义](./types-guide.md) - ROI 和绊线类型规范
- [SKILL.md](./SKILL.md) - 主技能指南
FILE:tripwire-workflow.md
# 绊线(检测线)交互工作流
**导航:** 返回 [SKILL.md](./SKILL.md) | 类型定义 [types-guide.md](./types-guide.md)
> 当用户需要为穿越检测定义绊线时,遵循此交互工作流。
## 核心概念
- **p1→p2** 主检测线(用户定义,2 个点)
- **p3→p4** A-B 区域标记(系统自动生成或用户定义)
- **方向** Forward/Backward/TwoWay(要检测的方向)
**自动生成过程**:用户只需指定 p1 和 p2 点。系统会自动生成垂直于主线的 p3 和 p4 点用于预览和确认。
矢量穿越检测算法详情,见 [types-guide.md#tripwire](./types-guide.md#tripwire)
## 工作流步骤
### 第 1 步:生成网格参考
```bash
node scripts/show-grid.mjs <image> <output-grid.png>
```
### 第 2 步:用户指定绊线
"查看网格,检测线应该在哪里?例如,从 B2 到 G2?"
### 第 3 步:预览绊线位置
系统基于用户的 2 个点(p1、p2)生成完整的 4 点结构,然后调用 `visualize.mjs` 生成预览:
```bash
# 自动计算 4 个点后
node scripts/visualize.mjs <image> '[]' preview.png \
--overlays '[{
"kind":"TripWire",
"name":"tripwire preview",
"direction":"Forward",
"points":[p1_x,p1_y,p2_x,p2_y,p3_x,p3_y,p4_x,p4_y]
}]'
```
预览显示:
- 主线(p1→p2)虚线 - 橙色
- A/B 标记(p3→p4)虚线 - 橙色,带标签
- 方向箭头 - Forward=绿色在 p4,Backward=红色在 p3
### 第 4 步:确认 A/B 区域
"查看预览图像:
- A 区在这一侧(例如线的上方)
- B 区在那一侧(例如线的下方)
- 绿色箭头显示 Forward 是 A→B
这是否正确?"
### 第 5 步:选择方向模式
用户看到预览后,提问:
- Forward(相同方向)
- Backward(相反方向)
- TwoWay(检测两个方向)
### 第 6 步:最终确认
生成最终预览,用户确认,然后调用技能:
```bash
echo '{"input0":{"image":"photo.jpg","tripwire":{"id":"1","name":"line","kind":"TripWire","points":[p1_x,p1_y,p2_x,p2_y,p3_x,p3_y,p4_x,p4_y],"direction":"Forward"}}}' | \
node scripts/invoke.mjs ep-xxxx-yyyy
```
---
## 常见错误
❌ 不要在显示预览前解释 A/B
✅ 用户必须看到带 A/B 标记标签的预览
❌ 不要仅使用 2 个点
✅ 始终提供 4 个点(主线 + A/B 标记)
❌ 不要跳过预览验证
✅ 始终显示预览以确认
---
## 完整对话示例
**用户**:"我需要检测进入门的人员"
**你**:
1. 生成网格:
```bash
node scripts/show-grid.mjs door-camera.jpg door-grid.png
```
2. 问:"查看网格,检测线应该在哪里?例如 B2 到 G2?"
3. 用户:"是的,从 B2 到 G2 的线"
4. 生成显示绊线位置和 A/B 标记的预览:
```bash
node scripts/visualize.mjs door-camera.jpg '[]' preview1.png \
--overlays '[{"kind":"TripWire","name":"door","direction":"Forward","points":[B2_x,B2_y,G2_x,G2_y,B1_x,B1_y,G3_x,G3_y]}]'
```
5. 解释:"这是绊线预览:
- 区域 'A' 在线上方(外侧)
- 区域 'B' 在线下方(内侧)
- 绿色箭头显示 Forward 检测 A→B(从外侧进入)
你对这个设置满意吗?"
6. 用户:"是的,我想检测进入(Forward)"
7. 生成最终预览以确认
8. 调用技能进行检测
---
## 数据结构
详见 [types-guide.md#tripwire](./types-guide.md#tripwire) 了解完整的定义、矢量算法和示例。
FILE:package.json
{
"name": "baidu-yijian-vision",
"version": "0.9.34",
"description": "Baidu Yijian Vision Skill - Intent-driven visual analysis with automatic skill routing",
"type": "module",
"private": true,
"engines": {
"node": ">=16.0.0",
"npm": ">=8.0.0"
},
"scripts": {
"pack": "node scripts/pack.mjs",
"test": "node --test 'tests/*.test.mjs'",
"test:integration": "node scripts/test-integration.mjs --skip-e2e",
"test:integration:all": "node scripts/test-integration.mjs",
"test:integration:e2e": "node scripts/test-integration.mjs --e2e",
"test:integration:agent": "node scripts/test-integration.mjs --agent",
"test:integration:docker": "node scripts/test-integration-docker.mjs",
"test:all": "npm test && npm run test:integration"
},
"devDependencies": {
"archiver": "^7.0.1",
"openclaw": "github:openclaw/openclaw#main"
},
"optionalDevDependencies": {
"_openclaw": "https://github.com/openclaw/openclaw.git#main"
},
"optionalDependencies": {
"sharp": "^0.33.0"
}
}
FILE:roi-workflow.md
# ROI(关注区域)交互工作流
**导航:** 返回 [SKILL.md](./SKILL.md) | 类型定义 [types-guide.md](./types-guide.md)
> 当用户需要为对象检测定义 ROI(关注区域)时,遵循此交互工作流。
## 工作流步骤
### 第 1 步:生成网格参考
```bash
node scripts/show-grid.mjs <image> <output-grid.png>
```
向用户显示网格图像并解释行和列标签。
### 第 2 步:询问 ROI 目的
"你想在这个区域检测什么?例如:
- 进入收银区域的人员
- 停车场中的车辆
- 货架上的产品"
### 第 3 步:用户指定顶点
用户根据网格标签指定角。
**矩形 ROI 示例:**
- 用户:"从 B2 到 G5"
- 转换为 4 个顶点:[B2、G2、G5、B5]
### 第 4 步:验证顶点顺序
"这些点是否围绕周边形成闭合多边形?(顺时针或逆时针)"
### 第 5 步:预览 ROI
```bash
node scripts/visualize.mjs <image> '[]' preview.png \
--overlays '[{"kind":"ROI","name":"test","points":[x1,y1,x2,y2,x3,y3,x4,y4]}]'
```
### 第 6 步:用户确认
"ROI 位置是否正确?"
### 第 7 步:调用技能
```bash
echo '{"input0":{"image":"photo.jpg","roi":{"id":"1","name":"zone","kind":"ROI","points":[...]}}}' | \
node scripts/invoke.mjs ep-xxxx-yyyy
```
> **重要:** 自定义 ROI 必须通过 `invoke.mjs` 传入。不要使用 `intent-invoke.mjs`,因为它会自动添加全图 ROI 覆盖掉你的自定义区域。
---
## 常见错误
❌ 不要假设矩形 ROI
✅ 支持任意多边形
❌ 不要允许无序的顶点
✅ 顶点必须按顺序排列
❌ 不要跳过预览验证
✅ 始终显示预览图像以确认
---
## 完整对话示例
### 示例 1:矩形入口区域
**用户**:"我想检测进入办公室的人员"
**你**:
1. 生成网格:
```bash
node scripts/show-grid.mjs office.jpg office-grid.png
```
2. 问:"查看网格,入口大概在哪里?"
3. 用户:"大约从 B1 到 G3"
4. 确认:"所以顶点是 B1、G1、G3、B3,对吧?"
5. 用户:"是的"
6. 预览 ROI
7. 问:"这个矩形是否覆盖了入口?"
8. 用户:"完美!"
9. 调用技能进行检测
### 示例 2:复杂多边形(L 形区域)
**用户**:"我需要监视 L 形存储区域"
**你**:
1. 生成网格
2. 问:"使用网格坐标标记 L 形的所有角,从一个角开始"
3. 用户:"从左上角开始:A2、E2、E4、D4、D6、A6"
4. 确认:"这些点是否按顺序形成 L 形的边界?确认它关闭回 A2"
5. 用户:"是的"
6. 预览 L 形多边形
7. 验证没有自交
8. 调用技能
---
## 数据结构
详见 [types-guide.md](./types-guide.md) 了解完整的定义、子对象和示例。
FILE:types-guide.md
# 易见类型定义指南
## 概述
易见平台处理视觉信息并返回结构化数据。理解这些类型可帮助你有效地使用技能输入和输出。
## 核心类型
### 基本类型
这些是在整个平台中使用的标量值。
| 类型 | 描述 | 示例 |
|------|------|------|
| `String` | 文本数据 | `"hello"` |
| `TemplateString` | 格式化文本 | `"Result: {value}"` |
| `Integer` | 整数 | `42` |
| `Double` | 浮点数 | `3.14` |
| `Boolean` | 真/假 | `true` |
| `Time` | 时间戳 | `"2026-03-12T10:00:00Z"` |
### 复杂类型
这些是包含视觉或检测信息的结构化对象。
#### Detection(检测)
表示图像中的检测对象。
```json
{
"bbox": [x, y, width, height],
"confidence": 0.94,
"category": {
"id": "person",
"name": "人体",
"confidence": 0.98
},
"track_id": 1,
"ocr": null,
"keypoints": null
}
```
**字段**:
- `bbox` - 边框 [x, y, width, height](像素坐标)
- `confidence` - 检测置信度(0-1)
- `category` - 对象分类,包含 id、name、confidence
- `track_id` - 跟踪 ID(用于帧间跟踪,可选)
- `ocr` - 文本识别结果(可选)
- `keypoints` - 姿态/骨骼关键点(可选)
#### TrackDetection(跟踪检测)
与 `Detection` 相同,但用于视频帧间的时间跟踪。
#### Attribute(属性)
表示图像中检测到的属性或属性。
```json
{
"attribute": "age",
"answer": "adult",
"confidence": 0.87
}
```
**字段**:
- `attribute` - 属性名称(例如"age"、"gender"、"expression")
- `answer` - 属性值
- `confidence` - 置信度分数
#### Image(图像)
表示技能的视觉输入。
```json
{
"imageData": "base64-encoded-image-data",
"imageWidth": 1920,
"imageHeight": 1080,
"sourceId": "abc123def456",
"imageId": "img001",
"timestamp": 1705056000000
}
```
**字段**:
- `imageData` - Base64 编码的图像字节(从文件路径自动编码)
- `imageWidth` - 图像宽度(像素)
- `imageHeight` - 图像高度(像素)
- `sourceId` - 源标识符,用于视频流(文件 MD5 哈希)
- `imageId` - 唯一的图像标识符
- `timestamp` - 帧时间戳(毫秒)
#### ROI(关注区域 / 电子围栏)
定义用于分析的多边形区域。
**完整工作流:** 见 [roi-workflow.md](./roi-workflow.md)
**数据结构**:
```json
{
"kind": "ROI",
"name": "entrance",
"points": [x1, y1, x2, y2, x3, y3, x4, y4]
}
```
**字段说明**:
- `kind` - 固定值"ROI"
- `name` - 区域描述名称(可选)
- `points` - 多边形顶点坐标数组,按顺序排列:`[x1,y1, x2,y2, x3,y3, ...]`
**示例**:
**矩形ROI - 收银区域**
```json
{
"kind": "ROI",
"name": "checkout-zone",
"points": [100, 100, 300, 100, 300, 250, 100, 250]
}
```
**多边形ROI - L形仓库区域**
```json
{
"kind": "ROI",
"name": "warehouse-l-zone",
"points": [100, 100, 400, 100, 400, 200, 300, 200, 300, 350, 100, 350]
}
```
**网格坐标输入**:
使用 `show-grid.mjs` 生成的网格坐标:
```
B1, E1, E3, B3 → 自动转换为像素坐标
```
#### Tripwire(绊线 / 穿越检测线)
定义用于穿越事件的检测线。
**完整工作流:** 见 [tripwire-workflow.md](./tripwire-workflow.md)
**核心概念 - 向量点乘法则**:
绊线穿越检测使用向量点乘:
1. **绊线向量** = p1 → p2 方向
2. **运动向量** = 前一帧到当前帧的对象移动
3. **点乘计算**:`dot = V_line.x * V_motion.x + V_line.y * V_motion.y`
4. **穿越判定**:
- `dot > 0` → 正向穿越(Forward,A→B)
- `dot < 0` → 反向穿越(Backward,B→A)
- TwoWay: 两个方向都检测
**数据结构**:
```json
{
"kind": "TripWire",
"name": "door-entrance",
"direction": "Forward",
"points": [p1_x, p1_y, p2_x, p2_y, p3_x, p3_y, p4_x, p4_y]
}
```
**字段说明**:
- `kind` - 固定值"TripWire"
- `name` - 检测线描述名称(可选)
- `direction` - "Forward" | "Backward" | "TwoWay"
- `points` - 8元素数组:
- p1, p2: 主检测线(用户指定)
- p3, p4: A-B区域标记(标识穿越方向)
**自动生成A-B点**:
用户可以只提供 **2个点**(p1, p2 主检测线),系统会自动生成 **p3, p4**(A-B区域标记):
1. **输入**: 用户给出2个点 `[p1_x, p1_y, p2_x, p2_y]`
2. **自动计算**:
```
dx, dy = 主线方向向量 (p2 - p1)
perpX, perpY = 旋转90度得到垂直向量
distance = 30px (默认)
p3 = p1 + perp * distance (A区域标记)
p4 = p2 + perp * distance (B区域标记)
```
3. **结果**: 自动补全为完整的4点结构 `[p1_x, p1_y, p2_x, p2_y, p3_x, p3_y, p4_x, p4_y]`
4. **生成预览图**: 使用 `visualize.mjs` 将Tripwire可视化
```bash
node scripts/visualize.mjs <image> '[]' preview.png \
--overlays '[{
"kind": "TripWire",
"name": "绊线预览",
"direction": "Forward",
"points": [p1_x, p1_y, p2_x, p2_y, p3_x, p3_y, p4_x, p4_y]
}]'
```
输出:`preview.png` 显示绊线虚线、A/B标记、方向箭头
5. **用户确认**: 显示预览图让用户确认:
- A/B区域位置是否正确
- 方向箭头是否指向期望方向
- 绊线是否跨越检测区域
**示例**:
用户输入:`[150, 100, 150, 300]` (竖直线)
系统自动生成:`[150, 100, 150, 300, 120, 100, 120, 300]` (左偏30px)
生成预览:
```bash
node scripts/visualize.mjs door.jpg '[]' preview.png \
--overlays '[{"kind":"TripWire","name":"门","direction":"Forward","points":[150,100,150,300,120,100,120,300]}]'
```
用户看预览图确认A(左)→B(右)方向是否符合预期
**结构可视化**:
```
p1 --------> p2 (主检测线,p1→p2)
^ ^
| |
p3 p4 (A-B区域标记)
• Forward: 检测p3→p4方向的穿越(A→B)
• Backward: 检测p4→p3方向的穿越(B→A)
• TwoWay: 双向都检测
```
**方向可视化**:
- **Forward** - 绿色箭头在p4,表示A→B方向
- **Backward** - 红色箭头在p3,表示B→A方向
- **TwoWay** - 蓝色箭头在p3和p4,表示双向
**示例**:
**垂直检测线 - 人员进出统计**
```json
{
"kind": "TripWire",
"name": "door-entrance",
"direction": "Forward",
"points": [150, 100, 150, 300, 130, 100, 170, 300]
}
```
说明:
- p1(150,100) → p2(150,300): 竖直线
- p3(130,100) → p4(170,300): A-B标记(左右偏移30px)
- Forward: 只检测从左→右的穿越
**水平检测线 - 考勤打卡**
```json
{
"kind": "TripWire",
"name": "checkin-line",
"direction": "TwoWay",
"points": [100, 200, 400, 200, 100, 180, 400, 220]
}
```
说明:
- p1(100,200) → p2(400,200): 水平线
- p3(100,180) → p4(400,220): A-B标记(上下偏移20px)
- TwoWay: 双向都统计
### 数组类型
任何类型都可以包装在数组中:
```json
{
"type": "Array<Detection>",
"example": [
{ "bbox": [...], "confidence": 0.9, ... },
{ "bbox": [...], "confidence": 0.85, ... }
]
}
```
常见数组:
- `Array<Detection>` - 多个检测对象
- `Array<TrackDetection>` - 帧间跟踪的对象
- `Array<Attribute>` - 多个属性结果
- `Array<ROI>` - 多个区域
- `Array<Tripwire>` - 多条检测线
## 可视化和交互
### 检测可视化
使用 `visualize.mjs` 在图像上可视化检测结果:
```bash
node scripts/visualize.mjs photo.jpg '<detection-json>' output.jpg
```
**支持**:
- 带置信度分数的边框
- 类别标签
- 用于跟踪可视化的跟踪 ID
- 叠加层(ROI 区域、绊线线条)
- 文本注释
### ROI/绊线输入
用于区域和障碍的交互式输入:
1. 生成网格参考图像:
```bash
node scripts/show-grid.mjs photo.jpg
```
2. 使用网格参考指定坐标:
```
ROI: B1, E1, E3, B3
Tripwire: A2 → G2 (direction: Forward)
```
3. 网格坐标自动转换为像素坐标
详见 [grid-guide.md](./grid-guide.md)。
## 视频帧提取
用于视频帧提取和多帧分析:
```json
{
"imageData": "base64-frame-data",
"sourceId": "video_abc123", // 每个视频源唯一
"imageId": "frame_001", // 每帧唯一
"timestamp": 1705056000000 // 帧时间戳,单位毫秒
}
```
**sourceId 计算**:
```
sourceId = MD5(视频文件的前 64KB) → 16 字符十六进制字符串
```
**timestamp 计算**:
```
timestamp = 帧索引 * 1000 / 帧率
```
详见 [video-guide.md](./video-guide.md)。
## 类型转换
### 输入转换
当将文件作为 `Image` 类型传递时:
```bash
# 文件路径自动转换为 base64
echo '{"input0":{"image":"photo.jpg"}}' | node invoke.mjs ep-detect
```
### 输出解析
检测结果会自动解析:
```javascript
// 原始输出
{ "detections": "[{\"bbox\": [...]}]" }
// 解析后的输出
{ "detections": [{ "bbox": [...] }] } // JSON 数组,可以直接使用
```
## 最佳实践
1. **始终验证边框** - 确保坐标在图像边界内
2. **使用 track_id 以保持连续性** - 跟踪帧间的对象以获得更好的结果
3. **有效组合类型** - 使用 ROI 限制检测区域
4. **时间戳精度** - 为视频分析维持一致的时间戳计算
5. **sourceId 一致性** - 为来自同一源的所有帧保持 sourceId 常数
## 故障排除
**"无效的检测"**
- 检查边框格式:[x, y, width, height]
- 验证坐标为非负数
- 确保框在图像边界内
**"缺少关键点"**
- 并非所有技能都支持关键点
- 检查技能文档以了解支持的输出
**"ROI 未被识别"**
- 验证多边形是否闭合(第一个点 ≠ 最后一个点)
- 确保至少有 3 个点
- 检查点的顺序是否正确(顺时针或逆时针)
**"sourceId 不匹配"**
- 如果视频文件改变,重新计算 sourceId
- 对来自同一源的所有帧使用相同的 sourceId
FILE:scripts/visualize.mjs
#!/usr/bin/env node
import fs from 'fs';
import path from 'path';
import sharp from 'sharp';
/**
* Read all stdin as a string (for piped input).
*/
function readStdin() {
return new Promise((resolve, reject) => {
const chunks = [];
process.stdin.setEncoding('utf-8');
process.stdin.on('data', (chunk) => chunks.push(chunk));
process.stdin.on('end', () => resolve(chunks.join('')));
process.stdin.on('error', reject);
});
}
/**
* Escape XML special characters for safe SVG embedding.
*/
export function escapeXml(str) {
return String(str)
.replace(/&/g, '&')
.replace(/</g, '<')
.replace(/>/g, '>')
.replace(/"/g, '"')
.replace(/'/g, ''');
}
/**
* Build SVG for detection bounding boxes and labels (green).
*/
export function buildDetectionSvg(detections, width, height) {
const rects = [];
const fontSize = 14;
const padding = 4;
const labelHeight = fontSize + padding * 2;
for (const det of detections) {
const [x, y, w, h] = det.bbox;
// Bounding box rectangle
rects.push(
`<rect x="x" y="y" width="w" height="h" fill="none" stroke="rgba(0,255,0,0.8)" stroke-width="2"/>`
);
// Label text
const parts = [];
if (det.track_id != null) parts.push(`#det.track_id`);
if (det.category && det.category.name) parts.push(det.category.name);
if (det.confidence != null) parts.push((det.confidence * 100).toFixed(0) + '%');
const label = parts.join(' ');
if (label) {
const labelWidth = label.length * (fontSize * 0.65) + padding * 2;
const labelY = Math.max(y - labelHeight, 0);
// Label background
rects.push(
`<rect x="x" y="labelY" width="labelWidth" height="labelHeight" fill="rgba(0,255,0,0.7)" rx="2"/>`
);
// Label text
rects.push(
`<text x="x + padding" y="labelY + fontSize + padding - 2" font-family="sans-serif" font-size="fontSize" fill="white">escapeXml(label)</text>`
);
}
}
return `<svg xmlns="http://www.w3.org/2000/svg" width="width" height="height">rects.join('')</svg>`;
}
/**
* Helper function to draw an arrow at a specific point.
* @param {Array} elements - Array to push SVG elements to
* @param {Object} point - {x, y} coordinates
* @param {number} dirX - Unit direction vector X component
* @param {number} dirY - Unit direction vector Y component
* @param {number} arrowSize - Size of the arrow
* @param {string} color - Arrow color (rgba string)
*/
function drawArrowAtPoint(elements, point, dirX, dirY, arrowSize, color) {
const tipX = point.x + dirX * arrowSize;
const tipY = point.y + dirY * arrowSize;
// Perpendicular base vectors
const perpX = -dirY * arrowSize * 0.6;
const perpY = dirX * arrowSize * 0.6;
const b1x = point.x - dirX * arrowSize - perpX;
const b1y = point.y - dirY * arrowSize - perpY;
const b2x = point.x - dirX * arrowSize + perpX;
const b2y = point.y - dirY * arrowSize + perpY;
elements.push(
`<polygon points="tipX,tipY b1x,b1y b2x,b2y" fill="color"/>`
);
}
/**
* Build SVG for ROI polygons and Tripwire polylines (input overlays).
*
* Each overlay item:
* { kind: "ROI", points: [x1,y1,x2,y2,...], name?: string }
* { kind: "TripWire", points: [x1,y1,x2,y2,...], name?: string, direction?: string }
*
* Each crossing item (optional):
* { tripwireId?: string, prevPos: [x,y], currPos: [x,y], dotProduct: number }
*/
export function buildOverlaysSvg(overlays, width, height, crossings = []) {
const elements = [];
const fontSize = 13;
const padding = 4;
const labelHeight = fontSize + padding * 2;
for (const overlay of overlays) {
const pts = overlay.points || [];
// Convert flat [x1,y1,x2,y2,...] to "x1,y1 x2,y2 ..."
const pointPairs = [];
for (let i = 0; i < pts.length - 1; i += 2) {
pointPairs.push(`pts[i],pts[i + 1]`);
}
const pointsStr = pointPairs.join(' ');
if (overlay.kind === 'ROI') {
// ROI: blue polygon with semi-transparent fill
elements.push(
`<polygon points="pointsStr" fill="rgba(0,150,255,0.15)" stroke="rgba(0,150,255,0.6)" stroke-width="2"/>`
);
// Label at first vertex
const name = overlay.name || overlay.displayName || '';
if (name && pts.length >= 2) {
const lx = pts[0];
const ly = Math.max(pts[1] - labelHeight, 0);
const labelWidth = name.length * (fontSize * 0.65) + padding * 2;
elements.push(
`<rect x="lx" y="ly" width="labelWidth" height="labelHeight" fill="rgba(0,150,255,0.8)" rx="2"/>`
);
elements.push(
`<text x="lx + padding" y="ly + fontSize + padding - 2" font-family="sans-serif" font-size="fontSize" fill="white">escapeXml(name)</text>`
);
}
} else if (overlay.kind === 'TripWire') {
// TripWire with 4-point structure support
// points = [x1,y1, x2,y2, x3,y3, x4,y4]
// p1-p2: main tripwire
// p3-p4: A-B detection line with directional arrows
// If only 2-point format provided, auto-generate perpendicular p3-p4
let p1, p2, p3, p4;
if (pts.length >= 8) {
// 4-point structure (explicit format)
p1 = {x: pts[0], y: pts[1]};
p2 = {x: pts[2], y: pts[3]};
p3 = {x: pts[4], y: pts[5]};
p4 = {x: pts[6], y: pts[7]};
} else if (pts.length >= 4) {
// 2-point format: auto-generate perpendicular A-B points
p1 = {x: pts[0], y: pts[1]};
p2 = {x: pts[2], y: pts[3]};
// Calculate perpendicular distance for A-B points
const dx = p2.x - p1.x;
const dy = p2.y - p1.y;
const len = Math.sqrt(dx * dx + dy * dy) || 1;
// Unit perpendicular vector (rotated 90 degrees counter-clockwise)
const perpX = -dy / len;
const perpY = dx / len;
// Distance from the main line to A-B points
const perpDist = 30;
// Generate A-B points perpendicular to the main tripwire
p3 = {
x: p1.x + perpX * perpDist,
y: p1.y + perpY * perpDist
};
p4 = {
x: p2.x + perpX * perpDist,
y: p2.y + perpY * perpDist
};
} else {
// Not enough points, skip rendering
p1 = p2 = p3 = p4 = null;
}
if (p1 && p2 && p3 && p4) {
// Draw main tripwire line (p1-p2) - dashed, no arrow
elements.push(
`<line x1="p1.x" y1="p1.y" x2="p2.x" y2="p2.y" stroke="rgba(255,165,0,0.8)" stroke-width="2" stroke-dasharray="8,4"/>`
);
// Draw A-B detection line (p3-p4) - dashed with direction arrow
elements.push(
`<line x1="p3.x" y1="p3.y" x2="p4.x" y2="p4.y" stroke="rgba(255,165,0,0.8)" stroke-width="2" stroke-dasharray="8,4"/>`
);
// Calculate direction vector (p3 -> p4)
const dx = p4.x - p3.x;
const dy = p4.y - p3.y;
const len = Math.sqrt(dx * dx + dy * dy) || 1;
const unitX = dx / len;
const unitY = dy / len;
const dir = overlay.direction || 'TwoWay';
const arrowSize = 12;
// Draw directional arrows
if (dir === 'Forward') {
// Arrow at p4, pointing toward p4 (green)
drawArrowAtPoint(elements, p4, unitX, unitY, arrowSize, 'rgba(76,175,80,0.8)');
} else if (dir === 'Backward') {
// Arrow at p3, pointing toward p3 (red)
drawArrowAtPoint(elements, p3, -unitX, -unitY, arrowSize, 'rgba(244,67,54,0.8)');
} else if (dir === 'TwoWay') {
// Arrows at both p3 and p4 (blue)
drawArrowAtPoint(elements, p3, -unitX, -unitY, arrowSize, 'rgba(33,150,243,0.8)');
drawArrowAtPoint(elements, p4, unitX, unitY, arrowSize, 'rgba(33,150,243,0.8)');
}
// Add region labels 'A' and 'B'
const labelFontSize = 12;
const labelOffset = 8;
// Label 'A' near p3
elements.push(
`<text x="p3.x - labelOffset" y="p3.y - labelOffset" font-family="sans-serif" font-size="labelFontSize" fill="rgba(255,165,0,0.8)" font-weight="bold">A</text>`
);
// Label 'B' near p4
elements.push(
`<text x="p4.x + labelOffset" y="p4.y - labelOffset" font-family="sans-serif" font-size="labelFontSize" fill="rgba(255,165,0,0.8)" font-weight="bold">B</text>`
);
}
// Label at first vertex (applies to both formats)
const name = overlay.name || overlay.displayName || '';
if (name && pts.length >= 2) {
const lx = pts[0];
const ly = Math.max(pts[1] - labelHeight, 0);
const labelWidth = name.length * (fontSize * 0.65) + padding * 2;
elements.push(
`<rect x="lx" y="ly" width="labelWidth" height="labelHeight" fill="rgba(255,165,0,0.8)" rx="2"/>`
);
elements.push(
`<text x="lx + padding" y="ly + fontSize + padding - 2" font-family="sans-serif" font-size="fontSize" fill="white">escapeXml(name)</text>`
);
}
}
}
// Draw crossing vectors perpendicular to tripwire
for (const crossing of crossings) {
const { tripwireId, prevPos, currPos, dotProduct, direction } = crossing;
if (!prevPos || !currPos) continue;
const [prevX, prevY] = prevPos;
const [currX, currY] = currPos;
// Crossing point is midpoint between previous and current position
const crossX = (prevX + currX) / 2;
const crossY = (prevY + currY) / 2;
// Determine color based on crossing type
let arrowColor, dotColor;
if (direction === 'Forward') {
// Forward crossing (dot > 0): green
arrowColor = 'rgba(76,175,80,0.8)';
dotColor = '#4caf50';
} else if (direction === 'Backward') {
// Backward crossing (dot < 0): red
arrowColor = 'rgba(244,67,54,0.8)';
dotColor = '#f44336';
} else {
// TwoWay crossing: blue
arrowColor = 'rgba(33,150,243,0.8)';
dotColor = '#2196f3';
}
// Find the associated tripwire to get its direction
let tripwirePerpX = 1; // default perpendicular
let tripwirePerpY = 0;
for (const overlay of overlays) {
if (overlay.kind === 'TripWire' && overlay.name === tripwireId) {
// Get tripwire direction vector
const pts = Array.isArray(overlay.points[0]) ? overlay.points.flat() : overlay.points;
if (pts.length >= 4) {
const midIdx = Math.floor(pts.length / 4);
const x1 = pts[midIdx * 2];
const y1 = pts[midIdx * 2 + 1];
const x2 = pts[(midIdx + 1) * 2];
const y2 = pts[(midIdx + 1) * 2 + 1];
const dx = x2 - x1;
const dy = y2 - y1;
const len = Math.sqrt(dx * dx + dy * dy) || 1;
// Perpendicular vector (rotated 90 degrees)
tripwirePerpX = -dy / len;
tripwirePerpY = dx / len;
}
break;
}
}
const arrowSize = 12;
if (direction === 'TwoWay') {
// TwoWay: draw arrows in both directions perpendicular to tripwire
// Forward arrow
const fx1 = crossX + tripwirePerpX * arrowSize;
const fy1 = crossY + tripwirePerpY * arrowSize;
elements.push(
`<line x1="crossX" y1="crossY" x2="fx1" y2="fy1" stroke="arrowColor" stroke-width="2.5" opacity="0.8"/>`
);
const perp2X = tripwirePerpY * arrowSize * 0.6;
const perp2Y = -tripwirePerpX * arrowSize * 0.6;
elements.push(
`<polygon points="fx1,fy1 fx1 - tripwirePerpX * arrowSize - perp2X,fy1 - tripwirePerpY * arrowSize - perp2Y fx1 - tripwirePerpX * arrowSize + perp2X,fy1 - tripwirePerpY * arrowSize + perp2Y" fill="arrowColor"/>`
);
// Backward arrow
const bx1 = crossX - tripwirePerpX * arrowSize;
const by1 = crossY - tripwirePerpY * arrowSize;
elements.push(
`<line x1="crossX" y1="crossY" x2="bx1" y2="by1" stroke="arrowColor" stroke-width="2.5" opacity="0.8"/>`
);
elements.push(
`<polygon points="bx1,by1 bx1 + tripwirePerpX * arrowSize - perp2X,by1 + tripwirePerpY * arrowSize - perp2Y bx1 + tripwirePerpX * arrowSize + perp2X,by1 + tripwirePerpY * arrowSize + perp2Y" fill="arrowColor"/>`
);
// Center circle
elements.push(
`<circle cx="crossX" cy="crossY" r="4" fill="dotColor" stroke="white" stroke-width="1.5"/>`
);
// Label
elements.push(
`<text x="crossX + 10" y="crossY - 5" font-size="11" fill="dotColor" font-weight="bold">⟷ TwoWay</text>`
);
} else {
// Forward or Backward: single arrow perpendicular to tripwire
const sign = direction === 'Forward' ? 1 : -1;
const arrowX = crossX + tripwirePerpX * arrowSize * sign;
const arrowY = crossY + tripwirePerpY * arrowSize * sign;
// Arrow shaft
elements.push(
`<line x1="crossX" y1="crossY" x2="arrowX" y2="arrowY" stroke="arrowColor" stroke-width="2.5" opacity="0.8"/>`
);
// Arrow head
const perpBaseX = tripwirePerpY * arrowSize * 0.6;
const perpBaseY = -tripwirePerpX * arrowSize * 0.6;
elements.push(
`<polygon points="arrowX,arrowY arrowX - tripwirePerpX * arrowSize * sign - perpBaseX,arrowY - tripwirePerpY * arrowSize * sign - perpBaseY arrowX - tripwirePerpX * arrowSize * sign + perpBaseX,arrowY - tripwirePerpY * arrowSize * sign + perpBaseY" fill="arrowColor"/>`
);
// Center circle
elements.push(
`<circle cx="crossX" cy="crossY" r="4" fill="dotColor" stroke="white" stroke-width="1.5"/>`
);
// Label
const dirLabel = direction === 'Forward' ? '✓ Forward' : '✗ Backward';
elements.push(
`<text x="crossX + 10" y="crossY - 5" font-size="11" fill="dotColor" font-weight="bold">dirLabel</text>`
);
}
}
return `<svg xmlns="http://www.w3.org/2000/svg" width="width" height="height">elements.join('')</svg>`;
}
/**
* Build SVG for text lines displayed in a footer area.
* Returns { svg: string, height: number }.
*
* The SVG is positioned to sit at the bottom of the extended canvas,
* starting at y=imageHeight.
*/
export function buildTextFooterSvg(textLines, width, imageHeight) {
const fontSize = 16;
const lineHeight = fontSize + 8;
const paddingTop = 10;
const paddingBottom = 10;
const paddingLeft = 12;
const totalHeight = paddingTop + textLines.length * lineHeight + paddingBottom;
const elements = [];
// Dark background
elements.push(
`<rect x="0" y="imageHeight" width="width" height="totalHeight" fill="rgba(0,0,0,0.75)"/>`
);
// Text lines
for (let i = 0; i < textLines.length; i++) {
const ty = imageHeight + paddingTop + (i + 1) * lineHeight - 4;
elements.push(
`<text x="paddingLeft" y="ty" font-family="sans-serif" font-size="fontSize" fill="white">escapeXml(textLines[i])</text>`
);
}
const svg = `<svg xmlns="http://www.w3.org/2000/svg" width="width" height="imageHeight + totalHeight">elements.join('')</svg>`;
return { svg, height: totalHeight };
}
/**
* Parse named CLI flags from argv.
* Returns { positional: string[], flags: Record<string, string> }.
*/
function parseArgs(argv) {
const positional = [];
const flags = {};
let i = 0;
while (i < argv.length) {
if (argv[i].startsWith('--') && i + 1 < argv.length) {
const key = argv[i].slice(2);
flags[key] = argv[i + 1];
i += 2;
} else {
positional.push(argv[i]);
i += 1;
}
}
return { positional, flags };
}
async function main() {
const { positional, flags } = parseArgs(process.argv.slice(2));
const inputImage = positional[0];
let detectionsArg = positional[1];
let outputPath = positional[2];
if (!inputImage || !detectionsArg) {
console.error('Usage: node visualize.mjs <input-image> <detections-json | -> [output-path] [--overlays \'<json>\'] [--text \'<json>\'] [--crossings \'<json>\']');
console.error('');
console.error(' <detections-json> JSON array of detections (parsedValue format), or "-" to read from stdin');
console.error(' [output-path] Optional output path. Default: <input>_detection.<ext>');
console.error(' --overlays JSON array of ROI/Tripwire overlay objects');
console.error(' --text JSON array of text lines to display as footer');
console.error(' --crossings JSON array of tripwire crossing events with motion vectors');
process.exit(1);
}
// Resolve input image
const resolvedInput = path.resolve(inputImage);
if (!fs.existsSync(resolvedInput)) {
console.error(`Input image not found: resolvedInput`);
process.exit(1);
}
// Read detections JSON
let detectionsJson;
if (detectionsArg === '-') {
detectionsJson = await readStdin();
} else {
detectionsJson = detectionsArg;
}
let detections;
try {
detections = JSON.parse(detectionsJson);
} catch (err) {
console.error(`Failed to parse detections JSON: err.message`);
process.exit(1);
}
if (!Array.isArray(detections)) {
console.error('Detections must be a JSON array');
process.exit(1);
}
// Parse --overlays
let overlays = [];
if (flags.overlays) {
try {
overlays = JSON.parse(flags.overlays);
if (!Array.isArray(overlays)) {
console.error('--overlays must be a JSON array');
process.exit(1);
}
} catch (err) {
console.error(`Failed to parse --overlays JSON: err.message`);
process.exit(1);
}
}
// Parse --text
let textLines = [];
if (flags.text) {
try {
textLines = JSON.parse(flags.text);
if (!Array.isArray(textLines)) {
console.error('--text must be a JSON array');
process.exit(1);
}
} catch (err) {
console.error(`Failed to parse --text JSON: err.message`);
process.exit(1);
}
}
// Parse --crossings
let crossings = [];
if (flags.crossings) {
try {
crossings = JSON.parse(flags.crossings);
if (!Array.isArray(crossings)) {
console.error('--crossings must be a JSON array');
process.exit(1);
}
} catch (err) {
console.error(`Failed to parse --crossings JSON: err.message`);
process.exit(1);
}
}
// Validate: must have something to draw
if (detections.length === 0 && overlays.length === 0 && textLines.length === 0) {
console.error('Nothing to draw: detections, overlays, and text are all empty');
process.exit(1);
}
// Default output path
if (!outputPath) {
const ext = path.extname(inputImage);
const base = path.basename(inputImage, ext);
const dir = path.dirname(resolvedInput);
outputPath = path.join(dir, `base_detectionext`);
} else {
outputPath = path.resolve(outputPath);
}
// Read image metadata
const image = sharp(resolvedInput);
const metadata = await image.metadata();
const { width, height } = metadata;
// Build composite layers
const composites = [];
// Layer 1: input overlays (ROI/Tripwire) — drawn first (bottom layer)
if (overlays.length > 0) {
const overlaySvg = buildOverlaysSvg(overlays, width, height, crossings);
composites.push({ input: Buffer.from(overlaySvg), top: 0, left: 0 });
}
// Layer 2: detection bboxes — drawn on top of overlays
if (detections.length > 0) {
const detSvg = buildDetectionSvg(detections, width, height);
composites.push({ input: Buffer.from(detSvg), top: 0, left: 0 });
}
// Layer 3: text footer — extend canvas and draw at bottom
let pipeline = image;
if (textLines.length > 0) {
const { svg: textSvg, height: textHeight } = buildTextFooterSvg(textLines, width, height);
pipeline = pipeline.extend({
bottom: textHeight,
background: { r: 0, g: 0, b: 0, alpha: 1 },
});
composites.push({ input: Buffer.from(textSvg), top: 0, left: 0 });
}
// Composite and save
await pipeline
.composite(composites)
.toFile(outputPath);
console.log(outputPath);
}
import { fileURLToPath } from 'url';
if (process.argv[1] === fileURLToPath(import.meta.url)) {
main();
}
FILE:scripts/show-grid.mjs
#!/usr/bin/env node
import fs from 'fs';
import path from 'path';
import sharp from 'sharp';
import { fileURLToPath } from 'url';
/**
* Escape XML special characters for safe SVG embedding.
*/
function escapeXml(str) {
return String(str)
.replace(/&/g, '&')
.replace(/</g, '<')
.replace(/>/g, '>')
.replace(/"/g, '"')
.replace(/'/g, ''');
}
/**
* Compute adaptive grid dimensions so that cells are approximately square.
*
* @param {number} imageWidth
* @param {number} imageHeight
* @returns {{ cols: number, rows: number }}
*/
export function computeGridSize(imageWidth, imageHeight) {
// Target ~30-42 intersection points with roughly square cells.
// We pick the short side = 4 segments and scale the long side proportionally.
const minSegments = 4;
const ratio = imageWidth / imageHeight;
let cols, rows;
if (ratio >= 1) {
// Landscape or square
rows = minSegments;
cols = Math.round(rows * ratio);
// Clamp cols to a reasonable range
cols = Math.max(minSegments, Math.min(cols, 12));
} else {
// Portrait
cols = minSegments;
rows = Math.round(cols / ratio);
rows = Math.max(minSegments, Math.min(rows, 12));
}
return { cols, rows };
}
/**
* Generate column labels: A, B, C, ..., Z, AA, AB, ...
*/
export function generateColLabels(count) {
const labels = [];
for (let i = 0; i < count; i++) {
let label = '';
let n = i;
do {
label = String.fromCharCode(65 + (n % 26)) + label;
n = Math.floor(n / 26) - 1;
} while (n >= 0);
labels.push(label);
}
return labels;
}
/**
* Generate row labels: 0, 1, 2, ...
*/
export function generateRowLabels(count) {
return Array.from({ length: count }, (_, i) => String(i));
}
/**
* Build an SVG overlay with grid lines, intersection dots, and labels.
*
* The SVG covers the full extended canvas (margin + image).
*
* @param {object} params
* @param {number} params.imageWidth
* @param {number} params.imageHeight
* @param {number} params.cols - number of column segments (cols+1 vertical lines)
* @param {number} params.rows - number of row segments (rows+1 horizontal lines)
* @param {number} params.marginTop
* @param {number} params.marginLeft
* @param {number} params.marginBottom
* @param {number} params.marginRight
* @returns {string} SVG string
*/
export function buildGridSvg({ imageWidth, imageHeight, cols, rows, marginTop, marginLeft, marginBottom, marginRight }) {
const totalWidth = marginLeft + imageWidth + marginRight;
const totalHeight = marginTop + imageHeight + marginBottom;
const gridWidth = imageWidth / cols;
const gridHeight = imageHeight / rows;
const colLabels = generateColLabels(cols + 1);
const rowLabels = generateRowLabels(rows + 1);
const elements = [];
const fontSize = 16;
const dotR = 5;
// Grid lines — vertical
for (let c = 0; c <= cols; c++) {
const x = marginLeft + c * gridWidth;
elements.push(
`<line x1="x" y1="marginTop" x2="x" y2="totalHeight" stroke="rgba(255,255,255,0.55)" stroke-width="2"/>`
);
}
// Grid lines — horizontal
for (let r = 0; r <= rows; r++) {
const y = marginTop + r * gridHeight;
elements.push(
`<line x1="marginLeft" y1="y" x2="totalWidth" y2="y" stroke="rgba(255,255,255,0.55)" stroke-width="2"/>`
);
}
// Intersection dots
for (let c = 0; c <= cols; c++) {
for (let r = 0; r <= rows; r++) {
const cx = marginLeft + c * gridWidth;
const cy = marginTop + r * gridHeight;
elements.push(
`<circle cx="cx" cy="cy" r="dotR" fill="white" stroke="rgba(0,0,0,0.7)" stroke-width="1.5"/>`
);
}
}
// Column labels — along the top margin
for (let c = 0; c <= cols; c++) {
const x = marginLeft + c * gridWidth;
const textWidth = colLabels[c].length * fontSize * 0.65;
const bgWidth = textWidth + 10;
const bgX = x - bgWidth / 2;
const bgY = 2;
const bgH = fontSize + 10;
elements.push(
`<rect x="bgX" y="bgY" width="bgWidth" height="bgH" fill="rgba(0,0,0,0.75)" rx="3"/>`
);
elements.push(
`<text x="x" y="bgY + fontSize + 3" font-family="sans-serif" font-size="fontSize" font-weight="bold" fill="white" text-anchor="middle">escapeXml(colLabels[c])</text>`
);
}
// Row labels — along the left margin
for (let r = 0; r <= rows; r++) {
const y = marginTop + r * gridHeight;
const label = rowLabels[r];
const textWidth = label.length * fontSize * 0.65;
const bgWidth = textWidth + 10;
const bgH = fontSize + 10;
const bgX = 2;
const bgY = y - bgH / 2;
elements.push(
`<rect x="bgX" y="bgY" width="bgWidth" height="bgH" fill="rgba(0,0,0,0.75)" rx="3"/>`
);
elements.push(
`<text x="bgX + bgWidth / 2" y="bgY + fontSize + 3" font-family="sans-serif" font-size="fontSize" font-weight="bold" fill="white" text-anchor="middle">escapeXml(label)</text>`
);
}
return `<svg xmlns="http://www.w3.org/2000/svg" width="totalWidth" height="totalHeight">elements.join('')</svg>`;
}
/**
* Parse named CLI flags from argv.
*/
function parseArgs(argv) {
const positional = [];
const flags = {};
let i = 0;
while (i < argv.length) {
if (argv[i].startsWith('--') && i + 1 < argv.length) {
const key = argv[i].slice(2);
flags[key] = argv[i + 1];
i += 2;
} else {
positional.push(argv[i]);
i += 1;
}
}
return { positional, flags };
}
async function main() {
const { positional, flags } = parseArgs(process.argv.slice(2));
const inputImage = positional[0];
let outputPath = positional[1];
if (!inputImage) {
console.error('Usage: node show-grid.mjs <input-image> [output-path] [--cols N] [--rows N]');
console.error('');
console.error(' Generates a grid reference image for specifying ROI/Tripwire coordinates.');
console.error(' --cols Number of column segments (default: auto based on aspect ratio)');
console.error(' --rows Number of row segments (default: auto based on aspect ratio)');
process.exit(1);
}
const resolvedInput = path.resolve(inputImage);
if (!fs.existsSync(resolvedInput)) {
console.error(`Input image not found: resolvedInput`);
process.exit(1);
}
// Read image metadata
const image = sharp(resolvedInput);
const metadata = await image.metadata();
const { width: imageWidth, height: imageHeight } = metadata;
// Compute grid size
let { cols, rows } = computeGridSize(imageWidth, imageHeight);
if (flags.cols) cols = parseInt(flags.cols, 10);
if (flags.rows) rows = parseInt(flags.rows, 10);
if (cols < 1 || rows < 1 || isNaN(cols) || isNaN(rows)) {
console.error('--cols and --rows must be positive integers');
process.exit(1);
}
const marginTop = 32;
const marginLeft = 32;
const marginBottom = 18;
const marginRight = 18;
// Build SVG
const svg = buildGridSvg({ imageWidth, imageHeight, cols, rows, marginTop, marginLeft, marginBottom, marginRight });
// Default output path
if (!outputPath) {
const ext = path.extname(inputImage);
const base = path.basename(inputImage, ext);
const dir = path.dirname(resolvedInput);
outputPath = path.join(dir, `base_gridext`);
} else {
outputPath = path.resolve(outputPath);
}
// Extend canvas for margins, composite grid SVG
await image
.extend({
top: marginTop,
left: marginLeft,
bottom: marginBottom,
right: marginRight,
background: { r: 30, g: 30, b: 30, alpha: 1 },
})
.composite([{ input: Buffer.from(svg), top: 0, left: 0 }])
.toFile(outputPath);
// Output metadata JSON for Agent consumption
const gridWidth = imageWidth / cols;
const gridHeight = imageHeight / rows;
const colLabels = generateColLabels(cols + 1);
const rowLabels = generateRowLabels(rows + 1);
const result = {
path: outputPath,
cols,
rows,
colLabels,
rowLabels,
gridWidth: Math.round(gridWidth * 100) / 100,
gridHeight: Math.round(gridHeight * 100) / 100,
marginLeft,
marginTop,
imageWidth,
imageHeight,
};
console.log(JSON.stringify(result));
}
if (process.argv[1] === fileURLToPath(import.meta.url)) {
main();
}
FILE:scripts/intent-invoke.mjs
#!/usr/bin/env node
import { querySkills } from './query.mjs';
import { callMultimodal } from './multimodal.mjs';
import { runSkill } from './invoke.mjs';
import { getWorkspaceSkills } from './workspace.mjs';
const DEFAULT_CONFIDENCE_THRESHOLD = 0.7;
const MAX_SKILL_RETRIES = 3;
/**
* Determine if we should use the skill based on confidence threshold.
*/
export function shouldUseSkill(skill, threshold = DEFAULT_CONFIDENCE_THRESHOLD) {
return !!(skill && skill.epId && typeof skill.confidence === 'number' && skill.confidence >= threshold);
}
/**
* Format the final result consistently.
*/
export function formatResult(mode, skill, detections = []) {
return {
success: true,
mode,
epId: skill?.epId || null,
skillName: skill?.name || null,
confidence: skill?.confidence || null,
count: detections.length,
detections
};
}
/**
* Extract detections array from a runSkill() result.
*/
function extractDetections(result) {
if (result?.未检出) return [];
const outputs = result?.result?.outputs;
if (!outputs || outputs.length === 0) return [];
const output = outputs[0];
if (output?.parsedValue && Array.isArray(output.parsedValue)) {
return output.parsedValue;
}
return [];
}
/**
* Main orchestration function:
* 1. Query router for skills matching intent
* 2. Try top skills one by one (up to MAX_SKILL_RETRIES)
* 3. If all skills fail, fallback to multimodal
*/
export async function intentInvoke(intent, imagePath, options = {}) {
const { threshold = DEFAULT_CONFIDENCE_THRESHOLD, topK = 5 } = options;
// Step 1: Query for matching skills
let skills;
try {
skills = await querySkills(intent, { topK });
} catch (err) {
// Query failed, will fallback to multimodal
skills = [];
}
// Step 2: Try top skills one by one
const candidates = skills.filter(s => shouldUseSkill(s, threshold)).slice(0, MAX_SKILL_RETRIES);
for (const skill of candidates) {
try {
const { result } = await runSkill(skill.epId, { input0: { image: imagePath } }, { autoROI: true });
const detections = extractDetections(result);
return formatResult('skill', skill, detections);
} catch (err) {
console.error(`Skill skill.epId (skill.name) failed: err.message`);
// Continue to next skill
}
}
// Step 3: Public skills all failed — search private workspace
try {
const privateSkills = await getWorkspaceSkills();
if (privateSkills.length > 0) {
return {
success: true,
mode: 'workspace-search',
epId: null,
skillName: null,
confidence: null,
count: 0,
detections: [],
privateSkills: privateSkills.map(s => ({
epId: s.epId,
displayName: s.displayName,
description: s.description,
})),
hint: '公共技能无匹配。请从 privateSkills 列表中选择最匹配的技能,使用 invoke.mjs 调用。如无合适技能,可调用 multimodal.mjs 回退。',
};
}
} catch (err) {
console.error(`Private workspace search failed: err.message`);
// Continue to multimodal fallback
}
// Step 4: Fallback to multimodal
try {
const detections = await callMultimodal(intent, imagePath);
return formatResult('multimodal', null, detections);
} catch (err) {
return {
success: false,
error: err.message,
mode: 'multimodal',
epId: null,
detections: []
};
}
}
/**
* CLI entry point.
*/
async function main() {
const intent = process.argv[2];
const imagePath = process.argv[3];
const threshold = parseFloat(process.argv[4]) || DEFAULT_CONFIDENCE_THRESHOLD;
if (!intent) {
console.error('Usage: node intent-invoke.mjs <intent> [image-path] [threshold]');
console.error('Example: node intent-invoke.mjs "detect people falling" photo.jpg 0.8');
console.error('');
console.error('Parameters:');
console.error(' intent - Description of what to detect (required)');
console.error(' image-path - Path to image file (optional)');
console.error(' threshold - Confidence threshold 0.0-1.0 (default: 0.7)');
process.exit(1);
}
try {
const result = await intentInvoke(intent, imagePath, { threshold });
console.log(JSON.stringify(result, null, 2));
process.exit(result.success ? 0 : 1);
} catch (err) {
console.error(JSON.stringify({
success: false,
error: err.message
}, null, 2));
process.exit(1);
}
}
if (import.meta.url === `file://process.argv[1]`) {
main();
}
FILE:scripts/workspace.mjs
#!/usr/bin/env node
import fs from 'fs';
import path from 'path';
import { fileURLToPath } from 'url';
import { httpsRequest, getApiKey, workspacesGetUrl, workspaceSkillsGetUrl } from './utils.mjs';
const __filename = fileURLToPath(import.meta.url);
const __dirname = path.dirname(__filename);
const CACHE_DIR = path.join(__dirname, '..', '.cache');
const WORKSPACE_CACHE_FILE = path.join(CACHE_DIR, 'workspace-cache.json');
const SKILLS_CACHE_FILE = path.join(CACHE_DIR, 'skills-cache.json');
const SKILLS_CACHE_TTL = 60 * 60 * 1000; // 1 hour in ms
// ── Helpers ──────────────────────────────────────────────────────────
function ensureCacheDir() {
if (!fs.existsSync(CACHE_DIR)) {
fs.mkdirSync(CACHE_DIR, { recursive: true });
}
}
function readCache(filePath, ttl) {
if (!fs.existsSync(filePath)) return null;
try {
const data = JSON.parse(fs.readFileSync(filePath, 'utf-8'));
if (ttl && Date.now() - data.timestamp > ttl) return null;
return data;
} catch {
return null;
}
}
function writeCache(filePath, data) {
ensureCacheDir();
fs.writeFileSync(filePath, JSON.stringify(data, null, 2), 'utf-8');
}
function buildEpId(workspaceId, localName) {
// localName format: c-sk-XXXXX → extract XXXXX part
const suffix = localName.replace(/^c-sk-/, '');
return `ep-workspaceId-suffix`;
}
// ── API Functions ────────────────────────────────────────────────────
/**
* Get the default workspace ID. Uses long-lived cache (tied to API key).
*/
export async function getDefaultWorkspaceId() {
const cached = readCache(WORKSPACE_CACHE_FILE);
if (cached?.workspaceId) return cached.workspaceId;
const apiKey = getApiKey();
const url = workspacesGetUrl();
const { statusCode, body } = await httpsRequest(url, {
method: 'POST',
headers: {
'Authorization': `Bearer apiKey`,
'Content-Type': 'application/json',
},
});
if (statusCode !== 200) {
throw new Error(`workspaces/get failed: HTTP statusCode - body`);
}
const resp = JSON.parse(body);
if (!resp.success || !resp.page?.result) {
throw new Error(`workspaces/get unexpected response: body`);
}
const defaultWorkspace = resp.page.result.find(ws => ws.type === 'default');
if (!defaultWorkspace) {
throw new Error('No default workspace found');
}
const workspaceId = defaultWorkspace.id;
writeCache(WORKSPACE_CACHE_FILE, { workspaceId, timestamp: Date.now() });
return workspaceId;
}
/**
* Get all released SkillApi skills from the default workspace.
* Uses file cache with 1-hour TTL. Each skill includes pre-constructed epId.
*/
export async function getWorkspaceSkills() {
const cached = readCache(SKILLS_CACHE_FILE, SKILLS_CACHE_TTL);
if (cached?.skills) return cached.skills;
const workspaceId = await getDefaultWorkspaceId();
const apiKey = getApiKey();
const url = workspaceSkillsGetUrl(workspaceId);
const allSkills = [];
let pageNo = 1;
const pageSize = 50;
// Paginate to fetch all skills
while (true) {
const { statusCode, body } = await httpsRequest(url, {
method: 'POST',
headers: {
'Authorization': `Bearer apiKey`,
'Content-Type': 'application/json',
},
body: JSON.stringify({
workspaceID: workspaceId,
orderBy: 'last_published_at',
order: 'desc',
pageNo,
pageSize,
publicationKinds: ['SkillApi'],
status: 'Released',
}),
});
if (statusCode !== 200) {
throw new Error(`skills/get failed: HTTP statusCode - body`);
}
const resp = JSON.parse(body);
if (!resp.success || !resp.page?.result) {
throw new Error(`skills/get unexpected response: body`);
}
const skills = resp.page.result.map(skill => ({
epId: buildEpId(workspaceId, skill.localName),
displayName: skill.displayName,
description: skill.description || '',
localName: skill.localName,
workspaceId,
}));
allSkills.push(...skills);
const totalCount = resp.page.totalCount || 0;
if (allSkills.length >= totalCount || skills.length < pageSize) break;
pageNo++;
}
writeCache(SKILLS_CACHE_FILE, { workspaceId, skills: allSkills, timestamp: Date.now() });
return allSkills;
}
// ── CLI ──────────────────────────────────────────────────────────────
async function main() {
const command = process.argv[2];
if (command === 'list-skills') {
try {
const skills = await getWorkspaceSkills();
if (skills.length === 0) {
console.log('No private workspace skills found.');
return;
}
console.log(`Private workspace skills (skills.length total):\n`);
for (const skill of skills) {
const desc = skill.description ? ` - skill.description` : '';
console.log(` skill.epId skill.displayNamedesc`);
}
} catch (err) {
console.error(`Error: err.message`);
process.exit(1);
}
} else {
console.log('Usage: node workspace.mjs list-skills');
console.log('');
console.log('Commands:');
console.log(' list-skills List all released private workspace skills');
}
}
if (import.meta.url === `file://process.argv[1]`) {
main();
}
FILE:scripts/query.mjs
#!/usr/bin/env node
import { httpsRequest, getApiKey, routerQueryUrl } from './utils.mjs';
/**
* Parse and validate the router query response.
* Returns array of skills sorted by confidence (highest first).
*/
export function parseQueryResponse(response) {
// Handle internal API response format
if (response?.data?.candidates && Array.isArray(response.data.candidates)) {
return response.data.candidates
.filter(s => s.skillId && typeof s.score === 'number')
.map(s => ({
epId: s.skillId,
name: s.displayName,
description: s.description,
confidence: s.score
}))
.sort((a, b) => b.confidence - a.confidence);
}
// Fallback to empty array
return [];
}
/**
* Query skills by intent text.
* @param {string} intent - User's intent text
* @param {object} options - Optional parameters
* @param {number} options.topK - Maximum number of results (default: 5)
* @returns {Promise<Array>} Array of matching skills with confidence scores
*/
export async function querySkills(intent, options = {}) {
const apiKey = getApiKey();
const { topK = 5 } = options;
const requestBody = JSON.stringify({
query: intent,
topK
});
const response = await httpsRequest(routerQueryUrl(), {
method: 'POST',
headers: {
'Authorization': `Bearer apiKey`,
'Content-Type': 'application/json',
'Accept': 'application/json',
},
body: requestBody,
});
if (response.statusCode !== 200) {
throw new Error(`Query failed with status response.statusCode: response.body`);
}
let result;
try {
result = JSON.parse(response.body);
} catch (err) {
throw new Error(`Failed to parse query response: err.message`);
}
return parseQueryResponse(result);
}
/**
* CLI entry point for testing query functionality.
*/
async function main() {
const intent = process.argv[2];
const topK = parseInt(process.argv[3], 10) || 5;
if (!intent) {
console.error('Usage: node query.mjs <intent> [topK]');
console.error('Example: node query.mjs "detect person falling" 3');
process.exit(1);
}
try {
const skills = await querySkills(intent, { topK });
console.log(JSON.stringify({
success: true,
intent,
count: skills.length,
skills
}, null, 2));
} catch (err) {
console.error(JSON.stringify({
success: false,
error: err.message
}, null, 2));
process.exit(1);
}
}
if (import.meta.url === `file://process.argv[1]`) {
main();
}
FILE:scripts/types.mjs
/**
* types.mjs — Yijian type constants, classification, serialization & deserialization.
*
* Centralizes all type-related logic previously scattered across invoke.mjs
* and intent-invoke.mjs.
*/
import fs from 'fs';
import path from 'path';
import crypto from 'crypto';
// ---------------------------------------------------------------------------
// 1. Type constants & classification
// ---------------------------------------------------------------------------
/** Basic scalar types — value is a plain string. */
export const BASIC_TYPES = ['String', 'TemplateString', 'Integer', 'Double', 'Boolean', 'Time'];
/** Complex object types — value needs JSON.stringify. */
export const COMPLEX_TYPES = ['Image', 'Detection', 'TrackDetection', 'Attribute', 'ROI', 'Tripwire'];
/** Inner types that can be wrapped in Array<T>. */
export const ARRAY_INNER_TYPES = [...BASIC_TYPES, ...COMPLEX_TYPES, 'Target'];
const _basicSet = new Set(BASIC_TYPES);
const _complexSet = new Set(COMPLEX_TYPES);
const _arrayInnerSet = new Set(ARRAY_INNER_TYPES);
/** Types whose output can be visualized (draw bbox / polygon / polyline). */
const _visualizableTypes = new Set(['Detection', 'TrackDetection', 'ROI', 'Tripwire']);
/** Types that represent input-side visual overlays (ROI, Tripwire). */
const _inputVisualizableTypes = new Set(['ROI', 'Tripwire']);
export function isBasicType(type) {
return _basicSet.has(type);
}
export function isComplexType(type) {
return _complexSet.has(type);
}
/**
* Returns true if `type` is `Array<...>`.
*/
export function isArrayType(type) {
return typeof type === 'string' && type.startsWith('Array<') && type.endsWith('>');
}
/**
* Unwrap `Array<T>` → `T`. Returns `null` for non-array types.
*/
export function unwrapArrayType(type) {
if (!isArrayType(type)) return null;
return type.slice(6, -1); // strip "Array<" and ">"
}
/**
* Returns true if the type is a visual/complex type (Image, Detection, etc.),
* including Array-wrapped forms.
*/
export function isVisualType(type) {
if (_complexSet.has(type)) return true;
const inner = unwrapArrayType(type);
return inner !== null && _complexSet.has(inner);
}
/**
* Returns true if the type supports output visualization
* (Detection, TrackDetection, ROI, Tripwire, and their Array<> forms).
*/
export function hasVisualization(type) {
if (_visualizableTypes.has(type)) return true;
const inner = unwrapArrayType(type);
return inner !== null && _visualizableTypes.has(inner);
}
/**
* Returns true if the type is an input-side visualizable overlay
* (ROI, Tripwire, and their Array<> forms).
*/
export function isInputVisualizableType(type) {
if (_inputVisualizableTypes.has(type)) return true;
const inner = unwrapArrayType(type);
return inner !== null && _inputVisualizableTypes.has(inner);
}
// ---------------------------------------------------------------------------
// 2. Field schema definitions
// ---------------------------------------------------------------------------
export const IMAGE_FIELDS = {
imageId: { type: 'String', required: false },
imageData: { type: 'String', required: false },
sourceId: { type: 'String', required: false },
imageWidth: { type: 'Integer', required: false },
imageHeight: { type: 'Integer', required: false },
timestamp: { type: 'Integer', required: false },
};
export const DETECTION_FIELDS = {
image_id: { type: 'String', required: false },
source_id: { type: 'String', required: false },
image_url: { type: 'String', required: false },
description: { type: 'String', required: false },
answers: { type: 'Array<String>', required: false },
image_base64: { type: 'Array<String>', required: false },
predictions: { type: 'Array<Object>', required: false },
roi_ids: { type: 'Array<String>', required: false },
answer: { type: 'String', required: false },
};
/** TrackDetection shares the same schema as Detection. */
export const TRACK_DETECTION_FIELDS = DETECTION_FIELDS;
export const ATTRIBUTE_FIELDS = {
image_id: { type: 'String', required: false },
image_url: { type: 'String', required: false },
description: { type: 'String', required: false },
answer: { type: 'Array<String>', required: false },
image_base64: { type: 'Array<String>', required: false },
predictions: { type: 'Array<Object>', required: false },
roiIds: { type: 'Array<String>', required: false },
};
export const ROI_FIELDS = {
id: { type: 'String', required: false },
name: { type: 'String', required: false },
displayName: { type: 'String', required: false },
points: { type: 'Array<Number>', required: true },
iou: { type: 'Float', required: false, default: 0.0 },
target_ratio: { type: 'Float', required: false, default: 0.5 },
kind: { type: 'String', required: true, default: 'ROI' },
interested: { type: 'Boolean', required: false, default: true },
direction: { type: 'String', required: false },
visuals: { type: 'Object', required: false },
};
export const TRIPWIRE_FIELDS = {
id: { type: 'String', required: false },
name: { type: 'String', required: false },
displayName: { type: 'String', required: false },
points: { type: 'Array<Number>', required: true },
iou: { type: 'Float', required: false, default: 0.0 },
target_ratio: { type: 'Float', required: false, default: 0.5 },
kind: { type: 'String', required: true, default: 'TripWire' },
interested: { type: 'Boolean', required: false, default: true },
direction: { type: 'String', required: true, default: 'TwoWay' },
visuals: { type: 'Object', required: false },
};
/** Lookup field schema by type name. */
export const FIELD_SCHEMAS = {
Image: IMAGE_FIELDS,
Detection: DETECTION_FIELDS,
TrackDetection: TRACK_DETECTION_FIELDS,
Attribute: ATTRIBUTE_FIELDS,
ROI: ROI_FIELDS,
Tripwire: TRIPWIRE_FIELDS,
};
// ---------------------------------------------------------------------------
// 3. Serialization — buildValue(type, userValue)
// ---------------------------------------------------------------------------
/**
* Simple heuristic to check if a string looks like base64-encoded data.
*/
function isBase64(str) {
if (typeof str !== 'string' || str.length < 100) return false;
return /^[A-Za-z0-9+/\n]+=*$/.test(str.trim());
}
/**
* Parse image dimensions from buffer (supports JPEG, PNG and WEBP).
*/
export function getImageDimensions(buffer) {
// PNG: bytes 16-23 contain width and height in the IHDR chunk
if (buffer[0] === 0x89 && buffer[1] === 0x50 && buffer[2] === 0x4E && buffer[3] === 0x47) {
return {
width: buffer.readUInt32BE(16),
height: buffer.readUInt32BE(20),
};
}
// JPEG: search for SOF markers
if (buffer[0] === 0xFF && buffer[1] === 0xD8) {
let offset = 2;
while (offset < buffer.length - 1) {
if (buffer[offset] !== 0xFF) { offset++; continue; }
const marker = buffer[offset + 1];
if (marker === 0xD9) break; // EOI
if (marker >= 0xC0 && marker <= 0xCF && marker !== 0xC4 && marker !== 0xC8) {
return {
height: buffer.readUInt16BE(offset + 5),
width: buffer.readUInt16BE(offset + 7),
};
}
const segLen = buffer.readUInt16BE(offset + 2);
offset += 2 + segLen;
}
}
// WEBP: RIFF????WEBP VP8 /VP8L/VP8X chunks
if (buffer.length >= 12 &&
buffer[0] === 0x52 && buffer[1] === 0x49 && buffer[2] === 0x46 && buffer[3] === 0x46 &&
buffer[8] === 0x57 && buffer[9] === 0x45 && buffer[10] === 0x42 && buffer[11] === 0x50) {
const chunkId = buffer.slice(12, 16).toString('ascii');
if (chunkId === 'VP8 ' && buffer.length >= 30) {
// Lossy: width/height at bytes 26-29 (14-bit each, little-endian, mask 0x3FFF)
return {
width: (buffer.readUInt16LE(26) & 0x3FFF),
height: (buffer.readUInt16LE(28) & 0x3FFF),
};
}
if (chunkId === 'VP8L' && buffer.length >= 25) {
// Lossless: 28-bit packed fields starting at byte 21
const bits = buffer.readUInt32LE(21);
return {
width: (bits & 0x3FFF) + 1,
height: ((bits >> 14) & 0x3FFF) + 1,
};
}
if (chunkId === 'VP8X' && buffer.length >= 30) {
// Extended: 24-bit little-endian width-1 at byte 24, height-1 at byte 27
return {
width: buffer.readUIntLE(24, 3) + 1,
height: buffer.readUIntLE(27, 3) + 1,
};
}
}
return { width: 0, height: 0 };
}
/**
* Read an image file and return { base64, width, height }.
*/
export function readImageFile(filePath) {
const resolved = path.resolve(filePath);
if (!fs.existsSync(resolved)) {
throw new Error(`Image file not found: resolved`);
}
const buffer = fs.readFileSync(resolved);
const { width, height } = getImageDimensions(buffer);
return { base64: buffer.toString('base64'), width, height };
}
/**
* Compute a short hash ID from the first 64 KB of a file.
* Returns a stable 16-char hex string (MD5 prefix).
*/
export function fileHashId(filePath) {
const fd = fs.openSync(path.resolve(filePath), 'r');
const buf = Buffer.alloc(65536);
const bytesRead = fs.readSync(fd, buf, 0, 65536, 0);
fs.closeSync(fd);
return crypto.createHash('md5').update(buf.subarray(0, bytesRead)).digest('hex').slice(0, 16);
}
/**
* Build the Image value object and JSON-stringify it.
*/
function buildImageValue(userValue) {
// (1) Pre-built object with imageData → pass through
if (typeof userValue === 'object' && userValue !== null && userValue.imageData) {
return JSON.stringify(userValue);
}
// (2) Object form { file, sourceId?, imageId?, timestamp? } or { image, sourceId?, imageId?, timestamp? } — for video frames
if (typeof userValue === 'object' && userValue !== null && (userValue.file || userValue.image)) {
const filePath = userValue.file || userValue.image;
const img = readImageFile(filePath);
return JSON.stringify({
imageData: img.base64,
imageId: userValue.imageId ?? fileHashId(filePath),
sourceId: userValue.sourceId ?? fileHashId(filePath),
imageWidth: img.width,
imageHeight: img.height,
timestamp: userValue.timestamp ?? Date.now(),
});
}
// (3) String path or base64 (existing logic)
let imageData, imageWidth = 0, imageHeight = 0;
let filePath = null;
if (typeof userValue === 'string') {
if (isBase64(userValue)) {
imageData = userValue;
} else {
filePath = userValue;
const img = readImageFile(userValue);
imageData = img.base64;
imageWidth = img.width;
imageHeight = img.height;
}
} else {
throw new Error(`Unsupported image value type: typeof userValue`);
}
return JSON.stringify({
imageData,
imageId: filePath ? fileHashId(filePath) : '0',
sourceId: filePath ? fileHashId(filePath) : '',
imageWidth,
imageHeight,
timestamp: Date.now(),
});
}
/**
* Apply default values from a field schema to a user-provided object.
*/
function applyDefaults(obj, fields) {
const result = { ...obj };
for (const [key, def] of Object.entries(fields)) {
if (result[key] === undefined && def.default !== undefined) {
result[key] = def.default;
}
}
return result;
}
/**
* Build a ROI value: fill in defaults (kind='ROI', etc.), JSON.stringify.
*/
function buildROIValue(userValue) {
if (typeof userValue === 'string') return userValue; // assume pre-serialized
const obj = applyDefaults(userValue, ROI_FIELDS);
return JSON.stringify(obj);
}
/**
* Build a Tripwire value: fill in defaults (kind='TripWire', direction, etc.), JSON.stringify.
*/
function buildTripwireValue(userValue) {
if (typeof userValue === 'string') return userValue;
const obj = applyDefaults(userValue, TRIPWIRE_FIELDS);
return JSON.stringify(obj);
}
/**
* Build a complex object value (Detection, TrackDetection, Attribute).
* If already an object, JSON.stringify; if string, assume pre-serialized.
*/
function buildGenericComplexValue(userValue) {
if (typeof userValue === 'string') return userValue;
return JSON.stringify(userValue);
}
/**
* Unified entry point: serialize a user value based on its Yijian type.
*
* @param {string} type - Yijian type (e.g. 'Image', 'ROI', 'Array<Detection>')
* @param {*} userValue - user-provided value
* @returns {string} - serialized string suitable for the API
*/
export function buildValue(type, userValue) {
if (userValue === undefined || userValue === null) return '';
// Array<T> — recurse on each element
const inner = unwrapArrayType(type);
if (inner !== null) {
const arr = Array.isArray(userValue) ? userValue : [userValue];
const built = arr.map(elem => {
const s = buildValue(inner, elem);
// For complex types the inner buildValue returns JSON string; parse it back
// so the outer stringify produces a proper array.
if (isComplexType(inner)) {
try { return JSON.parse(s); } catch { return s; }
}
return s;
});
return JSON.stringify(built);
}
switch (type) {
case 'Image':
return buildImageValue(userValue);
case 'ROI':
return buildROIValue(userValue);
case 'Tripwire':
return buildTripwireValue(userValue);
case 'Detection':
case 'TrackDetection':
case 'Attribute':
return buildGenericComplexValue(userValue);
default:
// Basic types — convert to string
if (typeof userValue === 'object') return JSON.stringify(userValue);
return String(userValue);
}
}
// ---------------------------------------------------------------------------
// 4. Deserialization — parseValue(type, valueString)
// ---------------------------------------------------------------------------
/**
* Parse Detection / TrackDetection value string into structured results.
* Extracts key fields from predictions.
*/
function parseDetections(value) {
const images = JSON.parse(value);
const detections = [];
for (const image of images) {
for (const pred of (image.predictions || [])) {
detections.push({
bbox: pred.bbox,
confidence: pred.confidence,
category: pred.categories && pred.categories.length > 0
? { id: pred.categories[0].id, name: pred.categories[0].name, confidence: pred.categories[0].confidence }
: null,
track_id: pred.track_id,
area: pred.area,
});
}
}
return detections;
}
/**
* Parse Attribute value string into structured results.
* Similar to Detection but also extracts answer field.
*/
function parseAttributes(value) {
const images = JSON.parse(value);
const results = [];
for (const image of images) {
const entry = {
answer: image.answer,
predictions: [],
};
for (const pred of (image.predictions || [])) {
entry.predictions.push({
bbox: pred.bbox,
confidence: pred.confidence,
categories: pred.categories,
answer: pred.answer,
});
}
results.push(entry);
}
return results;
}
/**
* Unified entry point: deserialize a value string based on its Yijian type.
*
* @param {string} type - Yijian type
* @param {string} valueString - raw value string from API response
* @returns {*} - parsed value
*/
export function parseValue(type, valueString) {
if (valueString === undefined || valueString === null || valueString === '') return valueString;
// Array<T>
const inner = unwrapArrayType(type);
if (inner !== null) {
// For Detection/TrackDetection arrays the entire value is already the images array
if (inner === 'Detection' || inner === 'TrackDetection') {
return parseDetections(valueString);
}
if (inner === 'Attribute') {
return parseAttributes(valueString);
}
// Generic array: parse, then recurse on each element
const arr = JSON.parse(valueString);
return arr.map(elem => {
const s = typeof elem === 'string' ? elem : JSON.stringify(elem);
return parseValue(inner, s);
});
}
switch (type) {
case 'Detection':
case 'TrackDetection':
// Single detection (wrapped in an array by API)
return parseDetections(valueString);
case 'Attribute':
return parseAttributes(valueString);
case 'ROI':
case 'Tripwire':
return JSON.parse(valueString);
case 'Image':
return JSON.parse(valueString);
default:
// Basic types — return as-is
return valueString;
}
}
/**
* Post-process an outputs array: parse complex types and attach parsedValue.
*/
export function parseOutputs(outputs) {
for (const field of outputs) {
if (field.value && (isComplexType(field.type) || isArrayType(field.type))) {
try {
field.parsedValue = parseValue(field.type, field.value);
} catch {
// keep original value if parsing fails
}
}
}
}
// ---------------------------------------------------------------------------
// 5. SKILL.md generation helpers
// ---------------------------------------------------------------------------
/**
* Returns hints used when generating SKILL.md for a given type.
*/
export function getSkillMdHints(type) {
const hints = {
inputNote: null,
hasVisualization: false,
visualizationCmd: null,
};
// Check the raw type or inner type
const inner = unwrapArrayType(type) || type;
if (inner === 'Image') {
hints.inputNote = '> pass a local file path (auto base64-encoded), or `{ file, sourceId?, imageId?, timestamp? }` / `{ image, sourceId?, imageId?, timestamp? }` for video frames — see **Video Frame Extraction Guide**.';
} else if (inner === 'ROI') {
hints.inputNote = '> requires drawing regions on the image — see **ROI / Tripwire Input Guide** below.';
} else if (inner === 'Tripwire') {
hints.inputNote = '> requires drawing tripwire lines on the image — see **ROI / Tripwire Input Guide** below.';
}
if (hasVisualization(type)) {
hints.hasVisualization = true;
if (inner === 'Detection' || inner === 'TrackDetection') {
hints.visualizationCmd = 'visualize.mjs';
} else if (inner === 'ROI' || inner === 'Tripwire') {
hints.visualizationCmd = 'visualize.mjs';
}
}
return hints;
}
FILE:scripts/utils.mjs
#!/usr/bin/env node
import https from 'https';
import http from 'http';
/**
* Make an HTTPS (or HTTP) request. Returns a Promise that resolves to { statusCode, headers, body }.
*/
export function httpsRequest(url, options = {}) {
return new Promise((resolve, reject) => {
const parsedUrl = new URL(url);
const mod = parsedUrl.protocol === 'https:' ? https : http;
const reqOptions = {
hostname: parsedUrl.hostname,
port: parsedUrl.port || (parsedUrl.protocol === 'https:' ? 443 : 80),
path: parsedUrl.pathname + parsedUrl.search,
method: options.method || 'GET',
headers: options.headers || {},
};
const req = mod.request(reqOptions, (res) => {
const chunks = [];
res.on('data', (chunk) => chunks.push(chunk));
res.on('end', () => {
const body = Buffer.concat(chunks).toString('utf-8');
resolve({ statusCode: res.statusCode, headers: res.headers, body });
});
});
req.on('error', reject);
if (options.body) {
req.write(options.body);
}
req.end();
});
}
/**
* Get the API key from environment. Throws if not set.
*/
export function getApiKey() {
const key = process.env.YIJIAN_API_KEY;
if (!key) {
throw new Error('YIJIAN_API_KEY environment variable is not set. Please configure it in ~/.claude/settings.json under "env".');
}
return key;
}
/**
* Construct the metadata URL for a given ep-id.
*/
export function metadataUrl(epId) {
return `https://yijian-next.cloud.baidu.com/api/skills/v1/epId/metadata`;
}
/**
* Construct the run URL for a given ep-id.
*/
export function runUrl(epId) {
return `https://yijian-next.cloud.baidu.com/api/skills/v1/epId/run`;
}
/**
* Construct the router query URL for intent-based skill matching.
*/
export function routerQueryUrl() {
return 'https://yijian.baidubce.com/harness/v1/router/query';
}
/**
* Construct the router multimodal URL for direct inference.
*/
export function routerMultimodalUrl() {
return 'https://yijian.baidubce.com/harness/v1/router/multimodal';
}
/**
* Construct the workspaces get URL.
*/
export function workspacesGetUrl() {
return 'https://yijian-next.cloud.baidu.com/api/vistudio/v1/workspaces/get';
}
/**
* Construct the workspace skills get URL.
*/
export function workspaceSkillsGetUrl(workspaceId) {
return `https://yijian-next.cloud.baidu.com/api/vistudio/v1/workspaces/workspaceId/skills/get`;
}
FILE:scripts/invoke.mjs
#!/usr/bin/env node
import { httpsRequest, getApiKey, runUrl } from './utils.mjs';
import { buildValue, parseOutputs, readImageFile } from './types.mjs';
/**
* Read all stdin as a string (for piped input).
*/
function readStdin() {
return new Promise((resolve, reject) => {
const chunks = [];
process.stdin.setEncoding('utf-8');
process.stdin.on('data', (chunk) => chunks.push(chunk));
process.stdin.on('end', () => resolve(chunks.join('')));
process.stdin.on('error', reject);
});
}
/**
* Build a full-image ROI covering the entire image bounds.
*/
export function fullImageROI(width, height) {
return { id: '1', name: 'zone', kind: 'ROI', points: [0, 0, width, 0, width, height, 0, height] };
}
/**
* Determine Yijian type from a field name heuristic.
*/
function detectFieldType(fieldName, fieldValue) {
const lower = fieldName.toLowerCase();
if (lower.includes('image') || lower.endsWith('image')) return 'Image';
if (lower.includes('roi')) return 'ROI';
if (lower.includes('tripwire')) return 'Tripwire';
if (typeof fieldValue === 'number') return 'Integer';
return 'String';
}
/**
* Build the `inputs` array for the run request body.
* Supports Image, ROI, Tripwire, Integer, and String types.
* When autoROI is enabled and an Image field is present without a corresponding ROI,
* automatically adds a full-image ROI.
*/
function buildRunInputs(userInputs, options = {}) {
const { autoROI = false } = options;
const inputs = [];
for (const [inputName, inputData] of Object.entries(userInputs)) {
const schema = [];
let hasImage = false;
let hasROI = false;
let imagePath = null;
for (const [fieldName, fieldValue] of Object.entries(inputData)) {
const type = detectFieldType(fieldName, fieldValue);
if (type === 'Image') {
hasImage = true;
imagePath = typeof fieldValue === 'object' && fieldValue !== null
? (fieldValue.file || fieldValue.image || null)
: (typeof fieldValue === 'string' && !fieldValue.startsWith('data:') ? fieldValue : null);
}
if (type === 'ROI') hasROI = true;
schema.push({
name: fieldName,
type: type,
value: buildValue(type, fieldValue),
});
}
// Auto-fill full-image ROI when enabled and not provided by user
if (autoROI && hasImage && !hasROI && imagePath) {
try {
const img = readImageFile(imagePath);
schema.push({
name: 'roi',
type: 'ROI',
value: buildValue('ROI', fullImageROI(img.width, img.height)),
});
} catch (err) {
// If we can't read the image for ROI, skip auto-fill
// (buildValue('Image', ...) will read it again anyway and throw if invalid)
}
}
inputs.push({ name: inputName, schema });
}
return inputs;
}
/**
* Invoke a skill by epId with the given user inputs.
*
* @param {string} epId - Skill endpoint ID (e.g. "ep-public-xxx")
* @param {object} userInputs - User input { input0: { image: "path", roi: {...} } }
* @param {object} options
* @param {boolean} options.autoROI - Auto-fill full-image ROI when not provided (default: true)
* @returns {Promise<{success: boolean, epId: string, result?: object, message?: string}>}
*/
export async function runSkill(epId, userInputs = {}, options = {}) {
const { autoROI = true } = options;
const apiKey = getApiKey();
const inputs = buildRunInputs(userInputs, { autoROI });
const requestBody = JSON.stringify({ inputs });
const response = await httpsRequest(runUrl(epId), {
method: 'POST',
headers: {
'Authorization': `Bearer apiKey`,
'Content-Type': 'application/json',
'Accept': 'application/json',
},
body: requestBody,
});
if (response.statusCode !== 200) {
// Check for "conditional branch did not hit" error (no detection)
try {
const parsed = JSON.parse(response.body);
if (parsed.message?.global?.detail?.includes('conditional branch did not hit')) {
return {
success: true,
epId,
result: { 未检出: true, detail: parsed.message.global.detail },
};
}
} catch {}
throw new Error(`Skill invocation failed (response.statusCode): response.body`);
}
let result;
try {
result = JSON.parse(response.body);
} catch {
result = { rawResponse: response.body };
}
// Post-process: parse complex-type outputs
if (result.result?.outputs) {
parseOutputs(result.result.outputs);
}
return { success: true, epId, result };
}
/**
* CLI entry point.
*/
async function main() {
const epId = process.argv[2];
let inputArg = process.argv[3];
if (!epId) {
console.error('Usage: node invoke.mjs <ep-id> \'<input-json>\' | -');
console.error('Use - as input arg to read from stdin');
console.error('');
console.error('Input JSON format:');
console.error(' { "input0": { "image": "/path/to/image.jpg" } }');
console.error(' { "input0": { "image": "/path/to/image.jpg", "roi": { "id":"1","name":"zone","kind":"ROI","points":[...] } } }');
process.exit(1);
}
// Parse user input JSON
let userInputs;
if (inputArg === '-') {
inputArg = await readStdin();
}
if (!inputArg) {
userInputs = {};
} else {
try {
userInputs = JSON.parse(inputArg);
} catch (err) {
console.error(`Failed to parse input JSON: err.message`);
process.exit(1);
}
}
try {
const output = await runSkill(epId, userInputs, { autoROI: true });
console.log(JSON.stringify(output, null, 2));
process.exit(output.success ? 0 : 1);
} catch (err) {
console.error(JSON.stringify({ success: false, epId, error: err.message }, null, 2));
process.exit(1);
}
}
if (import.meta.url === `file://process.argv[1]`) {
main();
}
FILE:scripts/list.mjs
#!/usr/bin/env node
import { querySkills } from './query.mjs';
/**
* List skills matching a query intent.
* This helps users discover what skills are available for their use case.
*
* Usage: node list.mjs "detect people"
*/
async function main() {
const intent = process.argv[2];
const topK = parseInt(process.argv[3], 10) || 10;
if (!intent) {
console.log('Usage: node list.mjs <intent> [topK]');
console.log('Example: node list.mjs "detect people falling" 5');
console.log('');
console.log('This queries the Yijian platform for skills matching your intent.');
process.exit(0);
}
try {
const skills = await querySkills(intent, { topK });
if (skills.length === 0) {
console.log('No skills found matching your intent.');
console.log('You can still use multimodal inference: node intent-invoke.mjs "' + intent + '" <image>');
process.exit(0);
}
console.log(`Found skills.length skill(s) matching "intent":\n`);
skills.forEach((skill, index) => {
const confidence = (skill.confidence * 100).toFixed(1);
console.log(`index + 1. skill.name || skill.epId`);
console.log(` ID: skill.epId`);
console.log(` Match: confidence%`);
if (skill.description) {
console.log(` Description: skill.description`);
}
console.log('');
});
console.log('To use a skill directly:');
console.log(` node intent-invoke.mjs "intent" <image-path>`);
} catch (err) {
console.error('Error:', err.message);
process.exit(1);
}
}
main();
FILE:scripts/multimodal.mjs
#!/usr/bin/env node
import { httpsRequest, getApiKey, routerMultimodalUrl } from './utils.mjs';
import fs from 'fs';
import path from 'path';
/**
* Convert a local file path to a base64 data URI.
*/
function imageToDataUri(filePath) {
const ext = path.extname(filePath).toLowerCase();
const mimeMap = {
'.jpg': 'image/jpeg', '.jpeg': 'image/jpeg',
'.png': 'image/png', '.gif': 'image/gif',
'.webp': 'image/webp', '.bmp': 'image/bmp',
};
const mime = mimeMap[ext] || 'image/jpeg';
const data = fs.readFileSync(filePath);
return `data:mime;base64,data.toString('base64')`;
}
/**
* Check if a string looks like a local file path (not a URL or data URI).
*/
function isLocalFilePath(str) {
return str && !str.startsWith('http://') && !str.startsWith('https://') && !str.startsWith('data:');
}
/**
* Call multimodal router for direct inference using internal API.
* @param {string} text - User's text query
* @param {string} imageUrl - URL or local file path to the image (optional)
* @returns {Promise<Object>} API response
*/
export async function callMultimodal(text, imageUrl) {
const apiKey = getApiKey();
const messages = [
{
role: 'user',
content: [
{
type: 'text',
text: text
}
]
}
];
// Add image if provided
if (imageUrl) {
// Convert local file path to data URI
const resolvedUrl = isLocalFilePath(imageUrl) ? imageToDataUri(imageUrl) : imageUrl;
messages[0].content.push({
type: 'image_url',
image_url: {
url: resolvedUrl
}
});
}
const requestBody = {
messages
};
const response = await httpsRequest(routerMultimodalUrl(), {
method: 'POST',
headers: {
'Authorization': `Bearer apiKey`,
'Content-Type': 'application/json',
'Accept': 'application/json',
},
body: JSON.stringify(requestBody),
});
if (response.statusCode !== 200) {
throw new Error(`Multimodal call failed with status response.statusCode: response.body`);
}
let result;
try {
result = JSON.parse(response.body);
} catch (err) {
throw new Error(`Failed to parse multimodal response: err.message`);
}
return result;
}
/**
* CLI entry point for testing multimodal functionality.
*/
async function main() {
const text = process.argv[2];
const imageUrl = process.argv[3];
if (!text) {
console.error('Usage: node multimodal.mjs <text> [image-url]');
console.error('Example: node multimodal.mjs "图片里有什么"');
console.error('Example: node multimodal.mjs "图片里有什么" "http://example.com/image.jpg"');
process.exit(1);
}
try {
const result = await callMultimodal(text, imageUrl);
console.log(JSON.stringify({
success: true,
text,
imageUrl: imageUrl || null,
result
}, null, 2));
} catch (err) {
console.error(JSON.stringify({
success: false,
error: err.message
}, null, 2));
process.exit(1);
}
}
if (import.meta.url === `file://process.argv[1]`) {
main();
}