@clawhub-sansan-mei-f5e0f1b566
通过本机 media-agent-crawler HTTP 服务搜集 B站/抖音/YouTube/知乎内容(不依赖 MCP 客户端安装)。当用户要搜集这些平台内容、并已在本机启动应用(默认 http://127.0.0.1:39002)时使用。
---
name: media-crawler-local
description: 通过本机 media-agent-crawler HTTP 服务搜集 B站/抖音/YouTube/知乎内容(不依赖 MCP 客户端安装)。当用户要搜集这些平台内容、并已在本机启动应用(默认 http://127.0.0.1:39002)时使用。
---
# media-crawler-local
直接调用本机 HTTP 服务,不走 OpenClaw/Cursor 的 MCP 客户端配置。
## 前置确认
先从用户消息或上下文中提取以下信息,缺少时再询问:
- 操作类型:搜集内容 / 查询归档 / 读取任务数据
- 目标链接或关键词
- 平台(可从链接自动推断)
---
## 工具清单
### B 站系列
| 工具名 | 必填参数 | 说明 |
|---|---|---|
| `crawl_bilibili` | `url` | 视频 URL 或 BV 号 |
| `crawl_bilibili_search` | `keyword` | 按关键词触发搜索结果搜集 |
| `crawl_bilibili_uploader` | `mid` | UP 主纯数字 ID,触发视频列表搜集 |
| `crawl_bilibili_popular` | 无 | 热门视频搜集 |
| `crawl_bilibili_weekly` | 无(可选 `number`) | 每周必看,不传 `number` 则自动取最新一期 |
| `crawl_bilibili_history` | 无(可选 `max/view_at/business/ps/type/page_count`) | 历史记录聚合搜集,不传 `page_count` 时跟随 `dailyRecommendPageCount` |
所有 B 站工具均支持可选 `cookies` 参数(字符串,从浏览器插件获取)。
### 其他平台
| 工具名 | 必填参数 | 说明 |
|---|---|---|
| `crawl_douyin` | `url` | 抖音视频 URL |
| `crawl_youtube` | `url` | YouTube 视频 URL 或视频 ID |
| `crawl_zhihu` | `url` | 知乎问题或回答 URL |
### 归档与数据读取
| 工具名 | 必填参数 | 可选参数 | 说明 |
|---|---|---|---|
| `list_archives` | 无 | `platform` / `keyword` / `limit` / `sort_by` / `created_after` | 列出归档任务,默认返回最多 50 条,按时间倒序 |
| `get_task_data` | `task_id` | `type` | 读取任务目录下的数据文件 |
`list_archives` 参数说明:
- `sort_by`:`date`(默认,创建时间倒序)或 `status`(running → failed → unknown → finished)
- `created_after`:ISO 日期,如 `2026-03-18` 或 `2026-03-18T10:00:00Z`
`get_task_data` 的 `type` 支持以下值(含别名):
| type 值 | 读取的数据 |
|---|---|
| `comments` / `comment` | 评论数据 |
| `danmaku` | 弹幕数据 |
| `subtitles` / `subtitle` / `caption` / `captions` | 字幕数据 |
| `detail` / `info` | 视频/帖子详情 |
| `all` / `full` | 全量聚合数据 |
| `summary` / `ai_summary` | AI 摘要 |
| 不传 | 返回目录下所有可识别文件 |
---
## HTTP 端点
服务地址默认 `http://127.0.0.1:39002`,可通过环境变量 `BIL_CRAWL_URL` 覆盖。
### 搜集端点(REST)
```
POST /start-crawl/{platform}/{encodedUrl}
Content-Type: application/json
{ "source": "ai" }
```
`encodedUrl` 需要 `encodeURIComponent` 编码;`platform` 取值:`bilibili` / `douyin` / `youtube` / `zhihu`。
### MCP 端点(JSON-RPC 2.0)
```
POST /mcp
Accept: application/json, text/event-stream
Content-Type: application/json
```
请求体格式:
```json
{ "jsonrpc": "2.0", "id": 1, "method": "tools/call", "params": { "name": "<tool>", "arguments": { } } }
```
---
## 调用方式选择
根据当前环境按优先级选择:
| 优先级 | 条件 | 方式 |
|---|---|---|
| 1 | 任何系统(无需额外依赖) | **内联命令**(见下方) |
| 2 | 有 Node.js | `node skills/scripts/*.mjs` |
| 3 | 有 bash(macOS/Linux/Git Bash) | `bash skills/scripts/*.sh` |
---
## 内联命令(首选,无需任何依赖)
AI 直接通过 Shell 工具执行,根据系统自动选择:
### Windows(PowerShell 内置)
先设置当前会话为 UTF-8(避免中文输出乱码):
```powershell
[Console]::InputEncoding = [System.Text.UTF8Encoding]::new($false)
[Console]::OutputEncoding = [System.Text.UTF8Encoding]::new($false)
$OutputEncoding = [Console]::OutputEncoding
```
**REST 搜集:**
```powershell
$encoded = [Uri]::EscapeDataString("https://www.bilibili.com/video/BV1xx411c7mD")
Invoke-RestMethod -Uri "http://127.0.0.1:39002/start-crawl/bilibili/$encoded" -Method POST -ContentType "application/json" -Body '{"source":"ai"}' | ConvertTo-Json -Depth 10
```
**MCP 工具调用:**
```powershell
$body = '{"jsonrpc":"2.0","id":2,"method":"tools/call","params":{"name":"list_archives","arguments":{"platform":"bilibili","limit":20}}}'
Invoke-RestMethod -Uri "http://127.0.0.1:39002/mcp" -Method POST -ContentType "application/json" -Headers @{Accept="application/json, text/event-stream"} -Body $body | ConvertTo-Json -Depth 10
```
### macOS / Linux(curl 系统自带)
**REST 搜集:**
```bash
curl -fsS -X POST "http://127.0.0.1:39002/start-crawl/bilibili/$(node -e 'process.stdout.write(encodeURIComponent(process.argv[1]))' 'https://www.bilibili.com/video/BV1xx411c7mD')" \
-H 'Content-Type: application/json' -d '{"source":"ai"}'
```
**MCP 工具调用:**
```bash
curl -fsS -X POST "http://127.0.0.1:39002/mcp" \
-H 'Content-Type: application/json' \
-H 'Accept: application/json, text/event-stream' \
-d '{"jsonrpc":"2.0","id":2,"method":"tools/call","params":{"name":"list_archives","arguments":{"platform":"bilibili","limit":20}}}'
```
> URL 编码:Windows 用 `[Uri]::EscapeDataString()`,macOS/Linux 用 `python3 -c "import urllib.parse,sys; print(urllib.parse.quote(sys.argv[1]))" "<url>"` 或 `node -e`(如有)。
---
## 脚本用法(备选)
所有脚本位于 `skills/scripts/`,提供 `.mjs`(Node.js)和 `.sh`(bash)两套。
### Node.js(`node skills/scripts/*.mjs`)
### 1. 快速搜集(REST,`crawl.mjs`)
```
node skills/scripts/crawl.mjs <platform> <url> [base_url]
```
示例:
```
node skills/scripts/crawl.mjs bilibili "https://www.bilibili.com/video/BV1xx411c7mD"
```
### 2. 通过 MCP 搜集(`crawl_mcp.mjs`,仅支持带 url 的工具)
```
node skills/scripts/crawl_mcp.mjs <tool_name> <target_url> [base_url]
```
示例:
```
node skills/scripts/crawl_mcp.mjs crawl_bilibili "https://www.bilibili.com/video/BV1xx411c7mD"
```
支持工具:`crawl_bilibili` / `crawl_douyin` / `crawl_youtube` / `crawl_zhihu`
> 其余工具(bilibili_search / bilibili_uploader / bilibili_popular / bilibili_weekly / bilibili_history / list_archives / get_task_data)请用 `mcp_tool.mjs`。
### 3. 归档查询(`list_archives_mcp.mjs`)
```
node skills/scripts/list_archives_mcp.mjs [platform] [keyword] [limit] [base_url]
```
示例:
```
node skills/scripts/list_archives_mcp.mjs bilibili "蛋神" 20
```
### 4. 通用工具调用(`mcp_tool.mjs`)
```
node skills/scripts/mcp_tool.mjs <tool_name> [args_json] [base_url]
```
示例:
```
node skills/scripts/mcp_tool.mjs crawl_bilibili_search '{"keyword":"蛋神"}'
node skills/scripts/mcp_tool.mjs crawl_bilibili_uploader '{"mid":"123456"}'
node skills/scripts/mcp_tool.mjs crawl_bilibili_popular '{}'
node skills/scripts/mcp_tool.mjs crawl_bilibili_weekly '{}'
node skills/scripts/mcp_tool.mjs crawl_bilibili_weekly '{"number":364}'
node skills/scripts/mcp_tool.mjs crawl_bilibili_history '{}'
node skills/scripts/mcp_tool.mjs crawl_bilibili_history '{"max":0,"view_at":0,"business":"","ps":20,"type":"all"}'
node skills/scripts/mcp_tool.mjs crawl_bilibili_history '{"page_count":2}'
node skills/scripts/mcp_tool.mjs get_task_data '{"task_id":"BV1xx411c7mD-123456","type":"comments"}'
```
---
## 执行流程
1. **判断环境**:读取系统信息中的 OS(`win32` → PowerShell 内联,其余 → curl 内联)。
2. **健康检查**:`GET /`(连不上则提醒用户先启动 Electron 应用)。
3. **发起搜集**:
- 简单 URL 搜集 → REST 端点(`/start-crawl/...`)
- 需要额外参数(搜索词、UP 主 ID 等)→ MCP 端点(`/mcp`)
4. **结果处理**:
- 给用户简要摘要(任务 ID、状态、关键字段)
- 内容很多时仅展示前几条,说明可通过 `get_task_data` 继续读取或过滤
---
## 故障处理
| 错误 | 处理方式 |
|---|---|
| 连接失败 | 提醒先启动 Electron 应用(`bun run start` / `dev`) |
| 401 / 403 | 提示检查 cookies 是否已在 store 中,或让用户重新从插件导入 |
| 429 | 按返回的 `Retry-After` 退避,不密集重试 |
| 5xx | 最多重试 1 次,返回错误摘要与建议 |
| `task_id` 不存在 | 先用 `list_archives` 查询正确的任务 ID |
FILE:scripts/mcp_tool.mjs
#!/usr/bin/env node
/**
* 通用 MCP 工具调用
* 用法: node mcp_tool.mjs <tool_name> [args_json] [base_url]
*/
import { print, print_error } from "./stdio_utf8.mjs";
const [toolName, argsRaw = "{}", baseUrl = process.env.BIL_CRAWL_URL || "http://127.0.0.1:39002"] = process.argv.slice(2);
if (!toolName) {
print_error("Usage: node mcp_tool.mjs <tool_name> [args_json] [base_url]");
print_error(
'Example: node mcp_tool.mjs list_archives \'{"platform":"bilibili","limit":10}\''
);
process.exit(2);
}
let args;
try {
args = JSON.parse(argsRaw);
if (typeof args !== "object" || Array.isArray(args) || args === null) throw new Error();
} catch {
print_error("args_json must be a valid JSON object");
process.exit(2);
}
const mcpUrl = `baseUrl/mcp`;
try {
await fetch(`baseUrl/`, { signal: AbortSignal.timeout(3000) });
} catch {
print_error(`Service not reachable at baseUrl`);
process.exit(1);
}
const headers = { "Content-Type": "application/json", Accept: "application/json, text/event-stream" };
const initPayload = {
jsonrpc: "2.0", id: 1, method: "initialize",
params: { protocolVersion: "2024-11-05", capabilities: {}, clientInfo: { name: "media-crawler-local", version: "1.0.0" } },
};
const callPayload = {
jsonrpc: "2.0", id: 2, method: "tools/call",
params: { name: toolName, arguments: args },
};
// Initialize session (best-effort)
try {
await fetch(mcpUrl, { method: "POST", headers, body: JSON.stringify(initPayload), signal: AbortSignal.timeout(5000) });
} catch {}
const res = await fetch(mcpUrl, {
method: "POST",
headers,
body: JSON.stringify(callPayload),
signal: AbortSignal.timeout(30000),
});
const text = await res.text();
try {
print(JSON.stringify(JSON.parse(text), null, 2));
} catch {
print(text);
}
FILE:scripts/mcp_tool.sh
#!/usr/bin/env bash
set -euo pipefail
TOOL_NAME="-"
ARGS_JSON="-"
if [[ -z "$ARGS_JSON" ]]; then
ARGS_JSON='{}'
fi
BASE_URL="-${BIL_CRAWL_URL:-http://127.0.0.1:39002}"
MCP_URL="$BASE_URL/mcp"
if [[ -z "$TOOL_NAME" ]]; then
echo "Usage: $0 <tool_name> [args_json] [base_url]" >&2
echo "Example: $0 list_archives '{"\"platform\"":"\"bilibili\"","\"limit\"":10}'" >&2
exit 2
fi
curl -fsS "$BASE_URL/" >/dev/null 2>&1 || {
echo "Service not reachable at $BASE_URL" >&2
exit 1
}
init_payload='{"jsonrpc":"2.0","id":1,"method":"initialize","params":{"protocolVersion":"2024-11-05","capabilities":{},"clientInfo":{"name":"media-crawler-local","version":"1.0.0"}}}'
tmp_args_file=$(mktemp)
trap 'rm -f "$tmp_args_file"' EXIT
printf '%s' "$ARGS_JSON" >"$tmp_args_file"
call_payload=$(TOOL_NAME_ENV="$TOOL_NAME" ARGS_FILE_ENV="$tmp_args_file" node -e '
const fs = require("node:fs");
const tool = process.env.TOOL_NAME_ENV || "";
const file = process.env.ARGS_FILE_ENV || "";
const raw = (file && fs.existsSync(file)) ? fs.readFileSync(file, "utf8") : "{}";
let args={};
try { args=JSON.parse(raw); } catch (e) {
console.error("Invalid args_json, must be valid JSON object");
process.exit(2);
}
if (typeof args !== "object" || Array.isArray(args) || args===null) {
console.error("args_json must be a JSON object");
process.exit(2);
}
process.stdout.write(JSON.stringify({jsonrpc:"2.0",id:2,method:"tools/call",params:{name:tool,arguments:args}}));
')
curl -sS -N "$MCP_URL" \
-H 'Accept: application/json, text/event-stream' \
-H 'Content-Type: application/json' \
--data "$init_payload" >/dev/null
curl -sS -N "$MCP_URL" \
-H 'Accept: application/json, text/event-stream' \
-H 'Content-Type: application/json' \
--data "$call_payload"
echo
FILE:scripts/list_archives_mcp.sh
#!/usr/bin/env bash
set -euo pipefail
PLATFORM="-"
KEYWORD="-"
LIMIT="-50"
BASE_URL="-${BIL_CRAWL_URL:-http://127.0.0.1:39002}"
ARGS='{}'
if [[ -n "$PLATFORM" || -n "$KEYWORD" || -n "$LIMIT" ]]; then
ARGS=$(node -e '
const [platform, keyword, limitRaw] = process.argv.slice(1);
const out = {};
if (platform) out.platform = platform;
if (keyword) out.keyword = keyword;
const n = Number(limitRaw);
if (!Number.isNaN(n) && n > 0) out.limit = Math.floor(n);
process.stdout.write(JSON.stringify(out));
' "$PLATFORM" "$KEYWORD" "$LIMIT")
fi
bash "$(dirname "$0")/mcp_tool.sh" list_archives "$ARGS" "$BASE_URL"
FILE:scripts/crawl_mcp.sh
#!/usr/bin/env bash
set -euo pipefail
TOOL="-"
TARGET_URL="-"
BASE_URL="-${BIL_CRAWL_URL:-http://127.0.0.1:39002}"
if [[ -z "$TOOL" || -z "$TARGET_URL" ]]; then
echo "Usage: $0 <crawl_bilibili|crawl_douyin|crawl_youtube|crawl_zhihu> <target_url> [base_url]" >&2
exit 2
fi
case "$TOOL" in
crawl_bilibili|crawl_douyin|crawl_youtube|crawl_zhihu) ;;
*)
echo "Unsupported tool: $TOOL (use mcp_tool.sh for bilibili_search/bilibili_uploader/bilibili_popular/bilibili_weekly)" >&2
exit 2
;;
esac
ARGS=$(node -e 'process.stdout.write(JSON.stringify({url: process.argv[1]}))' "$TARGET_URL")
bash "$(dirname "$0")/mcp_tool.sh" "$TOOL" "$ARGS" "$BASE_URL"
FILE:scripts/stdio_utf8.mjs
/**
* 统一的 UTF-8 文本输出。
* - TTY:直接写字符串,让 Node 走平台原生终端路径(Windows 下可避免中文乱码)。
* - Pipe/File:写 UTF-8 bytes,确保跨进程与重定向一致。
* @param {NodeJS.WriteStream} stream
* @param {string} text
*/
function writeLine(stream, text) {
const line = `text\n`;
if (stream.isTTY) {
stream.write(line);
return;
}
stream.write(Buffer.from(line, "utf8"));
}
/** @param {string} text */
export function print(text) {
writeLine(process.stdout, String(text));
}
/** @param {string} text */
export function print_error(text) {
writeLine(process.stderr, String(text));
}
FILE:scripts/crawl.mjs
#!/usr/bin/env node
/**
* 快速搜集(REST 端点)
* 用法: node crawl.mjs <platform> <url> [base_url]
*/
import { print, print_error } from "./stdio_utf8.mjs";
const SUPPORTED = ["bilibili", "douyin", "youtube", "zhihu"];
const [platform, targetUrl, baseUrl = process.env.BIL_CRAWL_URL || "http://127.0.0.1:39002"] = process.argv.slice(2);
if (!platform || !targetUrl) {
print_error(
`Usage: node crawl.mjs <SUPPORTED.join("|")> <target_url> [base_url]`
);
process.exit(2);
}
if (!SUPPORTED.includes(platform)) {
print_error(`Unsupported platform: platform`);
process.exit(2);
}
const encoded = encodeURIComponent(targetUrl);
const endpoint = `baseUrl/start-crawl/platform/encoded`;
/** @param {string} url */
async function probe(url) {
try {
const r = await fetch(url, { signal: AbortSignal.timeout(3000) });
return r.ok || r.status < 500;
} catch {
return false;
}
}
if (!(await probe(`baseUrl/health`)) && !(await probe(`baseUrl/`))) {
print_error(`Service not reachable at baseUrl`);
process.exit(1);
}
const res = await fetch(endpoint, {
method: "POST",
headers: { "Content-Type": "application/json", Accept: "application/json" },
body: JSON.stringify({ source: "ai" }),
});
const text = await res.text();
try {
print(JSON.stringify(JSON.parse(text), null, 2));
} catch {
print(text);
}
FILE:scripts/crawl.sh
#!/usr/bin/env bash
set -euo pipefail
PLATFORM="-"
TARGET_URL="-"
BASE_URL="-${BIL_CRAWL_URL:-http://127.0.0.1:39002}"
if [[ -z "$PLATFORM" || -z "$TARGET_URL" ]]; then
echo "Usage: $0 <bilibili|douyin|youtube|zhihu> <target_url> [base_url]" >&2
exit 2
fi
case "$PLATFORM" in
bilibili|douyin|youtube|zhihu) ;;
*)
echo "Unsupported platform: $PLATFORM" >&2
exit 2
;;
esac
encoded=$(node -e 'process.stdout.write(encodeURIComponent(process.argv[1]))' "$TARGET_URL")
endpoint="$BASE_URL/start-crawl/$PLATFORM/$encoded"
payload='{"source":"ai"}'
# Best-effort health check
curl -fsS "$BASE_URL/health" >/dev/null 2>&1 || curl -fsS "$BASE_URL/" >/dev/null 2>&1 || {
echo "Service not reachable at $BASE_URL" >&2
exit 1
}
curl -fsS -X POST "$endpoint" \
-H 'Content-Type: application/json' \
-H 'Accept: application/json' \
--data "$payload"
echo
FILE:scripts/crawl_mcp.mjs
#!/usr/bin/env node
/**
* 通过 MCP 搜集(仅支持带 url 的工具)
* 用法: node crawl_mcp.mjs <tool_name> <target_url> [base_url]
*/
import { execFileSync } from "node:child_process";
import { fileURLToPath } from "node:url";
import { dirname, join } from "node:path";
import { print_error } from "./stdio_utf8.mjs";
const SUPPORTED = ["crawl_bilibili", "crawl_douyin", "crawl_youtube", "crawl_zhihu"];
const [tool, targetUrl, baseUrl = process.env.BIL_CRAWL_URL || "http://127.0.0.1:39002"] = process.argv.slice(2);
if (!tool || !targetUrl) {
print_error(
`Usage: node crawl_mcp.mjs <SUPPORTED.join("|")> <target_url> [base_url]`
);
process.exit(2);
}
if (!SUPPORTED.includes(tool)) {
print_error(`Unsupported tool: tool`);
print_error(
"Use mcp_tool.mjs for: bilibili_search / bilibili_uploader / bilibili_popular / bilibili_weekly / bilibili_history"
);
process.exit(2);
}
const argsJson = JSON.stringify({ url: targetUrl });
const scriptDir = dirname(fileURLToPath(import.meta.url));
execFileSync(process.execPath, [join(scriptDir, "mcp_tool.mjs"), tool, argsJson, baseUrl], { stdio: "inherit" });
FILE:scripts/list_archives_mcp.mjs
#!/usr/bin/env node
/**
* 归档查询
* 用法: node list_archives_mcp.mjs [platform] [keyword] [limit] [base_url]
*/
import { execFileSync } from "node:child_process";
import { fileURLToPath } from "node:url";
import { dirname, join } from "node:path";
const [
platform = "",
keyword = "",
limitRaw = "50",
baseUrl = process.env.BIL_CRAWL_URL || "http://127.0.0.1:39002",
] = process.argv.slice(2);
/** @type {Record<string, unknown>} */
const out = {};
if (platform) out.platform = platform;
if (keyword) out.keyword = keyword;
const n = Number(limitRaw);
if (!Number.isNaN(n) && n > 0) out.limit = Math.floor(n);
const argsJson = Object.keys(out).length ? JSON.stringify(out) : "{}";
const scriptDir = dirname(fileURLToPath(import.meta.url));
execFileSync(process.execPath, [join(scriptDir, "mcp_tool.mjs"), "list_archives", argsJson, baseUrl], { stdio: "inherit" });
通过本机 media-agent-crawler HTTP 服务搜集 B站/抖音/YouTube/知乎内容(不依赖 MCP 客户端安装)。当用户要搜集这些平台内容、并已在本机启动应用(默认 http://127.0.0.1:39002)时使用。
---
name: media-crawler-local
description: 通过本机 media-agent-crawler HTTP 服务搜集 B站/抖音/YouTube/知乎内容(不依赖 MCP 客户端安装)。当用户要搜集这些平台内容、并已在本机启动应用(默认 http://127.0.0.1:39002)时使用。
---
# media-crawler-local
直接调用本机 HTTP 服务,不走 OpenClaw/Cursor 的 MCP 客户端配置。
## 前置确认
先从用户消息或上下文中提取以下信息,缺少时再询问:
- 操作类型:搜集内容 / 查询归档 / 读取任务数据
- 目标链接或关键词
- 平台(可从链接自动推断)
---
## 工具清单
### B 站系列
| 工具名 | 必填参数 | 说明 |
|---|---|---|
| `crawl_bilibili` | `url` | 视频 URL 或 BV 号 |
| `crawl_bilibili_search` | `keyword` | 按关键词触发搜索结果搜集 |
| `crawl_bilibili_uploader` | `mid` | UP 主纯数字 ID,触发视频列表搜集 |
| `crawl_bilibili_popular` | 无 | 热门视频搜集 |
| `crawl_bilibili_weekly` | 无(可选 `number`) | 每周必看,不传 `number` 则自动取最新一期 |
| `crawl_bilibili_history` | 无(可选 `max/view_at/business/ps/type/page_count`) | 历史记录聚合搜集,不传 `page_count` 时跟随 `dailyRecommendPageCount` |
所有 B 站工具均支持可选 `cookies` 参数(字符串,从浏览器插件获取)。
### 其他平台
| 工具名 | 必填参数 | 说明 |
|---|---|---|
| `crawl_douyin` | `url` | 抖音视频 URL |
| `crawl_youtube` | `url` | YouTube 视频 URL 或视频 ID |
| `crawl_zhihu` | `url` | 知乎问题或回答 URL |
### 归档与数据读取
| 工具名 | 必填参数 | 可选参数 | 说明 |
|---|---|---|---|
| `list_archives` | 无 | `platform` / `keyword` / `limit` / `sort_by` / `created_after` | 列出归档任务,默认返回最多 50 条,按时间倒序 |
| `get_task_data` | `task_id` | `type` | 读取任务目录下的数据文件 |
`list_archives` 参数说明:
- `sort_by`:`date`(默认,创建时间倒序)或 `status`(running → failed → unknown → finished)
- `created_after`:ISO 日期,如 `2026-03-18` 或 `2026-03-18T10:00:00Z`
`get_task_data` 的 `type` 支持以下值(含别名):
| type 值 | 读取的数据 |
|---|---|
| `comments` / `comment` | 评论数据 |
| `danmaku` | 弹幕数据 |
| `subtitles` / `subtitle` / `caption` / `captions` | 字幕数据 |
| `detail` / `info` | 视频/帖子详情 |
| `all` / `full` | 全量聚合数据 |
| `summary` / `ai_summary` | AI 摘要 |
| 不传 | 返回目录下所有可识别文件 |
---
## HTTP 端点
服务地址默认 `http://127.0.0.1:39002`,可通过环境变量 `BIL_CRAWL_URL` 覆盖。
### 搜集端点(REST)
```
POST /start-crawl/{platform}/{encodedUrl}
Content-Type: application/json
{ "source": "ai" }
```
`encodedUrl` 需要 `encodeURIComponent` 编码;`platform` 取值:`bilibili` / `douyin` / `youtube` / `zhihu`。
### MCP 端点(JSON-RPC 2.0)
```
POST /mcp
Accept: application/json, text/event-stream
Content-Type: application/json
```
请求体格式:
```json
{ "jsonrpc": "2.0", "id": 1, "method": "tools/call", "params": { "name": "<tool>", "arguments": { } } }
```
---
## 脚本用法
所有脚本位于 `skills/media-crawler-local/scripts/`,工作目录为 openclaw workspace 根。
### 1. 快速搜集(REST,`crawl.sh`)
```bash
bash skills/media-crawler-local/scripts/crawl.sh <platform> <url> [base_url]
```
示例:
```bash
bash skills/media-crawler-local/scripts/crawl.sh bilibili "https://www.bilibili.com/video/BV1xx411c7mD"
```
### 2. 通过 MCP 搜集(`crawl_mcp.sh`,仅支持带 url 的工具)
```bash
bash skills/media-crawler-local/scripts/crawl_mcp.sh <tool_name> <target_url> [base_url]
```
示例:
```bash
bash skills/media-crawler-local/scripts/crawl_mcp.sh crawl_bilibili "https://www.bilibili.com/video/BV1xx411c7mD"
```
支持工具:`crawl_bilibili` / `crawl_douyin` / `crawl_youtube` / `crawl_zhihu`
> 其余工具(bilibili_search / bilibili_uploader / bilibili_popular / bilibili_weekly / bilibili_history / list_archives / get_task_data)请用 `mcp_tool.sh`。
### 3. 归档查询(`list_archives_mcp.sh`)
```bash
bash skills/media-crawler-local/scripts/list_archives_mcp.sh [platform] [keyword] [limit] [base_url]
```
示例:
```bash
bash skills/media-crawler-local/scripts/list_archives_mcp.sh bilibili "蛋神" 20
```
### 4. 通用工具调用(`mcp_tool.sh`)
```bash
bash skills/media-crawler-local/scripts/mcp_tool.sh <tool_name> [args_json] [base_url]
```
示例:
```bash
# B 站搜索
bash skills/media-crawler-local/scripts/mcp_tool.sh crawl_bilibili_search '{"keyword":"蛋神"}'
# UP 主视频列表
bash skills/media-crawler-local/scripts/mcp_tool.sh crawl_bilibili_uploader '{"mid":"123456"}'
# 热门视频
bash skills/media-crawler-local/scripts/mcp_tool.sh crawl_bilibili_popular '{}'
# 每周必看(最新一期)
bash skills/media-crawler-local/scripts/mcp_tool.sh crawl_bilibili_weekly '{}'
# 每周必看(指定期数)
bash skills/media-crawler-local/scripts/mcp_tool.sh crawl_bilibili_weekly '{"number":364}'
# 历史记录(默认参数)
bash skills/media-crawler-local/scripts/mcp_tool.sh crawl_bilibili_history '{}'
# 历史记录(指定首屏 cursor)
bash skills/media-crawler-local/scripts/mcp_tool.sh crawl_bilibili_history '{"max":0,"view_at":0,"business":"","ps":20,"type":"all"}'
# 历史记录(指定采集页数)
bash skills/media-crawler-local/scripts/mcp_tool.sh crawl_bilibili_history '{"page_count":2}'
# 读取任务评论数据
bash skills/media-crawler-local/scripts/mcp_tool.sh get_task_data '{"task_id":"BV1xx411c7mD-123456","type":"comments"}'
```
---
## 执行流程
1. **健康检查**:`GET /`(连不上则提醒用户先启动应用)。
2. **发起搜集**:优先用 REST 端点(`crawl.sh`),需要额外工具参数时用 MCP(`mcp_tool.sh`)。
3. **结果处理**:
- 给用户简要摘要(任务 ID、状态、关键字段)
- 内容很多时仅展示前几条,说明可通过 `get_task_data` 继续读取或过滤
---
## 故障处理
| 错误 | 处理方式 |
|---|---|
| 连接失败 | 提醒先启动 Electron 应用(`bun run start` / `dev`) |
| 401 / 403 | 提示检查 cookies 是否已在 store 中,或让用户重新从插件导入 |
| 429 | 按返回的 `Retry-After` 退避,不密集重试 |
| 5xx | 最多重试 1 次,返回错误摘要与建议 |
| `task_id` 不存在 | 先用 `list_archives` 查询正确的任务 ID |
FILE:scripts/mcp_tool.sh
#!/usr/bin/env bash
set -euo pipefail
TOOL_NAME="-"
ARGS_JSON="-"
if [[ -z "$ARGS_JSON" ]]; then
ARGS_JSON='{}'
fi
BASE_URL="-${BIL_CRAWL_URL:-http://127.0.0.1:39002}"
MCP_URL="$BASE_URL/mcp"
if [[ -z "$TOOL_NAME" ]]; then
echo "Usage: $0 <tool_name> [args_json] [base_url]" >&2
echo "Example: $0 list_archives '{"\"platform\"":"\"bilibili\"","\"limit\"":10}'" >&2
exit 2
fi
curl -fsS "$BASE_URL/" >/dev/null 2>&1 || {
echo "Service not reachable at $BASE_URL" >&2
exit 1
}
init_payload='{"jsonrpc":"2.0","id":1,"method":"initialize","params":{"protocolVersion":"2024-11-05","capabilities":{},"clientInfo":{"name":"media-crawler-local","version":"1.0.0"}}}'
tmp_args_file=$(mktemp)
trap 'rm -f "$tmp_args_file"' EXIT
printf '%s' "$ARGS_JSON" >"$tmp_args_file"
call_payload=$(TOOL_NAME_ENV="$TOOL_NAME" ARGS_FILE_ENV="$tmp_args_file" node -e '
const fs = require("node:fs");
const tool = process.env.TOOL_NAME_ENV || "";
const file = process.env.ARGS_FILE_ENV || "";
const raw = (file && fs.existsSync(file)) ? fs.readFileSync(file, "utf8") : "{}";
let args={};
try { args=JSON.parse(raw); } catch (e) {
console.error("Invalid args_json, must be valid JSON object");
process.exit(2);
}
if (typeof args !== "object" || Array.isArray(args) || args===null) {
console.error("args_json must be a JSON object");
process.exit(2);
}
process.stdout.write(JSON.stringify({jsonrpc:"2.0",id:2,method:"tools/call",params:{name:tool,arguments:args}}));
')
curl -sS -N "$MCP_URL" \
-H 'Accept: application/json, text/event-stream' \
-H 'Content-Type: application/json' \
--data "$init_payload" >/dev/null
curl -sS -N "$MCP_URL" \
-H 'Accept: application/json, text/event-stream' \
-H 'Content-Type: application/json' \
--data "$call_payload"
echo
FILE:scripts/list_archives_mcp.sh
#!/usr/bin/env bash
set -euo pipefail
PLATFORM="-"
KEYWORD="-"
LIMIT="-50"
BASE_URL="-${BIL_CRAWL_URL:-http://127.0.0.1:39002}"
ARGS='{}'
if [[ -n "$PLATFORM" || -n "$KEYWORD" || -n "$LIMIT" ]]; then
ARGS=$(node -e '
const [platform, keyword, limitRaw] = process.argv.slice(1);
const out = {};
if (platform) out.platform = platform;
if (keyword) out.keyword = keyword;
const n = Number(limitRaw);
if (!Number.isNaN(n) && n > 0) out.limit = Math.floor(n);
process.stdout.write(JSON.stringify(out));
' "$PLATFORM" "$KEYWORD" "$LIMIT")
fi
bash "$(dirname "$0")/mcp_tool.sh" list_archives "$ARGS" "$BASE_URL"
FILE:scripts/crawl_mcp.sh
#!/usr/bin/env bash
set -euo pipefail
TOOL="-"
TARGET_URL="-"
BASE_URL="-${BIL_CRAWL_URL:-http://127.0.0.1:39002}"
if [[ -z "$TOOL" || -z "$TARGET_URL" ]]; then
echo "Usage: $0 <crawl_bilibili|crawl_douyin|crawl_youtube|crawl_zhihu> <target_url> [base_url]" >&2
exit 2
fi
case "$TOOL" in
crawl_bilibili|crawl_douyin|crawl_youtube|crawl_zhihu) ;;
*)
echo "Unsupported tool: $TOOL (use mcp_tool.sh for bilibili_search/bilibili_uploader/bilibili_popular/bilibili_weekly)" >&2
exit 2
;;
esac
ARGS=$(node -e 'process.stdout.write(JSON.stringify({url: process.argv[1]}))' "$TARGET_URL")
bash "$(dirname "$0")/mcp_tool.sh" "$TOOL" "$ARGS" "$BASE_URL"
FILE:scripts/crawl.sh
#!/usr/bin/env bash
set -euo pipefail
PLATFORM="-"
TARGET_URL="-"
BASE_URL="-${BIL_CRAWL_URL:-http://127.0.0.1:39002}"
if [[ -z "$PLATFORM" || -z "$TARGET_URL" ]]; then
echo "Usage: $0 <bilibili|douyin|youtube|zhihu> <target_url> [base_url]" >&2
exit 2
fi
case "$PLATFORM" in
bilibili|douyin|youtube|zhihu) ;;
*)
echo "Unsupported platform: $PLATFORM" >&2
exit 2
;;
esac
encoded=$(node -e 'process.stdout.write(encodeURIComponent(process.argv[1]))' "$TARGET_URL")
endpoint="$BASE_URL/start-crawl/$PLATFORM/$encoded"
payload='{"source":"ai"}'
# Best-effort health check
curl -fsS "$BASE_URL/health" >/dev/null 2>&1 || curl -fsS "$BASE_URL/" >/dev/null 2>&1 || {
echo "Service not reachable at $BASE_URL" >&2
exit 1
}
curl -fsS -X POST "$endpoint" \
-H 'Content-Type: application/json' \
-H 'Accept: application/json' \
--data "$payload"
echo