@clawhub-aurora2035-5d8ab66ad3
Local TTS skill using OpenVINO Qwen3-TTS for voice cloning and emotion style synthesis, supporting QQBOT workflows with strict audio length and file retentio...
# Xeon TTS
基于 OpenVINO Qwen3-TTS Base/Custom 模型的本地语音合成技能,面向 OpenClaw 的 QQBOT 工作流使用。
## 目标
- 安装本地双服务:5002 Flask TTS,9002 Node TTS Workflow
- 自动配置目标机器自己的 OpenClaw 配置,但只写入 `channels.qqbot.xeonTts`
- 与 xeonasr 共存,不覆盖 `tools.media.audio` 或 `channels.qqbot.stt`
- 支持两个工作流:音色克隆、指定语气 TTS
## 什么时候应该调用 xeontts
只有在以下场景才使用 xeontts:
- 用户明确要“克隆音色”“克隆声音”“复制我的声音”
- 用户要求“用某种语气朗读/播报/生成语音”
- 用户要把音频生成到本地文件,而不是做转写
以下场景禁止走 xeontts:
- 识别语音
- 语音转文字
- 听写
- STT / ASR
这些请求必须交给 xeonasr,以避免任务冲突。
## OpenClaw / QQBOT 使用规则
### 规则 1:音色克隆必须分两步走
当用户说“我要克隆音色”时:
1. 立即把当前会话切到 clone 流程
2. 回复用户上传 3 到 5 秒参考音频
3. 在收到参考音频前,不要直接开始合成
4. 如果机器上已安装 xeonasr,QQBOT 发来的语音会先命中 ASR;此时应由 ASR 把音频转交给 xeontts,而不是按普通转写处理
5. 收到音频后必须先校验时长
6. 如果时长小于 3 秒或大于 5 秒,直接拒绝并提示重新上传
7. 校验通过后,再让用户发送要朗读的文本
8. 使用 Base 模型生成音频并落盘
### 规则 2:指定语气生成默认走 Custom 模型
当用户说“用开心的语气朗读……”“生成语音……”时:
1. 解析用户是否指定了语气
2. 如果没有指定,默认使用 `普通`
3. 使用 Custom 模型生成音频
4. 把结果保存到本地 `outputs/`
5. 回复用户文件路径
### 规则 3:必须做时长保护
- 参考音频:3 到 5 秒
- Base 克隆输出:最多约 20 秒
- Custom 输出:最多约 30 秒
如果用户显式要求更长时长,或者文本长度估算会明显超过上限,必须直接提示用户拆分内容,而不是盲目提交推理。
### 规则 4:必须明确告知文件保留期
- 参考音频和生成结果默认只保留 7 天
- 超过保留期后,系统会自动清理 `references/` 和 `outputs/` 中的旧文件
- 在提示用户上传参考音频、以及告知生成完成时,都要明确告诉用户这一点
## 安装流程
当 OpenClaw、QQBOT 或其他代理被要求安装这个 skill 时,按下面流程执行:
1. 安装 skill
```bash
clawhub install xeontts
cd "$HOME/.openclaw/workspace/skills/xeontts"
```
2. 执行安装脚本
```bash
bash install.sh
```
3. 安装脚本会完成这些事情
- 创建 Python 3.10 环境
- 从 PyPI 安装 `xdp-tts-service`
- 生成 `config.json` 与 `tts_config.json`
- 尝试下载 Base / Custom OV 模型
- 仅在显式提供旧兼容参数时,才额外下载 Base checkpoint
- 配置 OpenClaw 的 `channels.qqbot.xeonTts`
- 启动 5002 与 9002
- 注册用户级 systemd 服务
- 运行 `self_check.sh`
当前默认模型仓库:
- `aurora2035/Qwen3-TTS-12Hz-0.6B-Base-OpenVINO-INT8`
- `aurora2035/Qwen3-TTS-12Hz-0.6B-CustomVoice-OpenVINO-INT8`
## 运行时端口
| 服务 | 端口 | 作用 |
|------|------|------|
| Flask TTS | 5002 | 真正执行 TTS 推理 |
| Node Workflow | 9002 | 解析 QQBOT 任务、维护会话状态、校验音频/文本时长 |
## OpenClaw 配置约定
xeontts 只会写入如下配置块:
```json
{
"channels": {
"qqbot": {
"xeonTts": {
"enabled": true,
"baseUrl": "http://127.0.0.1:9002",
"cloneModel": "qwen3_tts_0.6b_base_openvino",
"customModel": "qwen3_tts_0.6b_custom_openvino"
}
}
}
}
```
这意味着:
- 不会覆盖现有 `channels.qqbot.stt`
- 不会动 `tools.media.audio`
- 不会和 xeonasr 抢同一条 STT 链路
## 常用命令
```bash
cd "$HOME/.openclaw/workspace/skills/xeontts"
bash start_all.sh
bash stop_tts.sh
bash self_check.sh
curl http://127.0.0.1:5002/api/health
curl http://127.0.0.1:9002/health
```
## 关键接口
- `POST /api/workflow/message`
- 作用:根据用户消息判断是 clone 还是 custom TTS,或者提示补充参考音频
- `POST /api/workflow/reference-audio`
- 作用:上传参考音频,校验 3 到 5 秒后入库
- `POST /api/tts/custom-speak`
- 作用:直接调用 Custom 模型生成语音
- `POST /api/tts/clone-speak`
- 作用:直接调用 Base 模型做音色克隆
## 故障排查
- 如果 `5002` 不通,先检查 `tts.log`
- 如果 `9002` 不通,先检查 `skill.log`
- 如果参考音频总是被拒绝,先确认机器上是否有可用的 `ffprobe`;当前版本对 WAV 参考音频也支持无 `ffprobe` 回退校验
- 如果用户说的是转写意图,不要误用 xeontts
- 如果 Base 模型报错,优先让用户更换更干净的 3 到 5 秒参考音频
- 当前默认发布形态只要求 `Qwen3-TTS-12Hz-0.6B-Base-OpenVINO-INT8`,不再默认要求原始 Base checkpoint
- 只有旧导出模型缺少 processor 或 speech tokenizer 权重时,才需要补 `BASE_CHECKPOINT_PATH`
FILE:.clawhub.json
{
"name": "xeon-tts",
"version": "1.0.0",
"description": "基于 OpenVINO Qwen3-TTS Base/Custom 模型的本地语音合成技能。一键安装脚本自动创建 Python 环境、安装 xdp-tts-service、生成配置,并为 OpenClaw 的 QQBOT 写入独立 TTS 工作流配置。",
"author": "aurora2035",
"license": "MIT",
"tags": ["xeon", "tts", "voice-clone", "qqbot", "openvino", "qwen3", "openclaw", "local", "audio"],
"main": "server.js",
"repository": "",
"homepage": "",
"config": {
"port": 9002,
"flaskTtsUrl": "http://127.0.0.1:5002/api/tts/synthesize",
"cloneModel": "qwen3_tts_0.6b_base_openvino",
"customModel": "qwen3_tts_0.6b_custom_openvino"
},
"scripts": {
"install:all": "bash ./install.sh",
"init": "xdp-tts-init-config --output ./tts_config.json",
"start": "node server.js",
"start:tts": "xdp-tts-service --host 127.0.0.1 --port 5002 --config ./tts_config.json"
},
"engines": {
"node": ">=18.0.0",
"python": "3.10"
}
}
FILE:config.example.json
{
"port": 9002,
"flaskTtsUrl": "http://127.0.0.1:5002/api/tts/synthesize",
"flaskHealthUrl": "http://127.0.0.1:5002/api/health",
"cloneModel": "qwen3_tts_0.6b_base_openvino",
"cloneMode": "voice_clone_xvector",
"customModel": "qwen3_tts_0.6b_custom_openvino",
"defaultSpeaker": "Vivian",
"defaultLanguage": "Chinese",
"minReferenceDurationSec": 3,
"maxReferenceDurationSec": 5,
"maxCloneOutputSeconds": 20,
"maxCustomOutputSeconds": 30,
"estimatedCharsPerSecond": 4,
"fileRetentionDays": 7,
"outputDir": "./outputs",
"referencesDir": "./references",
"runtimeDir": "./runtime",
"sessionStateFile": "./runtime/session_state.json",
"openclawSession": "default"
}
FILE:configure_openclaw_integration.sh
#!/usr/bin/env bash
set -euo pipefail
log() {
printf '[xeontts-config] %s\n' "$*"
}
fail() {
printf '[xeontts-config] ERROR: %s\n' "$*" >&2
exit 1
}
timestamp() {
date +%Y%m%d-%H%M%S
}
OPENCLAW_HOME="-$HOME/.openclaw"
CONFIG_FILE="-$OPENCLAW_HOME/openclaw.json"
RUN_ID="$(timestamp)"
QQBOT_TTS_BASE_URL="-http://127.0.0.1:9002"
QQBOT_TTS_HEALTH_URL="-http://127.0.0.1:9002/health"
OUTPUT_DIR="-$(cd "$(dirname "${BASH_SOURCE[0]")" && pwd)/outputs}"
command -v node >/dev/null 2>&1 || fail "missing required command: node"
[[ -f "$CONFIG_FILE" ]] || fail "OpenClaw config not found: $CONFIG_FILE"
export CONFIG_FILE RUN_ID QQBOT_TTS_BASE_URL QQBOT_TTS_HEALTH_URL OUTPUT_DIR
node <<'NODE'
const fs = require('node:fs');
const {
CONFIG_FILE,
RUN_ID,
QQBOT_TTS_BASE_URL,
QQBOT_TTS_HEALTH_URL,
OUTPUT_DIR,
} = process.env;
const backupPath = `CONFIG_FILE.bak.RUN_ID`;
if (!fs.existsSync(backupPath)) {
fs.copyFileSync(CONFIG_FILE, backupPath);
}
const raw = fs.readFileSync(CONFIG_FILE, 'utf8');
const config = JSON.parse(raw);
config.channels = config.channels || {};
config.channels.qqbot = config.channels.qqbot || {};
config.channels.qqbot.xeonTts = {
...(config.channels.qqbot.xeonTts || {}),
enabled: true,
baseUrl: QQBOT_TTS_BASE_URL,
healthUrl: QQBOT_TTS_HEALTH_URL,
outputDir: OUTPUT_DIR,
cloneModel: 'qwen3_tts_0.6b_base_openvino',
customModel: 'qwen3_tts_0.6b_custom_openvino',
minReferenceDurationSec: 3,
maxReferenceDurationSec: 5,
maxCloneOutputSeconds: 20,
maxCustomOutputSeconds: 30,
modeRouting: {
cloneIntentKeywords: ['克隆音色', '克隆声音', 'voice clone'],
customIntentKeywords: ['生成语音', '朗读', '播报', 'tts'],
asrGuardKeywords: ['转写', '识别语音', 'speech to text', 'asr'],
},
};
if (config.skills && typeof config.skills === 'object' && 'xeontts' in config.skills) {
delete config.skills.xeontts;
if (Object.keys(config.skills).length === 0) {
delete config.skills;
}
}
fs.writeFileSync(CONFIG_FILE, `JSON.stringify(config, null, 2)\n`);
console.log('updated openclaw config with xeonTts block');
NODE
log "已写入 OpenClaw QQBOT TTS 配置: $CONFIG_FILE"
log "注意:xeontts 不会修改 tools.media.audio 或 channels.qqbot.stt,因此不会和 xeonasr 产生端口/配置冲突"
FILE:install.sh
#!/usr/bin/env bash
set -euo pipefail
GREEN='\033[0;32m'
YELLOW='\033[1;33m'
RED='\033[0;31m'
BLUE='\033[0;34m'
NC='\033[0m'
log_info() { echo -e "GREEN[INFO]NC $1"; }
log_warn() { echo -e "YELLOW[WARN]NC $1"; }
log_error() { echo -e "RED[ERROR]NC $1"; }
log_step() { echo -e "BLUE[STEP]NC $1"; }
SKILL_DIR="$(cd "$(dirname "BASH_SOURCE[0]")" && pwd)"
cd "$SKILL_DIR"
SKIP_START=0
SETUP_ARGS=()
while [[ $# -gt 0 ]]; do
case "$1" in
--skip-start) SKIP_START=1; shift ;;
--force|--skip-deps) SETUP_ARGS+=("$1"); shift ;;
*) log_error "未知参数: $1"; exit 1 ;;
esac
done
command -v node >/dev/null 2>&1 || { log_error "需要 Node.js 18+"; exit 1; }
command -v npm >/dev/null 2>&1 || { log_error "需要 npm"; exit 1; }
log_step "安装 Python 环境与模型配置"
bash "$SKILL_DIR/setup_env.sh" "SETUP_ARGS[@]"
log_step "安装 Node 依赖"
npm install
log_step "配置 OpenClaw QQBOT TTS 集成"
bash "$SKILL_DIR/configure_openclaw_integration.sh"
log_step "安装开机自启服务"
bash "$SKILL_DIR/install_systemd_services.sh"
if [[ "$SKIP_START" -ne 1 ]]; then
log_step "启动本地服务"
bash "$SKILL_DIR/start_all.sh"
else
log_warn "已跳过自动启动,可手工执行 ./start_all.sh"
fi
log_step "执行自检"
bash "$SKILL_DIR/self_check.sh"
log_info "Xeon TTS 安装完成"
FILE:install_systemd_services.sh
#!/usr/bin/env bash
set -euo pipefail
SKILL_DIR="$(cd "$(dirname "BASH_SOURCE[0]")" && pwd)"
SYSTEMD_USER_DIR="-$HOME/.config/systemd/user"
TTS_UNIT_NAME="xeontts-tts.service"
NODE_UNIT_NAME="xeontts-node.service"
TTS_UNIT_PATH="$SYSTEMD_USER_DIR/$TTS_UNIT_NAME"
NODE_UNIT_PATH="$SYSTEMD_USER_DIR/$NODE_UNIT_NAME"
NODE_BIN="$(command -v node || true)"
TTS_BIN="$SKILL_DIR/venv/bin/xdp-tts-service"
[[ -n "$NODE_BIN" ]] || { echo "未找到 node" >&2; exit 1; }
[[ -x "$TTS_BIN" ]] || { echo "未找到 $TTS_BIN" >&2; exit 1; }
[[ -f "$SKILL_DIR/tts_config.json" ]] || { echo "未找到 tts_config.json" >&2; exit 1; }
[[ -f "$SKILL_DIR/config.json" ]] || { echo "未找到 config.json" >&2; exit 1; }
mkdir -p "$SYSTEMD_USER_DIR"
cat > "$TTS_UNIT_PATH" <<EOF
[Unit]
Description=Xeon TTS Flask Service
After=network-online.target
Wants=network-online.target
[Service]
Type=simple
WorkingDirectory=$SKILL_DIR
ExecStart=$TTS_BIN --host 127.0.0.1 --port 5002 --config $SKILL_DIR/tts_config.json
Restart=always
RestartSec=3
Environment=HOME=$HOME
Environment=PATH=$HOME/.local/bin:$HOME/.npm-global/bin:$HOME/bin:/usr/local/bin:/usr/bin:/bin
Environment=XDP_TTS_CONFIG=$SKILL_DIR/tts_config.json
[Install]
WantedBy=default.target
EOF
cat > "$NODE_UNIT_PATH" <<EOF
[Unit]
Description=Xeon TTS Workflow Gateway
After=network-online.target $TTS_UNIT_NAME
Wants=network-online.target $TTS_UNIT_NAME
[Service]
Type=simple
WorkingDirectory=$SKILL_DIR
ExecStart=$NODE_BIN $SKILL_DIR/server.js
Restart=always
RestartSec=3
Environment=HOME=$HOME
Environment=PATH=$HOME/.local/bin:$HOME/.npm-global/bin:$HOME/bin:/usr/local/bin:/usr/bin:/bin
[Install]
WantedBy=default.target
EOF
systemctl --user daemon-reload
systemctl --user enable --now "$TTS_UNIT_NAME"
systemctl --user enable --now "$NODE_UNIT_NAME"
echo "xeontts 开机自启已启用"
FILE:package-lock.json
{
"name": "xeon-tts",
"version": "1.0.0",
"lockfileVersion": 3,
"requires": true,
"packages": {
"": {
"name": "xeon-tts",
"version": "1.0.0",
"license": "MIT",
"dependencies": {
"formidable": "^3.5.4"
},
"engines": {
"node": ">=18.0.0"
}
},
"node_modules/@noble/hashes": {
"version": "1.8.0",
"resolved": "https://registry.npmjs.org/@noble/hashes/-/hashes-1.8.0.tgz",
"integrity": "sha512-jCs9ldd7NwzpgXDIf6P3+NrHh9/sD6CQdxHyjQI+h/6rDNo88ypBxxz45UDuZHz9r3tNz7N/VInSVoVdtXEI4A==",
"license": "MIT",
"engines": {
"node": "^14.21.3 || >=16"
},
"funding": {
"url": "https://paulmillr.com/funding/"
}
},
"node_modules/@paralleldrive/cuid2": {
"version": "2.3.1",
"resolved": "https://registry.npmjs.org/@paralleldrive/cuid2/-/cuid2-2.3.1.tgz",
"integrity": "sha512-XO7cAxhnTZl0Yggq6jOgjiOHhbgcO4NqFqwSmQpjK3b6TEE6Uj/jfSk6wzYyemh3+I0sHirKSetjQwn5cZktFw==",
"license": "MIT",
"dependencies": {
"@noble/hashes": "^1.1.5"
}
},
"node_modules/asap": {
"version": "2.0.6",
"resolved": "https://registry.npmjs.org/asap/-/asap-2.0.6.tgz",
"integrity": "sha512-BSHWgDSAiKs50o2Re8ppvp3seVHXSRM44cdSsT9FfNEUUZLOGWVCsiWaRPWM1Znn+mqZ1OfVZ3z3DWEzSp7hRA==",
"license": "MIT"
},
"node_modules/dezalgo": {
"version": "1.0.4",
"resolved": "https://registry.npmjs.org/dezalgo/-/dezalgo-1.0.4.tgz",
"integrity": "sha512-rXSP0bf+5n0Qonsb+SVVfNfIsimO4HEtmnIpPHY8Q1UCzKlQrDMfdobr8nJOOsRgWCyMRqeSBQzmWUMq7zvVig==",
"license": "ISC",
"dependencies": {
"asap": "^2.0.0",
"wrappy": "1"
}
},
"node_modules/formidable": {
"version": "3.5.4",
"resolved": "https://registry.npmjs.org/formidable/-/formidable-3.5.4.tgz",
"integrity": "sha512-YikH+7CUTOtP44ZTnUhR7Ic2UASBPOqmaRkRKxRbywPTe5VxF7RRCck4af9wutiZ/QKM5nME9Bie2fFaPz5Gug==",
"license": "MIT",
"dependencies": {
"@paralleldrive/cuid2": "^2.2.2",
"dezalgo": "^1.0.4",
"once": "^1.4.0"
},
"engines": {
"node": ">=14.0.0"
},
"funding": {
"url": "https://ko-fi.com/tunnckoCore/commissions"
}
},
"node_modules/once": {
"version": "1.4.0",
"resolved": "https://registry.npmjs.org/once/-/once-1.4.0.tgz",
"integrity": "sha512-lNaJgI+2Q5URQBkccEKHTQOPaXdUxnZZElQTZY0MFUAuaEqe1E+Nyvgdz/aIyNi6Z9MzO5dv1H8n58/GELp3+w==",
"license": "ISC",
"dependencies": {
"wrappy": "1"
}
},
"node_modules/wrappy": {
"version": "1.0.2",
"resolved": "https://registry.npmjs.org/wrappy/-/wrappy-1.0.2.tgz",
"integrity": "sha512-l4Sp/DRseor9wL6EvV2+TuQn63dMkPjZ/sp9XkghTEbV9KlPS1xUsZ3u7/IQO4wxtcFB4bgpQPRcR3QCvezPcQ==",
"license": "ISC"
}
}
}
FILE:package.json
{
"name": "xeon-tts",
"version": "1.0.0",
"description": "Xeon TTS 本地语音合成技能,面向 OpenClaw QQBOT 的音色克隆与风格化 TTS 工作流",
"main": "server.js",
"scripts": {
"install:all": "bash ./install.sh",
"self-check": "bash ./self_check.sh",
"start": "node server.js",
"start:tts": "xdp-tts-service --host 127.0.0.1 --port 5002 --config ./tts_config.json",
"dev": "node --watch server.js"
},
"keywords": ["xeon", "tts", "voice-clone", "speech", "qqbot", "openclaw", "openvino", "qwen3"],
"author": "aurora2035",
"license": "MIT",
"engines": {
"node": ">=18.0.0"
},
"dependencies": {
"formidable": "^3.5.4"
}
}
FILE:README.md
# Xeon TTS Skill
基于 OpenVINO Qwen3-TTS Base/Custom 模型的本地语音合成技能,为 OpenClaw 的 QQBOT 提供两类工作流:
- 音色克隆:用户先声明要克隆音色,再上传 3 到 5 秒参考音频,随后用 Base 模型生成目标语音
- 风格化 TTS:用户直接要求“用某种语气朗读一段话”,系统调用 Custom 模型生成音频并落盘
这个 skill 刻意不占用 xeon_asr 的端口或全局音频配置:
- Flask TTS 服务:5002
- Node 工作流网关:9002
- OpenClaw 配置写入:`channels.qqbot.xeonTts`
它不会覆盖 `tools.media.audio`,也不会改动 `channels.qqbot.stt`,因此可以和已安装的 xeonasr 共存。
## 架构
双服务架构:
| 服务 | 端口 | 类型 | 作用 |
|------|------|------|------|
| Flask TTS | 5002 | Python | 加载 Base/Custom OpenVINO TTS 模型并执行推理 |
| TTS Workflow | 9002 | Node.js | 维护 QQBOT 会话状态、校验时长、保存输出、分流 clone/custom 请求 |
## 模型与能力
默认配置使用两套模型:
- Base:`qwen3_tts_0.6b_base_openvino`,用于音色克隆
- Custom:`qwen3_tts_0.6b_custom_openvino`,用于常规 TTS 和指定语气生成
时长约束默认值:
- 参考音频:必须在 3 到 5 秒之间
- Base 克隆输出:最多约 20 秒
- Custom 输出:最多约 30 秒
- 参考音频和生成结果:默认只保留 7 天,之后自动清理
如果用户文本里显式要求超过最大时长,或根据文本长度估算会超过上限,skill 会直接拒绝并提示拆分内容。
## 快速开始
### 1. 安装 skill
```bash
clawhub install xeontts
cd "$HOME/.openclaw/workspace/skills/xeontts"
bash install.sh
```
如果你是从源码目录运行:
```bash
cd /path/to/xeon_tts
bash install.sh
```
### 2. 模型下载
安装脚本现在默认会尝试从这两个 Hugging Face 仓库下载模型:
```bash
aurora2035/Qwen3-TTS-12Hz-0.6B-Base-OpenVINO-INT8
aurora2035/Qwen3-TTS-12Hz-0.6B-CustomVoice-OpenVINO-INT8
```
如果后续你要切换到别的仓库,再在安装前覆盖这些环境变量:
```bash
export BASE_MODEL_REPO=your-org/Qwen3-TTS-12Hz-0.6B-Base-OpenVINO-INT8
export CUSTOM_MODEL_REPO=your-org/Qwen3-TTS-12Hz-0.6B-CustomVoice-OpenVINO-INT8
bash install.sh
```
现在 `xdp-tts-service` 已经发布到 PyPI,安装脚本默认直接执行:
```bash
pip install xdp-tts-service
```
只有在你需要测试本地 wheel 或私有包时,才需要覆盖安装源:
```bash
export XDP_TTS_PIP_SPEC=/absolute/path/to/xdp_tts_service-0.1.0-py3-none-any.whl
bash install.sh
```
### 3. Base 模型是否还需要原始 checkpoint
如果你上传的是用当前最新版转换脚本导出的 `Qwen3-TTS-12Hz-0.6B-Base-OpenVINO-INT8`,并且导出目录里已经包含 processor 相关文件以及 `speech_tokenizer/` 子目录,那么默认不再需要单独上传原始 Base checkpoint。
只有两种情况还需要额外提供 `BASE_CHECKPOINT_PATH`:
- 你上传的是较早期导出的 Base OV 目录,里面缺少 processor 或 speech tokenizer 权重
- 你后续发现某些机器上 Base voice clone 仍然需要旧的 fallback tokenizer 路径
也就是说,按你现在的发布计划,优先只上传 `Qwen3-TTS-12Hz-0.6B-Base-OpenVINO-INT8` 是合理的。
## OpenClaw / QQBOT 工作流
### 1. 音色克隆
用户在 QQBOT 中说:
- `我要克隆音色`
- `帮我克隆我的声音`
工作流会这样运行:
1. xeontts 识别为 clone 任务,不走 xeonasr
2. Bot 回复:请上传 3 到 5 秒参考音频
3. 如果机器上同时安装了 xeonasr,QQBOT 的语音消息会先到 9001;ASR 会识别当前会话处于音色克隆流程,并把这段音频转交给 xeontts
4. xeontts 优先用 `ffprobe` 校验时长;如果机器上没有可用的 `ffprobe`,对 WAV 文件会自动回退到读取 WAV 头部信息来校验时长
5. 如果时长不合规,立即返回错误
6. 如果合规,保存参考音频到本地 `references/`
7. Bot 再要求用户发送要朗读的文本
8. 使用 Base 模型生成音频,并把 wav 文件落盘到 `outputs/`
系统会明确告知用户:参考音频和生成结果默认只保留 7 天,之后会自动清理。
这里的关键点是:ASR 仍然会删除自己的临时文件,但删除前已经把参考音频转交给 TTS 侧保存,所以不会再出现“文件被 ASR 清理导致音色克隆拿不到参考音频”的问题。
注意:默认使用 `voice_clone_xvector` 路径,原因是它对参考音频容错更高,但仍然使用的是 Base 模型。
如果后续你发现某个旧导出 Base 模型在目标机上无法完成 voice clone,再补一个原始 Base checkpoint 即可,不影响当前的开源默认方案。
说明:
- `ffprobe` 是 FFmpeg 工具链里的媒体探测工具,用来读取音频时长、编码、采样率等元数据
- 这里优先使用它,是因为它对 mp3、m4a、wav 等多种格式都更稳定
- 当前源码已经对 WAV 做了无 `ffprobe` 回退,所以它不是 WAV 场景的硬依赖;但如果你希望发布包开箱支持更多参考音频格式,仍建议目标机安装 FFmpeg
### 2. 指定语气生成
用户在 QQBOT 中说:
- `用开心的语气朗读:今天是个好日子`
- `生成语音:请提醒我下午三点开会`
工作流会这样运行:
1. 如果用户提到具体语气,就将其转成 `instruct_text`
2. 如果没有提到,默认按 `普通` 语气
3. 调用 Custom 模型生成音频
4. 将音频落盘到 `outputs/`
5. 回复用户保存路径,并明确告知文件默认只保留 7 天
## 文件清理策略
当前默认启用按天自动清理:
- `references/` 下的参考音频,默认保留 7 天
- `outputs/` 下的生成结果,默认保留 7 天
- 超过保留期的文件会在服务启动时和每次新保存文件后自动清理
安装脚本会先从 `config.example.json` 生成本地 `config.json`,其中默认已经包含 `fileRetentionDays`。
如果你要调整保留期,可以在本地生成的 `config.json` 里修改 `fileRetentionDays`。完整示例如下:
```json
{
"port": 9002,
"flaskTtsUrl": "http://127.0.0.1:5002/api/tts/synthesize",
"flaskHealthUrl": "http://127.0.0.1:5002/api/health",
"cloneModel": "qwen3_tts_0.6b_base_openvino",
"cloneMode": "voice_clone_xvector",
"customModel": "qwen3_tts_0.6b_custom_openvino",
"defaultSpeaker": "Vivian",
"defaultLanguage": "Chinese",
"minReferenceDurationSec": 3,
"maxReferenceDurationSec": 5,
"maxCloneOutputSeconds": 20,
"maxCustomOutputSeconds": 30,
"estimatedCharsPerSecond": 4,
"fileRetentionDays": 7,
"outputDir": "./outputs",
"referencesDir": "./references",
"runtimeDir": "./runtime",
"sessionStateFile": "./runtime/session_state.json",
"openclawSession": "default"
}
```
## 管理命令
```bash
cd "$HOME/.openclaw/workspace/skills/xeontts"
bash start_all.sh
bash stop_tts.sh
bash self_check.sh
curl http://127.0.0.1:5002/api/health
curl http://127.0.0.1:9002/health
```
## 主要接口
### 工作流意图入口
```bash
curl -X POST http://127.0.0.1:9002/api/workflow/message \
-H 'Content-Type: application/json' \
-d '{"sessionId":"default","userId":"qq-1001","text":"我要克隆音色"}'
```
### 上传参考音频
```bash
curl -X POST http://127.0.0.1:9002/api/workflow/reference-audio \
-F sessionId=default \
-F userId=qq-1001 \
-F [email protected]
```
### 继续克隆文本
```bash
curl -X POST http://127.0.0.1:9002/api/workflow/message \
-H 'Content-Type: application/json' \
-d '{"sessionId":"default","userId":"qq-1001","text":"请用我的音色说,今天下班早点休息。"}'
```
### 自定义语气生成
```bash
curl -X POST http://127.0.0.1:9002/api/tts/custom-speak \
-H 'Content-Type: application/json' \
-d '{"text":"欢迎使用 Xeon TTS","style":"开心","speakerId":"Vivian"}'
```
## 目录结构
```text
xeon_tts/
├── install.sh
├── setup_env.sh
├── configure_openclaw_integration.sh
├── install_systemd_services.sh
├── start_tts_service.sh
├── start_all.sh
├── stop_tts.sh
├── self_check.sh
├── server.js
├── config.example.json
├── tts_config.example.json
├── SKILL.md
└── README.md
```
## 发布清理
开源时不要提交这些本机运行产物:
- `venv/`
- `node_modules/`
- `runtime/`
- `outputs/`
- `references/`
- `config.json`
- `tts_config.json`
- `*.log`
- `*.pid`
## 注意事项
- 这个 skill 只处理 TTS 和音色克隆,不处理转写
- 如果用户说的是“识别语音”“转文字”,应该交给 xeonasr
- 如果机器上没有可用的 `ffprobe`,WAV 参考音频仍可通过内置回退逻辑完成时长校验;若要稳定支持更多参考音频格式,仍建议安装 `ffmpeg`
- Base 模型仍建议使用干净、单人声、较少背景噪声的参考音频
FILE:self_check.sh
#!/usr/bin/env bash
set -euo pipefail
GREEN='\033[0;32m'
YELLOW='\033[1;33m'
RED='\033[0;31m'
BLUE='\033[0;34m'
NC='\033[0m'
log_step() { echo -e "BLUE[STEP]NC $1"; }
log_ok() { echo -e "GREEN[PASS]NC $1"; }
log_warn() { echo -e "YELLOW[WARN]NC $1"; }
log_fail() { echo -e "RED[FAIL]NC $1"; }
OPENCLAW_HOME="-$HOME/.openclaw"
CONFIG_FILE="-$OPENCLAW_HOME/openclaw.json"
NODE_CONFIG_FILE="$(cd "$(dirname "BASH_SOURCE[0]")" && pwd)/config.json"
TTS_CONFIG_FILE="$(cd "$(dirname "BASH_SOURCE[0]")" && pwd)/tts_config.json"
FAIL_COUNT=0
pass() { log_ok "$1"; }
fail() { log_fail "$1"; FAIL_COUNT=$((FAIL_COUNT + 1)); }
check_http_health() {
local name="$1"
local url="$2"
if curl -fsS "$url" >/dev/null 2>&1; then
pass "$name 健康检查通过: $url"
else
fail "$name 健康检查失败: $url"
fi
}
check_json_value() {
local file_path="$1"
local label="$2"
local expression="$3"
if node -e '
const fs = require("node:fs");
const cfg = JSON.parse(fs.readFileSync(process.argv[1], "utf8"));
const fn = new Function("cfg", `return (process.argv[2]);`);
if (!fn(cfg)) process.exit(1);
' "$file_path" "$expression"; then
pass "$label"
else
fail "$label"
fi
}
log_step "检查本地服务"
check_http_health "Flask TTS" "http://127.0.0.1:5002/api/health"
check_http_health "Node TTS Workflow" "http://127.0.0.1:9002/health"
log_step "检查本地配置文件"
[[ -f "$NODE_CONFIG_FILE" ]] && pass "找到 config.json" || fail "缺少 config.json"
[[ -f "$TTS_CONFIG_FILE" ]] && pass "找到 tts_config.json" || fail "缺少 tts_config.json"
if [[ -f "$NODE_CONFIG_FILE" ]]; then
check_json_value "$NODE_CONFIG_FILE" "Node 网关监听 9002" 'cfg.port === 9002'
check_json_value "$NODE_CONFIG_FILE" "Node 网关转发到 Flask 5002" 'cfg.flaskTtsUrl === "http://127.0.0.1:5002/api/tts/synthesize"'
fi
if [[ -f "$TTS_CONFIG_FILE" ]]; then
check_json_value "$TTS_CONFIG_FILE" "Base 模型已配置" 'Boolean(cfg.qwen3_tts_0_6b_base_openvino || cfg["qwen3_tts_0.6b_base_openvino"])'
check_json_value "$TTS_CONFIG_FILE" "Custom 模型已配置" 'Boolean(cfg.qwen3_tts_0_6b_custom_openvino || cfg["qwen3_tts_0.6b_custom_openvino"])'
fi
log_step "检查 OpenClaw 配置"
if [[ -f "$CONFIG_FILE" ]]; then
pass "找到 OpenClaw 配置: $CONFIG_FILE"
check_json_value "$CONFIG_FILE" "channels.qqbot.xeonTts 已开启" 'cfg.channels?.qqbot?.xeonTts?.enabled === true'
check_json_value "$CONFIG_FILE" "channels.qqbot.xeonTts.baseUrl 指向 9002" 'cfg.channels?.qqbot?.xeonTts?.baseUrl === "http://127.0.0.1:9002"'
else
fail "未找到 OpenClaw 配置: $CONFIG_FILE"
fi
log_step "检查 systemd 状态"
if systemctl --user is-enabled xeontts-tts.service >/dev/null 2>&1; then pass "xeontts-tts.service 已启用"; else fail "xeontts-tts.service 未启用"; fi
if systemctl --user is-enabled xeontts-node.service >/dev/null 2>&1; then pass "xeontts-node.service 已启用"; else fail "xeontts-node.service 未启用"; fi
if [[ "$FAIL_COUNT" -gt 0 ]]; then
log_fail "自检失败,FAIL=$FAIL_COUNT"
exit 1
fi
log_ok "自检通过"
FILE:server.js
const http = require('node:http');
const fs = require('node:fs');
const path = require('node:path');
const os = require('node:os');
const { execFile } = require('node:child_process');
const formidable = require('formidable');
const SKILL_DIR = __dirname;
const CONFIG_PATH = path.join(SKILL_DIR, 'config.json');
const DEFAULT_CONFIG = {
port: 9002,
flaskTtsUrl: 'http://127.0.0.1:5002/api/tts/synthesize',
flaskHealthUrl: 'http://127.0.0.1:5002/api/health',
cloneModel: 'qwen3_tts_0.6b_base_openvino',
cloneMode: 'voice_clone_xvector',
customModel: 'qwen3_tts_0.6b_custom_openvino',
defaultSpeaker: 'Vivian',
defaultLanguage: 'Chinese',
minReferenceDurationSec: 3,
maxReferenceDurationSec: 5,
maxCloneOutputSeconds: 20,
maxCustomOutputSeconds: 30,
estimatedCharsPerSecond: 4,
fileRetentionDays: 7,
outputDir: './outputs',
referencesDir: './references',
runtimeDir: './runtime',
sessionStateFile: './runtime/session_state.json',
openclawSession: 'default',
};
function loadConfig() {
if (!fs.existsSync(CONFIG_PATH)) {
return normalizeConfig({});
}
try {
const raw = JSON.parse(fs.readFileSync(CONFIG_PATH, 'utf8'));
return normalizeConfig(raw || {});
} catch (error) {
console.warn('[xeontts] failed to parse config.json, using defaults:', error.message);
return normalizeConfig({});
}
}
function normalizeConfig(input) {
const config = { ...DEFAULT_CONFIG, ...(input || {}) };
const retentionDays = Number(config.fileRetentionDays);
config.fileRetentionDays = Number.isFinite(retentionDays) && retentionDays >= 0 ? retentionDays : DEFAULT_CONFIG.fileRetentionDays;
config.outputDir = resolveLocalPath(config.outputDir);
config.referencesDir = resolveLocalPath(config.referencesDir);
config.runtimeDir = resolveLocalPath(config.runtimeDir);
config.sessionStateFile = resolveLocalPath(config.sessionStateFile);
return config;
}
function resolveLocalPath(value) {
if (!value) {
return value;
}
if (value.startsWith('~/')) {
return path.join(os.homedir(), value.slice(2));
}
if (path.isAbsolute(value)) {
return value;
}
return path.join(SKILL_DIR, value);
}
const config = loadConfig();
ensureRuntimeDirs();
cleanupManagedFiles();
function ensureRuntimeDirs() {
for (const dirPath of [config.outputDir, config.referencesDir, config.runtimeDir, path.dirname(config.sessionStateFile)]) {
fs.mkdirSync(dirPath, { recursive: true });
}
if (!fs.existsSync(config.sessionStateFile)) {
fs.writeFileSync(config.sessionStateFile, '{}\n');
}
}
function getRetentionNotice() {
if (Number(config.fileRetentionDays) <= 0) {
return '文件保留期未启用自动清理。';
}
return `参考音频和生成结果默认只保留 config.fileRetentionDays 天,之后会自动清理。`;
}
function removeEmptyDirsRecursively(dirPath, stopDir) {
if (!dirPath || dirPath === stopDir) {
return;
}
try {
const entries = fs.readdirSync(dirPath);
if (entries.length > 0) {
return;
}
fs.rmdirSync(dirPath);
removeEmptyDirsRecursively(path.dirname(dirPath), stopDir);
} catch {
// ignore cleanup errors
}
}
function cleanupExpiredFiles(rootDir) {
if (Number(config.fileRetentionDays) <= 0) {
return 0;
}
const cutoffMs = Date.now() - Number(config.fileRetentionDays) * 24 * 60 * 60 * 1000;
let removedCount = 0;
function walk(currentDir) {
const entries = fs.readdirSync(currentDir, { withFileTypes: true });
for (const entry of entries) {
const fullPath = path.join(currentDir, entry.name);
if (entry.isDirectory()) {
walk(fullPath);
removeEmptyDirsRecursively(fullPath, rootDir);
continue;
}
try {
const stat = fs.statSync(fullPath);
if (stat.mtimeMs < cutoffMs) {
fs.rmSync(fullPath, { force: true });
removedCount += 1;
}
} catch {
// ignore cleanup errors
}
}
}
if (fs.existsSync(rootDir)) {
walk(rootDir);
}
return removedCount;
}
function cleanupManagedFiles() {
const removedReferences = cleanupExpiredFiles(config.referencesDir);
const removedOutputs = cleanupExpiredFiles(config.outputDir);
if (removedReferences > 0 || removedOutputs > 0) {
console.log(`[xeontts] auto-cleaned expired files: references=removedReferences, outputs=removedOutputs, retentionDays=config.fileRetentionDays`);
}
}
function loadState() {
try {
return JSON.parse(fs.readFileSync(config.sessionStateFile, 'utf8')) || {};
} catch {
return {};
}
}
function saveState(state) {
fs.writeFileSync(config.sessionStateFile, `JSON.stringify(state, null, 2)\n`);
}
function firstValue(value) {
if (Array.isArray(value)) {
return value[0];
}
return value;
}
function buildSessionKey(sessionId, userId) {
return `String(sessionId || config.openclawSession || 'default').trim()::String(userId || 'anonymous').trim()`;
}
function getSessionState(sessionId, userId) {
const state = loadState();
const key = buildSessionKey(sessionId, userId);
if (!state[key]) {
state[key] = {
stage: 'idle',
sessionId: sessionId || config.openclawSession || 'default',
userId: userId || 'anonymous',
referenceAudioPath: null,
referenceDurationSec: null,
lastOutputPath: null,
updatedAt: new Date().toISOString(),
};
}
return { state, key, session: state[key] };
}
function writeSessionState(sessionId, userId, updater) {
const current = getSessionState(sessionId, userId);
updater(current.session);
current.session.updatedAt = new Date().toISOString();
saveState(current.state);
return current.session;
}
function listPendingReferenceSessions(state) {
return Object.entries(state || {})
.filter(([, session]) => session?.stage === 'awaiting_reference_audio')
.map(([key, session]) => ({ key, session }));
}
function resolvePendingReferenceSession(sessionId, userId) {
const state = loadState();
const pending = listPendingReferenceSessions(state);
const normalizedSessionId = String(sessionId || '').trim();
const normalizedUserId = String(userId || '').trim();
if (normalizedSessionId && normalizedUserId) {
const exactKey = buildSessionKey(normalizedSessionId, normalizedUserId);
const exact = pending.find((item) => item.key === exactKey);
if (exact) {
return { matched: true, sessionId: exact.session.sessionId, userId: exact.session.userId, state, key: exact.key, session: exact.session, reason: 'exact_match' };
}
}
if (normalizedSessionId) {
const sameSession = pending.filter((item) => item.session.sessionId === normalizedSessionId);
if (sameSession.length === 1) {
const match = sameSession[0];
return { matched: true, sessionId: match.session.sessionId, userId: match.session.userId, state, key: match.key, session: match.session, reason: 'session_match' };
}
if (sameSession.length > 1) {
return { matched: false, state, reason: 'ambiguous_session_match', pendingCount: sameSession.length };
}
}
if (normalizedUserId) {
const sameUser = pending.filter((item) => item.session.userId === normalizedUserId);
if (sameUser.length === 1) {
const match = sameUser[0];
return { matched: true, sessionId: match.session.sessionId, userId: match.session.userId, state, key: match.key, session: match.session, reason: 'user_match' };
}
if (sameUser.length > 1) {
return { matched: false, state, reason: 'ambiguous_user_match', pendingCount: sameUser.length };
}
}
if (pending.length === 1) {
const match = pending[0];
return { matched: true, sessionId: match.session.sessionId, userId: match.session.userId, state, key: match.key, session: match.session, reason: 'single_pending_session' };
}
if (pending.length > 1) {
return { matched: false, state, reason: 'ambiguous_pending_sessions', pendingCount: pending.length };
}
return { matched: false, state, reason: 'no_pending_clone_session', pendingCount: 0 };
}
function persistReferenceAudioForSession(sessionId, userId, sourcePath, originalFilename, durationSec) {
const sessionKey = buildSessionKey(sessionId, userId).replace(/[^a-zA-Z0-9_-]+/g, '_');
const fallbackExt = path.extname(sourcePath || '.wav') || '.wav';
const targetPath = path.join(config.referencesDir, `sessionKeypath.extname(originalFilename || '') || fallbackExt`);
fs.mkdirSync(path.dirname(targetPath), { recursive: true });
fs.copyFileSync(sourcePath, targetPath);
writeSessionState(sessionId, userId, (session) => {
session.stage = 'awaiting_clone_text';
session.referenceAudioPath = targetPath;
session.referenceDurationSec = durationSec;
});
cleanupManagedFiles();
return targetPath;
}
function sendJson(res, statusCode, payload) {
res.writeHead(statusCode, { 'Content-Type': 'application/json; charset=utf-8' });
res.end(JSON.stringify(payload));
}
function readJsonBody(req) {
return new Promise((resolve, reject) => {
const chunks = [];
req.on('data', (chunk) => chunks.push(chunk));
req.on('end', () => {
try {
const text = Buffer.concat(chunks).toString('utf8');
resolve(text ? JSON.parse(text) : {});
} catch (error) {
reject(error);
}
});
req.on('error', reject);
});
}
function detectIntent(text, session) {
const normalized = String(text || '').trim();
if (!normalized) {
return 'unknown';
}
const lower = normalized.toLowerCase();
if (/(asr|转写|识别语音|语音转文字|speech[- ]?to[- ]?text|transcribe)/i.test(normalized)) {
return 'ignore_asr';
}
if (/(克隆音色|克隆声音|复制我的声音|voice clone|clone voice)/i.test(normalized)) {
return 'clone_start';
}
if (session?.stage === 'awaiting_clone_text') {
return 'clone_text';
}
if (/(生成语音|朗读|播报|tts|text to speech|语气|风格|声音说)/i.test(normalized)) {
return 'custom_speak';
}
if (/^用.+(语气|风格).+(说|朗读|播报)/.test(normalized)) {
return 'custom_speak';
}
return 'unknown';
}
function parseStyleAndContent(text) {
const normalized = String(text || '').trim();
const styleMatch = normalized.match(/用(.{1,12}?)(?:的)?(?:语气|风格|声音)/);
const style = styleMatch ? styleMatch[1].trim() : '普通';
const contentPatterns = [
/(?:说|朗读|播报|念出|生成)(?:一段话|这段话|下面这段话|以下内容)?[::]?\s*(.+)$/,
/(?:请|帮我)?(?:用.{0,12}?(?:语气|风格|声音))[::]?\s*(.+)$/,
/(?:生成语音|生成音频)[::]?\s*(.+)$/,
];
for (const pattern of contentPatterns) {
const match = normalized.match(pattern);
if (match && match[1]) {
return { style, content: match[1].trim() };
}
}
return { style, content: normalized };
}
function extractRequestedDurationSeconds(text) {
const normalized = String(text || '').trim();
const minuteMatch = normalized.match(/(\d+(?:\.\d+)?)\s*(分钟|分)/);
if (minuteMatch) {
return Number(minuteMatch[1]) * 60;
}
const secondMatch = normalized.match(/(\d+(?:\.\d+)?)\s*秒/);
if (secondMatch) {
return Number(secondMatch[1]);
}
return null;
}
function estimateOutputDurationSeconds(text) {
const chars = Array.from(String(text || '').replace(/\s+/g, '')).length;
if (!chars) {
return 0;
}
return chars / Number(config.estimatedCharsPerSecond || 4);
}
function validateRequestedOutput(text, maxSeconds, label) {
const requested = extractRequestedDurationSeconds(text);
if (requested && requested > maxSeconds) {
throw new Error(`label请求时长超过上限,当前最大支持约 maxSeconds 秒`);
}
const estimated = estimateOutputDurationSeconds(text);
if (estimated > maxSeconds) {
throw new Error(`label内容预计时长约 estimated.toFixed(1) 秒,超过当前上限 maxSeconds 秒,请拆分后重试`);
}
return { requestedSeconds: requested, estimatedSeconds: estimated };
}
function getMimeType(filePath, contentType) {
if (contentType && String(contentType).startsWith('audio/')) {
return contentType;
}
const ext = path.extname(filePath).toLowerCase();
switch (ext) {
case '.wav':
return 'audio/wav';
case '.mp3':
return 'audio/mpeg';
case '.m4a':
return 'audio/mp4';
case '.ogg':
case '.opus':
return 'audio/ogg';
default:
return 'application/octet-stream';
}
}
function readWavDurationSec(filePath) {
const stat = fs.statSync(filePath);
const fd = fs.openSync(filePath, 'r');
try {
const riffHeader = Buffer.alloc(12);
fs.readSync(fd, riffHeader, 0, 12, 0);
if (riffHeader.toString('ascii', 0, 4) !== 'RIFF' || riffHeader.toString('ascii', 8, 12) !== 'WAVE') {
return null;
}
let offset = 12;
let sampleRate = null;
let channels = null;
let bitsPerSample = null;
let dataSize = null;
while (offset + 8 <= stat.size) {
const chunkHeader = Buffer.alloc(8);
const bytesRead = fs.readSync(fd, chunkHeader, 0, 8, offset);
if (bytesRead < 8) {
break;
}
const chunkId = chunkHeader.toString('ascii', 0, 4);
const chunkSize = chunkHeader.readUInt32LE(4);
const chunkDataOffset = offset + 8;
if (chunkId === 'fmt ') {
const fmtBuffer = Buffer.alloc(Math.min(chunkSize, 32));
fs.readSync(fd, fmtBuffer, 0, fmtBuffer.length, chunkDataOffset);
channels = fmtBuffer.readUInt16LE(2);
sampleRate = fmtBuffer.readUInt32LE(4);
bitsPerSample = fmtBuffer.readUInt16LE(14);
} else if (chunkId === 'data') {
dataSize = chunkSize;
}
if (sampleRate && channels && bitsPerSample && dataSize) {
break;
}
offset = chunkDataOffset + chunkSize + (chunkSize % 2);
}
if (!sampleRate || !channels || !bitsPerSample || !dataSize) {
return null;
}
const bytesPerSecond = sampleRate * channels * (bitsPerSample / 8);
if (!bytesPerSecond) {
return null;
}
return dataSize / bytesPerSecond;
} finally {
fs.closeSync(fd);
}
}
function getAudioDurationSec(filePath) {
return new Promise((resolve, reject) => {
execFile(
'ffprobe',
['-v', 'error', '-show_entries', 'format=duration', '-of', 'default=noprint_wrappers=1:nokey=1', filePath],
(error, stdout) => {
if (error) {
const wavDuration = readWavDurationSec(filePath);
if (wavDuration && Number.isFinite(wavDuration) && wavDuration > 0) {
console.warn('[xeontts] ffprobe 不可用,已回退为 WAV 头时长解析');
resolve(wavDuration);
return;
}
reject(new Error('未找到 ffprobe,且无法通过 WAV 头解析音频时长,请先安装可用的 ffmpeg/ffprobe'));
return;
}
const duration = Number(String(stdout || '').trim());
if (!Number.isFinite(duration) || duration <= 0) {
const wavDuration = readWavDurationSec(filePath);
if (wavDuration && Number.isFinite(wavDuration) && wavDuration > 0) {
console.warn('[xeontts] ffprobe 输出无效,已回退为 WAV 头时长解析');
resolve(wavDuration);
return;
}
reject(new Error('无法读取参考音频时长,请上传常见音频格式'));
return;
}
resolve(duration);
},
);
});
}
function uniqueOutputPath(prefix) {
const date = new Date().toISOString().slice(0, 10);
const dirPath = path.join(config.outputDir, date);
fs.mkdirSync(dirPath, { recursive: true });
return path.join(dirPath, `prefix_Date.now().wav`);
}
async function callFlaskTtsJson(payload) {
const url = new URL(config.flaskTtsUrl);
url.searchParams.set('response_format', 'json');
const response = await fetch(url, {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify(payload),
});
const data = await response.json().catch(() => ({}));
if (!response.ok || data.success === false) {
throw new Error(data.error || `TTS 服务请求失败 (response.status)`);
}
return data;
}
async function synthesizeCustomVoice(options) {
const text = String(options.text || '').trim();
if (!text) {
throw new Error('缺少要生成的文本');
}
const timing = validateRequestedOutput(text, Number(config.maxCustomOutputSeconds || 30), '自定义音色');
const style = String(options.style || '普通').trim() || '普通';
const payload = {
text,
model: config.customModel,
tts_model: config.customModel,
tts_mode: 'custom_voice',
language: options.language || config.defaultLanguage,
speaker_id: options.speakerId || config.defaultSpeaker,
instruct_text: style === '普通' ? '' : `请用style的语气朗读这段话。`,
};
const result = await callFlaskTtsJson(payload);
const outputPath = uniqueOutputPath('custom');
fs.writeFileSync(outputPath, Buffer.from(result.audio_base64, 'base64'));
cleanupManagedFiles();
return {
mode: 'custom_voice',
outputPath,
sampleRate: result.sample_rate,
estimatedSeconds: timing.estimatedSeconds,
style,
};
}
async function synthesizeClonedVoice(options) {
const text = String(options.text || '').trim();
if (!text) {
throw new Error('缺少要生成的文本');
}
if (!options.referenceAudioPath || !fs.existsSync(options.referenceAudioPath)) {
throw new Error('当前会话没有可用的参考音频,请先上传 3 到 5 秒参考音频');
}
const timing = validateRequestedOutput(text, Number(config.maxCloneOutputSeconds || 20), '音色克隆');
const form = new FormData();
form.append('text', text);
form.append('model', config.cloneModel);
form.append('tts_model', config.cloneModel);
form.append('tts_mode', config.cloneMode);
form.append('language', options.language || config.defaultLanguage);
form.append('x_vector_only_mode', String(config.cloneMode === 'voice_clone_xvector'));
const audioBuffer = fs.readFileSync(options.referenceAudioPath);
form.append('prompt_audio', new Blob([audioBuffer], { type: getMimeType(options.referenceAudioPath) }), path.basename(options.referenceAudioPath));
if (config.cloneMode !== 'voice_clone_xvector' && options.referenceText) {
form.append('ref_text', options.referenceText);
}
const url = new URL(config.flaskTtsUrl);
url.searchParams.set('response_format', 'json');
const response = await fetch(url, { method: 'POST', body: form });
const data = await response.json().catch(() => ({}));
if (!response.ok || data.success === false) {
throw new Error(data.error || `音色克隆请求失败 (response.status)`);
}
const outputPath = uniqueOutputPath('clone');
fs.writeFileSync(outputPath, Buffer.from(data.audio_base64, 'base64'));
cleanupManagedFiles();
return {
mode: 'voice_clone',
outputPath,
sampleRate: data.sample_rate,
estimatedSeconds: timing.estimatedSeconds,
};
}
function parseMultipart(req) {
const form = new formidable.IncomingForm({
uploadDir: config.runtimeDir,
keepExtensions: true,
multiples: false,
});
return new Promise((resolve, reject) => {
form.parse(req, (error, fields, files) => {
if (error) {
reject(error);
return;
}
resolve({ fields, files });
});
});
}
function getUploadedFile(files) {
const candidate = files.file || files.audio || files.prompt_audio || null;
if (Array.isArray(candidate)) {
return candidate[0] || null;
}
return candidate;
}
async function handleWorkflowMessage(body) {
const sessionId = body.sessionId || config.openclawSession;
const userId = body.userId || 'anonymous';
const text = String(body.text || '').trim();
const snapshot = getSessionState(sessionId, userId);
const intent = detectIntent(text, snapshot.session);
if (intent === 'ignore_asr') {
return {
success: true,
action: 'ignore',
reason: 'detected_asr_request',
message: '当前请求是语音识别/转写意图,应交给 xeon_asr,不走 xeon_tts。',
};
}
if (intent === 'clone_start') {
writeSessionState(sessionId, userId, (session) => {
session.stage = 'awaiting_reference_audio';
session.referenceAudioPath = null;
session.referenceDurationSec = null;
});
return {
success: true,
action: 'ask_reference_audio',
message: `请上传一段 config.minReferenceDurationSec 到 config.maxReferenceDurationSec 秒的干净人声参考音频,我会用 Base 模型为你克隆音色。getRetentionNotice()`,
};
}
if (snapshot.session.stage === 'awaiting_reference_audio') {
return {
success: true,
action: 'waiting_reference_audio',
message: `当前正在进行音色克隆,请先上传一段 config.minReferenceDurationSec 到 config.maxReferenceDurationSec 秒参考音频。`,
};
}
if (intent === 'clone_text') {
const result = await synthesizeClonedVoice({
text,
referenceAudioPath: snapshot.session.referenceAudioPath,
language: body.language || config.defaultLanguage,
referenceText: body.referenceText || '',
});
writeSessionState(sessionId, userId, (session) => {
session.stage = 'idle';
session.lastOutputPath = result.outputPath;
});
return {
success: true,
action: 'synthesized_clone_voice',
message: `音色克隆完成,音频已落盘: result.outputPath。getRetentionNotice()`,
outputPath: result.outputPath,
estimatedSeconds: result.estimatedSeconds,
};
}
if (intent === 'custom_speak') {
const parsed = parseStyleAndContent(text);
const result = await synthesizeCustomVoice({
text: body.content || parsed.content,
style: body.style || parsed.style,
language: body.language || config.defaultLanguage,
speakerId: body.speakerId || config.defaultSpeaker,
});
writeSessionState(sessionId, userId, (session) => {
session.stage = 'idle';
session.lastOutputPath = result.outputPath;
});
return {
success: true,
action: 'synthesized_custom_voice',
message: `语音生成完成,音频已落盘: result.outputPath。getRetentionNotice()`,
outputPath: result.outputPath,
style: result.style,
estimatedSeconds: result.estimatedSeconds,
};
}
return {
success: true,
action: 'noop',
message: '未识别到 TTS/音色克隆意图。若你要克隆音色,请明确说“我要克隆音色”;若要生成语音,请明确说“用某种语气朗读这段话”。',
};
}
const server = http.createServer(async (req, res) => {
try {
if (req.method === 'GET' && req.url === '/health') {
return sendJson(res, 200, { status: 'ok', port: config.port });
}
if (req.method === 'GET' && req.url.startsWith('/api/session/state')) {
const url = new URL(req.url, 'http://127.0.0.1');
const sessionId = url.searchParams.get('sessionId') || config.openclawSession;
const userId = url.searchParams.get('userId') || 'anonymous';
const current = getSessionState(sessionId, userId).session;
return sendJson(res, 200, { success: true, session: current });
}
if (req.method === 'POST' && req.url === '/api/workflow/message') {
const body = await readJsonBody(req);
const result = await handleWorkflowMessage(body || {});
return sendJson(res, 200, result);
}
if (req.method === 'POST' && req.url === '/api/workflow/reference-audio') {
const parsed = await parseMultipart(req);
const file = getUploadedFile(parsed.files);
const sessionId = firstValue(parsed.fields.sessionId) || config.openclawSession;
const userId = firstValue(parsed.fields.userId) || 'anonymous';
if (!file?.filepath) {
return sendJson(res, 400, { success: false, error: 'missing audio file' });
}
const durationSec = await getAudioDurationSec(file.filepath);
if (durationSec < Number(config.minReferenceDurationSec || 3) || durationSec > Number(config.maxReferenceDurationSec || 5)) {
fs.rmSync(file.filepath, { force: true });
return sendJson(res, 400, {
success: false,
error: `参考音频时长为 durationSec.toFixed(2) 秒,不在 config.minReferenceDurationSec 到 config.maxReferenceDurationSec 秒范围内`,
});
}
const targetPath = persistReferenceAudioForSession(sessionId, userId, file.filepath, file.originalFilename || file.filepath, durationSec);
fs.rmSync(file.filepath, { force: true });
return sendJson(res, 200, {
success: true,
action: 'reference_audio_saved',
durationSec,
referenceAudioPath: targetPath,
message: `参考音频已接收,请继续发送你希望用该音色朗读的文本。getRetentionNotice()`,
});
}
if (req.method === 'POST' && req.url === '/api/workflow/asr-audio-intake') {
const parsed = await parseMultipart(req);
const file = getUploadedFile(parsed.files);
if (!file?.filepath) {
return sendJson(res, 400, { success: false, consumed: false, error: 'missing audio file' });
}
const sessionId = firstValue(parsed.fields.sessionId) || req.headers['x-openclaw-session'] || req.headers['x-session-id'] || config.openclawSession;
const userId = firstValue(parsed.fields.userId) || firstValue(parsed.fields.senderId) || req.headers['x-user-id'] || req.headers['x-sender-id'] || '';
const matched = resolvePendingReferenceSession(sessionId, userId);
if (!matched.matched) {
fs.rmSync(file.filepath, { force: true });
return sendJson(res, 200, {
success: true,
consumed: false,
reason: matched.reason,
pendingCount: matched.pendingCount || 0,
message: '当前没有等待参考音频的音色克隆会话。',
});
}
const durationSec = await getAudioDurationSec(file.filepath);
if (durationSec < Number(config.minReferenceDurationSec || 3) || durationSec > Number(config.maxReferenceDurationSec || 5)) {
fs.rmSync(file.filepath, { force: true });
return sendJson(res, 200, {
success: true,
consumed: true,
accepted: false,
sessionId: matched.sessionId,
userId: matched.userId,
reason: 'invalid_reference_duration',
durationSec,
message: `参考音频时长为 durationSec.toFixed(2) 秒,不在 config.minReferenceDurationSec 到 config.maxReferenceDurationSec 秒范围内,请重新上传。getRetentionNotice()`,
transcriptText: `【TTS参考音频未接收】参考音频时长为 durationSec.toFixed(2) 秒,不在 config.minReferenceDurationSec 到 config.maxReferenceDurationSec 秒范围内,请提醒用户重新上传 3 到 5 秒的干净人声。`,
});
}
const targetPath = persistReferenceAudioForSession(
matched.sessionId,
matched.userId,
file.filepath,
file.originalFilename || file.filepath,
durationSec,
);
fs.rmSync(file.filepath, { force: true });
return sendJson(res, 200, {
success: true,
consumed: true,
accepted: true,
action: 'reference_audio_saved_via_asr',
reason: matched.reason,
sessionId: matched.sessionId,
userId: matched.userId,
durationSec,
referenceAudioPath: targetPath,
message: `参考音频已接收,请继续发送你希望用该音色朗读的文本。getRetentionNotice()`,
transcriptText: '【TTS参考音频已接收】当前音色克隆会话的参考音频已经保存,请直接让用户继续发送要合成的文本,不要再要求重复上传,也不要说文件已被 ASR 清理。',
});
}
if (req.method === 'POST' && req.url === '/api/tts/custom-speak') {
const body = await readJsonBody(req);
const result = await synthesizeCustomVoice({
text: body.text,
style: body.style || '普通',
language: body.language || config.defaultLanguage,
speakerId: body.speakerId || config.defaultSpeaker,
});
return sendJson(res, 200, { success: true, ...result });
}
if (req.method === 'POST' && req.url === '/api/tts/clone-speak') {
const body = await readJsonBody(req);
const result = await synthesizeClonedVoice({
text: body.text,
referenceAudioPath: body.referenceAudioPath,
language: body.language || config.defaultLanguage,
referenceText: body.referenceText || '',
});
return sendJson(res, 200, { success: true, ...result });
}
return sendJson(res, 404, { success: false, error: 'not found' });
} catch (error) {
console.error('[xeontts] request failed:', error);
return sendJson(res, 500, { success: false, error: error.message || String(error) });
}
});
server.listen(config.port, '0.0.0.0', async () => {
let ttsHealth = 'unknown';
try {
const response = await fetch(config.flaskHealthUrl);
ttsHealth = response.ok ? 'ok' : `http_response.status`;
} catch (error) {
ttsHealth = `unreachable: error.message`;
}
console.log(`[xeontts] workflow gateway listening on http://0.0.0.0:config.port`);
console.log(`[xeontts] upstream tts health: ttsHealth`);
});
FILE:setup_env.sh
#!/usr/bin/env bash
set -euo pipefail
export HF_ENDPOINT=https://hf-mirror.com
export HF_HUB_ENABLE_HF_TRANSFER=0
RED='\033[0;31m'
GREEN='\033[0;32m'
YELLOW='\033[1;33m'
BLUE='\033[0;34m'
NC='\033[0m'
log_info() { echo -e "GREEN[INFO]NC $1"; }
log_warn() { echo -e "YELLOW[WARN]NC $1"; }
log_error() { echo -e "RED[ERROR]NC $1"; }
log_step() { echo -e "BLUE[STEP]NC $1"; }
SKILL_DIR="$(cd "$(dirname "BASH_SOURCE[0]")" && pwd)"
cd "$SKILL_DIR"
BASE_MODEL_PATH="-~/model/Qwen3-TTS-12Hz-0.6B-Base-OpenVINO-INT8"
CUSTOM_MODEL_PATH="-~/model/Qwen3-TTS-12Hz-0.6B-CustomVoice-OpenVINO-INT8"
BASE_CHECKPOINT_PATH="-"
BASE_MODEL_REPO="-aurora2035/Qwen3-TTS-12Hz-0.6B-Base-OpenVINO-INT8"
CUSTOM_MODEL_REPO="-aurora2035/Qwen3-TTS-12Hz-0.6B-CustomVoice-OpenVINO-INT8"
BASE_CHECKPOINT_REPO="-"
TTS_PIP_SPEC="-xdp-tts-service"
FORCE=0
SKIP_DEPS=0
usage() {
cat <<EOF
用法: $0 [--force] [--skip-deps]
环境变量覆盖:
XDP_TTS_PIP_SPEC Python 包安装源,默认 xdp-tts-service
BASE_MODEL_PATH Base OV 模型目录
CUSTOM_MODEL_PATH Custom OV 模型目录
BASE_CHECKPOINT_PATH 可选:Base 原始 checkpoint 目录(旧导出兼容)
BASE_MODEL_REPO Base OV 模型 HF 仓库名
CUSTOM_MODEL_REPO Custom OV 模型 HF 仓库名
BASE_CHECKPOINT_REPO 可选:Base checkpoint HF 仓库名(旧导出兼容)
EOF
}
while [[ $# -gt 0 ]]; do
case "$1" in
--force) FORCE=1; shift ;;
--skip-deps) SKIP_DEPS=1; shift ;;
-h|--help) usage; exit 0 ;;
*) log_error "未知参数: $1"; usage; exit 1 ;;
esac
done
expand_path() {
local value="$1"
if [[ "$value" == ~* ]]; then
value="value/#\~/$HOME"
fi
printf '%s\n' "$value"
}
detect_os() {
if [[ -f /etc/os-release ]]; then
. /etc/os-release
printf '%s\n' "$ID"
return 0
fi
printf '%s\n' unknown
}
check_sudo() {
if [[ "$EUID" -eq 0 ]]; then
SUDO=""
elif command -v sudo >/dev/null 2>&1; then
SUDO="sudo"
else
SUDO=""
fi
}
install_system_deps() {
[[ "$SKIP_DEPS" -eq 1 ]] && return 0
check_sudo
local os_id
os_id="$(detect_os)"
case "$os_id" in
ubuntu|debian)
log_step "安装系统依赖 (Debian/Ubuntu)"
$SUDO apt-get update -qq >/dev/null 2>&1 || true
$SUDO apt-get install -y wget curl git lsof net-tools unzip bzip2 ca-certificates ffmpeg >/dev/null 2>&1 || \
$SUDO apt-get install -y wget curl git lsof net-tools unzip bzip2 ca-certificates ffmpeg
;;
centos|rhel|fedora|rocky|almalinux|ol|alibabacloud|alios)
log_step "安装系统依赖 (RHEL/CentOS)"
local pkg_mgr="yum"
command -v dnf >/dev/null 2>&1 && pkg_mgr="dnf"
$SUDO "$pkg_mgr" install -y wget curl git lsof net-tools unzip bzip2 ca-certificates ffmpeg which >/dev/null 2>&1 || \
$SUDO "$pkg_mgr" install -y wget curl git lsof net-tools unzip bzip2 ca-certificates ffmpeg which
;;
*)
log_warn "未知系统,跳过系统依赖自动安装"
;;
esac
}
setup_miniconda() {
log_step "准备 Miniconda Python 3.10"
local conda_dir="$HOME/miniconda3"
local conda_url="https://repo.anaconda.com/miniconda/Miniconda3-py310_23.11.0-2-Linux-x86_64.sh"
if [[ "$FORCE" -eq 1 && -d "$conda_dir" ]]; then
rm -rf "$conda_dir"
fi
if [[ ! -d "$conda_dir" ]]; then
wget --timeout=120 -q "$conda_url" -O /tmp/miniconda.sh || curl -fsSL --connect-timeout 120 "$conda_url" -o /tmp/miniconda.sh
bash /tmp/miniconda.sh -b -p "$conda_dir" >/dev/null 2>&1
rm -f /tmp/miniconda.sh
fi
PYTHON_CMD="$conda_dir/bin/python"
[[ -x "$PYTHON_CMD" ]] || { log_error "Miniconda 安装失败"; exit 1; }
log_info "Python 就绪: $($PYTHON_CMD --version 2>&1)"
}
setup_venv() {
if [[ "$FORCE" -eq 1 && -d venv ]]; then
rm -rf venv
fi
if [[ ! -d venv ]]; then
log_step "创建虚拟环境"
"$PYTHON_CMD" -m venv venv
fi
source venv/bin/activate
pip install -q --upgrade pip
}
install_python_packages() {
log_step "安装 Python TTS 服务包"
if ! pip install -q --upgrade "$TTS_PIP_SPEC"; then
log_error "安装失败: $TTS_PIP_SPEC"
log_error "如果包尚未发布,请设置 XDP_TTS_PIP_SPEC=/path/to/xdp_tts_service.whl 后重试"
exit 1
fi
log_info "已安装: $TTS_PIP_SPEC"
}
verify_python_runtime() {
log_step "校验 Python TTS 运行时"
[[ -x venv/bin/xdp-tts-service ]] || {
log_error "安装后仍未生成 venv/bin/xdp-tts-service"
log_error "请检查 XDP_TTS_PIP_SPEC 是否指向包含 console entry point 的有效包"
exit 1
}
venv/bin/python <<'PYEOF'
import importlib.util
import sys
required = [
'xdp_tts_service',
'qwen_tts',
]
missing = []
for name in required:
if importlib.util.find_spec(name) is None:
missing.append(name)
if missing:
print('missing python modules:', ', '.join(missing), file=sys.stderr)
sys.exit(1)
PYEOF
}
copy_node_config() {
if [[ -f config.json && "$FORCE" -ne 1 ]]; then
log_info "config.json 已存在,跳过"
return 0
fi
cp config.example.json config.json
log_info "已生成 config.json"
}
generate_tts_config() {
if [[ -f tts_config.json && "$FORCE" -ne 1 ]]; then
log_info "tts_config.json 已存在,跳过"
return 0
fi
if [[ -x venv/bin/xdp-tts-init-config ]]; then
venv/bin/xdp-tts-init-config --output ./tts_config.json >/dev/null
else
cp tts_config.example.json tts_config.json
fi
log_info "已生成 tts_config.json"
}
resolve_hf_cli() {
export PATH="$HOME/miniconda3/bin:$HOME/.local/bin:$PATH"
if command -v hf >/dev/null 2>&1; then
printf '%s\n' "$(command -v hf)"
return 0
fi
if command -v huggingface-cli >/dev/null 2>&1; then
printf '%s\n' "$(command -v huggingface-cli)"
return 0
fi
if pip install -q 'huggingface_hub[cli]' >/dev/null 2>&1 || pip install -q huggingface_hub >/dev/null 2>&1; then
export PATH="$HOME/miniconda3/bin:$HOME/.local/bin:$PATH"
if command -v hf >/dev/null 2>&1; then
printf '%s\n' "$(command -v hf)"
return 0
fi
if command -v huggingface-cli >/dev/null 2>&1; then
printf '%s\n' "$(command -v huggingface-cli)"
return 0
fi
fi
return 1
}
download_model_if_missing() {
local repo_id="$1"
local target_path
[[ -n "$repo_id" ]] || return 0
[[ -n "-" ]] || return 0
target_path="$(expand_path "$2")"
if [[ -d "$target_path" ]] && [[ -n "$(ls -A "$target_path" 2>/dev/null || true)" ]]; then
log_info "模型目录已存在: $target_path"
return 0
fi
local hf_cli
if ! hf_cli="$(resolve_hf_cli)"; then
log_warn "未找到 Hugging Face CLI,跳过自动下载: $repo_id"
return 0
fi
mkdir -p "$target_path"
log_step "尝试下载模型: $repo_id -> $target_path"
if [[ "$(basename "$hf_cli")" == "hf" ]]; then
"$hf_cli" download "$repo_id" --local-dir "$target_path" || log_warn "下载失败,请后续手工补齐: $repo_id"
else
"$hf_cli" download "$repo_id" --local-dir "$target_path" --local-dir-use-symlinks False || log_warn "下载失败,请后续手工补齐: $repo_id"
fi
}
update_tts_model_paths() {
local base_model_abs custom_model_abs base_ckpt_abs
base_model_abs="$(expand_path "$BASE_MODEL_PATH")"
custom_model_abs="$(expand_path "$CUSTOM_MODEL_PATH")"
base_ckpt_abs=""
if [[ -n "$BASE_CHECKPOINT_PATH" ]]; then
base_ckpt_abs="$(expand_path "$BASE_CHECKPOINT_PATH")"
fi
log_step "写入 TTS 模型路径到 tts_config.json"
venv/bin/python <<PYEOF
import json
from pathlib import Path
config_path = Path("tts_config.json")
config = json.loads(config_path.read_text(encoding="utf-8"))
legacy_base = config.pop("qwen3_tts_base", None) or {}
legacy_custom = config.pop("qwen3_tts_custom", None) or {}
base_cfg = config.setdefault("qwen3_tts_0.6b_base_openvino", {})
custom_cfg = config.setdefault("qwen3_tts_0.6b_custom_openvino", {})
if legacy_base and not base_cfg.get("model_dir"):
base_cfg["model_dir"] = legacy_base.get("model") or legacy_base.get("model_dir") or ""
if legacy_custom and not custom_cfg.get("model_dir"):
custom_cfg["model_dir"] = legacy_custom.get("model") or legacy_custom.get("model_dir") or ""
base_cfg["model_dir"] = r"$base_model_abs"
base_cfg.setdefault("label", "Qwen3-TTS-0.6B-Base(OpenVINO)")
base_cfg.setdefault("model_type", "Qwen3_TTS_OpenVINO")
base_cfg.setdefault("tts_model_type", "voice_clone")
base_cfg.setdefault("force_cpu", False)
base_cfg.setdefault("default_mode", "voice_clone_xvector")
base_cfg.setdefault("modes", ["voice_clone", "voice_clone_xvector"])
base_cfg.setdefault("device", "CPU")
base_cfg.setdefault("default_language", "Chinese")
base_cfg.setdefault("prompt_text", "")
base_cfg.setdefault("prompt_audio", "")
custom_cfg["model_dir"] = r"$custom_model_abs"
custom_cfg.setdefault("label", "Qwen3-TTS-0.6B-CustomVoice(OpenVINO)")
custom_cfg.setdefault("model_type", "Qwen3_TTS_OpenVINO")
custom_cfg.setdefault("tts_model_type", "custom_voice")
custom_cfg.setdefault("force_cpu", False)
custom_cfg.setdefault("default_mode", "custom_voice")
custom_cfg.setdefault("modes", ["custom_voice"])
custom_cfg.setdefault("device", "CPU")
custom_cfg.setdefault("default_language", "Chinese")
custom_cfg.setdefault("default_speaker", "Vivian")
custom_cfg.setdefault("speakers", ["Vivian", "Serena", "Uncle_Fu", "Dylan", "Eric", "Ryan", "Aiden", "Ono_Anna", "Sohee"])
base_ckpt = r"$base_ckpt_abs".strip()
if base_ckpt:
base_cfg["checkpoint_path"] = base_ckpt
else:
base_cfg.pop("checkpoint_path", None)
config_path.write_text(json.dumps(config, indent=2, ensure_ascii=False) + "\n", encoding="utf-8")
PYEOF
}
repair_base_checkpoint_hint() {
local base_model_abs hint_file resolved_hint
base_model_abs="$(expand_path "$BASE_MODEL_PATH")"
hint_file="$base_model_abs/checkpoint_path.txt"
[[ -f "$hint_file" ]] || return 0
resolved_hint="$(tr -d '\r' < "$hint_file" | head -n 1 | xargs)"
if [[ -z "$resolved_hint" || ! -e "$(expand_path "$resolved_hint")" ]]; then
printf '%s\n' "$base_model_abs" > "$hint_file"
log_info "已修复 Base 模型 checkpoint_path.txt: $hint_file"
fi
}
main() {
echo "========================================"
echo " Xeon TTS Skill 环境准备"
echo "========================================"
install_system_deps
setup_miniconda
setup_venv
install_python_packages
verify_python_runtime
copy_node_config
generate_tts_config
download_model_if_missing "$BASE_MODEL_REPO" "$BASE_MODEL_PATH"
download_model_if_missing "$CUSTOM_MODEL_REPO" "$CUSTOM_MODEL_PATH"
download_model_if_missing "$BASE_CHECKPOINT_REPO" "$BASE_CHECKPOINT_PATH"
update_tts_model_paths
repair_base_checkpoint_hint
mkdir -p runtime outputs references
log_info "Xeon TTS 环境准备完成"
}
main "$@"
FILE:start_all.sh
#!/usr/bin/env bash
set -euo pipefail
SCRIPT_DIR="$(cd "$(dirname "BASH_SOURCE[0]")" && pwd)"
cd "$SCRIPT_DIR"
GREEN='\033[0;32m'
YELLOW='\033[1;33m'
BLUE='\033[0;34m'
NC='\033[0m'
log_info() { echo -e "GREEN[INFO]NC $1"; }
log_warn() { echo -e "YELLOW[WARN]NC $1"; }
log_step() { echo -e "BLUE[STEP]NC $1"; }
echo "========================================"
echo " 启动 Xeon TTS 服务"
echo "========================================"
[ ! -d node_modules ] && npm install
log_step "启动 Python TTS 服务 (5002)"
if ! lsof -Pi :5002 -sTCP:LISTEN -t >/dev/null 2>&1; then
./start_tts_service.sh
sleep 2
fi
log_step "启动 Node TTS 工作流网关 (9002)"
pkill -f "node.*server.js" 2>/dev/null || true
sleep 1
(setsid node server.js >> skill.log 2>&1 </dev/null &)
sleep 3
PID_9002=$(lsof -Pi :9002 -sTCP:LISTEN -t 2>/dev/null | head -1 || true)
if [[ -z "$PID_9002" ]]; then
log_warn "未检测到 9002 监听进程,请查看 skill.log"
exit 1
fi
echo "$PID_9002" > skill.pid
log_info "Node 网关已启动 (PID: $PID_9002)"
if command -v openclaw >/dev/null 2>&1; then
log_step "重启 OpenClaw gateway"
openclaw gateway restart || log_warn "gateway restart 失败,请手工执行"
fi
log_info "完成。健康检查: curl http://127.0.0.1:5002/api/health && curl http://127.0.0.1:9002/health"
FILE:start_tts_service.sh
#!/usr/bin/env bash
set -euo pipefail
GREEN='\033[0;32m'
YELLOW='\033[1;33m'
RED='\033[0;31m'
BLUE='\033[0;34m'
NC='\033[0m'
log_info() { echo -e "GREEN[INFO]NC $1"; }
log_warn() { echo -e "YELLOW[WARN]NC $1"; }
log_error() { echo -e "RED[ERROR]NC $1"; }
log_step() { echo -e "BLUE[STEP]NC $1"; }
SCRIPT_DIR="$(cd "$(dirname "BASH_SOURCE[0]")" && pwd)"
cd "$SCRIPT_DIR"
PID_FILE="$SCRIPT_DIR/tts.pid"
LOG_FILE="$SCRIPT_DIR/tts.log"
PORT=5002
START_BIN="$SCRIPT_DIR/venv/bin/xdp-tts-service"
[[ -x "$START_BIN" ]] || { log_error "未找到 $START_BIN,请先运行 bash setup_env.sh"; exit 1; }
[[ -f "$SCRIPT_DIR/tts_config.json" ]] || { log_error "未找到 tts_config.json"; exit 1; }
if lsof -Pi :$PORT -sTCP:LISTEN -t >/dev/null 2>&1; then
log_error "端口 $PORT 已被占用"
exit 1
fi
if [[ -f "$PID_FILE" ]] && kill -0 "$(cat "$PID_FILE")" 2>/dev/null; then
log_error "TTS 服务已在运行 (PID: $(cat "$PID_FILE"))"
exit 1
fi
log_step "启动 TTS Flask 服务"
XDP_TTS_CONFIG="$SCRIPT_DIR/tts_config.json" nohup "$START_BIN" --host 127.0.0.1 --port $PORT --config "$SCRIPT_DIR/tts_config.json" > "$LOG_FILE" 2>&1 &
echo $! > "$PID_FILE"
sleep 3
if curl -fsS http://127.0.0.1:$PORT/api/health >/dev/null 2>&1; then
log_info "TTS 服务启动成功: http://127.0.0.1:$PORT/api/health"
else
log_warn "TTS 服务已启动,但健康检查暂未通过,请查看 $LOG_FILE"
fi
FILE:stop_tts.sh
#!/usr/bin/env bash
set -euo pipefail
SCRIPT_DIR="$(cd "$(dirname "BASH_SOURCE[0]")" && pwd)"
cd "$SCRIPT_DIR"
if [[ -f tts.pid ]]; then
PID="$(cat tts.pid)"
kill "$PID" 2>/dev/null || kill -9 "$PID" 2>/dev/null || true
rm -f tts.pid
fi
PID_9002=$(lsof -Pi :9002 -sTCP:LISTEN -t 2>/dev/null | head -1 || true)
if [[ -n "$PID_9002" ]]; then
kill "$PID_9002" 2>/dev/null || kill -9 "$PID_9002" 2>/dev/null || true
fi
pkill -f "node.*server.js" 2>/dev/null || true
echo "Xeon TTS 服务已停止"
FILE:tts_config.example.json
{
"qwen3_tts_0.6b_base_openvino": {
"label": "Qwen3-TTS-0.6B-Base(OpenVINO)",
"model_dir": "~/model/Qwen3-TTS-12Hz-0.6B-Base-OpenVINO-INT8",
"model_type": "Qwen3_TTS_OpenVINO",
"tts_model_type": "voice_clone",
"force_cpu": false,
"default_mode": "voice_clone_xvector",
"modes": ["voice_clone", "voice_clone_xvector"],
"device": "CPU",
"default_language": "Chinese",
"prompt_text": "",
"prompt_audio": ""
},
"qwen3_tts_0.6b_custom_openvino": {
"label": "Qwen3-TTS-0.6B-CustomVoice(OpenVINO)",
"model_dir": "~/model/Qwen3-TTS-12Hz-0.6B-CustomVoice-OpenVINO-INT8",
"model_type": "Qwen3_TTS_OpenVINO",
"tts_model_type": "custom_voice",
"force_cpu": false,
"default_mode": "custom_voice",
"modes": ["custom_voice"],
"device": "CPU",
"default_language": "Chinese",
"default_speaker": "Vivian",
"speakers": ["Vivian", "Serena", "Uncle_Fu", "Dylan", "Eric", "Ryan", "Aiden", "Ono_Anna", "Sohee"]
}
}
FILE:_meta.json
{
"ownerId": "kn70s411jhsq1ert53ad4s7v8d82nqmh",
"slug": "xeontts",
"version": "1.0.0",
"publishedAt": 1773792000000
}
Automatically converts received voice messages to text via an external ASR service, supporting multiple audio formats and integrating with OpenClaw.
# Xeon TTS
基于 OpenVINO Qwen3-TTS Base/Custom 模型的本地语音合成技能,面向 OpenClaw 的 QQBOT 工作流使用。
## 目标
- 安装本地双服务:5002 Flask TTS,9002 Node TTS Workflow
- 自动配置目标机器自己的 OpenClaw 配置,但只写入 `channels.qqbot.xeonTts`
- 与 xeonasr 共存,不覆盖 `tools.media.audio` 或 `channels.qqbot.stt`
- 支持两个工作流:音色克隆、指定语气 TTS
## 什么时候应该调用 xeontts
只有在以下场景才使用 xeontts:
- 用户明确要“克隆音色”“克隆声音”“复制我的声音”
- 用户要求“用某种语气朗读/播报/生成语音”
- 用户要把音频生成到本地文件,而不是做转写
以下场景禁止走 xeontts:
- 识别语音
- 语音转文字
- 听写
- STT / ASR
这些请求必须交给 xeonasr,以避免任务冲突。
## OpenClaw / QQBOT 使用规则
### 规则 1:音色克隆必须分两步走
当用户说“我要克隆音色”时:
1. 立即把当前会话切到 clone 流程
2. 回复用户上传 3 到 5 秒参考音频
3. 在收到参考音频前,不要直接开始合成
4. 如果机器上已安装 xeonasr,QQBOT 发来的语音会先命中 ASR;此时应由 ASR 把音频转交给 xeontts,而不是按普通转写处理
5. 收到音频后必须先校验时长
6. 如果时长小于 3 秒或大于 5 秒,直接拒绝并提示重新上传
7. 校验通过后,再让用户发送要朗读的文本
8. 使用 Base 模型生成音频并落盘
### 规则 2:指定语气生成默认走 Custom 模型
当用户说“用开心的语气朗读……”“生成语音……”时:
1. 解析用户是否指定了语气
2. 如果没有指定,默认使用 `普通`
3. 使用 Custom 模型生成音频
4. 把结果保存到本地 `outputs/`
5. 回复用户文件路径
### 规则 3:必须做时长保护
- 参考音频:3 到 5 秒
- Base 克隆输出:最多约 20 秒
- Custom 输出:最多约 30 秒
如果用户显式要求更长时长,或者文本长度估算会明显超过上限,必须直接提示用户拆分内容,而不是盲目提交推理。
### 规则 4:必须明确告知文件保留期
- 参考音频和生成结果默认只保留 7 天
- 超过保留期后,系统会自动清理 `references/` 和 `outputs/` 中的旧文件
- 在提示用户上传参考音频、以及告知生成完成时,都要明确告诉用户这一点
## 安装流程
当 OpenClaw、QQBOT 或其他代理被要求安装这个 skill 时,按下面流程执行:
1. 安装 skill
```bash
clawhub install xeontts
cd "$HOME/.openclaw/workspace/skills/xeontts"
```
2. 执行安装脚本
```bash
bash install.sh
```
3. 安装脚本会完成这些事情
- 创建 Python 3.10 环境
- 从 PyPI 安装 `xdp-tts-service`
- 生成 `config.json` 与 `tts_config.json`
- 尝试下载 Base / Custom OV 模型
- 仅在显式提供旧兼容参数时,才额外下载 Base checkpoint
- 配置 OpenClaw 的 `channels.qqbot.xeonTts`
- 启动 5002 与 9002
- 注册用户级 systemd 服务
- 运行 `self_check.sh`
当前默认模型仓库:
- `aurora2035/Qwen3-TTS-12Hz-0.6B-Base-OpenVINO-INT8`
- `aurora2035/Qwen3-TTS-12Hz-0.6B-CustomVoice-OpenVINO-INT8`
## 运行时端口
| 服务 | 端口 | 作用 |
|------|------|------|
| Flask TTS | 5002 | 真正执行 TTS 推理 |
| Node Workflow | 9002 | 解析 QQBOT 任务、维护会话状态、校验音频/文本时长 |
## OpenClaw 配置约定
xeontts 只会写入如下配置块:
```json
{
"channels": {
"qqbot": {
"xeonTts": {
"enabled": true,
"baseUrl": "http://127.0.0.1:9002",
"cloneModel": "qwen3_tts_0.6b_base_openvino",
"customModel": "qwen3_tts_0.6b_custom_openvino"
}
}
}
}
```
这意味着:
- 不会覆盖现有 `channels.qqbot.stt`
- 不会动 `tools.media.audio`
- 不会和 xeonasr 抢同一条 STT 链路
## 常用命令
```bash
cd "$HOME/.openclaw/workspace/skills/xeontts"
bash start_all.sh
bash stop_tts.sh
bash self_check.sh
curl http://127.0.0.1:5002/api/health
curl http://127.0.0.1:9002/health
```
## 关键接口
- `POST /api/workflow/message`
- 作用:根据用户消息判断是 clone 还是 custom TTS,或者提示补充参考音频
- `POST /api/workflow/reference-audio`
- 作用:上传参考音频,校验 3 到 5 秒后入库
- `POST /api/tts/custom-speak`
- 作用:直接调用 Custom 模型生成语音
- `POST /api/tts/clone-speak`
- 作用:直接调用 Base 模型做音色克隆
## 故障排查
- 如果 `5002` 不通,先检查 `tts.log`
- 如果 `9002` 不通,先检查 `skill.log`
- 如果参考音频总是被拒绝,先确认机器上是否有可用的 `ffprobe`;当前版本对 WAV 参考音频也支持无 `ffprobe` 回退校验
- 如果用户说的是转写意图,不要误用 xeontts
- 如果 Base 模型报错,优先让用户更换更干净的 3 到 5 秒参考音频
- 当前默认发布形态只要求 `Qwen3-TTS-12Hz-0.6B-Base-OpenVINO-INT8`,不再默认要求原始 Base checkpoint
- 只有旧导出模型缺少 processor 或 speech tokenizer 权重时,才需要补 `BASE_CHECKPOINT_PATH`
FILE:.clawhub.json
{
"name": "xeon-tts",
"version": "1.0.0",
"description": "基于 OpenVINO Qwen3-TTS Base/Custom 模型的本地语音合成技能。一键安装脚本自动创建 Python 环境、安装 xdp-tts-service、生成配置,并为 OpenClaw 的 QQBOT 写入独立 TTS 工作流配置。",
"author": "aurora2035",
"license": "MIT",
"tags": ["xeon", "tts", "voice-clone", "qqbot", "openvino", "qwen3", "openclaw", "local", "audio"],
"main": "server.js",
"repository": "",
"homepage": "",
"config": {
"port": 9002,
"flaskTtsUrl": "http://127.0.0.1:5002/api/tts/synthesize",
"cloneModel": "qwen3_tts_0.6b_base_openvino",
"customModel": "qwen3_tts_0.6b_custom_openvino"
},
"scripts": {
"install:all": "bash ./install.sh",
"init": "xdp-tts-init-config --output ./tts_config.json",
"start": "node server.js",
"start:tts": "xdp-tts-service --host 127.0.0.1 --port 5002 --config ./tts_config.json"
},
"engines": {
"node": ">=18.0.0",
"python": "3.10"
}
}
FILE:config.example.json
{
"port": 9002,
"flaskTtsUrl": "http://127.0.0.1:5002/api/tts/synthesize",
"flaskHealthUrl": "http://127.0.0.1:5002/api/health",
"cloneModel": "qwen3_tts_0.6b_base_openvino",
"cloneMode": "voice_clone_xvector",
"customModel": "qwen3_tts_0.6b_custom_openvino",
"defaultSpeaker": "Vivian",
"defaultLanguage": "Chinese",
"minReferenceDurationSec": 3,
"maxReferenceDurationSec": 5,
"maxCloneOutputSeconds": 20,
"maxCustomOutputSeconds": 30,
"estimatedCharsPerSecond": 4,
"fileRetentionDays": 7,
"outputDir": "./outputs",
"referencesDir": "./references",
"runtimeDir": "./runtime",
"sessionStateFile": "./runtime/session_state.json",
"openclawSession": "default"
}
FILE:configure_openclaw_integration.sh
#!/usr/bin/env bash
set -euo pipefail
log() {
printf '[xeontts-config] %s\n' "$*"
}
fail() {
printf '[xeontts-config] ERROR: %s\n' "$*" >&2
exit 1
}
timestamp() {
date +%Y%m%d-%H%M%S
}
OPENCLAW_HOME="-$HOME/.openclaw"
CONFIG_FILE="-$OPENCLAW_HOME/openclaw.json"
RUN_ID="$(timestamp)"
QQBOT_TTS_BASE_URL="-http://127.0.0.1:9002"
QQBOT_TTS_HEALTH_URL="-http://127.0.0.1:9002/health"
OUTPUT_DIR="-$(cd "$(dirname "${BASH_SOURCE[0]")" && pwd)/outputs}"
command -v node >/dev/null 2>&1 || fail "missing required command: node"
[[ -f "$CONFIG_FILE" ]] || fail "OpenClaw config not found: $CONFIG_FILE"
export CONFIG_FILE RUN_ID QQBOT_TTS_BASE_URL QQBOT_TTS_HEALTH_URL OUTPUT_DIR
node <<'NODE'
const fs = require('node:fs');
const {
CONFIG_FILE,
RUN_ID,
QQBOT_TTS_BASE_URL,
QQBOT_TTS_HEALTH_URL,
OUTPUT_DIR,
} = process.env;
const backupPath = `CONFIG_FILE.bak.RUN_ID`;
if (!fs.existsSync(backupPath)) {
fs.copyFileSync(CONFIG_FILE, backupPath);
}
const raw = fs.readFileSync(CONFIG_FILE, 'utf8');
const config = JSON.parse(raw);
config.channels = config.channels || {};
config.channels.qqbot = config.channels.qqbot || {};
config.channels.qqbot.xeonTts = {
...(config.channels.qqbot.xeonTts || {}),
enabled: true,
baseUrl: QQBOT_TTS_BASE_URL,
healthUrl: QQBOT_TTS_HEALTH_URL,
outputDir: OUTPUT_DIR,
cloneModel: 'qwen3_tts_0.6b_base_openvino',
customModel: 'qwen3_tts_0.6b_custom_openvino',
minReferenceDurationSec: 3,
maxReferenceDurationSec: 5,
maxCloneOutputSeconds: 20,
maxCustomOutputSeconds: 30,
modeRouting: {
cloneIntentKeywords: ['克隆音色', '克隆声音', 'voice clone'],
customIntentKeywords: ['生成语音', '朗读', '播报', 'tts'],
asrGuardKeywords: ['转写', '识别语音', 'speech to text', 'asr'],
},
};
if (config.skills && typeof config.skills === 'object' && 'xeontts' in config.skills) {
delete config.skills.xeontts;
if (Object.keys(config.skills).length === 0) {
delete config.skills;
}
}
fs.writeFileSync(CONFIG_FILE, `JSON.stringify(config, null, 2)\n`);
console.log('updated openclaw config with xeonTts block');
NODE
log "已写入 OpenClaw QQBOT TTS 配置: $CONFIG_FILE"
log "注意:xeontts 不会修改 tools.media.audio 或 channels.qqbot.stt,因此不会和 xeonasr 产生端口/配置冲突"
FILE:install.sh
#!/usr/bin/env bash
set -euo pipefail
GREEN='\033[0;32m'
YELLOW='\033[1;33m'
RED='\033[0;31m'
BLUE='\033[0;34m'
NC='\033[0m'
log_info() { echo -e "GREEN[INFO]NC $1"; }
log_warn() { echo -e "YELLOW[WARN]NC $1"; }
log_error() { echo -e "RED[ERROR]NC $1"; }
log_step() { echo -e "BLUE[STEP]NC $1"; }
SKILL_DIR="$(cd "$(dirname "BASH_SOURCE[0]")" && pwd)"
cd "$SKILL_DIR"
SKIP_START=0
SETUP_ARGS=()
while [[ $# -gt 0 ]]; do
case "$1" in
--skip-start) SKIP_START=1; shift ;;
--force|--skip-deps) SETUP_ARGS+=("$1"); shift ;;
*) log_error "未知参数: $1"; exit 1 ;;
esac
done
command -v node >/dev/null 2>&1 || { log_error "需要 Node.js 18+"; exit 1; }
command -v npm >/dev/null 2>&1 || { log_error "需要 npm"; exit 1; }
log_step "安装 Python 环境与模型配置"
bash "$SKILL_DIR/setup_env.sh" "SETUP_ARGS[@]"
log_step "安装 Node 依赖"
npm install
log_step "配置 OpenClaw QQBOT TTS 集成"
bash "$SKILL_DIR/configure_openclaw_integration.sh"
log_step "安装开机自启服务"
bash "$SKILL_DIR/install_systemd_services.sh"
if [[ "$SKIP_START" -ne 1 ]]; then
log_step "启动本地服务"
bash "$SKILL_DIR/start_all.sh"
else
log_warn "已跳过自动启动,可手工执行 ./start_all.sh"
fi
log_step "执行自检"
bash "$SKILL_DIR/self_check.sh"
log_info "Xeon TTS 安装完成"
FILE:install_systemd_services.sh
#!/usr/bin/env bash
set -euo pipefail
SKILL_DIR="$(cd "$(dirname "BASH_SOURCE[0]")" && pwd)"
SYSTEMD_USER_DIR="-$HOME/.config/systemd/user"
TTS_UNIT_NAME="xeontts-tts.service"
NODE_UNIT_NAME="xeontts-node.service"
TTS_UNIT_PATH="$SYSTEMD_USER_DIR/$TTS_UNIT_NAME"
NODE_UNIT_PATH="$SYSTEMD_USER_DIR/$NODE_UNIT_NAME"
NODE_BIN="$(command -v node || true)"
TTS_BIN="$SKILL_DIR/venv/bin/xdp-tts-service"
[[ -n "$NODE_BIN" ]] || { echo "未找到 node" >&2; exit 1; }
[[ -x "$TTS_BIN" ]] || { echo "未找到 $TTS_BIN" >&2; exit 1; }
[[ -f "$SKILL_DIR/tts_config.json" ]] || { echo "未找到 tts_config.json" >&2; exit 1; }
[[ -f "$SKILL_DIR/config.json" ]] || { echo "未找到 config.json" >&2; exit 1; }
mkdir -p "$SYSTEMD_USER_DIR"
cat > "$TTS_UNIT_PATH" <<EOF
[Unit]
Description=Xeon TTS Flask Service
After=network-online.target
Wants=network-online.target
[Service]
Type=simple
WorkingDirectory=$SKILL_DIR
ExecStart=$TTS_BIN --host 127.0.0.1 --port 5002 --config $SKILL_DIR/tts_config.json
Restart=always
RestartSec=3
Environment=HOME=$HOME
Environment=PATH=$HOME/.local/bin:$HOME/.npm-global/bin:$HOME/bin:/usr/local/bin:/usr/bin:/bin
Environment=XDP_TTS_CONFIG=$SKILL_DIR/tts_config.json
[Install]
WantedBy=default.target
EOF
cat > "$NODE_UNIT_PATH" <<EOF
[Unit]
Description=Xeon TTS Workflow Gateway
After=network-online.target $TTS_UNIT_NAME
Wants=network-online.target $TTS_UNIT_NAME
[Service]
Type=simple
WorkingDirectory=$SKILL_DIR
ExecStart=$NODE_BIN $SKILL_DIR/server.js
Restart=always
RestartSec=3
Environment=HOME=$HOME
Environment=PATH=$HOME/.local/bin:$HOME/.npm-global/bin:$HOME/bin:/usr/local/bin:/usr/bin:/bin
[Install]
WantedBy=default.target
EOF
systemctl --user daemon-reload
systemctl --user enable --now "$TTS_UNIT_NAME"
systemctl --user enable --now "$NODE_UNIT_NAME"
echo "xeontts 开机自启已启用"
FILE:package-lock.json
{
"name": "xeon-tts",
"version": "1.0.0",
"lockfileVersion": 3,
"requires": true,
"packages": {
"": {
"name": "xeon-tts",
"version": "1.0.0",
"license": "MIT",
"dependencies": {
"formidable": "^3.5.4"
},
"engines": {
"node": ">=18.0.0"
}
},
"node_modules/@noble/hashes": {
"version": "1.8.0",
"resolved": "https://registry.npmjs.org/@noble/hashes/-/hashes-1.8.0.tgz",
"integrity": "sha512-jCs9ldd7NwzpgXDIf6P3+NrHh9/sD6CQdxHyjQI+h/6rDNo88ypBxxz45UDuZHz9r3tNz7N/VInSVoVdtXEI4A==",
"license": "MIT",
"engines": {
"node": "^14.21.3 || >=16"
},
"funding": {
"url": "https://paulmillr.com/funding/"
}
},
"node_modules/@paralleldrive/cuid2": {
"version": "2.3.1",
"resolved": "https://registry.npmjs.org/@paralleldrive/cuid2/-/cuid2-2.3.1.tgz",
"integrity": "sha512-XO7cAxhnTZl0Yggq6jOgjiOHhbgcO4NqFqwSmQpjK3b6TEE6Uj/jfSk6wzYyemh3+I0sHirKSetjQwn5cZktFw==",
"license": "MIT",
"dependencies": {
"@noble/hashes": "^1.1.5"
}
},
"node_modules/asap": {
"version": "2.0.6",
"resolved": "https://registry.npmjs.org/asap/-/asap-2.0.6.tgz",
"integrity": "sha512-BSHWgDSAiKs50o2Re8ppvp3seVHXSRM44cdSsT9FfNEUUZLOGWVCsiWaRPWM1Znn+mqZ1OfVZ3z3DWEzSp7hRA==",
"license": "MIT"
},
"node_modules/dezalgo": {
"version": "1.0.4",
"resolved": "https://registry.npmjs.org/dezalgo/-/dezalgo-1.0.4.tgz",
"integrity": "sha512-rXSP0bf+5n0Qonsb+SVVfNfIsimO4HEtmnIpPHY8Q1UCzKlQrDMfdobr8nJOOsRgWCyMRqeSBQzmWUMq7zvVig==",
"license": "ISC",
"dependencies": {
"asap": "^2.0.0",
"wrappy": "1"
}
},
"node_modules/formidable": {
"version": "3.5.4",
"resolved": "https://registry.npmjs.org/formidable/-/formidable-3.5.4.tgz",
"integrity": "sha512-YikH+7CUTOtP44ZTnUhR7Ic2UASBPOqmaRkRKxRbywPTe5VxF7RRCck4af9wutiZ/QKM5nME9Bie2fFaPz5Gug==",
"license": "MIT",
"dependencies": {
"@paralleldrive/cuid2": "^2.2.2",
"dezalgo": "^1.0.4",
"once": "^1.4.0"
},
"engines": {
"node": ">=14.0.0"
},
"funding": {
"url": "https://ko-fi.com/tunnckoCore/commissions"
}
},
"node_modules/once": {
"version": "1.4.0",
"resolved": "https://registry.npmjs.org/once/-/once-1.4.0.tgz",
"integrity": "sha512-lNaJgI+2Q5URQBkccEKHTQOPaXdUxnZZElQTZY0MFUAuaEqe1E+Nyvgdz/aIyNi6Z9MzO5dv1H8n58/GELp3+w==",
"license": "ISC",
"dependencies": {
"wrappy": "1"
}
},
"node_modules/wrappy": {
"version": "1.0.2",
"resolved": "https://registry.npmjs.org/wrappy/-/wrappy-1.0.2.tgz",
"integrity": "sha512-l4Sp/DRseor9wL6EvV2+TuQn63dMkPjZ/sp9XkghTEbV9KlPS1xUsZ3u7/IQO4wxtcFB4bgpQPRcR3QCvezPcQ==",
"license": "ISC"
}
}
}
FILE:package.json
{
"name": "xeon-tts",
"version": "1.0.0",
"description": "Xeon TTS 本地语音合成技能,面向 OpenClaw QQBOT 的音色克隆与风格化 TTS 工作流",
"main": "server.js",
"scripts": {
"install:all": "bash ./install.sh",
"self-check": "bash ./self_check.sh",
"start": "node server.js",
"start:tts": "xdp-tts-service --host 127.0.0.1 --port 5002 --config ./tts_config.json",
"dev": "node --watch server.js"
},
"keywords": ["xeon", "tts", "voice-clone", "speech", "qqbot", "openclaw", "openvino", "qwen3"],
"author": "aurora2035",
"license": "MIT",
"engines": {
"node": ">=18.0.0"
},
"dependencies": {
"formidable": "^3.5.4"
}
}
FILE:README.md
# Xeon TTS Skill
基于 OpenVINO Qwen3-TTS Base/Custom 模型的本地语音合成技能,为 OpenClaw 的 QQBOT 提供两类工作流:
- 音色克隆:用户先声明要克隆音色,再上传 3 到 5 秒参考音频,随后用 Base 模型生成目标语音
- 风格化 TTS:用户直接要求“用某种语气朗读一段话”,系统调用 Custom 模型生成音频并落盘
这个 skill 刻意不占用 xeon_asr 的端口或全局音频配置:
- Flask TTS 服务:5002
- Node 工作流网关:9002
- OpenClaw 配置写入:`channels.qqbot.xeonTts`
它不会覆盖 `tools.media.audio`,也不会改动 `channels.qqbot.stt`,因此可以和已安装的 xeonasr 共存。
## 架构
双服务架构:
| 服务 | 端口 | 类型 | 作用 |
|------|------|------|------|
| Flask TTS | 5002 | Python | 加载 Base/Custom OpenVINO TTS 模型并执行推理 |
| TTS Workflow | 9002 | Node.js | 维护 QQBOT 会话状态、校验时长、保存输出、分流 clone/custom 请求 |
## 模型与能力
默认配置使用两套模型:
- Base:`qwen3_tts_0.6b_base_openvino`,用于音色克隆
- Custom:`qwen3_tts_0.6b_custom_openvino`,用于常规 TTS 和指定语气生成
时长约束默认值:
- 参考音频:必须在 3 到 5 秒之间
- Base 克隆输出:最多约 20 秒
- Custom 输出:最多约 30 秒
- 参考音频和生成结果:默认只保留 7 天,之后自动清理
如果用户文本里显式要求超过最大时长,或根据文本长度估算会超过上限,skill 会直接拒绝并提示拆分内容。
## 快速开始
### 1. 安装 skill
```bash
clawhub install xeontts
cd "$HOME/.openclaw/workspace/skills/xeontts"
bash install.sh
```
如果你是从源码目录运行:
```bash
cd /path/to/xeon_tts
bash install.sh
```
### 2. 模型下载
安装脚本现在默认会尝试从这两个 Hugging Face 仓库下载模型:
```bash
aurora2035/Qwen3-TTS-12Hz-0.6B-Base-OpenVINO-INT8
aurora2035/Qwen3-TTS-12Hz-0.6B-CustomVoice-OpenVINO-INT8
```
如果后续你要切换到别的仓库,再在安装前覆盖这些环境变量:
```bash
export BASE_MODEL_REPO=your-org/Qwen3-TTS-12Hz-0.6B-Base-OpenVINO-INT8
export CUSTOM_MODEL_REPO=your-org/Qwen3-TTS-12Hz-0.6B-CustomVoice-OpenVINO-INT8
bash install.sh
```
现在 `xdp-tts-service` 已经发布到 PyPI,安装脚本默认直接执行:
```bash
pip install xdp-tts-service
```
只有在你需要测试本地 wheel 或私有包时,才需要覆盖安装源:
```bash
export XDP_TTS_PIP_SPEC=/absolute/path/to/xdp_tts_service-0.1.0-py3-none-any.whl
bash install.sh
```
### 3. Base 模型是否还需要原始 checkpoint
如果你上传的是用当前最新版转换脚本导出的 `Qwen3-TTS-12Hz-0.6B-Base-OpenVINO-INT8`,并且导出目录里已经包含 processor 相关文件以及 `speech_tokenizer/` 子目录,那么默认不再需要单独上传原始 Base checkpoint。
只有两种情况还需要额外提供 `BASE_CHECKPOINT_PATH`:
- 你上传的是较早期导出的 Base OV 目录,里面缺少 processor 或 speech tokenizer 权重
- 你后续发现某些机器上 Base voice clone 仍然需要旧的 fallback tokenizer 路径
也就是说,按你现在的发布计划,优先只上传 `Qwen3-TTS-12Hz-0.6B-Base-OpenVINO-INT8` 是合理的。
## OpenClaw / QQBOT 工作流
### 1. 音色克隆
用户在 QQBOT 中说:
- `我要克隆音色`
- `帮我克隆我的声音`
工作流会这样运行:
1. xeontts 识别为 clone 任务,不走 xeonasr
2. Bot 回复:请上传 3 到 5 秒参考音频
3. 如果机器上同时安装了 xeonasr,QQBOT 的语音消息会先到 9001;ASR 会识别当前会话处于音色克隆流程,并把这段音频转交给 xeontts
4. xeontts 优先用 `ffprobe` 校验时长;如果机器上没有可用的 `ffprobe`,对 WAV 文件会自动回退到读取 WAV 头部信息来校验时长
5. 如果时长不合规,立即返回错误
6. 如果合规,保存参考音频到本地 `references/`
7. Bot 再要求用户发送要朗读的文本
8. 使用 Base 模型生成音频,并把 wav 文件落盘到 `outputs/`
系统会明确告知用户:参考音频和生成结果默认只保留 7 天,之后会自动清理。
这里的关键点是:ASR 仍然会删除自己的临时文件,但删除前已经把参考音频转交给 TTS 侧保存,所以不会再出现“文件被 ASR 清理导致音色克隆拿不到参考音频”的问题。
注意:默认使用 `voice_clone_xvector` 路径,原因是它对参考音频容错更高,但仍然使用的是 Base 模型。
如果后续你发现某个旧导出 Base 模型在目标机上无法完成 voice clone,再补一个原始 Base checkpoint 即可,不影响当前的开源默认方案。
说明:
- `ffprobe` 是 FFmpeg 工具链里的媒体探测工具,用来读取音频时长、编码、采样率等元数据
- 这里优先使用它,是因为它对 mp3、m4a、wav 等多种格式都更稳定
- 当前源码已经对 WAV 做了无 `ffprobe` 回退,所以它不是 WAV 场景的硬依赖;但如果你希望发布包开箱支持更多参考音频格式,仍建议目标机安装 FFmpeg
### 2. 指定语气生成
用户在 QQBOT 中说:
- `用开心的语气朗读:今天是个好日子`
- `生成语音:请提醒我下午三点开会`
工作流会这样运行:
1. 如果用户提到具体语气,就将其转成 `instruct_text`
2. 如果没有提到,默认按 `普通` 语气
3. 调用 Custom 模型生成音频
4. 将音频落盘到 `outputs/`
5. 回复用户保存路径,并明确告知文件默认只保留 7 天
## 文件清理策略
当前默认启用按天自动清理:
- `references/` 下的参考音频,默认保留 7 天
- `outputs/` 下的生成结果,默认保留 7 天
- 超过保留期的文件会在服务启动时和每次新保存文件后自动清理
安装脚本会先从 `config.example.json` 生成本地 `config.json`,其中默认已经包含 `fileRetentionDays`。
如果你要调整保留期,可以在本地生成的 `config.json` 里修改 `fileRetentionDays`。完整示例如下:
```json
{
"port": 9002,
"flaskTtsUrl": "http://127.0.0.1:5002/api/tts/synthesize",
"flaskHealthUrl": "http://127.0.0.1:5002/api/health",
"cloneModel": "qwen3_tts_0.6b_base_openvino",
"cloneMode": "voice_clone_xvector",
"customModel": "qwen3_tts_0.6b_custom_openvino",
"defaultSpeaker": "Vivian",
"defaultLanguage": "Chinese",
"minReferenceDurationSec": 3,
"maxReferenceDurationSec": 5,
"maxCloneOutputSeconds": 20,
"maxCustomOutputSeconds": 30,
"estimatedCharsPerSecond": 4,
"fileRetentionDays": 7,
"outputDir": "./outputs",
"referencesDir": "./references",
"runtimeDir": "./runtime",
"sessionStateFile": "./runtime/session_state.json",
"openclawSession": "default"
}
```
## 管理命令
```bash
cd "$HOME/.openclaw/workspace/skills/xeontts"
bash start_all.sh
bash stop_tts.sh
bash self_check.sh
curl http://127.0.0.1:5002/api/health
curl http://127.0.0.1:9002/health
```
## 主要接口
### 工作流意图入口
```bash
curl -X POST http://127.0.0.1:9002/api/workflow/message \
-H 'Content-Type: application/json' \
-d '{"sessionId":"default","userId":"qq-1001","text":"我要克隆音色"}'
```
### 上传参考音频
```bash
curl -X POST http://127.0.0.1:9002/api/workflow/reference-audio \
-F sessionId=default \
-F userId=qq-1001 \
-F [email protected]
```
### 继续克隆文本
```bash
curl -X POST http://127.0.0.1:9002/api/workflow/message \
-H 'Content-Type: application/json' \
-d '{"sessionId":"default","userId":"qq-1001","text":"请用我的音色说,今天下班早点休息。"}'
```
### 自定义语气生成
```bash
curl -X POST http://127.0.0.1:9002/api/tts/custom-speak \
-H 'Content-Type: application/json' \
-d '{"text":"欢迎使用 Xeon TTS","style":"开心","speakerId":"Vivian"}'
```
## 目录结构
```text
xeon_tts/
├── install.sh
├── setup_env.sh
├── configure_openclaw_integration.sh
├── install_systemd_services.sh
├── start_tts_service.sh
├── start_all.sh
├── stop_tts.sh
├── self_check.sh
├── server.js
├── config.example.json
├── tts_config.example.json
├── SKILL.md
└── README.md
```
## 发布清理
开源时不要提交这些本机运行产物:
- `venv/`
- `node_modules/`
- `runtime/`
- `outputs/`
- `references/`
- `config.json`
- `tts_config.json`
- `*.log`
- `*.pid`
## 注意事项
- 这个 skill 只处理 TTS 和音色克隆,不处理转写
- 如果用户说的是“识别语音”“转文字”,应该交给 xeonasr
- 如果机器上没有可用的 `ffprobe`,WAV 参考音频仍可通过内置回退逻辑完成时长校验;若要稳定支持更多参考音频格式,仍建议安装 `ffmpeg`
- Base 模型仍建议使用干净、单人声、较少背景噪声的参考音频
FILE:self_check.sh
#!/usr/bin/env bash
set -euo pipefail
GREEN='\033[0;32m'
YELLOW='\033[1;33m'
RED='\033[0;31m'
BLUE='\033[0;34m'
NC='\033[0m'
log_step() { echo -e "BLUE[STEP]NC $1"; }
log_ok() { echo -e "GREEN[PASS]NC $1"; }
log_warn() { echo -e "YELLOW[WARN]NC $1"; }
log_fail() { echo -e "RED[FAIL]NC $1"; }
OPENCLAW_HOME="-$HOME/.openclaw"
CONFIG_FILE="-$OPENCLAW_HOME/openclaw.json"
NODE_CONFIG_FILE="$(cd "$(dirname "BASH_SOURCE[0]")" && pwd)/config.json"
TTS_CONFIG_FILE="$(cd "$(dirname "BASH_SOURCE[0]")" && pwd)/tts_config.json"
FAIL_COUNT=0
pass() { log_ok "$1"; }
fail() { log_fail "$1"; FAIL_COUNT=$((FAIL_COUNT + 1)); }
check_http_health() {
local name="$1"
local url="$2"
if curl -fsS "$url" >/dev/null 2>&1; then
pass "$name 健康检查通过: $url"
else
fail "$name 健康检查失败: $url"
fi
}
check_json_value() {
local file_path="$1"
local label="$2"
local expression="$3"
if node -e '
const fs = require("node:fs");
const cfg = JSON.parse(fs.readFileSync(process.argv[1], "utf8"));
const fn = new Function("cfg", `return (process.argv[2]);`);
if (!fn(cfg)) process.exit(1);
' "$file_path" "$expression"; then
pass "$label"
else
fail "$label"
fi
}
log_step "检查本地服务"
check_http_health "Flask TTS" "http://127.0.0.1:5002/api/health"
check_http_health "Node TTS Workflow" "http://127.0.0.1:9002/health"
log_step "检查本地配置文件"
[[ -f "$NODE_CONFIG_FILE" ]] && pass "找到 config.json" || fail "缺少 config.json"
[[ -f "$TTS_CONFIG_FILE" ]] && pass "找到 tts_config.json" || fail "缺少 tts_config.json"
if [[ -f "$NODE_CONFIG_FILE" ]]; then
check_json_value "$NODE_CONFIG_FILE" "Node 网关监听 9002" 'cfg.port === 9002'
check_json_value "$NODE_CONFIG_FILE" "Node 网关转发到 Flask 5002" 'cfg.flaskTtsUrl === "http://127.0.0.1:5002/api/tts/synthesize"'
fi
if [[ -f "$TTS_CONFIG_FILE" ]]; then
check_json_value "$TTS_CONFIG_FILE" "Base 模型已配置" 'Boolean(cfg.qwen3_tts_0_6b_base_openvino || cfg["qwen3_tts_0.6b_base_openvino"])'
check_json_value "$TTS_CONFIG_FILE" "Custom 模型已配置" 'Boolean(cfg.qwen3_tts_0_6b_custom_openvino || cfg["qwen3_tts_0.6b_custom_openvino"])'
fi
log_step "检查 OpenClaw 配置"
if [[ -f "$CONFIG_FILE" ]]; then
pass "找到 OpenClaw 配置: $CONFIG_FILE"
check_json_value "$CONFIG_FILE" "channels.qqbot.xeonTts 已开启" 'cfg.channels?.qqbot?.xeonTts?.enabled === true'
check_json_value "$CONFIG_FILE" "channels.qqbot.xeonTts.baseUrl 指向 9002" 'cfg.channels?.qqbot?.xeonTts?.baseUrl === "http://127.0.0.1:9002"'
else
fail "未找到 OpenClaw 配置: $CONFIG_FILE"
fi
log_step "检查 systemd 状态"
if systemctl --user is-enabled xeontts-tts.service >/dev/null 2>&1; then pass "xeontts-tts.service 已启用"; else fail "xeontts-tts.service 未启用"; fi
if systemctl --user is-enabled xeontts-node.service >/dev/null 2>&1; then pass "xeontts-node.service 已启用"; else fail "xeontts-node.service 未启用"; fi
if [[ "$FAIL_COUNT" -gt 0 ]]; then
log_fail "自检失败,FAIL=$FAIL_COUNT"
exit 1
fi
log_ok "自检通过"
FILE:server.js
const http = require('node:http');
const fs = require('node:fs');
const path = require('node:path');
const os = require('node:os');
const { execFile } = require('node:child_process');
const formidable = require('formidable');
const SKILL_DIR = __dirname;
const CONFIG_PATH = path.join(SKILL_DIR, 'config.json');
const DEFAULT_CONFIG = {
port: 9002,
flaskTtsUrl: 'http://127.0.0.1:5002/api/tts/synthesize',
flaskHealthUrl: 'http://127.0.0.1:5002/api/health',
cloneModel: 'qwen3_tts_0.6b_base_openvino',
cloneMode: 'voice_clone_xvector',
customModel: 'qwen3_tts_0.6b_custom_openvino',
defaultSpeaker: 'Vivian',
defaultLanguage: 'Chinese',
minReferenceDurationSec: 3,
maxReferenceDurationSec: 5,
maxCloneOutputSeconds: 20,
maxCustomOutputSeconds: 30,
estimatedCharsPerSecond: 4,
fileRetentionDays: 7,
outputDir: './outputs',
referencesDir: './references',
runtimeDir: './runtime',
sessionStateFile: './runtime/session_state.json',
openclawSession: 'default',
};
function loadConfig() {
if (!fs.existsSync(CONFIG_PATH)) {
return normalizeConfig({});
}
try {
const raw = JSON.parse(fs.readFileSync(CONFIG_PATH, 'utf8'));
return normalizeConfig(raw || {});
} catch (error) {
console.warn('[xeontts] failed to parse config.json, using defaults:', error.message);
return normalizeConfig({});
}
}
function normalizeConfig(input) {
const config = { ...DEFAULT_CONFIG, ...(input || {}) };
const retentionDays = Number(config.fileRetentionDays);
config.fileRetentionDays = Number.isFinite(retentionDays) && retentionDays >= 0 ? retentionDays : DEFAULT_CONFIG.fileRetentionDays;
config.outputDir = resolveLocalPath(config.outputDir);
config.referencesDir = resolveLocalPath(config.referencesDir);
config.runtimeDir = resolveLocalPath(config.runtimeDir);
config.sessionStateFile = resolveLocalPath(config.sessionStateFile);
return config;
}
function resolveLocalPath(value) {
if (!value) {
return value;
}
if (value.startsWith('~/')) {
return path.join(os.homedir(), value.slice(2));
}
if (path.isAbsolute(value)) {
return value;
}
return path.join(SKILL_DIR, value);
}
const config = loadConfig();
ensureRuntimeDirs();
cleanupManagedFiles();
function ensureRuntimeDirs() {
for (const dirPath of [config.outputDir, config.referencesDir, config.runtimeDir, path.dirname(config.sessionStateFile)]) {
fs.mkdirSync(dirPath, { recursive: true });
}
if (!fs.existsSync(config.sessionStateFile)) {
fs.writeFileSync(config.sessionStateFile, '{}\n');
}
}
function getRetentionNotice() {
if (Number(config.fileRetentionDays) <= 0) {
return '文件保留期未启用自动清理。';
}
return `参考音频和生成结果默认只保留 config.fileRetentionDays 天,之后会自动清理。`;
}
function removeEmptyDirsRecursively(dirPath, stopDir) {
if (!dirPath || dirPath === stopDir) {
return;
}
try {
const entries = fs.readdirSync(dirPath);
if (entries.length > 0) {
return;
}
fs.rmdirSync(dirPath);
removeEmptyDirsRecursively(path.dirname(dirPath), stopDir);
} catch {
// ignore cleanup errors
}
}
function cleanupExpiredFiles(rootDir) {
if (Number(config.fileRetentionDays) <= 0) {
return 0;
}
const cutoffMs = Date.now() - Number(config.fileRetentionDays) * 24 * 60 * 60 * 1000;
let removedCount = 0;
function walk(currentDir) {
const entries = fs.readdirSync(currentDir, { withFileTypes: true });
for (const entry of entries) {
const fullPath = path.join(currentDir, entry.name);
if (entry.isDirectory()) {
walk(fullPath);
removeEmptyDirsRecursively(fullPath, rootDir);
continue;
}
try {
const stat = fs.statSync(fullPath);
if (stat.mtimeMs < cutoffMs) {
fs.rmSync(fullPath, { force: true });
removedCount += 1;
}
} catch {
// ignore cleanup errors
}
}
}
if (fs.existsSync(rootDir)) {
walk(rootDir);
}
return removedCount;
}
function cleanupManagedFiles() {
const removedReferences = cleanupExpiredFiles(config.referencesDir);
const removedOutputs = cleanupExpiredFiles(config.outputDir);
if (removedReferences > 0 || removedOutputs > 0) {
console.log(`[xeontts] auto-cleaned expired files: references=removedReferences, outputs=removedOutputs, retentionDays=config.fileRetentionDays`);
}
}
function loadState() {
try {
return JSON.parse(fs.readFileSync(config.sessionStateFile, 'utf8')) || {};
} catch {
return {};
}
}
function saveState(state) {
fs.writeFileSync(config.sessionStateFile, `JSON.stringify(state, null, 2)\n`);
}
function firstValue(value) {
if (Array.isArray(value)) {
return value[0];
}
return value;
}
function buildSessionKey(sessionId, userId) {
return `String(sessionId || config.openclawSession || 'default').trim()::String(userId || 'anonymous').trim()`;
}
function getSessionState(sessionId, userId) {
const state = loadState();
const key = buildSessionKey(sessionId, userId);
if (!state[key]) {
state[key] = {
stage: 'idle',
sessionId: sessionId || config.openclawSession || 'default',
userId: userId || 'anonymous',
referenceAudioPath: null,
referenceDurationSec: null,
lastOutputPath: null,
updatedAt: new Date().toISOString(),
};
}
return { state, key, session: state[key] };
}
function writeSessionState(sessionId, userId, updater) {
const current = getSessionState(sessionId, userId);
updater(current.session);
current.session.updatedAt = new Date().toISOString();
saveState(current.state);
return current.session;
}
function listPendingReferenceSessions(state) {
return Object.entries(state || {})
.filter(([, session]) => session?.stage === 'awaiting_reference_audio')
.map(([key, session]) => ({ key, session }));
}
function resolvePendingReferenceSession(sessionId, userId) {
const state = loadState();
const pending = listPendingReferenceSessions(state);
const normalizedSessionId = String(sessionId || '').trim();
const normalizedUserId = String(userId || '').trim();
if (normalizedSessionId && normalizedUserId) {
const exactKey = buildSessionKey(normalizedSessionId, normalizedUserId);
const exact = pending.find((item) => item.key === exactKey);
if (exact) {
return { matched: true, sessionId: exact.session.sessionId, userId: exact.session.userId, state, key: exact.key, session: exact.session, reason: 'exact_match' };
}
}
if (normalizedSessionId) {
const sameSession = pending.filter((item) => item.session.sessionId === normalizedSessionId);
if (sameSession.length === 1) {
const match = sameSession[0];
return { matched: true, sessionId: match.session.sessionId, userId: match.session.userId, state, key: match.key, session: match.session, reason: 'session_match' };
}
if (sameSession.length > 1) {
return { matched: false, state, reason: 'ambiguous_session_match', pendingCount: sameSession.length };
}
}
if (normalizedUserId) {
const sameUser = pending.filter((item) => item.session.userId === normalizedUserId);
if (sameUser.length === 1) {
const match = sameUser[0];
return { matched: true, sessionId: match.session.sessionId, userId: match.session.userId, state, key: match.key, session: match.session, reason: 'user_match' };
}
if (sameUser.length > 1) {
return { matched: false, state, reason: 'ambiguous_user_match', pendingCount: sameUser.length };
}
}
if (pending.length === 1) {
const match = pending[0];
return { matched: true, sessionId: match.session.sessionId, userId: match.session.userId, state, key: match.key, session: match.session, reason: 'single_pending_session' };
}
if (pending.length > 1) {
return { matched: false, state, reason: 'ambiguous_pending_sessions', pendingCount: pending.length };
}
return { matched: false, state, reason: 'no_pending_clone_session', pendingCount: 0 };
}
function persistReferenceAudioForSession(sessionId, userId, sourcePath, originalFilename, durationSec) {
const sessionKey = buildSessionKey(sessionId, userId).replace(/[^a-zA-Z0-9_-]+/g, '_');
const fallbackExt = path.extname(sourcePath || '.wav') || '.wav';
const targetPath = path.join(config.referencesDir, `sessionKeypath.extname(originalFilename || '') || fallbackExt`);
fs.mkdirSync(path.dirname(targetPath), { recursive: true });
fs.copyFileSync(sourcePath, targetPath);
writeSessionState(sessionId, userId, (session) => {
session.stage = 'awaiting_clone_text';
session.referenceAudioPath = targetPath;
session.referenceDurationSec = durationSec;
});
cleanupManagedFiles();
return targetPath;
}
function sendJson(res, statusCode, payload) {
res.writeHead(statusCode, { 'Content-Type': 'application/json; charset=utf-8' });
res.end(JSON.stringify(payload));
}
function readJsonBody(req) {
return new Promise((resolve, reject) => {
const chunks = [];
req.on('data', (chunk) => chunks.push(chunk));
req.on('end', () => {
try {
const text = Buffer.concat(chunks).toString('utf8');
resolve(text ? JSON.parse(text) : {});
} catch (error) {
reject(error);
}
});
req.on('error', reject);
});
}
function detectIntent(text, session) {
const normalized = String(text || '').trim();
if (!normalized) {
return 'unknown';
}
const lower = normalized.toLowerCase();
if (/(asr|转写|识别语音|语音转文字|speech[- ]?to[- ]?text|transcribe)/i.test(normalized)) {
return 'ignore_asr';
}
if (/(克隆音色|克隆声音|复制我的声音|voice clone|clone voice)/i.test(normalized)) {
return 'clone_start';
}
if (session?.stage === 'awaiting_clone_text') {
return 'clone_text';
}
if (/(生成语音|朗读|播报|tts|text to speech|语气|风格|声音说)/i.test(normalized)) {
return 'custom_speak';
}
if (/^用.+(语气|风格).+(说|朗读|播报)/.test(normalized)) {
return 'custom_speak';
}
return 'unknown';
}
function parseStyleAndContent(text) {
const normalized = String(text || '').trim();
const styleMatch = normalized.match(/用(.{1,12}?)(?:的)?(?:语气|风格|声音)/);
const style = styleMatch ? styleMatch[1].trim() : '普通';
const contentPatterns = [
/(?:说|朗读|播报|念出|生成)(?:一段话|这段话|下面这段话|以下内容)?[::]?\s*(.+)$/,
/(?:请|帮我)?(?:用.{0,12}?(?:语气|风格|声音))[::]?\s*(.+)$/,
/(?:生成语音|生成音频)[::]?\s*(.+)$/,
];
for (const pattern of contentPatterns) {
const match = normalized.match(pattern);
if (match && match[1]) {
return { style, content: match[1].trim() };
}
}
return { style, content: normalized };
}
function extractRequestedDurationSeconds(text) {
const normalized = String(text || '').trim();
const minuteMatch = normalized.match(/(\d+(?:\.\d+)?)\s*(分钟|分)/);
if (minuteMatch) {
return Number(minuteMatch[1]) * 60;
}
const secondMatch = normalized.match(/(\d+(?:\.\d+)?)\s*秒/);
if (secondMatch) {
return Number(secondMatch[1]);
}
return null;
}
function estimateOutputDurationSeconds(text) {
const chars = Array.from(String(text || '').replace(/\s+/g, '')).length;
if (!chars) {
return 0;
}
return chars / Number(config.estimatedCharsPerSecond || 4);
}
function validateRequestedOutput(text, maxSeconds, label) {
const requested = extractRequestedDurationSeconds(text);
if (requested && requested > maxSeconds) {
throw new Error(`label请求时长超过上限,当前最大支持约 maxSeconds 秒`);
}
const estimated = estimateOutputDurationSeconds(text);
if (estimated > maxSeconds) {
throw new Error(`label内容预计时长约 estimated.toFixed(1) 秒,超过当前上限 maxSeconds 秒,请拆分后重试`);
}
return { requestedSeconds: requested, estimatedSeconds: estimated };
}
function getMimeType(filePath, contentType) {
if (contentType && String(contentType).startsWith('audio/')) {
return contentType;
}
const ext = path.extname(filePath).toLowerCase();
switch (ext) {
case '.wav':
return 'audio/wav';
case '.mp3':
return 'audio/mpeg';
case '.m4a':
return 'audio/mp4';
case '.ogg':
case '.opus':
return 'audio/ogg';
default:
return 'application/octet-stream';
}
}
function readWavDurationSec(filePath) {
const stat = fs.statSync(filePath);
const fd = fs.openSync(filePath, 'r');
try {
const riffHeader = Buffer.alloc(12);
fs.readSync(fd, riffHeader, 0, 12, 0);
if (riffHeader.toString('ascii', 0, 4) !== 'RIFF' || riffHeader.toString('ascii', 8, 12) !== 'WAVE') {
return null;
}
let offset = 12;
let sampleRate = null;
let channels = null;
let bitsPerSample = null;
let dataSize = null;
while (offset + 8 <= stat.size) {
const chunkHeader = Buffer.alloc(8);
const bytesRead = fs.readSync(fd, chunkHeader, 0, 8, offset);
if (bytesRead < 8) {
break;
}
const chunkId = chunkHeader.toString('ascii', 0, 4);
const chunkSize = chunkHeader.readUInt32LE(4);
const chunkDataOffset = offset + 8;
if (chunkId === 'fmt ') {
const fmtBuffer = Buffer.alloc(Math.min(chunkSize, 32));
fs.readSync(fd, fmtBuffer, 0, fmtBuffer.length, chunkDataOffset);
channels = fmtBuffer.readUInt16LE(2);
sampleRate = fmtBuffer.readUInt32LE(4);
bitsPerSample = fmtBuffer.readUInt16LE(14);
} else if (chunkId === 'data') {
dataSize = chunkSize;
}
if (sampleRate && channels && bitsPerSample && dataSize) {
break;
}
offset = chunkDataOffset + chunkSize + (chunkSize % 2);
}
if (!sampleRate || !channels || !bitsPerSample || !dataSize) {
return null;
}
const bytesPerSecond = sampleRate * channels * (bitsPerSample / 8);
if (!bytesPerSecond) {
return null;
}
return dataSize / bytesPerSecond;
} finally {
fs.closeSync(fd);
}
}
function getAudioDurationSec(filePath) {
return new Promise((resolve, reject) => {
execFile(
'ffprobe',
['-v', 'error', '-show_entries', 'format=duration', '-of', 'default=noprint_wrappers=1:nokey=1', filePath],
(error, stdout) => {
if (error) {
const wavDuration = readWavDurationSec(filePath);
if (wavDuration && Number.isFinite(wavDuration) && wavDuration > 0) {
console.warn('[xeontts] ffprobe 不可用,已回退为 WAV 头时长解析');
resolve(wavDuration);
return;
}
reject(new Error('未找到 ffprobe,且无法通过 WAV 头解析音频时长,请先安装可用的 ffmpeg/ffprobe'));
return;
}
const duration = Number(String(stdout || '').trim());
if (!Number.isFinite(duration) || duration <= 0) {
const wavDuration = readWavDurationSec(filePath);
if (wavDuration && Number.isFinite(wavDuration) && wavDuration > 0) {
console.warn('[xeontts] ffprobe 输出无效,已回退为 WAV 头时长解析');
resolve(wavDuration);
return;
}
reject(new Error('无法读取参考音频时长,请上传常见音频格式'));
return;
}
resolve(duration);
},
);
});
}
function uniqueOutputPath(prefix) {
const date = new Date().toISOString().slice(0, 10);
const dirPath = path.join(config.outputDir, date);
fs.mkdirSync(dirPath, { recursive: true });
return path.join(dirPath, `prefix_Date.now().wav`);
}
async function callFlaskTtsJson(payload) {
const url = new URL(config.flaskTtsUrl);
url.searchParams.set('response_format', 'json');
const response = await fetch(url, {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify(payload),
});
const data = await response.json().catch(() => ({}));
if (!response.ok || data.success === false) {
throw new Error(data.error || `TTS 服务请求失败 (response.status)`);
}
return data;
}
async function synthesizeCustomVoice(options) {
const text = String(options.text || '').trim();
if (!text) {
throw new Error('缺少要生成的文本');
}
const timing = validateRequestedOutput(text, Number(config.maxCustomOutputSeconds || 30), '自定义音色');
const style = String(options.style || '普通').trim() || '普通';
const payload = {
text,
model: config.customModel,
tts_model: config.customModel,
tts_mode: 'custom_voice',
language: options.language || config.defaultLanguage,
speaker_id: options.speakerId || config.defaultSpeaker,
instruct_text: style === '普通' ? '' : `请用style的语气朗读这段话。`,
};
const result = await callFlaskTtsJson(payload);
const outputPath = uniqueOutputPath('custom');
fs.writeFileSync(outputPath, Buffer.from(result.audio_base64, 'base64'));
cleanupManagedFiles();
return {
mode: 'custom_voice',
outputPath,
sampleRate: result.sample_rate,
estimatedSeconds: timing.estimatedSeconds,
style,
};
}
async function synthesizeClonedVoice(options) {
const text = String(options.text || '').trim();
if (!text) {
throw new Error('缺少要生成的文本');
}
if (!options.referenceAudioPath || !fs.existsSync(options.referenceAudioPath)) {
throw new Error('当前会话没有可用的参考音频,请先上传 3 到 5 秒参考音频');
}
const timing = validateRequestedOutput(text, Number(config.maxCloneOutputSeconds || 20), '音色克隆');
const form = new FormData();
form.append('text', text);
form.append('model', config.cloneModel);
form.append('tts_model', config.cloneModel);
form.append('tts_mode', config.cloneMode);
form.append('language', options.language || config.defaultLanguage);
form.append('x_vector_only_mode', String(config.cloneMode === 'voice_clone_xvector'));
const audioBuffer = fs.readFileSync(options.referenceAudioPath);
form.append('prompt_audio', new Blob([audioBuffer], { type: getMimeType(options.referenceAudioPath) }), path.basename(options.referenceAudioPath));
if (config.cloneMode !== 'voice_clone_xvector' && options.referenceText) {
form.append('ref_text', options.referenceText);
}
const url = new URL(config.flaskTtsUrl);
url.searchParams.set('response_format', 'json');
const response = await fetch(url, { method: 'POST', body: form });
const data = await response.json().catch(() => ({}));
if (!response.ok || data.success === false) {
throw new Error(data.error || `音色克隆请求失败 (response.status)`);
}
const outputPath = uniqueOutputPath('clone');
fs.writeFileSync(outputPath, Buffer.from(data.audio_base64, 'base64'));
cleanupManagedFiles();
return {
mode: 'voice_clone',
outputPath,
sampleRate: data.sample_rate,
estimatedSeconds: timing.estimatedSeconds,
};
}
function parseMultipart(req) {
const form = new formidable.IncomingForm({
uploadDir: config.runtimeDir,
keepExtensions: true,
multiples: false,
});
return new Promise((resolve, reject) => {
form.parse(req, (error, fields, files) => {
if (error) {
reject(error);
return;
}
resolve({ fields, files });
});
});
}
function getUploadedFile(files) {
const candidate = files.file || files.audio || files.prompt_audio || null;
if (Array.isArray(candidate)) {
return candidate[0] || null;
}
return candidate;
}
async function handleWorkflowMessage(body) {
const sessionId = body.sessionId || config.openclawSession;
const userId = body.userId || 'anonymous';
const text = String(body.text || '').trim();
const snapshot = getSessionState(sessionId, userId);
const intent = detectIntent(text, snapshot.session);
if (intent === 'ignore_asr') {
return {
success: true,
action: 'ignore',
reason: 'detected_asr_request',
message: '当前请求是语音识别/转写意图,应交给 xeon_asr,不走 xeon_tts。',
};
}
if (intent === 'clone_start') {
writeSessionState(sessionId, userId, (session) => {
session.stage = 'awaiting_reference_audio';
session.referenceAudioPath = null;
session.referenceDurationSec = null;
});
return {
success: true,
action: 'ask_reference_audio',
message: `请上传一段 config.minReferenceDurationSec 到 config.maxReferenceDurationSec 秒的干净人声参考音频,我会用 Base 模型为你克隆音色。getRetentionNotice()`,
};
}
if (snapshot.session.stage === 'awaiting_reference_audio') {
return {
success: true,
action: 'waiting_reference_audio',
message: `当前正在进行音色克隆,请先上传一段 config.minReferenceDurationSec 到 config.maxReferenceDurationSec 秒参考音频。`,
};
}
if (intent === 'clone_text') {
const result = await synthesizeClonedVoice({
text,
referenceAudioPath: snapshot.session.referenceAudioPath,
language: body.language || config.defaultLanguage,
referenceText: body.referenceText || '',
});
writeSessionState(sessionId, userId, (session) => {
session.stage = 'idle';
session.lastOutputPath = result.outputPath;
});
return {
success: true,
action: 'synthesized_clone_voice',
message: `音色克隆完成,音频已落盘: result.outputPath。getRetentionNotice()`,
outputPath: result.outputPath,
estimatedSeconds: result.estimatedSeconds,
};
}
if (intent === 'custom_speak') {
const parsed = parseStyleAndContent(text);
const result = await synthesizeCustomVoice({
text: body.content || parsed.content,
style: body.style || parsed.style,
language: body.language || config.defaultLanguage,
speakerId: body.speakerId || config.defaultSpeaker,
});
writeSessionState(sessionId, userId, (session) => {
session.stage = 'idle';
session.lastOutputPath = result.outputPath;
});
return {
success: true,
action: 'synthesized_custom_voice',
message: `语音生成完成,音频已落盘: result.outputPath。getRetentionNotice()`,
outputPath: result.outputPath,
style: result.style,
estimatedSeconds: result.estimatedSeconds,
};
}
return {
success: true,
action: 'noop',
message: '未识别到 TTS/音色克隆意图。若你要克隆音色,请明确说“我要克隆音色”;若要生成语音,请明确说“用某种语气朗读这段话”。',
};
}
const server = http.createServer(async (req, res) => {
try {
if (req.method === 'GET' && req.url === '/health') {
return sendJson(res, 200, { status: 'ok', port: config.port });
}
if (req.method === 'GET' && req.url.startsWith('/api/session/state')) {
const url = new URL(req.url, 'http://127.0.0.1');
const sessionId = url.searchParams.get('sessionId') || config.openclawSession;
const userId = url.searchParams.get('userId') || 'anonymous';
const current = getSessionState(sessionId, userId).session;
return sendJson(res, 200, { success: true, session: current });
}
if (req.method === 'POST' && req.url === '/api/workflow/message') {
const body = await readJsonBody(req);
const result = await handleWorkflowMessage(body || {});
return sendJson(res, 200, result);
}
if (req.method === 'POST' && req.url === '/api/workflow/reference-audio') {
const parsed = await parseMultipart(req);
const file = getUploadedFile(parsed.files);
const sessionId = firstValue(parsed.fields.sessionId) || config.openclawSession;
const userId = firstValue(parsed.fields.userId) || 'anonymous';
if (!file?.filepath) {
return sendJson(res, 400, { success: false, error: 'missing audio file' });
}
const durationSec = await getAudioDurationSec(file.filepath);
if (durationSec < Number(config.minReferenceDurationSec || 3) || durationSec > Number(config.maxReferenceDurationSec || 5)) {
fs.rmSync(file.filepath, { force: true });
return sendJson(res, 400, {
success: false,
error: `参考音频时长为 durationSec.toFixed(2) 秒,不在 config.minReferenceDurationSec 到 config.maxReferenceDurationSec 秒范围内`,
});
}
const targetPath = persistReferenceAudioForSession(sessionId, userId, file.filepath, file.originalFilename || file.filepath, durationSec);
fs.rmSync(file.filepath, { force: true });
return sendJson(res, 200, {
success: true,
action: 'reference_audio_saved',
durationSec,
referenceAudioPath: targetPath,
message: `参考音频已接收,请继续发送你希望用该音色朗读的文本。getRetentionNotice()`,
});
}
if (req.method === 'POST' && req.url === '/api/workflow/asr-audio-intake') {
const parsed = await parseMultipart(req);
const file = getUploadedFile(parsed.files);
if (!file?.filepath) {
return sendJson(res, 400, { success: false, consumed: false, error: 'missing audio file' });
}
const sessionId = firstValue(parsed.fields.sessionId) || req.headers['x-openclaw-session'] || req.headers['x-session-id'] || config.openclawSession;
const userId = firstValue(parsed.fields.userId) || firstValue(parsed.fields.senderId) || req.headers['x-user-id'] || req.headers['x-sender-id'] || '';
const matched = resolvePendingReferenceSession(sessionId, userId);
if (!matched.matched) {
fs.rmSync(file.filepath, { force: true });
return sendJson(res, 200, {
success: true,
consumed: false,
reason: matched.reason,
pendingCount: matched.pendingCount || 0,
message: '当前没有等待参考音频的音色克隆会话。',
});
}
const durationSec = await getAudioDurationSec(file.filepath);
if (durationSec < Number(config.minReferenceDurationSec || 3) || durationSec > Number(config.maxReferenceDurationSec || 5)) {
fs.rmSync(file.filepath, { force: true });
return sendJson(res, 200, {
success: true,
consumed: true,
accepted: false,
sessionId: matched.sessionId,
userId: matched.userId,
reason: 'invalid_reference_duration',
durationSec,
message: `参考音频时长为 durationSec.toFixed(2) 秒,不在 config.minReferenceDurationSec 到 config.maxReferenceDurationSec 秒范围内,请重新上传。getRetentionNotice()`,
transcriptText: `【TTS参考音频未接收】参考音频时长为 durationSec.toFixed(2) 秒,不在 config.minReferenceDurationSec 到 config.maxReferenceDurationSec 秒范围内,请提醒用户重新上传 3 到 5 秒的干净人声。`,
});
}
const targetPath = persistReferenceAudioForSession(
matched.sessionId,
matched.userId,
file.filepath,
file.originalFilename || file.filepath,
durationSec,
);
fs.rmSync(file.filepath, { force: true });
return sendJson(res, 200, {
success: true,
consumed: true,
accepted: true,
action: 'reference_audio_saved_via_asr',
reason: matched.reason,
sessionId: matched.sessionId,
userId: matched.userId,
durationSec,
referenceAudioPath: targetPath,
message: `参考音频已接收,请继续发送你希望用该音色朗读的文本。getRetentionNotice()`,
transcriptText: '【TTS参考音频已接收】当前音色克隆会话的参考音频已经保存,请直接让用户继续发送要合成的文本,不要再要求重复上传,也不要说文件已被 ASR 清理。',
});
}
if (req.method === 'POST' && req.url === '/api/tts/custom-speak') {
const body = await readJsonBody(req);
const result = await synthesizeCustomVoice({
text: body.text,
style: body.style || '普通',
language: body.language || config.defaultLanguage,
speakerId: body.speakerId || config.defaultSpeaker,
});
return sendJson(res, 200, { success: true, ...result });
}
if (req.method === 'POST' && req.url === '/api/tts/clone-speak') {
const body = await readJsonBody(req);
const result = await synthesizeClonedVoice({
text: body.text,
referenceAudioPath: body.referenceAudioPath,
language: body.language || config.defaultLanguage,
referenceText: body.referenceText || '',
});
return sendJson(res, 200, { success: true, ...result });
}
return sendJson(res, 404, { success: false, error: 'not found' });
} catch (error) {
console.error('[xeontts] request failed:', error);
return sendJson(res, 500, { success: false, error: error.message || String(error) });
}
});
server.listen(config.port, '0.0.0.0', async () => {
let ttsHealth = 'unknown';
try {
const response = await fetch(config.flaskHealthUrl);
ttsHealth = response.ok ? 'ok' : `http_response.status`;
} catch (error) {
ttsHealth = `unreachable: error.message`;
}
console.log(`[xeontts] workflow gateway listening on http://0.0.0.0:config.port`);
console.log(`[xeontts] upstream tts health: ttsHealth`);
});
FILE:setup_env.sh
#!/usr/bin/env bash
set -euo pipefail
export HF_ENDPOINT=https://hf-mirror.com
export HF_HUB_ENABLE_HF_TRANSFER=0
RED='\033[0;31m'
GREEN='\033[0;32m'
YELLOW='\033[1;33m'
BLUE='\033[0;34m'
NC='\033[0m'
log_info() { echo -e "GREEN[INFO]NC $1"; }
log_warn() { echo -e "YELLOW[WARN]NC $1"; }
log_error() { echo -e "RED[ERROR]NC $1"; }
log_step() { echo -e "BLUE[STEP]NC $1"; }
SKILL_DIR="$(cd "$(dirname "BASH_SOURCE[0]")" && pwd)"
cd "$SKILL_DIR"
BASE_MODEL_PATH="-~/model/Qwen3-TTS-12Hz-0.6B-Base-OpenVINO-INT8"
CUSTOM_MODEL_PATH="-~/model/Qwen3-TTS-12Hz-0.6B-CustomVoice-OpenVINO-INT8"
BASE_CHECKPOINT_PATH="-"
BASE_MODEL_REPO="-aurora2035/Qwen3-TTS-12Hz-0.6B-Base-OpenVINO-INT8"
CUSTOM_MODEL_REPO="-aurora2035/Qwen3-TTS-12Hz-0.6B-CustomVoice-OpenVINO-INT8"
BASE_CHECKPOINT_REPO="-"
TTS_PIP_SPEC="-xdp-tts-service"
FORCE=0
SKIP_DEPS=0
usage() {
cat <<EOF
用法: $0 [--force] [--skip-deps]
环境变量覆盖:
XDP_TTS_PIP_SPEC Python 包安装源,默认 xdp-tts-service
BASE_MODEL_PATH Base OV 模型目录
CUSTOM_MODEL_PATH Custom OV 模型目录
BASE_CHECKPOINT_PATH 可选:Base 原始 checkpoint 目录(旧导出兼容)
BASE_MODEL_REPO Base OV 模型 HF 仓库名
CUSTOM_MODEL_REPO Custom OV 模型 HF 仓库名
BASE_CHECKPOINT_REPO 可选:Base checkpoint HF 仓库名(旧导出兼容)
EOF
}
while [[ $# -gt 0 ]]; do
case "$1" in
--force) FORCE=1; shift ;;
--skip-deps) SKIP_DEPS=1; shift ;;
-h|--help) usage; exit 0 ;;
*) log_error "未知参数: $1"; usage; exit 1 ;;
esac
done
expand_path() {
local value="$1"
if [[ "$value" == ~* ]]; then
value="value/#\~/$HOME"
fi
printf '%s\n' "$value"
}
detect_os() {
if [[ -f /etc/os-release ]]; then
. /etc/os-release
printf '%s\n' "$ID"
return 0
fi
printf '%s\n' unknown
}
check_sudo() {
if [[ "$EUID" -eq 0 ]]; then
SUDO=""
elif command -v sudo >/dev/null 2>&1; then
SUDO="sudo"
else
SUDO=""
fi
}
install_system_deps() {
[[ "$SKIP_DEPS" -eq 1 ]] && return 0
check_sudo
local os_id
os_id="$(detect_os)"
case "$os_id" in
ubuntu|debian)
log_step "安装系统依赖 (Debian/Ubuntu)"
$SUDO apt-get update -qq >/dev/null 2>&1 || true
$SUDO apt-get install -y wget curl git lsof net-tools unzip bzip2 ca-certificates ffmpeg >/dev/null 2>&1 || \
$SUDO apt-get install -y wget curl git lsof net-tools unzip bzip2 ca-certificates ffmpeg
;;
centos|rhel|fedora|rocky|almalinux|ol|alibabacloud|alios)
log_step "安装系统依赖 (RHEL/CentOS)"
local pkg_mgr="yum"
command -v dnf >/dev/null 2>&1 && pkg_mgr="dnf"
$SUDO "$pkg_mgr" install -y wget curl git lsof net-tools unzip bzip2 ca-certificates ffmpeg which >/dev/null 2>&1 || \
$SUDO "$pkg_mgr" install -y wget curl git lsof net-tools unzip bzip2 ca-certificates ffmpeg which
;;
*)
log_warn "未知系统,跳过系统依赖自动安装"
;;
esac
}
setup_miniconda() {
log_step "准备 Miniconda Python 3.10"
local conda_dir="$HOME/miniconda3"
local conda_url="https://repo.anaconda.com/miniconda/Miniconda3-py310_23.11.0-2-Linux-x86_64.sh"
if [[ "$FORCE" -eq 1 && -d "$conda_dir" ]]; then
rm -rf "$conda_dir"
fi
if [[ ! -d "$conda_dir" ]]; then
wget --timeout=120 -q "$conda_url" -O /tmp/miniconda.sh || curl -fsSL --connect-timeout 120 "$conda_url" -o /tmp/miniconda.sh
bash /tmp/miniconda.sh -b -p "$conda_dir" >/dev/null 2>&1
rm -f /tmp/miniconda.sh
fi
PYTHON_CMD="$conda_dir/bin/python"
[[ -x "$PYTHON_CMD" ]] || { log_error "Miniconda 安装失败"; exit 1; }
log_info "Python 就绪: $($PYTHON_CMD --version 2>&1)"
}
setup_venv() {
if [[ "$FORCE" -eq 1 && -d venv ]]; then
rm -rf venv
fi
if [[ ! -d venv ]]; then
log_step "创建虚拟环境"
"$PYTHON_CMD" -m venv venv
fi
source venv/bin/activate
pip install -q --upgrade pip
}
install_python_packages() {
log_step "安装 Python TTS 服务包"
if ! pip install -q --upgrade "$TTS_PIP_SPEC"; then
log_error "安装失败: $TTS_PIP_SPEC"
log_error "如果包尚未发布,请设置 XDP_TTS_PIP_SPEC=/path/to/xdp_tts_service.whl 后重试"
exit 1
fi
log_info "已安装: $TTS_PIP_SPEC"
}
verify_python_runtime() {
log_step "校验 Python TTS 运行时"
[[ -x venv/bin/xdp-tts-service ]] || {
log_error "安装后仍未生成 venv/bin/xdp-tts-service"
log_error "请检查 XDP_TTS_PIP_SPEC 是否指向包含 console entry point 的有效包"
exit 1
}
venv/bin/python <<'PYEOF'
import importlib.util
import sys
required = [
'xdp_tts_service',
'qwen_tts',
]
missing = []
for name in required:
if importlib.util.find_spec(name) is None:
missing.append(name)
if missing:
print('missing python modules:', ', '.join(missing), file=sys.stderr)
sys.exit(1)
PYEOF
}
copy_node_config() {
if [[ -f config.json && "$FORCE" -ne 1 ]]; then
log_info "config.json 已存在,跳过"
return 0
fi
cp config.example.json config.json
log_info "已生成 config.json"
}
generate_tts_config() {
if [[ -f tts_config.json && "$FORCE" -ne 1 ]]; then
log_info "tts_config.json 已存在,跳过"
return 0
fi
if [[ -x venv/bin/xdp-tts-init-config ]]; then
venv/bin/xdp-tts-init-config --output ./tts_config.json >/dev/null
else
cp tts_config.example.json tts_config.json
fi
log_info "已生成 tts_config.json"
}
resolve_hf_cli() {
export PATH="$HOME/miniconda3/bin:$HOME/.local/bin:$PATH"
if command -v hf >/dev/null 2>&1; then
printf '%s\n' "$(command -v hf)"
return 0
fi
if command -v huggingface-cli >/dev/null 2>&1; then
printf '%s\n' "$(command -v huggingface-cli)"
return 0
fi
if pip install -q 'huggingface_hub[cli]' >/dev/null 2>&1 || pip install -q huggingface_hub >/dev/null 2>&1; then
export PATH="$HOME/miniconda3/bin:$HOME/.local/bin:$PATH"
if command -v hf >/dev/null 2>&1; then
printf '%s\n' "$(command -v hf)"
return 0
fi
if command -v huggingface-cli >/dev/null 2>&1; then
printf '%s\n' "$(command -v huggingface-cli)"
return 0
fi
fi
return 1
}
download_model_if_missing() {
local repo_id="$1"
local target_path
[[ -n "$repo_id" ]] || return 0
[[ -n "-" ]] || return 0
target_path="$(expand_path "$2")"
if [[ -d "$target_path" ]] && [[ -n "$(ls -A "$target_path" 2>/dev/null || true)" ]]; then
log_info "模型目录已存在: $target_path"
return 0
fi
local hf_cli
if ! hf_cli="$(resolve_hf_cli)"; then
log_warn "未找到 Hugging Face CLI,跳过自动下载: $repo_id"
return 0
fi
mkdir -p "$target_path"
log_step "尝试下载模型: $repo_id -> $target_path"
if [[ "$(basename "$hf_cli")" == "hf" ]]; then
"$hf_cli" download "$repo_id" --local-dir "$target_path" || log_warn "下载失败,请后续手工补齐: $repo_id"
else
"$hf_cli" download "$repo_id" --local-dir "$target_path" --local-dir-use-symlinks False || log_warn "下载失败,请后续手工补齐: $repo_id"
fi
}
update_tts_model_paths() {
local base_model_abs custom_model_abs base_ckpt_abs
base_model_abs="$(expand_path "$BASE_MODEL_PATH")"
custom_model_abs="$(expand_path "$CUSTOM_MODEL_PATH")"
base_ckpt_abs=""
if [[ -n "$BASE_CHECKPOINT_PATH" ]]; then
base_ckpt_abs="$(expand_path "$BASE_CHECKPOINT_PATH")"
fi
log_step "写入 TTS 模型路径到 tts_config.json"
venv/bin/python <<PYEOF
import json
from pathlib import Path
config_path = Path("tts_config.json")
config = json.loads(config_path.read_text(encoding="utf-8"))
legacy_base = config.pop("qwen3_tts_base", None) or {}
legacy_custom = config.pop("qwen3_tts_custom", None) or {}
base_cfg = config.setdefault("qwen3_tts_0.6b_base_openvino", {})
custom_cfg = config.setdefault("qwen3_tts_0.6b_custom_openvino", {})
if legacy_base and not base_cfg.get("model_dir"):
base_cfg["model_dir"] = legacy_base.get("model") or legacy_base.get("model_dir") or ""
if legacy_custom and not custom_cfg.get("model_dir"):
custom_cfg["model_dir"] = legacy_custom.get("model") or legacy_custom.get("model_dir") or ""
base_cfg["model_dir"] = r"$base_model_abs"
base_cfg.setdefault("label", "Qwen3-TTS-0.6B-Base(OpenVINO)")
base_cfg.setdefault("model_type", "Qwen3_TTS_OpenVINO")
base_cfg.setdefault("tts_model_type", "voice_clone")
base_cfg.setdefault("force_cpu", False)
base_cfg.setdefault("default_mode", "voice_clone_xvector")
base_cfg.setdefault("modes", ["voice_clone", "voice_clone_xvector"])
base_cfg.setdefault("device", "CPU")
base_cfg.setdefault("default_language", "Chinese")
base_cfg.setdefault("prompt_text", "")
base_cfg.setdefault("prompt_audio", "")
custom_cfg["model_dir"] = r"$custom_model_abs"
custom_cfg.setdefault("label", "Qwen3-TTS-0.6B-CustomVoice(OpenVINO)")
custom_cfg.setdefault("model_type", "Qwen3_TTS_OpenVINO")
custom_cfg.setdefault("tts_model_type", "custom_voice")
custom_cfg.setdefault("force_cpu", False)
custom_cfg.setdefault("default_mode", "custom_voice")
custom_cfg.setdefault("modes", ["custom_voice"])
custom_cfg.setdefault("device", "CPU")
custom_cfg.setdefault("default_language", "Chinese")
custom_cfg.setdefault("default_speaker", "Vivian")
custom_cfg.setdefault("speakers", ["Vivian", "Serena", "Uncle_Fu", "Dylan", "Eric", "Ryan", "Aiden", "Ono_Anna", "Sohee"])
base_ckpt = r"$base_ckpt_abs".strip()
if base_ckpt:
base_cfg["checkpoint_path"] = base_ckpt
else:
base_cfg.pop("checkpoint_path", None)
config_path.write_text(json.dumps(config, indent=2, ensure_ascii=False) + "\n", encoding="utf-8")
PYEOF
}
repair_base_checkpoint_hint() {
local base_model_abs hint_file resolved_hint
base_model_abs="$(expand_path "$BASE_MODEL_PATH")"
hint_file="$base_model_abs/checkpoint_path.txt"
[[ -f "$hint_file" ]] || return 0
resolved_hint="$(tr -d '\r' < "$hint_file" | head -n 1 | xargs)"
if [[ -z "$resolved_hint" || ! -e "$(expand_path "$resolved_hint")" ]]; then
printf '%s\n' "$base_model_abs" > "$hint_file"
log_info "已修复 Base 模型 checkpoint_path.txt: $hint_file"
fi
}
main() {
echo "========================================"
echo " Xeon TTS Skill 环境准备"
echo "========================================"
install_system_deps
setup_miniconda
setup_venv
install_python_packages
verify_python_runtime
copy_node_config
generate_tts_config
download_model_if_missing "$BASE_MODEL_REPO" "$BASE_MODEL_PATH"
download_model_if_missing "$CUSTOM_MODEL_REPO" "$CUSTOM_MODEL_PATH"
download_model_if_missing "$BASE_CHECKPOINT_REPO" "$BASE_CHECKPOINT_PATH"
update_tts_model_paths
repair_base_checkpoint_hint
mkdir -p runtime outputs references
log_info "Xeon TTS 环境准备完成"
}
main "$@"
FILE:start_all.sh
#!/usr/bin/env bash
set -euo pipefail
SCRIPT_DIR="$(cd "$(dirname "BASH_SOURCE[0]")" && pwd)"
cd "$SCRIPT_DIR"
GREEN='\033[0;32m'
YELLOW='\033[1;33m'
BLUE='\033[0;34m'
NC='\033[0m'
log_info() { echo -e "GREEN[INFO]NC $1"; }
log_warn() { echo -e "YELLOW[WARN]NC $1"; }
log_step() { echo -e "BLUE[STEP]NC $1"; }
echo "========================================"
echo " 启动 Xeon TTS 服务"
echo "========================================"
[ ! -d node_modules ] && npm install
log_step "启动 Python TTS 服务 (5002)"
if ! lsof -Pi :5002 -sTCP:LISTEN -t >/dev/null 2>&1; then
./start_tts_service.sh
sleep 2
fi
log_step "启动 Node TTS 工作流网关 (9002)"
pkill -f "node.*server.js" 2>/dev/null || true
sleep 1
(setsid node server.js >> skill.log 2>&1 </dev/null &)
sleep 3
PID_9002=$(lsof -Pi :9002 -sTCP:LISTEN -t 2>/dev/null | head -1 || true)
if [[ -z "$PID_9002" ]]; then
log_warn "未检测到 9002 监听进程,请查看 skill.log"
exit 1
fi
echo "$PID_9002" > skill.pid
log_info "Node 网关已启动 (PID: $PID_9002)"
if command -v openclaw >/dev/null 2>&1; then
log_step "重启 OpenClaw gateway"
openclaw gateway restart || log_warn "gateway restart 失败,请手工执行"
fi
log_info "完成。健康检查: curl http://127.0.0.1:5002/api/health && curl http://127.0.0.1:9002/health"
FILE:start_tts_service.sh
#!/usr/bin/env bash
set -euo pipefail
GREEN='\033[0;32m'
YELLOW='\033[1;33m'
RED='\033[0;31m'
BLUE='\033[0;34m'
NC='\033[0m'
log_info() { echo -e "GREEN[INFO]NC $1"; }
log_warn() { echo -e "YELLOW[WARN]NC $1"; }
log_error() { echo -e "RED[ERROR]NC $1"; }
log_step() { echo -e "BLUE[STEP]NC $1"; }
SCRIPT_DIR="$(cd "$(dirname "BASH_SOURCE[0]")" && pwd)"
cd "$SCRIPT_DIR"
PID_FILE="$SCRIPT_DIR/tts.pid"
LOG_FILE="$SCRIPT_DIR/tts.log"
PORT=5002
START_BIN="$SCRIPT_DIR/venv/bin/xdp-tts-service"
[[ -x "$START_BIN" ]] || { log_error "未找到 $START_BIN,请先运行 bash setup_env.sh"; exit 1; }
[[ -f "$SCRIPT_DIR/tts_config.json" ]] || { log_error "未找到 tts_config.json"; exit 1; }
if lsof -Pi :$PORT -sTCP:LISTEN -t >/dev/null 2>&1; then
log_error "端口 $PORT 已被占用"
exit 1
fi
if [[ -f "$PID_FILE" ]] && kill -0 "$(cat "$PID_FILE")" 2>/dev/null; then
log_error "TTS 服务已在运行 (PID: $(cat "$PID_FILE"))"
exit 1
fi
log_step "启动 TTS Flask 服务"
XDP_TTS_CONFIG="$SCRIPT_DIR/tts_config.json" nohup "$START_BIN" --host 127.0.0.1 --port $PORT --config "$SCRIPT_DIR/tts_config.json" > "$LOG_FILE" 2>&1 &
echo $! > "$PID_FILE"
sleep 3
if curl -fsS http://127.0.0.1:$PORT/api/health >/dev/null 2>&1; then
log_info "TTS 服务启动成功: http://127.0.0.1:$PORT/api/health"
else
log_warn "TTS 服务已启动,但健康检查暂未通过,请查看 $LOG_FILE"
fi
FILE:stop_tts.sh
#!/usr/bin/env bash
set -euo pipefail
SCRIPT_DIR="$(cd "$(dirname "BASH_SOURCE[0]")" && pwd)"
cd "$SCRIPT_DIR"
if [[ -f tts.pid ]]; then
PID="$(cat tts.pid)"
kill "$PID" 2>/dev/null || kill -9 "$PID" 2>/dev/null || true
rm -f tts.pid
fi
PID_9002=$(lsof -Pi :9002 -sTCP:LISTEN -t 2>/dev/null | head -1 || true)
if [[ -n "$PID_9002" ]]; then
kill "$PID_9002" 2>/dev/null || kill -9 "$PID_9002" 2>/dev/null || true
fi
pkill -f "node.*server.js" 2>/dev/null || true
echo "Xeon TTS 服务已停止"
FILE:tts_config.example.json
{
"qwen3_tts_0.6b_base_openvino": {
"label": "Qwen3-TTS-0.6B-Base(OpenVINO)",
"model_dir": "~/model/Qwen3-TTS-12Hz-0.6B-Base-OpenVINO-INT8",
"model_type": "Qwen3_TTS_OpenVINO",
"tts_model_type": "voice_clone",
"force_cpu": false,
"default_mode": "voice_clone_xvector",
"modes": ["voice_clone", "voice_clone_xvector"],
"device": "CPU",
"default_language": "Chinese",
"prompt_text": "",
"prompt_audio": ""
},
"qwen3_tts_0.6b_custom_openvino": {
"label": "Qwen3-TTS-0.6B-CustomVoice(OpenVINO)",
"model_dir": "~/model/Qwen3-TTS-12Hz-0.6B-CustomVoice-OpenVINO-INT8",
"model_type": "Qwen3_TTS_OpenVINO",
"tts_model_type": "custom_voice",
"force_cpu": false,
"default_mode": "custom_voice",
"modes": ["custom_voice"],
"device": "CPU",
"default_language": "Chinese",
"default_speaker": "Vivian",
"speakers": ["Vivian", "Serena", "Uncle_Fu", "Dylan", "Eric", "Ryan", "Aiden", "Ono_Anna", "Sohee"]
}
}
FILE:_meta.json
{
"ownerId": "kn70s411jhsq1ert53ad4s7v8d82nqmh",
"slug": "xeontts",
"version": "1.0.0",
"publishedAt": 1773792000000
}