@clawhub-lotfinity-b3295cdaa0
Use the Selkies test browser stack for desktop-stream/browser-in-desktop checks, Selkies-specific debugging, and viewport/mobile-context experiments. Trigger...
--- name: camofox-browser-selkies-test description: Use the Selkies test browser stack for desktop-stream/browser-in-desktop checks, Selkies-specific debugging, and viewport/mobile-context experiments. Trigger this when Lotfi mentions Selkies, the test browser, the test image, browser-in-desktop behavior, or the services on `http://127.0.0.1:9378` / `http://127.0.0.1:3003`. --- Use the Selkies test browser stack. Default targets: - camofox-browser API: `http://127.0.0.1:9378` - Selkies UI: `http://127.0.0.1:3003` - Docker container: `camofox-selkies-test` - Current pushed image reference: `medtouadmin/camofox-browser:selkies-beta` Use this skill for: - Selkies stream/websocket debugging - browser-in-desktop validation - viewport/mobile-context experiments tied to the Selkies test stack - reproducing issues specific to the test image/container Hard rules: - Treat this as separate from the main camofox-browser service. - If the task is about the shared remote browser over tailnet CDP, use `browser-cdp-tailnet` instead. - If the task is about the default local service on `127.0.0.1:9377`, use `camofox-browser-main` instead.
Use the main local camofox-browser service for standard browser automation in this workspace. Trigger this when Lotfi asks for the main/local camofox browser...
--- name: camofox-browser-main description: Use the main local camofox-browser service for standard browser automation in this workspace. Trigger this when Lotfi asks for the main/local camofox browser, the default camofox browser, or work against the service on `http://127.0.0.1:9377`. --- Use the main local camofox-browser service. Default target: - API base URL: `http://127.0.0.1:9377` - Docker container: `peaceful_kare` Workflow: 1. Check `/health`. 2. Open or reuse a tab. 3. Wait/snapshot. 4. Act. 5. Snapshot again after state-changing actions. Hard rules: - Always send `userId`. - Prefer `sessionKey` when creating or reusing tabs. - Re-snapshot after click, type, press, or navigation. - If the user names another browser target explicitly, stop and use that specific skill instead. Use this as the default local camofox-browser target unless Lotfi explicitly asks for Selkies/test or the detached tailnet browser.
Use the detached shared Chromium browser exposed over the tailnet CDP endpoint. Trigger this when Lotfi asks for the detached browser, shared browser, remote...
---
name: browser-cdp-tailnet
description: Use the detached shared Chromium browser exposed over the tailnet CDP endpoint. Trigger this when Lotfi asks for the detached browser, shared browser, remote CDP browser, tailnet browser, or a browser reachable at `http://100.101.184.33:9223` / `ws://100.101.184.33:9223/...`.
---
Use the shared remote Chromium/CDP browser over the tailnet.
Default target:
- CDP base URL: `http://100.101.184.33:9223`
- Browser WS endpoint: `ws://100.101.184.33:9223/devtools/browser/3fbb2459-85c5-40b5-8d50-6f3c596cf8d5`
Preferred connection method:
- `chromium.connectOverCDP("http://100.101.184.33:9223")`
Hard rules:
- Prefer the HTTP CDP base URL over hardcoding the raw WS endpoint when your client supports it.
- If `/json/version` reports `ws://localhost/...`, replace `localhost` with `100.101.184.33:9223`.
- Verify with a small probe before claiming it works.
Known-good checks already observed on this machine:
- `/json/version` responded on `http://100.101.184.33:9223`
- CDP WebSocket handshake succeeded
- `Browser.getVersion` succeeded
- live navigation to YouTube succeeded
Use this skill instead of local browser skills when the browser should be shared across agents or reached remotely over the tailnet.
Control a standalone camofox-browser server over its REST API, especially when a local or remote service is already running on port 9377. Use for opening tab...
---
name: camofox-browser-control
description: Control a standalone camofox-browser server over its REST API, especially when a local or remote service is already running on port 9377. Use for opening tabs, navigating, snapshotting pages, clicking refs, typing into forms, pressing keys, scrolling, exporting storage state, importing cookies, or debugging browser automation against camofox-browser/Camoufox behavior.
---
Use the standalone camofox-browser server directly over HTTP.
Default assumptions for this workspace:
- Base URL: `http://127.0.0.1:9377`
- The service is already running.
- `userId` is mandatory on nearly every useful request.
- `sessionKey` (or legacy `listItemId`) groups tabs; default to `default`.
## Golden workflow
1. Check `/health`.
2. Create a tab with `/tabs`.
3. Call `/tabs/:tabId/wait`.
4. Call `/tabs/:tabId/snapshot` and read refs.
5. Act with `/click`, `/type`, `/press`, `/scroll`, or `/navigate`.
6. Snapshot again after any state-changing action.
Prefer this loop over HTML scraping.
## Hard rules
- Always send `userId`.
- Prefer `POST /tabs` with `sessionKey` for raw server use.
- Re-snapshot after click, type, press, or navigation.
- If a field ignores `fill`, retry with `type` using `mode: "keyboard"`.
- If `/tabs` returns an empty list, check whether `userId` was omitted.
- Use direct navigation when the target URL is known; do not over-click through search results if a stable URL exists.
- Use VNC/manual login for MFA, CAPTCHAs, or brittle auth flows, then reuse storage state or persistence.
## Minimal endpoint map
Read `references/api-cheatsheet.md` when you need request/response shapes.
Most-used endpoints:
- `GET /health`
- `POST /tabs`
- `GET /tabs?userId=...`
- `POST /tabs/:tabId/wait`
- `GET /tabs/:tabId/snapshot?userId=...`
- `POST /tabs/:tabId/click`
- `POST /tabs/:tabId/type`
- `POST /tabs/:tabId/press`
- `POST /tabs/:tabId/scroll`
- `POST /tabs/:tabId/navigate`
- `POST /tabs/:tabId/evaluate`
- `POST /sessions/:userId/cookies`
- `GET /sessions/:userId/storage_state`
## Recommended helper script
Use `scripts/camofox.py` instead of rewriting raw HTTP every time.
Examples:
```bash
python3 skills/camofox-browser-control/scripts/camofox.py health
python3 skills/camofox-browser-control/scripts/camofox.py open --user lotfi --session default --url https://github.com
python3 skills/camofox-browser-control/scripts/camofox.py snapshot --user lotfi --tab <tabId>
python3 skills/camofox-browser-control/scripts/camofox.py click --user lotfi --tab <tabId> --ref e17
python3 skills/camofox-browser-control/scripts/camofox.py type --user lotfi --tab <tabId> --ref e2 --text 'hello' --mode fill
python3 skills/camofox-browser-control/scripts/camofox.py type --user lotfi --tab <tabId> --text '97304' --mode keyboard --submit
python3 skills/camofox-browser-control/scripts/camofox.py navigate --user lotfi --tab <tabId> --url https://example.com
```
## Known quirks
- `GET /tabs` without `userId` can misleadingly show no tabs even when tabs exist.
- Refs go stale after page changes. Snapshot again instead of reusing old refs blindly.
- `click` already retries normal click, force click, and mouse sequence; success does not guarantee the frontend changed the state you expect, so verify with a fresh snapshot.
- Some sites accept direct URL navigation more reliably than UI clicking.
- Some frontend inputs require true keyboard events. Use `mode: "keyboard"` plus `--submit` when `fill` does not trigger app logic.
- Large multi-step chained calls are more fragile than short calls with verification between them.
## Login strategy
For normal forms:
- open → wait → snapshot → type → click/submit → snapshot
For stubborn auth:
- use VNC/noVNC login
- export `storage_state`
- rely on persistence or restore state on later runs
For cookie bootstrap:
- import Netscape cookies through `/sessions/:userId/cookies`
- requires `CAMOFOX_API_KEY`
## Escape hatch
Use `/tabs/:tabId/evaluate` only when refs/typing/clicking are insufficient. Keep expressions small and targeted.
## Local note for this machine
The current host already has a live server on `127.0.0.1:9377`, with VNC/noVNC exposed by the container. Treat that as the default target unless the task says otherwise.
FILE:references/api-cheatsheet.md
# camofox-browser API cheatsheet
Base URL defaults to `http://127.0.0.1:9377`.
## Health
`GET /health`
Returns service/browser status.
## Open tab
`POST /tabs`
```json
{
"userId": "lotfi",
"sessionKey": "default",
"url": "https://example.com"
}
```
Returns:
```json
{
"tabId": "uuid",
"url": "https://example.com"
}
```
Notes:
- `userId` and `sessionKey` are required here.
- `POST /tabs/open` exists too, but uses `listItemId`; reserve that for compatibility cases.
## List tabs
`GET /tabs?userId=lotfi`
Returns:
```json
{
"running": true,
"tabs": [
{
"targetId": "uuid",
"tabId": "uuid",
"url": "https://example.com",
"title": "Example",
"listItemId": "default"
}
]
}
```
Important: without `userId`, this may look empty.
## Wait
`POST /tabs/:tabId/wait`
```json
{
"userId": "lotfi",
"timeout": 10000,
"waitForNetwork": false
}
```
Returns `{ "ok": true, "ready": true }`.
## Snapshot
`GET /tabs/:tabId/snapshot?userId=lotfi&format=text`
Returns:
```json
{
"url": "https://example.com",
"snapshot": "- button \"Search\" [e1] ...",
"refsCount": 12,
"truncated": false,
"totalChars": 1234,
"hasMore": false,
"nextOffset": null
}
```
Notes:
- Refs like `e1`, `e2` come from the snapshot.
- Snapshot again after page changes.
## Click
`POST /tabs/:tabId/click`
By ref:
```json
{ "userId": "lotfi", "ref": "e17" }
```
By selector:
```json
{ "userId": "lotfi", "selector": "button.submit" }
```
Returns:
```json
{
"ok": true,
"url": "https://example.com/next",
"refsAvailable": true
}
```
## Type
`POST /tabs/:tabId/type`
Fill mode:
```json
{
"userId": "lotfi",
"ref": "e2",
"text": "hello",
"mode": "fill"
}
```
Keyboard mode:
```json
{
"userId": "lotfi",
"text": "97304",
"mode": "keyboard",
"delay": 120,
"submit": true
}
```
Notes:
- `fill` requires `ref` or `selector`.
- `keyboard` can type into the current focus.
- Use keyboard mode for reactive or stubborn inputs.
## Press
`POST /tabs/:tabId/press`
```json
{ "userId": "lotfi", "key": "Enter" }
```
## Scroll
`POST /tabs/:tabId/scroll`
```json
{ "userId": "lotfi", "direction": "down", "amount": 500 }
```
## Navigate existing tab
`POST /tabs/:tabId/navigate`
```json
{ "userId": "lotfi", "url": "https://chatgpt.com" }
```
Returns:
```json
{
"ok": true,
"tabId": "uuid",
"url": "https://chatgpt.com/",
"refsAvailable": true
}
```
## Evaluate
`POST /tabs/:tabId/evaluate`
```json
{
"userId": "lotfi",
"expression": "(() => document.title)()"
}
```
Returns:
```json
{ "ok": true, "result": "Telegram" }
```
Use sparingly.
## Cookie import
`POST /sessions/:userId/cookies`
Requires `Authorization: Bearer <CAMOFOX_API_KEY>`.
Body contains Playwright-style cookies.
## Storage state export
`GET /sessions/:userId/storage_state`
Requires `Authorization: Bearer <CAMOFOX_API_KEY>` unless loopback/non-production allowances apply.
Useful after VNC/manual login.
## VNC notes
Common ports from the VNC plugin:
- `5900` VNC
- `6080` noVNC web UI
Typical flow:
1. open login page
2. complete login visually in noVNC
3. export storage state
4. reuse state later
FILE:scripts/camofox.py
#!/usr/bin/env python3
import argparse
import json
import sys
import urllib.parse
import urllib.request
DEFAULT_BASE = "http://127.0.0.1:9377"
def request(base, method, path, params=None, body=None, headers=None, timeout=40):
url = base.rstrip("/") + path
if params:
url += "?" + urllib.parse.urlencode(params)
data = None
req_headers = {"Content-Type": "application/json"}
if headers:
req_headers.update(headers)
if body is not None:
data = json.dumps(body).encode()
req = urllib.request.Request(url, data=data, headers=req_headers, method=method)
with urllib.request.urlopen(req, timeout=timeout) as resp:
raw = resp.read().decode()
return json.loads(raw) if raw else {"ok": True}
def add_common(parser):
parser.add_argument("--base", default=DEFAULT_BASE)
parser.add_argument("--timeout", type=int, default=40)
def main():
ap = argparse.ArgumentParser(description="Minimal camofox-browser REST helper")
sub = ap.add_subparsers(dest="cmd", required=True)
p = sub.add_parser("health")
add_common(p)
p = sub.add_parser("open")
add_common(p)
p.add_argument("--user", required=True)
p.add_argument("--session", default="default")
p.add_argument("--url", required=True)
p = sub.add_parser("list")
add_common(p)
p.add_argument("--user", required=True)
p = sub.add_parser("wait")
add_common(p)
p.add_argument("--user", required=True)
p.add_argument("--tab", required=True)
p.add_argument("--wait-for-network", action="store_true")
p.add_argument("--ms", type=int, default=10000)
p = sub.add_parser("snapshot")
add_common(p)
p.add_argument("--user", required=True)
p.add_argument("--tab", required=True)
p.add_argument("--format", default="text")
p.add_argument("--offset", type=int, default=0)
p = sub.add_parser("click")
add_common(p)
p.add_argument("--user", required=True)
p.add_argument("--tab", required=True)
p.add_argument("--ref")
p.add_argument("--selector")
p = sub.add_parser("type")
add_common(p)
p.add_argument("--user", required=True)
p.add_argument("--tab", required=True)
p.add_argument("--ref")
p.add_argument("--selector")
p.add_argument("--text", required=True)
p.add_argument("--mode", choices=["fill", "keyboard"], default="fill")
p.add_argument("--delay", type=int, default=30)
p.add_argument("--submit", action="store_true")
p = sub.add_parser("press")
add_common(p)
p.add_argument("--user", required=True)
p.add_argument("--tab", required=True)
p.add_argument("--key", required=True)
p = sub.add_parser("scroll")
add_common(p)
p.add_argument("--user", required=True)
p.add_argument("--tab", required=True)
p.add_argument("--direction", default="down")
p.add_argument("--amount", type=int, default=500)
p = sub.add_parser("navigate")
add_common(p)
p.add_argument("--user", required=True)
p.add_argument("--tab", required=True)
p.add_argument("--url", required=True)
p = sub.add_parser("evaluate")
add_common(p)
p.add_argument("--user", required=True)
p.add_argument("--tab", required=True)
p.add_argument("--expression", required=True)
args = ap.parse_args()
try:
if args.cmd == "health":
out = request(args.base, "GET", "/health", timeout=args.timeout)
elif args.cmd == "open":
out = request(args.base, "POST", "/tabs", body={"userId": args.user, "sessionKey": args.session, "url": args.url}, timeout=args.timeout)
elif args.cmd == "list":
out = request(args.base, "GET", "/tabs", params={"userId": args.user}, timeout=args.timeout)
elif args.cmd == "wait":
out = request(args.base, "POST", f"/tabs/{args.tab}/wait", body={"userId": args.user, "timeout": args.ms, "waitForNetwork": args.wait_for_network}, timeout=args.timeout)
elif args.cmd == "snapshot":
out = request(args.base, "GET", f"/tabs/{args.tab}/snapshot", params={"userId": args.user, "format": args.format, "offset": args.offset}, timeout=args.timeout)
elif args.cmd == "click":
body = {"userId": args.user}
if args.ref:
body["ref"] = args.ref
if args.selector:
body["selector"] = args.selector
out = request(args.base, "POST", f"/tabs/{args.tab}/click", body=body, timeout=args.timeout)
elif args.cmd == "type":
body = {
"userId": args.user,
"text": args.text,
"mode": args.mode,
"delay": args.delay,
"submit": args.submit,
}
if args.ref:
body["ref"] = args.ref
if args.selector:
body["selector"] = args.selector
out = request(args.base, "POST", f"/tabs/{args.tab}/type", body=body, timeout=args.timeout)
elif args.cmd == "press":
out = request(args.base, "POST", f"/tabs/{args.tab}/press", body={"userId": args.user, "key": args.key}, timeout=args.timeout)
elif args.cmd == "scroll":
out = request(args.base, "POST", f"/tabs/{args.tab}/scroll", body={"userId": args.user, "direction": args.direction, "amount": args.amount}, timeout=args.timeout)
elif args.cmd == "navigate":
out = request(args.base, "POST", f"/tabs/{args.tab}/navigate", body={"userId": args.user, "url": args.url}, timeout=args.timeout)
elif args.cmd == "evaluate":
out = request(args.base, "POST", f"/tabs/{args.tab}/evaluate", body={"userId": args.user, "expression": args.expression}, timeout=args.timeout)
else:
raise ValueError(f"unknown command: {args.cmd}")
except Exception as e:
print(json.dumps({"ok": False, "error": str(e)}, ensure_ascii=False, indent=2))
sys.exit(1)
print(json.dumps(out, ensure_ascii=False, indent=2))
if __name__ == "__main__":
main()
Onboard a new user to WhatsApp via WAHA—greet them, collect and sanitize their phone number, create/start a WAHA session, request and share a pairing code, v...
--- name: waha-onboarding description: Onboard a new user to WhatsApp via WAHA—greet them, collect and sanitize their phone number, create/start a WAHA session, request and share a pairing code, verify authentication, and then offer next actions (recent chats, contacts, specific chat). --- # WAHA Onboarding Skill Use this skill when a user wants to connect their WhatsApp account through WAHA. ## Onboarding flow ### 1) Collect phone number Ask for the user’s WhatsApp number including country code. Example prompt: "👋 I can connect your WhatsApp. Send your phone number with country code (digits only if possible), for example `905380546393`." ### 2) Sanitize number and derive session name - Strip all non-digit characters from the input. - Use sanitized value as `<phonenumber>`. - Session name format: `user-<phonenumber>`. ### 3) Create and start WAHA session Run: ```bash waha-cli waha-create-session --name user-<phonenumber> sleep 5 waha-cli waha-start-session --session user-<phonenumber> ``` ### 4) Request pairing code Run: ```bash sleep 5 waha-cli waha-request-pairing-code --session user-<phonenumber> --phone-number <phonenumber> ``` ### 5) Share pairing instructions Send the returned code and tell user: 1. Open WhatsApp → **Linked Devices** 2. Tap **Link a Device** 3. Tap **Link with phone number instead** 4. Enter the pairing code ### 6) Verify authentication after user confirms Run: ```bash waha-cli waha-check-auth-status --session user-<phonenumber> ``` - If status is `WORKING`: onboarding succeeded. - Otherwise: run fallback. ### 7) Confirm success and offer next actions Offer: - recent conversations - contacts - messages from a specific chat ## Fallback (if not WORKING) Restart and issue a fresh code: ```bash waha-cli waha-start-session --session user-<phonenumber> sleep 8 waha-cli waha-request-pairing-code --session user-<phonenumber> --phone-number <phonenumber> ``` Then ask user to retry from WhatsApp Linked Devices. ## Naming and ID conventions - WAHA session: `user-<phonenumber>` - Direct chat id convention: `<phonenumber>@c.us`
Step-by-step guide to log into the OpenClaw Control UI, enter gateway token, approve device pairing, and verify connection
---
name: gateway-control-ui
description: Step-by-step guide to log into the OpenClaw Control UI, enter gateway token, approve device pairing, and verify connection
metadata:
openclaw:
emoji: 🔧
requires:
bins: ["openclaw", "cat"]
---
# Gateway Control UI Login & Pairing
Use when a user needs to access the OpenClaw Gateway Control UI, authenticate, pair a device, and confirm connectivity.
## Prerequisites
- OpenClaw gateway running (`openclaw status`)
- Control UI URL configured in `gateway.controlUi.allowedOrigins`
- Gateway token available in config (`/data/.openclaw/openclaw.json`)
- Device pairing approval (CLI)
## Steps
### 1. Open Control UI
- URL: use your configured OpenClaw service URL (from `gateway.controlUi.allowedOrigins`)
- HTTP Basic Auth:
- Username: use your `SERVICE_USER_OPENCLAW` value
- Password: use your `SERVICE_PASSWORD_OPENCLAW` value
- Optional embedded form:
- `https://<user>:<pass>@<your-openclaw-domain>/`
### 2. Get Gateway Token
```bash
cat /data/.openclaw/openclaw.json | grep -A 2 '"token"'
```
Token is under:
```json
"gateway": { "auth": { "token": "YOUR_TOKEN_HERE" } }
```
### 3. Enter Token in UI
- On Overview page, click **Gateway Token** field
- Paste token
- Click **Connect**
### 4. Approve Pairing (CLI)
```bash
openclaw devices list
openclaw devices approve <requestId>
```
### 5. Verify Success
- In Control UI: status shows green “OK” and dashboard loads
- CLI:
```bash
openclaw status --deep
```
Check `Gateway` → `reachable` and channels show `OK`.
## Troubleshooting
- Page stays on form? WebSocket URL may need to be filled (usually left blank for remote)
- Pairing fails? Run `openclaw devices list` again to see pending requests
- Token invalid? Check `/data/.openclaw/openclaw.json` for correct value
Transcribe audio using a deployed Cloudflare Worker Whisper endpoint. Use when converting voice/audio files (wav, mp3, m4a, ogg, webm) to text through the cu...
--- name: cloudflare-whisper-worker description: Transcribe audio using a deployed Cloudflare Worker Whisper endpoint. Use when converting voice/audio files (wav, mp3, m4a, ogg, webm) to text through the custom /transcribe API, including bearer-token auth, plain-text extraction, and quick CLI transcription workflows. --- # Cloudflare Whisper Worker Use this skill to transcribe audio through the deployed Whisper Worker API. ## Endpoint - Base URL: `https://lotfi-whisper-worker.medtouradmin.workers.dev` - Route: `POST /transcribe` - Auth: `Authorization: Bearer <API_TOKEN>` - Body: raw audio bytes (`--data-binary @file`) ## Required environment variable Set token once per shell: ```bash export WHISPER_WORKER_TOKEN="<your_token>" ``` ## Transcribe a file (JSON response) ```bash curl -sS -X POST "https://lotfi-whisper-worker.medtouradmin.workers.dev/transcribe" \ -H "content-type: audio/wav" \ -H "authorization: Bearer $WHISPER_WORKER_TOKEN" \ --data-binary "@audio.wav" ``` ## Transcribe and return only text ```bash curl -sS -X POST "https://lotfi-whisper-worker.medtouradmin.workers.dev/transcribe" \ -H "content-type: audio/wav" \ -H "authorization: Bearer $WHISPER_WORKER_TOKEN" \ --data-binary "@audio.wav" \ | jq -r '.result.text // .text // .result.response // empty' ``` ## Content-Type guide - WAV: `audio/wav` - MP3: `audio/mpeg` - M4A: `audio/mp4` - OGG/OPUS: `audio/ogg` - WEBM: `audio/webm` ## Common errors - `401 Unauthorized`: missing/invalid bearer token - `400 Empty audio body`: file path wrong or empty file - `400 Send raw audio...`: invalid content-type header - `500`: worker/runtime/model error; retry and inspect full JSON FILE:scripts/transcribe.sh #!/usr/bin/env bash set -euo pipefail if [[ $# -lt 1 ]]; then echo "Usage: $0 <audio-file> [url]" exit 1 fi FILE="$1" URL="-https://lotfi-whisper-worker.medtouradmin.workers.dev/transcribe" if [[ ! -f "$FILE" ]]; then echo "Error: file not found: $FILE" exit 1 fi if [[ -z "-" ]]; then echo "Error: WHISPER_WORKER_TOKEN is not set" echo "Set it with: export WHISPER_WORKER_TOKEN=\"<token>\"" exit 1 fi ext="FILE##*." ext="$(echo "$ext" | tr '[:upper:]' '[:lower:]')" case "$ext" in wav) ctype="audio/wav" ;; mp3) ctype="audio/mpeg" ;; m4a) ctype="audio/mp4" ;; ogg|opus) ctype="audio/ogg" ;; webm) ctype="audio/webm" ;; *) ctype="application/octet-stream" ;; esac curl -sS -X POST "$URL" \ -H "content-type: $ctype" \ -H "authorization: Bearer $WHISPER_WORKER_TOKEN" \ --data-binary "@$FILE" \ | jq -r '.result.text // .text // .result.response // empty'
Consume the shared Whisper speech-to-text API over Tailnet at http://100.92.116.99:8765 using OpenAI-compatible audio transcription endpoint (/v1/audio/trans...
---
name: whisper-tailnet-api
description: Consume the shared Whisper speech-to-text API over Tailnet at http://100.92.116.99:8765 using OpenAI-compatible audio transcription endpoint (/v1/audio/transcriptions). Use when an agent needs remote transcription checks, request examples, language hints, timing tests, or troubleshooting response/output.
---
# Whisper STT API over Tailnet (OpenAI-compatible)
Use this guide to call the shared Whisper server.
## Endpoint
- **Base URL:** `http://100.92.116.99:8765`
- **Health:** `GET /health`
- **Transcribe:** `POST /v1/audio/transcriptions` (raw binary body)
## Quick health check
```bash
curl -sS http://100.92.116.99:8765/health
```
## Transcribe audio (recommended)
```bash
curl -sS -X POST \
--data-binary @/path/to/audio.wav \
"http://100.92.116.99:8765/v1/audio/transcriptions?ext=.wav"
```
## Time the request
```bash
time curl -sS -X POST \
--data-binary @/path/to/audio.wav \
"http://100.92.116.99:8765/v1/audio/transcriptions?ext=.wav"
```
## Notes
- Prefer this OpenAI-compatible route over `/transcribe` on this host.
- Pass file type via `ext` query (example: `.wav`, `.mp3`, `.m4a`).
- Use `language` query when known to improve accuracy.
## Expected response shape
```json
{
"text": "transcribed text...",
"model": "turbo"
}
```
Configure an OpenClaw instance to use a local OpenAI-compatible TTS backend (for example openedai-speech) with cloned voices. Use when users ask to wire loca...
---
name: local-openedai-tts
description: Configure an OpenClaw instance to use a local OpenAI-compatible TTS backend (for example openedai-speech) with cloned voices. Use when users ask to wire local TTS, set OpenClaw to use local speech synthesis, verify voice/model mapping, generate test clips, troubleshoot wrong voice/model selection, or expose the local TTS endpoint to LAN/Tailscale.
---
# Local OpenAI-Compatible TTS for OpenClaw
Configure OpenClaw to send TTS requests to a local OpenAI-compatible endpoint, then verify end-to-end delivery.
## Quick workflow
1. Set OpenAI base URL to local endpoint.
2. Configure OpenClaw messages TTS provider/model/voice.
3. Verify TTS config with `openclaw config get`.
4. Generate a direct API sample clip to confirm voice mapping.
5. Send sample via channel plugin (Telegram/WhatsApp/etc.) if requested.
6. If remote access is requested, expose the TTS service port (not necessarily the OpenClaw gateway).
## 1) Configure OpenClaw to use local TTS backend
Use CLI config commands only.
```bash
openclaw config set env.vars.OPENAI_BASE_URL http://127.0.0.1:19000/v1
openclaw config set messages.tts.provider openai
openclaw config set messages.tts.openai.model tts-1-hd
openclaw config set messages.tts.openai.voice me
```
Verify:
```bash
openclaw config get env.vars.OPENAI_BASE_URL
openclaw config get messages.tts
```
## 2) Verify cloned voice exists on backend
If using openedai-speech + XTTS voice mapping, cloned voices are commonly available only on `tts-1-hd`.
Check voice map inside container:
```bash
sudo docker exec openedai-speech sh -lc 'sed -n "1,220p" /app/config/voice_to_speaker.yaml'
```
If `voice: me` fails with `KeyError`, check whether:
- wrong model is used (`tts-1` instead of `tts-1-hd`), or
- voice key missing from `voice_to_speaker.yaml`.
## 3) Generate a deterministic test clip (direct API)
Use direct POST to validate backend behavior independent of chat surface rendering.
```bash
curl -sS -X POST http://127.0.0.1:8880/v1/audio/speech \
-H 'Content-Type: application/json' \
-d '{
"model":"tts-1-hd",
"voice":"me",
"input":"Quick cloned voice check.",
"speed":1.25,
"response_format":"mp3"
}' \
--output /tmp/clone-test.mp3
file /tmp/clone-test.mp3
```
Expected: MP3 audio file (not JSON error text).
## 4) Important limitation: speed pinning in OpenClaw config
`messages.tts.openai.speed` may be rejected by current OpenClaw schema. If so:
- keep model/voice in OpenClaw config,
- set speed per request when generating clips directly,
- or enforce speed with a local proxy layer in front of backend.
Do not claim speed is globally pinned unless schema accepts it.
## 5) Expose service correctly (LAN/Tailscale)
Distinguish between:
- **OpenClaw gateway exposure** (`gateway.bind`, `gateway.tailscale.*`), and
- **TTS backend exposure** (container/service port such as `19000` or `8880`).
If user asks to expose local TTS only, do not change gateway bind/mode unless explicitly requested.
For TTS backend reachability:
1. Confirm listener/bind:
```bash
ss -ltnp | grep ':19000\|:8880'
```
2. If bound to `127.0.0.1`, rebind service/container to `0.0.0.0` or tailnet interface.
3. Restrict access via firewall/Tailscale ACLs.
## 6) Channel delivery troubleshooting
If webchat does not play voice attachments:
- send as regular file attachment to supported channel (e.g., Telegram),
- verify target id explicitly,
- confirm local file still exists before sending.
If file missing, regenerate clip and resend.
## Command safety
- Use `openclaw config set/get` (never edit `openclaw.json` directly).
- Avoid unrelated gateway changes when task is strictly TTS service exposure.
- For external sends, use channel tools and explicit target ids.