@clawhub-shiftshen-92a6829a2b
Use a text model such as gpt-5.4 with the image_generation tool over an OpenAI-compatible /v1/responses endpoint, matching the CPA blog example. Do not call...
---
name: cpa-gpt-image-2
description: Use a text model such as gpt-5.4 with the image_generation tool over an OpenAI-compatible /v1/responses endpoint, matching the CPA blog example. Do not call gpt-image-2 as a direct model on gateways that do not expose it.
---
# cpa-gpt-image-2
Use this skill when image generation should go through the **CPA blog pattern**: a normal text model invokes the `image_generation` tool over a compatible `/v1/responses` endpoint.
## What this skill does
- sends a request to an OpenAI-compatible `/v1/responses` endpoint
- matches the CPA blog example request shape closely
- uses the `image_generation` tool
- defaults to model `gpt-5.4`
- prefers **non-streaming** mode by default for simpler and more stable parsing
- can switch to streaming mode when needed
- automatically retries short image rate-limit responses
- automatically retries transient `tools: []` text fallbacks from the gateway
- keeps credentials in environment variables, never in the skill files
## Important rule
Do **not** treat `gpt-image-2` as the direct model on gateways that do not expose that model.
Correct pattern:
- use a normal model such as `gpt-5.4`
- pass `tools: [{"type": "image_generation", "output_format": "png"}]`
- let the gateway/tool layer decide whether image generation is available
Wrong pattern for this gateway:
- directly calling model `gpt-image-2` when the provider does not publish that model
## Default environment resolution
The script resolves credentials in this order.
Base URL:
- `IMAGE_GEN_BASE_URL`
- `OTCBOT_BASE_URL`
- `CPA_BASE_URL`
- `OPENAI_BASE_URL`
- fallback to OpenClaw `models.json` otcbot provider baseUrl
API key:
- `IMAGE_GEN_KEY`
- `OTCBOT_API_KEY`
- `CPA_API_KEY`
- `OPENAI_API_KEY`
- fallback to OpenClaw `models.json` otcbot provider apiKey
Model default:
- `IMAGE_GEN_MODEL`
- `OTCBOT_IMAGE_MODEL`
- `CPA_MODEL`
- fallback to current OpenClaw image/default model
- final fallback: `gpt-5.4`
Optional:
- `IMAGE_GEN_OUTPUT_FORMAT` — default `png`
- `CPA_SESSION_ID` — session id header value, default `test-session`
- `CPA_USER_AGENT` — custom user-agent header
- `CPA_VERSION` — request header `version`, default `0.122.0`
- `CPA_ORIGINATOR` — request header `originator`, default `codex_cli_rs`
The script calls:
- `BASE_URL%//v1/responses`
## Default execution path
Use the bundled script:
```bash
python3 skills/cpa-gpt-image-2/scripts/generate_image.py \
--prompt "画一只可爱的松鼠" \
--output /tmp/squirrel.png \
--model gpt-5.4
```
Recommended env contract:
```bash
export IMAGE_GEN_BASE_URL='http://192.168.10.8:8317/v1'
export IMAGE_GEN_KEY='sk-xxxx'
export IMAGE_GEN_MODEL='gpt-5.4'
```
Override model when needed:
```bash
python3 skills/cpa-gpt-image-2/scripts/generate_image.py \
--prompt "a cinematic fox detective in Bangkok neon rain" \
--output /tmp/fox.png \
--model gpt-5.4 \
--format png
```
## Expected behavior
The script:
1. reads credentials from env or OpenClaw otcbot defaults
2. POSTs to `/v1/responses`
3. sends codex-style headers: `user-agent`, `version`, `originator`, `session_id`
4. requests the `image_generation` tool, defaulting to `stream: false`
5. parses normal JSON, or SSE `data:` payloads when streaming is enabled
6. auto-retries short `rate_limit_exceeded` image responses when the server provides a retry delay
7. auto-retries transient gateway fallbacks where the tool list comes back empty and the response degrades to text
8. extracts the first base64 image from the response
9. writes the file to the requested output path
## Fallback curl patterns
Preferred non-streaming version:
```bash
curl --location "$IMAGE_GEN_BASE_URL/responses" \
--header "Authorization: Bearer $IMAGE_GEN_KEY" \
--header "Content-Type: application/json" \
--data '{
"model": "gpt-5.4",
"input": "画一只可爱的松鼠",
"tools": [
{
"type": "image_generation",
"output_format": "png"
}
],
"instructions": "you are a helpful assistant",
"tool_choice": "auto",
"stream": false,
"store": false
}'
```
Streaming version when needed:
```bash
curl --location "$IMAGE_GEN_BASE_URL/responses" \
--header "Authorization: Bearer $IMAGE_GEN_KEY" \
--header "user-agent: -codex-tui/0.122.0 (Manjaro 26.1.0-pre; x86_64) vscode/3.0.12 (codex-tui; 0.122.0)" \
--header "version: -0.122.0" \
--header "originator: -codex_cli_rs" \
--header "session_id: -test-session" \
--header 'accept: text/event-stream' \
--header 'Content-Type: application/json' \
--data '{
"model": "gpt-5.4",
"input": "画一只可爱的松鼠",
"tools": [
{
"type": "image_generation",
"output_format": "png"
}
],
"instructions": "you are a helpful assistant",
"tool_choice": "auto",
"stream": true,
"store": false
}'
```
## Notes
- Prefer the bundled script for repeatability.
- Do not hardcode live keys, base URLs, or session ids into workspace docs.
- This skill intentionally mirrors the CPA blog example request shape as closely as practical.
- On this gateway, prefer `gpt-5.4` plus `image_generation` tool instead of direct `gpt-image-2` model calls.
- For the known local otcbot endpoint, prefer setting `OTCBOT_BASE_URL` and `OTCBOT_API_KEY` explicitly when testing.
- If the endpoint returns provider-specific SSE events, extend the SSE parser instead of changing the whole request shape.
- If the user asks to send the generated file back in the current chat, use the normal file-delivery flow after generation.
FILE:_meta.json
{
"slug": "cpa-gpt-image-2",
"version": "0.1.0",
"localOnly": true,
"createdAt": 1777258500000
}
FILE:scripts/generate_image.py
#!/usr/bin/env python3
import argparse
import base64
import json
import os
import re
import sys
import time
import urllib.error
import urllib.request
from pathlib import Path
def fail(msg: str, code: int = 1):
print(msg, file=sys.stderr)
sys.exit(code)
def find_image_b64(obj):
if isinstance(obj, dict):
if isinstance(obj.get("b64_json"), str):
return obj["b64_json"]
if obj.get("type") == "image_generation_call":
result = obj.get("result")
if isinstance(result, str):
return result
for value in obj.values():
hit = find_image_b64(value)
if hit:
return hit
elif isinstance(obj, list):
for item in obj:
hit = find_image_b64(item)
if hit:
return hit
return None
def load_json_if_exists(path: Path):
try:
if path.exists():
return json.loads(path.read_text())
except Exception:
return None
return None
def load_openclaw_provider_defaults():
status_path = Path.home() / "openclaw/agents/main/agent/models.json"
data = load_json_if_exists(status_path) or {}
providers = data.get("providers") or {}
otcbot = providers.get("otcbot") or {}
return {
"base_url": (otcbot.get("baseUrl") or "").rstrip("/"),
"api_key": otcbot.get("apiKey") or "",
}
def infer_default_model():
env_model = os.getenv("OTCBOT_IMAGE_MODEL") or os.getenv("CPA_MODEL")
if env_model:
return env_model
status_json = os.popen("openclaw models status --json 2>/dev/null").read().strip()
if status_json:
try:
status = json.loads(status_json)
image_model = status.get("imageModel")
if image_model and "/" in image_model:
return image_model.split("/", 1)[1]
default_model = status.get("resolvedDefault") or status.get("defaultModel")
if default_model and "/" in default_model:
return default_model.split("/", 1)[1]
except Exception:
pass
return os.getenv("IMAGE_GEN_MODEL", "gpt-5.4")
def parse_sse_events(raw: str):
payloads = []
buf = []
for line in raw.splitlines():
if line.startswith("data:"):
data = line[5:].strip()
if data == "[DONE]":
continue
buf.append(data)
elif not line.strip() and buf:
payloads.append("\n".join(buf))
buf = []
if buf:
payloads.append("\n".join(buf))
parsed = []
for item in payloads:
try:
parsed.append(json.loads(item))
except Exception:
continue
return parsed
def extract_retry_delay_ms(raw: str, parsed):
text = raw or ""
m = re.search(r"Please try again in (\d+)ms", text)
if m:
return int(m.group(1))
if isinstance(parsed, list):
for item in parsed:
err = item.get("error") if isinstance(item, dict) else None
if isinstance(err, dict):
msg = err.get("message", "")
m = re.search(r"Please try again in (\d+)ms", msg)
if m:
return int(m.group(1))
elif isinstance(parsed, dict):
err = parsed.get("error")
if isinstance(err, dict):
msg = err.get("message", "")
m = re.search(r"Please try again in (\d+)ms", msg)
if m:
return int(m.group(1))
return None
def is_rate_limit_error(raw: str, parsed):
text = raw or ""
if "rate_limit_exceeded" in text:
return True
if isinstance(parsed, list):
for item in parsed:
if isinstance(item, dict) and item.get("type") == "error":
err = item.get("error") or {}
if err.get("code") == "rate_limit_exceeded":
return True
elif isinstance(parsed, dict):
err = parsed.get("error") or {}
if err.get("code") == "rate_limit_exceeded":
return True
return False
def has_tools_empty_fallback(raw: str, parsed):
text = raw or ""
if '"tools":[]' in text and 'response.output_text.delta' in text:
return True
if isinstance(parsed, list):
saw_created = False
saw_text = False
for item in parsed:
if not isinstance(item, dict):
continue
t = item.get('type')
if t in ('response.created', 'response.in_progress'):
response = item.get('response') or {}
if response.get('tools') == []:
saw_created = True
if t == 'response.output_text.delta':
saw_text = True
return saw_created and saw_text
return False
parser = argparse.ArgumentParser(description="Generate an image through an otcbot/OpenAI-compatible Responses API")
parser.add_argument("--prompt", required=True, help="Image prompt")
parser.add_argument("--output", required=True, help="Output image path")
parser.add_argument("--model", default=infer_default_model())
parser.add_argument("--format", default=os.getenv("IMAGE_GEN_OUTPUT_FORMAT", "png"), choices=["png", "jpeg", "webp"])
parser.add_argument("--instructions", default="you are a helpful assistant")
parser.add_argument("--session-id", default=os.getenv("CPA_SESSION_ID", "test-session"))
parser.add_argument("--stream", dest="stream", action="store_true", default=False)
parser.add_argument("--no-stream", dest="stream", action="store_false")
parser.add_argument("--retries", type=int, default=4)
args = parser.parse_args()
provider_defaults = load_openclaw_provider_defaults()
base_url = (os.getenv("IMAGE_GEN_BASE_URL") or os.getenv("OTCBOT_BASE_URL") or os.getenv("CPA_BASE_URL") or os.getenv("OPENAI_BASE_URL") or provider_defaults["base_url"] or "").rstrip("/")
api_key = os.getenv("IMAGE_GEN_KEY") or os.getenv("OTCBOT_API_KEY") or os.getenv("CPA_API_KEY") or os.getenv("OPENAI_API_KEY") or provider_defaults["api_key"] or ""
user_agent = os.getenv("CPA_USER_AGENT", "codex-tui/0.122.0 (Manjaro 26.1.0-pre; x86_64) vscode/3.0.12 (codex-tui; 0.122.0)")
version = os.getenv("CPA_VERSION", "0.122.0")
originator = os.getenv("CPA_ORIGINATOR", "codex_cli_rs")
if not base_url:
fail("Missing OTCBOT_BASE_URL / CPA_BASE_URL / OPENAI_BASE_URL and no otcbot baseUrl found in models.json")
if not api_key:
fail("Missing OTCBOT_API_KEY / CPA_API_KEY / OPENAI_API_KEY and no otcbot apiKey found in models.json")
url = f"{base_url}/responses" if base_url.endswith("/v1") else f"{base_url}/v1/responses"
payload = {
"model": args.model,
"input": args.prompt,
"tools": [
{
"type": "image_generation",
"output_format": args.format,
}
],
"instructions": args.instructions,
"tool_choice": "auto",
"stream": args.stream,
"store": False,
}
last_raw = ""
last_parsed = None
for attempt in range(args.retries + 1):
data = json.dumps(payload).encode("utf-8")
req = urllib.request.Request(
url,
data=data,
headers={
"Authorization": f"Bearer {api_key}",
"user-agent": user_agent,
"version": version,
"originator": originator,
"session_id": args.session_id,
"accept": "text/event-stream" if args.stream else "application/json",
"Content-Type": "application/json",
},
method="POST",
)
try:
with urllib.request.urlopen(req, timeout=120) as resp:
raw = resp.read().decode("utf-8", errors="replace")
status = resp.status
except urllib.error.HTTPError as e:
body = e.read().decode("utf-8", errors="replace")
fail(f"HTTP {e.code}\n{body}")
except Exception as e:
fail(f"Request failed: {e}")
if status < 200 or status >= 300:
fail(f"Unexpected HTTP status: {status}\n{raw}")
parsed = None
if args.stream:
events = parse_sse_events(raw)
if events:
parsed = events
if parsed is None:
try:
parsed = json.loads(raw)
except json.JSONDecodeError:
fail(f"Response was not valid JSON or SSE JSON:\n{raw[:4000]}")
b64 = find_image_b64(parsed)
if b64:
out = Path(args.output)
out.parent.mkdir(parents=True, exist_ok=True)
out.write_bytes(base64.b64decode(b64))
print(str(out))
sys.exit(0)
last_raw = raw
last_parsed = parsed
if attempt < args.retries and is_rate_limit_error(raw, parsed):
delay_ms = extract_retry_delay_ms(raw, parsed) or 1000
time.sleep(max(delay_ms, 200) / 1000.0)
continue
if attempt < args.retries and has_tools_empty_fallback(raw, parsed):
time.sleep(0.8 + attempt * 0.4)
continue
break
fail("No base64 image found in response. Inspect raw payload.\n" + last_raw[:4000])
生成 AI 短视频脚本、镜头拆解、画面提示词、字幕文案与可直接进入视频生成测试的 prompt。用于把一个主题快速整理成可执行的视频方案,尤其适合:短视频选题、分镜脚本、AI 视频提示词、Yijia/Sora/Grok 文生视频测试、图生视频测试、爆款短视频结构化策划。
---
name: ai-video-script
description: 生成 AI 短视频脚本、镜头拆解、画面提示词、字幕文案与可直接进入视频生成测试的 prompt。用于把一个主题快速整理成可执行的视频方案,尤其适合:短视频选题、分镜脚本、AI 视频提示词、Yijia/Sora/Grok 文生视频测试、图生视频测试、爆款短视频结构化策划。
---
# AI Video Script Generator
把一个模糊主题整理成**可执行的视频脚本包**:
- 标题与定位
- 时长建议
- 分镜表
- 逐镜头提示词
- 旁白/字幕文案
- 可直接用于 Yijia 测试的视频总 prompt
## 输出目标
默认输出 5 个部分:
1. **视频概述**
- 标题
- 目标受众
- 风格
- 建议时长
2. **分镜表**
- 时间段
- 画面
- 镜头运动
- 情绪/节奏
3. **逐镜头提示词**
- 画面主体
- 场景环境
- 光线与镜头语言
- 风格关键词
4. **文案层**
- 开头钩子
- 旁白
- 字幕
- 结尾 CTA
5. **生成测试 prompt**
- 一段适合直接送入 Yijia/Sora/Grok 的整合版 prompt
## 推荐工作流
### 场景 A:先做脚本
当用户只说主题时,按下面顺序输出:
1. 先定义视频定位
2. 再给 3-5 个分镜
3. 再给每镜头提示词
4. 再给字幕与旁白
5. 最后给一段**可直接生成视频**的主 prompt
### 场景 B:直接为 AI 视频生成服务
如果目标是直接测试文生视频/图生视频:
- 优先输出 **8秒 / 10秒 / 15秒** 版本
- 画面描述要具体
- 避免镜头过多
- 强调主体、动作、场景、镜头、光线、风格、画质
- 结尾给一个**压缩后的单段 prompt**,便于直接粘贴到视频生成器
### 场景 C:适配短视频平台
短视频优先结构:
- 0-2 秒:钩子
- 2-6 秒:核心动作/反差
- 6-8 秒或 6-10 秒:结果/反转/记忆点
## 适配 Yijia 的写法
当目标是 Yijia 测试时,优先遵循:
- 单视频只保留一个核心主体
- 动作描述清晰,不要多事件并发
- 场景尽量单一
- 镜头运动只选 1-2 个
- 风格关键词不超过 6 个
- 避免过长抽象说明
推荐总 prompt 结构:
```text
主体 + 动作 + 场景 + 镜头运动 + 光线/氛围 + 风格 + 画质
```
示例:
```text
一只橘猫坐在雨后窗边慢慢转头看向镜头,桌上有一杯冒着热气的咖啡,镜头缓慢推进,室内暖色傍晚光线,治愈、细腻、电影感,竖屏,高清
```
## 建议输出模板
```markdown
# 标题
## 概述
- 时长:8秒 / 10秒 / 15秒
- 风格:
- 受众:
- 目标:
## 分镜
### 镜头1
- 时间:0-2秒
- 画面:
- 镜头:
- 提示词:
- 配音/字幕:
### 镜头2
...
## 完整旁白
## 字幕版文案
## Yijia 测试总 Prompt
## 可选变体
- 轻松版
- 电影感版
- 爆款钩子版
```
## 快速模板
### 1) 萌宠治愈类
```text
主体可爱、动作简单、场景温暖、镜头缓慢、光线柔和、情绪治愈
```
### 2) 搞笑反转类
```text
前半段正常,后半段出现反差动作,结尾有记忆点,节奏快,画面明确
```
### 3) 解压沉浸类
```text
主体重复性动作,特写,慢节奏,细节质感强,ASMR 氛围明显
```
## 参考
若要把脚本直接接到 creator 测试流程,读取:
- `references/yijia-test-workflow.md`
FILE:_meta.json
{
"ownerId": "kn702anzfa5av16gsqhas6bg0n81qbq6",
"slug": "ai-video-script",
"version": "1.0.0",
"publishedAt": 1771853066761
}
FILE:references/yijia-test-workflow.md
# Yijia Test Workflow Reference
## 用途
把脚本类 skill 产出的内容,压缩成可以直接用于 Yijia 文生视频测试的一段 prompt。
## 推荐测试规则
- 先从 `temp-workflow` 做测试,不污染正式 workflow
- 时长优先 8 秒
- 分辨率优先 竖屏
- 模型优先:`grok-imagine-1.0-video` 或 `grok-video`
- 类别可先用测试分类,例如:`测试视频`
- prompt 只保留单主体、单场景、单情绪、单动作主线
## Prompt 压缩法
把原脚本压成一段:
```text
主体 + 核心动作 + 场景 + 镜头运动 + 光线 + 风格 + 画质
```
### 示例 1:萌宠
```text
一只柯基在客厅地毯上摇着屁股小跑向镜头,嘴里叼着一只红色拖鞋,镜头低机位轻微跟拍,午后暖阳从窗边照进来,轻松搞笑,生活感,竖屏,高清
```
### 示例 2:解压
```text
透明玻璃珠缓慢落入浅水盘中激起细小波纹,微距特写,镜头轻微推进,柔和棚拍光线,安静、治愈、沉浸感,竖屏,高清
```
## Creator 测试命令
```bash
python3 /Users/shift/.openclaw/workspace-video-creator/scripts/workflow_guard.py
python3 /Users/shift/.openclaw/workspace-video-creator/scripts/enforce_project_layout.py
bash /Users/shift/.openclaw/workspace-video-creator/scripts/yijia_generate.sh \
--project "temp-workflow" \
--category "测试视频" \
--language zh \
--target "tiktok/shiftshen" \
--prompt "一只柯基在客厅地毯上摇着屁股小跑向镜头,嘴里叼着一只红色拖鞋,镜头低机位轻微跟拍,午后暖阳从窗边照进来,轻松搞笑,生活感,竖屏,高清" \
--duration "8秒" \
--resolution "竖屏" \
--mode text2video \
--channel "yijia" \
--model "grok-video"
```
## 预期产物
- `projects/temp-workflow/outputs/*.mp4`
- `projects/temp-workflow/outputs/*.txt`
- `projects/temp-workflow/progress/yijia_create_*.json`
## 注意
- 若环境未配置 Yijia 可执行文件或 API Key,测试会卡在生成阶段
- 若要正式 workflow 使用,需把测试 prompt 改成该 workflow 的 category/language 约束版本
FILE:test.md
---
name: ai-video-script
description: AI视频脚本生成器。根据用户输入的主题/关键词,生成完整的视频脚本,包含分镜描述、画面提示词、配音文案。适用于短视频创作者、AI视频制作者、内容营销人员。触发词:视频脚本、分镜、AI视频、短视频文案、视频策划。
---
# 测试:AI视频脚本生成器
## 测试案例1:产品介绍
**输入**:帮我生成一个智能手表的30秒视频脚本
**预期输出**:包含分镜、提示词、文案的完整脚本
---
## 实际测试
智能手表 - 30秒产品介绍脚本:
```markdown
# 智能手表 - 你的健康管家
## 概述
- 时长:30秒
- 风格:科技、时尚、生活化
- 目标:突出健康监测功能,引起购买欲
## 分镜表
### 场次1(0-5秒):开篇
- **画面**:智能手表特写,屏幕亮起
- **镜头**:微距特写,缓慢推进
- **提示词**:Smart watch close-up, screen lighting up, blue glow,
product photography, minimalist background --ar 16:9
- **配音**:你的健康,值得更好的守护。
- **字幕**:智能手表
### 场次2(5-15秒):功能展示
- **画面**:分屏展示心率、血氧、睡眠监测界面
- **镜头**:快速切换
- **提示词**:Split screen showing heart rate, blood oxygen, sleep monitoring
interfaces, clean UI design, tech aesthetic --ar 16:9
- **配音**:24小时心率监测,血氧追踪,睡眠分析。每一项数据,都是你健康的晴雨表。
- **字幕**:心率监测 | 血氧追踪 | 睡眠分析
### 场次3(15-25秒):使用场景
- **画面**:跑步、办公、睡觉三个场景快切
- **镜头**:跟拍 + 定格
- **提示词**:Multiple scenes: person running outdoors, working at office,
sleeping peacefully, wearing smart watch, lifestyle photography --ar 16:9
- **配音**:运动时、工作时、休息时。它始终陪伴,记录你的每一步成长。
- **字幕**:全天候守护
### 场次4(25-30秒):结尾
- **画面**:手表与手机同步数据,显示健康报告
- **镜头**:特写 + 拉远展示品牌
- **提示词**:Smart watch syncing with smartphone, health report on screen,
modern tech style, lifestyle setting --ar 16:9
- **配音**:智能手表,让健康管理更简单。点击下方链接,开启你的健康之旅。
- **字幕**:智能手表 | [品牌LOGO] | 立即了解
```
## 测试结果
✅ 脚本结构完整
✅ 分镜清晰
✅ 提示词可用
✅ 文案流畅
✅ 符合30秒时长
---
## 价值评估
这个 Skills 可以帮助:
- 短视频创作者快速产出脚本
- AI 视频制作者获得可用提示词
- 营销人员批量生产内容
**变现潜力**:
- 定价:¥47/月(约 1.5 元/次)
- 目标用户:短视频创作者、AI 视频制作者
- 月销 30 份 = ¥1,410,我分 30% = ¥423 ✅
---
_测试通过,准备发布_