@clawhub-atmosphere16happy-85982e9525
Generate professional baby photo books from photos with intelligent layout. Automatically organizes photos by baby's age stages, applies smart layouts to min...
---
name: baby-photo-book
description: Generate professional baby photo books from photos with intelligent layout. Automatically organizes photos by baby's age stages, applies smart layouts to minimize whitespace, and outputs print-ready PDF. Use when user wants to create baby photo albums, generate photo books from baby pictures, or automatically layout photos for printing.
---
# Baby Photo Book Generator
Generate professional baby photo books with intelligent layout optimization.
## Features
- **Automatic Age Grouping**: Organizes photos by baby's growth stages (newborn, infant, toddler)
- **Smart Layout**: Minimizes whitespace while preserving photo aspect ratios
- **Multiple Layout Patterns**: Supports 1-4 photos per page with optimal arrangements
- **Print-Ready Output**: Generates A4 PDF suitable for professional printing
## Usage
### Basic Usage
```bash
python scripts/generate_photo_book.py <photo_folder> --name "Baby Name" --birth YYYY-MM-DD
```
### Parameters
- `photo_folder`: Path to folder containing baby photos
- `--name`: Baby's name for the cover
- `--birth`: Baby's birth date (YYYY-MM-DD format)
- `--output`: Output PDF filename (default: baby_photo_book.pdf)
### Example
```bash
python scripts/generate_photo_book.py ~/baby_photos --name "小明" --birth 2023-06-15
```
## Layout Algorithm
The skill uses an intelligent layout engine that:
1. **Analyzes photo aspect ratios** to determine optimal placement
2. **Calculates multiple layout options** for each page
3. **Selects the layout with minimum whitespace**
4. **Preserves photo proportions** without cropping
### Layout Patterns
| Photos | Layout Strategy |
|--------|-----------------|
| 1 | Full page, maximized to fill available space |
| 2 | Side-by-side or stacked, whichever minimizes whitespace |
| 3 | Left-1-Right-2 or Top-1-Bottom-2 based on photo orientations |
| 4 | 2×2 adaptive grid with aspect-ratio-aware cell sizing |
## Age Stages
Photos are automatically grouped by baby's age:
- **Newborn** (0-1 month)
- **Early Infant** (1-3 months)
- **Mid Infant** (3-6 months)
- **Late Infant** (6-9 months)
- **Crawling** (9-12 months)
- **Early Toddler** (1-1.5 years)
- **Mid Toddler** (1.5-2 years)
- **Late Toddler** (2-3 years)
## Dependencies
- Python 3.8+
- Pillow (image processing)
- ReportLab (PDF generation)
Install dependencies:
```bash
pip install Pillow reportlab
```
## Output
Generates a PDF with:
- Cover page with baby name
- Chapter pages for each age stage
- Photo pages with intelligent layouts
- Date and age annotations on each photo
FILE:scripts/generate_photo_book.py
#!/usr/bin/env python3
"""
Baby Photo Book Generator
Generate professional baby photo books with intelligent layout optimization.
"""
import os
import sys
from datetime import datetime
from pathlib import Path
from collections import defaultdict
from PIL import Image, ExifTags
from reportlab.lib.pagesizes import A4
from reportlab.pdfgen import canvas
from reportlab.lib.units import cm
from reportlab.pdfbase import pdfmetrics
from reportlab.pdfbase.ttfonts import TTFont
from reportlab.lib.colors import HexColor
import argparse
class PhotoLayoutEngine:
"""Intelligent layout engine that minimizes whitespace"""
def __init__(self, page_width, page_height):
self.page_w = page_width
self.page_h = page_height
self.margin = 0.3 * cm
self.gap = 0.2 * cm
self.avail_w = page_width - 2 * self.margin
self.avail_h = page_height - 2 * self.margin
def get_aspect(self, photo_path):
try:
img = Image.open(photo_path)
return img.width / img.height
except:
return 1.0
def calculate_layout(self, photos):
"""Calculate optimal layout to minimize whitespace"""
n = len(photos)
if n == 0:
return []
aspects = [self.get_aspect(p['path']) for p in photos]
if n == 1:
return self._layout_1(photos[0], aspects[0])
elif n == 2:
return self._layout_2(photos, aspects)
elif n == 3:
return self._layout_3(photos, aspects)
else:
return self._layout_4(photos, aspects)
def _layout_1(self, photo, aspect):
"""Single photo - maximize to fill page"""
if aspect > self.avail_w / self.avail_h:
w = self.avail_w
h = w / aspect
else:
h = self.avail_h
w = h * aspect
x = self.margin + (self.avail_w - w) / 2
y = self.margin + (self.avail_h - h) / 2
return [(photo, (x, y, x + w, y + h))]
def _layout_2(self, photos, aspects):
"""Two photos - side by side or stacked"""
a1, a2 = aspects[0], aspects[1]
# Try horizontal arrangement
h = self.avail_h
w1 = h * a1
w2 = h * a2
if w1 + w2 + self.gap <= self.avail_w:
total_w = w1 + w2 + self.gap
start_x = self.margin + (self.avail_w - total_w) / 2
y = self.margin + (self.avail_h - h) / 2
blank1 = self.avail_w * self.avail_h - (w1 * h + w2 * h)
else:
scale = (self.avail_w - self.gap) / (w1 + w2)
w1 *= scale
w2 *= scale
h *= scale
start_x = self.margin
y = self.margin + (self.avail_h - h) / 2
blank1 = self.avail_w * self.avail_h - (w1 * h + w2 * h)
# Try vertical arrangement
w = self.avail_w
h1 = w / a1
h2 = w / a2
if h1 + h2 + self.gap <= self.avail_h:
total_h = h1 + h2 + self.gap
start_y = self.margin + (self.avail_h - total_h) / 2
blank2 = self.avail_w * self.avail_h - (w * h1 + w * h2)
else:
scale = (self.avail_h - self.gap) / (h1 + h2)
h1 *= scale
h2 *= scale
w *= scale
start_y = self.margin
blank2 = self.avail_w * self.avail_h - (w * h1 + w * h2)
if blank1 <= blank2:
y = self.margin + (self.avail_h - h) / 2
return [
(photos[0], (start_x, y, start_x + w1, y + h)),
(photos[1], (start_x + w1 + self.gap, y, start_x + w1 + self.gap + w2, y + h))
]
else:
x = self.margin + (self.avail_w - w) / 2
return [
(photos[0], (x, start_y + h2 + self.gap, x + w, start_y + h2 + self.gap + h1)),
(photos[1], (x, start_y, x + w, start_y + h2))
]
def _layout_3(self, photos, aspects):
"""Three photos - intelligent split"""
a1, a2, a3 = aspects
# Find best photo for left side (portrait preferred)
vertical_scores = [(i, 1 / a) for i, a in enumerate(aspects)]
left_idx = max(range(3), key=lambda i: vertical_scores[i][1])
right_idxs = [i for i in range(3) if i != left_idx]
left_w = self.avail_w * 0.5 - self.gap / 2
right_w = self.avail_w * 0.5 - self.gap / 2
h_left = min(self.avail_h, left_w / aspects[left_idx])
w_left = h_left * aspects[left_idx]
right_h = (self.avail_h - self.gap) / 2
w_r1 = min(right_w, right_h * aspects[right_idxs[0]])
h_r1 = w_r1 / aspects[right_idxs[0]]
w_r2 = min(right_w, right_h * aspects[right_idxs[1]])
h_r2 = w_r2 / aspects[right_idxs[1]]
used_area1 = w_left * h_left + w_r1 * h_r1 + w_r2 * h_r2
blank1 = self.avail_w * self.avail_h - used_area1
# Try top-bottom arrangement
horizontal_scores = [(i, a) for i, a in enumerate(aspects)]
top_idx = max(range(3), key=lambda i: horizontal_scores[i][1])
bottom_idxs = [i for i in range(3) if i != top_idx]
top_h = self.avail_h * 0.5 - self.gap / 2
bottom_h = self.avail_h * 0.5 - self.gap / 2
w_top = min(self.avail_w, top_h * aspects[top_idx])
h_top = w_top / aspects[top_idx]
bottom_w = (self.avail_w - self.gap) / 2
h_b1 = min(bottom_h, bottom_w / aspects[bottom_idxs[0]])
w_b1 = h_b1 * aspects[bottom_idxs[0]]
h_b2 = min(bottom_h, bottom_w / aspects[bottom_idxs[1]])
w_b2 = h_b2 * aspects[bottom_idxs[1]]
used_area2 = w_top * h_top + w_b1 * w_b1 + w_b2 * w_b2
blank2 = self.avail_w * self.avail_h - used_area2
if blank1 <= blank2:
y_left = self.margin + (self.avail_h - h_left) / 2
x_left = self.margin + (left_w - w_left) / 2
x_right = self.margin + left_w + self.gap + (right_w - max(w_r1, w_r2)) / 2
y_r1 = self.margin + right_h + self.gap + (right_h - h_r1) / 2
y_r2 = self.margin + (right_h - h_r2) / 2
result = [None] * 3
result[left_idx] = (photos[left_idx], (x_left, y_left, x_left + w_left, y_left + h_left))
result[right_idxs[0]] = (photos[right_idxs[0]], (x_right, y_r1, x_right + w_r1, y_r1 + h_r1))
result[right_idxs[1]] = (photos[right_idxs[1]], (x_right, y_r2, x_right + w_r2, y_r2 + h_r2))
return result
else:
x_top = self.margin + (self.avail_w - w_top) / 2
y_top = self.margin + self.avail_h - h_top - (top_h - h_top) / 2
y_bottom = self.margin + (bottom_h - max(h_b1, h_b2)) / 2
x_b1 = self.margin + (bottom_w - w_b1) / 2
x_b2 = self.margin + bottom_w + self.gap + (bottom_w - w_b2) / 2
result = [None] * 3
result[top_idx] = (photos[top_idx], (x_top, y_top, x_top + w_top, y_top + h_top))
result[bottom_idxs[0]] = (photos[bottom_idxs[0]], (x_b1, y_bottom, x_b1 + w_b1, y_bottom + h_b1))
result[bottom_idxs[1]] = (photos[bottom_idxs[1]], (x_b2, y_bottom, x_b2 + w_b2, y_bottom + h_b2))
return result
def _layout_4(self, photos, aspects):
"""Four photos - 2x2 adaptive grid"""
base_w = (self.avail_w - self.gap) / 2
base_h = (self.avail_h - self.gap) / 2
positions = []
for i, (photo, aspect) in enumerate(zip(photos, aspects)):
col = i % 2
row = i // 2
if aspect > 1.5:
cell_w = base_w * 1.1
cell_h = base_h * 0.9
elif aspect < 0.7:
cell_w = base_w * 0.9
cell_h = base_h * 1.1
else:
cell_w = base_w
cell_h = base_h
if aspect > cell_w / cell_h:
w = cell_w
h = w / aspect
else:
h = cell_h
w = h * aspect
base_x = self.margin + col * (base_w + self.gap)
base_y = self.margin + (1 - row) * (base_h + self.gap)
x = base_x + (base_w - w) / 2
y = base_y + (base_h - h) / 2
positions.append((photo, (x, y, x + w, y + h)))
return positions
class BabyPhotoBookGenerator:
"""Baby photo book generator"""
def __init__(self, output_path="baby_photo_book.pdf"):
self.output_path = output_path
self.page_width, self.page_height = A4
self.register_fonts()
self.baby_stages = [
("Newborn", 0, 1, "0-1 month"),
("Early Infant", 1, 3, "1-3 months"),
("Mid Infant", 3, 6, "3-6 months"),
("Late Infant", 6, 9, "6-9 months"),
("Crawling", 9, 12, "9-12 months"),
("Early Toddler", 12, 18, "1-1.5 years"),
("Mid Toddler", 18, 24, "1.5-2 years"),
("Late Toddler", 24, 36, "2-3 years"),
]
self.layout_engine = PhotoLayoutEngine(self.page_width, self.page_height)
def register_fonts(self):
try:
font_paths = [
"C:/Windows/Fonts/simhei.ttf",
"C:/Windows/Fonts/simsun.ttc",
"C:/Windows/Fonts/msyh.ttc",
]
for font_path in font_paths:
if os.path.exists(font_path):
try:
pdfmetrics.registerFont(TTFont('Chinese', font_path))
self.font_name = 'Chinese'
return
except:
continue
self.font_name = 'Helvetica'
except:
self.font_name = 'Helvetica'
def get_photo_date(self, image_path):
try:
image = Image.open(image_path)
exif = image._getexif()
if exif:
for tag in [0x9003, 0x9004, 0x0132]:
if tag in exif:
try:
return datetime.strptime(exif[tag], "%Y:%m:%d %H:%M:%S")
except:
continue
timestamp = os.path.getmtime(image_path)
return datetime.fromtimestamp(timestamp)
except:
return datetime.now()
def organize_photos(self, photo_paths, birth_date=None):
stage_photos = defaultdict(list)
for photo_path in photo_paths:
photo_date = self.get_photo_date(photo_path)
if birth_date:
age_months = (photo_date.year - birth_date.year) * 12 + \
(photo_date.month - birth_date.month)
stage_name = None
for name, start, end, range_str in self.baby_stages:
if start <= age_months < end:
stage_name = f"{name} ({range_str})"
break
key = stage_name or "Other"
stage_photos[key].append({
'path': photo_path, 'date': photo_date, 'age_months': age_months
})
else:
key = photo_date.strftime("%Y-%m")
stage_photos[key].append({
'path': photo_path, 'date': photo_date, 'age_months': None
})
return stage_photos
def create_cover(self, c, baby_name):
c.setFillColor(HexColor("#FFF0F5"))
c.rect(0, 0, self.page_width, self.page_height, fill=1, stroke=0)
c.setFillColor(HexColor("#8B4513"))
c.setFont(self.font_name, 40)
c.drawCentredString(self.page_width/2, self.page_height*0.6, f"{baby_name}'s Photo Book")
c.setFont(self.font_name, 12)
c.setFillColor(HexColor("#A0522D"))
c.drawCentredString(self.page_width/2, self.page_height*0.05,
datetime.now().strftime("%Y-%m"))
def create_chapter(self, c, title, count):
c.setFillColor(HexColor("#FFF5EE"))
c.rect(0, 0, self.page_width, self.page_height, fill=1, stroke=0)
c.setFillColor(HexColor("#8B4513"))
c.setFont(self.font_name, 28)
c.drawCentredString(self.page_width/2, self.page_height*0.52, title)
c.setFont(self.font_name, 10)
c.setFillColor(HexColor("#A0522D"))
c.drawCentredString(self.page_width/2, self.page_height*0.45, f"{count} photos")
def draw_photo_page(self, c, layout_result):
for photo_info, coords in layout_result:
x1, y1, x2, y2 = coords
try:
c.drawImage(photo_info['path'], x1, y1, width=x2-x1, height=y2-y1)
date_str = photo_info['date'].strftime("%Y-%m-%d")
if photo_info['age_months']:
date_str += f" ({photo_info['age_months']}m)"
c.setFont(self.font_name, 6)
c.setFillColor(HexColor("#666666"))
c.drawCentredString((x1+x2)/2, y1 - 6, date_str)
except Exception as e:
print(f"[WARN] Cannot draw: {e}")
def generate(self, photo_folder, baby_name="Baby", birth_date_str=None):
photo_paths = []
for file_path in Path(photo_folder).rglob('*'):
if file_path.suffix.lower() in {'.jpg', '.jpeg', '.png', '.bmp'}:
photo_paths.append(str(file_path))
if not photo_paths:
print("[ERROR] No photos found!")
return False
print(f"[OK] Found {len(photo_paths)} photos")
birth_date = None
if birth_date_str:
try:
birth_date = datetime.strptime(birth_date_str, "%Y-%m-%d")
except:
pass
stage_photos = self.organize_photos(photo_paths, birth_date)
sorted_stages = sorted(stage_photos.items(),
key=lambda x: self._get_stage_order(x[0]))
c = canvas.Canvas(self.output_path, pagesize=A4)
self.create_cover(c, baby_name)
c.showPage()
for stage_name, photos in sorted_stages:
if not photos:
continue
print(f"Processing: {stage_name} ({len(photos)} photos)")
self.create_chapter(c, stage_name, len(photos))
c.showPage()
photos.sort(key=lambda x: x['date'])
for i in range(0, len(photos), 4):
page_photos = photos[i:i+4]
layout_result = self.layout_engine.calculate_layout(page_photos)
self.draw_photo_page(c, layout_result)
c.showPage()
c.save()
print(f"[SUCCESS] Generated: {self.output_path}")
return True
def _get_stage_order(self, stage_name):
for i, (name, start, end, range_str) in enumerate(self.baby_stages):
if name in stage_name:
return i
return 999
def main():
parser = argparse.ArgumentParser(description='Baby Photo Book Generator')
parser.add_argument('photo_folder', help='Path to photo folder')
parser.add_argument('--output', '-o', default='baby_photo_book.pdf')
parser.add_argument('--name', '-n', default='Baby')
parser.add_argument('--birth', '-b', help='Birth date: YYYY-MM-DD')
args = parser.parse_args()
generator = BabyPhotoBookGenerator(args.output)
generator.generate(args.photo_folder, args.name, args.birth)
if __name__ == "__main__":
main()Use Playwright to host a browser and call Doubao Web's image generation function. Call this skill when the user requests to draw or generate an image using D...
---
name: "doubao-web"
description: "Use Playwright to host a browser and call Doubao Web's image generation function. Call this skill when the user requests to draw or generate an image using Doubao. (使用 Playwright 托管浏览器的方式,调用豆包 Web 端生图功能。当用户要求使用豆包画图、生成图片时调用此技能。)"
instructions: |
1. Call this skill when the user requests to draw or generate an image, specifically mentioning Doubao or not specifying a particular tool. (当用户请求画图、生成图片,并指明使用豆包或未指定特定工具时,请调用此技能。)
2. Extract the core subject, style, and scene from the user's description to use as the prompt. (提取用户描述中的核心主体、风格和场景作为 prompt。)
3. If the user specifies an image aspect ratio (e.g., avatar, wallpaper, 16:9, etc.), automatically match and add the `--ratio=<value>` parameter. (如果用户指定了图片比例,如头像、壁纸、16:9等,请自动匹配并添加 `--ratio=<value>` 参数。)
4. If the user specifies an image save path, use the `--output=<path>` parameter. (如果用户指定了图片保存路径,请使用 `--output=<path>` 参数。)
5. By default, execute the command in headless mode in the background: `npx ts-node /Users/pengjianfang/skills/doubao-web-image/scripts/main.ts "user's prompt" [optional parameters]`. (默认使用后台无头模式执行命令:`npx ts-node /Users/pengjianfang/skills/doubao-web-image/scripts/main.ts "用户的prompt" [可选参数]`。)
6. After execution, show the user the path of the generated image or confirm successful generation. (执行完毕后,向用户展示生成的图片路径或确认生成成功。)
---
# Doubao Web Image Generator
This project/skill uses Playwright to automate browser control, directly utilizing the real environment of Doubao Web for image generation, perfectly bypassing the `a_bogus` signature risk control issue. (这个项目/技能通过 Playwright 自动化控制浏览器的方式,直接利用豆包 Web 端的真实环境进行图片生成,从而完美避开 `a_bogus` 签名风控问题。)
## Features (功能)
- Auto-save login status to `~/.doubao-web-session` (自动保存登录状态在 `~/.doubao-web-session`)
- Send image generation Prompts in a real browser environment (在真实浏览器环境中发送生图 Prompt)
- Intercept and parse SSE stream responses to get the watermark-free original image URL (拦截并解析 SSE 流式响应,获取无水印原图 URL)
## How to Run (如何运行)
```bash
# Default headless mode (silent background run) and original image quality, saving to generated.png
# 默认使用无头模式 (后台静默运行) 和 获取原图画质,并默认保存为 generated.png
npx ts-node scripts/main.ts "A cyberpunk style cat (一只赛博朋克风格的猫咪)"
# Specify image save path (--output or --image)
# 指定图片保存路径 (--output 或 --image)
npx ts-node scripts/main.ts "A cyberpunk style cat" --output="./my_cyber_cat.png"
# Specify image quality (--quality=preview or --quality=original)
# preview fetches faster, original attempts to get high-res quality (default)
# 指定图片画质 (--quality=preview 或 --quality=original)
# preview 抓取速度更快,original 尝试获取大图画质 (默认)
npx ts-node scripts/main.ts "A cyberpunk style cat" --quality=preview --output="./preview_cat.png"
# For the first run or when login is required, you must use the --ui parameter to show the browser window
# 首次运行或需要登录时,必须使用 --ui 参数显示浏览器窗口
npx ts-node scripts/main.ts "Test" --ui
```
### Command Line Arguments (命令行参数说明)
| Parameter (参数) | Description (说明) | Default (默认值) |
|------|------|--------|
| `prompt` | (Required) Prompt for generating the image / (必填) 生成图片的提示词 | `A cute golden retriever (一只可爱的金毛犬)` |
| `--output=<path>` / `--image=<path>` | Local path to save the downloaded image / 图片下载保存的本地路径 | `./generated.png` |
| `--quality=<value>` | Image quality requirement: `preview` or `original` (High-res) / 图片画质要求,可选 `preview` (预览图) 或 `original` (高清原图) | `original` |
| `--ratio=<value>` | Image aspect ratio selection. Supported: `1:1` (Square avatar), `2:3` (Social media selfie), `3:4` (Classic photo), `4:3` (Article illustration), `9:16` (Mobile wallpaper portrait), `16:9` (Desktop wallpaper landscape) / 图片比例选择,支持:`1:1` (正方形头像), `2:3` (社交媒体自拍), `3:4` (经典比例拍照), `4:3` (文章配图插画), `9:16` (手机壁纸人像), `16:9` (桌面壁纸风景) | |
| `--ui` | Show browser interface (must be used for first login) / 显示浏览器界面(首次登录时必须使用) | Background silent run (后台静默运行) |
| `--help`, `-h` | Show help menu / 显示帮助菜单 | |
## Technical Principle (技术原理)
1. **Browser Hosting (浏览器托管)**: Use Playwright to launch a real Chromium browser, loading the user data directory. (利用 Playwright 启动一个真实的 Chromium 浏览器,加载用户数据目录。)
2. **UI Automation (UI 自动化)**: Locate the input box, auto-fill `Help me generate an image: {prompt}` and simulate pressing Enter. (定位输入框,自动填入 `帮我生成图片:{prompt}` 并模拟回车。)
3. **Network Interception (网络拦截)**: Listen to the POST request response of `/samantha/chat/completion` to get the complete SSE data stream. (监听 `/samantha/chat/completion` 的 POST 请求响应,获取完整的 SSE 数据流。)
4. **Data Parsing (数据解析)**: Use regex to match the `image_ori` URL in the response stream. (使用正则匹配响应流中的 `image_ori` 的 URL。)
## Directory Structure (目录结构)
- `scripts/doubao-webapi/client.ts` - Core Playwright client logic (核心 Playwright 客户端逻辑)
- `scripts/main.ts` - Command line entry file (命令行入口文件)
- `package.json` - Project dependencies (项目依赖)
FILE:package-lock.json
{
"name": "doubao-web-image",
"version": "1.0.0",
"lockfileVersion": 3,
"requires": true,
"packages": {
"": {
"name": "doubao-web-image",
"version": "1.0.0",
"license": "MIT",
"dependencies": {
"playwright": "^1.58.2",
"uuid": "^13.0.0"
},
"bin": {
"doubao-image": "scripts/main.ts"
},
"devDependencies": {
"@types/node": "^25.5.0",
"@types/uuid": "^10.0.0",
"ts-node": "^10.9.2",
"typescript": "^6.0.2"
},
"engines": {
"node": ">=18.0.0"
}
},
"node_modules/@cspotcode/source-map-support": {
"version": "0.8.1",
"resolved": "https://registry.npmmirror.com/@cspotcode/source-map-support/-/source-map-support-0.8.1.tgz",
"integrity": "sha512-IchNf6dN4tHoMFIn/7OE8LWZ19Y6q/67Bmf6vnGREv8RSbBVb9LPJxEcnwrcwX6ixSvaiGoomAUvu4YSxXrVgw==",
"dev": true,
"license": "MIT",
"dependencies": {
"@jridgewell/trace-mapping": "0.3.9"
},
"engines": {
"node": ">=12"
}
},
"node_modules/@jridgewell/resolve-uri": {
"version": "3.1.2",
"resolved": "https://registry.npmmirror.com/@jridgewell/resolve-uri/-/resolve-uri-3.1.2.tgz",
"integrity": "sha512-bRISgCIjP20/tbWSPWMEi54QVPRZExkuD9lJL+UIxUKtwVJA8wW1Trb1jMs1RFXo1CBTNZ/5hpC9QvmKWdopKw==",
"dev": true,
"license": "MIT",
"engines": {
"node": ">=6.0.0"
}
},
"node_modules/@jridgewell/sourcemap-codec": {
"version": "1.5.5",
"resolved": "https://registry.npmmirror.com/@jridgewell/sourcemap-codec/-/sourcemap-codec-1.5.5.tgz",
"integrity": "sha512-cYQ9310grqxueWbl+WuIUIaiUaDcj7WOq5fVhEljNVgRfOUhY9fy2zTvfoqWsnebh8Sl70VScFbICvJnLKB0Og==",
"dev": true,
"license": "MIT"
},
"node_modules/@jridgewell/trace-mapping": {
"version": "0.3.9",
"resolved": "https://registry.npmmirror.com/@jridgewell/trace-mapping/-/trace-mapping-0.3.9.tgz",
"integrity": "sha512-3Belt6tdc8bPgAtbcmdtNJlirVoTmEb5e2gC94PnkwEW9jI6CAHUeoG85tjWP5WquqfavoMtMwiG4P926ZKKuQ==",
"dev": true,
"license": "MIT",
"dependencies": {
"@jridgewell/resolve-uri": "^3.0.3",
"@jridgewell/sourcemap-codec": "^1.4.10"
}
},
"node_modules/@tsconfig/node10": {
"version": "1.0.12",
"resolved": "https://registry.npmmirror.com/@tsconfig/node10/-/node10-1.0.12.tgz",
"integrity": "sha512-UCYBaeFvM11aU2y3YPZ//O5Rhj+xKyzy7mvcIoAjASbigy8mHMryP5cK7dgjlz2hWxh1g5pLw084E0a/wlUSFQ==",
"dev": true,
"license": "MIT"
},
"node_modules/@tsconfig/node12": {
"version": "1.0.11",
"resolved": "https://registry.npmmirror.com/@tsconfig/node12/-/node12-1.0.11.tgz",
"integrity": "sha512-cqefuRsh12pWyGsIoBKJA9luFu3mRxCA+ORZvA4ktLSzIuCUtWVxGIuXigEwO5/ywWFMZ2QEGKWvkZG1zDMTag==",
"dev": true,
"license": "MIT"
},
"node_modules/@tsconfig/node14": {
"version": "1.0.3",
"resolved": "https://registry.npmmirror.com/@tsconfig/node14/-/node14-1.0.3.tgz",
"integrity": "sha512-ysT8mhdixWK6Hw3i1V2AeRqZ5WfXg1G43mqoYlM2nc6388Fq5jcXyr5mRsqViLx/GJYdoL0bfXD8nmF+Zn/Iow==",
"dev": true,
"license": "MIT"
},
"node_modules/@tsconfig/node16": {
"version": "1.0.4",
"resolved": "https://registry.npmmirror.com/@tsconfig/node16/-/node16-1.0.4.tgz",
"integrity": "sha512-vxhUy4J8lyeyinH7Azl1pdd43GJhZH/tP2weN8TntQblOY+A0XbT8DJk1/oCPuOOyg/Ja757rG0CgHcWC8OfMA==",
"dev": true,
"license": "MIT"
},
"node_modules/@types/node": {
"version": "25.5.0",
"resolved": "https://registry.npmmirror.com/@types/node/-/node-25.5.0.tgz",
"integrity": "sha512-jp2P3tQMSxWugkCUKLRPVUpGaL5MVFwF8RDuSRztfwgN1wmqJeMSbKlnEtQqU8UrhTmzEmZdu2I6v2dpp7XIxw==",
"dev": true,
"license": "MIT",
"dependencies": {
"undici-types": "~7.18.0"
}
},
"node_modules/@types/uuid": {
"version": "10.0.0",
"resolved": "https://registry.npmmirror.com/@types/uuid/-/uuid-10.0.0.tgz",
"integrity": "sha512-7gqG38EyHgyP1S+7+xomFtL+ZNHcKv6DwNaCZmJmo1vgMugyF3TCnXVg4t1uk89mLNwnLtnY3TpOpCOyp1/xHQ==",
"dev": true,
"license": "MIT"
},
"node_modules/acorn": {
"version": "8.16.0",
"resolved": "https://registry.npmmirror.com/acorn/-/acorn-8.16.0.tgz",
"integrity": "sha512-UVJyE9MttOsBQIDKw1skb9nAwQuR5wuGD3+82K6JgJlm/Y+KI92oNsMNGZCYdDsVtRHSak0pcV5Dno5+4jh9sw==",
"dev": true,
"license": "MIT",
"bin": {
"acorn": "bin/acorn"
},
"engines": {
"node": ">=0.4.0"
}
},
"node_modules/acorn-walk": {
"version": "8.3.5",
"resolved": "https://registry.npmmirror.com/acorn-walk/-/acorn-walk-8.3.5.tgz",
"integrity": "sha512-HEHNfbars9v4pgpW6SO1KSPkfoS0xVOM/9UzkJltjlsHZmJasxg8aXkuZa7SMf8vKGIBhpUsPluQSqhJFCqebw==",
"dev": true,
"license": "MIT",
"dependencies": {
"acorn": "^8.11.0"
},
"engines": {
"node": ">=0.4.0"
}
},
"node_modules/arg": {
"version": "4.1.3",
"resolved": "https://registry.npmmirror.com/arg/-/arg-4.1.3.tgz",
"integrity": "sha512-58S9QDqG0Xx27YwPSt9fJxivjYl432YCwfDMfZ+71RAqUrZef7LrKQZ3LHLOwCS4FLNBplP533Zx895SeOCHvA==",
"dev": true,
"license": "MIT"
},
"node_modules/create-require": {
"version": "1.1.1",
"resolved": "https://registry.npmmirror.com/create-require/-/create-require-1.1.1.tgz",
"integrity": "sha512-dcKFX3jn0MpIaXjisoRvexIJVEKzaq7z2rZKxf+MSr9TkdmHmsU4m2lcLojrj/FHl8mk5VxMmYA+ftRkP/3oKQ==",
"dev": true,
"license": "MIT"
},
"node_modules/diff": {
"version": "4.0.4",
"resolved": "https://registry.npmmirror.com/diff/-/diff-4.0.4.tgz",
"integrity": "sha512-X07nttJQkwkfKfvTPG/KSnE2OMdcUCao6+eXF3wmnIQRn2aPAHH3VxDbDOdegkd6JbPsXqShpvEOHfAT+nCNwQ==",
"dev": true,
"license": "BSD-3-Clause",
"engines": {
"node": ">=0.3.1"
}
},
"node_modules/fsevents": {
"version": "2.3.2",
"resolved": "https://registry.npmmirror.com/fsevents/-/fsevents-2.3.2.tgz",
"integrity": "sha512-xiqMQR4xAeHTuB9uWm+fFRcIOgKBMiOBP+eXiyT7jsgVCq1bkVygt00oASowB7EdtpOHaaPgKt812P9ab+DDKA==",
"hasInstallScript": true,
"license": "MIT",
"optional": true,
"os": [
"darwin"
],
"engines": {
"node": "^8.16.0 || ^10.6.0 || >=11.0.0"
}
},
"node_modules/make-error": {
"version": "1.3.6",
"resolved": "https://registry.npmmirror.com/make-error/-/make-error-1.3.6.tgz",
"integrity": "sha512-s8UhlNe7vPKomQhC1qFelMokr/Sc3AgNbso3n74mVPA5LTZwkB9NlXf4XPamLxJE8h0gh73rM94xvwRT2CVInw==",
"dev": true,
"license": "ISC"
},
"node_modules/playwright": {
"version": "1.58.2",
"resolved": "https://registry.npmmirror.com/playwright/-/playwright-1.58.2.tgz",
"integrity": "sha512-vA30H8Nvkq/cPBnNw4Q8TWz1EJyqgpuinBcHET0YVJVFldr8JDNiU9LaWAE1KqSkRYazuaBhTpB5ZzShOezQ6A==",
"license": "Apache-2.0",
"dependencies": {
"playwright-core": "1.58.2"
},
"bin": {
"playwright": "cli.js"
},
"engines": {
"node": ">=18"
},
"optionalDependencies": {
"fsevents": "2.3.2"
}
},
"node_modules/playwright-core": {
"version": "1.58.2",
"resolved": "https://registry.npmmirror.com/playwright-core/-/playwright-core-1.58.2.tgz",
"integrity": "sha512-yZkEtftgwS8CsfYo7nm0KE8jsvm6i/PTgVtB8DL726wNf6H2IMsDuxCpJj59KDaxCtSnrWan2AeDqM7JBaultg==",
"license": "Apache-2.0",
"bin": {
"playwright-core": "cli.js"
},
"engines": {
"node": ">=18"
}
},
"node_modules/ts-node": {
"version": "10.9.2",
"resolved": "https://registry.npmmirror.com/ts-node/-/ts-node-10.9.2.tgz",
"integrity": "sha512-f0FFpIdcHgn8zcPSbf1dRevwt047YMnaiJM3u2w2RewrB+fob/zePZcrOyQoLMMO7aBIddLcQIEK5dYjkLnGrQ==",
"dev": true,
"license": "MIT",
"dependencies": {
"@cspotcode/source-map-support": "^0.8.0",
"@tsconfig/node10": "^1.0.7",
"@tsconfig/node12": "^1.0.7",
"@tsconfig/node14": "^1.0.0",
"@tsconfig/node16": "^1.0.2",
"acorn": "^8.4.1",
"acorn-walk": "^8.1.1",
"arg": "^4.1.0",
"create-require": "^1.1.0",
"diff": "^4.0.1",
"make-error": "^1.1.1",
"v8-compile-cache-lib": "^3.0.1",
"yn": "3.1.1"
},
"bin": {
"ts-node": "dist/bin.js",
"ts-node-cwd": "dist/bin-cwd.js",
"ts-node-esm": "dist/bin-esm.js",
"ts-node-script": "dist/bin-script.js",
"ts-node-transpile-only": "dist/bin-transpile.js",
"ts-script": "dist/bin-script-deprecated.js"
},
"peerDependencies": {
"@swc/core": ">=1.2.50",
"@swc/wasm": ">=1.2.50",
"@types/node": "*",
"typescript": ">=2.7"
},
"peerDependenciesMeta": {
"@swc/core": {
"optional": true
},
"@swc/wasm": {
"optional": true
}
}
},
"node_modules/typescript": {
"version": "6.0.2",
"resolved": "https://registry.npmmirror.com/typescript/-/typescript-6.0.2.tgz",
"integrity": "sha512-bGdAIrZ0wiGDo5l8c++HWtbaNCWTS4UTv7RaTH/ThVIgjkveJt83m74bBHMJkuCbslY8ixgLBVZJIOiQlQTjfQ==",
"dev": true,
"license": "Apache-2.0",
"bin": {
"tsc": "bin/tsc",
"tsserver": "bin/tsserver"
},
"engines": {
"node": ">=14.17"
}
},
"node_modules/undici-types": {
"version": "7.18.2",
"resolved": "https://registry.npmmirror.com/undici-types/-/undici-types-7.18.2.tgz",
"integrity": "sha512-AsuCzffGHJybSaRrmr5eHr81mwJU3kjw6M+uprWvCXiNeN9SOGwQ3Jn8jb8m3Z6izVgknn1R0FTCEAP2QrLY/w==",
"dev": true,
"license": "MIT"
},
"node_modules/uuid": {
"version": "13.0.0",
"resolved": "https://registry.npmmirror.com/uuid/-/uuid-13.0.0.tgz",
"integrity": "sha512-XQegIaBTVUjSHliKqcnFqYypAd4S+WCYt5NIeRs6w/UAry7z8Y9j5ZwRRL4kzq9U3sD6v+85er9FvkEaBpji2w==",
"funding": [
"https://github.com/sponsors/broofa",
"https://github.com/sponsors/ctavan"
],
"license": "MIT",
"bin": {
"uuid": "dist-node/bin/uuid"
}
},
"node_modules/v8-compile-cache-lib": {
"version": "3.0.1",
"resolved": "https://registry.npmmirror.com/v8-compile-cache-lib/-/v8-compile-cache-lib-3.0.1.tgz",
"integrity": "sha512-wa7YjyUGfNZngI/vtK0UHAN+lgDCxBPCylVXGp0zu59Fz5aiGtNXaq3DhIov063MorB+VfufLh3JlF2KdTK3xg==",
"dev": true,
"license": "MIT"
},
"node_modules/yn": {
"version": "3.1.1",
"resolved": "https://registry.npmmirror.com/yn/-/yn-3.1.1.tgz",
"integrity": "sha512-Ux4ygGWsu2c7isFWe8Yu1YluJmqVhxqK2cLXNQA5AcC3QfbGNpM7fu0Y8b/z16pXLnFxZYvWhd3fhBY9DLmC6Q==",
"dev": true,
"license": "MIT",
"engines": {
"node": ">=6"
}
}
}
}
FILE:package.json
{
"name": "doubao-web-image",
"version": "1.0.0",
"description": "基于 Playwright 的豆包 (Doubao) Web 端网页自动化生图工具",
"main": "scripts/main.ts",
"bin": {
"doubao-image": "scripts/main.ts"
},
"scripts": {
"start": "ts-node scripts/main.ts",
"test": "echo \"Error: no test specified\" && exit 1"
},
"repository": {
"type": "git",
"url": "git+https://github.com/pjf6568/doubao-web-image.git"
},
"keywords": [
"doubao",
"playwright",
"automation",
"image-generation"
],
"author": "pjf6568",
"license": "MIT",
"type": "commonjs",
"engines": {
"node": ">=18.0.0"
},
"dependencies": {
"playwright": "^1.58.2",
"uuid": "^13.0.0"
},
"devDependencies": {
"@types/node": "^25.5.0",
"@types/uuid": "^10.0.0",
"ts-node": "^10.9.2",
"typescript": "^6.0.2"
}
}
FILE:README.md
# Doubao Web Image Generation CLI
基于 Playwright 的豆包 (Doubao) Web 端网页自动化生图工具。
A Playwright-based web automation tool for generating images using Doubao Web.
🔗 **GitHub Repository:** [pjf6568/doubao-web-image](https://github.com/pjf6568/doubao-web-image)
## ⚠️ 免责声明 / Disclaimer
**本项目仅供编程学习、Playwright 自动化测试研究和技术交流使用。**
**This project is for programming learning, Playwright automation testing research, and technical exchange only.**
- 本项目并非豆包官方产品,与字节跳动公司无任何关联。 / This project is not an official Doubao product and is not affiliated with ByteDance.
- 使用本项目产生的任何后果由使用者本人承担。 / The user bears all consequences arising from the use of this project.
- **请勿将本项目用于任何非法、侵权、恶意刷量或商业牟利的场景。** / **Do not use this project for any illegal, infringing, malicious traffic generation, or commercial profit-making purposes.**
- 若因使用本工具导致账号封禁、功能受限或其他损失,作者不承担任何责任。 / The author bears no responsibility for account bans, functional restrictions, or other losses caused by using this tool.
- 请自觉遵守相关平台的用户服务协议及生成内容规范。 / Please consciously abide by the relevant platform's user service agreement and generated content specifications.
---
## 🌟 特性 / Features
- 🤖 **免 API Key (No API Key Required)**:通过 Playwright 模拟浏览器操作,直接复用网页版登录状态。 / Simulates browser operations via Playwright, directly reusing the web version's login status.
- 🖼️ **高清大图下载 (High-Res Image Download)**:自动拦截原生下载链接,获取 >4MB 的无损高分辨率原图,而非缩略图。 / Automatically intercepts native download links to obtain lossless high-resolution original images (>4MB) instead of thumbnails.
- 📏 **比例控制 (Aspect Ratio Control)**:支持通过自然语言参数自动拼接控制图片长宽比(如 `16:9`, `1:1`)。 / Supports controlling the image aspect ratio (e.g., `16:9`, `1:1`) by automatically appending parameters via natural language.
- 🛡️ **验证码自动降级 (Auto-fallback for CAPTCHA)**:默认无头 (Headless) 模式运行,遇到风控拦截时自动弹窗切换到 UI 模式供人工验证。 / Runs in Headless mode by default, automatically popping up the UI mode for manual verification when encountering risk control interception.

## 📦 安装与配置 / Installation & Setup
确保你已经安装了 Node.js (建议 v18+) 和 npm。
Ensure you have Node.js (v18+ recommended) and npm installed.
### 方式一:直接全局安装(推荐) / Method 1: Global Installation (Recommended)
你可以直接通过 npm 从 GitHub 全局安装此工具,安装后可以在任意目录直接使用 `doubao-image` 命令:
You can install this tool globally directly from GitHub via npm. After installation, you can use the `doubao-image` command in any directory:
```bash
npm install -g git+https://github.com/pjf6568/doubao-web-image.git
```
*(注意:首次运行可能需要执行 `npx playwright install chromium` 来安装浏览器内核)*
*(Note: You may need to run `npx playwright install chromium` on the first run to install the browser binary)*
### 方式二:克隆到本地运行 / Method 2: Clone and Run Locally
```bash
# 1. 克隆项目 / Clone the project
git clone https://github.com/pjf6568/doubao-web-image.git
cd doubao-web-image
# 2. 安装项目依赖 / Install dependencies
npm install
# 3. 安装 Playwright 浏览器内核 / Install Playwright browser binary
npx playwright install chromium
```
## 🚀 使用方法 / Usage
如果你使用了**全局安装**,可以将下面所有的 `npx ts-node scripts/main.ts` 替换为简单的 `doubao-image` 命令。
If you used **global installation**, you can replace all `npx ts-node scripts/main.ts` below with the simple `doubao-image` command.
### 1. 首次使用(需手动登录) / 1. First Use (Manual Login Required)
由于项目需要获取你的登录 Cookie,第一次运行**必须带上 `--ui` 参数**以打开可视化浏览器:
Because the project needs to get your login Cookie, the first run **must include the `--ui` parameter** to open the visible browser:
```bash
npx ts-node scripts/main.ts "画一只可爱的猫咪 (Draw a cute cat)" --ui
```
*在弹出的浏览器中完成手机号/验证码登录后,脚本会自动检测到输入框并继续生成图片。登录态会保存在本地的 `~/.doubao-web-session` 目录中,后续无需重复登录。*
*After completing the phone number/CAPTCHA login in the popped-up browser, the script will automatically detect the input box and continue generating the image. The login state will be saved in the local `~/.doubao-web-session` directory, and no repeated login is needed subsequently.*
### 2. 日常生图(后台无头模式) / 2. Daily Image Generation (Headless Mode)
登录成功后,可以直接在后台静默生成并下载图片:
After successful login, you can generate and download images silently in the background:
```bash
npx ts-node scripts/main.ts "一只带有未来科技感的机器狗 (A robotic dog with a futuristic tech vibe)"
```
### 3. 高级参数 / 3. Advanced Parameters
- `--quality=<value>`:图片质量 (Image quality),可选 `preview`(预览图/preview)或 `original`(原始大图/original大图,默认/default)。
- `--ratio=<value>`:图片比例选择 (Aspect ratio selection)。支持的比例及推荐场景如下 (Supported ratios and recommended scenarios are as follows):
- `1:1`:正方形头像 / Square avatar
- `2:3`:社交媒体自拍 / Social media selfie
- `3:4`:经典比例拍照 / Classic photo ratio
- `4:3`:文章配图插画 / Article illustration
- `9:16`:手机壁纸人像 / Mobile wallpaper portrait
- `16:9`:桌面壁纸风景 / Desktop wallpaper landscape
- `--output=<path>` 或 `--image=<path>`:指定图片下载保存的路径。 / Specify the path to save the downloaded image.
- `--ui`:强制显示浏览器界面(用于重新登录或手动过验证码)。 / Force display the browser interface (used for re-login or manual CAPTCHA verification).
## 🤖 作为 AI Skill 使用 (For AI Assistants)
本项目内置了 `SKILL.md`,非常适合作为大模型(如 Trae 等 AI 助手)的扩展技能。
This project has a built-in `SKILL.md`, making it perfect as an extension skill for large models (like AI assistants such as Trae).
1. **导入技能 (Import Skill)**:AI 助手只需读取项目根目录下的 `SKILL.md` 文件。 / The AI assistant only needs to read the `SKILL.md` file in the project root directory.
2. **自然语言交互 (Natural Language Interaction)**:用户可以直接对 AI 说:“帮我用豆包画一张赛博朋克猫咪的手机壁纸”。 / Users can directly say to the AI: "Help me draw a cyberpunk cat mobile wallpaper using Doubao."
3. **自动执行 (Automatic Execution)**:AI 会根据 `SKILL.md` 中的指令,自动解析出 `prompt="赛博朋克猫咪"` 和 `--ratio=9:16`,并在后台静默调用命令行生成图片返回给用户。 / The AI will automatically parse out `prompt="cyberpunk cat"` and `--ratio=9:16` according to the instructions in `SKILL.md`, and silently call the command line in the background to generate the image and return it to the user.
**综合示例 (Comprehensive Example):**
```bash
npx ts-node scripts/main.ts "星空下的赛博朋克城市" --ratio="9:16" --quality=original --output=./city_wallpaper.png
```
## 🐛 常见问题 / FAQ
- **Q: 提示“未能获取到图片,可能触发了人机验证”? / Q: Prompt "Failed to get image, may have triggered CAPTCHA"?**
A: 脚本已内置自动重试机制。当在无头模式下遇到风控,脚本会自动关闭并以 UI 模式重启,给你 120 秒的时间在弹出的浏览器中手动完成滑块或点选验证码。 / The script has a built-in automatic retry mechanism. When encountering risk control in headless mode, the script will automatically close and restart in UI mode, giving you 120 seconds to manually complete the slider or point-and-click CAPTCHA in the popped-up browser.
- **Q: 生成的图片大小只有几百 KB? / Q: The generated image size is only a few hundred KB?**
A: 确保没有加上 `--quality=preview` 参数。脚本默认会模拟点击大图并寻找下载按钮来获取 `image_pre_watermark` 级别的高清无损原图(通常 >1MB)。 / Ensure the `--quality=preview` parameter is not added. By default, the script will simulate clicking the large image and looking for the download button to get the `image_pre_watermark` level high-definition lossless original image (usually >1MB).
---
### 💬 联系与交流 / Contact & Communication
这是我的微信公众号,欢迎关注交流!
Welcome to follow my WeChat Official Account for communication and discussion!
<img src="./2.jpg" width="300" alt="WeChat Official Account" />
FILE:scripts/doubao-webapi/client.ts
import { chromium, BrowserContext, Page } from 'playwright';
import * as path from 'path';
import * as os from 'os';
import * as fs from 'fs';
import * as https from 'https';
export class DoubaoClient {
private context: BrowserContext | null = null;
private page: Page | null = null;
private userDataDir: string;
constructor() {
// Store user data in a local folder so login state persists
this.userDataDir = path.join(os.homedir(), '.doubao-web-session');
if (!fs.existsSync(this.userDataDir)) {
fs.mkdirSync(this.userDataDir, { recursive: true });
}
}
/**
* Initializes the Playwright browser context
* @param headless - Whether to run the browser in headless mode.
* Recommend false for the first time to login manually.
*/
async init(headless: boolean = false) {
console.log(`[DoubaoClient] Initializing Playwright (headless: headless)...`);
console.log(`[DoubaoClient] User data directory: this.userDataDir`);
this.context = await chromium.launchPersistentContext(this.userDataDir, {
headless,
viewport: { width: 1280, height: 800 },
// Override User-Agent to remove "HeadlessChrome" which is easily detected by ByteDance WAF
userAgent: 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/122.0.0.0 Safari/537.36',
// Add stealth arguments
args: [
'--disable-blink-features=AutomationControlled',
'--disable-infobars'
]
});
const pages = this.context.pages();
this.page = pages.length > 0 ? pages[0] : (await this.context.newPage());
console.log('[DoubaoClient] Navigating to Doubao chat...');
if (!this.page) throw new Error("Failed to create page");
await this.page.goto('https://www.doubao.com/chat/', { waitUntil: 'domcontentloaded' });
// Wait a few seconds to let the page load and check for login status
await this.page.waitForTimeout(3000);
// Simple check to see if we're on the login page or a login modal is present
const url = this.page.url();
const title = await this.page.title();
console.log(`[DoubaoClient-Debug] 当前页面 URL: url`);
console.log(`[DoubaoClient-Debug] 当前页面 Title: title`);
const userAgent = await this.page.evaluate(() => navigator.userAgent);
console.log(`[DoubaoClient-Debug] 当前 User-Agent: userAgent`);
const loginTextVisible = await this.page.locator('text="登录/注册"').isVisible().catch(() => false);
const hasLoginModal = url.includes('login') || loginTextVisible;
if (hasLoginModal) {
console.log('\n=============================================');
console.log('❗️ 需要登录豆包 ❗️');
if (headless) {
console.error('⚠️ 当前处于无头模式(Headless),无法进行手动登录。');
console.error('👉 请运行带 --ui 参数的命令进行首次登录,例如: npx ts-node scripts/main.ts "测试" --ui');
console.log('=============================================\n');
throw new Error("Login required but running in headless mode.");
}
console.log('请在打开的浏览器窗口中完成登录 (支持手机号/验证码等)。');
console.log('登录成功后,程序将自动检测并继续运行。');
console.log('=============================================\n');
await this.page.screenshot({ path: 'debug-login-state.png' });
console.log('[DoubaoClient-Debug] 已保存当前页面截图到 debug-login-state.png (如果处于 headless 模式请查看此图确认状态)');
// Wait indefinitely until the chat input textarea appears
console.log('[DoubaoClient] 等待用户登录...');
await this.page.waitForSelector('textarea', { timeout: 0 });
console.log('[DoubaoClient] 检测到输入框,登录成功!继续执行。');
} else {
console.log('[DoubaoClient] 已检测到登录状态。');
}
}
/**
* Generates an image using Doubao Web UI Automation
* @param options - Prompt and quality settings
* @returns The generated image URL or null if failed
*/
async generateImage(options: { prompt: string, quality?: 'preview' | 'original', ratio?: string, timeout?: number }): Promise<string | null> {
if (!this.page) throw new Error('Client not initialized. Call init() first.');
const { prompt, quality = 'original', ratio, timeout = 60000 } = options;
const finalPrompt = ratio ? `prompt,图片比例 ratio` : prompt;
console.log(`[DoubaoClient] 正在发送生图请求: finalPrompt (要求质量: quality)`);
try {
// Find the chat input
const inputLocator = this.page.locator('textarea').first();
await inputLocator.waitFor({ state: 'visible', timeout: 10000 });
// Clear existing text and fill the prompt
await inputLocator.fill('');
await inputLocator.fill(`帮我生成图片:finalPrompt`);
await this.page.waitForTimeout(500); // short pause
// 记录当前页面上已有的生成图片数量
const beforeCount = await this.page.locator('img[src*="flow-imagex-sign"]').count();
console.log(`[DoubaoClient-Debug] 发送指令前,检测到已有图片数量: beforeCount`);
// Press enter to send
await inputLocator.press('Enter');
console.log('[DoubaoClient] 已发送指令,等待图片生成完成 (预计 10-30 秒)...');
// 轮询监控 DOM,等待新的图片出现
let targetUrl: string | null = null;
let targetImgElement: any = null;
const startTime = Date.now();
let pollCount = 0;
while (Date.now() - startTime < timeout) {
await this.page.waitForTimeout(2000); // 每 2 秒检查一次
pollCount++;
const currentCount = await this.page.locator('img[src*="flow-imagex-sign"]').count();
console.log(`[DoubaoClient-Debug] 第 pollCount 次轮询检查, 当前图片数量: currentCount`);
if (currentCount > beforeCount) {
const imgLocators = await this.page.locator('img[src*="flow-imagex-sign"]').all();
targetImgElement = imgLocators[imgLocators.length - 1];
// 等待一会儿让图片加载完成,避免抓到缩略图的旧链接
await this.page.waitForTimeout(3000);
targetUrl = await targetImgElement.getAttribute('src');
break;
}
}
if (targetUrl) {
if (quality === 'original' && targetImgElement) {
console.log(`[DoubaoClient] 检测到缩略图已生成,正在尝试获取原始大图...`);
// 模拟点击缩略图以打开原图大图模态框
await targetImgElement.click();
await this.page.waitForTimeout(3000); // 等待大图模态框加载
// 尝试查找下载按钮并拦截下载事件
try {
// We will also wait for any new response that might be the high res image
const downloadPromise = this.page.waitForEvent('download', { timeout: 5000 }).catch(() => null);
// 寻找包含“下载”文字的按钮或常见下载图标
const buttons = await this.page.locator('div[role="dialog"] svg, div[class*="image-viewer"] svg, svg').all();
let clicked = false;
for (const svg of buttons) {
const html = await svg.evaluate(node => node.outerHTML);
// 根据特征判断是否是下载按钮
// 豆包的下载图标 SVG path 包含 M19.207 12.707...
if (html.includes('download') || html.includes('下载') || html.includes('M19.207 12.707') || html.includes('M2 19C2')) {
// try to click it
try {
const parentBtn = await svg.evaluateHandle(node => {
let el = node.parentElement;
while(el && el.tagName !== 'BUTTON' && el.getAttribute('role') !== 'button' && el.tagName !== 'DIV') {
el = el.parentElement;
}
return el || node;
});
await parentBtn.asElement()?.click();
clicked = true;
console.log(`[DoubaoClient] 尝试点击下载图标...`);
break;
} catch (e) {
// ignore
}
}
}
// 另外一种方法:查找具有下载文字的元素
if (!clicked) {
const textEls = await this.page.locator('text="下载"').all();
for (const el of textEls) {
if (await el.isVisible()) {
await el.click();
clicked = true;
console.log(`[DoubaoClient] 尝试点击带有"下载"文字的元素...`);
break;
}
}
}
if (clicked) {
console.log(`[DoubaoClient] 等待下载事件触发或大图加载...`);
const download = await downloadPromise;
if (download) {
const downloadUrl = download.url();
console.log(`[DoubaoClient] 成功拦截到原生下载链接: downloadUrl`);
return downloadUrl;
} else {
console.log(`[DoubaoClient] 未能拦截到下载事件,继续提取页面大图...`);
// Wait a bit more for the DOM to update with high-res image
await this.page.waitForTimeout(2000);
}
} else {
console.log(`[DoubaoClient] 未找到明确的下载按钮,尝试提取无损 URL...`);
}
} catch (e) {
console.log(`[DoubaoClient] 拦截下载失败: e`);
}
// 查找模态框中新的、可能没有 downsize 后缀的大图
const modalImages = await this.page.locator('img[src*="flow-imagex-sign"]').all();
let bestUrl = null;
for (const img of modalImages) {
const src = await img.getAttribute('src');
if (src) {
// 优先寻找包含 image_pre_watermark 的高分辨率图
if (src.includes('image_pre_watermark')) {
console.log(`[DoubaoClient] 成功获取原始大图 URL (image_pre_watermark)`);
return src;
}
// 排出一些干扰项(例如活动 banner 等)
if (!src.includes('downsize') && !src.includes('web-operation') && !src.includes('avatar')) {
bestUrl = src;
}
}
}
if (bestUrl) {
console.log(`[DoubaoClient] 成功获取无损大图 URL`);
return bestUrl;
}
} else {
console.log(`[DoubaoClient] 成功从页面获取预览图片 URL`);
}
return targetUrl;
} else {
console.warn('[DoubaoClient] ⚠️ 等待图片超时,未能获取到图片 URL。');
await this.page.screenshot({ path: 'debug-timeout.png', fullPage: true });
const html = await this.page.content();
fs.writeFileSync('debug-page.html', html);
console.log('[DoubaoClient-Debug] 已保存超时现场截图到 debug-timeout.png 和源码到 debug-page.html');
return null;
}
} catch (error) {
console.error('[DoubaoClient] 生图过程发生错误:', error);
if (this.page) {
await this.page.screenshot({ path: 'debug-error.png', fullPage: true }).catch(() => {});
console.log('[DoubaoClient-Debug] 已保存报错现场截图到 debug-error.png');
}
return null;
}
}
/**
* Closes the browser context
*/
async close() {
if (this.context) {
await this.context.close();
console.log('[DoubaoClient] 浏览器已关闭。');
}
}
/**
* Downloads an image from a URL to a local file
* @param url - The image URL
* @param destPath - The local file path to save the image
* @returns A promise that resolves to the saved file path or null if failed
*/
static async downloadImage(url: string, destPath: string): Promise<string | null> {
return new Promise((resolve) => {
console.log(`[DoubaoClient] 正在下载图片至: destPath`);
// Ensure directory exists
const dir = path.dirname(destPath);
if (!fs.existsSync(dir)) {
fs.mkdirSync(dir, { recursive: true });
}
const file = fs.createWriteStream(destPath);
https.get(url, (response) => {
if (response.statusCode !== 200) {
console.error(`[DoubaoClient] 下载失败,HTTP 状态码: response.statusCode`);
file.close();
fs.unlink(destPath, () => {}); // Delete empty file
resolve(null);
return;
}
response.pipe(file);
file.on('finish', () => {
file.close();
resolve(destPath);
});
}).on('error', (err) => {
console.error(`[DoubaoClient] 下载发生错误: err.message`);
file.close();
fs.unlink(destPath, () => {}); // Delete empty file
resolve(null);
});
});
}
}
FILE:scripts/main.ts
#!/usr/bin/env ts-node
import { DoubaoClient } from './doubao-webapi/client';
import * as path from 'path';
async function main() {
// Parse command line arguments
const args = process.argv.slice(2);
// Help menu
if (args.includes('--help') || args.includes('-h') || args.length === 0) {
console.log(`
Doubao Web API Image Generation
Usage:
npx ts-node scripts/main.ts "Your prompt here" [options]
Options:
--ui Show browser window (required for first login)
--quality=<value> Image quality: 'preview' or 'original' (default: original)
--ratio=<value> Image ratio/resolution (e.g., '16:9', '1:1', '1024x1024')
--output=<path> Path to save the generated image (e.g., ./my_cat.png).
If not specified, defaults to 'generated.png' in current directory.
--image=<path> Alias for --output
--help, -h Show this help menu
`);
process.exit(0);
}
// 默认开启 headless 模式,除非用户显式指定了 --ui
const uiFlag = args.includes('--ui');
const headlessFlag = !uiFlag;
// Check for quality flag (--quality=preview or --quality=original)
let quality: 'preview' | 'original' = 'original';
const qualityArg = args.find(arg => arg.startsWith('--quality='));
if (qualityArg) {
const val = qualityArg.split('=')[1];
if (val === 'preview' || val === 'original') {
quality = val;
}
}
// Check for ratio/resolution flag
let ratio: string | undefined = undefined;
const ratioArg = args.find(arg => arg.startsWith('--ratio='));
if (ratioArg) {
ratio = ratioArg.split('=')[1];
}
// Parse output path
let outputPath = path.resolve(process.cwd(), 'generated.png');
const outputArg = args.find(arg => arg.startsWith('--output=') || arg.startsWith('--image='));
if (outputArg) {
const val = outputArg.split('=')[1];
if (val && val.trim().length > 0) {
outputPath = path.resolve(process.cwd(), val.trim());
}
} else {
// Also check if they used space format e.g. "--output ./file.png"
const outIndex = args.findIndex(arg => arg === '--output' || arg === '--image');
if (outIndex !== -1 && outIndex + 1 < args.length && !args[outIndex + 1].startsWith('-')) {
outputPath = path.resolve(process.cwd(), args[outIndex + 1].trim());
}
}
// Filter out options to get the prompt
const promptParts = args.filter(arg => !arg.startsWith('-') && args[args.indexOf(arg) - 1] !== '--output' && args[args.indexOf(arg) - 1] !== '--image');
const prompt = promptParts.join(' ').trim() || '一只可爱的金毛犬';
let client = new DoubaoClient();
let imageUrl: string | null = null;
let needsUiRetry = false;
try {
console.log('--- 启动豆包生图客户端 ---');
// First run
await client.init(headlessFlag);
console.log(`\n任务: 生成图片 "prompt" (质量: quality${ratio` : ''})`);
imageUrl = await client.generateImage({ prompt, quality, ratio });
if (!imageUrl) {
if (headlessFlag) {
console.log('\n⚠️ 未能获取到图片,可能触发了人机验证或网络超时。');
needsUiRetry = true;
} else {
console.log('\n❌ 失败: 无法获取图片链接。');
}
}
} catch (error) {
console.error('\n❌ 发生致命错误:', error);
if (headlessFlag) {
needsUiRetry = true;
}
} finally {
await client.close();
}
if (needsUiRetry) {
console.log('\n=============================================');
console.log('🔄 正在自动以 UI 模式重启,以便进行手动验证...');
console.log('💡 如果出现验证码,请在弹出的浏览器中手动完成验证。');
console.log('=============================================\n');
client = new DoubaoClient();
try {
await client.init(false); // Force UI mode
console.log(`\n任务 (重试): 生成图片 "prompt" (质量: quality${ratio` : ''})`);
// 给用户更多时间(比如 120 秒)来手动处理验证码
imageUrl = await client.generateImage({ prompt, quality, ratio, timeout: 120000 });
if (!imageUrl) {
console.log('\n❌ 重试失败: 仍无法获取图片链接。');
}
} catch (e) {
console.error('\n❌ UI 模式重试发生错误:', e);
} finally {
await client.close();
}
}
if (imageUrl) {
console.log('\n✅ 成功!');
console.log('图片链接:', imageUrl);
// Download the image
const savedPath = await DoubaoClient.downloadImage(imageUrl, outputPath);
if (savedPath) {
console.log(`💾 图片已保存至: savedPath`);
} else {
console.error('❌ 图片下载失败');
}
}
}
main();
FILE:tsconfig.json
{
"compilerOptions": {
"target": "ES2022",
"module": "Node16",
"moduleResolution": "Node16",
"esModuleInterop": true,
"forceConsistentCasingInFileNames": true,
"strict": true,
"skipLibCheck": true,
"types": ["node"],
"outDir": "./dist"
},
"include": ["scripts/**/*"]
}AI视频Prompt构建专家。采用"首尾帧图片+视频"工作流,支持多段5秒视频拼接生成长视频(30秒/60秒)。先生成关键帧图片,再生成视频Prompt,确保段与段之间无缝衔接。针对即梦平台优化,支持全中文Prompt输出。
---
name: ai-video-prompt
description: AI视频Prompt构建专家。采用"首尾帧图片+视频"工作流,支持多段5秒视频拼接生成长视频(30秒/60秒)。先生成关键帧图片,再生成视频Prompt,确保段与段之间无缝衔接。针对即梦平台优化,支持全中文Prompt输出。
---
# AI Video Prompt Builder - AI视频Prompt构建专家
## 概述
本Skill采用**"关键帧图片+多段视频拼接"**工作流,支持生成长视频(30秒/60秒或更长)。通过生成关键帧图片,再生成多段5秒视频,确保段与段之间无缝衔接。
**核心工作流:**
1. 规划视频总时长和段数(每段5秒)
2. 生成关键帧图片Prompt(N+1张,N=段数)
3. 生成每段5秒视频Prompt(基于相邻关键帧)
4. **关键:确保相邻段共用同一张关键帧**
5. 拼接成完整长视频
**核心能力:**
- 多段视频拼接规划
- 关键帧衔接一致性保障
- 道具/状态连贯性检查
- 详细描述防"违反常理"
- 即梦平台优化
- 全中文Prompt输出
**支持平台:**
- 即梦AI (Seedance) - 推荐
- Kling AI (可灵)
- Runway Gen-3/4
- OpenAI Sora
---
## 多段视频拼接原理
### 关键帧结构(以30秒=6段为例)
```
关键帧A(0s)
↓
关键帧B(5s)= 第1段尾帧 = 第2段首帧
↓
关键帧C(10s)= 第2段尾帧 = 第3段首帧
↓
关键帧D(15s)= 第3段尾帧 = 第4段首帧
↓
关键帧E(20s)= 第4段尾帧 = 第5段首帧
↓
关键帧F(25s)= 第5段尾帧 = 第6段首帧
↓
关键帧G(30s)= 第6段尾帧
```
**关键规则:**
- 30秒视频 = 6段 × 5秒 = **7张关键帧**
- 60秒视频 = 12段 × 5秒 = **13张关键帧**
- **相邻段必须共用同一张关键帧**
---
## 工作流程
### 阶段1:规划视频结构
**确定:**
- 总时长(30秒/60秒/其他)
- 段数(总时长 ÷ 5秒)
- 每段的起止状态
- 关键情绪转折点
### 阶段2:生成关键帧图片Prompt
**每张关键帧必须包含:**
1. **主体**:详细外貌+服装(完全一致)
2. **道具**:明确道具状态和位置
3. **表情**:具体神态
4. **姿势**:身体姿态
5. **环境**:场景+光线
**关键要求:**
- 所有关键帧的主体描述**完全相同**
- 道具状态变化要有**逻辑连贯性**
- 相邻关键帧的差异要**合理可控**
### 阶段3:生成每段视频Prompt
**每段结构:**
```
5秒视频:首帧状态 → 中间变化 → 尾帧状态
```
**要求:**
- 引用首帧图片
- 描述5秒内的完整变化
- 确保变化逻辑通向尾帧
### 阶段4:衔接一致性检查
**必须检查:**
- [ ] 第N段尾帧 = 第N+1段首帧(完全相同的Prompt)
- [ ] 道具状态连贯(玩具从有→掉→无,逻辑通顺)
- [ ] 主体外貌完全一致
- [ ] 服装完全一致
- [ ] 环境光线逻辑合理
---
## 关键帧衔接规范(防错误)
### 错误示例 ❌
**第1段尾帧B(5s):**
```
...手里抓着彩色玩具...
```
**第2段首帧B(5s):**
```
...手里无玩具... ← 错误!道具不一致
```
### 正确示例 ✅
**第1段尾帧B(5s):**
```
...手里抓着彩色玩具...
```
**第2段首帧B(5s):**
```
...手里抓着彩色玩具... ← 正确!完全一致
```
**第2段尾帧C(10s):**
```
...手里玩具掉落... ← 在本段内变化
```
### 道具状态时间线规范
**必须明确记录每个关键帧的道具状态:**
| 时间 | 关键帧 | 道具状态 | 变化说明 |
|------|--------|---------|---------|
| 0s | A | 无玩具 | 初始状态 |
| 5s | B | 手中有玩具 | 妈妈给的 |
| 10s | C | 玩具掉落 | 紧张时掉了 |
| 15s | D | 无玩具 | 妈妈收起来了 |
| 20s | E | 无玩具 | 保持 |
| 25s | F | 无玩具 | 保持 |
| 30s | G | 无玩具 | 保持 |
**规则:**
- 道具变化只能发生在**段内**(视频生成时)
- 关键帧之间**道具状态必须一致**
- 变化要有**合理逻辑**(紧张→掉落,不是凭空消失)
---
## 详细描述规范
### 主体描述(所有关键帧必须相同)
```
一个可爱的周岁宝宝,圆脸,粉嫩脸颊,稀疏柔软的黑发,
明亮好奇的大眼睛,穿着白色连体衣
```
**禁止在不同关键帧中改变:**
- ❌ 脸型变了
- ❌ 发型变了
- ❌ 服装颜色变了
- ❌ 年龄变了
### 道具描述(明确状态和位置)
**规范格式:**
```
手里[状态]玩具([位置])
```
**示例:**
- `手里抓着彩色玩具(玩具在手中)`
- `手里玩具掉落(玩具不在手中,掉在腿上)`
- `手里无玩具(玩具在妈妈包里)`
### 表情描述(允许变化)
**规范格式:**
```
表情[具体神态]
```
**示例:**
- `表情平静,嘴角带着微笑`
- `表情开始紧张,眉头微皱`
- `表情害怕,眼睛睁大`
---
## 完整示例:30秒宝宝打疫苗
### 故事线规划
| 段 | 时间 | 场景 | 情绪 | 道具变化 |
|---|------|------|------|---------|
| 1 | 0-5s | 到达医院 | 好奇 | 无→有玩具 |
| 2 | 5-10s | 等待中 | 好奇→紧张 | 有→掉落 |
| 3 | 10-15s | 看到护士 | 紧张→害怕 | 掉落→无 |
| 4 | 15-20s | 注射瞬间 | 害怕→震惊→大哭 | 无 |
| 5 | 20-25s | 大哭中 | 痛苦 | 无 |
| 6 | 25-30s | 妈妈安抚 | 痛苦→平静 | 无 |
### 7张关键帧Prompt
#### A(0s)- 第1段首帧
```
中景固定镜头,可爱的周岁宝宝,圆脸粉嫩,黑发稀疏,
明亮好奇的大眼睛,穿白色连体衣,被妈妈抱着走进医院大门,
表情好奇东张西望,手里无玩具,医院明亮大厅,自然光,温馨,8K
```
#### B(5s)- 第1段尾帧 = 第2段首帧
```
中景固定镜头,同个宝宝,圆脸粉嫩,黑发稀疏,
明亮好奇的大眼睛,穿白色连体衣,被妈妈抱着坐在候诊区椅子上,
表情好奇观察周围环境,手里抓着彩色玩具(玩具在手中),
医院候诊区,其他小朋友在远处,自然光,8K
```
#### C(10s)- 第2段尾帧 = 第3段首帧
```
中景固定镜头,同个宝宝,圆脸粉嫩,黑发稀疏,
明亮好奇的大眼睛,穿白色连体衣,坐在妈妈腿上,
表情紧张眉头微皱,看向诊室方向,手里玩具掉落(玩具不在手中,掉在腿上),
医院候诊区,自然光,8K
```
#### D(15s)- 第3段尾帧 = 第4段首帧
```
特写固定镜头,同个宝宝,圆脸粉嫩,黑发稀疏,
明亮的大眼睛,穿白色连体衣,表情害怕眼睛睁大嘴巴微张,
看到护士拿着针管,身体向后缩,手里无玩具(玩具在妈妈包里),
诊室环境,clinical light,8K
```
#### E(20s)- 第4段尾帧 = 第5段首帧
```
特写固定镜头,同个宝宝,圆脸变红,黑发微乱,
明亮的大眼睛紧闭流泪,穿白色连体衣,表情痛苦嘴巴大张哭泣,
眼泪流下脸颊,小拳头紧握挥舞,手里无玩具,
诊室环境,clinical light,8K
```
#### F(25s)- 第5段尾帧 = 第6段首帧
```
中景固定镜头,同个宝宝,圆脸仍然泛红,黑发微乱,
明亮的大眼睛流泪,穿白色连体衣,坐在妈妈腿上,
表情痛苦但开始减弱,眼泪减少,小拳头松开,
妈妈手轻拍背部安抚,手里无玩具,
诊室环境,clinical light转柔和光,8K
```
#### G(30s)- 第6段尾帧
```
中景固定镜头,同个宝宝,圆脸恢复粉嫩,黑发整齐,
明亮好奇的大眼睛,穿白色连体衣,靠在妈妈怀里,
表情委屈但平静,小手放松,手里无玩具,
妈妈温柔安抚,柔和光,8K
```
### 6段视频Prompt
#### 第1段:0-5秒(到达医院)
```
5秒视频:宝宝被妈妈抱着走进医院,好奇地东张西望,
转头看医院环境,手指向彩色装饰,妈妈给宝宝玩具,
宝宝坐在妈妈腿上玩玩具,镜头跟随移动,
自然光,真实运动,平滑过渡
```
#### 第2段:5-10秒(等待中)
```
5秒视频:宝宝玩着玩具,突然听到叫号声表情变化,
转头看向诊室方向,眉头微皱开始紧张,
小手抓紧妈妈衣服,玩具从手中掉落,
镜头缓慢推进到宝宝脸部,自然光,真实情绪变化,平滑过渡
```
#### 第3段:10-15秒(看到护士)
```
5秒视频:宝宝表情紧张看向诊室,护士拿着针管走来,
宝宝表情从紧张变害怕,眼睛睁大身体向后缩,
小手推开,表情惊恐,妈妈收起掉落的玩具,
镜头切换到宝宝视角看护士,自然光,真实恐惧反应,平滑过渡
```
#### 第4段:15-20秒(注射瞬间)
```
5秒视频:护士消毒宝宝手臂,宝宝表情害怕,
针扎入瞬间宝宝表情从害怕变震惊,眼睛睁大嘴巴张开无声喘息,
然后立即爆发出大声哭泣,眼泪涌出,
特写镜头捕捉瞬间表情变化,clinical light,慢动作,平滑过渡
```
#### 第5段:20-25秒(大哭中)
```
5秒视频:宝宝大声哭泣眼泪流下,小拳头紧握挥舞身体扭动,
妈妈伸手轻拍宝宝背部安抚,宝宝哭声仍然很大,
镜头从特写拉远到中景,clinical light,真实哭泣,平滑过渡
```
#### 第6段:25-30秒(妈妈安抚)
```
5秒视频:妈妈轻拍宝宝背部温柔安抚,宝宝哭声逐渐变小变成抽泣,
身体逐渐放松靠在妈妈怀里,表情从痛苦变委屈平静,
眼泪停止,呼吸平稳,镜头缓慢拉远,
柔和光,温馨氛围,真实情绪平复,平滑过渡
```
---
## 衔接一致性检查表
### 关键帧对比检查
| 检查点 | 第1段尾帧B | 第2段首帧B | 状态 |
|--------|-----------|-----------|------|
| 主体外貌 | 圆脸,黑发稀疏 | 圆脸,黑发稀疏 | ✅ 一致 |
| 服装 | 白色连体衣 | 白色连体衣 | ✅ 一致 |
| 道具 | 手中有玩具 | 手中有玩具 | ✅ 一致 |
| 表情 | 好奇 | 好奇 | ✅ 一致 |
| 检查点 | 第2段尾帧C | 第3段首帧C | 状态 |
|--------|-----------|-----------|------|
| 主体外貌 | 圆脸,黑发稀疏 | 圆脸,黑发稀疏 | ✅ 一致 |
| 服装 | 白色连体衣 | 白色连体衣 | ✅ 一致 |
| 道具 | 玩具掉落 | 玩具掉落 | ✅ 一致 |
| 表情 | 紧张 | 紧张 | ✅ 一致 |
### 道具时间线检查
| 时间 | 关键帧 | 道具状态 | 逻辑 |
|------|--------|---------|------|
| 0s | A | 无玩具 | 初始 |
| 5s | B | 有玩具 | 妈妈给的 ✅ |
| 10s | C | 掉落 | 紧张掉了 ✅ |
| 15s | D | 无 | 妈妈收起来了 ✅ |
| 20s | E | 无 | 保持 ✅ |
| 25s | F | 无 | 保持 ✅ |
| 30s | G | 无 | 保持 ✅ |
---
## 在即梦平台的操作步骤
### 步骤1:生成7张关键帧图片
按顺序生成:A → B → C → D → E → F → G
**注意:**
- 每张图片都要仔细检查
- 确保主体外貌完全一致
- 确保道具状态符合时间线
- 确保相邻关键帧可衔接
### 步骤2:生成6段5秒视频
| 段 | 首帧 | 尾帧 | 视频时长 |
|---|------|------|---------|
| 1 | A | B | 0-5s |
| 2 | B | C | 5-10s |
| 3 | C | D | 10-15s |
| 4 | D | E | 15-20s |
| 5 | E | F | 20-25s |
| 6 | F | G | 25-30s |
### 步骤3:拼接视频
使用剪映/PR:
1. 按顺序导入6段视频
2. 检查衔接点(B→C→D→E→F)
3. 确保无缝衔接
4. 导出完整30秒视频
---
## 常见错误与修正
### 错误1:道具状态不一致
❌ **错误:**
```
第1段尾帧B:手里抓着玩具
第2段首帧B:手里无玩具 ← 矛盾!
```
✅ **修正:**
```
第1段尾帧B:手里抓着玩具
第2段首帧B:手里抓着玩具 ← 一致!
第2段尾帧C:玩具掉落 ← 变化发生在段内
```
### 错误2:主体外貌变化
❌ **错误:**
```
关键帧A:圆脸,黑发稀疏
关键帧B:瓜子脸,长发 ← 变了!
```
✅ **修正:**
```
关键帧A:圆脸,黑发稀疏
关键帧B:圆脸,黑发稀疏 ← 完全一致
```
### 错误3:服装颜色变化
❌ **错误:**
```
关键帧A:白色连体衣
关键帧B:蓝色连体衣 ← 变了!
```
✅ **修正:**
```
关键帧A:白色连体衣
关键帧B:白色连体衣 ← 完全一致
```
---
## 负面约束(每段通用)
```
无变形,无扭曲,无多余肢体,面部稳定,角色一致,
无闪烁,无突然变色,运动平滑,真实表情,自然眼泪,
符合物理规律,宝宝外貌一致,服装一致,道具逻辑连贯
```
---
**版本:** 4.0
**更新日期:** 2026-03-16
**更新内容:**
- 添加多段视频拼接工作流
- 添加关键帧衔接一致性规范
- 添加道具状态时间线检查
- 添加常见错误与修正示例
**基于:** ChatGPT + 千问融合分析结果 + 实际使用反馈
FILE:examples/keyframe_examples.md
# 首尾帧工作流示例库
## 示例1:宝宝打疫苗(情绪反差)
### 开始帧(第0秒)
```
中景固定镜头,一个可爱的周岁宝宝,圆脸,粉嫩脸颊,稀疏柔软的黑发,
明亮好奇的大眼睛,穿着白色连体衣,坐在妈妈腿上,表情平静,
嘴角带着微笑,小手抓着彩色玩具,在明亮的儿科诊室里,
柔和自然窗光,温馨氛围,照片级真实,8K,超高清细节
```
### 结束帧(第5秒)
```
中景固定镜头,同一个可爱的周岁宝宝,圆脸,粉嫩脸颊(因哭泣而变红),
稀疏柔软的黑发(略微凌乱),明亮的大眼睛(紧闭流泪),
穿着白色连体衣,坐在妈妈腿上,表情痛苦,嘴巴大张哭泣,
眼泪流下脸颊,小拳头紧握,在明亮的儿科诊室里,
柔和自然窗光,情绪氛围,照片级真实,8K,超高清细节
```
### 5秒视频Prompt
```
[引用开始帧图片],宝宝表情从平静好奇突然转变为震惊,
眼睛睁大,嘴巴张开无声喘息(第1-2秒)。
然后护士注射疫苗后立即爆发出大声哭泣,眼泪涌出流下脸颊,
小拳头紧握挥舞,脸因情绪而变红(第3-4秒)。
妈妈的手轻轻安抚宝宝(第4-5秒)。
镜头缓慢推进到宝宝脸部,捕捉情绪转变全过程。
自然光,真实运动,平滑过渡,照片级真实,8K
```
### 负面约束
```
无变形,无扭曲,无多余肢体,面部稳定,角色一致,
无闪烁,无突然变色,运动平滑,真实哭泣表情,
自然眼泪,无漂浮物体,符合物理规律
```
---
## 示例2:赛博朋克女孩(警觉反应)
### 开始帧(第0秒)
```
中景固定镜头,一位银发赛博朋克女孩,20岁左右,精致五官,
冷峻表情,穿着发光霓虹夹克,黑色紧身裤,机械义肢手臂,
自信地行走在雨夜东京街头,霓虹灯招牌在潮湿地面反射,
细雨飘落,粉蓝色霓虹光,赛博朋克美学,电影级灯光,8K
```
### 结束帧(第5秒)
```
中景固定镜头,同一位银发赛博朋克女孩,20岁左右,精致五官,
警觉表情(眉头紧锁),穿着发光霓虹夹克,黑色紧身裤,机械义肢手臂,
突然停下,手摸耳机,环顾四周,在雨夜东京街头,
霓虹灯闪烁,蒸汽从通风口升起,赛博朋克美学,电影级灯光,8K
```
### 5秒视频Prompt
```
[引用开始帧图片],女孩自信行走中突然停下,
表情从自信变为警觉,眉头紧锁(第1-2秒)。
手摸耳机,头转向左侧,环顾四周(第3-4秒)。
背景霓虹灯闪烁,蒸汽升起(第4-5秒)。
镜头环绕拍摄,捕捉警觉反应全过程。
电影级灯光,真实物理,赛博朋克风格,8K
```
### 负面约束
```
无变形,无扭曲,面部稳定,角色一致,
无闪烁,机械义肢稳定,霓虹灯不扭曲,
符合物理规律,无空间扭曲
```
---
## 示例3:古风女子(舞蹈)
### 开始帧(第0秒)
```
中景固定镜头,一位身着飘逸白色汉服的优雅女子,长发及腰,
精致妆容,手持折扇,站在古典园林中,表情恬静,
樱花树环绕,柔和午后阳光,金色光线穿透花瓣,
唯美诗意氛围,照片级真实,8K,超高清细节
```
### 结束帧(第5秒)
```
中景固定镜头,同一位身着飘逸白色汉服的优雅女子,长发及腰(随风飘动),
精致妆容,手持折扇(展开),在古典园林中翩翩起舞,
表情愉悦,嘴角微笑,樱花瓣飘落,金色阳光,
唯美诗意氛围,照片级真实,8K,超高清细节
```
### 5秒视频Prompt
```
[引用开始帧图片],女子缓缓展开折扇,表情从恬静变为愉悦(第1-2秒)。
然后开始翩翩起舞,长袖飘逸,裙摆飞扬(第3-4秒)。
樱花瓣缓缓飘落,随风起舞(第4-5秒)。
镜头缓慢环绕,捕捉舞姿和花瓣。
金色阳光,真实布料物理,唯美诗意,8K
```
### 负面约束
```
无变形,无扭曲,面部稳定,服装一致,
无漂浮头发,真实布料解算,自然花瓣飘落,
符合物理规律
```
---
## 示例4:咖啡制作(产品展示)
### 开始帧(第0秒)
```
特写固定镜头,一杯新鲜萃取的浓缩咖啡,深棕色油脂,
白色陶瓷杯,放在木质吧台上,蒸汽缓缓升起,
温暖咖啡店环境,柔和侧光,温馨氛围,
产品摄影风格,8K,超高清细节
```
### 结束帧(第5秒)
```
特写固定镜头,同一杯咖啡,深棕色油脂,
白色陶瓷杯,放在木质吧台上,
拿铁艺术图案形成(心形叶子),蒸汽升起,
温暖咖啡店环境,柔和侧光,
产品摄影风格,8K,超高清细节
```
### 5秒视频Prompt
```
[引用开始帧图片],咖啡师手持奶泡壶,
开始缓慢倒入蒸奶(第1-2秒)。
奶泡与咖啡融合,形成白色漩涡(第3-4秒)。
最后拉出心形叶子图案(第4-5秒)。
镜头固定,慢动作捕捉拉花过程。
柔和侧光,真实流体物理,产品摄影,8K
```
### 负面约束
```
无变形,无扭曲,杯子稳定,液体物理真实,
无闪烁,无突然变色,运动平滑,
符合流体动力学
```
---
## 示例5:猫咪玩耍(动物)
### 开始帧(第0秒)
```
中景固定镜头,一只毛茸茸的橘猫,明亮绿眼睛,白色爪子,
坐在花园草地上,表情好奇,耳朵竖起,
盯着空中的蝴蝶,背景是 colorful 花朵,
温暖晨光,柔和氛围,照片级真实,8K,超高清细节
```
### 结束帧(第5秒)
```
中景固定镜头,同一只毛茸茸的橘猫,明亮绿眼睛,白色爪子,
前爪离地,身体腾空,扑向蝴蝶,表情专注,
在花园草地上,背景花朵摇曳,
温暖晨光,动态氛围,照片级真实,8K,超高清细节
```
### 5秒视频Prompt
```
[引用开始帧图片],猫咪身体压低,尾巴摇摆,
准备扑击(第1-2秒)。
然后猛地跃起,前爪伸展,扑向蝴蝶(第3-4秒)。
蝴蝶飞走,猫咪落地(第4-5秒)。
镜头跟随猫咪运动,慢动作捕捉扑击瞬间。
温暖晨光,真实动物物理,照片级真实,8K
```
### 负面约束
```
无变形,无扭曲,面部稳定,毛发一致,
无闪烁,无漂浮毛发,真实动物运动,
符合物理规律
```
---
## 使用流程总结
### 步骤1:生成开始帧
使用开始帧Prompt生成第0秒的图片
### 步骤2:生成结束帧
使用结束帧Prompt生成第5秒的图片
### 步骤3:检查一致性
- 主体外貌是否一致
- 服装是否一致
- 环境是否一致
- 光影是否一致
### 步骤4:生成视频
在即梦/可灵等平台:
1. 上传开始帧图片作为首帧
2. 上传结束帧图片作为尾帧
3. 输入视频Prompt
4. 生成5秒视频
### 步骤5:优化
如效果不佳:
- 增加主体描述细节
- 调整动作时间分配
- 添加负面约束
- 重新生成
---
**提示:** 所有示例都遵循"详细描述防变形"原则,确保生成的图片和视频符合常理。
FILE:examples/prompt_examples.md
# AI视频Prompt示例库
## 人物类
### 赛博朋克女孩
**基础版:**
```
A cyberpunk girl walking in rainy Tokyo street at night, neon lights, cinematic
```
**标准版:**
```
Medium tracking shot, a young cyberpunk girl with silver hair and neon jacket,
walking confidently through a rainy Tokyo street at night, neon signs reflecting
on wet pavement, light rain falling, cinematic lighting, cyberpunk aesthetic,
4k, photorealistic
```
**导演版:**
```
Cinematic low-angle tracking shot following a cyberpunk girl with silver hair
and glowing neon jacket walking through a rainy Tokyo street at night. Her jacket
reflects the wet pavement. Raindrops splash dynamically as her boots hit the ground.
Steam rises from street vents. Volumetric fog interacts with pink and blue neon
signs. Shot on 35mm lens, f/1.8, shallow depth of field, motion blur on background,
high contrast, photorealistic, 8k, film grain, color graded
```
---
## 图生视频示例
### 示例1:人物肖像
**输入图片:** 女孩肖像
**Prompt:**
```
The character slowly turns her head to the left, hair gently flowing with the
movement. Her eyes blink naturally. Subtle smile forms. Soft ambient light
creates gentle shadows on her face. Background remains static. 4k, photorealistic.
```
### 示例2:风景照片
**输入图片:** 山脉风景
**Prompt:**
```
Clouds slowly drift across the mountain peaks. Sunlight gradually shifts creating
dynamic shadows on the slopes. Birds fly across the frame. Grass in foreground
sways gently in breeze. Time-lapse atmosphere. Cinematic, 4k.
```
### 示例3:产品照片
**输入图片:** 手表产品
**Prompt:**
```
Camera slowly orbits around the watch. Light reflects off the crystal face creating
subtle glints. Second hand moves smoothly. Shadow shifts as light source changes
angle. Premium product photography style, 4k.
```
---
## 负面约束示例
### 通用负面约束
```
no morphing, no distortion, no extra limbs, stable face, consistent character,
no flickering, no watermark, no text, no blur, sharp focus
```
### 人物专用负面约束
```
no extra fingers, no deformed hands, stable face, consistent outfit,
no sudden color changes, smooth motion
```
### 场景专用负面约束
```
no sudden lighting changes, stable background, no floating objects,
realistic physics, consistent perspective
```
FILE:scripts/prompt_builder.py
#!/usr/bin/env python3
"""
AI Video Prompt Builder Script - 多段视频拼接版
AI视频Prompt构建辅助脚本 - 支持生成长视频(30秒/60秒)
Usage:
python prompt_builder.py --interactive
python prompt_builder.py --duration 30 --segments 6 --lang zh
"""
import argparse
import json
from typing import Dict, List, Optional, Tuple
def build_keyframe_prompt(segment_num: int, total_segments: int,
subject: str, appearance: str, clothing: str,
expression: str, pose: str, prop: str,
environment: str, lighting: str) -> str:
"""
构建关键帧Prompt
"""
time_point = segment_num * 5
# 确定是首帧、中间帧还是尾帧
if segment_num == 0:
frame_type = "首帧"
elif segment_num == total_segments:
frame_type = "尾帧"
else:
frame_type = f"关键帧{chr(65+segment_num)}" # B, C, D...
return (
f"中景固定镜头,{subject},{appearance},"
f"穿{clothing},{expression},{pose},"
f"{prop},在{environment},"
f"{lighting},8K"
)
def build_segment_video_prompt(segment_num: int, start_desc: str,
end_desc: str, transition: str) -> str:
"""
构建每段5秒视频Prompt
"""
start_time = segment_num * 5
end_time = start_time + 5
return (
f"5秒视频:{start_desc}(第1-2秒)。"
f"{transition}(第3-4秒)。"
f"{end_desc}(第4-5秒)。"
f"镜头跟随移动,自然光,真实运动,平滑过渡"
)
def check_consistency(prev_frame: str, next_frame: str) -> List[str]:
"""
检查相邻关键帧一致性
"""
issues = []
# 检查主体一致性
if "同个" not in next_frame and "同一个" not in next_frame:
issues.append("缺少'同个'标识,主体可能不一致")
return issues
def generate_negative_prompt() -> str:
"""
生成负面约束Prompt
"""
return (
"无变形,无扭曲,无多余肢体,面部稳定,角色一致,"
"无闪烁,无突然变色,运动平滑,真实表情,"
"符合物理规律,外貌一致,服装一致,道具逻辑连贯"
)
def interactive_builder_long_video():
"""
交互式长视频Prompt构建
"""
print("=" * 70)
print("AI视频Prompt构建助手 - 多段视频拼接版")
print("=" * 70)
print("\n本工具支持生成长视频(30秒/60秒/更长)")
print("通过多段5秒视频拼接实现")
print("=" * 70)
# 基本信息
duration = int(input("\n1. 目标总时长(秒,建议30或60): ") or "30")
num_segments = duration // 5
num_keyframes = num_segments + 1
print(f"\n将生成:{num_segments}段 × 5秒 = {duration}秒")
print(f"需要:{num_keyframes}张关键帧图片")
# 主体信息
print("\n【主体信息】")
subject = input("2. 主体身份 (如: 可爱的周岁宝宝): ")
appearance = input("3. 外貌细节 (如: 圆脸,粉嫩脸颊,黑发稀疏,大眼睛): ")
clothing = input("4. 服装 (如: 白色连体衣): ")
# 为每段收集信息
segments = []
print(f"\n【为每段收集信息,共{num_segments}段】")
for i in range(num_segments):
print(f"\n--- 第{i+1}段:{i*5}-{(i+1)*5}秒 ---")
expression_start = input(f" 开始表情: ")
expression_end = input(f" 结束表情: ")
prop_state = input(f" 道具状态: ")
transition = input(f" 变化过程: ")
segments.append({
"start_time": i * 5,
"end_time": (i + 1) * 5,
"expression_start": expression_start,
"expression_end": expression_end,
"prop_state": prop_state,
"transition": transition
})
# 生成关键帧
print("\n" + "=" * 70)
print("生成的关键帧Prompt")
print("=" * 70)
keyframes = []
for i in range(num_keyframes):
if i == 0:
# 首帧
frame = build_keyframe_prompt(
i, num_segments, subject, appearance, clothing,
segments[i]["expression_start"], "初始姿势", segments[i]["prop_state"],
"医院环境", "自然光"
)
elif i == num_segments:
# 尾帧
frame = build_keyframe_prompt(
i, num_segments, subject, appearance, clothing,
segments[i-1]["expression_end"], "结束姿势", segments[i-1]["prop_state"],
"医院环境", "柔和光"
)
else:
# 中间帧(衔接帧)
frame = build_keyframe_prompt(
i, num_segments, f"同个{subject}", appearance, clothing,
segments[i-1]["expression_end"], "过渡姿势", segments[i-1]["prop_state"],
"医院环境", "自然光"
)
keyframes.append(frame)
frame_label = chr(65 + i)
print(f"\n【关键帧{frame_label}({i*5}秒)】")
print(frame)
# 生成视频Prompt
print("\n" + "=" * 70)
print("生成的视频Prompt")
print("=" * 70)
for i, seg in enumerate(segments):
video_prompt = build_segment_video_prompt(
i,
f"表情{seg['expression_start']}",
f"表情{seg['expression_end']}",
seg['transition']
)
start_label = chr(65 + i)
end_label = chr(65 + i + 1)
print(f"\n【第{i+1}段:{seg['start_time']}-{seg['end_time']}秒】")
print(f"首帧:{start_label} | 尾帧:{end_label}")
print(video_prompt)
# 一致性检查
print("\n" + "=" * 70)
print("衔接一致性检查")
print("=" * 70)
all_passed = True
for i in range(len(keyframes) - 1):
issues = check_consistency(keyframes[i], keyframes[i+1])
start_label = chr(65 + i)
end_label = chr(65 + i + 1)
if issues:
print(f"\n⚠️ {start_label} → {end_label}:")
for issue in issues:
print(f" - {issue}")
all_passed = False
else:
print(f"\n✅ {start_label} → {end_label}: 检查通过")
if all_passed:
print("\n✅ 所有衔接点检查通过!")
else:
print("\n⚠️ 请修正上述问题后再生成")
# 负面约束
print("\n【负面约束】")
print(generate_negative_prompt())
# 使用建议
print("\n" + "=" * 70)
print("在即梦平台的操作步骤")
print("=" * 70)
print(f"\n1. 生成{num_keyframes}张关键帧图片(按A→B→C...顺序)")
print(f"2. 生成{num_segments}段5秒视频")
print("3. 使用剪映/PR拼接成完整视频")
def quick_build_30s_example():
"""
快速生成30秒宝宝打疫苗示例
"""
print("=" * 70)
print("30秒宝宝打疫苗 - 完整Prompt方案")
print("=" * 70)
result = {
"关键帧": {
"A(0s)": "中景固定镜头,可爱的周岁宝宝,圆脸,粉嫩脸颊,稀疏柔软的黑发,明亮好奇的大眼睛,穿白色连体衣,被妈妈抱着走进医院大门,表情好奇东张西望,手里无玩具,医院明亮大厅,自然光,温馨,8K",
"B(5s)": "中景固定镜头,同个可爱的周岁宝宝,圆脸,粉嫩脸颊,稀疏柔软的黑发,明亮好奇的大眼睛,穿白色连体衣,被妈妈抱着坐在候诊区椅子上,表情好奇观察周围环境,手里抓着彩色玩具(玩具在手中),医院候诊区,其他小朋友在远处,自然光,8K",
"C(10s)": "中景固定镜头,同个可爱的周岁宝宝,圆脸,粉嫩脸颊,稀疏柔软的黑发,明亮好奇的大眼睛,穿白色连体衣,坐在妈妈腿上,表情紧张眉头微皱,看向诊室方向,手里玩具掉落(玩具不在手中,掉在腿上),医院候诊区,自然光,8K",
"D(15s)": "特写固定镜头,同个可爱的周岁宝宝,圆脸,粉嫩脸颊,稀疏柔软的黑发,明亮的大眼睛,穿白色连体衣,表情害怕眼睛睁大嘴巴微张,看到护士拿着针管,身体向后缩,手里无玩具(玩具在妈妈包里),诊室环境,clinical light,8K",
"E(20s)": "特写固定镜头,同个可爱的周岁宝宝,圆脸变红,黑发微乱,明亮的大眼睛紧闭流泪,穿白色连体衣,表情痛苦嘴巴大张哭泣,眼泪流下脸颊,小拳头紧握挥舞,手里无玩具,诊室环境,clinical light,8K",
"F(25s)": "中景固定镜头,同个可爱的周岁宝宝,圆脸仍然泛红,黑发微乱,明亮的大眼睛流泪,穿白色连体衣,坐在妈妈腿上,表情痛苦但开始减弱,眼泪减少,小拳头松开,妈妈手轻拍背部安抚,手里无玩具,诊室环境,clinical light转柔和光,8K",
"G(30s)": "中景固定镜头,同个可爱的周岁宝宝,圆脸恢复粉嫩,黑发整齐,明亮好奇的大眼睛,穿白色连体衣,靠在妈妈怀里,表情委屈但平静,小手放松,手里无玩具,妈妈温柔安抚,柔和光,8K"
},
"视频段": {
"第1段(0-5s)": {
"首帧": "A", "尾帧": "B",
"prompt": "5秒视频:宝宝被妈妈抱着走进医院,好奇地东张西望,转头看医院环境,手指向彩色装饰,妈妈给宝宝玩具,宝宝坐在妈妈腿上玩玩具,镜头跟随移动,自然光,真实运动,平滑过渡"
},
"第2段(5-10s)": {
"首帧": "B", "尾帧": "C",
"prompt": "5秒视频:宝宝玩着玩具,突然听到叫号声表情变化,转头看向诊室方向,眉头微皱开始紧张,小手抓紧妈妈衣服,玩具从手中掉落,镜头缓慢推进到宝宝脸部,自然光,真实情绪变化,平滑过渡"
},
"第3段(10-15s)": {
"首帧": "C", "尾帧":