zihao chu

@clawhub-sir1st-inc-bbc3d5b6f5
1prompts
0upvotes received
0contributions
Joined 3 months ago
1 contribution in the last year
Aug
Sep
Oct
Nov
Dec
Jan
Feb
Mar
Apr
May
Jun
Jul
Less
Wuli Skill
Skill
Generate AI images and videos with 17+ active models via Wuli.art open platform API. Use when creating images from text prompts, editing images with one or m...
---
name: wuli
description: Generate AI images and videos with 17+ active models via Wuli.art open platform API. Use when creating images from text prompts, editing images with one or more references, generating videos from text, animating a first frame, generating first-last-frame videos, or creating auto-video tasks from images and videos. Covers text-to-image, image-to-image, text-to-video, first-frame video, first-last-frame video, auto-video, prompt optimization, uploads, and no-watermark downloads.
version: 1.0.6
author: sir1st
homepage: https://wuli.art
repository: https://github.com/alibaba-wuli/wuli-skill
requires:
  env:
    - WULI_API_TOKEN
primaryEnv: WULI_API_TOKEN
metadata: {"clawdbot":{"emoji":"🎨","primaryEnv":"WULI_API_TOKEN","requires":{"anyBins":["python3"],"env":["WULI_API_TOKEN"]},"os":["linux","darwin","win32"]}}
tags:
  - ai
  - image-generation
  - video-generation
  - text-to-image
  - text-to-video
  - image-to-video
  - image-editing
  - art
  - creative
  - wuli
triggers:
  - generate image
  - generate video
  - text to image
  - text to video
  - image to video
  - AI art
  - edit image
  - create artwork
  - animate image
  - wuli
---

# Wuli Platform — AI Image & Video Generation

Generate AI images and videos via the [Wuli.art](https://wuli.art) open platform API. Supports text-to-image, multi-reference image editing, text-to-video, first-frame image-to-video, first-last-frame video, auto-video, and sound control on supported video models across 17+ active models including Qwen Image, Seedream, 通义万相, 可灵, Seedance, and MiniMax Hailuo, with automatic uploads and no-watermark downloads when available.

## When to Use

- User wants to generate images from a text description
- User wants to edit or transform one or more existing images with AI
- User wants to create video from a text prompt
- User wants to animate a static image into a video
- User wants to generate a video from a first frame and last frame
- User wants to generate a video from reference images and/or a reference video
- User needs high-resolution (up to 4K) AI artwork
- User wants batch image generation
- User needs to choose between multiple AI models for best results

## Setup

```bash
export WULI_API_TOKEN="wuli-your-token-here"
```

Get your token: log in to [wuli.art](https://wuli.art), click the **API entry** at the bottom-left corner.

No additional dependencies — uses only Python 3 standard library.

## Quick Start

```bash
# Generate an image (simplest usage)
python3 skill.py --action image-gen --prompt "a serene mountain lake at sunrise"

# Generate a video
python3 skill.py --action txt2video --prompt "waves crashing on a golden beach at sunset"

# First-last-frame video
python3 skill.py --action flf2video --prompt "camera pushes through the scene"   --image_path ./start.jpg --end_image_path ./end.jpg

# Auto-video from a reference video
python3 skill.py --action auto-video --prompt "preserve the motion, make it cinematic"   --video_path ./input.mp4 --model "通义万相 2.6"
```

## Complete Command Reference

```bash
python3 skill.py --action <action> --prompt "description" [options]
```

### Actions

| Action | Description | Reference Inputs |
|---|---|---|
| `image-gen` | Text to image | None |
| `image-edit` | Edit or transform one or more reference images | `--image_url` or `--image_path` |
| `txt2video` | Text to video | None |
| `image2video` | Animate one image into a video | Exactly one `--image_*` |
| `flf2video` | First-last-frame video generation | Start frame via `--image_*`, end frame via `--end_image_*` |
| `auto-video` | Auto-video from reference images and/or videos | `--image_*`, `--video_*`, or both |

### Parameters

| Parameter | Default | Description |
|---|---|---|
| `--prompt` | *(required)* | Generation prompt, max 2000 chars |
| `--model` | auto-selected | Model name (see Model Selection Guide) |
| `--aspect_ratio` | 1:1 (image) / 16:9 (video) | Aspect ratio |
| `--resolution` | 2K (image) / 720P (video) | Output resolution. Model-dependent values include `2K`, `3K`, `4K`, `480P`, `720P`, `768P`, `1080P` |
| `--n` | 1 | Number of images to generate (1-4, image-gen only) |
| `--image_url` | — | Reference image URL(s). Supports comma-separated multiple URLs |
| `--image_path` | — | Local reference image path(s). Supports comma-separated multiple files |
| `--end_image_url` | — | End-frame image URL for `flf2video` |
| `--end_image_path` | — | End-frame local image path for `flf2video` |
| `--video_url` | — | Reference video URL(s) for `auto-video`, comma-separated |
| `--video_path` | — | Local reference video path(s) for `auto-video`, comma-separated |
| `--duration` | 5 | Video length in seconds |
| `--negative_prompt` | — | Exclude unwanted elements |
| `--sound` | backend default | Enable sound for supported video models. If omitted, backend default behavior is used |
| `--no-sound` | — | Disable sound for supported video models |
| `--optimize` | true | Prompt optimization is enabled by default and recommended for most prompts |
| `--no-optimize` | — | Disable prompt optimization when you need fully raw prompt behavior |

### Input Rules

- `image-edit` accepts one or more reference images
- `image2video` requires exactly one reference image
- `flf2video` requires exactly two images: start frame + end frame
- `auto-video` accepts reference images, reference videos, or both
- `--sound` / `--no-sound` only affect video tasks, and only take effect on models that support sound control
- Local files and remote URLs are auto-uploaded to Wuli OSS before submission

### Aspect Ratios

| Ratio | Use Case |
|---|---|
| `1:1` | Square — social media posts, avatars, icons |
| `16:9` | Widescreen — videos, desktop wallpapers, presentations |
| `9:16` | Vertical — phone wallpapers, stories, reels |
| `4:3` | Classic — photos, prints |
| `3:2` | Photography — DSLR-style landscape shots |
| `21:9` | Ultra-wide — cinematic banners (image only on supported models) |

## Model Selection Guide

### Image Models — Which to Choose

| Model | Best For | Resolution | Ref Images | Cost |
|---|---|---|---|---|
| 通义万相 2.7 | Newest Wan image model, up to 9 ref images | 2K, 4K | 9 | 1 credit (2K) / 3 credits (4K) |
| **Qwen Image 2.0** *(default)* | General purpose, fast, versatile | 2K, 4K | 4 | 1 credit |
| Qwen Image Turbo | Quick drafts, iterations | 2K, 4K | 4 | 1 credit |
| Seedream 5.0 Lite | Higher detail at lower cost than premium Seedream | 2K, 3K | 8 | 4 credits |
| Seedream 4.5 | Photorealism, high-fidelity detail | 2K, 4K | 8 | 4 credits |
| Seedream 4.0 | Photorealism with broad API-side support | 2K, 4K | 8 | 4 credits |

**Recommendations:**
- **Fastest & cheapest**: Qwen Image Turbo
- **Best all-rounder**: Qwen Image 2.0
- **Best detail at mid tier**: Seedream 5.0 Lite
- **Best quality for photos**: Seedream 4.5
- **Most ref images (up to 9)**: 通义万相 2.7
- **Need 4K**: 通义万相 2.7, Qwen Image 2.0, Qwen Image Turbo, or Seedream 4.5

### Video Models — Which to Choose

| Model | Best For | Resolution | Duration | Key Modes |
|---|---|---|---|---|
| **通义万相 2.2 Turbo** *(default)* | Quick videos, low cost | 720P | 5s | TXT, FF |
| 通义万相 2.7 | Newest Wan video model with stronger narrative | 720P-1080P | 5-15s (AUTO 5-10) | TXT, FF, FLF, AUTO |
| 通义万相 2.6 Flash | Fast image-to-video | 720P-1080P | 5-15s | FF |
| 通义万相 2.6 | Best all-rounder with auto-video | 720P-1080P | 5-15s | TXT, FF, AUTO |
| Happy Horse 1.0 | Long-duration TXT/FF/AUTO video, premium quality | 720P-1080P | 5-15s | TXT, FF, AUTO |
| 可灵 3.0 Omni | Richest multi-input video workflow | 720P-1080P | 5-15s | TXT, FF, FLF, AUTO |
| 可灵 O1 | Premium omni video quality | 720P-1080P | 5-10s | TXT, FF, FLF, AUTO |
| 可灵 3.0 | High-quality first-last-frame video | 720P-1080P | 5-15s | TXT, FF, FLF |
| 可灵 2.6 | 1080P-focused Kling generation | 1080P | 5-10s | TXT, FF, FLF |
| 可灵 2.5 Turbo | Lower-cost Kling first-last-frame workflows | 1080P | 5-10s | TXT, FF, FLF |
| Seedance 1.5 Pro | Motion-heavy videos, up to 12s | 480P-720P | 5-12s | TXT, FF, FLF |
| Seedance 1.0 Pro | Broad resolution coverage | 480P-1080P | 5-10s | TXT, FF, FLF |
| MiniMax Hailuo 2.3 | Cinematic text/video generation | 768P-1080P | 6-10s | TXT, FF |
| MiniMax Hailuo 2.3 Fast | Faster image-to-video | 768P-1080P | 6-10s | FF |

**Recommendations:**
- **Fastest & cheapest**: 通义万相 2.2 Turbo
- **Best all-rounder**: 通义万相 2.7 or 通义万相 2.6
- **Best multi-input / auto-video**: 可灵 3.0 Omni or 通义万相 2.7
- **Best premium quality**: Happy Horse 1.0, 可灵 O1, or 可灵 3.0
- **Best for first-last-frame**: 通义万相 2.7, 可灵 3.0, 可灵 2.6, or Seedance 1.5 Pro
- **Best for 1080P-only workflows**: 可灵 2.6 or 可灵 2.5 Turbo

## Examples

### Text to Image

```bash
# Simple generation
python3 skill.py --action image-gen --prompt "anime girl with blue hair in a garden"

# Batch generate 4 images to pick the best
python3 skill.py --action image-gen --prompt "cyberpunk cityscape at night" --n 4

# Photorealistic with premium model
python3 skill.py --action image-gen --prompt "photorealistic mountain landscape, golden hour"   --model "Seedream 4.5" --resolution 4K --aspect_ratio 16:9

# Higher-detail model
python3 skill.py --action image-gen --prompt "luxury product photography, studio lighting"   --model "Seedream 5.0 Lite" --resolution 3K

# Prompt optimization is enabled by default
python3 skill.py --action image-gen --prompt "a cat"

# Disable optimization only when you need exact raw prompting
python3 skill.py --action image-gen --prompt "a cat" --no-optimize
```

### Image Editing

```bash
# Edit a local image
python3 skill.py --action image-edit --prompt "add sunglasses and a hat"   --image_path ./photo.jpg

# Edit with multiple reference images
python3 skill.py --action image-edit --prompt "merge these references into one unified brand illustration"   --image_path "./ref1.jpg,./ref2.jpg"

# Edit a remote image
python3 skill.py --action image-edit --prompt "change background to sunset beach"   --image_url "https://example.com/photo.jpg"

# Style transfer
python3 skill.py --action image-edit --prompt "convert to oil painting style"   --image_path ./landscape.jpg --model "Seedream 4.5"
```

### Text to Video

```bash
# Simple video
python3 skill.py --action txt2video --prompt "waves crashing on a golden beach at sunset"

# Longer duration with higher quality
python3 skill.py --action txt2video --prompt "a cat playing piano in a jazz club"   --model "通义万相 2.6" --duration 10 --resolution 1080P

# Explicitly enable sound on a supported model
python3 skill.py --action txt2video --prompt "a singer performing on a neon stage"   --model "通义万相 2.6" --duration 10 --sound

# Cinematic quality
python3 skill.py --action txt2video --prompt "slow-motion rain drops on a window"   --model "可灵 O1" --resolution 1080P --aspect_ratio 16:9

# Disable sound when you want silent output or a lower-credit tier on some models
python3 skill.py --action txt2video --prompt "slow aerial shot over a mountain lake"   --model "可灵 3.0 Omni" --duration 10 --no-sound
```

### Image to Video (Animate)

```bash
# Animate a landscape photo
python3 skill.py --action image2video --prompt "slow zoom in with gentle wind blowing"   --image_path ./landscape.jpg --duration 5

# Animate from URL with premium model
python3 skill.py --action image2video --prompt "character turns head and smiles"   --image_url "https://example.com/portrait.jpg" --model "可灵 O1"
```

### First-Last-Frame Video

```bash
# Use start and end frame to control motion
python3 skill.py --action flf2video --prompt "a smooth cinematic transition between these two shots"   --image_path ./start.jpg --end_image_path ./end.jpg --model "可灵 3.0"

# Mix local and remote references
python3 skill.py --action flf2video --prompt "the flower blooms naturally from frame one to frame two"   --image_path ./flower_closed.jpg   --end_image_url "https://example.com/flower_open.jpg"
```

### Auto-Video

```bash
# Auto-video from a reference video
python3 skill.py --action auto-video --prompt "keep the action, upgrade it to a cinematic sci-fi look"   --video_path ./input.mp4 --model "通义万相 2.6" --duration 10 --sound

# Auto-video from multiple reference images
python3 skill.py --action auto-video --prompt "turn these keyframes into a coherent product reveal video"   --image_path "./frame1.jpg,./frame2.jpg,./frame3.jpg"   --model "可灵 3.0 Omni" --resolution 1080P

# Auto-video with both images and video
python3 skill.py --action auto-video --prompt "preserve the motion, but restyle everything as an anime sequence"   --image_url "https://example.com/style_ref.jpg"   --video_url "https://example.com/source.mp4"   --model "可灵 O1"
```

## Workflow

```
1. (Optional) Upload reference images/videos
   --image_path ./photo.jpg             → auto-uploaded to cloud storage
   --image_url https://...              → auto-downloaded and re-uploaded
   --video_path ./clip.mp4              → auto-uploaded to cloud storage
   --image_path "./a.jpg,./b.jpg"       → uploads multiple image refs

2. Submit generation task
   → Returns recordId for tracking

3. Auto-poll for completion
   → Images: polls every 5s (up to 5 min)
   → Videos: polls every 10s (up to 20 min)

4. Auto-download results
   → Fetches no-watermark version when available
   → Saves to current directory
   → Auto-opens on macOS/Linux/Windows
```

## Troubleshooting

- **"WULI_API_TOKEN not set"**: Run `export WULI_API_TOKEN="wuli-your-token"`. Get your token from [wuli.art](https://wuli.art) bottom-left corner → API. The token is sent as `Authorization: Bearer <token>`.
- **"HTTP 401"**: Token is invalid or expired. Regenerate it on the wuli.art platform.
- **"HTTP 429"**: Rate limited. Wait a few seconds and retry.
- **"Error code 2001"**: Insufficient credits. Top up at [wuli.art](https://wuli.art).
- **"REVIEW_FAILED"**: Content moderation rejected the prompt. Rephrase to avoid sensitive content.
- **"TIMEOUT"**: Generation took too long. Try a faster model or shorter duration.
- **"image2video requires exactly one reference image"**: Use only one `--image_*` input for `image2video`.
- **"flf2video requires exactly two reference images"**: Pass a start frame with `--image_*` and an end frame with `--end_image_*`, or provide exactly two image references total.
- **"auto-video requires at least one reference image or video"**: Provide `--image_*`, `--video_*`, or both.
- **Media upload fails**: Supported image formats are `jpg`, `jpeg`, `png`, `webp`. Supported video formats are `mp4`, `mov`, `avi`, `webm`.
- **Sound flag seems ignored**: `--sound` / `--no-sound` only affect video models that expose sound control. Some models or modes may keep backend default behavior.

## Tips

- Prompt optimization is enabled by default and is recommended for most prompts, especially short or vague ones.
- Use `--no-optimize` only when you need the model to follow your raw prompt as directly as possible.
- Start with the default models (Qwen Image 2.0 / 通义万相 2.2 Turbo) when you want the cheapest and simplest path.
- Generate multiple images with `--n 4` and pick the best.
- Use `--negative_prompt` to exclude unwanted elements, e.g. `--negative_prompt "blurry, low quality, watermark"`.
- For video, start with 5-second duration to preview, then re-generate at longer duration once you're happy with the style.
- If a video model supports sound control, `--no-sound` can be useful for silent exports and may reduce credits on some models.
- For `flf2video`, keep the start and end frames visually consistent for smoother motion.
- For `auto-video`, start with one video or 2-3 images before scaling up to more references.
- All results are auto-downloaded without watermarks when available.

For complete API documentation including current model tables and upload flow, see [references/【呜哩Wuli】开放平台 API 文档.md](references/%E3%80%90%E5%91%9C%E5%93%A9Wuli%E3%80%91%E5%BC%80%E6%94%BE%E5%B9%B3%E5%8F%B0%20API%20%E6%96%87%E6%A1%A3.md).

FILE:README.md
# wuli-skill

呜哩 Wuli 开放平台 Claude Code Skill。

## 文档

- [【呜哩Wuli】开放平台 API 文档](./references/【呜哩Wuli】开放平台%20API%20文档.md)
- [SKILL.md](./SKILL.md) — Skill 使用说明

FILE:manifest.json
{
  "name": "wuli",
  "version": "1.0.7",
  "description": "Generate AI images and videos with 17+ active models via Wuli.art open platform API. Supports text-to-image, multi-reference image editing, text-to-video, first-frame image-to-video, first-last-frame video, auto-video from images and videos, prompt optimization, sound control for supported video models, uploads, and no-watermark downloads.",
  "author": "sir1st",
  "license": "MIT-0",
  "homepage": "https://wuli.art",
  "repository": "https://github.com/alibaba-wuli/wuli-skill",
  "main": "skill.py",
  "tags": [
    "ai",
    "image-generation",
    "video-generation",
    "text-to-image",
    "text-to-video",
    "image-to-video",
    "image-editing",
    "art",
    "creative",
    "wuli"
  ],
  "env": {
    "WULI_API_TOKEN": {
      "description": "Wuli.art API token for authentication. Get it from https://wuli.art (bottom-left corner → API).",
      "required": true,
      "secret": true
    }
  },
  "metadata": {
    "clawdbot": {
      "emoji": "🎨",
      "requires": {
        "anyBins": ["python3"]
      },
      "os": ["linux", "darwin", "win32"],
      "privacy": "This skill uploads local files and remote images/videos to Wuli.art OSS storage for processing. Do not use with sensitive or private files. Uploaded content is transmitted to https://platform.wuli.art and the pre-signed OSS upload URL returned by that API."
    }
  },
  "arguments": {
    "action": {
      "description": "Action type: image-gen, image-edit, txt2video, image2video, flf2video, auto-video",
      "required": true
    },
    "prompt": {
      "description": "Text prompt for generation (max 2000 chars)",
      "required": true
    },
    "model": {
      "description": "Model name (default: Qwen Image 2.0 for images, 通义万相 2.2 Turbo for videos)",
      "default": ""
    },
    "aspect_ratio": {
      "description": "Aspect ratio (1:1, 16:9, 9:16, 4:3, 3:4, 3:2, 2:3, 21:9, 9:21)",
      "default": "1:1"
    },
    "resolution": {
      "description": "Resolution (images: 2K, 3K, 4K depending on model; videos: 480P, 720P, 768P, 1080P depending on model)",
      "default": "2K"
    },
    "n": {
      "description": "Number of images to generate (1-4, image only)",
      "default": "1"
    },
    "image_url": {
      "description": "Reference image URL(s). Supports comma-separated multiple values for image-edit and auto-video; also used by image2video and flf2video"
    },
    "image_path": {
      "description": "Local image path(s), auto-uploaded. Supports comma-separated multiple values for image-edit and auto-video; also used by image2video and flf2video"
    },
    "end_image_url": {
      "description": "End-frame image URL for flf2video"
    },
    "end_image_path": {
      "description": "End-frame local image path for flf2video"
    },
    "video_url": {
      "description": "Reference video URL(s) for auto-video, comma-separated"
    },
    "video_path": {
      "description": "Local reference video path(s) for auto-video, comma-separated"
    },
    "duration": {
      "description": "Video duration in seconds (video only)",
      "default": "5"
    },
    "negative_prompt": {
      "description": "Negative prompt to exclude unwanted elements"
    },
    "sound": {
      "description": "Enable sound for supported video models. If omitted, backend default behavior is used (currently true).",
      "default": ""
    },
    "no_sound": {
      "description": "Disable sound for video output when the model supports this parameter.",
      "default": "false"
    },
    "optimize": {
      "description": "Enable prompt optimization. Recommended for most prompts and enabled by default.",
      "default": "true"
    },
    "no_optimize": {
      "description": "Disable prompt optimization when you need the raw prompt sent as-is.",
      "default": "false"
    }
  }
}

FILE:references/【呜哩Wuli】开放平台 API 文档.md
# 【呜哩Wuli】开放平台 API 文档

## 已发布对应的SKILL(请参考skill内代码进行API调用)

[https://github.com/alibaba-wuli/wuli-skill](https://github.com/alibaba-wuli/wuli-skill#)

[https://clawhub.ai/sir1st-inc/wuli](https://clawhub.ai/sir1st-inc/wuli)

[《【呜哩Wuli】官方 Skill 使用指南》](https://alidocs.dingtalk.com/i/nodes/dQPGYqjpJYZnRbNYCoYejQOP8akx1Z5N?utm_scene=team_space)

> markdown版本: 

[https://github.com/alibaba-wuli/wuli-skill/blob/main/references/%E3%80%90%E5%91%9C%E5%93%A9Wuli%E3%80%91%E5%BC%80%E6%94%BE%E5%B9%B3%E5%8F%B0%20API%20%E6%96%87%E6%A1%A3.md: https://github.com/alibaba-wuli/wuli-skill/blob/main/references/%E3%80%90%E5%91%9C%E5%93%A9Wuli%E3%80%91%E5%BC%80%E6%94%BE%E5%B9%B3%E5%8F%B0%20API%20%E6%96%87%E6%A1%A3.md](https://github.com/alibaba-wuli/wuli-skill/blob/main/references/%E3%80%90%E5%91%9C%E5%93%A9Wuli%E3%80%91%E5%BC%80%E6%94%BE%E5%B9%B3%E5%8F%B0%20API%20%E6%96%87%E6%A1%A3.md)

## 概述

呜哩开放平台提供图片生成、视频生成等 AI 能力的 API 接口，支持通过 API Token 进行身份认证。

*   **服务地址**: https://platform.wuli.art
    
*   **认证方式**: 所有平台接口通过请求头 \`Authorization: Bearer <API Token>\` 传递 API Token 进行身份认证
    
*   **积分消耗**: API 调用正常消耗积分，积分明细中以"API调用-"为前缀标记, 注: 会员免费权益仅限在 wuli.art 网页端使用，API 调用不在权益范围内
    
*   **数据隔离**: 通过 API 提交的任务不会出现在网页端的历史记录和资源库中
    

---

## 认证

### 获取 API Token

登录 [wuli.art](https://wuli.art)，在左下角进入「API 开放平台」入口，查看或重置你的访问令牌。

> 重置后旧 Token 立即失效。

### 认证方式

在所有平台 API 请求中，通过请求头传递 Token：

```plaintext
Authorization: Bearer wuli-a1b2c3d4e5f6...
```
---

## 通用响应格式

所有接口响应均遵循以下格式：

```plaintext
{
  "success": true,
  "code": 200,
  "msg": "成功",
  "data": { ... },
  "requestId": "xxx"
}
```

| 字段 | 类型 | 说明 |
| --- | --- | --- |
| success | boolean | 请求是否成功 |
| code | int | 状态码，200 为成功 |
| msg | string | 错误信息 |
| data | object | 响应数据 |
| requestId | string | 请求追踪 ID |

---

## 接口列表

### 1. 提交生图/视频任务

```plaintext
POST /api/v1/platform/predict/submit
```

提交一个图片或视频生成任务。任务为异步执行，提交后通过查询接口轮询结果。

#### 请求头

| Header | 必填 | 说明 |
| --- | --- | --- |
| Authorization | 是 | `Bearer <API Token>` |
| Content-Type | 是 | application/json |

#### 请求参数

| 参数 | 类型 | 必填 | 说明 |
| --- | --- | --- | --- |
| modelName | string | 是 | 模型名称，见下方[可用模型列表](#%E5%8F%AF%E7%94%A8%E6%A8%A1%E5%9E%8B) |
| prompt | string | 是 | 提示词，最长 2000 字符 |
| mediaType | string | 否 | 媒体类型：`IMAGE` 或 `VIDEO`，不传则根据模型自动判断 |
| predictType | string | 否 | 生成类型，不传则自动推断。详见[生成类型说明](#%E7%94%9F%E6%88%90%E7%B1%BB%E5%9E%8B) |
| aspectRatio | string | 是 | 画面比例，如 `1:1`、`16:9`、`9:16` 等 |
| resolution | string | 是 | 分辨率，如 `2K`、`4K`（图片）或 `720P`、`1080P`（视频） |
| n | int | 否 | 生成数量，1-4，默认 1 |
| inputImageList | array | 否 | 参考图片列表，用于图生图、首帧图生视频、首尾帧图生视频、自动视频模式 |
| inputVideoList | array | 否 | 参考视频列表，用于自动视频模式 |
| videoTotalSeconds | int | 否 | 视频时长（秒），仅视频模型有效，默认 5 |
| sound | boolean | 否 | 是否开启声音，仅视频任务有效。不传按后端默认逻辑处理，当前默认 true。部分模型或模式可能忽略该字段 |
| negativePrompt | string | 否 | 反向提示词 |
| seed | int | 否 | 随机种子，默认 -1（随机） |
| optimizePrompt | boolean | 否 | 是否优化提示词，默认 true，建议开启，尤其适合较短或较泛的提示词 |

**inputImageList / inputVideoList 中每个元素格式：**

| 参数 | 类型 | 说明 |
| --- | --- | --- |
| imageUrl | string | 图片/视频的 URL（须通过上传接口获取）。在 `inputVideoList` 中同样使用 `imageUrl` 字段名 |

> 建议大多数场景保持 `optimizePrompt: true`，可以显著改善短提示词、口语化提示词和描述不完整提示词的生成效果。

> `sound` 仅对支持声音控制的视频模型有效。对于带参考视频的某些自动视频模式，`sound` 可能表示“是否保留原视频声音”而不是“是否重新生成声音”，以实际模型实现为准。

#### 请求示例

**文生图：**

```plaintext
{
  "modelName": "Qwen Image Turbo",
  "prompt": "一只穿着太空服的猫咪在月球上漫步，背景是蓝色地球",
  "mediaType": "IMAGE",
  "aspectRatio": "1:1",
  "resolution": "2K",
  "n": 4,
  "optimizePrompt": true
}
```

**图生图：**

```plaintext
{
  "modelName": "Qwen Image 2.0",
  "prompt": "将这张照片变成水彩画风格",
  "mediaType": "IMAGE",
  "predictType": "REF_2_IMG",
  "aspectRatio": "16:9",
  "resolution": "2K",
  "n": 2,
  "inputImageList": [
    { "imageUrl": "https://your-uploaded-image-url.jpg" }
  ]
}
```

**文生视频：**

```plaintext
{
  "modelName": "通义万相 2.2 Turbo",
  "prompt": "海浪拍打着金色的沙滩，夕阳西下",
  "mediaType": "VIDEO",
  "aspectRatio": "16:9",
  "resolution": "720P",
  "videoTotalSeconds": 5,
  "sound": true
}
```

**图生视频（首帧）：**

```plaintext
{
  "modelName": "通义万相 2.6 Flash",
  "prompt": "让画面中的花朵缓缓绽放",
  "mediaType": "VIDEO",
  "predictType": "FF_2_VIDEO",
  "aspectRatio": "16:9",
  "resolution": "720P",
  "videoTotalSeconds": 5,
  "sound": true,
  "inputImageList": [
    { "imageUrl": "https://your-uploaded-image-url.jpg" }
  ]
}
```

**图生视频（首尾帧）：**

```plaintext
{
  "modelName": "可灵 3.0",
  "prompt": "让镜头从第一帧平滑过渡到最后一帧",
  "mediaType": "VIDEO",
  "predictType": "FLF_2_VIDEO",
  "aspectRatio": "16:9",
  "resolution": "1080P",
  "videoTotalSeconds": 5,
  "inputImageList": [
    { "imageUrl": "https://your-uploaded-start-frame.jpg" },
    { "imageUrl": "https://your-uploaded-end-frame.jpg" }
  ]
}
```

**自动视频模式（图片 + 视频参考）：**

```plaintext
{
  "modelName": "可灵 3.0 Omni",
  "prompt": "保留原始动作节奏，同时转成电影感赛博朋克风格",
  "mediaType": "VIDEO",
  "predictType": "AUTO_VIDEO",
  "aspectRatio": "16:9",
  "resolution": "1080P",
  "videoTotalSeconds": 10,
  "sound": false,
  "inputImageList": [
    { "imageUrl": "https://your-uploaded-style-reference.jpg" }
  ],
  "inputVideoList": [
    { "imageUrl": "https://your-uploaded-source-video.mp4" }
  ]
}
```

#### 响应参数

| 参数 | 类型 | 说明 |
| --- | --- | --- |
| recordId | string | 任务记录 ID，用于后续查询 |
| credit | object | 积分消耗信息 |
| credit.modelGroup | string | 模型分组名 |
| credit.previousFreeUsage | int | 消耗前剩余免费次数 |
| credit.currentFreeUsage | int | 消耗后剩余免费次数 |

#### 响应示例

```plaintext
{
  "success": true,
  "code": 200,
  "data": {
    "recordId": "01JWXYZ...",
    "credit": {
      "modelGroup": "IMAGE_DEFAULT",
      "previousFreeUsage": 10,
      "currentFreeUsage": 6
    }
  }
}
```
---

### 2. 查询任务状态

```plaintext
GET /api/v1/platform/predict/query?recordId={recordId}
```

根据 `recordId` 查询任务状态和生成结果。建议以 2-5 秒间隔轮询，直到状态为终态。

#### 请求参数

| 参数 | 类型 | 必填 | 说明 |
| --- | --- | --- | --- |
| recordId | string | 是 | 提交任务时返回的记录 ID |

#### 响应参数

| 参数 | 类型 | 说明 |
| --- | --- | --- |
| recordId | string | 记录 ID |
| recordStatus | string | 任务整体状态，见[任务状态说明](#%E4%BB%BB%E5%8A%A1%E7%8A%B6%E6%80%81) |
| gmtCreate | string | 创建时间 |
| mediaType | string | `IMAGE` 或 `VIDEO` |
| modelInfo | object | 模型信息 |
| modelInfo.modelName | string | 模型名称 |
| genInfo | object | 生成参数信息 |
| genInfo.prompt | string | 提示词 |
| genInfo.predictType | string | 生成类型 |
| genInfo.aspectRatio | string | 画面比例 |
| genInfo.resolution | string | 分辨率 |
| genInfo.width | int | 宽度（像素） |
| genInfo.height | int | 高度（像素） |
| genInfo.videoTotalSeconds | int | 视频时长（秒） |
| genInfo.sound | boolean | 是否开启声音 |
| results | array | 生成结果列表 |
| results.taskId | string | 子任务 ID |
| results.imageId | string | 资源 ID |
| results.imageUrl | string | 结果图片/视频 URL（带水印） |
| results.status | string | 子任务状态 |
| results.progress | int | 进度百分比 |
| results.errorMsg | string | 错误信息 |
| results.star | int | 收藏状态 |

#### 响应示例

```plaintext
{
  "success": true,
  "code": 200,
  "data": {
    "recordId": "01JWXYZ...",
    "recordStatus": "SUCCEED",
    "gmtCreate": "2026-03-11T10:30:00.000+08:00",
    "mediaType": "IMAGE",
    "modelInfo": {
      "modelName": "Qwen Image Turbo"
    },
    "genInfo": {
      "prompt": "一只穿着太空服的猫咪在月球上漫步",
      "predictType": "TXT_2_IMG",
      "aspectRatio": "1:1",
      "resolution": "2K",
      "width": 1024,
      "height": 1024,
      "optimizePrompt": true
    },
    "results": [
      {
        "taskId": "01JWABC...",
        "imageId": "01JWDEF...",
        "imageUrl": "https://cdn.wuli.art/result/xxx.png",
        "status": "SUCCEED",
        "progress": 100,
        "star": 0
      }
    ]
  }
}
```
---

### 3. 获取无水印图片/视频

```plaintext
POST /api/v1/platform/predict/noWatermarkImage
```

获取生成结果的无水印版本 URL。

#### 请求参数

| 参数 | 类型 | 必填 | 说明 |
| --- | --- | --- | --- |
| taskId | string | 否 | 子任务 ID |
| resourceId | string | 否 | 资源 ID |
| resourceIdList | array | 否 | 资源 ID 列表（批量获取） |

> 三个参数至少传一个。

#### 请求示例

```plaintext
{
  "taskId": "01JWABC..."
}
```

或批量获取：

```plaintext
{
  "resourceIdList": ["01JWDEF...", "01JWGHI..."]
}
```

#### 响应示例

```plaintext
{
  "success": true,
  "code": 200,
  "data": {
    "url": "https://cdn.wuli.art/result/xxx\_nowatermark.png",
    "urlList": [
      "https://cdn.wuli.art/result/xxx1\_nowatermark.png",
      "https://cdn.wuli.art/result/xxx2\_nowatermark.png"
    ]
  }
}
```
---

### 4. 获取预签名上传 URL

```plaintext
GET /api/v1/platform/image/getUploadUrl?filename={filename}
```

获取 OSS 预签名上传 URL，用于上传参考图片或视频。上传成功后，将 `uploadUrl` 去掉签名参数后的公网 URL 用作 `inputImageList` / `inputVideoList` 中的 `imageUrl` 字段值。

#### 请求参数

| 参数 | 类型 | 必填 | 说明 |
| --- | --- | --- | --- |
| filename | string | 是 | 文件名（含后缀），如 `photo.jpg`、`clip.mp4` |

支持的图片格式：`jpg`、`jpeg`、`png`、`webp` 支持的视频格式：`mp4`、`mov`、`avi`、`webm`

#### 响应参数

| 参数 | 类型 | 说明 |
| --- | --- | --- |
| uploadUrl | string | 预签名上传 URL，使用 `PUT` 方法上传文件，有效期 1 小时 |
| objectName | string | 文件对象名（仅供参考） |

#### 响应示例

```plaintext
{
  "success": true,
  "code": 200,
  "data": {
    "uploadUrl": "https://oss.aliyuncs.com/wuli/xxx?签名参数...",
    "objectName": "upload/2026/03/11/abc123.jpg"
  }
}
```

#### 使用流程

1.  调用本接口获取 `uploadUrl` 和 `objectName`
    
2.  使用 `PUT` 方法将文件二进制数据上传到 `uploadUrl`，请求头必须设置 `Content-Type: application/octet-stream`
    
3.  将 `uploadUrl` 去掉查询参数（`?Expires=...` 部分）后的基础 URL 作为 `inputImageList[].imageUrl` 或 `inputVideoList[].imageUrl` 的值传入生成接口
    

#### 上传文件到预签名 URL

获取到 `uploadUrl` 后，需要通过 HTTP `PUT` 请求将文件内容上传。以下是具体的上传方式：

**curl 示例（上传本地图片）：**

```bash
# 1. 获取预签名上传 URL
UPLOAD_RESP=$(curl -s -H "Authorization: Bearer $API_TOKEN" \
  "https://platform.wuli.art/api/v1/platform/image/getUploadUrl?filename=photo.jpg")

UPLOAD_URL=$(echo $UPLOAD_RESP | jq -r '.data.uploadUrl')

# 2. PUT 上传文件（Content-Type 必须为 application/octet-stream）
curl -X PUT \
  -H "Content-Type: application/octet-stream" \
  --data-binary @photo.jpg \
  "$UPLOAD_URL"

# 3. 去掉签名参数，得到公网 URL 用于后续生成任务
PUBLIC_URL=$(echo "$UPLOAD_URL" | cut -d'?' -f1)
echo "公网 URL: $PUBLIC_URL"
```

**curl 示例（上传本地视频）：**

```bash
UPLOAD_RESP=$(curl -s -H "Authorization: Bearer $API_TOKEN" \
  "https://platform.wuli.art/api/v1/platform/image/getUploadUrl?filename=clip.mp4")

UPLOAD_URL=$(echo $UPLOAD_RESP | jq -r '.data.uploadUrl')

curl -X PUT \
  -H "Content-Type: application/octet-stream" \
  --data-binary @clip.mp4 \
  "$UPLOAD_URL"

PUBLIC_URL=$(echo "$UPLOAD_URL" | cut -d'?' -f1)
```

**Python 示例（上传本地文件）：**

```python
import urllib.parse
import urllib.request
import json
from pathlib import Path

API_BASE = "https://platform.wuli.art/api/v1/platform"
TOKEN = "wuli-a1b2c3d4e5f6..."

def upload_file(file_path, token):
    path = Path(file_path)
    filename = path.name
    encoded_filename = urllib.parse.quote(filename)

    # Step 1: 获取预签名上传 URL
    req = urllib.request.Request(
        f"{API_BASE}/image/getUploadUrl?filename={encoded_filename}"
    )
    req.add_header("Authorization", f"Bearer {token}")
    with urllib.request.urlopen(req) as resp:
        result = json.loads(resp.read())

    upload_url = result["data"]["uploadUrl"]

    # Step 2: PUT 上传文件（Content-Type 必须为 application/octet-stream）
    file_data = path.read_bytes()
    put_req = urllib.request.Request(upload_url, data=file_data, method="PUT")
    put_req.add_header("Content-Type", "application/octet-stream")
    with urllib.request.urlopen(put_req, timeout=120) as _:
        pass

    # Step 3: 去掉查询参数，得到公网 URL
    parsed = urllib.parse.urlparse(upload_url)
    public_url = f"{parsed.scheme}://{parsed.netloc}{parsed.path}"
    return public_url

# 上传图片并用于图生图
image_url = upload_file("photo.jpg", TOKEN)
# 上传视频并用于视频生视频
video_url = upload_file("clip.mp4", TOKEN)
```

**Python 示例（下载远程图片后重新上传到 OSS）：**

如果参考图片来自第三方 URL，需要先下载再上传到呜哩 OSS：

```python
def upload_remote_image(image_url, token):
    # Step 1: 下载远程图片
    req = urllib.request.Request(image_url)
    with urllib.request.urlopen(req, timeout=60) as resp:
        image_data = resp.read()

    # 从 URL 路径推断文件扩展名
    url_path = urllib.parse.urlparse(image_url).path
    ext = ".jpg"
    if "." in url_path.split("/")[-1]:
        ext = "." + url_path.split("/")[-1].rsplit(".", 1)[-1].lower()

    filename = f"upload{ext}"
    encoded_filename = urllib.parse.quote(filename)

    # Step 2: 获取预签名上传 URL
    req = urllib.request.Request(
        f"{API_BASE}/image/getUploadUrl?filename={encoded_filename}"
    )
    req.add_header("Authorization", f"Bearer {token}")
    with urllib.request.urlopen(req) as resp:
        result = json.loads(resp.read())

    upload_url = result["data"]["uploadUrl"]

    # Step 3: PUT 上传到 OSS
    put_req = urllib.request.Request(upload_url, data=image_data, method="PUT")
    put_req.add_header("Content-Type", "application/octet-stream")
    with urllib.request.urlopen(put_req, timeout=120) as _:
        pass

    # Step 4: 去掉查询参数，得到公网 URL
    parsed = urllib.parse.urlparse(upload_url)
    public_url = f"{parsed.scheme}://{parsed.netloc}{parsed.path}"
    return public_url
```
> **注意事项：**

*   上传时 `Content-Type` 必须设置为 `application/octet-stream`，不要使用 `multipart/form-data` 或文件的实际 MIME 类型
    
*   `uploadUrl` 有效期为 1 小时，请及时上传
    
*   上传完成后，需将 `uploadUrl` 去掉 `?` 及其后面的签名参数部分，得到的基础 URL 才能作为 `inputImageList[].imageUrl` 或 `inputVideoList[].imageUrl` 使用
    
*   支持的图片格式：`jpg`、`jpeg`、`png`、`webp`；支持的视频格式：`mp4`、`mov`、`avi`、`webm`
    
*   第三方 URL 的图片/视频不能直接用于生成任务，必须先上传到呜哩 OSS 获取公网 URL
    

---

## 可用模型

### 图片生成模型

| 模型名称 (modelName) | 支持的生成类型 | 支持的分辨率 | 最大生成数 | 最大参考图数 | 每张积分消耗 |
| --- | --- | --- | --- | --- | --- |
| 通义万相 2.7 | TXT_2_IMG, REF_2_IMG | 2K, 4K | 4 | 9 | 2K=1，4K=3 |
| Qwen Image 2.0 | TXT_2_IMG, REF_2_IMG | 2K, 4K | 4 | 4 | 1 |
| Qwen Image Turbo | TXT_2_IMG, REF_2_IMG | 2K, 4K | 4 | 4 | 1 |
| Seedream 5.0 Lite | TXT_2_IMG, REF_2_IMG | 2K, 3K | 4 | 8 | 4 |
| Seedream 4.5 | TXT_2_IMG, REF_2_IMG | 2K, 4K | 4 | 8 | 4 |
| Seedream 4.0 | TXT_2_IMG, REF_2_IMG | 1K, 2K, 4K | 4 | 8 | 4 |

> 当前默认推荐接入以上图片模型。`Qwen Image 25.08`、`Qwen Image 25.11`、`Qwen Image 25.12`、`通义万相 2.6` 图片模型在配置中仍保留兼容信息，但不作为当前默认接入模型。

#### 图片画面比例

| aspectRatio | 说明 |
| --- | --- |
| 1:1 | 正方形 |
| 4:3 | 横向 4:3 |
| 3:2 | 横向 3:2 |
| 16:9 | 宽屏横向 |
| 21:9 | 超宽横向 |
| 3:4 | 纵向 3:4 |
| 2:3 | 纵向 2:3 |
| 9:16 | 竖屏纵向 |
| 9:21 | 超高纵向 |

---

### 视频生成模型

| 模型名称 (modelName) | 支持的生成类型 | 支持的分辨率 | 支持的时长(秒) | 最大参考图数 | 最大参考视频数 | 支持声音开关 | 默认声音行为 |
| --- | --- | --- | --- | --- | --- | --- | --- |
| 通义万相 2.7 | TXT_2_VIDEO, FF_2_VIDEO, FLF_2_VIDEO, AUTO\_VIDEO | 720P, 1080P | 5, 10, 15（AUTO\_VIDEO 仅 5, 10） | 2（AUTO\_VIDEO 为 5） | 3（AUTO\_VIDEO） | 否 | 以模型实际返回为准 |
| Happy Horse 1.0 | TXT_2_VIDEO, FF_2_VIDEO, AUTO\_VIDEO | 720P, 1080P | 5, 10, 15 | 1（AUTO\_VIDEO 为 9） | 1（AUTO\_VIDEO） | 否 | 以模型实际返回为准 |
| 通义万相 2.2 Turbo | TXT_2_VIDEO, FF_2_VIDEO | 720P | 5 | 1 | 0 | 否 | 无音频输出 |
| 通义万相 2.6 Flash | FF_2_VIDEO | 720P, 1080P | 5, 10, 15 | 1 | 0 | 否（当前实现固定开启） | 默认带音频 |
| 通义万相 2.6 | TXT_2_VIDEO, FF_2_VIDEO, AUTO\_VIDEO | 720P, 1080P | 5, 10, 15（AUTO\_VIDEO 为 5, 10） | 2（AUTO\_VIDEO 为 5） | 3（AUTO\_VIDEO） | 部分支持 | `AUTO_VIDEO` 可控；`FF_2_VIDEO` 当前固定带音频；其余模式以模型实际返回为准 |
| 可灵 3.0 Omni | TXT_2_VIDEO, FF_2_VIDEO, FLF_2_VIDEO, AUTO\_VIDEO | 720P, 1080P | 5, 10, 15 | 7 | 1 | 是 | 无参考视频时默认开启；有参考视频时默认保留原声 |
| 可灵 O1 | TXT_2_VIDEO, FF_2_VIDEO, FLF_2_VIDEO, AUTO\_VIDEO | 720P, 1080P | 5, 10 | 7 | 1 | 是 | 无参考视频时默认开启；有参考视频时默认保留原声 |
| 可灵 3.0 | TXT_2_VIDEO, FF_2_VIDEO, FLF_2_VIDEO | 720P, 1080P | 5, 10, 15 | 2 | 0 | 是 | 默认开启 |
| 可灵 2.6 | TXT_2_VIDEO, FF_2_VIDEO, FLF_2_VIDEO | 1080P | 5, 10 | 2 | 0 | 是 | 默认开启 |
| 可灵 2.5 Turbo | TXT_2_VIDEO, FF_2_VIDEO, FLF_2_VIDEO | 1080P | 5, 10 | 2 | 0 | 是 | 默认开启 |
| Seedance 1.5 Pro | TXT_2_VIDEO, FF_2_VIDEO, FLF_2_VIDEO | 480P, 720P | 5, 10, 12 | 2 | 0 | 是 | 默认开启 |
| Seedance 1.0 Pro | TXT_2_VIDEO, FF_2_VIDEO, FLF_2_VIDEO | 480P, 720P, 1080P | 5, 10 | 2 | 0 | 是 | 默认开启 |
| MiniMax Hailuo 2.3 | TXT_2_VIDEO, FF_2_VIDEO | 768P, 1080P | 6, 10 | 1 | 0 | 否 | 无音频输出 |
| MiniMax Hailuo 2.3 Fast | FF_2_VIDEO | 768P, 1080P | 6, 10 | 1 | 0 | 否 | 无音频输出 |

> 当前默认推荐接入以上视频模型。`智能模型`、`Wan 2.2 Turbo` 等隐藏模型在配置中保留兼容信息，但不作为当前默认接入模型。

> “支持声音开关”表示当前 API 请求里的 `sound` 字段是否会参与该模型/模式的后端转换逻辑，不等同于网页端是否已经开放对应 UI。

> 未传 `sound` 时，后端当前默认按 `true` 处理；但某些模型/模式会固定带音频、固定无音频，或在带参考视频时将其解释为“是否保留原视频声音”。

> 对支持音频控制的模型，关闭 `sound` 后可能命中更低的积分档位；对带参考视频的自动视频模式，还可能命中视频编辑相关积分档位。

#### 视频积分消耗

视频积分根据模型、分辨率、时长和生成类型不同而不同，以下为当前模型配置对应的积分参考：

**通义万相 2.7（TXT**_2_VIDEO / FF_2_VIDEO / FLF_2_VIDEO）

| 分辨率 | 5秒 | 10秒 | 15秒 |
| --- | --- | --- | --- |
| 720P | 40 | 80 | 120 |
| 1080P | 60 | 120 | 180 |

**通义万相 2.7（AUTO\_VIDEO）**

| 分辨率 | 5秒 | 10秒 |
| --- | --- | --- |
| 720P | 40 | 80 |
| 1080P | 60 | 120 |

**Happy Horse 1.0**

| 分辨率 | 5秒 | 10秒 | 15秒 |
| --- | --- | --- | --- |
| 720P | 60 | 120 | 180 |
| 1080P | 100 | 200 | 300 |

**通义万相 2.2 Turbo**

| 分辨率 | 5秒 |
| --- | --- |
| 720P | 20 |

**通义万相 2.6 Flash**

| 分辨率 | 5秒 | 10秒 | 15秒 |
| --- | --- | --- | --- |
| 720P | 20 | 40 | 60 |
| 1080P | 40 | 80 | 120 |

**通义万相 2.6（TXT**_2_VIDEO / FF_2_VIDEO）

| 分辨率 | 5秒 | 10秒 | 15秒 |
| --- | --- | --- | --- |
| 720P | 40 | 80 | 120 |
| 1080P | 60 | 120 | 180 |

**通义万相 2.6（AUTO\_VIDEO）**

| 分辨率 | 5秒 | 10秒 |
| --- | --- | --- |
| 720P | 40 | 80 |
| 1080P | 60 | 120 |

**可灵 3.0 Omni（TXT**_2_VIDEO / FF_2_VIDEO / FLF_2_VIDEO）

| 分辨率 | 5秒 | 10秒 | 15秒 |
| --- | --- | --- | --- |
| 720P | 60 | 120 | 180 |
| 1080P | 80 | 160 | 240 |

**可灵 3.0 Omni（AUTO\_VIDEO）**

| 分辨率 | 5秒 | 10秒 | 15秒 |
| --- | --- | --- | --- |
| 720P | 80 | 160 | 180 |
| 1080P | 100 | 200 | 240 |

**可灵 O1（TXT**_2_VIDEO / FF_2_VIDEO / FLF_2_VIDEO）

| 分辨率 | 5秒 | 10秒 |
| --- | --- | --- |
| 720P | 40 | 80 |
| 1080P | 60 | 100 |

**可灵 O1（AUTO\_VIDEO）**

| 分辨率 | 5秒 | 10秒 |
| --- | --- | --- |
| 720P | 60 | 120 |
| 1080P | 80 | 160 |

**可灵 3.0**

| 分辨率 | 5秒 | 10秒 | 15秒 |
| --- | --- | --- | --- |
| 720P | 80 | 160 | 240 |
| 1080P | 100 | 200 | 300 |

**可灵 2.6**

| 分辨率 | 5秒 | 10秒 |
| --- | --- | --- |
| 1080P | 60 | 120 |

**可灵 2.5 Turbo**

| 分辨率 | 5秒 | 10秒 |
| --- | --- | --- |
| 1080P | 40 | 80 |

**Seedance 1.5 Pro**

| 分辨率 | 5秒 | 10秒 | 12秒 |
| --- | --- | --- | --- |
| 480P | 20 | 40 | 60 |
| 720P | 40 | 80 | 100 |

**Seedance 1.0 Pro**

| 分辨率 | 5秒 | 10秒 |
| --- | --- | --- |
| 480P | 20 | 40 |
| 720P | 40 | 80 |
| 1080P | 80 | 160 |

**MiniMax Hailuo 2.3**

| 分辨率 | 6秒 | 10秒 |
| --- | --- | --- |
| 768P | 40 | 80 |
| 1080P | 60 | 100 |

**MiniMax Hailuo 2.3 Fast**

| 分辨率 | 6秒 | 10秒 |
| --- | --- | --- |
| 768P | 20 | 40 |
| 1080P | 40 | 40 |

#### 视频画面比例

| aspectRatio | 说明 |
| --- | --- |
| 1:1 | 正方形 |
| 4:3 | 横向 4:3 |
| 3:4 | 纵向 3:4 |
| 16:9 | 宽屏横向 |
| 9:16 | 竖屏纵向 |

> 不同模型支持的画面比例可能不同，请以实际模型配置为准。

---

## 生成类型

| predictType | 说明 | 适用场景 |
| --- | --- | --- |
| TXT_2_IMG | 文生图 | 纯文本提示词生成图片 |
| REF_2_IMG | 图生图 | 一张或多张参考图片 + 提示词生成图片 |
| TXT_2_VIDEO | 文生视频 | 纯文本提示词生成视频 |
| FF_2_VIDEO | 图生视频（首帧） | 单张参考图 + 提示词生成视频 |
| FLF_2_VIDEO | 图生视频（首尾帧） | 首尾两张参考图 + 提示词生成视频 |
| AUTO\_VIDEO | 自动视频模式 | 参考图片、参考视频或混合参考素材 + 提示词生成视频 |

> 历史配置中的 `MULTI_IMG_2_VIDEO`、`VIDEO_2_VIDEO` 等能力，在当前对外模型配置中统一归入 `AUTO_VIDEO` 模式说明。

---

## 任务状态

| 状态 (status) | 说明 | 是否终态 |
| --- | --- | --- |
| INITIALIZING | 初始化中 | 否 |
| OPTIMIZING | 提示词优化中 | 否 |
| PENDING | 排队等待中 | 否 |
| PROCESSING | 生成中 | 否 |
| SUCCEED | 生成成功 | 是 |
| FAILED | 生成失败 | 是 |
| REVIEWFAILED | 内容审核不通过 | 是 |
| TIMEOUT | 任务超时 | 是 |
| CANCELLED | 已取消 | 是 |

---

## 错误码

| code | 说明 |
| --- | --- |
| 200 | 成功 |
| 401 | 未认证，Token 无效或缺失 |
| 403 | 无权限 |
| 429 | 请求频率过高 |
| 1001 | 参数错误 |
| 2001 | 积分余额不足 |
| 5000 | 服务内部错误 |

---

## 典型调用流程

```plaintext
1. 上传参考素材（如需要）
   GET /api/v1/platform/image/getUploadUrl?filename=ref.jpg
   → PUT 上传文件到返回的 uploadUrl（需加 Content-Type: application/octet-stream 请求头）
   → 将 uploadUrl 去掉签名参数后的公网 URL 作为图片/视频引用
2. 提交生成任务
   POST /api/v1/platform/predict/submit
   → 获得 recordId

3. 轮询任务状态（建议间隔 2~5 秒）
   GET /api/v1/platform/predict/query?recordId=xxx
   → 直到 recordStatus 为终态 (SUCCEED / FAILED / ...)

4. 获取无水印结果（可选）
   POST /api/v1/platform/predict/noWatermarkImage
   → 获得无水印 URL
```
FILE:skill.py
#!/usr/bin/env python3
"""
Wuli Platform - Unified AI Image/Video Generation Skill
Uses WULI_API_TOKEN env var for Bearer token authentication via the open platform API.
"""

import argparse
import json
import os
import platform
import subprocess
import sys
import time
import urllib.request
import urllib.error
import urllib.parse
from pathlib import Path

API_BASE = "https://platform.wuli.art/api/v1/platform"

CONTENT_TYPES = {
    ".jpg": "image/jpeg",
    ".jpeg": "image/jpeg",
    ".png": "image/png",
    ".webp": "image/webp",
    ".mp4": "video/mp4",
    ".mov": "video/quicktime",
    ".avi": "video/x-msvideo",
    ".webm": "video/webm",
}

ACTION_DEFAULTS = {
    "image-gen": {
        "media_type": "IMAGE",
        "predict_type": "TXT_2_IMG",
        "model": "Qwen Image 2.0",
        "aspect_ratio": "1:1",
        "resolution": "2K",
        "needs_image": False,
    },
    "image-edit": {
        "media_type": "IMAGE",
        "predict_type": "REF_2_IMG",
        "model": "Qwen Image 2.0",
        "aspect_ratio": "1:1",
        "resolution": "2K",
        "needs_image": True,
    },
    "txt2video": {
        "media_type": "VIDEO",
        "predict_type": "TXT_2_VIDEO",
        "model": "通义万相 2.2 Turbo",
        "aspect_ratio": "16:9",
        "resolution": "720P",
        "needs_image": False,
    },
    "image2video": {
        "media_type": "VIDEO",
        "predict_type": "FF_2_VIDEO",
        "model": "通义万相 2.2 Turbo",
        "aspect_ratio": "16:9",
        "resolution": "720P",
        "needs_image": True,
    },
    "flf2video": {
        "media_type": "VIDEO",
        "predict_type": "FLF_2_VIDEO",
        "model": "可灵 3.0",
        "aspect_ratio": "16:9",
        "resolution": "720P",
        "needs_image": True,
    },
    "auto-video": {
        "media_type": "VIDEO",
        "predict_type": "AUTO_VIDEO",
        "model": "通义万相 2.6",
        "aspect_ratio": "16:9",
        "resolution": "720P",
        "needs_image": False,
    },
}


def api_request(method, url, token, data=None, content_type="application/json"):
    headers = {
        "Authorization": f"Bearer {token}",
        "Accept": "application/json",
    }
    body = None
    if data is not None:
        if content_type == "application/json":
            body = json.dumps(data).encode("utf-8")
            headers["Content-Type"] = "application/json"
        else:
            body = data
            headers["Content-Type"] = content_type

    req = urllib.request.Request(url, data=body, headers=headers, method=method)
    try:
        with urllib.request.urlopen(req, timeout=30) as resp:
            return json.loads(resp.read().decode("utf-8"))
    except urllib.error.HTTPError as e:
        err_body = e.read().decode("utf-8", errors="replace")
        print(f"Error: HTTP {e.code} - {err_body}", file=sys.stderr)
        sys.exit(1)
    except urllib.error.URLError as e:
        print(f"Error: {e.reason}", file=sys.stderr)
        sys.exit(1)


def upload_file(file_path, token):
    path = Path(file_path)
    if not path.is_file():
        print(f"Error: File not found: {file_path}", file=sys.stderr)
        sys.exit(1)

    filename = path.name
    print(f"Uploading file: {filename} ...")

    encoded_filename = urllib.parse.quote(filename)
    resp = api_request("GET", f"{API_BASE}/image/getUploadUrl?filename={encoded_filename}", token)
    if not resp.get("success"):
        print(f"Error: Failed to get upload URL: {json.dumps(resp, ensure_ascii=False)}", file=sys.stderr)
        sys.exit(1)

    upload_url = resp["data"]["uploadUrl"]
    # Build public URL from upload_url (strip query params)
    parsed = urllib.parse.urlparse(upload_url)
    public_url = f"{parsed.scheme}://{parsed.netloc}{parsed.path}"

    file_data = path.read_bytes()
    put_req = urllib.request.Request(upload_url, data=file_data, method="PUT")
    put_req.add_header("Content-Type", "application/octet-stream")
    with urllib.request.urlopen(put_req, timeout=120) as _:
        pass

    print(f"Upload complete: {public_url}")
    return public_url


def upload_url_media(media_url, token):
    """Download a remote media file and re-upload it to OSS."""
    print(f"Downloading remote media: {media_url} ...")
    req = urllib.request.Request(media_url)
    with urllib.request.urlopen(req, timeout=60) as resp:
        media_data = resp.read()
        ct = resp.headers.get("Content-Type", "")

    ext = ".jpg"
    for suffix, mime in CONTENT_TYPES.items():
        if mime in ct:
            ext = suffix
            break

    url_path = urllib.parse.urlparse(media_url).path
    if "." in url_path.split("/")[-1]:
        ext = "." + url_path.split("/")[-1].rsplit(".", 1)[-1].lower()

    filename = f"upload{ext}"
    encoded_filename = urllib.parse.quote(filename)
    print(f"Re-uploading to OSS as {filename} ...")

    resp = api_request("GET", f"{API_BASE}/image/getUploadUrl?filename={encoded_filename}", token)
    if not resp.get("success"):
        print(f"Error: Failed to get upload URL: {json.dumps(resp, ensure_ascii=False)}", file=sys.stderr)
        sys.exit(1)

    upload_url = resp["data"]["uploadUrl"]

    # Build public URL from upload_url (strip query params)
    parsed = urllib.parse.urlparse(upload_url)
    public_url = f"{parsed.scheme}://{parsed.netloc}{parsed.path}"

    put_req = urllib.request.Request(upload_url, data=media_data, method="PUT")
    put_req.add_header("Content-Type", "application/octet-stream")
    with urllib.request.urlopen(put_req, timeout=120) as _:
        pass

    print(f"Upload complete: {public_url}")
    return public_url


def parse_list_arg(value):
    if not value:
        return []
    return [item.strip() for item in value.split(",") if item.strip()]


def upload_inputs(file_paths, remote_urls, token):
    uploaded = []
    for file_path in file_paths:
        uploaded.append(upload_file(file_path, token))
    for remote_url in remote_urls:
        uploaded.append(upload_url_media(remote_url, token))
    return uploaded


def get_no_watermark_urls(task_ids, token):
    """Fetch no-watermark URLs for given task IDs."""
    urls = {}
    for task_id in task_ids:
        try:
            resp = api_request("POST", f"{API_BASE}/predict/noWatermarkImage", token,
                               data={"taskId": task_id})
            if resp.get("success") and resp.get("data", {}).get("url"):
                urls[task_id] = resp["data"]["url"]
        except Exception:
            pass
    return urls


def download_file(url, filename):
    req = urllib.request.Request(url)
    with urllib.request.urlopen(req, timeout=120) as resp:
        Path(filename).write_bytes(resp.read())


def open_file(filepath):
    """Open a local file with the OS default viewer after download."""
    system = platform.system()
    try:
        if system == "Darwin":
            subprocess.Popen(["open", filepath])
        elif system == "Windows":
            os.startfile(filepath)
        elif system == "Linux":
            subprocess.Popen(["xdg-open", filepath])
    except Exception:
        pass


def main():
    parser = argparse.ArgumentParser(description="Wuli Platform - AI Image/Video Generation")
    parser.add_argument("--action", required=True, choices=ACTION_DEFAULTS.keys(),
                        help="Action: image-gen, image-edit, txt2video, image2video, flf2video, auto-video")
    parser.add_argument("--prompt", required=True, help="Generation prompt (max 2000 chars)")
    parser.add_argument("--model", default=None, help="Model name")
    parser.add_argument("--aspect_ratio", default=None, help="Aspect ratio (e.g. 1:1, 16:9)")
    parser.add_argument("--resolution", default=None, help="Resolution (e.g. 2K, 4K, 720P, 1080P)")
    parser.add_argument("--n", type=int, default=1, help="Number of images (1-4, image only)")
    parser.add_argument("--image_url", default=None,
                        help="Reference image URL(s), comma-separated for multiple images")
    parser.add_argument("--image_path", default=None,
                        help="Local image path(s), comma-separated for multiple files")
    parser.add_argument("--end_image_url", default=None,
                        help="End-frame image URL for flf2video")
    parser.add_argument("--end_image_path", default=None,
                        help="End-frame local image path for flf2video")
    parser.add_argument("--video_url", default=None,
                        help="Reference video URL(s), comma-separated for auto-video")
    parser.add_argument("--video_path", default=None,
                        help="Local video path(s), comma-separated for auto-video")
    parser.add_argument("--duration", type=int, default=None, help="Video duration in seconds")
    parser.add_argument("--negative_prompt", default=None, help="Negative prompt")
    parser.set_defaults(optimize=True)
    parser.add_argument("--optimize", action="store_true", dest="optimize",
                        help="Enable prompt optimization (default)")
    parser.add_argument("--no-optimize", action="store_false", dest="optimize",
                        help="Disable prompt optimization")
    parser.set_defaults(sound=None)
    parser.add_argument("--sound", action="store_true", dest="sound",
                        help="Enable sound for supported video models")
    parser.add_argument("--no-sound", action="store_false", dest="sound",
                        help="Disable sound for video output")
    args = parser.parse_args()

    token = os.environ.get("WULI_API_TOKEN")
    if not token:
        print("Error: WULI_API_TOKEN environment variable is not set\n"
              "Get your API token from https://wuli.art (左下角 -> API 开放平台)\n"
              "Then set it:\n"
              '  export WULI_API_TOKEN="wuli-your-token-here"', file=sys.stderr)
        sys.exit(1)

    cfg = ACTION_DEFAULTS[args.action]
    model = args.model or cfg["model"]
    aspect_ratio = args.aspect_ratio or cfg["aspect_ratio"]
    resolution = args.resolution or cfg["resolution"]
    media_type = cfg["media_type"]
    predict_type = cfg["predict_type"]

    image_paths = parse_list_arg(args.image_path)
    image_urls = parse_list_arg(args.image_url)
    end_image_paths = parse_list_arg(args.end_image_path)
    end_image_urls = parse_list_arg(args.end_image_url)
    video_paths = parse_list_arg(args.video_path)
    video_urls = parse_list_arg(args.video_url)

    if cfg["needs_image"] and not (image_paths or image_urls or end_image_paths or end_image_urls):
        print(f"Error: image input is required for {args.action}", file=sys.stderr)
        sys.exit(1)

    # Handle media inputs: always upload to OSS (local file or remote URL)
    uploaded_images = upload_inputs(image_paths, image_urls, token)
    uploaded_end_images = upload_inputs(end_image_paths, end_image_urls, token)
    uploaded_videos = upload_inputs(video_paths, video_urls, token)

    input_image_urls = uploaded_images + uploaded_end_images
    input_video_urls = uploaded_videos

    if args.action == "image-edit":
        if not input_image_urls:
            print("Error: image-edit requires at least one reference image", file=sys.stderr)
            sys.exit(1)
        if input_video_urls:
            print("Error: image-edit does not accept video input", file=sys.stderr)
            sys.exit(1)
    elif args.action == "image2video":
        if len(input_image_urls) != 1:
            print("Error: image2video requires exactly one reference image", file=sys.stderr)
            sys.exit(1)
        if input_video_urls:
            print("Error: image2video does not accept video input", file=sys.stderr)
            sys.exit(1)
    elif args.action == "flf2video":
        if len(input_image_urls) != 2:
            print("Error: flf2video requires exactly two reference images (start and end frame)", file=sys.stderr)
            sys.exit(1)
        if input_video_urls:
            print("Error: flf2video does not accept video input", file=sys.stderr)
            sys.exit(1)
    elif args.action == "auto-video":
        if not input_image_urls and not input_video_urls:
            print("Error: auto-video requires at least one reference image or video", file=sys.stderr)
            sys.exit(1)

    input_image_list = [{"imageUrl": url} for url in input_image_urls]
    input_video_list = [{"imageUrl": url} for url in input_video_urls]

    # Build request
    body = {
        "modelName": model,
        "mediaType": media_type,
        "predictType": predict_type,
        "prompt": args.prompt,
        "aspectRatio": aspect_ratio,
        "resolution": resolution,
        "n": args.n if media_type == "IMAGE" else 1,
        "optimizePrompt": args.optimize,
        "inputImageList": input_image_list,
        "inputVideoList": input_video_list,
    }

    duration = args.duration
    if media_type == "VIDEO":
        body["videoTotalSeconds"] = duration if duration else 5
        if args.sound is not None:
            body["sound"] = args.sound
    if args.negative_prompt:
        body["negativePrompt"] = args.negative_prompt

    print(f"\n=== Wuli Platform: {args.action} ===")
    print(f"Model:  {model}")
    print(f"Prompt: {args.prompt}")
    print(f"Optimize Prompt: {body['optimizePrompt']}")
    if media_type == "VIDEO":
        print(f"Duration: {body.get('videoTotalSeconds', 5)}s")
        if args.sound is None:
            print("Sound: auto (backend default)")
        else:
            print(f"Sound: {args.sound}")
    print(f"Aspect: {aspect_ratio}  Resolution: {resolution}")
    if input_image_list:
        print(f"Image refs: {len(input_image_list)}")
    if input_video_list:
        print(f"Video refs: {len(input_video_list)}")
    print("\nSubmitting request...")

    # Submit
    resp = api_request("POST", f"{API_BASE}/predict/submit", token, data=body)
    if not resp.get("success"):
        print(f"Error: Submit failed - {json.dumps(resp, ensure_ascii=False)}", file=sys.stderr)
        sys.exit(1)

    record_id = resp["data"]["recordId"]
    print(f"Record ID: {record_id}")
    print("Waiting for generation...")

    # Poll
    poll_interval = 10 if media_type == "VIDEO" else 5
    max_attempts = 120 if media_type == "VIDEO" else 60

    for attempt in range(1, max_attempts + 1):
        time.sleep(poll_interval)

        query_resp = api_request("GET", f"{API_BASE}/predict/query?recordId={record_id}", token)
        status = query_resp.get("data", {}).get("recordStatus", "UNKNOWN")

        if status == "SUCCEED":
            print("Generation completed!\n")
            timestamp = time.strftime("%Y%m%d_%H%M%S")
            results = query_resp["data"].get("results", [])

            # Fetch no-watermark URLs
            task_ids = [item["taskId"] for item in results if item.get("taskId")]
            print("Fetching no-watermark URLs...")
            nw_urls = get_no_watermark_urls(task_ids, token)

            if media_type == "IMAGE":
                downloaded = []
                for i, item in enumerate(results, 1):
                    task_id = item.get("taskId")
                    url = nw_urls.get(task_id) or item.get("imageUrl")
                    if url:
                        filename = f"wuli_image_{timestamp}_{i}.png"
                        src = "no-watermark" if task_id in nw_urls else "watermarked"
                        print(f"Downloading ({src}): {filename}")
                        download_file(url, filename)
                        downloaded.append(filename)
                print(f"\nDownloaded {len(downloaded)} image(s) to current directory")
                for f in downloaded:
                    open_file(f)
            else:
                task_id = results[0].get("taskId") if results else None
                url = nw_urls.get(task_id) or (results[0].get("imageUrl") if results else None)
                if url:
                    filename = f"wuli_video_{timestamp}.mp4"
                    src = "no-watermark" if task_id in nw_urls else "watermarked"
                    print(f"Downloading ({src}): {filename}")
                    download_file(url, filename)
                    print("Video downloaded to current directory")
                    open_file(filename)
            return

        if status in ("FAILED", "REVIEW_FAILED", "TIMEOUT", "CANCELLED"):
            print(f"Generation {status}")
            for item in query_resp.get("data", {}).get("results", []):
                err = item.get("errorMsg")
                if err:
                    print(f"  Task {item.get('taskId')}: {err}")
            sys.exit(1)

        elapsed = attempt * poll_interval
        print(f"Status: {status} ({elapsed}s elapsed)")

    print(f"Timeout: Generation took too long (>{max_attempts * poll_interval}s)")
    sys.exit(1)


if __name__ == "__main__":
    main()
ClawHub Coding Marketing+2
Z@clawhub-sir1st-inc-bbc3d5b6f5