@clawhub-yoimiya66-1a1e1f2b59
Generate ERNIE-Image-Turbo images through Baidu AI Studio and craft ERNIE-Image prompts for posters, comics, infographics, ecommerce images, UI-style visuals...
---
name: ernie-image-visual-promptsmith
description: Generate ERNIE-Image-Turbo images through Baidu AI Studio and craft ERNIE-Image prompts for posters, comics, infographics, ecommerce images, UI-style visuals, bilingual text rendering, structured layouts, negative prompts, generation settings, and use_pe decisions. Requires a user-provided AI Studio API key and is not an official Baidu skill.
metadata:
openclaw:
emoji: "\U0001F3A8"
skillKey: "ernie-image-visual-promptsmith"
homepage: "https://aistudio.baidu.com/account/accessToken"
requires:
env:
- BAIDU_AISTUDIO_API_KEY
anyBins:
- python3
- python
- py
primaryEnv: BAIDU_AISTUDIO_API_KEY
---
# ERNIE-Image Visual Promptsmith
Use this community skill to craft ERNIE-Image prompts and generate images through the AI Studio ERNIE-Image-Turbo endpoint. It is not official Baidu or ERNIE-Image software.
## Decide the Mode
- Generate immediately when the user asks to generate, draw, create, make an image, or uses equivalent Chinese generation wording.
- Return prompt-only guidance when the user asks to optimize, rewrite, improve, or review a prompt.
- Ask one concise question only if an exact visible text string, language, or required aspect ratio is missing and guessing would likely break the result.
## API Endpoint
- Base: `https://aistudio.baidu.com/llm/lmapi/v3`
- Submit: `POST /images/generations`
- Full URL: `https://aistudio.baidu.com/llm/lmapi/v3/images/generations`
- Auth header: `Authorization: bearer <BAIDU_AISTUDIO_API_KEY>`
- Platform header: `X-Client-Platform: aistudio`
## API Key
- Required environment variable: `BAIDU_AISTUDIO_API_KEY`
- Get a key: `https://aistudio.baidu.com/account/accessToken`
- If the key is missing, do not call the API. Tell the user to set `BAIDU_AISTUDIO_API_KEY`.
## Triggers
- Chinese examples: `ERNIE image: <prompt>`, `Wenxin image: <prompt>`, `generate image: <prompt>`, or equivalent Chinese wording for image generation.
- English examples: `ernie image: <prompt>`, `generate image: <prompt>`, `create image: <prompt>`.
- Treat text after the colon as the raw user prompt, improve it, choose a preset, then generate.
- If the user asks to optimize, rewrite, improve, or review a prompt, return prompt-only guidance and do not call the API.
## Prompt Workflow
1. Classify the image style: photorealistic, anime/manga, text-in-image, concept art, abstract/artistic, layout/composition, poster, ecommerce, infographic, comic/storyboard, UI screenshot style, or character-consistent visual.
2. Preserve immutable constraints: exact in-image text, language, subject count, character identity, spatial relationships, size, style, and forbidden elements.
3. Build the core prompt in five parts: subject -> action/context -> style -> lighting -> quality.
4. For layout-sensitive requests, append composition -> exact text -> spatial placement.
5. Keep in-image writing short when possible. Turn paragraphs into titles, labels, badges, or numbered lines.
6. For text rendering, put exact wording in quotes and specify placement, font weight, alignment, color, background contrast, and whitespace.
7. Choose a preset from `auto`, `text-poster`, `infographic`, `comic`, `product`, `ui`, `photo`, `concept`, or `abstract`.
8. Before generation, state:
```markdown
Final Prompt: <prompt>
Preset: <preset>
use_pe: <true or false>
Size: <size>
Reason: <why these settings fit ERNIE-Image>
```
## Generation Workflow
Use the bundled Python script. Prefer `python3`; on Windows use `python` or `py` if needed.
```bash
python3 {baseDir}/scripts/generate.py --prompt "<FINAL_PROMPT>" --preset <preset>
```
For exact text, bilingual labels, UI, flowcharts, signs, comics, or already detailed prompts, pass `--no-use-pe`.
```bash
python3 {baseDir}/scripts/generate.py --prompt "<FINAL_PROMPT>" --preset text-poster --no-use-pe
```
The script prints `IMAGE_URL:<url>` for URL responses and `MEDIA:<absolute_path>` for each saved image. Return the saved media path to the user.
If `BAIDU_AISTUDIO_API_KEY` is missing, tell the user to get a key from `https://aistudio.baidu.com/account/accessToken` and set `BAIDU_AISTUDIO_API_KEY`.
## Submit Payload
```json
{
"model": "ERNIE-Image-Turbo",
"prompt": "<FINAL_PROMPT>",
"n": 1,
"response_format": "url",
"size": "1024x1024",
"seed": 42,
"use_pe": true,
"num_inference_steps": 8,
"guidance_scale": 1.0
}
```
## Download and Output
- `response_format=url` returns image URLs in `data[]`; the script prints `IMAGE_URL:<url>`.
- The script downloads each URL immediately and saves the image locally.
- The script prints `MEDIA:<absolute_path>` for OpenClaw/ClawHub auto-attach.
- URLs may expire; the local file remains available after download.
- Output names are generated as `ernie-image-<timestamp>-<index>.<ext>`.
- Do not pass user-controlled filenames to shell commands.
## Defaults
- Model: `ERNIE-Image-Turbo`
- Preset: `auto`
- Count: `1`
- Response format: `url`
- Seed: `42`
- `text-poster`, `infographic`, `comic`, `product`, and `ui` presets default to `use_pe=false`.
- `photo`, `concept`, and `abstract` presets default to `use_pe=true`.
## Negative Prompt Rules
- Do not add `text`, `letters`, `typography`, `Chinese text`, or `English text` when the user wants readable writing.
- Prefer precise negatives: distorted text, misspelled words, duplicated letters, unreadable typography, warped layout, cropped title, low contrast, blurry details, inconsistent panels, artifacts.
- The API does not expose a separate negative prompt field in this skill. Express exclusions as natural language constraints inside the prompt, such as "avoid cluttered background" or "no visible watermark".
## Retry Strategy
- Text errors: reduce the amount of visible text, quote exact words once, add stronger placement and contrast, then use `--no-use-pe`.
- Layout errors: simplify object count, name each region, use grid/split-screen/foreground/background terms, then keep the same seed.
- Weak style: add camera/lens, art movement, medium, color temperature, material texture, and lighting direction.
- Cluttered image: remove secondary elements, add negative space, use "avoid cluttered background", and switch to a simpler preset if needed.
## References
- Read `references/api.md` for parameters, command examples, and endpoint mapping.
- Read `references/prompt-architecture.md` for ERNIE-Image prompt templates.
- Read `references/examples.md` for acceptance-style examples.
FILE:scripts/generate.py
#!/usr/bin/env python3
from __future__ import annotations
import argparse
import base64
import datetime as dt
import json
import os
from pathlib import Path
import sys
import urllib.error
import urllib.request
from urllib.parse import urlparse
API_URL = "https://aistudio.baidu.com/llm/lmapi/v3/images/generations"
KEY_URL = "https://aistudio.baidu.com/account/accessToken"
ENV_NAME = "BAIDU_AISTUDIO_API_KEY"
SIZES = (
"1024x1024",
"1376x768",
"1264x848",
"1200x896",
"896x1200",
"848x1264",
"768x1376",
)
MODELS = ("ERNIE-Image-Turbo", "ERNIE-Image")
PRESETS = (
"auto",
"text-poster",
"infographic",
"comic",
"product",
"ui",
"photo",
"concept",
"abstract",
)
PRESET_SETTINGS = {
"text-poster": {
"size": "896x1200",
"use_pe": False,
"steps": 8,
"guidance_scale": 1.0,
},
"infographic": {
"size": "1376x768",
"use_pe": False,
"steps": 8,
"guidance_scale": 1.0,
},
"comic": {
"size": "1024x1024",
"use_pe": False,
"steps": 8,
"guidance_scale": 1.0,
},
"product": {
"size": "1024x1024",
"use_pe": False,
"steps": 8,
"guidance_scale": 1.0,
},
"ui": {
"size": "768x1376",
"use_pe": False,
"steps": 8,
"guidance_scale": 1.0,
},
"photo": {
"size": "1024x1024",
"use_pe": True,
"steps": 8,
"guidance_scale": 1.0,
},
"concept": {
"size": "1376x768",
"use_pe": True,
"steps": 8,
"guidance_scale": 1.0,
},
"abstract": {
"size": "896x1200",
"use_pe": True,
"steps": 8,
"guidance_scale": 1.0,
},
}
FALLBACK_SETTINGS = {
"size": "1024x1024",
"use_pe": True,
"steps": 8,
"guidance_scale": 1.0,
}
def parse_args(argv: list[str]) -> argparse.Namespace:
parser = argparse.ArgumentParser(
description="Generate images with ERNIE-Image through Baidu AI Studio."
)
parser.add_argument("--prompt", required=True, help="Text-to-image prompt.")
parser.add_argument("--model", default="ERNIE-Image-Turbo", choices=MODELS)
parser.add_argument(
"--preset",
default="auto",
choices=PRESETS,
help="Scene preset that chooses size, use_pe, steps, and guidance defaults.",
)
parser.add_argument("--n", type=int, default=1, choices=(1, 2, 3, 4))
parser.add_argument(
"--response-format", default="url", choices=("url", "b64_json")
)
parser.add_argument("--size", default=None, choices=SIZES)
parser.add_argument("--seed", type=int, default=42)
parser.add_argument("--steps", type=int, default=None)
parser.add_argument("--guidance-scale", type=float, default=None)
pe_group = parser.add_mutually_exclusive_group()
pe_group.add_argument("--use-pe", dest="use_pe", action="store_true", default=None)
pe_group.add_argument("--no-use-pe", dest="use_pe", action="store_false")
parser.add_argument("--out-dir", default=".", help="Directory for saved images.")
parser.add_argument(
"--dry-run", action="store_true", help="Print request JSON and exit."
)
return parser.parse_args(argv)
def infer_preset(prompt: str) -> str:
text = prompt.lower()
checks = (
(
"infographic",
(
"infographic",
"flowchart",
"diagram",
"timeline",
"process",
"chart",
"流程图",
"信息图",
"步骤",
),
),
(
"comic",
(
"comic",
"manga",
"storyboard",
"panel",
"four-panel",
"四格",
"漫画",
"分镜",
),
),
(
"ui",
(
"ui",
"screenshot",
"app screen",
"dashboard",
"interface",
"启动页",
"界面",
"截图",
),
),
(
"product",
(
"product",
"ecommerce",
"hero image",
"commercial shot",
"产品",
"电商",
"主图",
),
),
(
"text-poster",
(
"exact text",
"heading",
"title",
"poster",
"banner",
"label",
"sign",
"typography",
"文字",
"标题",
"海报",
"横幅",
"说明牌",
),
),
(
"abstract",
(
"abstract",
"bauhaus",
"geometric",
"surreal",
"artistic",
"抽象",
"艺术",
),
),
(
"concept",
(
"concept art",
"sci-fi",
"fantasy",
"worldbuilding",
"environment design",
"概念图",
"科幻",
"奇幻",
),
),
(
"photo",
(
"photorealistic",
"photo",
"photograph",
"portrait",
"camera",
"lens",
"摄影",
"照片",
"写实",
),
),
)
for preset, keywords in checks:
if any(keyword in text for keyword in keywords):
return preset
return "photo"
def resolve_settings(args: argparse.Namespace) -> dict:
preset = infer_preset(args.prompt) if args.preset == "auto" else args.preset
settings = dict(FALLBACK_SETTINGS)
settings.update(PRESET_SETTINGS.get(preset, {}))
if args.model == "ERNIE-Image" and args.steps is None:
settings["steps"] = 50
settings["guidance_scale"] = 4.0
if args.size is not None:
settings["size"] = args.size
if args.steps is not None:
settings["steps"] = args.steps
if args.guidance_scale is not None:
settings["guidance_scale"] = args.guidance_scale
if args.use_pe is not None:
settings["use_pe"] = args.use_pe
settings["preset"] = preset
return settings
def build_payload(args: argparse.Namespace) -> dict:
settings = resolve_settings(args)
return {
"model": args.model,
"prompt": args.prompt,
"n": args.n,
"response_format": args.response_format,
"size": settings["size"],
"seed": args.seed,
"use_pe": settings["use_pe"],
"num_inference_steps": settings["steps"],
"guidance_scale": settings["guidance_scale"],
}
def request_generation(api_key: str, payload: dict) -> dict:
body = json.dumps(payload, ensure_ascii=False).encode("utf-8")
headers = {
"Authorization": f"bearer {api_key}",
"Content-Type": "application/json",
"X-Client-Platform": "aistudio",
"Accept": "application/json",
}
req = urllib.request.Request(API_URL, data=body, headers=headers, method="POST")
try:
with urllib.request.urlopen(req, timeout=120) as resp:
raw = resp.read()
except urllib.error.HTTPError as exc:
error_body = exc.read().decode("utf-8", "replace")
raise RuntimeError(f"HTTP {exc.code} {exc.reason}: {error_body}") from None
except urllib.error.URLError as exc:
raise RuntimeError(f"Network error: {exc.reason}") from None
try:
parsed = json.loads(raw.decode("utf-8"))
except json.JSONDecodeError:
raise RuntimeError(f"Non-JSON response: {raw[:300]!r}") from None
if "error" in parsed:
raise RuntimeError(f"API error: {parsed['error']}")
return parsed
def ensure_out_dir(path: str) -> Path:
out_dir = Path(path).expanduser().resolve()
out_dir.mkdir(parents=True, exist_ok=True)
return out_dir
def timestamp_name(index: int, suffix: str = ".png") -> str:
stamp = dt.datetime.now().strftime("%Y%m%d-%H%M%S-%f")
return f"ernie-image-{stamp}-{index}{suffix}"
def extension_from_url(url: str) -> str:
suffix = Path(urlparse(url).path).suffix.lower()
if suffix in {".png", ".jpg", ".jpeg", ".webp"}:
return suffix
return ".png"
def download_url(url: str, out_path: Path) -> None:
req = urllib.request.Request(
url, headers={"User-Agent": "OpenClaw ERNIE-Image Skill"}
)
try:
with urllib.request.urlopen(req, timeout=180) as resp:
data = resp.read()
except Exception as exc:
raise RuntimeError(f"Failed to fetch generated image: {exc}") from None
out_path.write_bytes(data)
def save_b64(data: str, out_path: Path) -> None:
try:
out_path.write_bytes(base64.b64decode(data))
except Exception as exc:
raise RuntimeError(f"Failed to decode b64_json image: {exc}") from None
def response_items(response: dict) -> list[dict]:
data = response.get("data")
if not isinstance(data, list) or not data:
raise RuntimeError(f"Response did not include image data: {response}")
items = []
for item in data:
if isinstance(item, dict):
items.append(item)
else:
raise RuntimeError(f"Unexpected image item: {item!r}")
return items
def save_outputs(response: dict, response_format: str, out_dir: Path) -> None:
for index, item in enumerate(response_items(response), start=1):
if response_format == "url":
url = item.get("url")
if not url:
raise RuntimeError(f"Image item missing URL: {item}")
print(f"IMAGE_URL:{url}")
out_path = out_dir / timestamp_name(index, extension_from_url(url))
download_url(url, out_path)
else:
b64 = item.get("b64_json")
if not b64:
raise RuntimeError(f"Image item missing b64_json: {item}")
out_path = out_dir / timestamp_name(index, ".png")
save_b64(b64, out_path)
print(f"MEDIA:{out_path.resolve()}")
def main(argv: list[str]) -> int:
args = parse_args(argv)
payload = build_payload(args)
if args.dry_run:
print(json.dumps(payload, ensure_ascii=False, indent=2))
return 0
api_key = os.getenv(ENV_NAME, "").strip()
if not api_key:
print(
f"Error: set {ENV_NAME} before generating. Get a key from {KEY_URL}.",
file=sys.stderr,
)
return 2
out_dir = ensure_out_dir(args.out_dir)
response = request_generation(api_key, payload)
save_outputs(response, args.response_format, out_dir)
return 0
if __name__ == "__main__":
try:
raise SystemExit(main(sys.argv[1:]))
except Exception as exc:
print(f"Error: {exc}", file=sys.stderr)
raise SystemExit(1)
FILE:references/api.md
# AI Studio ERNIE-Image API
This skill uses the AI Studio OpenAI-compatible image endpoint through the bundled zero-dependency script.
## API Key
- Environment variable: `BAIDU_AISTUDIO_API_KEY`
- Key page: `https://aistudio.baidu.com/account/accessToken`
- Missing key behavior: do not call the API; ask the user to set the env var.
## API Endpoint
- Base: `https://aistudio.baidu.com/llm/lmapi/v3`
- Submit: `POST /images/generations`
- Full URL: `https://aistudio.baidu.com/llm/lmapi/v3/images/generations`
- Headers:
- `Authorization: bearer <BAIDU_AISTUDIO_API_KEY>`
- `Content-Type: application/json`
- `X-Client-Platform: aistudio`
## Step 1 - Submit Generation
The ERNIE-Image endpoint returns `data[]` synchronously. There is no task id and no polling step in this skill.
Default payload:
```json
{
"model": "ERNIE-Image-Turbo",
"prompt": "<FINAL_PROMPT>",
"n": 1,
"response_format": "url",
"size": "1024x1024",
"seed": 42,
"use_pe": true,
"num_inference_steps": 8,
"guidance_scale": 1.0
}
```
Supported values:
| Field | Values |
|---|---|
| `model` | `ERNIE-Image-Turbo`, `ERNIE-Image` |
| `n` | `1`, `2`, `3`, `4` |
| `response_format` | `url`, `b64_json` |
| `size` | `1024x1024`, `1376x768`, `1264x848`, `1200x896`, `896x1200`, `848x1264`, `768x1376` |
| `seed` | integer |
| `use_pe` | boolean |
| `num_inference_steps` | integer |
| `guidance_scale` | number |
## Step 2 - Save Result
- For `response_format=url`, read `data[i].url`, print `IMAGE_URL:<url>`, download it immediately, and save a local image.
- For `response_format=b64_json`, read `data[i].b64_json`, decode it, and save a local PNG.
- Local filenames are generated by the script as `ernie-image-<timestamp>-<index>.<ext>`.
- The script never uses a user-controlled output filename in shell commands.
## Step 3 - Output Media
Print:
```text
MEDIA:<absolute_path>
```
OpenClaw/ClawHub can auto-attach this file. URL outputs may expire, but the downloaded local file remains available.
## Presets
The bundled script adds a `--preset` layer before sending API fields. User-supplied CLI values override preset defaults.
| Preset | Size | use_pe | Steps | Guidance | Use for |
|---|---:|---:|---:|---:|---|
| `auto` | inferred | inferred | inferred | inferred | Infer from prompt keywords |
| `text-poster` | `896x1200` | false | 8 | 1.0 | posters, banners, signage, exact text |
| `infographic` | `1376x768` | false | 8 | 1.0 | flowcharts, diagrams, process graphics |
| `comic` | `1024x1024` | false | 8 | 1.0 | comic panels, storyboards, speech bubbles |
| `product` | `1024x1024` | false | 8 | 1.0 | ecommerce hero images and labeled product shots |
| `ui` | `768x1376` | false | 8 | 1.0 | app screens, dashboards, UI mockups |
| `photo` | `1024x1024` | true | 8 | 1.0 | photorealistic scenes and product photography |
| `concept` | `1376x768` | true | 8 | 1.0 | sci-fi, fantasy, environments, worldbuilding |
| `abstract` | `896x1200` | true | 8 | 1.0 | abstract posters and artistic compositions |
If `--model ERNIE-Image` is used and `--steps` is not provided, the script raises the default to 50 steps and guidance scale 4.0.
## Bundled Script
Default generation:
```bash
python3 {baseDir}/scripts/generate.py --prompt "一只可爱的猫咪坐在窗台上" --preset auto
```
Precision text generation:
```bash
python3 {baseDir}/scripts/generate.py --prompt "A spring sale poster with exact text \"Spring Sale 50% OFF\" centered in the image" --preset text-poster --no-use-pe
```
Dry-run request:
```bash
python3 {baseDir}/scripts/generate.py --prompt "<prompt>" --dry-run
```
## Python SDK Shape
```python
import base64
from openai import OpenAI
client = OpenAI(
api_key="{api_key}",
base_url="https://aistudio.baidu.com/llm/lmapi/v3",
default_headers={"X-Client-Platform": "aistudio"},
)
img = client.images.generate(
model="ERNIE-Image-Turbo",
prompt="一只可爱的猫咪坐在窗台上",
n=1,
response_format="url",
size="1024x1024",
extra_body={
"seed": 42,
"use_pe": True,
"num_inference_steps": 8,
"guidance_scale": 1.0,
},
)
print(img.data[0].url)
```
## curl Shape
```bash
curl --location "https://aistudio.baidu.com/llm/lmapi/v3/images/generations" \
--header "Authorization: bearer {api_key}" \
--header "Content-Type: application/json" \
--header "X-Client-Platform: aistudio" \
--data '{
"model": "ERNIE-Image-Turbo",
"prompt": "一只可爱的猫咪坐在窗台上",
"n": 1,
"response_format": "url",
"size": "1024x1024",
"seed": 42,
"use_pe": true,
"num_inference_steps": 8,
"guidance_scale": 1.0
}'
```
## Prompt Enhancer and Exclusions
- Use `use_pe=true` to expand short creative prompts.
- Use `use_pe=false` when exact text, labels, flowchart order, UI text, multi-panel order, or character consistency matters.
- This API call does not send a separate `negative_prompt` field. Put exclusions into the positive prompt as natural language constraints, such as `avoid cluttered background`, `no visible watermark`, or `keep the title uncropped`.
FILE:references/examples.md
# Examples
Use these examples to calibrate prompt quality and generation settings. They are template-style examples adapted for this skill; do not treat them as official wording.
## Style Category Examples
### Photorealistic
Input: `拍一个商业咖啡杯产品图`
Prompt: Photorealistic close-up commercial product photograph of a matte ceramic coffee mug on a light oak table, cup centered with handle visible, soft morning sunlight from the left, gentle shadow on the right, subtle background blur, visible ceramic texture, clean ecommerce styling, sharp detail, realistic reflections, avoid cluttered background.
Command: `python3 {baseDir}/scripts/generate.py --prompt "<prompt>" --preset photo`
### Anime & Manga
Input: `画一个90年代动画风的图书馆少女`
Prompt: 1990s anime-style illustration of a cheerful teenage librarian character arranging books in a sunlit library, short chestnut hair, round glasses, blue cardigan, expressive eyes, clean ink linework, warm watercolor tones, soft afternoon window light, consistent character proportions, detailed shelves in the background.
Command: `python3 {baseDir}/scripts/generate.py --prompt "<prompt>" --preset abstract --size 896x1200`
### Text in Image
Input: `产品横幅,文字 NEW ARRIVAL`
Prompt: Minimalist product banner with exact text "NEW ARRIVAL" centered at the top in bold clean sans-serif typography, matte white background, a single rose-colored skincare jar centered below the title, strong contrast, generous whitespace, professional ecommerce photography, avoid extra text overlays.
Command: `python3 {baseDir}/scripts/generate.py --prompt "<prompt>" --preset text-poster`
### Concept Art
Input: `科幻城市概念图`
Prompt: Cinematic concept art of a futuristic coastal city built around a glowing vertical energy tower, small aircraft in the sky for scale, foreground observation deck with tiny silhouettes, layered bridges in the midground, ocean and storm clouds in the background, volumetric blue rim light, misty atmosphere, detailed production design, coherent spatial layout.
Command: `python3 {baseDir}/scripts/generate.py --prompt "<prompt>" --preset concept`
### Abstract & Artistic
Input: `抽象海报,主题是时间流动`
Prompt: Abstract artistic poster about the flow of time, Bauhaus-inspired geometric composition, overlapping circles and diagonal lines suggesting motion, deep navy and warm amber palette with silver accents, flat screen-print texture, balanced negative space, clean modern gallery-poster finish, avoid cluttered background.
Command: `python3 {baseDir}/scripts/generate.py --prompt "<prompt>" --preset abstract`
### Layout & Composition
Input: `左右对比图,过去和未来`
Prompt: Split-screen conceptual poster comparing past and future, left half shows an old analog clock and cracked stone texture, right half shows a clean glowing city skyline, vertical divider at the center, equal visual weight on both sides, strong left-right contrast, centered symmetry, title space at the top, balanced negative space, cinematic lighting.
Command: `python3 {baseDir}/scripts/generate.py --prompt "<prompt>" --preset concept`
## Acceptance Examples
### 1. Coffee Poster
Input: `做一张咖啡海报`
Prompt: Create a warm modern coffee poster. Place a ceramic cup of latte in the center on a wooden table, with visible steam rising upward. Add the exact title "Fresh Coffee" at the top center in large cream-colored serif lettering, and add the smaller text "Start your morning right" along the bottom. Use a deep espresso and cream color palette, soft morning window light, shallow depth of field, clean margins, readable typography, and a premium cafe advertising style.
Command: `python3 {baseDir}/scripts/generate.py --prompt "<prompt>" --preset text-poster --use-pe`
### 2. Exact English Sale Poster
Input: `生成海报,必须写 Spring Sale 50% OFF`
Prompt: Create a fresh spring retail sale poster. Use a bright white and soft green background with flowers around the edges and clean open space in the center. Show the exact text "Spring Sale 50% OFF" in large readable lettering at the center, with "Spring Sale" above "50% OFF". Use crisp typography, strong contrast, soft daylight, balanced margins, and a modern retail poster style.
Command: `python3 {baseDir}/scripts/generate.py --prompt "<prompt>" --preset text-poster`
### 3. Bilingual Flowchart
Input: `一张中英文双语流程图,标题是 The Coffee Making Process`
Prompt: Create a clean bilingual coffee-making flowchart infographic. Put the exact title "The Coffee Making Process" at the top center. Arrange five steps left to right with simple icons and exact labels: "1. Grind / 研磨", "2. Heat Water / 烧水", "3. Brew / 冲煮", "4. Pour / 倒入", "5. Enjoy / 享用". Use a cream background, dark brown text, aligned arrows, consistent spacing, readable typography, high contrast, and clean vector-style illustrations.
Command: `python3 {baseDir}/scripts/generate.py --prompt "<prompt>" --preset infographic`
### 4. Four-Panel Comic
Input: `四格漫画,一个机器人学会做饭`
Prompt: Create a four-panel comic in a 2x2 grid with clean borders and consistent character design. Panel 1: a small friendly silver robot opens a cookbook in a bright kitchen, speech bubble "I can learn this." Panel 2: the robot carefully chops vegetables with a focused expression, speech bubble "Step one: be careful." Panel 3: the robot stirs a soup pot while steam forms a heart shape, speech bubble "It smells good!" Panel 4: the robot serves a colorful meal to a smiling human friend, speech bubble "Dinner is ready." Keep the robot's body shape, face screen, colors, and apron consistent across all panels.
Command: `python3 {baseDir}/scripts/generate.py --prompt "<prompt>" --preset comic`
### 5. Ecommerce Product Image
Input: `电商主图,白色无线耳机,突出降噪和长续航`
Prompt: Create a premium ecommerce hero image for white wireless earbuds. Place the earbuds and charging case in the center on a soft light-gray background, with glossy reflections and rim lighting. Add two small feature callouts around the product with exact text: "主动降噪" and "36小时续航". Use clean spacing, accurate product shape, realistic materials, sharp edges, subtle shadows, and a high-end technology advertising style.
Command: `python3 {baseDir}/scripts/generate.py --prompt "<prompt>" --preset product`
### 6. App Launch Screen
Input: `APP启动页,名字是 MindGarden`
Prompt: Create a polished mobile app launch screen for a wellness app. Use a vertical 9:16 layout with a calm illustrated garden at dawn. Put the exact app name "MindGarden" centered in the upper third in clean rounded typography. Place a small leaf logo above the name and the tagline "Grow a calmer day" near the bottom. Use soft green, white, and warm sunlight, spacious composition, crisp UI-style text, and aligned mobile-screen proportions.
Command: `python3 {baseDir}/scripts/generate.py --prompt "<prompt>" --preset ui`
### 7. Text-Dense Sign
Input: `做一张说明牌,标题 Safety Rules,内容三条:Wear goggles, Keep hands clear, Stop before cleaning`
Prompt: Create a front-facing safety instruction board with a white background and dark navy text. Put the exact title "Safety Rules" at the top in large bold lettering. Below it, show three numbered lines with exact text: "1. Wear goggles", "2. Keep hands clear", "3. Stop before cleaning". Use simple safety icons beside each line, strong contrast, consistent line spacing, and a clean industrial signage style.
Command: `python3 {baseDir}/scripts/generate.py --prompt "<prompt>" --preset text-poster --size 1200x896`
### 8. Detailed Prompt Minimal Rewrite
Input: `A detailed fantasy city at sunset, floating bridges, blue roofs, orange sky, two airships, cinematic lighting, wide angle, no people, highly detailed`
Prompt: Create a wide-angle cinematic fantasy city at sunset. Show blue-roofed towers connected by floating bridges, with exactly two airships in the orange sky. Keep the city highly detailed, with warm rim lighting, atmospheric depth, no people, and a grand establishing-shot composition.
Command: `python3 {baseDir}/scripts/generate.py --prompt "<prompt>" --preset concept --no-use-pe`
## Repair Examples
### Text Is Misspelled
Problem: the generated poster misspelled "Spring Sale 50% OFF".
Repair Prompt: Create a simple spring retail poster with only one visible text string: "Spring Sale 50% OFF". Place that exact text in the center, bold sans-serif, dark green letters on a plain white background, high contrast, large margins, no other text, avoid distorted letters.
Command: `python3 {baseDir}/scripts/generate.py --prompt "<repair prompt>" --preset text-poster --seed 42`
### Comic Panels Bleed Together
Problem: a four-panel comic merged scenes or mixed dialogue.
Repair Prompt: Create a strict 2x2 four-panel comic grid with thick black borders and clear separation. Panel 1 only: robot reads cookbook, speech bubble "I can learn this." Panel 2 only: robot chops vegetables, speech bubble "Step one: be careful." Panel 3 only: robot stirs soup, speech bubble "It smells good!" Panel 4 only: robot serves dinner, speech bubble "Dinner is ready." Keep one consistent silver robot in every panel, avoid overlapping panels.
Command: `python3 {baseDir}/scripts/generate.py --prompt "<repair prompt>" --preset comic --seed 42`
FILE:references/prompt-architecture.md
# Prompt Architecture
Use this reference to convert vague requests into ERNIE-Image-ready prompts. It adapts public ERNIE Image prompt-guide patterns into this skill's API workflow. Preserve exact visible text unchanged.
## Core Prompt Formula
Use the five-part base formula for most prompts:
1. Subject: who or what appears.
2. Action or context: what is happening and where.
3. Style: photography, anime, poster, UI, vector, oil paint, concept art, etc.
4. Lighting: direction, intensity, color temperature, mood.
5. Quality: detail level, lens or medium, sharpness, depth, finish.
For structured design tasks, append:
6. Composition: grid, rule of thirds, split screen, centered symmetry, foreground/background, negative space.
7. Exact text: quoted strings only, preferably short phrases or labels.
8. Spatial placement: title location, label position, relative item size, margins, contrast.
## Style Categories
### Photorealistic
Use for product shots, portraits, architecture, food, interiors, and documentary scenes. Include camera/lens, lighting source, material texture, depth of field, subject placement, and realistic shadows.
Template: `Photorealistic [shot type] of [subject] in [environment], [camera/lens/framing], [light direction and color], [materials/textures], [background depth], sharp detail, realistic shadows, natural proportions.`
### Anime & Manga
Use for anime characters, manga panels, fantasy illustration, stylized scenes, and comic storytelling. Specify era or visual language, linework, color/grayscale, facial features, hair, outfit, and panel framing.
Template: `[anime/manga style] illustration of [character] [action/context], [hair/eyes/outfit/distinctive features], [background], [linework/color treatment], [lighting], consistent character design.`
### Text in Image
Use for posters, product banners, cards, covers, signage, infographic labels, and UI-like graphics. Keep text short where possible; long paragraphs should become a title plus labels or numbered lines. Put exact text in quotes.
Template: `[design type] with exact text "[text]" at [position], [font weight/style], [text color], [background contrast], [alignment], [surrounding whitespace], readable typography, clean layout.`
### Concept Art
Use for sci-fi, fantasy, game environments, creatures, vehicles, maps, and cinematic worldbuilding. Specify the main focal element, scale cues, foreground/midground/background, lighting effects, atmosphere, and production-design style.
Template: `Cinematic concept art of [subject/world], [foreground element], [midground], [background], [scale cues], [lighting effect], [atmosphere], detailed production design, coherent spatial layout.`
### Abstract & Artistic
Use for fine art, generative visuals, posters, album-art style images, geometric compositions, and expressive aesthetics. Specify movement, medium, dominant/accent colors, texture, temperature, and surface treatment.
Template: `[art movement/medium] composition of [theme], [dominant colors] with [accent colors], [texture/surface], [shape language], [mood], balanced negative space, high visual coherence.`
### Layout & Composition
Use for banners, ads, comparison graphics, editorial layouts, multi-product scenes, split-screen designs, and structured design mockups. Specify item count, relative size, alignment, spacing, and reading order.
Template: `[layout type] with [number] elements arranged in [grid/split/rule-of-thirds], [main subject] at [position], [secondary elements] at [position], [spacing], [negative space], [reading order], clean alignment.`
## `use_pe` Strategy
Use `--use-pe` for short creative ideas where added detail helps: simple animals, mood images, broad concept art, atmospheric scenes, style exploration, or prompts missing lighting/material/composition.
Use `--no-use-pe` for exact text, bilingual labels, signs, flowcharts, UI text, strict poster layout, multi-panel comics, character consistency, or long prompts that already specify composition and style.
## ERNIE-Image Quality Gates
- Exact text must be inside quotes.
- Keep visible text under 8-10 words when possible. Split long copy into short labels or numbered lines.
- Multi-panel images must describe every panel separately, in order.
- Multi-object scenes must specify object count, relative position, relative size, and spacing.
- Product shots must specify product shape, material, camera angle, background, shadow, and label placement.
- UI or poster prompts must name title area, content area, call-to-action area, margins, and contrast.
- Exact text tasks should not depend on prompt enhancement. Use `--no-use-pe`.
- If the prompt contains both exact text and rich style, keep text/layout clauses early and style clauses late.
## Failure Diagnosis
| Failure | Fix |
|---|---|
| Misspelled text | Shorten text, quote it once, specify location and high contrast, use `--no-use-pe`. |
| Missing text | Move exact text earlier in the prompt and remove competing labels. |
| Layout drift | Use grid, split-screen, top/bottom, left/right, foreground/background, and relative size terms. |
| Character inconsistency | Repeat hair, outfit, colors, accessories, and distinctive features in each panel or variant. |
| Product deformation | Simplify scene, describe material and silhouette, remove unrelated props. |
| Style conflict | Choose one primary style and make the other a minor texture or accent. |
| Cluttered result | Reduce object count, add negative space, and say "avoid cluttered background". |
| Weak cinematic quality | Add light direction, lens/camera, atmosphere, texture, and depth cues. |
## Common Mistakes and Fixes
- Too vague: add subject, context, style, lighting, and composition.
- Conflicting styles: choose one primary style; use the second style only as a texture or accent.
- Text overload: reduce paragraphs to short phrases, labels, or numbered lines.
- Missing spatial context: specify foreground, background, left/right/top/bottom, spacing, and relative scale.
- Weak text rendering: quote the exact text, place it explicitly, and specify high contrast.
- Overcrowded layout: reduce object count and add negative space.
## Task Templates
### Poster
Create a [poster type] for [topic/product/event]. Place [main subject] at [position]. Add the exact title "[title]" at [position] with [font style, weight, color, size, contrast]. Add exact supporting text "[subtitle]" at [position]. Use [style], [palette], [lighting], clear hierarchy, readable typography, generous margins, and a balanced poster layout.
### Ecommerce Image
Create a product hero image for [product]. Put the product at [center/left/right] with [camera angle]. Show [features] as clean callouts with exact labels: "[label 1]", "[label 2]", "[label 3]". Use uncluttered background, accurate product shape, realistic materials, sharp edges, controlled shadows, and high-end commercial lighting.
### Infographic or Flowchart
Create a clean infographic titled "[exact title]" at the top. Arrange [number] steps [left-to-right/top-to-bottom/radial]. Each step contains a simple icon, a numbered marker, and exact label text: "[step 1]", "[step 2]", "[step 3]". Use aligned connectors, consistent spacing, high contrast, and readable bilingual or single-language typography.
### Comic or Storyboard
Create a [number]-panel comic/storyboard in a clear grid. For each panel, specify scene, character action, facial expression, camera framing, and exact dialogue. Keep character design, clothing, colors, scale, and panel order consistent. Use readable speech bubbles and clean panel borders.
### UI Screenshot Style
Create a high-fidelity UI screenshot style image of [app/page]. Use [device/window/frame], [navigation], [main content], [controls], and exact UI text: "[text]". Keep crisp typography, realistic interface density, aligned components, clean spacing, and no decorative clutter.