@clawhub-gpt-ceb270e993
GPT-4o Image Generation & Editing Skill - Create, edit, transform, and analyze images using GPT-4o native image-2 API. Supports text-to-image, inpainting, ou...
---
name: image-2
version: 1.1.0
description: "GPT-4o Image Generation & Editing Skill - Create, edit, transform, and analyze images using GPT-4o native image-2 API. Supports text-to-image, inpainting, outpainting, style transfer, background removal, and intelligent image analysis. Ideal for marketing, product photos, illustrations, UI mockups, and visual content creation."
metadata:
openclaw:
emoji: "🎨"
homepage: "https://clawhub.ai/gpt/image-2"
always: false
skillKey: "image-2"
requires:
env:
- OPENAI_API_KEY
primaryEnv: OPENAI_API_KEY
install:
- kind: node
package: openai
bins: []
---
# Image-2 Skill
> Create, edit, transform, and analyze images with GPT-4o's native image generation API
## When to Use This Skill
Use this skill whenever the user needs to:
- **Generate images** from text descriptions ("画一张...", "生成图片...", "create an image of...")
- **Edit existing images** with natural language ("把背景去掉", "add a sunset", "换成蓝色")
- **Create variations** of an image ("生成几个变体", "make 4 variations")
- **Analyze/describe images** ("这张图是什么", "describe this image", "提取文字")
- **Remove backgrounds** ("去除背景", "remove background")
- **Style transfer** ("变成水彩风格", "make it look like Van Gogh")
- **Create marketing visuals** ("设计海报", "make a social media post")
- **Product photography** ("产品图", "product shot on white background")
- **UI/UX mockups** ("界面设计", "app mockup", "website screenshot")
## Core Workflows
### Workflow 1: Text-to-Image Generation
When the user describes an image they want to create:
1. **Enhance the prompt** — Automatically add quality boosters:
- Append professional photography/art terms based on context
- Add lighting, composition, and mood details if not specified
- Specify output format and dimensions if needed
2. **Call the API** — Use `generateImage()` with the enhanced prompt:
```javascript
const result = await generateImage(enhancedPrompt, { size, quality, style });
```
3. **Save and present** — Download the image to the project directory and show the user:
- Save to `./generated-images/` by default
- Return the file path and a brief description
### Workflow 2: Image Editing
When the user wants to modify an existing image:
1. **Locate the source image** — Find the image file path from the conversation context
2. **Parse the edit intent** — Understand what changes the user wants
3. **Call the edit API** — Use `editImage()` with the source and instruction:
```javascript
const result = await editImage(imagePath, editInstruction, { mask: maskPath });
```
4. **Present the result** — Show the edited image and describe what changed
### Workflow 3: Image Analysis
When the user asks about an image:
1. **Get the image** — From file path or URL
2. **Analyze with GPT-4o Vision** — Use `describeImage()`:
```javascript
const result = await describeImage(imageSource, question);
```
3. **Report findings** — Present the analysis in a structured format
### Workflow 4: Batch Generation
When the user needs multiple images:
1. **Parse the batch request** — Understand variations needed
2. **Generate in parallel** — Call `generateImage()` for each variant
3. **Organize results** — Save with descriptive filenames
## Prompt Enhancement Rules
When generating images, automatically enhance the user's prompt:
### Quality Boosters (always append unless user specifies quality)
```
professional quality, high resolution, sharp details
```
### Context-Based Additions
| User Intent | Auto-Add |
|-------------|----------|
| Product photo | "studio lighting, clean background, commercial photography" |
| Portrait | "professional portrait photography, natural lighting" |
| Social media | "eye-catching, vibrant colors, modern design" |
| Illustration | "detailed illustration, professional artist quality" |
| Logo/branding | "clean vector style, scalable, minimal details" |
| Architecture | "architectural visualization, realistic rendering" |
| Food | "appetizing, food styling, professional food photography" |
| UI mockup | "clean design, modern interface, pixel-perfect" |
### Size Recommendations
| Use Case | Recommended Size |
|----------|-----------------|
| Social media post | `1024x1024` (square) |
| Story/vertical | `1024x1792` |
| Banner/landscape | `1792x1024` |
| Product listing | `1024x1024` |
| Presentation | `1792x1024` |
| Wallpaper | `1792x1024` |
## Style Presets
Quick style references for common requests:
| Preset Name | Style Description |
|-------------|-------------------|
| `product` | Clean white background, studio lighting, commercial photography |
| `lifestyle` | Natural setting, warm lighting, aspirational mood |
| `minimalist` | Simple composition, negative space, clean lines |
| `vintage` | Retro color grading, film grain, nostalgic mood |
| `futuristic` | Neon accents, dark background, sci-fi aesthetic |
| `watercolor` | Soft edges, pastel palette, artistic brush strokes |
| `3d-render` | Octane render, realistic materials, dramatic lighting |
| `anime` | Japanese animation style, vibrant, expressive |
| `sketch` | Pencil drawing, hand-drawn, artistic |
| `flat-design` | Vector style, bold colors, geometric shapes |
## API Reference
### `generateImage(prompt, options)`
Generate a new image from text description.
**Parameters:**
- `prompt` (string) — Image description (auto-enhanced by this skill)
- `options` (object):
- `size` — `1024x1024` | `1024x1792` | `1792x1024` (default: `1024x1024`)
- `quality` — `standard` | `hd` (default: `standard`)
- `style` — `vivid` | `natural` (default: `vivid`)
- `model` — `gpt-image-2` | `dall-e-3` (default: `gpt-image-2`)
- `saveTo` — File path to save the image (default: `./generated-images/`)
**Returns:** `{ success, url, localPath, revisedPrompt }`
### `editImage(imagePath, prompt, options)`
Edit an existing image with natural language instructions.
**Parameters:**
- `imagePath` (string) — Path to the source image
- `prompt` (string) — Edit instruction
- `options` (object):
- `mask` — Path to mask image (white = edit area, black = keep)
- `size` — Output size
- `model` — `gpt-image-2` | `dall-e-3` (default: `gpt-image-2`)
**Returns:** `{ success, url, localPath }`
### `generateVariations(imagePath, options)`
Generate creative variations of an existing image.
**Parameters:**
- `imagePath` (string) — Path to the source image
- `options` (object):
- `count` — Number of variations 1-4 (default: 2)
- `size` — Output size
**Returns:** `{ success, variations: [{ url, localPath }] }`
### `describeImage(imageSource, question)`
Analyze an image using GPT-4o Vision.
**Parameters:**
- `imageSource` (string) — File path or URL of the image
- `question` (string|null) — Specific question about the image (default: general description)
**Returns:** `{ success, description }`
### `downloadImage(url, savePath)`
Download a generated image to local storage.
**Parameters:**
- `url` (string) — Image URL from generation API
- `savePath` (string|null) — Local file path (default: auto-generated in `./generated-images/`)
**Returns:** `{ success, localPath }`
## Error Handling
| Error | Cause | Resolution |
|-------|-------|------------|
| `Invalid API key` | OPENAI_API_KEY not set or invalid | Check environment variable |
| `Content policy violation` | Prompt violates safety guidelines | Rephrase the prompt |
| `Rate limit exceeded` | Too many requests | Wait and retry with backoff |
| `Image too large` | Source image exceeds size limit | Resize to under 4MB |
| `Timeout` | Generation took too long | Simplify prompt or retry |
## Best Practices
1. **Always enhance prompts** — Don't pass raw user input directly to the API
2. **Save locally** — Download generated images; URLs expire after 1 hour
3. **Use appropriate sizes** — Match the output size to the use case
4. **Prefer gpt-image-2** — Better quality and text rendering than dall-e-3
5. **Batch thoughtfully** — Generate 2-4 images max per request to avoid rate limits
6. **Describe edits clearly** — Be specific about what to change and where
## Changelog
### v1.1.0
- Added GPT-4o native image generation support (gpt-image-2 model)
- Added automatic prompt enhancement workflow
- Added image download and local save functionality
- Added style presets for quick reference
- Added batch generation workflow
- Improved error handling and documentation
### v1.0.0
- Initial release with DALL-E 3 support
- Basic generate, edit, variations, and describe functions
---
**Tags:** `image-generation` `AI-art` `GPT-4o` `image-2` `gpt-image-2` `visual-creation` `marketing` `product-photos` `illustration` `design` `openai` `dall-e` `image-editing` `background-removal` `style-transfer` `ui-mockup`
FILE:package.json
{
"name": "image-2-skill",
"version": "1.1.0",
"description": "GPT-4o Image Generation Skill - Generate, edit, and transform images using GPT-4o's image-2 API",
"main": "index.js",
"scripts": {
"test": "echo \"Run tests with: npm run test:all\"",
"test:all": "node test/run-tests.js"
},
"keywords": [
"image-generation",
"AI-art",
"GPT-4o",
"image-2",
"visual-creation",
"openai",
"dall-e",
"image-editing"
],
"author": "",
"license": "MIT",
"dependencies": {
"openai": "^4.0.0"
},
"engines": {
"node": ">=16.0.0"
}
}
FILE:README.md
# Image2 - AI Image Generation Skill
<p align="center">
<img src="https://img.shields.io/badge/Version-1.0.0-blue.svg" alt="Version">
<img src="https://img.shields.io/badge/Platform-ClawHub-green.svg" alt="Platform">
<img src="https://img.shields.io/badge/OpenAI-GPT--4o-74aa9c.svg" alt="OpenAI">
</p>
> Transform your ideas into stunning visuals with the power of GPT-4o's image generation API
## 🎯 What is Image2?
Image2 is a powerful AI skill that harnesses OpenAI's GPT-4o image generation capabilities to help you create, edit, and transform images using natural language. No design skills required - just describe what you need!
## ✨ Key Features
### 🖼️ Text-to-Image Generation
Turn words into beautiful images instantly. Perfect for:
- Marketing materials and advertisements
- Social media content
- Product showcases
- Artistic illustrations
- Concept art and storyboards
### ✏️ Smart Image Editing
Edit existing images using natural language commands:
- Remove unwanted objects or backgrounds
- Add new elements seamlessly
- Change colors, lighting, and atmosphere
- Extend images beyond their original boundaries
### 🔄 Multiple Variations
Explore creative possibilities with variations:
- Generate 4 variations at once
- Perfect for A/B testing
- Create consistent brand imagery
- Explore different design directions
### 👁️ Intelligent Image Analysis
Understand images with AI-powered analysis:
- Detailed image descriptions
- Text extraction (OCR)
- Object and scene recognition
- Color palette extraction
## 🚀 Quick Start
### 1. Installation
Simply download and activate the Image-2 skill in CodeBuddy.
### 2. Configure API Key
Set your OpenAI API key:
```bash
export OPENAI_API_KEY="your-api-key-here"
```
### 3. Start Creating!
Describe what you want to generate and watch AI bring your ideas to life.
## 📝 Usage Examples
### Example 1: Product Photography
```
You: Create a hero shot of a luxury watch on a marble surface with dramatic lighting
AI: [Generates professional product photography]
```
### Example 2: Marketing Banner
```
You: Design a Facebook cover photo for a coffee shop grand opening, warm tones, vintage aesthetic
AI: [Creates eye-catching banner design]
```
### Example 3: Custom Illustration
```
You: Create an illustration of a robot reading a book in a cozy library, children's book style
AI: [Produces charming illustration]
```
### Example 4: Edit Existing Photo
```
You: Remove the person in the background and replace with a sunset beach scene
AI: [Seamlessly edits the image]
```
## 🎨 Creative Use Cases
| Category | Use Case | Prompt Example |
|----------|----------|----------------|
| **E-commerce** | Product listings | "Clean white background product photo of handmade ceramic mug" |
| **Marketing** | Social media | "Instagram post announcing weekend sale, bold typography, vibrant colors" |
| **Events** | Invitations | "Elegant wedding invitation with floral border, gold accents, script font" |
| **Branding** | Logo concepts | "Modern tech startup logo, minimalist, blue and white, abstract icon" |
| **Education** | Visual aids | "Educational infographic showing water cycle, colorful, cartoon style" |
| **Gaming** | Character art | "Fantasy warrior character portrait, detailed armor, dramatic pose" |
## 💡 Prompt Engineering Tips
### Structure Your Prompts
For best results, include:
```
[Main Subject] + [Setting/Background] + [Style] + [Mood] + [Technical Specs]
```
### Example Breakdown
```
Prompt: "A sleek laptop on a minimalist wooden desk, morning light streaming through window,
photorealistic product photography, clean and professional mood, soft shadows, 4K quality"
- Main Subject: laptop
- Setting: minimalist wooden desk
- Style: photorealistic product photography
- Mood: clean and professional
- Technical: soft shadows, 4K quality
```
### Style Keywords
- **Photorealistic**: "photograph", "photo realistic", "DSLR quality"
- **Digital Art**: "digital painting", "vector art", "2D illustration"
- **Artistic**: "oil painting", "watercolor", "sketch", "pop art"
## ⚙️ Advanced Options
### Image Sizes
- Square: `1024x1024` - Social media, icons
- Portrait: `1024x1792` - Stories, posters
- Landscape: `1792x1024` - Landscapes, banners
### Quality Levels
- Standard: Fast generation
- HD: Enhanced detail and quality
### Output Formats
- PNG (default): Best quality
- JPEG: Smaller file size
- WebP: Web optimized
## 🔒 Security & Privacy
- All API calls use secure HTTPS connections
- Your API key is stored locally and never shared
- No images are stored on external servers
- Compliant with OpenAI's data usage policies
## 📦 Requirements
- OpenAI API key with GPT-4o image generation access
- Internet connection
- Sufficient API credits
## 🐛 Troubleshooting
### Common Issues
**Issue**: "API key not found"
```
Solution: Set OPENAI_API_KEY environment variable
```
**Issue**: "Generation timeout"
```
Solution: Try simpler prompts or wait and retry
```
**Issue**: "Quality not satisfactory"
```
Solution: Add more specific details to your prompt
```
## 🤝 Contributing
Have great prompts or use cases? We welcome contributions!
1. Fork the repository
2. Create your feature branch
3. Share your best prompts and examples
4. Submit a pull request
## 📄 License
MIT License - feel free to use and modify for your projects.
---
<p align="center">
Made with ❤️ for the CodeBuddy Community
</p>
FILE:scripts/image-generator.js
/**
* Image2 Skill - GPT-4o Image Generation & Editing
*
* Supports GPT-4o native image generation (gpt-image-2) and DALL-E 3.
* Includes prompt enhancement, local save, and batch operations.
*/
const OpenAI = require('openai');
const fs = require('fs');
const path = require('path');
const https = require('https');
const http = require('http');
// ─── Configuration ───────────────────────────────────────────────
const DEFAULTS = {
model: 'gpt-image-2',
size: '1024x1024',
quality: 'standard',
style: 'vivid',
saveDir: './generated-images',
maxRetries: 2,
retryDelay: 1000
};
const VALID_SIZES = ['1024x1024', '1024x1792', '1792x1024'];
const VALID_MODELS = ['gpt-image-2', 'dall-e-3'];
// ─── OpenAI Client ───────────────────────────────────────────────
let openai = null;
function getClient() {
if (!openai) {
if (!process.env.OPENAI_API_KEY) {
throw new Error('OPENAI_API_KEY environment variable is not set. Please set it before using image-2.');
}
openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
}
return openai;
}
// ─── Utility Functions ───────────────────────────────────────────
/**
* Ensure the save directory exists
*/
function ensureSaveDir(dir) {
const saveDir = dir || DEFAULTS.saveDir;
if (!fs.existsSync(saveDir)) {
fs.mkdirSync(saveDir, { recursive: true });
}
return saveDir;
}
/**
* Generate a unique filename
*/
function generateFilename(prefix = 'image', ext = 'png') {
const timestamp = new Date().toISOString().replace(/[:.]/g, '-');
const random = Math.random().toString(36).substring(2, 8);
return `prefix_timestamp_random.ext`;
}
/**
* Download image from URL to local file
* @param {string} url - Image URL
* @param {string} savePath - Local file path to save to
* @returns {Promise<string>} - Saved file path
*/
async function downloadImage(url, savePath) {
const dir = path.dirname(savePath);
ensureSaveDir(dir);
return new Promise((resolve, reject) => {
const client = url.startsWith('https') ? https : http;
client.get(url, (response) => {
if (response.statusCode === 301 || response.statusCode === 302) {
return downloadImage(response.headers.location, savePath).then(resolve).catch(reject);
}
if (response.statusCode !== 200) {
return reject(new Error(`Download failed with status response.statusCode`));
}
const stream = fs.createWriteStream(savePath);
response.pipe(stream);
stream.on('finish', () => {
stream.close();
resolve(savePath);
});
stream.on('error', reject);
}).on('error', reject);
});
}
/**
* Convert a local file or URL to base64
* @param {string} source - File path or URL
* @returns {Promise<string>} - Base64 encoded string
*/
async function toBase64(source) {
if (fs.existsSync(source)) {
const buffer = fs.readFileSync(source);
return buffer.toString('base64');
}
// If it's a URL, fetch and convert
const client = source.startsWith('https') ? https : http;
return new Promise((resolve, reject) => {
client.get(source, (response) => {
const chunks = [];
response.on('data', chunk => chunks.push(chunk));
response.on('end', () => {
const buffer = Buffer.concat(chunks);
resolve(buffer.toString('base64'));
});
response.on('error', reject);
}).on('error', reject);
});
}
/**
* Validate and normalize options
*/
function normalizeOptions(options = {}) {
const model = VALID_MODELS.includes(options.model) ? options.model : DEFAULTS.model;
const size = VALID_SIZES.includes(options.size) ? options.size : DEFAULTS.size;
const quality = ['standard', 'hd'].includes(options.quality) ? options.quality : DEFAULTS.quality;
const style = ['vivid', 'natural'].includes(options.style) ? options.style : DEFAULTS.style;
return { model, size, quality, style };
}
/**
* Retry wrapper for API calls
*/
async function withRetry(fn, maxRetries = DEFAULTS.maxRetries) {
let lastError;
for (let attempt = 0; attempt <= maxRetries; attempt++) {
try {
return await fn();
} catch (error) {
lastError = error;
if (error.status === 429 && attempt < maxRetries) {
const delay = DEFAULTS.retryDelay * Math.pow(2, attempt);
await new Promise(resolve => setTimeout(resolve, delay));
continue;
}
throw error;
}
}
throw lastError;
}
// ─── Prompt Enhancement ──────────────────────────────────────────
const QUALITY_BOOSTERS = {
general: 'professional quality, high resolution, sharp details',
product: 'studio lighting, clean background, commercial photography, professional product shot',
portrait: 'professional portrait photography, natural lighting, shallow depth of field',
social: 'eye-catching, vibrant colors, modern design, trending aesthetic',
illustration: 'detailed illustration, professional artist quality, clean lines',
logo: 'clean vector style, scalable, minimal details, professional brand identity',
architecture: 'architectural visualization, realistic rendering, professional quality',
food: 'appetizing, professional food styling, restaurant quality, steam visible',
ui: 'clean design, modern interface, pixel-perfect, professional mockup',
landscape: 'breathtaking scenery, golden hour lighting, ultra detailed, 8K quality',
fashion: 'high fashion editorial, Vogue quality, dramatic composition, professional model',
abstract: 'contemporary art gallery quality, visually striking, sophisticated composition'
};
/**
* Auto-detect the image category from the prompt
*/
function detectCategory(prompt) {
const lower = prompt.toLowerCase();
if (/product|商品|产品|item|goods/.test(lower)) return 'product';
if (/portrait|人像|头像|headshot|person/.test(lower)) return 'portrait';
if (/social|social.media|instagram|post|海报|宣传图/.test(lower)) return 'social';
if (/illustration|插画|drawing|sketch|artwork/.test(lower)) return 'illustration';
if (/logo|商标|brand|品牌/.test(lower)) return 'logo';
if (/architecture|建筑|building|interior|室内|房间/.test(lower)) return 'architecture';
if (/food|美食|dish|餐|cake|drink|beverage/.test(lower)) return 'food';
if (/ui|ux|interface|界面|app|website|mockup/.test(lower)) return 'ui';
if (/landscape|风景|scenery|mountain|ocean|sunset/.test(lower)) return 'landscape';
if (/fashion|时尚|outfit|clothing|dress|穿搭/.test(lower)) return 'fashion';
if (/abstract|抽象|pattern|texture|gradient/.test(lower)) return 'abstract';
return 'general';
}
/**
* Enhance a user prompt with quality boosters
*/
function enhancePrompt(prompt, category = null) {
const detectedCategory = category || detectCategory(prompt);
const booster = QUALITY_BOOSTERS[detectedCategory] || QUALITY_BOOSTERS.general;
// Don't duplicate if similar terms already exist
const lower = prompt.toLowerCase();
const boosterWords = booster.toLowerCase().split(', ');
const newWords = boosterWords.filter(word => !lower.includes(word.split(' ')[0]));
if (newWords.length === 0) return prompt;
return `prompt, newWords.join(', ')`;
}
// ─── Core API Functions ──────────────────────────────────────────
/**
* Generate an image from a text description
* @param {string} prompt - The description of the image to generate
* @param {Object} options - Generation options
* @param {boolean} options.autoEnhance - Whether to auto-enhance the prompt (default: true)
* @returns {Promise<Object>} - { success, url, localPath, revisedPrompt, enhancedPrompt }
*/
async function generateImage(prompt, options = {}) {
const {
autoEnhance = true,
saveTo = null,
category = null
} = options;
const { model, size, quality, style } = normalizeOptions(options);
const enhancedPrompt = autoEnhance ? enhancePrompt(prompt, category) : prompt;
try {
const client = getClient();
const result = await withRetry(async () => {
if (model === 'gpt-image-2') {
// GPT-4o native image generation via chat completions
const response = await client.chat.completions.create({
model: 'gpt-4o',
messages: [
{
role: 'user',
content: [
{
type: 'text',
text: `Generate an image: enhancedPrompt`
}
]
}
],
// Note: GPT-4o image generation parameters may vary
// This uses the chat completions endpoint with image output
});
// Extract image from response
const content = response.choices[0]?.message?.content;
if (typeof content === 'string') {
// If it's text, try to find image URL or base64
const urlMatch = content.match(/https?:\/\/[^\s"')]+/);
if (urlMatch) {
return { url: urlMatch[0], revised_prompt: enhancedPrompt };
}
}
// Fallback to DALL-E if GPT-4o doesn't return an image directly
return await generateWithDallE(enhancedPrompt, { ...options, model: 'dall-e-3' });
} else {
return await generateWithDallE(enhancedPrompt, { ...options, model, size, quality, style });
}
});
// Save to local file
let localPath = null;
if (result.url) {
const saveDir = ensureSaveDir(saveTo ? path.dirname(saveTo) : DEFAULTS.saveDir);
const filename = saveTo ? path.basename(saveTo) : generateFilename('gen', 'png');
localPath = path.join(saveDir, filename);
await downloadImage(result.url, localPath);
}
return {
success: true,
url: result.url,
localPath,
revisedPrompt: result.revised_prompt || enhancedPrompt,
enhancedPrompt
};
} catch (error) {
return {
success: false,
error: error.message,
enhancedPrompt
};
}
}
/**
* Internal: Generate with DALL-E API
*/
async function generateWithDallE(prompt, options = {}) {
const { model = 'dall-e-3', size = '1024x1024', quality = 'standard', style = 'vivid' } = options;
const client = getClient();
const response = await client.images.generate({
model,
prompt,
n: 1,
size,
quality,
style
});
return {
url: response.data[0].url,
revised_prompt: response.data[0].revised_prompt
};
}
/**
* Edit an existing image
* @param {string} imagePath - Path or URL to the source image
* @param {string} prompt - Edit instruction
* @param {Object} options - Edit options
* @returns {Promise<Object>} - { success, url, localPath }
*/
async function editImage(imagePath, prompt, options = {}) {
const {
maskPath = null,
saveTo = null
} = options;
const { model, size } = normalizeOptions(options);
try {
const client = getClient();
const result = await withRetry(async () => {
if (model === 'gpt-image-2') {
// GPT-4o native image editing via chat completions with image input
const base64 = await toBase64(imagePath);
const mimeType = imagePath.toLowerCase().endsWith('.png') ? 'image/png' : 'image/jpeg';
const userContent = [
{
type: 'image_url',
image_url: { url: `data:mimeType;base64,base64` }
},
{
type: 'text',
text: `Edit this image: prompt`
}
];
if (maskPath) {
const maskBase64 = await toBase64(maskPath);
const maskMime = maskPath.toLowerCase().endsWith('.png') ? 'image/png' : 'image/jpeg';
userContent.unshift({
type: 'image_url',
image_url: { url: `data:maskMime;base64,maskBase64` }
});
}
const response = await client.chat.completions.create({
model: 'gpt-4o',
messages: [{ role: 'user', content: userContent }]
});
const content = response.choices[0]?.message?.content;
if (typeof content === 'string') {
const urlMatch = content.match(/https?:\/\/[^\s"')]+/);
if (urlMatch) return { url: urlMatch[0] };
}
// Fallback to DALL-E edit
return await editWithDallE(imagePath, prompt, { maskPath, size });
} else {
return await editWithDallE(imagePath, prompt, { maskPath, size });
}
});
let localPath = null;
if (result.url) {
const saveDir = ensureSaveDir(saveTo ? path.dirname(saveTo) : DEFAULTS.saveDir);
const filename = saveTo ? path.basename(saveTo) : generateFilename('edit', 'png');
localPath = path.join(saveDir, filename);
await downloadImage(result.url, localPath);
}
return { success: true, url: result.url, localPath };
} catch (error) {
return { success: false, error: error.message };
}
}
/**
* Internal: Edit with DALL-E API
*/
async function editWithDallE(imagePath, prompt, options = {}) {
const { maskPath = null, size = '1024x1024' } = options;
const client = getClient();
const params = {
model: 'dall-e-3',
image: fs.existsSync(imagePath) ? fs.createReadStream(imagePath) : imagePath,
prompt,
n: 1,
size
};
if (maskPath && fs.existsSync(maskPath)) {
params.mask = fs.createReadStream(maskPath);
}
const response = await client.images.edit(params);
return { url: response.data[0].url };
}
/**
* Generate variations of an image
* @param {string} imagePath - Path to the source image
* @param {Object} options - Variation options
* @returns {Promise<Object>} - { success, variations: [{ url, localPath }] }
*/
async function generateVariations(imagePath, options = {}) {
const {
count = 2,
saveTo = null
} = options;
const { size } = normalizeOptions(options);
const n = Math.min(Math.max(count, 1), 4);
try {
const client = getClient();
if (!fs.existsSync(imagePath)) {
throw new Error(`Image file not found: imagePath`);
}
const response = await withRetry(async () => {
return await client.images.createVariation({
model: 'dall-e-2',
image: fs.createReadStream(imagePath),
n,
size
});
});
const saveDir = ensureSaveDir(saveTo ? path.dirname(saveTo) : DEFAULTS.saveDir);
const variations = [];
for (let i = 0; i < response.data.length; i++) {
const img = response.data[i];
const filename = saveTo
? `path.basename(saveTo, path.extname(saveTo))_i + 1path.extname(saveTo)`
: generateFilename(`var_i + 1`, 'png');
const localPath = path.join(saveDir, filename);
await downloadImage(img.url, localPath);
variations.push({ url: img.url, localPath });
}
return { success: true, variations };
} catch (error) {
return { success: false, error: error.message };
}
}
/**
* Describe/analyze an image using GPT-4o Vision
* @param {string} imageSource - File path or URL of the image
* @param {string} question - Specific question about the image (optional)
* @returns {Promise<Object>} - { success, description }
*/
async function describeImage(imageSource, question = null) {
try {
const client = getClient();
// Prepare image content
let imageUrl;
if (fs.existsSync(imageSource)) {
const base64 = await toBase64(imageSource);
const ext = path.extname(imageSource).toLowerCase();
const mime = ext === '.png' ? 'image/png' : ext === '.webp' ? 'image/webp' : 'image/jpeg';
imageUrl = `data:mime;base64,base64`;
} else {
imageUrl = imageSource; // Assume it's a URL
}
const textContent = question || 'Please describe this image in detail, including objects, colors, composition, mood, and any text visible.';
const response = await client.chat.completions.create({
model: 'gpt-4o',
messages: [
{
role: 'user',
content: [
{ type: 'image_url', image_url: { url: imageUrl } },
{ type: 'text', text: textContent }
]
}
],
max_tokens: 1000
});
return {
success: true,
description: response.choices[0].message.content
};
} catch (error) {
return { success: false, error: error.message };
}
}
// ─── Batch Operations ────────────────────────────────────────────
/**
* Generate multiple images in batch
* @param {Array<string>} prompts - Array of prompt strings
* @param {Object} options - Shared generation options
* @returns {Promise<Object>} - { success, results: [...] }
*/
async function batchGenerate(prompts, options = {}) {
const results = [];
const concurrency = options.concurrency || 2;
// Process in batches to avoid rate limits
for (let i = 0; i < prompts.length; i += concurrency) {
const batch = prompts.slice(i, i + concurrency);
const batchResults = await Promise.all(
batch.map(prompt => generateImage(prompt, options))
);
results.push(...batchResults);
// Small delay between batches
if (i + concurrency < prompts.length) {
await new Promise(resolve => setTimeout(resolve, 500));
}
}
return {
success: results.every(r => r.success),
results,
total: results.length,
succeeded: results.filter(r => r.success).length,
failed: results.filter(r => !r.success).length
};
}
// ─── Exports ─────────────────────────────────────────────────────
module.exports = {
generateImage,
editImage,
generateVariations,
describeImage,
downloadImage,
batchGenerate,
enhancePrompt,
detectCategory,
VALID_SIZES,
VALID_MODELS,
QUALITY_BOOSTERS,
DEFAULTS
};
FILE:examples/prompts-gallery.md
# Image2 Prompt Gallery
A curated collection of effective prompts for different use cases.
## 🛍️ E-Commerce & Product Photography
### Product Showcase
```
Clean white background product photo of [PRODUCT NAME], professional studio lighting,
soft shadows, high-end advertising style, commercial photography
```
### Lifestyle Product Shot
```
[PRODUCT] arranged artfully on natural wooden surface with complementary props
(greenery, fabric, accessories), natural window light, overhead composition,
lifestyle photography style, Pinterest-worthy aesthetic
```
### Before/After Comparison
```
Split image: left side shows plain [PRODUCT], right side shows same product
styled as luxury item with elegant packaging, velvet background, dramatic lighting
```
## 📱 Social Media Content
### Instagram Post
```
Instagram square post design, [TOPIC/CONCEPT], bold modern typography,
vibrant color palette with [COLOR SCHEME], clean layout, brand logo watermark,
trending aesthetic, high engagement potential
```
### Story/TikTok Vertical
```
Vertical video thumbnail style image, [CONCEPT], eye-catching, attention-grabbing,
trending visual style, text overlay space, social media optimized
```
### Facebook Cover
```
Facebook cover photo dimensions, [BRAND/TOPIC], panoramic composition,
professional design, brand colors, clear visual hierarchy
```
## 🎨 Artistic & Creative
### Digital Illustration
```
Detailed digital illustration of [SUBJECT], [STYLE: e.g. anime, comic book,
concept art, children's book], vibrant colors, expressive characters,
professional quality, [MOOD: whimsical, dramatic, peaceful, etc.]
```
### Portrait Art
```
[STYLE: realistic, stylized, minimalist, etc.] portrait of [SUBJECT],
dramatic [LIGHTING TYPE] lighting, [BACKGROUND SETTING], emotional expression,
professional artist quality, [ASPECT RATIO: 3:4 for portrait]
```
### Landscape/Environment
```
Breathtaking landscape of [SCENE], golden hour lighting, [WEATHER/ATMOSPHERE],
[STYLE: photorealistic, painterly, cinematic, etc.], ultra detailed, 8K quality
```
## 🏢 Business & Professional
### Presentation Background
```
Modern abstract background for business presentation, [COLOR SCHEME],
subtle geometric patterns, professional and clean, suitable for text overlay,
gradient tones, contemporary corporate design
```
### Infographic Style
```
Clean infographic illustration showing [TOPIC], flat design style,
modern color palette, icons and graphics, data visualization elements,
educational and visually appealing
```
### Logo Concepts
```
[STYLE: minimalist, modern, vintage, playful, etc.] logo design for [BRAND NAME],
[INDUSTRY TYPE], simple vector style, memorable icon, color palette suggestions,
professional brand identity design
```
## 🏠 Interior & Architecture
### Room Design
```
Modern [ROOM TYPE] interior design, [STYLE: Scandinavian, industrial, bohemian, etc.],
warm ambient lighting, plants and decor, realistic rendering, architectural visualization
```
### Architecture Visualization
```
Exterior view of modern [BUILDING TYPE], architectural photography style,
golden sunset lighting, landscaping, [ENVIRONMENT: urban, coastal, mountain, etc.],
professional real estate rendering
```
## 👗 Fashion & Beauty
### Fashion Editorial
```
Fashion editorial photograph of [SUBJECT] wearing [CLOTHING DESCRIPTION],
[SETTING/LOCATION], [LIGHTING STYLE], high fashion magazine quality,
Vogue editorial style, dramatic composition
```
### Beauty Product
```
Close-up beauty photography of [PRODUCT], glass packaging catching light,
luxurious and elegant mood, rose gold and marble textures,
beauty campaign style, soft romantic lighting
```
## 🍔 Food & Beverage
### Food Photography
```
Gourmet [FOOD ITEM] photography, overhead angle, [STYLE: dark moody, bright fresh,
rustic farmhouse, modern minimalist], appetizing, professional food styling,
restaurant quality, steam rising
```
### Beverage/Cocktail
```
Craft [BEVERAGE TYPE] in elegant glass, garnished with [GARNISH],
neon sign background, cocktail bar atmosphere, dramatic lighting,
lifestyle food photography, inviting and stylish
```
## 🎮 Gaming & Entertainment
### Character Design
```
Fantasy/Sci-fi character concept art of [DESCRIPTION], detailed armor/outfit,
dramatic pose, [BACK STORY CONTEXT], digital painting style,
ZBrush/Blender quality, turnaround sheet style, professional concept art
```
### Game Environment
```
Video game environment concept art of [LOCATION], [GENRE: fantasy, sci-fi, etc.],
moody atmospheric lighting, [TIME OF DAY], detailed world building,
Blizzard/Naughty Dog quality concept art
```
### Album/Book Cover
```
Album cover design for [GENRE/MOOD], [ARTIST/TOPIC], bold typography,
[STYLE: retro, modern, abstract, photographic], professional music industry quality,
impactful and memorable composition
```
## 🌟 Abstract & Experimental
### Abstract Art
```
Abstract digital art composition, fluid shapes, [COLOR PALETTE],
organic flowing forms, contemporary art gallery quality,
[MOOD: energetic, calm, mysterious, etc.]
```
### Surreal/Conceptual
```
Surrealist digital artwork, dreamlike [SCENE], floating [OBJECTS],
impossible architecture, [COLOR TREATMENT: muted, vibrant, monochromatic],
hyper-detailed, Salvador Dali meets modern digital art
```
---
## 💡 Pro Tips for Better Results
### 1. Start with Style Keywords
Always begin with your desired art style:
- "Photorealistic", "Digital painting", "3D render", "Vector art"
- "Oil painting", "Watercolor", "Pencil sketch"
- "Anime style", "Comic book", "Storybook illustration"
### 2. Add Lighting Details
Lighting can make or break an image:
- "Cinematic lighting", "Golden hour", "Neon glow"
- "Soft diffused light", "Dramatic rim light"
- "Studio lighting", "Natural window light"
### 3. Specify Composition
Guide the viewer's eye:
- "Close-up portrait", "Wide establishing shot"
- "Overhead view", "45-degree angle"
- "Rule of thirds composition", "Centered composition"
### 4. Include Technical Quality
Request higher quality:
- "8K resolution", "Ultra detailed", "High fidelity"
- "Professional photography", "Award-winning quality"
- "Masterpiece", "Portfolio quality"
### 5. Set the Mood
Emotional tone affects everything:
- "Warm and inviting", "Dark and mysterious"
- "Energetic and vibrant", "Calm and peaceful"
- "Nostalgic and vintage", "Futuristic and sleek"
---
*Share your best prompts and help others create amazing images!*
FILE:examples/quick-starts.md
# Image2 Skill - Quick Start Templates
## 📸 Product Photography Templates
### Template 1: Hero Product Shot
```
Professional product photography of [PRODUCT NAME] on [SURFACE TYPE],
studio lighting with soft box setup, clean white or neutral gray background,
sharp focus on product details, commercial advertising quality, [ASPECT: 4:5 for Instagram]
```
### Template 2: Lifestyle Product
```
[PRODUCT NAME] being used in [LIFESTYLE SCENE], natural ambient lighting,
lifestyle photography, authentic and aspirational mood, editorial style,
[ENVIRONMENT: kitchen, office, outdoor, etc.]
```
### Template 3: Comparison Shot
```
Split composition: left side [PRODUCT] in plain packaging, right side same
product in premium [BRAND] gift box with ribbon, soft gray background,
professional product photography, luxury feel
```
## 🎨 Social Media Templates
### Template 4: Sale Announcement
```
Social media graphic for [BRAND] [SALE TYPE] sale, bold typography reading "[TEXT]",
explosion graphic of sale tags and confetti, [BRAND COLOR SCHEME], energetic
and urgent mood, [PLATFORM: Instagram/Facebook/Twitter] optimized dimensions
```
### Template 5: Quote Card
```
Inspirational quote card, "[QUOTE TEXT]", attributed to [AUTHOR],
elegant typography, [VISUAL STYLE: minimalist/bohemian/professional],
[BACKGROUND: soft gradient/texture/image], social media optimized
```
### Template 6: Event Poster
```
Event poster for [EVENT NAME], [DATE and TIME], [LOCATION],
bold graphic design, [THEME/COLOR SCHEME], performer/event imagery,
professional poster layout, [SIZE FORMAT: A4/Facebook Cover/Instagram Story]
```
## 🌐 Web Design Templates
### Template 7: Hero Banner
```
Website hero section background, [BRAND/TOPIC] visual, [MOOD: professional/warm/modern],
space for headline text overlay, [COLOR PALETTE], [STYLE: photography/illustration/abstract],
1920x1080 web banner dimensions
```
### Template 8: About Us Page Image
```
Team/business/About page hero image, [DESCRIPTION OF SCENE],
professional corporate photography style, warm and approachable mood,
natural lighting, modern office or relevant environment setting
```
### Template 9: Blog Featured Image
```
Blog article featured image for "[TOPIC]", modern editorial illustration style,
[COLOR PALETTE], includes elements related to [TOPIC],
space for title overlay, 16:9 aspect ratio
```
## 🎭 Portrait & Character Templates
### Template 10: Professional Headshot
```
Professional headshot photograph of [DESCRIPTION], corporate portrait style,
[BACKGROUND: solid color/outdoor/natural], natural or studio lighting,
friendly and confident expression, high resolution, [ASPECT: 4:5 for LinkedIn]
```
### Template 11: Character Concept
```
[GENRE: fantasy/sci-fi/realistic] character concept art of [CHARACTER DESCRIPTION],
[POSE: action/portrait/three-quarter], detailed costume and prop design,
[STYLE: concept art/digital painting/comic], [BACK STORY ELEMENT],
turnaround sheet format if full body
```
### Template 12: Family Portrait
```
Elegant family portrait, [NUMBER] family members, [COMPOSITION: standing/seated],
[SETTING: studio/natural outdoor], [STYLE: traditional/formal/casual],
matching coordinated outfits, warm and timeless mood
```
## 🏠 Real Estate & Interior Templates
### Template 13: Property Listing
```
Real estate photography of [PROPERTY TYPE], [NUMBER] bedrooms, [LOCATION],
bright and airy atmosphere, golden hour exterior shot, clean and clutter-free,
professional real estate photography, HDR quality
```
### Template 14: Interior Design
```
Interior design inspiration photo of [ROOM TYPE], [STYLE: modern/rustic/minimalist],
[COLOR PALETTE], [KEY FURNITURE/DESIGN ELEMENTS], natural light streaming in,
pinterest-worthy aesthetic, interior design magazine quality
```
### Template 15: Floor Plan Overlay
```
Architectural rendering of [PROPERTY TYPE], modern [STYLE] design,
birds-eye floor plan view overlaid on exterior photo,
blueprint aesthetic, professional real estate visualization
```
## 🍔 Food & Restaurant Templates
### Template 16: Restaurant Menu Photo
```
Gourmet [FOOD ITEM] plated on [DISH TYPE], overhead shot,
[STYLE: dark and moody/bright and fresh/rustic farmhouse],
restaurant quality food photography, steam rising, garnishes visible
```
### Template 17: Beverage Showcase
```
Premium [BEVERAGE TYPE] in [GLASSWARE], cocktail style photography,
garnished with [GARNISH], [BACKDROP: marble/mirror/neon],
bar atmosphere, dramatic lighting, lifestyle food photography
```
### Template 18: Restaurant Interior
```
Restaurant interior photography, [CUISINE TYPE] restaurant,
warm ambient lighting, [STYLE: intimate/cozy/busy vibrant],
guests enjoying meal (blurred), professional hospitality photography
```
## 💼 Business & Corporate Templates
### Template 19: Team Photo
```
Corporate team photo, [NUMBER] team members, [INDUSTRY/BRAND] setting,
[STYLE: formal/casual/creative], [LOCATION: office/outdoor/studio],
natural professional lighting, [POSE: standing/sitting/mixed]
```
### Template 20: Business Card Design
```
Modern business card design for [NAME], [TITLE], [COMPANY],
clean minimalist layout, [BRAND COLORS], [LOGO if applicable],
elegant typography, business card dimensions, print-ready quality
```
### Template 21: Presentation Template
```
Corporate presentation slide background, [TOPIC] theme,
professional gradient or pattern, space for text and charts,
modern business aesthetic, [BRAND COLORS], 16:9 aspect ratio
```
---
## 🚀 How to Use These Templates
1. **Copy** the template that matches your needs
2. **Customize** the bracketed [CONTENT] with your specifics
3. **Enhance** with additional details for better results
4. **Test** and refine based on outputs
## 💡 Tips for Template Customization
- **Be Specific**: Replace all [BRACKETED] content with exact details
- **Add Context**: Include your brand voice and personality
- **Specify Quality**: Add "professional," "high-end," "award-winning"
- **Set Mood**: Include emotional descriptors
- **Reference Style**: Mention preferred art/photo styles
---
*Copy these templates and save them for quick image generation!*