@clawhub-jinpeng-bd9911f590
Generate PowerPoint presentations with AI images using Gemini. Each slide is a full-bleed image. Use for creating visual presentations, slideshows, or image-...
---
name: nano-banana-pro-pptx
description: Generate PowerPoint presentations with AI images using Gemini. Each slide is a full-bleed image. Use for creating visual presentations, slideshows, or image-based decks from a topic prompt. Requires --slides N.
metadata:
credentials:
- name: GEMINI_API_KEY
description: Gemini API key (required)
required: true
- name: GEMINI_BASE_URL
description: Custom/proxy API base URL (optional)
required: false
---
# nano-banana-pro-pptx: AI-Generated Image Presentations
Generate a PowerPoint presentation where every slide is a single full-bleed AI-generated image. Gemini handles both the narrative planning and image generation end-to-end.
## Usage
Run the script using absolute path (do NOT cd to skill directory first):
```bash
uv run ~/.openclaw/workspace/skills/nano-banana-pro-pptx/scripts/generate_pptx.py \
--prompt "your presentation topic" \
--slides N \
[--filename "output.pptx"] \
[--resolution 1K|2K|4K] \
[--api-key KEY] \
[--base-url URL]
```
## Arguments
| Argument | Required | Default | Description |
|---|---|---|---|
| `--prompt` | Yes | — | Topic/theme for the presentation |
| `--slides` | Yes | — | Number of slides (1–50) |
| `--filename` | No | Auto-generated slug | Output `.pptx` filename or full path |
| `--resolution` | No | `1K` | `1K` (1024px), `2K` (2048px), `4K` (4096px) |
| `--api-key` | No | `$GEMINI_API_KEY` | Gemini API key |
| `--base-url` | No | `$GEMINI_BASE_URL` | Custom API base URL |
The script checks for API key in this order:
1. `--api-key` argument (use if user provided key in chat)
2. `GEMINI_API_KEY` environment variable
If neither is available, the script exits with an error message.
The script checks for base URL in this order:
1. `--base-url` argument (use if user provided URL in chat)
2. `GEMINI_BASE_URL` environment variable
If neither is available, the script uses the default Gemini API base URL.
## Instructions
1. Always use the absolute path to the script — never `cd` to the skill directory
2. `--slides` is always required — ask the user if not provided
3. Default to `--resolution 1K` for drafts; suggest `2K` for final output
4. Omit `--filename` to let the script auto-generate a slug from the prompt (do not construct the filename yourself)
5. Confirm the output file path to the user after completion
6. If `GEMINI_API_KEY` is not set in the environment, remind the user to set it or pass `--api-key`
## Workflow
1. Draft at `1K` resolution to verify narrative and composition
2. Regenerate at `2K` or `4K` for final delivery
## Environment Variables
- `GEMINI_API_KEY` — required (or pass `--api-key`)
- `GEMINI_BASE_URL` — optional, for custom/proxy endpoints
FILE:scripts/generate_pptx.py
# /// script
# requires-python = ">=3.11"
# dependencies = [
# "google-genai",
# "python-pptx",
# "Pillow",
# ]
# ///
import argparse
import base64
import io
import json
import os
import re
import sys
import time
from datetime import datetime
from pathlib import Path
from google import genai
from google.genai import types
from PIL import Image, UnidentifiedImageError
from pptx import Presentation
from pptx.util import Emu
def slugify(text: str, max_len: int = 40) -> str:
"""Convert prompt text to a filename-safe slug.
For ASCII text, lowercases and replaces non-alphanumeric runs with hyphens.
For non-ASCII text (e.g. Chinese), keeps the original characters up to max_len,
replacing only filesystem-unsafe characters.
"""
ascii_slug = text.lower()
ascii_slug = re.sub(r"[^a-z0-9]+", "-", ascii_slug)
ascii_slug = ascii_slug.strip("-")[:max_len].rstrip("-")
if ascii_slug:
return ascii_slug
safe = re.sub(r'[\\/:*?"<>|]+', "-", text.strip())
safe = safe[:max_len].rstrip("-").strip()
return safe if safe else "presentation"
def parse_args() -> argparse.Namespace:
parser = argparse.ArgumentParser(
description="Generate a PowerPoint presentation with AI-generated slide images."
)
parser.add_argument("--prompt", required=True, help="Topic/theme for the presentation")
parser.add_argument("--slides", required=True, type=int, help="Number of slides (1-50)")
parser.add_argument("--filename", default=None, help="Output directory name (default: auto-generated from prompt)")
parser.add_argument(
"--resolution",
choices=["1K", "2K", "4K"],
default="1K",
help="Image resolution (default: 1K)",
)
parser.add_argument("--api-key", default=None, help="Gemini API key")
parser.add_argument("--base-url", default=None, help="Custom Gemini API base URL")
return parser.parse_args()
RESOLUTION_MAP = {"1K": 1024, "2K": 2048, "4K": 4096}
SLIDE_WIDTH_EMU = 9144000
SLIDE_HEIGHT_EMU = 5143500
def resolve_config(args: argparse.Namespace) -> tuple[str, str | None, Path, int]:
"""Resolve API key, base URL, output directory, and target width."""
api_key = args.api_key or os.environ.get("GEMINI_API_KEY")
if not api_key:
print("Error: GEMINI_API_KEY is not set. Pass --api-key or set the environment variable.")
sys.exit(1)
if not (1 <= args.slides <= 50):
print(f"Error: --slides must be between 1 and 50, got {args.slides}.")
sys.exit(1)
base_url = args.base_url or os.environ.get("GEMINI_BASE_URL")
slug = slugify(args.prompt)
timestamp = datetime.now().strftime("%Y%m%d-%H%M%S")
dir_name = args.filename if args.filename else f"{slug}-{timestamp}"
out_dir = Path(dir_name)
out_dir.mkdir(parents=True, exist_ok=True)
target_width = RESOLUTION_MAP[args.resolution]
return api_key, base_url, out_dir, target_width
def generate_slide_plans(
client: genai.Client,
topic: str,
n: int,
out_dir: Path,
) -> list[dict]:
"""Phase 1: Generate N slide plans (title + bullet points) via gemini-3.1-pro-preview."""
system_instruction = (
f"You are a professional presentation designer. Generate a {n}-slide PowerPoint outline "
f"for the topic: '{topic}'.\n"
f"Return ONLY a valid JSON array with exactly {n} objects. No markdown fences, no explanation.\n"
'Each object must have:\n'
' "title": short slide title (4-8 words)\n'
' "points": array of 3-5 concise bullet points (each 5-15 words)\n'
'The slides should form a coherent narrative arc from introduction to conclusion.'
)
def _call() -> list[dict]:
response = client.models.generate_content(
model="gemini-3.1-pro-preview",
contents=system_instruction,
)
raw = response.text.strip()
raw = re.sub(r"^```(?:json)?\s*", "", raw)
raw = re.sub(r"\s*```$", "", raw)
parsed = json.loads(raw)
if not isinstance(parsed, list) or len(parsed) != n:
raise ValueError(
f"Expected list of {n} objects, got: {type(parsed).__name__} "
f"len={len(parsed) if isinstance(parsed, list) else '?'}"
)
for item in parsed:
if not isinstance(item, dict) or "title" not in item or "points" not in item:
raise ValueError(f"Each slide must have 'title' and 'points' keys, got: {item}")
return parsed
print(f"Planning {n} slides for: {topic!r}")
try:
slides = _call()
except Exception as e:
print(f"Phase 1 attempt 1 failed: {e}. Retrying...")
try:
slides = _call()
except Exception as e2:
print(f"Phase 1 failed after retry: {e2}")
sys.exit(1)
plans_path = out_dir / "slide-plans.json"
plans_path.write_text(
json.dumps({"topic": topic, "slides": slides}, ensure_ascii=False, indent=2),
encoding="utf-8",
)
print(f"Saved slide plans to: {plans_path}")
return slides
def build_image_prompt(slide: dict, topic: str) -> str:
"""Build a full slide design prompt from a slide plan.
Asks the image model to generate a complete PPT slide with the title and
bullet points visually composed into the design, banana-slides style.
"""
points_text = "\n".join(f"• {p}" for p in slide["points"])
return (
f"Create a professional PowerPoint presentation slide image in 16:9 widescreen format "
f"for a presentation about '{topic}'. "
f"The slide must clearly display the following text content:\n\n"
f"TITLE: {slide['title']}\n\n"
f"KEY POINTS:\n{points_text}\n\n"
f"Design style: visually stunning, modern presentation design. "
f"Use a thematically relevant, high-quality background. "
f"Title text should be large, bold, and prominent. "
f"Bullet points should be clearly readable with high contrast. "
f"Professional layout with clean typography and balanced composition. "
f"The final image should look like a polished, presentation-ready slide."
)
def generate_images(
client: genai.Client,
slides: list[dict],
topic: str,
target_width: int,
out_dir: Path,
) -> list[Path]:
"""Phase 2: Generate one full slide design image per slide plan."""
image_paths = []
n = len(slides)
# Build and save all image prompts before generating
prompts = [
{"slide": i, "title": s["title"], "image_prompt": build_image_prompt(s, topic)}
for i, s in enumerate(slides, 1)
]
(out_dir / "slide-prompts.json").write_text(
json.dumps(prompts, ensure_ascii=False, indent=2), encoding="utf-8"
)
max_retries = 3
retry_delay = 2 # seconds, doubles each retry
interval = 1 # seconds between slides
log_path = out_dir / "image-requests.log"
log_entries = []
for i, slide in enumerate(slides, 1):
print(f"Generating slide {i}/{n}: {slide['title']!r}")
img_prompt = prompts[i - 1]["image_prompt"]
img_bytes = None
for attempt in range(1, max_retries + 1):
t_start = time.time()
started_at = datetime.now().isoformat(timespec="seconds")
status = "ok"
error_msg = None
try:
response = client.models.generate_content(
model="gemini-3-pro-image-preview",
contents=img_prompt,
config=types.GenerateContentConfig(
response_modalities=["IMAGE"],
),
)
raw_data = response.candidates[0].content.parts[0].inline_data.data
img_bytes = raw_data if isinstance(raw_data, bytes) else base64.b64decode(raw_data)
except Exception as e:
status = "error"
error_msg = str(e)
finally:
elapsed = round(time.time() - t_start, 2)
entry = {
"slide": i,
"title": slide["title"],
"attempt": attempt,
"started_at": started_at,
"elapsed_s": elapsed,
"status": status,
}
if error_msg:
entry["error"] = error_msg
log_entries.append(entry)
log_path.write_text(
json.dumps(log_entries, ensure_ascii=False, indent=2), encoding="utf-8"
)
if status == "ok":
break
if attempt < max_retries:
wait = retry_delay * (2 ** (attempt - 1))
print(f" Attempt {attempt} failed: {error_msg}. Retrying in {wait}s...")
time.sleep(wait)
else:
print(f"Error generating image for slide {i} after {max_retries} attempts: {error_msg}")
sys.exit(1)
try:
img = Image.open(io.BytesIO(img_bytes))
except UnidentifiedImageError:
print(f"Error: Could not identify image for slide {i}.")
sys.exit(1)
aspect = img.height / img.width
new_height = int(target_width * aspect)
img = img.resize((target_width, new_height), Image.LANCZOS)
title_slug = slugify(slide["title"], max_len=40)
out_path = out_dir / f"slide-{i:02d}-{title_slug}.png"
img.save(out_path, "PNG")
image_paths.append(out_path)
if i < n:
time.sleep(interval)
return image_paths
def assemble_pptx(image_paths: list[Path], out_dir: Path, slug: str) -> Path:
"""Phase 3: Assemble full-bleed slide images into a PPTX."""
prs = Presentation()
prs.slide_width = Emu(SLIDE_WIDTH_EMU)
prs.slide_height = Emu(SLIDE_HEIGHT_EMU)
blank_layout = prs.slide_layouts[6]
for img_path in image_paths:
slide = prs.slides.add_slide(blank_layout)
slide.shapes.add_picture(
str(img_path),
left=Emu(0),
top=Emu(0),
width=Emu(SLIDE_WIDTH_EMU),
height=Emu(SLIDE_HEIGHT_EMU),
)
output_path = out_dir / f"{slug}.pptx"
prs.save(output_path)
return output_path
if __name__ == "__main__":
args = parse_args()
api_key, base_url, out_dir, target_width = resolve_config(args)
slug = slugify(args.prompt)
http_options = {"base_url": base_url} if base_url else {}
client = genai.Client(api_key=api_key, http_options=http_options)
slides = generate_slide_plans(client, args.prompt, args.slides, out_dir)
image_paths = generate_images(client, slides, args.prompt, target_width, out_dir)
print("Assembling presentation...")
pptx_path = assemble_pptx(image_paths, out_dir, slug)
print(f"Done! Output folder: {out_dir.resolve()}")
print(f" Presentation: {pptx_path.name}")
print(f" Images: {len(image_paths)} slides")
print(f" Plans: slide-plans.json")
print(f" Image prompts: slide-prompts.json")