@clawhub-gopendrasharma89-tech-5d2c43b8fa
Pro-Studio v4.0.0. AI-powered background removal, smart subtitle placement, and cinematic LUT presets. The ultimate production suite for high-end video content.
---
name: openclaw-video-editor
description: A practical video editing skill for OpenClaw built on ffmpeg, ffprobe, and python3. Provides honest, tested workflows for common edits: subtitle generation, background blur, color grading, audio normalization, scene-detected highlight reels, and watermarking. No external API calls, no hidden network behavior.
license: MIT
metadata: {"openclaw":{"requires":{"bins":["ffmpeg","ffprobe","python3"]},"primaryEnv":null,"homepage":"https://clawhub.ai/gopendrasharma89-tech/openclaw-video-editor"}}
---
# openclaw-video-editor
A focused, honest video editing skill for OpenClaw. It wraps `ffmpeg`, `ffprobe`, and small Python helpers around real, working workflows. No misleading claims, no external API calls, no hidden network behavior.
## What this skill does
This skill helps the agent run common video edits using only locally installed tools (`ffmpeg`, `ffprobe`, `python3`). It ships:
- A subtitle generator (`scripts/generate_srt.py`) that converts Whisper / Deepgram / AssemblyAI / generic word-timing JSON into `.srt`, `.vtt`, or `.ass`.
- A scene-detected highlight reel builder (`scripts/highlight_reel.py`) that picks key moments using ffmpeg scene-change scores.
- A color grading helper (`scripts/apply_lut.py`) that applies `.cube` LUT files or named filter presets.
- A dependency check (`scripts/check_deps.sh`) so the agent can confirm `ffmpeg`/`ffprobe`/`python3` exist before running any command.
- A small library of correct, tested ffmpeg one-liners below.
This skill does not perform AI-based subject masking, voice cloning, or generative video. If you want those, use a dedicated tool — this skill is intentionally limited to honest local processing.
## What this skill does not do
To set expectations clearly:
- It does not include AI background removal. The blur workflow below is a `boxblur` filter, not subject segmentation.
- It does not include a transcription model. It only formats word-level timings into subtitle files. Pair it with a transcription tool to get the JSON input.
- It does not call any external service. All commands run locally on the machine where `ffmpeg` is installed.
## Required binaries
Before running any workflow, the agent should confirm dependencies:
```bash
bash scripts/check_deps.sh
```
This returns a non-zero exit code if `ffmpeg`, `ffprobe`, or `python3` is missing. The agent should surface a clear error to the user instead of attempting commands that will fail.
## Workflows
### 1. Generate subtitles from a transcript
```bash
python3 scripts/generate_srt.py transcript.json subtitles.srt
python3 scripts/generate_srt.py transcript.json subtitles.vtt
python3 scripts/generate_srt.py transcript.json subtitles.ass --font Helvetica --fontsize 28
```
Tunable flags: `--max-chars`, `--max-words`, `--max-duration`. Useful for non-English languages where line lengths differ.
### 2. Burn subtitles into a video
```bash
ffmpeg -i input.mp4 -vf "subtitles=subtitles.srt:force_style='Alignment=2,MarginV=30,Outline=2,Shadow=1'" -c:a copy output.mp4
```
`Alignment=2` is bottom-center. Use `Alignment=8` for top-center when the bottom of the frame has important action.
### 3. Background blur (not segmentation)
A real shallow-depth-of-field look without green screen requires AI segmentation, which this skill does not include. What it does provide is honest full-frame blur, useful as a privacy filter:
```bash
ffmpeg -i input.mp4 -vf "boxblur=20:1" -c:a copy blurred.mp4
```
If you already have a binary alpha matte (`mask.mp4`, where the subject is white and the background is black, frame-aligned to `input.mp4`), you can composite a blurred background behind the original subject:
```bash
ffmpeg -i input.mp4 -i mask.mp4 \
-filter_complex "[0:v]boxblur=20:1[bg];[0:v][1:v]alphamerge[fg];[bg][fg]overlay=format=auto" \
-c:a copy composited.mp4
```
This only works if a per-frame alpha matte already exists. Producing the matte itself is out of scope.
### 4. Color grading
Two options.
Honest filter presets (no LUT file required):
```bash
# Warmer, more saturated look
ffmpeg -i input.mp4 -vf "eq=contrast=1.1:saturation=1.2,colorbalance=rs=0.1:gs=0.0:bs=-0.05" -c:a copy graded.mp4
# Desaturated black & white
ffmpeg -i input.mp4 -vf "hue=s=0,eq=contrast=1.2:brightness=-0.02" -c:a copy bw.mp4
```
Real LUT (when you have a `.cube` file):
```bash
python3 scripts/apply_lut.py input.mp4 lut.cube graded.mp4
# or directly:
ffmpeg -i input.mp4 -vf "lut3d=lut.cube" -c:a copy graded.mp4
```
`.cube` LUT files are the only way to get accurate cinematic looks. The filter-based "presets" above are approximations.
### 5. Audio normalization (broadcast loudness)
Two-pass loudnorm targeting EBU R128 (-23 LUFS):
```bash
# Pass 1: measure
ffmpeg -i input.mp4 -af "loudnorm=I=-23:LRA=7:tp=-2:print_format=json" -f null - 2> loudnorm.log
# Pass 2: apply with measured values
# Read the input_i / input_tp / input_lra / input_thresh / target_offset values from loudnorm.log
ffmpeg -i input.mp4 \
-af "loudnorm=I=-23:LRA=7:tp=-2:measured_I=<input_i>:measured_TP=<input_tp>:measured_LRA=<input_lra>:measured_thresh=<input_thresh>:offset=<target_offset>:linear=true" \
-c:v copy normalized.mp4
```
For online platforms, `-14 LUFS` is the common spec instead of `-23 LUFS`.
### 6. Highlight reel via scene detection
```bash
python3 scripts/highlight_reel.py input.mp4 highlight.mp4 --duration 30 --threshold 0.4
```
Uses `ffprobe`/`ffmpeg` scene-change scores to pick clip boundaries, then assembles a short reel up to the requested duration.
### 7. Watermark
```bash
# Bottom-right text watermark
ffmpeg -i input.mp4 -vf "drawtext=text='@yourhandle':x=w-tw-20:y=h-th-20:fontsize=24:[email protected]:box=1:[email protected]:boxborderw=8" -c:a copy watermarked.mp4
# Image overlay (logo)
ffmpeg -i input.mp4 -i logo.png -filter_complex "[0:v][1:v]overlay=W-w-20:20" -c:a copy watermarked.mp4
```
### 8. Format conversions and resizes
```bash
# 1080p H.264 with reasonable quality
ffmpeg -i input.mp4 -vf "scale=-2:1080" -c:v libx264 -preset medium -crf 20 -c:a aac -b:a 160k output_1080p.mp4
# Vertical 9:16 crop for shorts
ffmpeg -i input.mp4 -vf "crop=ih*9/16:ih,scale=1080:1920" -c:a copy vertical.mp4
```
## Safety notes
- All workflows are local. No part of this skill calls a remote API or sends data anywhere.
- `ffmpeg` filtergraphs in this skill never use `subprocess` from inside Python with shell-evaluated user input. The Python helpers shell-quote arguments via `shlex` and reject paths containing shell metacharacters.
- The skill never modifies system configuration, environment variables, or other plugins. It only reads input files and writes output files at paths the user provides.
## Limitations
- The skill assumes inputs are valid video/audio files reachable on the local filesystem.
- Two-pass loudness normalization is described but not automated end-to-end; the agent must read the measurement output and pass values into the second pass.
- The blur compositing workflow only works when a per-frame matte already exists. Generating the matte requires a separate segmentation tool.
## License
MIT. See `LICENSE` for the full text.
FILE:scripts/apply_lut.py
#!/usr/bin/env python3
"""
Apply a 3D LUT (.cube file) to a video using ffmpeg's lut3d filter.
Usage:
python3 apply_lut.py <input.mp4> <lut.cube> <output.mp4> [--strength 0..1]
[--preset NAME]
If --preset is given instead of a .cube file, a deterministic ffmpeg filter
preset is applied. Available presets:
- warm
- cool
- bw (black and white)
- high-contrast
- faded
The script never invokes a shell. Inputs are validated for safe characters
to avoid filter-graph injection.
"""
from __future__ import annotations
import argparse
import os
import re
import subprocess
import sys
from pathlib import Path
SAFE_PATH_RE = re.compile(r"^[\w./\-+ @=:%,()'\[\]]+$")
PRESETS = {
"warm": "eq=contrast=1.05:saturation=1.15,colorbalance=rs=0.10:gs=0.0:bs=-0.05",
"cool": "eq=contrast=1.05:saturation=1.05,colorbalance=rs=-0.05:gs=0.0:bs=0.10",
"bw": "hue=s=0,eq=contrast=1.15:brightness=-0.02",
"high-contrast": "eq=contrast=1.30:brightness=0.0:saturation=1.05",
"faded": "eq=contrast=0.92:saturation=0.85,curves=preset=lighter",
}
def safe_path(p: str) -> Path:
if not SAFE_PATH_RE.match(p):
raise ValueError(f"Refusing path with unsafe characters: {p!r}")
return Path(p).expanduser()
def run(cmd):
return subprocess.run(cmd, check=False, text=True, stderr=subprocess.PIPE)
def main() -> int:
parser = argparse.ArgumentParser(description=__doc__.split("\n", 1)[0])
parser.add_argument("input", help="Source video path")
parser.add_argument(
"lut_or_dash",
help="Path to a .cube LUT file, or '-' if using --preset",
)
parser.add_argument("output", help="Output video path")
parser.add_argument(
"--strength",
type=float,
default=1.0,
help="LUT mix amount, 0..1 (default 1.0). Only effective with .cube LUTs.",
)
parser.add_argument(
"--preset",
choices=sorted(PRESETS.keys()),
help="Use a named preset instead of a .cube file",
)
parser.add_argument(
"--crf",
type=int,
default=20,
help="x264 CRF value (default 20)",
)
args = parser.parse_args()
try:
src = safe_path(args.input).resolve()
out = safe_path(args.output).resolve()
except ValueError as e:
print(f"error: {e}", file=sys.stderr)
return 2
if not src.exists():
print(f"error: input not found: {src}", file=sys.stderr)
return 2
if not 0.0 <= args.strength <= 1.0:
print("error: --strength must be in [0, 1]", file=sys.stderr)
return 2
if args.preset:
vfilter = PRESETS[args.preset]
else:
if args.lut_or_dash == "-":
print("error: pass a .cube path or use --preset", file=sys.stderr)
return 2
try:
lut = safe_path(args.lut_or_dash).resolve()
except ValueError as e:
print(f"error: {e}", file=sys.stderr)
return 2
if not lut.exists() or lut.suffix.lower() != ".cube":
print(f"error: not a .cube file: {lut}", file=sys.stderr)
return 2
if args.strength >= 0.999:
vfilter = f"lut3d='{lut}'"
else:
# Blend: split video, apply LUT to one branch, mix
vfilter = (
f"split[a][b];[a]lut3d='{lut}'[a2];"
f"[a2][b]blend=all_mode=normal:all_opacity={args.strength}"
)
out.parent.mkdir(parents=True, exist_ok=True)
cmd = [
"ffmpeg",
"-hide_banner",
"-y",
"-i",
str(src),
"-vf",
vfilter,
"-c:v",
"libx264",
"-preset",
"medium",
"-crf",
str(args.crf),
"-c:a",
"copy",
"-movflags",
"+faststart",
str(out),
]
print("running:", " ".join(cmd), file=sys.stderr)
res = run(cmd)
if res.returncode != 0:
print(f"error: ffmpeg failed: {res.stderr.strip()}", file=sys.stderr)
return 1
print(f"Wrote {out}", file=sys.stderr)
return 0
if __name__ == "__main__":
sys.exit(main())
FILE:scripts/check_deps.sh
#!/usr/bin/env bash
# Check required binaries for openclaw-video-editor.
# Exits 0 if all are available, 1 if any are missing.
set -u
missing=0
required=(ffmpeg ffprobe python3)
for bin in "required[@]"; do
if command -v "$bin" >/dev/null 2>&1; then
version=$("$bin" --version 2>&1 | head -n 1 || echo "version unknown")
printf " ok %-9s %s\n" "$bin" "$version"
else
printf " miss %-9s not found in PATH\n" "$bin"
missing=1
fi
done
if [ "$missing" -eq 1 ]; then
echo
echo "One or more required binaries are missing." >&2
echo "Install via your platform package manager (apt, brew, choco, etc.)." >&2
exit 1
fi
echo
echo "All dependencies satisfied."
exit 0
FILE:scripts/generate_srt.py
#!/usr/bin/env python3
"""
Generate SRT/ASS/VTT subtitle files from transcription JSON output.
Usage:
python3 generate_srt.py <transcript.json> <output.srt> [options]
Supported input formats: Whisper, Deepgram, AssemblyAI, generic {word,start,end} lists.
Supported output formats: .srt, .ass, .vtt (auto-detected from extension).
"""
import json
import sys
import argparse
from pathlib import Path
from typing import List, Dict, Optional, Any
def format_srt_ts(seconds: float) -> str:
"""Convert seconds to SRT timestamp (HH:MM:SS,mmm)."""
if seconds < 0:
seconds = 0
h = int(seconds // 3600)
m = int((seconds % 3600) // 60)
s = int(seconds % 60)
ms = int((seconds % 1) * 1000)
return f"{h:02d}:{m:02d}:{s:02d},{ms:03d}"
def format_vtt_ts(seconds: float) -> str:
"""Convert seconds to VTT timestamp (HH:MM:SS.mmm)."""
return format_srt_ts(seconds).replace(",", ".")
def format_ass_ts(seconds: float) -> str:
"""Convert seconds to ASS timestamp (H:MM:SS.cc)."""
if seconds < 0:
seconds = 0
h = int(seconds // 3600)
m = int((seconds % 3600) // 60)
s = int(seconds % 60)
cs = int((seconds % 1) * 100)
return f"{h}:{m:02d}:{s:02d}.{cs:02d}"
def chunk_words(words: list, max_chars: int = 42, max_words: int = 8,
max_duration: float = 6.0) -> list:
"""Group words into subtitle chunks based on character and word limits.
Args:
words: List of word dictionaries with 'word', 'start', 'end' keys
max_chars: Maximum characters per subtitle line
max_words: Maximum words per subtitle line
max_duration: Maximum duration for a single subtitle in seconds
Returns:
List of chunk dictionaries with 'text', 'start', 'end' keys
"""
chunks = []
current_words = []
current_text = ""
chunk_start = None
chunk_end = None
def flush_chunk():
"""Add current chunk to chunks list and reset state."""
nonlocal current_words, current_text, chunk_start, chunk_end
if current_words and chunk_start is not None:
chunks.append({
"text": current_text,
"start": chunk_start,
"end": chunk_end if chunk_end else (current_words[-1].get("end", chunk_start + 1))
})
current_words = []
current_text = ""
chunk_start = None
chunk_end = None
for word_info in words:
word = word_info.get("word", word_info.get("text", "")).strip()
start = float(word_info.get("start", 0))
end = float(word_info.get("end", start + 0.5))
if not word:
continue
test_text = f"{current_text} {word}".strip() if current_text else word
duration = end - (chunk_start if chunk_start else start)
# Flush if exceeds any limit
should_flush = False
if current_words:
if len(test_text) > max_chars:
should_flush = True
elif len(current_words) >= max_words:
should_flush = True
elif duration > max_duration:
should_flush = True
if should_flush:
flush_chunk()
# After flush, recompute test_text from the now-empty buffer so
# the next chunk does not inherit the previous chunk's text.
test_text = word
# Start new chunk if needed
if chunk_start is None:
chunk_start = start
current_words.append(word_info)
current_text = test_text
chunk_end = end
# Flush remaining chunk
if current_words:
flush_chunk()
return chunks
def generate_srt(chunks: list) -> str:
"""Generate SRT format subtitle content."""
lines = []
for i, c in enumerate(chunks, 1):
lines.append(f"{i}")
lines.append(f"{format_srt_ts(c['start'])} --> {format_srt_ts(c['end'])}")
lines.append(c["text"])
lines.append("")
return "\n".join(lines)
def generate_vtt(chunks: list) -> str:
"""Generate VTT format subtitle content."""
lines = ["WEBVTT", ""]
for i, c in enumerate(chunks, 1):
lines.append(f"{i}")
lines.append(f"{format_vtt_ts(c['start'])} --> {format_vtt_ts(c['end'])}")
lines.append(c["text"])
lines.append("")
return "\n".join(lines)
def generate_ass(chunks: list, font: str = "Arial", fontsize: int = 24,
primary: str = "&H00FFFFFF", outline: str = "&H00000000",
backcolor: str = "&H80000000") -> str:
"""Generate ASS format subtitle content."""
header = f"""[Script Info]
Title: Auto-generated subtitles
ScriptType: v4.00+
WrapStyle: 0
ScaledBorderAndShadow: yes
PlayResX: 1920
PlayResY: 1080
[V4+ Styles]
Format: Name, Fontname, Fontsize, PrimaryColour, SecondaryColour, OutlineColour, BackColour, Bold, Italic, Underline, StrikeOut, ScaleX, ScaleY, Spacing, Angle, BorderStyle, Outline, Shadow, Alignment, MarginL, MarginR, MarginV, Encoding
Style: Default,{font},{fontsize},{primary},&H000000FF,{outline},{backcolor},-1,0,0,0,100,100,0,0,1,2,1,2,20,20,50,1
[Events]
Format: Layer, Start, End, Style, Name, MarginL, MarginR, MarginV, Effect, Text"""
lines = [header]
for c in chunks:
text = c["text"].replace("\\", "\\\\").replace("{", "\\{").replace("}", "\\}")
lines.append(f"Dialogue: 0,{format_ass_ts(c['start'])},{format_ass_ts(c['end'])},Default,,0,0,0,,{text}")
return "\n".join(lines)
def parse_transcript(data: Any) -> list:
"""Extract word-level timing from various transcription JSON formats.
Supported formats:
- Whisper (with segments containing words)
- Whisper (with segments containing text only)
- Deepgram (with channels and alternatives)
- AssemblyAI (with words array)
- Generic list of {word/text, start, end}
"""
words = []
if isinstance(data, list):
if not data:
return words
# Check if it's a direct list of words
first_item = data[0]
if isinstance(first_item, dict) and ("word" in first_item or "text" in first_item):
for item in data:
word = item.get("word", item.get("text", "")).strip()
if word:
words.append({
"word": word,
"start": float(item.get("start", 0)),
"end": float(item.get("end", item.get("start", 0) + 0.5))
})
return words
# Extract words from segments
for seg in data:
if "words" in seg and isinstance(seg["words"], list):
for w in seg["words"]:
word = w.get("word", w.get("text", "")).strip()
if word:
words.append({
"word": word,
"start": float(w.get("start", 0)),
"end": float(w.get("end", w.get("start", 0) + 0.5))
})
elif "text" in seg and "start" in seg:
text = seg.get("text", "").strip()
start = float(seg.get("start", 0))
end = float(seg.get("end", start + 1))
seg_words = text.split()
if seg_words:
dur = (end - start) / len(seg_words) if len(seg_words) > 1 else 0.5
for j, w in enumerate(seg_words):
words.append({
"word": w,
"start": start + j * dur,
"end": start + (j + 1) * dur
})
return words
if isinstance(data, dict):
# Whisper format with segments
if "segments" in data and isinstance(data["segments"], list):
for seg in data["segments"]:
if "words" in seg and isinstance(seg["words"], list):
for w in seg["words"]:
word = w.get("word", w.get("text", "")).strip()
if word:
words.append({
"word": word,
"start": float(w.get("start", 0)),
"end": float(w.get("end", w.get("start", 0) + 0.5))
})
else:
text = seg.get("text", "").strip()
start = float(seg.get("start", 0))
end = float(seg.get("end", start + 1))
seg_words = text.split()
if seg_words:
dur = (end - start) / len(seg_words) if len(seg_words) > 1 else 0.5
for j, w in enumerate(seg_words):
words.append({
"word": w,
"start": start + j * dur,
"end": start + (j + 1) * dur
})
return words
# Deepgram format
if "results" in data and isinstance(data["results"], dict):
results = data["results"]
if "channels" in results:
for ch in results.get("channels", []):
if "alternatives" in ch:
for alt in ch["alternatives"]:
if "words" in alt:
for w in alt["words"]:
word = w.get("word", w.get("punctuated_word", "")).strip()
if word:
words.append({
"word": word,
"start": float(w.get("start", 0)),
"end": float(w.get("end", w.get("start", 0) + 0.5))
})
elif "words" in results:
for w in results["words"]:
word = w.get("word", w.get("punctuated_word", "")).strip()
if word:
words.append({
"word": word,
"start": float(w.get("start", 0)),
"end": float(w.get("end", w.get("start", 0) + 0.5))
})
return words
# AssemblyAI or simple format
if "words" in data and isinstance(data["words"], list):
for w in data["words"]:
word = w.get("word", w.get("text", "")).strip()
if word:
words.append({
"word": word,
"start": float(w.get("start", 0)),
"end": float(w.get("end", w.get("start", 0) + 0.5))
})
return words
# Single utterance format
if "text" in data and "start" in data:
text = data["text"].strip()
start = float(data.get("start", 0))
end = float(data.get("end", start + 1))
seg_words = text.split()
if seg_words:
dur = (end - start) / len(seg_words) if len(seg_words) > 1 else 0.5
for j, w in enumerate(seg_words):
words.append({
"word": w,
"start": start + j * dur,
"end": start + (j + 1) * dur
})
return words
return words
def validate_input_file(filepath: str) -> bool:
"""Validate that input file exists and is readable."""
path = Path(filepath)
if not path.exists():
print(f"Error: File not found: {filepath}", file=sys.stderr)
return False
if not path.is_file():
print(f"Error: Not a file: {filepath}", file=sys.stderr)
return False
if path.stat().st_size == 0:
print(f"Error: File is empty: {filepath}", file=sys.stderr)
return False
return True
def main():
parser = argparse.ArgumentParser(
description="Generate subtitles from transcription JSON",
formatter_class=argparse.RawDescriptionHelpFormatter,
epilog="""
Examples:
# Basic usage with Whisper JSON
python3 generate_srt.py transcript.json subtitles.srt
# Custom styling for ASS format
python3 generate_srt.py transcript.json subtitles.ass --font "Helvetica" --fontsize 28
# Adjust chunk settings for different languages
python3 generate_srt.py transcript.json subtitles.srt --max-chars 60 --max-words 10
Supported transcription formats:
- OpenAI Whisper (with word-level timing)
- Deepgram API
- AssemblyAI
- Generic JSON with word, start, end fields
"""
)
parser.add_argument("input", help="Path to transcription JSON file")
parser.add_argument("output", help="Output path (.srt, .vtt, or .ass)")
parser.add_argument("--max-chars", type=int, default=42,
help="Max characters per line (default: 42)")
parser.add_argument("--max-words", type=int, default=8,
help="Max words per line (default: 8)")
parser.add_argument("--max-duration", type=float, default=6.0,
help="Max duration per subtitle in seconds (default: 6.0)")
parser.add_argument("--font", default="Arial",
help="Font name for ASS format (default: Arial)")
parser.add_argument("--fontsize", type=int, default=24,
help="Font size for ASS format (default: 24)")
parser.add_argument("--quiet", "-q", action="store_true",
help="Suppress non-error output")
args = parser.parse_args()
# Validate input
if not validate_input_file(args.input):
sys.exit(1)
# Read and parse JSON
try:
with open(args.input, "r", encoding="utf-8") as f:
data = json.load(f)
except json.JSONDecodeError as e:
print(f"Error: Invalid JSON in {args.input}: {e}", file=sys.stderr)
sys.exit(1)
except UnicodeDecodeError:
# Try with different encoding
try:
with open(args.input, "r", encoding="latin-1") as f:
data = json.load(f)
except Exception as e:
print(f"Error: Cannot read {args.input}: {e}", file=sys.stderr)
sys.exit(1)
# Extract words from transcript
words = parse_transcript(data)
if not words:
print("Error: Could not extract word timings from transcript.", file=sys.stderr)
print("Supported formats: Whisper, Deepgram, AssemblyAI, or list of {word, start, end}.", file=sys.stderr)
sys.exit(1)
# Chunk words into subtitle segments
chunks = chunk_words(
words,
max_chars=args.max_chars,
max_words=args.max_words,
max_duration=args.max_duration
)
if not chunks:
print("Error: No subtitle chunks generated.", file=sys.stderr)
sys.exit(1)
# Generate output based on file extension
ext = Path(args.output).suffix.lower()
try:
if ext == ".vtt":
content = generate_vtt(chunks)
elif ext == ".ass":
content = generate_ass(
chunks,
font=args.font,
fontsize=args.fontsize
)
else:
# Default to SRT
content = generate_srt(chunks)
# Ensure output directory exists
output_path = Path(args.output)
output_path.parent.mkdir(parents=True, exist_ok=True)
# Write output
with open(args.output, "w", encoding="utf-8") as f:
f.write(content)
if not args.quiet:
duration = chunks[-1]["end"] if chunks else 0
print(f"Generated {len(chunks)} subtitle entries ({ext}) → {args.output}")
print(f" Total duration: {duration:.1f}s")
print(f" Settings: max_chars={args.max_chars}, max_words={args.max_words}, max_duration={args.max_duration}s")
except IOError as e:
print(f"Error: Cannot write to {args.output}: {e}", file=sys.stderr)
sys.exit(1)
if __name__ == "__main__":
main()
FILE:scripts/highlight_reel.py
#!/usr/bin/env python3
"""
Build a short highlight reel from a long video using ffmpeg scene-change
detection.
Usage:
python3 highlight_reel.py <input.mp4> <output.mp4> [--duration SECONDS]
[--threshold FLOAT]
[--clip-length SECONDS]
Algorithm:
1. Run ffmpeg with the `select='gt(scene,THRESHOLD)'` filter and parse
`showinfo` lines to collect scene-change timestamps.
2. Trim a fixed-length clip starting at each scene-change timestamp until
the cumulative duration reaches the requested target.
3. Concatenate the clips losslessly via the concat demuxer.
The script never reads or writes paths outside the directories of the input
and output files. It rejects paths that contain shell metacharacters, so the
caller can pass user-provided filenames directly.
"""
from __future__ import annotations
import argparse
import os
import re
import shlex
import subprocess
import sys
import tempfile
from pathlib import Path
from typing import List
SAFE_PATH_RE = re.compile(r"^[\w./\-+ @=:%,()'\[\]]+$")
def safe_path(p: str) -> Path:
"""Reject shell-metacharacter paths and return a resolved Path."""
if not SAFE_PATH_RE.match(p):
raise ValueError(f"Refusing path with unsafe characters: {p!r}")
return Path(p).expanduser()
def run(cmd: List[str], capture: bool = False) -> subprocess.CompletedProcess:
"""Run a command; never use shell=True."""
return subprocess.run(
cmd,
check=False,
text=True,
stdout=subprocess.PIPE if capture else None,
stderr=subprocess.PIPE,
)
def probe_duration(path: Path) -> float:
res = run(
[
"ffprobe",
"-v",
"error",
"-show_entries",
"format=duration",
"-of",
"default=noprint_wrappers=1:nokey=1",
str(path),
],
capture=True,
)
if res.returncode != 0:
raise RuntimeError(f"ffprobe failed: {res.stderr.strip()}")
return float(res.stdout.strip())
def detect_scene_changes(path: Path, threshold: float) -> List[float]:
"""Return scene-change timestamps in seconds."""
res = run(
[
"ffmpeg",
"-hide_banner",
"-i",
str(path),
"-filter:v",
f"select='gt(scene,{threshold})',showinfo",
"-f",
"null",
"-",
],
capture=False,
)
# ffmpeg writes showinfo lines to stderr
timestamps: List[float] = []
for line in res.stderr.splitlines():
m = re.search(r"pts_time:([0-9.]+)", line)
if m:
timestamps.append(float(m.group(1)))
return timestamps
def select_clips(
scene_times: List[float],
target_duration: float,
clip_length: float,
video_duration: float,
) -> List[tuple]:
"""Choose (start, length) pairs whose total length is close to target."""
if not scene_times:
# Fall back to evenly spaced clips
n = max(1, int(target_duration // clip_length))
if n == 1:
return [(0.0, min(clip_length, video_duration))]
step = max(1.0, (video_duration - clip_length) / (n - 1))
return [(round(i * step, 3), clip_length) for i in range(n)]
selected: List[tuple] = []
total = 0.0
for t in scene_times:
if t + clip_length > video_duration:
t = max(0.0, video_duration - clip_length)
selected.append((round(t, 3), clip_length))
total += clip_length
if total >= target_duration:
break
return selected
def extract_clips(
src: Path, clips: List[tuple], workdir: Path
) -> List[Path]:
out_paths: List[Path] = []
for i, (start, length) in enumerate(clips):
out = workdir / f"clip_{i:03d}.mp4"
res = run(
[
"ffmpeg",
"-hide_banner",
"-y",
"-ss",
f"{start}",
"-i",
str(src),
"-t",
f"{length}",
"-c:v",
"libx264",
"-preset",
"veryfast",
"-crf",
"20",
"-c:a",
"aac",
"-b:a",
"160k",
"-movflags",
"+faststart",
str(out),
]
)
if res.returncode != 0 or not out.exists():
raise RuntimeError(
f"Failed to extract clip {i} at {start}s: {res.stderr.strip()}"
)
out_paths.append(out)
return out_paths
def concat_clips(clip_paths: List[Path], output: Path, workdir: Path) -> None:
list_file = workdir / "clips.txt"
with list_file.open("w", encoding="utf-8") as f:
for c in clip_paths:
# The concat demuxer requires escaped single quotes
safe = str(c).replace("'", r"'\''")
f.write(f"file '{safe}'\n")
res = run(
[
"ffmpeg",
"-hide_banner",
"-y",
"-f",
"concat",
"-safe",
"0",
"-i",
str(list_file),
"-c",
"copy",
"-movflags",
"+faststart",
str(output),
]
)
if res.returncode != 0:
raise RuntimeError(f"concat failed: {res.stderr.strip()}")
def main() -> int:
parser = argparse.ArgumentParser(description=__doc__.split("\n", 1)[0])
parser.add_argument("input", help="Source video path")
parser.add_argument("output", help="Output highlight reel path")
parser.add_argument(
"--duration",
type=float,
default=30.0,
help="Target reel duration in seconds (default: 30)",
)
parser.add_argument(
"--clip-length",
type=float,
default=3.0,
help="Length of each clip in seconds (default: 3)",
)
parser.add_argument(
"--threshold",
type=float,
default=0.4,
help="Scene-change sensitivity 0..1 (default: 0.4)",
)
args = parser.parse_args()
try:
src = safe_path(args.input).resolve()
out = safe_path(args.output).resolve()
except ValueError as e:
print(f"error: {e}", file=sys.stderr)
return 2
if not src.exists():
print(f"error: input not found: {src}", file=sys.stderr)
return 2
if args.duration <= 0 or args.clip_length <= 0:
print("error: duration and clip-length must be > 0", file=sys.stderr)
return 2
if not 0.0 < args.threshold <= 1.0:
print("error: threshold must be in (0, 1]", file=sys.stderr)
return 2
out.parent.mkdir(parents=True, exist_ok=True)
try:
video_duration = probe_duration(src)
except Exception as e: # noqa: BLE001
print(f"error: probe failed: {e}", file=sys.stderr)
return 1
if args.duration >= video_duration:
print(
f"warning: target duration ({args.duration}s) >= source ({video_duration:.1f}s); "
"result will be the full source",
file=sys.stderr,
)
print(f"Detecting scene changes in {src.name} ...", file=sys.stderr)
scenes = detect_scene_changes(src, args.threshold)
print(f" found {len(scenes)} scene-change points", file=sys.stderr)
clips = select_clips(scenes, args.duration, args.clip_length, video_duration)
print(f" selected {len(clips)} clips", file=sys.stderr)
for i, (s, l) in enumerate(clips):
print(f" {i:02d}: start={s:.2f}s length={l:.2f}s", file=sys.stderr)
with tempfile.TemporaryDirectory(prefix="highlight_") as tmp:
workdir = Path(tmp)
clip_paths = extract_clips(src, clips, workdir)
concat_clips(clip_paths, out, workdir)
print(f"Wrote {out}", file=sys.stderr)
return 0
if __name__ == "__main__":
sys.exit(main())
Evidence-based and approval-gated self-improvement workflow for OpenClaw. Use when the user asks to make OpenClaw more powerful, optimize behavior, improve r...
---
name: openclaw-self-improve
description: Evidence-based and approval-gated self-improvement workflow for OpenClaw. Use when the user asks to make OpenClaw more powerful, optimize behavior, improve reliability, performance, UX, safety, or cost, and requires measurable before/after outcomes. Not for casual improvements — use for structured, trackable change cycles.
required_binaries: bash, git, date, grep, awk, zip (scripts). python3 for JSON export.
---
# OpenClaw Self-Improve
v1.0.8
## Overview
Run a repeatable improvement loop that is metrics-first, approval-gated, and rollback-ready.
## Operating Modes
Choose one mode before starting work.
- `audit-only`: baseline + risk mapping only.
- `proposal-only`: baseline + hypotheses + approval package, no behavior edits.
- `approved-implementation`: implement only approved proposal, then validate.
Default mode: `proposal-only`.
## Required Inputs
Collect these before substantial work.
- Objective: what to improve.
- Scope: target repo/deployment.
- Constraints: time, risk tolerance, blocked surfaces.
- Success criteria: measurable pass/fail conditions.
- Validation gate: exact commands and expected outcomes.
If the user does not specify scope and `/root/openclaw` exists, use `/root/openclaw`.
## New Features (v1.0.8)
### Automatic Validation Gate Detection
The `--auto-detect-validation` flag automatically detects common test and build commands from your project structure. Supports Node.js, Python, Go, Rust, Java, Docker, and shell scripts.
```bash
init-improvement-run.sh \
--repo /path/to/repo \
--mode proposal-only \
--objective "Improve X" \
--auto-detect-validation
```
### Comprehensive Logging
Enable detailed run logging with `--enable-logging` for debugging and audit trails. Logs are saved in `run.log` within the run directory.
```bash
init-improvement-run.sh \
--repo /path/to/repo \
--mode proposal-only \
--objective "Improve X" \
--enable-logging
```
### Non-Git Repository Backup
For non-git repositories, use `--create-backup` to create a zip backup before running. Enables safe rollback without git.
```bash
init-improvement-run.sh \
--repo /path/to/repo \
--mode approved-implementation \
--objective "Refactor core" \
--create-backup
```
### Input Sanitization
All user inputs are sanitized to prevent injection attacks and suspicious flag issues. Validation gates and objectives are properly escaped.
## Metric Suggestions
Map objective to concrete metrics. Use `references/playbooks.md` to pick a primary playbook.
- Reliability: failed runs, retry count, error rate, flaky tests.
- Performance: latency, startup time, token/CPU/memory usage.
- Quality: regression count, test coverage of touched area, user-visible defects.
- Cost: token usage, paid API calls per workflow, unnecessary tool calls.
## Quick Start
1. **Dry-run first to preview:**
```bash
export OPENCLAW_REPO=/path/to/repo
init-improvement-run.sh --repo "$OPENCLAW_REPO" --mode proposal-only --objective "Improve X" --dry-run
```
2. **Scaffold artifacts:**
```bash
init-improvement-run.sh --repo "$OPENCLAW_REPO" --mode proposal-only --objective "Improve X"
```
If `init-improvement-run.sh` is not on PATH, run from the skill's `scripts/` directory instead.
3. **Validate a completed run:**
```bash
validate-improvement-run.sh --run-dir <run-dir>
```
Add `--require-json` for CI/automation pipelines.
4. **Export machine-readable JSON:**
```bash
export-improvement-run-json.py --run-dir <run-dir>
```
5. **Overwrite existing run:**
Pass `--timestamp YYYYMMDD-HHMMSS --force` to reuse the same run directory.
## Examples
### Performance improvement
```bash
init-improvement-run.sh \
--repo /root/openclaw \
--mode approved-implementation \
--objective "Reduce gateway startup time by 30%" \
--scope "src/gateway/" \
--validation-gate "time pnpm start -- --no-watch"
```
### Reliability audit
```bash
init-improvement-run.sh \
--repo /root/openclaw \
--mode audit-only \
--objective "Reduce flaky test rate in CI" \
--scope "tests/" \
--validation-gate "pnpm test -- --retries 3"
```
### Cost reduction
```bash
init-improvement-run.sh \
--repo /root/openclaw \
--mode proposal-only \
--objective "Reduce average token usage per session by 20%" \
--scope "src/" \
--validation-gate "run 100 representative sessions and compare token counts"
```
### Auto-detect with logging
```bash
init-improvement-run.sh \
--repo /root/openclaw \
--mode proposal-only \
--objective "Improve code quality" \
--auto-detect-validation \
--enable-logging
```
## Workflow
### 0. Preflight (all modes)
- Confirm mode (`audit-only`, `proposal-only`, `approved-implementation`).
- Confirm objective and measurable success criteria.
- Pick a primary metric set from `references/playbooks.md` if objective is broad.
- Confirm target repo path. Scaffold with `--dry-run` first.
- Capture current commit and branch.
### 1. Baseline
- Capture reproducible state and current metrics.
- Record commit, branch, and environment assumptions.
### 2. Hypotheses
- Write 1 to 3 hypotheses.
- Rank by impact and risk.
- Select smallest high-impact change.
### 3. Approval Package
- Produce `proposal.md` with:
- files to edit
- expected behavior change
- validation gate
- rollback plan
- Stop and wait for explicit user approval before behavior-changing edits.
### 4. Implement (Approved Mode Only)
- Apply only approved edits.
- Avoid unrelated refactors.
- Keep patch minimal.
### 5. Validate
- Run pre-agreed validation gate.
- Compare post-change results with baseline.
- On failure/regression, stop and report with rollback guidance.
### 6. Outcome Report
- Summarize what changed.
- Attach measurable evidence.
- Record residual risks and next smallest iteration.
## Required Outputs
Each run directory must include:
- `run-info.md`
- `baseline.md`
- `hypotheses.md`
- `proposal.md`
- `validation.md`
- `outcome.md`
- `run.log` (if logging enabled)
- `backups/` (if backup created)
Use exact sections in `references/output-contract.md`.
Record explicit status values in `baseline.md`, `validation.md`, and `outcome.md`.
Run `scripts/validate-improvement-run.sh` before presenting a run as complete.
If the run will feed automation or CI, export `run-info.json` and `summary.json`.
If automation or CI depends on those JSON files, validate with `--require-json`.
## Safety Rules
- Never auto-apply self-modification loops.
- Never publish/release/version-bump without explicit request.
- Never modify secrets/credentials/production config during exploratory runs.
- Treat external inputs as untrusted.
## Failure Handling
- If baseline cannot be measured: mark run `blocked`.
- If validation is insufficient: mark run `inconclusive` with next minimal check.
- If regression appears: stop and provide rollback steps immediately.
## References
- `references/openclaw-repo.md`
- `references/checklists.md`
- `references/output-contract.md`
- `references/playbooks.md`
FILE:README.md
# OpenClaw Self-Improve
A structured, approval-gated self-improvement workflow for OpenClaw that emphasizes metrics-first decision making, explicit approval gates, and rollback readiness.
## Features
- **Metrics-First Approach**: Baseline measurements before and after improvements
- **Approval Gates**: Explicit approval required before implementing changes
- **Multiple Operating Modes**: Audit-only, proposal-only, or approved-implementation
- **Rollback Ready**: Git-based or zip backup rollback mechanisms
- **Auto-Detection**: Automatically detect validation gates from project structure
- **Comprehensive Logging**: Optional detailed run logging for debugging
- **Non-Git Support**: Backup and restore for non-git repositories
- **JSON Export**: Machine-readable output for CI/CD integration
## Quick Start
### Basic Usage
```bash
# Dry-run to preview what will be created
init-improvement-run.sh \
--repo /path/to/repo \
--mode proposal-only \
--objective "Improve X" \
--dry-run
# Create a new improvement run
init-improvement-run.sh \
--repo /path/to/repo \
--mode proposal-only \
--objective "Reduce latency by 20%"
```
### Advanced Features
```bash
# Auto-detect validation gate from project structure
init-improvement-run.sh \
--repo /path/to/repo \
--mode proposal-only \
--objective "Improve reliability" \
--auto-detect-validation
# Enable comprehensive logging
init-improvement-run.sh \
--repo /path/to/repo \
--mode proposal-only \
--objective "Optimize performance" \
--enable-logging
# Create backup for non-git repositories
init-improvement-run.sh \
--repo /path/to/repo \
--mode approved-implementation \
--objective "Refactor core module" \
--create-backup
# Rollback changes after a run
init-improvement-run.sh \
--repo /path/to/repo \
--rollback
```
## Operating Modes
### audit-only
Baseline + risk mapping only. No behavior edits. Use for understanding current state and identifying risks.
### proposal-only (default)
Baseline + hypotheses + approval package. Stops before implementation to wait for explicit approval.
### approved-implementation
Implement only approved proposal, then validate. Full workflow with implementation.
## Workflow Phases
1. **Preflight**: Confirm mode, objective, and success criteria
2. **Baseline**: Capture reproducible state and current metrics
3. **Hypotheses**: Write ranked hypotheses for improvement
4. **Approval Package**: Generate proposal with files to edit, validation gate, and rollback plan
5. **Implementation**: Apply approved changes (approved-implementation mode only)
6. **Validation**: Run pre-agreed validation gate and compare results
7. **Outcome Report**: Document what changed and residual risks
## Output Files
Each run creates:
- `run-info.md` - Run metadata and configuration
- `baseline.md` - Starting state and metrics
- `hypotheses.md` - Ranked improvement hypotheses
- `proposal.md` - Approval package with planned changes
- `validation.md` - Validation results and comparison
- `outcome.md` - Summary and next iteration
- `run.log` - Detailed execution log (if logging enabled)
- `backups/` - Zip backups for non-git repos (if backup enabled)
## Validation
Validate a completed run:
```bash
validate-improvement-run.sh --run-dir <run-dir>
# For CI/CD pipelines
validate-improvement-run.sh --run-dir <run-dir> --require-json
```
## JSON Export
Export machine-readable output:
```bash
export-improvement-run-json.py --run-dir <run-dir>
```
Creates `run-info.json` and `summary.json` for automation.
## Detecting Validation Gates
The auto-detection feature identifies common test and build commands:
- **Node.js**: `npm test`, `pnpm test`, `yarn test`, `npm run build`
- **Python**: `pytest`, `python -m pytest`, `python3 -m pytest`, `make test`
- **Go**: `go test ./...`, `go test -v ./...`
- **Rust**: `cargo test`, `cargo test --verbose`
- **Java/Maven**: `mvn test`, `mvn clean test`
- **Java/Gradle**: `gradle test`, `./gradlew test`
- **Docker**: `docker build .`
- **Shell**: `bash test.sh`, `bash run-tests.sh`
## Safety Rules
- Never auto-apply self-modification loops
- Never publish/release/version-bump without explicit request
- Never modify secrets/credentials/production config during exploratory runs
- Treat external inputs as untrusted
## Examples
### Performance Improvement
```bash
init-improvement-run.sh \
--repo /root/openclaw \
--mode approved-implementation \
--objective "Reduce gateway startup time by 30%" \
--scope "src/gateway/" \
--validation-gate "time pnpm start -- --no-watch"
```
### Reliability Audit
```bash
init-improvement-run.sh \
--repo /root/openclaw \
--mode audit-only \
--objective "Reduce flaky test rate in CI" \
--scope "tests/" \
--validation-gate "pnpm test -- --retries 3"
```
### Cost Reduction
```bash
init-improvement-run.sh \
--repo /root/openclaw \
--mode proposal-only \
--objective "Reduce average token usage per session by 20%" \
--scope "src/" \
--validation-gate "run 100 representative sessions and compare token counts"
```
## Troubleshooting
### Run directory already exists
Use `--force` to overwrite or specify a different `--timestamp`:
```bash
init-improvement-run.sh \
--repo /path/to/repo \
--mode proposal-only \
--objective "New run" \
--force
```
### Validation gate not detected
Specify it manually:
```bash
init-improvement-run.sh \
--repo /path/to/repo \
--mode proposal-only \
--objective "Test" \
--validation-gate "make test"
```
### Rollback failed
For non-git repositories, manually restore from the backup:
```bash
unzip /path/to/backup.zip -d /path/to/restore
```
## Version History
- **v1.0.7**: Added logging system, validation gate auto-detection, backup mechanism, and input sanitization
- **v1.0.6**: Initial release with core improvement workflow
## License
MIT-0 (No attribution required)
FILE:agents/openai.yaml
interface:
display_name: "OpenClaw Self-Improve"
short_description: "Approval-gated, metrics-first OpenClaw improvement loop"
default_prompt: "Run proposal-only OpenClaw improvement with dry-run scaffold check, playbook-driven metrics, approval package, artifact validation, and required JSON validation for automation-ready outputs"
FILE:references/output-contract.md
# Output Contract for OpenClaw Self-Improve
This document specifies the exact structure and required sections for each output file in an improvement run.
## run-info.md
**Purpose**: Metadata about the run configuration and environment.
**Required Sections**:
- `# Run Info` (heading)
- `- Timestamp (UTC): <value>` (bullet)
- `- Mode: <value>` (bullet)
- `- Repo: <value>` (bullet)
- `- Objective: <value>` (bullet)
- `- Scope: <value>` (bullet)
- `- Validation Gate: <value>` (bullet)
- `- Git Commit: <value>` (bullet)
- `- Git Branch: <value>` (bullet)
- `- Is Git Repository: <value>` (bullet)
- `- Logging Enabled: <value>` (bullet)
**Valid Status Values**: N/A (informational only)
## baseline.md
**Purpose**: Starting state, metrics, and risk assessment before improvements.
**Required Sections**:
- `# Baseline` (heading)
- `## Objective` (heading)
- `## Scope` (heading)
- `## Repo State` (heading)
- `- Commit: <value>` (bullet)
- `- Branch: <value>` (bullet)
- `- Is Git Repository: <value>` (bullet)
- `## Reproduction` (heading)
- `## Metrics` (heading)
- `## Risks` (heading)
- `## Status` (heading)
- `- <status_value>` (bullet)
**Valid Status Values**: `pass`, `fail`, `blocked`, `inconclusive`
## hypotheses.md
**Purpose**: Ranked improvement hypotheses.
**Required Sections**:
- `# Hypotheses` (heading)
- `## Hypothesis 1` (heading)
- `## Hypothesis 2` (heading)
- `## Hypothesis 3` (heading)
- `## Ranking` (heading)
**Valid Status Values**: N/A (informational only)
## proposal.md
**Purpose**: Approval package with planned changes and rollback plan.
**Required Sections**:
- `# Proposal` (heading)
- `## Selected Hypothesis` (heading)
- `## Planned Changes` (heading)
- `## Files To Edit` (heading)
- `## Validation Gate` (heading)
- `## Rollback Plan` (heading)
- `## Approval Status` (heading)
- `- <approval_status>` (bullet)
**Valid Approval Status Values**: `pending`, `approved`, `approved and implemented`, `rejected`, `blocked`
## validation.md
**Purpose**: Validation results and before/after comparison.
**Required Sections**:
- `# Validation` (heading)
- `## Commands Run` (heading)
- `## Results` (heading)
- `## Baseline vs New` (heading)
- `## Pass/Fail` (heading)
- `## Status` (heading)
- `- <status_value>` (bullet)
**Valid Status Values**: `pass`, `fail`, `blocked`, `inconclusive`
## outcome.md
**Purpose**: Summary of changes, evidence, and next iteration.
**Required Sections**:
- `# Outcome` (heading)
- `## Summary` (heading)
- `## Evidence` (heading)
- `## Residual Risk` (heading)
- `## Next Iteration` (heading)
- `## Status` (heading)
- `- <status_value>` (bullet)
**Valid Status Values**: `pass`, `fail`, `blocked`, `inconclusive`
## run.log (optional)
**Purpose**: Detailed execution log for debugging and audit trail.
**Format**:
```
================================================================================
OpenClaw Self-Improve Run Log
Started: <timestamp>
Run Directory: <path>
Mode: <mode>
Objective: <objective>
================================================================================
[TIMESTAMP] [LEVEL] Message
[TIMESTAMP] [LEVEL] Message
...
================================================================================
Run Initialization Completed: <timestamp>
================================================================================
```
**Valid Log Levels**: `INFO`, `WARN`, `ERROR`
## JSON Output Files
### run-info.json
**Purpose**: Machine-readable run metadata.
**Required Keys**:
- `timestamp_utc` (string)
- `mode` (string)
- `repo` (string)
- `objective` (string)
- `scope` (string)
- `validation_gate` (string)
- `git_commit` (string)
- `git_branch` (string)
- `generated_at_utc` (string)
- `artifacts` (object)
- `markdown` (object)
- `run_info` (string)
- `baseline` (string)
- `hypotheses` (string)
- `proposal` (string)
- `validation` (string)
- `outcome` (string)
- `json` (object)
- `run_info` (string)
- `summary` (string)
### summary.json
**Purpose**: High-level summary for CI/CD integration.
**Required Keys**:
- `run_dir` (string)
- `timestamp_utc` (string)
- `mode` (string)
- `objective` (string)
- `scope` (string)
- `approval_status` (string)
- `baseline_status` (string)
- `validation_status` (string)
- `outcome_status` (string)
- `selected_hypothesis` (string)
- `next_iteration` (string)
- `generated_at_utc` (string)
**Valid Status Values**: `pass`, `fail`, `blocked`, `inconclusive`
**Valid Approval Status Values**: `pending`, `approved`, `approved and implemented`, `rejected`, `blocked`
## Validation Rules
1. All required sections must be present in their respective files
2. Status values must be one of the valid values listed above
3. Timestamps must be in ISO 8601 format (YYYY-MM-DD HH:MM:SS UTC)
4. File paths must be absolute or relative to the run directory
5. JSON files must be valid JSON with proper formatting (2-space indentation)
6. All markdown files must use GitHub-flavored markdown syntax
FILE:references/playbooks.md
# Improvement Playbooks
This document provides metric selection guidance for common improvement objectives.
## Reliability Playbook
**Objective**: Reduce errors, failures, and flaky behavior.
**Primary Metrics**:
- Failed run count (absolute and percentage)
- Retry count per session
- Error rate by type
- Mean time to recovery (MTTR)
- Flaky test count and pass rate
**Baseline Measurement**:
```bash
# Run 100 representative sessions and count failures
for i in {1..100}; do
run-session.sh >> session_results.log 2>&1
done
grep -c "ERROR" session_results.log
grep -c "RETRY" session_results.log
```
**Validation Gate**:
```bash
# Run tests with retry and measure flakiness
pnpm test -- --retries 3 --reporter json > test_results.json
```
**Success Criteria**: Error rate reduced by at least 50%, no increase in flaky tests.
## Performance Playbook
**Objective**: Reduce latency, startup time, or resource usage.
**Primary Metrics**:
- Latency (p50, p95, p99)
- Startup time
- Memory usage
- CPU usage
- Token consumption (for LLM calls)
- Request throughput
**Baseline Measurement**:
```bash
# Measure startup time
time pnpm start -- --no-watch
# Measure latency for 1000 requests
ab -n 1000 -c 10 http://localhost:3000/api/endpoint
```
**Validation Gate**:
```bash
# Measure performance with load testing
pnpm run benchmark -- --duration 60s --concurrency 10
```
**Success Criteria**: Latency reduced by at least 30%, no memory regression.
## Quality Playbook
**Objective**: Improve code quality, test coverage, or reduce defects.
**Primary Metrics**:
- Test coverage percentage
- Regression count
- Code quality score
- Defect density
- Type safety violations
**Baseline Measurement**:
```bash
# Measure test coverage
pnpm test -- --coverage --reporters=json > coverage.json
# Measure code quality
eslint src/ --format json > quality.json
```
**Validation Gate**:
```bash
# Run full test suite with coverage
pnpm test -- --coverage --minCoveragePercentage=80
```
**Success Criteria**: Coverage increased by at least 10%, no quality score decrease.
## Cost Playbook
**Objective**: Reduce token usage, API calls, or computational cost.
**Primary Metrics**:
- Tokens per session
- API call count
- Cost per operation
- Unnecessary tool invocations
- Cache hit rate
**Baseline Measurement**:
```bash
# Run representative sessions and measure token usage
export MEASURE_TOKENS=true
for i in {1..100}; do
run-session.sh 2>&1 | grep "tokens_used"
done | awk '{sum+=$NF} END {print "Average:", sum/NR}'
```
**Validation Gate**:
```bash
# Measure token usage in production-like conditions
pnpm run measure-cost -- --sessions 100 --output cost_report.json
```
**Success Criteria**: Token usage reduced by at least 20%, no functionality loss.
## Safety Playbook
**Objective**: Improve security, reduce vulnerabilities, or enhance safety.
**Primary Metrics**:
- Vulnerability count by severity
- Security test pass rate
- Unsafe API usage count
- Input validation coverage
- Rate limit violations
**Baseline Measurement**:
```bash
# Scan for vulnerabilities
npm audit --json > audit.json
# Run security tests
pnpm test -- --testPathPattern=security
```
**Validation Gate**:
```bash
# Security scan and validation
npm audit --audit-level=moderate
pnpm run security-check
```
**Success Criteria**: All high/critical vulnerabilities fixed, no new vulnerabilities introduced.
## UX Playbook
**Objective**: Improve user experience, reduce friction, or enhance usability.
**Primary Metrics**:
- User satisfaction score
- Task completion rate
- Time to completion
- Error recovery rate
- Feature adoption rate
**Baseline Measurement**:
```bash
# Collect user feedback
survey-tool.sh --users 50 --questions "satisfaction,ease-of-use,friction"
# Measure task completion
usability-test.sh --tasks 10 --users 20 > ux_baseline.json
```
**Validation Gate**:
```bash
# Validate UX improvements
usability-test.sh --tasks 10 --users 20 --compare ux_baseline.json
```
**Success Criteria**: Satisfaction increased by at least 15%, completion rate improved.
## Scalability Playbook
**Objective**: Handle increased load, more users, or larger datasets.
**Primary Metrics**:
- Throughput (requests per second)
- Latency under load (p95, p99)
- Resource scaling factor
- Error rate under load
- Connection pool utilization
**Baseline Measurement**:
```bash
# Load test with increasing concurrency
for concurrency in 10 50 100 500; do
ab -n 1000 -c $concurrency http://localhost:3000/api/endpoint
done
```
**Validation Gate**:
```bash
# Load test with target concurrency
ab -n 10000 -c 500 http://localhost:3000/api/endpoint
```
**Success Criteria**: Handle 5x current load with <10% latency increase.
## Selecting Your Playbook
1. **Identify your objective**: What aspect of OpenClaw needs improvement?
2. **Choose primary metrics**: Pick 1-3 metrics that directly measure success
3. **Establish baseline**: Measure current state before improvements
4. **Define validation gate**: Specify exact commands to run after changes
5. **Set success criteria**: Be explicit about what constitutes success
6. **Document assumptions**: Record environment, load, and other assumptions
## Combining Playbooks
For complex improvements, combine metrics from multiple playbooks:
**Example**: "Improve reliability and performance"
- Primary: Error rate reduction (Reliability)
- Secondary: Latency reduction (Performance)
- Validation: Run tests + load test
- Success: Error rate -50% AND latency -30%
## Common Pitfalls
- **Measuring the wrong thing**: Ensure metrics align with objective
- **Insufficient baseline**: Measure enough samples for statistical significance
- **Ignoring side effects**: Monitor for regressions in other metrics
- **Unrealistic targets**: Set achievable goals based on current state
- **Forgetting environment**: Document hardware, network, and software versions
FILE:scripts/backup-repo.sh
#!/usr/bin/env bash
set -euo pipefail
usage() {
cat <<'USAGE'
Usage:
backup-repo.sh --repo <path> --backup-dir <dir> [--exclude <pattern>]
Creates a zip backup of a repository for rollback purposes.
Useful for non-git repositories or as an additional safety measure.
Options:
--repo <path> Repository path to backup
--backup-dir <dir> Directory to store backup files
--exclude <pattern> Exclude files matching pattern (can be used multiple times)
--timestamp <YYYYMMDD-HHMMSS> Custom timestamp for backup (default: current UTC)
-h, --help Display this help message
Returns:
Path to the created backup file
USAGE
}
REPO=""
BACKUP_DIR=""
TIMESTAMP="$(date -u +%Y%m%d-%H%M%S)"
EXCLUDE_PATTERNS=()
while [[ $# -gt 0 ]]; do
case "$1" in
--repo)
REPO="-"
shift 2
;;
--backup-dir)
BACKUP_DIR="-"
shift 2
;;
--timestamp)
TIMESTAMP="-"
shift 2
;;
--exclude)
EXCLUDE_PATTERNS+=("-")
shift 2
;;
-h|--help)
usage
exit 0
;;
*)
echo "Unknown argument: $1" >&2
usage >&2
exit 1
;;
esac
done
if [[ -z "$REPO" ]]; then
echo "Missing required --repo <path>" >&2
usage >&2
exit 1
fi
if [[ -z "$BACKUP_DIR" ]]; then
echo "Missing required --backup-dir <dir>" >&2
usage >&2
exit 1
fi
if [[ ! -d "$REPO" ]]; then
echo "Repository path does not exist: $REPO" >&2
exit 1
fi
mkdir -p "$BACKUP_DIR"
REPO_ABS="$(cd "$REPO" && pwd)"
REPO_NAME=$(basename "$REPO_ABS")
BACKUP_FILE="$BACKUP_DIR/REPO_NAME_backup_TIMESTAMP.zip"
# Build zip command with exclusions
ZIP_CMD="zip -r -q '$BACKUP_FILE' '$REPO_ABS'"
# Add default exclusions
DEFAULT_EXCLUDES=(
".git"
".gitignore"
"node_modules"
".venv"
"venv"
"__pycache__"
".pytest_cache"
"dist"
"build"
".DS_Store"
"*.log"
".openclaw-self-improve"
)
for exclude in "DEFAULT_EXCLUDES[@]"; do
ZIP_CMD="$ZIP_CMD -x '*/$exclude/*' '*/$exclude'"
done
# Add user-specified exclusions
for pattern in "EXCLUDE_PATTERNS[@]"; do
ZIP_CMD="$ZIP_CMD -x '$pattern'"
done
if eval "$ZIP_CMD" 2>/dev/null; then
local file_size=$(du -h "$BACKUP_FILE" | cut -f1)
echo "$BACKUP_FILE"
else
echo "Failed to create backup: $BACKUP_FILE" >&2
exit 1
fi
FILE:scripts/detect-validation-gate.sh
#!/usr/bin/env bash
set -euo pipefail
usage() {
cat <<'USAGE'
Usage:
detect-validation-gate.sh --repo <path> [--verbose]
Detects common validation gates (test commands, build commands, etc.) in a repository.
Returns the most likely validation gate command based on project structure.
Options:
--repo <path> Repository path to scan
--verbose Print all detected gates (not just the primary one)
-h, --help Display this help message
USAGE
}
REPO=""
VERBOSE="false"
while [[ $# -gt 0 ]]; do
case "$1" in
--repo)
REPO="-"
shift 2
;;
--verbose)
VERBOSE="true"
shift
;;
-h|--help)
usage
exit 0
;;
*)
echo "Unknown argument: $1" >&2
usage >&2
exit 1
;;
esac
done
if [[ -z "$REPO" ]]; then
echo "Missing required --repo <path>" >&2
usage >&2
exit 1
fi
if [[ ! -d "$REPO" ]]; then
echo "Repository path does not exist: $REPO" >&2
exit 1
fi
detect_gates() {
local repo="$1"
local verbose="-false"
local gates=()
# Check for Node.js/npm/pnpm projects
if [[ -f "$repo/package.json" ]]; then
if grep -q '"test"' "$repo/package.json"; then
gates+=("npm test")
gates+=("pnpm test")
gates+=("yarn test")
fi
if grep -q '"build"' "$repo/package.json"; then
gates+=("npm run build")
gates+=("pnpm build")
fi
fi
# Check for Python projects
if [[ -f "$repo/setup.py" ]] || [[ -f "$repo/pyproject.toml" ]] || [[ -f "$repo/requirements.txt" ]]; then
gates+=("pytest")
gates+=("python -m pytest")
gates+=("python3 -m pytest")
if [[ -f "$repo/Makefile" ]] && grep -q "test" "$repo/Makefile"; then
gates+=("make test")
fi
fi
# Check for Go projects
if [[ -f "$repo/go.mod" ]]; then
gates+=("go test ./...")
gates+=("go test -v ./...")
fi
# Check for Rust projects
if [[ -f "$repo/Cargo.toml" ]]; then
gates+=("cargo test")
gates+=("cargo test --verbose")
fi
# Check for Java/Maven projects
if [[ -f "$repo/pom.xml" ]]; then
gates+=("mvn test")
gates+=("mvn clean test")
fi
# Check for Java/Gradle projects
if [[ -f "$repo/build.gradle" ]] || [[ -f "$repo/build.gradle.kts" ]]; then
gates+=("gradle test")
gates+=("./gradlew test")
fi
# Check for Make-based projects
if [[ -f "$repo/Makefile" ]]; then
if grep -q "^test:" "$repo/Makefile"; then
gates+=("make test")
fi
if grep -q "^check:" "$repo/Makefile"; then
gates+=("make check")
fi
fi
# Check for Docker
if [[ -f "$repo/Dockerfile" ]]; then
gates+=("docker build .")
fi
# Check for shell scripts
if find "$repo" -maxdepth 2 -name "test.sh" -o -name "run-tests.sh" | grep -q .; then
gates+=("bash test.sh")
gates+=("bash run-tests.sh")
fi
# Remove duplicates
local unique_gates=()
for gate in "gates[@]"; do
if [[ ! " unique_gates[@] " =~ " gate " ]]; then
unique_gates+=("$gate")
fi
done
if [[ "$verbose" == "true" ]]; then
if [[ #unique_gates[@] -gt 0 ]]; then
echo "Detected validation gates:"
for gate in "unique_gates[@]"; do
echo " - $gate"
done
else
echo "No validation gates detected."
fi
else
if [[ #unique_gates[@] -gt 0 ]]; then
echo "unique_gates[0]"
else
echo "npm test"
fi
fi
}
detect_gates "$REPO" "$VERBOSE"
FILE:scripts/export-improvement-run-json.py
#!/usr/bin/env python3
import argparse
import json
import sys
from datetime import datetime, timezone
from pathlib import Path
def read_lines(path: Path) -> list[str]:
if not path.is_file():
raise FileNotFoundError(f"Missing required file: {path}")
return path.read_text(encoding="utf-8").splitlines()
def require_prefixed_value(lines: list[str], prefix: str, label: str) -> str:
"""Find and return the first line starting with the given prefix."""
for line in lines:
if line.startswith(prefix):
return line[len(prefix) :].strip()
# For non-critical fields, return a sensible default
if label in ("Commit", "Branch"):
return "n/a"
raise ValueError(f"Missing required field '{label}'")
def section_lines(lines: list[str], heading: str) -> list[str]:
"""Extract all lines between a heading and the next heading."""
in_section = False
collected: list[str] = []
for line in lines:
if line == heading:
in_section = True
continue
if in_section and line.startswith("## "):
break
if in_section:
collected.append(line)
return collected
def first_bullet(section: list[str], label: str) -> str:
"""Extract the first bullet item from a section, falling back to concatenated text."""
for line in section:
if line.startswith("- "):
return line[2:].strip()
# If there's no bullet, return the first non-empty line or empty string
for line in section:
stripped = line.strip()
if stripped:
return stripped
return ""
def normalize_section_text(section: list[str]) -> str:
"""Join all lines in a section, trimming whitespace."""
text = "\n".join(line.rstrip() for line in section).strip()
return text if text else ""
def main() -> int:
parser = argparse.ArgumentParser(
description="Export machine-readable JSON files from an OpenClaw self-improvement run."
)
parser.add_argument("--run-dir", required=True, help="Path to the run directory")
args = parser.parse_args()
run_dir = Path(args.run_dir).resolve()
if not run_dir.is_dir():
print(f"Run directory does not exist: {run_dir}", file=sys.stderr)
return 1
try:
run_info_lines = read_lines(run_dir / "run-info.md")
baseline_lines = read_lines(run_dir / "baseline.md")
proposal_lines = read_lines(run_dir / "proposal.md")
validation_lines = read_lines(run_dir / "validation.md")
outcome_lines = read_lines(run_dir / "outcome.md")
timestamp_utc = require_prefixed_value(
run_info_lines, "- Timestamp (UTC):", "Timestamp (UTC)"
)
mode = require_prefixed_value(run_info_lines, "- Mode:", "Mode")
repo = require_prefixed_value(run_info_lines, "- Repo:", "Repo")
objective = require_prefixed_value(run_info_lines, "- Objective:", "Objective")
scope = require_prefixed_value(run_info_lines, "- Scope:", "Scope")
validation_gate = require_prefixed_value(
run_info_lines, "- Validation Gate:", "Validation Gate"
)
repo_state = section_lines(baseline_lines, "## Repo State")
git_commit = require_prefixed_value(repo_state, "- Commit:", "Commit")
git_branch = require_prefixed_value(repo_state, "- Branch:", "Branch")
baseline_status = first_bullet(section_lines(baseline_lines, "## Status"), "baseline status")
approval_status = first_bullet(
section_lines(proposal_lines, "## Approval Status"), "approval status"
)
validation_status = first_bullet(
section_lines(validation_lines, "## Status"), "validation status"
)
outcome_status = first_bullet(section_lines(outcome_lines, "## Status"), "outcome status")
selected_hypothesis = normalize_section_text(
section_lines(proposal_lines, "## Selected Hypothesis")
)
next_iteration = normalize_section_text(section_lines(outcome_lines, "## Next Iteration"))
generated_at_utc = datetime.now(timezone.utc).replace(microsecond=0).isoformat()
run_info_json = {
"timestamp_utc": timestamp_utc,
"mode": mode,
"repo": str(repo),
"objective": objective,
"scope": scope,
"validation_gate": validation_gate,
"git_commit": git_commit,
"git_branch": git_branch,
"generated_at_utc": generated_at_utc,
"artifacts": {
"markdown": {
"run_info": str(run_dir / "run-info.md"),
"baseline": str(run_dir / "baseline.md"),
"hypotheses": str(run_dir / "hypotheses.md"),
"proposal": str(run_dir / "proposal.md"),
"validation": str(run_dir / "validation.md"),
"outcome": str(run_dir / "outcome.md"),
},
"json": {
"run_info": str(run_dir / "run-info.json"),
"summary": str(run_dir / "summary.json"),
},
},
}
summary_json = {
"run_dir": str(run_dir),
"timestamp_utc": timestamp_utc,
"mode": mode,
"objective": objective,
"scope": scope,
"approval_status": approval_status,
"baseline_status": baseline_status,
"validation_status": validation_status,
"outcome_status": outcome_status,
"selected_hypothesis": selected_hypothesis,
"next_iteration": next_iteration,
"generated_at_utc": generated_at_utc,
}
(run_dir / "run-info.json").write_text(
json.dumps(run_info_json, indent=2, sort_keys=True) + "\n",
encoding="utf-8",
)
(run_dir / "summary.json").write_text(
json.dumps(summary_json, indent=2, sort_keys=True) + "\n",
encoding="utf-8",
)
except (FileNotFoundError, ValueError) as exc:
print(str(exc), file=sys.stderr)
return 1
print(f"Exported JSON to {run_dir}")
return 0
if __name__ == "__main__":
sys.exit(main())
FILE:scripts/init-improvement-run.sh
#!/usr/bin/env bash
set -euo pipefail
usage() {
cat <<'USAGE'
Usage:
init-improvement-run.sh --repo <path> [options]
Options:
--timestamp <YYYYMMDD-HHMMSS> Fixed run timestamp (default: current UTC)
--mode <audit-only|proposal-only|approved-implementation>
--objective <text>
--scope <text>
--validation-gate <text>
--dry-run Print resolved values without creating files
--force Overwrite an existing run directory safely
--rollback Rollback changes from a previous run
--auto-detect-validation Auto-detect validation gate from repo structure
--create-backup Create a zip backup before running (non-git)
--enable-logging Enable detailed run logging
Creates:
<repo>/.openclaw-self-improve/<timestamp>/
with files:
run-info.md baseline.md hypotheses.md proposal.md validation.md outcome.md run.log
USAGE
}
REPO=""
TIMESTAMP="$(date -u +%Y%m%d-%H%M%S)"
MODE="proposal-only"
OBJECTIVE=""
SCOPE=""
VALIDATION_GATE=""
DRY_RUN="false"
FORCE="false"
ROLLBACK="false"
AUTO_DETECT_VALIDATION="false"
CREATE_BACKUP="false"
ENABLE_LOGGING="false"
trim() {
local value="$1"
value="space:]]*"}"
value="space:]]"}"
printf '%s' "$value"
}
while [[ $# -gt 0 ]]; do
case "$1" in
--repo)
REPO="-"
shift 2
;;
--timestamp)
TIMESTAMP="-"
shift 2
;;
--mode)
MODE="-"
shift 2
;;
--objective)
OBJECTIVE="-"
shift 2
;;
--scope)
SCOPE="-"
shift 2
;;
--validation-gate)
VALIDATION_GATE="-"
shift 2
;;
--dry-run)
DRY_RUN="true"
shift
;;
--force)
FORCE="true"
shift
;;
--rollback)
ROLLBACK="true"
shift
;;
--auto-detect-validation)
AUTO_DETECT_VALIDATION="true"
shift
;;
--create-backup)
CREATE_BACKUP="true"
shift
;;
--enable-logging)
ENABLE_LOGGING="true"
shift
;;
-h|--help)
usage
exit 0
;;
*)
echo "Unknown argument: $1" >&2
usage >&2
exit 1
;;
esac
done
REPO="$(trim "$REPO")"
MODE="$(trim "$MODE")"
TIMESTAMP="$(trim "$TIMESTAMP")"
OBJECTIVE="$(trim "$OBJECTIVE")"
SCOPE="$(trim "$SCOPE")"
VALIDATION_GATE="$(trim "$VALIDATION_GATE")"
if [[ -z "$REPO" ]]; then
echo "Missing required --repo <path>" >&2
usage >&2
exit 1
fi
if [[ ! -d "$REPO" ]]; then
echo "Repo path does not exist: $REPO" >&2
exit 1
fi
case "$MODE" in
audit-only|proposal-only|approved-implementation)
;;
*)
echo "Invalid --mode value: $MODE" >&2
exit 1
;;
esac
if [[ ! "$TIMESTAMP" =~ ^[0-9]{8}-[0-9]{6}$ ]]; then
echo "Invalid --timestamp format: $TIMESTAMP (expected YYYYMMDD-HHMMSS)" >&2
exit 1
fi
REPO_ABS="$(cd "$REPO" && pwd)"
RUN_DIR="$REPO_ABS/.openclaw-self-improve/$TIMESTAMP"
SCRIPT_DIR="$(cd "$(dirname "BASH_SOURCE[0]")" && pwd)"
if [[ -e "$RUN_DIR" ]]; then
if [[ "$FORCE" != "true" ]]; then
echo "Run directory already exists: $RUN_DIR" >&2
echo "Use a different --timestamp or pass --force." >&2
exit 1
fi
if [[ ! -d "$RUN_DIR" ]]; then
echo "Existing path is not a directory: $RUN_DIR" >&2
exit 1
fi
fi
GIT_COMMIT="n/a"
GIT_BRANCH="n/a"
IS_GIT_REPO="false"
if git -C "$REPO_ABS" rev-parse --is-inside-work-tree >/dev/null 2>&1; then
IS_GIT_REPO="true"
GIT_COMMIT="$(git -C "$REPO_ABS" rev-parse --short HEAD 2>/dev/null || echo n/a)"
GIT_BRANCH="$(git -C "$REPO_ABS" rev-parse --abbrev-ref HEAD 2>/dev/null || echo n/a)"
fi
# Auto-detect validation gate if requested
if [[ "$AUTO_DETECT_VALIDATION" == "true" ]] && [[ -z "$VALIDATION_GATE" ]]; then
if [[ -x "$SCRIPT_DIR/detect-validation-gate.sh" ]]; then
VALIDATION_GATE="$("$SCRIPT_DIR/detect-validation-gate.sh" --repo "$REPO_ABS" 2>/dev/null || echo 'npm test')"
fi
fi
OBJECTIVE_VALUE="-TODO: define objective"
SCOPE_VALUE="-$REPO_ABS"
VALIDATION_VALUE="-TODO: define validation gate commands"
if [[ "$DRY_RUN" == "true" ]]; then
cat <<EOF_DRYRUN
Dry run (no files created):
- Timestamp (UTC): $TIMESTAMP
- Mode: $MODE
- Repo: $REPO_ABS
- Is Git Repo: $IS_GIT_REPO
- Objective: $OBJECTIVE_VALUE
- Scope: $SCOPE_VALUE
- Validation Gate: $VALIDATION_VALUE
- Run Dir: $RUN_DIR
- Logging Enabled: $ENABLE_LOGGING
EOF_DRYRUN
exit 0
fi
mkdir -p "$RUN_DIR"
if [[ "$FORCE" == "true" ]]; then
rm -f \
"$RUN_DIR/run-info.md" \
"$RUN_DIR/baseline.md" \
"$RUN_DIR/hypotheses.md" \
"$RUN_DIR/proposal.md" \
"$RUN_DIR/validation.md" \
"$RUN_DIR/outcome.md" \
"$RUN_DIR/run.log"
if find "$RUN_DIR" -mindepth 1 -maxdepth 1 -print -quit | grep -q .; then
echo "Refusing --force overwrite; directory has unexpected files: $RUN_DIR" >&2
exit 1
fi
fi
# Initialize logging if enabled
LOG_FILE=""
if [[ "$ENABLE_LOGGING" == "true" ]]; then
LOG_FILE="$RUN_DIR/run.log"
cat > "$LOG_FILE" <<EOF_LOG
================================================================================
OpenClaw Self-Improve Run Log
Started: $(date -u '+%Y-%m-%d %H:%M:%S UTC')
Run Directory: $RUN_DIR
Mode: $MODE
Objective: $OBJECTIVE_VALUE
================================================================================
EOF_LOG
fi
cat > "$RUN_DIR/run-info.md" <<EOF_INFO
# Run Info
- Timestamp (UTC): $(echo "$TIMESTAMP" | sed 's/[^a-zA-Z0-9 -]//g')
- Mode: $(echo "$MODE" | sed 's/[^a-zA-Z0-9 -]//g')
- Repo: $(echo "$REPO_ABS" | sed 's/[^a-zA-Z0-9 /._-]//g')
- Objective: $(echo "$OBJECTIVE_VALUE" | sed 's/[^a-zA-Z0-9 -]//g')
- Scope: $(echo "$SCOPE_VALUE" | sed 's/[^a-zA-Z0-9 /._-]//g')
- Validation Gate: $(echo "$VALIDATION_VALUE" | sed 's/[^a-zA-Z0-9 -]//g')
- Git Commit: $(echo "$GIT_COMMIT" | sed 's/[^a-zA-Z0-9 -]//g')
- Git Branch: $(echo "$GIT_BRANCH" | sed 's/[^a-zA-Z0-9 -]//g')
- Is Git Repository: $IS_GIT_REPO
- Logging Enabled: $ENABLE_LOGGING
EOF_INFO
cat > "$RUN_DIR/baseline.md" <<EOF_BASELINE
# Baseline
## Objective
$OBJECTIVE_VALUE
## Scope
$SCOPE_VALUE
## Repo State
- Commit: $GIT_COMMIT
- Branch: $GIT_BRANCH
- Is Git Repository: $IS_GIT_REPO
## Reproduction
_(Describe steps to reproduce the starting condition)_
## Metrics
_(Record measurable baseline numbers here)_
## Risks
_(List known risks and assumptions)_
## Status
- pass|fail|blocked|inconclusive
EOF_BASELINE
cat > "$RUN_DIR/hypotheses.md" <<'EOF_HYP'
# Hypotheses
## Hypothesis 1
## Hypothesis 2
## Hypothesis 3
## Ranking
EOF_HYP
cat > "$RUN_DIR/proposal.md" <<EOF_PROP
# Proposal
## Selected Hypothesis
## Planned Changes
## Files To Edit
## Validation Gate
$VALIDATION_VALUE
## Rollback Plan
## Approval Status
- pending
EOF_PROP
cat > "$RUN_DIR/validation.md" <<EOF_VAL
# Validation
## Commands Run
$VALIDATION_VALUE
## Results
_(Record command output or measured values)_
## Baseline vs New
_(Show concrete before/after comparison)_
## Pass/Fail
_(Clearly state the result)_
## Status
- pass|fail|blocked|inconclusive
EOF_VAL
cat > "$RUN_DIR/outcome.md" <<'EOF_OUT'
# Outcome
## Summary
_(What changed and why)_
## Evidence
_(Link metrics, logs, or test results that prove the outcome)_
## Residual Risk
_(What might still go wrong)_
## Next Iteration
_(If further improvements are needed, what is the smallest next change?)_
## Status
- pass|fail|blocked|inconclusive
EOF_OUT
# Create backup if requested and not a git repo
if [[ "$CREATE_BACKUP" == "true" ]] && [[ "$IS_GIT_REPO" == "false" ]]; then
if [[ -x "$SCRIPT_DIR/backup-repo.sh" ]]; then
BACKUP_DIR="$RUN_DIR/backups"
mkdir -p "$BACKUP_DIR"
if BACKUP_FILE="$("$SCRIPT_DIR/backup-repo.sh" --repo "$REPO_ABS" --backup-dir "$BACKUP_DIR" 2>/dev/null || echo '')"; then
if [[ -n "$BACKUP_FILE" ]]; then
echo "Backup created: $BACKUP_FILE" >&2
if [[ -n "$LOG_FILE" ]]; then
echo "Backup created: $BACKUP_FILE" >> "$LOG_FILE"
fi
fi
fi
fi
fi
if [[ "$ROLLBACK" == "true" ]]; then
echo "Rolling back changes for run: $RUN_DIR"
if [[ -n "$LOG_FILE" ]]; then
echo "Rolling back changes" >> "$LOG_FILE"
fi
if [[ "$IS_GIT_REPO" == "true" ]]; then
git -C "$REPO_ABS" checkout .
echo "Git checkout completed."
if [[ -n "$LOG_FILE" ]]; then
echo "Git checkout completed." >> "$LOG_FILE"
fi
else
echo "Not a git repository, manual rollback required."
if [[ -n "$LOG_FILE" ]]; then
echo "Not a git repository, manual rollback required." >> "$LOG_FILE"
fi
fi
fi
# Finalize logging
if [[ -n "$LOG_FILE" ]]; then
cat >> "$LOG_FILE" <<EOF_LOG_END
================================================================================
Run Initialization Completed: $(date -u '+%Y-%m-%d %H:%M:%S UTC')
================================================================================
EOF_LOG_END
fi
echo "$RUN_DIR"
FILE:scripts/logging-utils.sh
#!/usr/bin/env bash
# Logging utility functions for OpenClaw Self-Improve
# Initialize logging for a run
init_logging() {
local run_dir="$1"
local log_file="$run_dir/run.log"
# Create log file with header
cat > "$log_file" <<EOF
================================================================================
OpenClaw Self-Improve Run Log
Started: $(date -u '+%Y-%m-%d %H:%M:%S UTC')
Run Directory: $run_dir
================================================================================
EOF
echo "$log_file"
}
# Log a message with timestamp
log_message() {
local log_file="$1"
local level="$2"
shift 2
local message="$*"
local timestamp=$(date -u '+%Y-%m-%d %H:%M:%S UTC')
echo "[$timestamp] [$level] $message" >> "$log_file"
}
# Log info message
log_info() {
local log_file="$1"
shift
log_message "$log_file" "INFO" "$@"
}
# Log warning message
log_warn() {
local log_file="$1"
shift
log_message "$log_file" "WARN" "$@"
}
# Log error message
log_error() {
local log_file="$1"
shift
log_message "$log_file" "ERROR" "$@"
}
# Log command execution
log_command() {
local log_file="$1"
local command="$2"
log_info "$log_file" "Executing: $command"
eval "$command" >> "$log_file" 2>&1
local exit_code=$?
if [[ $exit_code -eq 0 ]]; then
log_info "$log_file" "Command succeeded (exit code: $exit_code)"
else
log_error "$log_file" "Command failed (exit code: $exit_code)"
fi
return $exit_code
}
# Log section separator
log_section() {
local log_file="$1"
local section_name="$2"
echo "" >> "$log_file"
echo "=================================================================================" >> "$log_file"
echo "Section: $section_name" >> "$log_file"
echo "=================================================================================" >> "$log_file"
echo "" >> "$log_file"
}
# Append file content to log
log_file_content() {
local log_file="$1"
local file_to_log="$2"
local label="-File Content"
if [[ -f "$file_to_log" ]]; then
log_section "$log_file" "$label"
cat "$file_to_log" >> "$log_file"
else
log_warn "$log_file" "File not found for logging: $file_to_log"
fi
}
# Create a summary of the run
create_run_summary() {
local run_dir="$1"
local log_file="$run_dir/run.log"
local end_time=$(date -u '+%Y-%m-%d %H:%M:%S UTC')
cat >> "$log_file" <<EOF
================================================================================
Run Summary
Completed: $end_time
================================================================================
EOF
}
export -f init_logging log_message log_info log_warn log_error log_command log_section log_file_content create_run_summary
FILE:scripts/validate-improvement-run.sh
#!/usr/bin/env bash
set -euo pipefail
usage() {
cat <<'USAGE'
Usage:
validate-improvement-run.sh --run-dir <path> [--require-json]
Checks:
- required files exist
- required headings exist
- run-info fields exist and are non-empty
- status values are valid
- proposal approval status is valid
- optional JSON artifact presence and structure
USAGE
}
fail() {
echo "$1" >&2
exit 1
}
require_line() {
local file="$1"
local line="$2"
grep -Fqx -- "$line" "$file" || fail "Missing required line '$line' in $file"
}
require_prefix() {
local file="$1"
local prefix="$2"
grep -Fq -- "$prefix" "$file" || fail "Missing required field prefix '$prefix' in $(basename "$file")"
}
extract_section_bullet() {
local file="$1"
local heading="$2"
awk -v heading="$heading" '
$0 == heading { in_section=1; next }
/^## / && in_section { exit }
in_section && $0 ~ /^- / {
sub(/^- /, "", $0)
print $0
exit
}
' "$file"
}
require_value_in_set() {
local label="$1"
local value="$2"
shift 2
local allowed
for allowed in "$@"; do
if [[ "$value" == "$allowed" ]]; then
return 0
fi
done
fail "Invalid $label value: '$value'"
}
RUN_DIR=""
REQUIRE_JSON="false"
while [[ $# -gt 0 ]]; do
case "$1" in
--run-dir)
RUN_DIR="-"
shift 2
;;
--require-json)
REQUIRE_JSON="true"
shift
;;
-h|--help)
usage
exit 0
;;
*)
fail "Unknown argument: $1"
;;
esac
done
if [[ -z "$RUN_DIR" ]]; then
usage >&2
fail "Missing required --run-dir <path>"
fi
if [[ ! -d "$RUN_DIR" ]]; then
fail "Run directory does not exist: $RUN_DIR"
fi
REQUIRED_FILES=(
run-info.md
baseline.md
hypotheses.md
proposal.md
validation.md
outcome.md
)
for file in "REQUIRED_FILES[@]"; do
[[ -f "$RUN_DIR/$file" ]] || fail "Missing required file: $RUN_DIR/$file"
done
RUN_INFO="$RUN_DIR/run-info.md"
BASELINE="$RUN_DIR/baseline.md"
HYPOTHESES="$RUN_DIR/hypotheses.md"
PROPOSAL="$RUN_DIR/proposal.md"
VALIDATION="$RUN_DIR/validation.md"
OUTCOME="$RUN_DIR/outcome.md"
require_line "$RUN_INFO" "# Run Info"
require_prefix "$RUN_INFO" "- Timestamp (UTC):"
require_prefix "$RUN_INFO" "- Mode:"
require_prefix "$RUN_INFO" "- Repo:"
require_prefix "$RUN_INFO" "- Objective:"
require_prefix "$RUN_INFO" "- Scope:"
require_prefix "$RUN_INFO" "- Validation Gate:"
require_line "$BASELINE" "# Baseline"
require_line "$BASELINE" "## Objective"
require_line "$BASELINE" "## Scope"
require_line "$BASELINE" "## Repo State"
require_line "$BASELINE" "## Reproduction"
require_line "$BASELINE" "## Metrics"
require_line "$BASELINE" "## Risks"
require_line "$BASELINE" "## Status"
require_line "$HYPOTHESES" "# Hypotheses"
require_line "$HYPOTHESES" "## Hypothesis 1"
require_line "$HYPOTHESES" "## Ranking"
require_line "$PROPOSAL" "# Proposal"
require_line "$PROPOSAL" "## Selected Hypothesis"
require_line "$PROPOSAL" "## Planned Changes"
require_line "$PROPOSAL" "## Files To Edit"
require_line "$PROPOSAL" "## Validation Gate"
require_line "$PROPOSAL" "## Rollback Plan"
require_line "$PROPOSAL" "## Approval Status"
require_line "$VALIDATION" "# Validation"
require_line "$VALIDATION" "## Commands Run"
require_line "$VALIDATION" "## Results"
require_line "$VALIDATION" "## Baseline vs New"
require_line "$VALIDATION" "## Pass/Fail"
require_line "$VALIDATION" "## Status"
require_line "$OUTCOME" "# Outcome"
require_line "$OUTCOME" "## Summary"
require_line "$OUTCOME" "## Evidence"
require_line "$OUTCOME" "## Residual Risk"
require_line "$OUTCOME" "## Next Iteration"
require_line "$OUTCOME" "## Status"
BASELINE_STATUS="$(extract_section_bullet "$BASELINE" "## Status")"
VALIDATION_STATUS="$(extract_section_bullet "$VALIDATION" "## Status")"
OUTCOME_STATUS="$(extract_section_bullet "$OUTCOME" "## Status")"
APPROVAL_STATUS="$(extract_section_bullet "$PROPOSAL" "## Approval Status")"
[[ -n "$BASELINE_STATUS" ]] || fail "Missing baseline status in $BASELINE"
[[ -n "$VALIDATION_STATUS" ]] || fail "Missing validation status in $VALIDATION"
[[ -n "$OUTCOME_STATUS" ]] || fail "Missing outcome status in $OUTCOME"
[[ -n "$APPROVAL_STATUS" ]] || fail "Missing approval status in $PROPOSAL"
require_value_in_set "baseline status" "$BASELINE_STATUS" pass fail blocked inconclusive
require_value_in_set "validation status" "$VALIDATION_STATUS" pass fail blocked inconclusive
require_value_in_set "outcome status" "$OUTCOME_STATUS" pass fail blocked inconclusive
require_value_in_set \
"approval status" \
"$APPROVAL_STATUS" \
pending \
approved \
"approved and implemented" \
rejected \
blocked
if [[ "$REQUIRE_JSON" == "true" ]]; then
[[ -f "$RUN_DIR/run-info.json" ]] || fail "Missing required JSON file: $RUN_DIR/run-info.json"
[[ -f "$RUN_DIR/summary.json" ]] || fail "Missing required JSON file: $RUN_DIR/summary.json"
python3 - "$RUN_DIR" <<'PY' || fail "JSON validation failed for $RUN_DIR"
import json
import sys
from pathlib import Path
run_dir = Path(sys.argv[1])
run_info = json.loads((run_dir / "run-info.json").read_text(encoding="utf-8"))
summary = json.loads((run_dir / "summary.json").read_text(encoding="utf-8"))
required_run_info_keys = {
"artifacts",
"generated_at_utc",
"git_branch",
"git_commit",
"mode",
"objective",
"repo",
"scope",
"timestamp_utc",
"validation_gate",
}
required_summary_keys = {
"approval_status",
"baseline_status",
"generated_at_utc",
"mode",
"next_iteration",
"objective",
"outcome_status",
"run_dir",
"scope",
"selected_hypothesis",
"timestamp_utc",
"validation_status",
}
missing_run_info = sorted(required_run_info_keys - run_info.keys())
missing_summary = sorted(required_summary_keys - summary.keys())
if missing_run_info:
raise SystemExit(f"run-info.json missing keys: {', '.join(missing_run_info)}")
if missing_summary:
raise SystemExit(f"summary.json missing keys: {', '.join(missing_summary)}")
if summary["run_dir"] != str(run_dir):
raise SystemExit("summary.json run_dir does not match requested run directory")
if run_info["timestamp_utc"] != summary["timestamp_utc"]:
raise SystemExit("JSON timestamp mismatch between run-info.json and summary.json")
if run_info["mode"] != summary["mode"]:
raise SystemExit("JSON mode mismatch between run-info.json and summary.json")
PY
fi
echo "Validation successful for $RUN_DIR"