@clawhub-xaviermary56-a2963f92c2
Detect SQL injection risks in PHP/Yaf projects and generate parameterized query fix patches. Scans for string concatenation in SQL, unsafe superglobal interp...
---
name: php-sql-fixer
version: 1.0.0
description: "Detect SQL injection risks in PHP/Yaf projects and generate parameterized query fix patches. Scans for string concatenation in SQL, unsafe superglobal interpolation, and sprintf-based injection. Outputs annotated findings with before/after fix suggestions. Works with PHP 7.3 and common Yaf DB patterns."
emoji: 💉
user-invocable: true
homepage: https://github.com/XavierMary56/OmniPublish
requires:
- yaf-php-audit
metadata:
openclaw:
requires:
bins:
- bash
- grep
- php
---
# PHP SQL Fixer
Detect SQL injection risks and generate parameterized query fix patches for PHP/Yaf projects.
## Overview
This skill does two things:
1. **Scan** — find all SQL injection candidates in a PHP project (string concatenation, superglobal interpolation, unsafe sprintf)
2. **Fix** — for each finding, generate the parameterized equivalent and explain the change
Always prefer minimal, targeted fixes. Do not refactor surrounding code. Do not change DB abstraction patterns that already exist in the project.
---
## Workflow
### Step 1 — Run the scanner
```bash
bash "$SKILL_DIR/scripts/scan_sql.sh" <project-root> [output-file]
```
Read the output carefully. The scanner flags candidates, not confirmed vulnerabilities. Some hits may be false positives (e.g. SQL built from constants, not user input).
### Step 2 — Triage findings
For each flagged file:
- Open the file and read the full context around the hit (at least ±10 lines)
- Confirm whether user-controlled input (`$_GET`, `$_POST`, `$_REQUEST`, function params from controllers) reaches the SQL string
- Mark each finding as: **confirmed** / **suspected** / **false positive**
### Step 3 — Generate fix suggestions
```bash
php "$SKILL_DIR/scripts/suggest_fix.php" <file-path>
```
The script outputs annotated before/after for each risky SQL statement in the file.
### Step 4 — Apply fixes
Apply fixes manually or with targeted `Edit` tool calls. Rules:
- Use parameterized queries matching the project's existing DB pattern (PDO, custom model, etc.)
- Do not change method signatures or surrounding business logic
- Add a `// FIXED: sql injection` comment on the line where the fix was applied
- Run `php -l <file>` after every edit to verify syntax
### Step 5 — Verify
```bash
# syntax check
docker compose -f /mnt/d/Users/Public/php20250819/docker-php7.3/docker-compose.yml \
exec fpm-server php -l /var/www/html/2026www/<project>/<file>
# re-scan to confirm no remaining hits
bash "$SKILL_DIR/scripts/scan_sql.sh" <project-root>
```
---
## Fix Patterns
See `references/fix-patterns.md` for the complete catalog. Quick reference:
### Pattern 1 — String concatenation
```php
// BEFORE (unsafe)
$sql = "SELECT * FROM users WHERE id = " . $id;
$res = $db->query($sql);
// AFTER (PDO)
$stmt = $db->prepare("SELECT * FROM users WHERE id = ?");
$stmt->execute([$id]);
$res = $stmt->fetchAll();
```
### Pattern 2 — Variable interpolation
```php
// BEFORE (unsafe)
$sql = "SELECT * FROM orders WHERE status = '$status' AND uid = $uid";
// AFTER (PDO named placeholders)
$stmt = $db->prepare("SELECT * FROM orders WHERE status = :status AND uid = :uid");
$stmt->execute([':status' => $status, ':uid' => $uid]);
```
### Pattern 3 — sprintf injection
```php
// BEFORE (unsafe)
$sql = sprintf("SELECT * FROM t WHERE name = '%s'", $name);
// AFTER
$stmt = $db->prepare("SELECT * FROM t WHERE name = ?");
$stmt->execute([$name]);
```
### Pattern 4 — Yaf Model with custom query builder
```php
// BEFORE (unsafe — raw string passed to model)
$this->_model->where("user_id = $uid AND type = '$type'")->find();
// AFTER (use array condition — depends on your Model API)
$this->_model->where(['user_id' => $uid, 'type' => $type])->find();
// OR if model supports raw+bindings:
$this->_model->where("user_id = ? AND type = ?", [$uid, $type])->find();
```
### Pattern 5 — IN clause with array
```php
// BEFORE (unsafe)
$ids = implode(',', $id_arr);
$sql = "SELECT * FROM t WHERE id IN ($ids)";
// AFTER (PHP 7.3 compatible)
$placeholders = implode(',', array_fill(0, count($id_arr), '?'));
$stmt = $db->prepare("SELECT * FROM t WHERE id IN ($placeholders)");
$stmt->execute($id_arr);
```
---
## What NOT to Change
- Do not switch DB abstraction libraries (e.g. from custom Model to bare PDO) unless the whole project already uses PDO
- Do not parameterize column names or table names — these cannot be parameterized; use an allowlist instead
- Do not touch SQL built entirely from constants with no user input
- Do not change surrounding cache logic, error handling, or return values
---
## False Positive Checklist
Before reporting a finding as confirmed SQL injection:
- [ ] Does user-controlled input actually reach this SQL string?
- [ ] Is the value an integer that was already `intval()`-cast earlier?
- [ ] Is the value selected from a fixed allowlist (e.g. column name from a whitelist array)?
- [ ] Is the SQL built from config constants only (no request data)?
If all four are "no" → confirmed risk. If any is "yes" → suspected or false positive.
---
## Bulk Fix Guidance
When fixing many files across a project:
1. Run `scan_sql.sh` on the whole project, save output to file
2. Sort findings by controller/callback/payment paths first
3. Fix highest-risk files first (payment, callback, login)
4. Re-scan after each batch to track progress
5. Never mix SQL fix commits with unrelated changes
FILE:references/fix-patterns.md
# SQL 注入修复模式手册
PHP 7.3 / Yaf 项目常见 SQL 注入模式及参数化修复方法。
---
## 规则
1. **列名和表名不能参数化** — 只有值可以参数化。列名/表名必须使用白名单。
2. **整数参数先 (int) 转换** — 即使参数化,整数字段也要在绑定前 `(int)` 转换,防止类型混淆。
3. **不改 DB 层** — 修复只在调用点添加参数化,不引入新的 DB 库。
4. **PHP 7.3 语法** — 不使用 match、命名参数、nullsafe operator 等。
---
## Pattern 1 — 字符串拼接
最常见,最危险。
```php
// ❌ 危险
$sql = "SELECT * FROM users WHERE id = " . $id;
$sql = "SELECT * FROM users WHERE name = '" . $name . "'";
$sql = "UPDATE orders SET status = " . $status . " WHERE id = " . $id;
// ✅ 修复(PDO)
$stmt = $pdo->prepare("SELECT * FROM users WHERE id = ?");
$stmt->execute([$id]);
$rows = $stmt->fetchAll(PDO::FETCH_ASSOC);
$stmt = $pdo->prepare("SELECT * FROM users WHERE name = ?");
$stmt->execute([$name]);
$stmt = $pdo->prepare("UPDATE orders SET status = ? WHERE id = ?");
$stmt->execute([$status, $id]);
```
---
## Pattern 2 — 双引号变量插值
```php
// ❌ 危险
$sql = "SELECT * FROM t WHERE uid = $uid AND type = '$type'";
$sql = "DELETE FROM logs WHERE user_id = $user_id";
// ✅ 修复
$stmt = $pdo->prepare("SELECT * FROM t WHERE uid = ? AND type = ?");
$stmt->execute([$uid, $type]);
$stmt = $pdo->prepare("DELETE FROM logs WHERE user_id = ?");
$stmt->execute([$user_id]);
```
---
## Pattern 3 — sprintf
```php
// ❌ 危险(%s 不等于安全转义)
$sql = sprintf("SELECT * FROM t WHERE name = '%s' AND role = %d", $name, $role);
// ✅ 修复
$stmt = $pdo->prepare("SELECT * FROM t WHERE name = ? AND role = ?");
$stmt->execute([$name, (int)$role]);
```
---
## Pattern 4 — 超全局变量直接入 SQL
```php
// ❌ 危险(极高风险)
$sql = "SELECT * FROM t WHERE id = " . $_GET['id'];
$sql = "SELECT * FROM t WHERE name = '" . $_POST['name'] . "'";
// ✅ 修复
$id = (int) ($_GET['id'] ?? 0);
$stmt = $pdo->prepare("SELECT * FROM t WHERE id = ?");
$stmt->execute([$id]);
$name = trim($_POST['name'] ?? '');
$stmt = $pdo->prepare("SELECT * FROM t WHERE name = ?");
$stmt->execute([$name]);
```
---
## Pattern 5 — IN 子句
```php
// ❌ 危险
$ids = implode(',', $id_arr);
$sql = "SELECT * FROM t WHERE id IN ($ids)";
// ✅ 修复(PHP 7.3 兼容)
$id_arr = array_map('intval', $id_arr); // 如果是整数先强转
$placeholders = implode(',', array_fill(0, count($id_arr), '?'));
$stmt = $pdo->prepare("SELECT * FROM t WHERE id IN ($placeholders)");
$stmt->execute($id_arr);
```
---
## Pattern 6 — LIKE 子句
```php
// ❌ 危险
$sql = "SELECT * FROM t WHERE title LIKE '%" . $kw . "%'";
$sql = "SELECT * FROM t WHERE title LIKE '%$kw%'";
// ✅ 修复(通配符放在绑定值里)
$stmt = $pdo->prepare("SELECT * FROM t WHERE title LIKE ?");
$stmt->execute(["%$kw%"]);
```
---
## Pattern 7 — ORDER BY(列名不能参数化)
```php
// ❌ 危险
$sql = "SELECT * FROM t ORDER BY " . $_GET['sort'];
// ✅ 修复(白名单)
$allowed_cols = ['id', 'created_at', 'name', 'status'];
$sort = in_array($_GET['sort'] ?? '', $allowed_cols, true)
? $_GET['sort']
: 'id'; // 默认值
$stmt = $pdo->prepare("SELECT * FROM t ORDER BY $sort DESC");
$stmt->execute([]);
```
---
## Pattern 8 — Yaf Model 自定义 where()
项目里常见的 Model 链式调用,具体修法取决于 Model 是否支持数组条件。
```php
// ❌ 危险
$this->_model->where("uid = $uid AND status = '$status'")->find();
// ✅ 修复方式 A(如果 Model 支持数组条件)
$this->_model->where(['uid' => $uid, 'status' => $status])->find();
// ✅ 修复方式 B(如果 Model 支持原生绑定)
$this->_model->where("uid = ? AND status = ?", [$uid, $status])->find();
// ✅ 修复方式 C(退回 PDO)
$stmt = $this->_db->prepare("SELECT * FROM {$this->_table} WHERE uid = ? AND status = ?");
$stmt->execute([$uid, $status]);
$row = $stmt->fetch(PDO::FETCH_ASSOC);
```
---
## Pattern 9 — 分页 LIMIT/OFFSET
```php
// ❌ 危险
$sql = "SELECT * FROM t LIMIT $limit OFFSET $offset";
// ✅ 修复(整数强制转换,LIMIT 不需要占位符但必须确保是整数)
$limit = max(1, min(100, (int) $limit)); // 强制范围
$offset = max(0, (int) $offset);
// LIMIT/OFFSET 用整数字面量拼接是安全的,因为已经 (int) 转换
$stmt = $pdo->prepare("SELECT * FROM t LIMIT $limit OFFSET $offset");
$stmt->execute([]);
```
---
## 常见误区
### 误区 1 — addslashes 不够
```php
// ❌ 不安全(部分字符集下可绕过)
$name = addslashes($_POST['name']);
$sql = "SELECT * FROM t WHERE name = '$name'";
// ✅ 用参数化查询
```
### 误区 2 — mysql_real_escape_string 已废弃
```php
// ❌ PHP 7.0+ 已移除 mysql_* 系列函数
$name = mysql_real_escape_string($name);
```
### 误区 3 — intval 对字符串字段无效
```php
// ❌ 对字符串字段强转 int 会导致逻辑错误
$name = intval($_POST['name']); // 变成 0
// ✅ 字符串字段必须用参数化,不能用 intval
```
### 误区 4 — 列名也用 ? 占位
```php
// ❌ PDO 不支持列名占位符
$stmt = $pdo->prepare("SELECT ? FROM t");
$stmt->execute([$col]); // 实际执行 SELECT 'col_name' FROM t(当字符串处理)
// ✅ 列名用白名单
```
---
## 修复验证步骤
```bash
# 1. PHP 语法检查
php -l <file>
# 2. 容器内语法检查(Yaf 项目)
docker compose -f /mnt/d/Users/Public/php20250819/docker-php7.3/docker-compose.yml \
exec fpm-server php -l /var/www/html/2026www/<project>/<file>
# 3. 重跑扫描器确认清零
bash scan_sql.sh <project-root>
# 4. 接口冒烟测试(验证业务逻辑未破坏)
curl -X GET "http://localhost/api/xxx?id=1"
curl -X GET "http://localhost/api/xxx?id=1' OR '1'='1" # 注入测试,应返回空或报错
```
FILE:scripts/scan_sql.sh
#!/usr/bin/env bash
set -euo pipefail
if [ "$#" -lt 1 ]; then
echo "Usage: $0 <project-root> [output-file]" >&2
exit 1
fi
project_root="$1"
output_file="-"
if [ ! -d "$project_root" ]; then
echo "[ERROR] project root not found: $project_root" >&2
exit 1
fi
project_name="$(basename "$project_root")"
tmp="$(mktemp)"
trap 'rm -f "$tmp"' EXIT
# ── 扫描函数 ──────────────────────────────────────────────
scan_pattern() {
local label="$1"
local pattern="$2"
local hits
hits="$(grep -RInE --include='*.php' --include='*.inc' "$pattern" \
"$project_root/application" "$project_root/public" "$project_root/conf" 2>/dev/null \
| sed -n '1,200p' || true)"
if [ -n "$hits" ]; then
echo "=== $label ==="
echo "$hits"
echo
fi
}
{
echo "=== meta ==="
echo "project: $project_name"
echo "path: $project_root"
echo "scanned_at: $(date '+%Y-%m-%d %H:%M:%S %z')"
echo
# 1. 字符串拼接进 SQL(最高危)
scan_pattern "concat into sql" \
'"(SELECT|INSERT|UPDATE|DELETE|REPLACE)[^"]*"\s*\.\s*\$|"[^"]*WHERE[^"]*"\s*\.\s*\$|"[^"]*SET[^"]*"\s*\.\s*\$'
# 2. 变量直接插值到 SQL 字符串
scan_pattern "variable interpolation in sql" \
'"(SELECT|INSERT|UPDATE|DELETE|REPLACE)[^"$]*\$[a-zA-Z_][a-zA-Z0-9_]*'
# 3. sprintf 不安全格式化
scan_pattern "sprintf sql injection" \
"sprintf\s*\(\s*['\"].*%(s|d|f).*['\"].*\\\$"
# 4. 直接把超全局变量拼入 SQL
scan_pattern "superglobal directly in sql" \
'(SELECT|UPDATE|DELETE|INSERT)[^;]*\$_(GET|POST|REQUEST|COOKIE)\['
# 5. query/execute 接收拼接字符串变量
scan_pattern "query or execute with variable" \
'->(query|execute)\s*\(\s*\$[a-zA-Z_][a-zA-Z0-9_]*\s*[\),]'
# 6. WHERE 子句直接插入变量(单引号包裹或裸变量)
scan_pattern "where clause with raw variable" \
"WHERE[^'\"]*['\"][^'\"]*\\\$[a-zA-Z_]|WHERE\s+[a-zA-Z0-9_.]+\s*=\s*\\\$[a-zA-Z_]"
# 7. IN 子句用 implode 拼接(未参数化)
scan_pattern "in clause with implode" \
'IN\s*\(\s*\$[a-zA-Z_]|implode\s*\([^)]*\)\s*.*IN'
# 8. LIKE 子句拼接
scan_pattern "like clause concatenation" \
"LIKE\s*['\"][^'\"]*['\"\s]*\.\s*\\\$|LIKE\s*['\"%][^'\"]*\\\$"
# 9. ORDER BY / LIMIT 接受外部变量(列名注入)
scan_pattern "order by or limit with variable" \
'(ORDER\s+BY|LIMIT|OFFSET)\s+\$[a-zA-Z_]|(ORDER\s+BY|LIMIT|OFFSET)[^;]*\.\s*\$'
} > "$tmp"
# ── 统计命中 ──────────────────────────────────────────────
count_section() {
local section="$1"
awk -v s="$section" '
$0=="=== "s" ===" { found=1; next }
/^=== / && found { exit }
found && NF { c++ }
END { print c+0 }
' "$tmp"
}
c_concat="$(count_section "concat into sql")"
c_interp="$(count_section "variable interpolation in sql")"
c_sprintf="$(count_section "sprintf sql injection")"
c_super="$(count_section "superglobal directly in sql")"
c_exec="$(count_section "query or execute with variable")"
c_where="$(count_section "where clause with raw variable")"
c_in="$(count_section "in clause with implode")"
c_like="$(count_section "like clause concatenation")"
c_order="$(count_section "order by or limit with variable")"
total=$(( c_concat + c_interp + c_sprintf + c_super + c_exec + c_where + c_in + c_like + c_order ))
risk="low"
if [ "$c_concat" -ge 5 ] || [ "$c_super" -ge 1 ] || [ "$c_sprintf" -ge 3 ]; then
risk="high"
elif [ "$total" -ge 3 ] || [ "$c_concat" -ge 1 ] || [ "$c_interp" -ge 3 ]; then
risk="medium"
fi
# ── 生成报告 ──────────────────────────────────────────────
generate_report() {
echo "# SQL 注入风险扫描报告"
echo
echo "## 基本信息"
echo "- 项目:$project_name"
echo "- 路径:$project_root"
echo "- 扫描时间:$(date '+%Y-%m-%d %H:%M:%S %z')"
echo "- 风险等级:$risk"
echo "- 总命中行数:$total"
echo
echo "## 命中统计"
echo "| 类别 | 命中数 | 风险 |"
echo "|------|--------|------|"
echo "| SQL 字符串拼接 | $c_concat | 🔴 高 |"
echo "| 变量插值到 SQL | $c_interp | 🔴 高 |"
echo "| sprintf 注入 | $c_sprintf | 🟠 中 |"
echo "| 超全局变量直接入 SQL | $c_super | 🔴 高 |"
echo "| query/execute 接变量 | $c_exec | 🟡 待确认 |"
echo "| WHERE 裸变量 | $c_where | 🟠 中 |"
echo "| IN + implode | $c_in | 🟠 中 |"
echo "| LIKE 拼接 | $c_like | 🟠 中 |"
echo "| ORDER BY/LIMIT 接变量 | $c_order | 🟡 待确认 |"
echo
echo "## 修复优先级建议"
if [ "$c_super" -ge 1 ]; then
echo "- ⚠️ **立即处理**:发现超全局变量直接拼入 SQL,极高风险"
fi
if [ "$c_concat" -ge 1 ]; then
echo "- 🔴 **高优先**:字符串拼接 SQL,需逐一确认输入来源后参数化"
fi
if [ "$c_sprintf" -ge 1 ]; then
echo "- 🟠 **中优先**:sprintf 格式化 SQL,%s 不能替代参数绑定"
fi
echo
echo "## 详细命中"
echo
print_section() {
local section="$1"
awk -v s="$section" '
$0=="=== "s" ===" { found=1; next }
/^=== / && found { exit }
found { print }
' "$tmp"
}
for section in \
"concat into sql" \
"variable interpolation in sql" \
"sprintf sql injection" \
"superglobal directly in sql" \
"query or execute with variable" \
"where clause with raw variable" \
"in clause with implode" \
"like clause concatenation" \
"order by or limit with variable"
do
hits="$(print_section "$section")"
if [ -n "$hits" ]; then
echo "### $section"
echo '```'
echo "$hits"
echo '```'
echo
fi
done
echo "## 下一步"
echo "1. 对每个命中项确认是否有用户输入到达该 SQL"
echo "2. 用 \`php \"\$SKILL_DIR/scripts/suggest_fix.php\" <file>\` 获取修复建议"
echo "3. 参考 \`references/fix-patterns.md\` 选择适合项目 DB 层的修复模式"
echo "4. 修复后重新运行本脚本验证清零"
}
if [ -n "$output_file" ]; then
mkdir -p "$(dirname "$output_file")"
generate_report > "$output_file"
printf '[OK] 报告已写入: %s\n' "$output_file"
printf '[OK] 风险等级: %s | 总命中: %s\n' "$risk" "$total"
else
generate_report
fi
Captures learnings, errors, and corrections to enable continuous improvement. Use when: (1) A command or operation fails unexpectedly, (2) User corrects Clau...
---
name: self-improvement
version: 1.0.0
description: "Captures learnings, errors, and corrections to enable continuous improvement. Use when: (1) A command or operation fails unexpectedly, (2) User corrects Claude ('No, that's wrong...', 'Actually...'), (3) User requests a capability that doesn't exist, (4) An external API or tool fails, (5) Claude realizes its knowledge is outdated or incorrect, (6) A better approach is discovered for a recurring task. Also review learnings before major tasks."
emoji: 🧠
homepage: https://github.com/XavierMary56/OmniPublish
metadata:
openclaw: {}
---
# Self-Improvement Skill
Log learnings and errors to markdown files for continuous improvement. Coding agents can later process these into fixes, and important learnings get promoted to project memory.
## Quick Reference
| Situation | Action |
|-----------|--------|
| Command/operation fails | Log to `.learnings/ERRORS.md` |
| User corrects you | Log to `.learnings/LEARNINGS.md` with category `correction` |
| User wants missing feature | Log to `.learnings/FEATURE_REQUESTS.md` |
| API/external tool fails | Log to `.learnings/ERRORS.md` with integration details |
| Knowledge was outdated | Log to `.learnings/LEARNINGS.md` with category `knowledge_gap` |
| Found better approach | Log to `.learnings/LEARNINGS.md` with category `best_practice` |
| Simplify/Harden recurring patterns | Log/update `.learnings/LEARNINGS.md` with `Source: simplify-and-harden` and a stable `Pattern-Key` |
| Similar to existing entry | Link with `**See Also**`, consider priority bump |
| Broadly applicable learning | Promote to `CLAUDE.md`, `AGENTS.md`, and/or `.github/copilot-instructions.md` |
| Workflow improvements | Promote to `AGENTS.md` (OpenClaw workspace) |
| Tool gotchas | Promote to `TOOLS.md` (OpenClaw workspace) |
| Behavioral patterns | Promote to `SOUL.md` (OpenClaw workspace) |
## OpenClaw Setup (Recommended)
OpenClaw is the primary platform for this skill. It uses workspace-based prompt injection with automatic skill loading.
### Installation
**Via ClawdHub (recommended):**
```bash
clawdhub install self-improving-agent
```
**Manual:**
```bash
git clone https://github.com/peterskoett/self-improving-agent.git ~/.openclaw/skills/self-improving-agent
```
Remade for openclaw from original repo : https://github.com/pskoett/pskoett-ai-skills - https://github.com/pskoett/pskoett-ai-skills/tree/main/skills/self-improvement
### Workspace Structure
OpenClaw injects these files into every session:
```
~/.openclaw/workspace/
├── AGENTS.md # Multi-agent workflows, delegation patterns
├── SOUL.md # Behavioral guidelines, personality, principles
├── TOOLS.md # Tool capabilities, integration gotchas
├── MEMORY.md # Long-term memory (main session only)
├── memory/ # Daily memory files
│ └── YYYY-MM-DD.md
└── .learnings/ # This skill's log files
├── LEARNINGS.md
├── ERRORS.md
└── FEATURE_REQUESTS.md
```
### Create Learning Files
```bash
mkdir -p ~/.openclaw/workspace/.learnings
```
Then create the log files (or copy from `assets/`):
- `LEARNINGS.md` — corrections, knowledge gaps, best practices
- `ERRORS.md` — command failures, exceptions
- `FEATURE_REQUESTS.md` — user-requested capabilities
### Promotion Targets
When learnings prove broadly applicable, promote them to workspace files:
| Learning Type | Promote To | Example |
|---------------|------------|---------|
| Behavioral patterns | `SOUL.md` | "Be concise, avoid disclaimers" |
| Workflow improvements | `AGENTS.md` | "Spawn sub-agents for long tasks" |
| Tool gotchas | `TOOLS.md` | "Git push needs auth configured first" |
### Inter-Session Communication
OpenClaw provides tools to share learnings across sessions:
- **sessions_list** — View active/recent sessions
- **sessions_history** — Read another session's transcript
- **sessions_send** — Send a learning to another session
- **sessions_spawn** — Spawn a sub-agent for background work
### Optional: Enable Hook
For automatic reminders at session start:
```bash
# Copy hook to OpenClaw hooks directory
cp -r hooks/openclaw ~/.openclaw/hooks/self-improvement
# Enable it
openclaw hooks enable self-improvement
```
See `references/openclaw-integration.md` for complete details.
---
## Generic Setup (Other Agents)
For Claude Code, Codex, Copilot, or other agents, create `.learnings/` in your project:
```bash
mkdir -p .learnings
```
Copy templates from `assets/` or create files with headers.
### Add reference to agent files AGENTS.md, CLAUDE.md, or .github/copilot-instructions.md to remind yourself to log learnings. (this is an alternative to hook-based reminders)
#### Self-Improvement Workflow
When errors or corrections occur:
1. Log to `.learnings/ERRORS.md`, `LEARNINGS.md`, or `FEATURE_REQUESTS.md`
2. Review and promote broadly applicable learnings to:
- `CLAUDE.md` - project facts and conventions
- `AGENTS.md` - workflows and automation
- `.github/copilot-instructions.md` - Copilot context
## Logging Format
### Learning Entry
Append to `.learnings/LEARNINGS.md`:
```markdown
## [LRN-YYYYMMDD-XXX] category
**Logged**: ISO-8601 timestamp
**Priority**: low | medium | high | critical
**Status**: pending
**Area**: frontend | backend | infra | tests | docs | config
### Summary
One-line description of what was learned
### Details
Full context: what happened, what was wrong, what's correct
### Suggested Action
Specific fix or improvement to make
### Metadata
- Source: conversation | error | user_feedback
- Related Files: path/to/file.ext
- Tags: tag1, tag2
- See Also: LRN-20250110-001 (if related to existing entry)
- Pattern-Key: simplify.dead_code | harden.input_validation (optional, for recurring-pattern tracking)
- Recurrence-Count: 1 (optional)
- First-Seen: 2025-01-15 (optional)
- Last-Seen: 2025-01-15 (optional)
---
```
### Error Entry
Append to `.learnings/ERRORS.md`:
```markdown
## [ERR-YYYYMMDD-XXX] skill_or_command_name
**Logged**: ISO-8601 timestamp
**Priority**: high
**Status**: pending
**Area**: frontend | backend | infra | tests | docs | config
### Summary
Brief description of what failed
### Error
```
Actual error message or output
```
### Context
- Command/operation attempted
- Input or parameters used
- Environment details if relevant
### Suggested Fix
If identifiable, what might resolve this
### Metadata
- Reproducible: yes | no | unknown
- Related Files: path/to/file.ext
- See Also: ERR-20250110-001 (if recurring)
---
```
### Feature Request Entry
Append to `.learnings/FEATURE_REQUESTS.md`:
```markdown
## [FEAT-YYYYMMDD-XXX] capability_name
**Logged**: ISO-8601 timestamp
**Priority**: medium
**Status**: pending
**Area**: frontend | backend | infra | tests | docs | config
### Requested Capability
What the user wanted to do
### User Context
Why they needed it, what problem they're solving
### Complexity Estimate
simple | medium | complex
### Suggested Implementation
How this could be built, what it might extend
### Metadata
- Frequency: first_time | recurring
- Related Features: existing_feature_name
---
```
## ID Generation
Format: `TYPE-YYYYMMDD-XXX`
- TYPE: `LRN` (learning), `ERR` (error), `FEAT` (feature)
- YYYYMMDD: Current date
- XXX: Sequential number or random 3 chars (e.g., `001`, `A7B`)
Examples: `LRN-20250115-001`, `ERR-20250115-A3F`, `FEAT-20250115-002`
## Resolving Entries
When an issue is fixed, update the entry:
1. Change `**Status**: pending` → `**Status**: resolved`
2. Add resolution block after Metadata:
```markdown
### Resolution
- **Resolved**: 2025-01-16T09:00:00Z
- **Commit/PR**: abc123 or #42
- **Notes**: Brief description of what was done
```
Other status values:
- `in_progress` - Actively being worked on
- `wont_fix` - Decided not to address (add reason in Resolution notes)
- `promoted` - Elevated to CLAUDE.md, AGENTS.md, or .github/copilot-instructions.md
## Promoting to Project Memory
When a learning is broadly applicable (not a one-off fix), promote it to permanent project memory.
### When to Promote
- Learning applies across multiple files/features
- Knowledge any contributor (human or AI) should know
- Prevents recurring mistakes
- Documents project-specific conventions
### Promotion Targets
| Target | What Belongs There |
|--------|-------------------|
| `CLAUDE.md` | Project facts, conventions, gotchas for all Claude interactions |
| `AGENTS.md` | Agent-specific workflows, tool usage patterns, automation rules |
| `.github/copilot-instructions.md` | Project context and conventions for GitHub Copilot |
| `SOUL.md` | Behavioral guidelines, communication style, principles (OpenClaw workspace) |
| `TOOLS.md` | Tool capabilities, usage patterns, integration gotchas (OpenClaw workspace) |
### How to Promote
1. **Distill** the learning into a concise rule or fact
2. **Add** to appropriate section in target file (create file if needed)
3. **Update** original entry:
- Change `**Status**: pending` → `**Status**: promoted`
- Add `**Promoted**: CLAUDE.md`, `AGENTS.md`, or `.github/copilot-instructions.md`
### Promotion Examples
**Learning** (verbose):
> Project uses pnpm workspaces. Attempted `npm install` but failed.
> Lock file is `pnpm-lock.yaml`. Must use `pnpm install`.
**In CLAUDE.md** (concise):
```markdown
## Build & Dependencies
- Package manager: pnpm (not npm) - use `pnpm install`
```
**Learning** (verbose):
> When modifying API endpoints, must regenerate TypeScript client.
> Forgetting this causes type mismatches at runtime.
**In AGENTS.md** (actionable):
```markdown
## After API Changes
1. Regenerate client: `pnpm run generate:api`
2. Check for type errors: `pnpm tsc --noEmit`
```
## Recurring Pattern Detection
If logging something similar to an existing entry:
1. **Search first**: `grep -r "keyword" .learnings/`
2. **Link entries**: Add `**See Also**: ERR-20250110-001` in Metadata
3. **Bump priority** if issue keeps recurring
4. **Consider systemic fix**: Recurring issues often indicate:
- Missing documentation (→ promote to CLAUDE.md or .github/copilot-instructions.md)
- Missing automation (→ add to AGENTS.md)
- Architectural problem (→ create tech debt ticket)
## Simplify & Harden Feed
Use this workflow to ingest recurring patterns from the `simplify-and-harden`
skill and turn them into durable prompt guidance.
### Ingestion Workflow
1. Read `simplify_and_harden.learning_loop.candidates` from the task summary.
2. For each candidate, use `pattern_key` as the stable dedupe key.
3. Search `.learnings/LEARNINGS.md` for an existing entry with that key:
- `grep -n "Pattern-Key: <pattern_key>" .learnings/LEARNINGS.md`
4. If found:
- Increment `Recurrence-Count`
- Update `Last-Seen`
- Add `See Also` links to related entries/tasks
5. If not found:
- Create a new `LRN-...` entry
- Set `Source: simplify-and-harden`
- Set `Pattern-Key`, `Recurrence-Count: 1`, and `First-Seen`/`Last-Seen`
### Promotion Rule (System Prompt Feedback)
Promote recurring patterns into agent context/system prompt files when all are true:
- `Recurrence-Count >= 3`
- Seen across at least 2 distinct tasks
- Occurred within a 30-day window
Promotion targets:
- `CLAUDE.md`
- `AGENTS.md`
- `.github/copilot-instructions.md`
- `SOUL.md` / `TOOLS.md` for OpenClaw workspace-level guidance when applicable
Write promoted rules as short prevention rules (what to do before/while coding),
not long incident write-ups.
## Periodic Review
Review `.learnings/` at natural breakpoints:
### When to Review
- Before starting a new major task
- After completing a feature
- When working in an area with past learnings
- Weekly during active development
### Quick Status Check
```bash
# Count pending items
grep -h "Status\*\*: pending" .learnings/*.md | wc -l
# List pending high-priority items
grep -B5 "Priority\*\*: high" .learnings/*.md | grep "^## \["
# Find learnings for a specific area
grep -l "Area\*\*: backend" .learnings/*.md
```
### Review Actions
- Resolve fixed items
- Promote applicable learnings
- Link related entries
- Escalate recurring issues
## Detection Triggers
Automatically log when you notice:
**Corrections** (→ learning with `correction` category):
- "No, that's not right..."
- "Actually, it should be..."
- "You're wrong about..."
- "That's outdated..."
**Feature Requests** (→ feature request):
- "Can you also..."
- "I wish you could..."
- "Is there a way to..."
- "Why can't you..."
**Knowledge Gaps** (→ learning with `knowledge_gap` category):
- User provides information you didn't know
- Documentation you referenced is outdated
- API behavior differs from your understanding
**Errors** (→ error entry):
- Command returns non-zero exit code
- Exception or stack trace
- Unexpected output or behavior
- Timeout or connection failure
## Priority Guidelines
| Priority | When to Use |
|----------|-------------|
| `critical` | Blocks core functionality, data loss risk, security issue |
| `high` | Significant impact, affects common workflows, recurring issue |
| `medium` | Moderate impact, workaround exists |
| `low` | Minor inconvenience, edge case, nice-to-have |
## Area Tags
Use to filter learnings by codebase region:
| Area | Scope |
|------|-------|
| `frontend` | UI, components, client-side code |
| `backend` | API, services, server-side code |
| `infra` | CI/CD, deployment, Docker, cloud |
| `tests` | Test files, testing utilities, coverage |
| `docs` | Documentation, comments, READMEs |
| `config` | Configuration files, environment, settings |
## Best Practices
1. **Log immediately** - context is freshest right after the issue
2. **Be specific** - future agents need to understand quickly
3. **Include reproduction steps** - especially for errors
4. **Link related files** - makes fixes easier
5. **Suggest concrete fixes** - not just "investigate"
6. **Use consistent categories** - enables filtering
7. **Promote aggressively** - if in doubt, add to CLAUDE.md or .github/copilot-instructions.md
8. **Review regularly** - stale learnings lose value
## Gitignore Options
**Keep learnings local** (per-developer):
```gitignore
.learnings/
```
**Track learnings in repo** (team-wide):
Don't add to .gitignore - learnings become shared knowledge.
**Hybrid** (track templates, ignore entries):
```gitignore
.learnings/*.md
!.learnings/.gitkeep
```
## Hook Integration
Enable automatic reminders through agent hooks. This is **opt-in** - you must explicitly configure hooks.
### Quick Setup (Claude Code / Codex)
Create `.claude/settings.json` in your project:
```json
{
"hooks": {
"UserPromptSubmit": [{
"matcher": "",
"hooks": [{
"type": "command",
"command": "./skills/self-improvement/scripts/activator.sh"
}]
}]
}
}
```
This injects a learning evaluation reminder after each prompt (~50-100 tokens overhead).
### Full Setup (With Error Detection)
```json
{
"hooks": {
"UserPromptSubmit": [{
"matcher": "",
"hooks": [{
"type": "command",
"command": "./skills/self-improvement/scripts/activator.sh"
}]
}],
"PostToolUse": [{
"matcher": "Bash",
"hooks": [{
"type": "command",
"command": "./skills/self-improvement/scripts/error-detector.sh"
}]
}]
}
}
```
### Available Hook Scripts
| Script | Hook Type | Purpose |
|--------|-----------|---------|
| `scripts/activator.sh` | UserPromptSubmit | Reminds to evaluate learnings after tasks |
| `scripts/error-detector.sh` | PostToolUse (Bash) | Triggers on command errors |
See `references/hooks-setup.md` for detailed configuration and troubleshooting.
## Automatic Skill Extraction
When a learning is valuable enough to become a reusable skill, extract it using the provided helper.
### Skill Extraction Criteria
A learning qualifies for skill extraction when ANY of these apply:
| Criterion | Description |
|-----------|-------------|
| **Recurring** | Has `See Also` links to 2+ similar issues |
| **Verified** | Status is `resolved` with working fix |
| **Non-obvious** | Required actual debugging/investigation to discover |
| **Broadly applicable** | Not project-specific; useful across codebases |
| **User-flagged** | User says "save this as a skill" or similar |
### Extraction Workflow
1. **Identify candidate**: Learning meets extraction criteria
2. **Run helper** (or create manually):
```bash
./skills/self-improvement/scripts/extract-skill.sh skill-name --dry-run
./skills/self-improvement/scripts/extract-skill.sh skill-name
```
3. **Customize SKILL.md**: Fill in template with learning content
4. **Update learning**: Set status to `promoted_to_skill`, add `Skill-Path`
5. **Verify**: Read skill in fresh session to ensure it's self-contained
### Manual Extraction
If you prefer manual creation:
1. Create `skills/<skill-name>/SKILL.md`
2. Use template from `assets/SKILL-TEMPLATE.md`
3. Follow [Agent Skills spec](https://agentskills.io/specification):
- YAML frontmatter with `name` and `description`
- Name must match folder name
- No README.md inside skill folder
### Extraction Detection Triggers
Watch for these signals that a learning should become a skill:
**In conversation:**
- "Save this as a skill"
- "I keep running into this"
- "This would be useful for other projects"
- "Remember this pattern"
**In learning entries:**
- Multiple `See Also` links (recurring issue)
- High priority + resolved status
- Category: `best_practice` with broad applicability
- User feedback praising the solution
### Skill Quality Gates
Before extraction, verify:
- [ ] Solution is tested and working
- [ ] Description is clear without original context
- [ ] Code examples are self-contained
- [ ] No project-specific hardcoded values
- [ ] Follows skill naming conventions (lowercase, hyphens)
## Multi-Agent Support
This skill works across different AI coding agents with agent-specific activation.
### Claude Code
**Activation**: Hooks (UserPromptSubmit, PostToolUse)
**Setup**: `.claude/settings.json` with hook configuration
**Detection**: Automatic via hook scripts
### Codex CLI
**Activation**: Hooks (same pattern as Claude Code)
**Setup**: `.codex/settings.json` with hook configuration
**Detection**: Automatic via hook scripts
### GitHub Copilot
**Activation**: Manual (no hook support)
**Setup**: Add to `.github/copilot-instructions.md`:
```markdown
## Self-Improvement
After solving non-obvious issues, consider logging to `.learnings/`:
1. Use format from self-improvement skill
2. Link related entries with See Also
3. Promote high-value learnings to skills
Ask in chat: "Should I log this as a learning?"
```
**Detection**: Manual review at session end
### OpenClaw
**Activation**: Workspace injection + inter-agent messaging
**Setup**: See "OpenClaw Setup" section above
**Detection**: Via session tools and workspace files
### Agent-Agnostic Guidance
Regardless of agent, apply self-improvement when you:
1. **Discover something non-obvious** - solution wasn't immediate
2. **Correct yourself** - initial approach was wrong
3. **Learn project conventions** - discovered undocumented patterns
4. **Hit unexpected errors** - especially if diagnosis was difficult
5. **Find better approaches** - improved on your original solution
### Copilot Chat Integration
For Copilot users, add this to your prompts when relevant:
> After completing this task, evaluate if any learnings should be logged to `.learnings/` using the self-improvement skill format.
Or use quick prompts:
- "Log this to learnings"
- "Create a skill from this solution"
- "Check .learnings/ for related issues"
FILE:assets/LEARNINGS.md
# Learnings
Corrections, insights, and knowledge gaps captured during development.
**Categories**: correction | insight | knowledge_gap | best_practice
**Areas**: frontend | backend | infra | tests | docs | config
**Statuses**: pending | in_progress | resolved | wont_fix | promoted | promoted_to_skill
## Status Definitions
| Status | Meaning |
|--------|---------|
| `pending` | Not yet addressed |
| `in_progress` | Actively being worked on |
| `resolved` | Issue fixed or knowledge integrated |
| `wont_fix` | Decided not to address (reason in Resolution) |
| `promoted` | Elevated to CLAUDE.md, AGENTS.md, or copilot-instructions.md |
| `promoted_to_skill` | Extracted as a reusable skill |
## Skill Extraction Fields
When a learning is promoted to a skill, add these fields:
```markdown
**Status**: promoted_to_skill
**Skill-Path**: skills/skill-name
```
Example:
```markdown
## [LRN-20250115-001] best_practice
**Logged**: 2025-01-15T10:00:00Z
**Priority**: high
**Status**: promoted_to_skill
**Skill-Path**: skills/docker-m1-fixes
**Area**: infra
### Summary
Docker build fails on Apple Silicon due to platform mismatch
...
```
---
FILE:assets/SKILL-TEMPLATE.md
# Skill Template
Template for creating skills extracted from learnings. Copy and customize.
---
## SKILL.md Template
```markdown
---
name: skill-name-here
description: "Concise description of when and why to use this skill. Include trigger conditions."
---
# Skill Name
Brief introduction explaining the problem this skill solves and its origin.
## Quick Reference
| Situation | Action |
|-----------|--------|
| [Trigger 1] | [Action 1] |
| [Trigger 2] | [Action 2] |
## Background
Why this knowledge matters. What problems it prevents. Context from the original learning.
## Solution
### Step-by-Step
1. First step with code or command
2. Second step
3. Verification step
### Code Example
\`\`\`language
// Example code demonstrating the solution
\`\`\`
## Common Variations
- **Variation A**: Description and how to handle
- **Variation B**: Description and how to handle
## Gotchas
- Warning or common mistake #1
- Warning or common mistake #2
## Related
- Link to related documentation
- Link to related skill
## Source
Extracted from learning entry.
- **Learning ID**: LRN-YYYYMMDD-XXX
- **Original Category**: correction | insight | knowledge_gap | best_practice
- **Extraction Date**: YYYY-MM-DD
```
---
## Minimal Template
For simple skills that don't need all sections:
```markdown
---
name: skill-name-here
description: "What this skill does and when to use it."
---
# Skill Name
[Problem statement in one sentence]
## Solution
[Direct solution with code/commands]
## Source
- Learning ID: LRN-YYYYMMDD-XXX
```
---
## Template with Scripts
For skills that include executable helpers:
```markdown
---
name: skill-name-here
description: "What this skill does and when to use it."
---
# Skill Name
[Introduction]
## Quick Reference
| Command | Purpose |
|---------|---------|
| `./scripts/helper.sh` | [What it does] |
| `./scripts/validate.sh` | [What it does] |
## Usage
### Automated (Recommended)
\`\`\`bash
./skills/skill-name/scripts/helper.sh [args]
\`\`\`
### Manual Steps
1. Step one
2. Step two
## Scripts
| Script | Description |
|--------|-------------|
| `scripts/helper.sh` | Main utility |
| `scripts/validate.sh` | Validation checker |
## Source
- Learning ID: LRN-YYYYMMDD-XXX
```
---
## Naming Conventions
- **Skill name**: lowercase, hyphens for spaces
- Good: `docker-m1-fixes`, `api-timeout-patterns`
- Bad: `Docker_M1_Fixes`, `APITimeoutPatterns`
- **Description**: Start with action verb, mention trigger
- Good: "Handles Docker build failures on Apple Silicon. Use when builds fail with platform mismatch."
- Bad: "Docker stuff"
- **Files**:
- `SKILL.md` - Required, main documentation
- `scripts/` - Optional, executable code
- `references/` - Optional, detailed docs
- `assets/` - Optional, templates
---
## Extraction Checklist
Before creating a skill from a learning:
- [ ] Learning is verified (status: resolved)
- [ ] Solution is broadly applicable (not one-off)
- [ ] Content is complete (has all needed context)
- [ ] Name follows conventions
- [ ] Description is concise but informative
- [ ] Quick Reference table is actionable
- [ ] Code examples are tested
- [ ] Source learning ID is recorded
After creating:
- [ ] Update original learning with `promoted_to_skill` status
- [ ] Add `Skill-Path: skills/skill-name` to learning metadata
- [ ] Test skill by reading it in a fresh session
FILE:hooks/openclaw/handler.js
/**
* Self-Improvement Hook for OpenClaw
*
* Injects a reminder to evaluate learnings during agent bootstrap.
* Fires on agent:bootstrap event before workspace files are injected.
*/
const REMINDER_CONTENT = `
## Self-Improvement Reminder
After completing tasks, evaluate if any learnings should be captured:
**Log when:**
- User corrects you → \`.learnings/LEARNINGS.md\`
- Command/operation fails → \`.learnings/ERRORS.md\`
- User wants missing capability → \`.learnings/FEATURE_REQUESTS.md\`
- You discover your knowledge was wrong → \`.learnings/LEARNINGS.md\`
- You find a better approach → \`.learnings/LEARNINGS.md\`
**Promote when pattern is proven:**
- Behavioral patterns → \`SOUL.md\`
- Workflow improvements → \`AGENTS.md\`
- Tool gotchas → \`TOOLS.md\`
Keep entries simple: date, title, what happened, what to do differently.
`.trim();
const handler = async (event) => {
// Safety checks for event structure
if (!event || typeof event !== 'object') {
return;
}
// Only handle agent:bootstrap events
if (event.type !== 'agent' || event.action !== 'bootstrap') {
return;
}
// Safety check for context
if (!event.context || typeof event.context !== 'object') {
return;
}
// Inject the reminder as a virtual bootstrap file
// Check that bootstrapFiles is an array before pushing
if (Array.isArray(event.context.bootstrapFiles)) {
event.context.bootstrapFiles.push({
path: 'SELF_IMPROVEMENT_REMINDER.md',
content: REMINDER_CONTENT,
virtual: true,
});
}
};
module.exports = handler;
module.exports.default = handler;
FILE:hooks/openclaw/handler.ts
/**
* Self-Improvement Hook for OpenClaw
*
* Injects a reminder to evaluate learnings during agent bootstrap.
* Fires on agent:bootstrap event before workspace files are injected.
*/
import type { HookHandler } from 'openclaw/hooks';
const REMINDER_CONTENT = `## Self-Improvement Reminder
After completing tasks, evaluate if any learnings should be captured:
**Log when:**
- User corrects you → \`.learnings/LEARNINGS.md\`
- Command/operation fails → \`.learnings/ERRORS.md\`
- User wants missing capability → \`.learnings/FEATURE_REQUESTS.md\`
- You discover your knowledge was wrong → \`.learnings/LEARNINGS.md\`
- You find a better approach → \`.learnings/LEARNINGS.md\`
**Promote when pattern is proven:**
- Behavioral patterns → \`SOUL.md\`
- Workflow improvements → \`AGENTS.md\`
- Tool gotchas → \`TOOLS.md\`
Keep entries simple: date, title, what happened, what to do differently.`;
const handler: HookHandler = async (event) => {
// Safety checks for event structure
if (!event || typeof event !== 'object') {
return;
}
// Only handle agent:bootstrap events
if (event.type !== 'agent' || event.action !== 'bootstrap') {
return;
}
// Safety check for context
if (!event.context || typeof event.context !== 'object') {
return;
}
// Skip sub-agent sessions to avoid bootstrap issues
// Sub-agents have sessionKey patterns like "agent:main:subagent:..."
const sessionKey = event.sessionKey || '';
if (sessionKey.includes(':subagent:')) {
return;
}
// Inject the reminder as a virtual bootstrap file
// Check that bootstrapFiles is an array before pushing
if (Array.isArray(event.context.bootstrapFiles)) {
event.context.bootstrapFiles.push({
path: 'SELF_IMPROVEMENT_REMINDER.md',
content: REMINDER_CONTENT,
virtual: true,
});
}
};
export default handler;
FILE:hooks/openclaw/HOOK.md
---
name: self-improvement
description: "Injects self-improvement reminder during agent bootstrap"
metadata: {"openclaw":{"emoji":"🧠","events":["agent:bootstrap"]}}
---
# Self-Improvement Hook
Injects a reminder to evaluate learnings during agent bootstrap.
## What It Does
- Fires on `agent:bootstrap` (before workspace files are injected)
- Adds a reminder block to check `.learnings/` for relevant entries
- Prompts the agent to log corrections, errors, and discoveries
## Configuration
No configuration needed. Enable with:
```bash
openclaw hooks enable self-improvement
```
FILE:references/examples.md
# Entry Examples
Concrete examples of well-formatted entries with all fields.
## Learning: Correction
```markdown
## [LRN-20250115-001] correction
**Logged**: 2025-01-15T10:30:00Z
**Priority**: high
**Status**: pending
**Area**: tests
### Summary
Incorrectly assumed pytest fixtures are scoped to function by default
### Details
When writing test fixtures, I assumed all fixtures were function-scoped.
User corrected that while function scope is the default, the codebase
convention uses module-scoped fixtures for database connections to
improve test performance.
### Suggested Action
When creating fixtures that involve expensive setup (DB, network),
check existing fixtures for scope patterns before defaulting to function scope.
### Metadata
- Source: user_feedback
- Related Files: tests/conftest.py
- Tags: pytest, testing, fixtures
---
```
## Learning: Knowledge Gap (Resolved)
```markdown
## [LRN-20250115-002] knowledge_gap
**Logged**: 2025-01-15T14:22:00Z
**Priority**: medium
**Status**: resolved
**Area**: config
### Summary
Project uses pnpm not npm for package management
### Details
Attempted to run `npm install` but project uses pnpm workspaces.
Lock file is `pnpm-lock.yaml`, not `package-lock.json`.
### Suggested Action
Check for `pnpm-lock.yaml` or `pnpm-workspace.yaml` before assuming npm.
Use `pnpm install` for this project.
### Metadata
- Source: error
- Related Files: pnpm-lock.yaml, pnpm-workspace.yaml
- Tags: package-manager, pnpm, setup
### Resolution
- **Resolved**: 2025-01-15T14:30:00Z
- **Commit/PR**: N/A - knowledge update
- **Notes**: Added to CLAUDE.md for future reference
---
```
## Learning: Promoted to CLAUDE.md
```markdown
## [LRN-20250115-003] best_practice
**Logged**: 2025-01-15T16:00:00Z
**Priority**: high
**Status**: promoted
**Promoted**: CLAUDE.md
**Area**: backend
### Summary
API responses must include correlation ID from request headers
### Details
All API responses should echo back the X-Correlation-ID header from
the request. This is required for distributed tracing. Responses
without this header break the observability pipeline.
### Suggested Action
Always include correlation ID passthrough in API handlers.
### Metadata
- Source: user_feedback
- Related Files: src/middleware/correlation.ts
- Tags: api, observability, tracing
---
```
## Learning: Promoted to AGENTS.md
```markdown
## [LRN-20250116-001] best_practice
**Logged**: 2025-01-16T09:00:00Z
**Priority**: high
**Status**: promoted
**Promoted**: AGENTS.md
**Area**: backend
### Summary
Must regenerate API client after OpenAPI spec changes
### Details
When modifying API endpoints, the TypeScript client must be regenerated.
Forgetting this causes type mismatches that only appear at runtime.
The generate script also runs validation.
### Suggested Action
Add to agent workflow: after any API changes, run `pnpm run generate:api`.
### Metadata
- Source: error
- Related Files: openapi.yaml, src/client/api.ts
- Tags: api, codegen, typescript
---
```
## Error Entry
```markdown
## [ERR-20250115-A3F] docker_build
**Logged**: 2025-01-15T09:15:00Z
**Priority**: high
**Status**: pending
**Area**: infra
### Summary
Docker build fails on M1 Mac due to platform mismatch
### Error
```
error: failed to solve: python:3.11-slim: no match for platform linux/arm64
```
### Context
- Command: `docker build -t myapp .`
- Dockerfile uses `FROM python:3.11-slim`
- Running on Apple Silicon (M1/M2)
### Suggested Fix
Add platform flag: `docker build --platform linux/amd64 -t myapp .`
Or update Dockerfile: `FROM --platform=linux/amd64 python:3.11-slim`
### Metadata
- Reproducible: yes
- Related Files: Dockerfile
---
```
## Error Entry: Recurring Issue
```markdown
## [ERR-20250120-B2C] api_timeout
**Logged**: 2025-01-20T11:30:00Z
**Priority**: critical
**Status**: pending
**Area**: backend
### Summary
Third-party payment API timeout during checkout
### Error
```
TimeoutError: Request to payments.example.com timed out after 30000ms
```
### Context
- Command: POST /api/checkout
- Timeout set to 30s
- Occurs during peak hours (lunch, evening)
### Suggested Fix
Implement retry with exponential backoff. Consider circuit breaker pattern.
### Metadata
- Reproducible: yes (during peak hours)
- Related Files: src/services/payment.ts
- See Also: ERR-20250115-X1Y, ERR-20250118-Z3W
---
```
## Feature Request
```markdown
## [FEAT-20250115-001] export_to_csv
**Logged**: 2025-01-15T16:45:00Z
**Priority**: medium
**Status**: pending
**Area**: backend
### Requested Capability
Export analysis results to CSV format
### User Context
User runs weekly reports and needs to share results with non-technical
stakeholders in Excel. Currently copies output manually.
### Complexity Estimate
simple
### Suggested Implementation
Add `--output csv` flag to the analyze command. Use standard csv module.
Could extend existing `--output json` pattern.
### Metadata
- Frequency: recurring
- Related Features: analyze command, json output
---
```
## Feature Request: Resolved
```markdown
## [FEAT-20250110-002] dark_mode
**Logged**: 2025-01-10T14:00:00Z
**Priority**: low
**Status**: resolved
**Area**: frontend
### Requested Capability
Dark mode support for the dashboard
### User Context
User works late hours and finds the bright interface straining.
Several other users have mentioned this informally.
### Complexity Estimate
medium
### Suggested Implementation
Use CSS variables for colors. Add toggle in user settings.
Consider system preference detection.
### Metadata
- Frequency: recurring
- Related Features: user settings, theme system
### Resolution
- **Resolved**: 2025-01-18T16:00:00Z
- **Commit/PR**: #142
- **Notes**: Implemented with system preference detection and manual toggle
---
```
## Learning: Promoted to Skill
```markdown
## [LRN-20250118-001] best_practice
**Logged**: 2025-01-18T11:00:00Z
**Priority**: high
**Status**: promoted_to_skill
**Skill-Path**: skills/docker-m1-fixes
**Area**: infra
### Summary
Docker build fails on Apple Silicon due to platform mismatch
### Details
When building Docker images on M1/M2 Macs, the build fails because
the base image doesn't have an ARM64 variant. This is a common issue
that affects many developers.
### Suggested Action
Add `--platform linux/amd64` to docker build command, or use
`FROM --platform=linux/amd64` in Dockerfile.
### Metadata
- Source: error
- Related Files: Dockerfile
- Tags: docker, arm64, m1, apple-silicon
- See Also: ERR-20250115-A3F, ERR-20250117-B2D
---
```
## Extracted Skill Example
When the above learning is extracted as a skill, it becomes:
**File**: `skills/docker-m1-fixes/SKILL.md`
```markdown
---
name: docker-m1-fixes
description: "Fixes Docker build failures on Apple Silicon (M1/M2). Use when docker build fails with platform mismatch errors."
---
# Docker M1 Fixes
Solutions for Docker build issues on Apple Silicon Macs.
## Quick Reference
| Error | Fix |
|-------|-----|
| `no match for platform linux/arm64` | Add `--platform linux/amd64` to build |
| Image runs but crashes | Use emulation or find ARM-compatible base |
## The Problem
Many Docker base images don't have ARM64 variants. When building on
Apple Silicon (M1/M2/M3), Docker attempts to pull ARM64 images by
default, causing platform mismatch errors.
## Solutions
### Option 1: Build Flag (Recommended)
Add platform flag to your build command:
\`\`\`bash
docker build --platform linux/amd64 -t myapp .
\`\`\`
### Option 2: Dockerfile Modification
Specify platform in the FROM instruction:
\`\`\`dockerfile
FROM --platform=linux/amd64 python:3.11-slim
\`\`\`
### Option 3: Docker Compose
Add platform to your service:
\`\`\`yaml
services:
app:
platform: linux/amd64
build: .
\`\`\`
## Trade-offs
| Approach | Pros | Cons |
|----------|------|------|
| Build flag | No file changes | Must remember flag |
| Dockerfile | Explicit, versioned | Affects all builds |
| Compose | Convenient for dev | Requires compose |
## Performance Note
Running AMD64 images on ARM64 uses Rosetta 2 emulation. This works
for development but may be slower. For production, find ARM-native
alternatives when possible.
## Source
- Learning ID: LRN-20250118-001
- Category: best_practice
- Extraction Date: 2025-01-18
```
FILE:references/hooks-setup.md
# Hook Setup Guide
Configure automatic self-improvement triggers for AI coding agents.
## Overview
Hooks enable proactive learning capture by injecting reminders at key moments:
- **UserPromptSubmit**: Reminder after each prompt to evaluate learnings
- **PostToolUse (Bash)**: Error detection when commands fail
## Claude Code Setup
### Option 1: Project-Level Configuration
Create `.claude/settings.json` in your project root:
```json
{
"hooks": {
"UserPromptSubmit": [
{
"matcher": "",
"hooks": [
{
"type": "command",
"command": "./skills/self-improvement/scripts/activator.sh"
}
]
}
],
"PostToolUse": [
{
"matcher": "Bash",
"hooks": [
{
"type": "command",
"command": "./skills/self-improvement/scripts/error-detector.sh"
}
]
}
]
}
}
```
### Option 2: User-Level Configuration
Add to `~/.claude/settings.json` for global activation:
```json
{
"hooks": {
"UserPromptSubmit": [
{
"matcher": "",
"hooks": [
{
"type": "command",
"command": "~/.claude/skills/self-improvement/scripts/activator.sh"
}
]
}
]
}
}
```
### Minimal Setup (Activator Only)
For lower overhead, use only the UserPromptSubmit hook:
```json
{
"hooks": {
"UserPromptSubmit": [
{
"matcher": "",
"hooks": [
{
"type": "command",
"command": "./skills/self-improvement/scripts/activator.sh"
}
]
}
]
}
}
```
## Codex CLI Setup
Codex uses the same hook system as Claude Code. Create `.codex/settings.json`:
```json
{
"hooks": {
"UserPromptSubmit": [
{
"matcher": "",
"hooks": [
{
"type": "command",
"command": "./skills/self-improvement/scripts/activator.sh"
}
]
}
]
}
}
```
## GitHub Copilot Setup
Copilot doesn't support hooks directly. Instead, add guidance to `.github/copilot-instructions.md`:
```markdown
## Self-Improvement
After completing tasks that involved:
- Debugging non-obvious issues
- Discovering workarounds
- Learning project-specific patterns
- Resolving unexpected errors
Consider logging the learning to `.learnings/` using the format from the self-improvement skill.
For high-value learnings that would benefit other sessions, consider skill extraction.
```
## Verification
### Test Activator Hook
1. Enable the hook configuration
2. Start a new Claude Code session
3. Send any prompt
4. Verify you see `<self-improvement-reminder>` in the context
### Test Error Detector Hook
1. Enable PostToolUse hook for Bash
2. Run a command that fails: `ls /nonexistent/path`
3. Verify you see `<error-detected>` reminder
### Dry Run Extract Script
```bash
./skills/self-improvement/scripts/extract-skill.sh test-skill --dry-run
```
Expected output shows the skill scaffold that would be created.
## Troubleshooting
### Hook Not Triggering
1. **Check script permissions**: `chmod +x scripts/*.sh`
2. **Verify path**: Use absolute paths or paths relative to project root
3. **Check settings location**: Project vs user-level settings
4. **Restart session**: Hooks are loaded at session start
### Permission Denied
```bash
chmod +x ./skills/self-improvement/scripts/activator.sh
chmod +x ./skills/self-improvement/scripts/error-detector.sh
chmod +x ./skills/self-improvement/scripts/extract-skill.sh
```
### Script Not Found
If using relative paths, ensure you're in the correct directory or use absolute paths:
```json
{
"command": "/absolute/path/to/skills/self-improvement/scripts/activator.sh"
}
```
### Too Much Overhead
If the activator feels intrusive:
1. **Use minimal setup**: Only UserPromptSubmit, skip PostToolUse
2. **Add matcher filter**: Only trigger for certain prompts:
```json
{
"matcher": "fix|debug|error|issue",
"hooks": [...]
}
```
## Hook Output Budget
The activator is designed to be lightweight:
- **Target**: ~50-100 tokens per activation
- **Content**: Structured reminder, not verbose instructions
- **Format**: XML tags for easy parsing
If you need to reduce overhead further, you can edit `activator.sh` to output less text.
## Security Considerations
- Hook scripts run with the same permissions as Claude Code
- Scripts only output text; they don't modify files or run commands
- Error detector reads `CLAUDE_TOOL_OUTPUT` environment variable
- All scripts are opt-in (you must configure them explicitly)
## Disabling Hooks
To temporarily disable without removing configuration:
1. **Comment out in settings**:
```json
{
"hooks": {
// "UserPromptSubmit": [...]
}
}
```
2. **Or delete the settings file**: Hooks won't run without configuration
FILE:references/openclaw-integration.md
# OpenClaw Integration
Complete setup and usage guide for integrating the self-improvement skill with OpenClaw.
## Overview
OpenClaw uses workspace-based prompt injection combined with event-driven hooks. Context is injected from workspace files at session start, and hooks can trigger on lifecycle events.
## Workspace Structure
```
~/.openclaw/
├── workspace/ # Working directory
│ ├── AGENTS.md # Multi-agent coordination patterns
│ ├── SOUL.md # Behavioral guidelines and personality
│ ├── TOOLS.md # Tool capabilities and gotchas
│ ├── MEMORY.md # Long-term memory (main session only)
│ └── memory/ # Daily memory files
│ └── YYYY-MM-DD.md
├── skills/ # Installed skills
│ └── <skill-name>/
│ └── SKILL.md
└── hooks/ # Custom hooks
└── <hook-name>/
├── HOOK.md
└── handler.ts
```
## Quick Setup
### 1. Install the Skill
```bash
clawdhub install self-improving-agent
```
Or copy manually:
```bash
cp -r self-improving-agent ~/.openclaw/skills/
```
### 2. Install the Hook (Optional)
Copy the hook to OpenClaw's hooks directory:
```bash
cp -r hooks/openclaw ~/.openclaw/hooks/self-improvement
```
Enable the hook:
```bash
openclaw hooks enable self-improvement
```
### 3. Create Learning Files
Create the `.learnings/` directory in your workspace:
```bash
mkdir -p ~/.openclaw/workspace/.learnings
```
Or in the skill directory:
```bash
mkdir -p ~/.openclaw/skills/self-improving-agent/.learnings
```
## Injected Prompt Files
### AGENTS.md
Purpose: Multi-agent workflows and delegation patterns.
```markdown
# Agent Coordination
## Delegation Rules
- Use explore agent for open-ended codebase questions
- Spawn sub-agents for long-running tasks
- Use sessions_send for cross-session communication
## Session Handoff
When delegating to another session:
1. Provide full context in the handoff message
2. Include relevant file paths
3. Specify expected output format
```
### SOUL.md
Purpose: Behavioral guidelines and communication style.
```markdown
# Behavioral Guidelines
## Communication Style
- Be direct and concise
- Avoid unnecessary caveats and disclaimers
- Use technical language appropriate to context
## Error Handling
- Admit mistakes promptly
- Provide corrected information immediately
- Log significant errors to learnings
```
### TOOLS.md
Purpose: Tool capabilities, integration gotchas, local configuration.
```markdown
# Tool Knowledge
## Self-Improvement Skill
Log learnings to `.learnings/` for continuous improvement.
## Local Tools
- Document tool-specific gotchas here
- Note authentication requirements
- Track integration quirks
```
## Learning Workflow
### Capturing Learnings
1. **In-session**: Log to `.learnings/` as usual
2. **Cross-session**: Promote to workspace files
### Promotion Decision Tree
```
Is the learning project-specific?
├── Yes → Keep in .learnings/
└── No → Is it behavioral/style-related?
├── Yes → Promote to SOUL.md
└── No → Is it tool-related?
├── Yes → Promote to TOOLS.md
└── No → Promote to AGENTS.md (workflow)
```
### Promotion Format Examples
**From learning:**
> Git push to GitHub fails without auth configured - triggers desktop prompt
**To TOOLS.md:**
```markdown
## Git
- Don't push without confirming auth is configured
- Use `gh auth status` to check GitHub CLI auth
```
## Inter-Agent Communication
OpenClaw provides tools for cross-session communication:
### sessions_list
View active and recent sessions:
```
sessions_list(activeMinutes=30, messageLimit=3)
```
### sessions_history
Read transcript from another session:
```
sessions_history(sessionKey="session-id", limit=50)
```
### sessions_send
Send message to another session:
```
sessions_send(sessionKey="session-id", message="Learning: API requires X-Custom-Header")
```
### sessions_spawn
Spawn a background sub-agent:
```
sessions_spawn(task="Research X and report back", label="research")
```
## Available Hook Events
| Event | When It Fires |
|-------|---------------|
| `agent:bootstrap` | Before workspace files inject |
| `command:new` | When `/new` command issued |
| `command:reset` | When `/reset` command issued |
| `command:stop` | When `/stop` command issued |
| `gateway:startup` | When gateway starts |
## Detection Triggers
### Standard Triggers
- User corrections ("No, that's wrong...")
- Command failures (non-zero exit codes)
- API errors
- Knowledge gaps
### OpenClaw-Specific Triggers
| Trigger | Action |
|---------|--------|
| Tool call error | Log to TOOLS.md with tool name |
| Session handoff confusion | Log to AGENTS.md with delegation pattern |
| Model behavior surprise | Log to SOUL.md with expected vs actual |
| Skill issue | Log to .learnings/ or report upstream |
## Verification
Check hook is registered:
```bash
openclaw hooks list
```
Check skill is loaded:
```bash
openclaw status
```
## Troubleshooting
### Hook not firing
1. Ensure hooks enabled in config
2. Restart gateway after config changes
3. Check gateway logs for errors
### Learnings not persisting
1. Verify `.learnings/` directory exists
2. Check file permissions
3. Ensure workspace path is configured correctly
### Skill not loading
1. Check skill is in skills directory
2. Verify SKILL.md has correct frontmatter
3. Run `openclaw status` to see loaded skills
FILE:scripts/activator.sh
#!/bin/bash
# Self-Improvement Activator Hook
# Triggers on UserPromptSubmit to remind Claude about learning capture
# Keep output minimal (~50-100 tokens) to minimize overhead
set -e
# Output reminder as system context
cat << 'EOF'
<self-improvement-reminder>
After completing this task, evaluate if extractable knowledge emerged:
- Non-obvious solution discovered through investigation?
- Workaround for unexpected behavior?
- Project-specific pattern learned?
- Error required debugging to resolve?
If yes: Log to .learnings/ using the self-improvement skill format.
If high-value (recurring, broadly applicable): Consider skill extraction.
</self-improvement-reminder>
EOF
FILE:scripts/error-detector.sh
#!/bin/bash
# Self-Improvement Error Detector Hook
# Triggers on PostToolUse for Bash to detect command failures
# Reads CLAUDE_TOOL_OUTPUT environment variable
set -e
# Check if tool output indicates an error
# CLAUDE_TOOL_OUTPUT contains the result of the tool execution
OUTPUT="-"
# Patterns indicating errors (case-insensitive matching)
ERROR_PATTERNS=(
"error:"
"Error:"
"ERROR:"
"failed"
"FAILED"
"command not found"
"No such file"
"Permission denied"
"fatal:"
"Exception"
"Traceback"
"npm ERR!"
"ModuleNotFoundError"
"SyntaxError"
"TypeError"
"exit code"
"non-zero"
)
# Check if output contains any error pattern
contains_error=false
for pattern in "ERROR_PATTERNS[@]"; do
if [[ "$OUTPUT" == *"$pattern"* ]]; then
contains_error=true
break
fi
done
# Only output reminder if error detected
if [ "$contains_error" = true ]; then
cat << 'EOF'
<error-detected>
A command error was detected. Consider logging this to .learnings/ERRORS.md if:
- The error was unexpected or non-obvious
- It required investigation to resolve
- It might recur in similar contexts
- The solution could benefit future sessions
Use the self-improvement skill format: [ERR-YYYYMMDD-XXX]
</error-detected>
EOF
fi
FILE:scripts/extract-skill.sh
#!/bin/bash
# Skill Extraction Helper
# Creates a new skill from a learning entry
# Usage: ./extract-skill.sh <skill-name> [--dry-run]
set -e
# Configuration
SKILLS_DIR="./skills"
# Colors for output
RED='\033[0;31m'
GREEN='\033[0;32m'
YELLOW='\033[1;33m'
NC='\033[0m' # No Color
usage() {
cat << EOF
Usage: $(basename "$0") <skill-name> [options]
Create a new skill from a learning entry.
Arguments:
skill-name Name of the skill (lowercase, hyphens for spaces)
Options:
--dry-run Show what would be created without creating files
--output-dir Relative output directory under current path (default: ./skills)
-h, --help Show this help message
Examples:
$(basename "$0") docker-m1-fixes
$(basename "$0") api-timeout-patterns --dry-run
$(basename "$0") pnpm-setup --output-dir ./skills/custom
The skill will be created in: \$SKILLS_DIR/<skill-name>/
EOF
}
log_info() {
echo -e "GREEN[INFO]NC $1"
}
log_warn() {
echo -e "YELLOW[WARN]NC $1"
}
log_error() {
echo -e "RED[ERROR]NC $1" >&2
}
# Parse arguments
SKILL_NAME=""
DRY_RUN=false
while [[ $# -gt 0 ]]; do
case $1 in
--dry-run)
DRY_RUN=true
shift
;;
--output-dir)
if [ -z "-" ] || [[ "-" == -* ]]; then
log_error "--output-dir requires a relative path argument"
usage
exit 1
fi
SKILLS_DIR="$2"
shift 2
;;
-h|--help)
usage
exit 0
;;
-*)
log_error "Unknown option: $1"
usage
exit 1
;;
*)
if [ -z "$SKILL_NAME" ]; then
SKILL_NAME="$1"
else
log_error "Unexpected argument: $1"
usage
exit 1
fi
shift
;;
esac
done
# Validate skill name
if [ -z "$SKILL_NAME" ]; then
log_error "Skill name is required"
usage
exit 1
fi
# Validate skill name format (lowercase, hyphens, no spaces)
if ! [[ "$SKILL_NAME" =~ ^[a-z0-9]+(-[a-z0-9]+)*$ ]]; then
log_error "Invalid skill name format. Use lowercase letters, numbers, and hyphens only."
log_error "Examples: 'docker-fixes', 'api-patterns', 'pnpm-setup'"
exit 1
fi
# Validate output path to avoid writes outside current workspace.
if [[ "$SKILLS_DIR" = /* ]]; then
log_error "Output directory must be a relative path under the current directory."
exit 1
fi
if [[ "$SKILLS_DIR" =~ (^|/)\.\.(/|$) ]]; then
log_error "Output directory cannot include '..' path segments."
exit 1
fi
SKILLS_DIR="SKILLS_DIR#./"
SKILLS_DIR="./$SKILLS_DIR"
SKILL_PATH="$SKILLS_DIR/$SKILL_NAME"
# Check if skill already exists
if [ -d "$SKILL_PATH" ] && [ "$DRY_RUN" = false ]; then
log_error "Skill already exists: $SKILL_PATH"
log_error "Use a different name or remove the existing skill first."
exit 1
fi
# Dry run output
if [ "$DRY_RUN" = true ]; then
log_info "Dry run - would create:"
echo " $SKILL_PATH/"
echo " $SKILL_PATH/SKILL.md"
echo ""
echo "Template content would be:"
echo "---"
cat << TEMPLATE
name: $SKILL_NAME
description: "[TODO: Add a concise description of what this skill does and when to use it]"
---
# $(echo "$SKILL_NAME" | sed 's/-/ /g' | awk '{for(i=1;i<=NF;i++) $i=toupper(substr($i,1,1)) tolower(substr($i,2))}1')
[TODO: Brief introduction explaining the skill's purpose]
## Quick Reference
| Situation | Action |
|-----------|--------|
| [Trigger condition] | [What to do] |
## Usage
[TODO: Detailed usage instructions]
## Examples
[TODO: Add concrete examples]
## Source Learning
This skill was extracted from a learning entry.
- Learning ID: [TODO: Add original learning ID]
- Original File: .learnings/LEARNINGS.md
TEMPLATE
echo "---"
exit 0
fi
# Create skill directory structure
log_info "Creating skill: $SKILL_NAME"
mkdir -p "$SKILL_PATH"
# Create SKILL.md from template
cat > "$SKILL_PATH/SKILL.md" << TEMPLATE
---
name: $SKILL_NAME
description: "[TODO: Add a concise description of what this skill does and when to use it]"
---
# $(echo "$SKILL_NAME" | sed 's/-/ /g' | awk '{for(i=1;i<=NF;i++) $i=toupper(substr($i,1,1)) tolower(substr($i,2))}1')
[TODO: Brief introduction explaining the skill's purpose]
## Quick Reference
| Situation | Action |
|-----------|--------|
| [Trigger condition] | [What to do] |
## Usage
[TODO: Detailed usage instructions]
## Examples
[TODO: Add concrete examples]
## Source Learning
This skill was extracted from a learning entry.
- Learning ID: [TODO: Add original learning ID]
- Original File: .learnings/LEARNINGS.md
TEMPLATE
log_info "Created: $SKILL_PATH/SKILL.md"
# Suggest next steps
echo ""
log_info "Skill scaffold created successfully!"
echo ""
echo "Next steps:"
echo " 1. Edit $SKILL_PATH/SKILL.md"
echo " 2. Fill in the TODO sections with content from your learning"
echo " 3. Add references/ folder if you have detailed documentation"
echo " 4. Add scripts/ folder if you have executable code"
echo " 5. Update the original learning entry with:"
echo " **Status**: promoted_to_skill"
echo " **Skill-Path**: skills/$SKILL_NAME"
FILE:_meta.json
{
"ownerId": "kn70cjr952qdec1nx70zs6wefn7ynq2t",
"slug": "self-improving-agent",
"version": "3.0.5",
"publishedAt": 1773760428300
}Design and implement automatic community commenting workflows driven by existing site content, with optional support for automatic posting and multi-account...
---
name: community-operations
version: 1.0.0
description: Design and implement automatic community commenting workflows driven by existing site content, with optional support for automatic posting and multi-account execution. Use when building auto-comment systems that read comics, articles, novels, or videos from a database, generate natural-looking comments from that content, rotate across multiple accounts, schedule tasks, and optionally hand off generated comments to moderation before or after submission.
emoji: 💬
homepage: https://github.com/XavierMary56/OmniPublish
metadata:
openclaw:
requires:
bins:
- python3
---
# Community Operations
## Overview
Use this skill to build or operate **content-driven automatic commenting systems**. The primary use case is: read existing site content from the database, generate comments that match the content type, assign those comments to one or more accounts, and publish them with scheduling, rate limits, and optional moderation handoff.
This skill can later expand into automatic posting, but the first priority is **automatic commenting based on existing content**.
## Core boundary
Keep responsibilities separate:
- **This skill owns**:
- content selection for commenting
- comment generation strategy
- account rotation
- scheduling and task orchestration
- publishing execution
- duplicate suppression
- frequency control
- operation logging
- **This skill does not own**:
- moderation policy itself
- ad/contact-risk rules
- final moderation result semantics
If moderation is required, integrate a dedicated moderation skill or service such as `post-content-moderation`.
## First-priority capability
Prioritize this workflow first:
```text
existing content in database
→ extract content summary
→ generate comment candidates
→ choose account
→ optional moderation
→ submit comment
→ log result
```
Target content sources include:
- comics
- articles
- novels
- videos
## Content-driven automatic commenting
### Supported source types
Build a unified comment-input structure even if the original tables differ.
Recommended normalized fields:
- `content_type`
- `content_id`
- `title`
- `summary`
- `author`
- `tags`
- `category_id`
- `topic_id`
- `published_at`
- `extra`
Example:
```json
{
"content_type": "comic",
"content_id": 123,
"title": "标题",
"summary": "摘要",
"author": "作者",
"tags": ["热血", "校园"],
"extra": {}
}
```
## Comment generation rules
Do not use one generic template for every content type. Generate comments differently based on the source.
### Comics
Prefer comments about:
- art style
- character appeal
- plot progression
- update expectations
- favorite scenes
### Articles
Prefer comments about:
- viewpoint response
- agreement/disagreement
- practical reflection
- follow-up questions
- short discussion prompts
### Novels
Prefer comments about:
- writing style
- pacing
- plot setup
- character development
- expectation for later chapters
### Videos
Prefer comments about:
- pacing
- scene quality
- plot reaction
- rewatch value
- performance or presentation
## Comment style strategy
Generate multiple styles instead of repeating one voice.
Recommended style buckets:
- praise / appreciation
- discussion / opinion
- expectation / follow-up
- question / interaction
- light emotional reaction
Avoid repetitive low-quality outputs like:
- `不错啊`
- `写得真好`
- `支持一下`
- `期待更新`
Those can appear occasionally, but should not dominate the comment pool.
## Comment targeting workflow
When asked to build the system, follow this order:
### Step 1. Identify content sources
Clarify:
- which tables or models provide comics, articles, novels, and videos
- which fields are available for summarization
- whether comments target only published content
- whether comments should exclude very old content or low-quality content
### Step 2. Build a normalized content extractor
Create a common extraction layer that maps different content models into one unified comment input object.
### Step 3. Design comment generation
Define:
- generation prompt or template policy
- content-type-specific styles
- duplicate suppression rules
- per-content comment count limits
- unsafe phrase blacklist
- optional moderation checkpoint
### Step 4. Design account strategy
Define:
- account pool source
- account roles
- per-account comment quota
- cooldown period
- account disable conditions
- fallback account selection
### Step 5. Design execution path
Use this chain:
```text
select target content
→ build normalized content input
→ generate comment candidates
→ filter / deduplicate
→ optional moderation
→ choose account
→ submit comment
→ log result
→ retry or cooldown
```
### Step 6. Add controls
Always add:
- idempotency keys
- timeout handling
- retry limits
- duplicate-comment suppression
- per-account frequency caps
- per-content comment caps
- audit logging
- manual disable switch
## Multi-account rules
If multiple accounts are used, always define:
- account id
- account status
- role
- last action time
- cooldown_until
- daily quota
- hourly quota
- failure count
Do not design a system where all accounts can comment indefinitely with no rotation or no cooldown.
## Anti-duplication rules
Always prevent:
- same account posting the same comment repeatedly
- many accounts posting nearly identical comments in a short window
- too many comments on the same content in a short period
- same template being used on multiple content items without variation
Recommended controls:
- similarity threshold checks
- comment hash / normalized hash
- per-content cooloff window
- random delay window
- template variation rules
## Moderation handoff guidance
If generated comments must be checked, choose one mode:
### Mode A. Review before submit
Use when generated comments must not enter the system unless clean.
### Mode B. Submit then audit
Use when the project already supports post-save review states.
### Mode C. Hybrid
Use when comments get basic local filtering before submit and full moderation after submit.
Do not merge moderation policy directly into this skill. Keep moderation as a dependency.
## Logging requirements
At minimum, log:
- task id
- content_type
- content_id
- account id
- generated comment
- generation strategy or style bucket
- moderation mode used
- execution result
- failure reason
- created_at / updated_at
## Future expansion
This skill may later expand to:
- automatic posting
- posting + commenting orchestration
- campaign scheduling
- multi-account publishing waves
But first build **automatic commenting based on existing content** well.
## Recommended reusable resources
### references/
Store:
- content-extractor template
- comment-style matrix
- account rotation template
- queue design template
- rollout checklist
- project-specific schema plans such as `references/51dm-auto-comment-schema-plan.md`
- project-specific implementation plans such as `references/51dm-auto-comment-implementation-v1.md`
- project-specific task breakdowns such as `references/51dm-auto-comment-task-breakdown-v1.md`
- project-specific reuse plans such as `references/51dm-comment-reuse-plan.md`
- human-like comment guidance such as `references/human-like-comment-strategy.md`
### scripts/
Store:
- sample comment generator
- comment signature examples if needed
- scheduler/runner examples
- account rotation pseudocode
## Default recommendation
For the first version, build only these:
- content extractor
- comment generator
- account selector
- comment submitter
- moderation handoff adapter
- operation logger
Do not start with full posting automation unless the user explicitly prioritizes it.
FILE:references/51dm-auto-comment-implementation-v1.md
# Community Operations - 51dm 自动评论第一版实现设计清单
## 一、第一版目标
构建一套面向 51dm 的自动评论系统,支持:
- 读取现有内容
- 根据内容自动生成更像真人的评论
- 通过多账号轮换发布评论
- 与现有审核流衔接
- 控制频率、去重和基础风控
第一版优先覆盖:
- 社区帖子
- 文章
- 漫画
- 视频
---
## 二、推荐模块清单
## 1. 内容选择模块
职责:
- 从可评论内容池中选择目标内容
- 过滤不适合评论的内容
- 输出统一内容结构
建议能力:
- 只选已发布内容
- 排除评论过多内容
- 排除过旧内容
- 排除近期已自动评论过的内容
- 支持按内容类型分批拉取
输出结构建议:
```json
{
"content_type": "post|article|comic|video",
"content_id": 123,
"title": "标题",
"summary": "摘要",
"author": "作者",
"tags": ["标签"],
"extra": {}
}
```
---
## 2. 内容摘要模块
职责:
- 对长文本做摘要化
- 对不同内容类型提炼评论线索
建议:
- 社区帖子:抽取标题 + 正文前若干句
- 文章:抽取标题 + 摘要段
- 漫画:抽取标题 + 描述 + 标签
- 视频:抽取标题 + desc + actors/tags
要求:
- 不要把整段长文直接交给评论生成器
- 摘要长度受控
- 保留内容特征关键词
---
## 3. 评论生成模块
职责:
- 基于内容类型与摘要生成评论候选
- 输出多条不同风格评论
建议输出:
- 每条内容生成 3~8 条候选评论
- 每条评论标记风格:
- praise
- discussion
- question
- expectation
- emotion
要求:
- 评论必须和内容相关
- 不允许明显模板化
- 不允许批量近似改写
- 不允许联系方式 / 引流词 / 广告口吻
---
## 4. 评论筛选与去重模块
职责:
- 删除低质量候选评论
- 删除相似评论
- 删除过短/过假的评论
建议规则:
- 过滤空泛评论
- 过滤重复句式
- 过滤低长度无信息评论
- 同一内容短时禁止重复近似评论
- 同账号历史评论相似度控制
---
## 5. 账号选择模块
职责:
- 从运营账号池中选择可用账号
- 控制每账号评论频率和评论量
建议规则:
- 每账号有日评论上限
- 每账号有小时评论上限
- 每账号有 cooldown
- 失败账号进入短期冷却
- 不同账号使用不同评论风格权重
建议账号人设:
- 轻共鸣型
- 讨论型
- 催更型
- 情绪反应型
- 安静型
---
## 6. 审核适配模块
职责:
- 将自动生成评论接入现有审核系统
推荐模式:
- 优先使用“生成后先审核,再提交”
- 如果项目链路要求先入库,也可走“提交后自动审核”
第一版建议:
- 社区评论优先接现有自动审核链路
- 文章评论也建议复用现有待审状态
- 漫画 / 视频评论根据现有状态字段做适配
---
## 7. 评论提交模块
职责:
- 将评论写入对应评论表或走现有评论接口
- 记录提交结果
推荐优先级:
- 能复用现有业务接口时,优先走现有接口
- 必须直写表时,要严格复用当前字段结构和状态流
不同内容类型对应目标:
- 社区帖子 → `post_comment`
- 文章 → `contents_comment`
- 漫画 → `book_comments`
- 视频 → `mv_comments`
---
## 8. 任务调度模块
职责:
- 周期性跑自动评论任务
- 控制不同内容类型的执行频率
第一版建议:
- cron 驱动即可
- 按内容类型分开执行
- 每次限制处理条数
- 支持 dry-run 模式
建议任务:
- `auto-comment:post`
- `auto-comment:article`
- `auto-comment:comic`
- `auto-comment:video`
- `auto-comment:all`
---
## 9. 日志与审计模块
至少记录:
- task_id
- content_type
- content_id
- account_aff
- candidate_comment
- final_comment
- moderation_mode
- submit_result
- failure_reason
- created_at
建议独立日志表,不要只依赖业务评论表查问题。
---
## 三、第一版实施顺序
### Phase 1
- 社区帖子自动评论
- 文章自动评论
- 基础评论生成
- 基础账号轮换
- 基础审核衔接
### Phase 2
- 漫画自动评论
- 视频自动评论
- 更精细的人设与评论风格
- 更严格的去重和相似度控制
### Phase 3
- 小说自动评论
- 自动发帖能力
- 更复杂的运营策略与调度
---
## 四、第一版不建议做的内容
先不要上来就做:
- 复杂对话链评论
- 智能养号
- 自动发帖联动
- 跨平台多端发布
- 激进的高频评论策略
第一版优先确保:
- 评论像人
- 评论和内容相关
- 不重复
- 不违规
- 不把账号跑废
FILE:references/51dm-auto-comment-schema-plan.md
# Community Operations - 51dm 自动评论表结构与字段规划
## 1. 目标
基于 51dm 现有内容表结构,规划自动评论系统第一版应读取哪些表、哪些字段,以及评论结果应落到哪些表。
第一版目标:
- 针对站内已有内容自动生成评论
- 优先覆盖:漫画 / 文章 / 视频 / 社区帖子
- 小说能力先预留,待确认实际内容表后接入
- 评论仍走项目原有评论表与审核流
---
## 2. 已确认的主要内容表
## 2.1 漫画内容
表 / Model:
- `books` / `BooksModel`
建议读取字段:
- `id`
- `category_id`
- `category_title`
- `name`
- `description`
- `author`
- `tags`
- `chapter_count`
- `comment_count`
- `view_count`
- `rating`
- `status`
- `is_end`
- `type`
- `created_at`
- `updated_at`
推荐评论输入映射:
- `content_type` = `comic`
- `content_id` = `id`
- `title` = `name`
- `summary` = `description`
- `author` = `author`
- `tags` = `tags`
- `category_id` = `category_id`
- `extra.chapter_count` = `chapter_count`
- `extra.is_end` = `is_end`
---
## 2.2 文章内容
表 / Model:
- `contents` / `ContentsModel`
建议读取字段:
- `id`
- `title`
- `text`
- `aff`
- `tags`
- `type`
- `coins`
- `status`
- `comment_num`
- `view_num`
- `like_num`
- `favorite_num`
- `created_at`
- `updated_at`
推荐评论输入映射:
- `content_type` = `article`
- `content_id` = `id`
- `title` = `title`
- `summary` = `text` 的摘要截断
- `author` = `aff`
- `tags` = `tags`
- `extra.type` = `type`
说明:
- `text` 可能较长,建议先做摘要化,不要直接整段喂给评论生成器。
---
## 2.3 视频内容
表 / Model:
- `mv` / `MvModel`
建议读取字段:
- `id`
- `title`
- `second_title`
- `category_id`
- `category_title`
- `actors`
- `tags`
- `duration`
- `desc`
- `status`
- `count_comment`
- `count_play`
- `count_like`
- `created_at`
- `updated_at`
推荐评论输入映射:
- `content_type` = `video`
- `content_id` = `id`
- `title` = `title`
- `summary` = `desc`
- `tags` = `tags`
- `category_id` = `category_id`
- `extra.second_title` = `second_title`
- `extra.actors` = `actors`
- `extra.duration` = `duration`
---
## 2.4 社区帖子
表 / Model:
- `post` / `PostModel`
建议读取字段:
- `id`
- `topic_id`
- `topics`
- `title`
- `content`
- `aff`
- `status`
- `type`
- `post_type`
- `comment_num`
- `like_num`
- `favorite_num`
- `created_at`
- `updated_at`
推荐评论输入映射:
- `content_type` = `post`
- `content_id` = `id`
- `title` = `title`
- `summary` = `content` 摘要
- `author` = `aff`
- `tags` = `topics`
- `topic_id` = `topic_id`
- `extra.post_type` = `post_type`
---
## 2.5 小说内容
当前已见模型:
- `MemberNovelModel`
但从当前已读信息里,还没有确认它是否就是“可公开评论的小说主内容表”。
因此第一版建议:
- 先把小说作为可扩展内容类型保留
- 等确认小说主表、章节表、评论表后再正式接入
---
## 3. 已确认的评论表
## 3.1 漫画评论
表 / Model:
- `book_comments` / `BookCommentsModel`
字段:
- `id`
- `book_id`
- `aff`
- `parent_id`
- `content`
- `reply_count`
- `like_count`
- `status`
- `created_at`
- `updated_at`
说明:
- `status=0` 未审核
- `status=1` 已审核
---
## 3.2 视频评论
表 / Model:
- `mv_comments` / `MvCommentsModel`
字段:
- `id`
- `mv_id`
- `aff`
- `parent_id`
- `content`
- `reply_count`
- `like_count`
- `status`
- `created_at`
- `updated_at`
说明:
- `status=0` 未审核
- `status=1` 已审核
---
## 3.3 文章评论
表 / Model:
- `contents_comment` / `ContentsCommentModel`
字段:
- `id`
- `cid`
- `pid`
- `aff`
- `comment`
- `status`
- `refuse_reason`
- `is_top`
- `is_finished`
- `like_num`
- `created_at`
- `updated_at`
说明:
- `status=0` 待审核
- `status=1` 已通过
- `status=2` 已拒绝
---
## 3.4 社区评论
表 / Model:
- `post_comment` / `PostCommentModel`
字段建议(基于现有项目链路):
- `id`
- `post_id`
- `pid`
- `aff`
- `comment`
- `status`
- `refuse_reason`
- `is_top`
- `is_finished`
- `like_num`
- `created_at`
- `updated_at`
说明:
- 当前社区评论已有待审核 / 通过 / 拒绝状态流
- 适合直接复用现有自动审核链路
---
## 4. 账号来源建议
自动评论账号建议优先复用:
- `members` / `MemberModel`
建议至少读取:
- `aff`
- `nickname`
- `status`(如有)
- `lastvisit` / 最近活跃字段(如有)
- `vip_level`(如有)
- `expired_at`(如有)
- `oauth_type` / `oauth_id`(如业务链路需要)
自动评论系统建议额外维护一张独立的“运营账号配置表”,不要直接在 `members` 里硬塞运营状态。
建议独立配置字段:
- `aff`
- `role`
- `enabled`
- `cooldown_until`
- `daily_comment_limit`
- `hourly_comment_limit`
- `last_comment_at`
- `risk_score`
---
## 5. 第一版建议覆盖范围
优先级建议:
### 第一批接入
1. 社区帖子评论(`post` → `post_comment`)
2. 文章评论(`contents` → `contents_comment`)
3. 漫画评论(`books` → `book_comments`)
4. 视频评论(`mv` → `mv_comments`)
### 第二批接入
5. 小说评论(待确认实际主表 / 评论表后接入)
原因:
- 当前这四类内容和评论表结构已较明确
- 已存在现成评论模型和基础审核逻辑
- 更适合做第一版自动评论落地
FILE:references/51dm-auto-comment-task-breakdown-v1.md
# Community Operations - 51dm 自动评论第一版任务拆分
## 1. 目标
将 51dm 自动评论第一版拆成可执行的研发任务,优先覆盖:
- 社区帖子自动评论
- contents 图文内容自动评论
漫画与视频评论作为第二阶段。
---
## 2. 第一版范围
### 本期纳入
- 社区帖子自动评论
- contents 内容自动评论
- 评论生成
- 基础去重
- 账号轮换
- 审核衔接
- 定时触发
- 日志记录
### 本期不纳入
- 自动发帖
- 漫画自动评论
- 视频自动评论
- article 评论
- 评论对话链
- 智能养号
- 复杂 campaign 调度
---
## 3. 模块任务拆分
## 任务 A:目标内容选择器
### A1. 社区帖子候选选择
目标:
- 从 `post` 中选出可评论内容
建议过滤条件:
- 已发布
- 未删除
- 评论数未过高
- 近期未被自动评论过
- 内容文本长度满足最低要求
输出:
- 统一内容结构
### A2. contents 候选选择
目标:
- 从 `contents` 中选出可评论内容
建议过滤条件:
- `status=OK`
- 评论数未过高
- 文本存在
- 近期未被自动评论过
输出:
- 统一内容结构
---
## 任务 B:内容摘要器
### B1. 社区帖子摘要
- 读取 `title + content`
- 对正文做长度截断
- 提取关键词
### B2. contents 摘要
- 读取 `title + text + tags`
- 对 `text` 做摘要化
- 保留标签与风格信息
输出:
```json
{
"content_type": "post|article",
"content_id": 123,
"title": "标题",
"summary": "摘要",
"tags": [],
"extra": {}
}
```
---
## 任务 C:评论生成器
### C1. 风格设计
至少支持:
- praise
- discussion
- question
- expectation
- emotion
### C2. 社区帖子评论生成
- 基于标题/正文生成 3~5 条候选评论
- 偏向轻共鸣、讨论、提问
### C3. contents 评论生成
- 基于标题/摘要/标签生成 3~5 条候选评论
- 偏向观点反馈、体验反馈、轻讨论
### C4. 质量约束
- 评论必须和内容相关
- 评论长度自然分布
- 不允许明显模板化
- 不允许联系方式、引流词、广告词
---
## 任务 D:评论去重与筛选器
### D1. 候选评论去重
- 同一内容下近似评论去重
- 同一任务批次内去重
### D2. 历史评论相似度控制
- 同账号历史评论相似度控制
- 同内容历史自动评论相似度控制
### D3. 低质量评论过滤
过滤:
- 过短无意义评论
- 空泛评论
- 过度重复句式
---
## 任务 E:账号池与账号选择器
### E1. 运营账号配置
建议新增独立配置来源,至少管理:
- aff
- enabled
- role
- daily_limit
- hourly_limit
- cooldown_until
- last_comment_at
### E2. 账号选择规则
- 轮换选择账号
- 过滤冷却中账号
- 过滤达到限额账号
- 支持按内容类型匹配账号风格
### E3. 账号风格映射
建议角色:
- 轻共鸣型
- 讨论型
- 安静型
- 提问型
---
## 任务 F:审核适配
### F1. 社区评论审核衔接
- 复用 `CommunityService` 现有评论链路
- 评论进入现有待审/审核流
### F2. contents 评论审核衔接
- 复用 `ContentsService::createComment()`
- 评论进入现有待审/审核流
### F3. 生成后预审核(可选)
如需前置保护,可在生成后、提交前调用 moderation。
---
## 任务 G:评论提交器
### G1. 社区帖子评论提交
复用:
- `CommunityService::createPostComment()`
### G2. 社区回复评论提交(第二步再做)
复用:
- `CommunityService::createComComment()`
### G3. contents 评论提交
复用:
- `ContentsService::createComment()`
要求:
- 不第一版直写评论表
- 统一返回提交结果
- 统一记录失败原因
---
## 任务 H:日志与审计
建议新增自动评论日志表或统一日志模型,至少记录:
- task_id
- content_type
- content_id
- account_aff
- generated_comment
- final_comment
- strategy
- submit_result
- failure_reason
- created_at
---
## 任务 I:定时任务 / 命令入口
### I1. 社区自动评论命令
例如:
```bash
php cli auto-comment:post 50
```
### I2. contents 自动评论命令
例如:
```bash
php cli auto-comment:contents 50
```
### I3. 汇总命令
例如:
```bash
php cli auto-comment:all 100
```
### I4. 定时任务建议
- 社区评论:5~10 分钟
- contents 评论:10~15 分钟
---
## 4. 实施阶段建议
## Phase 1
- A1 / A2 目标选择器
- B1 / B2 摘要器
- C1~C4 评论生成器
- D1~D3 去重筛选
- E1 / E2 账号池与选择
## Phase 2
- F1 / F2 审核适配
- G1 / G3 评论提交器
- H 日志记录
## Phase 3
- I1~I4 命令与定时任务
- 观察效果
- 调整评论风格、频率、去重策略
---
## 5. 第一版交付标准
满足以下条件即视为第一版可落地:
- 能从社区和 contents 中选出可评论内容
- 能生成多条与内容相关的评论候选
- 能做基础去重和低质量过滤
- 能按账号池轮换评论
- 能复用现有业务逻辑提交评论
- 能进入现有审核流
- 能记录日志
- 能通过 CLI / cron 周期执行
FILE:references/51dm-comment-reuse-plan.md
# Community Operations - 51dm 各内容类型评论复用方案
## 1. 目标
明确 51dm 在做自动评论时,不同内容类型应优先复用哪一段现有业务逻辑,尽量避免第一版直接写评论表。
原则:
- 优先复用现有业务创建逻辑
- 优先复用现有审核流
- 优先复用现有状态与计数维护
- 尽量不要第一版直写评论表
---
## 2. 社区评论
### 内容链路
- 内容表:`post`
- 评论表:`post_comment`
### 推荐复用点
文件:
- `application/modules/Api/controllers/Community.php`
- `application/library/service/CommunityService.php`
核心方法:
- `CommunityService::createPostComment($member, $id, $content, $cityname)`
- `CommunityService::createComComment($member, $commentId, $content, $cityname)`
### 推荐方案
第一版自动评论直接复用 `CommunityService` 的创建逻辑,不要直接写 `post_comment`。
### 原因
- 已有评论待审核状态
- 已有回复评论逻辑
- 已有现成审核链路
- 已有状态与计数维护语义
- 与当前 post-content-moderation 改造后的链路最一致
### 结论
优先级:**最高**
---
## 3. 图文/文章内容评论(contents)
### 内容链路
- 内容表:`contents`
- 评论表:`contents_comment`
### 推荐复用点
文件:
- `application/modules/Api/controllers/Contents.php`
- `application/library/service/ContentsService.php`
核心方法:
- `ContentsService::createComment($member, $cid, $commentId, $content, $cityname)`
### 推荐方案
第一版自动评论优先复用 `ContentsService::createComment()`,不要直接写 `contents_comment`。
### 原因
- 已有长度限制
- 已有频率控制
- 已有评论次数限制
- 已有关键词/URL/字符过滤
- 已有待审核状态流
- 已有拒绝原因字段
### 结论
优先级:**最高**
---
## 4. 漫画评论
### 内容链路
- 内容表:`books`
- 评论表:`book_comments`
### 当前现有入口
文件:
- `application/modules/Api/controllers/Book.php`
核心方法:
- `BookController::commentAction()`
- `BookController::commentReplyAction()`
### 当前问题
漫画评论创建逻辑目前主要写在 controller 中,还没有单独沉到 service 层。
### 推荐方案
第一版不要直写 `book_comments`。
建议先抽一个轻量 service,例如:
- `BookCommentService`
抽出的逻辑至少包括:
- 校验书籍存在
- 校验评论权限
- 创建评论
- 维护 `books.comment_count`
### 原因
如果自动评论直接自己写表,会复制 controller 逻辑,后续容易分叉。
### 结论
优先级:**中高**
建议:**先抽 service,再接自动评论**
---
## 5. 视频评论
### 内容链路
- 内容表:`mv`
- 评论表:`mv_comments`
### 当前现有入口
文件:
- `application/modules/Api/controllers/Mv.php`
核心方法:
- `MvController::commentAction()`
- `MvController::commentReplyAction()`
### 当前问题
视频评论逻辑也主要在 controller 中,且比漫画多一层事件埋点逻辑。
### 推荐方案
第一版不要直写 `mv_comments`。
建议先抽一个轻量 service,例如:
- `MvCommentService`
抽出的逻辑至少包括:
- 校验视频存在
- 校验评论权限
- 创建评论
- 维护 `mv.count_comment`
- 可选保留事件埋点
### 原因
直写表会绕过现有业务逻辑和埋点逻辑。
### 结论
优先级:**中高**
建议:**先抽 service,再接自动评论**
---
## 6. article 文章评论
### 内容链路
- 内容表:`article`
- 评论表:`comment`
### 当前现有入口
文件:
- `application/modules/Api/controllers/Article.php`
核心方法:
- `ArticleController::commentAction()`
### 当前特点
这条链路额外依赖:
- VIP 校验
- 文章状态校验
- 订阅校验
- 购买校验
### 推荐方案
第一版不优先接。
如果后续要接,也建议先抽 service,再评估自动评论账号如何满足这些权限前置条件。
### 结论
优先级:**较低**
建议:**延后接入**
---
## 7. 第一版推荐接入顺序
### 第一梯队
1. 社区评论(复用 `CommunityService`)
2. 图文/文章内容评论(复用 `ContentsService`)
### 第二梯队
3. 漫画评论(先抽 `BookCommentService`)
4. 视频评论(先抽 `MvCommentService`)
### 第三梯队
5. article 评论(后补)
---
## 8. 研发落地原则
### 直接复用
适用于:
- 社区评论
- contents 评论
### 先抽 service 再复用
适用于:
- 漫画评论
- 视频评论
- article 评论
### 不建议第一版使用
- 自动任务直接写评论表
- 自动任务直接复制 controller 内部逻辑
- 自动任务自己补 comment_count 之类的计数字段
---
## 9. 一句话结论
在 51dm 中做自动评论时:
- **社区 / contents 优先直接复用现有 service**
- **漫画 / 视频先轻量抽 service,再接自动评论**
- **article 评论延后接入**
- **第一版不建议直写评论表**
FILE:references/architecture-template.md
# Community Operations - 架构模板
## 1. 业务目标
- 自动发帖 / 自动评论 / 多账号运营 / 混合
## 2. 账号模型
- account_id
- platform
- role
- status
- cooldown_until
- daily_limit
- hourly_limit
- last_action_at
## 3. 内容模型
- draft_id
- content_type
- title
- body
- medias
- topic_id
- tags
- uniqueness_key
## 4. 任务流
```text
task source
→ select content
→ select account
→ optional moderation
→ publish/comment
→ log result
→ retry/cooldown
```
## 5. 必备控制
- 幂等
- 超时
- 重试上限
- 频控
- 去重
- 审核衔接
- 审计日志
FILE:references/auto-comment-module-design.md
# Community Operations - 自动评论模块设计
## 1. 目标
基于站内已有内容自动生成评论并发布评论。
支持内容类型:
- 漫画
- 文章
- 小说
- 视频
---
## 2. 模块拆分
### A. 内容提取器
负责:
- 从不同内容表读取数据
- 抽取评论所需字段
- 统一输出标准结构
### B. 评论生成器
负责:
- 根据内容类型生成评论
- 输出多个候选评论
- 按风格分类(夸赞 / 讨论 / 提问 / 期待)
### C. 评论策略器
负责:
- 判断该内容是否需要评论
- 决定评论数量
- 决定评论时间
- 决定评论风格
### D. 账号选择器
负责:
- 从账号池选择账号
- 校验账号冷却状态
- 控制账号评论频率
- 失败后切换账号
### E. 评论执行器
负责:
- 调用评论接口
- 记录返回结果
- 失败重试
- 写执行日志
### F. 审核适配层
负责:
- 生成评论后的审核调用
- 发前审或发后审
- 风险评论丢弃或改写
---
## 3. 推荐统一输入结构
```json
{
"content_type": "comic|article|novel|video",
"content_id": 123,
"title": "标题",
"summary": "摘要",
"author": "作者",
"tags": ["标签1", "标签2"],
"category_id": 1,
"topic_id": 2,
"extra": {}
}
```
---
## 4. 内容类型与评论风格
### 漫画
- 画风反馈
- 角色反馈
- 剧情反馈
- 催更
- 对桥段的情绪反应
### 文章
- 观点回应
- 经验补充
- 轻讨论
- 提问互动
- 表达认同或保留意见
### 小说
- 文笔反馈
- 剧情推进
- 人物反馈
- 后续期待
- 章节后续猜测
### 视频
- 节奏反馈
- 片段反应
- 二刷意愿
- 内容评价
- 观感或情绪反应
---
## 5. 更像真人的生成要求
### 评论必须和内容有关系
生成评论时至少挂住一个具体点:
- 标题关键词
- 摘要/简介信息
- 标签
- 风格
- 剧情点
- 角色或作者信息
### 评论风格要分层
建议至少支持:
- 夸赞型
- 讨论型
- 提问型
- 期待型
- 情绪反应型
### 账号语气要有区分
建议至少设计几类账号人设:
- 轻共鸣型
- 讨论型
- 催更型
- 情绪反应型
- 安静型
### 长度要自然分布
建议:
- 少量超短评论
- 多数正常一句评论
- 部分两句展开评论
- 极少量偏长评论
### 避免明显机器感
尽量不要大量使用:
- 不错啊
- 真好看
- 支持一下
- 写得真棒
- 期待更新
这些句子可以偶尔出现,但不能成为主评论池。
---
## 6. 风控点
- 避免固定模板重复
- 避免同账号短时高频评论
- 避免同内容短时多条相似评论
- 避免生成联系方式、引流词、广告词
- 必要时接入 `post-content-moderation`
---
## 6. 第一版建议范围
先做:
- 内容提取
- 评论生成
- 账号轮换
- 评论投放
- 审核衔接
- 日志记录
先不做:
- 自动发帖
- 评论对话链
- 复杂 campaign 调度
- 复杂养号逻辑
FILE:references/human-like-comment-strategy.md
# Community Operations - 更像真人的评论生成策略
## 1. 目标
自动评论不是为了“看起来很多评论”,而是为了:
- 评论和内容有关
- 评论之间不像同一个人写的
- 评论不像固定模板批量灌水
- 评论不容易触发平台或业务方的低质/机器感判断
---
## 2. 先避免最假的 5 种评论
以下模式应尽量少用或禁用:
1. 纯空泛夸赞
- `不错啊`
- `真好看`
- `支持一下`
- `写得真棒`
2. 所有内容都用同一种语气
3. 评论和正文/标题没有关联
4. 同一批账号使用高度相似句式
5. 同一条内容短时间出现多个近似评论
---
## 3. 评论生成必须依赖内容特征
至少从以下信息中提取评论依据:
- 标题
- 摘要
- 标签
- 类别
- 作者
- 剧情/简介
- 视频标题/亮点
不要在缺乏内容特征时硬生成长评论。
如果素材不足,宁可生成一条短而自然的评论,也不要生成长而空的评论。
---
## 4. 按内容类型区分评论风格
### 漫画
适合:
- 角色反馈
- 剧情反馈
- 画风反馈
- 催更
- 对某个桥段的情绪反应
### 文章
适合:
- 观点回应
- 经验补充
- 轻讨论
- 提问互动
- 表达认同或保留意见
### 小说
适合:
- 文笔感受
- 剧情推进感受
- 人设反馈
- 章节期待
- 后续猜测
### 视频
适合:
- 节奏反馈
- 内容反应
- 片段印象
- 二刷意愿
- 轻度推荐语气
---
## 5. 按账号人设区分说话方式
建议账号至少分成几类:
### 轻共鸣型
- 说话短
- 表达真实反应
- 少用夸张词
### 讨论型
- 更容易提问
- 更容易补充观点
- 比较像愿意聊天的人
### 催更型
- 适合漫画、小说
- 关注后续剧情
- 会表达“想看后面”
### 情绪反应型
- 适合视频、剧情类内容
- 用简洁情绪表达
- 不要太戏精化
### 安静型
- 评论更短
- 出场频率低
- 不抢戏
不要让所有账号都用相同的情绪强度和句式长度。
---
## 6. 长度策略
建议自然分布:
- 20%:超短评论
- 50%:正常一句评论
- 25%:两句展开评论
- 5%:偏长的讨论型评论
如果长评论比例太高,会很假。
如果全是超短评论,也会像机器刷量。
---
## 7. 句式变化策略
生成时尽量变化:
- 感叹句
- 陈述句
- 提问句
- 先感受后评价
- 先评价后提问
例如不要连续出现:
- `这个真的不错`
- `这个确实不错`
- `这个看着不错`
这种会非常像模板变体。
---
## 8. 内容关联策略
评论最好能挂住内容中的一个具体点:
- 题材
- 人设
- 节奏
- 剧情点
- 关键词
- 风格感受
如果评论完全脱离内容,只是泛泛而谈,就算语气自然也还是假。
---
## 9. 去重与相似度控制
至少控制:
- 同一内容下近似评论不能连续出现
- 同账号不能重复发相似句式
- 多账号不能在短时间内复制轻微改写版评论
建议使用:
- 归一化后 hash 去重
- 相似度阈值控制
- 同一内容冷却窗口
- 评论池抽样轮换
---
## 10. 审核衔接
在“更像人”的同时,必须避免:
- 联系方式
- 引流词
- 广告口吻
- 明显灌水词
- 过度营销式夸张文案
如果接了 `post-content-moderation`,推荐:
- 生成后先审核,再提交
- 命中风险则改写或丢弃
---
## 11. 第一版建议
第一版先做到:
- 内容相关
- 语气多样
- 账号风格有区分
- 长度分布自然
- 不重复
- 不违规
不要第一版就追求“特别会聊”。
先做到“不像机器人”,已经很有价值。
FILE:references/operation-checklist.md
# Community Operations - Checklist
## 接入前
- [ ] 明确自动发帖还是自动评论还是两者都要
- [ ] 明确账号池来源
- [ ] 明确频控规则
- [ ] 明确审核模式
- [ ] 明确是否需要排队执行
## 设计时
- [ ] 有账号状态字段
- [ ] 有任务状态字段
- [ ] 有失败重试边界
- [ ] 有日志落点
- [ ] 有内容去重策略
- [ ] 有每账号限额
## 上线前
- [ ] 已验证单账号流程
- [ ] 已验证多账号轮换
- [ ] 已验证限频
- [ ] 已验证审核衔接
- [ ] 已验证失败重试
- [ ] 已验证禁用账号不会继续执行
全链路多平台发帖工作台(单机版)。一键部署、启动、运维 OmniPublish V2.0(FastAPI + Vue 3 + SQLite)六步流水线:素材选择→AI文案→图片处理→封面制作→水印→多平台发布。
---
name: omnipublish
version: 2.0.0
description: "全链路多平台发帖工作台(单机版)。一键部署、启动、运维 OmniPublish V2.0(FastAPI + Vue 3 + SQLite)六步流水线:素材选择→AI文案→图片处理→封面制作→水印→多平台发布。"
emoji: 🚀
user-invocable: true
homepage: https://github.com/XavierMary56/OmniPublish
metadata:
openclaw:
requires:
bins:
- python3
- node
- npm
anyBins:
- ffmpeg
install:
- kind: brew
package: [email protected]
- kind: brew
package: node
- kind: brew
package: ffmpeg
---
# OmniPublish V2.0 — 单机版工作台
全链路多平台发帖工作台。6 步流水线自动推进,一次多平台并行分发,SQLite 零配置单机运行。
---
## 激活时立即执行
当用户调用 /omnipublish 时,**立即运行以下状态检查和启动流程**:
```bash
# 1. 检查服务状态
curl -sf http://127.0.0.1:9527/api/ping
# 2. 如未运行,用 launcher.py 启动(自动处理 venv / 前端构建 / config)
python "$SKILL_DIR/launcher.py" start
# 3. 在浏览器打开(Windows)
start http://127.0.0.1:9527
# Mac/Linux:
# open http://127.0.0.1:9527
```
服务就绪后告知用户:
- 访问地址:http://127.0.0.1:9527
- 默认账号:admin / admin123
---
## 项目位置
技能目录:`$SKILL_DIR`(`.claude/skills/omnipublish/`)
项目根目录:`$SKILL_DIR` 上三级(`launcher.py` 自动推导)
launcher.py 做的事:
1. 创建 Python venv(首次)
2. pip install 依赖
3. npm build 前端(首次,约 1-2 分钟)
4. 后台启动 `python backend/main.py`
5. 等待 `/api/ping` 就绪
---
## 常用操作命令(Bash 工具执行)
```bash
# 启动
python "$SKILL_DIR/launcher.py" start
# 停止
python "$SKILL_DIR/launcher.py" stop
# 重启
python "$SKILL_DIR/launcher.py" restart
# 状态
python "$SKILL_DIR/launcher.py" status
# 查看日志
tail -f "$SKILL_DIR/../../../logs/server.log"
```
---
## 本地 API(WebFetch 工具调用)
基础地址:`http://127.0.0.1:9527`
```
GET /api/ping 健康检查 → {"ok":true}
POST /api/auth/login 登录 {username, password}
GET /api/tasks 任务列表
GET /api/platforms 业务线列表
GET /api/stats/overview 统计数据
GET /docs Swagger 交互式文档
```
登录示例:
```bash
curl -X POST http://127.0.0.1:9527/api/auth/login \
-H "Content-Type: application/json" \
-d '{"username":"admin","password":"admin123"}'
```
---
## 技术栈
| 层 | 技术 |
|---|------|
| 后端 | FastAPI (Python 3.10+) |
| 数据库 | SQLite(aiosqlite,data/omnipub.db) |
| 前端 | Vue 3 + TypeScript + Vite(首次运行自动构建) |
| 实时推送 | WebSocket(FastAPI 内置) |
| 图片处理 | Pillow + YOLOv8 |
| 视频处理 | FFmpeg(需本机安装) |
| AI 文案 | OpenAI / 兼容 API |
---
## 目录结构
```
OmniPublish/ ← 项目根($SKILL_DIR/../../../)
├── .claude/skills/omnipublish/
│ ├── SKILL.md ← 本文件
│ └── launcher.py ← 自适应启动器
├── config.json ← 运行时配置(首次自动创建)
├── data/omnipub.db ← SQLite 数据库(自动创建)
├── logs/server.log ← 服务日志
├── backend/ ← FastAPI 后端
│ ├── main.py
│ ├── database.py ← SQLite + asyncpg 兼容层
│ ├── routers/
│ └── services/
└── frontend/
├── src/ ← Vue 源码
└── dist/ ← 构建产物(首次运行自动生成)
```
---
## 流水线步骤
| 步骤 | currentStep | 后端路由 |
|------|-------------|---------|
| Step 1:素材 & 平台 | 0 | POST /api/pipeline |
| Step 2:AI 文案 | 1 | POST /step/2/generate |
| Step 3:图片重命名 | 2 | PUT /step/3/confirm |
| Step 4:封面制作 | 3 | POST /step/4/generate |
| Step 5:水印处理 | 4 | GET /step/5/plan, PUT /step/5/confirm |
| Step 6:上传发布 | 5 | POST /step/6/publish |
---
## 配置文件
`config.json`(从 `config.json.example` 自动创建):
```jsonc
{
"api_key": "sk-xxx", // LLM API 密钥(AI 文案,可不填)
"api_base": "https://api.openai.com",
"server": { "port": 9527, "auth_token": "change-me" },
"crypto": { "appkey": "", "aes_key": "", "aes_iv": "" }
}
```
修改后:`python "$SKILL_DIR/launcher.py" restart`
---
## 调试命令
```bash
# 查看实时日志
tail -f "$SKILL_DIR/../../../logs/server.log"
# 检查数据库
sqlite3 "$SKILL_DIR/../../../data/omnipub.db" ".tables"
# 检查 FFmpeg
ffmpeg -version
# 重置管理员密码
cd "$SKILL_DIR/../../../backend"
python -c "
import asyncio, bcrypt
from database import get_pool
async def reset():
pool = await get_pool()
async with pool.acquire() as conn:
h = bcrypt.hashpw(b'newpass123', bcrypt.gensalt()).decode()
await conn.execute(\"UPDATE users SET password=? WHERE username='admin'\", h)
print('密码已重置为 newpass123')
asyncio.run(reset())
"
```
---
## 开发规范
- Python:snake_case 函数,PascalCase 类,async/await IO,asyncio.to_thread() 耗时操作
- Vue:TypeScript 严格模式,状态统一 Pinia,上一步不清除数据,变更后 saveDraft()
---
## 常见问题
| 问题 | 排查 |
|------|------|
| 水印处理失败 | `ffmpeg -version` 检查是否安装 |
| AI 文案报错 | 检查 config.json 的 api_key 和 api_base |
| 端口占用 | 修改 config.json 的 server.port,重启 |
| 前端空白 | `python launcher.py restart`(重新构建) |
| 数据库损坏 | `cp data/omnipub.db data/omnipub_backup.db` 恢复 |
FILE:launcher.py
#!/usr/bin/env python3
"""OmniPublish V2.0 — 技能启动器
从技能目录自动推导项目根目录,完成全套初始化并后台启动服务。
用法:
python launcher.py # 启动(如已运行则跳过)
python launcher.py stop # 停止
python launcher.py status # 状态检查
python launcher.py restart # 重启
"""
import os
import sys
import subprocess
import time
import shutil
import signal
from pathlib import Path
# ── 路径推导(自适应任意机器)──
# 技能路径: {project}/.claude/skills/OmniPublishv2.0/launcher.py
SKILL_DIR = Path(__file__).resolve().parent
PROJECT_DIR = SKILL_DIR.parent.parent.parent # 上三级 = 项目根
BACKEND_DIR = PROJECT_DIR / "backend"
FRONTEND_DIR = PROJECT_DIR / "frontend"
VENV_DIR = PROJECT_DIR / "venv"
LOG_DIR = PROJECT_DIR / "logs"
LOG_FILE = LOG_DIR / "server.log"
PID_FILE = LOG_DIR / "server.pid"
PORT = 9527
# 平台判断
IS_WIN = sys.platform == "win32"
PYTHON = VENV_DIR / ("Scripts/python.exe" if IS_WIN else "bin/python3")
PIP = VENV_DIR / ("Scripts/pip.exe" if IS_WIN else "bin/pip")
# ══════════════════════════════════════
# 工具函数
# ══════════════════════════════════════
def check_server() -> bool:
"""检查服务是否在运行。"""
import urllib.request
try:
urllib.request.urlopen(f"http://127.0.0.1:{PORT}/api/ping", timeout=2)
return True
except Exception:
return False
def read_pid() -> int | None:
if PID_FILE.exists():
try:
return int(PID_FILE.read_text().strip())
except Exception:
pass
return None
def is_pid_alive(pid: int) -> bool:
try:
os.kill(pid, 0)
return True
except OSError:
return False
def run(cmd: list, **kwargs) -> subprocess.CompletedProcess:
print(f" $ {' '.join(str(c) for c in cmd)}")
return subprocess.run(cmd, **kwargs)
# ══════════════════════════════════════
# 操作
# ══════════════════════════════════════
def setup_venv():
"""创建虚拟环境(首次)。"""
if VENV_DIR.exists():
return
print("[SETUP] 创建 Python 虚拟环境...")
run([sys.executable, "-m", "venv", str(VENV_DIR)], check=True)
def install_deps():
"""安装 Python 依赖。"""
req = BACKEND_DIR / "requirements.txt"
print("[SETUP] 检查 Python 依赖...")
run([str(PIP), "install", "-q", "-r", str(req)], check=True)
def build_frontend():
"""构建 Vue 前端(仅首次或 dist/ 不存在时)。"""
dist = FRONTEND_DIR / "dist"
if dist.exists() and any(dist.iterdir()):
return # 已构建
print("[BUILD] 构建前端(首次约 1-2 分钟)...")
# 检查 npm
npm = shutil.which("npm")
if not npm:
print("[ERROR] 未找到 npm,请安装 Node.js: https://nodejs.org/")
sys.exit(1)
node_modules = FRONTEND_DIR / "node_modules"
if not node_modules.exists():
print("[BUILD] npm install...")
run([npm, "install", "--legacy-peer-deps"], cwd=str(FRONTEND_DIR), check=True)
run([npm, "run", "build"], cwd=str(FRONTEND_DIR), check=True)
print("[BUILD] 前端构建完成")
def ensure_config():
"""首次运行复制配置文件。"""
cfg = PROJECT_DIR / "config.json"
example = PROJECT_DIR / "config.json.example"
if not cfg.exists() and example.exists():
shutil.copy(example, cfg)
print("[SETUP] 已创建 config.json(AI 文案功能需填写 api_key)")
def start():
"""启动服务(后台运行)。"""
if check_server():
print(f"[OK] OmniPublish 已在运行 → http://127.0.0.1:{PORT}")
return
# 初始化
setup_venv()
install_deps()
build_frontend()
ensure_config()
# 确保日志目录
LOG_DIR.mkdir(exist_ok=True)
print("[START] 后台启动 OmniPublish...")
with open(LOG_FILE, "w", encoding="utf-8") as f:
kwargs = dict(stdout=f, stderr=f, cwd=str(BACKEND_DIR))
if IS_WIN:
kwargs["creationflags"] = subprocess.DETACHED_PROCESS | subprocess.CREATE_NEW_PROCESS_GROUP
proc = subprocess.Popen([str(PYTHON), "main.py"], **kwargs)
PID_FILE.write_text(str(proc.pid))
# 等待就绪(最多 15 秒)
print("[START] 等待服务就绪", end="", flush=True)
for _ in range(15):
time.sleep(1)
print(".", end="", flush=True)
if check_server():
print()
print(f"[OK] OmniPublish 已启动 (PID {proc.pid})")
print(f"[OK] 访问地址: http://127.0.0.1:{PORT}")
print(f"[OK] 账号: admin / admin123")
print(f"[OK] 日志: {LOG_FILE}")
return
print()
print(f"[WARN] 启动超时,请查看日志: {LOG_FILE}")
def stop():
"""停止服务。"""
pid = read_pid()
if pid and is_pid_alive(pid):
print(f"[STOP] 停止 OmniPublish (PID {pid})...")
try:
if IS_WIN:
subprocess.run(["taskkill", "/F", "/PID", str(pid)], check=True)
else:
os.kill(pid, signal.SIGTERM)
time.sleep(1)
print("[OK] 服务已停止")
except Exception as e:
print(f"[ERROR] 停止失败: {e}")
else:
if check_server():
print("[WARN] 服务在运行但找不到 PID 文件,请手动停止")
else:
print("[OK] 服务未运行")
if PID_FILE.exists():
PID_FILE.unlink()
def status():
"""显示服务状态。"""
running = check_server()
pid = read_pid()
print(f" 服务状态: {'运行中' if running else '已停止'}")
if pid:
alive = is_pid_alive(pid)
print(f" PID: {pid} ({'存活' if alive else '已退出'})")
print(f" 地址: http://127.0.0.1:{PORT}")
print(f" 项目: {PROJECT_DIR}")
print(f" 数据库: {PROJECT_DIR / 'data' / 'omnipub.db'}")
dist = FRONTEND_DIR / "dist"
print(f" 前端构建: {'已构建' if dist.exists() and any(dist.iterdir()) else '未构建'}")
# ══════════════════════════════════════
# 主入口
# ══════════════════════════════════════
if __name__ == "__main__":
cmd = sys.argv[1] if len(sys.argv) > 1 else "start"
if cmd == "start":
start()
elif cmd == "stop":
stop()
elif cmd == "restart":
stop()
time.sleep(1)
start()
elif cmd == "status":
status()
else:
print(f"用法: python launcher.py [start|stop|restart|status]")
sys.exit(1)
Moderate Telegram groups with a bot by receiving message/webhook events, extracting text/caption/media context, applying anti-advertising and anti-contact po...
---
name: telegram-group-moderation
version: 0.9.0
description: Moderate Telegram groups with a bot by receiving message/webhook events, extracting text/caption/media context, applying anti-advertising and anti-contact policies, and deciding whether to pass, delete, warn, mute, ban, or send for manual review. Use when connecting Telegram groups or channels to an existing moderation system, especially when reusing post-content-moderation as the policy core.
emoji: 📱
homepage: https://github.com/XavierMary56/OmniPublish
requires:
- post-content-moderation
metadata:
openclaw:
requires:
bins:
- python3
---
# Telegram Group Moderation
Build Telegram group moderation as an **integration layer**, not as a replacement for your existing moderation policy skill.
Recommended architecture:
- use `post-content-moderation` as the moderation-policy core
- use this skill to receive Telegram updates, normalize Telegram payloads, call the moderation core, and execute Telegram moderation actions
## Core responsibilities
Use this skill for:
- Telegram Bot webhook integration
- Telegram group message normalization
- extracting text, caption, media URL/file metadata, sender info, and chat info
- mapping moderation results into Telegram actions
- enforcing group-specific whitelist / admin-exemption / punishment rules
- logging moderation decisions for audit
Do not bloat this skill with generic moderation policy text that already belongs in `post-content-moderation`.
## Recommended decision flow
1. Receive Telegram update.
2. Detect update type:
- message
- edited_message
- channel_post
- edited_channel_post
3. Extract moderation input:
- chat_id
- message_id
- user_id
- username / display name
- text
- caption
- photo / video presence
- forwarded / reply / sticker / invite-link hints if needed
4. Normalize into a moderation payload.
5. Call moderation core.
6. Map result into Telegram action:
- `pass` -> no action
- `reject` -> delete / warn / mute / ban depending on rule set
- `review` -> flag to admin channel or log queue
7. Persist result and evidence.
## Action mapping
Use clear business mapping. Example:
- `pass` -> allow
- `reject` + high risk -> delete message and warn user
- `reject` + repeated violations -> delete and mute
- `reject` + explicit scam/spam pattern -> delete and ban
- `review` -> forward summary to admin review channel
Keep action policy configurable per group.
## Telegram-specific rule inputs
Add these rule dimensions on top of the generic moderation core:
- allowed chat ids
- admin / moderator user whitelist
- trusted service bots whitelist
- punishment ladder by offense count
- whether edited messages should be re-audited
- whether forwarded posts are allowed
- whether links are fully blocked or only ad-like links are blocked
- whether usernames / bios / display names count as diversion evidence
## Media limitations
Telegram integration often needs more than plain text:
- image moderation may require OCR and QR detection
- video moderation may require frame extraction and subtitle/ASR pipeline
- file_id alone is not enough for real moderation; fetch or proxy media only when policy and privacy requirements allow it
If real media inspection is not implemented, document that clearly and avoid claiming full image/video moderation coverage.
## Security baseline
- validate Telegram webhook authenticity at the integration layer
- verify chat allowlist before processing
- keep bot token and API keys only in environment variables
- rate-limit admin actions and callback retries
- log delete/mute/ban actions with chat_id, user_id, message_id, and moderation reason
- avoid downloading media to unsafe temp paths
- define retention policy for moderated content snapshots
## Bundled references
Read these files as needed:
- `references/architecture.md` for recommended system design
- `references/telegram-event-mapping.md` for Telegram update normalization
- `references/action-policy.md` for pass/reject/review to delete/warn/mute/ban mapping
- `references/php-yaf-integration.md` for PHP 7.3 / Yaf-oriented integration notes
- `references/multi-language-integration.md` for Python, Go, and Java integration guidance
- `references/install-and-usage.zh-CN.md` for practical Chinese installation and configuration guidance
- `references/production-rollout.zh-CN.md` for production rollout boundaries and deployment advice
- `references/http-contract-example.json` for request/response contract example with moderation core
- `references/http-contract-production.zh-CN.md` for production HTTP contract guidance
- `references/http-contract-production-v2.zh-CN.md` for trace_id-aware production contract guidance
- `references/redis-db-offense-store.zh-CN.md` for Redis/DB offense-count design guidance
- `references/db-schema-example.sql` for default DB offense-log schema
- `references/audit-log-schema-example.sql` for audit-log schema
- `references/audit-log-rollout.zh-CN.md` for audit-log rollout guidance
- `references/config-template.env.example` for environment template hints
- `references/release-notes.zh-CN.md` for Chinese release notes
- `references/clawhub-release-copy.zh-CN.md` for Chinese release copy and page wording
## Bundled scripts
Use bundled scripts as starting points, not production-final code:
- `scripts/config.php` for env-driven config layout
- `scripts/telegram_support.php` for shared constants and helpers
- `scripts/telegram_webhook_example.php` for PHP webhook entry example
- `scripts/telegram_action_example.php` for PHP Telegram Bot API action calls
- `scripts/python_telegram_webhook_example.py` for Python webhook/action flow example
- `scripts/go_telegram_webhook_example.go` for Go webhook/action flow example
- `scripts/java_telegram_webhook_example.java` for Java webhook/action flow example
## Packaging guidance
Keep this skill platform-specific and small:
- Telegram ingress and action logic belongs here
- reusable moderation policy belongs in `post-content-moderation`
- if you later add Discord/WhatsApp, create separate integration skills instead of mixing all platforms into one
FILE:references/action-policy.md
# Action Policy
## Recommended base mapping
### pass
- keep message
- optionally log low-risk pass sample
### reject
Choose action by severity and offense count.
Example ladder:
- first offense -> delete + warn
- second offense -> delete + mute 10 minutes
- third offense -> delete + mute 1 day
- repeated or severe -> delete + ban
### review
- do not silently ignore
- forward summary to admin review channel or queue
- include chat_id, message_id, user_id, short reason, and evidence summary
## High-risk indicators
Consider stronger punishment when you see:
- clear scam or fraud intent
- repeated off-platform contact attempts
- repeated QR / invite / gambling / adult-spam patterns
- bot-like repeated posting across groups
- evasion after prior warnings
## Operator controls
Keep these configurable:
- delete_on_reject
- warn_on_reject
- mute_duration_seconds
- ban_on_high_risk
- admin_review_chat_id
- offense_window_seconds
- offense_thresholds
FILE:references/architecture.md
# Architecture
## Recommended split
Use two layers:
### Layer 1: policy core
`post-content-moderation`
Responsibilities:
- ad / contact / diversion judgment
- whitelist and custom rule interpretation
- structured moderation result
### Layer 2: Telegram integration
`telegram-group-moderation`
Responsibilities:
- webhook event intake
- Telegram payload parsing
- group-specific config
- moderation action execution
- audit log and admin notifications
## Suggested request path
1. Telegram sends update to webhook
2. webhook validates request and allowed chat
3. normalize update into moderation payload
4. call moderation core
5. receive `pass|reject|review`
6. execute Telegram action
7. log result
## Suggested deployment options
### Option A: same service process
One PHP service handles webhook + moderation call + Telegram action.
Use when:
- small to medium traffic
- simple rules
- low ops overhead
### Option B: webhook ingress + worker queue
Webhook only stores updates and returns quickly. Worker handles moderation and Telegram actions.
Use when:
- large groups
- frequent media posts
- retry / rate-limit concerns
- you want better resilience
## Suggested persistence
Store at least:
- chat_id
- message_id
- user_id
- normalized text
- moderation result
- action taken
- offense count snapshot
- created_at
## Suggested risk policy
- text-only violation -> delete + warn
- repeated violation -> mute
- severe spam / scam pattern -> ban
- uncertain media-only suspicion -> review or conservative delete depending on business tolerance
FILE:references/audit-log-rollout.zh-CN.md
# 审核日志落库说明(中文)
## 目标
给 Telegram 审核接入层补齐可追踪、可复盘、可人工复核的基础日志能力。
## 建议记录什么
建议至少记录:
- trace_id
- chat_id
- message_id
- user_id
- username
- audit_status
- risk_level
- action
- reason
- offense_count
- action_result
- created_at
## 为什么要落库
如果只删消息、不留日志,后续会遇到这些问题:
- 管理员不知道为什么删了
- 用户申诉时无法回查
- 无法分析误判率
- 无法分析哪个群、哪类规则问题最多
## action_result 建议
建议至少区分:
- `pending`
- `success`
- `failed`
- `skipped`
## review 建议
review 不应只发管理员群,最好也落库。
这样可以:
- 后台查看待复核记录
- 后续做状态流转
- 审计谁处理过哪条 review
## trace_id 建议
trace_id 用来串联:
- Telegram webhook 收到的事件
- 调用审核核心的请求
- Telegram 动作执行结果
- 审核日志记录
推荐格式:
```text
tg-<message_id>-<YYYYMMDDHHMMSS>
```
FILE:references/audit-log-schema-example.sql
CREATE TABLE telegram_moderation_audit_log (
id BIGINT UNSIGNED NOT NULL AUTO_INCREMENT,
trace_id VARCHAR(64) NOT NULL,
chat_id BIGINT NOT NULL,
message_id BIGINT NOT NULL,
user_id BIGINT NOT NULL,
username VARCHAR(128) NOT NULL DEFAULT '',
audit_status VARCHAR(16) NOT NULL,
risk_level VARCHAR(16) NOT NULL,
action VARCHAR(64) NOT NULL,
reason VARCHAR(255) NOT NULL DEFAULT '',
offense_count INT NOT NULL DEFAULT 0,
action_result VARCHAR(32) NOT NULL DEFAULT 'pending',
created_at DATETIME NOT NULL,
PRIMARY KEY (id),
UNIQUE KEY uniq_trace_id (trace_id),
KEY idx_chat_message (chat_id, message_id),
KEY idx_user_created (user_id, created_at)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;
FILE:references/clawhub-release-copy.zh-CN.md
# ClawHub 页面中文文案
## 简介建议
面向 Telegram 群组的审核接入层 skill,用于接收机器人 webhook 消息、提取文本/图片/视频上下文、调用现有审核策略内核,并将审核结果映射为删帖、告警、禁言、封禁或人工复核动作。适合已经有内容审核规则,希望进一步接入 Telegram 群组治理的场景。
## 亮点建议
- 复用现有审核策略 skill,而不是重写审核规则
- 支持 Telegram webhook / 群消息 / 编辑消息审核接入
- 支持 delete / warn / mute / ban / review 动作映射
- 支持群组白名单、管理员豁免、违规阶梯处罚
- 适合 PHP 7.3 / Yaf 项目做接入层改造
- 已补生产接入参考材料、HTTP 契约示例与 env 模板
## 风险边界建议
- 这是 Telegram 接入层,不是完整本地多模态识别引擎
- 若要真正审核图片二维码、视频字幕、语音引流,需要额外补齐 OCR / ASR / 抽帧链路
- 上线前应确认 Bot 权限、群组范围、日志留存和数据合规策略
FILE:references/db-schema-example.sql
CREATE TABLE telegram_moderation_offense_log (
id BIGINT UNSIGNED NOT NULL AUTO_INCREMENT,
chat_id BIGINT NOT NULL,
user_id BIGINT NOT NULL,
created_at DATETIME NOT NULL,
PRIMARY KEY (id),
KEY idx_chat_user_created (chat_id, user_id, created_at)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;
FILE:references/http-contract-example.json
{
"request": {
"platform": "telegram",
"id": 345,
"title": "",
"content": "加V了解一下",
"imgs": [],
"videos": [],
"other": {
"chat_id": -1001234567890,
"user_id": 777,
"username": "spam_user",
"raw_has_photo": false,
"raw_has_video": false,
"forwarded": false
}
},
"response": {
"id": 345,
"audit_status": "reject",
"is_pass": 0,
"risk_level": "high",
"reason": "存在联系方式和明显引流话术",
"hit_rules": ["contact_info", "advertising"],
"hit_fields": ["content"],
"hit_positions": ["content"],
"action": "reject"
}
}
FILE:references/http-contract-production-v2.zh-CN.md
# HTTP 契约(生产版 v2 建议)
## 新增目标
在原有 Telegram -> moderation core 契约上,增加 trace_id 和更稳定的日志串联能力。
## 请求建议
```json
{
"platform": "telegram",
"scene": "group_message",
"id": 345,
"trace_id": "tg-345-20260320014100",
"title": "",
"content": "加V了解一下",
"imgs": [],
"videos": [],
"other": {
"chat_id": -1001234567890,
"message_id": 345,
"user_id": 777,
"username": "spam_user"
}
}
```
## 响应建议
```json
{
"id": 345,
"trace_id": "tg-345-20260320014100",
"audit_status": "reject",
"is_pass": 0,
"risk_level": "high",
"reason": "存在联系方式和明显引流话术",
"hit_rules": ["contact_info", "advertising"],
"action": "reject"
}
```
## 原则
- Telegram 接入层负责生成 trace_id
- 审核核心应尽量原样回传 trace_id
- 所有审计日志、review 记录、动作结果都使用同一个 trace_id
FILE:references/http-contract-production.zh-CN.md
# HTTP 契约(生产版建议)
## 目标
统一 Telegram 接入层与审核核心之间的 HTTP 请求/响应结构,避免每个项目都临时拼字段。
## 请求建议
### URL
```text
POST /moderation/telegram/message-audit
```
### Headers
```text
Content-Type: application/json
Authorization: Bearer <token>
X-Request-Id: <trace_id>
```
### Body 示例
```json
{
"platform": "telegram",
"scene": "group_message",
"id": 345,
"trace_id": "tg-345-20260320004500",
"title": "",
"content": "加V了解一下",
"imgs": [],
"videos": [],
"other": {
"chat_id": -1001234567890,
"chat_type": "supergroup",
"chat_title": "Example Group",
"message_id": 345,
"user_id": 777,
"username": "spam_user",
"display_name": "Promo Bot",
"raw_has_photo": false,
"raw_has_video": false,
"forwarded": false,
"edited": false
}
}
```
## 响应建议
```json
{
"id": 345,
"trace_id": "tg-345-20260320004500",
"audit_status": "reject",
"is_pass": 0,
"risk_level": "high",
"reason": "存在联系方式和明显引流话术",
"hit_rules": ["contact_info", "advertising"],
"hit_fields": ["content"],
"hit_positions": ["content"],
"action": "reject"
}
```
## 响应字段要求
至少要求:
- `audit_status`
- `risk_level`
- `reason`
推荐补充:
- `trace_id`
- `hit_rules`
- `hit_fields`
- `hit_positions`
- `action`
## 契约原则
- Telegram 接入层不重复发明审核规则
- 审核核心只负责判断,不直接调用 Telegram API
- Telegram 接入层根据审核结果自行决定 delete / warn / mute / ban / review
## 错误处理建议
审核核心异常时,建议返回明确错误码,避免 Telegram 接入层误判为通过。
例如:
```json
{
"error": true,
"code": "UPSTREAM_TIMEOUT",
"message": "moderation core timeout"
}
```
然后在 Telegram 接入层按业务策略:
- review
- 或保守拦截
FILE:references/install-and-usage.zh-CN.md
# 安装与使用说明(中文)
## 适用场景
这个 skill 适合以下场景:
- 你已经有一套内容审核规则,想接入 Telegram 群组
- 你希望机器人在群里自动拦截广告、联系方式、引流内容
- 你希望把审核结果映射成删除消息、警告、禁言、封禁或人工复核动作
- 你当前项目是 PHP 7.3 / Yaf,想先做一个可逐步增强的接入层
这个 skill 更像是 **Telegram 接入层**,不是替代审核规则本身。
建议配合现有 `post-content-moderation` 一起使用。
## 推荐架构
建议拆成两层:
### 第一层:审核策略核心
使用:`post-content-moderation`
负责:
- 判断是否广告
- 判断是否联系方式 / 引流
- 处理白名单 / 自定义规则
- 输出 `pass / reject / review`
### 第二层:Telegram 接入层
使用:`telegram-group-moderation`
负责:
- 接收 Telegram webhook
- 提取群消息文本、caption、图片/视频存在信息
- 组装审核请求
- 调用审核核心
- 根据结果执行 delete / warn / mute / ban / review
## 当前版本能做什么
当前版本已经提供:
- Telegram update 标准化骨架
- webhook 示例入口
- 删除消息示例
- 封禁用户示例
- 限时禁言示例
- review 通知管理员群示例
- 群组 allowlist
- 管理员豁免
- dry-run 配置
- 审核核心外部 endpoint 调用骨架
- PHP / Python / Go / Java 多语言接入示例
## 当前版本还没有完全补齐的部分
上线前请注意,当前示例脚本还需要你按业务补充:
- offense count 违规次数累计
- 图片 OCR / 二维码识别 / 视频抽帧 / ASR
- 与你现有审核服务的真实字段对接
当前这版已经补上基础能力:
- webhook secret 校验
- warn 提示消息发送
- mute 限时禁言动作
- review 转管理员群通知
- offense count 时间窗口统计
- 按第 1 / 2 / 3 次违规切换阶梯处罚
## 配置说明
主要配置位于:
- `scripts/config.php`
### 1. Telegram 配置
#### `telegram.bot_token`
Telegram Bot Token。
来源:
- 通过 BotFather 创建机器人后获得
建议:
- 只通过环境变量注入
- 不要写死在代码里
#### `telegram.api_base`
Telegram Bot API 地址。
默认:
```text
https://api.telegram.org
```
#### `telegram.webhook_secret`
Webhook 鉴权密钥。
建议:
- 生产环境必须启用
- 在 webhook 入口校验 header 或路径中的 secret
#### `telegram.allowed_hosts`
Telegram API 出站白名单。
默认:
```php
['api.telegram.org']
```
#### `telegram.allowed_chat_ids`
允许处理的群组 ID 白名单。
建议:
- 一开始只放测试群
- 验证稳定后再扩展到正式群
#### `telegram.admin_review_chat_id`
管理员审核群 / 告警群 ID。
建议用途:
- 把 review 结果推送给管理组
- 记录高风险命中摘要
### 2. 审核核心配置
#### `moderation_core.mode`
审核核心模式。
当前默认:
```text
external
```
含义:
- 当前 Telegram skill 默认把内容投递到一个外部审核接口
- 这个接口可以是你现有的审核服务
- 也可以是你后面包装好的 `post-content-moderation` HTTP 服务
#### `moderation_core.endpoint`
审核核心接口地址。
建议:
- 使用 HTTPS
- 放入 allowlist
- 只允许可信内网或固定业务域名
#### `moderation_core.token`
审核核心接口鉴权 token。
建议:
- 使用 Bearer Token
- 仅通过环境变量注入
#### `moderation_core.allowed_hosts`
审核核心接口地址白名单。
建议:
- 不要留空直接上线
- 只允许你的正式审核服务域名
#### `moderation_core.dry_run`
是否 dry-run。
开启后:
- 不会真的走正式审核链路
- 当前示例会直接返回 `review`
- 适合联调 Telegram 接入和动作逻辑
### 3. 动作策略配置
#### `policy.delete_on_reject`
命中拒绝时是否删消息。
#### `policy.warn_on_reject`
命中拒绝时是否警告。
当前示例里已经支持发送 warn 消息。
#### `policy.ban_on_high_risk`
高风险拒绝时是否直接封禁。
建议:
- 初期先关闭
- 观察误判情况后再打开
#### `policy.mute_seconds`
禁言时长。
当前示例里已经接入 `restrictChatMember`,可作为基础限时禁言实现。
#### `policy.offense_window_seconds`
违规统计时间窗口。
含义:
- 只统计这个时间窗口内的违规次数
- 超过窗口的历史违规会自动失效
例如:
- 配置 `86400` 表示只统计最近 24 小时内的违规记录
#### `policy.first_offense_action`
- `policy.second_offense_action`
- `policy.third_offense_action`
用于定义阶梯处罚动作。
推荐初始配置:
- 第 1 次:`delete_and_warn`
- 第 2 次:`delete_and_mute`
- 第 3 次:`delete_and_ban`
#### `policy.offense_store_driver`
违规次数存储驱动。
当前支持的结构是:
- `file`:本地 JSON demo
- `redis`:已提供最小可用实现
- `db`:生产 stub,等你接真实数据库
#### `policy.offense_store_path`
违规次数存储路径。
当前示例默认使用本地文件 JSON 存储,仅适合单机 demo 或联调环境。
生产环境更建议替换为 Redis 或数据库。
#### `policy.offense_store_table`
数据库驱动下建议使用的 offense log 表名。
默认:
- `telegram_moderation_offense_log`
#### `policy.re_audit_edited_messages`
是否重审编辑后的消息。
建议:
- 默认开启
- 否则用户可能先发正常内容,再编辑成广告
#### `policy.admin_user_ids`
管理员豁免 ID 列表。
建议:
- 仅放可信管理员
- 不要把普通机器人或普通用户放进去
#### `policy.trusted_bot_user_ids`
可信服务机器人 ID 列表。
建议:
- 用于放行你自己接入的系统机器人
- 避免被误删
### 4. Redis / DB 配置
#### `redis.host` / `redis.port` / `redis.password` / `redis.database`
Redis offense store 的连接配置。
#### `redis.key_prefix`
Redis key 前缀。
默认:
```text
telegram:moderation:v1:offense:
```
最终 key 示例:
```text
telegram:moderation:v1:offense:-1001234567890:777
```
#### `db.dsn` / `db.username` / `db.password`
数据库 offense store 配置。
当前代码里 DB 驱动已经提供最小可用 PDO 实现。
默认建议表结构见:
- `references/db-schema-example.sql`
## 推荐上线步骤
### 第一步:先测试群联调
建议只配置一个测试群:
- `allowed_chat_ids` 先只放测试群
- 打开 dry-run
- 暂时不启用 ban
### 第二步:验证消息链路
重点验证:
- 普通文本消息
- 带 caption 的图片
- 带 caption 的视频
- 编辑消息
- 转发消息
- 管理员消息
- 机器人消息
### 第三步:验证动作链路
重点验证:
- reject 是否真的删消息
- 高风险是否按预期 ban
- 管理员是否正确豁免
- 日志是否能定位 chat_id / user_id / message_id
### 第四步:再逐步上线正式群
建议按群逐步放量,不要一次性全开。
## 风险边界
这版 skill 目前属于:
- 可作为 Telegram 审核接入层 skeleton 使用
- 适合继续二次开发
这版 skill 暂时不应直接宣称为:
- 完整的 Telegram 反垃圾即插即用成品
- 完整的图片二维码审核系统
- 完整的视频字幕/语音审核系统
## 建议的下一步增强
建议后续优先补:
1. offense count 存储从本地文件 demo 升级到 Redis / DB
2. 多级阶梯处罚增加可配置豁免与更细分动作
3. mute / unmute 做更完整的恢复策略
4. review 转管理员群增加更丰富证据摘要
5. 图片 OCR / 二维码识别
6. 视频字幕 / 抽帧 / ASR
7. 与现有 `post-content-moderation` 做真实 HTTP 契约统一
如果你准备生产接入,继续阅读:
- `references/production-rollout.zh-CN.md`
- `references/http-contract-example.json`
- `references/http-contract-production.zh-CN.md`
- `references/http-contract-production-v2.zh-CN.md`
- `references/redis-db-offense-store.zh-CN.md`
- `references/audit-log-rollout.zh-CN.md`
- `references/audit-log-schema-example.sql`
- `references/config-template.env.example`
FILE:references/multi-language-integration.md
# Multi-language Integration
Use this reference when the user wants Telegram moderation integration examples beyond PHP.
## Goal
Provide small, readable webhook/action examples in common backend languages so teams can adapt the Telegram integration layer without being forced into PHP.
## Included languages
- Python
- Go
- Java
## Shared design rules
All example scripts should:
- treat this skill as a Telegram integration layer, not a full moderation engine
- validate Telegram webhook secret before processing
- normalize Telegram message updates into a small internal payload
- call an external moderation core endpoint
- map moderation result into Telegram actions such as delete, warn, mute, ban, or review notification
- use environment variables for all secrets and endpoints
- use explicit timeout values
- stay intentionally small and adaptation-friendly
## Important limitation
These examples are integration demos, not production-complete frameworks.
They do not automatically provide:
- offense count persistence
- queue-based retries
- OCR / QR / ASR media inspection
- full admin dashboard
- distributed rate limiting
## Suggested use
- read the language example that matches your stack
- adapt the normalization and action mapping to your business rules
- keep the moderation policy in `post-content-moderation` or your own moderation core
- do not duplicate policy logic across all languages unless required
FILE:references/php-yaf-integration.md
# PHP / Yaf Integration Notes
Use this when integrating Telegram moderation into a PHP 7.3 + Yaf project.
## Suggested placement
- webhook entry -> controller action or dedicated callback entry
- normalization -> library/service layer
- moderation core client -> library/client or service
- Telegram action client -> library/client
- offense tracking -> model layer
## Suggested flow in Yaf
1. webhook controller receives Telegram update JSON
2. validate secret / source / allowed chat
3. normalize update to internal moderation DTO
4. call moderation policy service
5. map result to Telegram action
6. persist moderation log
7. return fast HTTP 200 to Telegram
## Performance advice
- avoid heavy synchronous media downloads in webhook request path
- if media processing is needed, push to queue and return quickly
- rate-limit delete/mute/ban actions
- separate Telegram API failure logs from moderation decision logs
## Security advice
- never hardcode bot token
- restrict webhook route exposure
- verify allowed chat ids before action execution
- keep timeout and connect_timeout explicit for all outbound HTTP calls
FILE:references/production-rollout.zh-CN.md
# 生产接入建议(中文)
## 目标
这份说明用于把 `telegram-group-moderation` 从 demo / skeleton 接入层,推进到更稳的生产接入版本。
## 推荐生产架构
建议至少分成两个部分:
### 1. Webhook 接入层
职责:
- 接收 Telegram update
- 做 secret 校验
- 做 chat allowlist 校验
- 做基础消息标准化
- 快速返回 200
- 尽量不在请求线程里做重操作
### 2. 审核与动作执行层
职责:
- 读取标准化后的待处理事件
- 调用审核核心
- 根据规则执行 delete / warn / mute / ban / review
- 写违规次数与审计日志
如果群消息量较大,不建议所有处理都在 webhook 同步线程中完成。
## offense count 存储建议
当前 skill 的 PHP 版本内置:
- 本地 JSON 文件 demo
- Redis 最小可用实现
- DB 驱动结构预留
生产建议:
- 单实例低并发:可先用数据库
- 多实例部署:优先 Redis 或数据库
- 不建议长期依赖本地 JSON 文件
原因:
- 多实例下不同机器之间不会共享违规计数
- JSON 文件不适合高并发写入
- 进程异常中断时更容易出现覆盖或竞争问题
## 阶梯处罚建议
建议保留默认三段式,但要支持按群配置:
- 第 1 次:删除 + 警告
- 第 2 次:删除 + 禁言
- 第 3 次:删除 + 封禁
同时建议额外补:
- 管理员豁免
- 服务机器人豁免
- 特定群白名单策略
- 高风险关键词直接跳过前两级处罚
## Review 机制建议
生产环境下不要只返回 review 状态而不落地。
建议至少做一个:
- 发送到管理员审核群
- 写入 review 队列表
- 写入审计表,供后台查看
如果只做 review 但没有人接收,等于没做。
## Telegram Bot 权限建议
机器人至少需要:
- 删除消息权限
- 限制成员权限
- 封禁成员权限
如果需要更完整治理,还可能需要:
- 读取群消息权限
- 在管理员审核群发送消息权限
## 日志建议
生产建议至少记录:
- chat_id
- message_id
- user_id
- username
- normalized content 摘要
- audit_status
- risk_level
- action
- offense_count
- reason
- created_at
## 超时与失败策略
建议所有外部调用都设置:
- connect timeout
- request timeout
并区分两类失败:
### 1. 审核核心失败
建议:
- 高风险群:偏保守处理
- 普通群:review 或写待人工复核队列
### 2. Telegram API 动作失败
建议:
- 单独记录错误日志
- 允许有限次重试
- 不要把动作失败误写成审核通过
## 媒体处理建议
如果你的群里广告主要出现在图片和视频:
需要额外补:
- 图片 OCR
- 二维码识别
- 视频抽帧
- 字幕 OCR
- ASR 语音转文本
否则 Telegram 接入层只能稳定拦文本和 caption,不能宣称完整多模态审核。
## 上线顺序建议
### 阶段 1:测试群
- 开启 dry-run 或 review
- 不开 ban
- 观察误判率
### 阶段 2:小流量正式群
- 开 delete + warn
- 暂时只对高风险开 mute
### 阶段 3:稳定后再加 ban
- 对重复违规或高风险诈骗再启用 ban
## 0.1.1 建议目标
如果要把这个 skill 定义为更完整的 0.1.1,建议至少满足:
- 文档上明确 demo 与生产边界
- 配置上明确 offense store / review / action policy 的扩展方向
- 示例代码上包含 webhook secret、warn、mute、review、阶梯处罚
- 发布文案明确“可生产接入,不等于完整风控成品”
FILE:references/redis-db-offense-store.zh-CN.md
# Redis / DB offense store 设计说明(中文)
## 目标
把当前本地 JSON 文件版违规次数统计,升级为更适合生产环境的 Redis / DB 方案。
## 为什么要升级
本地 JSON 文件仅适合:
- 单机 demo
- 本地联调
- 小范围测试
不适合生产的原因:
- 多实例之间无法共享违规次数
- 高并发写入容易冲突
- 不利于审计和排查
- 不方便做跨群或跨时间维度分析
## Redis 方案建议
### 适用场景
适合:
- 高频实时判断
- 多实例部署
- 违规窗口统计以“最近 N 小时/天”为主
### 推荐 key 设计
```text
telegram:moderation:v1:offense:<chat_id>:<user_id>
```
### 推荐 value 结构
可选方案 1:有序集合(推荐)
- score:违规时间戳
- member:时间戳或 trace_id
优点:
- 方便按时间窗口清理
- 方便统计最近一段时间内的违规次数
### 推荐逻辑
1. reject 时写入当前时间戳
2. 清理窗口外数据
3. 统计当前窗口内数量
4. 返回 offense_count
### TTL 建议
TTL 可以略大于 `offense_window_seconds`,例如:
- 窗口 24h
- TTL 30h 或 48h
## DB 方案建议
### 适用场景
适合:
- 需要长期审计
- 需要后台查看处罚历史
- 需要复杂统计报表
### 建议表结构
表名示例:
```text
telegram_moderation_offense_log
```
字段建议:
- `id`
- `chat_id`
- `user_id`
- `message_id`
- `audit_status`
- `risk_level`
- `action`
- `reason`
- `created_at`
### 统计方式
实时判断时可按:
- `chat_id`
- `user_id`
- `created_at >= now - offense_window_seconds`
统计最近窗口内的 reject 次数。
## 推荐实践
如果你要:
- **高性能实时处罚** -> Redis 优先
- **长期审计和报表** -> DB 必备
- **两者都要** -> Redis 负责实时判断,DB 负责审计落库
## Skill 当前建议
当前 skill 更推荐这种落地方式:
- offense_count 判断用 Redis
- 处罚日志和 review 日志落 DB
这样既快,也方便回查。
## 当前 skill 已实现到什么程度
当前 PHP 版已经补上 Redis offense store 的最小可用实现:
- 使用 Redis zset 记录违规时间戳
- reject 时写入当前时间
- 自动清理窗口外数据
- 返回当前窗口内 offense_count
- 使用 key 前缀 + `chat_id:user_id` 作为统计维度
当前 DB 版也已经补上了最小可用实现:
- 使用 PDO 写入 offense log
- 按 `chat_id + user_id + created_at` 统计时间窗口内的违规次数
- 默认表名为 `telegram_moderation_offense_log`
如果你不想直接使用默认表结构,可以按自己的项目表结构修改 SQL。
FILE:references/release-notes.zh-CN.md
# 发布说明(中文)
## 版本 0.1.3
这是 `telegram-group-moderation` 的生产接入增强版,定位仍然是 **Telegram 群组审核接入层**,而不是替代已有审核规则系统。
核心思路是:
- 复用现有审核策略技能,例如 `post-content-moderation`
- 由 Telegram 接入层负责 webhook、消息标准化、动作执行、违规计数与审核日志
- 把审核结果映射成删消息、警告、禁言、封禁或人工复核动作
## 本版已包含内容
### 1. Telegram webhook 接入骨架
支持 Telegram update 的基础接收与处理流程,适合作为后续接入业务系统的起点。
### 2. 消息标准化逻辑
可以把 Telegram 的不同消息类型整理成统一审核输入,便于后续调用审核核心。
当前重点覆盖:
- `message`
- `edited_message`
- `channel_post`
- `edited_channel_post`
### 3. 审核核心调用骨架
内置对外部审核 endpoint 的调用结构,方便你接现有审核系统。
### 4. Telegram 动作执行增强
当前已包含:
- 删除消息示例
- 封禁用户示例
- 限时禁言示例
- 发送警告消息示例
- review 转管理员群通知示例
### 5. 多语言接入示例
当前已补充以下通用语言示例:
- PHP
- Python
- Go
- Java
### 6. Webhook 基础鉴权
当前已补充 Telegram webhook secret 校验示例,可作为生产接入前的最小安全基线。
### 7. offense count 与阶梯处罚
当前已补充:
- 基于时间窗口的违规次数统计
- file / redis / db 三种 offense store 驱动
- Redis 最小可用实现
- DB 最小可用 PDO 实现
- 按第 1 / 2 / 3 次违规自动切换处罚动作
默认思路为:
- 第 1 次:删除 + 警告
- 第 2 次:删除 + 禁言
- 第 3 次:删除 + 封禁
### 8. 生产接入参考材料
当前已新增:
- 生产接入建议文档
- env 配置模板
- moderation core HTTP 契约示例
- 生产版 HTTP 契约说明
- Redis / DB offense store 设计说明
- 默认 offense log 表结构示例
### 9. 审核日志落库与 trace_id
当前已补充:
- `TelegramTraceIdBuilder`
- `TelegramAuditLogStore`
- 默认审核日志表结构示例
- trace_id 串联 webhook / moderation / action / audit log
- 动作执行结果 `action_result` 落库
### 10. 基础安全与治理配置
已加入:
- allowlist host 校验
- allowed chat ids
- admin exempt
- trusted bot exempt
- dry-run
- 超时控制
- offense store driver 配置
- audit log table 配置
## 本版更适合什么场景
这版适合:
- 已经有审核规则,想快速接入 Telegram 群治理
- 想先搭起 webhook + 审核 + 动作执行 + 审计日志 的最小闭环
- 想在 PHP 7.3 / Yaf 项目里逐步演进 Telegram 群审核能力
## 本版暂时不包含什么
这版还不是完整成品,以下能力需要后续补齐:
- review 状态流转后台化
- offense log 审计字段进一步扩充
- mute / unmute 的更完整控制
- 阶梯处罚增加更复杂的规则与豁免体系
- 图片 OCR / 二维码识别
- 视频抽帧 / 字幕识别 / ASR
- 更完整的 Telegram Bot 管理动作封装
## 风险边界
请不要把这版直接理解为:
- 开箱即用的 Telegram 全自动风控系统
- 完整的多模态反广告引擎
- 无需二次开发即可生产上线的商业化审核机器人
更准确地说,它是:
- 一个清晰、可扩展、适合落地改造的 Telegram 审核接入层 skeleton / beta
## 推荐接入方式
建议与 `post-content-moderation` 配合使用:
- `post-content-moderation` 负责审核规则与结构化结果
- `telegram-group-moderation` 负责 Telegram webhook、消息解析、动作执行、违规计数和审计日志
## 下一步建议
建议按这个顺序继续增强:
1. review 状态流转后台化
2. offense log 审计字段进一步扩充
3. mute / unmute 做更完整的恢复策略
4. 阶梯处罚增加更复杂的规则与豁免体系
5. 再补图片 / 视频审核链路
6. 最后再做多群、多策略、多环境配置
FILE:references/telegram-event-mapping.md
# Telegram Event Mapping
## Main update types
Handle at least:
- `message`
- `edited_message`
- `channel_post`
- `edited_channel_post`
## Useful Telegram fields
From message-like payloads, extract when present:
- `chat.id`
- `chat.type`
- `chat.title`
- `message_id`
- `from.id`
- `from.username`
- `from.first_name`
- `from.last_name`
- `date`
- `text`
- `caption`
- `entities`
- `caption_entities`
- `photo`
- `video`
- `document`
- `forward_origin` or legacy forward fields
- `reply_to_message`
## Normalized moderation payload
Example normalized payload:
```json
{
"platform": "telegram",
"chat_id": -1001234567890,
"chat_type": "supergroup",
"chat_title": "Example Group",
"message_id": 345,
"user_id": 777,
"username": "spam_user",
"display_name": "Promo Bot",
"text": "加V了解一下",
"caption": "扫码进群",
"images": [],
"videos": [],
"raw_has_photo": true,
"raw_has_video": false,
"forwarded": false,
"edited": false
}
```
## Normalization rules
- combine `text` and `caption` into the text review surface when policy requires full-message judgment
- preserve original message id and chat id for Telegram action calls
- mark edited messages so you can re-audit them separately
- keep media presence flags even if real media inspection is not enabled
- treat usernames, invite links, and obvious handles as extra evidence when your policy requires it
FILE:scripts/contracts/README.txt
This directory contains interface-style and example implementations for production offense stores.
Use FileTelegramOffenseStore for local demo, RedisTelegramOffenseStore as a stub for Redis adaptation, and DbTelegramOffenseStore as a stub for DB-backed adaptation.
FILE:scripts/go_telegram_webhook_example.go
package main
import (
"bytes"
"encoding/json"
"fmt"
"net/http"
"os"
"strings"
"time"
)
type updateEnvelope map[string]interface{}
type moderationResult struct {
AuditStatus string `json:"audit_status"`
RiskLevel string `json:"risk_level"`
Reason string `json:"reason"`
}
func env(key, fallback string) string {
value := os.Getenv(key)
if value == "" {
return fallback
}
return value
}
func verifySecret(provided string) bool {
expected := env("TELEGRAM_WEBHOOK_SECRET", "")
if expected == "" {
return true
}
return provided == expected
}
func pickMessage(update updateEnvelope) (string, map[string]interface{}) {
fields := []string{"message", "edited_message", "channel_post", "edited_channel_post"}
for _, field := range fields {
if value, ok := update[field].(map[string]interface{}); ok {
return field, value
}
}
return "", nil
}
func normalize(update updateEnvelope) map[string]interface{} {
updateType, message := pickMessage(update)
if message == nil {
return nil
}
chat, _ := message["chat"].(map[string]interface{})
sender, _ := message["from"].(map[string]interface{})
text, _ := message["text"].(string)
caption, _ := message["caption"].(string)
return map[string]interface{}{
"platform": "telegram",
"update_type": updateType,
"chat_id": chat["id"],
"message_id": message["message_id"],
"user_id": sender["id"],
"username": sender["username"],
"content": strings.TrimSpace(text + "\n" + caption),
"raw_has_photo": message["photo"] != nil,
"raw_has_video": message["video"] != nil,
}
}
func postJSON(url string, payload interface{}, token string, timeout time.Duration, out interface{}) error {
body, err := json.Marshal(payload)
if err != nil {
return err
}
req, err := http.NewRequest("POST", url, bytes.NewReader(body))
if err != nil {
return err
}
req.Header.Set("Content-Type", "application/json")
if token != "" {
req.Header.Set("Authorization", "Bearer "+token)
}
client := &http.Client{Timeout: timeout}
resp, err := client.Do(req)
if err != nil {
return err
}
defer resp.Body.Close()
if out != nil {
return json.NewDecoder(resp.Body).Decode(out)
}
return nil
}
func moderate(payload map[string]interface{}) (moderationResult, error) {
endpoint := env("MODERATION_CORE_ENDPOINT", "")
token := env("MODERATION_CORE_TOKEN", "")
req := map[string]interface{}{
"platform": "telegram",
"id": payload["message_id"],
"title": "",
"content": payload["content"],
"imgs": []string{},
"videos": []string{},
"other": map[string]interface{}{
"chat_id": payload["chat_id"],
"user_id": payload["user_id"],
"username": payload["username"],
"raw_has_photo": payload["raw_has_photo"],
"raw_has_video": payload["raw_has_video"],
},
}
var result moderationResult
err := postJSON(endpoint, req, token, 15*time.Second, &result)
return result, err
}
func telegramCall(method string, payload interface{}) error {
botToken := env("TELEGRAM_BOT_TOKEN", "")
apiBase := strings.TrimRight(env("TELEGRAM_API_BASE", "https://api.telegram.org"), "/")
url := apiBase + "/bot" + botToken + "/" + method
return postJSON(url, payload, "", 10*time.Second, nil)
}
func main() {
update := updateEnvelope{
"message": map[string]interface{}{
"message_id": 1,
"chat": map[string]interface{}{"id": -100123, "type": "supergroup"},
"from": map[string]interface{}{"id": 777, "username": "spam_user"},
"text": "加V了解一下",
},
}
if !verifySecret(env("TELEGRAM_WEBHOOK_SECRET", "")) {
fmt.Println("invalid webhook secret")
return
}
payload := normalize(update)
if payload == nil {
fmt.Println("unsupported update")
return
}
result, err := moderate(payload)
if err != nil {
panic(err)
}
if result.AuditStatus == "reject" {
_ = telegramCall("deleteMessage", map[string]interface{}{
"chat_id": payload["chat_id"],
"message_id": payload["message_id"],
})
_ = telegramCall("sendMessage", map[string]interface{}{
"chat_id": payload["chat_id"],
"text": env("TELEGRAM_WARN_MESSAGE_TEMPLATE", "请勿发布广告、引流或联系方式内容。"),
"reply_to_message_id": payload["message_id"],
})
}
if result.AuditStatus == "review" {
adminReviewChatID := env("TELEGRAM_ADMIN_REVIEW_CHAT_ID", "")
if adminReviewChatID != "" {
_ = telegramCall("sendMessage", map[string]interface{}{
"chat_id": adminReviewChatID,
"text": fmt.Sprintf("[review]\nchat_id: %v\nmessage_id: %v\nuser_id: %v\nusername: %v\nreason: %s",
payload["chat_id"], payload["message_id"], payload["user_id"], payload["username"], result.Reason),
})
}
}
fmt.Println("done")
}
FILE:scripts/java_telegram_webhook_example.java
import java.io.InputStream;
import java.io.OutputStream;
import java.net.HttpURLConnection;
import java.net.URL;
import java.nio.charset.StandardCharsets;
import java.util.HashMap;
import java.util.Map;
public class JavaTelegramWebhookExample {
private static String env(String key, String fallback) {
String value = System.getenv(key);
return value == null || value.isEmpty() ? fallback : value;
}
private static boolean verifySecret(String provided) {
String expected = env("TELEGRAM_WEBHOOK_SECRET", "");
return expected.isEmpty() || expected.equals(provided);
}
private static String postJson(String targetUrl, String jsonBody, String bearerToken, int timeoutMs) throws Exception {
HttpURLConnection connection = (HttpURLConnection) new URL(targetUrl).openConnection();
connection.setRequestMethod("POST");
connection.setConnectTimeout(timeoutMs);
connection.setReadTimeout(timeoutMs);
connection.setDoOutput(true);
connection.setRequestProperty("Content-Type", "application/json");
if (bearerToken != null && !bearerToken.isEmpty()) {
connection.setRequestProperty("Authorization", "Bearer " + bearerToken);
}
try (OutputStream os = connection.getOutputStream()) {
os.write(jsonBody.getBytes(StandardCharsets.UTF_8));
}
InputStream inputStream = connection.getResponseCode() >= 200 && connection.getResponseCode() < 300
? connection.getInputStream()
: connection.getErrorStream();
byte[] body = inputStream.readAllBytes();
return new String(body, StandardCharsets.UTF_8);
}
private static String telegramMethodUrl(String method) {
String apiBase = env("TELEGRAM_API_BASE", "https://api.telegram.org");
String botToken = env("TELEGRAM_BOT_TOKEN", "");
return apiBase.replaceAll("/+$", "") + "/bot" + botToken + "/" + method;
}
public static void main(String[] args) throws Exception {
if (!verifySecret(env("TELEGRAM_WEBHOOK_SECRET", ""))) {
throw new RuntimeException("invalid webhook secret");
}
String moderationEndpoint = env("MODERATION_CORE_ENDPOINT", "");
String moderationToken = env("MODERATION_CORE_TOKEN", "");
String moderationPayload = "{" +
"\"platform\":\"telegram\"," +
"\"id\":1," +
"\"title\":\"\"," +
"\"content\":\"加V了解一下\"," +
"\"imgs\":[]," +
"\"videos\":[]," +
"\"other\":{\"chat_id\":-100123,\"user_id\":777,\"username\":\"spam_user\"}" +
"}";
String moderationResponse = postJson(moderationEndpoint, moderationPayload, moderationToken, 15000);
System.out.println(moderationResponse);
String deletePayload = "{\"chat_id\":-100123,\"message_id\":1}";
postJson(telegramMethodUrl("deleteMessage"), deletePayload, "", 10000);
String warnText = env("TELEGRAM_WARN_MESSAGE_TEMPLATE", "请勿发布广告、引流或联系方式内容。");
String sendMessagePayload = "{" +
"\"chat_id\":-100123," +
"\"text\":\"" + warnText.replace("\"", "\\\"") + "\"," +
"\"reply_to_message_id\":1" +
"}";
postJson(telegramMethodUrl("sendMessage"), sendMessagePayload, "", 10000);
}
}
FILE:scripts/python_telegram_webhook_example.py
import json
import os
import time
import urllib.request
import urllib.error
TELEGRAM_WEBHOOK_SECRET = os.getenv("TELEGRAM_WEBHOOK_SECRET", "")
TELEGRAM_BOT_TOKEN = os.getenv("TELEGRAM_BOT_TOKEN", "")
TELEGRAM_API_BASE = os.getenv("TELEGRAM_API_BASE", "https://api.telegram.org")
MODERATION_CORE_ENDPOINT = os.getenv("MODERATION_CORE_ENDPOINT", "")
MODERATION_CORE_TOKEN = os.getenv("MODERATION_CORE_TOKEN", "")
WARN_MESSAGE_TEMPLATE = os.getenv("TELEGRAM_WARN_MESSAGE_TEMPLATE", "请勿发布广告、引流或联系方式内容。")
MUTE_SECONDS = int(os.getenv("TELEGRAM_MUTE_SECONDS", "600"))
ADMIN_REVIEW_CHAT_ID = int(os.getenv("TELEGRAM_ADMIN_REVIEW_CHAT_ID", "0"))
def http_json(method, url, payload, headers=None, timeout=10):
body = json.dumps(payload).encode("utf-8") if payload is not None else None
request = urllib.request.Request(url, data=body, method=method)
final_headers = headers or {}
for key, value in final_headers.items():
request.add_header(key, value)
request.add_header("Content-Type", "application/json")
with urllib.request.urlopen(request, timeout=timeout) as response:
return json.loads(response.read().decode("utf-8"))
def verify_secret(provided):
if not TELEGRAM_WEBHOOK_SECRET:
return True
return provided == TELEGRAM_WEBHOOK_SECRET
def pick_message(update):
for field in ["message", "edited_message", "channel_post", "edited_channel_post"]:
if isinstance(update.get(field), dict):
return field, update[field]
return None, None
def normalize(update):
update_type, message = pick_message(update)
if not message:
return None
chat = message.get("chat") or {}
sender = message.get("from") or {}
text = message.get("text", "")
caption = message.get("caption", "")
return {
"platform": "telegram",
"update_type": update_type,
"chat_id": chat.get("id", 0),
"message_id": message.get("message_id", 0),
"user_id": sender.get("id", 0),
"username": sender.get("username", ""),
"content": (text + "\n" + caption).strip(),
"raw_has_photo": bool(message.get("photo")),
"raw_has_video": isinstance(message.get("video"), dict),
}
def moderate(payload):
headers = {}
if MODERATION_CORE_TOKEN:
headers["Authorization"] = "Bearer " + MODERATION_CORE_TOKEN
return http_json("POST", MODERATION_CORE_ENDPOINT, {
"platform": "telegram",
"id": payload["message_id"],
"title": "",
"content": payload["content"],
"imgs": [],
"videos": [],
"other": {
"chat_id": payload["chat_id"],
"user_id": payload["user_id"],
"username": payload["username"],
"raw_has_photo": payload["raw_has_photo"],
"raw_has_video": payload["raw_has_video"],
}
}, headers=headers, timeout=15)
def telegram_call(method, payload):
url = TELEGRAM_API_BASE.rstrip("/") + "/bot" + TELEGRAM_BOT_TOKEN + "/" + method
return http_json("POST", url, payload, timeout=10)
def send_warning(chat_id, message_id):
return telegram_call("sendMessage", {
"chat_id": chat_id,
"text": WARN_MESSAGE_TEMPLATE,
"reply_to_message_id": message_id,
})
def delete_message(chat_id, message_id):
return telegram_call("deleteMessage", {
"chat_id": chat_id,
"message_id": message_id,
})
def mute_user(chat_id, user_id):
return telegram_call("restrictChatMember", {
"chat_id": chat_id,
"user_id": user_id,
"permissions": {
"can_send_messages": False,
"can_send_audios": False,
"can_send_documents": False,
"can_send_photos": False,
"can_send_videos": False,
"can_send_video_notes": False,
"can_send_voice_notes": False,
"can_send_polls": False,
"can_send_other_messages": False,
"can_add_web_page_previews": False,
"can_change_info": False,
"can_invite_users": False,
"can_pin_messages": False,
"can_manage_topics": False,
},
"until_date": int(time.time()) + MUTE_SECONDS,
})
def ban_user(chat_id, user_id):
return telegram_call("banChatMember", {
"chat_id": chat_id,
"user_id": user_id,
})
def notify_review(payload, result, action):
if ADMIN_REVIEW_CHAT_ID <= 0:
return None
text = "[review]\nchat_id: {chat_id}\nmessage_id: {message_id}\nuser_id: {user_id}\nusername: {username}\naction: {action}\nreason: {reason}".format(
chat_id=payload["chat_id"],
message_id=payload["message_id"],
user_id=payload["user_id"],
username=payload["username"],
action=action,
reason=result.get("reason", ""),
)
return telegram_call("sendMessage", {"chat_id": ADMIN_REVIEW_CHAT_ID, "text": text})
def handle_update(update, provided_secret):
if not verify_secret(provided_secret):
return {"ok": False, "error": "invalid webhook secret"}, 403
payload = normalize(update)
if not payload:
return {"ok": True, "skipped": "unsupported update"}, 200
result = moderate(payload)
status = result.get("audit_status", "review")
risk = result.get("risk_level", "medium")
if status == "pass":
return {"ok": True, "action": "allow"}, 200
if status == "reject":
delete_message(payload["chat_id"], payload["message_id"])
send_warning(payload["chat_id"], payload["message_id"])
if risk == "high":
mute_user(payload["chat_id"], payload["user_id"])
return {"ok": True, "action": "delete_and_warn"}, 200
notify_review(payload, result, "review")
return {"ok": True, "action": "review"}, 200
if __name__ == "__main__":
sample_update = {
"message": {
"message_id": 1,
"chat": {"id": -100123, "type": "supergroup"},
"from": {"id": 777, "username": "spam_user"},
"text": "加V了解一下",
}
}
response, status_code = handle_update(sample_update, TELEGRAM_WEBHOOK_SECRET)
print(status_code)
print(json.dumps(response, ensure_ascii=False))
Review, rewrite, and moderate user-generated posts across title, body text, images, and videos to block ads and contact information while allowing configurab...
---
name: post-content-moderation
version: 1.0.0
description: Review, rewrite, and moderate user-generated posts across title, body text, images, and videos to block ads and contact information while allowing configurable whitelist exceptions and project-specific custom rules.
emoji: 🛡️
homepage: https://github.com/XavierMary56/OmniPublish
metadata:
openclaw:
requires:
bins:
- python3
---
# Post Content Moderation
## Overview
Apply a strict moderation workflow to the full post package, not just plain text. Default goal: review title, body content, images, and videos together, then reject content that contains advertising intent or contact information unless the match falls inside an explicit whitelist or a user-provided custom rule.
## Skill maintenance note
If this skill is published to ClawHub, keep the local version in the sibling `VERSION` file in sync with the published version.
Recommended release flow:
1. update skill content
2. bump `VERSION`
3. publish with the same version number
4. keep `VERSION` as the last published version after success
Recommended publish command template:
```bash
clawhub publish /path/to/post-content-moderation \
--slug post-content-moderation \
--name "Post Content Moderation" \
--version $(cat VERSION) \
--changelog "your release note"
```
## Security and capability notice
Before using this skill in production, treat it as a **networked moderation integration**, not a purely local rules engine.
Important boundaries:
- bundled PHP scripts can send moderation payloads to external APIs
- bundled PHP scripts can pull pending content from a remote API and callback results to a remote API
- any post text, comment text, whitelist, custom rules, image URLs, or video URLs included in the payload may leave the local environment
- the bundled media inspector in `scripts/moderation_support.php` is currently a placeholder and does **not** perform real image OCR, QR decoding, frame extraction, speech recognition, or video inspection by itself
- if you claim image/video moderation in production, implement and verify real media preprocessing first
Recommended safety baseline:
- use environment variables for all secrets
- use narrowly scoped allowlisted API hosts only
- keep timeout and fail-close policy explicit
- add dry-run testing before enabling callback writes
- do not send unnecessary user data to external models
- document to operators that media URLs may be exposed to third-party services if passed through
- avoid presenting the bundled PHP scripts as a full local media-audit engine
## Quick operating modes
Choose one mode before moderating:
### 1. Strict full-auto mode
Use when the business requires no human intervention.
Rule:
- clear violation -> `拒绝`
- clear clean content -> `通过`
- ambiguous but risky -> `拒绝`
- media unreadable / model output invalid / upstream failed -> `拒绝`
### 2. Balanced mode
Use when manual review is available.
Rule:
- clear violation -> `拒绝`
- clear clean content -> `通过`
- ambiguous or partially unreadable -> `需人工复核`
## Moderation scope
Always review every available part of a post:
- 标题
- 正文
- 图片
- 视频
- 图片或视频内可见文字
- 配文、字幕、贴纸文字、水印、角标、二维码、头像/昵称引流信息
Do not approve a post only because the body text is clean. If any one component violates policy, the whole post should normally be marked `拒绝` or `需人工复核`.
## Field-specific rules
### 标题规则
Reject or review titles that contain:
- direct promotion, recruitment, rebate, agency, commission, guaranteed results
- diversion language such as "私聊", "加我", "主页联系", "扫码了解"
- visible contact IDs, domain names, or disguised contact phrases
- exaggerated ad-style hooks whose main purpose is conversion rather than discussion
### 正文规则
Reject or review body text that contains:
- product/service promotion with obvious conversion intent
- calls to contact off-platform or move to private chat
- direct or disguised联系方式
- repeated promotional copy, pricing slogans, or enrollment / agency / lead-generation language
### 图片规则
Reject or review images that contain:
- QR codes, mini-program codes, payment codes, group invite codes
- contact cards, business cards, chat screenshots, profile screenshots
- watermarks with account IDs, brands plus CTA, or sales copy
- posters with price, discount, recruitment, or diversion wording
- corner text, decorative background text, or hidden overlaid text carrying contact or ad signals
### 视频规则
Reject or review videos that contain:
- cover image text with promotion or contact details
- subtitles or spoken content that guide users off-platform
- opening cards, ending cards, or fixed watermarks with contact info
- flash frames showing QR codes, account IDs, phone numbers, or group invitations
- oral CTA such as "想了解私信我" / "主页联系" / "加V备注"
## Default moderation policy
Treat the following as violations unless explicitly whitelisted:
### 1. Advertising / promotion
Block content that obviously promotes:
- products, services, channels, groups, websites, apps, stores, or paid offers
- traffic diversion such as "加我", "私聊了解", "点击链接", "扫码进群", "代理/加盟/返利/推广"
- recruitment, lead collection, account selling, brushing orders, betting, gray-market promotion, or obvious marketing copy
- repeated brand exposure plus calls to action
- image or video overlays that guide users to off-platform contact or purchase
Common ad signals:
- strong call-to-action language
- price, discount, rebate, agency, invitation, commission, guaranteed results
- external platform redirection
- excessive emoji / symbols used for promotion
- intentionally obfuscated promotional wording
- promotional subtitles, end cards, cover text, or watermark copy
### 2. Contact information
Block direct or disguised contact details, including:
- phone or mobile numbers
- WeChat / VX / V / vx / wx variants
- QQ numbers or QQ group numbers
- email addresses
- URLs, domains, short links, QR-code invitations expressed in text or shown in media
- WhatsApp, Line, Discord, Skype, social account IDs
- platform handles or IDs whose clear purpose is off-platform contact
- QR codes, payment codes, contact cards, business cards, profile screenshots, or group invitation screenshots
Also treat as violations when contact information is deliberately obfuscated, for example:
- replacing digits with spaces, symbols, Chinese numerals, or homophones
- mixing letters and punctuation to bypass detection
- replacing keywords with variants such as `v`, `vx`, `wx`, `薇`, `微`, `卫星`, `扣扣`, `邮箱`, `油箱`
- splitting numbers across spaces or punctuation
- using Chinese numerals, homophones, or mixed scripts to hide IDs
- "微❤", "薇", "卫星", "扣扣", "油箱", "点我头像", "看签名"
- showing contact text only in image corners, cover image, video ending card, or subtitle frames
## Multi-modal review workflow
For each moderation task, follow this order:
1. Review title.
2. Review body text.
3. Review each image for:
- visible text
- QR codes
- watermarks
- contact cards / screenshots / profile clues
- promotional posters or pricing posters
4. Review each video for:
- spoken or subtitled contact info
- cover image text
- opening/ending cards
- watermarks and persistent corner text
- QR codes or account IDs shown in frames
5. Merge findings across all components.
6. Check whitelist and custom rules.
7. Produce one final result: `通过`, `拒绝`, or `需人工复核`.
## Quick decision tree
Use this shortcut when speed matters:
- any clear contact info in title/body/image/video -> `拒绝`
- any clear ad/diversion language in title/body/image/video -> `拒绝`
- whitelist clearly covers the matched content and scenario -> `通过`
- model cannot inspect a required field:
- strict full-auto -> `拒绝`
- balanced mode -> `需人工复核`
- no hit across all fields -> `通过`
## Cross-field judgment rule
Judge the post as a whole.
Examples:
- title is normal but image contains a WeChat ID -> reject
- body is normal but video end card says "扫码领取" -> reject
- image shows a QR code but context is unclear and whitelist is absent -> manual review
- title/body mention a brand normally, but video repeatedly induces purchase or contact -> reject
## Whitelist handling
Whitelist is an exception layer, not a blanket bypass.
Allow content when the whitelist clearly covers the matched item, for example:
- allowed brand names
- allowed official accounts or official domains
- approved merchant names
- approved phrases that would otherwise look promotional
- internal business terms that resemble blocked words
- approved official QR codes or approved official service accounts in a narrowly defined scenario
Apply whitelist with these constraints:
- whitelist only the exact item or exact scenario the user approved
- do not expand a narrow whitelist into a broad exemption
- if promotional intent still exists outside the whitelisted fragment, reject
- if the post contains extra contact details beyond the whitelist scope, reject or send to manual review
- if only one image/video asset is whitelisted, do not automatically whitelist all assets in the same post
If the user says "可以自己定义自己的规则", ask for or accept custom rules in plain language and merge them after the default policy.
## Result priority
Use these priorities:
- clear ad or contact evidence in any field -> `拒绝`
- weak or ambiguous evidence, especially in image/video details -> `需人工复核`
- all fields clean, or only narrowly whitelisted content appears -> `通过`
## Standard output
Prefer this fixed output:
```text
审核结果:通过 / 拒绝 / 需人工复核
风险等级:低 / 中 / 高
审核范围:标题 / 正文 / 图片 / 视频
命中字段:标题 / 正文 / 图片 / 视频
命中位置:标题 / 正文第2段 / 第1张图片右下角 / 视频封面 / 视频00:12字幕 / 视频结尾等
命中规则:广告 / 联系方式 / 白名单例外 / 自定义规则
原因:一句话说明核心依据
处理建议:通过发布 / 删除违规文案 / 替换图片 / 裁剪视频片段 / 转人工复核
改写建议:如需,给出可发布版本
```
### Risk level guidance
- `高`:明确广告、明确联系方式、明确二维码、明确引流
- `中`:存在明显嫌疑但证据不完整,或有混淆空间
- `低`:未命中风险,仅正常讨论或明确白名单放行
## JSON output option
If the user requests structured output, use:
```json
{
"result": "pass|reject|review",
"risk_level": "low|medium|high",
"scope": ["title", "body", "images", "video"],
"hits": [
{
"field": "image",
"position": "image_1_bottom_right",
"rule": "contact_info",
"evidence": "VX: abc123456"
}
],
"reason": "image contains off-platform contact information",
"action": "replace_image"
}
```
## Rewrite behavior
When asked to revise a rejected post:
- remove ad-like calls to action
- remove all contact information and diversion hints
- remove or replace violating images/videos when needed
- preserve the user's main legitimate meaning
- keep wording natural, not robotic
- do not silently keep risky borderline phrases
- if media itself is违规, state that the media asset must be removed or replaced
## Custom rule intake
When the user provides extra moderation rules, normalize them into this structure before applying:
```text
审核对象:标题 / 正文 / 图片 / 视频 / 全部
白名单:<allowed words / brands / accounts / domains / QR codes / scenarios>
禁止词:<extra banned words>
禁止场景:<extra banned behaviors>
放行场景:<explicitly allowed contexts>
媒体规则:<special image/video rules>
输出要求:<whether to rewrite, label, explain, or return JSON>
```
If custom rules are incomplete, still perform moderation with defaults and state which defaults were used.
## Media-specific guidance
### Images
Pay extra attention to:
- posters, flyers, screenshots, chat records, contact cards
- QR codes, mini-program codes, payment codes
- watermarks with account IDs or brand recruitment copy
- text hidden in corners, borders, or background decorations
### Videos
Pay extra attention to:
- cover image text
- subtitles and speech transcription
- opening/ending cards
- repeated corner watermarks
- flash frames showing contact info
- spoken diversion like "私信我" or "主页联系"
If video cannot be fully inspected, do not guess. Return `需人工复核` and specify what could not be confirmed.
## Examples
### Example 1: reject because image contains contact info
Input:
```text
标题:今天分享一下我的护肤心得
正文:最近换季,保湿真的很重要。
图片:海报右下角含“VX: abc123456”
```
Output:
```text
审核结果:拒绝
风险等级:高
审核范围:标题、正文、图片
命中字段:图片
命中位置:第1张图片右下角
命中规则:联系方式
原因:图片中存在微信联系方式,构成站外引流。
处理建议:替换图片
建议:删除或替换该图片后再发布。
```
### Example 2: reject because video end card is promotional
Input:
```text
标题:兼职经验分享
正文:最近试了几个渠道。
视频:结尾出现“想了解更多,扫码进群”
```
Output:
```text
审核结果:拒绝
风险等级:高
审核范围:标题、正文、视频
命中字段:视频
命中位置:视频结尾
命中规则:广告、联系方式
原因:视频结尾包含明显引流和扫码进群信息。
处理建议:裁剪或替换违规片段
建议:移除结尾引流片段并删除相关联系方式。
```
### Example 3: whitelist exception
Input rule:
```text
白名单:官方客服微信 service_official,仅用于售后说明;官方售后二维码,仅限订单售后页面展示
```
Input content:
```text
标题:售后处理说明
正文:如订单异常,请联系官方客服处理。
图片:展示官方售后二维码
```
Output:
```text
审核结果:通过
风险等级:低
审核范围:标题、正文、图片
命中字段:图片
命中位置:第1张图片
命中规则:白名单例外
原因:命中的二维码属于明确白名单,且场景限于官方售后说明。
处理建议:通过发布
```
## References
- Read `references/rule-template.md` when converting user-provided business rules into a reusable moderation policy.
- Read `references/install-and-usage.md` when the user asks how to install, import, distribute, or operate this skill in a workspace.
- Read `references/api-integration.md` when the user wants pull APIs, callback APIs, comment moderation APIs, or a fully automatic no-human-review moderation workflow.
- Read `references/api-spec.md` when the user needs a backend-facing API contract with request fields, response fields, error codes, action enums, and implementation flow.
- Read `references/prompt-templates.md` when the user needs reusable moderation prompts, JSON-only response constraints, or full-auto conservative suffixes.
- Read `references/php-example-notes.md` and use `scripts/php_xai_client_example.php` when the user wants a PHP 7.3 integration example for x.ai moderation, callback handling, or comment moderation.
- Read `references/php-demo-suite.md` and use the bundled scripts when the user wants runnable PHP demos for pull, audit, callback, and comment-review flows.
FILE:references/api-integration.md
# API Integration
Use this reference when the user wants to wire the moderation skill into pull APIs, callback APIs, comment APIs, or a fully automatic review workflow.
## Recommended architecture
Use a simple four-step flow:
1. 拉取未审核帖子数据
2. 调用大模型审核标题/正文/图片/视频
3. 生成结构化审核结果
4. 回调业务审核结果接口
For comments, use a lighter flow:
1. 接收评论 `id + text`
2. 调用审核
3. 立即返回通过/拒绝结果
## Pull interface for pending posts
Recommended input shape:
```json
[
{
"id": 10001,
"title": "xx",
"imgs": [
"https://cdn.example.com/a.jpg"
],
"videos": [
"https://cdn.example.com/a.mp4"
],
"content": "xxx",
"other": 0
}
]
```
### Field notes
- `id`: required; unique post id
- `title`: optional but should be provided when available
- `imgs`: array of image URLs
- `videos`: array of video URLs
- `content`: main post body
- `other`: business extension field such as source/type/flags
## Callback interface for audit result
Recommended callback payload:
```json
{
"id": 10001,
"audit_status": "pass|reject|review",
"is_pass": 1,
"risk_level": "low|medium|high",
"reason": "图片右下角存在微信联系方式",
"hit_rules": ["contact_info"],
"hit_fields": ["images"],
"hit_positions": ["image_1_bottom_right"],
"action": "replace_image",
"audit_time": "2026-03-18 18:30:00"
}
```
### Minimal callback fields
If the business side only needs the minimum fields, return:
```json
{
"id": 10001,
"is_pass": 0,
"reason": "存在广告或联系方式"
}
```
## Comment moderation interface
Recommended comment input:
```json
{
"id": 90001,
"text": "加V了解一下 abc123456"
}
```
Recommended comment output:
```json
{
"id": 90001,
"audit_status": "reject",
"is_pass": 0,
"risk_level": "high",
"reason": "评论中存在联系方式和引流话术",
"hit_rules": ["contact_info", "advertising"]
}
```
## Automatic audit rule
If the user wants fully automatic moderation with no human action, use this policy:
- `pass` → 自动通过
- `reject` → 自动拦截
- do not use `review` unless the business explicitly keeps a manual-review lane
If the user requires zero manual operations, convert ambiguous cases into a conservative result:
- high-confidence violation → `reject`
- high-confidence clean content → `pass`
- ambiguous but risky content → `reject`
This is stricter, but matches "自动审核不需要人工操作".
## Prompting pattern for model call
When sending data to the model, include:
- moderation policy summary
- whitelist rules
- custom business rules
- post fields: title, content, imgs, videos
- required output schema
Recommended result schema:
```json
{
"id": 10001,
"audit_status": "pass|reject",
"is_pass": 1,
"risk_level": "low|medium|high",
"reason": "string",
"hit_rules": ["advertising", "contact_info"],
"hit_fields": ["title", "content", "images", "videos"],
"hit_positions": ["title", "image_1", "video_end_card"],
"action": "publish|reject|replace_image|trim_video"
}
```
## Example x.ai request shape
Do not hardcode real keys in code or docs. Use environment variables.
```bash
curl https://api.x.ai/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $XAI_API_KEY" \
-d '{
"model": "grok-4-1-fast",
"temperature": 0,
"stream": false,
"messages": [
{
"role": "system",
"content": "You are a strict moderation assistant. Block ads and contact information. Return JSON only."
},
{
"role": "user",
"content": "请审核以下帖子并返回 JSON:标题: xx;正文: xxx;图片: [https://...]; 视频: [https://...]"
}
]
}'
```
## Reliability recommendations
- set request timeout and connect timeout
- use retries only for network/transient failures, not content failures
- keep `temperature` at `0`
- enforce JSON-only output
- log raw request id, post id, result, and reason
- if image/video cannot be fetched, fail closed when full automation is required
## Suggested business mapping
### Posts
- `is_pass = 1` => 发布成功 or move to published state
- `is_pass = 0` => 拦截 and store reason
### Comments
- `is_pass = 1` => comment visible
- `is_pass = 0` => comment hidden/rejected immediately
## Suggested error handling
If model call fails:
- full-auto strict mode: reject and record `审核服务失败,已自动拦截`
- balanced mode: send to manual review
If media download fails:
- full-auto strict mode: reject and record `媒体读取失败,无法完成审核`
- balanced mode: send to manual review
FILE:references/api-spec.md
# API Spec
Use this reference when the user wants a backend-facing API contract for post moderation, comment moderation, callback integration, and fully automatic review.
## 1. Pending post pull API
Business side returns a list of posts waiting for moderation.
### Response example
```json
{
"code": 0,
"message": "ok",
"data": [
{
"id": 10001,
"title": "副业分享",
"content": "最近发现一个不错的项目",
"imgs": [
"https://cdn.example.com/post/10001-1.jpg"
],
"videos": [
"https://cdn.example.com/post/10001-1.mp4"
],
"other": 0,
"created_at": "2026-03-18 18:00:00"
}
]
}
```
### Field definition
| Field | Type | Required | Description |
|---|---|---:|---|
| code | int | yes | `0` means success |
| message | string | yes | response message |
| data | array | yes | pending moderation list |
| data[].id | int/string | yes | unique post id |
| data[].title | string | no | post title |
| data[].content | string | no | post body |
| data[].imgs | array | no | image url list |
| data[].videos | array | no | video url list |
| data[].other | int/string/object | no | business extension field |
| data[].created_at | string | no | creation time |
### Validation rules
- `id` is mandatory
- at least one of `title`, `content`, `imgs`, `videos` should exist
- media URLs should be directly readable by the moderation service or use signed URLs
## 2. Moderation callback API
Moderation service calls the business callback after a decision is made.
### Request example
```json
{
"id": 10001,
"audit_status": "reject",
"is_pass": 0,
"risk_level": "high",
"reason": "第1张图片右下角存在微信联系方式",
"hit_rules": ["contact_info"],
"hit_fields": ["images"],
"hit_positions": ["image_1_bottom_right"],
"action": "replace_image",
"model_name": "grok-4-1-fast",
"audit_mode": "auto",
"audit_time": "2026-03-18 18:30:00",
"trace_id": "audit-post-10001-20260318183000"
}
```
### Field definition
| Field | Type | Required | Description |
|---|---|---:|---|
| id | int/string | yes | post id |
| audit_status | string | yes | `pass` / `reject` / `review` |
| is_pass | int | yes | `1` pass, `0` reject |
| risk_level | string | yes | `low` / `medium` / `high` |
| reason | string | yes | core reason |
| hit_rules | array | no | matched rules |
| hit_fields | array | no | matched fields |
| hit_positions | array | no | matched positions |
| action | string | no | suggested action |
| model_name | string | no | model used |
| audit_mode | string | no | `auto` / `manual_review_enabled` |
| audit_time | string | yes | decision time |
| trace_id | string | no | trace id |
### Recommended response
```json
{
"code": 0,
"message": "callback accepted"
}
```
## 3. Minimal callback mode
If the business side only wants the simplest result:
### Request example
```json
{
"id": 10001,
"is_pass": 0,
"reason": "存在广告或联系方式"
}
```
This mode is easy to integrate, but loses detailed audit evidence.
## 4. Comment moderation API
This API is for comment text and short text moderation.
### Request example
```json
{
"id": 90001,
"text": "加V了解一下 abc123456"
}
```
### Response example
```json
{
"code": 0,
"message": "ok",
"data": {
"id": 90001,
"audit_status": "reject",
"is_pass": 0,
"risk_level": "high",
"reason": "评论中存在联系方式和引流话术",
"hit_rules": ["contact_info", "advertising"],
"action": "reject"
}
}
```
### Comment field definition
| Field | Type | Required | Description |
|---|---|---:|---|
| id | int/string | yes | comment id |
| text | string | yes | comment content |
| audit_status | string | yes | `pass` / `reject` |
| is_pass | int | yes | `1` pass, `0` reject |
| risk_level | string | yes | `low` / `medium` / `high` |
| reason | string | yes | rejection reason or pass reason |
| hit_rules | array | no | matched rule list |
| action | string | no | `publish` / `reject` |
## 5. Automatic audit policy
Use this when the user wants no human intervention.
### Strict full-auto mode
- clear clean content -> `pass`
- clear ad/contact content -> `reject`
- ambiguous but risky content -> `reject`
- media unreadable -> `reject`
- model request failed -> `reject`
- model returned invalid JSON -> `reject`
### Balanced mode
- clear clean content -> `pass`
- clear ad/contact content -> `reject`
- ambiguous or partially unreadable content -> `review`
- upstream/model/media temporary failure -> `review`
### Comment policy
- normal discussion -> `pass`
- any ad/contact signal -> `reject`
- uncertain disguised contact intent:
- strict full-auto -> `reject`
- balanced mode -> `review`
## 6. Error code spec
Recommended common error codes:
| Code | Meaning | Suggested handling |
|---:|---|---|
| 0 | success | normal processing |
| 4001 | invalid parameters | fix request and retry |
| 4002 | missing id | reject request |
| 4003 | empty content | reject request |
| 4004 | invalid media url | reject request or mark failed |
| 5001 | model request failed | strict mode reject; balanced mode manual review |
| 5002 | media download failed | strict mode reject; balanced mode manual review |
| 5003 | model returned invalid JSON | retry once or fail closed |
| 5004 | callback failed | retry callback with backoff |
| 5005 | callback timeout | retry callback with backoff |
### Error response example
```json
{
"code": 5002,
"message": "media download failed",
"data": {
"id": 10001
}
}
```
## 7. Action enum suggestion
Recommended `action` values:
- `publish`
- `reject`
- `replace_image`
- `replace_video`
- `trim_video`
- `rewrite_text`
- `block_comment`
- `review`
## 8. Suggested implementation flow
### Posts
1. pull pending posts
2. validate request fields
3. fetch title/content/images/videos
4. call model with strict moderation prompt
5. parse JSON result
6. normalize the result fields
7. map result to business callback contract
8. callback business system
9. retry callback if needed
### Comments
1. receive `id + text`
2. call model with comment-only moderation prompt
3. normalize the result fields
4. return structured result immediately
## 9. Security and reliability notes
- never hardcode API keys in source code or docs
- use environment variables such as `XAI_API_KEY`
- keep model `temperature = 0`
- enforce JSON-only output
- add timeout and connect timeout
- log `id`, `trace_id`, result, reason, and callback status
- if using strict full-auto mode, fail closed on upstream/model/media errors
- if callback is important, retry and keep a dead-letter record
## 10. Prompt contract suggestion
Recommended system contract:
```text
You are a strict moderation assistant. Review title, content, images, and videos. Block ads and contact information. Respect whitelist rules. Return JSON only. In full-auto mode, reject ambiguous risky content instead of asking for manual review.
```
Recommended user payload shape:
```json
{
"id": 10001,
"title": "xx",
"content": "xxx",
"imgs": ["https://..."],
"videos": ["https://..."],
"whitelist": [],
"custom_rules": [],
"output": "json"
}
```
FILE:references/install-and-usage.md
# Install and Usage
Use this reference when the user asks how to install, import, share, or operate the `post-content-moderation` skill.
## Files
- Skill folder: `skills/post-content-moderation`
- Packaged file: `skills/dist/post-content-moderation.skill`
## Installation ideas
Choose the method that matches the user's environment:
### Method 1: copy the unpacked folder into a workspace
Place the folder under a workspace skill directory such as:
```text
skills/post-content-moderation
```
Required file structure:
```text
post-content-moderation/
├── SKILL.md
├── references/
└── scripts/
```
### Method 2: distribute the packaged `.skill` file
Share this file:
```text
skills/dist/post-content-moderation.skill
```
Import it using the target environment's skill import/install flow.
## What this skill does after installation
- reviews title and body with a default ad/contact blocking policy
- can be used as a moderation prompt/policy skill for multimodal review tasks
- supports whitelist exceptions
- supports custom project rules
- supports strict full-auto mode and balanced mode
- can return human-readable output or JSON output
- includes optional PHP demo scripts for remote pull + callback workflows
## Important safety note
This skill is **not** a purely local moderation pack if you use the bundled PHP demos.
The demo scripts can:
- call external model APIs
- pull pending moderation items from a remote service
- send moderation results to a remote callback service
- expose text fields and media URLs to third-party services depending on your integration
Also note:
- the bundled media inspector is currently a placeholder only
- image/video moderation claims require your own verified OCR / QR / ASR / frame extraction pipeline before production use
- you should not present this demo as a complete media-audit implementation without those additions
## Recommended operating pattern
Provide moderation input in this shape:
```text
审核模式:严格全自动 / 平衡模式
标题:
正文:
图片:<number / description / attached images>
视频:<description / transcript / key frames>
白名单:
自定义规则:
输出要求:普通文本 / JSON
```
## Suggested rollout checklist
Before using this skill in production, confirm:
- whitelist is defined narrowly
- callback contract is fixed
- strict full-auto vs balanced mode is chosen
- media URLs are readable by the audit service
- timeout / retry / fail-close policy is agreed
- sample pass/reject cases were tested
- outbound hosts are allowlisted
- secrets are injected only through environment variables
- a dry-run path exists before callback writes are enabled
- operators know which fields may be sent to external model providers
## Example request
```text
请审核这个帖子。
审核模式:严格全自动
标题:副业分享
正文:最近发现一个不错的项目
图片:第1张图右下角有二维码
视频:无
白名单:无
输出要求:给结论+原因+处理建议
```
## Example response
```text
审核结果:拒绝
风险等级:高
审核范围:标题、正文、图片
命中字段:图片
命中位置:第1张图片右下角
命中规则:联系方式
原因:图片中二维码构成站外引流风险。
处理建议:删除或替换该图片。
```
## Deployment note
If the user wants team-wide consistency, keep one shared base rule set and define project-specific whitelist entries separately, instead of changing the default moderation policy every time.
FILE:references/php-demo-suite.md
# PHP Demo Suite
Use this reference when the user wants runnable PHP demo files instead of only a single client class.
## Bundled demo scripts
- `scripts/config.php` — centralized config
- `scripts/moderation_support.php` — object-oriented support classes
- `scripts/php_xai_client_example.php` — shared moderation client class
- `scripts/pull_pending_posts.php` — pull pending post list from business API
- `scripts/audit_posts.php` — pull -> audit -> callback full flow
- `scripts/callback_audit_result.php` — callback one moderation result from stdin JSON
- `scripts/audit_comment.php` — moderate one comment from stdin JSON
## Code style
This version follows these conventions:
- PHP 7.3 compatible syntax
- arrays use `[]`
- prefer object-oriented structure over procedural helpers
- keep entry scripts thin
- keep config and support logic separate
- add small infrastructure-style utility classes for maintainability
- add DTO-style wrappers for migration friendliness
- add service layer so command files stop owning business orchestration
## Main classes
### Constants
- `ModerationConst`
### Exceptions
- `ModerationException`
- `ModerationConfigException`
- `ModerationHttpException`
- `ModerationValidationException`
### Interfaces
- `ModerationLoggerInterface`
- `ModerationRuleProviderInterface`
- `ModerationMediaInspectorInterface`
### Utility / support classes
- `ModerationLogger`
- `ModerationTraceIdBuilder`
- `ModerationRetry`
- `ModerationCallbackRetryPolicy`
- `ModerationConfig`
- `ModerationRuleLoader`
- `ModerationHttpClient`
- `ModerationInput`
- `ModerationMediaInspectorPlaceholder`
- `ModerationResultBuilder`
- `ModerationResultFormatter`
- `ModerationAppContext`
### DTOs
- `ModerationPostDto`
- `ModerationCommentDto`
- `ModerationAuditResultDto`
### Services
- `PostModerationService`
- `CommentModerationService`
### Business client
- `PostContentModerationClient`
### Command classes
- `PullPendingPostsCommand`
- `AuditPostsCommand`
- `CallbackAuditResultCommand`
- `AuditCommentCommand`
## What became more engineered
Compared with the previous round, this version adds:
- service layer between commands and the moderation client
- command files that focus on IO instead of moderation orchestration
- a clearer split between transport/client logic and moderation workflow logic
## Suggested next migration path
When moving into a real Yaf project, map these parts like this:
- `ModerationConst` -> library/constants
- DTOs -> library/dto
- interfaces -> library/contracts
- `PostContentModerationClient` -> library/client
- services -> library/service
- `ModerationRuleProviderInterface` + implementation -> library/service or model-backed provider
- `AuditPostsCommand` -> cron/task entry
- `AuditCommentCommand` -> controller/service action
FILE:references/php-example-notes.md
# PHP Example Notes
Use this reference when the user wants to adapt the bundled PHP example to their own backend.
## Bundled files
- `scripts/config.php`
- `scripts/moderation_support.php`
- `scripts/php_xai_client_example.php`
## Style direction
This demo intentionally uses:
- PHP 7.3 compatible syntax
- short array syntax `[]`
- object-oriented structure
- small command entry files
- separated config and support classes
- a migration-friendly engineering skeleton
- DTO-style wrappers for post/comment/result data
- service layer between command and client
## Structure
### `config.php`
Keep configuration in one place:
- x.ai endpoint
- api key
- model name
- pull interface url
- callback url
- timeout settings
- host allowlists
- dry-run switch
- whitelist
- custom rules
### `moderation_support.php`
Keep shared support code in classes:
- constants
- custom exceptions
- logger interface + implementation
- trace id builder
- retry and callback retry policy
- config loader
- URL host allowlist validation
- rule provider interface + loader
- HTTP client
- stdin input reader
- media inspector interface + placeholder
- DTO classes
- result builder and formatter
- post/comment moderation services
- app context
### `php_xai_client_example.php`
Keep moderation transport responsibilities in one class:
- post moderation request
- comment moderation request
- callback request
- model response parsing
## Production advice
- do not store real API keys in git
- keep `temperature = 0`
- log raw model failures separately from content rejections
- if full-auto strict mode is enabled, fail closed on model/network/media errors
- if callback fails, retry with backoff instead of dropping the result silently
- replace placeholder media inspection with real image/video preprocessing
- keep outbound destinations on a strict allowlist
- use `dry_run` while testing pull/audit pipelines before enabling result callbacks
- if moving into Yaf, split current support file into constants, dto, contracts, client, formatter, provider, and service directories
## Important limitation
The current bundled media inspector is only a placeholder. It can count attached media and return a simple status object, but it does **not** actually inspect image or video content. If you need real media moderation, implement OCR, QR detection, frame extraction, and ASR in your own pipeline before using the result for automatic production decisions.
FILE:references/prompt-templates.md
# Prompt Templates
Use this reference when the user needs reusable prompt text for post moderation, comment moderation, or strict JSON-only output.
## 1. Post moderation system prompt
```text
You are a strict moderation assistant.
Review title, content, images, and videos together.
Block advertising and contact information.
Treat disguised contact details as violations, including variants such as v, vx, wx, 薇, 微, 卫星, 扣扣, split digits, mixed symbols, Chinese numerals, and hidden text in media.
Respect whitelist rules and custom business rules.
If the workflow is fully automatic and content is ambiguous but risky, reject it instead of asking for manual review.
Return JSON only.
```
## 2. Post moderation user payload template
```json
{
"id": 10001,
"title": "xx",
"content": "xxx",
"imgs": ["https://..."],
"videos": ["https://..."],
"other": 0,
"whitelist": [
"官方客服微信 service_official,仅限售后"
],
"custom_rules": [
"普通帖子图片不允许出现二维码",
"视频不允许出现个人微信、手机号、群号"
],
"output": "json"
}
```
## 3. Post moderation JSON schema prompt
```text
Return JSON only with this schema:
{
"id": 10001,
"audit_status": "pass|reject",
"is_pass": 1,
"risk_level": "low|medium|high",
"reason": "string",
"hit_rules": ["advertising","contact_info"],
"hit_fields": ["title","content","images","videos"],
"hit_positions": ["title","image_1","video_end_card"],
"action": "publish|reject|replace_image|replace_video|trim_video|rewrite_text"
}
```
## 4. Comment moderation system prompt
```text
You are a strict comment moderation assistant.
Review comment text only.
Block advertising and contact information.
Treat disguised contact details as violations.
Respect whitelist rules and custom business rules.
If the workflow is fully automatic and content is ambiguous but risky, reject it.
Return JSON only.
```
## 5. Comment moderation user payload template
```json
{
"id": 90001,
"text": "加V了解一下 abc123456",
"whitelist": [],
"custom_rules": [
"评论出现微信号一律拒绝"
],
"output": "json"
}
```
## 6. Comment moderation JSON schema prompt
```text
Return JSON only with this schema:
{
"id": 90001,
"audit_status": "pass|reject",
"is_pass": 1,
"risk_level": "low|medium|high",
"reason": "string",
"hit_rules": ["advertising","contact_info"],
"action": "publish|reject"
}
```
## 7. Strict anti-formatting suffix
Append this when the model tends to add explanations outside JSON:
```text
Do not add markdown. Do not wrap JSON in code fences. Do not add any explanation before or after JSON.
```
## 8. Full-auto conservative rule suffix
Append this when no manual review is allowed:
```text
This workflow has no manual review. If content is ambiguous but has meaningful risk, return reject.
If media cannot be read completely, return reject.
```
FILE:references/release-notes.zh-CN.md
# 发布说明(中文)
## 版本 1.1.1
这是一版面向生产接入前的安全加固和说明修正版,重点不是扩展审核规则,而是降低误用、误配和误解风险。
### 本次优化
#### 1. 增加出站地址白名单能力
新增 allowlist / allowed_hosts 配置,用于限制审核脚本可访问的外部地址范围。
适用范围:
- 模型 API 地址
- 待审核内容拉取地址
- 审核结果回调地址
这样可以避免因为配置错误,把审核数据发送到非预期的第三方地址。
#### 2. 增加 dry-run 模式
新增 `MODERATION_DRY_RUN=1` 支持。
开启后:
- 仍可执行拉取与审核流程
- 但不会把审核结果正式 callback 到业务系统
- 更适合联调、验收、灰度验证
这样可以先验证审核链路是否稳定,再决定是否真的写回业务结果。
#### 3. 明确网络与数据边界
补充了文档说明,明确指出:
- 该 skill 不是纯本地规则引擎
- 使用演示脚本时,帖子文本、评论文本、规则配置、图片 URL、视频 URL 等信息,可能被发送到外部审核服务
- 上线前应明确数据合规范围与脱敏策略
#### 4. 修正媒体审核能力描述
此前文档容易让人理解为“内置脚本已经具备完整图片/视频审核能力”。
现在已明确:
- 当前内置 `ModerationMediaInspector` 只是 placeholder
- 仅返回基础状态,不执行真实 OCR、二维码识别、视频抽帧、语音识别
- 若业务需要自动审核图片/视频,必须自行补齐并验证媒体处理链路
#### 5. 收紧模型提示词
补充约束,避免模型在证据不足时“假装看过图片或视频内容”。
原则是:
- 有证据再下结论
- 没证据就不要伪造媒体命中结果
- 自动化场景下,证据不足但有风险时,按保守策略处理
#### 6. 保持 PHP 7.3 兼容
本次修改仍兼容 PHP 7.3,适合现有老项目或 Yaf 项目接入。
---
## 使用建议
### 建议优先这样接入
1. 先配置严格的 `allowed_hosts`
2. 先开启 `MODERATION_DRY_RUN=1`
3. 用样例数据跑通 pull → audit → callback 全链路
4. 确认业务方能接受外部模型处理的数据范围
5. 确认图片/视频是否真的有独立媒体审核能力
6. 再关闭 dry-run 进入正式回调
### 不建议直接这样使用
- 不建议默认把任意 URL 作为 pull/callback 地址
- 不建议在未脱敏场景下直接发送完整敏感业务数据
- 不建议把 placeholder 媒体检查结果当作“已完成真实图片/视频审核”
- 不建议在没有白名单和错误策略的情况下直接上线全自动拦截
---
## 适合谁用
这版更适合:
- 有业务审核接口,需要 pull + callback 的团队
- 希望用大模型辅助文本审核的项目
- 需要保守接入、先联调后上线的场景
这版暂时不适合直接宣称为:
- 完整的本地多模态审核引擎
- 开箱即用的视频 OCR / ASR / 抽帧审核方案
- 零补充即可投入生产的全媒体审核系统
---
## 升级摘要
如果你已经在使用旧版,升级后请重点补充这几个配置:
- `xai.allowed_hosts`
- `moderation.allowed_hosts`
- `MODERATION_DRY_RUN`
并重新确认:
- pull_url
- callback_url
- 数据外发范围
- 媒体审核能力是否真实落地
FILE:references/rule-template.md
# Rule Template
Use this template when a user wants to customize moderation rules for a specific project, community, or platform.
## Minimal template
```text
审核对象:标题 / 正文 / 图片 / 视频 / 全部
白名单:
- 允许出现的品牌、账号、域名、术语、官方二维码、指定场景
默认拦截:
- 广告
- 联系方式
新增禁止词:
-
新增禁止场景:
-
明确放行场景:
-
媒体规则:
- 图片是否允许二维码
- 视频是否允许口播联系方式
- 是否允许官方水印
- 是否允许站外引导文案
输出格式:
- 仅给结论
- 给结论+原因
- 给结论+原因+改写建议
- 返回 JSON 结果
冲突处理:
- 优先白名单 / 优先拦截 / 冲突时人工复核
```
## Example
```text
审核对象:全部
白名单:
- Apple、华为
- 官方域名 example.com
- 官方客服微信 service_official,仅限售后
- 官方售后二维码,仅限售后说明图
默认拦截:
- 广告
- 联系方式
新增禁止词:
- 返利
- 代发
- 保过
新增禁止场景:
- 招代理
- 引导站外交易
- 图中出现个人联系方式
- 视频结尾引导扫码
明确放行场景:
- 合法售后说明
- 官方公告
媒体规则:
- 普通帖子图片不允许出现二维码
- 视频不允许出现个人微信、手机号、群号
- 官方账号水印可保留
输出格式:
- 给结论+原因+改写建议
冲突处理:
- 冲突时人工复核
```
## Normalization guidance
Convert the user's free-form rules into a short reusable policy. Keep rules concrete and narrow. Avoid broad whitelisting such as "所有品牌都允许" unless the user explicitly insists.
Audit legacy PHP projects, especially Yaf-based PHP 7.3 codebases, for architecture issues, security risks, performance problems, compatibility risks, and ma...
---
name: yaf-php-audit
version: 1.1.0
description: Audit legacy PHP projects, especially Yaf-based PHP 7.3 codebases, for architecture issues, security risks, performance problems, compatibility risks, and maintainability concerns. Works best with Yaf-style projects, but is also useful for first-pass triage of many traditional PHP codebases with similar structure. Use when reviewing a PHP/Yaf project, producing a structured code audit report, or triaging many similar projects with consistent audit dimensions.
emoji: 🔍
user-invocable: true
homepage: https://github.com/XavierMary56/OmniPublish
metadata:
openclaw:
requires:
bins:
- bash
- grep
- find
---
# Yaf PHP Audit
Audit a legacy PHP project with a focus on Yaf-style structure, PHP 7.3 compatibility, and practical risk discovery.
## Overview
Use this skill to produce a structured audit report for a single PHP project or to apply the same audit dimensions across many similar projects. Prefer evidence from code over assumptions. Prefer small, realistic remediation advice over broad refactors.
## Audit Workflow
1. Confirm the target project directory.
2. Identify the main entrypoints under `public`.
3. Read the request flow from entrypoint to controller, then to library/model/config where relevant.
4. Summarize the current structure before listing risks.
5. Focus on security, performance, reliability, and maintainability issues that have operational impact.
6. Distinguish confirmed issues from suspected risks.
7. Use `references/checklist.md` as the audit checklist and report skeleton.
8. Use `scripts/scan_project.sh` to perform a first-pass static scan before deeper manual review.
## Scope
Prioritize review of:
- `public`
- `application/controllers`
- `application/models`
- `application/library`
- `application/modules`
- `conf`
If present, also inspect:
- callback handlers
- payment flows
- cron/task scripts
- login/auth logic
- Redis usage
- external HTTP/RPC calls
## What to Check
### Structure and Architecture
Check for:
- mixed or unclear entry responsibilities
- controllers containing heavy business logic
- duplicated logic across controllers/modules
- scattered DB/Redis/HTTP calls without stable encapsulation
- confusing naming or inconsistent layering
### Security
Check for:
- SQL injection risk
- unsafe SQL string concatenation
- missing input validation
- weak callback verification
- upload/file handling risk
- dangerous functions such as `eval`, `exec`, `shell_exec`, `system`, `passthru`
- missing `timeout` / `connect_timeout` in external HTTP calls
- auth / permission bypass risk
- sensitive operations without sufficient logging
### Performance
Check for:
- N+1 query patterns
- queries inside loops
- excessive `SELECT *`
- missing cache abstraction
- Redis misuse
- expensive Redis commands such as `sdiff` / `sdiffstore`
- large batch tasks without chunking
- pagination / sorting patterns with full-scan risk
### Reliability and Consistency
Check for:
- idempotency problems in callbacks or retry paths
- race conditions around status updates
- missing transaction boundaries
- inconsistent cache update strategy
- fragile cron/task implementations
- missing failure logging on critical paths
### PHP 7.3 Compatibility
Flag syntax or patterns that require newer PHP versions. Do not recommend PHP 7.4+ or 8.x-only syntax unless you clearly mark it as incompatible with PHP 7.3.
### Business High-Risk Areas
Always pay extra attention to:
- payment
- callback / notify
- user status changes
- login/session
- risk-control logic
- scheduled jobs
- bulk processing
- data synchronization
## Reporting Format
Structure the audit output as:
1. target project
2. audit conclusion
3. project structure overview
4. current logic summary
5. risk findings
6. risk level and priority
7. remediation suggestions
8. verification suggestions
For each important finding, include:
- file path
- problem summary
- impact
- recommendation
## Bulk Audit Guidance
When many similar projects exist:
1. Apply the same checklist to every project.
2. Generate a short per-project summary first.
3. Rank projects by risk level and business criticality.
4. Spend the most attention on payment, callback, auth, Redis, SQL, and task-heavy projects.
5. Avoid writing long narrative reports for every low-risk project.
## Resource Usage
- Read `references/checklist.md` when you need the detailed checklist, risk grading guidance, or report template.
- Run `scripts/scan_project.sh <project-root> [output-file]` for a first-pass scan that highlights likely hotspots and optionally writes a text report to disk.
- Run `scripts/scan_workspace.sh <workspace-root> [output-dir]` to batch-scan many sibling projects and generate a summary table, high-risk list, and per-project text reports.
## Output Preference
Be concise but complete. The report should help humans decide what to fix first.
FILE:README.md
# yaf-php-audit
用于 **PHP / Yaf 老项目代码审计** 的 OpenClaw skill,重点适配 **PHP 7.3 + Yaf** 风格项目。
它适合这几类场景:
- 对单个 PHP/Yaf 项目做首轮代码审计
- 对一批结构相似的老项目做批量初筛
- 在支付、回调、任务、Redis、SQL、权限等高风险区域快速定位热点
- 为人工深度审计提供统一的清单、脚本和报告输出
---
## 目录结构
```text
yaf-php-audit/
├── SKILL.md
├── README.md
├── references/
│ └── checklist.md
└── scripts/
├── scan_project.sh
└── scan_workspace.sh
```
---
## 功能说明
### 1. 单项目审计
脚本:`scripts/scan_project.sh`
能力:
- 检查项目结构是否符合常见 Yaf 风格
- 扫描入口文件
- 扫描危险函数
- 扫描原始输入使用
- 扫描 SQL 拼接嫌疑
- 扫描回调 / 支付 / 任务关键词
- 扫描 PHP 7.4+ / 8.x 语法嫌疑
- 输出文本报告
### 2. 批量项目审计
脚本:`scripts/scan_workspace.sh`
能力:
- 扫描工作区下所有一级项目目录
- 为每个项目生成一份文本报告
- 生成 `summary.csv` 汇总表
- 生成 `summary.md` 人读版汇总
- 生成 `high-risk.txt` 高风险项目清单
### 3. 审计标准
参考文件:`references/checklist.md`
用于统一:
- 结构检查维度
- 安全检查维度
- 性能检查维度
- 可靠性检查维度
- 风险分级标准
- 报告模板
---
## 部署方式
### 方式一:目录方式部署
将整个目录放到 OpenClaw workspace skills 目录中:
```bash
mkdir -p ~/.openclaw/workspace/skills
cp -r yaf-php-audit ~/.openclaw/workspace/skills/
```
目录最终应类似:
```text
~/.openclaw/workspace/skills/yaf-php-audit/
```
### 方式二:打包后分发
如果你已经拿到了 `.skill` 包,可以按你的 OpenClaw 安装方式导入或分发。
当前包名示例:
```text
yaf-php-audit.skill
```
### 方式三:发布到 ClawHub
如果你要共享给其他 OpenClaw 用户,推荐发布到 ClawHub。
先登录:
```bash
clawhub login
clawhub whoami
```
再发布:
```bash
clawhub publish /path/to/yaf-php-audit \
--slug yaf-php-audit \
--name "Yaf PHP Audit" \
--version 1.1.0 \
--changelog "Initial public release: single-project audit reports, batch workspace scan, summary outputs, high-risk list, checklist, and README."
```
---
## 使用方式
### 单项目扫描
会输出一份“项目审计报告”格式的文本。
直接输出到终端:
```bash
bash scripts/scan_project.sh /path/to/project
```
输出到文本文件:
```bash
bash scripts/scan_project.sh /path/to/project /path/to/output/project-report.txt
```
示例:
```bash
bash scripts/scan_project.sh /mnt/d/Users/Public/php20250819/2026www/51dm /mnt/d/Users/Public/php20250819/2026www/audit-output/51dm.txt
```
### 批量扫描整个工作区
```bash
bash scripts/scan_workspace.sh /path/to/workspace /path/to/audit-output
```
示例:
```bash
bash scripts/scan_workspace.sh /mnt/d/Users/Public/php20250819/2026www /mnt/d/Users/Public/php20250819/2026www/audit-output
```
---
## 输出结果说明
批量扫描默认输出目录类似:
```text
audit-output/
├── summary.csv
├── summary.md
├── high-risk.txt
└── projects/
├── 51dm.txt
├── project-a.txt
└── project-b.txt
```
### `projects/*.txt`
每个项目一份首轮扫描文本报告,包含:
- 项目名 / 路径 / 生成时间
- 目录结构检查
- 入口文件列表
- 危险函数命中
- 原始输入命中
- SQL 拼接嫌疑
- Redis 高耗时命令
- 外部调用线索
- 回调 / 支付 / 任务关键词
- PHP 新语法嫌疑
### `summary.csv`
适合排序、筛选和导入表格工具。
字段:
- `project`
- `risk_level`
- `dangerous_hits`
- `raw_input_hits`
- `callback_hits`
- `payment_hits`
- `task_hits`
- `php_new_syntax_hits`
- `notes`
### `summary.md`
适合人工快速浏览。
### `high-risk.txt`
列出被粗分级为 `high` 的项目,方便优先人工复核。
---
## 风险等级说明
这是一个 **首轮粗筛分级**,不是最终审计结论。
- `high`:高复杂度 / 高敏感业务 / 多个高风险特征叠加
- `medium`:存在明显风险面,建议人工复核
- `low`:首轮命中较少,但不代表绝对安全
建议流程:
1. 先看 `summary.csv` 或 `summary.md`
2. 优先处理 `high-risk.txt`
3. 再对高风险项目做人工深审
---
## 适用范围
更适合:
- PHP 7.3 老项目
- Yaf 风格项目
- 传统 PHP 项目首轮静态审计
- 多项目复制型业务代码
- 以支付、回调、任务、内容分发为核心的业务系统
不适合直接替代:
- 渗透测试
- 真正的静态分析器
- 完整的数据流安全分析
- 最终人工审计结论
---
## 注意事项
- 本 skill 的脚本输出是 **首轮扫描结果**,不是最终漏洞结论。
- 命中项需要结合代码上下文人工确认。
- 对支付、回调、登录态、任务系统命中较多的项目,应优先深审。
- 批量项目场景下,不建议一开始就为所有项目写长篇报告,应先做风险排序。
---
## 推荐使用流程
### 单项目
1. 跑 `scan_project.sh`
2. 看文本报告
3. 根据命中点人工深读关键文件
4. 输出正式审计结论
### 多项目
1. 跑 `scan_workspace.sh`
2. 看 `summary.csv`
3. 按高风险项目排序
4. 对前几名项目人工深审
5. 最后再输出正式汇总报告
---
## 二次开发建议
后续可继续增强:
- 输出 Markdown 格式项目报告
- 自动提取支付/回调相关文件清单
- 增加回调验签专项扫描
- 增加事务/幂等关键词专项扫描
- 增加 SQL/Redis 热点统计
- 增加针对特定项目框架的规则集
FILE:references/checklist.md
# Yaf PHP Audit Checklist
Use this checklist to keep reviews consistent across many similar PHP/Yaf projects.
## 1. Target Project
- Confirm project root.
- Confirm whether the project is really Yaf-style.
- Confirm whether the runtime target is PHP 7.3.
- Check whether there is a project-level `AGENTS.md` with extra constraints.
## 2. Structure Overview
Record:
- main directories
- `public` entrypoints
- controller/model/library/module layout
- config files
- task/callback/payment-related directories if present
## 3. Entry and Request Flow
Review:
- `public/api`
- `public/www`
- `public/web`
- `public/adm`
- `public/nav`
- `public/pwa`
Check:
- where bootstrap happens
- how routing reaches controllers
- whether there are custom dispatch rules
- whether entrypoints have mixed responsibilities
## 4. Security Checks
Look for:
- raw SQL concatenation
- direct use of `$_GET`, `$_POST`, `$_REQUEST` without validation
- dangerous functions: `eval`, `exec`, `shell_exec`, `system`, `passthru`, `unserialize`
- file upload and path handling risks
- callback signature or source verification weaknesses
- auth/permission checks that appear easy to bypass
- secrets or credentials committed to config/code
- hardcoded passwords / api_key / token literal strings (`password = 'value'`, `api_key = "value"`)
- login, logout, session, auth, permission, role, privilege keyword clusters (confirm auth coverage)
## 5. Performance Checks
Look for:
- database access inside loops
- repeated model queries in loops
- `SELECT *` on large tables
- missing pagination limits
- Redis anti-patterns
- blocked external calls on request path
- no timeout / connect_timeout on HTTP requests
- `foreach` / `for` / `while` containing direct model or DB call (N+1 pattern)
- `static $cache = []` or `static $var = array()` used as in-process cache antipattern
## 6. Reliability Checks
Look for:
- callback re-entry and idempotency problems
- status changes without transaction protection
- cache/database inconsistency windows
- weak retry behavior
- missing error handling and failure logs
- fragile batch and task scripts
## 7. Compatibility Checks
Look for code that assumes PHP 7.4+ or 8.x features, including:
- match expressions
- constructor property promotion
- union types
- attributes
- nullsafe operator
- enums
- readonly
## 8. High-Risk Business Flows
Review with extra care:
- payment
- callback / notify
- login/session
- user status changes
- risk-control logic
- scheduled tasks
- batch processing
- data sync jobs
## 9. Risk Grading
### High
- exploitable security risk
- payment/callback/login state inconsistency
- clear data corruption or major reliability risk
- obvious full-scan / N+1 / blocking external dependency on critical path
- hardcoded credential (password, api_key, secret) confirmed in source code
### Medium
- significant maintainability problem with operational impact
- probable performance issue under load
- weak validation or weak logging on sensitive path
- structure issues that increase defect risk
### Low
- style issue with little operational impact
- low-confidence suspicion with limited evidence
- documentation/naming issues only
## 10. Batch Audit Workflow
When auditing many similar projects:
1. Run `scripts/scan_workspace.sh <workspace-root> [output-dir]`.
2. Review `summary.csv` or `summary.md` first.
3. Sort by `risk_level`, then prioritize projects tagged `payment-heavy`, `callback-heavy`, `task-heavy`, or `dangerous-fns`.
4. Only deep-read the highest-risk projects first.
5. Keep the per-project reports as evidence snapshots, not final audit conclusions.
## 11. Report Template
### Target project
- project name:
- path:
- type:
### Audit conclusion
- one-paragraph summary
### Project structure overview
- entrypoints:
- main modules:
- notable directories:
### Current logic summary
- request flow:
- key business areas:
### Risk findings
For each finding record:
- level:
- file:
- summary:
- impact:
- recommendation:
- confidence: confirmed / suspected
### Priority suggestions
- P0:
- P1:
- P2:
### Verification suggestions
- syntax check
- minimal functional path
- callback retry path
- repeated request/idempotency path
- SQL/Redis hotspot validation if applicable
FILE:scripts/scan_project.sh
#!/usr/bin/env bash
set -euo pipefail
if [ "$#" -lt 1 ]; then
echo "Usage: $0 <project-root> [output-file]" >&2
exit 1
fi
project_root="$1"
output_file="-"
if [ ! -d "$project_root" ]; then
echo "[ERROR] project root not found: $project_root" >&2
exit 1
fi
project_name="$(basename "$project_root")"
tmp_report="$(mktemp)"
trap 'rm -f "$tmp_report"' EXIT
collect_raw() {
cd "$project_root"
echo "=== structure ==="
for path in public application/controllers application/models application/library application/modules conf; do
if [ -e "$path" ]; then
echo "[OK] $path"
else
echo "[MISSING] $path"
fi
done
echo
echo "=== entrypoints ==="
find public -maxdepth 2 -type f 2>/dev/null | sed 's#^#- #' | sort || true
echo
print_hits() {
label="$1"
pattern="$2"
echo "=== $label ==="
grep -RInE --include='*.php' --include='*.inc' --include='*.phtml' "$pattern" application public conf 2>/dev/null | sed -n '1,120p' || true
echo
}
print_hits "dangerous functions" '(^|[^a-zA-Z0-9_])(eval|exec|shell_exec|system|passthru|proc_open|popen|unserialize)[[:space:]]*\('
print_hits "raw superglobals" '\$_(GET|POST|REQUEST|COOKIE|FILES)\b'
print_hits "sql concatenation suspects" '(select|update|delete|insert)[^;\n]*\.[^;\n]*\$|->query\s*\(|->execute\s*\('
print_hits "select star" 'select\s+\*'
print_hits "redis high-cost commands" '(sdiff|sdiffstore)[[:space:]]*\('
print_hits "external http without obvious timeout clues" '(curl_init|curl_setopt|file_get_contents\s*\(|request\s*\()'
print_hits "callback and notify keywords" '(callback|notify)'
print_hits "payment keywords" '(pay|order|refund|charge)'
print_hits "task and cron keywords" '(cron|task|queue|worker)'
print_hits "php 7.4+ syntax suspects" '(match[[:space:]]*\(|fn[[:space:]]*\(|\?->|readonly|enum[[:space:]]+|public[[:space:]]+[A-Za-z_][A-Za-z0-9_]*[[:space:]]*=)'
print_hits "hardcoded credentials suspects" '(password|passwd|pwd|api_key|secret|token)\s*=\s*['"'"'"][^'"'"'"]{4,}'
print_hits "login and auth keywords" '(login|logout|session|token|auth|permission|role|privilege)'
print_hits "curl without explicit timeout check" 'curl_init\s*\('
print_hits "static array cache antipattern" 'static\s+\$[a-zA-Z_][a-zA-Z0-9_]*\s*=\s*(array\s*\(|\[)'
print_hits "loop with db suspects" '(foreach|for\s*\(|while\s*\()[^{]*\{[^}]*(->find|->where|->query|->select|->first|->get|->fetch)'
}
count_hits() {
local file="$1"
local section="$2"
awk -v section="$section" '
$0 == "=== " section " ===" { in_section=1; next }
/^=== / && in_section { exit }
in_section && NF { count++ }
END { print count+0 }
' "$file"
}
print_section() {
local file="$1"
local section="$2"
awk -v section="$section" '
$0 == "=== " section " ===" { in_section=1; next }
/^=== / && in_section { exit }
in_section { print }
' "$file"
}
calc_risk() {
local dangerous="$1"
local raw="$2"
local callback="$3"
local payment="$4"
local task="$5"
local phpnew="$6"
local hardcoded="$7"
if [ "$dangerous" -ge 5 ] || { [ "$callback" -ge 10 ] && [ "$payment" -ge 10 ]; } || [ "$task" -ge 15 ] || [ "$hardcoded" -ge 3 ]; then
echo "high"
elif [ "$dangerous" -ge 1 ] || [ "$callback" -ge 3 ] || [ "$payment" -ge 3 ] || [ "$raw" -ge 20 ] || [ "$phpnew" -ge 1 ] || [ "$hardcoded" -ge 1 ]; then
echo "medium"
else
echo "low"
fi
}
report_note() {
local risk="$1"
local callback="$2"
local payment="$3"
local task="$4"
local dangerous="$5"
if [ "$risk" = "high" ]; then
echo "高优先级项目,建议人工深审支付、回调、任务和危险函数命中区域。"
elif [ "$callback" -ge 3 ] || [ "$payment" -ge 3 ]; then
echo "存在明显业务敏感链路,建议优先复核回调和支付状态变更。"
elif [ "$task" -ge 5 ] || [ "$dangerous" -ge 1 ]; then
echo "存在任务链路或高敏函数使用,建议补充人工复核。"
else
echo "首轮命中有限,可作为低优先级项目保留跟踪。"
fi
}
collect_raw > "$tmp_report"
dangerous="$(count_hits "$tmp_report" "dangerous functions")"
raw="$(count_hits "$tmp_report" "raw superglobals")"
sqlsus="$(count_hits "$tmp_report" "sql concatenation suspects")"
selectstar="$(count_hits "$tmp_report" "select star")"
redisrisk="$(count_hits "$tmp_report" "redis high-cost commands")"
httprisk="$(count_hits "$tmp_report" "external http without obvious timeout clues")"
callback="$(count_hits "$tmp_report" "callback and notify keywords")"
payment="$(count_hits "$tmp_report" "payment keywords")"
task="$(count_hits "$tmp_report" "task and cron keywords")"
phpnew="$(count_hits "$tmp_report" "php 7.4+ syntax suspects")"
hardcoded="$(count_hits "$tmp_report" "hardcoded credentials suspects")"
authkw="$(count_hits "$tmp_report" "login and auth keywords")"
curlinit="$(count_hits "$tmp_report" "curl without explicit timeout check")"
staticcache="$(count_hits "$tmp_report" "static array cache antipattern")"
loopdb="$(count_hits "$tmp_report" "loop with db suspects")"
risk="$(calc_risk "$dangerous" "$raw" "$callback" "$payment" "$task" "$phpnew" "$hardcoded")"
summary_note="$(report_note "$risk" "$callback" "$payment" "$task" "$dangerous")"
generate_report() {
echo "# 项目审计报告"
echo
echo "## 1. 项目基本信息"
echo "- 项目名:$project_name"
echo "- 项目路径:$project_root"
echo "- 生成时间:$(date '+%Y-%m-%d %H:%M:%S %z')"
echo "- 首轮风险等级:$risk"
echo
echo "## 2. 审计结论"
echo "- 结论:$summary_note"
echo
echo "## 3. 结构概览"
print_section "$tmp_report" "structure"
echo
echo "## 4. 入口文件"
print_section "$tmp_report" "entrypoints"
echo
echo "## 5. 风险统计"
echo "- dangerous functions:$dangerous"
echo "- raw superglobals:$raw"
echo "- sql concatenation suspects:$sqlsus"
echo "- select star:$selectstar"
echo "- redis high-cost commands:$redisrisk"
echo "- external http clues:$httprisk"
echo "- callback and notify keywords:$callback"
echo "- payment keywords:$payment"
echo "- task and cron keywords:$task"
echo "- php 7.4+/8.x syntax suspects:$phpnew"
echo "- hardcoded credentials suspects:$hardcoded"
echo "- login and auth keywords:$authkw"
echo "- curl without explicit timeout check:$curlinit"
echo "- static array cache antipattern:$staticcache"
echo "- loop with db suspects:$loopdb"
echo
echo "## 6. 风险明细"
echo
echo "### 6.1 dangerous functions"
print_section "$tmp_report" "dangerous functions"
echo
echo "### 6.2 raw superglobals"
print_section "$tmp_report" "raw superglobals"
echo
echo "### 6.3 sql concatenation suspects"
print_section "$tmp_report" "sql concatenation suspects"
echo
echo "### 6.4 select star"
print_section "$tmp_report" "select star"
echo
echo "### 6.5 redis high-cost commands"
print_section "$tmp_report" "redis high-cost commands"
echo
echo "### 6.6 external http without obvious timeout clues"
print_section "$tmp_report" "external http without obvious timeout clues"
echo
echo "### 6.7 callback and notify keywords"
print_section "$tmp_report" "callback and notify keywords"
echo
echo "### 6.8 payment keywords"
print_section "$tmp_report" "payment keywords"
echo
echo "### 6.9 task and cron keywords"
print_section "$tmp_report" "task and cron keywords"
echo
echo "### 6.10 php 7.4+ syntax suspects"
print_section "$tmp_report" "php 7.4+ syntax suspects"
echo
echo "### 6.11 hardcoded credentials suspects"
print_section "$tmp_report" "hardcoded credentials suspects"
echo
echo "### 6.12 login and auth keywords"
print_section "$tmp_report" "login and auth keywords"
echo
echo "### 6.13 curl without explicit timeout check"
print_section "$tmp_report" "curl without explicit timeout check"
echo
echo "### 6.14 static array cache antipattern"
print_section "$tmp_report" "static array cache antipattern"
echo
echo "### 6.15 loop with db suspects"
print_section "$tmp_report" "loop with db suspects"
echo
echo "## 7. 审计建议"
echo "- 建议优先人工复核支付、回调、登录态、任务链路。"
echo "- 建议核实外部 HTTP 调用是否设置 timeout / connect_timeout。"
echo "- 建议结合调用链确认幂等、事务边界、N+1 查询和缓存一致性问题。"
echo
echo "## 8. 说明"
echo "- 本报告属于首轮静态审计报告,用于风险筛查与优先级排序。"
echo "- 命中项不等于最终漏洞结论,高风险项目仍需人工深审。"
}
if [ -n "$output_file" ]; then
mkdir -p "$(dirname "$output_file")"
generate_report > "$output_file"
printf '[OK] report written: %s\n' "$output_file"
else
generate_report
fi
FILE:scripts/scan_workspace.sh
#!/usr/bin/env bash
set -euo pipefail
if [ "$#" -lt 1 ]; then
echo "Usage: $0 <workspace-root> [output-dir]" >&2
exit 1
fi
workspace_root="$1"
output_dir="-$workspace_root/audit-output"
if [ ! -d "$workspace_root" ]; then
echo "[ERROR] workspace root not found: $workspace_root" >&2
exit 1
fi
script_dir="$(cd "$(dirname "$0")" && pwd)"
project_scan="$script_dir/scan_project.sh"
if [ ! -x "$project_scan" ]; then
echo "[ERROR] missing executable project scanner: $project_scan" >&2
exit 1
fi
mkdir -p "$output_dir/projects"
summary_csv="$output_dir/summary.csv"
summary_md="$output_dir/summary.md"
high_risk_txt="$output_dir/high-risk.txt"
echo 'project,risk_level,dangerous_hits,raw_input_hits,callback_hits,payment_hits,task_hits,php_new_syntax_hits,hardcoded_hits,loopdb_hits,staticcache_hits,notes' > "$summary_csv"
: > "$high_risk_txt"
{
echo "# Workspace Audit Summary"
echo
echo "- workspace: $workspace_root"
echo "- generated_at: $(date '+%Y-%m-%d %H:%M:%S %z')"
echo
echo "| project | risk | dangerous | raw input | callback | payment | task | php 7.4+/8.x | hardcoded | loop+db | static-cache | notes |"
echo "|---|---:|---:|---:|---:|---:|---:|---:|---:|---:|---:|---|"
} > "$summary_md"
count_hits() {
local file="$1"
local section="$2"
awk -v section="$section" '
$0 == "=== " section " ===" { in_section=1; next }
/^=== / && in_section { exit }
in_section && NF { count++ }
END { print count+0 }
' "$file"
}
risk_level() {
local dangerous="$1"
local raw="$2"
local callback="$3"
local payment="$4"
local task="$5"
local phpnew="$6"
local hardcoded="$7"
if [ "$dangerous" -ge 5 ] || { [ "$callback" -ge 10 ] && [ "$payment" -ge 10 ]; } || [ "$task" -ge 15 ] || [ "$hardcoded" -ge 3 ]; then
echo "high"
elif [ "$dangerous" -ge 1 ] || [ "$callback" -ge 3 ] || [ "$payment" -ge 3 ] || [ "$raw" -ge 20 ] || [ "$phpnew" -ge 1 ] || [ "$hardcoded" -ge 1 ]; then
echo "medium"
else
echo "low"
fi
}
project_notes() {
local dangerous="$1"
local callback="$2"
local payment="$3"
local task="$4"
local raw="$5"
local hardcoded="$6"
local loopdb="$7"
local staticcache="$8"
local notes=()
[ "$dangerous" -ge 1 ] && notes+=("dangerous-fns")
[ "$callback" -ge 3 ] && notes+=("callback-heavy")
[ "$payment" -ge 3 ] && notes+=("payment-heavy")
[ "$task" -ge 5 ] && notes+=("task-heavy")
[ "$raw" -ge 20 ] && notes+=("raw-input-heavy")
[ "$hardcoded" -ge 1 ] && notes+=("hardcoded-creds")
[ "$loopdb" -ge 1 ] && notes+=("loop-db-risk")
[ "$staticcache" -ge 1 ] && notes+=("static-cache")
if [ "#notes[@]" -eq 0 ]; then
echo "general-review"
else
local IFS=';'
echo "notes[*]"
fi
}
find "$workspace_root" -mindepth 1 -maxdepth 1 -type d | sort | while read -r project_dir; do
project_name="$(basename "$project_dir")"
case "$project_name" in
.git|dist|skills|audit-output|audit-output-test|vendor|node_modules)
continue
;;
esac
report_file="$output_dir/projects/project_name.txt"
if ! bash "$project_scan" "$project_dir" "$report_file" >/dev/null 2>&1; then
echo "[WARN] scan failed: $project_name" >&2
echo "$project_name,scan_failed,0,0,0,0,0,0,0,0,0,scan-failed" >> "$summary_csv"
echo "| $project_name | scan_failed | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | scan-failed |" >> "$summary_md"
continue
fi
dangerous="$(count_hits "$report_file" "dangerous functions")"
raw="$(count_hits "$report_file" "raw superglobals")"
callback="$(count_hits "$report_file" "callback and notify keywords")"
payment="$(count_hits "$report_file" "payment keywords")"
task="$(count_hits "$report_file" "task and cron keywords")"
phpnew="$(count_hits "$report_file" "php 7.4+ syntax suspects")"
hardcoded="$(count_hits "$report_file" "hardcoded credentials suspects")"
loopdb="$(count_hits "$report_file" "loop with db suspects")"
staticcache="$(count_hits "$report_file" "static array cache antipattern")"
risk="$(risk_level "$dangerous" "$raw" "$callback" "$payment" "$task" "$phpnew" "$hardcoded")"
notes="$(project_notes "$dangerous" "$callback" "$payment" "$task" "$raw" "$hardcoded" "$loopdb" "$staticcache")"
printf '%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s\n' \
"$project_name" "$risk" "$dangerous" "$raw" "$callback" "$payment" "$task" "$phpnew" \
"$hardcoded" "$loopdb" "$staticcache" "$notes" >> "$summary_csv"
printf '| %s | %s | %s | %s | %s | %s | %s | %s | %s | %s | %s | %s |\n' \
"$project_name" "$risk" "$dangerous" "$raw" "$callback" "$payment" "$task" "$phpnew" \
"$hardcoded" "$loopdb" "$staticcache" "$notes" >> "$summary_md"
if [ "$risk" = "high" ]; then
printf '%s | dangerous=%s raw=%s callback=%s payment=%s task=%s phpnew=%s hardcoded=%s loopdb=%s staticcache=%s | %s\n' \
"$project_name" "$dangerous" "$raw" "$callback" "$payment" "$task" "$phpnew" \
"$hardcoded" "$loopdb" "$staticcache" "$notes" >> "$high_risk_txt"
fi
done
echo
printf '[OK] summary csv: %s\n' "$summary_csv"
printf '[OK] summary md: %s\n' "$summary_md"
printf '[OK] high risk list: %s\n' "$high_risk_txt"
printf '[OK] project reports dir: %s\n' "$output_dir/projects"