@clawhub-mengbin92-0388fb031f
Professional long-form novel analysis and character relationship mapping + theme song generation. Automatically handles GBK/UTF-8 encoding, multi-GB file chu...
---
name: novel-character-graph
description: "Professional long-form novel analysis and character relationship mapping + theme song generation. Automatically handles GBK/UTF-8 encoding, multi-GB file chunking, chapter-by-chapter incremental parsing, supports TXT/EPUB/PDF/DOCX novel formats. Outputs complete character system, relationship maps, worldviews, weapon/item systems, interactive visualization HTML, and AI-generated theme songs (Suno). Use when users ask for: novel analysis, character mapping, relationship graphs, worldview architecture, story timeline organization, manga/comic adaptation settings, novel theme song, ending song, OST."
tags: [novel, analysis, character, relationship, graph, xianxia, wuxia, fantasy, song, music]
triggers:
- 小说分析
- 人物图谱
- 人物关系
- 关系图谱
- 世界观
- 小说人物
- 角色设定
- 主题曲
- 片头曲
- 片尾曲
- OST
- 小说配乐
required_commands:
- iconv
- ebook-convert
- pdftotext
- pandoc
usage_hint: >
当用户提到「小说分析」「人物图谱」「关系图」「世界观」「角色设定」时使用本skill。
当用户提到「主题曲」「片头曲」「片尾曲」「OST」「配乐」时也使用本skill(可独立使用,
不需要先做完整人物分析,直接基于用户描述的小说类型/情绪生成即可)。
---
# 小说人物图谱专业分析Skill + 主题曲生成
## 核心能力
本Skill专业处理百万字级长篇小说深度分析,解决大文件无法一次性解析的问题,并能为小说生成专属主题曲。
---
## 第一步:文件预处理流程
### 1.1 文件格式检测与编码修复
```bash
# 1. 检测文件基本信息
file -i <小说文件>
wc -l <小说文件>
ls -lh <小说文件>
# 2. 编码自动修复(中文小说90%为GBK编码)
cat <小说文件> | iconv -f GBK -t UTF-8 2>/dev/null > decoded.tmp && mv decoded.tmp <小说文件>_utf8.txt
# 3. 编码失败备选方案
iconv -f GB18030 -t UTF-8 <小说文件> 2>/dev/null
iconv -f BIG5 -t UTF-8 <小说文件> 2>/dev/null
```
### 1.2 支持的主流小说格式
| 格式 | 处理方法 | 依赖命令 |
|------|---------|---------|
| ✅ `.txt` | 直接分段读取 | - |
| ✅ `.epub` | `ebook-convert file.epub file.txt` | `calibre-bin` |
| ✅ `.pdf` | `pdftotext -layout file.pdf output.txt` | `poppler-utils` |
| ✅ `.docx` | `pandoc -s file.docx -o output.txt` | `pandoc` |
| ✅ `.mobi` | `ebook-convert file.mobi file.txt` | `calibre-bin` |
> 安装依赖:`sudo apt install calibre-bin poppler-utils pandoc`
---
## 第二步:大文件智能分段解析
### 2.1 分段策略(>10万行强制分段)
```
文件大小 < 500KB → 一次性读取
500KB - 2MB → 分2-4段
2MB - 10MB → 每500行一段增量解析
> 10MB → 每1000行一段增量解析
```
### 2.2 分段读取Shell模板
```bash
# 分段读取命令模板
cat <小说文件> | head -n +500 # 第1段
cat <小说文件> | tail -n +501 | head -n 500 # 第2段
cat <小说文件> | tail -n +1001 | head -n 500 # 第3段
# ... 以此类推
```
### 2.3 每段解析核心提取项
每段必须提取并增量更新:
- ✅ 新出场人物全名 + 别名 + 外号
- ✅ 人物关系变化(敌对/盟友/情侣/师徒/血缘)
- ✅ 关键剧情节点 + 时间线推进
- ✅ 武功/功法/神器/丹药系统更新
- ✅ 势力阵营/宗门/国家变化
- ✅ 重要伏笔 + 世界观补全
---
## 第三步:人物体系标准化建模
### 3.1 人物层级分类体系
| 层级 | 定义标准 | 详细程度 |
|------|---------|---------|
| S级核心人物 | 主角团 + 最终BOSS + 关键导师 | 深度解析:性格/成长线/经典场景/漫画设定 |
| A级重要人物 | 主要反派 + 主要盟友 + 主要女主 | 完整人物卡 |
| B级次要人物 | 重要配角 + 各门派掌门 | 基础信息 + 立场 |
| C级NPC | 路人甲 + 一次性炮灰 | 只计数不展开 |
### 3.2 标准人物卡模板
```markdown
### 🎭 人物名称
- **身份**:详细身份定位
- **修为/境界**:
- **性格特质**:3-5个核心关键词
- **核心能力/功法**:
- **人物关系网**:
- ❤️ 恋人:xxx
- 👨🏫 师父:xxx
- ⚔️ 死敌:xxx
- 👬 兄弟:xxx
- **经典分镜/漫画设定**:
```
---
## 第四步:关系图谱构建规范
### 4.1 四大核心关系类型
使用标准Emoji统一标记:
| 关系类型 | Emoji标记 |
|---------|-----------|
| 爱情/道侣 | ❤️ 💕 |
| 师徒/传承 | 👨🏫 👩🏫 |
| 兄弟/战友 | 👬 🤝 |
| 血缘/家族 | 👨👩👧 |
| 敌对/死仇 | ⚔️ 💀 |
| 上下级/从属 | 👑 🛡️ |
### 4.2 ASCII关系总图规范
```
【核心主角】⭐
┌───────────┼───────────┐
【女主1】💕 【兄弟】👬 【师父】👨🏫
│ │ │
家族线 战队线 功法线
```
---
## 第五步:输出交付标准
### 5.1 必须交付的三个文件
| 交付物 | 文件名 | 内容 |
|-------|--------|------|
| 1 | `《书名》_人物完整谱系.md` | 完整人物体系 + 势力架构 + 功法系统 + 时间线 |
| 2 | `《书名》_核心人物关系.md` | S级+A级人物深度解析 + 漫画分镜设定 |
| 3 | `index.html` | TailwindCSS + SVG交互式动态图谱网页 |
### 5.2 交互式网页标准组件
HTML页面必须包含4个标签页:
1. 🕸️ **关系图谱页** - SVG动态连线图
2. 👤 **人物介绍页** - 毛玻璃人物卡片
3. 🌍 **势力格局页** - 阵营分布 + 战力对比
4. 🎨 **漫画设定页** - 每人画风+色彩+经典分镜
---
## 第六步:质量检查清单
交付前必须验证:
✅ **编码正确** - 无乱码、无问号、无GBK残留
✅ **人物数量** - 长篇小说≥100人,中篇≥50人,短篇≥20人
✅ **无遗漏** - 女主/反派/导师/兄弟四大类完整
✅ **关系清晰** - 爱情/敌对/师徒/阵营四类关系无歧义
✅ **世界观** - 功法/境界/神器/丹药系统完整
---
## 第七步:主题曲生成(可选增值服务)
> 当用户要求生成主题曲、片头曲、片尾曲、OST、配乐时执行本节
> **注意**:主题曲功能可独立使用,不需要先做完整人物分析。
> 用户可以直接描述小说类型和想要的风格,跳过第一步到第六步直接到本节。
### 7.1 推荐主题供用户挑选
分析小说后,根据**类型/题材/情绪氛围**,从以下预设主题中推荐3-5个:
| 主题风格 | 适用场景 | 情绪关键词 |
|---------|---------|-----------|
| 🔥 **热血燃战** | 主角爆发、决战、逆袭 | 燃、热血、激昂、不屈 |
| 🌙 **古风柔情** | 爱情线、离别、思念 | 婉约、忧伤、唯美、深情 |
| ⚔️ **江湖侠气** | 武侠小说、门派争斗 | 豪迈、洒脱、义气、快意 |
| 🌌 **史诗宏大** | 世界观展示、神魔大战 | 庄严、壮阔、神秘、命运 |
| ❄️ **冰冽虐心** | 悲剧结局、失去、死亡 | 压抑、悲怆、孤独、破碎 |
| ☀️ **希望曙光** | 主角成长、胜利、重生 | 希望、温暖、崛起、救赎 |
| 🔮 **宿命轮回** | 穿越、系统、转生 | 轮回、命运、混沌、超脱 |
| 🌸 **青春校园** | 现代言情、校园恋爱 | 清新、甜蜜、悸动、纯真 |
**用户也可以自己指定主题风格**,直接进入7.2节。
### 7.2 Suno AI主题曲创作
根据小说内容构建Suno提示词:
#### Style字段公式
```
Genre + Mood + Era + Instruments + Vocal Style + Production + Dynamics
```
#### 小说主题曲Style参考模板
**🔥 热血燃战型**
```
Epic xianxia battle anthem, intense heroic atmosphere, ancient Chinese fantasy,
drums and erhu lead, male vocalist with powerful belting, orchestral percussion,
minor key with triumphant modulations, building from ominous intro to explosive chorus,
90BPM D minor
```
**🌙 古风柔情型**
```
Ancient Chinese romantic ballad, melancholic and ethereal, guqin and bamboo flute,
female vocalist with soulful nasally tone, sparse arrangement with gradual orchestral layers,
情感爆发点在副歌, Bb major 65BPM
```
**⚔️ 江湖侠气型**
```
Wuxia martial arts world soundtrack, bold and free-spirited, traditional Chinese instruments
pipa and dizi, male vocalist with raw gritty tone, rhythmic drums with folk elements,
outro fades to peaceful resolution, G minor 85BPM
```
**🌌 史诗宏大型**
```
Epic dark fantasy orchestral, divine and mysterious atmosphere, full symphonic orchestra
with choir, operatic male vocals with reverb, Gregorian chant elements,
命运感强烈, E minor 120BPM
```
#### Lyrics结构标签(写在歌词中)
```
[Intro] [Verse] [Pre-Chorus] [Chorus]
[Post-Chorus] [Bridge] [Interlude]
[Instrumental] [Outro]
```
#### 动态标签(描述演唱情绪)
| 标签 | 含义 |
|------|------|
| `[Powerful]` | 爆发力强 |
| `[Whispered]` | 低语诉说 |
| `[Soulful]` | 深情款款 |
| `[Gritty]` | 粗犷沙哑 |
| `[Falsetto]` | 假音飘渺 |
| `[Building Energy]` | 渐强递进 |
| `[Emotional Climax]` | 情感高潮 |
### 7.3 主题曲与视频合成
生成音频后,用ffmpeg合并视频:
```bash
# 方式一:6秒精华版(视频片段 + 歌曲开头)
ffmpeg -y -i <视频文件> -i <音频文件> \
-vf "fade=t=in:st=0:d=0.5,fade=t=out:st=5.5:d=0.5" \
-c:v libx264 -c:a aac -shortest \
<输出文件>
# 方式二:循环版(视频循环 + 歌曲完整版)
ffmpeg -y -stream_loop 9 -i <视频文件> -i <音频文件> \
-vf "fade=t=in:st=0:d=1,fade=t=out:st=54:d=2" \
-c:v libx264 -c:a aac -t <时长秒数> \
<输出文件>
```
---
## 第八步:进阶脚本(可选)
> 以下脚本位于 `scripts/` 目录,执行前先验证环境,无需用户批准即可安装依赖。
`scripts/chapter_split.py` - 按章节智能分割大文件
`scripts/character_extract.py` - NLP辅助提取人名(中文分词)
`scripts/generate_html.py` - 自动生成交互式图谱页面
### 8.1 chapter_split.py(章节分割)
```python
#!/usr/bin/env python3
"""智能章节分割脚本 - 按章节将大文件分成多个小文件"""
import re
import os
import sys
def split_by_chapters(file_path, output_dir=None):
"""检测章节标题并分割文件"""
with open(file_path, 'r', encoding='utf-8') as f:
content = f.read()
# 常见章节标题正则
chapter_pattern = re.compile(
r'^第[一二三四五六七八九十百千\d]+[章节卷部篇]\s*[::]\s*.+$',
re.MULTILINE
)
chapters = chapter_pattern.split(content)
if len(chapters) <= 1:
# 备选:简单按"第X章"分割
chapter_pattern = re.compile(r'第[一二三四五六七八九十百千\d]+章')
chapters = chapter_pattern.split(content)
base_name = os.path.splitext(os.path.basename(file_path))[0]
output_dir = output_dir or os.path.dirname(file_path)
for i, chapter_content in enumerate(chapters):
if chapter_content.strip():
output_path = os.path.join(output_dir, f"{base_name}_第{i+1}章.txt")
with open(output_path, 'w', encoding='utf-8') as f:
f.write(chapter_content.strip())
print(f"OK: {output_path}")
if __name__ == "__main__":
if len(sys.argv) < 2:
print("Usage: python chapter_split.py <novel.txt>")
sys.exit(1)
split_by_chapters(sys.argv[1])
```
### 8.2 character_extract.py(人名提取)
```python
#!/usr/bin/env python3
"""基于结巴分词的中文人名提取辅助工具"""
import re
import os
import sys
def extract_names(text, top_n=100):
"""提取文本中的中国人名(简单基于姓氏词典)"""
# 常见姓氏
surnames = r'[赵钱孙李周吴郑王冯陈褚卫蒋沈韩杨朱秦尤许何吕施张孔曹严华金魏陶姜戚谢邹喻柏水窦章云苏潘葛奚范彭郎鲁韦昌马苗凤花方俞任袁柳酆鲍史唐费廉岑薛雷贺倪汤滕殷罗毕郝邬安常乐于时傅皮卞齐康伍余元卜顾孟平黄穆萧尹姚邵堪汪祁毛禹狄米贝明臧计伏成戴谈宋茅庞熊纪舒屈项祝董梁杜阮蓝闵席季麻强贾路娄危江童颜郭梅盛林刁钟徐邱骆高夏蔡田樊胡凌霍虞万支柯咎管卢莫经房裘缪干解应宗丁宣贲邓郁单杭洪包诸左右崔吉钮龚程嵇邢滑裴陆荣翁荀羊于惠甄曲面加羊舌微巢关蒯相查后荆红游竺权逯盖益桓公万俟司马上官欧阳夏侯诸葛闻人东方赫连皇甫尉迟公羊澹台公冶宗政濮阳淳于单于太叔申屠公孙仲孙轩辕令狐钟离宇文长孙慕容鲜于闾丘司徒司空亓官司寇仉督子车颛孙端木巫马公西漆雕乐正壤驷公良拓跋夹谷宰父谷利段干百里东郭南门呼延归海羊舌微生岳帅缑亢况后有琴梁丘左丘东门西门商牟佘佴伯赏南宫墨哈谯笪年爱阳佟]'
# 简单的人名匹配(姓氏+1-2个汉字)
pattern = re.compile(rf'{surnames}[\u4e00-\u9fff]{{1,3}}')
names = pattern.findall(text)
# 去重并统计频率
name_freq = {}
for name in names:
name_freq[name] = name_freq.get(name, 0) + 1
sorted_names = sorted(name_freq.items(), key=lambda x: x[1], reverse=True)
return sorted_names[:top_n]
if __name__ == "__main__":
if len(sys.argv) < 2:
print("Usage: python character_extract.py <novel.txt> [top_n]")
sys.exit(1)
file_path = sys.argv[1]
top_n = int(sys.argv[2]) if len(sys.argv) > 2 else 100
with open(file_path, 'r', encoding='utf-8') as f:
text = f.read()
names = extract_names(text, top_n)
print(f"共提取到 {len(names)} 个人名:")
for name, freq in names:
print(f" {name}: {freq}次")
```
### 8.3 generate_html.py(生成交互图谱)
```python
#!/usr/bin/env python3
"""自动生成交互式人物关系图谱HTML"""
import os
import sys
TEMPLATE_PATH = os.path.join(os.path.dirname(__file__), '../assets/graph_template.html')
def generate_html(title, subtitle, characters, relationships, output_path='index.html'):
"""生成HTML图谱"""
with open(TEMPLATE_PATH, 'r', encoding='utf-8') as f:
template = f.read()
# 填充标题
template = template.replace('{{TITLE}}', title)
template = template.replace('{{SUBTITLE}}', subtitle)
# 生成人物卡片HTML
cards_html = ''
for char in characters:
cards_html += f'''
<div class="character-card bg-gray-800/50 rounded-2xl p-6 border border-gray-700/50">
<h3 class="text-xl font-bold text-orange-400 mb-2">{char['name']}</h3>
<p class="text-gray-300 text-sm">{char.get('identity', '')}</p>
<p class="text-gray-400 text-xs mt-2">修为:{char.get('realm', '未知')}</p>
</div>
'''
# 替换占位符
template = template.replace('<!-- CHARACTER_CARDS -->', cards_html)
with open(output_path, 'w', encoding='utf-8') as f:
f.write(template)
print(f"OK: {output_path}")
if __name__ == "__main__":
print("Usage: from generate_html import generate_html")
print(" generate_html('小说名', '副标题', characters_list, relationships_list)")
FILE:_meta.json
{
"ownerId": "kn73pfccc8e95j989m8js0x9zx82p86q",
"slug": "novel-character-graph",
"version": "1.0.1",
"publishedAt": 1776926026962
}
FILE:assets/graph_template.html
<!DOCTYPE html>
<html lang="zh-CN">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>{{TITLE}} - 人物关系图谱</title>
<script src="https://cdn.tailwindcss.com"></script>
<style>
@import url('https://fonts.googleapis.com/css2?family=Ma+Shan+Zheng&family=Noto+Sans+SC:wght@400;500;700&display=swap');
body { font-family: 'Noto Sans SC', sans-serif; background: linear-gradient(135deg, #1a1a2e 0%, #16213e 50%, #0f3460 100%); min-height: 100vh; }
.title-font { font-family: 'Ma Shan Zheng', cursive; }
.glow { text-shadow: 0 0 20px rgba(255, 107, 107, 0.8), 0 0 40px rgba(255, 107, 107, 0.4); }
.character-card { transition: all 0.4s cubic-bezier(0.175, 0.885, 0.32, 1.275); backdrop-filter: blur(10px); }
.character-card:hover { transform: translateY(-10px) scale(1.05); box-shadow: 0 25px 50px -12px rgba(0, 0, 0, 0.5); }
.relation-line { stroke-dasharray: 5, 5; animation: dash 2s linear infinite; }
@keyframes dash { to { stroke-dashoffset: -20; } }
@keyframes float { 0%, 100% { transform: translateY(0px); } 50% { transform: translateY(-10px); } }
.float-animation { animation: float 3s ease-in-out infinite; }
.tab-active { background: linear-gradient(135deg, #f97316 0%, #ea580c 100%); }
</style>
</head>
<body class="text-white p-8">
<div class="max-w-7xl mx-auto">
<!-- 标题区域 -->
<div class="text-center mb-12">
<h1 class="title-font text-6xl md:text-8xl glow text-orange-400 mb-4">{{TITLE}}</h1>
<p class="text-xl text-gray-300">{{SUBTITLE}}</p>
</div>
<!-- 标签页切换 -->
<div class="flex justify-center gap-4 mb-8 flex-wrap">
<button onclick="showTab('graph')" id="tab-graph" class="tab-active px-6 py-3 rounded-full font-bold transition-all duration-300">🕸️ 关系图谱</button>
<button onclick="showTab('characters')" id="tab-characters" class="bg-gray-700/50 hover:bg-gray-600/50 px-6 py-3 rounded-full font-bold transition-all duration-300">👤 人物介绍</button>
<button onclick="showTab('world')" id="tab-world" class="bg-gray-700/50 hover:bg-gray-600/50 px-6 py-3 rounded-full font-bold transition-all duration-300">🌍 势力格局</button>
<button onclick="showTab('comic')" id="tab-comic" class="bg-gray-700/50 hover:bg-gray-600/50 px-6 py-3 rounded-full font-bold transition-all duration-300">🎨 漫画设定</button>
</div>
<!-- 内容面板占位 -->
<div id="content-graph" class="content-panel">
<div class="bg-gray-800/30 rounded-3xl p-8 backdrop-blur-lg border border-gray-700/50">
<h2 class="title-font text-4xl text-center text-orange-400 mb-8">核心人物关系网</h2>
<!-- SVG图谱位置 -->
<div class="flex justify-center">
<!-- CHARACTER_GRAPH_SVG -->
</div>
</div>
</div>
<div id="content-characters" class="content-panel hidden">
<div class="grid grid-cols-1 md:grid-cols-2 lg:grid-cols-3 gap-6">
<!-- CHARACTER_CARDS -->
</div>
</div>
<div id="content-world" class="content-panel hidden">
<!-- WORLD_VIEW -->
</div>
<div id="content-comic" class="content-panel hidden">
<!-- COMIC_SETTINGS -->
</div>
</div>
<script>
function showTab(tabName) {
document.querySelectorAll('.content-panel').forEach(p => p.classList.add('hidden'));
document.querySelectorAll('[id^="tab-"]').forEach(t => {
t.classList.remove('tab-active');
t.classList.add('bg-gray-700/50', 'hover:bg-gray-600/50');
});
document.getElementById('content-' + tabName).classList.remove('hidden');
document.getElementById('tab-' + tabName).classList.add('tab-active');
document.getElementById('tab-' + tabName).classList.remove('bg-gray-700/50', 'hover:bg-gray-600/50');
}
</script>
</body>
</html>
FILE:references/format_handbook.md
# 小说格式处理手册
## 编码问题终极解决方案
### 中文小说编码问题诊断
99%的中文TXT小说编码问题:
- 老小说:GBK / GB18030 编码(起点/纵横等老站)
- 新小说:UTF-8 编码
- 台湾小说:BIG5 编码
### 批量编码转换脚本
```bash
# 批量转换GBK→UTF-8
for f in *.txt; do
iconv -f GBK -t UTF-8 "$f" -o "f%.txt_utf8.txt" 2>/dev/null && mv "f%.txt_utf8.txt" "$f" && echo "OK: $f" || echo "FAIL: $f"
done
```
---
## 各格式转换工具安装
### EPUB/MOBI 处理
```bash
# Debian/Ubuntu
sudo apt install calibre-bin poppler-utils pandoc
# 转换命令
ebook-convert novel.epub novel.txt
ebook-convert novel.mobi novel.txt
```
### PDF 处理
```bash
# 最佳文本提取工具
pdftotext -layout novel.pdf output.txt
pdftotext -f 10 -l 100 novel.pdf output.txt # 指定页码范围
```
### DOCX 处理
```bash
pandoc -s novel.docx -o output.txt
```
---
## 大文件分段Shell脚本
```bash
#!/bin/bash
# split_novel.sh - 智能分段脚本
FILE=$1
LINES_PER_CHUNK=500
TOTAL_LINES=$(wc -l < "$FILE")
CHUNKS=$(( (TOTAL_LINES + LINES_PER_CHUNK - 1) / LINES_PER_CHUNK ))
echo "Total: $TOTAL_LINES lines, split into $CHUNKS chunks"
for i in $(seq 1 $CHUNKS); do
START=$(( (i-1) * LINES_PER_CHUNK + 1 ))
echo "=== Chunk $i / $CHUNKS (lines $START - $((START + LINES_PER_CHUNK - 1))) ==="
cat "$FILE" | tail -n +$START | head -n $LINES_PER_CHUNK
done
```
FILE:scripts/chapter_split.py
#!/usr/bin/env python3
"""智能章节分割脚本 - 按章节将大文件分成多个小文件"""
import re
import os
import sys
def split_by_chapters(file_path, output_dir=None):
"""检测章节标题并分割文件"""
with open(file_path, 'r', encoding='utf-8') as f:
content = f.read()
# 常见章节标题正则(第X章 / 第X卷 / 第X部)
chapter_pattern = re.compile(
r'^第[一二三四五六七八九十百千\d]+[章节卷部篇]\s*[::]\s*.+$',
re.MULTILINE
)
chapters = chapter_pattern.split(content)
if len(chapters) <= 1:
# 备选:简单按"第X章"分割
chapter_pattern = re.compile(r'第[一二三四五六七八九十百千\d]+章')
chapters = chapter_pattern.split(content)
base_name = os.path.splitext(os.path.basename(file_path))[0]
output_dir = output_dir or os.path.dirname(file_path)
for i, chapter_content in enumerate(chapters):
if chapter_content.strip():
output_path = os.path.join(output_dir, f"{base_name}_第{i+1}章.txt")
with open(output_path, 'w', encoding='utf-8') as f:
f.write(chapter_content.strip())
print(f"OK: {output_path}")
if __name__ == "__main__":
if len(sys.argv) < 2:
print("Usage: python chapter_split.py <novel.txt>")
sys.exit(1)
split_by_chapters(sys.argv[1])
FILE:scripts/character_extract.py
#!/usr/bin/env python3
"""基于结巴分词的中文人名提取辅助工具"""
import re
import os
import sys
def extract_names(text, top_n=100):
"""提取文本中的中国人名(简单基于姓氏词典)"""
# 常见姓氏(百家姓前100)
surnames = r'[赵钱孙李周吴郑王冯陈褚卫蒋沈韩杨朱秦尤许何吕施张孔曹严华金魏陶姜戚谢邹喻柏水窦章云苏潘葛奚范彭郎鲁韦昌马苗凤花方俞任袁柳酆鲍史唐费廉岑薛雷贺倪汤滕殷罗毕郝邬安常乐于时傅皮卞齐康伍余元卜顾孟平黄穆萧尹姚邵堪汪祁毛禹狄米贝明臧计伏成戴谈宋茅庞熊纪舒屈项祝董梁杜阮蓝闵席季麻强贾路娄危江童颜郭梅盛林刁钟徐邱骆高夏蔡田樊胡凌霍虞万支柯咎管卢莫经房裘缪干解应宗丁宣贲邓郁单杭洪包诸左右崔吉钮龚程嵇邢滑裴陆荣翁荀羊于惠甄曲面加羊舌微巢关蒯相查后荆红游竺权逯盖益桓公万俟司马上官欧阳夏侯诸葛闻人东方赫连皇甫尉迟公羊澹台公冶宗政濮阳淳于单于太叔申屠公孙仲孙轩辕令狐钟离宇文长孙慕容鲜于闾丘司徒司空亓官司寇仉督子车颛孙端木巫马公西漆雕乐正壤驷公良拓跋夹谷宰父谷利段干百里东郭南门呼延归海羊舌微生岳帅缑亢况后有琴梁丘左丘东门西门商牟佘佴伯赏南宫墨哈谯笪年爱阳佟]'
# 简单的人名匹配(姓氏+1-2个汉字)
pattern = re.compile(rf'{surnames}[\u4e00-\u9fff]{1,3}')
names = pattern.findall(text)
# 去重并统计频率
name_freq = {}
for name in names:
name_freq[name] = name_freq.get(name, 0) + 1
sorted_names = sorted(name_freq.items(), key=lambda x: x[1], reverse=True)
return sorted_names[:top_n]
if __name__ == "__main__":
if len(sys.argv) < 2:
print("Usage: python character_extract.py <novel.txt> [top_n]")
sys.exit(1)
file_path = sys.argv[1]
top_n = int(sys.argv[2]) if len(sys.argv) > 2 else 100
with open(file_path, 'r', encoding='utf-8') as f:
text = f.read()
names = extract_names(text, top_n)
print(f"共提取到 {len(names)} 个人名:")
for name, freq in names:
print(f" {name}: {freq}次")
FILE:scripts/generate_html.py
#!/usr/bin/env python3
"""自动生成交互式人物关系图谱HTML"""
import os
import sys
TEMPLATE_PATH = os.path.join(os.path.dirname(__file__), '../assets/graph_template.html')
def generate_html(title, subtitle, characters, relationships, output_path='index.html'):
"""生成HTML图谱
参数:
title: 小说标题
subtitle: 副标题
characters: list of dict, 每个人物包含 name/identity/realm 等字段
relationships: list of tuple, (人物A, 人物B, 关系类型)
output_path: 输出HTML路径
"""
with open(TEMPLATE_PATH, 'r', encoding='utf-8') as f:
template = f.read()
# 填充标题
template = template.replace('{{TITLE}}', title)
template = template.replace('{{SUBTITLE}}', subtitle)
# 生成人物卡片HTML
cards_html = ''
for char in characters:
cards_html += f'''
<div class="character-card bg-gray-800/50 rounded-2xl p-6 border border-gray-700/50">
<h3 class="text-xl font-bold text-orange-400 mb-2">{char['name']}</h3>
<p class="text-gray-300 text-sm">{char.get('identity', '')}</p>
<p class="text-gray-400 text-xs mt-2">修为:{char.get('realm', '未知')}</p>
</div>
'''
# 生成关系图谱SVG
graph_svg = generate_svg_graph(characters, relationships)
# 替换占位符
template = template.replace('<!-- CHARACTER_CARDS -->', cards_html)
template = template.replace('<!-- CHARACTER_GRAPH_SVG -->', graph_svg)
with open(output_path, 'w', encoding='utf-8') as f:
f.write(template)
print(f"OK: {output_path}")
return output_path
def generate_svg_graph(characters, relationships, width=1200, height=800):
"""生成SVG关系图
参数:
characters: 人物列表
relationships: 关系列表 [(A, B, '类型'), ...]
"""
# 简单的中心辐射布局
svg_parts = [f'<svg viewBox="0 0 {width} {height}" class="w-full h-auto">']
# 绘制连线
relations_colors = {
'爱情': '#ff6b9d',
'师徒': '#c084fc',
'兄弟': '#60a5fa',
'血缘': '#fb923c',
'敌对': '#ef4444',
'从属': '#a3e635',
}
center_x, center_y = width // 2, height // 2
radius = min(width, height) // 3
for i, (a, b, rtype) in enumerate(relationships):
# 简化处理:随机位置
ax = center_x + (i * 137) % width
ay = center_y + (i * 97) % height
bx = center_x + ((i + 3) * 137) % width
by = center_y + ((i + 3) * 97) % height
color = relations_colors.get(rtype, '#94a3b8')
svg_parts.append(
f'<line x1="{ax}" y1="{ay}" x2="{bx}" y2="{by}" '
f'stroke="{color}" stroke-width="2" stroke-dasharray="5,5" opacity="0.7"/>'
)
# 绘制人物节点
for i, char in enumerate(characters[:20]): # 最多20个
x = center_x + (i * 137) % width
y = center_y + (i * 97) % height
svg_parts.append(
f'<circle cx="{x}" cy="{y}" r="30" fill="#f97316" opacity="0.8"/>'
f'<text x="{x}" y="{y+5}" text-anchor="middle" fill="white" font-size="12">'
f'{char["name"][:4]}</text>'
)
svg_parts.append('</svg>')
return '\n'.join(svg_parts)
if __name__ == "__main__":
print("Usage: from generate_html import generate_html")
print(" generate_html('小说名', '副标题', characters_list, relationships_list)")
Analyze and validate SKILL.md files for best practices, common issues, and improvement suggestions. Use when reviewing a Skill, creating a new Skill, or when...
---
name: skill-linter
description: Analyze and validate SKILL.md files for best practices, common issues, and improvement suggestions. Use when reviewing a Skill, creating a new Skill, or when asked to check/audit/improve a SKILL.md file.
allowed-tools: Read, Edit, Write
---
# Skill Linter & Advisor
Analyze SKILL.md files against Claude Code Skills best practices and provide actionable feedback.
## Analysis Process
1. **Read the SKILL.md file** - Load the complete content
2. **Parse frontmatter** - Validate YAML structure and required fields
3. **Check content structure** - Verify best practices for the markdown body
4. **Compare against patterns** - Match against known good Skill patterns
5. **Generate report** - Provide structured feedback with severity levels
## Validation Checklist
### Frontmatter (YAML Header)
| Check | Severity | Description |
|-------|----------|-------------|
| Has `---` delimiters | 🔴 Critical | Must have opening and closing `---` |
| Valid YAML syntax | 🔴 Critical | YAML must parse without errors |
| Has `name` field | 🟡 Warning | Defaults to directory name, but explicit is better |
| Has `description` | 🔴 Critical | Required for auto-trigger to work |
| Description quality | 🟡 Warning | Should be specific, mention when to use |
| `disable-model-invocation` | 🟢 Info | Only set if you want manual-only |
| `user-invocable` | 🟢 Info | Set to false to hide from / menu |
| `allowed-tools` | 🟡 Warning | Specify if Skill needs specific tools |
| `model` override | 🟢 Info | Only if you need specific model |
| `context: fork` | 🟢 Info | Use for long-running or isolated tasks |
| `agent` with context | 🟢 Info | Required when context: fork |
### Content Structure
| Check | Severity | Description |
|-------|----------|-------------|
| Has clear title/heading | 🟡 Warning | First line should indicate purpose |
| Has process/steps | 🟡 Warning | Skills should have actionable steps |
| Has output format | 🟡 Warning | Define expected output structure |
| Uses specific language | 🟡 Warning | Avoid vague terms like "etc", "etc." |
| Has examples | 🟢 Info | Concrete examples improve reliability |
| Has constraints/guardrails | 🟢 Info | Define what NOT to do |
| Appropriate length | 🟡 Warning | Too short (<100 words) or too long (>2000) |
### Common Issues
| Issue | Severity | Fix |
|-------|----------|-----|
| Missing description | 🔴 Critical | Add description explaining when to trigger |
| Description too vague | 🟡 Warning | Be specific about use cases |
| No clear output format | 🟡 Warning | Add expected output structure |
| Missing tool declarations | 🟡 Warning | Add `allowed-tools` if using tools |
| Too many responsibilities | 🟡 Warning | Split into multiple focused Skills |
| Hardcoded paths | 🟡 Warning | Use variables or relative paths |
| No error handling guidance | 🟢 Info | Add what to do when things go wrong |
## Output Format
```
# Skill Analysis Report
## File: {filepath}
### Frontmatter Analysis
| Field | Status | Value | Notes |
|-------|--------|-------|-------|
| name | ✅/⚠️/❌ | {value} | {feedback} |
| description | ✅/⚠️/❌ | {value} | {feedback} |
| ... | | | |
**Frontmatter Score:** X/10
### Content Analysis
| Check | Status | Notes |
|-------|--------|-------|
| Has clear purpose | ✅/⚠️/❌ | {feedback} |
| Actionable steps | ✅/⚠️/❌ | {feedback} |
| Output format defined | ✅/⚠️/❌ | {feedback} |
| Has examples | ✅/⚠️/❌ | {feedback} |
| Appropriate length | ✅/⚠️/❌ | {word_count} words |
**Content Score:** X/10
### Issues Found
#### 🔴 Critical (Must Fix)
1. {issue description} → {fix suggestion}
#### 🟡 Warnings (Should Fix)
1. {issue description} → {fix suggestion}
#### 🟢 Suggestions (Nice to Have)
1. {issue description} → {fix suggestion}
### Overall Assessment
**Total Score:** X/10
**Verdict:**
- ✅ Excellent - Ready to use
- 🟡 Good - Minor improvements suggested
- ⚠️ Needs Work - Address warnings before using
- ❌ Critical Issues - Must fix before using
### Recommended Actions
1. {action item}
2. {action item}
3. {action item}
### Improved Version (Optional)
If significant improvements are needed, provide a rewritten SKILL.md:
```yaml
---
# improved frontmatter
---
# Improved content...
```
```
## Skill Patterns Reference
### Pattern 1: Checklist/Task Skill
For: Code review, testing, validation tasks
Structure:
- Clear trigger description
- Step-by-step process
- Checklist categories
- Severity ratings
- Structured output format
### Pattern 2: Generator Skill
For: Documentation, commit messages, reports
Structure:
- Input requirements
- Analysis steps
- Template/format specification
- Examples
- Constraints
### Pattern 3: Explorer/Research Skill
For: Code exploration, debugging, analysis
Structure:
- Context gathering (!commands)
- Investigation steps
- What to look for
- How to present findings
### Pattern 4: Workflow Skill
For: Multi-step processes, releases, deployments
Structure:
- Prerequisites check
- Sequential steps
- Validation points
- Rollback guidance
## Examples of Good Descriptions
✅ **Good:**
- "Perform a thorough code review following the team checklist. Use when reviewing code changes, pull requests, or when the user asks for a code review."
- "Generate API documentation from source code. Use when the user asks to document an API endpoint, route handler, or controller."
- "Create a standardized git commit message following Conventional Commits format. Use when the user asks to commit or create a commit message."
❌ **Bad:**
- "Does code review" (too vague)
- "Helps with documentation" (when?)
- "A skill for git" (too broad)
## Examples of Good Output Formats
✅ **Good:**
```markdown
## Output Format
Structure your review as:
**Summary**
[One-paragraph overall assessment]
**Critical Issues**
[Must fix before merging]
**Approved?**
[YES / NO / YES WITH CONDITIONS]
```
❌ **Bad:**
```markdown
Just give me a review of the code.
```
FILE:skill_linter.py
#!/usr/bin/env python3
"""
Skill Linter - 检查 SKILL.md 文件的最佳实践
"""
import sys
import re
import yaml
from pathlib import Path
from dataclasses import dataclass, field
from typing import List, Optional
@dataclass
class Issue:
severity: str # critical, warning, info
category: str
message: str
suggestion: str
@dataclass
class AnalysisResult:
filepath: str
frontmatter: dict = field(default_factory=dict)
content: str = ""
issues: List[Issue] = field(default_factory=list)
word_count: int = 0
@property
def critical_count(self):
return sum(1 for i in self.issues if i.severity == "critical")
@property
def warning_count(self):
return sum(1 for i in self.issues if i.severity == "warning")
@property
def info_count(self):
return sum(1 for i in self.issues if i.severity == "info")
def parse_skill_md(filepath: str) -> AnalysisResult:
"""解析 SKILL.md 文件"""
result = AnalysisResult(filepath=filepath)
try:
content = Path(filepath).read_text(encoding='utf-8')
except Exception as e:
result.issues.append(Issue(
severity="critical",
category="file",
message=f"无法读取文件: {e}",
suggestion="检查文件路径和权限"
))
return result
result.content = content
result.word_count = len(content.split())
# 解析 frontmatter
frontmatter_match = re.match(r'^---\s*\n(.*?)\n---\s*\n', content, re.DOTALL)
if frontmatter_match:
try:
result.frontmatter = yaml.safe_load(frontmatter_match.group(1)) or {}
except yaml.YAMLError as e:
result.issues.append(Issue(
severity="critical",
category="frontmatter",
message=f"YAML 解析错误: {e}",
suggestion="检查 YAML 语法,确保没有制表符或格式错误"
))
else:
result.issues.append(Issue(
severity="critical",
category="frontmatter",
message="缺少 YAML frontmatter (--- 分隔符)",
suggestion="在文件开头添加 ---\n[你的配置]\n---"
))
return result
def analyze_frontmatter(result: AnalysisResult):
"""分析 frontmatter"""
fm = result.frontmatter
# 检查 name
if 'name' not in fm:
result.issues.append(Issue(
severity="warning",
category="frontmatter",
message="缺少 'name' 字段",
suggestion="添加 name 字段来明确 Skill 名称,如: name: my-skill"
))
elif not fm.get('name'):
result.issues.append(Issue(
severity="warning",
category="frontmatter",
message="'name' 字段为空",
suggestion="提供一个有意义的 Skill 名称"
))
# 检查 description
if 'description' not in fm:
result.issues.append(Issue(
severity="critical",
category="frontmatter",
message="缺少 'description' 字段",
suggestion="description 是必需的,它告诉 Claude 何时使用该 Skill。例如: 'Perform code review when asked to review changes'"
))
else:
desc = fm.get('description', '')
if len(desc) < 20:
result.issues.append(Issue(
severity="warning",
category="frontmatter",
message="description 太短",
suggestion="description 应该具体说明 Skill 的功能和触发时机,至少 20 个字符"
))
elif len(desc) > 300:
result.issues.append(Issue(
severity="info",
category="frontmatter",
message="description 较长",
suggestion="description 会被用于判断相关性,保持简洁明了"
))
# 检查 description 质量
vague_terms = ['etc', 'etc.', 'something', 'things', 'stuff']
for term in vague_terms:
if term in desc.lower():
result.issues.append(Issue(
severity="warning",
category="frontmatter",
message=f"description 包含模糊词汇 '{term}'",
suggestion="使用具体的描述,避免模糊词汇"
))
# 检查 allowed-tools
if 'allowed-tools' not in fm:
result.issues.append(Issue(
severity="info",
category="frontmatter",
message="未指定 'allowed-tools'",
suggestion="如果 Skill 需要特定工具,添加 allowed-tools 字段,如: allowed-tools: Read, Edit, Bash"
))
# 检查 context 和 agent
if fm.get('context') == 'fork' and 'agent' not in fm:
result.issues.append(Issue(
severity="warning",
category="frontmatter",
message="context: fork 但未指定 agent",
suggestion="当使用 context: fork 时,应该指定 agent 类型,如: agent: Explore"
))
def analyze_content(result: AnalysisResult):
"""分析内容部分"""
content = result.content
# 提取 frontmatter 后的内容
frontmatter_match = re.match(r'^---\s*\n.*?\n---\s*\n', content, re.DOTALL)
if frontmatter_match:
body = content[frontmatter_match.end():]
else:
body = content
# 检查是否有标题
if not re.search(r'^#+\s+', body, re.MULTILINE):
result.issues.append(Issue(
severity="warning",
category="content",
message="内容缺少标题/标题",
suggestion="添加一个清晰的标题来说明 Skill 的目的"
))
# 检查是否有步骤/流程
has_steps = bool(re.search(r'(^|\n)\d+\.', body)) or \
bool(re.search(r'(^|\n)[\*\-\+]\s', body)) or \
bool(re.search(r'(?i)(step|process|workflow|procedure)', body))
if not has_steps:
result.issues.append(Issue(
severity="warning",
category="content",
message="内容缺少明确的步骤或流程",
suggestion="添加编号步骤或项目符号列表来指导 Claude 执行"
))
# 检查是否有输出格式说明
has_output_format = bool(re.search(r'(?i)(output format|output|format|structure|template)', body))
if not has_output_format:
result.issues.append(Issue(
severity="info",
category="content",
message="未明确指定输出格式",
suggestion="添加 '## Output Format' 部分来定义期望的输出结构"
))
# 检查是否有示例
has_examples = bool(re.search(r'```', body)) or \
bool(re.search(r'(?i)(example|for example|e\.g\.)', body))
if not has_examples:
result.issues.append(Issue(
severity="info",
category="content",
message="内容缺少示例",
suggestion="添加代码块或具体示例来帮助 Claude 理解期望的输出"
))
# 检查长度
word_count = len(body.split())
if word_count < 50:
result.issues.append(Issue(
severity="warning",
category="content",
message=f"内容较短 ({word_count} 词)",
suggestion="考虑添加更多细节和指导,使 Skill 更实用"
))
elif word_count > 1500:
result.issues.append(Issue(
severity="info",
category="content",
message=f"内容较长 ({word_count} 词)",
suggestion="长内容没问题,但确保结构清晰,便于 Claude 遵循"
))
# 检查模糊词汇
vague_patterns = [
(r'\betc\.?\b', "使用 'etc' 可能过于模糊"),
(r'\bsomething\b', "使用 'something' 不够具体"),
(r'\bthings?\b', "使用 'things' 不够具体"),
(r'\bstuff\b', "使用 'stuff' 不够专业"),
]
for pattern, msg in vague_patterns:
if re.search(pattern, body, re.IGNORECASE):
result.issues.append(Issue(
severity="info",
category="content",
message=msg,
suggestion="使用具体的术语替代模糊词汇"
))
def calculate_score(result: AnalysisResult) -> int:
"""计算总分"""
score = 10
score -= result.critical_count * 3
score -= result.warning_count * 1
score -= result.info_count * 0.5
return max(0, int(score))
def print_report(result: AnalysisResult):
"""打印分析报告"""
print(f"\n{'='*60}")
print(f"# Skill Analysis Report")
print(f"{'='*60}")
print(f"\n## 文件: {result.filepath}")
# Frontmatter 部分
print(f"\n## Frontmatter 分析")
print(f"\n| 字段 | 值 |")
print(f"|------|-----|")
for key, value in result.frontmatter.items():
display_value = str(value)[:50] + "..." if len(str(value)) > 50 else str(value)
print(f"| {key} | {display_value} |")
# 问题列表
print(f"\n## 发现的问题")
critical_issues = [i for i in result.issues if i.severity == "critical"]
warning_issues = [i for i in result.issues if i.severity == "warning"]
info_issues = [i for i in result.issues if i.severity == "info"]
if critical_issues:
print(f"\n### 🔴 严重问题 (必须修复)")
for i, issue in enumerate(critical_issues, 1):
print(f"{i}. **{issue.message}**")
print(f" → 建议: {issue.suggestion}")
if warning_issues:
print(f"\n### 🟡 警告 (建议修复)")
for i, issue in enumerate(warning_issues, 1):
print(f"{i}. **{issue.message}**")
print(f" → 建议: {issue.suggestion}")
if info_issues:
print(f"\n### 🟢 建议 (可选)")
for i, issue in enumerate(info_issues, 1):
print(f"{i}. **{issue.message}**")
print(f" → 建议: {issue.suggestion}")
if not result.issues:
print("\n✅ 没有发现任何问题!")
# 评分
score = calculate_score(result)
print(f"\n## 总体评估")
print(f"\n**评分:** {score}/10")
if score >= 9:
verdict = "✅ 优秀 - 可以直接使用"
elif score >= 7:
verdict = "🟡 良好 - 建议进行小幅改进"
elif score >= 5:
verdict = "⚠️ 需要改进 - 使用前请修复警告"
else:
verdict = "❌ 严重问题 - 必须修复后才能使用"
print(f"**结论:** {verdict}")
print(f"\n**统计:** {result.critical_count} 严重 | {result.warning_count} 警告 | {result.info_count} 建议")
print(f"**字数:** {result.word_count} 词")
def main():
if len(sys.argv) < 2:
print("用法: python3 skill_linter.py <SKILL.md 路径>")
print("示例: python3 skill_linter.py ~/.claude/skills/my-skill/SKILL.md")
sys.exit(1)
filepath = sys.argv[1]
if not Path(filepath).exists():
print(f"错误: 文件不存在: {filepath}")
sys.exit(1)
# 解析和分析
result = parse_skill_md(filepath)
if result.frontmatter: # 只有成功解析才继续
analyze_frontmatter(result)
analyze_content(result)
# 打印报告
print_report(result)
# 返回退出码
sys.exit(0 if result.critical_count == 0 else 1)
if __name__ == "__main__":
main()
Search the web and fetch URL content using DuckDuckGo. Use when the user wants to search for information online without requiring API keys or paid services....
---
name: duckduckgo-search
description: Search the web and fetch URL content using DuckDuckGo. Use when the user wants to search for information online without requiring API keys or paid services. Supports text search with results including title URL and snippet. Also supports URL fetching to extract readable content from web pages. Triggers on phrases like "search for" "look up" "find information about" "fetch url" "get page content" or when web_search is unavailable.
---
# DuckDuckGo Search & Fetch
Search the web and fetch URL content using DuckDuckGo (no API key required).
## Prerequisites
需要安装依赖:
```bash
pip3 install duckduckgo-search
```
## 功能
### 1. 网页搜索 (ddg_search.py)
```bash
python3 scripts/ddg_search.py "your search query" [--max-results 10]
```
### 2. 网页抓取 (ddg_fetch.py)
```bash
python3 scripts/ddg_fetch.py "https://example.com" [--timeout 30]
```
## Usage Examples
### 搜索
```bash
# Basic search
python3 scripts/ddg_search.py "OpenClaw AI agent"
# Search with more results
python3 scripts/ddg_search.py "Python best practices" --max-results 15
```
### 抓取网页
```bash
# Fetch a webpage
python3 scripts/ddg_fetch.py "https://openclaw.ai"
# With custom timeout
python3 scripts/ddg_fetch.py "https://example.com" --timeout 15
# Plain text output
python3 scripts/ddg_fetch.py "https://example.com" --format text
```
## Output Format
### 搜索结果 (JSON)
```json
{
"query": "search query",
"count": 10,
"results": [
{
"title": "Result title",
"url": "https://example.com",
"snippet": "Description snippet"
}
]
}
```
### 抓取结果 (JSON)
```json
{
"url": "https://example.com",
"title": "Page Title",
"text": "Extracted readable content...",
"description": "Meta description",
"status_code": 200,
"error": null
}
```
## Integration with OpenClaw
Example workflow
```python
# Search
result = exec({
"command": "python3 /path/to/skills/duckduckgo-search/scripts/ddg_search.py query"
})
# Parse: json.loads(result.stdout)
# Fetch URL
result = exec({
"command": "python3 /path/to/skills/duckduckgo-search/scripts/ddg_fetch.py https://example.com"
})
# Parse: json.loads(result.stdout)
```
FILE:README.md
# OpenClaw Skills
A collection of useful skills for OpenClaw.
## Skills
### skill-linter
Analyze and validate SKILL.md files for best practices, common issues, and improvement suggestions.
**Usage:**
```bash
/skill-linter /path/to/SKILL.md
```
### duckduckgo-search
Search the web and fetch URL content using DuckDuckGo (no API key required).
**Prerequisites:**
```bash
pip3 install duckduckgo-search
```
**Usage:**
```bash
# Search
python3 duckduckgo-search/scripts/ddg_search.py "your query" --max-results 10
# Fetch URL
python3 duckduckgo-search/scripts/ddg_fetch.py "https://example.com"
```
## Installation
Install via ClawHub:
```bash
clawhub install skill-linter
clawhub install duckduckgo-search
```
## License
MIT
FILE:scripts/ddg_fetch.py
#!/usr/bin/env python3
"""
DuckDuckGo Web Fetcher
Fetch and extract readable content from URLs
"""
import argparse
import json
import sys
import urllib.request
import urllib.parse
import re
from html.parser import HTMLParser
class MLStripper(HTMLParser):
"""Strip HTML tags and decode entities"""
def __init__(self):
super().__init__()
self.reset()
self.fed = []
def handle_data(self, d):
self.fed.append(d)
def handle_entityref(self, name):
entities = {
'amp': '&', 'lt': '<', 'gt': '>', 'quot': '"',
'apos': "'", 'nbsp': ' ', 'copy': '©', 'reg': '®',
'trade': '™', 'hellip': '…', 'mdash': '—', 'ndash': '–'
}
self.fed.append(entities.get(name, f'&{name};'))
def get_data(self):
return ''.join(self.fed)
def strip_tags(html):
"""Remove HTML tags from string"""
s = MLStripper()
try:
s.feed(html)
except:
return html
return s.get_data()
def clean_text(text):
"""Clean and normalize text"""
# Remove extra whitespace
text = re.sub(r'\s+', ' ', text)
# Remove special characters but keep common ones
text = re.sub(r'[^\w\s.,!?;:\-\'"()\[\]中文中文\-]', '', text)
return text.strip()
def fetch_url(url: str, timeout: int = 30):
"""
Fetch a URL and extract readable content.
Args:
url: URL to fetch
timeout: Request timeout in seconds
Returns:
dict: Fetched content with title, text, and metadata
"""
result = {
"url": url,
"title": "",
"text": "",
"content": "",
"error": None,
"status_code": 0
}
try:
# Validate URL
parsed = urllib.parse.urlparse(url)
if not parsed.scheme:
# Add https if missing
url = 'https://' + url
parsed = urllib.parse.urlparse(url)
if not parsed.netloc:
result["error"] = "Invalid URL"
return result
# Build request
req = urllib.request.Request(
url,
headers={
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.0 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.0',
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
'Accept-Language': 'en-US,en;q=0.9,zh-CN;q=0.8,zh;q=0.7',
'Accept-Encoding': 'gzip, deflate',
'DNT': '1',
'Connection': 'keep-alive',
}
)
# Make request
with urllib.request.urlopen(req, timeout=timeout) as response:
result["status_code"] = response.status
# Get content encoding
content_encoding = response.headers.get('Content-Encoding', '')
# Read content
content = response.read()
# Decode based on encoding
if content_encoding == 'gzip':
import gzip
content = gzip.decompress(content)
html = content.decode('utf-8', errors='ignore')
# Extract title
title_match = re.search(r'<title[^>]*>([^<]+)</title>', html, re.IGNORECASE)
if title_match:
result["title"] = strip_tags(title_match.group(1)).strip()
# Try to find main content
contentSelectors = [
# Common article content selectors
(r'<article[^>]*>(.*?)</article>', 'article'),
(r'<main[^>]*>(.*?)</main>', 'main'),
(r'<div[^>]*class="[^"]*content[^"]*"[^>]*>(.*?)</div>', 'content div'),
(r'<div[^>]*class="[^"]*article[^"]*"[^>]*>(.*?)</div>', 'article div'),
(r'<div[^>]*class="[^"]*post[^"]*"[^>]*>(.*?)</div>', 'post div'),
(r'<div[^>]*id="[^"]*content[^"]*"[^>]*>(.*?)</div>', 'content div'),
(r'<div[^>]*id="[^"]*main[^"]*"[^>]*>(.*?)</div>', 'main div'),
]
main_content = ""
for pattern, name in contentSelectors:
match = re.search(pattern, html, re.DOTALL | re.IGNORECASE)
if match:
main_content = match.group(1)
break
# If no content found, use body
if not main_content:
body_match = re.search(r'<body[^>]*>(.*?)</body>', html, re.DOTALL | re.IGNORECASE)
if body_match:
main_content = body_match.group(1)
# Extract text from content
if main_content:
# Remove script and style elements
main_content = re.sub(r'<script[^>]*>.*?</script>', '', main_content, flags=re.DOTALL | re.IGNORECASE)
main_content = re.sub(r'<style[^>]*>.*?</style>', '', main_content, flags=re.DOTALL | re.IGNORECASE)
main_content = re.sub(r'<noscript[^>]*>.*?</noscript>', '', main_content, flags=re.DOTALL | re.IGNORECASE)
main_content = re.sub(r'<!--.*?-->', '', main_content, flags=re.DOTALL)
# Strip tags and clean
text = strip_tags(main_content)
text = clean_text(text)
# Limit text length
max_length = 10000
if len(text) > max_length:
text = text[:max_length] + "..."
result["text"] = text
result["content"] = main_content[:5000] # Keep some HTML for reference
# Extract meta description
desc_match = re.search(r'<meta[^>]*name="description"[^>]*content="([^"]*)"', html, re.IGNORECASE)
if not desc_match:
desc_match = re.search(r'<meta[^>]*property="og:description"[^>]*content="([^"]*)"', html, re.IGNORECASE)
if desc_match:
result["description"] = desc_match.group(1)
return result
except urllib.error.HTTPError as e:
result["error"] = f"HTTP Error {e.code}: {e.reason}"
return result
except urllib.error.URLError as e:
result["error"] = f"URL Error: {str(e.reason)}"
return result
except Exception as e:
result["error"] = f"Error: {type(e).__name__}: {str(e)}"
return result
def main():
parser = argparse.ArgumentParser(description="Fetch URL content (no API key required)")
parser.add_argument("url", help="URL to fetch")
parser.add_argument("--timeout", "-t", type=int, default=30,
help="Request timeout in seconds (default: 30)")
parser.add_argument("--format", "-f", choices=["text", "json"], default="json",
help="Output format (default: json)")
args = parser.parse_args()
# Fetch content
result = fetch_url(args.url, args.timeout)
# Output
if args.format == "json":
print(json.dumps(result, ensure_ascii=False, indent=2))
else:
if result.get("error"):
print(f"Error: {result['error']}", file=sys.stderr)
sys.exit(1)
if result.get("title"):
print(f"Title: {result['title']}\n")
print(result.get("text", ""))
# Return error code if failed
if result.get("error") and not result.get("text"):
sys.exit(1)
if __name__ == "__main__":
main()
FILE:scripts/ddg_search.py
#!/usr/bin/env python3
"""
DuckDuckGo Search Script
Search the web using DuckDuckGo (no API key required)
Fallback to alternative search methods if DDG blocks
"""
import argparse
import json
import sys
import subprocess
import urllib.request
import urllib.parse
import urllib.error
import re
def search_with_curl(query: str, max_results: int = 10):
"""Use curl to fetch DuckDuckGo results"""
try:
encoded_query = urllib.parse.quote_plus(query)
url = f"https://lite.duckduckgo.com/lite/?q={encoded_query}"
cmd = [
'curl', '-s', '-L',
'-A', 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.0',
'--connect-timeout', '10',
'--max-time', '20',
url
]
result = subprocess.run(cmd, capture_output=True, text=True, timeout=25)
html = result.stdout
if result.returncode != 0:
return {"error": f"curl failed: {result.stderr}", "results": []}
# Parse results
results = []
# Pattern 1: Standard result format
result_pattern = r'<tr>.*?class="result-link"[^\u003e]*href="([^"]*)"[^\u003e]*>([^\u003c]*)</a>.*?class="result-snippet"[^\u003e]*>([^\u003c]*)<'
matches = re.findall(result_pattern, html, re.DOTALL | re.IGNORECASE)
for url, title, snippet in matches[:max_results]:
title = title.strip()
snippet = snippet.strip()
# Clean up URL
if url.startswith('/'):
url = 'https://duckduckgo.com' + url
if title:
results.append({
"title": title,
"url": url,
"snippet": snippet[:200] if snippet else ""
})
return {"results": results}
except Exception as e:
return {"error": str(e), "results": []}
def search_ddg(query: str, max_results: int = 10):
"""
Search DuckDuckGo
"""
try:
encoded_query = urllib.parse.quote_plus(query)
url = f"https://lite.duckduckgo.com/lite/?q={encoded_query}"
headers = {
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.0',
'Accept': 'text/html,application/xhtml+xml',
'Accept-Language': 'en-US,en;q=0.9',
}
req = urllib.request.Request(url, headers=headers)
with urllib.request.urlopen(req, timeout=15) as response:
html = response.read().decode('utf-8', errors='ignore')
# Parse results
results = []
# Find result rows
import re
rows = re.findall(r'<tr[^\u003e]*>(.*?)</tr>', html, re.DOTALL)
for row in rows:
# Extract link
link_match = re.search(r'<a[^\u003e]*href="([^"]*)"[^\u003e]*>([^\u003c]*)</a>', row)
if link_match:
url = link_match.group(1)
title = link_match.group(2).strip()
# Extract snippet
snippet_match = re.search(r'<td[^\u003e]*class="[^"]*snippet[^"]*"[^\u003e]*>([^\u003c]*)<', row)
snippet = snippet_match.group(1).strip() if snippet_match else ""
# Clean URL - decode DuckDuckGo redirect URLs
if url.startswith('/'):
url = 'https://duckduckgo.com' + url
# Decode DuckDuckGo redirect URLs
if 'uddg=' in url:
match = re.search(r'uddg=([^&]+)', url)
if match:
url = urllib.parse.unquote(match.group(1))
if title and len(title) > 3:
results.append({
"title": title,
"url": url,
"snippet": snippet[:200]
})
return {
"query": query,
"count": len(results),
"results": results[:max_results]
}
except Exception as e:
# Try curl fallback
fallback = search_with_curl(query, max_results)
if fallback.get("results"):
return {
"query": query,
"count": len(fallback["results"]),
"results": fallback["results"]
}
return {
"query": query,
"error": str(e),
"results": []
}
def main():
parser = argparse.ArgumentParser(description="Search DuckDuckGo (no API key required)")
parser.add_argument("query", help="Search query")
parser.add_argument("--max-results", "-n", type=int, default=10,
help="Maximum results (1-20, default: 10)")
args = parser.parse_args()
# Validate max_results
max_results = max(1, min(20, args.max_results))
# Perform search
result = search_ddg(args.query, max_results)
# Output JSON
print(json.dumps(result, ensure_ascii=False, indent=2))
# Return error code if search failed
if "error" in result and not result.get("results"):
sys.exit(1)
if __name__ == "__main__":
main()
Analyze and validate SKILL.md files for best practices, common issues, and improvement suggestions. Use when reviewing a Skill, creating a new Skill, or when...
---
name: skill-linter
description: Analyze and validate SKILL.md files for best practices, common issues, and improvement suggestions. Use when reviewing a Skill, creating a new Skill, or when asked to check/audit/improve a SKILL.md file.
allowed-tools: Read, Edit, Write
---
# Skill Linter & Advisor
Analyze SKILL.md files against Claude Code Skills best practices and provide actionable feedback.
## Analysis Process
1. **Read the SKILL.md file** - Load the complete content
2. **Parse frontmatter** - Validate YAML structure and required fields
3. **Check content structure** - Verify best practices for the markdown body
4. **Compare against patterns** - Match against known good Skill patterns
5. **Generate report** - Provide structured feedback with severity levels
## Validation Checklist
### Frontmatter (YAML Header)
| Check | Severity | Description |
|-------|----------|-------------|
| Has `---` delimiters | 🔴 Critical | Must have opening and closing `---` |
| Valid YAML syntax | 🔴 Critical | YAML must parse without errors |
| Has `name` field | 🟡 Warning | Defaults to directory name, but explicit is better |
| Has `description` | 🔴 Critical | Required for auto-trigger to work |
| Description quality | 🟡 Warning | Should be specific, mention when to use |
| `disable-model-invocation` | 🟢 Info | Only set if you want manual-only |
| `user-invocable` | 🟢 Info | Set to false to hide from / menu |
| `allowed-tools` | 🟡 Warning | Specify if Skill needs specific tools |
| `model` override | 🟢 Info | Only if you need specific model |
| `context: fork` | 🟢 Info | Use for long-running or isolated tasks |
| `agent` with context | 🟢 Info | Required when context: fork |
### Content Structure
| Check | Severity | Description |
|-------|----------|-------------|
| Has clear title/heading | 🟡 Warning | First line should indicate purpose |
| Has process/steps | 🟡 Warning | Skills should have actionable steps |
| Has output format | 🟡 Warning | Define expected output structure |
| Uses specific language | 🟡 Warning | Avoid vague terms like "etc", "etc." |
| Has examples | 🟢 Info | Concrete examples improve reliability |
| Has constraints/guardrails | 🟢 Info | Define what NOT to do |
| Appropriate length | 🟡 Warning | Too short (<100 words) or too long (>2000) |
### Common Issues
| Issue | Severity | Fix |
|-------|----------|-----|
| Missing description | 🔴 Critical | Add description explaining when to trigger |
| Description too vague | 🟡 Warning | Be specific about use cases |
| No clear output format | 🟡 Warning | Add expected output structure |
| Missing tool declarations | 🟡 Warning | Add `allowed-tools` if using tools |
| Too many responsibilities | 🟡 Warning | Split into multiple focused Skills |
| Hardcoded paths | 🟡 Warning | Use variables or relative paths |
| No error handling guidance | 🟢 Info | Add what to do when things go wrong |
## Output Format
```
# Skill Analysis Report
## File: {filepath}
### Frontmatter Analysis
| Field | Status | Value | Notes |
|-------|--------|-------|-------|
| name | ✅/⚠️/❌ | {value} | {feedback} |
| description | ✅/⚠️/❌ | {value} | {feedback} |
| ... | | | |
**Frontmatter Score:** X/10
### Content Analysis
| Check | Status | Notes |
|-------|--------|-------|
| Has clear purpose | ✅/⚠️/❌ | {feedback} |
| Actionable steps | ✅/⚠️/❌ | {feedback} |
| Output format defined | ✅/⚠️/❌ | {feedback} |
| Has examples | ✅/⚠️/❌ | {feedback} |
| Appropriate length | ✅/⚠️/❌ | {word_count} words |
**Content Score:** X/10
### Issues Found
#### 🔴 Critical (Must Fix)
1. {issue description} → {fix suggestion}
#### 🟡 Warnings (Should Fix)
1. {issue description} → {fix suggestion}
#### 🟢 Suggestions (Nice to Have)
1. {issue description} → {fix suggestion}
### Overall Assessment
**Total Score:** X/10
**Verdict:**
- ✅ Excellent - Ready to use
- 🟡 Good - Minor improvements suggested
- ⚠️ Needs Work - Address warnings before using
- ❌ Critical Issues - Must fix before using
### Recommended Actions
1. {action item}
2. {action item}
3. {action item}
### Improved Version (Optional)
If significant improvements are needed, provide a rewritten SKILL.md:
```yaml
---
# improved frontmatter
---
# Improved content...
```
```
## Skill Patterns Reference
### Pattern 1: Checklist/Task Skill
For: Code review, testing, validation tasks
Structure:
- Clear trigger description
- Step-by-step process
- Checklist categories
- Severity ratings
- Structured output format
### Pattern 2: Generator Skill
For: Documentation, commit messages, reports
Structure:
- Input requirements
- Analysis steps
- Template/format specification
- Examples
- Constraints
### Pattern 3: Explorer/Research Skill
For: Code exploration, debugging, analysis
Structure:
- Context gathering (!commands)
- Investigation steps
- What to look for
- How to present findings
### Pattern 4: Workflow Skill
For: Multi-step processes, releases, deployments
Structure:
- Prerequisites check
- Sequential steps
- Validation points
- Rollback guidance
## Examples of Good Descriptions
✅ **Good:**
- "Perform a thorough code review following the team checklist. Use when reviewing code changes, pull requests, or when the user asks for a code review."
- "Generate API documentation from source code. Use when the user asks to document an API endpoint, route handler, or controller."
- "Create a standardized git commit message following Conventional Commits format. Use when the user asks to commit or create a commit message."
❌ **Bad:**
- "Does code review" (too vague)
- "Helps with documentation" (when?)
- "A skill for git" (too broad)
## Examples of Good Output Formats
✅ **Good:**
```markdown
## Output Format
Structure your review as:
**Summary**
[One-paragraph overall assessment]
**Critical Issues**
[Must fix before merging]
**Approved?**
[YES / NO / YES WITH CONDITIONS]
```
❌ **Bad:**
```markdown
Just give me a review of the code.
```
FILE:skill_linter.py
#!/usr/bin/env python3
"""
Skill Linter - 检查 SKILL.md 文件的最佳实践
"""
import sys
import re
import yaml
from pathlib import Path
from dataclasses import dataclass, field
from typing import List, Optional
@dataclass
class Issue:
severity: str # critical, warning, info
category: str
message: str
suggestion: str
@dataclass
class AnalysisResult:
filepath: str
frontmatter: dict = field(default_factory=dict)
content: str = ""
issues: List[Issue] = field(default_factory=list)
word_count: int = 0
@property
def critical_count(self):
return sum(1 for i in self.issues if i.severity == "critical")
@property
def warning_count(self):
return sum(1 for i in self.issues if i.severity == "warning")
@property
def info_count(self):
return sum(1 for i in self.issues if i.severity == "info")
def parse_skill_md(filepath: str) -> AnalysisResult:
"""解析 SKILL.md 文件"""
result = AnalysisResult(filepath=filepath)
try:
content = Path(filepath).read_text(encoding='utf-8')
except Exception as e:
result.issues.append(Issue(
severity="critical",
category="file",
message=f"无法读取文件: {e}",
suggestion="检查文件路径和权限"
))
return result
result.content = content
result.word_count = len(content.split())
# 解析 frontmatter
frontmatter_match = re.match(r'^---\s*\n(.*?)\n---\s*\n', content, re.DOTALL)
if frontmatter_match:
try:
result.frontmatter = yaml.safe_load(frontmatter_match.group(1)) or {}
except yaml.YAMLError as e:
result.issues.append(Issue(
severity="critical",
category="frontmatter",
message=f"YAML 解析错误: {e}",
suggestion="检查 YAML 语法,确保没有制表符或格式错误"
))
else:
result.issues.append(Issue(
severity="critical",
category="frontmatter",
message="缺少 YAML frontmatter (--- 分隔符)",
suggestion="在文件开头添加 ---\n[你的配置]\n---"
))
return result
def analyze_frontmatter(result: AnalysisResult):
"""分析 frontmatter"""
fm = result.frontmatter
# 检查 name
if 'name' not in fm:
result.issues.append(Issue(
severity="warning",
category="frontmatter",
message="缺少 'name' 字段",
suggestion="添加 name 字段来明确 Skill 名称,如: name: my-skill"
))
elif not fm.get('name'):
result.issues.append(Issue(
severity="warning",
category="frontmatter",
message="'name' 字段为空",
suggestion="提供一个有意义的 Skill 名称"
))
# 检查 description
if 'description' not in fm:
result.issues.append(Issue(
severity="critical",
category="frontmatter",
message="缺少 'description' 字段",
suggestion="description 是必需的,它告诉 Claude 何时使用该 Skill。例如: 'Perform code review when asked to review changes'"
))
else:
desc = fm.get('description', '')
if len(desc) < 20:
result.issues.append(Issue(
severity="warning",
category="frontmatter",
message="description 太短",
suggestion="description 应该具体说明 Skill 的功能和触发时机,至少 20 个字符"
))
elif len(desc) > 300:
result.issues.append(Issue(
severity="info",
category="frontmatter",
message="description 较长",
suggestion="description 会被用于判断相关性,保持简洁明了"
))
# 检查 description 质量
vague_terms = ['etc', 'etc.', 'something', 'things', 'stuff']
for term in vague_terms:
if term in desc.lower():
result.issues.append(Issue(
severity="warning",
category="frontmatter",
message=f"description 包含模糊词汇 '{term}'",
suggestion="使用具体的描述,避免模糊词汇"
))
# 检查 allowed-tools
if 'allowed-tools' not in fm:
result.issues.append(Issue(
severity="info",
category="frontmatter",
message="未指定 'allowed-tools'",
suggestion="如果 Skill 需要特定工具,添加 allowed-tools 字段,如: allowed-tools: Read, Edit, Bash"
))
# 检查 context 和 agent
if fm.get('context') == 'fork' and 'agent' not in fm:
result.issues.append(Issue(
severity="warning",
category="frontmatter",
message="context: fork 但未指定 agent",
suggestion="当使用 context: fork 时,应该指定 agent 类型,如: agent: Explore"
))
def analyze_content(result: AnalysisResult):
"""分析内容部分"""
content = result.content
# 提取 frontmatter 后的内容
frontmatter_match = re.match(r'^---\s*\n.*?\n---\s*\n', content, re.DOTALL)
if frontmatter_match:
body = content[frontmatter_match.end():]
else:
body = content
# 检查是否有标题
if not re.search(r'^#+\s+', body, re.MULTILINE):
result.issues.append(Issue(
severity="warning",
category="content",
message="内容缺少标题/标题",
suggestion="添加一个清晰的标题来说明 Skill 的目的"
))
# 检查是否有步骤/流程
has_steps = bool(re.search(r'(^|\n)\d+\.', body)) or \
bool(re.search(r'(^|\n)[\*\-\+]\s', body)) or \
bool(re.search(r'(?i)(step|process|workflow|procedure)', body))
if not has_steps:
result.issues.append(Issue(
severity="warning",
category="content",
message="内容缺少明确的步骤或流程",
suggestion="添加编号步骤或项目符号列表来指导 Claude 执行"
))
# 检查是否有输出格式说明
has_output_format = bool(re.search(r'(?i)(output format|output|format|structure|template)', body))
if not has_output_format:
result.issues.append(Issue(
severity="info",
category="content",
message="未明确指定输出格式",
suggestion="添加 '## Output Format' 部分来定义期望的输出结构"
))
# 检查是否有示例
has_examples = bool(re.search(r'```', body)) or \
bool(re.search(r'(?i)(example|for example|e\.g\.)', body))
if not has_examples:
result.issues.append(Issue(
severity="info",
category="content",
message="内容缺少示例",
suggestion="添加代码块或具体示例来帮助 Claude 理解期望的输出"
))
# 检查长度
word_count = len(body.split())
if word_count < 50:
result.issues.append(Issue(
severity="warning",
category="content",
message=f"内容较短 ({word_count} 词)",
suggestion="考虑添加更多细节和指导,使 Skill 更实用"
))
elif word_count > 1500:
result.issues.append(Issue(
severity="info",
category="content",
message=f"内容较长 ({word_count} 词)",
suggestion="长内容没问题,但确保结构清晰,便于 Claude 遵循"
))
# 检查模糊词汇
vague_patterns = [
(r'\betc\.?\b', "使用 'etc' 可能过于模糊"),
(r'\bsomething\b', "使用 'something' 不够具体"),
(r'\bthings?\b', "使用 'things' 不够具体"),
(r'\bstuff\b', "使用 'stuff' 不够专业"),
]
for pattern, msg in vague_patterns:
if re.search(pattern, body, re.IGNORECASE):
result.issues.append(Issue(
severity="info",
category="content",
message=msg,
suggestion="使用具体的术语替代模糊词汇"
))
def calculate_score(result: AnalysisResult) -> int:
"""计算总分"""
score = 10
score -= result.critical_count * 3
score -= result.warning_count * 1
score -= result.info_count * 0.5
return max(0, int(score))
def print_report(result: AnalysisResult):
"""打印分析报告"""
print(f"\n{'='*60}")
print(f"# Skill Analysis Report")
print(f"{'='*60}")
print(f"\n## 文件: {result.filepath}")
# Frontmatter 部分
print(f"\n## Frontmatter 分析")
print(f"\n| 字段 | 值 |")
print(f"|------|-----|")
for key, value in result.frontmatter.items():
display_value = str(value)[:50] + "..." if len(str(value)) > 50 else str(value)
print(f"| {key} | {display_value} |")
# 问题列表
print(f"\n## 发现的问题")
critical_issues = [i for i in result.issues if i.severity == "critical"]
warning_issues = [i for i in result.issues if i.severity == "warning"]
info_issues = [i for i in result.issues if i.severity == "info"]
if critical_issues:
print(f"\n### 🔴 严重问题 (必须修复)")
for i, issue in enumerate(critical_issues, 1):
print(f"{i}. **{issue.message}**")
print(f" → 建议: {issue.suggestion}")
if warning_issues:
print(f"\n### 🟡 警告 (建议修复)")
for i, issue in enumerate(warning_issues, 1):
print(f"{i}. **{issue.message}**")
print(f" → 建议: {issue.suggestion}")
if info_issues:
print(f"\n### 🟢 建议 (可选)")
for i, issue in enumerate(info_issues, 1):
print(f"{i}. **{issue.message}**")
print(f" → 建议: {issue.suggestion}")
if not result.issues:
print("\n✅ 没有发现任何问题!")
# 评分
score = calculate_score(result)
print(f"\n## 总体评估")
print(f"\n**评分:** {score}/10")
if score >= 9:
verdict = "✅ 优秀 - 可以直接使用"
elif score >= 7:
verdict = "🟡 良好 - 建议进行小幅改进"
elif score >= 5:
verdict = "⚠️ 需要改进 - 使用前请修复警告"
else:
verdict = "❌ 严重问题 - 必须修复后才能使用"
print(f"**结论:** {verdict}")
print(f"\n**统计:** {result.critical_count} 严重 | {result.warning_count} 警告 | {result.info_count} 建议")
print(f"**字数:** {result.word_count} 词")
def main():
if len(sys.argv) < 2:
print("用法: python3 skill_linter.py <SKILL.md 路径>")
print("示例: python3 skill_linter.py ~/.claude/skills/my-skill/SKILL.md")
sys.exit(1)
filepath = sys.argv[1]
if not Path(filepath).exists():
print(f"错误: 文件不存在: {filepath}")
sys.exit(1)
# 解析和分析
result = parse_skill_md(filepath)
if result.frontmatter: # 只有成功解析才继续
analyze_frontmatter(result)
analyze_content(result)
# 打印报告
print_report(result)
# 返回退出码
sys.exit(0 if result.critical_count == 0 else 1)
if __name__ == "__main__":
main()
Search the web and fetch URL content using DuckDuckGo. Use when the user wants to search for information online without requiring API keys or paid services....
---
name: duckduckgo-search
description: Search the web and fetch URL content using DuckDuckGo. Use when the user wants to search for information online without requiring API keys or paid services. Supports text search with results including title URL and snippet. Also supports URL fetching to extract readable content from web pages. Triggers on phrases like "search for" "look up" "find information about" "fetch url" "get page content" or when web_search is unavailable.
---
# DuckDuckGo Search & Fetch
Search the web and fetch URL content using DuckDuckGo (no API key required).
## Prerequisites
需要安装依赖:
```bash
pip3 install duckduckgo-search
```
## 功能
### 1. 网页搜索 (ddg_search.py)
```bash
python3 scripts/ddg_search.py "your search query" [--max-results 10]
```
### 2. 网页抓取 (ddg_fetch.py)
```bash
python3 scripts/ddg_fetch.py "https://example.com" [--timeout 30]
```
## Usage Examples
### 搜索
```bash
# Basic search
python3 scripts/ddg_search.py "OpenClaw AI agent"
# Search with more results
python3 scripts/ddg_search.py "Python best practices" --max-results 15
```
### 抓取网页
```bash
# Fetch a webpage
python3 scripts/ddg_fetch.py "https://openclaw.ai"
# With custom timeout
python3 scripts/ddg_fetch.py "https://example.com" --timeout 15
# Plain text output
python3 scripts/ddg_fetch.py "https://example.com" --format text
```
## Output Format
### 搜索结果 (JSON)
```json
{
"query": "search query",
"count": 10,
"results": [
{
"title": "Result title",
"url": "https://example.com",
"snippet": "Description snippet"
}
]
}
```
### 抓取结果 (JSON)
```json
{
"url": "https://example.com",
"title": "Page Title",
"text": "Extracted readable content...",
"description": "Meta description",
"status_code": 200,
"error": null
}
```
## Integration with OpenClaw
Example workflow
```python
# Search
result = exec({
"command": "python3 /path/to/skills/duckduckgo-search/scripts/ddg_search.py query"
})
# Parse: json.loads(result.stdout)
# Fetch URL
result = exec({
"command": "python3 /path/to/skills/duckduckgo-search/scripts/ddg_fetch.py https://example.com"
})
# Parse: json.loads(result.stdout)
```
FILE:README.md
# DuckDuckGo Search & Fetch
使用 DuckDuckGo 进行网页搜索和内容抓取,无需 API Key,完全免费。
## 功能特性
- 🔍 **网页搜索** - 支持文本搜索,返回标题、URL 和摘要
- 📄 **网页抓取** - 提取网页可读内容,支持 Markdown/Text 格式
- 🔐 **无需 API Key** - 直接使用 DuckDuckGo 服务
- ⚡ **轻量级** - 纯 Python 实现,依赖少
## 安装
### 1. 安装依赖
```bash
pip3 install duckduckgo-search
```
### 2. 在 OpenClaw 中使用
```
可通过 OpenClaw skills 或直接调用脚本使用
```
## 使用方法
### 命令行
#### 搜索
```bash
python3 scripts/ddg_search.py "搜索关键词" [--max-results 10]
```
**参数说明:**
- `query` - 搜索关键词
- `--max-results` - 最大结果数(默认 10)
**示例:**
```bash
python3 scripts/ddg_search.py "OpenClaw AI agent"
python3 scripts/ddg_search.py "Python tips" --max-results 15
```
#### 抓取网页
```bash
python3 scripts/ddg_fetch.py "https://example.com" [--timeout 30] [--format markdown]
```
**参数说明:**
- `url` - 要抓取的网页 URL
- `--timeout` - 超时时间(秒,默认 30)
- `--format` - 输出格式(markdown/text,默认 markdown)
**示例:**
```bash
python3 scripts/ddg_fetch.py "https://openclaw.ai"
python3 scripts/ddg_fetch.py "https://example.com" --timeout 15 --format text
```
## 输出格式
### 搜索结果 (JSON)
```json
{
"query": "search query",
"count": 10,
"results": [
{
"title": "Result title",
"url": "https://example.com",
"snippet": "Description snippet"
}
]
}
```
### 抓取结果 (JSON)
```json
{
"url": "https://example.com",
"title": "Page Title",
"text": "Extracted readable content...",
"description": "Meta description",
"status_code": 200,
"error": null
}
```
## OpenClaw 集成
### 作为 Skill 使用
在 OpenClaw 中可通过 skill 触发词使用:
- "search for xxx"
- "look up xxx"
- "find information about xxx"
- "fetch url xxx"
- "get page content xxx"
### 编程调用
```python
import json
import subprocess
# 搜索
result = subprocess.run(
["python3", "scripts/ddg_search.py", "your query"],
capture_output=True, text=True
)
data = json.loads(result.stdout)
# 抓取网页
result = subprocess.run(
["python3", "scripts/ddg_fetch.py", "https://example.com"],
capture_output=True, text=True
)
data = json.loads(result.stdout)
```
## 许可证
MIT License
FILE:_meta.json
{
"ownerId": "kn73pfccc8e95j989m8js0x9zx82p86q",
"slug": "ddg-search-fetch",
"version": "1.0.0",
"publishedAt": 1773929871938
}
FILE:scripts/ddg_fetch.py
#!/usr/bin/env python3
"""
DuckDuckGo Web Fetcher
Fetch and extract readable content from URLs
"""
import argparse
import json
import sys
import urllib.request
import urllib.parse
import re
from html.parser import HTMLParser
class MLStripper(HTMLParser):
"""Strip HTML tags and decode entities"""
def __init__(self):
super().__init__()
self.reset()
self.fed = []
def handle_data(self, d):
self.fed.append(d)
def handle_entityref(self, name):
entities = {
'amp': '&', 'lt': '<', 'gt': '>', 'quot': '"',
'apos': "'", 'nbsp': ' ', 'copy': '©', 'reg': '®',
'trade': '™', 'hellip': '…', 'mdash': '—', 'ndash': '–'
}
self.fed.append(entities.get(name, f'&{name};'))
def get_data(self):
return ''.join(self.fed)
def strip_tags(html):
"""Remove HTML tags from string"""
s = MLStripper()
try:
s.feed(html)
except:
return html
return s.get_data()
def clean_text(text):
"""Clean and normalize text"""
# Remove extra whitespace
text = re.sub(r'\s+', ' ', text)
# Remove special characters but keep common ones
text = re.sub(r'[^\w\s.,!?;:\-\'"()\[\]中文中文\-]', '', text)
return text.strip()
def fetch_url(url: str, timeout: int = 30):
"""
Fetch a URL and extract readable content.
Args:
url: URL to fetch
timeout: Request timeout in seconds
Returns:
dict: Fetched content with title, text, and metadata
"""
result = {
"url": url,
"title": "",
"text": "",
"content": "",
"error": None,
"status_code": 0
}
try:
# Validate URL
parsed = urllib.parse.urlparse(url)
if not parsed.scheme:
# Add https if missing
url = 'https://' + url
parsed = urllib.parse.urlparse(url)
if not parsed.netloc:
result["error"] = "Invalid URL"
return result
# Build request
req = urllib.request.Request(
url,
headers={
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.0 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.0',
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
'Accept-Language': 'en-US,en;q=0.9,zh-CN;q=0.8,zh;q=0.7',
'Accept-Encoding': 'gzip, deflate',
'DNT': '1',
'Connection': 'keep-alive',
}
)
# Make request
with urllib.request.urlopen(req, timeout=timeout) as response:
result["status_code"] = response.status
# Get content encoding
content_encoding = response.headers.get('Content-Encoding', '')
# Read content
content = response.read()
# Decode based on encoding
if content_encoding == 'gzip':
import gzip
content = gzip.decompress(content)
html = content.decode('utf-8', errors='ignore')
# Extract title
title_match = re.search(r'<title[^>]*>([^<]+)</title>', html, re.IGNORECASE)
if title_match:
result["title"] = strip_tags(title_match.group(1)).strip()
# Try to find main content
contentSelectors = [
# Common article content selectors
(r'<article[^>]*>(.*?)</article>', 'article'),
(r'<main[^>]*>(.*?)</main>', 'main'),
(r'<div[^>]*class="[^"]*content[^"]*"[^>]*>(.*?)</div>', 'content div'),
(r'<div[^>]*class="[^"]*article[^"]*"[^>]*>(.*?)</div>', 'article div'),
(r'<div[^>]*class="[^"]*post[^"]*"[^>]*>(.*?)</div>', 'post div'),
(r'<div[^>]*id="[^"]*content[^"]*"[^>]*>(.*?)</div>', 'content div'),
(r'<div[^>]*id="[^"]*main[^"]*"[^>]*>(.*?)</div>', 'main div'),
]
main_content = ""
for pattern, name in contentSelectors:
match = re.search(pattern, html, re.DOTALL | re.IGNORECASE)
if match:
main_content = match.group(1)
break
# If no content found, use body
if not main_content:
body_match = re.search(r'<body[^>]*>(.*?)</body>', html, re.DOTALL | re.IGNORECASE)
if body_match:
main_content = body_match.group(1)
# Extract text from content
if main_content:
# Remove script and style elements
main_content = re.sub(r'<script[^>]*>.*?</script>', '', main_content, flags=re.DOTALL | re.IGNORECASE)
main_content = re.sub(r'<style[^>]*>.*?</style>', '', main_content, flags=re.DOTALL | re.IGNORECASE)
main_content = re.sub(r'<noscript[^>]*>.*?</noscript>', '', main_content, flags=re.DOTALL | re.IGNORECASE)
main_content = re.sub(r'<!--.*?-->', '', main_content, flags=re.DOTALL)
# Strip tags and clean
text = strip_tags(main_content)
text = clean_text(text)
# Limit text length
max_length = 10000
if len(text) > max_length:
text = text[:max_length] + "..."
result["text"] = text
result["content"] = main_content[:5000] # Keep some HTML for reference
# Extract meta description
desc_match = re.search(r'<meta[^>]*name="description"[^>]*content="([^"]*)"', html, re.IGNORECASE)
if not desc_match:
desc_match = re.search(r'<meta[^>]*property="og:description"[^>]*content="([^"]*)"', html, re.IGNORECASE)
if desc_match:
result["description"] = desc_match.group(1)
return result
except urllib.error.HTTPError as e:
result["error"] = f"HTTP Error {e.code}: {e.reason}"
return result
except urllib.error.URLError as e:
result["error"] = f"URL Error: {str(e.reason)}"
return result
except Exception as e:
result["error"] = f"Error: {type(e).__name__}: {str(e)}"
return result
def main():
parser = argparse.ArgumentParser(description="Fetch URL content (no API key required)")
parser.add_argument("url", help="URL to fetch")
parser.add_argument("--timeout", "-t", type=int, default=30,
help="Request timeout in seconds (default: 30)")
parser.add_argument("--format", "-f", choices=["text", "json"], default="json",
help="Output format (default: json)")
args = parser.parse_args()
# Fetch content
result = fetch_url(args.url, args.timeout)
# Output
if args.format == "json":
print(json.dumps(result, ensure_ascii=False, indent=2))
else:
if result.get("error"):
print(f"Error: {result['error']}", file=sys.stderr)
sys.exit(1)
if result.get("title"):
print(f"Title: {result['title']}\n")
print(result.get("text", ""))
# Return error code if failed
if result.get("error") and not result.get("text"):
sys.exit(1)
if __name__ == "__main__":
main()
FILE:scripts/ddg_search.py
#!/usr/bin/env python3
"""
DuckDuckGo Search Script
Search the web using DuckDuckGo (no API key required)
Fallback to alternative search methods if DDG blocks
"""
import argparse
import json
import sys
import subprocess
import urllib.request
import urllib.parse
import urllib.error
import re
def search_with_curl(query: str, max_results: int = 10):
"""Use curl to fetch DuckDuckGo results"""
try:
encoded_query = urllib.parse.quote_plus(query)
url = f"https://lite.duckduckgo.com/lite/?q={encoded_query}"
cmd = [
'curl', '-s', '-L',
'-A', 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.0',
'--connect-timeout', '10',
'--max-time', '20',
url
]
result = subprocess.run(cmd, capture_output=True, text=True, timeout=25)
html = result.stdout
if result.returncode != 0:
return {"error": f"curl failed: {result.stderr}", "results": []}
# Parse results
results = []
# Pattern 1: Standard result format
result_pattern = r'<tr>.*?class="result-link"[^\u003e]*href="([^"]*)"[^\u003e]*>([^\u003c]*)</a>.*?class="result-snippet"[^\u003e]*>([^\u003c]*)<'
matches = re.findall(result_pattern, html, re.DOTALL | re.IGNORECASE)
for url, title, snippet in matches[:max_results]:
title = title.strip()
snippet = snippet.strip()
# Clean up URL
if url.startswith('/'):
url = 'https://duckduckgo.com' + url
if title:
results.append({
"title": title,
"url": url,
"snippet": snippet[:200] if snippet else ""
})
return {"results": results}
except Exception as e:
return {"error": str(e), "results": []}
def search_ddg(query: str, max_results: int = 10):
"""
Search DuckDuckGo
"""
try:
encoded_query = urllib.parse.quote_plus(query)
url = f"https://lite.duckduckgo.com/lite/?q={encoded_query}"
headers = {
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.0',
'Accept': 'text/html,application/xhtml+xml',
'Accept-Language': 'en-US,en;q=0.9',
}
req = urllib.request.Request(url, headers=headers)
with urllib.request.urlopen(req, timeout=15) as response:
html = response.read().decode('utf-8', errors='ignore')
# Parse results
results = []
# Find result rows
import re
rows = re.findall(r'<tr[^\u003e]*>(.*?)</tr>', html, re.DOTALL)
for row in rows:
# Extract link
link_match = re.search(r'<a[^\u003e]*href="([^"]*)"[^\u003e]*>([^\u003c]*)</a>', row)
if link_match:
url = link_match.group(1)
title = link_match.group(2).strip()
# Extract snippet
snippet_match = re.search(r'<td[^\u003e]*class="[^"]*snippet[^"]*"[^\u003e]*>([^\u003c]*)<', row)
snippet = snippet_match.group(1).strip() if snippet_match else ""
# Clean URL - decode DuckDuckGo redirect URLs
if url.startswith('/'):
url = 'https://duckduckgo.com' + url
# Decode DuckDuckGo redirect URLs
if 'uddg=' in url:
match = re.search(r'uddg=([^&]+)', url)
if match:
url = urllib.parse.unquote(match.group(1))
if title and len(title) > 3:
results.append({
"title": title,
"url": url,
"snippet": snippet[:200]
})
return {
"query": query,
"count": len(results),
"results": results[:max_results]
}
except Exception as e:
# Try curl fallback
fallback = search_with_curl(query, max_results)
if fallback.get("results"):
return {
"query": query,
"count": len(fallback["results"]),
"results": fallback["results"]
}
return {
"query": query,
"error": str(e),
"results": []
}
def main():
parser = argparse.ArgumentParser(description="Search DuckDuckGo (no API key required)")
parser.add_argument("query", help="Search query")
parser.add_argument("--max-results", "-n", type=int, default=10,
help="Maximum results (1-20, default: 10)")
args = parser.parse_args()
# Validate max_results
max_results = max(1, min(20, args.max_results))
# Perform search
result = search_ddg(args.query, max_results)
# Output JSON
print(json.dumps(result, ensure_ascii=False, indent=2))
# Return error code if search failed
if "error" in result and not result.get("results"):
sys.exit(1)
if __name__ == "__main__":
main()