@clawhub-scottkiss-295f29f174
Map service address lookup and distance-query workflow. MUST use this skill when the user asks for coordinates (坐标), latitude/longitude, POI locations (e.g.,...
---
name: map-address-query
description: Map service address lookup and distance-query workflow. MUST use this skill when the user asks for coordinates (坐标), latitude/longitude, POI locations (e.g., specific buildings, shops, or addresses like "北京市朝阳区望京街10号的坐标"), or when calculating route distance and time between places. Supports converting physical locations or names into GPS coordinates.current support Tencent map.
---
# Map Address Query
First, run `./scripts/qq_map_cli.sh` (or `.\scripts\qq_map_cli.bat` on Windows) to download the CLI tool.
Then, use `./scripts/bin/qq-map-cli` (or `.\scripts\bin\qq-map-cli.exe` on Windows) to query Tencent Location Service from the terminal.
## Quick Start
- If the global config file does not exist, create one first:
```bash
# On Mac/Linux
./scripts/bin/qq-map-cli setup --config ~/.qq_map_cli_config.json
# On Windows (CMD/PowerShell)
.\scripts\bin\qq-map-cli.exe setup --config %USERPROFILE%\.qq_map_cli_config.json
```
- If the user already has a Tencent Location Service key, write it directly during setup:
```bash
./scripts/bin/qq-map-cli setup --config ~/.qq_map_cli_config.json --key "your-key"
# Windows: .\scripts\bin\qq-map-cli.exe setup --config %USERPROFILE%\.qq_map_cli_config.json --key "your-key"
```
- The CLI reads the key from one of these places: `--key`, `QQ_MAP_KEY`, `--config` file path.
- Always append the global config flag: `--config ~/.qq_map_cli_config.json` (or `--config %USERPROFILE%\.qq_map_cli_config.json` on Windows) to all commands.
- Run the bundled CLI from the skill directory: `./scripts/bin/qq-map-cli <subcommand> ...` (or `.\scripts\bin\qq-map-cli.exe` on Windows).
- Prefer `--json` when another script or workflow needs structured output.
## How to get Tencent Location Service key
- visit https://lbs.qq.com/dev/console/application/mine
- click "创建应用" button to create a new application
- click "添加key" button to add a new key
- copy the "Key" value to the `--key` flag
## Commands
### Geocode One Address
Use `geocoder` when the user wants latitude/longitude or structured address fields for one place.
```bash
./scripts/bin/qq-map-cli geocoder \
--config ~/.qq_map_cli_config.json \
--address "北京市海淀区彩和坊路海淀西大街74号"
# Windows: .\scripts\bin\qq-map-cli.exe geocoder --config %USERPROFILE%\.qq_map_cli_config.json --address "..."
```
### Measure Distance Between Two Named Addresses
Use `address-distance` when the user asks for the distance or travel time between two address names.
```bash
./scripts/bin/qq-map-cli address-distance \
--config ~/.qq_map_cli_config.json \
--from-address "北京市海淀区中关村大街27号" \
--to-address "北京市朝阳区望京街10号" \
--mode driving
# Windows: .\scripts\bin\qq-map-cli.exe address-distance --config %USERPROFILE%\.qq_map_cli_config.json ...
```
### Run Coordinate or Matrix Queries
Use `distance-matrix` when the user already has coordinates, needs one-to-many or many-to-many queries, or wants raw `from` and `to` payload control.
```bash
./scripts/bin/qq-map-cli distance-matrix \
--config ~/.qq_map_cli_config.json \
--origin 39.984154,116.307490 \
--destination 39.908692,116.397477 \
--mode walking
# Windows: .\scripts\bin\qq-map-cli.exe distance-matrix --config %USERPROFILE%\.qq_map_cli_config.json ...
```
## Workflow
1. Check whether a global config already exists at `~/.qq_map_cli_config.json` (or `%USERPROFILE%\.qq_map_cli_config.json`) or `QQ_MAP_KEY`.
2. If neither exists, run `setup` with the global `--config` flag to create the config file in the user's home directory.
3. If the user can provide the Tencent key immediately, prefer `setup --config ~/.qq_map_cli_config.json --key "..."` so the config is complete in one step.
4. If the key is still missing after setup, stop and ask the user for the Tencent Location Service key instead of trying live queries.
5. After the user provides the key, immediately persist it globally by running `setup --config ~/.qq_map_cli_config.json --key "..." --force`.
6. Treat `~/.qq_map_cli_config.json` as a globally persistent config so the user does not need to configure it again for queries in other directories.
7. Start with the fullest address available, including city and district, to improve geocoder accuracy.
8. Use `address-distance` by default for natural-language requests like "这两个地址相距多远".
9. Switch to `distance-matrix` for batch routing, coordinate-based inputs, or advanced raw payloads.
10. Retry with `--policy 1` when the user gives a shorter or fuzzier address and strict parsing looks too brittle.
11. Change `--mode` to `walking`, `bicycling`, or `straight` when the user asks for non-driving results.
12. Keep API keys in the user's workspace config or environment variables, not inside the skill folder.
## Output Notes
- Default output is human-readable and suited to chat replies.
- `--json` returns raw API responses for downstream automation.
- `address-distance --json` returns both geocoder responses and the final distance-matrix response in one object.
## Troubleshooting
- **SSL Certificate Error on macOS**: If you see `[SSL: CERTIFICATE_VERIFY_FAILED]`, run the downloaded binary with `SSL_CERT_FILE=/etc/ssl/cert.pem ./scripts/bin/qq-map-cli ...` to manually point to macOS's certificates.
FILE:README.md
# Map Address Query Skill
[English](#english) | [中文](#chinese)
---
<a id="english"></a>
## English
### Overview
A skill that provides map service address lookup and distance-query workflows. Currently supports Tencent Location Service (QQ Map).
It enables AI agents and users to:
- Convert a natural language address to geographic coordinates (Latitude/Longitude).
- Measure travel distance and duration between two locations.
- Calculate route distance matrices.
### Installation & Setup
1. **Download the CLI tool:**
- **Mac/Linux**: Run `./scripts/qq_map_cli.sh`
- **Windows**: Run `.\scripts\qq_map_cli.bat`
2. **Get an API Key:**
- Visit [Tencent Location Service Console](https://lbs.qq.com/dev/console/application/mine).
- Click "创建应用" (Create Application) and "添加key" (Add Key).
- Copy the generated Key.
3. **Configure the Key globally:**
- **Mac/Linux**: `./scripts/bin/qq-map-cli setup --config ~/.qq_map_cli_config.json --key "your-key" --force`
- **Windows**: `.\scripts\bin\qq-map-cli.exe setup --config %USERPROFILE%\.qq_map_cli_config.json --key "your-key" --force`
### Usage
Once configured, the CLI can be used from any directory to query addresses:
```bash
# Geocode an address
./scripts/bin/qq-map-cli geocoder --config ~/.qq_map_cli_config.json --address "杭州大厦"
# Measure distance
./scripts/bin/qq-map-cli address-distance --config ~/.qq_map_cli_config.json --from-address "杭州西站" --to-address "杭州大厦" --mode driving
```
*(For raw JSON output, append `--json` to any command)*
### Troubleshooting
- **macOS SSL Error (`[SSL: CERTIFICATE_VERIFY_FAILED]`)**:
Prefix your command with `SSL_CERT_FILE=/etc/ssl/cert.pem` to use the macOS system certificates.
---
<a id="chinese"></a>
## 中文
### 简介
这是一个提供地图地址查询与距离测算工作流的技能插件。目前支持**腾讯位置服务(QQ地图)**。
它可以赋予 AI 代理或用户以下能力:
- 将自然语言地址转换为精确的地理坐标(经纬度)。
- 测量两个地点之间的实际物理距离与预估通行时间。
- 计算批量路线距离矩阵。
### 安装与配置
1. **自动下载依赖命令行工具:**
- **Mac/Linux**: 运行 `./scripts/qq_map_cli.sh`
- **Windows**: 运行 `.\scripts\qq_map_cli.bat`
2. **获取 API Key:**
- 访问 [腾讯位置服务控制台](https://lbs.qq.com/dev/console/application/mine)。
- 点击“创建应用”,然后“添加 Key”。
- 复制生成的 Key 字符串。
3. **全局配置您的 Key:**
- **Mac/Linux**: `./scripts/bin/qq-map-cli setup --config ~/.qq_map_cli_config.json --key "您的key" --force`
- **Windows**: `.\scripts\bin\qq-map-cli.exe setup --config %USERPROFILE%\.qq_map_cli_config.json --key "您的key" --force`
### 使用方法
配置好全局密钥后,即可随时随地在终端发起查询:
```bash
# 查询具体地址坐标
./scripts/bin/qq-map-cli geocoder --config ~/.qq_map_cli_config.json --address "杭州大厦"
# 计算两地通行距离
./scripts/bin/qq-map-cli address-distance --config ~/.qq_map_cli_config.json --from-address "杭州西站" --to-address "杭州大厦" --mode driving
```
*(如果需要供程序对接的结构化 JSON 数据结果,可以在任何命令结尾加上 `--json`)*
### 常见问题 Troubleshooting
- **macOS 下出现 SSL 证书报错 (`[SSL: CERTIFICATE_VERIFY_FAILED]`)**:
这属于跨平台二进制文件的常见问题。在命令行开头加上 `SSL_CERT_FILE=/etc/ssl/cert.pem` 即可调用系统的根证书完成环境验证。例如:
`SSL_CERT_FILE=/etc/ssl/cert.pem ./scripts/bin/qq-map-cli geocoder ...`
FILE:agents/openai.yaml
interface:
display_name: "Map Address Query"
short_description: "Geocode addresses and measure travel distance"
default_prompt: "Use $map-address-query to help set up map address queries, resolve addresses, and calculate distances."
FILE:scripts/qq_map_cli.sh
#!/bin/bash
set -e
# Detect OS
OS="$(uname -s)"
ARCH="$(uname -m)"
CMD_NAME="qq-map-cli"
ZIP_FILE="qq-map-cli.zip"
DOWNLOAD_URL=""
if [ "$OS" = "Darwin" ]; then
DOWNLOAD_URL="https://github.com/scottkiss/qq-map-cli/releases/download/v1.0.2/qq-map-cli-darwin-arm64.zip"
CMD_NAME="qq-map-cli"
elif [ "$OS" = "Linux" ]; then
DOWNLOAD_URL="https://github.com/scottkiss/qq-map-cli/releases/download/v1.0.2/qq-map-cli-linux-x86_64.zip"
CMD_NAME="qq-map-cli"
elif echo "$OS" | grep -iq 'mingw\|cygwin\|msys\|windows_nt'; then
DOWNLOAD_URL="https://github.com/scottkiss/qq-map-cli/releases/download/v1.0.2/qq-map-cli-windows-x86_64.zip"
CMD_NAME="qq-map-cli.exe"
else
echo "Unsupported OS: $OS"
exit 1
fi
DIR="$( cd "$( dirname "BASH_SOURCE[0]" )" && pwd )"
BIN_DIR="$DIR/bin"
CMD_PATH="$BIN_DIR/$CMD_NAME"
if [ ! -x "$CMD_PATH" ] && [ ! -f "$CMD_PATH" ]; then
echo "Downloading $CMD_NAME..." >&2
mkdir -p "$BIN_DIR"
curl -L -s "$DOWNLOAD_URL" -o "$BIN_DIR/$ZIP_FILE"
unzip -q -o "$BIN_DIR/$ZIP_FILE" -d "$BIN_DIR"
if [ "$CMD_NAME" = "qq-map-cli" ]; then
chmod +x "$CMD_PATH"
fi
rm -f "$BIN_DIR/$ZIP_FILE"
echo "Download complete: $CMD_PATH" >&2
else
echo "$CMD_NAME is already downloaded at $CMD_PATH" >&2
fi
Convert scanned PDF documents into Word text documents using a free, local OCR engine or remote api.
---
name: pdf2word-skills
description: Convert scanned PDF documents into Word text documents using a free, local OCR engine or remote api.
---
# PDF to Word Converter
[🇨🇳 简体中文 / Simplified Chinese](README_zh.md)
A skill to extract text from scanned PDF documents and convert them into reusable Word (`.docx`) files using the free, local `docr` OCR engine.
## Prerequisites
1. Initialize the OCR engine by downloading the binaries:
```bash
bash scripts/install.sh
```
2. Install the required Python dependencies:
```bash
pip install -r scripts/requirements.txt
```
## Usage
Run the Python script passing the input PDF file and the desired output `.docx` file path. You can also append any additional standard `docr` arguments (such as engine preferences).
```bash
python scripts/pdf2word.py <input.pdf> <output.docx> [docr_args...]
```
### Examples
Convert a single file with the default local engine:
```bash
python scripts/pdf2word.py sample.pdf sample_output.docx
```
### Using Other API Engines
By default, the script uses the local `RapidOCR` engine. The underlying `docr` tool also supports other engines like the Google Gemini API for potentially higher recognition accuracy on complex layouts.
To use Gemini, first configure your API key:
```bash
mkdir -p ~/.ocr
echo "gemini_api_key=your_gemini_key" > ~/.ocr/config
```
Then pass the `-engine gemini` argument to the script:
```bash
python scripts/pdf2word.py sample.pdf sample_output.docx -engine gemini
```
If your document has **tables**, you can force Gemini to output them in Markdown format so the script can parse them into native Word tables:
```bash
python scripts/pdf2word.py sample.pdf sample_output.docx -engine gemini -prompt "Extract all text and preserve tables in Markdown format using | symbols."
```
### How it Works
1. The script calls `docr`, which uses the specified OCR model (RapidOCR by default) to read text from the scanned PDF.
2. The extracted text is temporarily stored.
3. The `python-docx` library is used to read the temporary text and construct a formatted Word document.
4. Temporary files are cleaned up automatically.
FILE:README.md
---
name: pdf2word-skills
description: Convert scanned PDF documents into Word text documents using a free, local OCR engine or remote api.
---
# PDF to Word Converter
[🇨🇳 简体中文 / Simplified Chinese](README_zh.md)
A skill to extract text from scanned PDF documents and convert them into reusable Word (`.docx`) files using the free, local `docr` OCR engine.
## Prerequisites
1. Initialize the OCR engine by downloading the binaries:
```bash
bash scripts/install.sh
```
2. Install the required Python dependencies:
```bash
pip install -r scripts/requirements.txt
```
## Usage
Run the Python script passing the input PDF file and the desired output `.docx` file path. You can also append any additional standard `docr` arguments (such as engine preferences).
```bash
python scripts/pdf2word.py <input.pdf> <output.docx> [docr_args...]
```
### Examples
Convert a single file with the default local engine:
```bash
python scripts/pdf2word.py sample.pdf sample_output.docx
```
### Using Other API Engines
By default, the script uses the local `RapidOCR` engine. The underlying `docr` tool also supports other engines like the Google Gemini API for potentially higher recognition accuracy on complex layouts.
To use Gemini, first configure your API key:
```bash
mkdir -p ~/.ocr
echo "gemini_api_key=your_gemini_key" > ~/.ocr/config
```
Then pass the `-engine gemini` argument to the script:
```bash
python scripts/pdf2word.py sample.pdf sample_output.docx -engine gemini
```
If your document has **tables**, you can force Gemini to output them in Markdown format so the script can parse them into native Word tables:
```bash
python scripts/pdf2word.py sample.pdf sample_output.docx -engine gemini -prompt "Extract all text and preserve tables in Markdown format using | symbols."
```
### How it Works
1. The script calls `docr`, which uses the specified OCR model (RapidOCR by default) to read text from the scanned PDF.
2. The extracted text is temporarily stored.
3. The `python-docx` library is used to read the temporary text and construct a formatted Word document.
4. Temporary files are cleaned up automatically.
FILE:README_zh.md
# PDF 转 Word 技能
[English](SKILL.md)
这是一个使用**免费**的本地 OCR 引擎 `docr` 将扫描版 PDF 文档提取并转换为可编辑的 Word (`.docx`) 文件的技能。
## 准备工作
1. 下载并初始化 OCR 引擎二进制文件:
```bash
bash scripts/install.sh
```
2. 安装所需的 Python 依赖项:
```bash
pip install -r scripts/requirements.txt
```
## 使用方法
运行 Python 脚本,传入输入的 PDF 文件路径和期望输出的 `.docx` 文件路径。您还可以在最后附加 `docr` 支持的其他参数(如引擎选择)。
```bash
python scripts/pdf2word.py <输入.pdf> <输出.docx> [docr参数...]
```
### 示例
使用默认的本地引擎转换单个文件:
```bash
python scripts/pdf2word.py sample.pdf sample_output.docx
```
### 使用其他 API 引擎
默认情况下,脚本使用本地的 `RapidOCR` 引擎。底层的 `docr` 工具也支持使用其他引擎(如 Google Gemini API),这可以在处理复杂排版时提供更高的识别准确率。
要使用 Gemini,请先配置您的 API 密钥:
```bash
mkdir -p ~/.ocr
echo "gemini_api_key=your_gemini_key" > ~/.ocr/config
```
然后在运行脚本时传入 `-engine gemini` 参数:
```bash
python scripts/pdf2word.py sample.pdf sample_output.docx -engine gemini
```
如果您的文档中包含**表格**,您可以利用 prompt 功能强制 Gemini 输出 Markdown 格式的表格,脚本会自动将它们转换回原生的 Word 表格:
```bash
python scripts/pdf2word.py sample.pdf sample_output.docx -engine gemini -prompt "请提取所有文本,务必使用 | 符号将表格保持为 Markdown 格式输出。"
```
### 工作原理
1. 脚本调用 `docr`,使用指定的 OCR 模型(默认为 RapidOCR)从扫描的 PDF 中读取文本。
2. 提取的文本被临时存储。
3. 使用 `python-docx` 库读取临时文本内容,并构建为格式化的 Word 文档。
4. 自动清理临时文件。
FILE:scripts/install.sh
#!/bin/bash
# Define the version to download
VERSION="v1.0.0"
# Detect OS
OS=$(uname -s | tr '[:upper:]' '[:lower:]')
EXT=""
if [[ "$OS" == *"mingw"* ]] || [[ "$OS" == *"msys"* ]] || [[ "$OS" == *"cygwin"* ]]; then
OS="windows"
EXT=".exe"
elif [ "$OS" != "darwin" ] && [ "$OS" != "linux" ]; then
echo "Unsupported OS: $OS"
exit 1
fi
# Detect Architecture
ARCH=$(uname -m)
if [ "$ARCH" = "x86_64" ]; then
ARCH="amd64"
elif [ "$ARCH" = "aarch64" ] || [ "$ARCH" = "arm64" ]; then
ARCH="arm64"
else
echo "Unsupported architecture: $ARCH"
exit 1
fi
# Construct filename and URL
FILENAME="docr-$OS-$ARCH$EXT"
DOWNLOAD_URL="https://github.com/scottkiss/doc-ocr/releases/download/$VERSION/$FILENAME"
# Set target directory relative to the script location
SCRIPT_DIR="$(cd "$(dirname "BASH_SOURCE[0]")" && pwd)"
TARGET_DIR="$SCRIPT_DIR/docr"
TARGET_FILE="$TARGET_DIR/docr$EXT"
echo "Downloading docr $VERSION for $OS ($ARCH)..."
# Ensure the target directory exists
mkdir -p "$TARGET_DIR"
# Download the binary
curl -L -o "$TARGET_FILE" "$DOWNLOAD_URL"
if [ $? -eq 0 ]; then
# Make it executable
chmod +x "$TARGET_FILE"
echo "Successfully downloaded and installed to $TARGET_FILE"
else
echo "Failed to download $FILENAME from $DOWNLOAD_URL"
exit 1
fi
FILE:scripts/pdf2word.py
#!/usr/bin/env python3
import sys
import os
import subprocess
try:
from docx import Document
except ImportError:
print("Error: python-docx library is not installed.")
print("Please install requirements first: pip install -r requirements.txt")
sys.exit(1)
def main():
if len(sys.argv) < 3:
print("Usage: python scripts/pdf2word.py <input_pdf> <output_docx> [docr_args...]")
print("Example: python scripts/pdf2word.py sample.pdf output.docx")
print("Example with Gemini API: python scripts/pdf2word.py sample.pdf output.docx -engine gemini")
sys.exit(1)
input_pdf = sys.argv[1]
output_docx = sys.argv[2]
extra_args = sys.argv[3:]
if not os.path.exists(input_pdf):
print(f"Error: Input file '{input_pdf}' not found.")
sys.exit(1)
script_dir = os.path.dirname(os.path.abspath(__file__))
docr_path = os.path.join(script_dir, "docr", "docr")
if not os.path.exists(docr_path):
print("Error: docr binary not found.")
print("Please run scripts/install.sh first to download the OCR engine.")
sys.exit(1)
print(f"==> Extracting text from {input_pdf} using docr...")
temp_text_file = output_docx + ".txt"
try:
# Pass extra custom arguments to docr
cmd = [docr_path, input_pdf, "-o", temp_text_file] + extra_args
subprocess.run(cmd, check=True)
except subprocess.CalledProcessError as e:
print(f"Error: OCR extraction failed: {e}")
sys.exit(1)
if not os.path.exists(temp_text_file):
print(f"Error: Expected text file {temp_text_file} was not created by docr.")
sys.exit(1)
print(f"==> Converting extracted text to Word Document ({output_docx})...")
doc = Document()
with open(temp_text_file, "r", encoding="utf-8") as f:
is_first_page = True
table_data = []
def commit_table():
if not table_data:
return
# Remove the markdown separator line (e.g., |---|---|)
rows = [r for r in table_data if r.replace('|', '').replace('-', '').replace(':', '').replace(' ', '') != '']
if rows:
cols_count = max(len(r.split('|')) - 2 for r in rows)
if cols_count > 0:
table = doc.add_table(rows=len(rows), cols=cols_count)
table.style = 'Table Grid'
for i, row in enumerate(rows):
cells = [c.strip() for c in row.split('|')[1:-1]]
for j, val in enumerate(cells):
if j < len(table.columns):
table.cell(i, j).text = val
table_data.clear()
for line in f:
clean_line = line.strip()
# Simple Markdown table detection
if clean_line.startswith("|") and clean_line.endswith("|"):
table_data.append(clean_line)
continue
else:
commit_table()
if clean_line:
# Handle docr's page delimiter "<!-- Page X -->"
if clean_line.startswith("<!-- Page ") and clean_line.endswith(" -->"):
if not is_first_page:
doc.add_page_break()
is_first_page = False
continue
doc.add_paragraph(clean_line)
commit_table() # In case the file ends with a table
try:
doc.save(output_docx)
print(f"==> Successfully saved to {output_docx}")
except Exception as e:
print(f"Error saving Word doc: {e}")
finally:
# Cleanup temporary text file
try:
if os.path.exists(temp_text_file):
os.remove(temp_text_file)
except OSError as e:
print(f"Warning: Could not remove temporary file {temp_text_file}: {e}")
if __name__ == "__main__":
main()
FILE:scripts/requirements.txt
python-docx>=1.1.0
OCR documents (PDFs and images) using Gemini 2.5 Flash, PaddleOCR (local), or RapidOCR (local).
---
name: doc-ocr-skills
description: OCR documents (PDFs and images) using Gemini 2.5 Flash, PaddleOCR (local), or RapidOCR (local).
---
# Document OCR Skill (docr)
Uses **Gemini 2.5 Flash**, **PaddleOCR**, or **RapidOCR** (local) to recognize text from scanned PDFs and images. Compiled as a single Go binary.
## Prerequisites
- API Key configured in `~/.ocr/config` (not needed for Paddle/Rapid)
- For RapidOCR engine: `pip install rapidocr_onnxruntime`
- For PaddleOCR engine: `pip install paddleocr paddlepaddle`
### API Key Configuration
Create the config file:
```bash
mkdir -p ~/.ocr
cat > ~/.ocr/config << EOF
# Google Gemini API Key
gemini_api_key=your_gemini_key
EOF
```
## Quick Start
> **Path Variable**: All commands below use `$DOCR`. Before running any command, set this variable:
> ```bash
> SKILL_DIR="$(cd "$(dirname "<path-to-this-SKILL.md>")" && pwd)"
> DOCR="$SKILL_DIR/scripts/docr/docr"
> ```
```bash
# OCR a single document using RapidOCR (default)
$DOCR document.pdf
$DOCR image.jpg
# Use Gemini engine
$DOCR -engine gemini document.pdf
# Use PaddleOCR local engine
$DOCR -engine paddle document.pdf
# Specify output file
$DOCR document.pdf -o result.txt
# Batch process all supported files in a directory
$DOCR -batch ./docs/ -o ./outputs/
```
## Engines
| Engine | Flag | API Key Config | Doc Handling |
|--------|------|---------------|--------------|
| **RapidOCR** (default) | `-engine rapid` | None | Local OCR |
| **Gemini** | `-engine gemini` | `gemini_api_key` | Cloud Vision API |
| **PaddleOCR** (local) | `-engine paddle` | None | Local OCR |
## CLI Reference
```
docr [options] <file or directory>
Options:
-engine string OCR engine: rapid (default) / gemini / paddle
-e string Engine (short flag)
-o string Output file path or directory (batch mode)
-output string Output path (long flag)
-batch Batch mode: process all files in directory
-prompt string Custom recognition prompt (gemini)
```
## Installation
We provide pre-compiled binaries to get you started quickly.
```bash
cd doc-ocr-skills/scripts
./install.sh
```
This script will detect your OS (`darwin`/`linux`) and architecture (`amd64`/`arm64`) and download the appropriate version of `docr`.
### Building from Source (Optional)
If you prefer to build from source, ensure you have **Go 1.21+** installed:
```bash
cd doc-ocr-skills/scripts/docr
go build -o docr .
```
## Error Handling
| Error | Solution |
|-------|----------|
| `config file not found` | Create `~/.ocr/config` with API keys |
| `gemini_api_key not found` | Add `gemini_api_key=VALUE` to config |
| `file not found` | Verify the document file path |
| API timeout | Retry; large files may need longer |
FILE:README.md
# Document OCR Skill (docr)
[](https://golang.org/)
[](LICENSE)
[中文版](./README_CN.md)
A powerful, single-binary CLI tool for Optical Character Recognition (OCR) on scanned PDFs and images. It supports multiple engines including **Gemini 2.5 Flash**, **PaddleOCR**, and **RapidOCR**.
## ✨ Features
- **Multi-Engine Support**: Choose between cloud-based (Gemini) or local (PaddleOCR, RapidOCR) engines.
- **Single Binary**: Compiled Go binary for easy distribution and installation.
- **Batch Processing**: Process entire directories of documents with a single command.
- **Cross-Platform**: Supports macOS (Darwin), Linux, and Windows.
- **Custom Prompts**: Enhance recognition results with custom prompts when using Gemini.
## 🚀 Quick Start
### Quick Install (Single Command)
```bash
npx skills add scottkiss/doc-ocr-skills
```
### Installation (Binary)
Use our convenient install script to download the latest pre-compiled binary for your system:
```bash
curl -sSL https://raw.githubusercontent.com/scottkiss/doc-ocr-skills/main/scripts/install.sh | bash
```
*Note: The script installs the `docr` binary into a `docr` directory relative to where it's run. Add it to your PATH for global access.*
### Building from Source
If you have Go 1.21+ installed:
```bash
git clone https://github.com/scottkiss/doc-ocr-skills.git
cd doc-ocr-skills/scripts/docr
go build -o docr .
```
## 🛠 Prerequisites
### Local Engines (Optional)
If you plan to use local OCR engines, ensure the corresponding Python packages are installed:
- **RapidOCR** (Default): `pip install rapidocr_onnxruntime`
- **PaddleOCR**: `pip install paddleocr paddlepaddle`
### API Configuration (For Gemini)
To use the Gemini engine, create a configuration file at `~/.ocr/config`:
```bash
mkdir -p ~/.ocr
cat > ~/.ocr/config << EOF
# Google Gemini API Key
gemini_api_key=your_gemini_key_here
EOF
```
## 📖 Usage
### Basic OCR
Recognize text using the default engine (RapidOCR):
```bash
docr document.pdf
docr image.png
```
### Specifying an Engine
```bash
# Use Google Gemini (Requires API Key)
docr -engine gemini document.pdf
# Use PaddleOCR (Local)
docr -engine paddle document.pdf
```
### Batch Processing
Process all supported files in a directory and save results to an output folder:
```bash
docr -batch ./input_docs/ -o ./output_results/
```
### CLI Options
| Flag | Description |
|------|-------------|
| `-engine`, `-e` | OCR engine to use: `rapid` (default), `gemini`, or `paddle`. |
| `-o`, `-output` | Path to the output file or directory (for batch mode). |
| `-batch` | Enable batch processing for a directory. |
| `-prompt` | Custom recognition prompt (only applicable to Gemini). |
## ❗ Troubleshooting
| Issue | Solution |
|-------|----------|
| `config file not found` | Ensure `~/.ocr/config` exists. |
| `gemini_api_key not found` | Check if the key is correctly set in the config file. |
| `pip: command not found` | Ensure Python and pip are installed for local engines. |
| Permission Denied | Run `chmod +x docr` on the binary. |
## 📄 License
This project is licensed under the MIT License.
FILE:README_CN.md
# Document OCR Skill (docr)
[](https://golang.org/)
[](LICENSE)
[English Version](./README.md)
一个强大、单二进制文件的命令行工具,用于对扫描的 PDF 和图像进行光学字符识别 (OCR)。支持多种引擎,包括 **Gemini 2.5 Flash**、**PaddleOCR** 和 **RapidOCR**。
## ✨ 功能特性
- **多引擎支持**:可选云端 (Gemini) 或本地 (PaddleOCR, RapidOCR) 引擎。
- **单二进制文件**:使用 Go 编译,方便分发和安装。
- **批量处理**:支持单条命令处理整个目录的文档。
- **跨平台**:支持 macOS (Darwin)、Linux 和 Windows。
- **自定义提示词**:使用 Gemini 时,可以通过自定义提示词增强识别效果。
## 🚀 快速开始
### 一键安装
```bash
npx skills add scottkiss/doc-ocr-skills
```
### 手动安装 (二进制)
使用我们的安装脚本下载适用于您系统的最新预编译二进制文件:
```bash
curl -sSL https://raw.githubusercontent.com/scottkiss/doc-ocr-skills/main/scripts/install.sh | bash
```
*注意:脚本会将 `docr` 二进制文件安装到运行目录下的 `docr` 文件夹中。建议将其添加到 PATH 以便全局访问。*
### 源码编译
如果您安装了 Go 1.21+:
```bash
git clone https://github.com/scottkiss/doc-ocr-skills.git
cd doc-ocr-skills/scripts/docr
go build -o docr .
```
## 🛠 前置条件
### 本地引擎 (可选)
如果您打算使用本地 OCR 引擎,请确保已安装相应的 Python 包:
- **RapidOCR** (默认): `pip install rapidocr_onnxruntime`
- **PaddleOCR**: `pip install paddleocr paddlepaddle`
### API 配置 (针对 Gemini)
要使用 Gemini 引擎,请在 `~/.ocr/config` 创建配置文件:
```bash
mkdir -p ~/.ocr
cat > ~/.ocr/config << EOF
# Google Gemini API Key
gemini_api_key=您的_gemini_密钥
EOF
```
## 📖 使用方法
### 基础识别
使用默认引擎 (RapidOCR) 识别文本:
```bash
docr document.pdf
docr image.png
```
### 指定引擎
```bash
# 使用 Google Gemini (需要 API Key)
docr -engine gemini document.pdf
# 使用 PaddleOCR (本地)
docr -engine paddle document.pdf
```
### 批量处理
处理目录下所有受支持的文件,并将结果保存到输出文件夹:
```bash
docr -batch ./input_docs/ -o ./output_results/
```
### 命令行选项
| 选项 (Flag) | 描述 |
|------|-------------|
| `-engine`, `-e` | 使用的 OCR 引擎:`rapid` (默认), `gemini`, 或 `paddle`。 |
| `-o`, `-output` | 输出文件或目录的路径(用于批量模式)。 |
| `-batch` | 开启目录批量处理模式。 |
| `-prompt` | 自定义识别提示词(仅适用于 Gemini)。 |
## ❗ 故障排除
| 问题 | 解决方案 |
|-------|----------|
| `config file not found` | 确保 `~/.ocr/config` 文件存在。 |
| `gemini_api_key not found` | 检查配置文件中是否已正确设置密钥。 |
| `pip: command not found` | 确保已安装 Python 和 pip 以使用本地引擎。 |
| Permission Denied | 对二进制文件运行 `chmod +x docr`。 |
## 📄 开源协议
本项目采用 MIT 协议开源。
FILE:scripts/install.sh
#!/bin/bash
# Define the version to download
VERSION="v1.0.0"
# Detect OS
OS=$(uname -s | tr '[:upper:]' '[:lower:]')
EXT=""
if [[ "$OS" == *"mingw"* ]] || [[ "$OS" == *"msys"* ]] || [[ "$OS" == *"cygwin"* ]]; then
OS="windows"
EXT=".exe"
elif [ "$OS" != "darwin" ] && [ "$OS" != "linux" ]; then
echo "Unsupported OS: $OS"
exit 1
fi
# Detect Architecture
ARCH=$(uname -m)
if [ "$ARCH" = "x86_64" ]; then
ARCH="amd64"
elif [ "$ARCH" = "aarch64" ] || [ "$ARCH" = "arm64" ]; then
ARCH="arm64"
else
echo "Unsupported architecture: $ARCH"
exit 1
fi
# Construct filename and URL
FILENAME="docr-$OS-$ARCH$EXT"
DOWNLOAD_URL="https://github.com/scottkiss/doc-ocr/releases/download/$VERSION/$FILENAME"
# Set target directory relative to the script location
SCRIPT_DIR="$(cd "$(dirname "BASH_SOURCE[0]")" && pwd)"
TARGET_DIR="$SCRIPT_DIR/docr"
TARGET_FILE="$TARGET_DIR/docr$EXT"
echo "Downloading docr $VERSION for $OS ($ARCH)..."
# Ensure the target directory exists
mkdir -p "$TARGET_DIR"
# Download the binary
curl -L -o "$TARGET_FILE" "$DOWNLOAD_URL"
if [ $? -eq 0 ]; then
# Make it executable
chmod +x "$TARGET_FILE"
echo "Successfully downloaded and installed to $TARGET_FILE"
else
echo "Failed to download $FILENAME from $DOWNLOAD_URL"
exit 1
fi