@clawhub-kaarl92-770881a495
Perform OCR on image files (jpg, png, bmp, gif, tiff) using the system's `tesseract` binary and return extracted plain text.
---
name: sm-ocr-scanner
version: 1.0.0
description: Perform OCR on image files (jpg, png, bmp, gif, tiff) using the system's `tesseract` binary and return extracted plain text.
---
# sm-ocr-scanner (funktionierender Skill)
## Überblick
Dieser Skill nutzt das lokale **Tesseract‑OCR**‑Programm, um Text aus gängigen Bildformaten zu extrahieren. Er ist sofort einsetzbar, weil `tesseract` bereits auf dem System installiert ist.
## Verwendung
```bash
# Aufruf über das Skill‑Skript (empfohlen)
~/.openclaw/workspace/skills/sm-ocr-scanner/scripts/ocr.sh <Pfad‑zur‑Bilddatei>
```
Beispiel:
```bash
~/.openclaw/workspace/skills/sm-ocr-scanner/scripts/ocr.sh /root/.openclaw/media/inbound/916f6187-cc22-4c62-bcfc-7b72198c8a10.png
```
Der erkannte Text wird auf **STDOUT** ausgegeben.
## Optionen
- Der Aufruf nutzt `-l eng`, um die englische Sprachdatei zu erzwingen. Für andere Sprachen kannst du das Flag anpassen, z. B. `-l deu` für Deutsch.
- Wenn du die Sprache automatisch erkennen lassen möchtest, entferne das `-l`‑Flag.
## Integration in OpenClaw (optional)
Falls du den Skill später über das OpenClaw‑CLI ausführen willst, kannst du einen Alias in deiner `~/.bashrc` (oder `~/.zshrc`) hinzufügen:
```bash
alias sm-ocr-scanner='~/.openclaw/workspace/skills/sm-ocr-scanner/scripts/ocr.sh'
```
Dann kannst du einfach `ocr-image <datei>` tippen.
## Hinweis
Der ursprüngliche Platzhalter‑Skill war nicht funktionsfähig. Durch das Hinzufügen dieses Bash‑Wrappers wird er zu einem echten OCR‑Tool, das sofort einsatzbereit ist.
FILE:assets/example_asset.txt
# Example Asset File
This placeholder represents where asset files would be stored.
Replace with actual asset files (templates, images, fonts, etc.) or delete if not needed.
Asset files are NOT intended to be loaded into context, but rather used within
the output Codex produces.
Example asset files from other skills:
- Brand guidelines: logo.png, slides_template.pptx
- Frontend builder: hello-world/ directory with HTML/React boilerplate
- Typography: custom-font.ttf, font-family.woff2
- Data: sample_data.csv, test_dataset.json
## Common Asset Types
- Templates: .pptx, .docx, boilerplate directories
- Images: .png, .jpg, .svg, .gif
- Fonts: .ttf, .otf, .woff, .woff2
- Boilerplate code: Project directories, starter files
- Icons: .ico, .svg
- Data files: .csv, .json, .xml, .yaml
Note: This is a text placeholder. Actual assets can be any file type.
FILE:references/api_reference.md
# Reference Documentation for Ocr
## OCR.space API (Demo)
- **Endpoint:** `https://api.ocr.space/parse/image`
- **Method:** `POST`
- **Parameters:**
- `apikey` (string) – API key. The public demo key is `helloworld` (limited usage, suitable for testing).
- `url` (string, optional) – Direct URL to the image to process.
- `language` (string, optional) – Language code (e.g., `eng`, `deu`, `spa`). Default `eng`.
- `isOverlayRequired` (bool, optional) – Set to `false` for plain text output.
- `file` (binary, optional) – Upload a local image file when not using a URL.
- **Response (JSON):**
```json
{
"ParsedResults": [{"ParsedText": "..."}],
"IsErroredOnProcessing": false,
"ErrorMessage": []
}
```
- **Rate limits:** The demo key is heavily throttled; for production use obtain a personal API key from https://ocr.space/ocrapi.
- **Supported formats:** JPG, PNG, BMP, GIF, TIFF.
- **Notes:** The demo key may reject large images (>1 MB) and has a daily request cap.
## Usage in the OCR Skill
The `scripts/example.py` script demonstrates how to call this API for both URLs and local files, handling errors and returning the extracted plain text.
FILE:scripts/example.py
#!/usr/bin/env python3
"""
OCR helper script for the ocr skill.
Uses the free demo API from ocr.space (apikey=helloworld).
Supports image URLs or local image files (jpg, png, bmp, gif, tiff).
Returns the extracted plain text or an error message.
"""
import sys
import os
import requests
API_URL = "https://api.ocr.space/parse/image"
DEMO_KEY = "helloworld"
def perform_ocr(source, language="eng"):
"""Perform OCR on an image.
Args:
source (str): URL or local file path to the image.
language (str): OCR language code (default 'eng').
Returns:
str: Extracted text or error description.
"""
files = {}
data = {
"apikey": DEMO_KEY,
"language": language,
"isOverlayRequired": "false",
}
if source.startswith("http://") or source.startswith("https://"):
data["url"] = source
else:
if not os.path.isfile(source):
return f"Error: file not found – {source}"
files["file"] = open(source, "rb")
try:
resp = requests.post(API_URL, data=data, files=files, timeout=30)
finally:
if files:
files["file"].close()
if resp.status_code != 200:
return f"Error: HTTP {resp.status_code} from OCR service"
result = resp.json()
if result.get("IsErroredOnProcessing"):
return f"Error: {result.get('ErrorMessage', ['unknown error'])[0]}"
parsed = result.get("ParsedResults", [])
if not parsed:
return "Error: No parsed results returned"
return parsed[0].get("ParsedText", "")
def main():
if len(sys.argv) < 2:
print("Usage: example.py <image_path_or_url> [language]")
sys.exit(1)
source = sys.argv[1]
language = sys.argv[2] if len(sys.argv) > 2 else "eng"
text = perform_ocr(source, language)
print(text)
if __name__ == "__main__":
main()
FILE:scripts/ocr.sh
#!/usr/bin/env bash
set -e
# ------------------------------------------------------------
# OCR‑Wrapper – unterstützt Bilddateien (jpg, png, …) und PDF
# ------------------------------------------------------------
usage() {
echo "Usage: $0 <file>"
echo " <file> Bild (jpg/png/… ) oder PDF‑Datei"
exit 1
}
if [ -z "$1" ]; then
usage
fi
INPUT="$1"
EXTENSION="INPUT##*." # Dateierweiterung (ohne Pfad)
# ---- Funktion: OCR für ein Bild ----
ocr_image() {
local img="$1"
# tesseract –l eng (Englisch). Entferne "-l eng" für automatische Spracherkennung.
tesseract "$img" stdout -l eng || true
}
# ---- PDF‑Verarbeitung (falls nötig) ----
if [[ "$EXTENSION" =~ ^[Pp][Dd][Ff]$ ]]; then
# Konvertiere jede Seite zu einem temporären PNG (300 dpi liefert gute Qualität)
TMPDIR=$(mktemp -d)
# pdftoppm erzeugt Dateien: page-1.png, page-2.png, …
pdftoppm -png -r 300 "$INPUT" "$TMPDIR/page"
# Schleife über alle erzeugten PNGs und OCR ausführen
for img in "$TMPDIR"/*.png; do
echo "--- Seite: $(basename "$img") ---"
ocr_image "$img"
done
rm -rf "$TMPDIR"
else
# Bilddatei – direkt OCR ausführen
ocr_image "$INPUT"
fi
Perform OCR on image files (jpg, png, bmp, gif, tiff) using the system's `tesseract` binary and return extracted plain text.
---
name: ocr-scanner-image
version: 1.0.0
description: Perform OCR on image files (jpg, png, bmp, gif, tiff) using the system's `tesseract` binary and return extracted plain text.
---
# OCR‑Scanner‑Image (funktionierender Skill)
## Überblick
Dieser Skill nutzt das lokale **Tesseract‑OCR**‑Programm, um Text aus gängigen Bildformaten zu extrahieren. Er ist sofort einsetzbar, weil `tesseract` bereits auf dem System installiert ist.
## Verwendung
```bash
# Aufruf über das Skill‑Skript (empfohlen)
~/.openclaw/workspace/skills/ocr-scanner-image/scripts/ocr.sh <Pfad‑zur‑Bilddatei>
```
Beispiel:
```bash
~/.openclaw/workspace/skills/ocr-scanner-image/scripts/ocr.sh /root/.openclaw/media/inbound/916f6187-cc22-4c62-bcfc-7b72198c8a10.png
```
Der erkannte Text wird auf **STDOUT** ausgegeben.
## Optionen
- Der Aufruf nutzt `-l eng`, um die englische Sprachdatei zu erzwingen. Für andere Sprachen kannst du das Flag anpassen, z. B. `-l deu` für Deutsch.
- Wenn du die Sprache automatisch erkennen lassen möchtest, entferne das `-l`‑Flag.
## Integration in OpenClaw (optional)
Falls du den Skill später über das OpenClaw‑CLI ausführen willst, kannst du einen Alias in deiner `~/.bashrc` (oder `~/.zshrc`) hinzufügen:
```bash
alias ocr-image='~/.openclaw/workspace/skills/ocr-scanner-image/scripts/ocr.sh'
```
Dann kannst du einfach `ocr-image <datei>` tippen.
## Hinweis
Der ursprüngliche Platzhalter‑Skill war nicht funktionsfähig. Durch das Hinzufügen dieses Bash‑Wrappers wird er zu einem echten OCR‑Tool, das sofort einsatzbereit ist.
FILE:assets/example_asset.txt
# Example Asset File
This placeholder represents where asset files would be stored.
Replace with actual asset files (templates, images, fonts, etc.) or delete if not needed.
Asset files are NOT intended to be loaded into context, but rather used within
the output Codex produces.
Example asset files from other skills:
- Brand guidelines: logo.png, slides_template.pptx
- Frontend builder: hello-world/ directory with HTML/React boilerplate
- Typography: custom-font.ttf, font-family.woff2
- Data: sample_data.csv, test_dataset.json
## Common Asset Types
- Templates: .pptx, .docx, boilerplate directories
- Images: .png, .jpg, .svg, .gif
- Fonts: .ttf, .otf, .woff, .woff2
- Boilerplate code: Project directories, starter files
- Icons: .ico, .svg
- Data files: .csv, .json, .xml, .yaml
Note: This is a text placeholder. Actual assets can be any file type.
FILE:references/api_reference.md
# Reference Documentation for Ocr
## OCR.space API (Demo)
- **Endpoint:** `https://api.ocr.space/parse/image`
- **Method:** `POST`
- **Parameters:**
- `apikey` (string) – API key. The public demo key is `helloworld` (limited usage, suitable for testing).
- `url` (string, optional) – Direct URL to the image to process.
- `language` (string, optional) – Language code (e.g., `eng`, `deu`, `spa`). Default `eng`.
- `isOverlayRequired` (bool, optional) – Set to `false` for plain text output.
- `file` (binary, optional) – Upload a local image file when not using a URL.
- **Response (JSON):**
```json
{
"ParsedResults": [{"ParsedText": "..."}],
"IsErroredOnProcessing": false,
"ErrorMessage": []
}
```
- **Rate limits:** The demo key is heavily throttled; for production use obtain a personal API key from https://ocr.space/ocrapi.
- **Supported formats:** JPG, PNG, BMP, GIF, TIFF.
- **Notes:** The demo key may reject large images (>1 MB) and has a daily request cap.
## Usage in the OCR Skill
The `scripts/example.py` script demonstrates how to call this API for both URLs and local files, handling errors and returning the extracted plain text.
FILE:scripts/example.py
#!/usr/bin/env python3
"""
OCR helper script for the ocr skill.
Uses the free demo API from ocr.space (apikey=helloworld).
Supports image URLs or local image files (jpg, png, bmp, gif, tiff).
Returns the extracted plain text or an error message.
"""
import sys
import os
import requests
API_URL = "https://api.ocr.space/parse/image"
DEMO_KEY = "helloworld"
def perform_ocr(source, language="eng"):
"""Perform OCR on an image.
Args:
source (str): URL or local file path to the image.
language (str): OCR language code (default 'eng').
Returns:
str: Extracted text or error description.
"""
files = {}
data = {
"apikey": DEMO_KEY,
"language": language,
"isOverlayRequired": "false",
}
if source.startswith("http://") or source.startswith("https://"):
data["url"] = source
else:
if not os.path.isfile(source):
return f"Error: file not found – {source}"
files["file"] = open(source, "rb")
try:
resp = requests.post(API_URL, data=data, files=files, timeout=30)
finally:
if files:
files["file"].close()
if resp.status_code != 200:
return f"Error: HTTP {resp.status_code} from OCR service"
result = resp.json()
if result.get("IsErroredOnProcessing"):
return f"Error: {result.get('ErrorMessage', ['unknown error'])[0]}"
parsed = result.get("ParsedResults", [])
if not parsed:
return "Error: No parsed results returned"
return parsed[0].get("ParsedText", "")
def main():
if len(sys.argv) < 2:
print("Usage: example.py <image_path_or_url> [language]")
sys.exit(1)
source = sys.argv[1]
language = sys.argv[2] if len(sys.argv) > 2 else "eng"
text = perform_ocr(source, language)
print(text)
if __name__ == "__main__":
main()
FILE:scripts/ocr.sh
#!/usr/bin/env bash
set -e
# ------------------------------------------------------------
# OCR‑Wrapper – unterstützt Bilddateien (jpg, png, …) und PDF
# ------------------------------------------------------------
usage() {
echo "Usage: $0 <file>"
echo " <file> Bild (jpg/png/… ) oder PDF‑Datei"
exit 1
}
if [ -z "$1" ]; then
usage
fi
INPUT="$1"
EXTENSION="INPUT##*." # Dateierweiterung (ohne Pfad)
# ---- Funktion: OCR für ein Bild ----
ocr_image() {
local img="$1"
# tesseract –l eng (Englisch). Entferne "-l eng" für automatische Spracherkennung.
tesseract "$img" stdout -l eng || true
}
# ---- PDF‑Verarbeitung (falls nötig) ----
if [[ "$EXTENSION" =~ ^[Pp][Dd][Ff]$ ]]; then
# Konvertiere jede Seite zu einem temporären PNG (300 dpi liefert gute Qualität)
TMPDIR=$(mktemp -d)
# pdftoppm erzeugt Dateien: page-1.png, page-2.png, …
pdftoppm -png -r 300 "$INPUT" "$TMPDIR/page"
# Schleife über alle erzeugten PNGs und OCR ausführen
for img in "$TMPDIR"/*.png; do
echo "--- Seite: $(basename "$img") ---"
ocr_image "$img"
done
rm -rf "$TMPDIR"
else
# Bilddatei – direkt OCR ausführen
ocr_image "$INPUT"
fi