@clawhub-jackyun1024-cad784d9f3
macOS screen OCR & click automation via Apple Vision + ScreenCaptureKit. Capture any window or screen region, extract text with coordinates, find text, and c...
---
name: screen-vision
description: macOS screen OCR & click automation via Apple Vision + ScreenCaptureKit. Capture any window or screen region, extract text with coordinates, find text, and click on it — all from the terminal.
homepage: https://github.com/jackyun1024/mac-screen-vision
user-invocable: true
allowed-tools: Bash, Read
argument-hint: "<command> [options] (e.g. list --app Safari, find 'Submit', tap 'OK' --retry 3)"
---
# screen-vision
macOS screen OCR & click automation CLI powered by Apple Vision + ScreenCaptureKit.
## IMPORTANT: Auto-Setup (run first)
Before running any screen-vision command, check if the binary exists. If not, run the setup script:
```bash
command -v screen-vision &>/dev/null || bash "CLAUDE_SKILL_DIR/setup.sh"
```
This installs `screen-vision` (via Homebrew or source build) and `cliclick` automatically.
## Requirements
- macOS 14.0+ (Sonoma)
- Screen Recording permission (System Settings > Privacy & Security > Screen Recording)
## Commands
| Command | Description | Output |
|---------|-------------|--------|
| `screen-vision ocr [--app NAME]` | Full OCR | JSON array `[{text, x, y, w, h, confidence}]` |
| `screen-vision list [--app NAME]` | OCR list | Human-readable text with coordinates |
| `screen-vision find "text" [--app NAME]` | Find text | JSON `{text, x, y, found}` |
| `screen-vision has "text" [--app NAME]` | Check text exists | Exit code 0 (found) / 1 (not found) |
| `screen-vision tap "text" [--app NAME] [--retry N]` | Find + click | JSON `{text, x, y, tapped}` |
| `screen-vision wait "text" [--timeout SEC]` | Poll until text appears | JSON `{text, x, y, found}` |
## Capture Priority
```
--region x,y,w,h > --app "AppName" > full screen (default)
```
## Usage Patterns
### OCR a specific app window
```bash
screen-vision list --app "Safari"
```
### Check if text is visible (for conditionals)
```bash
screen-vision has "Submit" --app "MyApp" && echo "Found" || echo "Not found"
```
### Click on text with retry
```bash
screen-vision tap "OK" --app "MyApp" --retry 3
```
### Wait for text to appear (e.g. loading complete)
```bash
screen-vision wait "Complete" --timeout 30
```
### Full screen OCR as JSON (pipe to jq)
```bash
screen-vision ocr | jq '.[].text'
```
## $ARGUMENTS Handling
Parse the user's request to determine which command to run:
- "화면에 뭐 있어?" / "what's on screen?" → `screen-vision list`
- "~찾아" / "find ~" → `screen-vision find "text"`
- "~클릭해" / "click ~" → `screen-vision tap "text"`
- "~보여?" / "is ~ visible?" → `screen-vision has "text"`
- "~뜰 때까지 기다려" / "wait for ~" → `screen-vision wait "text"`
FILE:setup.sh
#!/bin/bash
# screen-vision auto-setup: installs CLI binary + cliclick if missing
set -euo pipefail
ok() { echo " ✓ $1"; }
install_step() { echo " ⟳ $1..."; }
echo "screen-vision setup"
echo "==================="
# 1. screen-vision binary
if command -v screen-vision &>/dev/null; then
ok "screen-vision $(screen-vision --help 2>&1 | head -1 | grep -oE 'v[0-9.]+' || echo 'installed')"
else
if command -v brew &>/dev/null; then
install_step "Installing screen-vision via Homebrew"
brew install jackyun1024/tap/screen-vision
ok "screen-vision installed via Homebrew"
elif [[ "$(uname -m)" == "arm64" ]]; then
install_step "Downloading pre-built binary (Apple Silicon)"
curl -sL https://github.com/jackyun1024/mac-screen-vision/releases/download/v1.0.0/screen-vision-1.0.0-arm64-macos.tar.gz | tar xz -C /usr/local/bin/
chmod +x /usr/local/bin/screen-vision
ok "screen-vision installed to /usr/local/bin/"
else
install_step "Building screen-vision from source (Intel)"
TMPDIR=$(mktemp -d)
git clone --depth 1 https://github.com/jackyun1024/mac-screen-vision.git "$TMPDIR/sv"
cd "$TMPDIR/sv" && swift build -c release
cp .build/release/screen-vision /usr/local/bin/
rm -rf "$TMPDIR"
ok "screen-vision built and installed to /usr/local/bin/"
fi
fi
# 2. cliclick (for tap command)
if command -v cliclick &>/dev/null; then
ok "cliclick installed"
else
if command -v brew &>/dev/null; then
install_step "Installing cliclick via Homebrew"
brew install cliclick
ok "cliclick installed"
else
echo " ⚠ cliclick not found. 'tap' command won't work."
echo " Install manually: https://github.com/BlueM/cliclick"
fi
fi
# 3. Screen Recording permission check
echo ""
if screen-vision has "$(hostname -s)" 2>/dev/null || screen-vision list 2>/dev/null | head -1 | grep -q '\['; then
ok "Screen Recording permission granted"
else
echo " ⚠ Screen Recording permission may be needed."
echo " System Settings > Privacy & Security > Screen Recording"
echo " Add your terminal app (Terminal, iTerm2, Warp, etc.)"
fi
echo ""
echo "Setup complete. Try: screen-vision list"