@clawhub-matthew77-ac7442ae63
Provides image analysis and description from URLs or local files plus real-time web search using MiniMax's Token Plan API.
---
name: minimax-api
description: Enables image understanding and web search via MiniMax's Token Plan API. Use when asked to analyze/describe images, extract information from images, or search the web. Handles both HTTP/HTTPS image URLs and local file paths (absolute paths). Triggers on: "analyze this image", "describe this picture", "what's in this image", "search the web for", "look up", "web search".
---
# Minimax API
## Overview
Provides two capabilities via MiniMax's Token Plan API:
1. **Image understanding** — analyze images via VLM
2. **Web search** — real-time web search
API base URL: `https://api.minimaxi.com`
## Capabilities
### 1. understand_image
Analyzes an image and returns a text description.
**Input:**
- `image_url`: HTTP/HTTPS URL or absolute local file path (e.g., `/home/user/photo.png`, `D:\images\photo.png`)
- `prompt`: What to ask about the image
**Output:** Text description from the VLM.
**Script:** `scripts/minimax_image.py`
**Usage:**
```bash
export MINIMAX_API_KEY="your_api_key"
python3 skills/minimax-api/scripts/minimax_image.py \
--prompt "Describe this image briefly" \
--image-url "https://example.com/photo.jpg"
# Or with local file
python3 skills/minimax-api/scripts/minimax_image.py \
--prompt "Extract text from this image" \
--image-url "/home/user/documents/receipt.png"
```
### 2. web_search
Performs a web search and returns formatted results.
**Input:**
- `query`: Search query string
**Output:** JSON with organic results, related searches, and metadata.
**Script:** `scripts/minimax_search.py`
**Usage:**
```bash
export MINIMAX_API_KEY="your_api_key"
python3 skills/minimax-api/scripts/minimax_search.py \
--query "MiniMax M2.7 release notes"
```
## Setup
**Required:** A MiniMax API key from [platform.minimaxi.com](https://platform.minimaxi.com).
Set it as an environment variable:
```bash
export MINIMAX_API_KEY="your_api_key_here"
```
Add the above line to your `~/.bashrc` (or `.zshrc`) to make it permanent.
Alternatively, pass `--api-key` directly on the command line (not recommended — exposes key in shell history).
## API Reference
See `references/api_spec.md` for full API documentation including request/response schemas, error codes, and headers.
FILE:references/api_spec.md
# MiniMax Token Plan API Specification
## Overview
MiniMax provides two API endpoints accessible via the Token Plan subscription:
- **VLM API** — image understanding (vision)
- **Search API** — web search
Base URL: `https://api.minimaxi.com`
## Authentication
All requests require:
```
Authorization: Bearer {API_KEY}
MM-API-Source: minimax-coding-plan-mcp
Content-Type: application/json
```
## Endpoints
### 1. Image Understanding (VLM)
**Endpoint:** `POST /v1/coding_plan/vlm`
**Request Body:**
```json
{
"prompt": "string - The question/instruction about the image",
"image_url": "string - Image as data URL (data:image/{format};base64,{data})"
}
```
**Supported Image Formats:** JPEG, PNG, WebP
**Image URL Formats:**
- `data:image/jpeg;base64,{base64_data}` — base64 encoded image data
- `data:image/png;base64,{base64_data}` — base64 encoded PNG
- `data:image/webp;base64,{base64_data}` — base64 encoded WebP
- `https://example.com/image.jpg` — HTTP/HTTPS URL (only works via MCP server which auto-converts)
**Response:**
```json
{
"content": "string - The VLM's text response",
"base_resp": {
"status_code": 0,
"status_msg": "string"
}
}
```
**Error Codes:**
- `0` — Success
- `1004` — Authentication error (invalid API key)
- `2013` — Invalid parameters (e.g., invalid image URL)
- `2038` — Real-name verification required
### 2. Web Search
**Endpoint:** `POST /v1/coding_plan/search`
**Request Body:**
```json
{
"q": "string - Search query"
}
```
**Response:**
```json
{
"organic": [
{
"title": "string - Result title",
"link": "string - Result URL",
"snippet": "string - Result description",
"date": "string - Publication date (if available)"
}
],
"related_searches": [
{
"query": "string - Related search query"
}
],
"base_resp": {
"status_code": 0,
"status_msg": "string"
}
}
```
## Error Handling
All endpoints return a `base_resp` object:
| status_code | Meaning |
|-------------|---------|
| 0 | Success |
| 1004 | Auth error — check API key |
| 2013 | Invalid parameters |
| 2038 | Real-name verification needed |
On error, the tool exits with code 1 and prints the error message to stderr.
FILE:scripts/minimax_image.py
#!/usr/bin/env python3
"""
Minimax VLM (Vision Language Model) API client.
Handles image understanding via MiniMax Token Plan API.
Usage:
python3 minimax_image.py --api-key KEY --prompt PROMPT --image-url URL_OR_PATH
"""
import argparse
import base64
import json
import os
import sys
import urllib.request
import urllib.error
def download_image_as_base64(url: str) -> str:
"""Download HTTP/HTTPS image and return base64 data URL."""
try:
with urllib.request.urlopen(url, timeout=30) as response:
image_data = response.read()
content_type = response.headers.get('Content-Type', 'image/jpeg').lower()
# Detect format
if 'png' in content_type:
fmt = 'png'
elif 'webp' in content_type:
fmt = 'webp'
elif 'jpeg' in content_type or 'jpg' in content_type:
fmt = 'jpeg'
else:
fmt = 'jpeg' # default
b64_data = base64.b64encode(image_data).decode('utf-8')
return f"data:image/{fmt};base64,{b64_data}"
except urllib.error.URLError as e:
print(f"Error downloading image: {e}", file=sys.stderr)
sys.exit(1)
def local_file_to_base64(path: str) -> str:
"""Read local image file and return base64 data URL."""
if not os.path.exists(path):
# Try expanding path for Windows paths like D:\...
if sys.platform == 'win32' and ':' in path:
# Try as-is
pass
print(f"Error: File not found: {path}", file=sys.stderr)
sys.exit(1)
# Detect format from extension
lower_path = path.lower()
if lower_path.endswith('.png'):
fmt = 'png'
elif lower_path.endswith('.webp'):
fmt = 'webp'
elif lower_path.endswith('.jpg') or lower_path.endswith('.jpeg'):
fmt = 'jpeg'
else:
fmt = 'jpeg' # default
try:
with open(path, 'rb') as f:
image_data = f.read()
b64_data = base64.b64encode(image_data).decode('utf-8')
return f"data:image/{fmt};base64,{b64_data}"
except IOError as e:
print(f"Error reading file: {e}", file=sys.stderr)
sys.exit(1)
def process_image_url(image_url: str) -> str:
"""
Process image input and convert to base64 data URL.
Handles:
- HTTP/HTTPS URLs: downloads and converts to base64
- Local file paths: reads file and converts to base64
- Base64 data URLs: passes through as-is
"""
if image_url.startswith('data:'):
# Already a data URL
return image_url
elif image_url.startswith(('http://', 'https://')):
return download_image_as_base64(image_url)
else:
# Local file path
return local_file_to_base64(image_url)
def call_vlm_api(api_key: str, prompt: str, image_url: str) -> dict:
"""Call MiniMax VLM API."""
# Process image (convert to base64 if needed)
processed_image_url = process_image_url(image_url)
url = "https://api.minimaxi.com/v1/coding_plan/vlm"
payload = {
"prompt": prompt,
"image_url": processed_image_url
}
data = json.dumps(payload).encode('utf-8')
req = urllib.request.Request(
url,
data=data,
headers={
'Authorization': f'Bearer {api_key}',
'Content-Type': 'application/json',
'MM-API-Source': 'minimax-coding-plan-mcp'
},
method='POST'
)
try:
with urllib.request.urlopen(req, timeout=60) as response:
result = json.loads(response.read().decode('utf-8'))
return result
except urllib.error.HTTPError as e:
error_body = e.read().decode('utf-8') if e.fp else ''
print(f"API Error {e.code}: {error_body}", file=sys.stderr)
sys.exit(1)
except urllib.error.URLError as e:
print(f"Request Error: {e}", file=sys.stderr)
sys.exit(1)
def main():
parser = argparse.ArgumentParser(description='Minimax VLM Image Understanding')
parser.add_argument('--api-key', required=True, help='Minimax API key')
parser.add_argument('--prompt', required=True, help='Prompt/question about the image')
parser.add_argument('--image-url', required=True, help='Image URL or local file path')
args = parser.parse_args()
result = call_vlm_api(args.api_key, args.prompt, args.image_url)
# Check for API errors
base_resp = result.get('base_resp', {})
if base_resp.get('status_code', 0) != 0:
status_msg = base_resp.get('status_msg', 'Unknown error')
print(f"API Error: {status_msg}", file=sys.stderr)
sys.exit(1)
# Output the content
content = result.get('content', '')
if content:
print(content)
else:
print("No content returned from API", file=sys.stderr)
sys.exit(1)
if __name__ == '__main__':
main()
FILE:scripts/minimax_search.py
#!/usr/bin/env python3
"""
Minimax Web Search API client.
Performs web searches via MiniMax Token Plan API.
Usage:
python3 minimax_search.py --api-key KEY --query "search query"
Example:
python3 minimax_search.py --api-key KEY --query "MiniMax M2.7 release notes"
"""
import argparse
import json
import sys
import urllib.request
import urllib.error
def call_search_api(api_key: str, query: str) -> dict:
"""Call MiniMax Search API."""
url = "https://api.minimaxi.com/v1/coding_plan/search"
payload = {
"q": query
}
data = json.dumps(payload).encode('utf-8')
req = urllib.request.Request(
url,
data=data,
headers={
'Authorization': f'Bearer {api_key}',
'Content-Type': 'application/json',
'MM-API-Source': 'minimax-coding-plan-mcp'
},
method='POST'
)
try:
with urllib.request.urlopen(req, timeout=30) as response:
result = json.loads(response.read().decode('utf-8'))
return result
except urllib.error.HTTPError as e:
error_body = e.read().decode('utf-8') if e.fp else ''
print(f"API Error {e.code}: {error_body}", file=sys.stderr)
sys.exit(1)
except urllib.error.URLError as e:
print(f"Request Error: {e}", file=sys.stderr)
sys.exit(1)
def format_results(result: dict) -> str:
"""Format search results as readable text."""
lines = []
base_resp = result.get('base_resp', {})
if base_resp.get('status_code', 0) != 0:
return f"API Error: {base_resp.get('status_msg', 'Unknown error')}"
organic = result.get('organic', [])
if organic:
lines.append("=== Search Results ===")
for i, item in enumerate(organic, 1):
title = item.get('title', 'No title')
link = item.get('link', '')
snippet = item.get('snippet', '')
date = item.get('date', '')
lines.append(f"\n[{i}] {title}")
if date:
lines.append(f" Date: {date}")
lines.append(f" Link: {link}")
if snippet:
lines.append(f" {snippet}")
related = result.get('related_searches', [])
if related:
lines.append("\n=== Related Searches ===")
for item in related:
query = item.get('query', '')
if query:
lines.append(f" - {query}")
return '\n'.join(lines) if lines else "No results found"
def main():
parser = argparse.ArgumentParser(description='Minimax Web Search')
parser.add_argument('--api-key', required=True, help='Minimax API key')
parser.add_argument('--query', required=True, help='Search query')
args = parser.parse_args()
result = call_search_api(args.api_key, args.query)
print(format_results(result))
if __name__ == '__main__':
main()
Web search using Tavily's LLM-optimized API. Returns relevant results with content snippets, scores, and metadata.
---
name: tavily-search
description: Web search using Tavily's LLM-optimized API. Returns relevant results with content snippets, scores, and metadata.
homepage: https://tavily.com
metadata: {"openclaw":{"emoji":"🔍","requires":{"bins":["node"],"env":["TAVILY_API_KEY"]},"primaryEnv":"TAVILY_API_KEY"}}
---
# Tavily Search
Search the web and get relevant results optimized for LLM consumption.
## Authentication
Get your API key at https://tavily.com and add to your OpenClaw config:
```json
{
"skills": {
"entries": {
"tavily-search": {
"enabled": true,
"apiKey": "tvly-YOUR_API_KEY_HERE"
}
}
}
}
```
Or set the environment variable:
```bash
export TAVILY_API_KEY="tvly-YOUR_API_KEY_HERE"
```
## Quick Start
### Using the Script
```bash
node {baseDir}/scripts/search.mjs "query"
node {baseDir}/scripts/search.mjs "query" -n 10
node {baseDir}/scripts/search.mjs "query" --deep
node {baseDir}/scripts/search.mjs "query" --topic news
```
### Examples
```bash
# Basic search
node {baseDir}/scripts/search.mjs "python async patterns"
# With more results
node {baseDir}/scripts/search.mjs "React hooks tutorial" -n 10
# Advanced search
node {baseDir}/scripts/search.mjs "machine learning" --deep
# News search
node {baseDir}/scripts/search.mjs "AI news" --topic news
# Domain-filtered search
node {baseDir}/scripts/search.mjs "Python docs" --include-domains docs.python.org
```
## Options
| Option | Description | Default |
|--------|-------------|---------|
| `-n <count>` | Number of results (1-20) | 10 |
| `--depth <mode>` | Search depth: `ultra-fast`, `fast`, `basic`, `advanced` | `basic` |
| `--topic <topic>` | Topic: `general` or `news` | `general` |
| `--time-range <range>` | Time range: `day`, `week`, `month`, `year` | - |
| `--include-domains <domains>` | Comma-separated domains to include | - |
| `--exclude-domains <domains>` | Comma-separated domains to exclude | - |
| `--raw-content` | Include full page content | false |
| `--json` | Output raw JSON | false |
## Search Depth
| Depth | Latency | Relevance | Use Case |
|-------|---------|-----------|----------|
| `ultra-fast` | Lowest | Lower | Real-time chat, autocomplete |
| `fast` | Low | Good | Need chunks but latency matters |
| `basic` | Medium | High | General-purpose, balanced |
| `advanced` | Higher | Highest | Precision matters, research |
## Tips
- **Keep queries under 400 characters** - Think search query, not prompt
- **Break complex queries into sub-queries** - Better results than one massive query
- **Use `--include-domains`** to focus on trusted sources
- **Use `--time-range`** for recent information
- **Filter by `score`** (0-1) to get highest relevance results
FILE:scripts/search.mjs
#!/usr/bin/env node
function usage() {
console.error(`Usage: search.mjs "query" [options]
Options:
-n <count> Number of results (1-20, default: 10)
--depth <mode> Search depth: ultra-fast, fast, basic, advanced (default: basic)
--topic <topic> Topic: general or news (default: general)
--time-range <range> Time range: day, week, month, year
--include-domains <list> Comma-separated domains to include
--exclude-domains <list> Comma-separated domains to exclude
--raw-content Include full page content
--json Output raw JSON
Examples:
search.mjs "python async patterns"
search.mjs "React hooks tutorial" -n 10
search.mjs "AI news" --topic news --time-range week
search.mjs "Python docs" --include-domains docs.python.org,realpython.com`);
process.exit(2);
}
const args = process.argv.slice(2);
if (args.length === 0 || args[0] === "-h" || args[0] === "--help") usage();
const query = args[0];
let maxResults = 10;
let searchDepth = "basic";
let topic = "general";
let timeRange = null;
let includeDomains = [];
let excludeDomains = [];
let includeRawContent = false;
let outputJson = false;
for (let i = 1; i < args.length; i++) {
const a = args[i];
if (a === "-n") {
maxResults = Number.parseInt(args[i + 1] ?? "10", 10);
i++;
continue;
}
if (a === "--depth") {
searchDepth = args[i + 1] ?? "basic";
i++;
continue;
}
if (a === "--topic") {
topic = args[i + 1] ?? "general";
i++;
continue;
}
if (a === "--time-range") {
timeRange = args[i + 1];
i++;
continue;
}
if (a === "--include-domains") {
includeDomains = (args[i + 1] ?? "").split(",").map(d => d.trim()).filter(Boolean);
i++;
continue;
}
if (a === "--exclude-domains") {
excludeDomains = (args[i + 1] ?? "").split(",").map(d => d.trim()).filter(Boolean);
i++;
continue;
}
if (a === "--raw-content") {
includeRawContent = true;
continue;
}
if (a === "--json") {
outputJson = true;
continue;
}
console.error(`Unknown arg: a`);
usage();
}
const apiKey = (process.env.TAVILY_API_KEY ?? "").trim();
if (!apiKey) {
console.error("Error: TAVILY_API_KEY not set");
console.error("Get your API key at https://tavily.com");
process.exit(1);
}
const body = {
query: query,
max_results: Math.max(1, Math.min(maxResults, 20)),
search_depth: searchDepth,
topic: topic,
include_raw_content: includeRawContent,
};
if (timeRange) body.time_range = timeRange;
if (includeDomains.length > 0) body.include_domains = includeDomains;
if (excludeDomains.length > 0) body.exclude_domains = excludeDomains;
const resp = await fetch("https://api.tavily.com/search", {
method: "POST",
headers: {
"Content-Type": "application/json",
"Authorization": `Bearer apiKey`,
},
body: JSON.stringify(body),
});
if (!resp.ok) {
const text = await resp.text().catch(() => "");
throw new Error(`Tavily Search failed (resp.status): text`);
}
const data = await resp.json();
if (outputJson) {
console.log(JSON.stringify(data, null, 2));
process.exit(0);
}
// Print AI answer if available
if (data.answer) {
console.log("## Answer\n");
console.log(data.answer);
console.log("\n---\n");
}
// Print results
const results = (data.results ?? []).slice(0, maxResults);
console.log(`## Sources (results.length results)\n`);
for (const r of results) {
const title = String(r?.title ?? "").trim();
const url = String(r?.url ?? "").trim();
const content = String(r?.content ?? "").trim();
const score = r?.score ? ` (relevance: (r.score * 100).toFixed(0)%)` : "";
if (!title || !url) continue;
console.log(`- **title**score`);
console.log(` url`);
if (content) {
console.log(` content.slice(0, 300)""`);
}
console.log();
}
if (data.response_time) {
console.log(`\nResponse time: data.response_times`);
}
Comprehensive research grounded in web data with explicit citations. Use when you need multi-source synthesis—comparisons, current events, market analysis, d...
---
name: tavily-research
description: Comprehensive research grounded in web data with explicit citations. Use when you need multi-source synthesis—comparisons, current events, market analysis, detailed reports.
homepage: https://tavily.com
metadata: {"openclaw":{"emoji":"📊","requires":{"bins":["node"],"env":["TAVILY_API_KEY"]},"primaryEnv":"TAVILY_API_KEY"}}
---
# Tavily Research
Conduct comprehensive research on any topic with automatic source gathering, analysis, and response generation with citations.
## Authentication
Get your API key at https://tavily.com and add to your OpenClaw config:
```json
{
"skills": {
"entries": {
"tavily-research": {
"enabled": true,
"apiKey": "tvly-YOUR_API_KEY_HERE"
}
}
}
}
```
Or set the environment variable:
```bash
export TAVILY_API_KEY="tvly-YOUR_API_KEY_HERE"
```
## Quick Start
### Using the Script
```bash
node {baseDir}/scripts/research.mjs "query"
node {baseDir}/scripts/research.mjs "query" --pro
node {baseDir}/scripts/research.mjs "query" --output report.md
```
### Examples
```bash
# Quick overview
node {baseDir}/scripts/research.mjs "What is retrieval augmented generation?"
# Comprehensive analysis
node {baseDir}/scripts/research.mjs "LangGraph vs CrewAI for multi-agent systems" --pro
# Market research with output file
node {baseDir}/scripts/research.mjs "Fintech startup landscape 2025" --pro --output fintech-report.md
# Technical comparison
node {baseDir}/scripts/research.mjs "React vs Vue vs Svelte" --pro
```
## Options
| Option | Description | Default |
|--------|-------------|---------|
| `--model <model>` | Model: `mini`, `pro`, `auto` | `mini` |
| `--output <file>` | Save report to file | - |
| `--json` | Output raw JSON | false |
## Model Selection
**Rule of thumb**: "what does X do?" → mini. "X vs Y vs Z" or "best way to..." → pro.
| Model | Use Case | Speed |
|-------|----------|-------|
| `mini` | Single topic, targeted research | ~30s |
| `pro` | Comprehensive multi-angle analysis | ~60-120s |
| `auto` | API chooses based on complexity | Varies |
## Output Format
The research includes:
- **AI-generated answer**: Comprehensive synthesis
- **Sources**: Citations with titles, URLs, and relevance scores
- **Metadata**: Query, response time, and statistics
## Tips
- Research can take 30-120 seconds depending on complexity
- Use `--pro` for comparisons, market analysis, or detailed reports
- Use `--output` to save reports for later reference
- The `auto` model lets Tavily choose based on query complexity
FILE:scripts/research.mjs
#!/usr/bin/env node
function usage() {
console.error(`Usage: research.mjs "query" [options]
Options:
--model <model> Model: mini, pro, auto (default: mini)
--output <file> Save report to file
--json Output raw JSON
Examples:
research.mjs "What is retrieval augmented generation?"
research.mjs "LangGraph vs CrewAI" --pro
research.mjs "Fintech landscape 2025" --pro --output report.md`);
process.exit(2);
}
const args = process.argv.slice(2);
if (args.length === 0 || args[0] === "-h" || args[0] === "--help") usage();
const query = args[0];
let model = "mini";
let outputFile = null;
let outputJson = false;
for (let i = 1; i < args.length; i++) {
const a = args[i];
if (a === "--model") {
model = args[i + 1] ?? "mini";
i++;
continue;
}
if (a === "--output") {
outputFile = args[i + 1];
i++;
continue;
}
if (a === "--json") {
outputJson = true;
continue;
}
console.error(`Unknown arg: a`);
usage();
}
const apiKey = (process.env.TAVILY_API_KEY ?? "").trim();
if (!apiKey) {
console.error("Error: TAVILY_API_KEY not set");
console.error("Get your API key at https://tavily.com");
process.exit(1);
}
const body = {
query: query,
search_depth: "advanced",
include_answer: true,
include_raw_content: false,
max_results: 10,
topic: "general",
};
// Map model to search_depth
if (model === "mini") {
body.search_depth = "basic";
body.max_results = 5;
} else if (model === "pro") {
body.search_depth = "advanced";
body.max_results = 10;
}
const resp = await fetch("https://api.tavily.com/search", {
method: "POST",
headers: {
"Content-Type": "application/json",
"Authorization": `Bearer apiKey`,
},
body: JSON.stringify(body),
});
if (!resp.ok) {
const text = await resp.text().catch(() => "");
throw new Error(`Tavily Research failed (resp.status): text`);
}
const data = await resp.json();
if (outputJson) {
console.log(JSON.stringify(data, null, 2));
process.exit(0);
}
// Build report
let report = "";
report += `# Research Report: query\n\n`;
if (data.answer) {
report += `## Summary\n\ndata.answer\n\n`;
report += `---\n\n`;
}
// Sources
const results = data.results ?? [];
report += `## Sources (results.length)\n\n`;
for (const r of results) {
const title = String(r?.title ?? "").trim();
const url = String(r?.url ?? "").trim();
const content = String(r?.content ?? "").trim();
const score = r?.score ? ` (relevance: (r.score * 100).toFixed(0)%)` : "";
if (!title || !url) continue;
report += `### titlescore\n\n`;
report += `url\n\n`;
if (content) {
report += `content\n\n`;
}
}
if (data.response_time) {
report += `\n---\n\nResponse time: data.response_times\n`;
}
// Output
if (outputFile) {
const fs = await import("fs");
fs.writeFileSync(outputFile, report, "utf-8");
console.log(`Report saved to outputFile`);
} else {
console.log(report);
}Extract content from specific URLs using Tavily's extraction API. Returns clean markdown/text from web pages.
---
name: tavily-extract
description: Extract content from specific URLs using Tavily's extraction API. Returns clean markdown/text from web pages.
homepage: https://tavily.com
metadata: {"openclaw":{"emoji":"📄","requires":{"bins":["node"],"env":["TAVILY_API_KEY"]},"primaryEnv":"TAVILY_API_KEY"}}
---
# Tavily Extract
Extract clean content from specific URLs. Ideal when you know which pages you want content from.
## Authentication
Get your API key at https://tavily.com and add to your OpenClaw config:
```json
{
"skills": {
"entries": {
"tavily-extract": {
"enabled": true,
"apiKey": "tvly-YOUR_API_KEY_HERE"
}
}
}
}
```
Or set in environment variable:
```bash
export TAVILY_API_KEY="tvly-YOUR_API_KEY_HERE"
```
## Quick Start
### Using the Script
```bash
node {baseDir}/scripts/extract.mjs "https://example.com/article"
node {baseDir}/scripts/extract.mjs "url1,url2,url3"
node {baseDir}/scripts/extract.mjs "url" --query "authentication API"
```
### Examples
```bash
# Single URL
node {baseDir}/scripts/extract.mjs "https://docs.python.org/3/tutorial/classes.html"
# Multiple URLs
node {baseDir}/scripts/extract.mjs "https://example.com/page1,https://example.com/page2"
# With query focus
node {baseDir}/scripts/extract.mjs "https://example.com/docs" --query "authentication API"
# Advanced extraction for JS pages
node {baseDir}/scripts/extract.mjs "https://app.example.com" --depth advanced --timeout 60
```
## Options
| Option| Description | Default |
|--------|-------------|---------|
| `--query <text>` | Rerank chunks by relevance | - |
| `--chunks <n>` | Chunks per URL (1-5, requires query) | 3 |
| `--depth <mode>` | Extract depth: `basic` or `advanced` | `basic` |
| `--format <fmt>` | Output format: `markdown` or `text` | `markdown` |
| `--timeout <sec>` | Max wait time (1-60 seconds) | varies |
| `--json` | Output raw JSON | false |
## Extract Depth
| Depth | When to Use |
|-------|-------------|
| `basic` | Simple text extraction, faster |
| `advanced` | Dynamic/JS-rendered pages, tables, structured data |
## Tips
- **Max 20 URLs per request** - batch larger lists
- **Use `--query` + `--chunks`** to get only relevant content
- **Try `basic` first**, fall back to `advanced` if content is missing
- **Set longer `--timeout`** for slow pages (up to 60s)
- **Check `failed_results`** in JSON output for URLs that couldn't be extracted
FILE:scripts/extract.mjs
#!/usr/bin/env node
function usage() {
console.error(`Usage: extract.mjs "url1,url2,..." [optionsURLs (comma-separated, max 20)
--query <text> Rerank chunks by relevance
--chunks <n> Chunks per URL (1-5, requires query)
--depth <mode> Extract depth: basic or advanced (default: basic)
--format <fmt> Output format: markdown or text (default: markdown)
--timeout <sec> Max wait time (1-60 seconds)
--json Output raw JSON
Examples:
extract.mjs "https://docs.python.org/3/tutorial/classes.html"
extract.mjs "https://example.com/page1,https://example.com/page2"
extract.mjs "https://example.com/docs" --query "authentication API"
extract.mjs "https://app.example.com" --depth advanced --timeout 60`);
process.exit(2);
}
const args = process.argv.slice(2);
if (args.length === 0 || args[0] === "-h" || args[0] === "--help") usage();
const urlsInput = args[0];
let query = null;
let chunksPerSource = 3;
let extractDepth = "basic";
let format = "markdown";
let timeout = null;
let outputJson = false;
for (let i = 1; i < args.length; i++) {
const a = args[i];
if (a === "--query") {
query = args[i + 1];
i++;
continue;
}
if (a === "--chunks") {
chunksPerSource = Number.parseInt(args[i + 1] ?? "3", 10);
i++;
continue;
}
if (a === "--depth") {
extractDepth = args[i + 1] ?? "basic";
i++;
continue;
}
if (a === "--format") {
format = args[i + 1] ?? "markdown";
i++;
continue;
}
if (a === "--timeout") {
timeout = Number.parseFloat(args[i + 1]);
i++;
continue;
}
if (a === "--json") {
outputJson = true;
continue;
}
console.error(`Unknown arg: a`);
usage();
}
const urls = urlsInput.split(",").map(u => u.trim()).filter(Boolean);
if (urls.length === 0) {
console.error("Error: No URLs provided");
process.exit(1);
}
if (urls.length > 20) {
console.error("Error: Max 20 URLs per request");
process.exit(1);
}
const apiKey = (process.env.TAVILY_API_KEY ?? "").trim();
if (!apiKey) {
console.error("Error: TAVILY_API_KEY not set");
console.error("Get your API key at https://tavily.com");
process.exit(1);
}
const body = {
urls: urls,
extract_depth: extractDepth,
format: format,
};
if (query) body.query = query;
if (query && chunksPerSource) body.chunks_per_source = chunksPerSource;
if (timeout) body.timeout = timeout;
const resp = await fetch("https://api.tavily.com/extract", {
method: "POST",
headers: {
"Content-Type": "application/json",
"Authorization": `Bearer apiKey`,
},
body: JSON.stringify(body),
});
if (!resp.ok) {
const text = await resp.text().catch(() => "");
throw new Error(`Tavily Extract failed (resp.status): text`);
}
const data = await resp.json();
if (outputJson) {
console.log(JSON.stringify(data, null, 2));
process.exit(0);
}
// Print results
const results = data.results ?? [];
console.log(`## Extracted Content (results.length URLs)\n`);
for (const r of results) {
const url = String(r?.url ?? "").trim();
const content = String(r?.raw_content ?? "").trim();
if (!url) continue;
console.log(`### url\n`);
if (content) {
console.log(content);
}
console.log("\n---\n");
}
// Print failed results
const failed = data.failed_results ?? [];
if (failed.length > 0) {
console.log(`## Failed (failed.length)\n`);
for (const f of failed) {
console.log(`- f.url: f.error ?? "Unknown error"`);
}
}
if (data.response_time) {
console.log(`\nResponse time: data.response_times`);
}
Crawl any website and save pages as local markdown files. Ideal for downloading documentation, knowledge bases, or web content for offline access or analysis.
---
name: tavily-crawl
description: Crawl any website and save pages as local markdown files. Ideal for downloading documentation, knowledge bases, or web content for offline access or analysis.
homepage: https://tavily.com
metadata: {"openclaw":{"emoji":"🕷️","requires":{"bins":["node"],"env":["TAVILY_API_KEY"]},"primaryEnv":"TAVILY_API_KEY"}}
---
# Tavily Crawl
Crawl websites to extract content from multiple pages. Ideal for documentation, knowledge bases, and site-wide content extraction.
## Authentication
Get your API key at https://tavily.com and add to your OpenClaw config:
```json
{
"skills": {
"entries": {
"tavily-crawl": {
"enabled": true,
"apiKey": "tvly-YOUR_API_KEY_HERE"
}
}
}
}
```
Or set in environment variable:
```bash
export TAVILY_API_KEY="tvly-YOUR_API_KEY_HERE"
```
## Quick Start
### Using the Script
```bash
node {baseDir}/scripts/crawl.mjs "https://docs.example.com"
node {baseDir}/scripts/crawl.mjs "https://docs.example.com" --output ./docs
node {baseDir}/scripts/crawl.mjs "https://example.com" --depth 2 --limit 50
```
### Examples
```bash
# Basic crawl
node {baseDir}/scripts/crawl.mjs "https://docs.example.com"
# Deeper crawl with limits
node {baseDir}/scripts/crawl.mjs "https://docs.example.com" --depth 2 --limit 50
# Save to files
node {baseDir}/scripts/crawl.mjs "https://docs.example.com" --depth 2 --output ./docs
# Focused crawl with path filters
node {baseDir}/scripts/crawl.mjs "https://example.com" --depth 2 \
--select "/docs/.*" --exclude "/blog/.*"
# With semantic instructions
node {baseDir}/scripts/crawl.mjs "https://docs.example.com" \
--instructions "Find API documentation" --chunks 3
```
## Options
| Option | Description | Default |
|--------|-------------|---------|
| `--depth <n>` | Crawl depth (1-5) | 1 |
| `--breadth <n>` | Links per page | 20 |
| `--limit <n>` | Total pages cap | 50 |
| `--output <dir>` | Save pages to directory | - |
| `--instructions <text>` | Natural language guidance | - |
| `--chunks <n>` | Chunks per page (1-5, requires instructions) | - |
| `--depth-mode <mode>` | Extract depth: `basic` or `advanced` | `basic` |
| `--select <pattern>` | Regex pattern to include | - |
| `--exclude <pattern>` | Regex pattern to exclude | - |
| `--timeout <sec>` | Max wait time (10-150 seconds) | 150 |
| `--json` | Output raw JSON | false |
## Depth vs Performance
| Depth | Typical Pages | Time |
|-------|---------------|------|
| 1 | 10-50 | Seconds |
| 2 | 50-500 | Minutes |
| 3 | 500-5000 | Many minutes |
**Start with `--depth 1`** and increase only if needed.
## Crawl for Context vs Data Collection
**For agentic use (feeding results into context):** Always use `--instructions` + `--chunks`. This returns only relevant chunks instead of full pages, preventing context window explosion.
**For data collection (saving to files):** Omit `--chunks` to get full page content.
## Tips
- **Always use `--chunks` for agentic workflows** - prevents context explosion when feeding results to LLMs
- **Omit `--chunks` only for data collection** - when saving full pages to files
- **Start conservative** (`--depth 1`, `--limit 20`) and scale up
- **Use path patterns** to focus on relevant sections
- **Always set a `--limit`** to prevent runaway crawls
FILE:scripts/crawl.mjs
#!/usr/bin/env node
import fs from "fs";
import path from "path";
function usage() {
console.error(`Usage: crawl.mjs "url" [options]
Options:
--depth <n> Crawl depth (1-5, default: 1)
--breadth <n> Links per page (default: 20)
--limit <n> Total pages cap (default: 50)
--output <dir> Save pages to directory
--instructions <text> Natural language guidance
--chunks <n> Chunks per page (1-5, requires instructions)
--depth-mode <mode> Extract depth: basic or advanced (default: basic)
--select <pattern> Regex pattern to include
--exclude <pattern> Regex pattern to exclude
--timeout <sec> Max wait time (10-150 seconds, default: 150)
--json Output raw JSON
Examples:
crawl.mjs "https://docs.example.com"
crawl.mjs "https://docs.example.com" --depth 2 --limit 50
crawl.mjs "https://docs.example.com" --depth 2 --output ./docs
crawl.mjs "https://example.com" --instructions "Find API docs" --chunks 3`);
process.exit(2);
}
const args = process.argv.slice(2);
if (args.length === 0 || args[0] === "-h" || args[0] === "--help") usage();
const url = args[0];
let maxDepth = 1;
let maxBreadth = 20;
let limit = 50;
let outputDir = null;
let instructions = null;
let chunksPerSource = null;
let extractDepth = "basic";
let selectPaths = null;
let excludePaths = null;
let timeout = 150;
let outputJson = false;
for (let i = 1; i < args.length; i++) {
const a = args[i];
if (a === "--depth") {
maxDepth = Number.parseInt(args[i + 1] ?? "1", 10);
i++;
continue;
}
if (a === "--breadth") {
maxBreadth = Number.parseInt(args[i + 1] ?? "20", 10);
i++;
continue;
}
if (a === "--limit") {
limit = Number.parseInt(args[i + 1] ?? "50", 10);
i++;
continue;
}
if (a === "--output") {
outputDir = args[i + 1];
i++;
continue;
}
if (a === "--instructions") {
instructions = args[i + 1];
i++;
continue;
}
if (a === "--chunks") {
chunksPerSource = Number.parseInt(args[i + 1], 10);
i++;
continue;
}
if (a === "--depth-mode") {
extractDepth = args[i + 1] ?? "basic";
i++;
continue;
}
if (a === "--select") {
selectPaths = args[i + 1];
i++;
continue;
}
if (a === "--exclude") {
excludePaths = args[i + 1];
i++;
continue;
}
if (a === "--timeout") {
timeout = Number.parseFloat(args[i + 1] ?? "150", 10);
i++;
continue;
}
if (a === "--json") {
outputJson = true;
continue;
}
console.error(`Unknown arg: a`);
usage();
}
const apiKey = (process.env.TAVILY_API_KEY ?? "").trim();
if (!apiKey) {
console.error("Error: TAVILY_API_KEY not set");
console.error("Get your API key at https://tavily.com");
process.exit(1);
}
const body = {
url: url,
max_depth: maxDepth,
max_breadth: maxBreadth,
limit: limit,
extract_depth: extractDepth,
};
if (instructions) body.instructions = instructions;
if (instructions && chunksPerSource) body.chunks_per_source = chunksPerSource;
if (selectPaths) body.select_paths = [selectPaths];
if (excludePaths) body.exclude_paths = [excludePaths];
if (timeout) body.timeout = timeout;
console.error(`Crawling url (depth: maxDepth, limit: limit)...`);
const resp = await fetch("https://api.tavily.com/crawl", {
method: "POST",
headers: {
"Content-Type": "application/json",
"Authorization": `Bearer apiKey`,
},
body: JSON.stringify(body),
});
if (!resp.ok) {
const text = await resp.text().catch(() => "");
throw new Error(`Tavily Crawl failed (resp.status): text`);
}
const data = await resp.json();
if (outputJson) {
console.log(JSON.stringify(data, null, 2));
process.exit(0);
}
// Print results
const results = data.results ?? [];
console.error(`\nCrawled results.length pages\n`);
if (outputDir) {
// Save to files
fs.mkdirSync(outputDir, { recursive: true });
for (const r of results) {
const pageUrl = String(r?.url ?? "").trim();
const content = String(r?.raw_content ?? "").trim();
if (!pageUrl) continue;
// Generate filename from URL
const urlObj = new URL(pageUrl);
let filename = urlObj.pathname.replace(/[^a-zA-Z0-9_-]/g, "_") || "index";
filename = filename.replace(/^_+|_+$/g, "");
filename = filename || "index";
filename += ".md";
const filepath = path.join(outputDir, filename);
fs.writeFileSync(filepath, content, "utf-8");
console.error(`Saved: filepath`);
}
console.error(`\nAll pages saved to outputDir/`);
} else {
// Print to stdout
console.log(`## Crawl Results (results.length pages)\n`);
for (const r of results) {
const pageUrl = String(r?.url ?? "").trim();
const content = String(r?.raw_content ?? "").trim();
if (!pageUrl) continue;
console.log(`### pageUrl\n`);
if (content) {
console.log(content);
}
console.log("\n---\n");
}
}
if (data.response_time) {
console.error(`\nResponse time: data.response_times`);
}